It happens (sometimes) in life that you are very close to something/someone(? :-) and because of it you cannot grasp the full extent of your opportunities. Taking few steps back/away might help, if you train yourself in thinking out of the box; Otherwise, taking a few years break will definitely bring the "argh, I could have done this & that" moments :-)
Which bring us to the subject: I could have done it in 2000 (maybe), I could have done it in 2004 (definitely, maybe :-) or, I'll just do it now (2008)
The task at hand was/is quite daunting, but one can break it in a few "easy" steps:
- build your own web crawler/robot
- feed it with previous art or just google
- let it crunch the data/web for a while or two :-)
- (hopefully your robot behaves and no one will ban you; sorry guys)
- extract the www sites from the stack
- perform some IP magic/geo tagging on them
- find a good/nice way to plot/display your data
I guess by the time one reaches 6. his/her chances of finishing the job go above 80% :-)
Update: the data was interpolated/reduced (depends on how you look at it :-) to ~50K geo locations that follow the initial distribution found in the collected data set. I have no time, nor excessive passion/knowledge(? :-) to run the numbers in a complete statistical way, it's like what I learned the other day: it's good to have some data to question, rather than having no data at all.