This is a bit wonkish, as Krugman might say, but bear with me, you may find this interesting:
We moved efolkMusic to a new server in December of last year, and due to a documented bug in our CMS (content management system- ours is Joomla) we inadvertently created around 1.7 million (give or take a few thousand) irrelevant, yet working, URLs.
These “internal links” (on our site and pointing to pages on our site) were, of course, “crawled” by Google’s robots, and since they did return working pages, they continued to be crawled, relentlessly, incessantly, and way too frequently for our server to keep up. We were brought to our knees.
We’re not a big site, in the scheme of things, about 4000 pages, with 4-5000 human visitors each month; Google and friends, however, never sleep, and were hitting us close to 50,000 times a day. (Hey, it even takes a robot a little time to check on those 1,500,000 pages).
Everything slowed down except our server usage (and monthly bill). So I appealed to Google to delete their cache of our site and re-index.
Well, they won’t do that. My only alternative (to date) is to ban ALL robots from the website (which you do with a little file called “robots.txt”) and come what may.
This stinks, of course, but we are just the little guy against the 6-trillion ton guerilla. Maybe you can help? I’m thinking some viral spreading of the story would be the best remedy, maybe it is possible to exist WITHOUT GOOGLE!! Hell, it might even put us on the map.
PS Hackers- I suggest a “random link generator” on a enough choice sites could bring down this never-delete-anything-because-information-is-money oligarchy. How else are we ever going to rid the internet of all the stupid things we did or said that Google is perpetuating?