xangis — 2014-05-25T22:43:03-04:00 — #1
A couple of my sites used to get a lot of spambot traffic, mostly using Xrumer and a bunch of well-known exploit attempts, including some dictionary attacks. Almost all of these attacks are against well-known PHP frameworks (Drupal, WordPress, PHPMyAdmin, etc). None of the sites use PHP, but the bots don't care.
Almost all of this traffic was from servers in Fujian Province in China or from servers on OVH Systems, so I blocked those IP ranges.
I'm not worried about any negative effects from blocking the Chinese servers, but is it likely that being a black hole to OVH will ever matter? The concern I have is that other legitimate sites might have crawlers/indexers/aggregators that could fail to find and include me and result in less traffic over the long term. It seems like a small possibility, but I have to wonder -- has anyone else seen significant negative impact to useful traffic after blocking swaths of useless traffic?
eastcoast — 2014-05-26T19:25:12-04:00 — #2
At a previous job where I managed high-traffic international news websites I had the same decision. The concern was that the occasional proxy user would be inconvenienced, but from analysis of apache log files the estimated payoff in terms of lower bandwidth and primarily higher reliability through elimination of bot spider spikes was substantial. Google analytics post bocking the ip ranges didn't even twitch (primarily because it's only recording real traffic generated by real people) where as server load and bandwidth made useful savings. I'm not convinced there are any reputable crawlers or aggregators on the likes of ovh, it's primarily web trash and parasites you're better off without.
xangis — 2014-05-26T23:51:46-04:00 — #3
Thank you, that's reassuring. Are there other networks you had a lot of trouble with that it might be wise to pre-emptively block before they notice me?
eastcoast — 2014-05-28T18:53:09-04:00 — #4
Hetzner are another major european provider worth blocking, though pretty much all of the large vps/dedicated hosts that occupy the lower end of the market are going to be a source of trash spidering