Got a warning for my blog going over 100GB in bandwidth this month… which sounded incredibly unusual. My blog is text and a couple images and I haven’t posted anything to it in ages… like how would that even be possible?
Turns out it’s possible when you have crawlers going apeshit on your server. Am I even reading this right? 12,181 with 181 zeros at the end for ‘Unknown robot’? This is actually bonkers.


That’s counting on one machine using the same cookie session continuously, or they code up a way to share the tokens across machines. That’s now how the bot farms work
It will obviously depend heavily on the type of bot crawling, but that is not hard coordination for harvesting data for LLM’s, as they will already have strategies to prevent nodes all crawling the same thing - a simple valkey cache can store a solved JWT.