AI companies are violating a basic social contract of the web and and ignoring robots.txt

Andy Reid@lemmy.world · 10 months ago

AI companies are violating a basic social contract of the web and and ignoring robots.txt

palordrolap@kbin.social · 10 months ago

Put something in robots.txt that isn’t supposed to be hit and is hard to hit by non-robots. Log and ban all IPs that hit it.

Imperfect, but can’t think of a better solution.

KillingTimeItself@lemmy.dbzer0.com · 10 months ago

hmm, i though websites just blocked crawler traffic directly? I know one site in particular has rules about it, and will even go so far as to ban you permanently if you continually ignore them.

Bogasse@lemmy.ml · 10 months ago

Detecting crawlers can be easier said than done 🙁

KillingTimeItself@lemmy.dbzer0.com · 10 months ago

i mean yeah, but at a certain point you just have to accept that it’s going to be crawled. The obviously negligent ones are easy to block.

AI companies are violating a basic social contract of the web and and ignoring robots.txt

AI companies are violating a basic social contract of the web and and ignoring robots.txt

The rise and fall of robots.txt