Neocities and Ai

Jan 21, 2025
Tags: neocities blog ai

Is no policy better than a bad policy?

Today Neocities decided to roll out their AI policy which amounts to an open sourced robots.txt, file now included with every subsequent new website. Anyone who had been using neocities as a web host with any seriousness up to now has undoubtedly made one of these, as well as thrown the necessary (and most likely ignored) meta tags in the heading of their static websites. It’s one of the first things I researched when getting a website set up. When I looked it up six months ago, the basic unfortunate reality of robots.txt is most scrapers will kindly choose to ignore them. This isn’t a binding agreement between you and the robots that scour the internet looking for content. That content being anything robots scrape for, not just AI. They look for email addresses, keywords for SEO (search engine optimization), pictures to ingest into their image generation bot, text for their chatbot, etc. It’s a never ending problem and they’ve been around as long as the internet.

But this response from Neocities, who is my webhost (and yours if you had accepted their offer of making the old web, new again), is a white flag! Oh no! We cannot possible aid you in protecting your websites from these (admittedly) nefarious circumstances. I am not extremely technical, this website is the most code I’ve ever written in my life and it’s been a wild ride! I do not know how to appropriately tackle the problem of billion dollar AI companies coming to your websites to harvest your data. However, I could do some research and possibly consult with the many people who have written extensively about this topic. This is their business! And they are promising a better web for the close-to-1,000,000 websites they now host.

I don’t think Neocities is alone in this conundrum. “AI” as the industry and world at large has decided to call it, is being shoehorned into literally everything we see. And the more they use it, the more they force it into the most day-to-day areas, the more scraping that will happen. I think that I just want more than a head-in-the-sand approach to this issue. Can we stand up as a group and say “no more?” There is a program called pi-hole that uses blacklists that are sourced from multiple organizations. It is interesting to envision a crowdsourced blacklist of AI bot ip addresses in the same way. I think there needs to be a pool of people that aren’t beholden to AI in the way that so many are, and these pools need to use that amazing ingenuity that so often happens on the internet to stand against this nonsense and protect all of our shit! Or we just lean into it and poison that well, and poison it big time.