Reworded the README to better align with current project goals, to fix a typo, and add an anti-AI badge for good measure. |
||
|---|---|---|
| .cargo | ||
| .forgejo | ||
| data | ||
| debian | ||
| LICENSES | ||
| nix | ||
| src | ||
| tests | ||
| .envrc | ||
| .gitattributes | ||
| .gitignore | ||
| build.rs | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CHANGELOG.md | ||
| flake.lock | ||
| flake.nix | ||
| justfile | ||
| README.md | ||
| REUSE.toml | ||
iocaine
The deadliest poison known to AI.
Iocaine is a defense mechanism against unwanted scrapers, sitting between upstream resources and the fronting reverse proxy. It is designed to significantly reduce the load caused by the relentless attack of the Crawlers (mostly originating from various AI companies) in a way that does not place undue burden on benign visitors. While iocaine does support presenting a proof-of-work challenge, that should be a last resort, and even if that path is taken, the challenge is inexpensive for the human visitor.
Originally, iocaine started as a garbage generator, a tarpit, a place where the reverse proxy routed unwanted visitors, so that they'd crawl an infinite maze of garbage. Since then, it grew into a more ambitious project, and includes a scripting engine that lets the iocaine-operator tell the software how to treat each incoming request. The idea remained similar, however: keep the crawlers in the maze, and let the backend serve the real content for benign visitors.
Unlike some similar tools, iocaine does not try to make the bad bots go away, it welcomes them, into an infinite maze of garbage. It does so, because that makes it possible to serve them poisoned URLs, which in turn, makes it possible to identify bad actors even if they come back piggy-backing on a real browser. By filling their queue with junk when they visit with their simpler collectors, we can capture the browser-based scrapers in the maze too.
The garbage generator in iocaine has been engineered to require as few resources as possible, to keep it cheap, pretty much on par with serving a static file from the filesystem. The goal is to have a tool that costs next to nothing for the operator, reduces the load on upstream services, and protects them from the crawlers, while having no (or at worst, very little) effect on legit visitors, but making life of crawler operators difficult at the same time.
For more information about what this is, how it works, and how to deploy it, have a look at the dedicated website.
Lets make AI poisoning the norm. If we all do it, they won't have anything to crawl.