iocaine ======= [![Build status][ci:badge]][ci:url] [![Container image][oci:badge]][oci:url] [![Demo][demo:badge]][demo:url] [ci:badge]: https://git.madhouse-project.org/algernon/iocaine/actions/workflows/build.yaml/badge.svg?style=for-the-badge&label=CI [ci:url]: https://git.madhouse-project.org/algernon/iocaine/actions/workflows/build.yaml/runs/latest [oci:badge]: https://img.shields.io/badge/container-latest-blue?style=for-the-badge [oci:url]: https://git.madhouse-project.org/algernon/-/packages/container/iocaine/latest [demo:badge]: https://img.shields.io/badge/demo-iocaine-seagreen?style=for-the-badge [demo:url]: https://poison.madhouse-project.org/ > The deadliest poison known to AI. This is a tarpit, modeled after [Nepenthes](https://zadzmo.org/code/nepenthes/), intended to catch unwelcome web crawlers, but with a slightly different, more aggressive intended usage scenario. The core idea is to configure a reverse proxy to serve content generated by `iocaine` to AI crawlers, but normal content to every other visitor. This differs from Nepenthes, where the idea is to link to it, and trap crawlers that way. Not with `iocaine`, where the trap is laid by the reverse proxy. `iocaine` does not try to slow crawlers. It does not try to waste their time that way - that is left up to the reverse proxy. `iocaine` is *purely* about generating garbage. To give you an idea how it works, check the [demo][demo:url], or peek into the [deployment documentation](docs/deploying.md#configuring-the-reverse-proxy). If you wish to know more about how this works, see [docs/how-it-works.md](docs/how-it-works.md). ## Warning This is deliberately malicious software, intended to cause harm. Do not deploy if you aren't fully comfortable with what you are doing. LLM scrapers are relentless and brutal, they *will* place additional burden on your server, even if you only serve static content. With `iocaine`, there's going to be increased computing power used. It's *highly* recommended to implement rate limits at the reverse proxy level, such as with the [caddy-ratelimit](https://github.com/mholt/caddy-ratelimit) plugin, if using Caddy. Entrapment is done by the reverse proxy. Anything that ends up being served by `iocaine` will be trapped there: there are no outgoing links. Be careful what you route towards it. ## Installation `cargo install --path .` Or, if you prefer Docker, an [image][oci:url] is available. If you're on NixOS, this repository is a flake, and provides a NixOS module to help deploying it. See [here](https://pages.madhouse-project.org/algernon/infrastructure.org/eru_services_iocaine) for how to use that. Expected usage is to hide the tarpit behind a reverse proxy like `nginx` or `Caddy`, and delegate the trapping to them, see the [deployment documentation](docs/deploying.md). ## Configuration `iocaine` can be configured via a TOML-format configuration file, or via the environment. Almost everything has sane defaults, but providing a wordlist, and at least one source for the markov generator is **required**. The configuration file is split into three main sections: [`[server]`](#server), [`[sources]`](#sources), and [`[generator]`](#generator). ### `[server]` The `[server]` section is used to configure the address and port the server will listen on, via the `bind` property. The default is shown below: ``` toml [server] bind = "127.0.0.1:42069" ``` This parameter is available as `IOCAINE_SERVER__BIND` when configuring via environment variables. ### `[sources]` The `[sources]` section is the only section without defaults, specifying both options here is mandatory. ``` toml [sources] words = "/usr/share/dict/wamerican.txt" markov = ["/var/lib/iocaine/markov/bee-movie.txt", "/var/lib/iocaine/markov/moby-dick.txt"] ``` The first option, `words`, refers to a word list file, with one word per line. When generating links, the *path* of the link will be a word chosen from this word list. The second option, `markov`, is a list of files used to train the markov chain generator. These will be used to generate the main content. These parameters are available as `IOCAINE_SOURCES__WORDS` and `IOCAINE_SOURCES__MARKOV`, respectively, when configuring via environment variables. Do note that if configuring `iocaine` this way, the `IOCAINE_SOURCES__MARKOV` environment variable *must* be a TOML list: `IOCAINE_SOURCES__MARKOV='["/var/lib/iocaine/markov/bee-movie.txt"]'`. ### `[generator]` The `[generator]` section is used to describe how garbage is generated, how many paragraphs are produced per page, how many words they may have, how many links to place, and whether to add a "Back" link at the top. It looks like this, with defaults shown: ``` toml [generator.markov.paragraphs] min = 1 max = 1 [generator.markov.words] min = 10 max = 420 [generator.links] min = 2 max = 5 backlink = true [generator] initial_seed = "" ``` When configuring through environment variables, these settings are available via `IOCAINE_GENERATOR__MARKOV__PARAGRAPHS__MIN`, `IOCAINE_GENERATOR__MARKOV__PARAGRAPHS_MAX`, `IOCAINE_GENERATOR__MARKOV__WORDS__MIN`, `IOCAINE_GENERATOR__MARKOV__WORDS__MAX`, `IOCAINE_GENERATOR__LINKS__MIN`, `IOCAINE_GENERATOR__LINKS__MAX`, and `IOCAINE_GENERATOR__LINKS__BACKLINK`, `IOCAINE_GENERATOR__INITIAL_SEED` respectively. ## License & copyright `iocaine` is © 2025 Gergely Nagy, with code adapted from [lipsum](https://github.com/mgeisler/lipsum) by [Martin Geisler](https://github.com/mgeisler), and is released under the [MIT](LICENSES/MIT.txt) license. A lot of `iocaine` has been inspired by [Nepenthes](https://zadzmo.org/code/nepenthes/), but shares no code with it, just ideas. ## See Also Similar software you might be interested in, because the more attempts at poisoning AI, the merrier: - [Nepenthes](https://zadzmo.org/code/nepenthes/) - [Quixotic](https://marcusb.org/hacks/quixotic.html) - [marko](https://codeberg.org/timmc/marko/) - [Poison the WeLLMs](https://codeberg.org/MikeCoats/poison-the-wellms) - [django-llm-poison](https://github.com/Fingel/django-llm-poison) - [konterfai](https://codeberg.org/konterfai/konterfai) - [caddy-defender](https://github.com/JasonLovesDoggo/caddy-defender) Lets make AI poisoning the norm. If we all do it, they won't have anything to crawl.