iocaine/README.md
Gergely Nagy 85e6f4f66f Typo fix
Signed-off-by: Gergely Nagy <me@gergo.csillger.hu>
2025-01-19 21:39:52 +01:00

6.2 KiB

iocaine

Build status Container image Demo

The deadliest poison known to AI.

This is a tarpit, modeled after Nepenthes, intended to catch unwelcome web crawlers, but with a slightly different, more aggressive intended usage scenario. The core idea is to configure a reverse proxy to serve content generated by iocaine to AI crawlers, but normal content to every other visitor. This differs from Nepenthes, where the idea is to link to it, and trap crawlers that way. Not with iocaine, where the trap is laid by the reverse proxy.

iocaine does not try to slow crawlers. It does not try to waste their time that way - that is left up to the reverse proxy. iocaine is purely about generating garbage.

To give you an idea how it works, check the demo, or peek into the deployment documentation. If you wish to know more about how this works, see docs/how-it-works.md.

Warning

This is deliberately malicious software, intended to cause harm. Do not deploy if you aren't fully comfortable with what you are doing. LLM scrapers are relentless and brutal, they will place additional burden on your server, even if you only serve static content. With iocaine, there's going to be increased computing power used. It's highly recommended to implement rate limits at the reverse proxy level, such as with the caddy-ratelimit plugin, if using Caddy.

Entrapment is done by the reverse proxy. Anything that ends up being served by iocaine will be trapped there: there are no outgoing links. Be careful what you route towards it.

Installation

cargo install --path .

Or, if you prefer Docker, an image is available. If you're on NixOS, this repository is a flake, and provides a NixOS module to help deploying it. See here for how to use that.

Expected usage is to hide the tarpit behind a reverse proxy like nginx or Caddy, and delegate the trapping to them, see the deployment documentation.

Configuration

iocaine can be configured via a TOML-format configuration file, or via the environment. Almost everything has sane defaults, but providing a wordlist, and at least one source for the markov generator is required.

The configuration file is split into three main sections: [server], [sources], and [generator].

[server]

The [server] section is used to configure the address and port the server will listen on, via the bind property. The default is shown below:

[server]
bind = "127.0.0.1:42069"

This parameter is available as IOCAINE_SERVER__BIND when configuring via environment variables.

[sources]

The [sources] section is the only section without defaults, specifying both options here is mandatory.

[sources]
words = "/usr/share/dict/wamerican.txt"
markov = ["/var/lib/iocaine/markov/bee-movie.txt", "/var/lib/iocaine/markov/moby-dick.txt"]

The first option, words, refers to a word list file, with one word per line. When generating links, the path of the link will be a word chosen from this word list.

The second option, markov, is a list of files used to train the markov chain generator. These will be used to generate the main content.

These parameters are available as IOCAINE_SOURCES__WORDS and IOCAINE_SOURCES__MARKOV, respectively, when configuring via environment variables. Do note that if configuring iocaine this way, the IOCAINE_SOURCES__MARKOV environment variable must be a TOML list: IOCAINE_SOURCES__MARKOV='["/var/lib/iocaine/markov/bee-movie.txt"]'.

[generator]

The [generator] section is used to describe how garbage is generated, how many paragraphs are produced per page, how many words they may have, how many links to place, and whether to add a "Back" link at the top. It looks like this, with defaults shown:

[generator.markov.paragraphs]
min = 1
max = 1

[generator.markov.words]
min = 10
max = 420

[generator.links]
min = 2
max = 5
backlink = true

[generator]
initial_seed = ""

When configuring through environment variables, these settings are available via IOCAINE_GENERATOR__MARKOV__PARAGRAPHS__MIN, IOCAINE_GENERATOR__MARKOV__PARAGRAPHS_MAX, IOCAINE_GENERATOR__MARKOV__WORDS__MIN, IOCAINE_GENERATOR__MARKOV__WORDS__MAX, IOCAINE_GENERATOR__LINKS__MIN, IOCAINE_GENERATOR__LINKS__MAX, and IOCAINE_GENERATOR__LINKS__BACKLINK, IOCAINE_GENERATOR__INITIAL_SEED respectively.

iocaine is © 2025 Gergely Nagy, with code adapted from lipsum by Martin Geisler, and is released under the MIT license. A lot of iocaine has been inspired by Nepenthes, but shares no code with it, just ideas.

See Also

Similar software you might be interested in, because the more attempts at poisoning AI, the merrier:

Lets make AI poisoning the norm. If we all do it, they won't have anything to crawl.