mirror of https://git.madhouse-project.org/algernon/iocaine.git synced 2025-03-10 17:28:49 +01:00

The deadliest poison known to AI

rust tool

Find a file

iadd 284af56e68 Do less allocating and copying when generating text Before, on a low-capacity system (such as a an inexpensive cloud host), doing Markov-chain text generation was _extraordinarily_ slow, taking half a second or more to produce a page, and if multiple requests came in simultaneously they could easily swamp the capacity of such a system. Most of the time was spent in the Words iterator, which did a bunch of cloning of Strings in what the hot path. This changes the Markov generator's internal representation - now, instead of storing Strings, it stores index-pairs into a single shared String, normalized so that all references to particular words are collapsed into a single pair. This also means that the hash map is working with fixed-size values, which can't hurt. In addition, it does only one hash-map lookup per generated word in the happy-path of not reaching the end of the chain. The upshot of all this is that where it was taking a half-second or more to generate a page, it now takes about 0.001 seconds. On the downside, the initialization of WurstsalatGeneratorPro has become rather less flexible. Before, you created one and then taught it various strings, or gave it a list of paths to read and teach itself from. Now, the _only_ way to create one is directly with a list of paths. Changing this is possible, but it means `Substr` would have to learn to distinguish which source data it came from, which would mean a likely 50% increase in its size. It didn't seem worth it to preserve that capability, which wasn't even being used.		2025-02-10 08:18:54 -08:00
.forgejo/workflows	ci: Update to use cellar	2025-02-10 14:20:37 +01:00
data	data: Update the grafana dashboard	2025-02-09 09:48:28 +01:00
docs	metrics: No need to set `instance`	2025-02-06 22:01:54 +01:00
LICENSES	Initial import	2025-01-16 10:44:56 +01:00
nix	Add a number of tests	2025-01-30 09:18:35 +01:00
src	Do less allocating and copying when generating text	2025-02-10 08:18:54 -08:00
templates	Make templating actually useful	2025-01-29 00:20:21 +01:00
tests	Make the metrics always available	2025-02-07 15:07:40 +01:00
.envrc	Initial import	2025-01-16 10:44:56 +01:00
.gitattributes	.gitattributes: Try to mark markdown files detectable	2025-01-25 12:40:00 +01:00
.gitignore	Move documentation to a dedicated site	2025-01-25 01:31:38 +01:00
.gitmodules	Move documentation to a dedicated site	2025-01-25 01:31:38 +01:00
Cargo.lock	Implement Prometheus-compatible, optional metrics	2025-02-05 02:36:13 +01:00
Cargo.toml	Make the metrics always available	2025-02-07 15:07:40 +01:00
flake.lock	flake.lock: Update	2025-02-09 11:17:25 +01:00
flake.nix	nix: Add zstd to the devshell	2025-01-29 08:23:40 +01:00
README.md	Move documentation to a dedicated site	2025-01-25 01:31:38 +01:00
REUSE.toml	docs: Add a HOWTO about monitoring	2025-02-05 11:26:49 +01:00

README.md

iocaine

The deadliest poison known to AI.

This is a tarpit, modeled after Nepenthes, intended to catch unwelcome web crawlers, but with a slightly different, more aggressive intended usage scenario. The core idea is to configure a reverse proxy to serve content generated by iocaine to AI crawlers, but normal content to every other visitor. This differs from Nepenthes, where the idea is to link to it, and trap crawlers that way. Not with iocaine, where the trap is laid by the reverse proxy.

iocaine does not try to slow crawlers. It does not try to waste their time that way - that is left up to the reverse proxy. iocaine is purely about generating garbage.

For more information about what this is, how it works, and how to deploy it, have a look at the dedicated website.

Lets make AI poisoning the norm. If we all do it, they won't have anything to crawl.