Before, on a low-capacity system (such as a an inexpensive cloud host), doing Markov-chain text generation was _extraordinarily_ slow, taking half a second or more to produce a page, and if multiple requests came in simultaneously they could easily swamp the capacity of such a system. Most of the time was spent in the Words iterator, which did a bunch of cloning of Strings in what the hot path. This changes the Markov generator's internal representation - now, instead of storing Strings, it stores index-pairs into a single shared String, normalized so that all references to particular words are collapsed into a single pair. This also means that the hash map is working with fixed-size values, which can't hurt. In addition, it does only one hash-map lookup per generated word in the happy-path of not reaching the end of the chain. The upshot of all this is that where it was taking a half-second or more to generate a page, it now takes about 0.001 seconds. On the downside, the initialization of WurstsalatGeneratorPro has become rather less flexible. Before, you created one and then taught it various strings, or gave it a list of paths to read and teach itself from. Now, the _only_ way to create one is directly with a list of paths. Changing this is possible, but it means `Substr` would have to learn to distinguish which source data it came from, which would mean a likely 50% increase in its size. It didn't seem worth it to preserve that capability, which wasn't even being used. |
||
---|---|---|
.forgejo/workflows | ||
data | ||
docs | ||
LICENSES | ||
nix | ||
src | ||
templates | ||
tests | ||
.envrc | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
Cargo.lock | ||
Cargo.toml | ||
flake.lock | ||
flake.nix | ||
README.md | ||
REUSE.toml |
iocaine
The deadliest poison known to AI.
This is a tarpit, modeled after Nepenthes, intended to catch unwelcome web crawlers, but with a slightly different, more aggressive intended usage scenario. The core idea is to configure a reverse proxy to serve content generated by iocaine
to AI crawlers, but normal content to every other visitor. This differs from Nepenthes, where the idea is to link to it, and trap crawlers that way. Not with iocaine
, where the trap is laid by the reverse proxy.
iocaine
does not try to slow crawlers. It does not try to waste their time that way - that is left up to the reverse proxy. iocaine
is purely about generating garbage.
For more information about what this is, how it works, and how to deploy it, have a look at the dedicated website.
Lets make AI poisoning the norm. If we all do it, they won't have anything to crawl.