The initial seed is used by the RNG, and the intended use is to allow modifying the generated output without otherwise modifying the configuration, while still being a static, controllable seed that can be shared between instances if need be. Thanks to @buherator@infosec.place for the idea! Signed-off-by: Gergely Nagy <me@gergo.csillger.hu>
2 KiB
How does iocaine
work?
The goal of iocaine
is to generate a stable, infinite maze of garbage. Each page will be randomly generated, but as long as the configuration (and the training data) remains the same, each individual page will always render the same. Because iocaine
is expected to work behind a reverse proxy, to shadow the real content when facing unwanted crawlers, it will generate different pages for different hosts, even if the path is the same.
This means that if iocaine
is set up to shadow for both site1.example.com
and site2.example.com
, then https://site1.example.com/some/path/
and https://site2.example.com/some/path/
will render different content, but each visit to either will render the same.
This is accomplished by seeding the random number generator with a number derived from the SHA256 digest of the original request URL. This also means that you can deploy multiple iocaine
nodes and load-balance between them, if so need be, and the output will be stable as long as the configuration and training data is the same between nodes.
For a number of reasons, iocaine
uses a different seed for the markov-chain generated text, for the generated link URLs, and the generated link texts. All three are based on the original URL, though.
Such seeding, is, of course, not secure. But we do not need security here, we need each page to render in a stable way. If there's a collision, that happens, it's no big deal, we might end up with a mostly identical page - but we'll remain in the infinite maze, nevertheless.
To provide a way to change the generated content without changing any settings or using different sources, it is possible to set an initial seed, which will be factored into the random number generation.
Every page has the same structure: an optional "back" link (which just points to ../
), followed by a number of markov-chain generated paragraphs of various length, and an unordered list of links at the bottom. Each link is relative to the current page, has a random URI, and random text, too.