mirror of
https://git.madhouse-project.org/algernon/iocaine.git
synced 2025-03-10 17:28:49 +01:00
This rebuilds the templating so that the *content* is no longer pre-generated, only the parameters. It is up to the template (and some newly implemented helper functions) to construct the output from those. Signed-off-by: Gergely Nagy <me@gergo.csillger.hu>
69 lines
2.3 KiB
Markdown
69 lines
2.3 KiB
Markdown
---
|
|
title: Using Caddy with iocaine
|
|
description: Setting up Caddy to front for iocaine
|
|
---
|
|
|
|
# Getting started
|
|
|
|
In here, I assume that iocane has already been [configured](@/configuration/index.md) and [deployed](@/deploying/iocaine.md). Lets assume that we have a site running at `[::1]:8080`, and we want to serve that `Caddy`. Normally, that would look something like this:
|
|
|
|
```caddyfile
|
|
blog.example.com {
|
|
reverse_proxy [::1]:8080
|
|
}
|
|
```
|
|
|
|
# Routing AI agents elsewhere
|
|
|
|
To serve `iocaine`'s garbage to AI visitors, what we need is a matcher, and a matched `reverse_proxy`:
|
|
|
|
```caddyfile
|
|
blog.example.com {
|
|
@ai {
|
|
header_regexp user-agent (?i:gptbot|chatgpt|ccbot|claude)
|
|
}
|
|
reverse_proxy @ai 127.0.0.1:42069
|
|
reverse_proxy [::1]:8080
|
|
}
|
|
```
|
|
|
|
# Applying rate limits
|
|
|
|
We can do even better than this, though! We can apply rate limits using [caddy-ratelimit](https://github.com/mholt/caddy-ratelimit)! Unfortunately, that leads to a slightly more complex configuration, involving a bit of repetition, but one we can mitigate with a snippet. Lets start with that:
|
|
|
|
```caddyfile
|
|
(ai-bots) {
|
|
header_regexp user-agent (?i:gptbot|chatgpt|ccbot|claude)
|
|
}
|
|
```
|
|
|
|
This is essentially the same thing as the `@ai` matcher, lifted out. The reason it had to be lifted out, is because the same matcher will have to be reused in slightly differring contexts, including ones where I can't use a named matcher. It sounds more complicated than it is, really, so let me show the final result:
|
|
|
|
```caddyfile
|
|
blog.example.com {
|
|
rate_limit {
|
|
zone ai-bots {
|
|
match {
|
|
import ai-bots
|
|
}
|
|
key {user_agent}
|
|
events 16
|
|
window 1m
|
|
}
|
|
}
|
|
|
|
@ai {
|
|
import ai-bots
|
|
}
|
|
@not-ai {
|
|
not {
|
|
import ai-bots
|
|
}
|
|
}
|
|
|
|
reverse_proxy @ai 127.0.0.1:42069
|
|
reverse_proxy @not-ai [::1]:8080
|
|
}
|
|
```
|
|
|
|
This does two things: it routes AI user-agents to `iocaine`, and applies a 16 request / minute rate limit, by user agent. If the rate limit is exceeded, Caddy will return a HTTP 429 ("Too Many Requests"), with a `Retry-After` header, to encourage them to come back to our little maze. Rate limiting is keyed by user agent, because most crawlers use *many* hosts to crawl a site at the same time, where each would remain well under reasonable limits - but together, they're a massive pain. So the above snippet is keyed by user agent instead!
|