diff --git a/crowdsec-docs/docs/appsec/bot_detection/configuration.md b/crowdsec-docs/docs/appsec/bot_detection/configuration.md new file mode 100644 index 000000000..98b206fa8 --- /dev/null +++ b/crowdsec-docs/docs/appsec/bot_detection/configuration.md @@ -0,0 +1,129 @@ +--- +id: configuration +title: Bot detection configuration +sidebar_position: 2 +--- + + + +This page covers the configuration of the bot detection feature: the signing keys and their rotation, cookie lifetime, and JavaScript bundle obfuscation. + +## Where to set these values + +Bot-detection settings live under a top-level `challenge:` block inside an appsec-config YAML file — the same kind of file documented in [AppSec configuration syntax](../configuration.md). Multiple appsec-configs loaded by your AppSec acquisition combine field by field, so you can keep the upstream collection's appsec-config unchanged and ship a small overlay of your own that only sets what you care about. The mechanics of loading and merging appsec-configs are covered in [AppSec configuration syntax](../configuration.md#configuration-file-format). + +A minimal overlay looks like this — every field below is optional, see the rest of this page for what each one does: + +```yaml +# /etc/crowdsec/appsec-configs/mycorp-overlay.yaml +name: _XX_APPSEC_CONFIG_OVERLAY_XX_ + +challenge: + master_secret: "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef" + crypto_obfuscation_pool_size: 3 +``` + +Reload CrowdSec for changes to take effect: + +```bash +sudo systemctl reload crowdsec +``` + +## Key management + +The challenge mechanism is built on a long-lived **master secret**, from which the AppSec component derives two independent key families: one rotated on a schedule that signs challenge tickets and validates proof-of-work, and one (static for the lifetime of the master secret) that seals the success cookie. + +| Key | Default | Operational meaning | +|---|---|---| +| `master_secret` | random, ephemeral (single-instance only) | Long-lived secret used to derive every other key. Hex (≥64 chars, preferred) or passphrase (≥32 bytes of UTF-8). **Required** when running more than one AppSec instance — all instances must share the same value or cookies issued by one are rejected by another. | +| `key_rotation_interval` | `5m` | How often the per-epoch signing key advances. All instances in a distributed setup must agree on this value to derive identical per-epoch keys. Minimum 30s. | +| `max_live_epochs` | `3` | How many past epochs (in addition to the current one) the AppSec component still accepts. Bump this if a meaningful share of your clients need more than `(max_live_epochs + 1) × key_rotation_interval` to solve and submit the challenge (slow mobile networks, long round-trips). | +| `cookie_ttl` | `12h` | How long a successful-challenge cookie stays valid. Decoupled from key rotation — the cookie carries its own `not_after` timestamp sealed under the master cookie key, so rotating the per-epoch sign key does **not** invalidate already-issued cookies. | + +### Single-instance deployments + +Leaving `master_secret` unset is fine: the AppSec component generates a 32-byte random secret at startup and logs a warning. Every restart invalidates all outstanding challenge cookies, which is acceptable for a single host. + +### Multi-instance / HA deployments + +Set `master_secret` and `key_rotation_interval` to the **same value** on every AppSec instance. If the values differ, a cookie issued by instance A will be rejected by instance B and clients will be re-challenged on every request that lands on a different node — a noticeable user-experience regression and a load amplifier. + +To rotate the master secret safely: + +1. Generate a new secret. +2. Roll it out to **every** AppSec instance within one `cookie_ttl` window. +3. Restart each instance after it has the new value. + +During the rollout, clients holding cookies sealed under the old secret will be re-challenged once on instances that already have the new secret — there is no way to keep both valid simultaneously. + +### Generating a secret + +The recommended form is a 32-byte hex string: + +```bash +openssl rand -hex 32 +# e.g. 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef +``` + +Passphrases are accepted too, but must be at least 32 bytes of UTF-8. An invalid value (too short, non-hex characters in a hex-looking string) causes CrowdSec to fail loading the config. + +## JS obfuscation + +The AppSec component serves two JavaScript artefacts to the client during a challenge: a **static library bundle** (the fingerprinting + proof-of-work runner) and a **dynamic per-epoch key module** (which embeds the current signing key). Both are obfuscated, and you can tune how many distinct obfuscated variants are kept in memory and how often new ones are produced. + +**Why this matters.** Code running inside an attacker-controlled browser can always be reverse-engineered eventually; obfuscation buys time and cost, not invisibility. + +| Key | Default | Recommended | What it controls | +|---|---|---|---| +| `crypto_obfuscation_pool_size` | `1` | `3` | Number of distinct obfuscations of the per-epoch sign-key module kept per live epoch. Each variant costs ~5 s of CPU per rotation. A pool size of 3 is recommended in production: different clients see different obfuscations of the same key, which materially raises the cost of an attacker reverse-engineering the module. The default of `1` exists to keep tests cheap. | +| `library_runtime_obfuscation_enabled` | `false` | `false` | When `false`, the AppSec component serves only the library bundle baked at build time (no runtime cost). When `true`, a background goroutine produces additional obfuscations of the static library bundle on a cadence. Enable only on hosts with CPU budget to spare — the build-time bundle is already obfuscated and is sufficient for most deployments. | +| `library_obfuscation_pool_size` | `1` | `1` | Maximum number of obfuscated library-bundle variants kept. Has no effect unless `library_runtime_obfuscation_enabled` is `true` — values >1 are clamped to 1 with a startup warning otherwise. | +| `library_obfuscation_refresh_interval` | `1h` | `1h` | How often the background obfuscator produces one new library-bundle variant. Each pass costs roughly one minute of CPU. Ignored when runtime obfuscation is disabled. | + +:::tip +Don't enable `library_runtime_obfuscation_enabled` on a small or shared host — the obfuscator is CPU-heavy and runs every `library_obfuscation_refresh_interval`. The build-time obfuscation is enough for most deployments; only turn this on if you specifically need rotating byte-level library variants in addition to the build-time bundle. +::: + +## Applying changes + +Most fields take effect on the next CrowdSec reload: + +```bash +sudo systemctl reload crowdsec +``` + +Specifically: + +- Changing `master_secret` **invalidates all in-flight challenges** (clients mid-challenge will be re-challenged) and **invalidates every already-issued cookie**. Plan a rotation as described in [Multi-instance / HA deployments](#multi-instance--ha-deployments). +- Changing `key_rotation_interval` or `max_live_epochs` invalidates in-flight challenges but does **not** invalidate already-issued cookies — they remain valid until their own `not_after` timestamp. +- Changing `cookie_ttl` affects only cookies issued **after** the reload; cookies already in the wild keep their original lifetime. +- Changing the JS obfuscation fields takes effect on the next rotation tick / refresh tick. + +## Verification + +Check that the AppSec component picked up the config: + +```bash +sudo cscli metrics show appsec +``` + +Hit a protected endpoint with a clean client and confirm the challenge HTML is served. Tail the CrowdSec log: + +```bash +sudo tail -F /var/log/crowdsec.log | grep -E "challenge submission|on_challenge_submit" +``` + +You should see lines like: + +``` +level=info msg="challenge submission accepted" source=198.51.100.42 fsid=FS1_xyz is_bot=false allowlisted=false +level=info msg="on_challenge_submit rejected" source=203.0.113.7 reason=fast-bot-detection signals="[cdp]" +``` + +If you don't see any `challenge submission` lines at all after a reload, double-check that: + +- The new appsec-config is listed in your AppSec datasource (`appsec_configs:`) — see [Where to set these values](#where-to-set-these-values). +- The bouncer is forwarding `/crowdsec-internal/challenge/*` paths unchanged — see [Prerequisites](intro.md#prerequisites) on the intro page. diff --git a/crowdsec-docs/docs/appsec/bot_detection/intro.md b/crowdsec-docs/docs/appsec/bot_detection/intro.md new file mode 100644 index 000000000..960c9cd26 --- /dev/null +++ b/crowdsec-docs/docs/appsec/bot_detection/intro.md @@ -0,0 +1,277 @@ +--- +id: intro +title: Bot detection +sidebar_position: 1 +--- + + + +Bot detection allows you to block automation before it reaches the application. Where the rest of the WAF reacts to *what a user does* — the payloads they send, the endpoints they hit, the patterns they trigger — bot detection answers a different question: *what a user is*, a real browser or a script pretending to be one. + +## What bot detection does + +Bot detection separates humans from automation. Real browsers pass the check transparently and continue as usual; bots, headless browsers and clients that don't execute JavaScript are filtered out — by the AppSec component for the obvious cases, and by CrowdSec scenarios for the repeat offenders. + +It runs as an extra layer on top of your existing WAF, sharing the same hooks system and event pipeline. See the [request lifecycle](../request-lifecycle.md) for where it fits. + +## Prerequisites + +- A working AppSec setup. If you don't have one yet, follow the [general AppSec quickstart](../quickstart/general.mdx). +- A **compatible bouncer**. Bot detection requires the bouncer to forward the challenge endpoints to the AppSec component, so not every bouncer can serve it. The currently compatible ones (look for the **Bot Detection** badge at the top of their page) are: + - [Nginx](/u/bouncers/nginx) + - [HAProxy SPOA](/u/bouncers/haproxy_spoa) + +## Enable bot detection + +Install the collection that bundles everything (appsec-config + hooks + scenarios): + +```bash +sudo cscli collections install _XX_HUB_COLLECTION_BOT_DETECTION_XX_ +``` + +Then make sure the bundled appsec-config is actually loaded by your AppSec acquisition. Open the AppSec datasource file (typically `/etc/crowdsec/acquis.d/appsec.yaml`) and either list the new appsec-config explicitly: + +```yaml +listen_addr: 127.0.0.1:7422 +appsec_configs: + - crowdsecurity/appsec-default + - _XX_APPSEC_CONFIG_BOT_DETECTION_XX_ +labels: + type: appsec +``` + +…or use a wildcard so any installed `crowdsecurity/*` appsec-config is picked up automatically: + +```yaml +listen_addr: 127.0.0.1:7422 +appsec_configs: + - crowdsecurity/* +labels: + type: appsec +``` + +Reload CrowdSec for the change to take effect: + +```bash +sudo systemctl reload crowdsec +``` + +:::info +If your acquisition already loads appsec-configs via a wildcard, no acquisition change is needed — installing the collection is enough. +::: + +The rest of this section describes what is inside the collection, so you know what behavior you just enabled. None of it requires an extra install step. + +### The appsec-config it installs + +The collection ships `_XX_APPSEC_CONFIG_BOT_DETECTION_XX_`, an appsec-config whose top-level `challenge:` block carries the bot-detection runtime settings: + +```yaml +name: _XX_APPSEC_CONFIG_BOT_DETECTION_XX_ + +challenge: + # All fields below are optional and have sane defaults — see the + # Configuration page for what they mean and when to override them. + # master_secret: "..." + # key_rotation_interval: 5m + # cookie_ttl: 12h + # crypto_obfuscation_pool_size: 3 +``` + +For a **single-instance** deployment you can use this as-is. For **multi-instance / HA** deployments you must set `master_secret` (and keep `key_rotation_interval` consistent) across all WAF instances — see [Key management](configuration.md#key-management). + +### Legitimate bots it allowlists + +The collection ships hooks that proactively grant a challenge cookie to common, well-known bots — Googlebot, Bingbot, and similar — so they never see the challenge page. Internally they call `GrantChallengeCookie("")`, which mints a signed cookie marked as allowlisted and short-circuits the challenge flow. + +Two kinds of allowlist entries are shipped: + +- **Path-based** — well-known endpoints that legitimate non-browser clients hit by design (e.g. `/.well-known/*`, `robots.txt`, ACME challenge paths). Anything matching one of those paths is granted a cookie without further checks. +- **User-agent + identity-verified** — for declared bots like Googlebot, the User-Agent is necessary but not sufficient: the hook also verifies the client's identity via a reverse-DNS lookup (and forward-DNS confirmation) and/or a check against the vendor's published IP ranges. A spoofed UA on an IP that does not resolve back to the vendor is **not** granted a cookie and goes through the normal challenge flow. + +### Bad bots it rejects + +The collection also ships an `on_challenge_submit` hook that calls `RejectSubmission(...)` when the in-browser fast-bot-detection library has flagged the client (headless browser, automation framework, impossible device profile, …): + +```yaml +on_challenge_submit: + - filter: fingerprint.FastBotDetection.Bool() == true + apply: + - RejectSubmission("fast-bot-detection") + - apply: + - LogAccepted("challenge submission accepted") +``` + +A rejected submission produces both a log line you can tail and a structured CrowdSec event — which means it shows up as an **alert in the CrowdSec console** (and in `cscli alerts list`) alongside the rest of your detection signals: + +``` +time="2026-06-03T13:57:49Z" level=info msg="on_challenge_submit rejected" automation=true bouncer=127.0.0.1 component=appsec_runtime_config fsid=FS1_000010000000000000000_00010h02ba_1920x1080c16m32b10011h22f04c_f1000111100010111100011111111e00000000p1100h793814_0h005997_1h-53968_en1tEurope-Paris_h-626_0100h3f9247 is_bot=true module=acquisition.appsec name="127.0.0.1:7422/" platform=Linux reason="Fast Bot Detection" request_uuid=9a822e6b-e20f-465c-8a52-b39ed62e7b7a signals="[cdp]" source=213.44.63.11 type=appsec ua="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36" +``` + +Accepted submissions are not logged by default. + +### Behavioral scenarios it installs + +Per-request hooks can only see the request in front of them. The collection therefore also installs three CrowdSec scenarios that watch the bigger picture and create proper decisions (i.e. blocks at the bouncer level) for repeat offenders: + +| Scenario | What it catches | +|---|---| +| `_XX_SCENARIO_CHALLENGE_FAILED_SUBMITS_XX_` | An IP submits the challenge many times and keeps failing — typical of an automated solver or a script brute-forcing the fingerprint. | +| `_XX_SCENARIO_CHALLENGE_NEVER_SUBMITS_XX_` | An IP requests the challenge page repeatedly but never POSTs to `/submit` — typical of scripts that don't execute JavaScript. | +| `_XX_SCENARIO_CHALLENGE_SHARED_COOKIE_XX_` | The same `__crowdsec_challenge` cookie is presented by many distinct IPs — typical of cookie replay or a bot farm sharing state. | + +These are regular scenarios, so they show up in `cscli alerts list`, in the console, and in your decision stream as you'd expect. + +## Verification + +Hit a protected route from a clean client (no cookie) — you should receive the challenge HTML rather than the real response: + +```bash +curl -i https://your-protected-site.example/some/page +# expect a 200 with a small HTML body containing the challenge script, +# and a Set-Cookie for __crowdsec_challenge once the challenge is solved. +``` + +Tail the CrowdSec log and trigger a failed submission (e.g. with `curl` against `/crowdsec-internal/challenge/submit` with garbage payload) to see the `on_challenge_submit rejected` line. After enough failed submissions, the behavioral scenario should fire and appear in `cscli alerts list`. + +## Recipes + +The snippets below are **advanced** — for the helpers (`SendChallenge`, `GrantChallengeCookie`, `RejectSubmission`, …) and the `fingerprint` object, see the [Hooks reference](../hooks.md). + +### Restrict the challenge to a specific path + +By default the appsec-config shipped by the collection challenges every request without a valid cookie. If you'd rather narrow the challenge to one section of your application — say a checkout flow — gate the `SendChallenge()` call with a path filter: + +```yaml +inband: + post_eval: + - filter: req.URL.Path startsWith "/checkout/" + apply: + - SendChallenge() +``` + +:::note +A client that has already obtained a cookie via `GrantChallengeCookie(...)` is exempted from `SendChallenge()` **regardless of the path** — the allowlist cookie short-circuits the challenge globally, not per-route. +::: + +### Allowlist an internal probe by header + +Useful for synthetic monitoring or internal health checks that don't run JavaScript: + +```yaml +inband: + pre_eval: + - filter: req.Header.Get("X-Internal-Probe") == "my-shared-secret" + apply: + - GrantChallengeCookie("internal-probe", "24h") +``` + +:::warning +This recipe trusts whoever knows the shared secret. If `my-shared-secret` ever leaks — into a log, a screenshot, a public dashboard — anyone who learns it can present that header and bypass bot detection entirely. Prefer pairing the header check with a source-IP filter (`req.RemoteAddr`) or rotating the secret regularly. +::: + +### Reject submissions flagged by the fast-bot-detection library + +This is what the collection ships by default — shown here so you can adapt it (e.g. tighten / loosen the filter, change the reject reason): + +```yaml +on_challenge_submit: + - filter: fingerprint.FastBotDetection.Bool() == true + apply: + - RejectSubmission("fast-bot-detection") +``` + +## Using the bot signal in appsec-configs and scenarios + +The information the challenge collects about a client is not locked inside the bot-detection collection — it's surfaced as a regular CrowdSec event and made available to both appsec-config hooks and scenario expressions. That means you can react to a "this client is automation" verdict anywhere in your CrowdSec stack, not just inside the dedicated `on_challenge_submit` hook. + +### From an appsec-config + +Inside `on_challenge` and `on_challenge_submit` hooks, the in-flight challenge exposes a `fingerprint` object you can branch on. The most useful entry point is the high-level boolean and its companion counters / helpers: + +| Expression | Returns | Description | +| ------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------- | +| `fingerprint.IsBot()` | `bool` | The bottom-line verdict: `true` if any bot-detection signal fired (automation, headless, impossible device…). | +| `fingerprint.BotSignalCount()` | `int` | How many distinct signals fired — useful for "more than N" thresholds. | +| `fingerprint.HasAutomationSignal()` | `bool` | A webdriver / Selenium / Playwright / CDP indicator was seen. | +| `fingerprint.HasHeadlessSignal()` | `bool` | Headless-browser indicators (no GPU, no real plugins, …). | +| `fingerprint.HasMismatchSignal()` | `bool` | Cross-context inconsistencies (UA vs platform, language vs timezone, …). | +| `fingerprint.HasImpossibleDeviceSignal()` | `bool` | Device specs that don't exist in the wild (e.g. 256 cores, 0 GB RAM). | +| `fingerprint.BotSignals()` | `[]str` | The full list of signal names that fired, for logging. | + +Example — reject only clients with multiple, independent signals so you don't punish a flaky headless screenshot bot for tripping a single check: + +```yaml +on_challenge_submit: + - filter: fingerprint.IsBot() && fingerprint.BotSignalCount() >= 2 + apply: + - RejectSubmission("multiple bot signals") +``` + +See the [Hooks reference](../hooks.md#the-fingerprint-object) for the full list of fingerprint methods. + +### From a scenario + +Every step of the challenge lifecycle (requested / submitted / failed / solved) emits a CrowdSec event with `source: crowdsec-appsec-challenge`, distinct from `crowdsec-appsec` events emitted by WAF rule matches. The most important fingerprint signals are also flattened into `evt.Parsed` so scenario `filter` expressions can match on them cheaply, and the full fingerprint object is available under `evt.Unmarshaled.fingerprint` for richer queries. + +Flat fields exposed in `evt.Parsed`: + +| Field | Values | Meaning | +| -------------------------------------- | ----------------------------------- | --------------------------------------------------------------------------------------------------------- | +| `source` | `crowdsec-appsec-challenge` | Distinguishes challenge events from WAF rule events. | +| `challenge_event` | `requested` / `submitted` / `failed` / `solved` | Which step of the lifecycle produced the event. | +| `challenge_difficulty` | integer string | The PoW difficulty applied to this moment. | +| `challenge_fail_reason` | string (only on `failed`) | Why a submission was rejected (`fast-bot-detection`, operator-supplied reason from `RejectSubmission()`). | +| `fsid` | string | Per-fingerprint identifier. Stable across the cookie's lifetime — useful for `groupby`. | +| `fingerprint_bot` | `"true"` / `"false"` | Set when a fingerprint was attached to the event. | +| `fingerprint_allowlisted` | `"true"` / `"false"` | Whether this cookie was issued via `GrantChallengeCookie(...)` rather than a real submission. | +| `fingerprint_allowlist_reason` | string | The reason argument passed to `GrantChallengeCookie(...)` (only set when allowlisted). | +| `user_agent` | string | The client's User-Agent at the time of the event. | + +This makes it straightforward to write your own scenarios on top of the built-in ones. For example, alerting on any client the challenge identified as automation: + +```yaml +type: leaky +name: mycorp/appsec-bot-detected +filter: | + evt.Parsed.source == "crowdsec-appsec-challenge" && + evt.Parsed.fingerprint_bot == "true" +groupby: evt.Meta.source_ip +capacity: 1 +leakspeed: 1m +labels: + type: appsec + service: bot-detection +``` + +Or, more targeted, alerting on repeat offenders that fail submission for the same automation reason: + +```yaml +type: leaky +name: mycorp/appsec-automation-repeat +filter: | + evt.Parsed.source == "crowdsec-appsec-challenge" && + evt.Parsed.challenge_event == "failed" && + evt.Parsed.challenge_fail_reason == "fast-bot-detection" +groupby: evt.Meta.source_ip +capacity: 5 +leakspeed: 10m +``` + +For deeper queries that the flat fields don't cover, `evt.Unmarshaled.fingerprint` exposes the same helper methods as the in-hook `fingerprint` object: + +```yaml +filter: | + evt.Parsed.source == "crowdsec-appsec-challenge" && + evt.Unmarshaled.fingerprint.HasAutomationSignal() +``` + +## Next steps + +- [Bot detection configuration](configuration.md) — tune the master secret, key rotation, cookie TTL, and JS obfuscation pool sizes. +- [Hooks reference](../hooks.md) — full list of helpers and the new `on_challenge` / `on_challenge_submit` stages. +- [Request lifecycle](../request-lifecycle.md) — where the challenge runs relative to WAF rules. diff --git a/crowdsec-docs/docs/appsec/hooks.md b/crowdsec-docs/docs/appsec/hooks.md index 32f58eb72..8a454d6a9 100644 --- a/crowdsec-docs/docs/appsec/hooks.md +++ b/crowdsec-docs/docs/appsec/hooks.md @@ -6,12 +6,14 @@ sidebar_position: 4 The Application Security Component lets you hook into different stages to change behavior at runtime. -Hooks run in four phases: +Hooks run in six phases: - `on_load`: Called just after the rules have been loaded into the engine. - `pre_eval`: Called after a request has been received but before the rules are evaluated. - `post_eval`: Called after the rules have been evaluated. - `on_match`: Called after a successful match of a rule. If multiple rules, this hook will be called only once. +- `on_challenge`: Called for in-band requests carrying a valid challenge cookie, with the decoded `fingerprint` object available. See [Bot detection](bot_detection/intro.md). (In-band only.) +- `on_challenge_submit`: Called when a client POSTs a challenge response to `/crowdsec-internal/challenge/submit`, after crypto validation and fingerprint decryption. See [Bot detection](bot_detection/intro.md). (In-band only.) ## Using hooks @@ -98,6 +100,9 @@ This hook is intended to be used to disable rules only for this particular reque | `DisableBodyInspection` | `func()` | Skip body inspection for the current request (also bypasses the maximum body size check). See [Request body size handling](#request-body-size-handling) | | `ValidateRequestWithSchema` | `func(ref str) bool` | Validate the current request against an OpenAPI schema previously loaded under `ref` (returns `true` on success). On failure, structured details are published to `hook_vars` (see [OpenAPI Schema Validation](api_validation.md#validation-result-variables)). | | `hook_vars` | `map[string]string` | Per-request scratch space shared with later hooks and propagated to the resulting event. Helpers such as `ValidateRequestWithSchema` publish their results here. | +| `SendChallenge` | `func()` | Instruct the AppSec component to serve a JavaScript challenge for this request. No-op if the request already carries a valid challenge cookie. See [Bot detection](bot_detection/intro.md). | +| `GrantChallengeCookie` | `func(reason str, ttl str?)` | Mint a valid challenge cookie for this client (allowlist escape hatch for trusted user-agents or internal probes). `reason` is recorded in logs (≤256 bytes); optional `ttl` (a Go duration like `"24h"`) overrides the configured `cookie_ttl`. | +| `SetChallengeDifficulty` | `func(level str)` | Override the proof-of-work difficulty for this request. Valid levels: `"disabled"`, `"low"`, `"medium"` (default), `"high"`, `"impossible"`. See [Challenge difficulty levels](#challenge-difficulty-levels). | #### Example @@ -119,12 +124,15 @@ This hook is mostly intended for debugging or threat-hunting purposes. #### Available helpers -| Helper Name | Type | Description | -| ------------- | -------------- | ------------------------------------------------------------ | -| `IsInBand` | `bool` | `true` if the request is in the in-band processing phase | -| `IsOutBand` | `bool` | `true` if the request is in the out-of-band processing phase | -| `DumpRequest` | `func()` | Dump the request to a file | -| `req` | `http.Request` | Original HTTP request received by the remediation component | +| Helper Name | Type | Description | +| ------------------------ | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `IsInBand` | `bool` | `true` if the request is in the in-band processing phase | +| `IsOutBand` | `bool` | `true` if the request is in the out-of-band processing phase | +| `DumpRequest` | `func()` | Dump the request to a file | +| `req` | `http.Request` | Original HTTP request received by the remediation component | +| `SendChallenge` | `func()` | Instruct the AppSec component to serve a JavaScript challenge for this request. No-op if the request already carries a valid challenge cookie. See [Bot detection](bot_detection/intro.md). | +| `GrantChallengeCookie` | `func(reason str, ttl str?)` | Mint a valid challenge cookie for this client (allowlist escape hatch for trusted user-agents or internal probes). `reason` is recorded in logs (≤256 bytes); optional `ttl` (a Go duration like `"24h"`) overrides the configured `cookie_ttl`. | +| `SetChallengeDifficulty` | `func(level str)` | Override the proof-of-work difficulty for this request. Valid levels: `"disabled"`, `"low"`, `"medium"` (default), `"high"`, `"impossible"`. See [Challenge difficulty levels](#challenge-difficulty-levels). | #### DumpRequest @@ -212,6 +220,58 @@ on_match: - SetRemediation("allow") ``` +### `on_challenge` + +This hook fires for in-band requests that carry a valid `__crowdsec_challenge` cookie — i.e. clients that have already passed the JavaScript challenge once. The decoded device `fingerprint` is available, so this is the right place to apply per-request decisions based on what the challenge learned about the client. Skipped if the request has no valid challenge cookie. **In-band only.** + +See [Bot detection](bot_detection/intro.md) for the broader picture. + +#### Available helpers + +| Helper Name | Type | Description | +| -------------------------------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `SendChallenge` | `func()` | Force a re-challenge for this request even though the client already has a cookie (e.g. when fingerprint mismatches indicate the cookie may have been replayed). | +| `SetChallengeDifficulty` | `func(level str)` | Override the proof-of-work difficulty for the next challenge issued. See [Challenge difficulty levels](#challenge-difficulty-levels). | +| `EvaluateMismatches` | `func() MismatchReport` | Run the configured mismatch checks against the fingerprint and return a structured report. Result is cached per request. See [The `MismatchReport` object](#the-mismatchreport-object). | +| `fingerprint` | `FingerprintData` | The decoded fingerprint object. See [The `fingerprint` object](#the-fingerprint-object). | +| `fingerprint.UAMobileMismatch` | `func() bool` | `true` if the mobile signals carried by the fingerprint contradict the User-Agent header. | +| `fingerprint.AcceptLanguageMismatch` | `func(req http.Request) bool` | `true` if the `Accept-Language` header is inconsistent with the languages reported by the fingerprint. | +| `fingerprint.TimezoneCountryMismatch` | `func(country str) bool` | `true` if the timezone reported by the fingerprint is inconsistent with the given country code (typically obtained from a GeoIP lookup on the client IP). | + +#### Example + +```yaml +on_challenge: + - filter: EvaluateMismatches().High() >= 1 + apply: + - SendChallenge() +``` + +### `on_challenge_submit` + +This hook fires when a client POSTs a challenge response to `/crowdsec-internal/challenge/submit`, **after** the AppSec component has cryptographically validated the submission and decrypted the fingerprint, but **before** the success cookie is issued. This is the right place to refuse cookies to clients the challenge has positively identified as automation. **In-band only.** + +#### Available helpers + +| Helper Name | Type | Description | +| ----------------------- | ------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `RejectSubmission` | `func(reason str, verbosity str?)` | Refuse to issue a challenge cookie despite a valid crypto submission. `reason` is recorded in logs. Optional `verbosity`: `"minimal"`, `"info"` (default), `"verbose"` — controls how much fingerprint detail is logged. | +| `GrantChallengeCookie` | `func(reason str, ttl str?)` | Issue the challenge cookie inline as part of the submit response (no 307 redirect). `reason` is recorded in logs; optional `ttl` (a Go duration like `"24h"`) overrides the configured `cookie_ttl`. | +| `LogAccepted` | `func(msg str, verbosity str?)` | Emit a structured "submission accepted" log line. Same `verbosity` semantics as `RejectSubmission`. | +| `EvaluateMismatches` | `func() MismatchReport` | Same as in `on_challenge` — run the mismatch checks against the just-decrypted fingerprint. | +| `fingerprint` | `FingerprintData` | The decoded fingerprint object — see [The `fingerprint` object](#the-fingerprint-object). | + +#### Example + +```yaml +on_challenge_submit: + - filter: fingerprint.FastBotDetection.Bool() == true + apply: + - RejectSubmission("fast-bot-detection") + - apply: + - LogAccepted("challenge submission accepted") +``` + ## Detailed Helpers Information ### `SetRemediation*` @@ -273,3 +333,52 @@ For example: - To get the client IP: `req.RemoteAddr` - To get the HTTP method: `req.Method` - To get the FQDN: `req.Host` + +### Challenge difficulty levels + +`SetChallengeDifficulty(level)` accepts the following levels. Numbers are approximate proof-of-work iteration counts and rough wall-clock solve times on a modern desktop browser; mobile is meaningfully slower. + +| Level | Approx. iterations | Approx. solve time | When to use | +| -------------- | ------------------ | ------------------ | ---------------------------------------------------------------------------------------------------------------------------- | +| `"disabled"` | 0 (any nonce wins) | instant | Functional smoke testing or when you only care about the fingerprint, not the proof-of-work. | +| `"low"` | ~1 024 | 0.2 – 2 s | Latency-sensitive endpoints, mobile-heavy traffic. | +| `"medium"` | ~4 096 | 1 – 8 s | **Default.** Reasonable trade-off between user experience and attacker cost. | +| `"high"` | ~32 768 | 7 – 60 s | Routes under active abuse; clients you already suspect. | +| `"impossible"` | unsolvable | n/a | Hard block: the AppSec component rejects the submission server-side. Use to fully block a client without leaking the reason. | + +### The `fingerprint` object + +In `on_challenge` and `on_challenge_submit` hooks, `fingerprint` exposes the device data collected by the in-browser library. The most commonly used fields: + +| Field | Type | Description | +| --------------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | +| `fingerprint.FastBotDetection.Bool()` | `bool` | `true` if the in-browser fast-bot-detection library raised any signal (CDP, headless, automation framework, impossible device profile, …). | +| `fingerprint.Signals` | object | Raw category roll-ups: device, browser, automation, graphics, codecs, locale. | +| `fingerprint.Bot` | object | Convenience accessor for the individual bot signals. | +| `fingerprint.Allowlisted` | `bool` | `true` if the cookie was minted via `GrantChallengeCookie(...)` rather than a real challenge submission. | +| `fingerprint.AllowlistReason` | `str` | Operator-supplied reason from `GrantChallengeCookie(reason, ...)`, copied through to logs. | +| `fingerprint.FSID` | `str` | Per-fingerprint identifier, stable across the cookie's lifetime. Useful for correlating logs. | + +For the higher-level bot detection workflow (what the library actually detects, how to allowlist legitimate bots, behavioral scenarios), see [Bot detection](bot_detection/intro.md). + +### The `MismatchReport` object + +`EvaluateMismatches()` returns a cached-per-request `MismatchReport` summarising every mismatch signal that fired against the current fingerprint. + +| Method | Returns | Description | +| ---------------------------- | ---------- | -------------------------------------------------------------------------- | +| `.Count()` | `int` | Total number of signals fired. | +| `.Empty()` | `bool` | `true` if no signal fired. | +| `.High() / .Medium() / .Low()` | `int` | Count of fired signals by severity. | +| `.Has(reason str)` | `bool` | `true` if the specific signal `reason` fired. | +| `.Reasons()` | `[]string` | Stable-ordered list of fired reason keys. | +| `.String()` | `str` | Compact human-readable form: `"reason1(sev),reason2(sev)"`. Useful in logs. | + +Example: + +```yaml +on_challenge_submit: + - filter: EvaluateMismatches().High() >= 1 && EvaluateMismatches().Has("cdp") + apply: + - RejectSubmission("high-severity-mismatch") +``` diff --git a/crowdsec-docs/sidebars.ts b/crowdsec-docs/sidebars.ts index cee84bcf3..feb48c326 100644 --- a/crowdsec-docs/sidebars.ts +++ b/crowdsec-docs/sidebars.ts @@ -713,6 +713,14 @@ const sidebarsConfig: SidebarConfig = { { type: "doc", id: "appsec/rules_examples" }, ], }, + { + type: "category", + label: "Bot detection", + link: { type: "doc", id: "appsec/bot_detection/intro" }, + items: [ + { type: "doc", id: "appsec/bot_detection/configuration" }, + ], + }, { type: "category", label: "References", diff --git a/crowdsec-docs/src/components/remediation-support-badge.tsx b/crowdsec-docs/src/components/remediation-support-badge.tsx index e3f122615..a068f18ad 100644 --- a/crowdsec-docs/src/components/remediation-support-badge.tsx +++ b/crowdsec-docs/src/components/remediation-support-badge.tsx @@ -8,6 +8,7 @@ type RemediationSupportBadgesProps = { Mode: boolean; // Mode is a boolean that controls the color of the Mode bubble Metrics: boolean; // Metrics is a boolean that controls the color of the Metrics bubble Appsec?: boolean; // Appsec is a boolean that controls the color of the AppSec bubble + BotDetection?: boolean; // BotDetection is a boolean that controls the color of the Bot Detection bubble }; const RemediationSupportBadge = ({ title, description, support }: { title: string; description: string; support: string }) => { @@ -38,12 +39,14 @@ export default function RemediationSupportBadges({ Prometheus, Mode, Appsec, + BotDetection, }: Readonly): React.JSX.Element { const mtlsSupport = MTLS ? "Supported" : "Unsupported"; const metricsSupport = Metrics ? "Supported" : "Unsupported"; const prometheusSupport = Prometheus ? "Supported" : "Unsupported"; const modeSupport = Mode ? "Live & Stream" : "Stream only"; const appsecSupport = Appsec !== undefined && Appsec ? "Supported" : "Unsupported"; + const botDetectionSupport = BotDetection !== undefined && BotDetection ? "Supported" : "Unsupported"; return (
@@ -54,6 +57,13 @@ export default function RemediationSupportBadges({ support={appsecSupport} /> )} + {BotDetection !== undefined && ( + + )} diff --git a/crowdsec-docs/unversioned/bouncers/nginx.mdx b/crowdsec-docs/unversioned/bouncers/nginx.mdx index b2db62174..69f890e4a 100644 --- a/crowdsec-docs/unversioned/bouncers/nginx.mdx +++ b/crowdsec-docs/unversioned/bouncers/nginx.mdx @@ -28,7 +28,7 @@ import RemediationSupportBadges from "@site/src/components/remediation-support-b 💬 Discourse

- + A lua Remediation Component for nginx.