Stopping distributed scraping with country-level CIDR, on stock nginx

On a certain website, I noticed traffic from a particular country’s IPs had grown noticeably. Looking closer, it wasn’t the usual “one server hammering you” type — it was scraping spread across a great many IPs, each at a modest pace.

This is a record of how I read its nature, chose how to block it by “blast radius” (the range of damage if something breaks), and rolled it out without stopping the running servers. I hope it offers a clue to anyone troubled by similar distributed access.

What was happening

Aggregating the access logs, the increased traffic had these characteristics:

So it wasn’t an attack — it was a content-harvesting bot designed to evade rate limits. It seems to be collecting articles relentlessly, apparently routed through a residential / mobile proxy network.

What matters here is that “each IP is modest” point. The approach of “detect and individually block IPs that hit hard in a short window” — looking at per-source behavior — barely works against this opponent.

Aspect Typical high-volume access This distributed access
Source IPs Concentrated in a few Spread across 15,000+
Per-IP frequency High, stands out Low, hard to tell from real users
Blockable per IP? ◎ Easy to detect by threshold ✕ Slips under the threshold; tightening catches real users

If fighting by individual-IP behavior is a losing matchup, the natural answer is to block in bulk by the source’s “country / network unit.” Fortunately, this bot was concentrated in one country’s ISPs.

Why block by IP range (CIDR)

When you hear “block by country,” you might brace for “do I list 15,000 IPs?” In practice there’s no need — you handle it with CIDR (IP address ranges).

CIDR is written like 203.0.113.0/24, where one entry represents not a single IP but a whole network range. The ranges are quite wide; the smaller the number (the prefix length), the vaster they get.

CIDR notation IPs covered by one line (approx.)
/24 ~256
/16 ~65,000
/12 ~1.04 million
/10 ~4.19 million

So a single /10 line covers about 4.19 million IPs.

Why can one line cover so much? In binary, everything up to the prefix length is the network part (the range checked for a match), and the rest is the host part (matches any value). With /24, the lower 8 bits are free, so one line points at 256 addresses at once.

The address space allocated to a given country, as an aggregated CIDR list, comes to roughly 5,500 lines. Those 5,500 lines cover more than 340 million IPs. And crucially —

Blocking individual IPs one by one is whack-a-mole against 340 million. A country-level CIDR list lets you block by area, ahead of time, in about 5,500 lines. That was the essential advantage.

Choosing the approach by “blast radius”

There are several ways to do country-level CIDR blocking. The premise here was a setup with no load balancer or CDN in front — each server exposes 80/443 directly. In other words, if the frontmost nginx goes down, that one box goes offline along with the web.

Under that premise, the selection criterion is less about performance and more about “how far the collateral damage spreads if it breaks (blast radius).” I lined up three candidates.

Approach Mechanism Damage range if it breaks
Host firewall Drop target IP ranges in the kernel The whole host. A misconfiguration risks locking out your own SSH. Easy to get container forwarding wrong, too
nginx + external GeoIP database A dedicated module determines the country Inside the nginx container. But the official image lacks the module, so you end up owning a self-built image
nginx’s built-in geo module + CIDR list A standard feature judges IP ranges Inside the nginx container. Even if the data file breaks, pre-start validation catches it and nginx itself stays unharmed

Placing the three on “effect × blast radius” makes the right choice clear.

Method-selection chart: the x-axis is damage range (large to small), the y-axis is effect (low to high); nginx geo+CIDR sits in the high-effect, low-impact quadrant

I went with the third: nginx’s standard geo module + a CIDR list. The deciding points were:

The external GeoIP database approach is conceptually sound, but it means discarding the official image and promoting the frontmost nginx image to self-managed. In a setup where the front falling means one box goes down, that “expanding the failure surface you carry yourself” wasn’t worth it. I judged it enough to migrate once multi-country support or higher accuracy becomes necessary. The host firewall is the most efficient, but it has the highest chance of a lockout accident, so I kept it in reserve as a last resort for volumetric attacks.

The config is very plain. Load the CIDR list in the http context,

# A lookup table that just returns 0/1
geo $blocked {
    default 0;
    include /etc/nginx/blocked_cidr.conf;   # lines like "203.0.113.0/24 1;"
}

and in the server context, close the connection immediately on a match (444 is nginx’s internal status that drops the connection without returning a response).

if ($blocked) {
    return 444;
}

The list itself is just lines of the target country’s CIDRs with 1; appended.

203.0.113.0/24 1;
198.51.100.0/22 1;
...

Since geo is managed internally with a radix tree, the lookup cost is essentially negligible even with thousands of entries.

Running the list’s auto-update unattended and safely

CIDR allocations shift gradually, so I want to update the list periodically. At the same time, I don’t want to spend human effort on every update, and I absolutely want to avoid restarting nginx with a broken list and taking a box down. So I auto-update via a scheduled job on each server, with several safety nets.

Auto-update flow: generate → validate with the full config → on failure do nothing / on success overwrite keeping the same inode and restart → record metrics

There are three key points.

1. Validate with the “real config” before going live. Not the new list alone, but the full configuration combining the actual config files and certificates, run through nginx -t in a throwaway container. If it doesn’t pass, the production file is never touched. Even if the download or generation fails, the running nginx is unharmed.

2. Overwrite “by rewriting the same file” (preserving the inode). When you mount config into a container file-by-file, replacing the file (rename) can leave the container still looking at the old entity (inode) — a subtle but easy trap. So instead of swapping in a new file, I overwrite the contents of the existing file (keeping the same inode) and then restart.

3. Avoid simultaneous restarts. Without a load balancer, multiple boxes restarting at once momentarily narrows the intake. I added a random wait to each server’s scheduled job to stagger the restart timing.

Update success / failure is emitted as metrics, so failures are noticeable.

Rolling out to a downtime-free fleet, with a canary

Because there’s no load balancer, “try it on just one production box first” can’t be done with a normal deploy (the same config file goes to every box). So I got creative with how the config is shipped.

  1. First, ship to all boxes with the blocking logic commented out. The lookup table, the list, and the auto-update mechanism are in place, but blocking isn’t active (a safe, inert state).
  2. Enable blocking on just one box, by hand, and restart it (the canary).
  3. On that one box’s real traffic, measure before and after.
  4. If there are no problems, remove the comment in the config and roll out to all boxes.

The canary’s measurements were clear-cut.

Aspect Before After
Target-country IPs Thousands passing through Almost all blocked
Leakage Very slight (only the HTTP→HTTPS redirect that runs first; no content is served)
False blocks of non-targets (legitimate) Zero
Error rate Normal No change

Observed all day, there were no restart loops or the like; it stayed stable. Being able to confirm “it’s working” and “it’s not catching legitimate users” with numbers rather than guesses, before widening to all boxes, was reassuring.

Trade-offs and limits

To be honest, country-level blocking requires accepting some compromise.

Even so, being able to clearly lower the immediate load — with minimal work, on the official image, in a form that doesn’t ripple to the core even if it breaks — was a cost-effective move.

Summary

It’s not flashy, but I’d be glad if it helps as an example of building “in a way that doesn’t break,” one piece at a time, for anyone who meets the same situation somewhere.