DOKKAEBILABS
WhatsApp us
← All posts

Dokkaebi Labs · April 6, 2026 · 4 min read

How We Cleaned Up 875,000 Spam URLs: A Security Incident Response

Learn how we diagnosed and fixed a massive SEO spam attack that indexed 875,000 malicious URLs in 24 hours.

securityincident-responsecase-studyseo

The Call

Client woke up to a nightmare: Google Search Console showing 875,000 indexed pages. Their site had 5 real pages.

The legitimate pages were drowning in a sea of spam. Bots were hammering the server at scale. Server load was at 90%+ and climbing. Crawlers couldn't be stopped. Legitimate traffic was dying.

This wasn't just an SEO problem — it was a security incident.

What Happened

SEO spam injection attack. Attackers had compromised the site and created hundreds of thousands of malicious URLs designed to rank for high-value keywords: casinos, pharmaceuticals, cheap products. Google's crawler, doing its job, indexed all of them.

The attacker's goal: rank these spam URLs in Google, monetize them, and destroy the legitimate site's SEO in the process. It worked.

How We Found It

Symptom 1: Google Search Console: 875,000 pages indexed (vs. 5 real pages).

Symptom 2: Server logs: Thousands of requests per minute to URLs like /casino-bonuses, /viagra-online, /cheap-flights-xxxx.

Symptom 3: Legitimate traffic: Down 95%.

The Fix

We implemented a 6-layer defense strategy:

1. 410 Gone Responses (Immediate)

First instinct: return 404 for spam URLs. Wrong. Google treats 404 as "page disappeared" but might re-crawl later. Better: return 410 Gone — "this page is permanently gone" — which Google respects and de-indexes faster.

Nginx config:

location ~* ^/(casino|viagra|cheap-|buy-) {
    return 410;
}

Key insight: 410 > 404 for spam cleanup. Google de-indexes 410 responses in days, not weeks.

2. Bot Filtering (Hours 2–4)

Not all crawlers are legitimate. We implemented user-agent filtering:

if ($http_user_agent ~* (scrapy|curl|wget|bot)) {
    return 403;
}

Exception: We whitelisted Googlebot and other major search engines so legitimate crawling could continue.

3. Rate Limiting (Hours 4–6)

The attacks came from patterns of IP addresses. We implemented aggressive rate limiting:

limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;
limit_req zone=one burst=20 nodelay;

This means: max 10 requests per second per IP. Repeat offenders get throttled to nothing.

4. Security Headers (Hours 6–8)

We hardened the response headers to prevent future compromise:

add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "DENY" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header Content-Security-Policy "default-src 'self'" always;

5. Firewall Rules (Hours 8–12)

On Cloudflare, we added WAF rules to block known malicious IP ranges and patterns:

  • Block countries not in service region
  • Challenge requests with suspicious patterns
  • Rate limit by path (spam paths get hit harder)

6. Sitemap & Robots.txt Cleanup (Hours 12–24)

We regenerated the sitemap with only legitimate pages and updated robots.txt:

User-agent: *
Disallow: /casino
Disallow: /viagra
Disallow: /cheap-
Disallow: /buy-

Then submitted the updated sitemap to Google Search Console to accelerate de-indexing.

Results

De-indexing: Spam URLs removed from Google index over 2–4 weeks (expected when returning 410).

Server Load: Dropped from 90%+ to ~15%. The site was responsive again.

Bot Traffic: Malicious crawlers blocked. Legitimate traffic (Googlebot) allowed through.

Security: No re-infection with the hardened headers + rate limiting in place.

Timeline

  • Hour 0: Discovery via client alert
  • Hour 1: Root cause analysis (spam injection via parameter)
  • Hours 2–24: Deploy all 6 layers of fixes
  • Days 2–14: Google processes 410 responses, begins de-indexing
  • Week 2–4: Spam URLs removed from index

Lessons

1. 410 > 404

Always return 410 (Gone) for spam URLs you want to remove. It signals permanence to search engines and accelerates de-indexing.

2. Rate Limiting Is Non-Negotiable

Every production site should have rate limiting. It prevents this exact scenario and defends against DDoS.

3. Bot Filtering Matters

Not all traffic is good traffic. Filter by user-agent, IP, and request pattern.

4. Security Headers Are Free

These headers cost nothing to implement and prevent a whole class of attacks. Use them.

5. Monitor Your Indexed Page Count

Set up alerts in Google Search Console and Analytics. An unexpected spike in indexed pages is a red flag.

The Bigger Picture

This wasn't a simple SEO problem. It was a security incident that required:

  • Root cause analysis (how was it injected?)
  • Incident response (how do we stop it?)
  • Remediation (how do we prevent it again?)

The same principles apply to any security incident: understand it, stop it, learn from it.

Want Similar Help?

If your site is under attack, or you want to audit your infrastructure for vulnerabilities like this, we can help.

Schedule a security consultation →


Learn more about our security consulting services or read about our case study on this incident.

Have questions or want to discuss this further? Reach out on WhatsApp or email.

Get in touch →