OiO.lk Blog security How to protect a small app from content scraping?
security

How to protect a small app from content scraping?


I run a subscription-based application where my primary value proposition is the content itself. Due to the limited size of my dataset, content scraping poses an existential threat to my business. I need help designing an effective anti-scraping solution for my specific use case.

Current Situation:

  • Small, finite dataset that could be completely scraped with ~20.000 requests
  • Users must have a paid subscription to access content
  • Already experiencing multiple instances of scraping and reselling of content
  • Authentication is user-based (not IP-based)

Proposed Solution:
I’m considering implementing a two-tier rate limiting system:

  1. Per-minute limit: When exceeded, user must solve a CAPTCHA/challenge to continue
  2. Per-day quota: When exceeded, automatic permanent account ban, because no legitimate user can arrive to that

Key Questions:

  1. Is this approach likely to be effective for my use case?
  2. What are potential drawbacks or ways scrapers might circumvent this?
  3. Are there better alternatives I should consider?
  4. What would be appropriate thresholds for the rate limits given my dataset size?

Additional Context:
Traditional rate-limiting solutions I’ve found are designed for large-scale websites where even 5000 requests/day is acceptable. These solutions don’t address my need to protect a small, valuable dataset.

Technical Implementation Details:

  • Backend technology: django
  • Current authentication method: JWT

Has anyone implemented similar protection for a small-scale content-based service? Looking for both successful approaches and pitfalls to avoid.



You need to sign in to view this answers

Exit mobile version