I run a subscription-based application where my primary value proposition is the content itself. Due to the limited size of my dataset, content scraping poses an existential threat to my business. I need help designing an effective anti-scraping solution for my specific use case.
Current Situation:
- Small, finite dataset that could be completely scraped with ~20.000 requests
- Users must have a paid subscription to access content
- Already experiencing multiple instances of scraping and reselling of content
- Authentication is user-based (not IP-based)
Proposed Solution:
I’m considering implementing a two-tier rate limiting system:
- Per-minute limit: When exceeded, user must solve a CAPTCHA/challenge to continue
- Per-day quota: When exceeded, automatic permanent account ban, because no legitimate user can arrive to that
Key Questions:
- Is this approach likely to be effective for my use case?
- What are potential drawbacks or ways scrapers might circumvent this?
- Are there better alternatives I should consider?
- What would be appropriate thresholds for the rate limits given my dataset size?
Additional Context:
Traditional rate-limiting solutions I’ve found are designed for large-scale websites where even 5000 requests/day is acceptable. These solutions don’t address my need to protect a small, valuable dataset.
Technical Implementation Details:
- Backend technology: django
- Current authentication method: JWT
Has anyone implemented similar protection for a small-scale content-based service? Looking for both successful approaches and pitfalls to avoid.
You need to sign in to view this answers
Leave feedback about this