Data Sources & Ethics
Data Sources & Ethics
Where the evidence comes from, and the line we hold around it.
Last updated 2026-05
What we read
In V1 the qualitative evidence comes from public Reddit and YouTube content, scoped to the communities and channels where your target consumer actually talks about the problem. Every quote in a report links back to its source.
How we access it
We work API-first where an official API exists, under the source platform's data terms. For web sources we honour robots.txt and respect rate limits with per-source and global quotas.
- No private content: no login walls, no paywall bypass, nothing behind authentication.
- Source attribution is preserved internally; reports cite the source URL.
- We operate under the official Reddit Data API terms and respect deletion propagation.
- Analyzed content is never sold or licensed to third parties.
Anonymization before storage
Personal identifiers are removed before anything is written down. Handles are one-way hashed, names and places are stripped, and a cleanup pass catches the remainder. The raw scraped text is never persisted — only the anonymized form. Reports never expose a pseudonym.
Takedowns & erasure
Source platforms and content authors can request removal through a documented process. Anyone whose post was analyzed — customer or not — can request erasure, and we propagate that deletion across every store and any cited quote within 30 days. Upstream deletions on the source platforms are polled and propagated automatically.
The ethical line
We mine a consumer voice that is already public, to answer a narrow question, under a documented methodology — not to profile individuals. We do not build individual-level profiles, and special-category threads are dropped by default.