The case for a curated chemical safety API when PubChem has 119 million compounds and EPA CompTox has 900,000.
PubChem has 119 million compounds. EPA CompTox has 900,000. These are extraordinary scientific databases, built and maintained by serious people, free to use, and unrivaled for the work they were designed for. PubChem was built for chemistry research. CompTox was built for environmental and occupational toxicology. They are excellent at those jobs.
They were not built for the question a marketplace operator types into Google at 2 AM after a Prop 65 letter shows up in their inbox: "is this ingredient in this product, in this exposure context, currently classified as a safety concern by the regulators that matter to me?"
That's a different question. The shape of the gap shows up in three places.
Take Glyphosate. IARC classifies it Group 2A (probably carcinogenic to humans). EPA classifies it as "not likely to be carcinogenic to humans." EU CLP classifies it as Acute Tox. 4 oral plus Eye Damage 1 plus Aquatic Chronic 2.
What does "humans" mean here? An average adult? A pregnant person? An infant whose blood-brain barrier hasn't fully formed? An immunocompromised cancer patient on a regimen that changes how their body clears xenobiotics? Someone with chronic kidney disease? A nursing mother passing every metabolite through her breast milk?
These are not the same answer. The dose that's tolerable for a healthy adult through dermal exposure is not the dose that's tolerable for an infant through ingestion. The risk threshold that's appropriate for a workplace OSHA assessment is not appropriate for a kid's bath toy.
Public chemical APIs, on the rare occasions they include any classification at all, give you "humans" as a single bucket and call it a day. Sometimes you also get an aquatic-organism note. Almost never do you get the breakdown by life-stage, route, or population subgroup — the breakdowns that consumer-facing safety apps actually need.
We built ALETHEIA around 19 exposure contexts: 8 human life-stages (adult, child, teen, infant, pregnant, nursing, elderly, immunocompromised), 2 pets (domestic dog, domestic cat), 5 routes (dermal, inhalation, ingestion, ocular, occupational), and 4 environmental (general, aquatic-life, terrestrial-wildlife, pollinator). Every compound record is rendered through whichever context the caller asks for. A pregnancy app gets the prenatal answer. A pet-safety app gets the dog-or-cat answer. A marketplace screening kid products gets the infant or child answer. The same API call, the same record, different rendering.
Most public databases treat exposure context as an afterthought. We treat it as the primary index.
Glyphosate, again. Three of the most cited regulators classify it three different ways. A normal API call to a public chemical database returns one classification or no classification at all. The fact that EPA and IARC and EU CLP have published incompatible answers — that fact is the actually-important information for someone deciding whether to put it on their product page.
ALETHEIA returns the full distribution. Every compound record includes classifications from 15+ regulators (IARC, EPA, EU CLP, ECHA, NTP, CalEPA / OEHHA, FDA, Health Canada, ECCC, WHO, OSHA, NIOSH, plus more) with each agency's specific designation, year, and source attribution. Disagreement between agencies isn't hidden — it's the data.
Public chemical APIs hide the disagreement. We surface it.
Every regulator publishes their data. Every regulator's data has a different license. PubChem's terms permit a lot but not everything; some of its underlying sources have stricter reuse terms than the aggregator does. EPA data is US government work and broadly reusable, but not every CompTox source qualifies as government work. ECHA REACH data has explicit reuse restrictions in some product contexts. IARC monograph text is copyrighted.
If you're building a consumer-facing safety app and shipping it commercially, you are doing something that the underlying agencies' license terms may or may not permit. Most builders don't read the licenses. They assume "publicly available" means "I can use it commercially." That's not always true. The day a marketplace's legal team asks "where does this data come from and can we use it commercially?" — that's the day the licensing chain matters more than anything else.
We did the work. ALETHEIA's terms permit commercial reuse without per-record attribution. Not because we wrote our own license and ignored the source rules — because we cleared the chain at the source level, paid for what needed paying for, and structured the aggregator output so derivative use is unambiguous. This is invisible until you need it.
A chemical and consumer-product safety reference API. 1,879 curated compounds, 959 materials, 1,262 consumer products, 2,325 fragrance ingredients. Cross-agency consensus per record. 19 exposure contexts. Embeddable safety badges. Watchlist plus webhooks for compounds whose classifications change. Available on RapidAPI (Basic free with up to 10-compound batches, Pro $29 with 50-compound batches, Ultra $99 with 100-compound batches) or direct.
Smaller than PubChem on raw compound count. Bigger than PubChem on what consumer-facing safety actually needs. We have a full side-by-side comparison with PubChem and EPA CompTox if you want to see how the tradeoffs map to your specific use case.
ALETHEIA could be a Series A. It isn't, by choice. It's a product line under Holistic Quality LLC, an Ohio LLC owned by one person, run with the explicit intent of staying small and durable rather than fast and large.
Two reasons. First: VC-funded chemical-data companies have a pattern. They raise, they grow, they get acquired by a larger SaaS or by a chemical industry trade group, and the public-API surface that drew customers in the first place gets paywalled, deprecated, or shuttered. We didn't want to build something that would predictably go that way.
Second: the consumer-safety problem is not a venture-scale problem. There aren't enough marketplaces in regulated categories to support a billion-dollar valuation. There are more than enough to support a durable mid-six-figure-revenue business that does the work properly, charges fairly, and outlasts the next funding cycle. That's the shape we want.
#FTP — For The People, Always. It's not a marketing slogan. It's a posture about who we're building for.
More compounds, especially in the categories that matter most to current customers (cleaning, kids' products, pet products, personal care). More regulators in the consensus pool — Australian APVMA, Japanese MHLW, India CIB&RC are all on the roadmap. Mid-trial check-in emails so trial users actually convert. A status page so compliance buyers don't get cold feet. Better tooling for marketplaces specifically — webhook templates and listing-pipeline integrations.
The thing we're not doing: chasing PubChem on compound count. We will be smaller on raw catalog forever. The bet is that smaller-and-better serves the marketplace-and-consumer-product job better than larger-and-shallower. So far, the customers we have agree.
If you build something where chemical safety matters, and you've ever spent an afternoon trying to reconcile what IARC and EPA disagree about — ALETHEIA was built for you. The free tier is generous, the playground is open, and we'll talk to anyone evaluating it about whether we're actually the right tool for what they're trying to do, including telling them when we're not.
1,879 compounds, 19 exposure contexts, 15+ regulators. Free playground, no API key.