Scoring Overview
ChemAudit provides comprehensive molecular scoring across multiple dimensions to help you assess compound quality, drug-likeness, and suitability for various applications.
Available Scoring Systems
| Score Type | What It Measures | Range | Use Case |
|---|---|---|---|
| ML-Readiness | 4-dimension assessment: structural quality, property profile, complexity, representability | 0–100 | Dataset curation, model training |
| Drug-likeness | 7 filters (Lipinski, QED, Veber, Ro3, Ghose, Egan, Muegge) + consensus + lead-likeness | Multiple | Drug discovery, lead identification |
| Safety Filters | PAINS, Brenk, NIH, ZINC, ChEMBL (7 sub-catalogs) structural alerts | Pass/Fail | Compound library filtering |
| ADMET | SA Score, ESOL solubility, CNS MPO, Pfizer/GSK/Golden Triangle rules | Various | Lead optimization, candidate selection |
| NP-Likeness | Fragment-based natural product similarity (7 categories) | −5 to +5 | Natural product research, diversity analysis |
| Scaffold Analysis | Murcko scaffold extraction (standard + generic) | N/A | SAR analysis, scaffold hopping |
| Aggregator Likelihood | 6 risk indicators + 10 known SMARTS patterns | 0–1 | Assay design, hit validation |
| Bioavailability & Permeation | 6-axis radar (LIPO, SIZE, POLAR, INSOLU, INSATU, FLEX) + BOILED-Egg GI/BBB | Radar: 0–6 in range; Egg: yolk/white/grey | Oral bioavailability, CNS drug design |
How to Use Scoring
Web Interface
Scoring results appear automatically on the Scoring tab after validation:
- Enter and validate a molecule
- Navigate to the Scoring tab
- Review all scores with interpretations
- Click score details for more information
API - All Scores
Request all available scores:
curl -X POST http://localhost:8001/api/v1/score \
-H "Content-Type: application/json" \
-d '{
"molecule": "CCO",
"format": "smiles"
}'
API - Specific Scores
Request only specific score types:
curl -X POST http://localhost:8001/api/v1/score \
-H "Content-Type: application/json" \
-d '{
"molecule": "CCO",
"include": ["ml_readiness", "druglikeness", "admet"]
}'
Interpreting Scores Together
Different scores provide complementary information. Here's how to interpret them together:
High-Quality Drug Discovery Candidate
- ML-Readiness: 80-100 (descriptors and fingerprints work)
- Drug-likeness: Passes Lipinski and Veber, QED > 0.5
- Safety: No PAINS, BRENK, or NIH alerts
- ADMET: Good solubility, low SA score, passes Pfizer/GSK rules
- Aggregator: Low risk
Machine Learning Dataset Molecule
- ML-Readiness: 80-100 (high priority)
- Validation: 90-100 (structure quality)
- Drug-likeness: Not critical, but useful for diversity
- ADMET: Not critical for training data
Natural Product
- NP-Likeness: > 1.0 (natural product-like)
- Drug-likeness: May fail Lipinski (NPs often larger)
- Scaffold: Complex ring systems common
- ADMET: SAscore often high (hard to synthesize)
Screening Hit for Validation
- Safety: Check PAINS (hit could be artifact)
- Aggregator: Check aggregation risk
- Drug-likeness: QED and Lipinski for early assessment
- ADMET: Initial ADMET profile
Score Caveats
Context Matters
Scores are guidelines, not absolutes:
- Lipinski failures: Many drugs violate Lipinski (antibiotics, natural products)
- PAINS alerts: 87 FDA-approved drugs match PAINS patterns
- High SAscore: Doesn't mean impossible to synthesize
- Low QED: Doesn't mean it won't be a good drug
Molecule Size Effects
Very small or very large molecules may have unexpected scores:
- Small molecules (< 10 heavy atoms): May score poorly on drug-likeness
- Large molecules (> 50 heavy atoms): May fail size-based filters
- Macrocycles: Often fail standard drug-likeness rules
Prediction Limitations
ADMET and other predictions are estimates:
- Based on computational models, not experimental data
- May not account for rare functional groups
- Best used for prioritization, not as absolute truth
- Validate experimentally for critical decisions
Score Combinations for Filtering
Strict Drug-like Filter
validation_score >= 90 AND
lipinski_passed = true AND
veber_passed = true AND
qed >= 0.5 AND
pains_passed = true AND
aggregator_likelihood = "Low"
ML Training Data Filter
ml_readiness_score >= 80 AND
validation_score >= 90 AND
(No critical validation errors)
Natural Product Filter
np_likeness_score > 1.0 AND
validation_score >= 70 AND
has_scaffold = true
Batch Processing with Scoring
In batch mode, scoring statistics are aggregated:
- Average scores: Mean across all molecules
- Pass rates: Percentage passing each filter
- Distribution: Score histograms
- Outliers: Molecules with unusual score patterns
Use these statistics to:
- Assess overall library quality
- Identify problematic subsets
- Guide curation efforts
- Track improvements over time
Next Steps
Explore individual scoring systems in detail:
- ML-Readiness — 4-dimension assessment (structural quality, properties, complexity, representability)
- Drug-likeness — Lipinski, QED, Veber, Ro3, Ghose, Egan, Muegge, consensus, lead-likeness
- Safety Filters — PAINS, Brenk, NIH, ZINC, ChEMBL (7 sub-catalogs)
- ADMET — SA Score, ESOL, CNS MPO, Pfizer/GSK/Golden Triangle rules
- NP-Likeness — Fragment-based natural product classification (7 categories)
- Scaffold Analysis — Murcko scaffold extraction (standard + generic)
- Aggregator Likelihood — 6 risk indicators + known SMARTS patterns
- Bioavailability & Permeation — 6-axis radar + BOILED-Egg GI absorption and BBB penetration
- Scoring Profiles — 8 presets + custom profile builder with desirability scoring