Skip to main content

Scoring Overview

ChemAudit provides comprehensive molecular scoring across multiple dimensions to help you assess compound quality, drug-likeness, and suitability for various applications.

Available Scoring Systems

Score TypeWhat It MeasuresRangeUse Case
ML-Readiness4-dimension assessment: structural quality, property profile, complexity, representability0–100Dataset curation, model training
Drug-likeness7 filters (Lipinski, QED, Veber, Ro3, Ghose, Egan, Muegge) + consensus + lead-likenessMultipleDrug discovery, lead identification
Safety FiltersPAINS, Brenk, NIH, ZINC, ChEMBL (7 sub-catalogs) structural alertsPass/FailCompound library filtering
ADMETSA Score, ESOL solubility, CNS MPO, Pfizer/GSK/Golden Triangle rulesVariousLead optimization, candidate selection
NP-LikenessFragment-based natural product similarity (7 categories)−5 to +5Natural product research, diversity analysis
Scaffold AnalysisMurcko scaffold extraction (standard + generic)N/ASAR analysis, scaffold hopping
Aggregator Likelihood6 risk indicators + 10 known SMARTS patterns0–1Assay design, hit validation
Bioavailability & Permeation6-axis radar (LIPO, SIZE, POLAR, INSOLU, INSATU, FLEX) + BOILED-Egg GI/BBBRadar: 0–6 in range; Egg: yolk/white/greyOral bioavailability, CNS drug design

How to Use Scoring

Web Interface

Scoring results appear automatically on the Scoring tab after validation:

  1. Enter and validate a molecule
  2. Navigate to the Scoring tab
  3. Review all scores with interpretations
  4. Click score details for more information

API - All Scores

Request all available scores:

curl -X POST http://localhost:8001/api/v1/score \
-H "Content-Type: application/json" \
-d '{
"molecule": "CCO",
"format": "smiles"
}'

API - Specific Scores

Request only specific score types:

curl -X POST http://localhost:8001/api/v1/score \
-H "Content-Type: application/json" \
-d '{
"molecule": "CCO",
"include": ["ml_readiness", "druglikeness", "admet"]
}'

Interpreting Scores Together

Different scores provide complementary information. Here's how to interpret them together:

High-Quality Drug Discovery Candidate

  • ML-Readiness: 80-100 (descriptors and fingerprints work)
  • Drug-likeness: Passes Lipinski and Veber, QED > 0.5
  • Safety: No PAINS, BRENK, or NIH alerts
  • ADMET: Good solubility, low SA score, passes Pfizer/GSK rules
  • Aggregator: Low risk

Machine Learning Dataset Molecule

  • ML-Readiness: 80-100 (high priority)
  • Validation: 90-100 (structure quality)
  • Drug-likeness: Not critical, but useful for diversity
  • ADMET: Not critical for training data

Natural Product

  • NP-Likeness: > 1.0 (natural product-like)
  • Drug-likeness: May fail Lipinski (NPs often larger)
  • Scaffold: Complex ring systems common
  • ADMET: SAscore often high (hard to synthesize)

Screening Hit for Validation

  • Safety: Check PAINS (hit could be artifact)
  • Aggregator: Check aggregation risk
  • Drug-likeness: QED and Lipinski for early assessment
  • ADMET: Initial ADMET profile

Score Caveats

Context Matters

Scores are guidelines, not absolutes:

  • Lipinski failures: Many drugs violate Lipinski (antibiotics, natural products)
  • PAINS alerts: 87 FDA-approved drugs match PAINS patterns
  • High SAscore: Doesn't mean impossible to synthesize
  • Low QED: Doesn't mean it won't be a good drug

Molecule Size Effects

Very small or very large molecules may have unexpected scores:

  • Small molecules (< 10 heavy atoms): May score poorly on drug-likeness
  • Large molecules (> 50 heavy atoms): May fail size-based filters
  • Macrocycles: Often fail standard drug-likeness rules

Prediction Limitations

ADMET and other predictions are estimates:

  • Based on computational models, not experimental data
  • May not account for rare functional groups
  • Best used for prioritization, not as absolute truth
  • Validate experimentally for critical decisions

Score Combinations for Filtering

Strict Drug-like Filter

validation_score >= 90 AND
lipinski_passed = true AND
veber_passed = true AND
qed >= 0.5 AND
pains_passed = true AND
aggregator_likelihood = "Low"

ML Training Data Filter

ml_readiness_score >= 80 AND
validation_score >= 90 AND
(No critical validation errors)

Natural Product Filter

np_likeness_score > 1.0 AND
validation_score >= 70 AND
has_scaffold = true

Batch Processing with Scoring

In batch mode, scoring statistics are aggregated:

  • Average scores: Mean across all molecules
  • Pass rates: Percentage passing each filter
  • Distribution: Score histograms
  • Outliers: Molecules with unusual score patterns

Use these statistics to:

  • Assess overall library quality
  • Identify problematic subsets
  • Guide curation efforts
  • Track improvements over time

Next Steps

Explore individual scoring systems in detail:

  • ML-Readiness — 4-dimension assessment (structural quality, properties, complexity, representability)
  • Drug-likeness — Lipinski, QED, Veber, Ro3, Ghose, Egan, Muegge, consensus, lead-likeness
  • Safety Filters — PAINS, Brenk, NIH, ZINC, ChEMBL (7 sub-catalogs)
  • ADMET — SA Score, ESOL, CNS MPO, Pfizer/GSK/Golden Triangle rules
  • NP-Likeness — Fragment-based natural product classification (7 categories)
  • Scaffold Analysis — Murcko scaffold extraction (standard + generic)
  • Aggregator Likelihood — 6 risk indicators + known SMARTS patterns
  • Bioavailability & Permeation — 6-axis radar + BOILED-Egg GI absorption and BBB penetration
  • Scoring Profiles — 8 presets + custom profile builder with desirability scoring