ChemAudit

ChemAudit is a comprehensive web-based chemical structure validation suite designed for drug discovery, medicinal chemistry, and cheminformatics research. It combines powerful validation, standardization, and scoring capabilities into an intuitive interface.

What ChemAudit Does

ChemAudit helps you ensure chemical structure quality across your research workflow:

Validate molecules with detailed structural checks (SMILES, InChI, MOL, or IUPAC names)
Screen for problematic substructures using industry-standard alert catalogs
Score molecules for ML-readiness, drug-likeness, and ADMET properties
Standardize structures using the ChEMBL-compatible pipeline with full provenance tracking
Process batches of up to 1 million molecules with real-time progress tracking
Analyze datasets with interactive charts, chemical space maps, and scaffold analysis
Bookmark results and track validation history with an immutable audit trail
Compare molecules side-by-side with property radar overlays
Share results via permalinks and receive notifications on completion
Export results in 9 formats (CSV, Excel, SDF, JSON, PDF, Fingerprints, Dedup, Scaffold, Property Matrix)

Key Features

Single Molecule Validation

Validate SMILES, InChI, or MOL blocks with detailed structural checks including valence, kekulization, sanitization, stereochemistry, and representation quality.

Batch Processing

Process large datasets with up to 1 million molecules depending on deployment configuration. Real-time WebSocket progress updates keep you informed during processing.

Structural Alerts

Screen against over 1,500 patterns from PAINS, BRENK, NIH, ZINC, and ChEMBL catalogs. Identify potential assay interference compounds and unwanted chemical moieties before investing time and resources.

Comprehensive Scoring

Evaluate molecules across 10+ dimensions:

ML-readiness: Descriptor calculability, fingerprint generation, size constraints
Drug-likeness: Lipinski, QED, Veber, Rule of Three, Ghose, Egan, Muegge filters
ADMET: Synthetic accessibility, solubility, CNS penetration, bioavailability
NP-likeness: Natural product vs. synthetic classification
Scaffold analysis: Murcko scaffold extraction
Aggregator likelihood: Colloidal aggregation risk assessment
Ligand efficiency: LE and LLE metrics for lead optimization
Bioavailability radar: 6-axis oral bioavailability profile
Property breakdown: Per-atom TPSA and LogP contributions
Salt inventory: Salt form detection and fragment classification

ChEMBL-Compatible Standardization

Standardize structures using a pipeline compatible with ChEMBL's curation workflow:

Structural issue detection
Salt and solvent removal
Parent molecule extraction
Optional tautomer canonicalization

QSAR-Ready Pipeline

Prepare chemical datasets for machine learning with a multi-step curation pipeline: standardization, salt stripping, neutralization, tautomer canonicalization, and duplicate removal. Batch export in CSV, SDF, or JSON.

Structure Filter

Multi-stage funnel filtering for generative chemistry outputs with property filters, SMARTS substructure matching, preset configurations (drug-like, lead-like, fragment-like), funnel visualization, and REINVENT-compatible scoring.

Dataset Audit

Comprehensive dataset health auditing with an overall health score, contradictory label detection, dataset diff/comparison, curation reports, and interactive treemap drill-down.

Diagnostics

Low-level structure analysis tools: SMILES diagnostics, InChI layer diff, round-trip validation, file pre-validation, and coordinate dimension analysis.

Batch Analytics & Visualizations

Explore batch results with interactive charts and chemical space mapping:

Butina clustering with configurable Tanimoto cutoff (Morgan ECFP4 fingerprints)
Chemical taxonomy classification (~50 curated SMARTS rules)
Registration hashing for tautomer-invariant deduplication
Deduplication across 4 levels (exact, tautomer, stereo, salt-form)
Scaffold analysis with diversity metrics
PCA and t-SNE chemical space projections
Matched molecular pairs and activity cliff detection
Linked brushing across all visualizations

Bookmarks & History

Save validation snapshots and maintain an audit trail:

Bookmark molecules with tags and notes
Browse validation history with filtering by date, outcome, and source
Submit bookmarks as batch jobs
Session-scoped privacy with GDPR-compliant data erasure

Scoring Profiles

Evaluate molecules against customizable property criteria:

8 built-in presets (Lipinski, Lead-like, Fragment, CNS, Ghose, Veber, PPI, NP)
Custom profile builder with 8 property thresholds and weights
Apply profiles to batch jobs or re-score subsets inline
Export/import profiles as JSON for sharing

IUPAC Name Support

Enter chemical names directly — ChemAudit auto-detects IUPAC names, common names, and trade names, converting them to SMILES via OPSIN or PubChem before validation.

Stay informed and share results:

Email notifications on batch completion
Webhook callbacks with HMAC-SHA256 signatures
Shareable permalinks for batch reports (30-day expiry)

Database Integration

Cross-reference molecules against:

PubChem: Properties, synonyms, IUPAC names
ChEMBL: Bioactivity data, targets, clinical phase
COCONUT: Natural product sources and organisms
Wikidata: Isomeric SMILES, InChI, CAS numbers, molecular mass

Plus:

Identifier Resolution: Resolve 10+ identifier types (SMILES, InChI, InChIKey, CID, ChEMBL ID, CAS, DrugBank, ChEBI, UNII, Wikipedia URL, compound names) with automatic detection and cross-database linking via UniChem
Cross-Database Comparison: Compare structural representations across all four databases with consistency verdicts

Quick Start

Get ChemAudit running in minutes with Docker:

# Clone the repository
git clone https://github.com/Kohulan/ChemAudit.git
cd chemaudit

# Create environment file
cp .env.example .env
# Edit .env to set required secrets (POSTGRES_PASSWORD, SECRET_KEY, etc.)

# Start all services
docker-compose up -d

Access Points (Development):

Web UI: http://localhost:3002
API Documentation: http://localhost:8001/api/v1/docs
API ReDoc: http://localhost:8001/api/v1/redoc

Access Points (Production):

Web UI and API: http://localhost (behind Nginx)

Ready to get started? Here's where to go next:

Getting Started — Install and configure ChemAudit
User Guide — Learn all features
QSAR-Ready Pipeline — Curate datasets for ML
Structure Filter — Multi-stage compound filtering
Dataset Audit — Dataset health auditing
Batch Analytics — Interactive dataset exploration
Scoring Profiles — Custom property scoring
Bookmarks & History — Save and track results
API Reference — Integrate ChemAudit into your workflow
Deployment — Deploy to production
Troubleshooting — Solve common issues

External Resources

Support

Questions or issues? Open an issue on GitHub

What ChemAudit Does​

Key Features​

Single Molecule Validation​

Batch Processing​

Structural Alerts​

Comprehensive Scoring​

ChEMBL-Compatible Standardization​

QSAR-Ready Pipeline​

Structure Filter​

Dataset Audit​

Diagnostics​

Batch Analytics & Visualizations​

Bookmarks & History​

Scoring Profiles​

IUPAC Name Support​

Notifications & Sharing​

Database Integration​

Quick Start​

Navigation​

External Resources​

Support​