ChemAudit
ChemAudit is a comprehensive web-based chemical structure validation suite designed for drug discovery, medicinal chemistry, and cheminformatics research. It combines powerful validation, standardization, and scoring capabilities into an intuitive interface.
What ChemAudit Does
ChemAudit helps you ensure chemical structure quality across your research workflow:
- Validate molecules with detailed structural checks (SMILES, InChI, MOL, or IUPAC names)
- Screen for problematic substructures using industry-standard alert catalogs
- Score molecules for ML-readiness, drug-likeness, and ADMET properties
- Standardize structures using the ChEMBL-compatible pipeline with full provenance tracking
- Process batches of up to 1 million molecules with real-time progress tracking
- Analyze datasets with interactive charts, chemical space maps, and scaffold analysis
- Bookmark results and track validation history with an immutable audit trail
- Compare molecules side-by-side with property radar overlays
- Share results via permalinks and receive notifications on completion
- Export results in 9 formats (CSV, Excel, SDF, JSON, PDF, Fingerprints, Dedup, Scaffold, Property Matrix)
Key Features
Single Molecule Validation
Validate SMILES, InChI, or MOL blocks with detailed structural checks including valence, kekulization, sanitization, stereochemistry, and representation quality.
Batch Processing
Process large datasets with up to 1 million molecules depending on deployment configuration. Real-time WebSocket progress updates keep you informed during processing.
Structural Alerts
Screen against over 1,500 patterns from PAINS, BRENK, NIH, ZINC, and ChEMBL catalogs. Identify potential assay interference compounds and unwanted chemical moieties before investing time and resources.
Comprehensive Scoring
Evaluate molecules across 10+ dimensions:
- ML-readiness: Descriptor calculability, fingerprint generation, size constraints
- Drug-likeness: Lipinski, QED, Veber, Rule of Three, Ghose, Egan, Muegge filters
- ADMET: Synthetic accessibility, solubility, CNS penetration, bioavailability
- NP-likeness: Natural product vs. synthetic classification
- Scaffold analysis: Murcko scaffold extraction
- Aggregator likelihood: Colloidal aggregation risk assessment
- Ligand efficiency: LE and LLE metrics for lead optimization
- Bioavailability radar: 6-axis oral bioavailability profile
- Property breakdown: Per-atom TPSA and LogP contributions
- Salt inventory: Salt form detection and fragment classification
ChEMBL-Compatible Standardization
Standardize structures using a pipeline compatible with ChEMBL's curation workflow:
- Structural issue detection
- Salt and solvent removal
- Parent molecule extraction
- Optional tautomer canonicalization
QSAR-Ready Pipeline
Prepare chemical datasets for machine learning with a multi-step curation pipeline: standardization, salt stripping, neutralization, tautomer canonicalization, and duplicate removal. Batch export in CSV, SDF, or JSON.
Structure Filter
Multi-stage funnel filtering for generative chemistry outputs with property filters, SMARTS substructure matching, preset configurations (drug-like, lead-like, fragment-like), funnel visualization, and REINVENT-compatible scoring.
Dataset Audit
Comprehensive dataset health auditing with an overall health score, contradictory label detection, dataset diff/comparison, curation reports, and interactive treemap drill-down.
Diagnostics
Low-level structure analysis tools: SMILES diagnostics, InChI layer diff, round-trip validation, file pre-validation, and coordinate dimension analysis.
Batch Analytics & Visualizations
Explore batch results with interactive charts and chemical space mapping:
- Butina clustering with configurable Tanimoto cutoff (Morgan ECFP4 fingerprints)
- Chemical taxonomy classification (~50 curated SMARTS rules)
- Registration hashing for tautomer-invariant deduplication
- Deduplication across 4 levels (exact, tautomer, stereo, salt-form)
- Scaffold analysis with diversity metrics
- PCA and t-SNE chemical space projections
- Matched molecular pairs and activity cliff detection
- Linked brushing across all visualizations
Bookmarks & History
Save validation snapshots and maintain an audit trail:
- Bookmark molecules with tags and notes
- Browse validation history with filtering by date, outcome, and source
- Submit bookmarks as batch jobs
- Session-scoped privacy with GDPR-compliant data erasure
Scoring Profiles
Evaluate molecules against customizable property criteria:
- 8 built-in presets (Lipinski, Lead-like, Fragment, CNS, Ghose, Veber, PPI, NP)
- Custom profile builder with 8 property thresholds and weights
- Apply profiles to batch jobs or re-score subsets inline
- Export/import profiles as JSON for sharing
IUPAC Name Support
Enter chemical names directly — ChemAudit auto-detects IUPAC names, common names, and trade names, converting them to SMILES via OPSIN or PubChem before validation.
Notifications & Sharing
Stay informed and share results:
- Email notifications on batch completion
- Webhook callbacks with HMAC-SHA256 signatures
- Shareable permalinks for batch reports (30-day expiry)
Database Integration
Cross-reference molecules against:
- PubChem: Properties, synonyms, IUPAC names
- ChEMBL: Bioactivity data, targets, clinical phase
- COCONUT: Natural product sources and organisms
- Wikidata: Isomeric SMILES, InChI, CAS numbers, molecular mass
Plus:
- Identifier Resolution: Resolve 10+ identifier types (SMILES, InChI, InChIKey, CID, ChEMBL ID, CAS, DrugBank, ChEBI, UNII, Wikipedia URL, compound names) with automatic detection and cross-database linking via UniChem
- Cross-Database Comparison: Compare structural representations across all four databases with consistency verdicts
Quick Start
Get ChemAudit running in minutes with Docker:
# Clone the repository
git clone https://github.com/Kohulan/ChemAudit.git
cd chemaudit
# Create environment file
cp .env.example .env
# Edit .env to set required secrets (POSTGRES_PASSWORD, SECRET_KEY, etc.)
# Start all services
docker-compose up -d
Access Points (Development):
- Web UI: http://localhost:3002
- API Documentation: http://localhost:8001/api/v1/docs
- API ReDoc: http://localhost:8001/api/v1/redoc
Access Points (Production):
- Web UI and API: http://localhost (behind Nginx)
Navigation
Ready to get started? Here's where to go next:
- Getting Started — Install and configure ChemAudit
- User Guide — Learn all features
- QSAR-Ready Pipeline — Curate datasets for ML
- Structure Filter — Multi-stage compound filtering
- Dataset Audit — Dataset health auditing
- Batch Analytics — Interactive dataset exploration
- Scoring Profiles — Custom property scoring
- Bookmarks & History — Save and track results
- API Reference — Integrate ChemAudit into your workflow
- Deployment — Deploy to production
- Troubleshooting — Solve common issues
External Resources
Support
Questions or issues? Open an issue on GitHub