Skip to main content

Single Molecule Validation

ChemAudit's single molecule validation provides comprehensive structural analysis for individual molecules. This feature is ideal for quick checks, exploring new compounds, or validating structures before batch processing.

Supported Input Formats

ChemAudit automatically detects and supports multiple input formats:

FormatExampleAuto-Detected
SMILESCCOYes
InChIInChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3Yes
MOL BlockV2000/V3000 formatYes
IUPAC Nameaspirin, 2-acetoxybenzoic acidYes
Auto-Detection

Simply paste your molecule in any format — including IUPAC names like "aspirin" or "2-acetoxybenzoic acid". ChemAudit automatically detects the input type and converts names to SMILES. See IUPAC Name Conversion for details.

How to Validate

Using the Web Interface

  1. Navigate to the Single Validation page (home)
  2. Enter or paste your molecule in the input field
  3. Click Validate
  4. Review results across all tabs:
    • Validation: Structural checks and overall score
    • Alerts: PAINS, BRENK, NIH, ZINC, ChEMBL screening
    • Scoring: ML-readiness, drug-likeness, ADMET, NP-likeness
    • Scoring Profiles: Consensus score, lead/fragment-likeness, property breakdowns, bioavailability radar
    • Standardization: ChEMBL-compatible cleanup with provenance timeline
    • Database Lookup: PubChem, ChEMBL, COCONUT cross-references

Using the API

curl -X POST http://localhost:8001/api/v1/validate \
-H "Content-Type: application/json" \
-d '{
"molecule": "CCO",
"format": "auto"
}'

Validation Checks Explained

ChemAudit runs 5 basic checks on every molecule plus 17 deep validation checks organized into three domains. All check severities can be customized through the severity configuration panel.

Basic Checks

These fundamental checks assess structural validity:

CheckSeverityDescriptionCommon Causes of Failure
ParsabilityCriticalCan the input be parsed into a valid molecule?Malformed SMILES, invalid characters
SanitizationErrorDoes the molecule pass RDKit sanitization?Structural inconsistencies, invalid atom types
ValenceCriticalDo all atoms have chemically valid bond counts?Typos in SMILES, incorrect charges
AromaticityErrorCan aromatic systems be kekulized?Invalid aromatic systems, wrong electron count
ConnectivityWarningIs the molecule a single connected component?Mixtures, salts, disconnected fragments
Critical Failures

If a critical check fails, the molecule structure is invalid and cannot be used for further analysis.

Deep Validation — Chemical Composition

Six checks examining what the molecule is made of:

CheckSeverityDescription
Mixture DetectionWarningIdentifies disconnected fragments and classifies each as drug, salt, solvent, or unknown
Solvent ContaminationWarningMatches fragments against 15+ known solvents (water, DMSO, DMF, acetonitrile, methanol, ethanol, etc.)
Inorganic FilterWarning/ErrorDetects inorganic (no carbon → Error) or organometallic (carbon + metal → Warning) compounds
Radical DetectionWarningFlags atoms with unpaired electrons
Isotope LabelsInfoDetects isotope-labeled atoms (deuterium, ¹³C, tritium, etc.)
Trivial MoleculeErrorFlags molecules with ≤ 3 heavy atoms as too small for meaningful analysis

Deep Validation — Structural Complexity

Six checks assessing structural features that may complicate analysis:

CheckSeverityDescription
Hypervalent AtomsWarningAtoms exceeding maximum allowed valence for their element
Polymer DetectionInfoDetected via SGroup markers, MW > 1500 Da, or dummy atoms
Ring StrainWarning3- or 4-membered rings with significant angle strain
Macrocycle DetectionInfoRings with > 12 atoms
Charged SpeciesInfoNet charge calculation, positive/negative atoms, zwitterion detection
Explicit Hydrogen AuditInfoMixed explicit/implicit hydrogen representation

Deep Validation — Stereo & Tautomers

Five checks related to stereochemistry and tautomeric forms:

CheckSeverityDescription
Stereoisomer EnumerationWarningEnumerates possible stereoisomers from undefined centers (cap: 128)
Undefined StereocentersWarningCounts chiral centers without R/S specification
Tautomer DetectionInfoEnumerates tautomers and checks if input matches canonical form
Aromatic System ValidationWarningFlags unusual aromatic ring sizes (not 5 or 6) and charged aromatic atoms
Coordinate DimensionInfoDetects 2D, 3D, or no coordinate data
Severity Customization

All deep validation check severities can be overridden through the severity configuration panel. This lets you adjust which checks are treated as errors vs. warnings vs. informational based on your specific use case.

Validation Options

Preserve Aromatic Notation

By default, ChemAudit outputs canonical SMILES in kekulized form (explicit single/double bonds). Enable this option to preserve aromatic notation:

Input:

c1ccccc1

Output (default - kekulized):

C1=CC=CC=C1

Output (preserve_aromatic=true):

c1ccccc1

Understanding Results

Overall Score

The overall validation score ranges from 0-100 and indicates structure quality:

Score RangeQualityInterpretation
90-100ExcellentStructure is valid and ready for use
70-89GoodMinor issues detected, review recommended
50-69FairSignificant issues need attention
0-49PoorCritical problems, structure likely invalid

Molecule Information

Every validated molecule returns comprehensive information:

  • Input SMILES: Original input
  • Canonical SMILES: Standardized SMILES representation
  • InChI: International Chemical Identifier
  • InChIKey: Hashed InChI for database lookups
  • Molecular Formula: Element composition
  • Molecular Weight: Exact mass
  • Atom Count: Number of heavy atoms

Issue Details

Failed checks appear in the issues list with:

  • Check name: Which validation failed
  • Severity: Critical, Warning, or Info
  • Message: Human-readable description
  • Affected atoms: Atom indices involved (if applicable)
  • Details: Additional technical information

Scoring Profiles Tab

The Scoring Profiles tab provides advanced drug-likeness analysis beyond the standard Scoring tab:

  • Consensus Score — Drug-likeness consensus across 5 rule sets (Lipinski, Ghose, Veber, Egan, Muegge), scored 0–5
  • Lead & Fragment Likeness — Lead-likeness assessment, Rule of 3 compliance, salt inventory, ligand efficiency
  • Property Breakdown — Per-atom TPSA and LogP breakdowns, Bertz complexity index, Fsp3 detail
  • Bioavailability Radar — 6-axis radar chart for oral bioavailability assessment plus BOILED-Egg scatter for GI absorption and BBB permeation predictions
  • Atom Contribution Viewer — Per-atom property contribution heatmap

See Scoring Profiles for full details on profile scoring and the custom profile builder.

Standardization Provenance

When viewing the Standardization tab, a provenance timeline shows exactly what changed at each pipeline stage:

  • Vertical timeline with per-stage cards that auto-expand when changes occurred
  • Change types tracked: charge normalization, bond normalization, ring aromaticity, radical changes, fragment removal
  • DVAL cross-references: Links to deep validation findings (e.g., "DVAL-01: 2 undefined stereocenters detected")
  • Tautomer summary: Number of tautomers enumerated, canonical form, complexity flag
  • Stereo detail: Per-center before/after configuration and reason for change

Enable provenance in the API with include_provenance=true on the /standardize endpoint. See Standardization for details.

Bookmarking Results

After validation, click the Bookmark button (star icon) in the results header to save the molecule and its full result snapshot. Bookmarked molecules appear on the Bookmarks & History page where you can search, tag, and revisit them.

Severity Configuration

Click the gear icon to open the severity configuration panel. This lets you override the default severity of any validation check:

  • Choose Error, Warning, or Info for each check
  • Reset individual checks back to their defaults
  • Overrides persist in your browser's localStorage
  • The overall verdict dynamically recomputes based on your effective severities

RDKit Version Source

The canonical SMILES shown in molecule info includes a tooltip indicating which RDKit version was used for canonicalization. This is useful for reproducibility when sharing results.

Common Validation Errors

Valence Errors

Problem: Atom has incorrect number of bonds

Examples:

InvalidValidExplanation
CC(C)(C)(C)CCC(C)(C)CCarbon can have max 4 bonds
CN (with 4 bonds to N)C[N+]Quaternary nitrogen needs charge

Kekulization Failures

Problem: Aromatic ring doesn't follow Hückel's rule (4n+2 electrons)

Examples:

InvalidValidExplanation
c1cccc1c1ccccc1Benzene needs 6 carbons (6 pi electrons)
c1cccccccc1C1=CC=CC=CC=CC=C18 pi electrons - not aromatic

Unclosed Rings

Problem: Ring opening doesn't have matching closing

Examples:

InvalidValidExplanation
C1CCCC1CCCC1Ring 1 not closed
C1CCC2C1CCC2CC2C1Rings 1 and 2 must both close

Best Practices

  1. Validate early: Check structures before starting experiments
  2. Review warnings: Don't ignore stereo or representation warnings
  3. Use canonical forms: Work with canonical SMILES for consistency
  4. Cross-check databases: Use database lookup to verify compound identity
  5. Document issues: Record any validation warnings in your data

Next Steps