Diagnostics
The Diagnostics page provides low-level chemical structure analysis tools for debugging parsing errors, investigating structural discrepancies, and pre-screening files before batch processing.
Access Diagnostics from the Data Preparation dropdown in the header navigation.
Available Tools
SMILES Diagnostics
Parse and analyze a SMILES string to understand its structure at the atom level:
- Atom-by-atom breakdown: Each atom with its element, charge, isotope, and bonding
- Ring closures: Identification of ring opening/closing tokens
- Branch points: Parenthetical branch structure
- Parse errors: Detailed error messages for invalid SMILES with the position of failure
When a molecule fails validation with "Cannot parse input," paste it into SMILES Diagnostics to see exactly where and why parsing fails.
InChI Layer Diff
Compare InChI layers between two molecules to identify exact structural differences:
- Formula layer — Molecular formula comparison
- Connectivity layer — Atom connectivity differences
- Hydrogen layer — Hydrogen attachment differences
- Charge layer — Formal charge differences
- Stereo layers — Stereochemistry differences (tetrahedral, double bond)
This is useful for understanding why two molecules that look similar have different InChIKeys or why standardization changed a structure.
Round-Trip Validation
Validate SMILES → MOL → SMILES round-trip fidelity:
- Parse the input SMILES into an RDKit molecule object
- Convert to MOL block (3D coordinates if possible)
- Convert back to SMILES
- Compare the original and round-tripped SMILES
Differences indicate potential representation loss — the molecule may not survive serialization/deserialization cycles in some workflows.
File Pre-Validation
Check a file for parseable SMILES before uploading to batch processing:
- File parsing: Validates that the file format is correct (CSV, SDF)
- Column detection: Identifies SMILES and name columns
- Parse rate: Reports what percentage of SMILES can be parsed by RDKit
- Error summary: Lists unparseable entries with row numbers and error messages
Running file pre-validation before a large batch upload helps you fix format issues and unparseable entries upfront, rather than discovering them during processing.
Coordinate Dimension Analysis
Detect the coordinate dimensionality of molecules in MOL blocks or SDF files:
| Dimension | Meaning |
|---|---|
| 2D | X/Y coordinates only (Z = 0) — suitable for 2D depiction |
| 3D | X/Y/Z coordinates — includes conformational information |
| No coordinates | No atom coordinate data present |
| Mixed | File contains molecules with different dimensionalities |
This helps verify that SDF files have the expected coordinate data before using them for 3D analysis or docking workflows.
Use Cases
Debugging Validation Failures
- A molecule fails validation → copy the SMILES
- Open Diagnostics → SMILES Diagnostics
- Paste the SMILES → see the exact parsing error or structural issue
- Fix the SMILES and re-validate
Comparing Standardized vs. Original
- Standardize a molecule → note the original and standardized SMILES
- Open Diagnostics → InChI Layer Diff
- Enter both SMILES → see exactly which structural layers changed
- Understand whether stereochemistry, connectivity, or charge was affected
Pre-Screening Batch Files
- Open Diagnostics → File Pre-Validation
- Upload your CSV or SDF
- Review the parse rate and error summary
- Fix problematic entries
- Upload the cleaned file to Batch Processing
Next Steps
- Single Validation — Validate individual molecules
- Batch Processing — Process large datasets
- Standardization — Understand the standardization pipeline
- QSAR-Ready Pipeline — Prepare datasets for ML