NP-Likeness Scoring
NP-likeness (Natural Product likeness) scoring evaluates whether a molecule resembles known natural products or synthetic compounds.
What It Measures
The NP-likeness score is based on structural fragments and their frequency in:
- Natural product databases (COCONUT, Dictionary of Natural Products)
- Synthetic molecule databases (PubChem, ZINC)
Higher scores indicate natural product-like character, lower scores indicate synthetic-like character.
Score Range and Interpretation
| Score | Category | Description |
|---|---|---|
| ≥ 2.0 | Strong NP-like | Natural product features clearly evident |
| 1.0–2.0 | NP-like | Suggestive of natural origin |
| 0.3–1.0 | Moderate NP-like | Some NP features present |
| −0.3 to 0.3 | Mixed | Both NP and synthetic characteristics |
| −1.0 to −0.3 | Moderate synthetic | More synthetic than NP features |
| −2.0 to −1.0 | Synthetic-like | Typical synthetic compound profile |
| < −2.0 | Strong synthetic | Lacks natural product features |
Color scale in the UI: green (NP-like) → slate (mixed) → red (synthetic).
Typical Scores
Natural products:
- Morphine: +2.5
- Taxol: +3.8
- Quinine: +2.1
Synthetic drugs:
- Ibuprofen: -1.8
- Aspirin: -0.9
- Lipitor: -2.3
API Usage
curl -X POST http://localhost:8001/api/v1/score \
-H "Content-Type: application/json" \
-d '{
"molecule": "CN1CCC23C4C1CC5=C2C(=C(C=C5)O)OC3C(C=C4)O",
"include": ["np_likeness"]
}'
Response:
{
"np_likeness": {
"score": 2.1,
"interpretation": "Natural product-like molecule",
"caveats": []
}
}
Caveats and Limitations
Molecule size effects:
For very large molecules (>50 heavy atoms), scores may be less reliable. The model is trained on typical drug-sized molecules.
Fragment-based:
The score is based on structural fragments. Molecules with unusual fragment combinations may have unexpected scores.
Training data bias:
Scores reflect the training data composition. Rare natural product classes may score as synthetic if underrepresented in training data.
When limitations apply, they're reported in the caveats array for transparency.
Use Cases
Natural Product Research
Identify natural product-like molecules in screening libraries:
np_likeness_score > 1.0
Diversity Analysis
Assess chemical diversity by NP-likeness distribution:
- Wide range (-3 to +3): High diversity
- Narrow range: Homogeneous library
- Skewed distribution: Biased toward NP or synthetic space
Scaffold Analysis
Combine with scaffold analysis to identify natural product-inspired scaffolds:
np_likeness_score > 1.0 AND
has_scaffold = true
Library Design
Balance natural product and synthetic character:
- NP-focused library: Score > 0
- Synthetic-focused library: Score < 0
- Balanced library: Score -1 to +1
Interpretation Guidelines
High positive score (> 2.0):
- Likely contains natural product fragments
- May have complex ring systems
- Often higher molecular weight
- Potentially harder to synthesize
Near zero (-1.0 to +1.0):
- Intermediate character
- May be NP-inspired synthetic molecules
- Simple aromatic compounds
- Modified natural products
High negative score (< -2.0):
- Synthetic-like fragments dominant
- Simpler structure
- Drug-like properties common
- Easier to synthesize
Relationship to Other Scores
NP-likeness vs Drug-likeness:
- Natural products often violate Lipinski
- Higher complexity (more rings, stereocenters)
- Often larger molecular weight
NP-likeness vs Synthetic Accessibility:
- Positive correlation: NP-like → harder to synthesize
- Not absolute: some NPs are easy to synthesize
NP-likeness vs Scaffold Complexity:
- NP-like molecules often have complex scaffolds
- Multiple fused ring systems
- More stereocenters
Best Practices
- Use as a guide: Not a definitive natural product classifier
- Combine with other data: Check COCONUT database for confirmation
- Consider therapeutic area: Some drug classes are NP-like (antibiotics)
- Track during optimization: Monitor changes in NP character
- Balance with drug-likeness: High NP-likeness may sacrifice drug-likeness
Reference
Ertl, P., Roggo, S. & Schuffenhauer, A. (2008). Natural product-likeness score and its application for prioritization of compound libraries. Journal of Chemical Information and Modeling, 48(1), 68–74.
Next Steps
- Database Integrations — Search COCONUT database
- Scaffold Analysis — Analyze ring systems
- Scoring Overview — All scoring systems