Structure Filter

The Structure Filter provides multi-stage funnel filtering for chemical libraries, generative chemistry outputs, or any SMILES collection. Molecules pass through a sequence of configurable filter stages, with a visual funnel showing pass/fail counts at each stage.

Filter Presets

Four built-in presets provide ready-to-use filter configurations:

Preset	Description	Key Criteria
drug_like	Lipinski-based drug-likeness	MW ≤ 500, LogP ≤ 5, HBD ≤ 5, HBA ≤ 10
lead_like	Lead optimization criteria	MW 200–450, LogP −1 to 4, RotB ≤ 7
fragment_like	Rule of Three for fragments	MW ≤ 300, LogP ≤ 3, HBD ≤ 3, HBA ≤ 3
permissive	Minimal filtering	Basic validity checks only

You can also define custom filter configurations with arbitrary property ranges and SMARTS-based substructure inclusion/exclusion patterns.

Using the Web Interface

Navigate to Structure Filter under the Library dropdown in the header
Choose your input method:
- Paste SMILES: Enter SMILES strings, one per line
- Upload file: Drag and drop a CSV or SDF file
Select a preset or configure custom filter stages
Add optional SMARTS patterns for substructure filtering
Click Filter
View the funnel visualization showing how many molecules pass or fail at each stage
Explore stage-by-stage results with per-molecule pass/fail details
Download passing molecules

Async Processing

For datasets with more than 1,000 molecules, filtering runs asynchronously with a WebSocket progress feed. Smaller datasets return results immediately.

Funnel Visualization

The funnel chart shows the filtering pipeline as a series of stages. Each stage displays:

Stage name and index
Input count — molecules entering the stage
Passed count — molecules passing the stage
Rejected count — molecules filtered out
Whether the stage is enabled or skipped

This makes it easy to identify which filter stage removes the most molecules and adjust your criteria accordingly.

Scoring Mode

The Structure Filter also provides a continuous 0–1 score for each molecule, useful for integration with generative models:

curl -X POST http://localhost:8001/api/v1/structure-filter/score \
  -H "Content-Type: application/json" \
  -d '{
    "smiles_list": ["CCO", "c1ccccc1", "CC(=O)Oc1ccccc1C(=O)O"],
    "preset": "drug_like"
  }'

Response:

{
  "scores": [0.85, 0.92, 0.78]
}

A score of null indicates an invalid or unparseable SMILES.

REINVENT Integration

A REINVENT-compatible scoring endpoint is available for direct integration with the REINVENT generative model:

curl -X POST "http://localhost:8001/api/v1/structure-filter/reinvent-score?preset=drug_like" \
  -H "Content-Type: application/json" \
  -d '[{"input_string": "CCO", "query_id": 0}, {"input_string": "c1ccccc1", "query_id": 1}]'

API Reference

Filter (Synchronous ≤ 1,000 molecules)

curl -X POST http://localhost:8001/api/v1/structure-filter/filter \
  -H "Content-Type: application/json" \
  -d '{
    "smiles_list": ["CCO", "c1ccccc1", "CC(=O)Oc1ccccc1C(=O)O"],
    "preset": "drug_like"
  }'

Response:

{
  "input_count": 3,
  "output_count": 2,
  "stages": [
    {
      "stage_name": "validity",
      "stage_index": 0,
      "input_count": 3,
      "passed_count": 3,
      "rejected_count": 0,
      "enabled": true
    },
    {
      "stage_name": "property_filter",
      "stage_index": 1,
      "input_count": 3,
      "passed_count": 2,
      "rejected_count": 1,
      "enabled": true
    }
  ],
  "molecules": [
    { "smiles": "CCO", "status": "passed", "failed_at": null, "rejection_reason": null },
    { "smiles": "c1ccccc1", "status": "passed", "failed_at": null, "rejection_reason": null },
    { "smiles": "CC(=O)Oc1ccccc1C(=O)O", "status": "rejected", "failed_at": "property_filter", "rejection_reason": "MW > 500" }
  ]
}

Batch Upload (> 1,000 molecules)

curl -X POST http://localhost:8001/api/v1/structure-filter/batch/upload \
  -F "file=@molecules.csv" \
  -F "preset=drug_like"

Check Status

curl http://localhost:8001/api/v1/structure-filter/batch/{job_id}/status

Get Results

curl http://localhost:8001/api/v1/structure-filter/batch/{job_id}/results

Download

# Passing molecules only (one SMILES per line)
curl http://localhost:8001/api/v1/structure-filter/batch/{job_id}/download/passed_txt -o passed.txt

# Full results with status
curl http://localhost:8001/api/v1/structure-filter/batch/{job_id}/download/full_csv -o results.csv

import requests

# Synchronous filter
response = requests.post(
    "http://localhost:8001/api/v1/structure-filter/filter",
    json={
        "smiles_list": ["CCO", "c1ccccc1", "INVALID"],
        "preset": "drug_like"
    }
)
result = response.json()
print(f"Input: {result['input_count']}, Output: {result['output_count']}")
for mol in result["molecules"]:
    print(f"  {mol['smiles']}: {mol['status']}")

Rate Limits

Endpoint	Limit
`POST /structure-filter/filter`	20 req/min
`POST /structure-filter/score`	30 req/min
`POST /structure-filter/reinvent-score`	30 req/min
`POST /structure-filter/batch/upload`	3 req/min
`GET /structure-filter/batch/*/status`	60 req/min
`GET /structure-filter/batch/*/results`	30 req/min
`GET /structure-filter/batch//download/`	10 req/min

WebSocket

For batch jobs, connect to the WebSocket for real-time progress:

ws://localhost:8001/ws/structure-filter/{job_id}

Same message format and keep-alive protocol as the batch processing WebSocket.

Next Steps

QSAR-Ready Pipeline — Curate molecules for ML before filtering
Batch Processing — Full validation and scoring
Scoring Profiles — Custom scoring criteria

Filter Presets​

Using the Web Interface​

Funnel Visualization​

Scoring Mode​

REINVENT Integration​

API Reference​

Filter (Synchronous ≤ 1,000 molecules)​

Batch Upload (> 1,000 molecules)​

Check Status​

Get Results​

Download​

Rate Limits​

WebSocket​

Next Steps​