protein-quest
Python package to search/retrieve/filter proteins and protein structures
It uses
- Uniprot Sparql endpoint to search for proteins and their measured or predicted 3D structures.
- Uniprot taxonomy to search for taxonomy.
- QuickGO to search for Gene Ontology terms.
- gemmi to work with macromolecular models.
- dask-distributed to compute in parallel.
An example workflow:
graph TB;
classDef dashedBorder stroke-dasharray: 5 5;
taxonomy[/Search taxon/]:::dashedBorder
taxonomy[/Search taxon/] -. taxon_ids .-> searchuniprot[/Search UniprotKB/]
searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
fetchpdbe -->|mmcif_files_with_uniprot_acc| chainfilter{Filter on chain of uniprot}
chainfilter --> |mmcif_files| residuefilter{Filter on chain length}
fetchad -->|pdb_files| confidencefilter{Filter out low confidence}
Install
pip install protein-quest
Or to use the latest development version:
pip install git+https://github.com/haddocking/protein-quest.git
Usage
The main entry point is the protein-quest
command line tool which has multiple subcommands to perform actions.
To use programmaticly, see API documentation.
Search Uniprot accessions
protein-quest search uniprot \
--taxon-id 9606 \
--reviewed \
--subcellular-location-uniprot nucleus \
--subcellular-location-go GO:0005634 \
--molecular-function-go GO:0003677 \
--limit 100 \
uniprot_accs.txt
Search for PDBe structures of uniprot accessions
protein-quest search pdbe uniprot_accs.txt pdbe.csv
pdbe.csv
file is written containing the the PDB id and chain of each uniprot accession.
Search for Alphafold structures of uniprot accessions
protein-quest search alphafold uniprot_accs.txt alphafold.csv
To retrieve PDB structure files
protein-quest retrieve pdbe pdbe.csv downloads-pdbe/
To retrieve AlphaFold structure files
protein-quest retrieve alphafold alphafold.csv downloads-af/
For each entry downloads the summary.json and cif file.
To filter AlphaFold structures on confidence
Filter AlphaFoldDB structures based on confidence (pLDDT). Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.
protein-quest filter confidence \
--confidence-threshold 50 \
--min-residues 100 \
--max-residues 1000 \
./downloads-af ./filtered
To filter PDBe files on chain of uniprot accession
Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.
protein-quest filter chain \
pdbe.csv \
./downloads-pdbe ./filtered-chains
To filter PDBe files on nr of residues
protein-quest filter residue \
--min-residues 100 \
--max-residues 1000 \
./filtered-chains ./filtered
Search Gene Ontology (GO)
You might not know what the identifier of a Gene Ontology term is at protein-quest search uniprot
.
You can use following command to search for a Gene Ontology (GO) term.
protein-quest search go --limit 5 --aspect cellular_component apoptosome -
Model Context Protocol (MCP) server
Protein quest can also help LLMs like Claude Sonnet 4 by providing a set of tools for protein structures.
To run mcp server you have to install the mcp
extra with:
pip install protein-quest[mcp]
# or in development
uv sync --all-extras --all-groups
The server can be started with:
protein-quest mcp
The mcp server contains an prompt template to search/retrieve/filter candidate structures.
Contributing
For development information and contribution guidelines, please see CONTRIBUTING.md.