protein-quest
Python package to search/retrieve/filter proteins and protein structures.
It uses
- Uniprot Sparql endpoint to search for proteins and their measured or predicted 3D structures.
- Uniprot taxonomy to search for taxonomy.
- QuickGO to search for Gene Ontology terms.
- gemmi to work with macromolecular models.
- dask-distributed to compute in parallel.
The package is used by
An example workflow:
graph TB;
taxonomy[/Search taxon/] -. taxon_ids .-> searchuniprot[/Search UniprotKB/]
goterm[/Search GO term/] -. go_ids .-> searchuniprot[/Search UniprotKB/]
searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
searchuniprot -. uniprot_accessions .-> searchemdb[/Search EMDB/]
searchuniprot -. uniprot_accessions .-> searchuniprotdetails[/Search UniProt details/]
searchintactionpartners[/Search interaction partners/] -.-x |uniprot_accessions|searchuniprot
searchcomplexes[/Search complexes/]
searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
searchemdb -. emdb_ids .->fetchemdb[Retrieve EMDB]
fetchpdbe -->|mmcif_files| chainfilter{{Filter on chain of uniprot}}
chainfilter --> |mmcif_files| residuefilter{{Filter on chain length}}
fetchad -->|mmcif_files| confidencefilter{{Filter out low confidence}}
confidencefilter --> |mmcif_files| ssfilter{{Filter on secondary structure}}
residuefilter --> |mmcif_files| ssfilter
ssfilter -. mmcif_files .-> convert2cif([Convert to cif])
ssfilter -. mmcif_files .-> convert2uniprot_accessions([Convert to UniProt accessions])
classDef dashedBorder stroke-dasharray: 5 5;
goterm:::dashedBorder
taxonomy:::dashedBorder
searchemdb:::dashedBorder
fetchemdb:::dashedBorder
searchintactionpartners:::dashedBorder
searchcomplexes:::dashedBorder
searchuniprotdetails:::dashedBorder
convert2cif:::dashedBorder
convert2uniprot_accessions:::dashedBorder
(Dotted nodes and edges are side-quests.)
Install
pip install protein-quest
Or to use the latest development version:
pip install git+https://github.com/haddocking/protein-quest.git
Usage
The main entry point is the protein-quest command line tool which has multiple subcommands to perform actions.
To use programmaticly, see the Jupyter notebooks and API documentation.
While downloading or copying files it uses a global cache (located at ~/.cache/protein-quest) and hardlinks to save disk space and improve speed.
This behavior can be customized with the --no-cache, --cache-dir, and --copy-method command line arguments.
Search Uniprot accessions
protein-quest search uniprot \
--taxon-id 9606 \
--reviewed \
--subcellular-location-uniprot "nucleus" \
--subcellular-location-go GO:0005634 \
--molecular-function-go GO:0003677 \
--limit 100 \
uniprot_accs.txt
Search for PDBe structures of uniprot accessions
protein-quest search pdbe uniprot_accs.txt pdbe.csv
pdbe.csv file is written containing the the PDB id and chain of each uniprot accession.
Search for Alphafold structures of uniprot accessions
protein-quest search alphafold uniprot_accs.txt alphafold.csv
Search for EMDB structures of uniprot accessions
protein-quest search emdb uniprot_accs.txt emdbs.csv
To retrieve PDB structure files
protein-quest retrieve pdbe pdbe.csv downloads-pdbe/
To retrieve AlphaFold structure files
protein-quest retrieve alphafold alphafold.csv downloads-af/
For each entry downloads the cif file.
To retrieve EMDB volume files
protein-quest retrieve emdb emdbs.csv downloads-emdb/
To filter AlphaFold structures on confidence
Filter AlphaFoldDB structures based on confidence (pLDDT). Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.
protein-quest filter confidence \
--confidence-threshold 50 \
--min-residues 100 \
--max-residues 1000 \
./downloads-af ./filtered
To filter PDBe files on chain of uniprot accession
Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.
protein-quest filter chain \
pdbe.csv \
./downloads-pdbe ./filtered-chains
To filter PDBe files on nr of residues
protein-quest filter residue \
--min-residues 100 \
--max-residues 1000 \
./filtered-chains ./filtered
To filter on secondary structure
To filter on structure being mostly alpha helices and have no beta sheets. See the following notebook to determine the ratio of secondary structure elements.
protein-quest filter secondary-structure \
--ratio-min-helix-residues 0.5 \
--ratio-max-sheet-residues 0.0 \
--write-stats filtered-ss/stats.csv \
./filtered-chains ./filtered-ss
Search Taxonomy
protein-quest search taxonomy "Homo sapiens" -
Search Gene Ontology (GO)
You might not know what the identifier of a Gene Ontology term is at protein-quest search uniprot.
You can use following command to search for a Gene Ontology (GO) term.
protein-quest search go --limit 5 --aspect cellular_component apoptosome -
Search for interaction partners
Use https://www.ebi.ac.uk/complexportal to find interaction partners of given UniProt accession.
protein-quest search interaction-partners Q05471 interaction-partners-of-Q05471.txt
The interaction-partners-of-Q05471.txt file contains uniprot accessions (one per line).
Search for complexes
Given Uniprot accessions search for macromolecular complexes at https://www.ebi.ac.uk/complexportal and return the complex entries and their members.
echo Q05471 | protein-quest search complexes - complexes.csv
The complexes.csv looks like
query_protein,complex_id,complex_url,complex_title,members
Q05471,CPX-2122,https://www.ebi.ac.uk/complexportal/complex/CPX-2122,Swr1 chromatin remodelling complex,P31376;P35817;P38326;P53201;P53930;P60010;P80428;Q03388;Q03433;Q03940;Q05471;Q06707;Q12464;Q12509
Search for UniProt details
To get details (like protein name, sequence length, organism) for a list of UniProt accessions.
protein-quest search uniprot-details uniprot_accs.txt uniprot_details.csv
The uniprot_details.csv looks like:
uniprot_accession,uniprot_id,sequence_length,reviewed,protein_name,taxon_id,taxon_name
A0A087WUV0,ZN892_HUMAN,522,True,Zinc finger protein 892,9606,Homo sapiens
Convert structure files to .cif format
Some tools (for example powerfit) only work with .cif files and not *.cif.gz or *.bcif files.
protein-quest convert structures --format cif --output-dir ./filtered-cif ./filtered-ss
Convert structure files to UniProt accessions
After running some filters you might want to know which UniProt accessions are still present in the filtered structures.
protein-quest convert uniprot ./filtered-ss uniprot_accs.filtered.txt
Model Context Protocol (MCP) server
Protein quest can also help LLMs like Claude Sonnet 4 by providing a set of tools for protein structures.

To run mcp server you have to install the mcp extra with:
pip install protein-quest[mcp]
The server can be started with:
protein-quest mcp
The mcp server contains an prompt template to search/retrieve/filter candidate structures.
Shell autocompletion
The protein-quest command line tool supports shell autocompletion using shtab.
Initialize for bash shell with:
mkdir -p ~/.local/share/bash-completion/completions
protein-quest --print-completion bash > ~/.local/share/bash-completion/completions/protein-quest
Initialize for zsh shell with:
mkdir -p ~/.local/share/zsh/site-functions
protein-quest --print-completion zsh > ~/.local/share/zsh/site-functions/_protein-quest
fpath=("$HOME/.local/share/zsh/site-functions" $fpath)
autoload -Uz compinit && compinit
Contributing
For development information and contribution guidelines, please see CONTRIBUTING.md.