Skip to content

protein-quest

Documentation CI Research Software Directory Badge PyPI DOI Codacy Badge

Python package to search/retrieve/filter proteins and protein structures.

It uses

The package is used by

An example workflow:

graph TB;
    taxonomy[/Search taxon/] -. taxon_ids .-> searchuniprot[/Search UniprotKB/]
    goterm[/Search GO term/] -. go_ids .-> searchuniprot[/Search UniprotKB/]
    searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
    searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
    searchuniprot -. uniprot_accessions .-> searchemdb[/Search EMDB/]
    searchintactionpartners[/Search interaction partners/] -.-x |uniprot_accessions|searchuniprot
    searchcomplexes[/Search complexes/]
    searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
    searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
    searchemdb -. emdb_ids .->fetchemdb[Retrieve EMDB]
    fetchpdbe -->|mmcif_files| chainfilter{{Filter on chain of uniprot}}
    chainfilter --> |mmcif_files| residuefilter{{Filter on chain length}}
    fetchad -->|mmcif_files| confidencefilter{{Filter out low confidence}}
    confidencefilter --> |mmcif_files| ssfilter{{Filter on secondary structure}}
    residuefilter --> |mmcif_files| ssfilter
    ssfilter -. mmcif_files .-> convert2cif([Convert to cif])
    classDef dashedBorder stroke-dasharray: 5 5;
    goterm:::dashedBorder
    taxonomy:::dashedBorder
    searchemdb:::dashedBorder
    fetchemdb:::dashedBorder
    searchintactionpartners:::dashedBorder
    searchcomplexes:::dashedBorder
    convert2cif:::dashedBorder

(Dotted nodes and edges are side-quests.)

Install

pip install protein-quest

Or to use the latest development version:

pip install git+https://github.com/haddocking/protein-quest.git

Usage

The main entry point is the protein-quest command line tool which has multiple subcommands to perform actions.

To use programmaticly, see the Jupyter notebooks and API documentation.

While downloading or copying files it uses a global cache (located at ~/.cache/protein-quest) and hardlinks to save disk space and improve speed. This behavior can be customized with the --no-cache, --cache-dir, and --copy-method command line arguments.

Search Uniprot accessions

protein-quest search uniprot \
    --taxon-id 9606 \
    --reviewed \
    --subcellular-location-uniprot nucleus \
    --subcellular-location-go GO:0005634 \
    --molecular-function-go GO:0003677 \
    --limit 100 \
    uniprot_accs.txt
(GO:0005634 is "Nucleus" and GO:0003677 is "DNA binding")

Search for PDBe structures of uniprot accessions

protein-quest search pdbe uniprot_accs.txt pdbe.csv

pdbe.csv file is written containing the the PDB id and chain of each uniprot accession.

Search for Alphafold structures of uniprot accessions

protein-quest search alphafold uniprot_accs.txt alphafold.csv

Search for EMDB structures of uniprot accessions

protein-quest search emdb uniprot_accs.txt emdbs.csv

To retrieve PDB structure files

protein-quest retrieve pdbe pdbe.csv downloads-pdbe/

To retrieve AlphaFold structure files

protein-quest retrieve alphafold alphafold.csv downloads-af/

For each entry downloads the summary.json and cif file.

To retrieve EMDB volume files

protein-quest retrieve emdb emdbs.csv downloads-emdb/

To filter AlphaFold structures on confidence

Filter AlphaFoldDB structures based on confidence (pLDDT). Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.

protein-quest filter confidence \
    --confidence-threshold 50 \
    --min-residues 100 \
    --max-residues 1000 \
    ./downloads-af ./filtered

To filter PDBe files on chain of uniprot accession

Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.

protein-quest filter chain \
    pdbe.csv \
    ./downloads-pdbe ./filtered-chains

To filter PDBe files on nr of residues

protein-quest filter residue  \
    --min-residues 100 \
    --max-residues 1000 \
    ./filtered-chains ./filtered

To filter on secondary structure

To filter on structure being mostly alpha helices and have no beta sheets.

protein-quest filter secondary-structure \
    --ratio-min-helix-residues 0.5 \
    --ratio-max-sheet-residues 0.0 \
    --write-stats filtered-ss/stats.csv \
    ./filtered-chains ./filtered-ss

Search Taxonomy

protein-quest search taxonomy "Homo sapiens" -

Search Gene Ontology (GO)

You might not know what the identifier of a Gene Ontology term is at protein-quest search uniprot. You can use following command to search for a Gene Ontology (GO) term.

protein-quest search go --limit 5 --aspect cellular_component apoptosome -

Search for interaction partners

Use https://www.ebi.ac.uk/complexportal to find interaction partners of given UniProt accession.

protein-quest search interaction-partners Q05471 interaction-partners-of-Q05471.txt

The interaction-partners-of-Q05471.txt file contains uniprot accessions (one per line).

Search for complexes

Given Uniprot accessions search for macromolecular complexes at https://www.ebi.ac.uk/complexportal and return the complex entries and their members.

echo Q05471 | protein-quest search complexes - complexes.csv

The complexes.csv looks like

query_protein,complex_id,complex_url,complex_title,members
Q05471,CPX-2122,https://www.ebi.ac.uk/complexportal/complex/CPX-2122,Swr1 chromatin remodelling complex,P31376;P35817;P38326;P53201;P53930;P60010;P80428;Q03388;Q03433;Q03940;Q05471;Q06707;Q12464;Q12509

Convert structure files to .cif format

Some tools (for example powerfit) only work with .cif files and not *.cif.gz or *.bcif files.

protein-quest convert --output-dir ./filtered-cif ./filtered-ss

Model Context Protocol (MCP) server

Protein quest can also help LLMs like Claude Sonnet 4 by providing a set of tools for protein structures.

Protein Quest MCP workflow

To run mcp server you have to install the mcp extra with:

pip install protein-quest[mcp]

The server can be started with:

protein-quest mcp

The mcp server contains an prompt template to search/retrieve/filter candidate structures.

Contributing

For development information and contribution guidelines, please see CONTRIBUTING.md.