Skip to content

retrieve

Retrieve subcommands for protein-quest.

alphafold(alphafold_csv, output_dir, /, *, format_=None, db_version=None, gzip_files=False, all_isoforms=False, max_parallel_downloads=5, cache=None, _=None)

Retrieve AlphaFold files for IDs in CSV.

Retrieve AlphaFold files from the AlphaFold Protein Structure Database.

Parameters:

Name Type Description Default
alphafold_csv InputFile

CSV file with an af_id column, or with model_provider and model_identifier columns. When using model_provider, only rows with model_provider == 'alphafold' are used. Single-column CSV files are also accepted, and the first row is treated as an ID. Use - for stdin.

required
output_dir OutputDir

Directory to store downloaded AlphaFold files.

required
format_ Annotated[set[DownloadableFormat], Parameter(name=--format, negative='')] | None

Formats to retrieve. Defaults to [cif]. Repeat parameter for multiple formats, for example --format cif --format pdb.

None
db_version str | None

AlphaFold database version.

None
gzip_files Annotated[bool, Parameter(negative='')]

Gzip downloaded files.

False
all_isoforms Annotated[bool, Parameter(negative='')]

Return all isoforms.

False
max_parallel_downloads BatchSize

Maximum number of parallel downloads.

5
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None

emdb(emdb_csv, output_dir, /, *, cache=None, _=None)

Retrieve EMDB volume files for EMDB IDs in CSV.

Retrieve volume files from Electron Microscopy Data Bank (EMDB) website for unique EMDB IDs listed in a CSV file.

Parameters:

Name Type Description Default
emdb_csv InputFile

CSV file with emdb_id column. Other columns are ignored. Single-column CSV files are also accepted, and the first row is treated as an ID. Use - for stdin.

required
output_dir OutputDir

Directory to store downloaded EMDB volume files.

required
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None

pdbe(pdbe_csv, output_dir, /, *, max_parallel_downloads=5, cache=None, _=None)

Retrieve mmCIF files from PDBe for PDB IDs in CSV.

Retrieve mmCIF files from Protein Data Bank in Europe Knowledge Base (PDBe) website for unique PDB IDs listed in a CSV file.

Parameters:

Name Type Description Default
pdbe_csv InputFile

CSV file with a pdb_id column, or with model_provider and model_identifier columns. When using model_provider, only rows with model_provider == 'pdbe' are used. Single-column CSV files are also accepted, and the first row is treated as an ID. Use - for stdin.

required
output_dir OutputDir

Directory to store downloaded PDBe mmCIF files.

required
max_parallel_downloads BatchSize

Maximum number of parallel downloads.

5
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None

structure(structures_csv, output_dir, /, *, raw=False, max_parallel_downloads=5, cache=None, _=None)

Retrieve structure files from search structure CSV output.

Retrieve structure files from model URLs listed in search structure CSV output.

Parameters:

Name Type Description Default
structures_csv InputFile

CSV file with provider, model_identifier, model_url, and model_format columns. Use - for stdin.

required
output_dir OutputDir

Directory to store retrieved structure files.

required
raw Annotated[bool, Parameter(negative='')]

Download in native format from CSV.

False
max_parallel_downloads BatchSize

Maximum number of parallel downloads.

5
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None