Skip to content

fetch

Module for fetch Alphafold data.

DownloadableFormat = Literal['summary', 'bcif', 'cif', 'pdb', 'paeDoc', 'amAnnotations', 'amAnnotationsHg19', 'amAnnotationsHg38', 'msa', 'plddtDoc'] module-attribute

Types of formats that can be downloaded from the AlphaFold web service.

UrlFileNamePair = tuple[URL, str] module-attribute

A tuple of a URL and a filename.

UrlFileNamePairsOfFormats = dict[DownloadableFormat, UrlFileNamePair] module-attribute

A mapping of DownloadableFormat to UrlFileNamePair.

downloadable_formats = set(get_args(DownloadableFormat)) module-attribute

Set of formats that can be downloaded from the AlphaFold web service.

AlphaFoldEntry dataclass

AlphaFold entry with summary object and optionally local files.

See https://alphafold.ebi.ac.uk/api-docs for more details on the summary data structure.

by_format(dl_format)

Get the file path for a specific format.

Parameters:

Name Type Description Default
dl_format DownloadableFormat

The format for which to get the file path.

required

Returns:

Type Description
Path | None

The file path corresponding to the download format.

Path | None

Or None if the file is not set.

Raises:

Type Description
ValueError

If the format is not valid.

format2attr(dl_format) classmethod

Get the attribute name for a specific download format.

Parameters:

Name Type Description Default
dl_format DownloadableFormat

The format for which to get the attribute name.

required

Returns:

Type Description
str

The attribute name corresponding to the download format.

Raises:

Type Description
ValueError

If the format is not valid.

nr_of_files()

Nr of _file properties that are set

Returns:

Type Description
int

The number of _file properties that are set.

relative_to(session_dir)

Convert paths in an AlphaFoldEntry to be relative to the session directory.

Parameters:

Name Type Description Default
session_dir Path

The session directory to which the paths should be made relative.

required

Returns:

Type Description
AlphaFoldEntry

An AlphaFoldEntry instance with paths relative to the session directory.

fetch_alphafold_db_version() async

Fetch the current version of the AlphaFold database.

Returns:

Type Description
str

The current version of the AlphaFold database as a string. For example: "6".

fetch_many(uniprot_accessions, save_dir, formats, db_version=None, max_parallel_downloads=5, cacher=None, gzip_files=False, all_isoforms=False)

Synchronously fetches summaries and/or files like cif from AlphaFold Protein Structure Database.

Parameters:

Name Type Description Default
uniprot_accessions Iterable[str]

A set of Uniprot accessions to fetch.

required
save_dir Path

The directory to save the fetched files to.

required
formats set[DownloadableFormat]

A set of formats to download. If summary is in the set then summaries will be fetched using the API endpoint. and later the other files will be downloaded using static file URLs. If summary is not in the set then all files will be downloaded using static file URLs only. Excluding 'summary' is much faster as it avoids slow API calls.

required
db_version str | None

The version of the AlphaFold database to use. If None, the latest version will be used.

None
max_parallel_downloads int

The maximum number of parallel downloads.

5
cacher Cacher | None

A cacher to use for caching the fetched files.

None
gzip_files bool

Whether to gzip the downloaded files. Summaries are never gzipped.

False
all_isoforms bool

Whether to yield all isoforms of each uniprot entry. When False then yields only the canonical sequence per uniprot entry.

False

Returns:

Type Description
list[AlphaFoldEntry]

A list of AlphaFoldEntry dataclasses containing the summary, pdb file, and pae file.

fetch_many_async(uniprot_accessions, save_dir, formats, db_version=None, max_parallel_downloads=5, cacher=None, gzip_files=False, all_isoforms=False)

Asynchronously fetches summaries and/or files from AlphaFold Protein Structure Database.

Parameters:

Name Type Description Default
uniprot_accessions Iterable[str]

A set of Uniprot accessions to fetch.

required
save_dir Path

The directory to save the fetched files to.

required
formats set[DownloadableFormat]

A set of formats to download. If summary is in the set then summaries will be fetched using the API endpoint. and later the other files will be downloaded using static file URLs. If summary is not in the set then all files will be downloaded using static file URLs only.

required
db_version str | None

The version of the AlphaFold database to use. If None, the latest version will be used.

None
max_parallel_downloads int

The maximum number of parallel downloads.

5
cacher Cacher | None

A cacher to use for caching the fetched files.

None
gzip_files bool

Whether to gzip the downloaded files. Summaries are never gzipped.

False
all_isoforms bool

Whether to yield all isoforms of each uniprot entry. When False then yields only the canonical sequence per uniprot entry.

False

Yields:

Type Description
AsyncGenerator[AlphaFoldEntry]

A dataclass containing the summary, pdb file, and pae file.

Raises:

Type Description
ValueError

If 'formats' set is empty.

ValueError

If all_isoforms is True and 'summary' is not in 'formats' set.

fetch_summary(qualifier, session, semaphore, save_dir, cacher) async

Fetches a summary from the AlphaFold database for a given qualifier.

Parameters:

Name Type Description Default
qualifier str

The uniprot accession for the protein or entry to fetch. For example Q5VSL9.

required
session RetryClient

An asynchronous HTTP client session with retry capabilities.

required
semaphore Semaphore

A semaphore to limit the number of concurrent requests.

required
save_dir Path | None

An optional directory to save the fetched summary as a JSON file. If set and summary exists then summary will be loaded from disk instead of being fetched from the API. If not set then the summary will not be saved to disk and will always be fetched from the API.

required
cacher Cacher

A cacher to use for caching the fetched summary. Only used if save_dir is not None.

required

Returns:

Type Description
list[EntrySummary]

A list of EntrySummary objects representing the fetched summary.

list[EntrySummary]

When qualifier has multiple isoforms then multiple summaries are returned,

list[EntrySummary]

otherwise a list of a single summary is returned.

files_for_alphafold_entries(uniprot_accessions, formats, db_version, gzip_files)

Get the files to download for multiple AlphaFold entries.

Parameters:

Name Type Description Default
uniprot_accessions Iterable[str]

A set of Uniprot accessions.

required
formats set[DownloadableFormat]

A set of formats to download.

required
db_version str

The version of the AlphaFold database to use.

required
gzip_files bool

Whether to download gzipped files. Otherwise downloads uncompressed files.

required

Returns:

Type Description
dict[str, UrlFileNamePairsOfFormats]

A mapping of Uniprot accession to a mapping of DownloadableFormat to UrlFileNamePair.