resolution
Filter structure files by resolution rank.
GroupBy = Literal['uniprot_accession'] | None
module-attribute
Type for grouping strategy in resolution-based filtering.
ResolutionFilterStatistics
dataclass
Statistics for filtering files based on ranked structure resolution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_file
|
Path
|
The path to the input file. |
required |
uniprot_accession
|
str | None
|
UniProt accession used for grouping. |
required |
resolution
|
float
|
Resolution from the structure file. |
required |
total_residue_count
|
int
|
Total residues across the whole structure. |
required |
is_alphafold
|
bool
|
Whether the structure was predicted by AlphaFold. |
required |
passed
|
bool
|
Whether the file passed the ranking filter. |
required |
output_file
|
Path | None
|
The path to the output file, if passed. |
required |
copy_resolution_statistics(stats, output_dir, copy_method='copy')
Copy files for passed statistics and set their output_file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats
|
Iterable[ResolutionFilterStatistics]
|
Statistics with |
required |
output_dir
|
Path
|
Directory where passed files will be written. |
required |
copy_method
|
CopyMethod
|
How to copy passed files to output directory. |
'copy'
|
Yields:
| Type | Description |
|---|---|
Generator[ResolutionFilterStatistics]
|
Statistics with |
filter_files_on_resolution(input_files, output_dir, top, group_by='uniprot_accession', copy_method='copy')
Filter structure files by resolution rank.
AlphaFold structures are preferred over non-AlphaFold. Structures with lower resolution are preferred. If resolution is the same, structures with more residues are preferred. If resolution is missing, those structures are undesirable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_files
|
list[Path]
|
Structure files to rank and filter. |
required |
output_dir
|
Path
|
Directory where passed files will be written. |
required |
top
|
int
|
Maximum number of files to keep. |
required |
group_by
|
GroupBy
|
Ranking strategy. |
'uniprot_accession'
|
copy_method
|
CopyMethod
|
How to copy passed files to output directory. |
'copy'
|
Yields:
| Type | Description |
|---|---|
Generator[ResolutionFilterStatistics]
|
Objects describing the filtering result for each input file. |
group_resolution_statistics(stats, top, group_by='uniprot_accession')
Rank stats by resolution and mark the top N as passed.
In group_by='uniprot_accession' mode, files with no UniProt accession
are skipped with a warning and appended last. In group_by=None mode,
all files are ranked globally and no missing-accession warnings are emitted.
AlphaFold structures are preferred over non-AlphaFold. Structures with lower resolution are preferred. If resolution is the same, structures with more residues are preferred. If resolution is missing, those structures are undesirable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats
|
Iterable[ResolutionFilterStatistics]
|
Resolution statistics to group and rank. |
required |
top
|
int
|
Maximum number of structures to pass. |
required |
group_by
|
GroupBy
|
Ranking strategy. |
'uniprot_accession'
|
Returns:
| Type | Description |
|---|---|
list[ResolutionFilterStatistics]
|
All statistics with |
list[ResolutionFilterStatistics]
|
The entries are sorted alphabetically by filename. |
iter_resolution_statistics(input_files)
Load resolution statistics for each structure file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_files
|
Iterable[Path]
|
Structure files to read metadata from. |
required |
Yields:
| Type | Description |
|---|---|
Generator[ResolutionFilterStatistics]
|
Statistics objects with metadata filled in; |
Generator[ResolutionFilterStatistics]
|
|
resolution_sort_key(stats)
Sort key for resolution-based filtering.
AlphaFold structures are preferred over non-AlphaFold. Structures with lower resolution are preferred. If resolution is the same, structures with more residues are preferred. If resolution is missing, those structures are undesirable.
Output is deterministic and sorted alphabetically by filename.