Query session database¶
When you interact with protein-detective a session database is created to store the results of your queries. This notebook shows how to query that database.
This notebooks expects that you have run the workflow.ipynb notebook first to create and fill the session database.
In [1]:
Copied!
from pathlib import Path
from protein_detective.db import connect
from pathlib import Path
from protein_detective.db import connect
Configure %sql magic to use the session database.
In [2]:
Copied!
session_dir = Path("session1")
# connect() is a context manager, to use connection over multiple cells, we cal its enter method directly.
conn = connect(session_dir, read_only=True).__enter__()
%load_ext sql
%sql conn
# %sql does not retain variable set in initialize_db(), so we use Python API for queries that need it.
session_dir = Path("session1")
# connect() is a context manager, to use connection over multiple cells, we cal its enter method directly.
conn = connect(session_dir, read_only=True).__enter__()
%load_ext sql
%sql conn
# %sql does not retain variable set in initialize_db(), so we use Python API for queries that need it.
Loading configurations from /home/verhoes/git/protein-detective/protein-detective/pyproject.toml.
Settings changed:
Config | value |
---|---|
displaylimit | 100 |
In [3]:
Copied!
# Paths in the database are relative to the session directory.
# To make full paths, you need to prepend the session directory.
# The session directory is set as a variable in the database.
conn.execute("SELECT getvariable('session_dir')").fetchone()
# Paths in the database are relative to the session directory.
# To make full paths, you need to prepend the session directory.
# The session directory is set as a variable in the database.
conn.execute("SELECT getvariable('session_dir')").fetchone()
Out[3]:
('session1',)
Uniprot search results¶
In [4]:
Copied!
%sql SELECT * FROM proteins
%sql SELECT * FROM proteins
Running query in 'DuckDBPyConnection'
Out[4]:
uniprot_acc |
---|
B1APH4 |
A6NE82 |
A8MYZ6 |
A0A1W2PPK0 |
A8MUV8 |
A8K0S8 |
A6NFI3 |
A8K8V0 |
A8MT69 |
A6NJT0 |
B2RD01 |
A8MTJ6 |
A0A1B0GWH4 |
A0A1W2PPF3 |
A6NK75 |
A0A087WUV0 |
A8MZ59 |
A6NP11 |
A3KN83 |
A6NK53 |
A6NMT0 |
A8MPP1 |
A8MQ14 |
A0A2Z4LIS9 |
A6NJ46 |
B2RPK0 |
A0A1W2PQ73 |
A6NI15 |
A8MXY4 |
A6NCS4 |
A6NDX5 |
A8MTY0 |
A6NN14 |
A0A5F9ZHS7 |
A0A1W2PRP0 |
A6NLW8 |
A2RRD8 |
A0A1B0GVZ6 |
A6NGD5 |
A2RU54 |
A6NJG6 |
B4DX44 |
A9YTQ3 |
A8K830 |
A6NDR6 |
A6NHT5 |
A0A1W2PQL4 |
B4DU55 |
A8MTQ0 |
A6NJL1 |
A8MT65 |
A6NDZ8 |
A6NKF2 |
A6NHJ4 |
A6NNA5 |
A8MUZ8 |
A6NM28 |
A1A519 |
A0A2R8Y619 |
A6NFD8 |
A0A3B3IU63 |
A6NFQ7 |
B2RXF5 |
A0A0U1RQI7 |
A0A1W2PPM1 |
A6NNF4 |
A1YPR0 |
A0A1B0GTS1 |
A6NJ08 |
A0A0C5B5G6 |
A8MWA4 |
In [5]:
Copied!
%sql SELECT * FROM pdbs
%sql SELECT * FROM pdbs
Running query in 'DuckDBPyConnection'
Out[5]:
pdb_id | method | resolution | mmcif_file |
---|---|---|---|
7YWX | Electron_Microscopy | 12.0 | downloads/7ywx.cif |
4NE5 | X-Ray_Crystallography | 2.5 | downloads/4ne5.cif |
4E45 | X-Ray_Crystallography | 2.0 | downloads/4e45.cif |
4NDY | X-Ray_Crystallography | 7.0 | downloads/4ndy.cif |
4NE3 | X-Ray_Crystallography | 1.7999999523162842 | downloads/4ne3.cif |
4E44 | X-Ray_Crystallography | 2.0999999046325684 | downloads/4e44.cif |
4NE6 | X-Ray_Crystallography | 2.0999999046325684 | downloads/4ne6.cif |
7R5S | Electron_Microscopy | 2.8299999237060547 | downloads/7r5s.cif |
7XHN | Electron_Microscopy | 3.7100000381469727 | downloads/7xhn.cif |
7XHO | Electron_Microscopy | 3.2899999618530273 | downloads/7xho.cif |
4DRA | X-Ray_Crystallography | 2.4100000858306885 | downloads/4dra.cif |
4NE1 | X-Ray_Crystallography | 6.5 | downloads/4ne1.cif |
4DRB | X-Ray_Crystallography | 2.630000114440918 | downloads/4drb.cif |
5Y7Y | X-Ray_Crystallography | 2.4000000953674316 | downloads/5y7y.cif |
In [6]:
Copied!
%sql SELECT * FROM proteins_pdbs
%sql SELECT * FROM proteins_pdbs
Running query in 'DuckDBPyConnection'
Out[6]:
uniprot_acc | pdb_id | uniprot_chains | single_chain_pdb_file |
---|---|---|---|
A8MT69 | 7YWX | X=1-81 | single_chain/A8MT69_7ywx_X2A.pdb |
A8MT69 | 4NE5 | B/D/F/H=8-81 | single_chain/A8MT69_4ne5_B2A.pdb |
A8MT69 | 4E45 | B/D/G/I/L/N=1-81 | single_chain/A8MT69_4e45_B2A.pdb |
A8MT69 | 4NDY | B/D/H/L/M/N/U/V/W/X=8-81 | single_chain/A8MT69_4ndy_B2A.pdb |
A8MT69 | 4NE3 | B=8-81 | single_chain/A8MT69_4ne3_B2A.pdb |
A8MT69 | 4E44 | B/D=1-81 | single_chain/A8MT69_4e44_B2A.pdb |
A8MT69 | 4NE6 | B/D=8-81 | single_chain/A8MT69_4ne6_B2A.pdb |
A8MT69 | 7R5S | X=1-81 | single_chain/A8MT69_7r5s_X2A.pdb |
A8MT69 | 7XHN | X=1-81 | single_chain/A8MT69_7xhn_X2A.pdb |
A8MT69 | 7XHO | X=1-81 | single_chain/A8MT69_7xho_X2A.pdb |
A8MT69 | 4DRA | E/F/G/H=1-81 | single_chain/A8MT69_4dra_E2A.pdb |
A8MT69 | 4NE1 | B/D/H/L/M/N/U/V/W/X/Z/b/d/h/i/j/o/p/q/r=8-81 | single_chain/A8MT69_4ne1_B2A.pdb |
A8MT69 | 4DRB | J/K/L/M/N/O=1-81 | single_chain/A8MT69_4drb_J2A.pdb |
A9YTQ3 | 5Y7Y | A=27-280 | single_chain/A9YTQ3_5y7y_A2A.pdb |
In [7]:
Copied!
%sql SELECT * FROM alphafolds LIMIT 1
%sql SELECT * FROM alphafolds LIMIT 1
Running query in 'DuckDBPyConnection'
Out[7]:
uniprot_acc | summary | bcif_file | cif_file | pdb_file | pae_image_file | pae_doc_file | am_annotations_file | am_annotations_hg19_file | am_annotations_hg38_file |
---|---|---|---|---|---|---|---|---|---|
A0A087WUV0 | {"entryId": "AF-A0A087WUV0-F1", "gene": "Unknown", "sequenceChecksum": "5DE83E4BE25B68BD", "sequenceVersionDate": "2014-10-29", "uniprotAccession": "A0A087WUV0", "uniprotId": "A0A087WUV0_HUMAN", "uniprotDescription": "Uncharacterized protein", "taxId": 9606, "organismScientificName": "Homo sapiens", "uniprotStart": 1, "uniprotEnd": 522, "uniprotSequence": "MEPEGRGSLFEDSDLLHAGNPKENDVTAVLLTPGSQELMIRDMAEALTQWRQLNSPQGDVPEKPRNLVLLGLPISTPDVISQLEHEEELEREVSKAASQKHWETIPESKELTPEKDISEEESAPGVLIVRFSKESSSECEDSLESQQENHEKHLIQEAVTEKSSRERSYQSDEFRRNCTQRSLLVQQQGERLHHCDSFKNNLKQNSDIIRHERICAGKKPWKCNECEKAFSYYSAFVLHQRIHTGEKPYECNECGKAFSQSIHLTLHQRIHTGEKPYECHECGKAFSHRSALIRHHIIHTGEKPYECNECGKAFNQSSYLTQHQRIHTGEKPYECNECGKAFSQSTFLTQHQVIHTGEKPYKCNECGKAFSDRSGLIQHQRTHTGERPYECNECGKAFGYCSALTQHQRTHTGEKPYKCNDCAKAFSDRSALIRHQRTHTGEKPYKCKDCGKAFSQSSSLTKHQKTHTGEKPYKCKECGKAFSQSSSLSQHQKTHAGVKTKKYVQALSEHLTFGQHKRIHTG", "modelCreatedDate": "2022-06-01", "latestVersion": 4, "allVersions": [1, 2, 3, 4], "bcifUrl": "https://alphafold.ebi.ac.uk/files/AF-A0A087WUV0-F1-model_v4.bcif", "cifUrl": "https://alphafold.ebi.ac.uk/files/AF-A0A087WUV0-F1-model_v4.cif", "pdbUrl": "https://alphafold.ebi.ac.uk/files/AF-A0A087WUV0-F1-model_v4.pdb", "paeImageUrl": "https://alphafold.ebi.ac.uk/files/AF-A0A087WUV0-F1-predicted_aligned_error_v4.png", "paeDocUrl": "https://alphafold.ebi.ac.uk/files/AF-A0A087WUV0-F1-predicted_aligned_error_v4.json", "amAnnotationsUrl": "https://alphafold.ebi.ac.uk/files/AF-A0A087WUV0-F1-aa-substitutions.csv", "amAnnotationsHg19Url": null, "amAnnotationsHg38Url": "https://alphafold.ebi.ac.uk/files/AF-A0A087WUV0-F1-hg38.csv", "isReviewed": false, "isReferenceProteome": true} | None | None | downloads/AF-A0A087WUV0-F1-model_v4.pdb | None | None | None | None | None |
In [8]:
Copied!
%sql SELECT count(*) FROM alphafolds
%sql SELECT count(*) FROM alphafolds
Running query in 'DuckDBPyConnection'
Out[8]:
count_star() |
---|
71 |
In [9]:
Copied!
# Fetch fields from inside summary
%sql SELECT uniprot_acc, summary.taxId, summary.uniprotStart, summary.uniprotEnd, summary.gene FROM alphafolds
# Fetch fields from inside summary
%sql SELECT uniprot_acc, summary.taxId, summary.uniprotStart, summary.uniprotEnd, summary.gene FROM alphafolds
Running query in 'DuckDBPyConnection'
Out[9]:
uniprot_acc | taxId | uniprotStart | uniprotEnd | gene |
---|---|---|---|---|
A0A087WUV0 | 9606 | 1 | 522 | "Unknown" |
A0A0C5B5G6 | 9606 | 1 | 16 | "MT-RNR1" |
A0A0U1RQI7 | 9606 | 1 | 1052 | "KLF18" |
A0A1B0GTS1 | 9606 | 1 | 333 | "HSFX4" |
A0A1B0GVZ6 | 9606 | 1 | 204 | "MBD3L2B" |
A0A1B0GWH4 | 9606 | 1 | 333 | "HSFX3" |
A0A1W2PPF3 | 9606 | 1 | 345 | "DUXB" |
A0A1W2PPK0 | 9606 | 1 | 400 | "Unknown" |
A0A1W2PPM1 | 9606 | 1 | 405 | "CPHXL" |
A0A1W2PQ73 | 9606 | 1 | 354 | "ERFL" |
A0A1W2PQL4 | 9606 | 1 | 384 | "ZNF722P" |
A0A1W2PRP0 | 9606 | 1 | 233 | "FOXL3" |
A0A2R8Y619 | 9606 | 1 | 122 | "H2BE1" |
A0A2Z4LIS9 | 9606 | 1 | 290 | "FOXO3B" |
A0A3B3IU63 | 9606 | 1 | 148 | "H2AL1RP" |
A0A5F9ZHS7 | 9606 | 1 | 289 | "NFILZ" |
A1A519 | 9606 | 1 | 330 | "FAM170A" |
A1YPR0 | 9606 | 1 | 619 | "ZBTB7C" |
A2RRD8 | 9606 | 1 | 509 | "ZNF320" |
A2RU54 | 9606 | 1 | 273 | "HMX2" |
A3KN83 | 9606 | 1 | 1393 | "SBNO1" |
A6NCS4 | 9606 | 1 | 301 | "NKX2-6" |
A6NDR6 | 9606 | 1 | 274 | "MEIS3P1" |
A6NDX5 | 9606 | 1 | 716 | "ZNF840P" |
A6NDZ8 | 9606 | 1 | 208 | "MBD3L4" |
A6NE82 | 9606 | 1 | 208 | "MBD3L3" |
A6NFD8 | 9606 | 1 | 242 | "HELT" |
A6NFI3 | 9606 | 1 | 1004 | "ZNF316" |
A6NFQ7 | 9606 | 1 | 191 | "DPRX" |
A6NGD5 | 9606 | 1 | 496 | "ZSCAN5C" |
A6NHJ4 | 9606 | 1 | 632 | "ZNF860" |
A6NHT5 | 9606 | 1 | 357 | "HMX3" |
A6NI15 | 9606 | 1 | 193 | "MSGN1" |
A6NJ08 | 9606 | 1 | 208 | "MBD3L5" |
A6NJ46 | 9606 | 1 | 265 | "NKX6-3" |
A6NJG6 | 9606 | 1 | 315 | "ARGFX" |
A6NJL1 | 9606 | 1 | 495 | "ZSCAN5B" |
A6NJT0 | 9606 | 1 | 531 | "UNCX" |
A6NK53 | 9606 | 1 | 670 | "ZNF233" |
A6NK75 | 9606 | 1 | 572 | "ZNF98" |
A6NKF2 | 9606 | 1 | 412 | "ARID3C" |
A6NLW8 | 9606 | 1 | 204 | "DUXA" |
A6NM28 | 9606 | 1 | 416 | "ZFP92" |
A6NMT0 | 9606 | 1 | 343 | "DBX1" |
A6NN14 | 9606 | 1 | 1252 | "ZNF729" |
A6NNA5 | 9606 | 1 | 263 | "DRGX" |
A6NNF4 | 9606 | 1 | 738 | "ZNF726" |
A6NP11 | 9606 | 1 | 495 | "ZNF716" |
A8K0S8 | 9606 | 1 | 358 | "MEIS3P2" |
A8K830 | 9606 | 1 | 154 | "COLCA2" |
A8K8V0 | 9606 | 1 | 405 | "ZNF785" |
A8MPP1 | 9606 | 1 | 907 | "DDX11L8" |
A8MQ14 | 9606 | 1 | 1090 | "ZNF850" |
A8MT65 | 9606 | 1 | 544 | "ZNF891" |
A8MT69 | 9606 | 1 | 81 | "CENPX" |
A8MTJ6 | 9606 | 1 | 420 | "FOXI3" |
A8MTQ0 | 9606 | 1 | 251 | "NOTO" |
A8MTY0 | 9606 | 1 | 619 | "ZNF724" |
A8MUV8 | 9606 | 1 | 499 | "ZNF727" |
A8MUZ8 | 9606 | 1 | 300 | "ZNF705G" |
A8MWA4 | 9606 | 1 | 300 | "ZNF705E" |
A8MXY4 | 9606 | 1 | 864 | "ZNF99" |
A8MYZ6 | 9606 | 1 | 492 | "FOXO6" |
A8MZ59 | 9606 | 1 | 198 | "LEUTX" |
A9YTQ3 | 9606 | 1 | 701 | "AHRR" |
B1APH4 | 9606 | 1 | 448 | "ZNF487" |
B2RD01 | 9606 | 1 | 187 | "CENPBD1" |
B2RPK0 | 9606 | 1 | 211 | "HMGB1P1" |
B2RXF5 | 9606 | 1 | 422 | "ZBTB42" |
B4DU55 | 9606 | 1 | 563 | "ZNF879" |
B4DX44 | 9606 | 1 | 427 | "ZNF736" |
Density filtered models¶
In [10]:
Copied!
%%sql
SELECT
f.confidence, f.min_threshold, f.max_threshold,
density_filtered_alphafolds.*,
alphafolds.summary.uniprotStart,
alphafolds.summary.uniprotEnd,
length(alphafolds.summary.uniprotSequence) AS uniprot_length
FROM density_filtered_alphafolds
JOIN density_filters AS f USING (density_filter_id)
JOIN alphafolds USING (uniprot_acc)
LIMIT 100;
%%sql
SELECT
f.confidence, f.min_threshold, f.max_threshold,
density_filtered_alphafolds.*,
alphafolds.summary.uniprotStart,
alphafolds.summary.uniprotEnd,
length(alphafolds.summary.uniprotSequence) AS uniprot_length
FROM density_filtered_alphafolds
JOIN density_filters AS f USING (density_filter_id)
JOIN alphafolds USING (uniprot_acc)
LIMIT 100;
Running query in 'DuckDBPyConnection'
Out[10]:
confidence | min_threshold | max_threshold | density_filter_id | uniprot_acc | nr_residues_above_confidence | keep | pdb_file | uniprotStart | uniprotEnd | uniprot_length |
---|---|---|---|---|---|---|---|---|---|---|
70.0 | 100 | 500 | 1 | A0A087WUV0 | 283 | True | density_filtered/AF-A0A087WUV0-F1-model_v4.pdb | 1 | 522 | 524 |
70.0 | 100 | 500 | 1 | A0A0C5B5G6 | 10 | False | None | 1 | 16 | 18 |
70.0 | 100 | 500 | 1 | A0A0U1RQI7 | 192 | True | density_filtered/AF-A0A0U1RQI7-F1-model_v4.pdb | 1 | 1052 | 1054 |
70.0 | 100 | 500 | 1 | A0A1B0GTS1 | 116 | True | density_filtered/AF-A0A1B0GTS1-F1-model_v4.pdb | 1 | 333 | 335 |
70.0 | 100 | 500 | 1 | A0A1B0GVZ6 | 54 | False | None | 1 | 204 | 206 |
70.0 | 100 | 500 | 1 | A0A1B0GWH4 | 117 | True | density_filtered/AF-A0A1B0GWH4-F1-model_v4.pdb | 1 | 333 | 335 |
70.0 | 100 | 500 | 1 | A0A1W2PPF3 | 124 | True | density_filtered/AF-A0A1W2PPF3-F1-model_v4.pdb | 1 | 345 | 347 |
70.0 | 100 | 500 | 1 | A0A1W2PPK0 | 71 | False | None | 1 | 400 | 402 |
70.0 | 100 | 500 | 1 | A0A1W2PPM1 | 68 | False | None | 1 | 405 | 407 |
70.0 | 100 | 500 | 1 | A0A1W2PQ73 | 86 | False | None | 1 | 354 | 356 |
70.0 | 100 | 500 | 1 | A0A1W2PQL4 | 226 | True | density_filtered/AF-A0A1W2PQL4-F1-model_v4.pdb | 1 | 384 | 386 |
70.0 | 100 | 500 | 1 | A0A1W2PRP0 | 94 | False | None | 1 | 233 | 235 |
70.0 | 100 | 500 | 1 | A0A2R8Y619 | 92 | False | None | 1 | 122 | 124 |
70.0 | 100 | 500 | 1 | A0A2Z4LIS9 | 52 | False | None | 1 | 290 | 292 |
70.0 | 100 | 500 | 1 | A0A3B3IU63 | 92 | False | None | 1 | 148 | 150 |
70.0 | 100 | 500 | 1 | A0A5F9ZHS7 | 71 | False | None | 1 | 289 | 291 |
70.0 | 100 | 500 | 1 | A1A519 | 64 | False | None | 1 | 330 | 332 |
70.0 | 100 | 500 | 1 | A1YPR0 | 214 | True | density_filtered/AF-A1YPR0-F1-model_v4.pdb | 1 | 619 | 621 |
70.0 | 100 | 500 | 1 | A2RRD8 | 336 | True | density_filtered/AF-A2RRD8-F1-model_v4.pdb | 1 | 509 | 511 |
70.0 | 100 | 500 | 1 | A2RU54 | 83 | False | None | 1 | 273 | 275 |
70.0 | 100 | 500 | 1 | A3KN83 | 844 | False | None | 1 | 1393 | 1395 |
70.0 | 100 | 500 | 1 | A6NCS4 | 75 | False | None | 1 | 301 | 303 |
70.0 | 100 | 500 | 1 | A6NDR6 | 94 | False | None | 1 | 274 | 276 |
70.0 | 100 | 500 | 1 | A6NDX5 | 418 | True | density_filtered/AF-A6NDX5-F1-model_v4.pdb | 1 | 716 | 718 |
70.0 | 100 | 500 | 1 | A6NDZ8 | 64 | False | None | 1 | 208 | 210 |
70.0 | 100 | 500 | 1 | A6NE82 | 60 | False | None | 1 | 208 | 210 |
70.0 | 100 | 500 | 1 | A6NFD8 | 100 | True | density_filtered/AF-A6NFD8-F1-model_v4.pdb | 1 | 242 | 244 |
70.0 | 100 | 500 | 1 | A6NFI3 | 458 | True | density_filtered/AF-A6NFI3-F1-model_v4.pdb | 1 | 1004 | 1006 |
70.0 | 100 | 500 | 1 | A6NFQ7 | 76 | False | None | 1 | 191 | 193 |
70.0 | 100 | 500 | 1 | A6NGD5 | 206 | True | density_filtered/AF-A6NGD5-F1-model_v4.pdb | 1 | 496 | 498 |
70.0 | 100 | 500 | 1 | A6NHJ4 | 378 | True | density_filtered/AF-A6NHJ4-F1-model_v4.pdb | 1 | 632 | 634 |
70.0 | 100 | 500 | 1 | A6NHT5 | 97 | False | None | 1 | 357 | 359 |
70.0 | 100 | 500 | 1 | A6NJ08 | 61 | False | None | 1 | 208 | 210 |
70.0 | 100 | 500 | 1 | A6NJ46 | 81 | False | None | 1 | 265 | 267 |
70.0 | 100 | 500 | 1 | A6NJG6 | 66 | False | None | 1 | 315 | 317 |
70.0 | 100 | 500 | 1 | A6NJL1 | 220 | True | density_filtered/AF-A6NJL1-F1-model_v4.pdb | 1 | 495 | 497 |
70.0 | 100 | 500 | 1 | A6NJT0 | 102 | True | density_filtered/AF-A6NJT0-F1-model_v4.pdb | 1 | 531 | 533 |
70.0 | 100 | 500 | 1 | A6NK53 | 218 | True | density_filtered/AF-A6NK53-F1-model_v4.pdb | 1 | 670 | 672 |
70.0 | 100 | 500 | 1 | A6NK75 | 412 | True | density_filtered/AF-A6NK75-F1-model_v4.pdb | 1 | 572 | 574 |
70.0 | 100 | 500 | 1 | A6NKF2 | 150 | True | density_filtered/AF-A6NKF2-F1-model_v4.pdb | 1 | 412 | 414 |
70.0 | 100 | 500 | 1 | A6NLW8 | 120 | True | density_filtered/AF-A6NLW8-F1-model_v4.pdb | 1 | 204 | 206 |
70.0 | 100 | 500 | 1 | A6NM28 | 276 | True | density_filtered/AF-A6NM28-F1-model_v4.pdb | 1 | 416 | 418 |
70.0 | 100 | 500 | 1 | A6NMT0 | 77 | False | None | 1 | 343 | 345 |
70.0 | 100 | 500 | 1 | A6NN14 | 795 | False | None | 1 | 1252 | 1254 |
70.0 | 100 | 500 | 1 | A6NNF4 | 525 | False | None | 1 | 738 | 740 |
70.0 | 100 | 500 | 1 | A6NP11 | 314 | True | density_filtered/AF-A6NP11-F1-model_v4.pdb | 1 | 495 | 497 |
70.0 | 100 | 500 | 1 | A8K0S8 | 111 | True | density_filtered/AF-A8K0S8-F1-model_v4.pdb | 1 | 358 | 360 |
70.0 | 100 | 500 | 1 | A8K830 | 10 | False | None | 1 | 154 | 156 |
70.0 | 100 | 500 | 1 | A8K8V0 | 214 | True | density_filtered/AF-A8K8V0-F1-model_v4.pdb | 1 | 405 | 407 |
70.0 | 100 | 500 | 1 | A8MPP1 | 706 | False | None | 1 | 907 | 909 |
70.0 | 100 | 500 | 1 | A8MQ14 | 811 | False | None | 1 | 1090 | 1092 |
70.0 | 100 | 500 | 1 | A8MT65 | 246 | True | density_filtered/AF-A8MT65-F1-model_v4.pdb | 1 | 544 | 546 |
70.0 | 100 | 500 | 1 | A8MT69 | 74 | False | None | 1 | 81 | 83 |
70.0 | 100 | 500 | 1 | A8MTJ6 | 94 | False | None | 1 | 420 | 422 |
70.0 | 100 | 500 | 1 | A8MTQ0 | 89 | False | None | 1 | 251 | 253 |
70.0 | 100 | 500 | 1 | A8MTY0 | 483 | True | density_filtered/AF-A8MTY0-F1-model_v4.pdb | 1 | 619 | 621 |
70.0 | 100 | 500 | 1 | A8MUV8 | 326 | True | density_filtered/AF-A8MUV8-F1-model_v4.pdb | 1 | 499 | 501 |
70.0 | 100 | 500 | 1 | A8MUZ8 | 137 | True | density_filtered/AF-A8MUZ8-F1-model_v4.pdb | 1 | 300 | 302 |
70.0 | 100 | 500 | 1 | A8MWA4 | 101 | True | density_filtered/AF-A8MWA4-F1-model_v4.pdb | 1 | 300 | 302 |
70.0 | 100 | 500 | 1 | A8MXY4 | 657 | False | None | 1 | 864 | 866 |
70.0 | 100 | 500 | 1 | A8MYZ6 | 90 | False | None | 1 | 492 | 494 |
70.0 | 100 | 500 | 1 | A8MZ59 | 78 | False | None | 1 | 198 | 200 |
70.0 | 100 | 500 | 1 | A9YTQ3 | 196 | True | density_filtered/AF-A9YTQ3-F1-model_v4.pdb | 1 | 701 | 703 |
70.0 | 100 | 500 | 1 | B1APH4 | 67 | False | None | 1 | 448 | 450 |
70.0 | 100 | 500 | 1 | B2RD01 | 54 | False | None | 1 | 187 | 189 |
70.0 | 100 | 500 | 1 | B2RPK0 | 151 | True | density_filtered/AF-B2RPK0-F1-model_v4.pdb | 1 | 211 | 213 |
70.0 | 100 | 500 | 1 | B2RXF5 | 94 | False | None | 1 | 422 | 424 |
70.0 | 100 | 500 | 1 | B4DU55 | 375 | True | density_filtered/AF-B4DU55-F1-model_v4.pdb | 1 | 563 | 565 |
70.0 | 100 | 500 | 1 | B4DX44 | 290 | True | density_filtered/AF-B4DX44-F1-model_v4.pdb | 1 | 427 | 429 |
70.0 | 100 | 500 | 1 | A6NI15 | 66 | False | None | 1 | 193 | 195 |
70.0 | 100 | 500 | 1 | A6NNA5 | 85 | False | None | 1 | 263 | 265 |
Powerfit¶
In [11]:
Copied!
result = %sql SELECT powerfit_run_id, options.target, options.resolution FROM powerfit_runs;
result = %sql SELECT powerfit_run_id, options.target, options.resolution FROM powerfit_runs;
Running query in 'DuckDBPyConnection'
In [12]:
Copied!
result
result
Out[12]:
powerfit_run_id | target | resolution |
---|---|---|
1 | "../../powerfit-tutorial/ribosome-KsgA.map" | 13 |
In [13]:
Copied!
%%sql
SELECT density_filter_id, uniprot_acc, pdb_file,
parse_filename(pdb_file, true) AS pdb_filename
FROM density_filtered_alphafolds WHERE keep=True
%%sql
SELECT density_filter_id, uniprot_acc, pdb_file,
parse_filename(pdb_file, true) AS pdb_filename
FROM density_filtered_alphafolds WHERE keep=True
Running query in 'DuckDBPyConnection'
Out[13]:
density_filter_id | uniprot_acc | pdb_file | pdb_filename |
---|---|---|---|
1 | A0A087WUV0 | density_filtered/AF-A0A087WUV0-F1-model_v4.pdb | AF-A0A087WUV0-F1-model_v4 |
1 | A0A0U1RQI7 | density_filtered/AF-A0A0U1RQI7-F1-model_v4.pdb | AF-A0A0U1RQI7-F1-model_v4 |
1 | A0A1B0GTS1 | density_filtered/AF-A0A1B0GTS1-F1-model_v4.pdb | AF-A0A1B0GTS1-F1-model_v4 |
1 | A0A1B0GWH4 | density_filtered/AF-A0A1B0GWH4-F1-model_v4.pdb | AF-A0A1B0GWH4-F1-model_v4 |
1 | A0A1W2PPF3 | density_filtered/AF-A0A1W2PPF3-F1-model_v4.pdb | AF-A0A1W2PPF3-F1-model_v4 |
1 | A0A1W2PQL4 | density_filtered/AF-A0A1W2PQL4-F1-model_v4.pdb | AF-A0A1W2PQL4-F1-model_v4 |
1 | A1YPR0 | density_filtered/AF-A1YPR0-F1-model_v4.pdb | AF-A1YPR0-F1-model_v4 |
1 | A2RRD8 | density_filtered/AF-A2RRD8-F1-model_v4.pdb | AF-A2RRD8-F1-model_v4 |
1 | A6NDX5 | density_filtered/AF-A6NDX5-F1-model_v4.pdb | AF-A6NDX5-F1-model_v4 |
1 | A6NFD8 | density_filtered/AF-A6NFD8-F1-model_v4.pdb | AF-A6NFD8-F1-model_v4 |
1 | A6NFI3 | density_filtered/AF-A6NFI3-F1-model_v4.pdb | AF-A6NFI3-F1-model_v4 |
1 | A6NGD5 | density_filtered/AF-A6NGD5-F1-model_v4.pdb | AF-A6NGD5-F1-model_v4 |
1 | A6NHJ4 | density_filtered/AF-A6NHJ4-F1-model_v4.pdb | AF-A6NHJ4-F1-model_v4 |
1 | A6NJL1 | density_filtered/AF-A6NJL1-F1-model_v4.pdb | AF-A6NJL1-F1-model_v4 |
1 | A6NJT0 | density_filtered/AF-A6NJT0-F1-model_v4.pdb | AF-A6NJT0-F1-model_v4 |
1 | A6NK53 | density_filtered/AF-A6NK53-F1-model_v4.pdb | AF-A6NK53-F1-model_v4 |
1 | A6NK75 | density_filtered/AF-A6NK75-F1-model_v4.pdb | AF-A6NK75-F1-model_v4 |
1 | A6NKF2 | density_filtered/AF-A6NKF2-F1-model_v4.pdb | AF-A6NKF2-F1-model_v4 |
1 | A6NLW8 | density_filtered/AF-A6NLW8-F1-model_v4.pdb | AF-A6NLW8-F1-model_v4 |
1 | A6NM28 | density_filtered/AF-A6NM28-F1-model_v4.pdb | AF-A6NM28-F1-model_v4 |
1 | A6NP11 | density_filtered/AF-A6NP11-F1-model_v4.pdb | AF-A6NP11-F1-model_v4 |
1 | A8K0S8 | density_filtered/AF-A8K0S8-F1-model_v4.pdb | AF-A8K0S8-F1-model_v4 |
1 | A8K8V0 | density_filtered/AF-A8K8V0-F1-model_v4.pdb | AF-A8K8V0-F1-model_v4 |
1 | A8MT65 | density_filtered/AF-A8MT65-F1-model_v4.pdb | AF-A8MT65-F1-model_v4 |
1 | A8MTY0 | density_filtered/AF-A8MTY0-F1-model_v4.pdb | AF-A8MTY0-F1-model_v4 |
1 | A8MUV8 | density_filtered/AF-A8MUV8-F1-model_v4.pdb | AF-A8MUV8-F1-model_v4 |
1 | A8MUZ8 | density_filtered/AF-A8MUZ8-F1-model_v4.pdb | AF-A8MUZ8-F1-model_v4 |
1 | A8MWA4 | density_filtered/AF-A8MWA4-F1-model_v4.pdb | AF-A8MWA4-F1-model_v4 |
1 | A9YTQ3 | density_filtered/AF-A9YTQ3-F1-model_v4.pdb | AF-A9YTQ3-F1-model_v4 |
1 | B2RPK0 | density_filtered/AF-B2RPK0-F1-model_v4.pdb | AF-B2RPK0-F1-model_v4 |
1 | B4DU55 | density_filtered/AF-B4DU55-F1-model_v4.pdb | AF-B4DU55-F1-model_v4 |
1 | B4DX44 | density_filtered/AF-B4DX44-F1-model_v4.pdb | AF-B4DX44-F1-model_v4 |
In [14]:
Copied!
%%sql
SELECT uniprot_acc, pdb_id, single_chain_pdb_file,
parse_filename(single_chain_pdb_file, true) AS structure
FROM proteins_pdbs
WHERE single_chain_pdb_file IS NOT NULL
%%sql
SELECT uniprot_acc, pdb_id, single_chain_pdb_file,
parse_filename(single_chain_pdb_file, true) AS structure
FROM proteins_pdbs
WHERE single_chain_pdb_file IS NOT NULL
Running query in 'DuckDBPyConnection'
Out[14]:
uniprot_acc | pdb_id | single_chain_pdb_file | structure |
---|---|---|---|
A8MT69 | 7YWX | single_chain/A8MT69_7ywx_X2A.pdb | A8MT69_7ywx_X2A |
A8MT69 | 4NE5 | single_chain/A8MT69_4ne5_B2A.pdb | A8MT69_4ne5_B2A |
A8MT69 | 4E45 | single_chain/A8MT69_4e45_B2A.pdb | A8MT69_4e45_B2A |
A8MT69 | 4NDY | single_chain/A8MT69_4ndy_B2A.pdb | A8MT69_4ndy_B2A |
A8MT69 | 4NE3 | single_chain/A8MT69_4ne3_B2A.pdb | A8MT69_4ne3_B2A |
A8MT69 | 4E44 | single_chain/A8MT69_4e44_B2A.pdb | A8MT69_4e44_B2A |
A8MT69 | 4NE6 | single_chain/A8MT69_4ne6_B2A.pdb | A8MT69_4ne6_B2A |
A8MT69 | 7R5S | single_chain/A8MT69_7r5s_X2A.pdb | A8MT69_7r5s_X2A |
A8MT69 | 7XHN | single_chain/A8MT69_7xhn_X2A.pdb | A8MT69_7xhn_X2A |
A8MT69 | 7XHO | single_chain/A8MT69_7xho_X2A.pdb | A8MT69_7xho_X2A |
A8MT69 | 4DRA | single_chain/A8MT69_4dra_E2A.pdb | A8MT69_4dra_E2A |
A8MT69 | 4NE1 | single_chain/A8MT69_4ne1_B2A.pdb | A8MT69_4ne1_B2A |
A8MT69 | 4DRB | single_chain/A8MT69_4drb_J2A.pdb | A8MT69_4drb_J2A |
A9YTQ3 | 5Y7Y | single_chain/A9YTQ3_5y7y_A2A.pdb | A9YTQ3_5y7y_A2A |
In [15]:
Copied!
from protein_detective.db import powerfit_solutions
from protein_detective.db import powerfit_solutions
In [16]:
Copied!
df = powerfit_solutions(conn)
df
df = powerfit_solutions(conn)
df
Out[16]:
powerfit_run_id | structure | rank | cc | fishz | relz | translation | rotation | density_filter_id | af_id | pdb_id | pdb_file | uniprot_acc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | A8MT69_4ne6_B2A | 1 | 0.456 | 0.492 | 11.071 | [227.18, 242.53, 211.83] | [0.0, -0.0, -1.0, 0.604, -0.797, 0.0, -0.797, ... | <NA> | None | 4NE6 | session1/single_chain/A8MT69_4ne6_B2A.pdb | A8MT69 |
1 | 1 | A8MT69_4drb_J2A | 1 | 0.444 | 0.477 | 10.588 | [227.18, 242.53, 214.9] | [0.797, -0.604, 0.0, 0.604, 0.797, 0.0, 0.0, 0... | <NA> | None | 4DRB | session1/single_chain/A8MT69_4drb_J2A.pdb | A8MT69 |
2 | 1 | A8MT69_4dra_E2A | 1 | 0.443 | 0.476 | 10.402 | [214.9, 187.27, 214.9] | [1.0, -0.0, 0.0, 0.0, -0.0, 1.0, -0.0, -1.0, -... | <NA> | None | 4DRA | session1/single_chain/A8MT69_4dra_E2A.pdb | A8MT69 |
3 | 1 | A8MT69_4e44_B2A | 1 | 0.440 | 0.472 | 10.099 | [224.11, 236.39, 227.18] | [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0] | <NA> | None | 4E44 | session1/single_chain/A8MT69_4e44_B2A.pdb | A8MT69 |
4 | 1 | A8MT69_7xhn_X2A | 1 | 0.439 | 0.471 | 10.136 | [230.25, 242.53, 217.97] | [-1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, 0.0, 1.0] | <NA> | None | 7XHN | session1/single_chain/A8MT69_7xhn_X2A.pdb | A8MT69 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
62752 | 1 | AF-A8MTY0-F1-model_v4 | 1463 | 0.095 | 0.095 | 5.424 | [168.85, 202.62, 236.39] | [0.0, 1.0, 0.0, 0.797, 0.0, -0.604, -0.604, 0.... | 1 | A8MTY0 | None | session1/density_filtered/AF-A8MTY0-F1-model_v... | A8MTY0 |
62753 | 1 | AF-A8MTY0-F1-model_v4 | 1462 | 0.095 | 0.095 | 5.425 | [122.8, 174.99, 147.36] | [0.797, 0.604, 0.0, 0.604, -0.797, 0.0, 0.0, 0... | 1 | A8MTY0 | None | session1/density_filtered/AF-A8MTY0-F1-model_v... | A8MTY0 |
62754 | 1 | AF-A8MTY0-F1-model_v4 | 1461 | 0.095 | 0.095 | 5.429 | [150.43, 211.83, 104.38] | [0.548, 0.548, 0.632, 0.184, -0.816, 0.548, 0.... | 1 | A8MTY0 | None | session1/density_filtered/AF-A8MTY0-F1-model_v... | A8MTY0 |
62755 | 1 | AF-A8MTY0-F1-model_v4 | 1460 | 0.095 | 0.096 | 5.437 | [224.11, 282.44, 150.43] | [0.0, 0.797, 0.604, -1.0, 0.0, -0.0, 0.0, -0.6... | 1 | A8MTY0 | None | session1/density_filtered/AF-A8MTY0-F1-model_v... | A8MTY0 |
62756 | 1 | AF-A8MTY0-F1-model_v4 | 1459 | 0.095 | 0.096 | 5.445 | [196.48, 196.48, 273.23] | [0.548, 0.548, 0.632, 0.184, -0.816, 0.548, 0.... | 1 | A8MTY0 | None | session1/density_filtered/AF-A8MTY0-F1-model_v... | A8MTY0 |
62757 rows × 13 columns
In [17]:
Copied!
df["cc"].plot(kind="hist", title="Histogram of CC Values", bins=40)
df["cc"].plot(kind="hist", title="Histogram of CC Values", bins=40)
Out[17]:
<Axes: title={'center': 'Histogram of CC Values'}, ylabel='Frequency'>
Show the top fitted models and their scores.
In [ ]:
Copied!
conn.execute("""
SELECT
f.* EXCLUDE (unfitted_model_file),
s.* EXCLUDE (pdb_file),
coalesce(f.unfitted_model_file, s.pdb_file) AS unfitted_model_file,
FROM fitted_models AS f
JOIN solutions AS s USING (powerfit_run_id, structure, rank)""").df()
conn.execute("""
SELECT
f.* EXCLUDE (unfitted_model_file),
s.* EXCLUDE (pdb_file),
coalesce(f.unfitted_model_file, s.pdb_file) AS unfitted_model_file,
FROM fitted_models AS f
JOIN solutions AS s USING (powerfit_run_id, structure, rank)""").df()
Out[ ]:
powerfit_run_id | structure | rank | fitted_model_file | powerfit_run_id_1 | structure_1 | rank_1 | cc | fishz | relz | translation | rotation | density_filter_id | af_id | pdb_id | uniprot_acc | unfitted_model_file | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | A8MT69_4ne6_B2A | 1 | session1/powerfit/1/A8MT69_4ne6_B2A/fit_1.pdb | 1 | A8MT69_4ne6_B2A | 1 | 0.456 | 0.492 | 11.071 | [227.18, 242.53, 211.83] | [0.0, -0.0, -1.0, 0.604, -0.797, 0.0, -0.797, ... | <NA> | None | 4NE6 | A8MT69 | session1/single_chain/A8MT69_4ne6_B2A.pdb |
1 | 1 | A8MT69_4ne6_B2A | 2 | session1/powerfit/1/A8MT69_4ne6_B2A/fit_2.pdb | 1 | A8MT69_4ne6_B2A | 2 | 0.438 | 0.470 | 10.571 | [138.15, 153.5, 138.15] | [-0.797, 0.604, 0.0, 0.604, 0.797, 0.0, 0.0, 0... | <NA> | None | 4NE6 | A8MT69 | session1/single_chain/A8MT69_4ne6_B2A.pdb |
2 | 1 | A8MT69_4ne6_B2A | 3 | session1/powerfit/1/A8MT69_4ne6_B2A/fit_3.pdb | 1 | A8MT69_4ne6_B2A | 3 | 0.428 | 0.457 | 10.297 | [227.18, 138.15, 227.18] | [-0.0, 0.0, 1.0, 0.0, 1.0, 0.0, -1.0, 0.0, -0.0] | <NA> | None | 4NE6 | A8MT69 | session1/single_chain/A8MT69_4ne6_B2A.pdb |
3 | 1 | A8MT69_4ne6_B2A | 4 | session1/powerfit/1/A8MT69_4ne6_B2A/fit_4.pdb | 1 | A8MT69_4ne6_B2A | 4 | 0.407 | 0.432 | 9.720 | [276.3, 248.67, 153.5] | [0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0] | <NA> | None | 4NE6 | A8MT69 | session1/single_chain/A8MT69_4ne6_B2A.pdb |
4 | 1 | A8MT69_4ne6_B2A | 5 | session1/powerfit/1/A8MT69_4ne6_B2A/fit_5.pdb | 1 | A8MT69_4ne6_B2A | 5 | 0.399 | 0.422 | 9.498 | [184.2, 190.34, 205.69] | [0.816, 0.548, 0.184, -0.184, 0.548, -0.816, -... | <NA> | None | 4NE6 | A8MT69 | session1/single_chain/A8MT69_4ne6_B2A.pdb |
In [ ]:
Copied!