Classwork for BIMM143
Laura Lu (PID: A17844089)
The EBI maintains the largest database of AlphaFold structure prediction models at: https://alphafold.ebi.ac.uk
From last class (before Halloween) we saw that PDB had 244,290 (Oct 2025)
The total number of protein sequences in UniProtKB is 199,579,901
Key Point: This is a tiny fraction of sequence space that has structural coverage (0.12%)
244290/199579901 * 100
[1] 0.1224021
AFDB is attempting to address this gap…
There are two “Quality Scores” from AlphaFold one for residues (i.e each amino acid) called pLDDT score. The other PAE score measures the confidence in the relative position of two residues (i.e. a score for every pair of residues).
Figure of 5 generated HIV-PR models

and the top ranked model colored by chain.

pLDDT score for model 1

and model 5

Read key result files into R. The first thing I need to know is what my results directory/folder is called (i.e. it name is different for every AlphaFold run/job)
results_dir <- "hivprdimer_23119:"
# File names for all PDB models
pdb_files <- list.files(path=results_dir,
pattern=".pdb",
full.names = TRUE)
# Print our PDB file names
basename(pdb_files)
character(0)