Structural Comparison of alpha and beta globin chain. Both were taken from human hemoglobin (PDBid 4HHB). |
So, after a little bit warming up about the
bascis of protein structure, now I would like to discuss about databases of
protein structure and file formats. After that the discussion about protein
structure comparison and alignment will follow. In the Part 1, I’ve already
discussed a little about the primary databases of biological macromolecule
structures. Yes, those are wwPDB, which mainly focuses on protein structures,
and NDB which focuses on nucleic acid structures.
Now I would like to discuss about one of the wwPDB
subserver, RCSB PDB. About more than 100,000 structures have been resolved per
January 2016 (precisely 115,306 structures). It is quite many, but this number
is nothing compared to the sequence databases which exceeding more than a
hundred million sequences. Okay, back to the PDB. Just like its sequence
database counterpart, we can download structural data in PDB. Each of the protein
structures are identified by an unique identifier called as PDBid. A PDBid is
characterized by four alphanumeric characters which is permanent and immutable,
means that as long as that structure exists it has that specific PDBid. As an
example, human hemoglobin has PDBid of 4HHB. Try to type “4HHB” in the PDB
search box you you will be directed to the page containing hemoglobin
structure. The structure page contains several features of the respective
protein, such as structure summary, 3D view, annotations, sequence,
sequence.structure similarity, and related literatures. We can also download a
structural file in this page.
There are three types of file formats of structural
files, namely PDB file, mmCIF file, and XML file. All of these three files
contain informations about the relative position of every atom composing the
proteins in 3D space. The difference among these files are the information
parsibility to allow further computational analysis. PDB file is the earliest
file format developed to accomodate the structural data. It is written in
table-like format including the number of atoms, amino acid residue, protein
subunits, and atom position in xyz coordinates. This format is relatively easy
to read and understand, but hardly parsable by computer since it adopts a
textfile format. The other, mmCIF and XML, formats adopt a relational database
format so they're more parsable by computer to allow more analyses like
residual grouping, sorting, etc.
Having seen the PDB file, does it give you an
image of what the protein look like? Of course it’s quite hard to imagine the
overal protein structure just by plotting all atoms based on their coordinates.
This is why in the earlier time (around 1970s), protein structure was made
physically by using molecular models (just like in the chemistry class) in
order to visualize it. But now there’re various sophisticated molecular
visualization programs which allow us to visualize the protein in various
styles, from wireframe, ball and stick, spacefill, until the ribbon style. Some
of the programs like Cn3D, Rasmol, Jmol, YASARA View, UCSF Chimera are free to
download. All of these program can take all three file formats to be
visualized.
One of the common analysis of protein structure
is to compare whether how fit one structure to the others. This kind of
analysis enables us to see the structural variation among similar or homologous
structures. Protein structure comparison is measured in distance between
equipositional atoms in compared structures. This measure is called root mean
square distance or RMSD for short. Commonly, only RMSD between equipositional
alpha-carbon are calculated (see the above figure). Just like visualization programs, there are also
several free-to-download or web-based protein comparison program like CE, UCSF
Chimera, Expresso, FATCAT, MAMMOTH,
VAST+, etc. Beside structural comparison, these programs also can perform
structural alignment, in which the amino acid residues in two proteins are
aligned to each other based on their position in both structures. This
structural alignment is considered better compared to ordinary sequence
alignment since structural conservation is higher than sequence conservation,
that is a tiny error might ruin the overall structure.
Victor