Kamis, 14 Januari 2016

Structural Bioinformatics - Part 1: The Basics, Structural Protein Determination, and Database

Structure of Hemoglobin


This is still a part of the “Sebuah Tulisan Bioinformatika” series which was previously written in Bahasa Indonesia. Now I would like to push my luck a little bit further by writing it in English started from this article and further on. Okay, this time I’d like to write about the basics of Structural Bioinformatics. Hopefully you would enjoy my story :)
The story starts with a definition. Structural Bioinformatics is a branch of bioinformatics which deals with the structural parts of biological macromolecules, the DNA, RNA and Protein. Nowadays, protein structures are dominating the structural databases compared to DNA and RNA. Several factors that cause protein domination might be associated with the history which in 1970s were initiated by resolving protein structure by using X-ray crystallography. Therefor, I will put more focus on protein structural bioinformatics in the story.
As you might already know, there are 4 hierarchical structures governing the protein, named primary until quartenary structure. The primary structure represents protein in a string/sequence of amino acid composing the protein. So if you see the sequence WHYGARTFED for example, that is the primary structure composed of tryptophan, histidine, tyrosine, glycine, alanine, arginine, and so on. For more details of amino acid abbreviation symbol, you can search in the Google. The secondary structure composed of the local structures formed by local interactions of adjacent amino acids through hydrogen bonds. Commonly there are eight types of secondary structures as defined in the Dictionary of Secondary Structure of Protein (DSSP) by Kabsch in 1983. They are 310 helix (G), alpha helix (H), pi helix (I), beta bridge (B), beta bulges (E), turns (T), curve (S), and loop (C). To ease the complexity, these secondary structures often grouped into three larger classes, named helix (G, H, and I), strands (B and E), and loops (T, S, and C). The tertiary structure often determines the majority folding patterns of protein. It is formed by non-local residual interactions involving Van der Waals and hydrophobic interactions. Sometimes the folding is also strengthened by incorporating covalent bond through disulfide bridge between two cysteine. Owing to the native tertiary structure, the protein can function properly. But several large protein complexes needs a higher order structure in order to function, by which we call as quartenary structure. This structure involves several tertiary structure subunits to be assembled together. Hemoglobin, a classic example, is a protein with a quartenary structure. It is composed of two alpha globin and two beta globin subunits.
Until recently, there are three methods employed to determine the protein structure. Ordered from the earliest to the most recent, they are  X-ray crystallography, nucleic magnetic resonance (NMR) spectroscopy, and electron microscopy (EM). Each of these methods have their advantages and drawbacks. X-ray crystallography focusing an x-ray beam through a crystallized protein. The patterns of electron diffraction due to x-ray beam is mapped into an electron-density map, which is then used to build a model structure of the corresponding protein. The use of protein crystallization and electron-density map allows the building of a model structure in high resolution. However, the crystallization process almost always give a problem since not all protein can be crystallized. The use of NMR in protein structure determination solves this crystallization problem simply because this method do not require such process. In NMR spectroscopy, a purified protein solution is placed in a very strong magnetic field and then a radio wave hit to the molecules. The corresponding resonance from a radio wave is then analyzed to map a number of adjacent atomic nuclei. The model structure is then build based on the position of these atomic nuclei relative to the others. NMR spectroscopy gives an intermediate resolution of the resulted structure compared to x-ray, but the independency of protein crystallization process make this method used to model the structure of non-crystallizeable protein such as transmembrane proteins. The EM method is the most recently developed method for determining protein structure. In the process, electron beams are projected directly to the protein complex at every angle to generate a 3D image, similar to cell structure visualization. EM is able to model a large of even huge protein complexes which the other two methods could not. However, speaking of the resolution, EM-generated model structure has the lowest resolution compared to NMR or X-ray methods.
After generating an image, then what to do next? Well, in this digital era where uploading images are prominent feature of the term “exist”, similar thing also happens for protein structure. The world wide Protein Data Bank (wwPDB) is the primary database where the researchers all over the world submit their model structures to be deposited. This database is divided into three sub-databases located in three different countries:
1.         Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) in USA. URL: http://www.rcsb.org/pdb/home/home.do
2.         European Protein Data Bank (PDBe) in UK. URL: http://www.ebi.ac.uk/pdbe/node/1
3.         Japan Protein Data Bank (PDBj) in Japan. URL: http://pdbj.org/
These three databases also accept other biological macromolecular structures such as DNA and RNA. But as the data grows, a new database developed specially to accommodate DNA and RNA structure was built. This database is called Nucleic Acid Database (NDB) and you can access it in: http://ndbserver.rutgers.edu/. The occurrence of all these databases help the researcher all around the world to deposit and exchanging structural data in order to make one further step in their research. Well, I think that’s enough for the first part of the story. In the next part, I will tell more about the structural databases as well as the file formats.

Victor

Tidak ada komentar: