Skip to content

Latest commit

 

History

History
70 lines (50 loc) · 3.23 KB

chemcomp.md

File metadata and controls

70 lines (50 loc) · 3.23 KB

The Chemical Component Dictionary

The Chemical Component Dictionary is an external reference file describing all residue and small molecule components found in PDB entries. This dictionary contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules.

How Does BioJava Decide what Groups Are Amino Acids?

BioJava utilizes the Chem. Comp. Dictionary to achieve a chemically correct representation of each group. To make it clear how this can work, let's take a look at how Selenomethionine and water is dealt with:

Structure structure = StructureIO.getStructure("1A62");

for (Chain chain : structure.getChains()){
    for (Group group : chain.getAtomGroups()){
        if ( group.getPDBName().equals("MSE") || group.getPDBName().equals("HOH")){
            System.out.println(group.getPDBName() + " is a group of type " + group.getType());
        }
    }
}

This will give this output:

MSE is a group of type amino
MSE is a group of type amino
MSE is a group of type amino
HOH is a group of type hetatm
HOH is a group of type hetatm
HOH is a group of type hetatm
...

As you can see, although MSE is flaged as HETATM in the PDB file, BioJava still represents it correctly as an amino acid. They key is that the definition file for MSE flags it as "L-PEPTIDE LINKING", which is being used by BioJava.

Note: Selenomethionine is a naturally occurring amino acid containing selenium. It has the ID MSE in the Chemical Component Dictionary.

How to Access Chemical Component Definitions

By default BioJava will retrieve the full chemical component definitions provided by the PDB. That way BioJava makes sure that the user gets a correct representation e.g. distinguish ligands from the polypeptide chain, correctly resolve chemically modified residues, etc.

The behaviour is configurable by setting a property in the ChemCompGroupFactory singleton:

  1. Use a minimal built-in set of Chemical Component Definitions. Will only deal with most frequent cases of chemical components. Does not guarantee a correct representation, but it is fast and does not require network access.
     ChemCompGroupFactory.setChemCompProvider(new ReducedChemCompProvider());
  1. Load all Chemical Component Definitions at startup (slow startup, but then no further delays later on, requires more memory)
     ChemCompGroupFactory.setChemCompProvider(new AllChemCompProvider());
  1. Fetch missing Chemical Component Definitions on the fly (small download and parsing delays every time a new chemical compound is found). Default behaviour since 4.2.0. Note that the chemical component files are cached in the local file system for subsequent uses.
     ChemCompGroupFactory.setChemCompProvider(new DownloadChemCompProvider());

Navigation: Home | Book 3: The Structure Modules | Chapter 5 : Chemical Component Dictionary

Prev: Chapter 4 : Local Installations

Next: Chapter 6 : Work with mmCIF/PDBx Files