The Chemical Component Dictionary is an external reference file describing all residue and small molecule components found in PDB entries. This dictionary contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules.
BioJava utilizes the Chem. Comp. Dictionary to achieve a chemically correct representation of each group. To make it clear how this can work, let's take a look at how Selenomethionine and water is dealt with:
Structure structure = StructureIO.getStructure("1A62");
for (Chain chain : structure.getChains()){
for (Group group : chain.getAtomGroups()){
if ( group.getPDBName().equals("MSE") || group.getPDBName().equals("HOH")){
System.out.println(group.getPDBName() + " is a group of type " + group.getType());
}
}
}
This will give this output:
MSE is a group of type amino MSE is a group of type amino MSE is a group of type amino HOH is a group of type hetatm HOH is a group of type hetatm HOH is a group of type hetatm ...
As you can see, although MSE is flaged as HETATM in the PDB file, BioJava still represents it correctly as an amino acid. They key is that the definition file for MSE flags it as "L-PEPTIDE LINKING", which is being used by BioJava.
Note: Selenomethionine is a naturally occurring amino acid containing selenium. It has the ID MSE in the Chemical Component Dictionary.
By default BioJava will retrieve the full chemical component definitions provided by the PDB. That way BioJava makes sure that the user gets a correct representation e.g. distinguish ligands from the polypeptide chain, correctly resolve chemically modified residues, etc.
The behaviour is configurable by setting a property in the ChemCompGroupFactory
singleton:
- Use a minimal built-in set of Chemical Component Definitions. Will only deal with most frequent cases of chemical components. Does not guarantee a correct representation, but it is fast and does not require network access.
ChemCompGroupFactory.setChemCompProvider(new ReducedChemCompProvider());
- Load all Chemical Component Definitions at startup (slow startup, but then no further delays later on, requires more memory)
ChemCompGroupFactory.setChemCompProvider(new AllChemCompProvider());
- Fetch missing Chemical Component Definitions on the fly (small download and parsing delays every time a new chemical compound is found). Default behaviour since 4.2.0. Note that the chemical component files are cached in the local file system for subsequent uses.
ChemCompGroupFactory.setChemCompProvider(new DownloadChemCompProvider());
Navigation: Home | Book 3: The Structure Modules | Chapter 5 : Chemical Component Dictionary