Package Scientific :: Package IO :: Module PDB
[frames] | no frames]

Module PDB

Parsing and writing of Protein Data Bank (PDB) files

This module provides classes that represent PDB (Protein Data Bank) files and configurations contained in PDB files. It provides access to PDB files on two levels: low-level (line by line) and high-level (chains, residues, and atoms).

Caution: The PDB file format has been heavily abused, and it is probably impossible to write code that can deal with all variants correctly. This modules tries to read the widest possible range of PDB files, but gives priority to a correct interpretation of the PDB format as defined by the Brookhaven National Laboratory.

A special problem are atom names. The PDB file format specifies that the first two letters contain the right-justified chemical element name. A later modification allowed the initial space in hydrogen names to be replaced by a digit. Many programs ignore all this and treat the name as an arbitrary left-justified four-character name. This makes it difficult to extract the chemical element accurately; most programs write the '"CA"' for C_alpha in such a way that it actually stands for a calcium atom. For this reason a special element field has been added later, but only few files use it. In the absence of an element field, the code in this module attempts to guess the element using all information available.

The low-level routines in this module do not try to deal with the atom name problem; they return and expect four-character atom names including spaces in the correct positions. The high-level routines use atom names without leading or trailing spaces, but provide and use the element field whenever possible. For output, they use the element field to place the atom name correctly, and for input, they construct the element field content from the atom name if no explicit element field is found in the file.

Except where indicated, numerical values use the same units and conventions as specified in the PDB format description.

Example:

 >>>conf = Structure('example.pdb')
 >>>print conf
 >>>for residue in conf.residues:
 >>>    for atom in residue:
 >>>        print atom
Classes
  AminoAcidResidue
Amino acid residue in a PDB file
  Atom
Atom in a PDB structure
  Chain
Chain of PDB residues
  Group
Atom group (residue or molecule) in a PDB file
  HetAtom
HetAtom in a PDB structure
  Molecule
Molecule in a PDB file
  NucleotideChain
Nucleotide chain in a PDB file
  NucleotideResidue
Nucleotide residue in a PDB file
  PDBFile
PDB file with access at the record level
  PeptideChain
Peptide chain in a PDB file
  Residue
  ResidueNumber
PDB residue number
  Structure
A high-level representation of the contents of a PDB file
Functions
 
defineAminoAcidResidue(symbol)
Make the parser recognize a particular residue type as an amino acid residue
 
defineNucleicAcidResidue(symbol)
Make the parser recognize a particular residue type as an nucleic acid residue
Variables
  amino_acids = ['ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'CYX', 'GLN'...
  nucleic_acids = ['A', 'C', 'G', 'I', 'T', 'U', '+A', '+C', '+G...
Function Details

defineAminoAcidResidue(symbol)

 

Make the parser recognize a particular residue type as an amino acid residue

Parameters:
  • symbol (str) - the three-letter code for an amino acid

defineNucleicAcidResidue(symbol)

 

Make the parser recognize a particular residue type as an nucleic acid residue

Parameters:
  • symbol (str) - the one-letter code for a nucleic acid

Variables Details

amino_acids

Value:
['ALA',
 'ARG',
 'ASN',
 'ASP',
 'CYS',
 'CYX',
 'GLN',
 'GLU',
...

nucleic_acids

Value:
['A',
 'C',
 'G',
 'I',
 'T',
 'U',
 '+A',
 '+C',
...