Package Scientific :: Package IO :: Module PDB

Module PDB

Parsing and writing of Protein Data Bank (PDB) files

This module provides classes that represent PDB (Protein Data Bank) files and configurations contained in PDB files. It provides access to PDB files on two levels: low-level (line by line) and high-level (chains, residues, and atoms).

Caution: The PDB file format has been heavily abused, and it is probably impossible to write code that can deal with all variants correctly. This modules tries to read the widest possible range of PDB files, but gives priority to a correct interpretation of the PDB format as defined by the Brookhaven National Laboratory.

A special problem are atom names. The PDB file format specifies that the first two letters contain the right-justified chemical element name. A later modification allowed the initial space in hydrogen names to be replaced by a digit. Many programs ignore all this and treat the name as an arbitrary left-justified four-character name. This makes it difficult to extract the chemical element accurately; most programs write the '"CA"' for C_alpha in such a way that it actually stands for a calcium atom. For this reason a special element field has been added later, but only few files use it. In the absence of an element field, the code in this module attempts to guess the element using all information available.

The low-level routines in this module do not try to deal with the atom name problem; they return and expect four-character atom names including spaces in the correct positions. The high-level routines use atom names without leading or trailing spaces, but provide and use the element field whenever possible. For output, they use the element field to place the atom name correctly, and for input, they construct the element field content from the atom name if no explicit element field is found in the file.

Except where indicated, numerical values use the same units and conventions as specified in the PDB format description.

Example:

 >>>conf = Structure('example.pdb')
 >>>print conf
 >>>for residue in conf.residues:
 >>>    for atom in residue:
 >>>        print atom

Classes
	AminoAcidResidue Amino acid residue in a PDB file
	Atom Atom in a PDB structure
	Chain Chain of PDB residues
	Group Atom group (residue or molecule) in a PDB file
	HetAtom HetAtom in a PDB structure
	Molecule Molecule in a PDB file
	NucleotideChain Nucleotide chain in a PDB file
	NucleotideResidue Nucleotide residue in a PDB file
	PDBFile PDB file with access at the record level
	PeptideChain Peptide chain in a PDB file
	Residue
	ResidueNumber PDB residue number
	Structure A high-level representation of the contents of a PDB file

Functions

defineAminoAcidResidue(symbol)
Make the parser recognize a particular residue type as an amino acid residue

defineNucleicAcidResidue(symbol)
Make the parser recognize a particular residue type as an nucleic acid residue

Variables
	amino_acids = `['ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'CYX', 'GLN'...`
	nucleic_acids = `['A', 'C', 'G', 'I', 'T', 'U', '+A', '+C', '+G...`

Function Details

defineAminoAcidResidue(symbol)

Make the parser recognize a particular residue type as an amino acid residue

Parameters:

symbol (str) - the three-letter code for an amino acid

defineNucleicAcidResidue(symbol)

Make the parser recognize a particular residue type as an nucleic acid residue

Parameters:

symbol (str) - the one-letter code for a nucleic acid

Variables Details

amino_acids

Value:

['ALA',
 'ARG',
 'ASN',
 'ASP',
 'CYS',
 'CYX',
 'GLN',
 'GLU',
...

nucleic_acids

Value:

['A',
 'C',
 'G',
 'I',
 'T',
 'U',
 '+A',
 '+C',
...