Domain images and architecture pages
We have redesigned the domain images which appear on the peptidase summaries. Each image is scaled according to the sequence length, shown as a blue line. The peptidase unit is shown as a green box, with the active site residues and metal legands (if any) shown as red and blue “lollipops”,respectively, along the bottom edge of the box. The top edge shows disulfide bridges, and known carbohydrate binding sites (as orange lollipops). An inhibitor uint is shown as a large, grey box, with reactive site residues shown on the bottom edge as red lollipops. Other domains that have been annotated by SwissProt or Pfam are shown as smaller boxes. Domains derived from Pfam are shown as red boxes and links to the Pfam database can be accessed by clicking on the domain. Signal peptides and transmembrane domains are shown as small, black boxes. Propeptides are shown as small, grey boxes. Mouse-over text gives details for each feature displayed.
Because these simpler domain images are quicker to generate, we now include at the family level a page showing the different protein architectures known in the family or subfamily, ordered by MEROPS identifier.
Comparisons of peptidase specificity
The MEROPS collection of substrate cleavages now exceeds 38,500. There are over three hundred peptidases for which ten or more substrates are known. In addition to the displays on a peptidase summary, MEROPS now includes displays to compare preferences in binding pockets S4 to S4′. These are items on the substrate index and show preference in terms of all amino acids, amino acid properties and individual amino acids. The first of these shows, for each peptidase, an amino acid if it occurs in the same binding pocket in 40% or more of the substrates. So no more than two amino acids are shown for any one binding pocket. The amino acid is shown with a green background, and the brighter the green the greater the percentage of substrates with the amino acid in that binding pocket. The second display is similar but instead of showing individual amino acids, these are collected into “aliphatic”, “aromatic”, “acidic”, “basic” or “small” groups. In the third option the user is prompted to select an amino acid from a pull-down menu and the displays shows the number of substrates with the selected amino acid in each binding pocket for each peptidase. Where an amino acid has not been observed in a binding pocket, this is hightlighted in black. In all three displays where no amino acid is possible (for example P4, P3 and P2 for an aminopeptidase, of P2′, P3′ or P4′ for a carboxypeptidase) the binding pocket is highlighted in grey.
If known, the substrate alignments how show protein secondary structure at the foot of the alignment. A helix is shown as a string of “a’s” and is highlighted in red, a beta strand is shown as a string of “b’s” and is highlighted in green.
MEROPS identifiers for another model organism
We recently expanded MEROPS identifiers to Arabidopsis thaliana, as well as human, mouse and rat, so that every gene product that is likely to be a peptidase has a unique identifier. We have now added identifiers for all probable peptidase in Saccharomyces cerevisiae. Identifiers for peptidases for this organism have the first character after the dot replaced by the letter A. When a homologue is characterized biochemically, we will replace the identifier with one in the standard format (three digits after the dot).
The number of Richardson diagrams showing cartoons of structures has substantially increased, thanks to the hard work of Matthew Jenner, who has been working with us this summer. There is now a Richardson diagram for every peptidase or inhibitor for which a tertiary structure has been solved.
Predicted sequences from the chimpanzee genome
Summer student Matthew Jenner has also been predicting protein sequences from the chimpanzee genome. Protein sequences from eukaryote genomes are collected from the Ensembl database. Although Ensembl has a sophisticated, automated pipeline for predicting protein sequences, some predictions require a further manual stage. These are predictions where exons are missed, introns are mistranslated as exons, or genes are run together. Predicted protein sequences derived from orthologue genes which show the greatest difference between human and chimpanzee have been recalculated using the GeneWise software, the human sequence as a template and nucleotide sequence found in the chimpanzee genome by using the Ensembl Blast search service.