The release of MEROPS 9 was more complex than had been realised, and it has taken a few days to get everything working. We apologise for any pages, features or facilities that have been missing. If you notice any pages, features or facilities that are not working properly then please let us know, preferably via the new feedback system.
Archive for December, 2009
New codebase Release 9 of MEROPS will be in two parts. The reason for this is that the software that produces the web pages has been re-written by Matthew Waller and Jody Clements from the Wellcome Trust Sanger Institute web team. In January 2010 the new data will be published. The release has been done in this way so that any programming bugs can be detected and reliably separated from data errors. The FTP site will also be updated in January. The new software has been designed to resemble the old website, and the changes have mainly been for our benefit, to simplify the maintainence and make it easier to add new features. Users will notice that frames have disappeared, and that the left-hand green menu now scrolls with the rest of the page. Links to external databases now open in the same window so you will have to use the back button to return to MEROPS. References and sequences are now displayed in small windows in the middle of the screen.
Feedback and reporting errors
The MEROPS website now has a ticketing system for reporting errors and making comments. Each page now has a feedback link in the footer. The user will be asked to enter his or her name and an E-mail address when the comment is posted. The user will receive an automated E-mail which should not be replied to. A member of the MEROPS team will then contact the user and when the problem has been fixed the ticket will be closed. The user will receive a second automated E-mail. This should only be replied to if the issue has not been resolved to the user’s satisfaction: the ticket will then be automatically re-opened. Please use this system to report programming errors, broken links and any errors or omissions in the data.
Links to PubMed Central from Literature pages
There are now links from the clan, family, peptidase and inhibitor literature pages to the full text of papers stored in PubMed Central.
Domain images and architecture pages
We have redesigned the domain images which appear on the peptidase summaries. Each image is scaled according to the sequence length, shown as a blue line. The peptidase unit is shown as a green box, with the active site residues and metal legands (if any) shown as red and blue “lollipops”,respectively, along the bottom edge of the box. The top edge shows disulfide bridges, and known carbohydrate binding sites (as orange lollipops). An inhibitor uint is shown as a large, grey box, with reactive site residues shown on the bottom edge as red lollipops. Other domains that have been annotated by SwissProt or Pfam are shown as smaller boxes. Domains derived from Pfam are shown as red boxes and links to the Pfam database can be accessed by clicking on the domain. Signal peptides and transmembrane domains are shown as small, black boxes. Propeptides are shown as small, grey boxes. Mouse-over text gives details for each feature displayed.
Because these simpler domain images are quicker to generate, we now include at the family level a page showing the different protein architectures known in the family or subfamily, ordered by MEROPS identifier.
Comparisons of peptidase specificity
The MEROPS collection of substrate cleavages now exceeds 38,500. There are over three hundred peptidases for which ten or more substrates are known. In addition to the displays on a peptidase summary, MEROPS now includes displays to compare preferences in binding pockets S4 to S4′. These are items on the substrate index and show preference in terms of all amino acids, amino acid properties and individual amino acids. The first of these shows, for each peptidase, an amino acid if it occurs in the same binding pocket in 40% or more of the substrates. So no more than two amino acids are shown for any one binding pocket. The amino acid is shown with a green background, and the brighter the green the greater the percentage of substrates with the amino acid in that binding pocket. The second display is similar but instead of showing individual amino acids, these are collected into “aliphatic”, “aromatic”, “acidic”, “basic” or “small” groups. In the third option the user is prompted to select an amino acid from a pull-down menu and the displays shows the number of substrates with the selected amino acid in each binding pocket for each peptidase. Where an amino acid has not been observed in a binding pocket, this is hightlighted in black. In all three displays where no amino acid is possible (for example P4, P3 and P2 for an aminopeptidase, of P2′, P3′ or P4′ for a carboxypeptidase) the binding pocket is highlighted in grey.
If known, the substrate alignments how show protein secondary structure at the foot of the alignment. A helix is shown as a string of “a’s” and is highlighted in red, a beta strand is shown as a string of “b’s” and is highlighted in green.
MEROPS identifiers for another model organism
We recently expanded MEROPS identifiers to Arabidopsis thaliana, as well as human, mouse and rat, so that every gene product that is likely to be a peptidase has a unique identifier. We have now added identifiers for all probable peptidase in Saccharomyces cerevisiae. Identifiers for peptidases for this organism have the first character after the dot replaced by the letter A. When a homologue is characterized biochemically, we will replace the identifier with one in the standard format (three digits after the dot).
The number of Richardson diagrams showing cartoons of structures has substantially increased, thanks to the hard work of Matthew Jenner, who has been working with us this summer. There is now a Richardson diagram for every peptidase or inhibitor for which a tertiary structure has been solved.
Predicted sequences from the chimpanzee genome
Summer student Matthew Jenner has also been predicting protein sequences from the chimpanzee genome. Protein sequences from eukaryote genomes are collected from the Ensembl database. Although Ensembl has a sophisticated, automated pipeline for predicting protein sequences, some predictions require a further manual stage. These are predictions where exons are missed, introns are mistranslated as exons, or genes are run together. Predicted protein sequences derived from orthologue genes which show the greatest difference between human and chimpanzee have been recalculated using the GeneWise software, the human sequence as a template and nucleotide sequence found in the chimpanzee genome by using the Ensembl Blast search service.