Prediction of lipoprotein signal peptides in Gram-negative bacteria.
Journal: 2004/March - Protein Science
ISSN: 0961-8368
Abstract:
A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/.
Relations:
Content
Citations
(404)
References
(26)
Chemicals
(3)
Genes
(167)
Organisms
(1)
Processes
(4)
Anatomy
(1)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Protein Sci 12(8): 1652-1662

Prediction of lipoprotein signal peptides in Gram-negative bacteria

Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby 2800, Denmark
Stockholm Bioinformatics Center, Department of Biochemistry, Stockholm University, Stockholm S-106 91, Sweden
These authors contributed equally to the presented work.
Present address: The Bioinformatics Centre, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark
Reprint requests to: Anders Krogh, The Bioinformatics Centre, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark; e-mail: kd.uk.fnib@hgork; fax: 45-3532-1300.
Reprint requests to: Anders Krogh, The Bioinformatics Centre, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark; e-mail: kd.uk.fnib@hgork; fax: 45-3532-1300.
Received 2003 Jan 24; Revised 2003 May 15; Accepted 2003 May 19.

Abstract

A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/.

Keywords: Signal peptides, lipoprotein prediction, HMM, neural networks
Abstract

Bacterial lipoproteins consist of a large group of proteins with many different functions. The characteristic feature of all lipoproteins is a signal sequence in the N-terminal end, followed by a cysteine (Hayashi and Wu 1990). The signal sequence is cleaved by signal peptidase II (SPaseII), also called lipoprotein signal peptidase (Lsp). These lipoprotein signal peptides are quite similar to the signal peptides of secreted proteins, which are cleaved by signal peptidase I (SPaseI). So far, a few hundred putative lipoproteins in Gram-negative Eubacteria have been annotated in SWISS-PROT (Bairoch and Apweiler 2000).

Biosynthesis of lipoproteins in Gram-negative and Gram-positive bacteria consists of three steps, as shown in Figure 1: transfer of a diacylglyceride to the cysteine sulphydryl group of the unmodified prolipoprotein; cleavage of the signal peptide by signal peptidase II, forming an apolipoprotein; and, finally, acylation of the α-amino group of the N-terminal cysteine of the apolipoprotein (Sankaran and Wu 1994). Before the processing of the prolipoprotein, which takes place on the periplasmic side of the inner membrane, the prolipoprotein is exported through the inner membrane by the general secretory pathway that is also used by secretory proteins processed by SPaseI (Hayashi and Wu 1990). In Gram-negative bacteria, the lipoproteins are anchored to either the inner or the outer membrane, and a single amino acid in position +2 is proposed to determine the final destination of the lipoproteins (Yamaguchi et al. 1988; Seydel et al. 1999). For more details about biosynthesis and export of lipoproteins, see Braun and Wu (1994).

An external file that holds a picture, illustration, etc.
Object name is 65256-17f1_L1TT.jpg

Biosynthesis of a lipoprotein. Lipids are attached to cysteine. Peptides are shown to the left and to the right of the cysteine residue. Catalytic enzymes are written beside reaction arrows.

The signal sequence can be divided into three regions: an n-region, an h-region, and a c-region. The n-region is characterized by presence of the positive amino acids lysine and/or arginine, the h-region consists of hydrophobic amino acids, and the c-region has a characteristic region of four amino acids around the cleavage site that is very well conserved, a so-called lipobox. The most conserved amino acids in the lipobox are a leucine in position −3 from the cleavage site, an alanine in position −2, and a glycine or an alanine in position −1. The cysteine at position +1 is required: LA(G,A) ↓ C (von Heijne 1989). The consensus for the lipoprotein signal sequence has previously been characterized further, so it could be used for lipoprotein predictions. One example is the consensus made by von Heijne, (LVI)(ASTG)(GA) ↓ C, requiring only one match to the first two positions. This pattern was able to discriminate between all lipoprotein signal peptides and SPaseI-cleaved signal peptides known at the time (von Heijne 1989). The lipoprotein predictor in PSORT (Nakai and Kanehisa 1991) integrates the von Heijne consensus sequence in its predictions. Another example is the Prosite pattern PS00013 {DERK} (6)(LIVMFWSTAG)(2)(LIVMFYSTAGCQ) (AGS) ↓ C, where {DERK}(6) means that none of the four amino acids are allowed in the first six positions (position −10 to −5 relative to the cleavage site). The pattern has two additional rules: The cysteine must be between position 15 and 35, and at least one lysine or arginine must be in one of the first seven positions of the signal peptide (Falquet et al. 2002). More recently, a new regular expression was made for Gram-positive bacteria (Sutcliffe and Harrington 2002).

The lipoprotein signal peptide has been compared with the SPaseI-cleaved signal peptides. The lipoprotein signal peptides have a similar n-region, but the h-regions of lipoprotein signal peptides are shorter and the SPaseI-cleaved signal peptides have a polar c-region before the cleavage site (Klein et al. 1988; von Heijne 1989). For lipoproteins, as well as for the SpaseI-cleaved proteins, the n- and h-regions are required for the translocation of the uncleaved protein precursor through the inner membrane. The c-region is necessary for the recognition of the cleavage site by the signal peptidase (von Heijne 1990).

Methods for prediction of SPaseI-cleaved signal peptides have been around for some time (Nakai and Kanehisa 1991; Nielsen et al. 1997). The performance of these methods is generally quite good, but it is a problem to discriminate SPaseI-cleaved signal peptides from SPaseII-cleaved signals and N-terminal transmembrane helices (Nielsen et al. 1997; Nielsen and Krogh 1998). Similarly, methods for predicting transmembrane helices often, by mistake, predict signal peptides as membrane helices (for example, see Krogh et al. 2001).

Here we present a method to predict lipoproteins in Gram-negative bacteria and their signal peptide cleavage site based on a hidden Markov model (HMM) or a neural network. Both methods are significantly better than the above-mentioned existing methods. The HMM is trained on both SPaseI-cleaved proteins, lipoproteins, and cytoplasmic and transmembrane proteins, and it is able to classify an N-terminal protein sequence as a lipoprotein signal peptides, a SPaseI-cleaved signal peptide, or a protein without a signal sequence (cytoplasmic or transmembrane) with very low error rates. The HMM is also able to predict the cleavage site in both SPaseI- and SPaseII-cleaved signal peptides.

Acknowledgments

We thank Hajime Tokuda for experimental results prior to publication and Lars Juhl Jensen for his programming advice. This work was sponsored by a grant to the Center for Biological Sequence Analysis (S.B.) from the Danish National Research Foundation. A.K. was supported by EU grant no. QLRI-CT-2001-00015.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Acknowledgments

Notes

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0303703.

Notes
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0303703.
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.