Molecular Bioinformatics
Center
|
| Home | About | Research | Databases | Tools | Education | TBI | |
|
About MBC People Publications Education Facilities Contact us |
The Molecular Bioinformatics Center (MBC) was established in 2006 as a national resource for bioinformatics and computational systems biology in Taiwan. MBC performs research in computational biology, develops software tools for analyzing biological data and creates public databases. More ...
Predicting disulfide connectivity patterns
Lu CH et al.
Proteins (2007), 67, 262-270 Disulfide bonds play an important role in stabilizing
protein structure and regulating protein function. Therefore,
the ability to infer disulfide connectivity from protein sequences
will be valuable in structural modeling and functional analysis.
However, to predict disulfide connectivity directly from sequences
presents a challenge to computational biologists due to the nonlocal
nature of disulfide bonds, i.e., the close spatial proximity of
the cysteine pair that forms the disulfide bond does not necessarily
imply the short sequence separation of the cysteine residues.
Recently, Chen and Hwang (Proteins 2005;61:507-512) treated this
problem as a multiple class classification by defining each distinct
disulfide pattern as a class. They used multiple support vector
machines based on a variety of sequence features to predict the
disulfide patterns. Their results compare favorably with those
in the literature for a benchmark dataset sharing less than 30%
sequence identity. However, since the number of disulfide patterns
grows rapidly when the number of disulfide bonds increases, their
method performs unsatisfactorily for the cases of large number
of disulfide bonds. In this work, we propose a novel method to
represent disulfide connectivity in terms of cysteine pairs, instead
of disulfide patterns. Since the number of bonding states of the
cysteine pairs is independent of that of disulfide bonds, the
problem of class explosion is avoided. The bonding states of the
cysteine pairs are predicted using the support vector machines
together with the genetic algorithm optimization for feature selection.
The complete disulfide patterns are then determined from the connectivity
matrices that are constructed from the predicted bonding states
of the cysteine pairs. Our approach outperforms the current approaches
in the literature. Revised April 2, 2007 |
|