TMSOC (Transmembrane Helix: Simple Or Complex)

		Contact: Protein Sequence Analysis Group Bioinformatics Institute

TMSOC (Transmembrane helix: Simple Or Complex)

This webserver identifies the simple TMs in the protein sequence and masks them out prior to similarity searches

Wing-Cheong Wong, Sebastian Maurer-Stroh, Georg Schneider, Frank Eisenhaber

FASTA sequence example

Instructions for the TMSOC webserver

Paste a FASTA sequence into the first text box.
Paste the corresponding TM regions into the second text box. If it is left blank, TM segments will be predicted.
Click on example near the input text area to see the data format.
For an explanation on the webserver output, click help.

Download TMSOC

The TMSOC program is written in PERL for easy integration into computational pipelines. It calculates the complexity, hydrophobicity and z-score of TM segments to determine their classification (i.e. simple/twilight/complex). It only requires the (1) protein sequence and (2) TM range(s) as inputs. See README

Note that this archive contains a command-line version of TMSOC that computes the TM helix classification (simple/twilight/complex). It does not contain the TM prediction module since we do not have the permission to redistribute the TM predictors.

WinRAR is recommended for unzipping.

Overview

Transmembrane helical segments (TMs) can be classified into two groups of so-called 'simple' and 'complex' TMs. Whereas the first group represents mere hydrophobic anchors with an overrepresentation of aliphatic hydrophobic residues and are likely attributed to convergent evolution, the complex ones embody ancestral information and tend to have structural and functional roles beyond just membrane immersion [1,2]. Hence, the sequence homology concept is not applicable on simple TMs. In practice, these simple TMs attract statistically significant but evolutionary unrelated hits during similarity searches (whether through BLAST or HMM based approach) [1]. This is especially problematic for membrane proteins that contain both globular segments and TMs. As such, it necessitates for the identification and removal of simple TMs (i.e. through the masking of these sequence segments) from seed sequences prior to sequence similarity searches. In doing so, appropriately masked sequences experience a decrease in false-discovery rate in searches without sacrificing sensitivity [1,2].

In the webserver "TMSOC (Transmembrane Helix: Simple Or Complex)", the presence and length of TM helices in a protein sequence are first derived with statistical significance criteria from 5 TM predictors (TMHMM, HMMTOP, DASTM, PhobiusTM, SAPS) [1]. Then, a z-score criterion (based on sequence complexity and hydrophobicity) is applied onto each TM helix segment to identify the simple TMs [2]. The input can simply be a fasta-formatted sequence. As outputs, users will find the results of (i) positions of the predicted/user-defined TM segments, (ii) the sequence complexity, hydrophobicity, z-score and classification (i.e. simple/twilight/complex) of each TM segment following the procedure from ref. [2], (iii) a sequence complexity/hydrophobicity plot including the users' TM segments, and (iv) a fasta-formatted sequence with masked simple TM sequences (replaced by a continuum of 'X') for input in other programs.

References

Wing-Cheong Wong, Sebastian Maurer-Stroh, Frank Eisenhaber, 2010, More than 1001 problems with protein domain databases: transmembrane regions, signalpeptides and the issue of sequence homology, PLoS Computational Biology, 6(7), doi:10.1371/journal.pcbi.1000867
Link to supplementary materials
Wing-Cheong Wong, Sebastian Maurer-Stroh, Frank Eisenhaber, 2011, Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to the membrane proteins, Biology Direct, 6(57), doi:10.1186/1745-6150-6-57
Link to supplementary materials
Wing-Cheong Wong, Sebastian Maurer-Stroh, Georg Schneider, Frank Eisenhaber, 2012, Transmembrane helix: simple or complex, Nucleic Acids Research (Web Server issue), doi:10.1093/nar/gks379