Contact:



Protein Sequence Analysis Group

Bioinformatics Institute
 
   
  Supplementary materials to:
  Charged residues next to transmembrane regions revisited: 'Positive-inside rule' is complemented by negative charge inside depletion/outside enrichment
 

James Baker1,2, Wing Cheong-Wong1, Birgit Eisenhaber1, Jim Warwicker2*, Frank Eisenhaber1,3*

1Bioinformatics Institute, Agency for Science Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore 138671

2Faculty of Life Sciences, John Garside Building, 131 Princess Street, Manchester, M1 7DN

3School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, 637553, Singapore

Supplementary files to methods and results section.

About.

These scripts can be used to mine database files from UniProt and TopDB into transmembrane and perform sequence distribution data analysis.

Disclaimer: This release is not designed for redistribution, or reapplication as production software, but is provided as is. If you are having trouble running the scripts, or if there is something unclear, do feel free to contact the authors directly.

Download.

The downloaded zip includes the original files downloaded from the respective databases and the Uniprot non-redundant datasets. The zip also contains the python scripts used to generate the datasets, tables, figures, as well as the parsed .csv files of each dataset. Within the .csv files is the ID for the respective databases, the full protein sequence, the TMH sequences, the flank sequences (each file for each dataset has a different cut-off flank length: 5, 10 or 20), the number of TMHs in the given protein, and the orientation of the TMH.

Databases.

Database.zip includes the database files as described in the methods section:
UniArch.txt
UniBacilli.txt
UniCress.txt
UniER.txt
UniEcoli.txt
UniFungi.txt
UniGolgi.txt
UniHuman.txt
UniPM.txt

Scripts.

Scripts.zip includes scripts used for many of the figures and tables throughout the results:

Figure_1_2_3_4_6.py
Figure_5A_vectors.py
Figure_5B_vectors.py
Figure_7.py
Figure_S1.py
Figure_S4.py
Table_1.py
Table_2_central_alignment.py
Table_2_database_defined.py
Table_4.py
Table_S1.py
Table_S2.py
topdb_fasta_dataset_generator.py
Uniprot_dataset_generator.py

The scripts folder also includes various text files to assist the scripts.

Datasets.

Datasets.zip includes the processed database files used in the distribution analysis throughout the results.

Dataset file name key.

For each of the database files, there are several processed dataset files. The name of the dataset file describes the processing used to generate the dataset file. The csv file names include the name of the original database file, the maximum allowed flank length, the Flankclash variable state (if Flankclash is true, then the dataset does not allow overlap between flanking regions of TMHs. If the variable is false then the dataset allows overlap), and an additional condition if only TMH records with flanks of either half or full length are included in the dataset.

For example, UniPM_20_flanklength_flankclashTrue_only_half_flanks.csv is from the UniPM database file (UniPM), uses a maximum flank length of 20 residues (20_flanklength), does not allow flanks to overlap (FlankclashTrue), and only includes records that have at least 10 residues in both flanks (only_half_flanks).

TopDB_5_flanklength_flankclashFalse_only_full_flanks.csv
TopDB_5_flanklength_flankclashFalse_only_half_flanks.csv
TopDB_5_flanklength_flankclashFalse.csv
TopDB_5_flanklength_flankclashTrue_only_full_flanks.csv
TopDB_5_flanklength_flankclashTrue_only_half_flanks.csv
TopDB_5_flanklength_flankclashTrue.csv
TopDB_10_flanklength_flankclashFalse_only_full_flanks.csv
TopDB_10_flanklength_flankclashFalse_only_half_flanks.csv
TopDB_10_flanklength_flankclashFalse.csv
TopDB_10_flanklength_flankclashTrue_only_full_flanks.csv
TopDB_10_flanklength_flankclashTrue_only_half_flanks.csv
TopDB_10_flanklength_flankclashTrue.csv
TopDB_20_flanklength_flankclashFalse_only_full_flanks.csv
TopDB_20_flanklength_flankclashFalse_only_half_flanks.csv
TopDB_20_flanklength_flankclashFalse.csv
TopDB_20_flanklength_flankclashTrue_only_full_flanks.csv
TopDB_20_flanklength_flankclashTrue_only_half_flanks.csv
TopDB_20_flanklength_flankclashTrue.csv
UniArch_5_flanklength_flankclashFalse_only_full_flanks.csv
UniArch_5_flanklength_flankclashFalse_only_half_flanks.csv
UniArch_5_flanklength_flankclashFalse.csv
UniArch_5_flanklength_flankclashTrue_only_full_flanks.csv
UniArch_5_flanklength_flankclashTrue_only_half_flanks.csv
UniArch_5_flanklength_flankclashTrue.csv
UniArch_10_flanklength_flankclashFalse_only_full_flanks.csv
UniArch_10_flanklength_flankclashFalse_only_half_flanks.csv
UniArch_10_flanklength_flankclashFalse.csv
UniArch_10_flanklength_flankclashTrue_only_full_flanks.csv
UniArch_10_flanklength_flankclashTrue_only_half_flanks.csv
UniArch_10_flanklength_flankclashTrue.csv
UniArch_20_flanklength_flankclashFalse_only_full_flanks.csv
UniArch_20_flanklength_flankclashFalse_only_half_flanks.csv
UniArch_20_flanklength_flankclashFalse.csv
UniArch_20_flanklength_flankclashTrue_only_full_flanks.csv
UniArch_20_flanklength_flankclashTrue_only_half_flanks.csv
UniArch_20_flanklength_flankclashTrue.csv
UniBacilli_5_flanklength_flankclashFalse_only_full_flanks.csv
UniBacilli_5_flanklength_flankclashFalse_only_half_flanks.csv
UniBacilli_5_flanklength_flankclashFalse.csv
UniBacilli_5_flanklength_flankclashTrue_only_full_flanks.csv
UniBacilli_5_flanklength_flankclashTrue_only_half_flanks.csv
UniBacilli_5_flanklength_flankclashTrue.csv
UniBacilli_10_flanklength_flankclashFalse_only_full_flanks.csv
UniBacilli_10_flanklength_flankclashFalse_only_half_flanks.csv
UniBacilli_10_flanklength_flankclashFalse.csv
UniBacilli_10_flanklength_flankclashTrue_only_full_flanks.csv
UniBacilli_10_flanklength_flankclashTrue_only_half_flanks.csv
UniBacilli_10_flanklength_flankclashTrue.csv
UniBacilli_20_flanklength_flankclashFalse_only_full_flanks.csv
UniBacilli_20_flanklength_flankclashFalse_only_half_flanks.csv
UniBacilli_20_flanklength_flankclashFalse.csv
UniBacilli_20_flanklength_flankclashTrue_only_full_flanks.csv
UniBacilli_20_flanklength_flankclashTrue_only_half_flanks.csv
UniBacilli_20_flanklength_flankclashTrue.csv
UniCress_5_flanklength_flankclashFalse_only_full_flanks.csv
UniCress_5_flanklength_flankclashFalse_only_half_flanks.csv
UniCress_5_flanklength_flankclashFalse.csv
UniCress_5_flanklength_flankclashTrue_only_full_flanks.csv
UniCress_5_flanklength_flankclashTrue_only_half_flanks.csv
UniCress_5_flanklength_flankclashTrue.csv
UniCress_10_flanklength_flankclashFalse_only_full_flanks.csv
UniCress_10_flanklength_flankclashFalse_only_half_flanks.csv
UniCress_10_flanklength_flankclashFalse.csv
UniCress_10_flanklength_flankclashTrue_only_full_flanks.csv
UniCress_10_flanklength_flankclashTrue_only_half_flanks.csv
UniCress_10_flanklength_flankclashTrue.csv
UniCress_20_flanklength_flankclashFalse_only_full_flanks.csv
UniCress_20_flanklength_flankclashFalse_only_half_flanks.csv
UniCress_20_flanklength_flankclashFalse.csv
UniCress_20_flanklength_flankclashTrue_only_full_flanks.csv
UniCress_20_flanklength_flankclashTrue_only_half_flanks.csv
UniCress_20_flanklength_flankclashTrue.csv
UniEcoli_5_flanklength_flankclashFalse_only_full_flanks.csv
UniEcoli_5_flanklength_flankclashFalse_only_half_flanks.csv
UniEcoli_5_flanklength_flankclashFalse.csv
UniEcoli_5_flanklength_flankclashTrue_only_full_flanks.csv
UniEcoli_5_flanklength_flankclashTrue_only_half_flanks.csv
UniEcoli_5_flanklength_flankclashTrue.csv
UniEcoli_10_flanklength_flankclashFalse_only_full_flanks.csv
UniEcoli_10_flanklength_flankclashFalse_only_half_flanks.csv
UniEcoli_10_flanklength_flankclashFalse.csv
UniEcoli_10_flanklength_flankclashTrue_only_full_flanks.csv
UniEcoli_10_flanklength_flankclashTrue_only_half_flanks.csv
UniEcoli_10_flanklength_flankclashTrue.csv
UniEcoli_20_flanklength_flankclashFalse_only_full_flanks.csv
UniEcoli_20_flanklength_flankclashFalse_only_half_flanks.csv
UniEcoli_20_flanklength_flankclashFalse.csv
UniEcoli_20_flanklength_flankclashTrue_only_full_flanks.csv
UniEcoli_20_flanklength_flankclashTrue_only_half_flanks.csv
UniEcoli_20_flanklength_flankclashTrue.csv
UniER_5_flanklength_flankclashFalse_only_full_flanks.csv
UniER_5_flanklength_flankclashFalse_only_half_flanks.csv
UniER_5_flanklength_flankclashFalse.csv
UniER_5_flanklength_flankclashTrue_only_full_flanks.csv
UniER_5_flanklength_flankclashTrue_only_half_flanks.csv
UniER_5_flanklength_flankclashTrue.csv
UniER_10_flanklength_flankclashFalse_only_full_flanks.csv
UniER_10_flanklength_flankclashFalse_only_half_flanks.csv
UniER_10_flanklength_flankclashFalse.csv
UniER_10_flanklength_flankclashTrue_only_full_flanks.csv
UniER_10_flanklength_flankclashTrue_only_half_flanks.csv
UniER_10_flanklength_flankclashTrue.csv
UniER_20_flanklength_flankclashFalse_only_full_flanks.csv
UniER_20_flanklength_flankclashFalse_only_half_flanks.csv
UniER_20_flanklength_flankclashFalse.csv
UniER_20_flanklength_flankclashTrue_only_full_flanks.csv
UniER_20_flanklength_flankclashTrue_only_half_flanks.csv
UniER_20_flanklength_flankclashTrue.csv
UniFungi_5_flanklength_flankclashFalse_only_full_flanks.csv
UniFungi_5_flanklength_flankclashFalse_only_half_flanks.csv
UniFungi_5_flanklength_flankclashFalse.csv
UniFungi_5_flanklength_flankclashTrue_only_full_flanks.csv
UniFungi_5_flanklength_flankclashTrue_only_half_flanks.csv
UniFungi_5_flanklength_flankclashTrue.csv
UniFungi_10_flanklength_flankclashFalse_only_full_flanks.csv
UniFungi_10_flanklength_flankclashFalse_only_half_flanks.csv
UniFungi_10_flanklength_flankclashFalse.csv
UniFungi_10_flanklength_flankclashTrue_only_full_flanks.csv
UniFungi_10_flanklength_flankclashTrue_only_half_flanks.csv
UniFungi_10_flanklength_flankclashTrue.csv
UniFungi_20_flanklength_flankclashFalse_only_full_flanks.csv
UniFungi_20_flanklength_flankclashFalse_only_half_flanks.csv
UniFungi_20_flanklength_flankclashFalse.csv
UniFungi_20_flanklength_flankclashTrue_only_full_flanks.csv
UniFungi_20_flanklength_flankclashTrue_only_half_flanks.csv
UniFungi_20_flanklength_flankclashTrue.csv
UniGolgi_5_flanklength_flankclashFalse_only_full_flanks.csv
UniGolgi_5_flanklength_flankclashFalse_only_half_flanks.csv
UniGolgi_5_flanklength_flankclashFalse.csv
UniGolgi_5_flanklength_flankclashTrue_only_full_flanks.csv
UniGolgi_5_flanklength_flankclashTrue_only_half_flanks.csv
UniGolgi_5_flanklength_flankclashTrue.csv
UniGolgi_10_flanklength_flankclashFalse_only_full_flanks.csv
UniGolgi_10_flanklength_flankclashFalse_only_half_flanks.csv
UniGolgi_10_flanklength_flankclashFalse.csv
UniGolgi_10_flanklength_flankclashTrue_only_full_flanks.csv
UniGolgi_10_flanklength_flankclashTrue_only_half_flanks.csv
UniGolgi_10_flanklength_flankclashTrue.csv
UniGolgi_20_flanklength_flankclashFalse_only_full_flanks.csv
UniGolgi_20_flanklength_flankclashFalse_only_half_flanks.csv
UniGolgi_20_flanklength_flankclashFalse.csv
UniGolgi_20_flanklength_flankclashTrue_only_full_flanks.csv
UniGolgi_20_flanklength_flankclashTrue_only_half_flanks.csv
UniGolgi_20_flanklength_flankclashTrue.csv
UniHuman_5_flanklength_flankclashFalse_only_full_flanks.csv
UniHuman_5_flanklength_flankclashFalse_only_half_flanks.csv
UniHuman_5_flanklength_flankclashFalse.csv
UniHuman_5_flanklength_flankclashTrue_only_full_flanks.csv
UniHuman_5_flanklength_flankclashTrue_only_half_flanks.csv
UniHuman_5_flanklength_flankclashTrue.csv
UniHuman_10_flanklength_flankclashFalse_only_full_flanks.csv
UniHuman_10_flanklength_flankclashFalse_only_half_flanks.csv
UniHuman_10_flanklength_flankclashFalse.csv
UniHuman_10_flanklength_flankclashTrue_only_full_flanks.csv
UniHuman_10_flanklength_flankclashTrue_only_half_flanks.csv
UniHuman_10_flanklength_flankclashTrue.csv
UniHuman_20_flanklength_flankclashFalse_only_full_flanks.csv
UniHuman_20_flanklength_flankclashFalse_only_half_flanks.csv
UniHuman_20_flanklength_flankclashFalse.csv
UniHuman_20_flanklength_flankclashTrue_only_full_flanks.csv
UniHuman_20_flanklength_flankclashTrue_only_half_flanks.csv
UniHuman_20_flanklength_flankclashTrue.csv
UniPM_5_flanklength_flankclashFalse_only_full_flanks.csv
UniPM_5_flanklength_flankclashFalse_only_half_flanks.csv
UniPM_5_flanklength_flankclashFalse.csv
UniPM_5_flanklength_flankclashTrue_only_full_flanks.csv
UniPM_5_flanklength_flankclashTrue_only_half_flanks.csv
UniPM_5_flanklength_flankclashTrue.csv
UniPM_10_flanklength_flankclashFalse_only_full_flanks.csv
UniPM_10_flanklength_flankclashFalse_only_half_flanks.csv
UniPM_10_flanklength_flankclashFalse.csv
UniPM_10_flanklength_flankclashTrue_only_full_flanks.csv
UniPM_10_flanklength_flankclashTrue_only_half_flanks.csv
UniPM_10_flanklength_flankclashTrue.csv
UniPM_20_flanklength_flankclashFalse_only_full_flanks.csv
UniPM_20_flanklength_flankclashFalse_only_half_flanks.csv
UniPM_20_flanklength_flankclashFalse.csv
UniPM_20_flanklength_flankclashTrue_only_full_flanks.csv
UniPM_20_flanklength_flankclashTrue_only_half_flanks.csv
UniPM_20_flanklength_flankclashTrue.csv

Usage.

Features.

The scripts can be used to mine UniProt files and the Fasta TopDB file into tables that have easier to handle information about their transmembrane domain and neighbouring residue sequences in csv format.

Additional scripts that were used to analyse the data are included, however, these are provided as and may not work out of the box since they rely on more modules.

Generating datasets.

  1. These scripts require Biopython, numpy, and python 2.7. Run sudo easy_install pip; sudo pip install numpy; sudo pip install Biopython and enter your password as appropriate. If you come across any errors it is probably because python is not installed in the default locations, or a package has already been installed before you ran these commands.

  2. Run the database generator scripts using python topdb_fasta_dataset_generator.py and python Uniprot_dataset_generator.py. These generate csv files from the database files.

Feedback Login Site Map