Protein Sequence Analysis Group

Bioinformatics Institute
  Supplementary materials to:
  On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

This is part 4 of a series of articles on the issue of sequence similarity and homology. The three preceding articles are accessible via these links: part 1, part 2 and part 3a, 3b.

Wing-Cheong Wong, Sebastian Maurer-Stroh, Birgit Eisenhaber, Frank Eisenhaber

Full reference
BMC Bioinformatics, 15(1):166, doi:10.1186/1471-2105-15-166

Supplementary to results section: Dissection of sequence alignments accentuates homology evidence in true hits while deemphasizes false hits

Below, we provide the HMMER2 and HMMER3 alignments of the 10 false-positive hits from the case study

  • Transferrin-binding protein (PF01298.13 Lipoprotein 5) to false hits (IF2P_HUMAN, IF2P_MOUSE, IF2P_PONAB, NUCL2_ORYSJ) click here

  • Hepatocyte nuclear factor (PF04814.8 HNF-1_N) to false hits (MLL2_MOUSE, CORTO_DROME, DHKL_DICDI) click here

  • Type II secretion system protein L (PF05134.8 T2SL) to false hit (AMOT_MOUSE) click here

  • Chromatin remodeling factor ISW1a (PF09110.6 HAND) to false hit (NUCL_HUMAN) click here

  • RNA polymerase II elongation factor (PF10390.4 ELL) to false hit (PK4_DICDI) click here

Below, we provide the HMMER2 and HMMER3 alignments of the 3 false-negative hits from the case study

  • ATPase family (PF00004.24 AAA) to true hit (CHLI_PORPU) click here

  • Short chain dehydrogenase (PF00106.20 adh_short) to true hit (HEM1_METKA) click here

  • Formate/Nitrate transporters (PF01226.12 Form_Nir_trans) to true hit (TIP12_MAIZE) click here

Supplementary to results section: Quality score as a proxy to identify the structural segments of domain models for score dissection

Below, we provide the list of high-quality and low-quality segments for 537 SMART and 4771 Pfam domains that have representative PDB/DSSP information

  • List for the 537 SMART domains (67kb) click here

  • List for the 4771 Pfam domains (908kb) click here

Supplementary to results section: The dissection framework validates the seed sequences in domain alignments and systematically identifies the potential false positive and false negative hits in HMMER searches

Below, we provide the domain model alignments of the SM00320 (WD40) and PF13894.1 (zf-C2H2_4) where their common paired hits are less than their HMMER2 orphaned hits

  • SMART domain SM00320 (WD40) (290kb) click here

  • Pfam domain PF13894.1 (zf-C2H2_4) (98kb) click here

DissectHMMER perl program

Below, we provide the program (written as PERL modules) that recomputes the HMMER2/3 log-odd scores.

  • download the program here

All files are compressed by WinRAR

Feedback Login Site Map
Feedback Login Site Map