|
|
Supplementary to: |
|
|
|
Materials and Methods |
Naturally occurring pandemic 2009 (H1N1) influenza A viral sequences that were submitted between 30 March 2009 to 31 May 2010 were downloaded from the NCBI Influenza Virus Resource[1]. A total of 3588 viral strains were analyzed. The sequences for each protein were aligned with MAFFT[2] and substitutions in positions of all 10 proteins for all 3588 strains were identified relative to reference strain A/Texas/04/2009 as it was one of the first submitted strains with sequence information available for all viral genes that most closely resembles the rest of the circulating H1N1 strains during the first week of sequence submission. Phylogenetic analysis was conducted on all strains with full-length nucleotide sequences available for all 8 segments. The protein coding nucleotide sequences for these strains were concatenated such that a single sequence representing a single strain contains nucleotides for all 10 proteins. These sequences were aligned with MAFFT[2] using the FFT-NS-1 option. Cd-hit[3] was used to remove highly similar sequences by allowing a maximal sequence identity of 99.94% to reduce the set to 727 non-redundant strains. Next, we created a maximum likelihood tree using PhyML[4] with the approximate likelihood ratio test, the HKY85 substitution model and other parameters such as for the shape of the gamma distribution (0.353) were estimated by the program. The major strain lineages and substitutions discussed in this analysis were identified and marked[5] in the resulting phylogenetic tree. |
To observe the emerging trends of the substitutions HA-K2E, HA-Q310H, PB2-K340N, HA-D239N and HA-D239G, the number of strains carrying these 5 substitutions was recorded according to their collection date. A window period of 28 days was used to estimate the average percentage of observing a particular substitution, over the total number of strains with sequence information at the position of the substitution. Since the first sample collected falls on 30 March 2009, the first data point in the percentage-time graph, which represents an average percentage of the substitution over the past 28 days, will be on 26 April 2009. As there are relatively much fewer sequences available from February 2010, inclusion of data from this date forward will result in unreliable fluctuations. Hence, data points from February 2010 onwards were not included in this percentage-time graph analysis. |
The structural mapping of the mutations is based on the crystal structure of 2009 H1N1 hemagglutinin (PDB: 3LZG)[6] modeled with a human host cell receptor analogue (LSTC) as well as a homology model of polymerase basic protein 2 using PDB: 2VQZ[7] as template. Modelling and visualization of structures was done with Yasara.[8] |
Sequencing methodology by IAL: Viral RNA was extracted either from clinical samples or supernatant fluid from MDCK infected cells using the QIAmp Viral RNA Extraction Kit (QIAGEN, Valencia, CA, US) according to the manufacturer's instructions. For viral RNA extraction from necropsy tissues the QIAmp Blood Viral RNA Extraction Kit was used instead. Primers designed to amplify the complete HA gene sequence as well as the RT-PCR amplification and sequencing protocols were those provided by WHO (http://www.who.int/csr/resources/publications/swineflu/sequencing_primers/en/index.html). RT-PCR products were directly sequenced using the ABI Prism Big Dye Terminator Cycle Sequencing Ready Reaction kit (PE Applied Biosystems, Foster City, CA, US), Sequences were determined in an Applied Biosystems 3130 ABI Genetic Analyzer. Original clinical samples and isolates from cell culture maintained the same pattern of mutations. The following sequences were deposited in GenBank under accession numbers: GQ247724; GQ356787; GQ368664-GQ368667; GQ414764-GQ414768; GQ915017-GQ915025. Accessions of sequences discussed in detail are indicated in the main text. |
|
References |
1. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, et al. The influenza virus resource at the National Center for Biotechnology Information. J. Virol. 2008 Jan;82(2):596-601. |
2. Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33(2):511-518. |
3. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006 Jul 1;22(13):1658-1659. |
4. Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 2003 Oct;52(5):696-704. |
5. Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 2007 Aug;24(8):1596-1599. |
6. Xu R, Ekiert DC, Krause JC, Hai R, Crowe JE, Wilson IA. Structural basis of preexisting immunity to the 2009 H1N1 pandemic influenza virus. Science. 2010 Apr 16;328(5976):357-360. |
7. Guilligay D, Tarendeau F, Resa-Infante P, Coloma R, Crepin T, Sehr P, et al. The structural basis for cap binding by influenza virus polymerase subunit PB2. Nat Struct Mol Biol. 2008 5;15(5):500-506. |
8. Krieger E, Koraimann G, Vriend G. Increasing the precision of comparative models with YASARA NOVA--a self-parameterizing force field. Proteins. 2002 May 15;47(3):393-402. |
|
|
|