From bffcf9271f0543c33e1e00289ba9221c119b07cb Mon Sep 17 00:00:00 2001 From: Don Pellegrino Date: Fri, 15 Jan 2010 18:43:32 +0000 Subject: Initial check-in. --- diff --git a/README b/README new file mode 100644 index 0000000..9caedb8 --- a/dev/null +++ b/README @@ -0,0 +1,35 @@ +Experiment 007 +Don Pellegrino [don@drexel.edu] + +Collection and inventory of influenza data. + +INTRODUCTION + +The "Influenza Virus Resource" at NCBI +[http://www.ncbi.nlm.nih.gov/genomes/FLU/] exposes the sequence records and +their meta-data in a number of different ways. An exploration of the +phylogenetic properties of the records first requires that the available data +be collected and inventoried. + +Two primary alternatives have been identified for managing the data. A +relational database can be used. IBM DB2 has been used for this. The use of +a relational database is limited by the difficulty in sharing the data. Each +vendor uses incompatible import and export routines. Additionally installing +an instance of a database management system (DBMS) often requires a large +amount of effort and many not be practical on hosted environments which do not +support the running of user daemons. Finally proper parallelization of a DBMS +will require additional system specific configuration for each machine used. + +An alternative to the DBMS is to use a container file format such as HDF5. +This has the advantage that all of the data can be collected into a single +file which can then be shared with others. It has the disadvantage that is +lacks the robust search and SQL operations provided by a DBMS. In addition to +two alternatives use fundamentally different storage strategies with the DBMS +using a relational model and the contain file format using a hierarchical +model. + +The "doc/Data Deployments.dia" diagram shows the source systems that +expose the various records as well as the transform routines that are +used for aggregation of the data on the local system. + + LocalWords: NCBI parallelization HDF SQL Pellegrino phylogenetic DBMS dia diff --git a/data/ProteinNames.txt b/data/ProteinNames.txt new file mode 100644 index 0000000..13ae313 --- a/dev/null +++ b/data/ProteinNames.txt @@ -0,0 +1,275 @@ +>A_PB2 + +MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITADKRITEMVPER +NEQGQTLWSKMSDAGSDRVMVSPLAVTWWNRNGPVTSTVHYPKVYKTYFDKVERLKHGTFGPVHFRNQVK +IRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELRDCKISPLMVAYMLERE +LVRKTRFLPVAGGTSSIYIEVLHLTQGTCWEQMYTPGGGVRNDDVDQSLIIAARNIVRRAAVSADPLASL +LEMCHSTQIGGTRMVDILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSVKKEEEVLTGNLQ +TLKIRVHEGYEEFTMVGKRATAILRKATRRLVQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF +VNRANQRLNPMHQLLRHFQKDAKVLFQNWGVEHIDSVMGMIGVLPDMTPSTEMSMRGIRVSKMGVDEYSS +TERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTERLTITYSSSMMWEINGPESVLVNTYQWIIRNWEAV +KIQWSQNPAMLYNKMEFEPFQSLVPKAIRSQYSGFVRTLFQQMRDVLGTFDTTQIIKLLPFAAAPPKQSR +MQFSSLTVNVRGSGMRILVRGNSPVFNYNKTTKRLTILGKDAGTLIEDPDESTSGVESAVLRGFLIIGKE +DRRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN + +>A_PB1 +MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTETGAPQLNPID +GPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCLETMEAVQQTRVDKLTQGRQTYDWTLNRNQPAA +TALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRRVRDNMTKKMVTQRTIGKKKQ +RVNKRGYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKA +KLANVVRKMMTNSQDTELSFTITGDNTKWNENQNPRMFLAMITYITKNQPEWFRNILSIAPIMFSNKMAR +LGKGYMFESKRMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTASLSPGMMMGMFNMLSTVLG +VSILNLGQKKYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGINMSKKKSYINKTGTF +EFTSFFYRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYR +CHRGDTQIQTRRSFELKKLWDQTQSRAGLLVSDGGPNLYNIRNLHIPEVCLKWELMDENYRGRLCNPLNP +FVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFF +PSSSYRRPIGISSMVEAMVSRARIDARIDFESGRIKKEEFSEIMKICSTIEELRRQK + +>A_PB1-F2 +MEQEQGTPWTQSTEHTNIQRRGSGRQIQKLGHPNSTQLMDHYLRIMNQVDMHKQTVSWRLWPSLKNPTQV +SLRTHALKQWKPFNRQGWTN + +>A_PA + +MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIVVELDDPNA +LLKHRFEIIEGRDRTMAWTVVNSICNTTGAGKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKS +ENTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMANRGLWDSFRQSERGEETIEEKFEITGT +MRRLADQSLPPNFSCLENFRAYVDGFEPNGCIEGKLSQMSKEVNAQIEPFLKTTPRPIKLPNGPPCYQRS +KFLLMDALKLSIEDPSHEGEGIPLYDAIKCIKTFFGWKEPYIVKPHEKGINSNYLLSWKQVLSELQDIEN +EEKIPRTKNMKKTSQLKWALGENMAPEKVDFENCRDISDLKQYDSDEPELRSLSSWIQNEFNKACELTDS +VWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCR +TKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQISRP +MFLYVRTNGTSKVKMKWGMEMRRCLLQSLQQIESMIEAESSVKEKDMTKEFFENKSEAWPIGESPKGVEE +GSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWV +LLNASWFNSFLTHALK + +>A_HA +MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGEICDS +PHQILDGENCTLIDALLGDPQCDGFQNKKWDLFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNES +FNWTGVTQNGTSSACIRRSNNSFFSRLNWLTHLKFKYPALNVTMPNNEKFDKLYIWGVHHPGTDNDQIFL +YAQASGRITVSTKRSQQTVIPNIGSRPRVRNIPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGK +SSIMRSDAPIGKCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGA +IAGFIENGWEGMVDGWYGFRHQNSEGIGQAADLKSTQAAIDQINGKLNRLIGKTNEKFHQIEKEFSEVEG +RIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTKKQLRENAEDMGNGCFKIYHKCD +NACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVALLGFIMWACQKGNI +RCNICI + +>A_NP +MASQGTKRSYEQMETDGDRQNATEIRASVGKMIDGIGRFYIQMCTELKLSDHEGRLIQNSLTIEKMVLSA +FDERRNKYLEEHPSAGKDPKKTGGPIYRRVDGKWMRELVLYDKEEIRRIWRQANNGEDATSGLTHIMIWH +SNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGIGTMVMELIRMVKRGINDRNFWRGE +NGRKTRSAYERMCNILKGKFQTAAQRAMVDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACA +YGPAVSSGYDFEKEGYSLVGIDPFKLLQNSQIYSLIRPNENPAHKSQLVWMACHSAAFEDLRLLSFIRGT +KVSPRGKLSTRGVQIASNENMDNMGSSTLELRSGYWAIRTRSGGNTNQQRASAGQTSVQPTFSVQRNLPF +EKSTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEVSFRGRGVFELSDEKATNPIVPSFDMSNEGSYFFG +DNAEEYDN + +>A_NA +MNPNQKIITIGSVSLTISTICFFMQTAILITTVTLHFKQYEFNSPPNNQVMLCEPTIIERNITEIVYLTN +TTIEKEICPKLAEYRNWSKPQCDITGFAPFSKDNSIRLSAGGDIWVTREPYVSCDPDKCYQFALGQGTTL +NNVHSNDTVRDRTPYRTLLMNELGVPFHLGTKQVCIAWSSSSCHDGKAWLHVCITGDDKNATASFIYNGR +LVDSIVSWSKEILRTQESECVCINGTCTVVMTDGSASGKADTKILFIEEGKIVHTSTLSGSAQHVEECSC +YPRYPGVRCVCRDNWKGSNRPIVDINIKDHSIVSSYVCSGLVGDTPRKNDSSSSSHCLDPNNEEGGHGVK +GWAFDDGNDVWMGRTISEKSRLGYETFKVIEGWSNPKSKLQINRQVIVDRGNRSGYSGIFSVEGKSCINR +CFYVELIRGRKEETEVLWTSNSIVVFCGTSGTYGTGSWPDGADINLMPI + +>A_NA +MNTNQRIITIGTICLIVGIISLLLQIGNIILLWMSHSIQTGEKSHPKVCNQSVITYENNTWVNQTYVNIS +NTNIAAGQGVTPIILAGNSSLCPISGWAIYSKDNSIRIGSKGDIFVMREPFISCSHLECRTFFLTQGALL +NDRHSNGTVKDRSPYRTLMSCPIGEAPSPYNSRFESVAWSASACHDGMGWLTIGISGPDNGAVAVLKYNG +IITDTIKSWRNKILRTQESECVCINGSCFTIMTDGPSNGQASYKLFKMEKGKIIRSIELDAPNYHYEECS +CYPDTGKVVCVCRDNWHASNRPWVSFDQNLDYQIGYICSGVFGDNPRSNDGKGNCGPVLSNGANGVKGFS +FRYGNGVWIGRTKSISSRSGFEMIWDPNGWTETDSSFSMKQDIIALTDWSGYSGSFVQHPELTGMNCIRP +CFWVELIRGQPKESTIWTSGSSISFCGVNSGTASWSWPDGADLPFTIDK + +>A_M1 +MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFSGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPS +ERGLQRRRFVQNALNGNGDPNNMDKAVKLYRKLKREITFHGAKEIALSYSAGALASCMGLIYNRMGAVTT +EVAFGLVCATCEQIADSQHRSHRQMVATTNPLIRHENRMVLASTTAKAMEQMAGSSEQAAEAMEIASQAR +QMVQAMRAIGTHPSSSTGLRDDLLENLQTYQKRMGVQMQRFK + +>A_M2 +PIRNEWGCRCNDSSDPLVVAANIIGILHLILWILDRLFFKCVYRLFKHGLKRGPSTEGVPE +SMREEYRKEQQNAVDADDSHFVSIELE + +>A_NS1 +MDSNTVSSFQVDCFLWHIRKQVVDQELSDAPFLDRLRRDQRSLRGRGNTLGLDIKAATHVGKQIVEKILK +EESDEALKMTMVSTPASRYITDMTIEELSRNWFMLMPKQKVEGPLCIRMDQAIMEKNIMLKANFSVIFDR +LETIVLLRAFTEEGAIVGEISPLPSFPGHTIEDVKNAIGVLIGGLEWNDNTVRVSKNLQRFAWRSSNENG +GPPLTPKQKREMARTARSKV + +>A_NS2 +DILLRMSKMQLGSSSEDLNGMITQFESLKIYRDSLGEAVMRMGDLHLLQNRNGKWREQLG +QKFEEIRWLIEEVRHRLKTTENSFEQITFMQALQLLFEVEQEIRTFSFQLI +>B_PB1 +MNINPYFLFIDVPIQAAISTTFPYTGVPPYSHGTGTGYTIDTVIRTHEYSNKGKQYISDVTGCTMVDPTN +GPLPEDNEPSAYAQLDCVLEALDRMDEEHPGLFQAASQNAMEALMVTTVDKLTQGRQTFDWTVCRNQPAA +TALNTTITSFRLNDLNGADKGGLIPFCQDIIDSLDRPEMTFFSVKNIKKKLPAKNRKGFLIKRIPMKVKD +KITKVEYIKRALSLNTMTKDAERGKLKRRAIATAGIQIRGFVLVVENLAKNICENLEQSGLPVGGNEKKA +KLSNAVAKMLSNCPPGGISMTVTGDNTKWNECLNPRIFLAMTERITRDSPIWFRDFCSIAPVLFSNKIAR +LGKGFMITSKTKRLKAQIPCPDLFSIPLERYNEETRAKLKKLKPFFNEEGTASLSPGMMMGMFNMLSTVL +GVAALGIKNIGNKEYLWDGLQSSDDFALFVNAKDEETCMEGINDFYRTCKLLGVNMSKKKSYCNETGMFE +FTSMFYRDGFVSNFAMELPSFGVAGVNESADMAIGMTIIKNNMINNGMGPATAQTAIQLFIADYRYTYKC +HRGDSKVEGKRMKIIKELWENTKGRDGLLVADGGPNIYNLRNLHIPEIVLKYNLMDPEYKGRLLHPQNPF +VGHLSIEGIKEADITPAHGPVKKMDYDAVSGTHSWRTKRNRSILNTDQRNMILEEQCYAKCCNLFEACFN +SASYRKPVGQHSMLEAMAHRLRMDARLDYESGRMSKDDFEKAMAHLGEIGYI + +>B_PB2 + +MTLAKIELLKQLLRDNEAKTVLKQTTVDQYNIIRKFNTSRIEKNPSLRMKWAMCSNFPLALTKGDMANRI +PLEYKGIQLKTNAEDIGTKGQMCSIAAVTWWNTYGPIGDTEGFERVYESFFLRKMRLDNATWGRITFGPV +ERVRKRVLLNPLTKEMPPDEASNVIMEILFPKEAGIPRESTWIHRELIKEKREKLKGTMITPIVLAYMLE +RELVARRRFLPVAGATSAEFIEMLHCLQGENWRQIYHPGGNKLTESRSQSMIVACRKIIRRSIVASNPLE +LAVEIANKTVIDTEPLKSCLAAIDGGDVACDIIRAALGLKIRQRQRFGRLELKRISGRGFKNDEEILIGN +GTIQKIGIWDGEEEFHVRCGECRGILKKSKMKLEKLLINSAKKEDMRDLIILCMVFSQDTRMFQGVRGEI +NFLNRAGQLLSPMYQLQRYFLNRSNDLFDQWGYEESPKASELHGINESMNASDYTLKGVVVTRNVIDDFS +STETEKVSITKNLSLIKRTGEVIMGANDVSELESQAQLMITYDTPKMWEMGTTKELVQNTYQWVLKNLVT +LKAQFLLGKEDMFQWDAFEAFESIIPQKMAGQYSGFARAVLKQMRDQEVMKTDQFIKLLPFCFSPPKLRS +NGEPYQFLKLVLKGGGENFIEVRKGSPLFSYNPQTEVLTICGRMMSLKGKIEDEERNRSMGNAVLAGFLV +SGKYDPDLGDFKTIEELEKLKPGEKANILLYQGKPVKVVKRKRYSALSNDISQGIKRQRMTVESMGWALS + +>B_PA + +MDTFITRNFQTTIIQKAKNTMAEFSEDPELQPAMLFNICVHLEVCYVISDMNFLDEEGKAYTALEGQGKE +QNLRPQYEVIEGMPRTIAWMVQRSLAQEHGIETPKYLADLFDYKTKRFIEVGITKGLADDYFWKKKEKLG +NSMELMIFSYNQDYSLSNESSLDEEGKGRVLSRLTELQAELSLKNLWQVLIGEEDVEKGIDFKLGQTISR +LRDISVPAGFSNFEGMRSYIDNIDPKGAIERNLARMSPLVSVTPKKLTWEDLRPIGPHIYDHELPEVPYN +AFLLMSDELGLANMTEGKSKKPKTLAKECLEKYSTLRDQTDPILIMKSEKANENFLWKLWRDCVNTISNE +ETSNELQKTNYAKWATGDGLTYQKIMKEVAIDDETMCQEEPKIPNKCRVAAWVQTEMNLLSTLTSKRALD +LPEIGPDVAPVEHVGSERRKYFVNEINYCKASTVMMKYVLFHTSLLNESNASMGKYKVIPITNRVVNEKG +ESFDMLYGLAVKGQSHLRGDTDVVTVVTFEFSSTDPRVDSGKWPKYTVFRIGSLFVSGREKSVYLYCRVN +GTNKIQMKWGMEARRCLLQSMQQMEAIVEQESSIQGYDMTKACFKGDRVNSPKTFSIGTQEGKLVKGSFG +KALRVIFTKCLMHYVFGNAQLEGFSAESRRLLLLIQALKDRKGPWVFDLEGMYSGIEECISNNPWVIQSA +YWFNEWLGFEKEGSKVLESVDEIMDE + +>B_HA +MKAIIVLLMVVTSNADRICTGITSSNSPHVVKTATQGEVNVTGVIPLTTTPTKSHFANLKGTETRGKLCP +KCLNCTDLDVALGRPKCTGNIPSARVSILHEVRPVTSGCFPIMHDRTKIRQLPNLLRGYEHIRLSTHNVI +NAENAPGGPYKIGTSGSCPNVTNGNGFFATMAWAVPKNDNNKTATNSLTIEVPYICTEGEDQITVWGFHS +DNETQMAKLYGDSKPQKFTSSANGVTTHYVSQIGGFPNQTEDGGLPQSGRIVVDYMVQKSGKTGTITYQR +GILLPQKVWCASGRSKVIKGSLPLIGEADCLHEKYGGLNKSKPYYTGEHAKAIGNCPIWVKTPLKLANGT +KYRPPAKLLKERGFFGAIAGFLEGGWEGMIAGWHGYTSHGAHGVAVAADLKSTQEAINKITKNLNSLSEL +EVKNLQRLSGAMDELHNEILELDEKVDDLRADTISSQIELAVLLSNEGIINSEDEHLLALERKLKKMLGP +SAVEIGNGCFETKHKCNQTCLDRIAAGTFDAGEFSLPTFDSLNITAASLNDDGLDNHTILLYYSTAASSL +AVTLMIAIFVVYMVSRDNVSCSICL + +>B_NP +MSNMDIDGINTGTIDKAPEEITSGTSGTTRPIIRPATLAPPSNKRTRNPSPERATTIGEADVGRKTQKKQ +TPTEIKKSVYNMVVKLGEFYNQMMVKAGLNDDMERNLIQNAHAVERILLAATDDKKTEFQKKKNARDVKE +GKEEIDHNKTGGTFYKMVRDDKTIYFSPIRVTFLKEEVKTMYKTTMGSDGFSGLNHIMIGHSQMNDVCFQ +RSKALKRVGLDPSLISTFAGSTLPRRSGATGVAIKGGGTLVAEAIRFIGRAMADRGLLRDIKAKTAYEKI +LLNLKNKCSAPQQKALVDQVIGSRNPGIADIEDLTLLARSMVVVRPSVASKVVLPISIYAKIPQLGFNVE +EYSMVGYEAMALYNMATPVSILRVGDDAKDKSQLFFMSCFGAAYEDLRVLSALTGTEFKPRSALKCKGFH +VPAKEQVEGMGAALMSIKLQFWAPMTRSGGNEVGGDGGSGQISCSPVFAVERPIALSKQAVRRMLSMNIE +GRDADVKGNLLKMMNDSMAKKTNGNAFIGKKMFQISDKNKTNPVEIPIKQTIPNFFFGRDTAEDYDDLDY +>B_NB +MNNATFNYTNVNPISHIRGSIIITICVSFIIILTIFGYIAKILTNRNNCTNNAIGLCKCIKCSGCEPFCN +KRGDTSSPRTGVDIPAFILPGLNLSESTPN + +>B_NA +MLPSTIQTLTLFLTSGGVLLSLYVSASLSYLLYSDILLKFSPTEITAPTMPLDCANASNVQAVNRSATKG +VTLLLPEPEWTYPRLSCPGSTFQKALLISPHRFGETKGNSAPLIIREPFIACGPNECKHFALTHYAAQPG +GYYNGTRGDRNKLRHLISVKLGKIPTVENSIFHMAAWSGSACHDGKEWTYIGVDGPDNNALLKIKYGEAY +TDTYHSYANKILRTQESACNCIGGNCYLMITDGSASGVSECRFLKIREGRIIKEIFPTGRVKHTEECTCG +FASNKTIECACRDNSYTAKRPFVKLNVETDTAEIRLMCTDTYLDTPRPDDGSITGPCESNGDKGSGGIKG +GFVHQRMASKIGRWYSRTMSKTERMGMGLYVKYDGDPWADSDALAFSGVMVSMKEPGWYSFGFEIKDKKC +DVPCIGIEMVHDGGKETWHSAATAIYCLMGSGQLLWDTVTGVDMAL + +>B_M1 + +MSLFGDTIAYLLSLTEDGEGKAELAEKLHCWFGGKEFDLDSALEWIKNKRCLTDIQKALIGASICFLKPK +DQERKRRFITEPLSGMGTTATKKKGLILAERKMRRCVSFHEAFEIAEGHESSALLYCLMVMYLNPGNYSM +QVKLGTLCALCEKQASHSHRAHSRAARSSVPGVRREMQMVSAMNTAKTMNGMGKGEDVQKLAEELQSNIG +VLRSLGASQKNGEGIAKDVMEVLKQSSMGNSALVKKYL + +>B_BM2 + +MLEPFQILSICSFILSALHFMAWTIGHLNQIKRGINMKIRIKGPNKETINREVSILRHSYQKEIQAKETM +KEVLSDNMEVLSDHIIIEGLSAEEIIKMGETVLEIEELH + +>B_NS1 +MANNMTTTQIEVGPGATNATINFEAGILECYERLSWQRALDYPGQDRLNRLKRKLESRIKTHNKSEPESK +RMSLEERKAIGVKMMKVLLFMNPSAGIEGFEPYCMKSSSNSNCTKYNWTDYPSTPGRCLDDIEEEPEDVD +GPTEIVLRDMNNKDARQKIKEEVNTQKEGKFRLTIKRDMRNVLSLRVLVNGTFLKHPNGYKSLSTLHRLN +AYDQSGRLVAKLVATDDLTVEDEEDGHRILNSLFERLNEGHSKPIRAAETAVGVLSQFGQEHRLSPEEGD +N + +>B_NS2 +WRMKKMAIGSSTHSSSVLMKDIQSQFEQLKLRWESYPNLVKSTDYHQKRETIRLVTEEL +YLLSKRIDDNILFHKTVIANSSIIADMVVSLSLLETLYEMKDVVEVYSRQCL + +>C_CM2 +MGRMAMKWLVVIICFSITSQPASACNLKTCLKLFNNTDAVTVHCFNENQGYMLTLASLGLGIITMLYLLV +KIIIELVNGFVLGRWERWCGDIKTTIMPEIDSMEKDIALSRERLDLGEDAPDETDNSPIPFSNDGIFEI +>C_M1 +MAHEILIAETEAFLKNVAPETRTAIISAITGGKSACKSAAKLIKNEHLPLMSGEATTMHIVMRCLYPEIK +PWKKASDMLNKATSSLKKSEGRDIRKQMKAAGDFLGVESMMKMRAFRDDQIMEMVEEVYDHPDDYTPDIR +IGTITAWLRCKNKKSERYRSNVSESGRTALKIHEVRKASTAMNEIAGITGLGEEALSLQRQTESLAILCN +HTFGSNIMRPHLEKAIKGVEGRVGEMGRMAMK +>C_NP +MSDRRQNRKTPDEQRKANALIINENIEAYIAICKEVGLNGDEMLILENGIAIEKAIRICCDGKYQEKREK +KAREAQRADSNFNADSIGIRLVKRAGSGTNITYHAVVELTSRSRIVQILKSHWGNELNRAKIAGKRLGFS +ALFASNLEAIIYQRGRNAARRNGSAELFTLTQGAGIETRYKWIMEKHIGIGVLIADAKGLINGKREGKRG +VDANVKLRAGTTGSPLERAMQGIEKKAFPGPLRALARRVVKANYNDAREALNVIAEASLLLKPQITNKMT +MPWCMWLAARLTLKDEFANFCAYAGRRAFEVFNIAMEKIGICSFQGTIMNDDEIESIEDKAQVLMMACFG +LAYEDFSLVSAMVSHPLKLRNRMKIGNFRVGEKVSTVLSPLLRFTRWAEFAQRFALQANTSREGAQISNS +AVFAVERKITTDVQRVEELLNKVQAHEDEPLQTLYKKVREQISIIGRNKSEIKEFLGSSMYDLNDQEKQN +PINFRSGAHPFFFEFDPDYNPIRVKRPKKPIAKRNSNISRLEEEGMDENSEIGQAKKMKPLDQLTSTSSN +IPGKN +>C_HE +MFFSLLLMLGLTEAEKIKICLQKQVNSSFSLHNGFGGNLYATEEKRMFELVKPKAGASVLNQSTWIGFGD +SRTDKSNSAFPRSADVSAKTADKFRSLSGGSLMLSMFGPPGKVDYLYQGCGKHKVFYEGVNWSPHAAINC +YRKNWTDIKLNFQKNIYELASQSHCMSLVNALDKTIPLQATAGVAKNCNNSFLKNPALYTQEVNPSVEKC +GKENLAFFTLPTQFGTYECKLHLVASCYFIYDSKEVYNKRGCDNYFQVIYDSSGKVVGGLDNRVSPYTGN +SGDTPTMQCDMLQLKPGRYSVRSSPRFLLMPERSYCFDMKEKGPVTAVQSIWGKGRESDHAVDQACLSTP +GCMLIQKQKPYIGEADDHHGDQEMRELLSGLDYEARCISQSGWVNETSPFTEEYLLPPKFGRCPLAAKEE +SIPKIPDGLLIPTSGTDTTVTKPKSRIFGIDDLIIGLLFVAIVEAGIGGYLLGSRKVSGGGVTKESAEKG +FEKIGNDIQILRSSTNIAIEKLNDRISHDEQAIRDLTLEIENARSEALLGELGIIRALLVGNISIGLQES +LWELASEITNRAGDLAVEVSPGCWVIDNNICDQSCQNFIFKFNETAPVPTIPPLDTKIDLQSDPFYWGSS +LGLAITAAISLAALVISGIAICRTK +>C_P3 +MSKTFAEIAEAFLEPEAVRIAKEAVEEYGDHERKIIQIGIHFQVCCMFCDEYLSTNGSDRFVLIEGRKRG +TAVSLQNELCKSYDLEPLPFLCDIFDREEKQFVEIGITRKADDSYFQSKFGKLGNSCKIFVFSYDGRLDK +NCEGPMEEQKLRIFSFLATAADFLRKENMFNEIFLPDNEETIIEMKKGKTFLKLRDESVPLPFQTYEQMK +DYCEKFKGNPRELASKVSQMQSNIKLPIKHYEQNKFRQIRLPKGPMAPYTHKFLMEEAWMFTKISDPERS +RAGEILIDFFKKGNLSAIRPKDKPLQGKYPIHYKNLWNQIKAAIADRTMVINENDHSEFLGGIGRASKKI +PEVSLTQDVITTEGLKQSENKLPEPRSFPKWFNAEWMWAIKDSDLTGWVPMAEYPPADNELEDYAEHLNK +TMEGVLQGTNCAREMGKCILTVGALMTECRLFPGKIKVVPIYARSKERKSMQEGLPVPSEMDCLFGICVK +SKSHLNKDDGMYTIITFEFSIREPNLEKHQKYTVFEAGHTTVRMKKGESVIGREVPLYLYCRTTALSKIK +NDWLSKARRCFITTMDTVETICLRESAKAEENLVEKTLNEKQMWIGKKNGELIAQPLREALRVQLVQQFY +FCIYNDSQLEGFCNEQKKILMALEGDKKNKSSFGFNPEGLLEKIEECLINNPMCLFMAQRLNELVIEASK +RGAKFFKID +>C_PB1 +MEINPYLMFLNNDVTSLISTTYPYTGPPPMSHGSSTKYTLETIKRTYDYSRTSVEKTSKVFNIPRRKFCN +CLEDKDELVKPTGNVDISSLLGLAEMMEKRMGEGFFKHCVMEAETEILKMHFSRLTEGRQTYDWTSERNM +PAATALQLTVDAIKETEGPFKGTTMLEYCNKMIEMLDWKEVKFRKVKTMVRREKDKRSGKEIKTKVPVMG +IDSIKHDEFLIRALTINTMAKDGERGKLQRRAIATPGMIVRPFSKIVETVAQKICEKLKESGLPVGGNEK +KAKLKTTVTSLNARMNSDQFAVNITGDNSKWNECQQPEAYLALLAYITKDSSDLMKDLCSVAPVLFCNKF +VKLGQGIRLSNKRKTKEVIIKAEKMGKYKNLMREEYKNLFEPLEKYIQKDVCFLPGGMLMGMFNMLSTVL +GVSTLCYMDEELKAKGCFWTGLQSSDDFVLFAVASNWSNIHWTIRRFNAVCKLIGINMSLEKSYGSLPEL +FEFTSMFFDGEFVSNLAMELPAFTTAGVNEGVDFTAAMSIIKTNMINNSLSPSTALMALRICLQEFRATY +RVHPWDSRVKGGRMKIINEFIKTIENKDGLLIADGGKLMNNISTLHIPEEVLKFEKMDEQYRNRVFNPKN +PFTNFDKTIDIFRAHGPIRVEENEAVVSTHSFRTRANRTLLNTDMRAMMAEEKRYQMVCDMFKSVFESAD +INPPIGAMSIGEAIEEKLLERAKMKRDIGAIEDSEYEEIKDIIRDAKKARIESR +>C_PB2 +MSFLLTIAKEYKRLCQDAKAAQMMTVGTVSNYTTFKKWTTSRKEKNPSLRMRWAMSSKFPIIANKRMLEE +AQIPKEHNNVALWEDTEDVSKRDHVLASASCINYWNFCGPCVNNSEVIKEVYKSRFGRLERRKEIMWKEL +RFTLVDRQRRRVDTQPVEQRLRTGEIKDLQMWTLFEDEAPLASKFILDNYGLVKEMRSKFANKPLNKEVV +AHMLEKQFNPESRFLPVFGAIRPERMELIHALGGETWIQEANTAGISNVDQRKNDMRAVCRKVCLAANAS +IMNAKSKLVEYIKSTSMRIGETERKLEELILETDDVSPEVTLCKSALGGPLGKTLSFGPMLLKKISGSGV +KVKDTVYIQGVRAVQFEYWSEQEEFYGEYKSATALFSRKERSLEWITIGGGINEDRKRLLAMCMIFCRDG +DYFKDAPATITMADLSTKLGREIPYQYVMMNWIQKSEDNLEALLYSRGIVETNPGKMGSSMGIDGSKRAI +KSLRAVTIQSGKIDMPESKEKIHLELSDNLEAFDSSGRIVATILDLPSDKKVTFQDVSFQHPDLAVLRDE +KTAITKGYEALIKRLGTGDNDIPSLIAKKDYLSLYNLPEVKLMAPLIRPNRKGVYSRVARKLVSTQVTTG +HYSLHELIKVLPFTYFAPKQGMFEGRLFFSNDSFVEPGVNNNVFSWSKADSSKIYCHGIAIRVPLVVGDE +HMDTSLALLEGFSVCENDPRAPMVTRQDLIDVGFGQKVRLFVGQGSVRTFKRTASQRAASSDVNKNVKKI +KMSN +>C_NS1 +MSDKTVKSTNLMAFVATKMLERQEDLDTCTEMQVEKMKTSTKARLRTESSFAPRTWEDAIKDGELLFNGT +ILQAESPTMTPASVEMKGKKFPIDFAPSNIAPIGQNPIYLSPCIPNFDGNVWEATMYHHRGATLTKTMNC +NCFQRTIWCHPNPSRMRLSYAFVLYCRNTKKICGYLIAKQVAGIETGIRKCFRCIKSGFVMATDEISLTI +LQSIKSGAQLDPYWGNETPDIDKTEAYMLSLREAGP +>C_NS2 +EILRRSVD +TSSLNKWPELKQELENVSDALKADSLWLPMKSLSLYSKVSNQEPSSIPIGEMKHQILTRLKLICSRLEKL +DLNLSKAVLGIQNSEDLILIIYNRDVCKNTILMIKSLCNSLI diff --git a/doc/Data Deployments.dia b/doc/Data Deployments.dia new file mode 100644 index 0000000..b8ad4af --- a/dev/null +++ b/doc/Data Deployments.dia Binary files differ -- cgit v0.8.3.1-22-g547a