Data Fields for Analyses

Interesting paths and connections through the data.

Data Fields for Analyses

Postby donpellegrino » Fri Aug 21, 2009 12:22 pm

The National Center for Biotechnology Information (NCBI) provides nucleotide and protein sequences for influenza as well as a few fields describing the sequences. A collection of this data is available from the "NCBI Influenza Virus Sequence Database" via FTP at ftp://ftp.ncbi.nih.gov/genomes/INFLUENZA/. I have loaded the protein sequences and their descriptive fields into a relational database and created a view to join the sequence name and GI number with the accession number and fields describing the virus. The following fields are available:

Identifier Fields
    NCBI GI Identifier
    NCBI GB Identifier / Accession number
Information about the sequence record.
    Sequence name
    Sequence length
    Genome segment number
    Sequence is full length indicator
Information about the virus.
    Virus name
    Subtype
    Country
    Year
    Host
    Age
    Gender
With this data many interesting visualizations can be created. For example a temporal analysis might include an annotation for the base layer of the map with the values from the year field.
donpellegrino
 
Posts: 14
Joined: Wed Aug 19, 2009 1:52 pm

Re: Data Fields for Analyses

Postby donpellegrino » Fri Aug 21, 2009 10:40 pm

Using eFetch to collect data from the GenBank protein database for each GI in the "NCBI Influenza Virus Sequence Database" a few additional fields are added to the set. These are populated with varying levels of completeness and accuracy.

Identifier Fields
    NCBI GI Identifier
    NCBI GB Identifier / Accession number
    NCBI Accession number and version
    NCBI Locus identifier
    Day record was updated
    Day record was created
    Number of citations listed in GenBank
Information about the sequence record.
    Sequence name
    Sequence length
    Genome segment number
    Sequence is full length indicator
Information about the virus.
    GenBank Organism Feature / Qualifier
    Virus name
    GenBank Strain Feature / Qualifier
    GenBank Serotype Feature / Qualifier
    Subtype
    Country
    Year
    Host
    Age
    Gender
donpellegrino
 
Posts: 14
Joined: Wed Aug 19, 2009 1:52 pm


Return to Analyses

Who is online

Users browsing this forum: No registered users and 1 guest

cron