summaryrefslogtreecommitdiffstats
Side-by-side diff
-rw-r--r--README35
-rw-r--r--data/ProteinNames.txt275
-rw-r--r--doc/Data Deployments.diabin0 -> 3566 bytes
3 files changed, 310 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..9caedb8
--- a/dev/null
+++ b/README
@@ -0,0 +1,35 @@
+Experiment 007
+Don Pellegrino [don@drexel.edu]
+
+Collection and inventory of influenza data.
+
+INTRODUCTION
+
+The "Influenza Virus Resource" at NCBI
+[http://www.ncbi.nlm.nih.gov/genomes/FLU/] exposes the sequence records and
+their meta-data in a number of different ways. An exploration of the
+phylogenetic properties of the records first requires that the available data
+be collected and inventoried.
+
+Two primary alternatives have been identified for managing the data. A
+relational database can be used. IBM DB2 has been used for this. The use of
+a relational database is limited by the difficulty in sharing the data. Each
+vendor uses incompatible import and export routines. Additionally installing
+an instance of a database management system (DBMS) often requires a large
+amount of effort and many not be practical on hosted environments which do not
+support the running of user daemons. Finally proper parallelization of a DBMS
+will require additional system specific configuration for each machine used.
+
+An alternative to the DBMS is to use a container file format such as HDF5.
+This has the advantage that all of the data can be collected into a single
+file which can then be shared with others. It has the disadvantage that is
+lacks the robust search and SQL operations provided by a DBMS. In addition to
+two alternatives use fundamentally different storage strategies with the DBMS
+using a relational model and the contain file format using a hierarchical
+model.
+
+The "doc/Data Deployments.dia" diagram shows the source systems that
+expose the various records as well as the transform routines that are
+used for aggregation of the data on the local system.
+
+ LocalWords: NCBI parallelization HDF SQL Pellegrino phylogenetic DBMS dia

Valid XHTML 1.0 Strict

Copyright © 2009 Don Pellegrino All Rights Reserved.