Data Curation & Information Management
The National Science Foundation recognises the need for foresight in data management. This project uses the principles and technolgy developed within the NSF Long Term Ecological Research (LTER) community, specifically collaborating with the Moorea Coral Reef LTER project which shares data of similar types.
Types of data
Coral cover raw data are digital images, whether directly photographed, captured from video, or scanned from film slides. Derived data are coral photo quadrat cover analysis results in tabular form. Juvenile coral density raw data are tabular in situsurvey counts.
Data tables in the catalog are described in Ecological Metadata Language (EML) to integration level using the program oXygen. Keywords are selected from the NBII Thesaurus where available. Units are selected from the LTER Unit Registry. Data packages, the data tables and their metadata combined, are registered with the Knowledge Network for Biocomplexity (KNB) with system-wide unique package identifiers. The most recent revision appears by default and previous revisions are archived and accessible by specifying the revision number.
Data users are required to agree to the Data Use Agreement which ensures proper attribution and allowed uses.
Data handling, Quality assurance
Image data are stored on the filesystem and backed up regularly disk-to-disk as well as archived in offline media storage. Stored originally in Excel files, data were reformatted to upload into a relational database in normalized tables. Data tables in the data catalog are cached on the fileystem from stored queries in the database. The process of modeling the structure inherent in data and implementing that data model in a relational database ensures a high level of consistency, as any data inconsistent with the expected structure will not load. Taxonomic codes and sample sites are linked to a controlled vocabulary in the database.
The Entity-Relationship Diagram below illustrates the data model. Sample site locations and cover type classification, including phylogenic taxonomy, are stored in self-referential tables enforcing a tree structure in levels of aggregation. Input coral cover analysis data are loaded in a denormalized (“wide-format”) table, then queried into a normalized (“long-format”) table (a process commonly referred to as a “reverse pivot”).