Links

Tools

Export citation

Search in Google Scholar

GigaDB schema update to accommodate the growing variety of data.

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

GigaScience (http://www.gigasciencejournal.com) is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB (http://www.gigadb.org). GigaScience is co-published in collaboration between BGI and BioMed Central, to meet the needs of a new generation of biological and biomedical research as it enters the era of " big-data. " The journal's scope covers studies from the entire spectrum of the life sciences that produce and use large-scale data as the center of their work. Data from these articles are hosted in GigaDB, from where they can be cited to provide a direct link between the study and the data supporting it, as well as access to relevant tools for reproducing or reusing these data. Due to the scope of GigaScience, GigaDB needs to host a wider variety of data type than most biological databases. In order to make this possible, we have created and launched a new version of GigaDB that now uses a fully extensible database schema capable of handling this variety of data types and standards. The schema has 3 main areas, centered around these tables: Dataset Sample Data/File These are roughly analogous to those used by other common systems for submitting /curating biological data, including the SRA (http://www.ebi.ac.uk/ena/submit/metadata-model), and the ISA infrastructure (http://isatab.sourceforge.net/format.html). The dataset part includes tables to store information about the overall study design, the authors and funding bodies. It also acts as a holder to link together all the samples and data associated with it, as well as providing links to external sources. The Sample area of the schema plays host to the sample metadata and sample relationships, including their relationship to particular data files. Here we present the schema, and in an poster we show how it is implemented for metadata capture in our Submission Wizard.