Primary Databases: Explanation, Categories, Examples, Applications

What are Primary Databases?
GenBank
European Molecular Biology Laboratory (EMBL)
DNA Data Bank of Japan (DDBJ)
Protein Data Bank (PDB)
Gene Expression Omnibus (GEO)
Applications of Primary Databases

The explosion of raw sequence data in modern genomic research necessitates the existence of biological databases to efficiently store and structure this immense amount of data. These databases serve as repositories for biological data, enabling convenient retrieval of information through effective organization and storage methods.

What are Primary Databases?

Primary databases are a type of biological database containing original and unprocessed biological data.
These databases typically consist of raw sequences, such as nucleotide or protein sequences, or structural information, such as molecular structures.
Several primary sequence databases are widely used in the field of bioinformatics:

GenBank at the National Center for Biotechnology Information (NCBI)
DNA Database of Japan (DDBJ)
European Molecular Biology Laboratory (EMBL)

Other examples of primary databases include:

Protein Data Bank (PDB)
Gene Expression Omnibus (GEO)
ArrayExpress

GenBank

GenBank is a primary biological database managed by the National Center for Biotechnology Information (NCBI).
It is an annotated collection of publicly available sequences, including information about genes, proteins, and other genetic elements.
GenBank is part of the International Nucleotide Sequence Database Collaboration (INSDC), a joint effort between three primary databases: GenBank, DDBJ, and EMBL.
These organizations work collaboratively to share sequence data globally on a daily basis, ensuring that the data in each database is up-to-date and accurate.
The GenBank flat file format is used to represent sequence data and annotations in the database.
GenBank accepts mRNA or genomic sequence data with proper source organism information and annotation provided by the submitter.
However, the database does not accept:

Noncontiguous sequences
Primer sequences
Protein sequences without underlying nucleotide submission
Mixed genomic and mRNA sequences
Consensus sequences
Sequences with lengths of less than 200 nucleotides

To submit sequences to this database, several web-based tools are available:

BankIt: A web-based submission tool allowing users to submit gene sequences to the GenBank database. It supports the submission of sets of sequences.
Sequin: A stand-alone submission tool used for more complex submissions, such as those containing long sequences, multiple annotations, or gapped sequences. Sequin can be downloaded from the FTP site for use on Mac, PC, and UNIX platforms. Each Sequin file should have fewer than 10,000 sequences to ensure maximum performance.
tbl2asn: A stand-alone tool used for even larger submissions. Like Sequin, tbl2asn can be downloaded from the FTP site. The submitter can work offline to prepare the submission and then submit it using tbl2asn.

European Molecular Biology Laboratory (EMBL)

EMBL (European Molecular Biology Laboratory) is a collection of nucleotide sequence data maintained by the European Bioinformatics Institute (EBI).
It is part of the International Nucleotide Sequence Database Collaboration (INSDC), along with the GenBank and DDBJ databases.
EMBL’s main focus is on the storage and distribution of nucleotide and protein sequences, providing tools and resources for researchers to analyze and interpret this data.
Like other primary databases, EMBL collects and archives data from various sources, including scientific publications and direct submissions from researchers.
One of the main features of EMBL is its user-friendly interface, allowing researchers to easily search for and retrieve data.
EMBL offers a range of tools and resources for sequence analysis, including:

Alignment tools
Phylogenetic trees
Protein structure prediction software

EMBL uses a sequence submission tool called Webin:

This tool is web-based and can be accessed through EMBL’s website.
With Webin, researchers can submit single sequences, multiple sequences, or a large number of sequences.

DNA Data Bank of Japan (DDBJ)

DDBJ (DNA Data Bank of Japan) is a primary database that collects and stores genetic information, mainly from Japanese researchers, but also assigns accession numbers to researchers from other countries.
DDBJ is a member of the International Nucleotide Sequence Database Collaboration (INSDC) and regularly exchanges collected data with EMBL and GenBank.
DDBJ's main activities include:

Collecting and exchanging nucleotide sequence data
Managing bioinformatics tools for data submission and retrieval
Developing tools for biological data analysis
Organizing Bioinformatics Training Courses in Japanese to teach people how to analyze biological data

DDBJ uses a newly developed web-based tool called the Nucleotide Sequence Submission System (NSSS) for sequence submissions.

The NSSS replaced Sakura in November 2012. Sakura had been used for sequence submission since 1995.
For submissions involving very long or numerous sequences, DDBJ recommends using its Mass Submission System (MSS).

Protein Data Bank (PDB)

PDB (Protein Data Bank) is a global database that stores information about the structure of biological macromolecules.
It is managed by the Research Collaboratory for Structural Bioinformatics (RCSB) and provides many services to help researchers access and analyze the structural data.
PDB collects and archives 3D-atomic level structural models of macromolecules obtained through three commonly used experimental techniques:

Crystallography
Nuclear magnetic resonance spectroscopy (NMR)
Electron microscopy (3DEM)

The database entries are mostly structures of proteins, but also include:

Nucleic acids
Carbohydrates
Theoretical models

In addition to the structural models, PDB archives:

Experimental data
Associated metadata
Other details about the molecules

Gene Expression Omnibus (GEO)

GEO (Gene Expression Omnibus) is a public database that stores high-throughput gene expression and functional genomics data.
It was created in 2000 as a resource for gene expression studies and has since expanded to include other types of data such as genome methylation and chromatin structure.
The database requires researchers to provide:

Raw data
Processed data
Descriptive metadata

The original submitter-supplied GEO records are of three types:

Platform: Describes the array or sequencer used
Sample: Describes the source and analysis of the sample
Series: Links related Samples and describes a whole study

These records are organized into two categories:

DataSet: A curated collection of comparable Samples that share a common set of array elements
Profile: Consists of expression measurements for a gene across all Samples in a DataSet

Applications of Primary Databases

GenBank and EMBL are primary databases that serve as references for genome analysis and comparison.
The primary database PDB is used for identifying protein structures.
Gene Expression Omnibus (GEO), a primary database, contains transcriptome data useful for analyzing differentially expressed genes and understanding gene expression.
KEGG is a primary database that provides information on metabolic and signaling pathways in various organisms.

Primary Databases: Explanation, Categories, Examples, Applications

Table of Contents

What are Primary Databases?

GenBank

European Molecular Biology Laboratory (EMBL)

DNA Data Bank of Japan (DDBJ)

Protein Data Bank (PDB)

Gene Expression Omnibus (GEO)

Applications of Primary Databases

Post a Comment

Top Post Ad

Below Post Ad

Microbiologist Toolkit

Search This Blog

Patent Requirements and Biotechnology Patents: Patentability, Protection, and Application Process Explained

Popular Posts

Can a Natural Compound Help Fight Dental Cavities?

Hidden Antibiotic Potential: Discovery of Potent Biosynthetic Intermediates in Methylenomycin Pathway

TLC–Bioautography for Detection of Antimicrobial Compounds from Culture Supernatant Principle, Procedure, and Applications

Plasmids by Doctor-dr

Biochemical Tests of Bacillus cereus: Identification Guide with Morphology, Key Reactions, and Results

Followers

Visitors

About Us

Follow Us

Footer Copyright

Contact form

Primary Databases: Explanation, Categories, Examples, Applications

Table of Contents

What are Primary Databases?

GenBank

European Molecular Biology Laboratory (EMBL)

DNA Data Bank of Japan (DDBJ)

Protein Data Bank (PDB)

Gene Expression Omnibus (GEO)

Applications of Primary Databases

You may like these posts

Post a Comment

Top Post Ad

Below Post Ad

Contact form