Databases

Search Handbook

Databases

The BOLD platform contains a set of integrated databases that offer users access to primary BOLD data as well as supplementary information important for their research. All of these databases are publicly accessible on BOLD without a user account, however private data is masked from searches where applicable.

The Public Data Portal and the BIN database consist of primary sources of data on BOLD, while the Publication Database, and the Primer Database are mostly used in supporting users with their research.

This section describes how users can search and access information from these databases, as well as it introduces other important features of the system, including the BOLD ID Engine, the Taxonomy Browser, and the Annotation framework.

BOLD Identification Engine

The library of sequences collected in BOLD is available for facilitating identification of unknown sequences. The BOLD Identification Engine uses all sequences uploaded to BOLD from public and private projects to locate the closest match. To ensure data security, sequences from private records are never exposed.

Batch Identifications

BOLD now provides the ability to submit a batch of query sequences for identification. This service is available for up to 100 sequences at a time for users signed into the system.

Email Results

Users can email the identification results so that identification requests may be run in parallel. The new option is next to the Submit button. Upon submitting the ID Engine request, the system will provide you with an estimated run-time.

Animal Identification (COI)

The BOLD ID Engine accepts sequences from the 5’ region of the mitochondrial gene COI and returns a species-level identification (when possible). BOLD uses the BLAST algorithm to identify single base indels before aligning the protein translation through profile to a Hidden Markov Model of the COI protein. There are four types of databases that can be used to identify COI sequences. The BOLD ID Engine provides historical copies of the COI databases dating back to 2009 for use in replicating results from previous years. The Full-Length COI database is designed for use with short query sequences as it provides maximum overlap in the barcode region of COI.

Fungal (ITS) and Plant (rbcL & matK) Identification

In the BOLD ID Engine, ITS is the default identification tool for fungal barcodes and rbcL and matK are the defaults for plant barcodes. Both return a species-level identification (when possible). The BLAST algorithm is employed in place of BOLD’s internal identification engine for these sequences. The number of fungal and plant sequences in BOLD is relatively limited compared to the number of animal sequences and thus a successful species match may not be possible. As new sequences are added to the database, the number of successful matches should improve. These databases include many species represented by only one or two specimens, as well as all species with interim taxonomy. Both searches will return a list of the nearest matches but do not provide a probability of placement to a taxon.

Descriptions of the 6 types of identification databases on BOLD
Database Name	Description	Database Size
All Barcode Records	Every COI sequence on BOLD >500bp	>1,390,000 sequences
Species Barcode Records	Every COI sequence >500bp with species level identification	>1,150,000 sequences
Public Barcode Records	Every public COI sequence >500bp	>270,000 sequences
Full-Length Barcode Records	Every COI sequence on BOLD >640bp	>950,000 sequences
Fungal Records	Every ITS sequence on BOLD >100bp	>15,000 sequences
Plant Records	Every rbcL and matK sequence on BOLD >500bp	>95,000 & >70,000 sequences respectively

The results page for a typical animal sequence identification is illustrated below. For each sequence queried, a overview is provided describing the best match, links to both the taxonomic page and the BIN cluster for the match, as well as a Taxon ID Tree placing the query sequence in among 100 of the closest matches. The top matches listed in the table provide links to the public record where available. A map is provided displaying the collection location of all the public records in the top 100 matches. For a batch of sequences queried, each result page is accessible via the accordion tabs in the page.

Id Engine Results Identification Engine results page for batch identification

tag_new
tag_publicdata
tag_search
tag_specimen
tag_sequence
tag_bin
tag_analysis
tag_taxonomy
tag_map

Taxonomy Browser

The Taxonomy Browser is a synthetic database that allows users to examine the progress of DNA barcoding by browsing through the different levels of the taxonomic hierarchy available on BOLD.

Within the Taxonomy Browser, users can select phlya in the Animal, Plant, Fungus, or Protist kingdoms to navigate from phylum to species level. Statistics on the progress of DNA barcoding at each taxon are generated from both public and private data while protecting private user-owned data. To look up a specific taxon directly, use the search function by entering a taxonomic name into the search bar at the top of the Taxonomy Browser or on the BOLD Home page. Descriptions of the features on each taxon page are illustrated and described below.

Tax Browser BOLD Taxonomy Browser

Information available on each page within the BOLD Taxonomy Browser
1. Lineage	Displays the taxon name and the higher taxonomic levels.
2. Search Bar	Enter a taxonomic name to go directly to a page.
3. Sub-Taxonomy	Links to all sub-taxa with number of specimen records for each.
4. Taxon Description	Displays the description of this taxon from the Wikipedia website.
5. Statistics	These statistics are compiled by BOLD for this taxon. A species progress list can be downloaded for each rank that has sub-taxa. The published and released sequences for this taxon can be downloaded from this section.
6. Sample Sources	A graph of the top institutions that provided specimens with their specimen tallies.
7. Imagery	A random selection of the images available for the subtaxa of this taxon. Mousing over an image selects it for higher-resolution display to the right.
8. Image Details	The taxonomic identifier, the sample identifier, license, and attribution are all displayed beneath the selected image.
9. Collection Sites	A map of the collection sites including a list of the top countries.
10. Taxon Occurrence	A map of the occurrence data for this taxon from GBIF.

tag_taxonomy
tag_publicdata
tag_search
tag_specimen
tag_image
tag_map

Publication Database

The Publication Database contains details on publications that are relevant to the barcoding community and are submitted by users of the system. It is accessible without logging into BOLD. This database indexes title, abstract, year, and authors, allowing for broad searches. Expanding a publication from the results list will provide details on the publication, including a link to the article on the journal’s site, as illustrated below. A citation or set of citations can be downloaded from BOLD using the drop down menu to the right of the search bar.

Bibliographies can be submitted to this database by users, following the Bibliography Submission protocol. By associating records to a bibliography on BOLD, the article citation will appear everywhere the records appear in BOLD.

Publication DB Publication database showing an example search for an author name.

tag_download
tag_search
tag_submission
tag_publication

Primer Database

The Primer Database is a database of all the public primers available in BOLD. This can be accessed without a BOLD account. Using the search bar, users can enter terms that appear in the primer code, submitter, or reference fields. Selecting a primer from the database will provide details on the primer, including primer performance statistics derived from data submitted to BOLD as illustrated below. A primer or set of selected primers can be downloaded in FASTA format using the Download Selected Primers button the the right of the search bar.

If users have previously registered a primer in BOLD, it will be available in the Primer Database if the user is signed in to BOLD, allowing private primers to be edited (ie, to make it publicly available and to add citation information). New primers must be registered from the User Console before trace files generated using them are submitted to records on BOLD following the Trace Submission protocol.

Primer DB Primer database showing an example search for primers associated with the keyword "bird".

tag_tracefile
tag_search
tag_download
tag_submission
tag_sequence

Public Data Portal

Searching the Public Data Portal

The BOLD Public Data Portal is a database of all of the public records on BOLD, including those in the early data release phase of the iBOL project, where information is still masked. This database can be used to access and download specimen data and sequences.

Public users can search the Public Data Portal using taxonomy, geography (country and state/province), and institution keywords, or by using Sample ID or BOLD Process ID to find individual records.

Any combination of keywords into the search bar. For example, searching "Lepidoptera Canada" will return all of the Lepidoptera records collected in Canada. Searching "Lepidoptera Canada -Ontario" will return the same results, but with the specimens collected in Ontario omitted.
For further details and examples can be entered for using the search functionality, see the search help section that is available by clicking on the help button to the right of the search bar.

The search results will display a list of the public records that match the searched terms, as illustrated below. Toggling to "BINs" next to the search button will convert the list to all BINs available.

BIN list Public Data Portal results from a search for "Chordata"

Specimen Record

The record page gives information on the specimen identifier, taxonomy, specimen details, collection data (including collection site), sequence information, specimen image details, and attribution details. The figure below shows the details page for a particular record. A record page will reference a BIN when one is available and provides links to GenBank records.

Public Specimen Record Public Record Page

tag_publicdata
tag_download
tag_search
tag_specimen
tag_sequence
tag_image
tag_tracefile
tag_annotation
tag_taxonomy
tag_map

Barcode Index Numbers (BINs)

The Barcode Index Number System is an online framework that clusters barcode sequences algorithmically, generating a web page for each cluster. Since clusters show high concordance with species, this system can be used to verify species identifications as well as document diversity when taxonomic information is lacking. This system consists of three parts:

A clustering algorithm employing graph theoretic methods to generate operational taxonomic units (OTUs) and putative species from sequence data without prior taxonomic information.
A curated registry of barcode clusters integrated with an online database of specimen and taxonomic data with support for community annotations.
An Annotation framework that allows researchers to review and critique the taxonomic identifications associated with each BIN and notify data owners of errors.

The BIN framework can greatly expedite the evaluation and annotation of described species and putative new ones while reducing the need to generate interim names, a non-trivial issue in barcoding datasets. The BIN algorithm has been effectively tested on a broad set of taxonomic groups and shows potential for applications in species abundance studies and environmental barcoding. The registry employs modern URI and web service functionality enabling integration with other databases.

BIN Requirements

COI sequences over 500bp will be evaluated for inclusion into BINs if they meet the quality standards. Sequences over 300bp will be considered for membership into an existing BIN, but will not create or split BINs.

BIN Publication

Ratnasingham S, Hebert PDN (2013) A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System. PLoS ONE 8(8): e66213. DOI:10.1371/journal.pone.0066213

BIN pages display aggregated data in several sections described and illustrated below.

BIN page example

BIN page definitions for the illustration below.
1. BIN Details	BIN details include BIN identifiers (URI and DOI), the member count, and distributional information. Also, nearest neighbour BIN details are provided ,along with the nearest member and the taxonomy of that record.
2. Taxonomy	The taxonomy of the public data is visible for the BIN, with highlighting to indicate taxonomy concordance and discordance. NEW! For each taxon, logged in users can search the records that they have access to by clicking on the magnifying glass icon.
3. Annotation	Via the Add Tags & Comments button, BIN pages support community vetting though annotation of individual data elements (taxonomy, images, collection sites and attribution). Please see the Annotation section for more details.
4. Distance Distribution	A histogram provides the distribution of distances between sequences within the BIN and against the nearest neighbour sequence.
5. Associated Publications	List of the publications that contain sequences from the BIN.
6. Dendrogram of Sequences	For BINs with 3 - 150 members, a circle tree is displayed which also includes the nearest neighbour. Hovering over taxon names on the circular tree highlights the terminal branch. A PDF version of the tree is available for download for all BINs with more than 2 members.
7. Haplotype Network	The interactive diagram allows for investigation of the haplotypes in the BIN cluster along species and geographical splits. Hovering over a haplotype node in the diagram reveals details on which species or geographical information are grouped. The larger the node, the more sequences in the haplotype. The thicker the line between nodes, the more closely related those two haplotypes are.
8. Collection and Owner Data	A list of the collection countries and number of specimens collected per country followed by a list of the owners of the public and private sequences contained within a BIN. NEW! For each country, logged in users can search the records that they have access to by clicking on the magnifying glass icon.
9. BIN Barcode Compliance	BINs are marked as compliant if they contain at least one sequence that meets Barcode Compliance standards.
10. Specimen Images	Displays all images for records clustered in the BIN, with license information available for each.
11. Sampling Sites	Displays a map of the collection sites based on GPS coordinates.
12. Attribution	Lists the institutions where specimens are deposited and sequenced, along with photographers, collectors, taxonomists, and funding sources. NEW! For each Specimen Depository, logged in users can search the records that they have access to by clicking on the magnifying glass icon.

tag_search
tag_specimen
tag_sequence
tag_image
tag_bin
tag_annotation
tag_taxonomy
tag_user
tag_new
tag_map

Public Annotation on Databases

As the volume of barcode data being generated increases rapidly, the need for routine curation has become apparent. BOLD’s annotation and notification system supports rapid community based validation of barcode data. Annotation can occur at the project level, record level, and also on specific data elements including taxonomy, images, and sequences on BIN pages . The Annotation System leverages the large user-base and expert knowledge for curation of both private data within collaborative projects and public data through the Public Data Portal. Tagging allows for categorization using custom and controlled tags. Both custom and controlled tags can be used for filters, searches, and workflow management.

Comments and tags applied to data by BOLD users will appear in the Activity Report on the User Console and the Activity Report on the appropriate Project Console. Comments will persist on the data element with the user's full name and a date stamp. Tags can be removed at any time by any user.

Annotation is available wherever the Add Tags and Comments button appears within BOLD. Users must be signed in to BOLD to be able to add tags and comments.

Annotation Button

The figure below illustrates the annotation window which allows for comments as well as the option to choose an existing tag or create a new tag.

Annotation pop-up window

tag_user
tag_publicdata
tag_submission
tag_specimen
tag_sequence
tag_image
tag_bin
tag_annotation

Handbook

Search Handbook

Databases

In This Chapter:

BOLD Identification Engine

Batch Identifications

Email Results

Animal Identification (COI)

Fungal (ITS) and Plant (rbcL & matK) Identification

Taxonomy Browser

Publication Database

Primer Database

Public Data Portal

Searching the Public Data Portal

Specimen Record

Barcode Index Numbers (BINs)

BIN Requirements

BIN Publication

Public Annotation on Databases

Databases

Resources

Organization

Community

Partners