FAANGMine v1.2 Documentation

FAANGMine is a data mining resource that integrates reference genome assemblies for cattle, horse, pig, sheep, chicken, cat, dog and water buffalo with many other biological data sets. Powered by InterMine, this platform provides access to a number of datasets from a variety of source. It also provides customized bioinformatics tools that researchers can use to create their own custom datasets. FAANGMine is part of FAANGMine.org. The FAANG (Functional Annotation of ANimal Genomes) Consortium is “a coordinated international action to accelerate genome to phenome” and aims to generate comprehensive maps of functional elements in genomes of domesticated animals. FAANGMine will integrate data generated by the FAANG Consortium for animal researchers with or without bioinformatic programming skills to use in their own research projects.

_images/FAANGMine-banner.png

Main site: http://faangmine.org/faangmine

Link to the available datasets in FAANGMine: http://128.206.116.34:8080/faangmine/dataCategories.do

FAANGMine.org is based upon work supported by the National Science Foundation under Award Number 0054449. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. FAANGMine is developed and hosted at the University of Missouri. If you have comments or if you wish to report a problem, please contact the Database Administrator.

Overview of FAANGMine

This section provides a brief overview of the layout for FAANGMine.

_images/FAANGMine-banner.png

The navigation panel highlights different functionalities of FAANGMine.

Home - The home page for FAANGMine

MyMine - The MyMine serves as a portal for account management. When logged in to FAANGMine Users can access their saved templates, most recent queries and saved lists.

Templates - List of templates that users can select from based on the nature of their query. Each template is a predefined query with a simple form containing a description of what input is expected and the type of output that will be generated.

Lists - Allows users to upload lists of genes on which they can perform enrichment analyses and export the results. Users that log in to FAANGMine can save their lists for future use.

QueryBuilder - A flexible interface that allows users to create their own custom query template while browing the FAANGMine data models. Queries can be exported in a variety of formats to share with other users.

Regions - The Genomic Region Search tool where users can enter a series of genomic coordinates, specify flanking regions and fetch all features that fall within the given interval. The result can be exported or saved as a list for further analyses.

Data Sources - Provides a summary of all the data loaded into FAANGMine including their sources, associated publications and links to source sites.

Help - Links to the FAANGMine help docs and tutorials

API - Describes the InterMine API that allows users to programmatically access FAANGMine.

Report Pages

All objects in FAANGMine (e.g., gene, protein, transcript, publication) have report pages that can viewed after running a query. It allows users to view all available information for that object while providing links to related objects. As an example, we can revisit the templates example. In the list of templates under the Templates tab on the FAANGMine home page, select Gene -> Homologues to query FAANGMine to retrieve all homologues for a given gene. Enter “GSTM1” into the the LOOKUP search box then click Show Results. In the results table, note that every entry is contains a link. You can mouse over any link to bring up a summary of that object. If we hover over the first Gene ID, we can see a summary box that includes

Report summary box

Summary for gene entry in query results table

Clicking on that same item will bring up its report page that includes a comprehensive for gene GSTM1. The report page header shows the Gene ID and its Biotype, for this example, protein coding. The tabs at the top of the page in the Quick Links menu bar quickly bring you to the data listed. The column on the right side of the report page displays external links to other Mines and databases.

Report page

Report page for protein-coding gene

The content of the report page is divided into categories based on the type of information provided for that particular object. Clicking on links within each category bring up more details about the objects of interest.

Summary

The Summary section near the top of the report provides information on the gene such as its length, chromosome location, and strand information. Users can also get the complete FASTA sequence of the gene by clicking on the FASTA tab.

Report page summary section

Transcripts

The Transcripts section contains information about the gene model, such as transcripts and exons. Links to FASTA files are included where applicable.

Report page transcript section

Proteins

The Proteins section provides information about the protein product of the gene. The comments section gives a brief description about the protein along with the UniProt accession and links to any outside data sets.

Report page protein section

Function

The Function section displays Gene Ontology annotations for a gene. Annotations are divided into three categories:

  • Cellular Component
  • Molecular Function
  • Biological Process

The GO terms are displayed along with the evidence code indicating how the annotations were derived. A results with Pathway information is also displayed if applicable.

Report page function section

Homology

The Homology section provides information for all homologues. The first portion displays a summarized view of the homologues reported in different organisms. The next portion provides more detailed information about the homologue, the type of homologue and from which dataset the information was obtained all displayed in a results table.

Report page homology section

Interactions

The Interactions section provides interaction information. For GSTM1 there are no interaction information available but for genes that do have interaction information, a network is displayed showing all interactors for the current gene.

Report page interactions section

Publications

The Publications section displays a table of publications related to the gene with links to full citations.

Report page publication section

Other

This last section provides miscellaneous information that do not fit into any of the above categories. This example lists protein coding annotations and their sources.

Report page other section

Lists

Creating Lists

Users may create and save lists of features, such as gene IDs, transcript IDs, gene symbols, etc. The list tool searches the database for the list items and attempts to convert each identifier to the selected type. Click on the Lists tab from the menu to access the full list upload form. A short version of the form is also in the Quick List box on the home page.

Lists Upload Form

List upload form

Lists Quick List

Quick list from FAANGMine home page

As an example, enter the following comma-separated identifiers into the Lists upload form under the Lists tab. Notice that they do not have to be in the same format. A Summary table is displayed with the results of searching for each of the five identifiers in the list.

CAPN2, ENSCHIG00000014802, BTG1, XDH, 101107826

Leave the Select Type drop-down menu to Gene and the Organism drop-down to Any. Click on Create List. Note that you can also upload a list from a .txt file.

Lists results

List Example: Search results for list of identifiers

The summary table provides information regarding those identifiers that had a direct hit without any duplicates. If there are any duplicates, users can decide to add the relevant entries individually by clicking on the Add button under the Action column or choosing the Add all tab. Here we will click Add all. Once the selections have been added, the list can be saved by clicking the Save a list of 66 Genes button on the top of the summary table. Name the list by entering text into the Choose a name for the list box at the top of the results page.

Lists save results

List Example: Saving list of identifiers

After the list is saved, users are presented with a List Analysis page. This page provides users with widgets to perform analyses on gene lists that they have created.

Lists analysis pate

List Example: Analysis for gene list

The selection of widgets provided on the List Analysis page depend on the contents of the list. The available widgets for this list example include:

  1. Gene Ontology Enrichment
  2. Publication Enrichment
  3. Pathway Enrichment
  4. Orthologues
Lists widgets

List Example: Displayed widgets for list analysis

Saving Lists

To see your saved lists, click the View tab on the Lists page. If not logged in, lists will be saved temporarily during your current session. However, you must be logged in to save your lists permanently. Further analyses of lists can be done with the Actions links at the top of the list. The links become active once lists are selected for analyses. Saved lists may also be accessed from the MyMine menu tab.

Lists widgets

List Example: Saved user lists

MyMine

MyMine serves as a portal where logged-in users may manage their lists, queries, templates, and account details.

To access MyMine, click on the MyMine menu tab. A submenu appears with six options:

Lists - Lists saved by the user when logged in.

History - List of most recently run queries.

Queries - List of saved queries.

Templates - Templates created or marked as “favorite” by the user.

Password - Password reset form.

Account Details - User preferences form.

MyMine Home

Saved lists found under MyMine. Note that currently saved lists can be selected for analyses to contribute to new lists.

API

An API is available for users who would like to programmatically access FAANGMine.

API page

Perl, Python, Ruby, and Java are the languages supported by the InterMine API.

For more detailed information, view the InterMine documentation.

Data Sources

The Data Sources table provides a description of the datasets that are integrated into FAANGMine, along with their download location, version or release, citations wherever applicable, and any additional comments.

Data Sources Table

BovineMine Data Sources table

How to cite

FAANGMine is a project supported by the National Science Foundation to address the need for a high performance data mining resource that enables fine-grained querying and integrating the heterogeneous FAANG data with existing information, such as functions of known genes and research datasets.

For more generic examples on how to use InterMine, click here. These are tutorials created by FlyMine that showcase the different features of InterMine.