FAANGMine v1.2 Documentation¶
FAANGMine is a data mining resource that integrates reference genome assemblies for cattle, horse, pig, sheep, chicken, cat, dog and water buffalo with many other biological data sets. Powered by InterMine, this platform provides access to a number of datasets from a variety of source. It also provides customized bioinformatics tools that researchers can use to create their own custom datasets. FAANGMine is part of FAANGMine.org. The FAANG (Functional Annotation of ANimal Genomes) Consortium is “a coordinated international action to accelerate genome to phenome” and aims to generate comprehensive maps of functional elements in genomes of domesticated animals. FAANGMine will integrate data generated by the FAANG Consortium for animal researchers with or without bioinformatic programming skills to use in their own research projects.
Main site: http://faangmine.org/faangmine
Link to the available datasets in FAANGMine: http://128.206.116.34:8080/faangmine/dataCategories.do
FAANGMine.org is based upon work supported by the National Science Foundation under Award Number 0054449. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. FAANGMine is developed and hosted at the University of Missouri. If you have comments or if you wish to report a problem, please contact the Database Administrator.
Overview of FAANGMine¶
This section provides a brief overview of the layout for FAANGMine.
The navigation panel highlights different functionalities of FAANGMine.
Home - The home page for FAANGMine
MyMine - The MyMine serves as a portal for account management. When logged in to FAANGMine Users can access their saved templates, most recent queries and saved lists.
Templates - List of templates that users can select from based on the nature of their query. Each template is a predefined query with a simple form containing a description of what input is expected and the type of output that will be generated.
Lists - Allows users to upload lists of genes on which they can perform enrichment analyses and export the results. Users that log in to FAANGMine can save their lists for future use.
QueryBuilder - A flexible interface that allows users to create their own custom query template while browing the FAANGMine data models. Queries can be exported in a variety of formats to share with other users.
Regions - The Genomic Region Search tool where users can enter a series of genomic coordinates, specify flanking regions and fetch all features that fall within the given interval. The result can be exported or saved as a list for further analyses.
Data Sources - Provides a summary of all the data loaded into FAANGMine including their sources, associated publications and links to source sites.
Help - Links to the FAANGMine help docs and tutorials
API - Describes the InterMine API that allows users to programmatically access FAANGMine.
Report Pages¶
All objects in FAANGMine (e.g., gene, protein, transcript, publication) have report pages that can viewed after running a query. It allows users to view all available information for that object while providing links to related objects. As an example, we can revisit the templates example. In the list of templates under the Templates tab on the FAANGMine home page, select Gene -> Homologues to query FAANGMine to retrieve all homologues for a given gene. Enter “GSTM1” into the the LOOKUP search box then click Show Results. In the results table, note that every entry is contains a link. You can mouse over any link to bring up a summary of that object. If we hover over the first Gene ID, we can see a summary box that includes
Clicking on that same item will bring up its report page that includes a comprehensive for gene GSTM1. The report page header shows the Gene ID and its Biotype, for this example, protein coding. The tabs at the top of the page in the Quick Links menu bar quickly bring you to the data listed. The column on the right side of the report page displays external links to other Mines and databases.
The content of the report page is divided into categories based on the type of information provided for that particular object. Clicking on links within each category bring up more details about the objects of interest.
Summary¶
The Summary section near the top of the report provides information on the gene such as its length, chromosome location, and strand information. Users can also get the complete FASTA sequence of the gene by clicking on the FASTA tab.
Transcripts¶
The Transcripts section contains information about the gene model, such as transcripts and exons. Links to FASTA files are included where applicable.
Proteins¶
The Proteins section provides information about the protein product of the gene. The comments section gives a brief description about the protein along with the UniProt accession and links to any outside data sets.
Function¶
The Function section displays Gene Ontology annotations for a gene. Annotations are divided into three categories:
- Cellular Component
- Molecular Function
- Biological Process
The GO terms are displayed along with the evidence code indicating how the annotations were derived. A results with Pathway information is also displayed if applicable.
Homology¶
The Homology section provides information for all homologues. The first portion displays a summarized view of the homologues reported in different organisms. The next portion provides more detailed information about the homologue, the type of homologue and from which dataset the information was obtained all displayed in a results table.
Interactions¶
The Interactions section provides interaction information. For GSTM1 there are no interaction information available but for genes that do have interaction information, a network is displayed showing all interactors for the current gene.
Publications¶
The Publications section displays a table of publications related to the gene with links to full citations.
Other¶
This last section provides miscellaneous information that do not fit into any of the above categories. This example lists protein coding annotations and their sources.
Genomic Regions Search¶
The Genomic Regions Search is a tool to fetch features that are within a given set of genomic coordinates or are within a given number of bases flanking the coordinates.
To begin this type of search, click the Regions tab on the menu bar. A form will appear asking for the search parameters (organism, feature types, genomic coordinates, etc.)
The coordinates must have one of three formats:
- chromosome_number:start..end
- chromosome_number:start-end
- chromosome_number start end (tab delimited)
Click on the input examples above the text input box (number 4) to view a representative set of coordinates in each format. Click the Genome coordinates help link near the top of the form for more detailed information on the input format requirements.
During a search, regions may be extended on either side of the genomic coordinates using the slider or by entering text in the field to the left of the slide bar. There is also the option to perform a strand-specific region search using the checkbox at the bottom of the form (number 6)
As an example, select B. taurus from the Select Organism drop-down, and ARS-UCD1.2 as the Assembly. Slick the box next to Select Feature Types to uncheck all of the boxes, then check the box next to Gene, and enter the following coordinates into the genomic regions search text field:
14:2000000..2800000
Click the search box to conduct the genomic regions search. If there are no overlaps within your search coordinates, the search can be done again with the search region extended using the slide bar or entering text into the search box (e.g., 10k).
The search results page presents a list of features present within the genomic interval that was searched. In this case, the feature type was limited to Gene. The results may be exported as tab-separated or comma-separated values. If they contain genomic features, there is also the option to saved the results in GFF3 or BED format. The FASTA sequences of the features may also be downloaded. Links within the features provide detailed reports. If users are interested in creating a list of particular features from the result page then they can filter based on feature type (if applicable), shown in red box, and click on Go.
Lists¶
Creating Lists¶
Users may create and save lists of features, such as gene IDs, transcript IDs, gene symbols, etc. The list tool searches the database for the list items and attempts to convert each identifier to the selected type. Click on the Lists tab from the menu to access the full list upload form. A short version of the form is also in the Quick List box on the home page.
As an example, enter the following comma-separated identifiers into the Lists upload form under the Lists tab. Notice that they do not have to be in the same format. A Summary table is displayed with the results of searching for each of the five identifiers in the list.
CAPN2, ENSCHIG00000014802, BTG1, XDH, 101107826
Leave the Select Type drop-down menu to Gene and the Organism drop-down to Any. Click on Create List. Note that you can also upload a list from a .txt file.
The summary table provides information regarding those identifiers that had a direct hit without any duplicates. If there are any duplicates, users can decide to add the relevant entries individually by clicking on the Add button under the Action column or choosing the Add all tab. Here we will click Add all. Once the selections have been added, the list can be saved by clicking the Save a list of 66 Genes button on the top of the summary table. Name the list by entering text into the Choose a name for the list box at the top of the results page.
After the list is saved, users are presented with a List Analysis page. This page provides users with widgets to perform analyses on gene lists that they have created.
The selection of widgets provided on the List Analysis page depend on the contents of the list. The available widgets for this list example include:
- Gene Ontology Enrichment
- Publication Enrichment
- Pathway Enrichment
- Orthologues
Saving Lists¶
To see your saved lists, click the View tab on the Lists page. If not logged in, lists will be saved temporarily during your current session. However, you must be logged in to save your lists permanently. Further analyses of lists can be done with the Actions links at the top of the list. The links become active once lists are selected for analyses. Saved lists may also be accessed from the MyMine menu tab.
MyMine¶
MyMine serves as a portal where logged-in users may manage their lists, queries, templates, and account details.
To access MyMine, click on the MyMine menu tab. A submenu appears with six options:
Lists - Lists saved by the user when logged in.
History - List of most recently run queries.
Queries - List of saved queries.
Templates - Templates created or marked as “favorite” by the user.
Password - Password reset form.
Account Details - User preferences form.
API¶
An API is available for users who would like to programmatically access FAANGMine.
Perl, Python, Ruby, and Java are the languages supported by the InterMine API.
For more detailed information, view the InterMine documentation.
Data Sources¶
The Data Sources table provides a description of the datasets that are integrated into FAANGMine, along with their download location, version or release, citations wherever applicable, and any additional comments.
How to cite¶
FAANGMine is a project supported by the National Science Foundation to address the need for a high performance data mining resource that enables fine-grained querying and integrating the heterogeneous FAANG data with existing information, such as functions of known genes and research datasets.
For more generic examples on how to use InterMine, click here. These are tutorials created by FlyMine that showcase the different features of InterMine.