.. _cookbook: Gene Ontology Data ================ How do I find the GO annotations for my set of genes? ------------------------------------------------------------------------- This assumes that you have successfully uploaded or created your set of genes. If you require help with this please see :ref:`lists` To find the GO term annotations for your genes it is easiest to start with a specially created template search: 1. From the FlyMine homepage, select the ‘Templates’ tab from the navigation bar. 2. From the list of templates we are looking for one called `Gene → GO terms `_. If you cannot see it immediately enter ‘GO’ into the filter box to narrow the search. 3. Click on the template name, this will display the template form. Enter the gene or select the list you wish to run the search with and press ‘Show results’. Modifications: This template search returns GO term annotations for all three GO ontologies along with other GO attributes, such as evidence code and qualifier. You may wish to further filter your results on some of these attributes, or add additional attributes to the search, such as annotation extension or the GO ‘with’ attribute. The following examples illustrate how to do this: How do I filter my results to show only GO annotations from the ‘biological process’ ontology? ------------------------------------------------------------------------------------------------------------------------ The easiest way to filter your results is using the filter functions in the results table. In the ontology term.Namespace column click on the column summary icon. Select ‘biological process’ and click filter. To return to your original results either remove the filter by clicking on the green ‘filter’ icon in this column header, or use the UNDO button above the results table: .. image:: /_images/filterGOterms.png How do I filter my results to remove those with the IEA (inferred from electronic annotation) --------------------------------------------------------------------------------------------------------------------- evidence code? --------------------- You can do this either through the column summary or the \filter’ function: 1. Using the column summary: Click on the column summary icon in the code column. Select all evidence codes except IEA and click filter. .. image:: /_images/filterEvidence.png 2. Using the filter button: Click on ‘Filters’ above the results table. Scroll down to the GO Evidence Code field and click ‘Choose’. Define your filter and click ‘Add filter: .. image:: /_images/filterEvidence2.png How do I analyse my gene list for GO term enrichment? -------------------------------------------------------------------------- Go term enrichment is calculated automatically for all gene lists - you simply have to navigate to the :ref:`listanalysispage` for your list. The GO term enrichment for your list will be displayed as a widget (see :ref:`widgets`). For more information on the enrichment calculation see the `InterMine documentation `_. Other Gene Ontology searches ------------------------------------------ FlyMine includes many other Gene Ontology related searches. To see the full range of searches navigate to the :ref:`templatesearches` page and enter ‘GO’ into the filter box (see :ref:`filtertemplatesearches`). For example you can find all genes annotated to a particular GO term using the `GOterm → Genes `_ template. This search is extended by the template `GO term name [and children of this term] → Genes in one specific organism `_, which will return genes annotated to the term given AND genes annotated to any children of this term. Various combinations of GO term and data searching are available, for example `Gene → orthologues and GO terms of these orthologues `_, will return the GO terms annotated to the orthologues of your gene(s). Remember, if you don’t see what you are looking for please :ref:`contact` - we can probably construct the search for you. Expression Data ============= How do I find which tissues or developmental stages my gene(s) is/are expressed in? ----------------------------------------------------------------------------------------------------------------- For a single gene the quickest way to view the available expression data is to look at the :ref:`reportpages` for that gene. Here you will find graphs showing the expression of the gene across development and the expression in adult fly tissues. In addition tables showing *in situ* mRNA expression data from BDGP and FlyFish are available. For a list of genes, widgets (graphs) are available on the :ref:`listanalysispage` showing expression in adult fly tissues (FlyAtlas data) and across development (modENCODE RNA_seq data, BDGP and FlyFish in situ hybridisation). How do I find genes expressed in a particular tissue or at a certain developmental stage? ---------------------------------------------------------------------------------------------------------------------- A number of template searches are available to analyse the various sources of expression data. Due to the widely variant nature of the different expression sources, it is generally necessary to search each one separately - for example data for expression over development is available from BDGP, FlyFish, the modENCODE RNA_seq data and from the time course data published by Arbeitman et al. Although a template search is available to analyse the BDGP and FlyFish data together, the development stage ranges they used in their studies do not agree exactly. The following templates searches specifically look for genes expressed at a particular developmental stage(s) or in a particular tissue(s): `Tissue [D. melanogaster] → FlyAtlas data `_ (allows optional filtering on FlyAtlas values) `Microarray Time Course data from Arbeitman et al (filter on stage and ratio) → Genes (D. melanogaster) `_. `BDGP → Genes `_ (searches for a specified expression term and developmental stage). `Stage [D. melanogaster] → FlyFish + BDGP in situ data `_ `FlyFish expression term + stage [D. melanogaster] → Genes `_ To see all template searches for expression data navigate to the :ref:`templatesearches` and filter the list for ‘expression’ using the drop-down filter menu (see :ref:`filtertemplatesearches`). How do I find genes expressed ONLY in a particular tissue? ------------------------------------------------------------------ This is a three step process: 1. Find and save the genes expressed in all tissues except the one you are interested in. For this you can use the template `Genes → Expression in a set of tissues `_. From the drop down list of tissues select all tissues except the one you want to find the tissue-specific genes. The operator should be set to **ONE OF**: .. image:: /_images/tissuespecific1.png Create a list of all the genes returned by your search (If you are unsure how to save a list of genes see :ref:`makealist`): .. image:: /_images/tissuespecific2.png 2. Find the genes expressed in the tissue you are interested in: You can use the same template as above - just change the tissue selection to the one you are interested in. Again, save the set of genes returned by the search: .. image:: /_images/tissuespecific3.png 3. Find the genes expressed ONLY in the tissue you are interested in: Now to find the genes that are expressed ONLY in the tissue you are interested in you need to subtract the first list created above from the second list. To do this, navigate to the lists ‘view’ page. The two lists you have just created should be at the top of the list of lists. Select the two lists and then ‘Asymmetric Difference` (Note that you need to use Asymmetric difference rather than subtract - the icons on this page illustrate the difference between these two operations). The Asymmetric difference provides options to subtract the lists either way - you need to select the single tissue list minus the all but one list - enter a name for your new list. The new list should appear at the top of the lists view page. In this example, as we are using the FlyAtlas data, we can check that we have created a tissue-specific expression set by looking at the FlyAtlas widget on the :ref:`listanalysispage` for the list we have just created. .. image:: /_images/tissuespecific4.png 4. Similar searches. The same workflow can of course be applied to finding other unique sets - for example, genes expressed only in a particular developmental stage. The template search `RNA_seq → genes expressed in a set of tissues `_ can be used to create various lists based on the modENCODE RNA_seq data (which includes developmental stage, tissues and other conditions). Pathways ====== Where does the pathway data come from? ------------------------------------------------ In FlyMine we load pathway data from both `Reactome `_ and `KEGG `_ . However, note that the KEGG data has not been updated since May 2011 due the current KEGG licensing requirement. How do I find which pathways my gene(s) is/are involved in? --------------------------------------------------------------------- Use the following template search: `Gene → Pathway `_ My gene is in pathway X, how can I find other genes involved in this pathway? ---------------------------------------------------------------------------------------- Use the following template search: `Pathway → Genes `_ I am not sure of the exact name of the pathway, how can I find this? ----------------------------------------------------------------------------- Just start typing in the pathway name box - the type-ahead function will automatically show you matching pathways. Can I visualize this pathway data in InterMine? ----------------------------------------------------- We do not currently have any pathway visualization within FlyMine. However, linkouts enable you to view the pathways in either Reactome or Kegg. How do I find out if my genes or lists of genes have any pathways in common? ---------------------------------------------------------------------------------------- The underlying data model makes it possible to construct queries which effectively compare two lists for a specified attribute. Such a query is available as a template for comparing the pathways for two genes or two sets of genes. For two genes the query will return any pathways that are shared by the two genes. Similarly, if two lists are provided any pathways shared between any two genes in the lists are returned. The template is: `Gene A → Pathways ← Gene B `_ How do I find whether orthologues of my gene(s) share similar pathways or are involved in ---------------------------------------------------------------------------------------------------- additional pathways? ------------------------ For a single gene, use the report page pathways viewer: Each gene report page in FlyMine (and in other MOD-InterMines) includes a table displaying pathways for the orthologues of the gene you are viewing. The table is created by searching the other InterMine databases for orthologous genes and the pathways they are involved in: .. image:: /_images/pathwayviewer.png For a list of genes: 1. Send your list of genes to the relevant organisms Mine: see :ref:`listanalysisjumptomine` 2. Once in the other Mine, search the templates for a search that will provide you with the pathway data for your genes. 3. At present we do not have a tool that will directly compare the pathways for your list of genes. However if you download your pathway list (see :ref:`resultsdownload`) and upload it back into your original mine (see :ref:`listupload`) you can use the :ref:`listsetoperations` to carry out an intersection on the two pathway lists. Regulatory data ============ How do I find which transcription factors have been shown to regulate my gene(s)? ------------------------------------------------------------------------------------------------------------- The following template will return all types of regulatory region that have been mapped for your gene: `Gene → Regulatory elements `_ For a specific type of regulatory region you can :ref:`filters`. The fifth column of the results (sequence ontology term.name) provides the type of regulatory region. The following template searches specifically for transcription factor binding sites and gives the factor which binds the site: `Gene → Transcription Factors `_ Can I create a list of other genes known to be regulated by this/these transcription factors? ------------------------------------------------------------------------------------------------------------------------ Yes, use the following template: `TranscriptionFactor → Genes which have binding sites for this factor `_ How do I find the sequence of the binding sites? ---------------------------------------------------------------- The templates `Gene → Regulatory elements `_ and `Gene → Transcription Factors `_ both include the sequence of the binding site in their results. For other templates you can add the sequence field if necessary (see :ref:`resultsadddata`). You need to look for the class **Sequence** and add the field called **Residues**. Other regulatory region searches -------------------------------------------- FlyMine includes many other templates for searching regulatory data, including searching within a specified chromosomal region and searching within gene flanking regions. To view all regulatory templates, go to the templates page and filter the list for **Regulation**: .. image:: /_images/regulatorytemplates.png Orthologues ========== Do(es) my gene(s) have orthologues in Human, mouse or rat? -------------------------------------------------------------------------------- 1. Open the following template: `Gene → Orthologues `_ 2. Select the gene or gene list you wish to run the search on. 3. Turn on the Organism Short Name constraint and select ONE OF and then select Human, Mouse and Rat (see image below). 3. Run the template. .. image:: /_images/orthologuetemplate.png Does my gene(s) have a human orthologue associated with a disease? ----------------------------------------------------------------------------------------------- In FlyMine we load disease and associated human gene data from OMIM . This data can be searched with orthologous fly genes using the following template: `Gene → Human disease gene and OMIM disease `_ You can also search genes associated with a specific disease. Two template searches are available, one that just returns Human genes and one that returns the Human genes and their orthologues either in all organisms or a specified organism: 1. Return just the human genes: `Disease → Human Gene `_ 2. Return the Human gene and orthologues: `Disease → Human Gene and Orthologues `_ How do I export the sequences of an orthologous pair so that I can do an alignment in ------------------------------------------------------------------------------------------------------------------- galaxy? ----------- 1. Run a template search to bring back the sequence of the gene and orthologous gene: `Gene → Orthologues + Sequence `_ 2. Download the sequence to galaxy. You need to do this in two steps - once for the gene sequence and then for the orthologue sequence (you can’t download two sequences at once to galaxy). At this point, you need to ensure you have only one row of sequence as all rows will be downloaded. This can be achieved by further filtering your results (see :ref:`filters`). * In the download box, select **FASTA sequence** from the format options. * Selecting the fasta option, provides a **nodes** option, where you can select which sequence column you are exporting. * Under the **output** option you can optionally extend the sequence * Under the **destination** option select **Send to galaxy for analysis**. (see :ref:`resultsgalaxy` for more details). 3. In galaxy you will need to cat your two sequence files. You can use the **Concatenate datasets** program under **Text Manipulation** in the menu bar. 4. You can then use the **ClustalW** program in galaxy (available under **Mulitple Alignments** in the menu bar) to align your sequences. Interactions ========= Do you load genetic and physical interaction data? ------------------------------------------------------------------- In FlyMine we load both genetic and physical (protein-protein) interaction data. Genetic and physical interactions are loaded from `BioGrid `_. Physical interactions are also loaded from `IntAct `_. Note that there is significant overlap in the physical interactions loaded from BioGrid and IntAct. **NOTE:** All protein-protein interactions have been mapped to genes, so whether you are building your own search or using the template searches you will need to start from the Gene class and use gene identifiers rather than protein identifiers. I am only interested in physical interactions, how do I filter out the genetic interactions? ------------------------------------------------------------------------------------------------------------------- A. Using a template search ‘’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’ All of the interaction template searches in FlyMine include an optional constraint to show either physical or genetic interactions. Unless noted specifically, all interactions are shown by default: .. image:: /_images/interactiontypetemplates.png B. building your own query: ‘’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’ The **Interactions** class references a **Details** class which has a **type** attribute (field). This attribute holds the value **physical** or **genetic**. If you are building your own query (see :ref:`querybuilder`), you can **constrain** this attribute to the required value: .. image:: /_images/interactiontype.png .. _interactionexperimentgo: How do I find the experimental procedure that was used to predict the interactions? --------------------------------------------------------------------------------------------------------------- The experimental procedure is stored in the **Interaction Detection Methods** under the attribute **Name**. This attribute has already been added to the results of the interaction templates: .. image:: /_images/interactiondetectionmethod3.png However, if you find one which has this attribute missing or you wish to add it to your own query, you can add it using the **manage columns** function on the results tables (see :ref:`resultsadddata`): .. image:: /_images/interactiondetectionmethod2.png OR, using the query builder: .. image:: /_images/interactiondetectionmethod1.png I am only interested in high-confidence interactions, how do I find these? ------------------------------------------------------------------------------------------------ Currently the only way to filter out high-throughput or lower confidence interactions is to filter on the experimental method - for example you may want to exclude all interactions from two-hybrid studies. See :ref:`interactionexperimentgo` and :ref:`columnsummaries` and :ref:`filters`. Coming soon: high-confidence interaction sets as assigned by `IntAct `_. How do I view the interactions for my gene/protein? --------------------------------------------------------------------- The gene :ref:`reportpages` include an interactions viewer. This allows you to toggle between showing all interactions, physical or genetic interactions, export your interaction network or save the list of participating genes as a list. You can also view the network in results table form, allowing further filtering and exploration (see :ref:`resultstables`). .. image:: /_images/interactionviewer.png How do I find orthologues for my interacting genes/proteins? --------------------------------------------------------------------------------- The following template search will return the orthologues for interactions for a specified gene or gene list: `Gene→ Interacting Genes + Orthologues `_ You can also search on a genome scale using the following template: `All interacting genes in organism1 → Orthologues in organism2 `_ Do the orthologues for my interacting genes also interact? (interologues). -------------------------------------------------------------------------------------------------- This search can be carried out on a genome scale using the following template: `Interactions in organism1 → Orthologues that also interact in organism2 (Interologues) `_ Note that this template is dependent on the interaction sets loaded into the InterMine, so for FlyMine can only be run with worm (*C. elegans*) and yeast (*S. cerevisiae*). How do I find out if my interacting genes/proteins are expressed in the same tissue/dev stage? -------------------------------------------------------------------------------------------------------------------- 1. Run the following template to retrieve your interacting genes: `Gene → Interactions `_ 2. Create a list containing all the interacting genes and your original gene. (You will need to create a list of the interacting genes and then add your original gene to this list - see :ref:`makealist` for more information on creating lists from results tables). 3. There are now several options to examine the expression of this set of genes. : A. The list analysis page: The :ref:`listanalysispage` provides several graphical displays of expression data - tissue expression from FlyAtlas and developmental stage expression from BDGP and Fly-Fish. for the genes expressed in a particular tissue or stage, simply click on the bar, from where you can either create a list or view the genes in a results table - here you will easily be able to find whether your original gene is included in the set. The above is a more exploratory approach to looking at the expression as you cannot see which sets include your original gene. For a more detailed analysis use one the templates described below: B. Tissue-expression template: Run the following template with your list of genes: `Gene → FlyAtlas data `_ * Filter the gene symbol column for your original gene. This will show you the tissues that your original gene is expressed in. * Select the tissue you are interested in using the column summary and turn off the gene filter you applied above for your original gene. This will show you all the genes (and your original gene) that are expressed in that tissue. You could make lists of the genes expressed in each tissue for further analysis. C. Developmental stage expression: The following templates will allow you to analyse your list for developmental stage expression. The `BDGP `_ and `FlyFish `_ data is *in situ* mRNA expression data, while the microarray time course data is a developmental stage microarray expression study by `Arbeitman et al `_. Filter the data as described above to find stages in which your original gene and its interactors are expressed: `Gene → FlyFish and BDGP insitu data `_ You can examine the FlyFish and BDGP data separately if you wish: `Gene -> BDGP in situ data `_ `Gene → FlyFish data `_ `Gene → Microarray time course expression data `_ The above workflow is summarised in the following diagram: .. image:: /_images/InteractionExpressionWorkflow.png