Integration of GO Annotation
Visualization of Gene Ontology (GO) Using GO Explorer Back to main manual
A tab control is introduced in VisANT so that the GO explorer and the toolbox remain together in the left control panel. The width of the panel can be changed through mouse-dragging to facilitate browsing, while the width of the toolbox remains unchanged.
Note: the GO hierarchy can be saved as an image through the popup menu by right mouse-clicking.
Note: Hierarchy information is retrieved from the Predictome database, which is synchronized with the GO database monthly.
Note: YGR119C in the figure is annotated using two selected GO branches only: molecular function and metabolic process, see Annotate the Gene Functions Using Flexible Schema for details.
Tree's Look & Feel
The tree of GO hierarchy shown in the GO explorer looks much compact in the Window's Look & Feel. A new Look & Feel menu has been added under the View menu to allow user to change different look & feel of VisANT. Following figure shows the difference of the GO tree in two different look & feel:
Navigation of GO Hierarchies
Clicking on the expansion symbol or double-clicking over the tree node will expand/collapse it. A database query will be sent to the VisANT server to retrieve the node¨s descendents:
number shown in [ ] for each tree node indicates the total genes
annotated under this GO branch for the current species. Other information, such
as the number of genes directly annotated under the term, is shown in the
tooltip by a mouse-over of the tree node.
Note: When the species changes this information will change accordingly
Each tree node is associated with a checkbox to allow user selection of GO branches. If the GO term appears in multiple places of the tree, selecting one of them will automatically select the rest. This also applies to node highlighting. Terms under different categories are highlighted using different color.
GO-Related Searching Back to main manual
All GO-related functions described below are available only for the following species:
|Arabidopsis thaliana||Bos taurus||Caenorhabditis elegans||Drosophila melanogaster|
|Danio rerio||Gallus gallus||Homo sapiens||Magnaporthe grisea|
|Mus musculus||Oryza sativa||Plasmodium falciparum||Rattus norvegicus|
Note: the number of supported species may increase in the future
Search GO Terms Using Key Words or GO ids
Enter the key words in the search box at the bottom of the GO Explorer. The search results are the paths from the terms containing (highlighted) the key words/GO ID to the root of the ontology. The following figure shows the partial results of searching with key words "virus cell cycle":
Note: Usually the search will result in hundreds, or even thousands of paths. To address this challenge, the search is processed in another thread and the tree is disabled (meaning you will not be able to click the tree nodes) and the number of the paths being added to the tree is shown in the status bar (bottom of the above figure). At the same time, you can still play with the network. Because of the large number of the paths shown the GO Explorer, VisANT may run out of the memory, especially when VisANT is run as an Applet. Please reference here for the solutions when VisANT is run as a local application.
Note: the key words are case-sensitive
Note: when multiple key words are presented, the default search operation is AND
Note: GO id must stat with GO:, such as GO:0019385
By default the GO term's child terms will NOT be shown when display the path unless it has been queried before. In order to know all its child nodes, user can first collapse then expand the terms of interest in the GO explorer. The following figure shows the result of such operation for the term "response to virus" (shown at the bottom of above figure):
Search GO Terms Using Gene Names/IDs
The search box at the bottom of GO Explorer does not support searches using gene/protein names or IDs unless the name is part of the term description, such as in the case of p53. However, VisANT does support an indirect search of GO terms using gene names/IDs as illustrated below:
Make sure Homo sapiens is the current species. Enter the gene names/IDs (e.g. pten) in the VisANT's search box in the ToolBox, and click the Search button --> All interactions associated with pten will be shown:
Note: If there is no interaction data on pten, it will still be shown in the screen so long as this gene exists in Entrez Gene database.
Alternatively, you can add this gene to the network as shown below:
Select the node pten and query its GO annotation through the Nodes menu, and many GO terms are associated with this gene as shown below:
Note: Beginning with version 3.5, VisANT will automatically resolve the name of user-added nodes when querying GO annotation.
Click on the GO Explorer tab, and click on the node pten, the hierarchies of GO terms associated with gene pten are shown in the GO explorer, some of them are illustrated below:
Search Genes and/or Their Interactions Using Key Words
VisANT 3.5 supports indirect search of genes and/or their interactions using key words, as illustrated below:
In the GO Explorer search box, enter the key word "phosphatase", and click the Search button; 492 paths are shown in the GO Explorer:
Assume we are interested in gene associated with phosphatase activities, there are 251 genes associated with the term in Homo Sapiens. Drag the term from the GO Explorer and drop into the network. A metanode of the GO term with all 251 genes will be created
Note: before dragging a term into the network, make sure that the options (click the expansion button near the search button shown below) of the drag&drop operation are set as shown below:
Note: To drag a term, first left-click on it, then drag it to the network
If interested, query the interactions between 251 genes as illustrated in above figure, or query all possible interactions of 251 genes.
Interactive Visualization of GO Hierarchy in A Network Back to main manual
Visualize the Hierarchies of the Node's GO Information
The hierarchies of a given GO term are defined as paths from the term to the root of the ontology. Because terms may have multiple ancestors, a term may have many different paths to reach the root. In VisANT, there are three cases in which a node will have associated GO information:
， A gene/protein node with queried GO annotations
， A gene/protein node, or a metanode of a subnetwork, with GO functions predicted by algorithms such as GOTEA
， A metanode representing a GO term (e.g. dragged from the GO Explorer)
If a node has GO information, left mouse-clicking will show the hierarchies of the associated terms in the GO Explorer if the Link to network option is selected as shown in the figure above. Because a node may be associated with multiple terms, and terms have multiple paths; the visualization of hierarchies takes a little time to finish and is carried out asynchronized in a new thread. Visualization of GO hierarchies of another node is prohibited before the previous task is completed. In addition, users will not be able to expand/collapse the tree during the visualization process. The status of the visualization process can be checked at the status bar; e.g..
Note: when there are too many expanded GO terms, always use the popup menu Collapse All to clean the GO tree through right mouse-clicking.
Note: Uncheck the checkbox ＾Link to the network￣ in case the visualization of the GO hierarchy is not needed, and the system¨s performance will be improved.
Find the Node of GO Term in the Network
When clicking a term in the GO Explorer, the corresponding metanode in the network will be selected if it exists in the current network. In the following figure, node GO:0016791 is selected when tree node phosphatase activity is selected.
Creating Multi-Scale Networks
A term shown in the GO Explorer can be dragged into the network panel in VisANT to create a metanode which contains genes annotated directly under it and/or all its descendents. The above figure shows conversion of the y2h protein-protein interaction network of Saccharomyces cerevisiae into a network of modules defined by GO terms. Other options for the drag&drop operation, such as creating a metanode containing all genes under the branch, are available in the configuration panel. If the genes are already in the network, the operation will group them into metanodes unless they are already in another metanode. In such cases, a duplicated node will be created and grouped in the new metanode. Options of the drag&drop operations, in the case of an empty network, are illustrated below:
Once a metanode is created , users can double-click the node of its child (such as GO:0019385) to expand it. Such multi-scale visualization schema classify groups of genes and/or terms as biological network modules thereby reducing the size of a large network to a manageable level, and greatly facilitating the analysis of gene-to-gene, term-to-term, and gene-to-term relationships. The schema brings related genes and terms together in one place, facilitating the study of related biological modules using the default aggregation functions of the metagraph to infer the term associations from the network of genes or their products.
Filter the Network Based on GO Annotation
Similar to the procedure to create the multi-scale network using GO terms, the network can also be filter based on the specific functions defined by GO annotations using following steps:
Load the network
Find the specific functions in GO hierarchy using either key words or GO ids. Assume we want to search for "induction of apoptosis by intracellular signals", as shown below:
Make sure that option for drag&drop operation is "Metanode of existing genes only", as shown in the above figure.
Drag&drop the GO term (GO:0008629 in this example) to the network, a metanode of GO term will be created with genes annotated under the corresponding branch being embedded, as shown in above figure
Double-click the metanode to collapse it
Repeat steps 2-5 if there are additional functions need to be filtered
Select all metanodes using menu Edit/Select Metanodes/All
Invert node select using the menu Edit/Invert Node Selection
Delete selected nodes using the menu Edit/Delete Selected Nodes
Now only the metanode of GO term is left in the network, double-click the metanode to expand it, as shown the following figure:
Ungroup the metanode using the menu MetaGraph/Grouping/Ungroup Selected Nodes, the network left now is the subnetwork of the original one that involved in the specific functions
Annotate the Gene Functions Using Flexible Schema
VisANT provides four basic options to annotate genes using GO annotations. Options 1-3 listed below can also be applied to the selected branches. These options provide users great flexibility to test various hypotheses. We use the the human cell cycle tumor suppressor gene phosphatase and tensin homolog (PTEN) to illustrate these options (indicated below):
Note: Menus for GO annotations under the MetaGraph menu, which will annotate ALL the genes, including those hidden in the collapsed metanodes. The Nodes menu should be used for selective annotation
Using Most Specific GO Terms: Genes are annotated with the most specific functional descriptions available at Entrez Gene database. The table below lists the GO annotation of pten with this option:
lipid transporter activity(GO:0005319)[IEA]
cholesterol transporter activity(GO:0017127)[IEA]
protein homodimerization activity(GO:0042803)[IDA]
metal chelating activity(GO:0046911)[IDA]
protein heterodimerization activity(GO:0046982)[IPI]
tau protein binding(GO:0048156)[IPI]
apolipoprotein E receptor binding(GO:0050749)[IDA,IPI]
|response to reactive
oxygen species(GO:0000302) [NAS]
negative regulation of endothelial cell proliferation(GO:0001937)[IDA]
triacylglycerol metabolic process(GO:0006641)[IDA,IMP]
cholesterol catabolic process(GO:0006707)[IEA]
cellular calcium ion homeostasis(GO:0006874)[IEA]
induction of apoptosis(GO:0006917)[IDA]
response to oxidative stress(GO:0006979)[IEA]
G-protein coupled receptor protein signaling pathway(GO:0007186)[IDA]
nitric oxide mediated signal transduction(GO:0007263)[IDA]
synaptic transmission, cholinergic(GO:0007271)[TAS]
negative regulation of platelet activation(GO:0010544)[IDA]
regulation of axon extension(GO:0030516)[TAS]
positive regulation of cGMP biosynthetic process(GO:0030828)[IDA]
Cdc42 protein signal transduction(GO:0032488)[IDA]
positive regulation of low-density lipoprotein receptor catabolic process(GO:0032805)[IDA]
chylomicron remnant clearance(GO:0034382)[IMP]
very-low-density lipoprotein particle clearance(GO:0034447)[IMP]
lipoprotein biosynthetic process(GO:0042158)[IEA]
negative regulation of MAP kinase activity(GO:0043407)[IDA]
negative regulation of blood vessel endothelial cell migration(GO:0043537)[IDA]
reverse cholesterol transport(GO:0043691)[IDA]
regulation of neuronal synaptic plasticity(GO:0048168)[TAS]
negative regulation of inflammatory response(GO:0050728)[IC]
positive regulation of nitric-oxide synthase activity(GO:0051000)[IDA]
positive regulation of membrane protein ectodomain proteolysis(GO:0051044)[IDA]
plasma membrane (GO:0005886) [EXP]
very-low-density lipoprotein particle(GO:0034361)[IDA]
low-density lipoprotein particle(GO:0034362)[IDA]
intermediate-density lipoprotein particle(GO:0034363)[IDA]
high-density lipoprotein particle(GO:0034364)[IDA]
Using Informative GO Terms: Genes are annotated using GO terms (i) having more than a user-specified number of genes and (ii) each of whose descendent terms have less than the specified number of genes. Let's use 145 as the cutoff (click the button near Search Button of GO Explorer, and enter 145 in the corresponding field and press Enter key), the informative GO annotations for PTEN is shown below:
protein homodimerization activity (GO:0042803)
|negative regulation of
steroid metabolic process(GO:0008202)
lipid catabolic process(GO:0016042)
cellular di-, tri-valent inorganic cation homeostasis(GO:0030005)
induction of apoptosis(GO:0006917)
G-protein coupled receptor protein signaling pathway(GO:0007186)
regulation of multicellular organismal process(GO:0051239)
regulation of response to stimulus(GO:0048583)
regulation of cell differentiation(GO:0045595)
regulation of cellular component organization(GO:0051128)
nucleotide metabolic process(GO:0009117)
regulation of cellular protein metabolic process(GO:0032268)
negative regulation of catalytic activity(GO:0043086)
regulation of protein kinase activity(GO:0045859)
Using GO Terms with Genes under the Branch > cutoff: A term must have more than a user-specified number of genes. Let's use 300 as the cutoff, and here is the results:
|phosphoric ester hydrolase
|regulation of cell
regulation of catalytic activity(GO:0050790)
anatomical structure morphogenesis(GO:0009653)
phosphate metabolic process(GO:0006796)
post-translational protein modification(GO:0043687)
lipid metabolic process(GO:0006629)
positive regulation of cellular process(GO:0048522)
positive regulation of developmental process(GO:0051094)
programmed cell death(GO:0012501)
regulation of programmed cell death(GO:0043067)
regulation of apoptosis(GO:0042981)
nervous system development(GO:0007399)
negative regulation of cellular process(GO:0048523)
regulation of cell proliferation(GO:0042127)
regulation of cellular process(GO:0050794)
regulation of biological process(GO:0050789)
cellular protein metabolic process(GO:0044267)
regulation of gene expression(GO:0010468)
negative regulation of developmental process(GO:0051093)
cellular alcohol metabolic process(GO:0006066)
carbohydrate metabolic process(GO:0005975)
cellular lipid metabolic process(GO:0044255)
macromolecule metabolic process(GO:0043170)
regulation of signal transduction(GO:0009966)
protein kinase cascade(GO:0007243)
Using Selected GO Terms Only: Genes are annotated using only selected GO terms. Following figure shows the selected terms and resulting annotation for PTEN:
Options 2 and 3 are frequently used when predicting gene functions using functional linkages. Annotations resulting from different options can coexist as node descriptions in VisANT for comparison purposes.
Predict the Functions of Network Modules
VisANT predicts the functions of network modules based on the functions of their components provided by the Gene Ontology (GO) database. From this perspective, VisANT requires two inputs to predict the function of a module:
， The genes in the metanode must be GO annotated before the analysis. VisANT provides flexible annotation functions to annotate gene function using different scope and detail level for different types of network. Please reference Annotate the Gene Functions Using Flexible Schema for more information.
Note: the predicted function of the network may depend on annotation schema.
The network must have metanodes. Users can easily create a metanode for selected nodes by press CTRL-G key or through the menu MetaGraph-->Grouping-->Group Selected Nodes
Note: By default, VisANT will predict the functions all non-embedded metanodes if no metanode is selected; otherwise, it will predict functions for all selected metanode nodes. That is to say, VisANT by default will not predict functions for descendent metanodes unless they are specifically selected.
Note: You can always cancel the analysis by clicking the red button at the right-bottom corner of the status bar as shown below:
Using hyper-geometric test to validate the function of network modules: This function allows quick identification of shared functions of the gene set a network module. Network topology is not taken into account. The function can be activated through the menu:: MetaGraph-->Predict Functions of Metanodes-->Detect Over-represented GO Terms Using Hypergeometric Test-->Start Hypergeometric Test over GO Database.
Note: Although this analysis does not require any configurable parameters, it does use the cutoff of the informative GO term to identify the shared functions of the enriched GO terms. It also uses a cutoff for the number of top terms, as shown below (red circles):
Note: to disable both parameters, simply enter a large number (say 1000) for the number of top terms, and enter 1 as the cutoff on informative terms.
Note: if the number of component genes for a given metanode is small (say, <15), the results may be biased.
Using GO Term Enrichment Analysis (GOTEA) to predict the function of network modules. To take advantage of network information, a new algorithm has been developed and implemented in VisANT to find over-represented GO terms in user specified network modules (i.e., metanodes in VisANT). The function is available under the MetaGraph menu. The analysis will be performed for all non-embedded metanodes by default unless they are specifically selected. Over represented GO terms will be shown as a quick tip when the mouse is passed over the corresponding metanode. Again, GOTEA requires genes in the modules to be annotated prior to the analysis.
As shown below, there are three options for GOTEA:
Fast GOTEA. This option scans only related GO terms instead of the whole GO database to test whether they are statistically enriched in the target modules. The option first collects all GO terms annotated for the genes inside the modules, then collects all the terms that are part of their paths to reach the root.
GOTEA over Selected GO Branches: This option only scans the GO terms under selected branches in the GO Explorer
GOTEA over GO Database: This option scans
all GO terms that have genes associated with the current species in VisANT.
For a given target GO term, the algorithm first computes the density score of each node based on the path distance (number of links) to other nodes in the same module, and the similarity between its associated GO terms and the target term. The enrichment of target terms is determined by a permutation test over the subset of same number of genes extracted from all known genes annotated in the Entrez Gene database, with appropriate false discovery rate (FDR) cutoff. The advantage of the algorithm over similar algorithms is reflected in the computation of the density score, where the impact of one gene on another is a function of GO term similarity, and the number of links between the genes. GO term similarity is calculated using a fuzzy search rather than a conventional exact match (31). With such a density score, a gene having many neighbors with similar GO terms will have more significant contributions to the enrichment outcome; the algorithm therefore leverages network topology, as well as the GO hierarchy. In addition, metagraphs provide a flexible visual context to perform analysis for hierarchically organized network modules. Details of the algorithm can be found here.
When completed, the predicted functions for specified cutoffs will be added to the description of the corresponding metanode and can be edited through the Node Properties menu (or by the key combination CTRL-SHIFT_N). In addition, an HTML report of the detailed results, including the p-value and FDR score of all enriched GO terms will be generated and opened as shown below:
The report first lists the informative terms of the enriched GO ontology, the integer number shown at right represents the number of enriched GO terms that lead to the informative GO terms. For example, number 62 of informative term "cell cycle" in the above figure indicates that there are 62 enriched GO terms whose informative term is "cell cycle". After the informative terms, the report also list all the enriched terms, as well as corresponding p-value and FDR, with FDR less than the cutoff.
GOTEA uses following parameters:
alpha: is used to control the impact GO term similarity. Bigger alpha indicates less impact of term similarity.
beta: is used to control the impact network topology. Bigger beta indicates less impact of the topology
number of iterations: the number of permutation tests. Iteration 2000 shall be fine for usually quick tests; and iteration 20000 is recommended to make sure the results are statistically significant. When the number of component genes is small (say less than 15) for a given metanode, the results may biased and it is suggested to use bigger number.
FDR cutoff: false discovery rate, a double value that is used to determine whether a GO term is enriched. All statistically significant results shall have their FDR less than the cutoff.
number of top terms: the number of permutation tests. The informative GO terms are ranked according to the number of terms that result in the informative terms, as shown in the above figure of report. The big number indicates that big number of enriched GO terms share the informative GO term.
cutoff of informative GO terms: please reference here for more information.
Try out the step-by-step macros of above example to predict the function of KEGG cell cycle pathway using both hyper-geometric test and GOTEA. The macro file can be downloaded here. The macro first loads the KEGG cell pathway and annotates all genes in the pathway. It then uses both the hyper-geometric test and fast GOTEA to predict the function of the pathway, each step is explained in detail in the comments. In this example, both methods should be able to recover the function as a cell cycle pathway.
Note: you may use appropriate annotation
schema to annotate the genes to reduce the number of terms that need to be
scanned, therefore speed up the GOTEA
Detect Over-Expressed (Enriched) Network Modules using NMEA
Network module enrichment analysis (NMEA) is design to find functional modules that inform phenotypic differences. For the current release, such differences are usually transcriptional activity. That is to say, NMEA will try to find those network modules that behave differently in disease and normal samples. NMEA requires two inputs:
Network modules: In VisANT, modules are represented by metanodes. NMEA will be performed for all non-embedded metanodes; i.e. it is not performed for descendent metanodes unless they are specifically selected.
Expression data of samples and controls, e.g. expression data in disease samples and normal samples. In order to distinguish the samples from the controls, the data file for the expression is required to use the first non-comment line to indicate which columns are normal (the control), the keyword normal is case insensitive.
MUT MUT MUT MUT MUT ... MUT Normal Normal Normal Normal Normal ....
The first line "#!Expression addNewNode=false" is required for VisANT to recognize the data type and the parameter addNewNode=false tells VisANT to abandon those genes that are not shown in the current network. This parameter is optional. A sample expression data with 22 mutation samples of p53 and 17 wild-type samples can be downloaded here. More information regarding expression data and corresponding visualization in VisANT can be found here.
Note: VisANT looks key work normal for the control, we use the MUT to indicate the mutated sample, but it can be anything because VisANT treats all non-normal columns as (disease) samples.
When the execution of NMEA is completed, the nodes in the modules will be colored according to their density scores. NMEA uses the same color map as expression data (shown below) and user can customize the color map by clicking it:
Note: Default values for alpha, beta are suggested. Iteration 2000 shall be fine for usually quick tests; and iteration 20000 is recommended to make sure the results are statistically significant
Note: when the number of component genes is small (say less than 15) for a given metanode, the results may biased and it is suggested to use bigger number for the iteration.
Note: the detail of algorithm for NMEA can be found here
Note: The color of the contribution is relative to the genes inside the targeted module, and is not comparable between modules.
In addition, a detailed report will also be generated and loaded into the browse. Density scores for each gene are listed and bigger number indicates more contribution (darker green in above figure) to the overall enrichment of the corresponding network module:
The example shown in the above figure can be tried out using the step-by-step macros: NMEA for cell cycle pathway in P53 mutant data if VisANT is run as an applet, otherwise, the macro file can be downloaded here can then be opened by VisANT to carry out the macros. Each step is explained in detail in the comments, more information about the macros can be found here.