goscripts.obo_tools module¶
@author: Pieter Moris
-
goscripts.obo_tools.
assign_depth
(node, GOdict, depth=0)¶ Recursive function to assign depth to node and all of its children.
Starting from the root node of a directed acyclic graph (i.e. the gene ontology hierarchy), provided that all nodes have their child nodes as an attribute, the depth of each node will be set. Note: the minimal depth is always assigned. The depth of the root node is set to zero.
Parameters: - node (goTerm object) – The GO node that is the starting point for the recursive assignment, normally the root nodes of the three GO ontologies.
- GOdict (dict) – A dictionary of GO objects generated by importOBO(). Keys are of the format GO-0000001 and map to OBO objects.
- depth (int) – The current depth of the recursion, assigned to each node as its depth in the hierarchy.
Returns: Modifies the goTerm objects’ depth attribute inplace.
Return type: None
-
goscripts.obo_tools.
buildGOtree
(GOdict, root_nodes)¶ Generates the entire GO tree’s parent structure by walking through the hierarchy of each GO entry.
Performs four main functions: A) assign all higher order ancestors (recursive parents) for each GO object B) assign immediate children C) assign all recursively found children D) assign depth to each node.
Parameters: - GOdict (dict) – A dictionary of GO objects generated by importOBO(). Keys are of the format GO-0000001 and map to OBO objects.
- root_nodes (list) – A list of nodes that lie at the root of the gene ontologies. These form the starting point for the recursive function that walks through the directed acyclic graph and assigns depths to each GO term/node.
Returns: The input GO dictionary will be updated in place. Each term object’s parent attributes now trace back over the full tree hierarchy.
Return type: None
-
goscripts.obo_tools.
completeChildHierarchy
(GOdict)¶ Generates the entire GO tree’s child structure.
By iterating over the parents of each GO object, each term’s recursive_children attribute will be filled with a set of all recursively found child terms.
NOTE: completeParentsHierarchy() must be run prior to this function!
Parameters: GOdict (dict) – A dictionary of GO objects generated by importOBO(). Keys are of the format GO-0000001 and map to OBO objects. The recursive_parents attribute of the GO objects must be complete. Returns: Updates the provided GO dictionary inplace so that for each GO term object the “children” attribute points to the immediate children of the GO term and the “recursive_children” attribute traces back over the full GO hierarchy. Return type: None
-
goscripts.obo_tools.
filterOnNamespace
(GOdict, namespace)¶ Reduces the dictionary of goTerm objects to those belonging to a specific namespace.
Parameters: - GOdict (dict) – A dictionary of GO objects generated by importOBO(). Keys are of the format GO-0000001 and map to goTerm objects.
- namespace (str) – The namespace to restrict the GO dictionary and enrichment test to. E.g. biological_process, cellular_component or molecular_function.
Returns: A filtered dictionary of GO objects all belonging to the namespace.
Return type: dict
-
class
goscripts.obo_tools.
goTerm
(GOid)¶ Bases:
object
GO term object.
Stores the ID, name and domain of the GO term and contains dictionaries for child and parent nodes.
Variables: - id (str) – The identifier of the GO term.
- alt_id (str) – Optional tag for an alternative id.
- name (str) – The GO term name.
- namespace (str) – The domain of the GO term (Cellular Component, Molecular Function or Biological Process).
- parents (set of str) – The parent terms of the GO term, as indicated by the is_a relationship.
- children (set of str) – The child terms of the GO term, derived from other GO terms after a complete OBO file is processed initially.
# https://stackoverflow.com/questions/1336791/dictionary-vs-object-which-is-more-efficient-and-why # https://stackoverflow.com/questions/3489071/in-python-when-to-use-a-dictionary-list-or-set # When you want to store some values which you’ll be iterating over, Python’s list constructs are slightly faster. # However, if you’ll be storing (unique) values in order to check for their existence, then sets are significantly faster. ‘’‘
-
alt_id
¶
-
children
¶
-
depth
¶
-
goCount
= 0¶
-
id
¶
-
name
¶
-
namespace
¶
-
parents
¶
-
recursive_children
¶
-
recursive_parents
¶
-
goscripts.obo_tools.
importOBO
(path, ignore_part_of)¶ Imports an OBO file and generates a dictionary containing an goTerm object for each GO term.
Every entry that is referred to via either “is_a” or “relationship: part_of” is considered a parent of the referring entry.
Parameters: - path (str) – The path to the file.
- ignore_part_of (bool) – A boolean indicating whether or not the “part_of” relationship should be ignored.
Returns: - dict of goTerm objects – Keys are of the format GO-0000001 and map to goTerm objects..
- Possible improvements – Check for is_obsolete and replaced_by, although the replacement term should be in OBO file as an entry.
-
goscripts.obo_tools.
propagateParents
(currentTerm, baseGOid, GOdict, parentSet)¶ Propagates through the parent hierarchy of a provided GO term to create a set of all higher order parents.
Each term’s recursive_parents attribute will be filled with all recursively found parent terms.
Parameters: - currentTerm (str) – The GO id that is being visited.
- baseGOid (str) – The original GO term id for which the search for its parents was started.
- GOdict (dict) – A dictionary of GO objects generated by importOBO(). Keys are of the format GO-0000001 and map to goTerm objects.
- parentSet (set) – An, initially, empty set that gets passed through the recursion. It tracks the entire recursive group of parent terms of the original base GO id (i.e. the starting point of the function call).
Returns: Updates the parentSet set inplace so that it contains all the (recursive) parents for the baseGOid.
Return type: None
-
goscripts.obo_tools.
set_namespace_root
(namespace)¶ Stores the GO ID for the root of the selected namespace.
Parameters: namespace (str) – A string containing the desired namespace. E.g. biological_process, cellular_component or molecular_function. Returns: The list of GO ID’s of the root terms of the selected namespace. Return type: list