goscripts.gaf_parser module¶
@author: Pieter Moris
-
goscripts.gaf_parser.
cleanGafTerms
(gafDict, filteredGOdict)¶ Remove GO terms that do not belong to the chosen namespace, from the gaf dictionary. Also removes genes entirely if none of their associated terms belong to the namespace.
Parameters: - gafDict (dict of str mapping to set) – A dictionary that maps gene Uniprot ACs (str) to a set GO term IDs. Generated by importGAF().
- filteredGOdict – A filtered dictionary of GO objects all belonging to the same namespace. Generated by obo_tools.importOBO() followed by obo_tools.filterOnNamespace().
Returns: The gaf dictionary after removal of GO terms belonging to different namespaces.
Return type: filteredGafDict
-
goscripts.gaf_parser.
createSubsetGafDict
(subset, gafDict)¶ Generates a dictionary that maps the subset’s Uniprot ACs to the GO IDs, based on the provided gene subset and the gaf dictionary.
Parameters: - subset (set of str) – A subset of Uniprot ACs of interest.
- gafDict (dict of str mapping to set) – A dictionary that maps Uniprot ACs (str) to a set GO IDs. Generated by importGAF().
Returns: A dictionary that maps the subset’s Uniprot ACs to GO IDs.
Return type: dict of str mapping to set
-
goscripts.gaf_parser.
importGAF
(path, geneSet)¶ Imports a GAF file (gene association format) and generates a dictionary mapping the gene Uniprot AC to the GO ID. Only imports genes which are present in the provided (background) gene set, if one is provided.
- Information on the GAF 2.1 format can be found at
- http://geneontology.org/page/go-annotation-file-gaf-format-21
Parameters: - path (str) – The path to the file.
- geneSet (set) – A set containing the Uniprot AC’s of all the genes under consideration (background).
Returns: dict of str mapping to set – A dictionary that maps Uniprot ACs (str) to a set GO IDs.
Possible improvements – Check for is_obsolete and replaced_by, although the replacement term should be in OBO file as an entry.
- Check for inclusion in provided gene set afterwards using:
gafDict = {key: value for key, value in gafDict.items() if key in geneSet }