goscripts.gaf_parser module¶
@author: Pieter Moris
-
goscripts.gaf_parser.cleanGafTerms(gafDict, filteredGOdict)¶ Remove GO terms that do not belong to the chosen namespace, from the gaf dictionary. Also removes genes entirely if none of their associated terms belong to the namespace.
Parameters: - gafDict (dict of str mapping to set) – A dictionary that maps gene Uniprot ACs (str) to a set GO term IDs. Generated by importGAF().
- filteredGOdict – A filtered dictionary of GO objects all belonging to the same namespace. Generated by obo_tools.importOBO() followed by obo_tools.filterOnNamespace().
Returns: The gaf dictionary after removal of GO terms belonging to different namespaces.
Return type: filteredGafDict
-
goscripts.gaf_parser.createSubsetGafDict(subset, gafDict)¶ Generates a dictionary that maps the subset’s Uniprot ACs to the GO IDs, based on the provided gene subset and the gaf dictionary.
Parameters: - subset (set of str) – A subset of Uniprot ACs of interest.
- gafDict (dict of str mapping to set) – A dictionary that maps Uniprot ACs (str) to a set GO IDs. Generated by importGAF().
Returns: A dictionary that maps the subset’s Uniprot ACs to GO IDs.
Return type: dict of str mapping to set
-
goscripts.gaf_parser.importGAF(path, geneSet)¶ Imports a GAF file (gene association format) and generates a dictionary mapping the gene Uniprot AC to the GO ID. Only imports genes which are present in the provided (background) gene set, if one is provided.
- Information on the GAF 2.1 format can be found at
- http://geneontology.org/page/go-annotation-file-gaf-format-21
Parameters: - path (str) – The path to the file.
- geneSet (set) – A set containing the Uniprot AC’s of all the genes under consideration (background).
Returns: dict of str mapping to set – A dictionary that maps Uniprot ACs (str) to a set GO IDs.
Possible improvements – Check for is_obsolete and replaced_by, although the replacement term should be in OBO file as an entry.
- Check for inclusion in provided gene set afterwards using:
gafDict = {key: value for key, value in gafDict.items() if key in geneSet }