goscripts.genelist_importer module¶
@author: Pieter Moris
-
goscripts.genelist_importer.
importGeneList
(path)¶ Imports the interest/background set (Uniprot AC).
Parameters: path (str) – The path to the file. Returns: - set of str – A set of background Uniprot AC’s.
- Notes (Gene lists should not contain a header. One gene per line.)
- Possible improvements (check for file structure and allow headers, comma separated lists, etc.)
-
goscripts.genelist_importer.
isValidSubset
(subset, background)¶ Checks if the gene subset of interest contains genes not present in the background set. If there are additional genes they are removed.
Parameters: - subset (set of str) – A subset of Uniprot ACs of interest.
- background (set of str) – A set of Uniprot ACs to be used as the background.
Returns: A cleaned subset of Uniprot ACs of interest.
Return type: set of str
-
goscripts.genelist_importer.
reportMissingGenes
(geneSet, gafDict, indicator)¶ Finds and reports Uniprot AC’s in the provided background/interest gene sets which are not present in the gene association file (most likely obsolete entries). Also returns a new set where these missing genes are removed.
Parameters: - geneSet (set of str) – A set of Uniprot ACs. Generated by importSubset() or importBackground().
- gafDict (dict of str mapping to set) – A dictionary mapping gene Uniprot AC’s (str) to a set GO IDs. Generated by importGAF().
- indicator (str) – A string signifying whether the set is the background or interest set of genes.
Returns: geneSet – The set after removal of Uniprot AC’s not present in provided gene lists.
Return type: set of str