goscripts.genelist_importer module

@author: Pieter Moris

goscripts.genelist_importer.importGeneList(path)

Imports the interest/background set (Uniprot AC).

Parameters:path (str) – The path to the file.
Returns:
  • set of str – A set of background Uniprot AC’s.
  • Notes (Gene lists should not contain a header. One gene per line.)
  • Possible improvements (check for file structure and allow headers, comma separated lists, etc.)
goscripts.genelist_importer.isValidSubset(subset, background)

Checks if the gene subset of interest contains genes not present in the background set. If there are additional genes they are removed.

Parameters:
  • subset (set of str) – A subset of Uniprot ACs of interest.
  • background (set of str) – A set of Uniprot ACs to be used as the background.
Returns:

A cleaned subset of Uniprot ACs of interest.

Return type:

set of str

goscripts.genelist_importer.reportMissingGenes(geneSet, gafDict, indicator)

Finds and reports Uniprot AC’s in the provided background/interest gene sets which are not present in the gene association file (most likely obsolete entries). Also returns a new set where these missing genes are removed.

Parameters:
  • geneSet (set of str) – A set of Uniprot ACs. Generated by importSubset() or importBackground().
  • gafDict (dict of str mapping to set) – A dictionary mapping gene Uniprot AC’s (str) to a set GO IDs. Generated by importGAF().
  • indicator (str) – A string signifying whether the set is the background or interest set of genes.
Returns:

geneSet – The set after removal of Uniprot AC’s not present in provided gene lists.

Return type:

set of str