ENCODEQT Python Module API

ENCODEQueryTools is written in Python 2.7.

DEPENDENCIES:

  • pandas must be installed for the ENCODEQT module to function correctly.
  • json, urllib, urllib2 are also imported by the ENCODEQT module (all part of standard Python).

METHOD SUMMARY:

class ENCODEQT.ENCODEQT

This class contains all functions to access the ENCODE TF Significance Tool’s JSON-based API.

checkGenesInDB(format='pandas')

Given a single gene list, this function will return a JSON-formatted string or a pandas data frame showing which gene/transcript IDs in the list are in the database.

Parameters:format (str) – The format to use for the output. ‘pandas’ will output a Pandas DataFrame object. ‘json’ will output a JSON-formatted string.
Returns:pandas.DataFrame if format=’pandas’ or str in JSON format if format=’json’.
fuzzyCellLineSearch(cellLine, format='pandas')

Given a partial cell type label, this function performs a fuzzy search to find potential matching labels in the database.

Parameters:
  • cellLine (str) – a string to match against cell type labels in the database.
  • format (str) – The format to use for the output. ‘pandas’ will output a Pandas DataFrame object. ‘json’ will output a JSON-formatted string.
Returns:

pandas.DataFrame if format=’pandas’ or str in JSON format if format=’json’.

fuzzyFactorSearch(factor, format='pandas')

Given a partial transcription factor label, this function performs a fuzzy search to find potential matching labels in the database.

Parameters:
  • factor (str) – a string to match against transcription factor labels in the database.
  • format (str) – The format to use for the output. ‘pandas’ will output a Pandas DataFrame object. ‘json’ will output a JSON-formatted string.
Returns:

pandas.DataFrame if format=’pandas’ or str in JSON format if format=’json’.

getAllFactorGenes(factor, format='pandas')

This function returns all genes/transcripts in our database with a transcription factor target window intersection given current variable values.

Parameters:format (str) – The format to use for the output. ‘pandas’ will output a Pandas DataFrame object. ‘json’ will output a JSON-formatted string.
Returns:pandas.DataFrame if format=’pandas’ or str in JSON format if format=’json’.
getBackgroundList()

This function returns the value of _backgroundList as an array.

Returns:str[] – an array containing the symbols to use for the background data set. An empty array indicates that the background data will be comprised of all symbols in the database.
getCellLines()

This function returns the current value of _cellLines.

Returns:str[] – An array of strings containing the cell type labels to use for the current query. For queries pooled across all cell lines, a single-element array containing “all” is used.
getCountsByList(results)

For _queryType=”TFsig”, this function returns the number of genes/transcripts with hits in the database for each foreground symbol list.

Returns:int[] – an array cotaining the number of genes/transcripts with hits in the database for all foreground symbol lists.
getDownstream()

This function returns the current value of _downstream, the downstream pad, in basepairs. All values will be divisible by 500.

Returns:int – the size of the upstream pad in basepairs (the number will be negative, indicating upstream).
getElementType()

This function returns the current value of _elementType.

Returns:str – the element type to be used for the current query.
getFeatureType()

This function returns the current value of _featureType.

Returns:str – the feature type to be used for the current query.
getListNames()

This function returns the current value of _listNames as an array.

Returns:str[] – an array of labels for each gene/transcript list to be used in the current query.
getOrganism()

This function returns the current value of _organism.

Returns:str – the organism type to be used for the current query.
getPriorQuery()

This function returns the most recent data POSTed to the API as a JSON-formatted string. Only queries of type “TFSig” are stored.

Returns:string – a JSON-formatted string containing the data POSTed to the API for the most recent “TFsig” query.
getSigGenesFromListForFactor(factor, format='pandas', listToUse='List1')

This function returns all genes/transcripts in the user-provided gene list with a transcription factor target window intersection given current variable values.

Parameters:
  • format (str) – The format to use for the output. ‘pandas’ will output a Pandas DataFrame object. ‘json’ will output a JSON-formatted string.
  • listToUse (str) – For queries that use multiple foreground lists, the label corresponding to the list to use for this query. The default value of ‘List1’ is used for single foreground list queries where the _listNames variable is not set.
Returns:

pandas.DataFrame if format=’pandas’ or str in JSON format if format=’json’.

getSignificanceScores(format='pandas')

This function calculates significantly enriched transcription factors in one or more gene lists using current variable values.

Parameters:format (str) – The format to use for the output. ‘pandas’ will output a Pandas DataFrame object. ‘json’ will output a JSON-formatted string.
Returns:pandas.DataFrame if format=’pandas’ or str in JSON format if format=’json’.
getStatistics(results)

This function extracts the results array returned by all API calls. This array is then used by other functions for furhter processing. Other functions call this function as needed.

Returns:str[] – The array contained in the results field obtained from the server’s response.
getSymbolList()

This function returns the current value of _symbolList as an array of arrays.

Returns:str[][] – an array of arrays containing each foreground list of genes/transcripts to be used in the current query.”
getSymbolType()

This function returns the current value of _symbolType.

Returns:str – the symbol type to be used for the current query
getTFsAndCellLines(format='pandas')

This function returns all transcription factor/cell type combinations in the database.

Parameters:format (str) – The format to use for the output. ‘pandas’ will output a Pandas DataFrame object. ‘json’ will output a JSON-formatted string.
Returns:pandas.DataFrame if format=’pandas’ or str in JSON format if format=’json’. For JSON format, two arrays are returned in which the elements correspond (ex: factor #1 and cell type #1, etc).
getTotalGenesAnalyzed()

For _queryType=”TFsig”, this function returns the total number of genes comprising the background data set used).

Returns:int – the total number of genes used for the background data set in the current query.
getUpstream()

This function returns the current value of _upstream, the upstream pad, in basepairs. All values will be divisible by 500.

Returns:int – the size of the upstream pad in basepairs (the number will be negative, indicating upstream).
setBackgroundList(backgroundList)

This function sets the value of _backgroundList given an array of strings as input.

Parameters:backgroundList (str[]) – an array containing the symbols to use for the background data set. To use the set of all symbols in our database, pass an empty array.
setCellLines(cellLines)

This function sets the value of _cellLines given an array of strings (cell line IDs) as input.

Parameters:cellLines (str[]) – An array of strings containing the cell type labels to use for the current query. For queries pooled across all cell lines, a single-element array containing “all” should be used.
setDownstream(downstream)

This function sets the value of _downstream, the downstream pad. Input should be provided in number of basepairs. Input will be automatically converted to the integer format expected by the API.

Parameters:downstream (int) – The value of the downstream analysis window pad in basepairs. Will be rounded up to nearest multiple of 500 and must be less than 5000.
setElementType(elementType)

This function sets the value of _elementType.

Parameters:elementType (str) – the element type to use for the next query (“genes”, “transcripts”, “pseudogenes”, or “pseudotranscripts”).
setFeatureType(featureType)

This function sets the value of _featureType.

Parameters:featureType (str) – The feature type for the query (“TSS”, “TTS”, or “Genic”).
setListNames(listNames)

This function sets the value of _listNames given an array of strings as input.

Parameters:listNames (str[]) – An array of labels to use for each gene/transcript list in the current query. To use the default labels, pass an empty array.
setOrganism(organism)

This function sets the value of _organism.

Parameters:organism (str) – the organism to query (“human” or “mouse”).
setPriorQuery(query)

This function stores the data to be POSTed to the API as a JSON-formatted string. Only queries of type “TFSig” are stored.

Parameters:query (str) – A JSON-formatted string that was POSTed to the server for the most recent “TFsig” query.
setSymbolList(symbolList)

This function sets the value of _symbolList and expects an array of arrays as input.

Parameters:symbolList (str[][]) – An array of arrays containing lists of genes/transcripts to use as the foreground in the current query.
setSymbolType(symbolType)

This function sets the value of _symbolType.

Parameters:symbolType (str) – The symbol type to use for the current query (“symbol”, “ensembl”, “entrez”, or “havana”).
setUpstream(upstream)

This function sets the value of _upstream, the upstream pad. Input should be provided in number of basepairs. Input will be automatically converted to the integer format expected by the API.

Parameters:upstream (int) – The value of the upstream analysis window pad in basepairs. Will be rounded up to nearest multiple of 500 and must be less than 5000.
showCurrentState()

This function prints the current values of all variables required for queries.

useSampleQuery()

Sets al values equivalent to those needed to run Demo #1 on the encodeqt.stanford.edu website. Used for testing purposes.

validateAllGenesQuery()

This function validates that all variables necessary for query type “allGenes” are set to valid values before POSTing the data to the API. This function is called in the other functions when needed.

Returns:int – the return code (1 if invalid, 0 if valid)
validateCheckGeneInDBQuery()

This function validates that all variables necessary for query type “geneCheck” are set to valid values before POSTing the data to the API. This function is called in the other functions when needed.

Returns:int – the return code (1 if invalid, 0 if valid)
validateOrganism()

This function validates that _organism is set to a valid value (currently “human” or “mouse”). It is called in the other functions when needed.

Returns:int – the return code (1 if invalid, 0 if valid)
validateQuery()

This function validates that all variables required for query type “TFsig” are set to valid values before POSTing the data to the API. This function is called in the other functions when needed.

Returns:int – the return code (1 if invalid, 0 if valid)
verifyCellLineInDB(cellLine)

Given a cell type label, this function verifies that it is in the database.

Parameters:cellLine (str) – The label corresponding to the cell type label to check in the database.
Returns:bool – whether the label exists in the database.
verifyFactorInDB(factor)

Given a transcription factor label, this function verifies that it is in the database.

Parameters:factor (str) – The label corresponding to the transcription factor label to check in the database.
Returns:bool – whether the label exists in the database.

Previous topic

About ENCODEQueryTools

Next topic

ENCODEQT Python Module Examples

This Page