Tutorial

1. INPUT

Daphnia-cisTarget is currently available for Daphnia pulex and Daphnia magna and can be run through the web interface of i-cisTarget.

input form

(1) Paste a list of gene identifiers from Daphnia magna or Daphnia pulex or upload a file with gene IDs. See below for supported gene IDs.

Note:

  • genomic coordinates (regions, e.g. ChIP peaks) are not supported for Daphnia.
  • Don't input very small gene sets (e.g. less than 10). Small input sets may result in many false positives.

(2) Choose the appropriate species and gene annotation version

(3) Each gene is linked to candidate regulatory regions that form the motif search space. To this end, we assigned to each gene the non-coding regions located in its neighbourhood, including UTRs, introns, and a region upstream of the transcription start site. This upstream region comprises either 300 bp or 5 kb. Which search space to use can be selected as a parameter. If both options are selected, Daphnia-cisTarget will run the motif recovery analysis on both data bases and combine the results in one report.

(4) If the analysis is run using both search space options (300 bp and 5 kb upstream), the enrichment analysis is computed within each database separately by default. In this case, AUC distribution and NES are generated for each data base separately, but the results are combined in one report and sorted by NES. If you choose to analyse the enrichment over all databases, AUC distribution and NES are calculated across both databases.

(5) Optional parameters

Normalized enrichment score threshold
Only the enriched motifs with normalized enrichment score higher than the threshold will be shown in the report, as well as the STAMP clustering of similar motifs will be performed only for the motifs with NES above this threshold. You can reduce the time needed for clustering by increasing this threshold.

ROC threshold for AUC calculation
For each motif, we generate a receiver operating curve (ROC) and calculate the area under this curve (AUC). Because we are mainly interested in highly ranked genes, we calculate the AUC for only a fraction of the top ranked genes. This AUC is then used to compare all motifs and to calculate the NES. The parameter is set by default to 0.03, which corresponds to 990 genes in the D. magna and 927 genes in the D. pulex databases. It can't be set higher than the threshold for visualization (e.g. if the ROC threshold is set to 0.01, the threshold for visualisation has to be at least 330 for D. magna)

Threshold for visualization
The cut-off for x-axis of AUC plot. If this is set to 20.000 then the recovery curve will be visualized for the top 20.000 genes.

2. RECOVERY ANALYSIS

Daphnia-cisTarget generates a cumulative recovery curve for the input gene set ("foreground"), using the motif score ranking of all genes ("background"). The Area Under the Curve (AUC) of these foreground genes is calculated for each motif, and the AUCs for all motifs are normalised to a Normalized Enrichment Score (NES = (AUC-µ)/σ). Moreover, similar enriched motifs are clustered together using STAMP.

3. OUTPUT

When the analysis is finished the results will appear on the webpage or the link to the results will be provided to your e-mail.

output form

(1) Parameters and statistics

Table listing the parameters used in the analysis and statistics, including:

(2) Plot representing the AUC distribution

If the "within each database separately" enrichment analysis was used, the distribution for each database is represented by a different color. The mean of all AUC scores across all motifs in the corresponding database (µ) is given, along with the standard deviation of all these AUC scores (σ). The dotted line indicates cut-off that is based on the NES threshold. Motifs with an AUC to the right of this line surpass the NES threshold and are listed in the output.

(3) Recovery curve of the best motif

If two databases were used, the best motifs are represented by a different color. The thick curve indicates the average number of recovered genes across all motifs in the database. The dotted line shows the AUC threshold that was used to calculate the area under the curve.

(4) Table listing the most enriched motifs for the input gene set, ranked according to NES

Note that similar motifs are clustered together by the same color (STAMP clustering). The table includes:

(5) Subsequent analyses can be performed for the selected enriched motifs

Supported input formats

Gene signatures must be supplied as a list of gene identifiers, separated by newline characters. The gene IDs must correspond to the IDs in the following gene catalogues:

D. magna D. pulex
201104m8 pasaupdate Frozen catalogue v1.1 v2.0 beta3
mu8AUGepir3s00311g138
mu8AUGepir2s00007g44
mu8AUGapi5s01092g326
mu8AUGepir7p2s01581g65
mu8AUGapi5_contig23020g252
mu8AUGapi5_contig25230g571
mu8AUGapi5_contig31066g458
mu8AUGapi5_contig50573g462
mu8AUGapi5p1s00024g218
mu8AUGapi5p1s00512g32
JGI_V11_304793
JGI_V11_302856
JGI_V11_319285
JGI_V11_330653
JGI_V11_302561
JGI_V11_40590
JGI_V11_318354
JGI_V11_318355
JGI_V11_323532
JGI_V11_119506
hxNCBI_GNO_50334
hxAUG26us90g26t1
hxJGI_V11_242009
hxAUG26us39g44t1
hxNCBI_GNO_255214
hxAUG25s25g53t1
hxNCBI_GNO_90674
hxAUG26rep1s4g305t1
hxNCBI_GNO_59084
hxAUG26rep2s1g103t1

You can use the following files for gene ID conversion between different gene catalogues:

Examples

Study Species Input Reference Link
Heat shock signature D. pulex Genes Spanier et al., Genome Biol. Evol. (2017) Report
Genes upregulated after chronic treatment with microcystin-free cyanobacteria D. magna Genes Schwarzenberger et al., BMC Genomics (2014) Report

Contact

If you have any question or problem related to Daphnia-cisTarget, please, inform us: lcbtools@kuleuven.be

Cite us