Tutorial
1. INPUT
Daphnia-cisTarget is currently available for Daphnia pulex and Daphnia magna and can be run through the web interface of i-cisTarget.
(1) Paste a list of gene identifiers from Daphnia magna or Daphnia pulex or upload a file with gene IDs. See below for supported gene IDs.
Note:
- genomic coordinates (regions, e.g. ChIP peaks) are not supported for Daphnia.
- Don't input very small gene sets (e.g. less than 10). Small input sets may result in many false positives.
(2) Choose the appropriate species and gene annotation version
(3) Each gene is linked to candidate regulatory regions that form the motif search space. To this end, we assigned to each gene the non-coding regions located in its neighbourhood, including UTRs, introns, and a region upstream of the transcription start site. This upstream region comprises either 300 bp or 5 kb. Which search space to use can be selected as a parameter. If both options are selected, Daphnia-cisTarget will run the motif recovery analysis on both data bases and combine the results in one report.
(4) If the analysis is run using both search space options (300 bp and 5 kb upstream), the enrichment analysis is computed within each database separately by default. In this case, AUC distribution and NES are generated for each data base separately, but the results are combined in one report and sorted by NES. If you choose to analyse the enrichment over all databases, AUC distribution and NES are calculated across both databases.
(5) Optional parameters
Normalized enrichment score threshold
Only the enriched motifs with normalized enrichment score higher than the threshold will be shown in the report, as well as the STAMP clustering of similar motifs will be performed only for the motifs with NES above this threshold. You can reduce the time needed for clustering by increasing this threshold.
ROC threshold for AUC calculation
For each motif, we generate a receiver operating curve (ROC) and calculate the area under this curve (AUC). Because we are mainly interested in highly ranked genes, we calculate the AUC for only a fraction of the top ranked genes. This AUC is then used to compare all motifs and to calculate the NES. The parameter is set by default to 0.03, which corresponds to 990 genes in the D. magna and 927 genes in the D. pulex databases. It can't be set higher than the threshold for visualization (e.g. if the ROC threshold is set to 0.01, the threshold for visualisation has to be at least 330 for D. magna)
Threshold for visualization
The cut-off for x-axis of AUC plot. If this is set to 20.000 then the recovery curve will be visualized for the top 20.000 genes.
2. RECOVERY ANALYSIS
Daphnia-cisTarget generates a cumulative recovery curve for the input gene set ("foreground"), using the motif score ranking of all genes ("background"). The Area Under the Curve (AUC) of these foreground genes is calculated for each motif, and the AUCs for all motifs are normalised to a Normalized Enrichment Score (NES = (AUC-µ)/σ). Moreover, similar enriched motifs are clustered together using STAMP.
3. OUTPUT
When the analysis is finished the results will appear on the webpage or the link to the results will be provided to your e-mail.
(1) Parameters and statistics
Table listing the parameters used in the analysis and statistics, including:
- Total number of motifs (features) for which ranking was considered across all the databases
- Number of enriched motifs (features) for the specific NES threshold
- Total number of ranked genes (regions)
- Type of input query: 'regions' by default for technical reasons
- Fraction of input gene IDs that is contained in database. Can be below 1 if some input genes don't have candidate regulatory regions assigned.
- Number of i-cisTarget regions in input set: number of input genes
- Normalized enrichment score (NES) threshold
- AUC threshold (fraction/number of genes corresponding to this threshold)
- Recovery curve threshold (number of genes visualized in the AUC plots)
(2) Plot representing the AUC distribution
If the "within each database separately" enrichment analysis was used, the distribution for each database is represented by a different color. The mean of all AUC scores across all motifs in the corresponding database (µ) is given, along with the standard deviation of all these AUC scores (σ). The dotted line indicates cut-off that is based on the NES threshold. Motifs with an AUC to the right of this line surpass the NES threshold and are listed in the output.
(3) Recovery curve of the best motif
If two databases were used, the best motifs are represented by a different color. The thick curve indicates the average number of recovered genes across all motifs in the database. The dotted line shows the AUC threshold that was used to calculate the area under the curve.
(4) Table listing the most enriched motifs for the input gene set, ranked according to NES
Note that similar motifs are clustered together by the same color (STAMP clustering). The table includes:
- Rank of the motif (#)
- Name of the motif (feature) with a list of possible TFs that might bind those motifs. In brackets the names of known or predicted D. melanogaster TFs according to the motif2TF database, followed Daphnia homologous genes.
- Normalized enrichment score (NES) value
- Logo of the enriched motif
- Recovery curve for the enriched motif (blue). To retrieve an optimal subset of the input gene set as putative target genes, a “leading edge” is determined as the rank position where the difference between the signal (blue curve) and the background (mean recovery curve plus two standard deviations: green curve) is largest. The dotted line indicates this leading edge (x-axis) and the corresponding number of input genes (y-axis).
- A link to the list of candidate target genes. Rank: position of a gene in the ranking of all genes for this particular motif. For technical reasons, the gene ID column is called 'Region ID' and the annotation column 'Associated genes'. The annotation column contains the annotations provided by the Daphnia Genomics Consortium along with the gene symbols of D. melanogaster homologues in brackets.
- A link to the list of the genes from the input that are ranked among the top genes for this motif. The top is determined by the recovery curve threshold (1025 by default).
- Name of the database which contains the enriched feature
(5) Subsequent analyses can be performed for the selected enriched motifs
- Use candidate target regions as filter and use as input for i-cisTarget analysis again.
Retrieves the candidate target genes from the selected motifs and uses them as input for a new Daphnia-cisTarget analysis. - Scan candidate target regions of selected features either for multiple homotypic or heterotypic CRMs
This is currently not possible for Daphnia-cisTarget. - Create SIF file for the selected features
Simple Interaction File (SIF) including names of the selected motifs, predicted target genes and their annotation. You can import the SIF file in Cytoscape to create a gene regulatory network.
Supported input formats
Gene signatures must be supplied as a list of gene identifiers, separated by newline characters. The gene IDs must correspond to the IDs in the following gene catalogues:
- Daphnia magna: daphmagna_201104m8.pasaupdate.gtf
- Daphnia pulex: dpulex_jgi060905_JGI_V11.gff or daphnia_genes2010_beta3.gff
D. magna | D. pulex | |
---|---|---|
201104m8 pasaupdate | Frozen catalogue v1.1 | v2.0 beta3 |
mu8AUGepir3s00311g138
|
JGI_V11_304793
|
hxNCBI_GNO_50334
|
You can use the following files for gene ID conversion between different gene catalogues:
- Daphnia magna: Gene IDs of genome-modelled gene catalogues 201104m8
Note: The transcriptome-modelled gene catalogue (Orsini et al., 2016) is NOT integrated in Daphnia-cisTarget, since - Daphnia pulex: Gene IDs of v1.1 gene catalogues of JGI (JGI_V11), Ensembl (DAPPUDRAFT) and Gnomon (NCBI_GNO)
Examples
Study | Species | Input | Reference | Link |
---|---|---|---|---|
Heat shock signature | D. pulex | Genes | Spanier et al., Genome Biol. Evol. (2017) | Report |
Genes upregulated after chronic treatment with microcystin-free cyanobacteria | D. magna | Genes | Schwarzenberger et al., BMC Genomics (2014) | Report |
Contact
If you have any question or problem related to Daphnia-cisTarget, please, inform us: lcbtools@kuleuven.be
Cite us
-
If you use Daphnia-cisTarget, please cite:
Spanier, K.I., Jansen, M., Decaestecker, E., Hulselmans, G., Becker, D., Colbourne, J.K., Orsini, L., De Meester, L. and Aerts, S. (2017) Conserved Transcription Factors Steer Growth-Related Genomic Programs in Daphnia. Genome Biology and Evolution doi: 10.1093/gbe/evx127
-
If you use i-cisTarget, please cite:
Imrichová,H., Hulselmans,G., Kalender Atak,Z., Potier,D. and Aerts,S. (2015) i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Research doi: 10.1093/nar/gkv395
Herrmann,C., Van de Sande,B., Potier,D. and Aerts,S. (2012) i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Research doi: 10.1093/nar/gks543