CNS microRNA Profiles

General Questions

How were these data generated?

A detailed description of the data generation method can be found in Hoye et al., J Neurosci (2017) and He et al., Neuron (2012). Click the hyperlink to access the publication on PubMed. Please cite if you use data derived from the publication.

What types of data can I download from this site?

Each individual bar graph can be downloaded as either a .png or .pdf. A table generated under the “Compare Cell Types Enrichment” header is downloadable as a .csv at the bottom of the table. The entirety of the raw data table is downloadable under the “Download Data” header.

Why are there two different reports of statistical measurements?

Significance between any one cell type and another was calculated by empirical Bayes, adjusting the p-value for multiple comparisons with the Benjamini-Hochberg correction. This cannot be performed when comparing one cell type to all others as this is not a pairwise comparison. Rather, the R package pSI was used to determine miRNAs enriched in specific cell populations across a large number of profiles.

How do I figure out the most highly expressed miRNA by a given cell type?

Under the “Search by Cell Type” tab, compare the cell type of interest to all other cell types. Then click the column header that refers to the expression of the cell type of interest and this will sort the table so the most highly expressed miRNA by that cell type are at the top.

What does it mean for a miRNA to be “conserved” and how is this parameter calculated?

A conserved miRNA is one where the sequence is similar across many species. Generally speaking, a more highly conserved miRNA strand is more likely to be stable and therefore loaded into the RISC complex to serve as a functional regulator. While the data were all generated in mouse models, conservation scores can potentially inform on the likelihood of a shared phenotype across multiple organisms. Conservation data was taken from TargetScan “miR Family” data table. TargetScan defines conservation along three levels: broadly conserved (conserved across most vertebrates, usually to zebrafish), conserved (conserved across most mammals, but usually not beyond placental mammals) and poorly conserved (all other combinations).

Which standard miRNA notation system is used to depict the data? Is this system identical across each data set?

The data from Hoye 2017 utilized the 5p/3p strand notation system, whereby the He 2012 data originally utilized the star strand notation. Where possible, we converted this into the 5p v 3p strand notation system, using miRBase and miRDB as guides. We hope this will help to improve clarity and specificity in order to remove unambiguous identification. We did not change any data during this reformatting process.

Who should I contact with additional questions not addressed in this FAQ section?

Please reach out to either Dr. Joseph Dougherty (jdougherty@wustl.edu) or Dr. Timothy Miller (miller.t@wustl.edu) with questions related to this database.

Questions about Hoye et al. (2017), neurons v glia in spinal cord and brainstem

What exogenous control is used for each bar graph?

Within each bar graph, miRNA expression is normalized to the tissue-wide Ago. Thus, the tissue-wide Ago’s expression level is set at 1.0 and set as the exogenous control. All other cell types are displayed as expression relative to the exogenous control.

What is the difference between Relative Fold Change and Ct?

Relative Fold Change describes the expression of a miRNA in relation to an exogenous control. This reflects the relative abundance of a given miRNA, but not the absolute abundance. Ct (cycle threshold) is determined by the absolute miRNA starting concentration and is therefore a more direct measure of absolute abundance. Ct is still not a direct measure of absolute abundance either and is the closest proxy available in the absence of an FPKM equivalent.

How is log2FC calculated? How does this measure differ from Ct?

Ct is naturally in log2FC (log base 2 of the difference in fold change). The log2FC is calculated as the difference Ct between Cell Type 1 and Cell Type 2. A positive log2FC means that Cell Type 1 has higher expression of the given miRNA, and vice versa. Each value of 1.0 log2FC units indicates an additional two-fold enrichment.

Why are data only available for these four cell types? Will additional cell types be added in the future?

Only samples from these four cell types and two tissue types were run on microarrays from the same lot. Microarrays can only be analyzed if performed side by side in this fashion. Because of this, no additional cell types will be added to the existing dataset. Future assays may be performed or further datasets may be incorporated to fill in important gaps in website functionality. The original publication included data from CNP-Cre (for oligodendrocytes), however this was a secondary assay performed to verify the enrichment of motor neuron-enriched miRNA and was not run on microarrays from the same lot.

What is the “not detectable” threshold?

If the cycle threshold exceeded 35, the sample was considered not to express the given miRNA. To account for non-specific background associated with the myc immunoprecipitation, miRNA expression was measured by an identical myc immunoprecipitation in non-transgenic animals. For a cell type to express a given miRNA above the background value, the biological replicate with the highest Ct count (lowest expression) must be 2 Ct less than non-transgenic median Ct.

Questions about He et al. (2012), neuronal subtypes in neocortex and cerebellum

Why are some miRNA present in this dataset, but not in the neuron v glia dataset and vice versa? Why are the y-axes in different units?

While the neuron v glia dataset was generated using Taqman microarrays, the He et al. dataset was generated using small RNA sequencing. miRNA will be detectable via this method that did not have corresponding probes on the microarray. Conversely, He et al. contains data from different brain regions and only from neuronal cell types. It is therefore possible that no reads were detected because the miRNA is only expressed in more caudal regions or in non-neuronal cell types. The difference in data generation also explains the difference in units. While microarray are quantified as Ct, similar to qPCR, small RNA sequencing is quantified here using CPM (counts per million).

Why do the graphs produced from this dataset have two y-axes? How is the y-axis determine for each miRNA?

Because the data was generated via small RNA sequencing, the miRNA abundance is measured in terms of CPM. CPM is a normalized number of reads and therefore can range from 0 well into the hundreds of thousands. We therefore chose to display this information primarily on a log scale, to capture as much of this large range as possible. However, the log scale can compress differences and make them seem non-significant. For this reason, we display the CPM for each sample below its corresponding bar and on the secondary y-axis. We also changed the y-axis minima, so they start at a value on the order of magnitude of the number of reads, rather than 0.0. These steps will help to visualize inter-cell type differences.

What is the “not detectable” threshold?

A miRNA was considered to be expressed in a given sample if the CPM reliably exceeded 1. Background data from non-transgenic animals were not available for this data.