Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Description: The SplicingFactory R package uses transcript-level expression
transcript isoform diversity within samples or between conditions.
Additionally, the package analyzes the isoform diversity data, looking for
significant changes between conditions.
RoxygenNote: 7.1.1
RoxygenNote: 7.3.3
Imports: SummarizedExperiment, methods, stats
Suggests:
testthat,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export(calculate_entropy)
export(calculate_gini)
export(calculate_inverse_simpson)
export(calculate_simpson)
export(calculate_tsallis_entropy)
import(methods)
import(stats)
importFrom(SummarizedExperiment,SummarizedExperiment)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# SplicingFactory 1.3.2 (dev)

* Added Tsallis entropy as a diversity metric, with full documentation and examples.
* Users can now set the `q` parameter for Tsallis entropy in `calculate_diversity()` and `calculate_method()`.
* Documentation and vignette updated to reflect this new feature.

# SplicingFactory 1.3.1 (dev)

* Citation update
Expand Down
42 changes: 30 additions & 12 deletions R/calculate_diversity.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
#' input dataset with transcript-level expression values. The values in
#' \code{x} are grouped into genes based on this vector.
#' @param method Method to use for splicing diversity calculation, including
#' naive entropy (\code{naive}), Laplace entropy (\code{laplace}), Gini index
#' (\code{gini}), Simpson index (\code{simpson}) and inverse Simpson index
#' naive entropy (\code{naive}), Laplace entropy (\code{laplace}), Tsallis entropy (\code{tsallis}),
#' Gini index (\code{gini}), Simpson index (\code{simpson}) and inverse Simpson index
#' (\code{invsimpson}). The default method is Laplace entropy.
#' @param norm If \code{TRUE}, the entropy values are normalized to the number
#' of transcripts for each gene. The normalized entropy values are always
Expand All @@ -21,6 +21,11 @@
#' to use for diversity calculations.
#' @param verbose If \code{TRUE}, the function will print additional diagnostic
#' messages, besides the warnings and errors.
#' @param q Tsallis entropy parameter (q ≥ 0). Only used if method = "tsallis".
#' Default is 2. Must be a single scalar value.
#' Tsallis entropy is a generalization that encompasses multiple diversity measures:
#' q = 0 gives species richness, q = 1 gives Shannon entropy, and other q values
#' give related diversity indices (e.g., Simpson index at q=2).
#' @return Gene-level splicing diversity values in a \code{SummarizedExperiment}
#' object.
#' @import methods
Expand All @@ -35,7 +40,7 @@
#' diversity values for each gene in each sample. These diversity values can be
#' used to investigate the dominance of a specific transcript for a gene,
#' the diversity of transcripts in a gene, and analyze changes in diversity.
#'
#'
#' There are a number of diversity values implemented in the package. These
#' include the following:
#' \itemize{
Expand All @@ -44,6 +49,9 @@
#' values mean a more diverse set of transcripts for a gene.
#' \item Laplace entropy: Shannon entropy where the transcript frequencies are
#' replaced by a Bayesian estimate, using Laplace's prior.
#' \item Tsallis entropy: A generalization of Shannon entropy, parameterized by q (q ≥ 0).
#' q = 0 gives species richness, q → 1 gives Shannon entropy, q ≠ 1 gives Tsallis entropy.
#' The default q is 2.
#' \item Gini index: a measure of statistical dispersion originally used in
#' economy. This measurement ranges from 0 (complete equality) to 1
#' (complete inequality). A value of 1 (complete inequality) means a single
Expand Down Expand Up @@ -73,7 +81,7 @@
#' # calculating normalized Laplace entropy
#' result <- calculate_diversity(x, gene, method = "laplace", norm = TRUE)
calculate_diversity <- function(x, genes = NULL, method = "laplace", norm = TRUE,
tpm = FALSE, assayno = 1, verbose = FALSE) {
tpm = FALSE, assayno = 1, verbose = FALSE, q = 2) {
if (!(is.matrix(x) || is.data.frame(x) || is.list(x) || is(x, "DGEList") ||
is(x, "RangedSummarizedExperiment") || is(x, "SummarizedExperiment"))) {
stop("Input data type is not supported! Please use `?calculate_diversity`
Expand Down Expand Up @@ -143,7 +151,7 @@ calculate_diversity <- function(x, genes = NULL, method = "laplace", norm = TRUE
stop("The number of rows is not equal to the given gene set.", call. = FALSE)
}

if (!(method %in% c("naive", "laplace", "gini", "simpson", "invsimpson"))) {
if (!(method %in% c("naive", "laplace", "tsallis", "gini", "simpson", "invsimpson"))) {
stop("Invalid method. Please use `?calculate_diversity` to see the possible
arguments and details.",
call. = FALSE
Expand All @@ -168,18 +176,28 @@ calculate_diversity <- function(x, genes = NULL, method = "laplace", norm = TRUE
have any effect on the calculation.", call. = FALSE)
}

result <- calculate_method(x, genes, method, norm, verbose = verbose)
result <- calculate_method(x, genes, method, norm, verbose = verbose, q = q)

# Prepare assay and row/col data
result_assay <- result[, -1, drop = FALSE]
rownames(result_assay) <- result[, 1]
result_rowData <- data.frame(genes = result[, 1], row.names = result[, 1])
result_colData <- data.frame(samples = colnames(x), row.names = colnames(x))

# For Tsallis with scalar q, columns correspond to samples only
col_ids <- colnames(x)
row_ids <- as.character(result[, 1])
result_colData <- data.frame(samples = col_ids, row.names = col_ids)
colnames(result_assay) <- col_ids
rownames(result_assay) <- row_ids

result_metadata <- list(method = method, norm = norm)
if (method == "tsallis") result_metadata$q <- q

result <- SummarizedExperiment(assays = list(diversity = result_assay),
rowData = result_rowData,
colData = result_colData,
metadata = result_metadata)
result <- SummarizedExperiment(
assays = list(diversity = result_assay),
rowData = result_rowData,
colData = result_colData,
metadata = result_metadata
)

return(result)
}
37 changes: 34 additions & 3 deletions R/calculate_method.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,18 @@
#' input dataset with transcript-level expression values. The values in
#' \code{x} are grouped into genes based on this vector.
#' @param method Method to use for splicing diversity calculation, including
#' naive entropy (\code{naive}), Laplace entropy (\code{laplace}), Gini index
#' (\code{gini}), Simpson index (\code{simpson}) and inverse Simpson index
#' naive entropy (\code{naive}), Laplace entropy (\code{laplace}), Tsallis entropy (\code{tsallis}),
#' Gini index (\code{gini}), Simpson index (\code{simpson}) and inverse Simpson index
#' (\code{invsimpson}). The default method is Laplace entropy.
#' @param norm If \code{TRUE}, the entropy values are normalized to the number
#' of transcripts for each gene. The normalized entropy values are always
#' between 0 and 1. If \code{FALSE}, genes cannot be compared to each other,
#' due to possibly different maximum entropy values.
#' @param q Tsallis entropy parameter (q ≥ 0). Only used if method = "tsallis".
#' Default is 2. Must be a single scalar value.
#' Tsallis entropy is a generalization that encompasses multiple diversity measures:
#' q = 0 gives species richness, q = 1 gives Shannon entropy, and other q values
#' give related diversity indices (e.g., Simpson index at q=2).
#' @param verbose If \code{TRUE}, the function will print additional diagnostic
#' messages, besides the warnings and errors.
#' @return Gene-level splicing diversity values in a \code{data.frame}, where
Expand All @@ -23,7 +28,8 @@
#' transcript-level expression values, aggregated by the genes defined in the
#' \code{genes} parameter.
#' @import stats
calculate_method <- function(x, genes, method, norm = TRUE, verbose = FALSE) {
calculate_method <- function(x, genes, method, norm = TRUE, verbose = FALSE, q = 2) {

if (method == "naive") {
x <- aggregate(x, by = list(genes), calculate_entropy, norm = norm)
}
Expand All @@ -33,6 +39,31 @@ calculate_method <- function(x, genes, method, norm = TRUE, verbose = FALSE) {
pseudocount = 1)
}

if (method == "tsallis") {
# Note: q must be a scalar value (required for statistical testing)
# calculate_tsallis_entropy enforces length(q) == 1
gene_levels <- unique(genes)
coln <- colnames(x)
rown <- gene_levels
tsallis_row <- function(gene) {
idx <- which(genes == gene)
sapply(seq_len(ncol(x)), function(j) {
calculate_tsallis_entropy(x[idx, j], q = q, norm = norm)
})
}
result_mat <- t(vapply(gene_levels, tsallis_row, FUN.VALUE = numeric(ncol(x))))
colnames(result_mat) <- coln
rownames(result_mat) <- rown
out_df <- data.frame(Gene = rown, result_mat, check.names = FALSE)
if (all(rowSums(!is.na(result_mat)) == 0)) {
out_df <- data.frame(Gene=character(0))
for (nm in coln) out_df[[nm]] <- numeric(0)
x <- out_df
return(x)
}
x <- out_df
}

if (method == "gini") {
x <- aggregate(x, by = list(genes), calculate_gini)
}
Expand Down
Loading