Title: | Flexible Inference via Permutations in R |
---|---|
Description: | A flexible permutation framework for making inference such as point estimation, confidence intervals or hypothesis testing, on any kind of data, be it univariate, multivariate, or more complex such as network-valued data, topological data, functional data or density-valued data. |
Authors: | Alessia Pini [aut], Aymeric Stamm [aut, cre] , Simone Vantini [aut], Juliette Chiapello [ctb] |
Maintainer: | Aymeric Stamm <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.3.3.9000 |
Built: | 2024-10-31 05:52:05 UTC |
Source: | https://github.com/lmjl-alea/flipr |
This function carries out an hypothesis test in which the null hypothesis is that K samples are governed by the same underlying generative probability distribution against the alternative hypothesis that they are governed by different generative probability distributions.
anova_test( data, memberships, stats = list(stat_anova_f_ip), B = 1000L, M = NULL, alternative = "right_tail", combine_with = "tippett", type = "exact", seed = NULL, ... )
anova_test( data, memberships, stats = list(stat_anova_f_ip), B = 1000L, M = NULL, alternative = "right_tail", combine_with = "tippett", type = "exact", seed = NULL, ... )
data |
A numeric vector or a numeric matrix or a list specifying the
pooled data points. Alternatively, it can be a distance matrix stored as an
object of class |
memberships |
An integer vector specifying the original membership of each data point. |
stats |
A list of functions produced by |
B |
The number of sampled permutations. Default is |
M |
The total number of possible permutations. Defaults to |
alternative |
A single string or a character vector specifying whether
the p-value is right-tailed, left-tailed or two-tailed. Choices are
|
combine_with |
A string specifying the combining function to be used to
compute the single test statistic value from the set of p-value estimates
obtained during the non-parametric combination testing procedure. For now,
choices are either |
type |
A string specifying which formula should be used to compute the
p-value. Choices are |
seed |
An integer specifying the seed of the random generator useful for
result reproducibility or method comparisons. Default is |
... |
Extra parameters specific to some statistics. |
A base::list
with 4 components:
observed
: the value of the (possible combined) test statistic(s) using
the original memberhips of data points;
pvalue
: the permutation p-value;
null_distribution
: the values of the (possible combined) test statistic(s)
using the permuted memberhips of data points;
permutations
: the permutations that were effectively sampled to produce
the null distribution.
A user-specified function should have at least two arguments:
the first argument should be either a list of the n
pooled data points or
a dissimilarity matrix stored as a stats::dist
object.
the second argument shoud be an integer vector specifying the (possibly permuted) membership of each data point.
See the stat_anova_f()
function for an example.
out1 <- anova_test( data = dist(chickwts$weight), memberships = chickwts$feed, stats = list(stat_anova_f_ip) ) out1$pvalue out2 <- anova_test( data = chickwts$weight, memberships = chickwts$feed, stats = list(stat_anova_f) ) out2$pvalue
out1 <- anova_test( data = dist(chickwts$weight), memberships = chickwts$feed, stats = list(stat_anova_f_ip) ) out1$pvalue out2 <- anova_test( data = chickwts$weight, memberships = chickwts$feed, stats = list(stat_anova_f) ) out2$pvalue
This is a collection of functions that provide test statistics to be used into the permutation scheme for performing (M)ANOVA. These test statistics can be divided into two categories: traditional statistics that use empirical moments and inter-point statistics that only rely on pairwise dissimilarities between data points.
stat_anova_f(data, memberships, ...) stat_anova_f_ip(data, memberships, ...)
stat_anova_f(data, memberships, ...) stat_anova_f_ip(data, memberships, ...)
data |
Either a list of the |
memberships |
An integer vector specifying the membership of each data point. |
... |
Extra parameters specific to some statistics. |
A numeric value storing the value of test statistic given the
(possibly permuted) memberships specified by memberships
.
stat_anova_f()
implements the F statistic used in traditional (M)ANOVA.
stat_anova_f_ip()
implements a pseudo F statistic based on inter-point
distances only as described in Shinohara et al. (2020).
Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of variance; designed experiments. Chapter 5 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Krzanowski, W. J. (1988) Principles of Multivariate Analysis. A User's Perspective. Oxford.
Hand, D. J. and Taylor, C. C. (1987) Multivariate Analysis of Variance and Repeated Measures. Chapman and Hall.
Shinohara, Russell T., et al. "Distance‐based analysis of variance for brain connectivity." Biometrics 76.1 (2020): 257-269.
npk2 <- npk npk2$foo <- rnorm(24) n <- nrow(npk2) data1 <- purrr::array_tree(npk2$yield, margin = 1) stat_anova_f(data1, npk2$block) data2 <- purrr::array_tree(cbind(npk2$yield, npk2$foo), margin = 1) stat_anova_f(data2, npk2$block) D <- dist(cbind(npk2$yield, npk2$foo)) stat_anova_f_ip(D, npk2$block)
npk2 <- npk npk2$foo <- rnorm(24) n <- nrow(npk2) data1 <- purrr::array_tree(npk2$yield, margin = 1) stat_anova_f(data1, npk2$block) data2 <- purrr::array_tree(cbind(npk2$yield, npk2$foo), margin = 1) stat_anova_f(data2, npk2$block) D <- dist(cbind(npk2$yield, npk2$foo)) stat_anova_f_ip(D, npk2$block)
Biregular grids can be created for any number of parameter objects.
grid_biregular( x, ..., center = NULL, levels = 3, original = TRUE, filter = NULL )
grid_biregular( x, ..., center = NULL, levels = 3, original = TRUE, filter = NULL )
x |
A |
... |
One or more |
center |
A numeric vector specifying the point onto which the biregular
grid should be centered. Defaults to |
levels |
An integer for the number of values of each parameter to use
to make the regular grid. |
original |
A logical: should the parameters be in the original units or in the transformed space (if any)? |
filter |
A logical: should the parameters be filtered prior to generating the grid. Must be a single expression referencing parameter names that evaluates to a logical vector. |
Note that there may a difference in grids depending on how the function
is called. If the call uses the parameter objects directly the possible
ranges come from the objects in dials
. For example:
mixture()
## Proportion of Lasso Penalty (quantitative) ## Range: [0, 1]
set.seed(283) mix_grid_1 <- grid_random(mixture(), size = 1000) range(mix_grid_1$mixture)
## [1] 0.001490161 0.999741096
However, in some cases, the parsnip
and recipe
packages overrides
the default ranges for specific models and preprocessing steps. If the
grid function uses a parameters
object created from a model or recipe,
the ranges may have different defaults (specific to those models). Using
the example above, the mixture
argument above is different for
glmnet
models:
library(parsnip) library(tune) # When used with glmnet, the range is [0.05, 1.00] glmn_mod <- linear_reg(mixture = tune()) %>% set_engine("glmnet") set.seed(283) mix_grid_2 <- grid_random(extract_parameter_set_dials(glmn_mod), size = 1000) range(mix_grid_2$mixture)
## [1] 0.05141565 0.99975404
A tibble. There are columns for each parameter and a row for every parameter combination.
grid_biregular(dials::mixture(), center = 0.2)
grid_biregular(dials::mixture(), center = 0.2)
This function carries out an hypothesis test where the null hypothesis is that the sample is governed by a generative probability distribution which is centered and symmetric against the alternative hypothesis that they are governed by a probability distribution that is either not centered or not symmetric.
one_sample_test( x, stats = list(stat_max), B = 1000L, M = NULL, alternative = "two_tail", combine_with = "tippett", type = "exact", seed = NULL, ... )
one_sample_test( x, stats = list(stat_max), B = 1000L, M = NULL, alternative = "two_tail", combine_with = "tippett", type = "exact", seed = NULL, ... )
x |
A numeric vector or a numeric matrix or a list representing the sample from which the user wants to make inference. |
stats |
A list of functions produced by |
B |
The number of sampled permutations. Default is |
M |
The total number of possible permutations. Defaults to |
alternative |
A single string or a character vector specifying whether
the p-value is right-tailed, left-tailed or two-tailed. Choices are
|
combine_with |
A string specifying the combining function to be used to
compute the single test statistic value from the set of p-value estimates
obtained during the non-parametric combination testing procedure. For now,
choices are either |
type |
A string specifying which formula should be used to compute the
p-value. Choices are |
seed |
An integer specifying the seed of the random generator useful for
result reproducibility or method comparisons. Default is |
... |
Extra parameters specific to some statistics. |
A list
with three components: the value of the
statistic for the original two samples, the p-value of the resulting
permutation test and a numeric vector storing the values of the permuted
statistics.
A user-specified function should have at least two arguments:
the first argument is data
which should be a list of the n
observations from the sample;
the second argument is flips
which should be an integer vector giving
the signs by which each observation in data
should be multiplied.
It is possible to use the use_stat
function with nsamples = 1
to have flipr automatically generate a template file for writing down
your own test statistics in a way that makes it compatible with the flipr
framework.
See the stat_max
function for an example.
n <- 10L mu <- 3 sigma <- 1 # Sample under the null distribution x1 <- rnorm(n = n, mean = 0, sd = sigma) t1 <- one_sample_test(x1, B = 100L) t1$pvalue # Sample under some alternative distribution x2 <- rnorm(n = n, mean = mu, sd = sigma) t2 <- one_sample_test(x2, B = 100L) t2$pvalue
n <- 10L mu <- 3 sigma <- 1 # Sample under the null distribution x1 <- rnorm(n = n, mean = 0, sd = sigma) t1 <- one_sample_test(x1, B = 100L) t1$pvalue # Sample under some alternative distribution x2 <- rnorm(n = n, mean = mu, sd = sigma) t2 <- one_sample_test(x2, B = 100L) t2$pvalue
This is a collection of functions that provide test statistics to be used into the permutation scheme for performing one-sample testing.
stat_max(data, flips, ...)
stat_max(data, flips, ...)
data |
A list storing the sample from which the user wants to make inference. |
flips |
A numeric vectors of |
... |
Extra parameters specific to some statistics. |
A numeric value evaluating the desired test statistic.
n <- 10 x <- as.list(rnorm(n)) flips <- sample(c(-1, 1), n, replace = TRUE) stat_max(x, flips)
n <- 10 x <- as.list(rnorm(n)) flips <- sample(c(-1, 1), n, replace = TRUE) stat_max(x, flips)
A plausibility function is...
nparams
An integer specifying the number of parameters to be
inferred. Default is 1L
.
nperms
An integer specifying the number of permutations to be
sampled. Default is 1000L
.
nperms_max
An integer specifying the total number of distinct permutations that can be made given the sample sizes.
alternative
A string specifying the type of alternative
hypothesis. Choices are "two_tail"
, "left_tail"
and "right_tail
.
Defaults to "two_tail"
.
aggregator
A string specifying which function should be used to
aggregate test statistic values when non-parametric combination is used
(i.e. when multiple test statistics are used). Choices are "tippett"
and "fisher
for now. Defaults to "tippett"
.
pvalue_formula
A string specifying which formula to use for
computing the permutation p-value. Choices are either probability
(default) or estimator
. The former provides p-values that lead to
exact hypothesis tests while the latter provides an unbiased estimate
of the traditional p-value.
max_conf_level
A numeric value specifying the maximum confidence
level that we aim to achieve for the confidence regions. This is used
to compute bounds on each parameter of interest in order to fit a
Kriging model that approximates the expensive plausibility function on
a hypercube. Defaults to 0.99
.
point_estimate
A numeric vector providing point estimates for the parameters of interest.
parameters
A list of functions of class param
produced via
new_quant_param
that stores the parameters to be
inferred along with important properties such as their name, range,
etc. Defaults to NULL
.
grid
A tibble storing evaluations of the plausibility function on
a regular centered grid of the parameter space. Defaults to NULL
.
new()
Create a new plausibility function object.
PlausibilityFunction$new( null_spec, stat_functions, stat_assignments, ..., seed = NULL )
null_spec
A function or an R object coercible into a function (via
rlang::as_function()
). For one-sample problems, it should transform
the x
sample (provided as first argument) using the parameters (as
second argument) to make its distribution centered symmetric. For
two-sample problems, it should transform the y
sample (provided as
first argument) using the parameters (as second argument) to make it
exchangeable with the x
sample under a null hypothesis.
stat_functions
A vector or list of functions (or R objects
coercible into functions via rlang::as_function()
) specifying the
whole set of test statistics that should be used.
stat_assignments
A named list of integer vectors specifying which test statistic should be associated with each parameter. The length of this list should match the number of parameters under investigation and is thus used to set it. Each element of the list should be named after the parameter it identifies.
...
Vectors, matrices or lists providing the observed samples.
seed
A numeric value specifying the seed to be used. Defaults to
NULL
in which case seed = 1234
is used and the user is informed of
this setting.
A new PlausibilityFunction
object.
set_nperms()
Change the value of the nperms
field.
PlausibilityFunction$set_nperms(val)
val
New value for the number of permutations to be sampled.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$nperms pf$set_nperms(10000) pf$nperms
set_nperms_max()
Change the value of the nperms_max
field.
PlausibilityFunction$set_nperms_max(val)
val
New value for the total number of of possible distinct permutations.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$nperms_max pf$set_nperms_max(10000) pf$nperms_max
set_alternative()
Change the value of the alternative
field.
PlausibilityFunction$set_alternative(val)
val
New value for the type of alternative hypothesis.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$alternative pf$set_alternative("right_tail") pf$alternative
set_aggregator()
Change the value of the aggregator
field.
PlausibilityFunction$set_aggregator(val)
val
New value for the string specifying which function should be used to aggregate test statistic values when non-parametric combination is used (i.e. when multiple test statistics are used).
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$aggregator pf$set_aggregator("fisher") pf$aggregator
set_pvalue_formula()
Change the value of the pvalue_formula
field.
PlausibilityFunction$set_pvalue_formula(val)
val
New value for the string specifying which formula should be used to compute the permutation p-value.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$pvalue_formula pf$set_pvalue_formula("estimate") pf$pvalue_formula
get_value()
Computes an indicator of the plausibility of specific values for the parameters of interest in the form of a p-value of an hypothesis test against these values.
PlausibilityFunction$get_value( parameters, keep_null_distribution = FALSE, keep_permutations = FALSE, ... )
parameters
A vector whose length should match the nparams
field
providing specific values of the parameters of interest for assessment
of their plausibility in the form of a p-value of the corresponding
hypothesis test.
keep_null_distribution
A boolean specifying whether the empirical
permutation null distribution should be returned as well. Defaults to
FALSE
.
keep_permutations
A boolean specifying whether the list of sampled
permutations used to compute the empirical permutation null
distribution should be returned as well. Defaults to FALSE
.
...
Extra parameters specific to some statistics.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$get_value(2)
set_max_conf_level()
Change the value of the max_conf_level
field.
PlausibilityFunction$set_max_conf_level(val)
val
New value for the maximum confidence level that we aim to achieve for the confidence regions.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$max_conf_level pf$set_max_conf_level(0.999) pf$max_conf_level
set_point_estimate()
Change the value of the point_estimate
field.
PlausibilityFunction$set_point_estimate( point_estimate = NULL, lower_bound = -10, upper_bound = 10, ncores = 1L, estimate = FALSE, overwrite = FALSE )
point_estimate
A numeric vector providing rough point estimates for the parameters under investigation.
lower_bound
A scalar or numeric vector specifying the lower bounds
for each parameter under investigation. If it is a scalar, the value is
used as lower bound for all parameters. Defaults to -10
.
upper_bound
A scalar or numeric vector specifying the lower bounds
for each parameter under investigation. If it is a scalar, the value is
used as lower bound for all parameters. Defaults to 10
.
ncores
An integer specifying the number of cores to use for
maximizing the plausibility function to get a point estimate of the
parameters. Defaults to 1L
.
estimate
A boolean specifying whether the rough point estimate
provided by val
should serve as initial point for maximizing the
plausibility function (estimate = TRUE
) or as final point estimate
for the parameters (estimate = FALSE
). Defaults to FALSE
.
overwrite
A boolean specifying whether to force the computation if
it has already been set. Defaults to FALSE
.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$point_estimate pf$set_point_estimate(mean(y) - mean(x)) pf$point_estimate
set_parameter_bounds()
Change the value of the parameters
field.
Updates the range of the parameters under investigation.
PlausibilityFunction$set_parameter_bounds(point_estimate, conf_level)
point_estimate
A numeric vector providing a point estimate for
each parameter under investigation. If no estimator is known by the
user, (s)he can resort to the $set_point_estimate()
method to get a
point estimate by maximizing the plausibility function.
conf_level
A numeric value specifying the confidence level to be used for setting parameter bounds. It should be in (0,1).
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(point_estimate = mean(y) - mean(x)) pf$parameters pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$parameters
set_grid()
Computes a tibble storing a regular centered grid of the parameter space.
PlausibilityFunction$set_grid(parameters, npoints = 20L)
parameters
A list of new_quant_param
objects
containing information about the parameters under investigation. It
should contain the fields point_estimate
and range
.
npoints
An integer specifying the number of points to discretize
each dimension. Defaults to 20L
.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L )
evaluate_grid()
Updates the grid
field with a pvalue
column storing
evaluations of the plausibility function on the regular centered grid
of the parameter space.
PlausibilityFunction$evaluate_grid(grid, ncores = 1L)
grid
A tibble
storing a grid that spans the
space of parameters under investigation.
ncores
An integer specifying the number of cores to run
evaluations in parallel. Defaults to 1L
.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L ) pf$evaluate_grid(grid = pf$grid)
clone()
The objects of this class are cloneable with this method.
PlausibilityFunction$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------ ## Method `PlausibilityFunction$set_nperms` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$nperms pf$set_nperms(10000) pf$nperms ## ------------------------------------------------ ## Method `PlausibilityFunction$set_nperms_max` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$nperms_max pf$set_nperms_max(10000) pf$nperms_max ## ------------------------------------------------ ## Method `PlausibilityFunction$set_alternative` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$alternative pf$set_alternative("right_tail") pf$alternative ## ------------------------------------------------ ## Method `PlausibilityFunction$set_aggregator` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$aggregator pf$set_aggregator("fisher") pf$aggregator ## ------------------------------------------------ ## Method `PlausibilityFunction$set_pvalue_formula` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$pvalue_formula pf$set_pvalue_formula("estimate") pf$pvalue_formula ## ------------------------------------------------ ## Method `PlausibilityFunction$get_value` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$get_value(2) ## ------------------------------------------------ ## Method `PlausibilityFunction$set_max_conf_level` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$max_conf_level pf$set_max_conf_level(0.999) pf$max_conf_level ## ------------------------------------------------ ## Method `PlausibilityFunction$set_point_estimate` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$point_estimate pf$set_point_estimate(mean(y) - mean(x)) pf$point_estimate ## ------------------------------------------------ ## Method `PlausibilityFunction$set_parameter_bounds` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(point_estimate = mean(y) - mean(x)) pf$parameters pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$parameters ## ------------------------------------------------ ## Method `PlausibilityFunction$set_grid` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L ) ## ------------------------------------------------ ## Method `PlausibilityFunction$evaluate_grid` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L ) pf$evaluate_grid(grid = pf$grid)
## ------------------------------------------------ ## Method `PlausibilityFunction$set_nperms` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$nperms pf$set_nperms(10000) pf$nperms ## ------------------------------------------------ ## Method `PlausibilityFunction$set_nperms_max` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$nperms_max pf$set_nperms_max(10000) pf$nperms_max ## ------------------------------------------------ ## Method `PlausibilityFunction$set_alternative` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$alternative pf$set_alternative("right_tail") pf$alternative ## ------------------------------------------------ ## Method `PlausibilityFunction$set_aggregator` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$aggregator pf$set_aggregator("fisher") pf$aggregator ## ------------------------------------------------ ## Method `PlausibilityFunction$set_pvalue_formula` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$pvalue_formula pf$set_pvalue_formula("estimate") pf$pvalue_formula ## ------------------------------------------------ ## Method `PlausibilityFunction$get_value` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$get_value(2) ## ------------------------------------------------ ## Method `PlausibilityFunction$set_max_conf_level` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$max_conf_level pf$set_max_conf_level(0.999) pf$max_conf_level ## ------------------------------------------------ ## Method `PlausibilityFunction$set_point_estimate` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$point_estimate pf$set_point_estimate(mean(y) - mean(x)) pf$point_estimate ## ------------------------------------------------ ## Method `PlausibilityFunction$set_parameter_bounds` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(point_estimate = mean(y) - mean(x)) pf$parameters pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$parameters ## ------------------------------------------------ ## Method `PlausibilityFunction$set_grid` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L ) ## ------------------------------------------------ ## Method `PlausibilityFunction$evaluate_grid` ## ------------------------------------------------ x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) { purrr::map(y, ~ .x - parameters[1]) } stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L ) pf$evaluate_grid(grid = pf$grid)
This function plots the plausibility function for up to two parameters of interest.
plot_pf(pf, alpha = 0.05, ngrid = 10, ncores = 1, subtitle = "")
plot_pf(pf, alpha = 0.05, ngrid = 10, ncores = 1, subtitle = "")
pf |
A |
alpha |
A numeric value specifying a significance level to contrast the
plausibility function against. Defaults to |
ngrid |
An integer specifying the grid size on which the plausibility
function will be evaluated. Specifically if |
ncores |
An integer specifying the number of cores to use for
parallelized computations. Defaults to |
subtitle |
A string for specifying a subtitle to the plot. Defaults to
|
A ggplot
object.
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L ) pf$evaluate_grid(grid = pf$grid) plot_pf(pf)
x <- rnorm(10) y <- rnorm(10, mean = 2) null_spec <- function(y, parameters) {purrr::map(y, ~ .x - parameters[1])} stat_functions <- list(stat_t) stat_assignments <- list(mean = 1) pf <- PlausibilityFunction$new( null_spec = null_spec, stat_functions = stat_functions, stat_assignments = stat_assignments, x, y ) pf$set_nperms(50) pf$set_point_estimate(mean(y) - mean(x)) pf$set_parameter_bounds( point_estimate = pf$point_estimate, conf_level = 0.8 ) pf$set_grid( parameters = pf$parameters, npoints = 2L ) pf$evaluate_grid(grid = pf$grid) plot_pf(pf)
This function carries out an hypothesis test in which the null hypothesis is that the two samples are governed by the same underlying generative probability distribution against the alternative hypothesis that they are governed by two different generative probability distributions.
two_sample_test( x, y, stats = list(stat_t), B = 1000L, M = NULL, alternative = "two_tail", combine_with = "tippett", type = "exact", seed = NULL, ... )
two_sample_test( x, y, stats = list(stat_t), B = 1000L, M = NULL, alternative = "two_tail", combine_with = "tippett", type = "exact", seed = NULL, ... )
x |
A numeric vector or a numeric matrix or a list representing the 1st
sample. Alternatively, it can be a distance matrix stored as an object of
class |
y |
A numeric vector if |
stats |
A list of functions produced by |
B |
The number of sampled permutations. Default is |
M |
The total number of possible permutations. Defaults to |
alternative |
A single string or a character vector specifying whether
the p-value is right-tailed, left-tailed or two-tailed. Choices are
|
combine_with |
A string specifying the combining function to be used to
compute the single test statistic value from the set of p-value estimates
obtained during the non-parametric combination testing procedure. For now,
choices are either |
type |
A string specifying which formula should be used to compute the
p-value. Choices are |
seed |
An integer specifying the seed of the random generator useful for
result reproducibility or method comparisons. Default is |
... |
Extra parameters specific to some statistics. |
A list
with three components: the value of the
statistic for the original two samples, the p-value of the resulting
permutation test and a numeric vector storing the values of the permuted
statistics.
A user-specified function should have at least two arguments:
the first argument is data
which should be a list of the n1 + n2
concatenated observations with the original n1
observations from the first
sample on top and the original n2
observations from the second sample
below;
the second argument is perm_data
which should be an integer vector giving
the indices in data
that are considered to belong to the first sample.
It is possible to use the use_stat
function with nsamples = 2
to have flipr automatically generate a template file for writing down
your own test statistics in a way that makes it compatible with the flipr
framework.
See the stat_t
function for an example.
n <- 10L mx <- 0 sigma <- 1 # Two different models for the two populations x <- rnorm(n = n, mean = mx, sd = sigma) delta <- 10 my <- mx + delta y <- rnorm(n = n, mean = my, sd = sigma) t1 <- two_sample_test(x, y) t1$pvalue # Same model for the two populations x <- rnorm(n = n, mean = mx, sd = sigma) delta <- 0 my <- mx + delta y <- rnorm(n = n, mean = my, sd = sigma) t2 <- two_sample_test(x, y) t2$pvalue
n <- 10L mx <- 0 sigma <- 1 # Two different models for the two populations x <- rnorm(n = n, mean = mx, sd = sigma) delta <- 10 my <- mx + delta y <- rnorm(n = n, mean = my, sd = sigma) t1 <- two_sample_test(x, y) t1$pvalue # Same model for the two populations x <- rnorm(n = n, mean = mx, sd = sigma) delta <- 0 my <- mx + delta y <- rnorm(n = n, mean = my, sd = sigma) t2 <- two_sample_test(x, y) t2$pvalue
This is a collection of functions that provide test statistics to be used into the permutation scheme for performing two-sample testing. These test statistics can be divided into two categories: traditional statistics that use empirical moments and inter-point statistics that only rely on pairwise dissimilarities between data points.
stat_welch(data, indices1, ...) stat_student(data, indices1, ...) stat_t(data, indices1, ...) stat_fisher(data, indices1, ...) stat_f(data, indices1, ...) stat_mean(data, indices1, ...) stat_hotelling(data, indices1, ...) stat_bs(data, indices1, ...) stat_student_ip(data, indices1, ...) stat_t_ip(data, indices1, ...) stat_fisher_ip(data, indices1, ...) stat_f_ip(data, indices1, ...) stat_bg_ip(data, indices1, ...) stat_energy_ip(data, indices1, alpha = 1L, ...) stat_cq_ip(data, indices1, ...) stat_mod_ip(data, indices1, ...) stat_dom_ip(data, indices1, standardize = TRUE, ...)
stat_welch(data, indices1, ...) stat_student(data, indices1, ...) stat_t(data, indices1, ...) stat_fisher(data, indices1, ...) stat_f(data, indices1, ...) stat_mean(data, indices1, ...) stat_hotelling(data, indices1, ...) stat_bs(data, indices1, ...) stat_student_ip(data, indices1, ...) stat_t_ip(data, indices1, ...) stat_fisher_ip(data, indices1, ...) stat_f_ip(data, indices1, ...) stat_bg_ip(data, indices1, ...) stat_energy_ip(data, indices1, alpha = 1L, ...) stat_cq_ip(data, indices1, ...) stat_mod_ip(data, indices1, ...) stat_dom_ip(data, indices1, standardize = TRUE, ...)
data |
Either a list of the |
indices1 |
An integer vector specifying the indices in |
... |
Extra parameters specific to some statistics. |
alpha |
A scalar value specifying the power to which the dissimilarities
should be elevated in the computation of the inter-point energy statistic.
Default is |
standardize |
A boolean specifying whether the distance between medoids
in the |
A real scalar giving the value of test statistic for the permutation
specified by the integer vector indices
.
stat_hotelling
implements Hotelling's statistic for
multivariate data with
.
stat_student
or stat_t
implements Student's
statistic (originally assuming equal variances and thus using the pooled
empirical variance estimator). See t.test
for details.
stat_welch
implements Student-Welch statistic which is
essentially a modification of Student's statistic accounting for unequal
variances. See t.test
for details.
stat_fisher
or stat_f
implements Fisher's
variance ratio statistic. See var.test
for details.
stat_mean
implements a statistic that computes the difference
between the means.
stat_bs
implements the statistic proposed by Bai & Saranadasa
(1996) for high-dimensional multivariate data.
stat_student_ip
or stat_t_ip
implements a
Student-like test statistic based on inter-point distances only as described
in Lovato et al. (2020).
stat_fisher_ip
or stat_f_ip
implements a
Fisher-like test statistic based on inter-point distances only as described
in Lovato et al. (2020).
stat_bg_ip
implements the statistic proposed by Biswas &
Ghosh (2014).
stat_energy_ip
implements the class of energy-based
statistics as described in Székely & Rizzo (2013);
stat_cq_ip
implements the statistic proposed by Chen & Qin
(2010).
stat_mod_ip
implements a statistic that computes the mean of
inter-point distances.
stat_dom_ip
implements a statistic that computes the distance
between the medoids of the two samples, possibly standardized by the pooled
corresponding variances.
Bai, Z., & Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 311-329.
Lovato, I., Pini, A., Stamm, A., & Vantini, S. (2020). Model-free two-sample test for network-valued data. Computational Statistics & Data Analysis, 144, 106896.
Biswas, M., & Ghosh, A. K. (2014). A nonparametric two-sample test applicable to high dimensional data. Journal of Multivariate Analysis, 123, 160-171.
Székely, G. J., & Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8), 1249-1272.
Chen, S. X., & Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38(2), 808-835.
n <- 10L mx <- 0 sigma <- 1 delta <- 10 my <- mx + delta x <- rnorm(n = n, mean = mx, sd = sigma) y <- rnorm(n = n, mean = my, sd = sigma) D <- dist(c(x, y)) x <- as.list(x) y <- as.list(y) stat_welch(c(x, y), 1:n) stat_t(c(x, y), 1:n) stat_f(c(x, y), 1:n) stat_mean(c(x, y), 1:n) stat_hotelling(c(x, y), 1:n) stat_bs(c(x, y), 1:n) stat_t_ip(D, 1:n) stat_f_ip(D, 1:n) stat_bg_ip(D, 1:n) stat_energy_ip(D, 1:n) stat_cq_ip(D, 1:n) stat_mod_ip(D, 1:n) stat_dom_ip(D, 1:n)
n <- 10L mx <- 0 sigma <- 1 delta <- 10 my <- mx + delta x <- rnorm(n = n, mean = mx, sd = sigma) y <- rnorm(n = n, mean = my, sd = sigma) D <- dist(c(x, y)) x <- as.list(x) y <- as.list(y) stat_welch(c(x, y), 1:n) stat_t(c(x, y), 1:n) stat_f(c(x, y), 1:n) stat_mean(c(x, y), 1:n) stat_hotelling(c(x, y), 1:n) stat_bs(c(x, y), 1:n) stat_t_ip(D, 1:n) stat_f_ip(D, 1:n) stat_bg_ip(D, 1:n) stat_energy_ip(D, 1:n) stat_cq_ip(D, 1:n) stat_mod_ip(D, 1:n) stat_dom_ip(D, 1:n)
This function is a helper to automatically generate an .R
file populated
with a skeleton of a typical test function compatible with flipr
.
use_stat(nsamples = 1, stat_name = "mystat")
use_stat(nsamples = 1, stat_name = "mystat")
nsamples |
An integer specifying the number of samples to be used.
Defaults to |
stat_name |
A string specifying the name of the test statistic that is
being implemented. Defaults to |
Creates a dedicated .R
file with a template of code for the
function that implements the test statistic and saves it to the R/
folder
of your package.
## Not run: use_stat() ## End(Not run)
## Not run: use_stat() ## End(Not run)