Package 'GBJ' reference manual

Title:	Generalized Berk-Jones Test for Set-Based Inference in Genetic Association Studies
Description:	Offers the Generalized Berk-Jones (GBJ) test for set-based inference in genetic association studies. The GBJ is designed as an alternative to tests such as Berk-Jones (BJ), Higher Criticism (HC), Generalized Higher Criticism (GHC), Minimum p-value (minP), and Sequence Kernel Association Test (SKAT). All of these other methods (except for SKAT) are also implemented in this package, and we additionally provide an omnibus test (OMNI) which integrates information from each of the tests. The GBJ has been shown to outperform other tests in genetic association studies when signals are correlated and moderately sparse. Please see the vignette for a quickstart guide or Sun and Lin (2017) <arXiv:1710.02469> for more details.
Authors:	Ryan Sun [aut, cre]
Maintainer:	Ryan Sun <[email protected]>
License:	GPL-3
Version:	0.5.4
Built:	2025-03-29 03:46:30 UTC
Source:	https://github.com/cran/GBJ

BJ.R

Description

Calculate the Berk-Jones test statistic and p-value.

Usage

BJ(test_stats, cor_mat = NULL, pairwise_cors = NULL)
BJ(test_stats, cor_mat = NULL, pairwise_cors = NULL)

Arguments

`test_stats`	Vector of test statistics for each factor in the set (i.e. marginal test statistic for each SNP in a gene).
`cor_mat`	d*d matrix of the correlations between all the test statistics in the set, where d is the total number of test statistics in the set. You only need to specify EITHER cor_mat OR pairwise_cors.
`pairwise_cors`	A vector of all d(d-1)/2 pairwise correlations between the test statistics. You only need to specify EITHER cor_mat OR pairwise_cors.

Value

A list with the elements:

`BJ`	The observed Berk-Jones test statistic.
`BJ_pvalue`	The p-value of this observed value, given the size of the set and correlation structure.

Examples

# Should return statistic = 1.243353 and p_value = 0.256618
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
BJ(test_stats=Z_vec, cor_mat=cor_Z)
# Should return statistic = 1.243353 and p_value = 0.256618
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
BJ(test_stats=Z_vec, cor_mat=cor_Z)

Starting with individual-level data on p factors, generate score test statistics for each factor for input into GBJ/GHC/HC/BJ/minP. Also get the correlations between these test statistics. Designed to be used with linear or logistic or log-linear regression null models.

Usage

calc_score_stats(null_model, factor_matrix, link_function, P_mat = NULL)
calc_score_stats(null_model, factor_matrix, link_function, P_mat = NULL)

Arguments

`null_model`	An R regression model fitted using glm(). Do not use lm(), even for linear regression!
`factor_matrix`	An n*p matrix with each factor as one column. There should be no missing data.
`link_function`	Either "linear" or "logit" or "log"
`P_mat`	The projection matrix used in calculation may be passed in to speed up the calculation. See paper for details. Default is null.

Value

A list with the elements:

`test_stats`	The p score test statistics.
`cor_mat`	The p*p matrix giving the pairwise correlation of every two test statistics.

Examples

Y <- rbinom(n=100, size=1, prob=0.5)
null_mod <- glm(Y~1, family=binomial(link="logit"))
factor_mat <- matrix(data=rnorm(n=100*5), nrow=100)
calc_score_stats(null_mod, factor_mat, "logit")
Y <- rbinom(n=100, size=1, prob=0.5)
null_mod <- glm(Y~1, family=binomial(link="logit"))
factor_mat <- matrix(data=rnorm(n=100*5), nrow=100)
calc_score_stats(null_mod, factor_mat, "logit")

estimate_ss_cor.R

Description

Estimate the correlations between GWAS summary statistics using reference panel eigenvectors and reference panel genotypes.

Usage

estimate_ss_cor(ref_pcs, ref_genotypes, link_function)
estimate_ss_cor(ref_pcs, ref_genotypes, link_function)

Arguments

`ref_pcs`	An n*m matrix containing PCs calculated from the reference panel. Here n is the number of subjects in the reference panel and m is roughly the number of PCs used in the original analysis which produced the summary statistics.
`ref_genotypes`	An n*d matrix holding the genotypes from the reference panel, where the d columns correspond to the d SNPs for which we have summary statistics. No missing data allowed.
`link_function`	Either "linear" or "logit" or "log".

Value

A list with the elements:

cor_mat

The d*d matrix giving the pairwise correlation of every two test statistics.

Examples

ref_pcs <- matrix(data=runif(n=1000, min=-0.2, max=0.2), ncol=5)
ref_genotypes <- matrix(data=rbinom(n=2000, size=2, prob=0.3), ncol=10)
estimate_ss_cor(ref_pcs=ref_pcs, ref_genotypes=ref_genotypes, link_function="linear")
ref_pcs <- matrix(data=runif(n=1000, min=-0.2, max=0.2), ncol=5)
ref_genotypes <- matrix(data=rbinom(n=2000, size=2, prob=0.3), ncol=10)
estimate_ss_cor(ref_pcs=ref_pcs, ref_genotypes=ref_genotypes, link_function="linear")

Genotypes at FGFR2 SNPs for subjects from 'GBR' population in the 1000 Genomes Project.

Description

A dataset containing the genotypes (number of minor alleles) for each of 91 subjects from the 'GBR' population in the 1000 Genomes Projects. There are 64 SNPs documented here, all residing in the FGFR2 gene.

Usage

data(FGFR2)
data(FGFR2)

Format

A matrix with 91 rows (one for each subject) and 64 columns (one for each SNP)

Source

https://www.internationalgenome.org/data

GBJ.R

Description

Calculate the Generalized Berk-Jones test statistic and p-value.

Usage

GBJ(test_stats, cor_mat = NULL, pairwise_cors = NULL)
GBJ(test_stats, cor_mat = NULL, pairwise_cors = NULL)

Arguments

`test_stats`	Vector of test statistics for each factor in the set (i.e. marginal test statistic for each SNP in a gene).
`cor_mat`	d*d matrix of the correlations between all the test statistics in the set, where d is the total number of test statistics in the set. You only need to specify EITHER cor_mat OR pairwise_cors.
`pairwise_cors`	A vector of all d(d-1)/2 pairwise correlations between the test statistics. You only need to specify EITHER cor_mat OR pairwise_cors.

Value

A list with the elements:

`GBJ`	The observed Generalized Higher Criticism test statistic.
`GBJ_pvalue`	The p-value of this observed value, given the size of the set and correlation structure.
`err_code`	Sometimes if your p-value is very small (<10^(-12) usually), R/C++ do not have enough precision in their standard routines to calculate the number accurately. In these cases (and very rarely others) we switch to standard Berk-Jones instead (more stable numerically) and let you know with a message here.

Examples

# Should return statistic = 0.9248399 and p_value = 0.2670707
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
GBJ(test_stats=Z_vec, cor_mat=cor_Z)
# Should return statistic = 0.9248399 and p_value = 0.2670707
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
GBJ(test_stats=Z_vec, cor_mat=cor_Z)

GBJ_pvalue.R

Description

Calculate the p-value for the Generalized Berk-Jones (GBJ) statistic.

Usage

GBJ_pvalue(observed_gbj, d, pairwise_cors, times_to_try = 5)
GBJ_pvalue(observed_gbj, d, pairwise_cors, times_to_try = 5)

Arguments

`observed_gbj`	The observed value of the GBJ statistic.
`d`	The number of test statistics in the set.
`pairwise_cors`	A vector of all d(d-1)/2 pairwise correlations between the test statistics, where d is total number of test statistics in the set.
`times_to_try`	Sometimes the numerical root-finder is finnicky, so we have to give it extra chances to try and calculate the p-value if first time is failure. Recommend setting this parameter to 5.

Value

The p-value of the GBJ test.

Examples

GBJ_pvalue(observed_gbj=2, d=5, pairwise_cors=rep(0.2,10))
GBJ_pvalue(observed_gbj=2, d=5, pairwise_cors=rep(0.2,10))

Simulated Principal Components for 'GBR' population in the 1000 Genomes Project.

Description

A dataset containing 5 simulated Principal Components (PCs) for each of 91 subjects from the 'GBR' population in the 1000 Genomes Projects. These would normally be used as covariates in a regression model to control for population stratification.

Usage

data(gbr_pcs)
data(gbr_pcs)

Format

A matrix with 91 rows (one for each subject) and 5 columns (one for each PC)

GHC.R

Description

Calculate the Generalized Higher Criticism test statistic and p-value.

Usage

GHC(test_stats, cor_mat = NULL, pairwise_cors = NULL)
GHC(test_stats, cor_mat = NULL, pairwise_cors = NULL)

Arguments

`test_stats`	Vector of test statistics for each factor in the set (i.e. marginal test statistic for each SNP in a gene).
`cor_mat`	d*d matrix of the correlations between all the test statistics in the set, where d is the total number of test statistics in the set. You only need to specify EITHER cor_mat OR pairwise_cors.
`pairwise_cors`	A vector of all d(d-1)/2 pairwise correlations between the test statistics. You only need to specify EITHER cor_mat OR pairwise_cors.

Value

A list with the elements:

`GHC`	The observed Generalized Higher Criticism test statistic.
`GHC_pvalue`	The p-value of this observed value, given the size of the set and correlation structure.
`err_code`	Sometimes if your p-value is very small (<10^(-12) usually), R/C++ do not have enough precision in their standard routines to calculate the number accurately. In these cases (and very rarely others) we switch to standard Higher Criticism instead (more stable numerically) and let you know with a message here.

Examples

set.seed(100)
Z_vec <- rnorm(5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
GHC(test_stats=Z_vec, cor_mat=cor_Z)
set.seed(100)
Z_vec <- rnorm(5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
GHC(test_stats=Z_vec, cor_mat=cor_Z)

HC.R

Description

Calculate the Higher Criticism test statistic and p-value.

Usage

HC(test_stats, cor_mat = NULL, pairwise_cors = NULL)
HC(test_stats, cor_mat = NULL, pairwise_cors = NULL)

Arguments

`test_stats`	Vector of test statistics for each factor in the set (i.e. marginal test statistic for each SNP in a gene).
`cor_mat`	d*d matrix of the correlations between all the test statistics in the set, where d is the total number of test statistics in the set. You only need to specify EITHER cor_mat OR pairwise_cors.
`pairwise_cors`	A vector of all d(d-1)/2 pairwise correlations between the test statistics. You only need to specify EITHER cor_mat OR pairwise_cors.

Value

A list with the elements:

`HC`	The observed Higher Criticism test statistic.
`HC_pvalue`	The p-value of this observed value, given the size of the set and correlation structure.

Examples

# Should return statistic = 2.067475 and p_value = 0.2755146
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
HC(test_stats=Z_vec, cor_mat=cor_Z)
# Should return statistic = 2.067475 and p_value = 0.2755146
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
HC(test_stats=Z_vec, cor_mat=cor_Z)

minP.R

Description

Given a vector of individual test statistics and their pairwise correlations, calculate the MinimumP (see Conneely and Boehnke, 2007) second-level test statistic and it's p-value.

Usage

minP(test_stats, cor_mat = NULL, pairwise_cors = NULL)
minP(test_stats, cor_mat = NULL, pairwise_cors = NULL)

Arguments

`test_stats`	Vector of test statistics for each factor in the set (i.e. marginal test statistic for each SNP in a gene)
`cor_mat`	d*d matrix of the correlations between all the test statistics in the set, where d is the total number of test statistics in the set. You only need to specify EITHER cor_mat OR pairwise_cors.
`pairwise_cors`	A vector of all d(d-1)/2 pairwise correlations between the test statistics. You only need to specify EITHER cor_mat OR pairwise_cors.

Value

A list with the elements:

`minP`	The observed MinimumP test statistic.
`minP_pvalue`	The p-value of this observed value, given the size of the set and correlation structure.

Examples

# Should return statistic = 0.05918928 and p_value = 0.2525972.
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
minP(test_stats=Z_vec, cor_mat=cor_Z)
# Should return statistic = 0.05918928 and p_value = 0.2525972.
set.seed(100)
Z_vec <- rnorm(5) + rep(1,5)
cor_Z <- matrix(data=0.2, nrow=5, ncol=5)
diag(cor_Z) <- 1
minP(test_stats=Z_vec, cor_mat=cor_Z)

omni_individual.R

Description

Computes the omnibus test statistic combining GBJ, GHC, minP, and SKAT. This version of the function assumes you have the individual factor data (i.e. genotypes) for each subject. If you only have summary statistics, use omni_ss(). You WILL NOT be able to use this function unless you have also loaded the SKAT package (install.packages("SKAT"); library(SKAT)).

Usage

OMNI_individual(null_model, factor_matrix, link_function, num_boots = 100)
OMNI_individual(null_model, factor_matrix, link_function, num_boots = 100)

Arguments

`null_model`	An R regression model fitted using glm(). Do not use lm(), even for linear regression!
`factor_matrix`	An n*d matrix with each factor (i.e. each SNP) as one column. There should be no missing data.
`link_function`	Either "linear" or "logit" or "log".
`num_boots`	Number of bootstrap repetitions to find correlation matrix of set-based statistics.

Value

A list with the elements:

`OMNI`	The observed omnibus test statistic.
`OMNI_pvalue`	The p-value of the OMNI test
`err_code`	Sometimes if your p-value is very small (< 1*10^(-10)), R may run into numerical issues. This message will alert you if such a situation occurs.

Examples

factor_matrix <- matrix(data=rbinom(n=1000, size=2, prob=0.3), ncol=5)
Y <- rnorm(n=200)
null_mod <- glm(Y ~ 1)
OMNI_individual(null_model=null_mod, factor_matrix=factor_matrix,
link_function='linear', num_boots=5)
factor_matrix <- matrix(data=rbinom(n=1000, size=2, prob=0.3), ncol=5)
Y <- rnorm(n=200)
null_mod <- glm(Y ~ 1)
OMNI_individual(null_model=null_mod, factor_matrix=factor_matrix,
link_function='linear', num_boots=5)

omni_ss.R

Description

Computes the omnibus test statistic combining GBJ, GHC, minP, and SKAT. This version of the function assumes you are using GWAS summary statistics. If you individual-level genotype data, use omni_individual().

Usage

OMNI_ss(test_stats, cor_mat, num_boots = 100)
OMNI_ss(test_stats, cor_mat, num_boots = 100)

Arguments

`test_stats`	Vector of test statistics for each factor in the set (i.e. marginal test statistic for each SNP in a gene)
`cor_mat`	d*d matrix of the correlations between all the test statistics in the set, where d is the total number of test statistics in the set.
`num_boots`	Number of bootstrap repetitions to find correlation matrix of set-based statistics.

Value

A list with the elements:

`OMNI`	The observed omnibus test statistic.
`OMNI_pvalue`	The p-value of the OMNI test
`err_code`	Sometimes if your p-value is very small (< 1*10^(-10)), R may run into numerical issues. This message will alert you if such a situation occurs.

Examples

cor_mat <- matrix(data=0.3, nrow=5, ncol=5)
diag(cor_mat) <- 1
test_stats <- as.numeric(mvtnorm::rmvnorm(n=1, sigma=cor_mat))
OMNI_ss(test_stats=test_stats, cor_mat=cor_mat, num_boots=5)
cor_mat <- matrix(data=0.3, nrow=5, ncol=5)
diag(cor_mat) <- 1
test_stats <- as.numeric(mvtnorm::rmvnorm(n=1, sigma=cor_mat))
OMNI_ss(test_stats=test_stats, cor_mat=cor_mat, num_boots=5)

score_stats_only.R

Description

Starting with individual-level data on p factors, generate score test statistics for each factor for input into GBJ/GHC/HC/BJ/minP. DOES NOT get the correlations (assumed known).

Usage

score_stats_only(null_model, factor_matrix, link_function, P_mat = NULL)
score_stats_only(null_model, factor_matrix, link_function, P_mat = NULL)

Arguments

`null_model`	An R regression model fitted using glm(). Do not use lm(), even for linear regression!
`factor_matrix`	An n*d matrix with each factor as one column. There should be no missing data.
`link_function`	Either "linear" or "logit" or "log".
`P_mat`	The projection matrix used in calculation may be passed in to speed up the calculation. See paper for details. Default is null.

Value

The d score test statistics.

Examples

Y <- rbinom(n=100, size=1, prob=0.5)
null_mod <- glm(Y~1, family=binomial(link="logit"))
factor_matrix <- matrix(data=rnorm(n=100*5), nrow=100)
score_stats_only(null_mod, factor_matrix, "logit")
Y <- rbinom(n=100, size=1, prob=0.5)
null_mod <- glm(Y~1, family=binomial(link="logit"))
factor_matrix <- matrix(data=rnorm(n=100*5), nrow=100)
score_stats_only(null_mod, factor_matrix, "logit")

surv.R

Description

Survival (1 minus the CDF) function of standard normal random variable.

Usage

surv(x)
surv(x)

Arguments

`x`	Vector of quantiles

Value

Probability that a standard normal random variable is greater than x.

Examples

surv(0)		# Should return 0.5
surv(0)		# Should return 0.5

Package 'GBJ'

Help Index

BJ.R

Description

Usage

Arguments

Value

Examples

calc_score_stats.R

Description

Usage

Arguments

Value

Examples

estimate_ss_cor.R

Description

Usage

Arguments

Value

Examples

Genotypes at FGFR2 SNPs for subjects from 'GBR' population in the 1000 Genomes Project.

Description

Usage

Format

Source

GBJ.R

Description

Usage

Arguments

Value

Examples

GBJ_pvalue.R

Description

Usage

Arguments

Value

Examples

Simulated Principal Components for 'GBR' population in the 1000 Genomes Project.

Description

Usage

Format

GHC.R

Description

Usage

Arguments

Value

Examples

HC.R

Description

Usage

Arguments

Value

Examples

minP.R

Description

Usage

Arguments

Value

Examples

omni_individual.R

Description

Usage

Arguments

Value

Examples

omni_ss.R

Description

Usage

Arguments

Value

Examples

score_stats_only.R

Description

Usage

Arguments

Value

Examples

surv.R

Description

Usage