SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests

UKBB recently released whole exome sequencing (WES) data1, allowing study of rare variant associations for complex phenotypes. However, best practices remain unclear for rare variant tests in large-scale biobanks. A common practice is to test all rare (MAF ≤ 1%) loss-of-function (LoF) and missense variants, but this approach can lose power if associations are enriched in very rare variants or certain functional annotations. To improve power, researchers can restrict tests to rarer variants, such as variants with MAF ≤ 0.1% or MAF ≤ 0.01%. Another approach is to incorporate functional annotations. To incorporate multiple MAF cutoffs and functional annotations, multiple tests are needed for each gene or region, and results need to be combined using minimum P value or Cauchy combination method2,3.

Currently, SAIGE-GENE4 is the only method developed to conduct variance component set-based tests, such as SKAT5 and SKAT-O6, for unbalanced case–control phenotypes in biobank-scale data. For example, in our evaluation, the most recent set-based test, STAAR2, cannot control for type I error rates in the presence of case–control imbalance (Extended Data Fig. 1). Burden tests (such as implemented in REGENIE2 (ref. 7)) collapse multiple rare variants into a single variant, allowing the use of well-developed single-variant tests. However, Burden tests can have low power compared with SKAT and SKAT-O6. This was confirmed in our simulation studies (Extended Data Fig. 2). In analyzes of UKBB WES data from 160,000 white British individuals (from the release with 200,000 individuals), we found that SAIGE-GENE performed well when testing variants with MAF ≤ 1% (Fig. 1a), but inflation was observed in SKAT and SKAT -O in SAIGE-GENE when restricting to variants with MAF ≤ 0.1% or 0.01% if the case–control ratios were more unbalanced than 1:30 (Fig. 1a and Extended Data Fig. 3). Our type I error simulation studies (Supplementary Note; Methods) also showed the same inflation (Extended Data Fig. 4), suggesting that SKAT and SKAT-O in SAIGE-GENE can suffer from inflated type I error rates when restricted to variants with very low MAF.

Fig. 1: Q–Q plots for Burden, SKAT and SKAT-O for four exemplary binary phenotypes in UKBB WES data using SAIGE-GENE and SAIGE-GENE+.

aSAIGE-GENE. b, SAIGE-GENE+. Burden, SKAT and SKAT-O tests were performed for 18,372 genes with missense and LoF variants with three different maximum MAF cutoffs (1%, 0.1% and 0.01%). Names of genes reaching the exome-wide significance threshold (two-sided P< 2.5 × 10-6) in SAIGE-GENE+ are annotated in the plots.

In addition, computation cost is not low enough to test for multiple variant sets. For example, to test the largest gene (TTN ) with 16,227 variants in the UKBB WES data with three maximum MAF cutoffs (1%, 0.1% and 0.01%) and three annotations (LoF only, LoF+missense and LoF+missense+synonymous), SAIGE-GENE required 164 CPU hours and 65 gigabytes (GB) of memory (Supplementary Table 1).

To address these issues, we propose SAIGE-GENE+. Although SAIGE-GENE uses various approaches to account for case–control imbalance, it cannot fully address the imbalance and sparsity in the data (Fig. 1a and Extended Data Fig. 4a). To reduce the data sparsity due to ultra-rare variants, before testing each variant set, SAIGE-GENE+ collapses variants with MAC ≤ 10 and then tests the collapsed variant together with all other variants with MAC > 10 (Extended Data Fig. 5; Methods ). Collapsing has been commonly used for ultra-rare variants8,9 by assuming those variants have the same direction of effects on phenotypes. We observed that the inflation is substantially reduced and all tests have well controlled type I errors in both simulated (Extended Data Fig. 4b) and UKBB WES analyzes (Fig. 1b) for four exemplary phenotypes with case-control ratios from 1:32 to 1:267. The genomic control inflation factors also became closer to 1 (Extended Data Fig. 3).

Collapsing ultra-rare variants in SAIGE-GENE+ decreases the number of variants (Extended Data Fig. 6), leading to reduced computation time and memory usage (Fig. 2a, Supplementary Table 1 and Extended Data Fig. 7). To further reduce the computational cost, SAIGE-GENE+ extensively uses C++ with sparse matrix libraries, reads genotypes for all genetic markers in a set only once, and conducts multiple association tests corresponding to different MAF cutoffs and annotations (Supplementary Note). The computation time of SAIGE-GENE+ for performing all Burden, SKAT and SKAT-O tests was 1,407 times lower (9,851 min versus 7 min) and the memory usage dropped from 65 GB to 2.1 GB compared with SAIGE-GENE for testing association of the largest gene TTN (16,227 LoF+missense+synonymous variants) with basal metabolic rate (Supplementary Table 1). To perform SKAT-O tests for 18,372 genes in randomly selected 150,000 samples with three MAF cutoffs (1%, 0.1% and 0.01%) and three variant annotations (LoF only, LoF+missense and LoF+missense+synonymous), SAIGE-GENE+ required 78.6 CPU hours (18.8 CPU hours for fitting the null mixed model using a full genetic relationship matrix (GRM) as Step 1 and 59.8 CPU hours for association tests as Step 2) and 4.8 GB memory (4.8 GB for Step 1 and 2 GB for Step 2) (Supplementary Tables 2 and 3 and Extended Data Fig. 8). In addition, when a sparse GRM instead of a full GRM was used in Step 1, the time and memory usage dropped dramatically (<1 min and 0.61 GB) (Supplementary Table 2, Supplementary Note, Extended Data Fig. 9 and Supplementary Figs. 1 and 2) and treating covariates as offset leads to a further decrease in the computation time (Supplementary Table 4). We also compared the computation cost of SAIGE-GENE+ and REGENIE2 (Supplementary Tables 2 and 3, Supplementary Note and Extended Data Fig. 8).

Fig. 2: Performance of SAIGE-GENE+ in UKBB WES data.
Figure 2

a, Computation time and memory of the gene-based tests (Step 2; Methods) in SAIGE-GENE and SAIGE-GENE+ for four genes with different numbers of variants. The SKAT-O tests were conducted with three maximum MAF cutoffs (1%, 0.1% and 0.01%) and three variant annotations (LoF only, LoF+missense and LoF+missense+synonymous) and combined using the Cauchy combination or minimum P value approach. Plots are in the log10 -log10 scale. Details of the numbers and genes are presented in Supplementary Table 1. b, Most significant variant sets across the three different MAF cutoffs (1%, 0.1% and 0.01%) and three functional annotations (LoF (L) only, LoF+missense (M+L) and LoF+missense+synonymous (S+M +L)). Distribution of variant sets with the smallest P values ​​among 551 significant gene–phenotype associations identified by SAIGE-GENE+ in the analyzes of 30 quantitative traits and 141 binary traits in the UKBB WES data.

By collapsing ultra-rare variants, SAIGE-GENE+ can have more significant P values ​​than SAIGE-GENE. We applied both methods to 37 self-reported binary phenotypes in the UKBB WES data. We observed 27 significant gene–phenotype associations in which SKAT-O P values ​​in SAIGE-GENE+ were more significant than SKAT-O P values ​​in SAIGE-GENE (Supplementary Table 5). For example, BRCA2 for breast cancer with MAF ≤ 0.1% had a P value of 7.62 × 10-8 in SAIGE-GENE+ and 1.65 × 10-3 in SAIGE- GENE, and GCK for diabetes with MAF ≤ 0.1% also had a more significant P value (1.22 × 10-13) in SAIGE-GENE+ than in SAIGE-GENE (P = 4.06 × 10-6). More detailed discussion is provided in the Supplementary Note.

We evaluated the power of SAIGE-GENE+ and SAIGE-GENE through simulation studies based on real genotypes of ten genes in the UKBB WES data (Supplementary Table 6, Supplementary Note and Methods). In all scenarios, SAIGE-GENE+ had higher or similar empirical power than SAIGE-GENE (Supplementary Table 7 and Supplementary Fig. 3) with increased median Chi-square statistics (Supplementary Table 8). In line with previous studies6, our results showed that SKAT-O tests can have higher power than Burden tests (Extended Data Fig. 2 and Supplementary Table 8). As expected, Burden test P values ​​were highly concordant in SAIGE-GENE+ and REGENIE2 (Pearson’s correlation R2 = 0.99 for –log10(Pvalue)) (Supplementary Fig. 4). In addition, the simulation results suggested that incorporating multiple functional annotations (LoF, LoF+missense and LoF+missense+synonymous) and maximum MAF cutoffs (0.01%, 0.1% and 1%) can increase power compared with using only a single MAF cutoff (1%) on one set of function annotation (LoF+missense+synonymous) (Supplementary Fig. 5 and Supplementary Table 8).

We applied SAIGE-GENE+ to 18,372 genes in the UKBB WES data with 160,000 individuals of white British ancestry for 30 quantitative and 141 binary traits (Methods). We identified 465 gene–phenotype associations for 27 quantitative traits and 86 for 51 binary traits that were exome-wide significant with P values ​​≤ 2.5 × 10-6(Supplementary Tables 9 and 10), containing both known and potentially new associations (Supplementary Note). We created PheWeb-like server for visualizing these results (see Data availability)10.

The UKBB WES data analysis showed that using lower MAF cutoffs can identify new associations in which the associations are highly enriched in rarer variants. For example, the association between PDCD1LG2 which encodes Programmed Cell Death 1 Ligand, and chronic lymphocytic leukemia became significant in tests restricted to variants with MAF ≤ 0.01% and 0.1% (P = 7.5 × 10-7) compared with tests with all variants with MAF ≤ 1% (P = 5.4 × 10−4 ) (Supplementary Table 11). The underlying reason could be that associations are enriched in the rarer variants, for example, the most significant variant has a MAF 3.4 × 10−4(rs7854303) (see the PheWeb-like visual browser). Using a MAF cutoff ≤ 1% includes many noncausal variants, and thus decreases power. Moreover, including lower MAF cutoffs helped to replicate known associations such as MLH1 for colorectal cancer and CDKN2Afor melanoma (Supplementary Table 11). Due to multiple comparison burden, including lower MAF cutoffs can make marginally significant associations insignificant. For 141 binary phenotypes, 17 out of 92 (18.4%) associations were further identified with lower MAF cutoffs, while 9 (9.8%) became insignificant (Supplementary Fig. 6a and Supplementary Table 11). For 30 quantitative traits, 28 out of 465 (6%) associations were additionally identified, while 53 (11.4%) became insignificant (Supplementary Fig. 6a and Supplementary Table 12), suggesting that restricting association tests to rarer variants has a gain for binary phenotypes. In functional annotation categories, 184 associations were identified by testing LoF only; including LoF+missense sets identified 299 additional associations and including LoF+missense+synonymous sets identified 91 more associations (Supplementary Fig. 6b). These results are consistent with our simulation studies showing that empirical power increased when incorporating multiple functional annotations (Supplementary Fig. 5). We also investigated which variant set among the 551 significant gene–phenotype associations had the smallest Pvalue (Fig. 2b). Interestingly, in variant sets with MAF ≤ 0.01%, those with LoF only generally had the smallest Pvalues, while with MAF ≤ 1%, LoF+missense+synonymous sets tend to have the smallest Pvalues.

In summary, our results demonstrate that incorporating multiple MAF cutoffs and functional annotations in exome-wide set-based association tests can help identify new gene–phenotype associations, and that SAIGE-GENE+ can facilitate this.

Leave a Comment

Your email address will not be published.

%d bloggers like this: