ASTUTE is an R package designed to integrate cancer genomic and transcriptomic data in order to perform genotype-phenotype mapping. It leverages regularized regression with LASSO penalty to uncover associations between somatic mutations and gene expression profiles.

In its basic implementation, ASTUTE requires two main inputs: (i) a binary matrix where rows are patients (i.e., samples) and columns are mutations. Each cell of the matrix is 1 if the related mutation was observed in the sample; 0 otherwise. (2) a matrix with log2(x+1)-transformed normalized expression matrix for the same patients.

In this vignette, we give an overview of the package by presenting some of its main functions.

Installing the ASTUTE R package

The ASTUTE package can be installed from GitHub using the R package devtools as follows.

library("devtools")
install_github("ramazzottilab/ASTUTE", ref = 'master')

Changelog

  • 1.0.0 Package released in July 2024.

Using the ASTUTE R package

We provide within the package an example dataset providing alterations and expression data for a set of selected genes from 50 lung adenocarcinoma samples from Cancer Genome Atlas Research Network. “Comprehensive molecular profiling of lung adenocarcinoma.” Nature 511, no. 7511 (2014): 543.

library("ASTUTE")
data(datasetExample)

ASTUTE performs genotype-phenotype mapping by associating somatic mutations to gene expression profiles.

set.seed(12345)
resExample <- ASTUTE( alterations = datasetExample$alterations, 
                      expression = datasetExample$expression, 
                      regularization = TRUE, 
                      nboot = NA, 
                      num_processes = NA, 
                      verbose = FALSE )
print(names(resExample))
## [1] "input_data"   "inference"    "parameters"   "goodness_fit" "fold_changes"
## [6] "pvalues"      "qvalues"

The output of this analysis is a a list of 7 elements: (1) input_data: list providing the input data (i.e., alterations and expression data); (2) bootstrap: results of the inference by bootstrap (i.e., alpha alterations matrix, beta matrix, and intercept estimates); (3) parameters: list with the paremeters used for the inference (i.e., regularization TRUE/FALSE and nboot); (4) goodness_fit: goodness of fit estimated as the cosine similarity comparing observations and predictions; (5) fold_changes: log2 fold changes estimates; (6) pvalues: p-values estimates; (7) qvalues: p-values estimates corrected for false discovery.

In the example provided above, we did not perform bootstrap, so no p-values and q-values estimates are provided.

Current R Session

## R Under development (unstable) (2025-02-24 r87814)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ASTUTE_1.2.0     BiocStyle_2.35.0
## 
## loaded via a namespace (and not attached):
##  [1] lsa_0.73.3          cli_3.6.4           knitr_1.50         
##  [4] rlang_1.1.5         xfun_0.51           textshaping_1.0.0  
##  [7] jsonlite_1.9.1      SnowballC_0.7.1     htmltools_0.5.8.1  
## [10] ragg_1.3.3          sass_0.4.9          glmnet_4.1-8       
## [13] rmarkdown_2.29      grid_4.5.0          evaluate_1.0.3     
## [16] jquerylib_0.1.4     fastmap_1.2.0       foreach_1.5.2      
## [19] yaml_2.3.10         lifecycle_1.0.4     bookdown_0.42      
## [22] BiocManager_1.30.25 compiler_4.5.0      codetools_0.2-20   
## [25] fs_1.6.5            Rcpp_1.0.14         htmlwidgets_1.6.4  
## [28] lattice_0.22-6      systemfonts_1.2.1   digest_0.6.37      
## [31] R6_2.6.1            parallel_4.5.0      splines_4.5.0      
## [34] shape_1.4.6.1       Matrix_1.7-3        bslib_0.9.0        
## [37] tools_4.5.0         iterators_1.0.14    survival_3.8-3     
## [40] pkgdown_2.1.1.9000  cachem_1.1.0        desc_1.4.3