--- title: "volcano3D: 2x3-way analysis" author: "Myles Lewis, Katriona Goldmann, Elisabetta Sciacca" output: html_document: toc: true toc_float: collapsed: false toc_depth: 3 number_sections: false vignette: > %\VignetteIndexEntry{2x3-way analysis} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include = FALSE, echo = FALSE} knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fig.height = 7, fig.width=7, fig.align = "center") library(knitr) library(kableExtra) ``` ## 2x3-way analysis ```{r, echo=FALSE} library(ggplot2) library(ggpubr) library(plotly) library(usethis) ``` The main work flow for using volcano3D is ideal for comparing high dimensional data such as gene expression or other biological data across 3 classes, and this is covered in the main vignette. However, an alternative use of 3-way radial plots and 3d volcano plots is for 2x3-way analysis. In this type of analysis there is a binary factor such as drug response (responders vs non-responders) and a 2nd factor with 3 classes such as a trial with 3 drugs. ## Example This vignette shows analysis from the STRAP trial in rheumatoid arthritis, in which patients were randomised to one of three drugs (etanercept, rituximab, tocilizumab). Clinical response to treatment (a binary outcome) was measured after 16 weeks. Gene expression data from RNA-Sequencing (RNA-Seq) of synovial biopsies from the patients' inflamed joints was performed at baseline. RNA-Seq data is count based and overdispersed, and thus is typically best modelled by a negative binomial distribution. ## DESeq2 pipeline Differential gene expression to compare the synovial gene expression between responders vs non-responders is performed using the Bioconductor package DESeq2. Since there are 3 distinct drugs this becomes a 2x3-factor analysis. The following code shows set up and 2x3-way analysis in DESeq2 followed by generation of a 'volc3d' class results object for plotting and visualisation of the results. ```{r, eval=FALSE} library(volcano3D) # Basic DESeq2 set up library(DESeq2) counts <- matrix(rnbinom(n=3000, mu=100, size=1/0.5), ncol=30) rownames(counts) <- paste0("gene", 1:100) cond <- rep(factor(rep(1:3, each=5), labels = c('A', 'B', 'C')), 2) resp <- factor(rep(1:2, each=15), labels = c('non.responder', 'responder')) metadata <- data.frame(drug = cond, response = resp) # Full dataset object construction dds <- DESeqDataSetFromMatrix(counts, metadata, ~response) # Perform 3x DESeq2 analyses comparing binary response for each drug res <- deseq_2x3(dds, ~response, "drug") ``` The design formula can contain covariates, however the main contrast should be the last term of the formula as is standard in DESeq2 analysis. The function `deseq_2x3()` returns a list of 3 DESeq2 objects containing the response analysis for each of the three drugs. These response vs non-response differential expression comparisons can be quickly visualised with the `easyVolcano()` function from the R package `easylabel`, which is designed for DESeq2 and limma objects and uses an interactive R/shiny interface. ```{r, eval=FALSE} library(easylabel) df <- as.data.frame(res[[1]]) # results for the first drug easyVolcano(df) ``` The `deseq_2x3` output is passed to `deseq_2x3_polar()` to generate a `volc3d` class object for plotting. Thus the 3-way radial plot and 3d volcano plot simplify visualisation of the 2x3-way analysis, replacing 3 volcano plots with a single radial plot or 3d volcano plot. ```{r, eval=FALSE} # Generate polar object obj <- deseq_2x3_polar(res) # 2d plot radial_plotly(obj) # 3d plot volcano3D(obj) ``` ## Real world example The example below using real data from the STRAP trial demonstrates a 3-way radial plot highlighting the relationship between genes significantly increased in responders across each of the three drugs. The pipe to `toWebGL()` is used to speed up plotting by using webGL instead of SVG in plotly scatter plots. Alternatively, a ggplot2 version can be plotted using `radial_ggplot()`. ```{r, eval=FALSE} obj <- deseq_2x3_polar(data1) labs <- c('MS4A1', 'TNXA', 'FLG2', 'MYBPC1') radial_plotly(obj, type=2, label_rows = labs) %>% toWebGL() ``` ```{r radial_2x3_pos, echo = FALSE, message=FALSE, fig.align='center', out.width='70%', out.extra='style="border: 0;"'} knitr::include_graphics("radial_2x3_pos.png") ``` A 3d volcano plot can be generated too. ```{r volc_2x3, echo = FALSE, message=FALSE, fig.align='center', out.width='70%', out.extra='style="border: 0;"'} knitr::include_graphics("volc3d_2x3.png") ``` To show genes which are significantly increased in non-responders for each drug, we reprocess the object using the argument `process = "negative"` as this leads to a different colour scheme for each gene. The polar coordinates are essentially flipped as each axis represents increased expression in non-responders. ```{r, eval=FALSE} obj <- deseq_2x3_polar(data1, process = "negative") labs <- c('MS4A1', 'TNXA', 'FLG2', 'MYBPC1') radial_plotly(obj, type=2, label_rows = labs) %>% toWebGL() ``` ```{r radial_2x3_neg, echo = FALSE, message=FALSE, fig.align='center', out.width='70%', out.extra='style="border: 0;"'} knitr::include_graphics("radial_2x3_neg.png") ``` ## Custom 2x3-way analysis The above workflow is designed for RNA-Seq count data. For other raw data types, the function `polar_coords_2x3()` is used to map attributes to polar coordinates in a 2x3-way analysis. ```{r eval=FALSE} polar_obj <- polar_coords_2x3(vstdata, metadata, "ACR.response.status", "Randomised.Medication") radial_plotly(polar_obj, type=2) volcano3D(polar_obj) ``` In the above code example `vstdata` contains transformed (approximately Gaussian) gene expression data with genes in columns and samples in rows. `metadata` is a dataframe of sample information with samples in rows. `"ACR.response.status"` refers the the binary response column in `metadata`. `"Randomised.Medication"` refers to the 3-way group column in `metadata`. `polar_coords_2x3()` accepts raw data and performs all the calculations needed to generate coordinates, colours etc for plotting radial 3-way plots or 3d volcano plots. This differs from the original `polar_coords()` function in that it is the mean difference between the 2-way response outcome which is used to map coordinates along each axis for each of the 3 groups for unscaled polar coordinates. Scaled polar coordinates are generated using the *t*-statistic for each group comparison. There is no straightforward equivalent of a one-way ANOVA or likelihood ratio test to use for the *z* axis as is used in the simpler 3-way analysis for standard 3-way radial plots/ 3d volcano plots. So instead the *z* axis in a 2x3-way analysis uses the smallest p-value from each of the 3 paired comparisons. This p-value is transformed as usual as -log~10~(p). A table of p-values can be supplied by the user. But if a table of p-values is absent, p-values are automatically calculated by `polar_coords_2x3()`. Each of the 3 groups is subsetted and pairwise comparisons of the binary response factor are performed. Options here include unpaired *t*-test and Wilcoxon test (see `calc_stats_2x3()`). The colour scheme is not as straightforward as for the standard polar plot and volcano3D plot since genes (or attributes) can be significantly up or downregulated in the response comparison for each of the 3 groups. `process = "positive"` means that genes are labelled with colours if a gene is significantly upregulated in the response for that group. This uses the primary colours (RGB) so that if a gene is upregulated in both red and blue group it becomes purple etc with secondary colours. If the gene is upregulated in all 3 groups it is labelled black. Non-significant genes are in grey. With `process = "negative"` genes are coloured when they are significantly downregulated. With `process = "two.sided"` the colour scheme means that both significantly up- and down-regulated genes are coloured with down-regulated genes labelled with inverted colours (i.e. cyan is the inverse of red etc). However, significant upregulation in a group takes precedence. With `process = "negative"` the polar coordinates are also flipped as each axis now represents upregulated expression in non-responders. The end result is a `volc3d` class object for downstream plotting. ## Forest plots Specific genes/variables can be interrogated using a forest plot to investigate differences in binary response between the 3 groups. ```{r, eval=FALSE} forest_ggplot(obj, c("MS4A1", "FLG2", "SFN") ``` ```{r forest, echo = FALSE, message=FALSE, fig.align='center', out.width='70%', out.extra='style="border: 0;"'} knitr::include_graphics("forest.png") ``` The related functions `forest_plot()` and `forest_plotly()` can be used to generate plots in base graphics or plotly. --- ## Citation volcano3D was developed by the bioinformatics team at the [Experimental Medicine & Rheumatology department](https://www.qmul.ac.uk/whri/emr/) and [Centre for Translational Bioinformatics](https://www.qmul.ac.uk/c4tb/) at Queen Mary University London. If you use this package please cite as: ```{r} citation("volcano3D") ``` or using: > Lewis, Myles J., et al. _Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes_. Cell reports 28.9 (2019): 2455-2470.