Similarity-driven multi-view embeddings from high-dimensional biomedical data

Avants, Brian B.; Tustison, Nicholas J.; Stone, James R.

doi:10.1038/s43588-021-00029-8

Article
Published: 22 February 2021

Similarity-driven multi-view embeddings from high-dimensional biomedical data

Nature Computational Science volume 1, pages 143–152 (2021)Cite this article

890 Accesses
14 Citations
7 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 04 March 2021

This article has been updated

A preprint version of the article is available at arXiv.

Abstract

Diverse, high-dimensional modalities collected in large cohorts present new opportunities for the formulation and testing of integrative scientific hypotheses. Similarity-driven multi-view linear reconstruction (SiMLR) is an algorithm that exploits inter-modality relationships to transform large scientific datasets into smaller, more well-powered and interpretable low-dimensional spaces. SiMLR contributes an objective function to identify joint signal regularization based on sparse matrices representing prior within-modality relationships and an implementation that permits application to joint reduction of large data matrices. We demonstrate that SiMLR outperlforms closely related methods on supervised learning problems in simulation data, a multi-omics cancer survival prediction dataset and multiple modality neuroimaging datasets. Taken together, this collection of results shows that SiMLR may be applied to joint signal estimation from disparate modalities and may yield practically useful results in a variety of application domains.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: An overview of the SiMLR workflow.**

**Fig. 2: SiMLR simulation study results for sensitivity to noise and ability to recover the signal.**

**Fig. 3: Fully supervised brain age prediction and performance comparison with SGCCA for PTBP data.**

Supervised dimensionality reduction for big data

Article Open access 17 May 2021

Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer

Article Open access 05 January 2021

Cardinal v.3: a versatile open-source software for mass spectrometry imaging analysis

Article 23 November 2023

Data availability

All visualized plots in the main manuscript are generated from our code capsule, which contains both the specific data sources and software calls necessary to reproduce the figures⁸⁰.

The simulation data are built dynamically in R. The scripts that generate the data are publicly available in our code capsule⁸⁰. We downloaded evaluation data from the multi-omic cancer benchmark⁴⁷ website at http://acgt.cs.tau.ac.il/multi_omic_benchmark/download.html. Data are available in our code capsule⁸⁰ along with the relevant statistical details and calls needed to reproduce the results reported here. The data are free to use with no restrictions. The brain age data used here were obtained from PTBP⁸¹. These data were originally downloaded from https://figshare.com/articles/dataset/The_Pediatric_Template_of_Brain_Perfusion_PTBP_/923555. The relevant subset is available in our code capsule⁸⁰. The data are free to use with no restrictions. Supplementary data used here were obtained from the PING study database (https://chd.ucsd.edu/research/ping-study.html). PING requires a user to register and request data. The review of the request may also require institutional support and justification of data use. We originally gained access to these data in 2013 as part of the PING-in-a-box service, which is now defunct. Data used here were also obtained from the ADNI database (http://adni.loni.usc.edu). ADNI was launched in 2003 as a public-private partnership, led by M. W. Weiner. The primary goal of ADNI has been to test whether serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and early Alzheimer’s disease. For up-to-date information, see http://adni.loni.usc.edu. The investigators within ADNI contributed to the design and implementation of ADNI and/or provided data, but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Authorship_List.pdf. ADNI requires a user to register and request data. The review of the request may also require institutional support and justification of data use. We originally gained access to these data in 2008. The version used in the Supplementary Information was downloaded in August 2020 from LONI.

Code availability

ANTsR is open source and freely available at https://github.com/ANTsX/ANTsR. The development of the code available on GitHub is ongoing. The specific release version of the code and scripts used for the analysis and generation of figures in the main body of this manuscript are available in our code capsule⁸⁰.

Change history

04 March 2021
A Correction to this paper has been published: https://doi.org/10.1038/s43588-021-00049-4

References

Cole, J. H., Marioni, R. E., Harris, S. E. & Deary, I. J. Brain age and other bodily ‘ages’: implications for neuropsychiatry. Mol. Psychiatry 24, 266–281 (2019).
Article Google Scholar
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Article Google Scholar
Habeck, C., Stern, Y. & Alzheimer’s Disease Neuroimaging Initiative. Multivariate data analysis for neuroimaging data: overview and application to Alzheimer’s disease. Cell Biochem. Biophys. 58, 53–67 (2010).
Article Google Scholar
Shamy, J. L. et al. Volumetric correlates of spatiotemporal working and recognition memory impairment in aged rhesus monkeys. Cereb. Cortex 21, 1559–1573 (2011).
Article Google Scholar
McKeown, M. J. et al. Analysis of fMRI data by blind separation into independent spatial components. Hum. Brain Mapp. 6, 160–188 (1998).
Article Google Scholar
Calhoun, V. D., Adali, T., Pearlson, G. D. & Pekar, J. J. A method for making group inferences from functional MRI data using independent component analysis. Hum. Brain Mapp. 14, 140–151 (2001).
Article Google Scholar
Calhoun, V. D., Liu, J. & Adali, T. A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. Neuroimage 45, S163–S172 (2009).
Article Google Scholar
Avants, B. B., Cook, P. A., Ungar, L., Gee, J. C. & Grossman, M. Dementia induces correlated reductions in white matter integrity and cortical thickness: a multivariate neuroimaging study with sparse canonical correlation analysis. Neuroimage 50, 1004–1016 (2010).
Article Google Scholar
de Pierrefeu, A. et al. Structured sparse principal components analysis with the TV-elastic net penalty. IEEE Trans. Med. Imaging 37, 396–407 (2018).
Article Google Scholar
Du, L. et al. Structured sparse canonical correlation analysis for brain imaging genetics: an improved GraphNet method. Bioinformatics 32, 1544–1551 (2016).
Article Google Scholar
Avants, B. et al. Sparse unbiased analysis of anatomical variance in longitudinal imaging. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Jiang, T. et al.) 324–331 (Springer, 2010).
Avants, B. B. et al. Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population. Neuroimage 84, 698–711 (2014).
Article Google Scholar
Du, L.et al. in Brain Informatics and Health (eds Guo, Y. etal.) 275–284 (Springer, 2015)..
Guigui, N. et al. Network regularization in imaging genetics improves prediction performances and model interpretability on Alzheimer’s disease. In Proc. IEEE 16th International Symposium on Biomedical Imaging. 1403–1406 (IEEE, 2019).
Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999).
Article MATH Google Scholar
Chalise, P. & Fridley, B. L. Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm. PLoS ONE 12, e0176278 (2017).
Dhillon, P. et al. Subject-specific functional parcellation via Prior Based Eigenanatomy. Neuroimage 99, 14–27 (2014).
Article Google Scholar
Tikhonov, A. N. On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39, 195–198 (1943).
MathSciNet Google Scholar
Bell, J. B Solutions of ill-posed problems. Math. Comput. 32, 1320–1322 (1978).
Article Google Scholar
Smilde, A. K., Westerhuis, J. A. & de Jong, S. A framework for sequential multiblock component methods. J. Chemom. 17, 323–337 (2003).
Article Google Scholar
Tenenhaus, A. & Tenenhaus, M. Regularized generalized canonical correlation analysis. Psychometrika 76, 257–284 (2011).
Article MathSciNet MATH Google Scholar
Tenenhaus, M., Tenenhaus, A. & Groenen, P. J. Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika 82, 737–777 (2017).
Article MathSciNet MATH Google Scholar
Zhan, Z., Ma, Z. & Peng, W. Biomedical data analysis based on multi-view intact space learning with geodesic similarity preserving. Neural Processing Lett. 49, 1381–1398 (2019).
Article Google Scholar
Baltrušaitis, T., Ahuja, C. & Morency, L. P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2018).
Article Google Scholar
Kettenring, J. R. Canonical analysis of several sets of variables. Biometrika 58, 433–451 (1971).
Article MathSciNet MATH Google Scholar
Tenenhaus, A. et al. Variable selection for generalized canonical correlation analysis. Biostatistics 15, 569–583 (2014).
Article MATH Google Scholar
Rohart, F., Gautier, B., Singh, A. & LêCao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017).
Garali, I. et al. A strategy for multimodal data integration: application to biomarkers identification in spinocerebellar ataxia. Brief. Bioinform. 19, 1356–1369 (2017).
Article Google Scholar
Gloaguen, A. et al. Multiway generalized canonical correlation analysis. Biostatisticskxaa https://doi.org/10.1093/biostatistics/kxaa010 (2020).
Hotelling, H. The most predictable criterion. J. Educ. Psychol. 26, 139–142 (1935).
Article Google Scholar
Hotelling, H. Relations between two sets of variants. Biometrika 28, 321–377 (1936).
Article MATH Google Scholar
Lock, E. F., Hoadley, K. A., Marron, J. S. & Nobel, A. B. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Stat. 7, 523–542 (2013).
Article MathSciNet MATH Google Scholar
Yu, Q., Risk, B. B., Zhang, K. & Marron, J. S. JIVE integration of imaging and behavioral data. Neuroimage 152, 38–49 (2017).
Article Google Scholar
Ceulemans, E., Wilderjans, T. F., Kiers, H. A. & Timmerman, M. E. MultiLevel simultaneous component analysis: a computational shortcut and software package. Behav. Res. Methods 48, 1008–1020 (2016).
Article Google Scholar
Argelaguet, R. et al. Multi-omics factor analysis–a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018).
Carmichael, I. et al. Joint and individual analysis of breast cancer histologic images and genomic covariates. Preprint at https://arxiv.org/abs/1912.00434 (2019).
McMillan, C. T. et al. White matter imaging helps dissociate tau from TDP-43 in frontotemporal lobar degeneration. J. Neurol. Neurosurg. Psychiatry 84, 949–955 (2013).
Article Google Scholar
McMillan, C. T. et al. Genetic and neuroanatomic associations in sporadic frontotemporal lobar degeneration. Neurobiol. Aging 35, 1473–1482 (2014).
Article Google Scholar
Cook, P. A. et al. Relating brain anatomy and cognitive ability using a multivariate multimodal framework. Neuroimage 99, 477–486 (2014).
Article Google Scholar
Hyvärinen, A. & Oja, E. Independent component analysis: a tutorial. In Notes for International Joint Conference on Neural Networks (IJCNN, 1999)..
Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Networks 13, 411–430 (2000).
Article Google Scholar
Haykin, S. & Chen, Z. The cocktail party problem. Neural Comput. 17, 1875–1902 (2005).
Article Google Scholar
Andersen, P. K. & Gill, R. D. Cox’s regression model for counting processes: a large sample study. Ann. Stat. 10, 1100–1120 (1982).
Article MathSciNet MATH Google Scholar
Fox, J. & Weisberg, S. An R Companion to Applied Regression 2nd edn (2011).
Huang, L. et al. Development and validation of a prognostic model to predict the prognosis of patients who underwent chemotherapy and resection of pancreatic adenocarcinoma: a large international population-based cohort study. BMC Med. 17, 1–16 (2019).
Article Google Scholar
Neums, L., Meier, R., Koestler, D. C. & Thompson, J. A. Improving survival prediction using a novel feature selection and feature reduction framework based on the integration of clinical and molecular data. Pac. Symp. Biocomput. 25, 415–426 (2020).
Google Scholar
Rappoport, N. & Shamir, R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 46, 10546–10562 (2018).
Article Google Scholar
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Article Google Scholar
Yong, W.-S., Hsu, F.-M. & Chen, P.-Y. Profiling genome-wide DNA methylation. Epigenetics Chromatin 9, 1–16 (2016).
Article Google Scholar
Ozsolak, F. & Milos, P. M. RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12, 87–98 (2011).
Article Google Scholar
Witten, D. M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009).
Article MATH Google Scholar
Barnhart, H. X., Haber, M. & Song, J. Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics 58, 1020–1027 (2002).
Article MathSciNet MATH Google Scholar
Avants, B. B. et al. The pediatric template of brain perfusion. Sci. Data 2, 1–17 (2015).
Article Google Scholar
Kandel, B. M., Wang, D. J., Detre, J. A., Gee, J. C. & Avants, B. B. Decomposing cerebral blood flow MRI into functional and structural components: a non-local approach based on prediction. Neuroimage 105, 156–170 (2015).
Article Google Scholar
Tustison, N. J. et al. Logical circularity in voxel-based analysis: normalization strategy may induce statistical bias. Hum. Brain Mapp. 35, 745–759 (2014).
Article Google Scholar
Franke, K. & Gaser, C. Ten years of BrainAGE as a neuroimaging biomarker of brain aging: what insights have we gained?. Front. Neurol. 10, 789 (2019).
Article Google Scholar
Jernigan, T. L. et al. The pediatric imaging, neurocognition, and genetics (PING) data repository. Neuroimage 124, 1149–1154 (2016).
Article Google Scholar
Bro, R., Kjeldahl, K., Smilde, A. K. & Kiers, H. A. Cross-validation of component models: a critical look at current methods. Anal. Bioanal. Chem. 390, 1241–1251 (2008).
Article Google Scholar
Bickel, S. & Scheffer, T. Multi-view clustering. In Proc. IEEE International Conference on Data Mining. 19–26 (ICDM, 2004).
Wang, Y., Wu, L., Lin, X. & Gao, J. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 29, 4833–4843 (2018).
Article Google Scholar
De Vito, R., Bellio, R., Trippa, L. & Parmigiani, G. Multi-study factor analysis. Biometrics 75, 337–346 (2019).
Article MathSciNet MATH Google Scholar
Eddelbuettel, D. & Balamuta, J. J. Extending R with C++: a brief introduction to Rcpp. Am. Stat. 72, 28–36 (2018).
Article MathSciNet Google Scholar
Avants, B. B., Johnson, H. J. & Tustison, N. J. Neuroinformatics and the The Insight Toolkit. Front. Neuroinform. 9, 5 (2015).
Article Google Scholar
Avants, B. B. et al. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage 54, 2033–2044 (2011).
Article Google Scholar
Muschelli, J. et al. Neuroconductor: an R platform for medical imaging analysis. Biostatistics 20, 218–239 (2019).
Article MathSciNet Google Scholar
Zou, H., Hastie, T. & Tibshirani, R. Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006).
Article MathSciNet Google Scholar
Shen, H. & Huang, J. Z. Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99, 1015–1034 (2008).
Article MathSciNet MATH Google Scholar
Jolliffe, I. T., Trendafilov, N. T. & Uddin, M. A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12, 531–547 (2003).
Article MathSciNet Google Scholar
Lin, C. J. Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19, 2756–2779 (2007).
Article MathSciNet MATH Google Scholar
Jain, P., Netrapalli, P. & Sanghavi, S. Low-rank matrix completion using alternating minimization. In Proc. 45th Annual ACM Symposium on Theory of Computing. 665–674 (ACM, 2013).
Blumensath, T. & Davies, M. E. Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27, 265–274 (2009).
Article MathSciNet MATH Google Scholar
Pustina, D., Avants, B., Faseyitan, O. K., Medaglia, J. D. & Coslett, H. B. Improved accuracy of lesion to symptom mapping with multivariate sparse canonical correlations. Neuropsychologia 115, 154–166 (2018).
Article Google Scholar
Hanafi, M. PLS path modelling: computation of latent variables with the estimation mode B. Comput. Stat. 22, 275–292 (2007).
Article MathSciNet MATH Google Scholar
Tenenhaus, A., Philippe, C. & Frouin, V. Kernel generalized canonical correlation analysis. Comput. Stat. Data Anal. 90, 114–131 (2015).
Article MathSciNet MATH Google Scholar
Malkov, Y. A. & Yashunin, D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 824–836 (2018).
Article Google Scholar
Hill, W. G. & Robertson, A. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38, 226–231 (1968).
Article Google Scholar
Bahmani, S. & Raj, B. A unifying analysis of projected gradient descent for ℓp-constrained least squares. Appl. Comput. Harmon. Anal. 34, 366–378 (2013).
Article MathSciNet MATH Google Scholar
Martí, R., Resende, M. G. & Ribeiro, C. C. Multi-start methods for combinatorial optimization. Eur. J. Oper. Res. 226, 1–8 (2013).
Article MathSciNet MATH Google Scholar
Jernigan, T. L. et al. The Pediatric Imaging, Neurocognition, and Genetics (PING) Data Repository. NeuroImage 124, 1149–1154 (2016).
Article Google Scholar
Avants, B. B., Tustison, N. J. & Stone, J. R. SiMLR in ANTsR: interpretable, similarity-driven multi-view embeddings from high-dimensional biomedical data. Code Ocean https://doi.org/10.24433/CO.3087836.v2 (2021).
Avants, B. B., Tustison, N. J. & Wang, D. J. J. The pediatric template of brain perfusion (PTBP). figshare https://doi.org/10.6084/m9.figshare.923555.v20 (2013).

Download references

Acknowledgements

This work is supported by a combined grant from Cohen Veterans Bioscience (CVB-461) and the Office of Naval Research (N00014-18-1-2440) as well as the National Institutes of Health (K01-ES025432-01).

Supplementary data used in the preparation of this article were obtained from the PING study database (https://chd.ucsd.edu/research/ping-study.html). The investigators within PING contributed to the design and implementation of the PING database and/or provided data, but did not participate in the analysis or writing of this report. A complete listing of investigators of the PING study can be found at ref. ⁷⁹.

Supplementary data collection and sharing for this project was funded by ADNI (National Institutes of Health Grant U01 AG024904) and the Department of Defense ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica; Biogen; Bristol Myers Squibb; CereSpir; Cogstate; Eisai; Elan Pharmaceuticals; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche and its affiliated company Genentech; Fujirebio; GE Healthcare; IXICO; Janssen Alzheimer Immunotherapy Research & Development; Johnson & Johnson Pharmaceutical Research & Development; Lumosity; Lundbeck; Merck & Co.; Meso Scale Diagnostics; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (https://fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory of Neuro Imaging at the University of Southern California.

Author information

Authors and Affiliations

Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, VA, USA
Brian B. Avants, Nicholas J. Tustison & James R. Stone

Authors

Brian B. Avants
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas J. Tustison
View author publications
You can also search for this author in PubMed Google Scholar
James R. Stone
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.B.A., N.J.T. and J.R.S. made substantial contributions to the conception and design of the work, and the analysis and interpretation of data. B.B.A. and N.J.T. created the software. All authors drafted and revised the manuscript.

Corresponding author

Correspondence to Brian B. Avants.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks Steve Marron, Cathy Philippe and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Fernando Chirigati was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–5, Tables 1–4 and discussion.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Avants, B.B., Tustison, N.J. & Stone, J.R. Similarity-driven multi-view embeddings from high-dimensional biomedical data. Nat Comput Sci 1, 143–152 (2021). https://doi.org/10.1038/s43588-021-00029-8

Download citation

Received: 10 June 2020
Accepted: 19 January 2021
Published: 22 February 2021
Issue Date: February 2021
DOI: https://doi.org/10.1038/s43588-021-00029-8

This article is cited by

Multimodal Fusion of Brain Imaging Data: Methods and Applications
- Na Luo
- Weiyang Shi
- Tianzi Jiang
Machine Intelligence Research (2024)
Hypergraph regularized low-rank tensor multi-view subspace clustering via L1 norm constraint
- Guoqing Liu
- Hongwei Ge
- Shuangxi Wang
Applied Intelligence (2023)
Multi-view subspace enhanced representation of manifold regularization and low-rank tensor constraint
- Guoqing Liu
- Hongwei Ge
- Shuangxi Wang
International Journal of Machine Learning and Cybernetics (2023)
Low-rank tensor multi-view subspace clustering via cooperative regularization
- Guoqing Liu
- Hongwei Ge
- Shuangxi Wang
Multimedia Tools and Applications (2023)
Multi-view clustering via dual-norm and HSIC
- Guoqing Liu
- Hongwei Ge
- Shuangxi Wang
Multimedia Tools and Applications (2022)