Andreas Mang :: Publications

Selected Publications

If you are interested in a paper that is not available from this site send me an andreas at math dot uh dot edu. As indicated below, some of the work I contributed to can be found on arXiv. You can find a more complete list of the works I have contributed to on Google Scholar.

An operator-splitting approach for variational optimal control formulations for diffeomorphic shape matching.
Journal of Computational Physics, 493:1124632023, 2023.
with Jiwen He & Robert Azencott
[doi:10.1016/j.jcp.2023.112463 | arXiv:2307.10114v1]

Abstract

We present formulations and numerical algorithms for solving diffeomorphic shape matching problems. We formulate shape matching as a variational problem governed by a dynamical system that models the flow of diffeomorphism $f_t \in \operatorname{diff}(\mathbb{R}^3)$. We overview our contributions in this area, and present an improved, matrix-free implementation of an operator splitting strategy for diffeomorphic shape matching. We showcase results for diffeomorphic shape matching of real clinical cardiac data in $\mathbb{R}^3$ to assess the performance of our methodology.

CLAIRE: Constrained large deformation diffeomorphic image registration on parallel computing architectures.
Journal of Open Source Software, 6(61):3038, 2021.
with M. Brunn, N. Himthani, G. Biros & M. Mehl
[doi:10.21105/joss.03038]

Abstract

CLAIRE (Mang & Biros, 2019) is a computational framework for Constrained LArge deformation diffeomorphic Image REgistration (Mang et al., 2019). It supports highly-optimized, parallel computational kernels for (multi-node) CPU (Gholami et al., 2017; Mang et al., 2019; Mang & Biros, 2016) and (multi-node multi-)GPU architectures (Brunn et al., 2020, 2021). CLAIRE uses MPI for distributed-memory parallelism and can be scaled up to thousands of cores (Mang et al., 2019; Mang & Biros, 2016) and GPU devices (Brunn et al., 2020). The multi-GPU implementation uses device direct communication. The computational kernels are interpolation for semi-Lagrangian time integration, and a mixture of high-order finite difference operators and Fast-Fourier-Transforms (FFTs) for differentiation. CLAIRE uses a Newton-Krylov solver for numerical optimization (Mang & Biros, 2015, 2017). It features various schemes for regularization of the control problem (Mang & Biros, 2016) and different similarity measures. CLAIRE implements different preconditioners for the reduced space Hessian (Brunn et al., 2020; Mang et al., 2019) to optimize computational throughput and enable fast convergence. It uses PETSc (Balay et al., n.d.) for scalable and efficient linear algebra operations and solvers and TAO (Balay et al., n.d.; Munson et al., 2015) for numerical optimization. CLAIRE can be downloaded at https://github.com/andreasmang/claire.

Fast GPU 3D diffeomorphic image registration.
Journal of Parallel and Distributed Computing, 149:149-162, 2021.
with Malte Brunn, Naveen Himthani, George Biros & Miriam Mehl
[doi:10.1016/j.jpdc.2020.11.006 | arXiv:2004.08893]

Abstract

3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss–Newton–Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformation diffeomorphic registration packages support GPUs. Our contributions are new algorithms to significantly reduce the run time of the two main computational kernels in CLAIRE: calculation of derivatives and scattered-data interpolation. We deploy (i) highly-optimized, mixed-precision GPU-kernels for the evaluation of scattered-data interpolation, (ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with optimized 8th-order finite differences, and (iii) compare with state-of-the-art CPU and GPU implementations. As a highlight, we demonstrate that we can register 256^3 clinical images in less than 6 s on a single NVIDIA Tesla V100. This amounts to over 20 speed-up over the current version of CLAIRE and over 30 speed-up over existing GPU implementations.

Integrated biophysical modeling and image analysis: Application to neuro-oncology.
Annual Reviews in Biomedical Engineering, 22:309-341, 2020.
with Spyridon Bakas, Shashank Subramanian, Christos Davatzikos & George Biros
[doi:10.1146/annurev-bioeng-062117-121105 | arXiv:2002.09628]

Abstract

Central nervous system (CNS) tumors come with vastly heterogeneous histologic, molecular, and radiographic landscapes, rendering their precise characterization challenging. The rapidly growing fields of biophysical modeling and radiomics have shown promise in better characterizing the molecular, spatial, and temporal heterogeneity of tumors. Integrative analysis of CNS tumors, including clinically acquired multi-parametric magnetic resonance imaging (mpMRI) and the inverse problem of calibrating biophysical models to mpMRI data, assists in identifying macroscopic quantifiable tumor patterns of invasion and proliferation, potentially leading to improved (a) detection/segmentation of tumor subregions and (b) computer-aided diagnostic/prognostic/predictive modeling. This article presents a summary of (a) biophysical growth modeling and simulation, (b) inverse problems for model calibration, (c) these models’ integration with imaging workflows, and (d) their application to clinically relevant studies. We anticipate that such quantitative integrative analysis may even be beneficial in a future revision of the World Health Organization (WHO) classification for CNS tumors, ultimately improving patient survival prospects.

Multi-node multi-GPU diffeomorphic image registration for large-scale imaging problems.
Proc ACM/IEEE Conference on Supercomputing, pp. 523-539, 2020.
with Malte Brunn, Naveen Himthani, George Biros & Miriam Mehl
[doi:10.1109/SC41405.2020.00042 | arXiv:2008.12820]

Abstract

We present a Gauss-Newton-Krylov solver for large deformation diffeomorphic image registration. We extend the publicly available CLAIRE library to multi-node multi-graphics processing unit (GPUs) systems and introduce novel algorithmic modifications that significantly improve performance. Our contributions comprise (i) a new preconditioner for the reduced-space Gauss-Newton Hessian system, (ii) a highly-optimized multi-node multi-GPU implementation exploiting device direct communication for the main computational kernels (interpolation, high-order finite difference operators and Fast-Fourier-Transform), and (iii) a comparison with state-of-the-art CPU and GPU implementations. We solve a 256^3-resolution image registration problem in five seconds on a single NVIDIA Tesla V100, with a performance speedup of 70% compared to the state-of-the-art. In our largest run, we register 2048^3 resolution images (25 B unknowns; approximately 152× larger than the largest problem solved in state-of-the-art GPU implementations) on 64 nodes with 256 GPUs on TACC's Longhorn system.

Image-driven biophysical tumor growth model calibration.
SIAM Journal on Scientific Computing, 42(3):B549-B580, 2020.
with Klaudius Scheufele, Shashank Subramanian, George Biros & Miriam Mehl
[doi:10.1137/19M1275280 | arXiv:1907.07774]

Abstract

We present a novel formulation for the calibration of a biophysical tumor growth model from a single-time snapshot, multiparametric magnetic resonance imaging (MRI) scan of a glioblastoma patient. Tumor growth models are typically nonlinear parabolic partial differential equations (PDEs). Thus, we have to generate a second snapshot to be able to extract significant information from a single patient snapshot. We create this two-snapshot scenario as follows. We use an atlas (an average of several scans of healthy individuals) as a substitute for an earlier, pretumor, MRI scan of the patient. Then, using the patient scan and the atlas, we combine image-registration algorithms and parameter estimation algorithms to achieve a better estimate of the healthy patient scan and the tumor growth parameters that are consistent with the data. Our scheme is based on our recent work (Scheufele et al., Comput. Methods Appl. Mech. Engrg., to appear), but we apply a different and novel scheme where the tumor growth simulation in contrast to the previous work is executed in the patient brain domain and not in the atlas domain yielding more meaningful patient-specific results. As a basis, we use a PDE-constrained optimization framework. We derive a modified Picard-iteration-type solution strategy in which we alternate between registration and tumor parameter estimation in a new way. In addition, we consider an $\ell_1$ sparsity constraint on the initial condition for the tumor and integrate it with the new joint inversion scheme. We solve the subproblems with a reduced space, inexact Gauss--Newton--Krylov/quasi-Newton method. We present results using real brain data with synthetic tumor data that show that the new scheme reconstructs the tumor parameters in a more accurate and reliable way compared to our earlier scheme.

CLAIRE: A distributed-memory solver for constrained large deformation diffeomorphic image registration.
SIAM Journal on Scientific Computing, 41(5):C548-C584, 2019.
with Amir Gholami, Christos Davatzikos & George Biros
[doi:10.1137/18M1207818 | arXiv:1808.04487]

Abstract

With this work we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diifeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss‒Newton‒Krylov scheme. We exploit state-of-the-art techniques in scientific computing to develop an eifective solver that scales to thousands of distributed memory nodes on high-end clusters. We present the formulation, discuss algorithmic features, describe the software package, and introduce an improved preconditioner for the reduced space Hessian to speed up the convergence of our solver. We test registration performance on synthetic and real data. We Demonstrate registration accuracy on several neuroimaging datasets. We compare the performance of our scheme against diiferent flavors of the Demons algorithm for diifeomorphic image registration. We study convergence of our preconditioner and our overall algorithm. We report scalability results on state-of-the-art supercomputing platforms. We Demonstrate that we can solve registration problems for clinically relevant data sizes in two to four minutes on a standard compute node with 20 cores, attaining excellent data fidelity. With the present work we achieve a speedup of (on average) 5× with a peak performance of up to 17× compared to our former work.

Coupling brain-tumor biophysical models and diffeomorphic image registration.
Computer Methods in Applied Mechanics and Engineering, 347:533-567, 2019.
with Klaudius Scheufele, Amir Gholami, Christos Davatzikos, George Biros & Miriam Mehl
[doi:10.1016/j.cma.2018.12.008 | arXiv:1710.06420]

Abstract

We present SIBIA (Scalable Integrated Biophysics-based Image Analysis), a framework for joint image registration and biophysical inversion and we apply it to analyze MR images of glioblastomas (primary brain tumors). We have two applications in mind. The first one is normal-to-abnormal image registration in the presence of tumor-induced topology differences. The second one is biophysical inversion based on single-time patient data. The underlying optimization problem is highly non-linear and non-convex and has not been solved before with a gradient-based approach.
Given the segmentation of a normal brain MRI and the segmentation of a cancer patient MRI, we determine tumor growth parameters and a registration map so that if we “grow a tumor” (using our tumor model) in the normal brain and then register it to the patient image, then the registration mismatch is as small as possible. This “coupled problem” two-way couples the biophysical inversion and the registration problem. In the image registration step we solve a large-deformation diffeomorphic registration problem parameterized by an Eulerian velocity field. In the biophysical inversion step we estimate parameters in a reaction–diffusion tumor growth model that is formulated as a partial differential equation (PDE). In SIBIA, we couple these two sub-components in an iterative manner. We first presented the components of SIBIA in “Gholami et al., Framework for Scalable Biophysics-based Image Analysis, IEEE/ACM Proceedings of the SC2017,” in which we derived parallel distributed memory algorithms and software modules for the decoupled registration and biophysical inverse problems. In this paper, our contributions are the introduction of a PDE-constrained optimization formulation of the coupled problem, and the derivation of a Picard iterative solution scheme. We perform extensive tests to experimentally assess the performance of our method on synthetic and clinical datasets. We demonstrate the convergence of the SIBIA optimization solver in different usage scenarios. We demonstrate that using SIBIA, we can accurately solve the coupled problem in three dimensions ( resolution) in a few minutes using 11 dual-x86 nodes.

PDE-constrained optimization in medical image analysis.
Optimization and Engineering, 19:765–812, 2018.
with Amir Gholami, Christos Davatzikos & George Biros
[doi:10.1007/s11081-018-9390-9 | arXiv:1803.00058]

Abstract

PDE-constrained optimization problems find many applications in medical image analysis, for example, neuroimaging, cardiovascular imaging, and oncological imaging. We review related literature and give examples on the formulation, discretization, and numerical solution of PDE-constrained optimization problems for medical imaging. We discuss three examples. The first one is image registration. The second one is data assimilation for brain tumor patients, and the third one data assimilation in cardiovascular imaging. The image registration problem is a classical task in medical image analysis and seeks to find pointwise correspondences between two or more images. The data assimilation problems use a PDE-constrained formulation to link a biophysical model to patient-specific data obtained from medical images. The associated optimality systems turn out to be sets of nonlinear, multicomponent PDEs that are challenging to solve in an efficient way.
The ultimate goal of our work is the design of inversion methods that integrate complementary data, and rigorously follow mathematical and physical principles, in an attempt to support clinical decision making. This requires reliable, high-fidelity algorithms with a short time-to-solution. This task is complicated by model and data uncertainties, and by the fact that PDE-constrained optimization problems are ill-posed in nature, and in general yield high-dimensional, severely ill-conditioned systems after discretization. These features make regularization, effective preconditioners, and iterative solvers that, in many cases, have to be implemented on distributed-memory architectures to be practical, a prerequisite. We showcase state-of-the-art techniques in scientific computing to tackle these challenges.

A Lagrangian Gauss-Newton-Krylov solver for intensity- and mass-preserving diffeomorphic image registration.
SIAM Journal on Scientific Computing, 39(5):B860-B885, 2017.
with Lars Ruthotto
[doi:10.1137/17M1114132 | arXiv:1703.04446]

Abstract

We present an efficient solver for diffeomorphic image registration problems in the framework of large deformation diffeomorphic metric mapping (LDDMM). We use an optimal control formulation, in which the velocity field of a hyperbolic PDE needs to be found such that the distance between the final state of the system (the transformed/transported template image) and the observation (the reference image) is minimized. Our solver supports both stationary and nonstationary (i.e., transient or time-dependent) velocity fields. As transformation models, we consider both the transport equation (assuming intensities are preserved during the deformation) and the continuity equation (assuming masses are preserved). We consider the reduced form of the optimal control problem and solve the resulting unconstrained optimization problem using a discretize-then-optimize approach. A key contribution is the elimination of the PDE constraint using a Lagrangian hyperbolic PDE solver. Lagrangian methods rely on the concept of characteristic curves. We approximate these curves using a fourth-order Runge--Kutta method. We also present an efficient algorithm for computing the derivatives of the final state of the system with respect to the velocity field. This allows us to use fast Gauss--Newton-based methods. We present quickly converging iterative linear solvers using spectral preconditioners that render the overall optimization efficient and scalable. Our method is embedded into the image registration framework FAIR and, thus, supports the most commonly used similarity measures and regularization functionals. We demonstrate the potential of our new approach using several synthetic and real-world test problems with up to 14.7 million degrees of freedom.

A semi-Lagrangian two-level preconditioned Newton—Krylov solver for constrained diffeomorphic image registration.
SIAM Journal on Scientific Computing, 39(6):B1064-B1101, 2017.
with George Biros
[doi:10.1137/16M1070475 | arXiv:1604.02153v2]

Abstract

We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation.
We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations.
We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20× speedup for a two dimensional, real world multi-subject medical image registration problem.

A framework for scalable biophysics-based image analysis.
Proc ACM/IEEE Conference on SuperComputing, 19, 2017.
with Amir Gholami, Klaudius Scheufele, Christos Davatzikos, Miriam Mehl & George Biros
[doi:10.1145/3126908.3126930]

Abstract

We present SIBIA (Scalable Integrated Biophysics-based Image Analysis), a framework for coupling biophysical models with medical image analysis. It provides solvers for an image-driven inverse brain tumor growth model and an image registration problem, the combination of which can eventually help in diagnosis and prognosis of brain tumors. The two main computational kernels of SIBIA are a Fast Fourier Transformation (FFT) implemented in the library AccFFT to discretize differential operators, and a cubic interpolation kernel for semi-Lagrangian based advection. We present efficiency and scalability results for the computational kernels, the inverse tumor solver and image registration on two x86 systems, Lonestar 5 at the Texas Advanced Computing Center and Hazel Hen at the Stuttgart High Performance Computing Center. We showcase results that demonstrate that our solver can be used to solve registration problems of unprecedented scale, 40963 resulting in ∼ 200 billion unknowns---a problem size that is 64X larger than the state-of-the-art. For problem sizes of clinical interest, SIBIA is about 8X faster than the state-of-the-art.

Constrained H1 regularization schemes for diffeomorphic image registration.
SIAM Journal on Imaging Sciences, 9(3):1154-1194, 2016.
with George Biros
[doi:10.1137/15M1010919 | arXiv:1503.00757]

Abstract

We propose regularization schemes for deformable registration and efficient algorithms for their numerical approximation. We treat image registration as a variational optimal control problem. The deformation map is parametrized by its velocity. Tikhonov regularization ensures well-posedness. Our scheme augments standard smoothness regularization operators based on H2- and H1-seminorms with a constraint on the divergence of the velocity field, which resembles variational formulations for Stokes incompressible flows. In our formulation, we invert for a stationary velocity field and a mass source map. This allows us to explicitly control the compressibility of the deformation map and by that the determinant of the deformation gradient. We also introduce a new regularization scheme that allows us to control shear. We use a globalized, preconditioned, matrix-free, reduced space (Gauss--)Newton--Krylov scheme for numerical optimization. We exploit variable elimination techniques to reduce the number of unknowns of our system; we only iterate on the reduced space of the velocity field. Our current implementation is limited to the two-dimensional case. The numerical experiments demonstrate that we can control the determinant of the deformation gradient without compromising registration quality. This additional control allows us to avoid oversmoothing of the deformation map. We also demonstrate that we can promote or penalize shear while controlling the determinant of the deformation gradient.

Distributed-memory large deformation diffeomorphic 3D image registration.
Proc ACM/IEEE Conference on SuperComputing, 72:842-853, 2016.
with Amir Gholami & George Biros
[doi:10.1109/SC.2016.71 | arXiv:1608.03630]

Abstract

We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the solution of a highly nonlinear problem that involves pseudo-differential operators, biharmonic operators, and pure advection operators both forward and backward in time. A key issue is the time to solution, which poses the demand for efficient optimization methods as well as an effective utilization of high performance computing resources. To address this problem we use a preconditioned, inexact, Gauss-Newton-Krylov solver. Our algorithm integrates several components: a spectral discretization in space, a semi-Lagrangian formulation in time, analytic adjoints, different regularization functionals (including volume-preserving ones), a spectral preconditioner, a highly optimized distributed Fast Fourier Transform, and a cubic interpolation scheme for the semi-Lagrangian time-stepping. We demonstrate the scalability of our algorithm on images with resolution of up to 10243 on the "Maverick" and "Stampede" systems at the Texas Advanced Computing Center (TACC). The critical problem in the medical imaging application domain is strong scaling, that is, solving registration problems of a moderate size of 2563---a typical resolution for medical images. We are able to solve the registration problem for images of this size in less than five seconds on 64 ×86 nodes of TACC's "Maverick" system.

An inverse problem formulation for parameter estimation of a reaction-diffusion model for low grade gliomas.
Journal of Mathematical Biololgy, 72(1):409-433, 2016.
with Amir Gholami & George Biros
[doi:10.1007/s00285-015-0888-x | arXiv:1408.6221]

Abstract

We present a numerical scheme for solving a parameter estimation problem for a model of low-grade glioma growth. Our goal is to estimate the spatial distribution of tumor concentration, as well as the magnitude of anisotropic tumor diffusion. We use a constrained optimization formulation with a reaction–diffusion model that results in a system of nonlinear partial differential equations. In our formulation, we estimate the parameters using partially observed, noisy tumor concentration data at two different time instances, along with white matter fiber directions derived from diffusion tensor imaging. The optimization problem is solved with a Gauss–Newton reduced space algorithm. We present the formulation and outline the numerical algorithms for solving the resulting equations. We test the method using a synthetic dataset and compute the reconstruction error for different noise levels and detection thresholds for monofocal and multifocal test cases.

An inexact Newton-Krylov algorithm for constrained diffeomorphic image registration.
SIAM Journal on Imaging Sciences, 8(2):1030-1069, 2015.
with George Biros
[doi:10.1137/140984002 | arXiv:1408.6299]

Abstract

We propose numerical algorithms for solving large deformation diffeomorphic image registration problems. We formulate the nonrigid image registration problem as a problem of optimal control. This leads to an infinite-dimensional partial differential equation (PDE) constrained optimization problem.
The PDE constraint consists, in its simplest form, of a hyperbolic transport equation for the evolution of the image intensity. The control variable is the velocity field. Tikhonov regularization on the control ensures well-posedness. We consider standard smoothness regularization based on H1- or H2-seminorms. We augment this regularization scheme with a constraint on the divergence of the velocity field rendering the deformation incompressible and thus ensuring that the determinant of the deformation gradient is equal to one, up to the numerical error.
We use a Fourier pseudospectral discretization in space and a Chebyshev pseudospectral discretization in time. We use a preconditioned, globalized, matrix-free, inexact Newton-Krylov method for numerical optimization. A parameter continuation is designed to estimate an optimal regularization parameter. Regularity is ensured by controlling the geometric properties of the deformation field. Overall, we arrive at a black-box solver. We study spectral properties of the Hessian, grid convergence, numerical accuracy, computational efficiency, and deformation regularity of our scheme. We compare the designed Newton-Krylov methods with a globalized preconditioned gradient descent. We study the influence of a varying number of unknowns in time.
The reported results demonstrate excellent numerical accuracy, guaranteed local deformation regularity, and computational efficiency with an optional control on local mass conservation. The Newton-Krylov methods clearly outperform the Picard method if high accuracy of the inversion is required.