← Back to Work

Diffusion for Complete Generation of Collider Events

End-to-end generative modeling for high-dimensional point cloud datasets in physics.

Full Paper Link

Iridescent ripples of a bright blue and pink liquid

Synopsis

This work makes the first direct comparison between two state-of-the-art score-based generative models using either images or point clouds as representations of the same high-granularity calorimeter simulation data. By comparing both approaches, we offer valuable insights into the challenges of modeling sparse hadronic showers using image-based techniques while exploring the immense potential of point-cloud methods for future experimental detectors.

Abstract

Score based generative models are a new class of generative models that have been shown to accurately generate high dimensional calorimeter datasets. Recent advances in generative models have used images with 3D voxels to represent and model complex calorimeter showers. Point clouds, however, are likely a more natural representation of calorimeter showers, particularly in calorimeters with high granularity. Point clouds preserve all of the information of the original simulation, more naturally deal with sparse datasets, and can be implemented with more compact models and data files. In this work, two state-of-the-art score based models are trained on the same s[118;1:3uet of calorimeter simulation and directly compared.

Introduction

Detector simulations are essential tools for data analysis by connecting particle and nuclear physics predictions to measurable quantities. However, the most precise detector simulations (usually based on GEANT 1) are computationally expensive. This is especially true for calorimeters, which are designed to stop most particles and thus require modeling interactions across multiple energy scales. If there was a way to build a fast simulation automatically and using the full detector dimensionality, then data analysis at existing and developing experiments could be greatly enhanced.

Deep learning (DL) has been used to build automated and high-dimensional fast simulations ('surrogate models') for calorimeters. Starting from Generative Adversarial Networks (GANs) and now including Diffusion Models 2, these methods have rapidly improved. However, nearly all proposed methods for DL-based calorimeter simulations are based on an image format (a fixed grid of pixels) 3. These data are unlike natural images in a number of ways, most notably in their sparsity.

Representing Sparse Data

Since most cells in a high-granularity calorimeter image are empty, a more natural representation of these data may be a point cloud 4. Point clouds are a set of attributes assigned to locations in space; in the calorimeter case, the attribute is energy and the location is the cell coordinates. A calorimeter point cloud would require far fewer numbers to specify than an image representation, since only cells with non-zero energy would be recorded.

For a fair comparison, two diffusion models (one image-based, one point-cloud based) are trained using the same score-matching strategy on representations of the same parent GEANT simulation. We simulate a high-granularity iron-scintillator calorimeter similar to the forward hadronic calorimeter planned for the ePIC detector at the Electron-Ion Collider.

Results & Advantages

Both models perform well for most distributions and show very promising classifier performance (AUCs) 5, deviating no more than 10% from the baseline at smaller deposited energies. However, the point cloud model offers several distinct advantages over the image model:

As calorimeters continue to increase in granularity, the advantages of point clouds, combined with further model optimizations, will likely make point cloud based models a clear choice for future detectors 8.

References

  1. GEANT is the standard framework for the precise simulation of the passage of particles through matter, but requires substantial processing time.
  2. Vinicius Mikuni and Benjamin Nachman. "Score-based generative models for calorimeter shower simulation." Phys. Rev. D, 106(9):092009, 2022.
  3. The recent CaloChallenge community comparison showcased state-of-the-art methods, but predominantly relied on modeling the calorimeters as segmented image grids.
  4. Vinicius Mikuni, Benjamin Nachman, and Mariel Pettee. "Fast Point Cloud Generation with Diffusion Models in High Energy Physics." 2023.
  5. A classifier trained to distinguish between generated and GEANT showers produced an AUC of 0.673 for the image model and 0.726 for the point cloud model.
  6. At full scale testing, the point cloud model contains roughly 620k parameters vs 2.5 million for the image model, reducing the time to sample 100k events from ~8000 seconds down to ~2600 seconds.
  7. The image representation required cells to be clustered into $5 \times 5 \times 5$ voxels due to memory limitations, while the point cloud efficiently learns from the zero-suppressed cell-level hits directly.
  8. R. Abdul Khalek et al. "Science Requirements and Detector Concepts for the Electron-Ion Collider: EIC Yellow Report." Nucl. Phys. A, 1026:122447, 2022.