Statisticsmedium

Below Is an Example R Script Which Simulates Peaks from a Chromatography Spectrum

Question

Below is an example R script which simulates peaks from a chromatography spectrum. Analyze the script, explain the statistical model behind the peak simulation, and describe how to extract peak parameters from the output.

Gaussian peak simulation, detection, and parameter extraction in R.

Peak Model

Gaussian

Each peak is modeled as h · exp(−(x−μ)² / 2σ²) plus noise

1Understanding the Peak Model

Chromatography peaks are modeled as Gaussian functions: f(x)=hexp((xμ)22σ2)f(x) = h \cdot \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right) where hh is peak height, μ\mu is retention time (peak center), and σ\sigma is the peak width parameter.
ParameterSymbolPhysical MeaningUnits
HeighthMaximum intensityAU (absorbance)
PositionμRetention timeminutes
WidthσStandard deviationminutes
AreaA = hσ√(2π)Proportional to amountAU·min

2Simulating the Spectrum

The simulated chromatogram is a sum of individual Gaussian peaks plus random noise: S(x)=i=1nhiexp((xμi)22σi2)+ϵ(x)S(x) = \sum_{i=1}^{n} h_i \cdot \exp\left(-\frac{(x - \mu_i)^2}{2\sigma_i^2}\right) + \epsilon(x) where ϵ(x)N(0,σnoise2)\epsilon(x) \sim N(0, \sigma_{noise}^2).
  R Script Structure:

  # Define peak parameters
  peaks <- data.frame(
    mu    = c(2.5, 5.0, 7.3),  # retention times
    h     = c(100, 250, 80),   # heights
    sigma = c(0.3, 0.4, 0.25)  # widths
  )

  # Generate signal + noise
  x <- seq(0, 10, by = 0.01)
  signal <- rowSums(sapply(1:3, function(i)
    peaks$h[i] * exp(-(x-peaks$mu[i])^2 /
      (2*peaks$sigma[i]^2))))
  noise <- rnorm(length(x), 0, 5)
  spectrum <- signal + noise

3Peak Detection Strategy

Peak detection involves three steps: (1) smoothing to reduce noise, (2) finding local maxima via first derivative zero-crossings, (3) filtering by minimum height threshold.

Finding Local Maxima in R

Use `diff()` to compute the first difference (numerical derivative). A sign change from positive to negative in `diff(signal)` indicates a local maximum. The `pracma::findpeaks()` function automates this with adjustable thresholds.

4Extracting Peak Parameters

Once peaks are detected, fit Gaussian curves using nonlinear least squares (`nls()` in R) to extract hh, μ\mu, and σ\sigma. The area under each peak is: A=hσ2πA = h \cdot \sigma \cdot \sqrt{2\pi}
Step 1: Smooth signalMoving average or Savitzky-Golay
Step 2: Find peaksdiff() zero-crossings
Step 3: Fit Gaussiansnls() per peak region
Step 4: Compute areasA = hσ√(2π)
OUTPUTμ, h, σ, A per peak

Resolution

Two peaks are **resolved** if the valley between them drops below 10% of the smaller peak's height. Resolution Rs=(μ2μ1)/(2(σ1+σ2))R_s = (\mu_2 - \mu_1) / (2(\sigma_1 + \sigma_2)). Values Rs>1.5R_s > 1.5 indicate baseline resolution.

Quiz

Test your understanding with these questions.

1
What mathematical function is typically used to model a single chromatography peak?
2
How is the area under a Gaussian peak calculated?