Statisticsmedium
Below Is an Example R Script Which Simulates Peaks from a Chromatography Spectrum
Question
Below is an example R script which simulates peaks from a chromatography spectrum. Analyze the script, explain the statistical model behind the peak simulation, and describe how to extract peak parameters from the output.
Gaussian peak simulation, detection, and parameter extraction in R.
Peak Model
Gaussian
Each peak is modeled as h · exp(−(x−μ)² / 2σ²) plus noise
1Understanding the Peak Model
Chromatography peaks are modeled as Gaussian functions: where is peak height, is retention time (peak center), and is the peak width parameter.
| Parameter | Symbol | Physical Meaning | Units |
|---|---|---|---|
| Height | h | Maximum intensity | AU (absorbance) |
| Position | μ | Retention time | minutes |
| Width | σ | Standard deviation | minutes |
| Area | A = hσ√(2π) | Proportional to amount | AU·min |
2Simulating the Spectrum
The simulated chromatogram is a sum of individual Gaussian peaks plus random noise: where .
R Script Structure: # Define peak parameters peaks <- data.frame( mu = c(2.5, 5.0, 7.3), # retention times h = c(100, 250, 80), # heights sigma = c(0.3, 0.4, 0.25) # widths ) # Generate signal + noise x <- seq(0, 10, by = 0.01) signal <- rowSums(sapply(1:3, function(i) peaks$h[i] * exp(-(x-peaks$mu[i])^2 / (2*peaks$sigma[i]^2)))) noise <- rnorm(length(x), 0, 5) spectrum <- signal + noise
3Peak Detection Strategy
Peak detection involves three steps: (1) smoothing to reduce noise, (2) finding local maxima via first derivative zero-crossings, (3) filtering by minimum height threshold.
Finding Local Maxima in R
Use `diff()` to compute the first difference (numerical derivative). A sign change from positive to negative in `diff(signal)` indicates a local maximum. The `pracma::findpeaks()` function automates this with adjustable thresholds.
4Extracting Peak Parameters
Once peaks are detected, fit Gaussian curves using nonlinear least squares (`nls()` in R) to extract , , and . The area under each peak is:
Step 1: Smooth signalMoving average or Savitzky-Golay
Step 2: Find peaksdiff() zero-crossings
Step 3: Fit Gaussiansnls() per peak region
Step 4: Compute areasA = hσ√(2π)
OUTPUTμ, h, σ, A per peak
Resolution
Two peaks are **resolved** if the valley between them drops below 10% of the smaller peak's height. Resolution . Values indicate baseline resolution.
Quiz
Test your understanding with these questions.
1
What mathematical function is typically used to model a single chromatography peak?
2
How is the area under a Gaussian peak calculated?