We were looking at the results of a popular spectroscopic data set compiled by Martens et al “Using an extended multiplicative signal correction, we can disentangle the two effects of light scattering and light absorption. Useful for analyzing powder blends via near-infrared transmission “Article published in Analytical Chemistry, Volume 75, Issue 3 (February 1, 2003): 394–404. We were able to drastically increase the data’s signal-to-noise ratio with just a few easy pre-processing steps.
Here, we’ll show how useful JMP’s multivariate platforms are by applying them to the same dataset. As one iteratively tests new pre-processing procedures and evaluates their effect on the data, spectral analysis makes heavy use of multivariate statistics. First, we place emphasis on unsupervised learning, also known as exploratory data analysis. Next, we show how to construct multivariate calibration models using the functional data explorer (FDE). Lastly, we present a more sophisticated pre-processing technique, the extended multiplicative signal correction, and demonstrate how it can further enhance our multivariate calibration model.
Multivariate Techniques
Method of Principal Components
First, we show how principal components analysis can be used as an exploratory technique (PCA). Interestingly, the known subgroups can be easily detected in the score plot when a PCA is done on the raw spectra, with no additional processing being required (Figure 1). Keep in mind from Part 1 that the samples are gluten and starch combinations, and that the colors represent the gluten fraction of the mixtures. While the spectra can be reliably categorized into groups, Spectra Data with the same gluten fraction still show significant variation. This will create issues with multivariate calibration. After we determine the best pre-processing procedure, this inaccuracy decreases noticeably.
An Adaptive Model-Based Multivariate Control Diagram
In order to find and eliminate potentially misleading outliers from a model, an outlier analysis is typically carried out. JMP’s PCA platform includes outlier analysis tools. In contrast, the Model Driven Multivariate Control Chart (MDMCC) became available in JMP 15 as a more robust tool for outlier analysis, enabling more in-depth exploration of root cause analysis and spectral comparisons for differences.
Based on the PCA produced T2 and SPE charts, MDMCC identifies two spectra (13 and 14) as outliers for the spectra in this investigation (Figure 2). As seen by the T2 chart, these spots are extremely out of line with the data in the model plane, and as shown by the SPE chart, they are also rather far from the model plane itself. These spectra are 1) different from typical spectra, and 2) poorly fitted in the principal component analysis model. Since these two criteria are met, these points have the potential to be influential outliers and cause the PCA model plane to shift when it is fitted. There is a risk of a bad PCA model if they are in fact incorrect data. Aside from points 13 and 14, there are a few more that stand out as anomalies in the T2 chart but not the SPE chart. These outlying spots are less likely to be significant since they are statistically consistent with the PCA model even if they are located at a greater distance from the mean spectrum. Figure 4 provides a helpful depiction of the differences between T2 and SPE outliers.