2011_06_29_DAILY PAPER REVIEW (AAMIR ALAUD-DIN)
Title and Authors
Title: Practical aspects of PARAFAC modeling of fluorescence excitation-emission data
Authors: C. M. Andersen1,2* and R. Bro1
1The Royal Veterinary and Agricultural University, Department of Food Science, Food Technology, Rolighedsvej 30, DK-1958 Frederiksberg C, Denmark
2Danish Institute of Fisheries Research, Department of Seafood Research, DTU Building 221, DK-2800 Kgs. Lyngby, Denmark
SUMMARY OF PAPER
INTRODUCTION
Problems:
Fluorescence spectroscopy proved itself to be a very fruitful technique in a lot of fields like chemistry, medicine, and environment. The problems associated with this technique include interferences, scatter, and overlapping of signals, etc. During modeling of fluorescence data, interference from scattered light influences the model parameters.
Solution:
The autofluorescence of excitation emission spectra can be analyzed by multinear models like PARAFAC, Tucker, and N-PLS. For trilinear data, curve resolution is possible by PARAFAC. It provides concentration profiles of the analytes present in the samples being analyzed. This paper demonstrates the details of using PARAFAC model. It also provides the guidelines to handle the probelms which arise during PARAFAC modeling.
MATERIALS AND METHODS
Data:
Following two different data sets with their specifications are as given below
Fish Data:
Two frozen temperatures (-20oC and -30oC)
Four storage periods (3, 6, 9, or 12 months)
One chill storage (2oC)
Five storage periods (0, 3, 7, 14, or 21 days)
105 samples were analyzed spectrofluorimetrically at 22oC.
Emission wavelength range was 270 - 600 nm at 2 nm intervals.
Excitation wavelength range was 250 - 370 nm at 10 nm intervals.
Data with Known Fluorophores:
27 samples of four fluorophores in varying concentrations.
Emission wavelength range was 200 - 315 nm at 5 nm intervals.
Excitation wavelength range was 250 - 459 nm at 1 nm intervals.
Model and its Description:
The PARAFAC model can be written as
x_ijk= ∑_(f=1)^F▒?a_if b_jf c_kf ?+ e_ijk
i = 1,....,I; j = 1,.....,J; k = 1,.....,K
where xijk is the intensity of the ith sample at jth variable (emission mode) and at the kth variable (excitation mode). aif, bjf, and ckf are parameters that describe the sample and eijk is the variation that is not captured by the model. PARAFAC components are the signals from individual fluorophores. If correct number of components is selected, aif describes the relative concentration of analyte f in sample i. bf and cf are the emission and excitation spectra. Inner filter effects are caused by high concentrations, scattering, and quenching. They disturb the trilinearity of the data. Abundance of missing values and similarity in spectra can lead to uncertain estimates. PARAFAC model can be validated by the use of fit values, visual assessment of the loadings, residual analysis, core consistency diagnostic, jack-knifing and by a split half analysis.
Data Pretreatment:
There should be no emission wavelength below excitation wavelength. Otherwise, it will mean a higher energy emission than the input energy (energy and wavelength are inversely related). So, below excitation wavelength, emission wavelength is zero. These zero values do not conform to three way model, so, this problem is handled by setting these values to missing numbers in the analysis.
For emission wavelengths less than excitation wavelength, no fluorescence or intensity is possible. But, a part of excitation spectra has non-zero values (shows some fluorescence or intensity), as shown in figure 1, for emission wavelengths less than excitation wavelength which is not possible practically. For this no fluorescence region, the data again exhibits a two way data and PARAFAC model for this part is invalid.
Figure 1
Figure 2 shows same emission spectra for different excitation wavelengths. It shows that this emission data is bilinear data having the same shape for different excitation wavelengths. If we set zeros here (shown in circle and enlarged in figure 3), shape of every emission spectrum is changed because we have zeros for different excitation wavelengths. To solve this problem, many components are required (as we have different emission spectrum now). So, using zeros here is inadequate for data analysis. So, the problem only arises for the part of emission just below excitation. For other regions, the area is zero physically. Some molecules emit double peaks for a single excitation wavelength and so emission may be less than excitation peak.
Figure 2
Figure 3
Scattering is also another problem in the spectra. It is caused by small particles in the samples. Light scatters after collision. In case of elastic scattering, Rayleigh first order peak appears (appears as diagonal because of elastic scattering). If energy is lost, the scattered light will have higher emitted peak (Rayleigh second order scatter). Apart from these two scatters, other peaks also appear because of diffraction grating in monochromator of the instrument.
There are several different ways to deal with these problems. For emission region very close to excitation region or region at double the emission wavelength than that of excitation are set to missing data. Other techniques include use of zeros, down weighting of elements. The purpose of all the techniques is to minimize the non-trilinear trend of influenced part of data. Rayleigh scatters are most important because minimizing this part gives similar results.
RESULTS AND DISCUSSION
The steps and methods of finding a valid PARAFAC model are discussed below. Correct number of components can be found by visualizing the loadings, variance explained by the model, number of iterations and core consistency diagnostic. The corrections by these artifacts are dealt with model constraints. The score values by jack-knife method, leverage and sample residuals distribution are used to identify outliers. The model is validated by split-half analysis.
Initial PARAFAC Modeling of Fish Data
PARAFAC models with one to five components and without any constraints are applied. Visual appearance of fluorescence EEMs suggests a two component model. But, sometimes other components may have smaller peaks with the same excitation and emission wavelengths as the dominant fluorophores. So, using more than two components is better.
The percentage of explained variance does not vary after 3rd component which is an indicative of three component model. The core consistency diagnostic (the relative sum-of-squared difference between the core obtained from data and PARAFAC loadings) decreases to 37% for a three component model indicating that a two component model is suitable. But, at this early stage it is not possible to draw a final conclusion about PARAFAC components.
If we start at different positions, the model converges to same solution if right numbers of component are chosen. But, it does not provide conclusive evidence, rather together with the evidence from fit values, three components were found to be feasible. Moreover, degeneracy and absence of local minima indicated that a three component model is adequate.
Visual Appearance of Loadings
If number of components chosen are correct and data is trilinear, the excitation and emission of a correctly PARAFAC model represent the analytes. Emission spectra near 300nm and a large negative peak in figure 4 for two and three component models show that the model did not identify the analytes correctly. The long and narrow peak in emission of three component models may be due to low-emission missing data and small amount of scatter.
Jack-Knife Validation of Loadings
Jack-knifing is the resampling technique to evaluate the stability of the model. Large standard errors are obtained for component three where the peak is large and narrow.
Explanation of Artifactual Loadings
By jack-knifing, uncertainty in the model is obtained quantitatively. But, the reason can’t be explained. For this, component one and two are subtracted from the data to obtain third component. By visualizing, it is apparent that emission near 300nm is not easily detectable. This problem is due to large amount of missing values. Use of constraints helps to solve this problem.
Applying Constraints
The negative peak in figure 4 indicates the invalidity of the model. To cope up with this problem, non-negativity constraints are used which remove this problem. Non-negativity should not be used if the data is valid for the model. A proof of improvement in the model by non-negativity is the use of CORCONDIA whose value is 59% for a three component model after the use of non-negativity constraints which was 37% when no non-negativity constraints were used.
Data with Known Fluorophores
The data contains four analytes. The PARAFAC modeling suggested four or five components according to the results of variance and number of iterations. For too many components, the number of iterations increases dramatically, but, it is only a guess because iterations may also increase for highly correlated data. For a five component model, the noise in the data becomes obvious. After exclusion of this part it becomes clear that a four COMPONENT model is valid by visualizing the loadings.
Finding Outliers in Fish Data
Three component model with non-negativity constraints is used to identify the outliers. It plots an identity match plot. The outlying sample appears away from the remaining data. It was found that sample number 42 is an outlier because it was away in the plots of all modes. Use of other techniques like jack-knifing, and score values did not give any other additional outlier.
Samples 14, 37, and 82 also exhibited as outliers, but the loadings with and without these samples were similar, so, they were not removed from the sample.
Data with Known Fluorophores
Samples 2, 3, and 4 had high leverage values. But, loadings were not changed by their removal. It shows the high concentration of either of the analytes. So, they were kept in the samples.
Split Half Validation
Split half validation fits the model on different groups of samples. If the loadings are similar on different subsets of data, the model is validated. The excitation and emission peaks obtained for three components were 290, 330, 330, 360, 330, and 400 nm respectively.
This paper gives a clear idea of applying a PARAFAC model and also deals with the solution of associated problems.