|
|
||||||||
1From Hôpital des Quinze-Vingts, Centre Hospitalier National dOphtalmologie, Paris, France; 2Laboratoire de Statistique Theorique et Appliquée, Université Pierre et Marie Curie, Paris, France; 3Conservatoire National des Arts et Métiers, Paris, France; and 4Alcon France, SA, Paris, France.
| Abstract |
|---|
|
|
|---|
METHODS. Visual field data of 437 patients were collected and classified by a glaucoma specialist into seven clinical groups: irregularities of VF (IVF), nasal step (NaS), arcuate scotoma (AC), paracentral scotoma (PCS), blind-spot enlargement (BSE), diffuse deficit (DD), and advanced deficit (AD). The number and content of constituent variable scores were identified by principal components analysis followed by Varimax Rotation and simple clustering, taking spatial distribution homogeneity and visual system anatomy into account. Unidimensionality was checked by a stepwise Cronbach
curve. Clinical predictability of the derived scores was checked by comparing clinical groups (ANOVA).
RESULTS. Patients older than 60 years comprised 53.3% of the sample. The average mean deviation was 9.2 dB and pattern standard deviation was 6.5 dB. Six scores were identified: four peripheral scores (nasal superior, NS; nasal inferior, NI; temporal superior, TS; and temporal inferior, TI) and two paracentral scores (PCSs; superior, PCSS; and inferior, PCSI). Cronbach
was always >0.90. The six scores decreased sequentially from IVF to DD to AD. Scores of AC were lower in NS, NI, and TS; PCSS was less in PCS; BSE scores were less in TS and TI; NaS scores were less in NS and NI.
CONCLUSIONS. Six well-separated, optimal scores were obtained from the Humphrey perimetry matrix. Internal reliability was good. It was possible to discriminate between clinical subgroups. Further analyses, based on longitudinal data, must be performed to confirm these findings.
Mean deviation (MD)5 is the average of all differences between measures and their normal values, weighted by the variance observed in the general population (where Xi is the measured threshold, Ni is the normal reference threshold at point i, S21i is the variance of normal field measurement at point i, and n is the number of test points).
![]() |
PSD is a normalized distance, standardized with reference to the general population, calculated for each point.
![]() |
Although there is a considerable body of literature about the properties of these indices as they relate to the clinical picture, some weaknesses can be identified a priori: (1) They do not take into account the spatial distribution of the points measured, the proximity of one point to another, and correlations between the points (e.g., switching one measure with another changes neither the MD nor PSD); (2) MD cannot be interpreted without knowing PSD, and vice versa; (3) MD is not a sensitive parameter in early stages of glaucoma; (4) PSD is not a sensitive parameter in late stages of glaucoma; (5) neither measure takes into account anatomic dimensions of the eye or visual system (horizontal threshold of retinal nerve fibers, the vertical threshold of vision cerebral hemispheres, and the retinal artery located centrally in the optic nerve).
Very little research has been performed to find algorithms that would help to identify visual field defects (VFDs) more precisely. Brigatti et al.6 7 used computerized neural networks with some success to identify patients with early glaucomatous visual field loss, yielding sensitivity and specificity both >70%. To achieve this, they had to include information on the automated visual field index and other structural data.
Mandava et al.8 identified 11 clusters by nearest-neighbor cluster analysis performed on Octopus visual fields (Haag-Streit, Köniz, Switzerland). A discriminant analysis was performed on the 11 scores used to classify patients with and without glaucoma. The sensitivity and specificity of this classification were very good (sensitivity and specificity >90%).
Brigatti et al.7 and Mandava et al.8 shared a common objective to develop a classifying algorithm that would help clinicians to detect new VFDs. Brigatti et al.7 directly included global perimetry indicators that assume that MD, PSD, and short-term fluctuation are the optimal information that can be retrieved from this test. However, the model produces estimators that are completely disconnected from clinical reality, and clinicians still must be persuaded by the conclusions of neural network modeling. Mandava et al.8 used a cluster analysis with the sole purpose of optimizing information reduction before running a discriminant analysis. However, the clusters did not respect retinal anatomy very well, for example, they spanned the vertical meridian, and almost nothing was stated about either the construct validity or the internal consistency reliability of these clusters.
Although we agree with the approach developed by Mandava et al.,8 we believe that the scores developed should be demonstrated to be clinically relevant before they are applied in the identification and monitoring of patients with glaucoma by using specific statistical techniquesfor example, neural networks or discriminant function.
The present pilot study was designed to identify scores produced by automated static perimetry that would: (1) be structurally valid, (2) respect the anatomy of the eye and visual system, (3) follow the clinical evolution of glaucoma, and (4) be easily interpreted by physicians. This article addresses only the development and validation of the scores.
| Materials and Methods |
|---|
|
|
|---|
Both eyes were taken into account when glaucoma was bilateral. To pool data from all eyes, a left mirror image of right eyes was computerized. A potential laterality effect on scores was checked by an analysis of variance (ANOVA).
All data used had been obtained according to the usual procedure of the glaucoma department. Patients, by default therefore, underwent a Humphrey perimeter 24-2 threshold test, with the FASTPAC algorithm (Carl Zeiss Meditec, Dublin, CA). The main characteristics of FASTPAC is the use of 3-dB steps. The fixation target was centralized on the blind spot used to monitor fixation. The stimulus was size III, white, and on a background of 31.5 asb. Patients were required to have experienced a previous visual field test. Only tests with normal reliability indices were included: fixation losses <20% and false-positive and -negative errors <30%.
On the basis of these data, a glaucoma specialist clinically classified patients into seven groups, according to VFD shape: (1) irregularities of visual field; (2) nasal step; (3) arcuate scotoma; (4) paracentral scotoma; (5) blind spot enlargement; (6) diffuse deficit; and (7) advanced deficit. Visual field irregularities corresponded to slightly depressed thresholds throughout the visual field, which did not amount to specific focal defects. Diffuse defects corresponded to a generalized loss of sensitivity throughout the whole field. Advanced glaucoma is defined by multiple scotoma surrounding the fixation point.
Data were entered into the study database with double-entry control. Analysis was conducted on the pattern deviation matrix since: (1) the data are standardized against a normal population; and (2) it reveals localized defects that may be masked by a generalized depression or an elevation of the hill of vision.
Control patients without glaucoma were not included because the pattern deviation matrix is already standardized according to an age- and gender-matched normal population. In addition, the Varimax Rotation, used for optimization of the data, would have been affected by the presence of two (control and patient) subjects with different patterns of correlation.
Figure 1 describes the variable labels used in the analysis. Fifty-two variables were identified from the threshold data, as follows: 14 in the northwest quadrant (NW1NW14); 12 in the northeast quadrant (NE1NE11, NE13); 14 in the southwest quadrant (SW1SW14); and 12 in the southeast quadrant (SE1, SE2, SE4SE13). This nontraditional terminology is used to facilitate understanding of the Varimax Rotation used and the resultant clusters. MD and PSD were collected as the main parameters from the perimeter test.
|
= 5%. Our sample size was fixed according to empiric rules used in factor analysisthat is, 5 to 10 patients should be included per variable in the analysis; hence, >400 patients were included for the 52 variables identified. A principal components analysis (PCA) was performed on the 52 variables to retain all meaningful information. The number of factors retained was chosen after examining the plot of eigenvalues. Only factors with eigenvalues >1 were retained. Scores were extracted after Varimax Rotation followed by a cluster analysis with the number of clusters fixed and equal to the number of factors retainedthat is, each of the 52 variables was classified into the most correlated rotated factor. An unweighted score was computed for each factor by summing the variables that correlated with it the most. Each variable was included in one score only. This clustering step was followed by a refining process, described later, that took into account the specificity of the data.
Correlations between each of the 52 variables and the scores were computed. A variable was regarded as correctly classified when its correlation with a score was higher than that with any other score. When this was not the case, the variable was moved to other scores to maximize its correlation. This iterative process continued until the system stabilized (i.e., no further movement was necessary). Some specific rules were added regarding the spatial homogeneity and clinical coherence of the clusters: (1) To maintain the spatial homogeneity of clusters (geographical continuity: a score had to relate to neighboring measurements), constraints were defined that restricted certain movesthat is, if a measurement was separated spatially from its cluster by another measurement, the former was moved to the nearest cluster. (2) To maintain the clinical coherence of clusters, specific constraints were defined that restricted certain movesthat is, if a variable was in a cluster not compatible with known medical constraints underlying eye function, it was moved to the nearest, clinically meaningful cluster. This rule applied when a cluster crossed the eastwest or northsouth axes.
Assessment of Unidimensionality and Clinical Validity of Scores
A stepwise Cronbach
curve9 10 11 was plotted to check the unidimensionality of the variables yielding the score. This calculation made it possible to verify that a group of items measured the same underlying unidimensional concept (construct validity). The curvature should increase monotonically when all items belong to the appropriate score. Otherwise, items should be allocated to another score. However, because it is influenced by sample fluctuations, it should be interpreted cautiously, especially when scores contain few items.
Clinical validity can be estimated as the ability of a score to capture clinical relevance.12 The mean of each score was compared across all seven clinical groups by ANOVA.
| Results |
|---|
|
|
|---|
Pupil diameter was documented in 137 patients and, on average, was 4.3 mm ± 1.1 (SD). The average duration of the test was 8.0 minutes. The average MD was 9.2 ± 7.2 dB (SD) and the PSD was 6.5 ± 3.3.
Principal component analysis identified six factors that explained 61.09% of the total variance. After an abrupt decrease, the plot of the eigenvalues showed a clear break at the sixth eigenvalue, then a plateau, and again a new, but slighter, decrease. The sixth eigenvalue was also the first value less than unity,1 and so the min-eigen criterion also retained six factors.
Figure 2 describes step-by-step how the scores were constructed. Examination of correlations with the six factors retained after Varimax Rotation indicated the first cluster pattern of the original variables. All measurements of the northwest quadrant plus NE3, NE6, and NE11 were correlated and constituted factor 1 (correlations: 0.530.85). Factor 2 comprised all measurements of the southwest sector (correlations: 0.590.85). Factor 3 included all measurements of the northeast sector (correlations: 0.400.75) except NE3, NE6, and NE11, already attracted by factor 1, and NE10, which alone represented factor 6 (correlation: 0.49). Factor 4 comprised measurements of the southeast sector (correlations: 0.540.70) except SE5 and SE6, which jointly made factor 5 (correlations: 0.53 and 0.63).
|
Correlations were calculated again, and we were obliged to move NE6 from score 1 to score 3, NW14 to score 6, and both SE1 and SE2 from score 4 to score 5. (Fig. 2 : scores after the second correlation analysis). No additional moves were necessary after the second step.
Cronbach
curves were estimated for each subset to check that the scores were unidimensional. We were obliged to move NE13 and SE4 to another set to obtain unidimensional scores. If NE13 was moved to score 4, we produced a set crossing the north-to-south quadrants. Moreover, the new set (score 4 + NE13) was no longer unidimensional. A similar situation arose when we tried to move SE4 to score 3. NE13 and SE4 were therefore not moved. NE6 and NE7 could, however, be moved to score 6, in accordance with the adopted rules. In this way, spatial homogeneity was respected (no crossing from the north to the south quadrant), as was specificity (each variable correlated more with its own score than with any other score).
The six clusters from which scores were derived are described in Figure 2 . Four scores were peripheral (nasal superior, NS; nasal inferior, NI; temporal superior, TS; temporal inferior, TI) and the remaining two central (paracentral superior, PCS; and paracentral inferior, PCI). The formulas were respectively:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
For each score, Cronbach
13 was always >0.90 (Fig. 3 , the maximum value reported in the curve), demonstrating good internal reliability. Cronbach
curves for peripheral scores increased monotonically, supporting the requirement that all items should contribute to the score. However, this was not the case for central scores, although Cronbach
curves were always >0.90. Last, the decrease of curvature was small and limited to a single item.
|
|
|
|
|
| Discussion |
|---|
|
|
|---|
If homogenous subgroups of measures (i.e., a new set of MD scores) could be identified (ideally related, even empirically, to a clinical glaucoma classification), the follow-up of patients with glaucoma would be more clinically relevant. A vector of scores, each specific to a precise anatomic visual field area, starting with a high score at the onset of disease and decreasing with severity, would dramatically clarify treatment decisions and the follow-up of patients with glaucoma. In this manner, PSD would become an indirect measure of heterogeneity within scores.
A PCA followed by our clustering algorithm allowed us to reach this objective. We identified six scores that explained more than 60% of the observed variance. The scores had orthogonal properties meaning that their independence was maximized. They had also good construct validity, as demonstrated by Cronbach
curvesthat is, with the four peripheral scores, at least; switching an item to another score did not improve its validity. In the case of the two paracentral scores, some switching did improve reliability. However, the loss of reliability, as measured by the Cronbach
curve, was very low and could be explained by stochastic sampling issues. We therefore preferred to keep scores close to retinal anatomy (proximity of points), instead of maximizing the mathematical properties of our scores. Finally, the scores were easy to calculate.
We used the pattern deviation matrix and performed statistical manipulations to achieve this result. We could have worked on threshold sensitivities or the total deviation matrix. The former would have required an adjustment for age. The latter provides an indirect standardization, based on a population-wise approach, which is better than local data-based adjustments. We used the pattern-deviation matrix because it emphasized localized defects and therefore would increase the correlation between points belonging to a same VFD. This correlation should stabilize the Varimax Rotation.
Although we used a rotation pattern that maximized score independence, we still found high correlations between certain scores. This indirectly, but strongly, supports the fact that the MD is a score with a high construct validity. In other words, each item contributes homogenously to the MD. In contrast, a possible use of MD as a single score would explain a much smaller part of the total variance; hence, much information would be lost.
Our algorithm was successful in producing scores that respected retinal anatomy (good and rapid convergence). Four quadrants, fully separated by the horizontal and vertical axes, were shown to be associated with two central scores. The coexistence of central and peripheral central scores could be interpreted as follows: Central scores may be sensitive to blood flow variations in the central retinal artery, whereas peripheral scores may be more sensitive to intraocular pressures effects on nerve fibers at the disc junction. Additional data are needed to confirm this hypothesis.
The relationship between the localization of a clinical field defect and its corresponding score demonstrated that our scoring algorithm was clinically relevant and thereby possessed construct validity. Apart from visual field irregularities and diffuse deficit, our six scores assembled valuable data describing the localization of a VFD.
Nasal scores were involved in nasal step and arcuate scotoma, temporal and paracentral scores in blind-spot enlargement, and paracentral scores in paracentral scotoma. Even with advanced deficits, the six scores added information to the MD. The relationship between the localization of a clinical VFD and its corresponding score demonstrated that our scoring algorithm was clinically relevant and thereby possessed construct validity.
Our six scores differed from the eleven described by Mandava et al.8 because of the algorithm used. We believe that fewer scores for retinal anatomy would be easier both to apply and understand in daily practice. Because our scores demonstrated good construct and external validity, they should assist in patient follow-up, although additional longitudinal data are needed to confirm this. Comparison with the work of Brigatti et al.6 7 is not straightforward, since their main goal was to identify patients with glaucoma. Nonetheless, our scores could be used as entry parameters in a neural network to serve the same purpose.
AGIS (Advanced Glaucoma Intervention Study) scores14 were not calculated for our sample of patients. Therefore, a head-to-head comparison with our six scores was not possible. The AGIS investigators decided that one single clinical score was appropriate to define the severity of glaucomatous VFDs. This assumption was somewhat contradicted by our findings. Our algorithm is also simpler than that used in AGIS and can be managed with a basic calculator. Finally, we demonstrated that our six scores brought additional information to the MD.
Our pilot study has limitations. A larger sample size may have increased the sensitivity of our algorithm to detect different or additional scores. The sample size of some patient groups was rather small, and certain analyses should be interpreted cautiously. All patients came from a single center, which reduces external validity. We defined seven classes of clinical abnormality arbitrarily, but other clinical classifications may be of interest. We also used unweighted combinations of variables, whereas Humphrey algorithms use standardization according to several confounding factors. The use of weighted combinations of variables could improve the sensitivity our scores. Last, we restricted our algorithm to a linear approach. The use of nonlinear models may make some scores more accurate.
More development work is needed before our six scores can be used. Sensitivity to clinical changes should be explored, and the discrimination of different clinical abnormalities must be verified.
| Appendix 1 |
|---|
|
|
|---|
and the Stepwise Cronbach
Curve
is a simple, mixed, one-way model: Xij = µj +
i +
ij, where µj is a variable fixed (nonrandom) effect and
i is a random effect with zero mean and SE 
corresponding to patient variability. It produces the variance of the true latent measure (
ij = µj +
i); and
ij is a random effect with zero mean and SE
corresponding to the additional measurement error. The true measure and the error are uncorrelated: cov(
i,
ij) = 0.
These assumptions are classic in experimental design. This model defines relationships between different kinds of variables: the observed score Xij, the true score
ij, and the error
ij.
Reliability of an Instrument.
A measurement instrument gives us readings that we call observed values. The reliability
of an instrument is defined as the ratio of the true over the observed measure. Under the parallel model, one can show that the reliability of any variable Xj (as an instrument to measure the true value) is given by
![]() |
can be easily interpreted as a correlation coefficient between the true and the observed measure.
When the parallel model is assumed, the reliability of the sum of k variables equals
![]() |
coefficient (CAC)15 :
![]() |
![]() |
![]() |
It is easy to show a direct connection between CAC and the percentage of variance of the first component in PCA, which in factor analysis is often used to assess unidimensionality.16 The PCA is usually based on an analysis of the latent roots of the correlation matrix of k variables R, which, under the parallel model, looks as follows:
![]() |
1 = (k 1)
+ 1, and the other multiple roots are
2 =
3 =
4 = ... = 1
= (k
1)/(k 1). Thus, using the Spearman-Brown formula, we can express the reliability of the sum of variables as:
![]() |
(CAC) and the first latent root
1, which in practice is estimated by the corresponding value of the observed correlation matrix and thus the percentage of variance of the first principal component in a PCA. So, CAC is also considered as a measure of unidimensionality. The Spearman-Brown formula indicates a simple relationship between CAC and the number of variables. It is easy to show that the CAC is an increasing function of the number of variables. This formula is obtained under the parallel model.
A step-by-step curve of CAC can be built to assess the unidimensionality of a set of variables.11 16 The first step uses all variables to compute CAC. Then, at every successive step, one variable is removed from the scale. The removed variable is that which leaves the scale with its maximum CAC value. This procedure is repeated until only two variables remain. If the parallel model is true, increasing the number of variables increases the reliability of the total score, which is estimated by Cronbach
. Thus, a decrease of such a curve after adding a variable would cause us to suspect strongly that the added variable did not constitute a unidimensional set with the other variables.
| Footnotes |
|---|
Submitted for publication October 13, 2004; revised February 18 and April 7, 2005; accepted April 28, 2005.
Disclosure: J.P. Nordmann, None; M. Mesbah, None; G. Berdeaux, Alcon France, SA (E, F)
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Gilles Berdeaux, Alcon France, 4, Rue Henri Sainte-Claire Deville, F-92563 Rueil-Malmaison Cedex, France; gilles.berdeaux{at}alconlabs.com.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |