|
|
||||||||
1 From the Glaucoma Center, Department of Ophthalmology and 2 Department of Family and Preventive Medicine, University of California San Diego, La Jolla.
| Abstract |
|---|
|
|
|---|
METHODS. One eye of each of 94 subjects was included. Healthy eyes (n = 38) had both normal-appearing optic discs and normal SAP results. Glaucoma by SAP (n = 42) required a repeatable abnormal result (glaucoma hemifield test [GHT] or corrected pattern standard deviation [CPSD] outside normal limits). Glaucoma by disc appearance (n = 51) was based on masked stereoscopic photograph evaluation. Receiver operating characteristic (ROC) curve areas, sensitivities, and specificities were calculated for each instrument separately for each diagnosis.
RESULTS. The largest area under the ROC curve was found for OCT inferior
quadrant thickness (0.91 for diagnosis based on SAP, 0.89 for diagnosis
based on disc appearance), followed by the FDT number of total
deviation plot points of
5% (0.88 and 0.87, respectively), SLP
linear discriminant function (0.79 and 0.81, respectively), and SWAP
PSD (0.78 and 0.76, respectively). For diagnosis based on SAP, the ROC
curve area was significantly larger for OCT than for SLP and SWAP. For
diagnosis based on disc appearance, the ROC curve area was
significantly larger for OCT than for SWAP. For both diagnostic
criteria, at specificities of
90% and
70%, the most sensitive OCT
parameter was more sensitive than the most sensitive SWAP and SLP
parameters. For diagnosis based on SAP, the most sensitive FDT
parameter was more sensitive than the most sensitive SLP parameter at
specificities of
90% and
70% and was more sensitive than the most
sensitive SWAP parameter at specificity of
70%. For diagnosis based
on disc appearance at specificity of
90%, the most sensitive FDT
parameter was more sensitive than the most sensitive SWAP and SLP
parameters. At specificity
90%, agreement among instruments for
classifying eyes as glaucomatous was poor.
CONCLUSIONS. In general, areas under the ROC curve were largest (although not always significantly so) for OCT parameters, followed by FDT, SLP, and SWAP, regardless of the definition of glaucoma used. The most sensitive OCT and FDT parameters tended to be more sensitive than the most sensitive SWAP and SLP parameters at the specificities investigated, regardless of diagnostic criteria.
| Introduction |
|---|
|
|
|---|
Several diagnostic techniques to assess the RNFL and visual function have been introduced recently to aid in the early diagnosis and monitoring of patients with diagnosed or suspected glaucoma. The objective of this study was to compare the diagnostic ability of two methods for quantitatively assessing the RNFL (scanning laser polarimetry [SLP] and optical coherence tomography [OCT]), and two retinal ganglion cellspecific methods for testing visual function (short-wavelength automated perimetry [SWAP] and frequency-doubling technology [FDT] perimetry) in glaucomatous and healthy eyes in a single-sample population. The advantage of examining the diagnostic performance of these instruments in a single population is that population-characteristicbased variables are eliminated, thus allowing direct comparison of results obtained with the different instruments.
There may be biases in evaluating the diagnostic ability of RNFL imaging when using optic disc appearance as a gold standard or in evaluating visual function when using SAP as a gold standard. Our goal was therefore to evaluate the diagnostic precision of RNFL assessment for detecting eyes with glaucomatous visual function defects and to evaluate the diagnostic precision of visual function assessment for detecting eyes with a glaucomatous optic disc appearance. To directly compare results from structural and functional tests in the same sample population, we also evaluated RNFL assessment, with optic disc appearance as the diagnostic criterion, and examined visual function tests, with SAP as the diagnostic criterion.
| Methods |
|---|
|
|
|---|
All subject eyes had open angles, best corrected acuity of 20/40 or
better, sphere within ±5.0 diopters (D), and cylinder within ±
3.0 D at time of testing. Subjects had no history of diabetes or other
systemic disease and no reported ophthalmic or neurologic surgery or
other diseases affecting visual fields or color vision. All visual
function tests were reliable (
25% false positives, false negatives,
fixation losses) and all RNFL images obtained were judged to be of
acceptable quality by experienced operators.
Healthy eyes in this study (n = 38) had a measured IOP of 22 mm Hg or less with no history of elevated IOP. These eyes had healthy-appearing optic discs, based on masked consensus grading of simultaneous stereoscopic photographs by two expert graders, and SAP results within normal limits. Average ± SD healthy subject age was 58.13 ± 12.18 years.
In determining diagnostic sensitivity for RNFL measurements, glaucoma was diagnosed based on SAP results. The 42 eyes in the group with diagnosis by SAP had repeatable abnormal SAP results (either glaucoma hemifield test [GHT] results or corrected pattern standard deviation [CPSD] outside normal limits). Optic disc appearance and IOP were not used as diagnostic criteria. Eyes in this group had a mean ± SD mean deviation (MD) of -4.0 ± 4.2 dB, indicating primarily early glaucoma. The patients mean age was 64.4 ± 11.7 years.
When determining diagnostic sensitivity of visual function tests, glaucoma was diagnosed based on optic disc appearance. These 51 eyes had either focal rim notching, rim thinning, excavation of the rim, or RNFL defects. SAP results and IOP were not used as diagnostic criteria. Eyes in this group had a mean ± SD MD of -3.5 ± 4.0 dB indicating primarily early glaucoma. The patients mean age ± SD was 63.2 ± 11.9 years. Diagnosis in 37 eyes overlapped, and these were included in both the diagnosis-by-disc-appearance and diagnosis-by-SAP result groups. Mean SAP MD and mean age were not significantly different between the two glaucoma diagnosis groups (P > 0.05).
There were no significant differences in age (ANOVA, P > 0.05) or ethnic origin among the healthy subjects, patients with glaucoma diagnosed by disc appearance, and those with glaucoma diagnosed by SAP results. Ninety-one percent of subjects were white, 3% were Asian, 3% were Hispanic, and 2% were African American.
To directly compare results and assess bias from RNFL-based techniques and visual function techniques, we also evaluated the diagnostic ability of RNFL tests in healthy subjects and patients with glaucoma diagnosed by disc appearance, and we evaluated the diagnostic ability of visual function tests in healthy subjects and patients with glaucoma diagnosed by SAP results.
Instrumentation
Scanning Laser Polarimetry.
The scanning laser polarimeter (GDx Nerve Fiber Analyzer; Laser
Diagnostic Technologies, San Diego, CA) uses confocal scanning diode
technology coupled with an integrated polarization modulator to measure
retardation of light that has double passed the birefringent fibers of
the RNFL. Retardation measurements have been shown to correlate with
RNFL thickness measurements.9
Details of this instrument
and descriptions of parameters have been provided
elsewhere.10
11
12
13
14
We examined 27 parameters automatically provided by the GDx nerve fiber analyzer software (ver. 2.0.01; Laser Diagnostic Technologies). Parameters investigated were: GDx Number (neural network result), average thickness, ellipse average, ellipse modulation, inferior average thickness, inferior integral, inferior maximum-nasal median ratio, inferior maximum thickness, inferior-nasal integral ratio, inferior-nasal mean ratio, inferior ratio, inferior-temporal integral ratio, inferior-temporal mean ratio, maximum modulation, superior average thickness, superior-inferior integral ratio, superior-inferior mean ratio, superior integral, superior maximum thickness, superior-nasal integral ratio, superior-nasal mean ratio, superior-nasal ratio, superior ratio, superior-temporal integral ratio, superior-temporal mean ratio, symmetry, and total polar integral. Finally, we examined the value of a discriminant analysis model (linear discriminant function [LDF]) proposed by Weinreb et al.14 [LDF = -4.442655 - (0.156 · average thickness) + (0.935 · ellipse modulation) + (0.183 · ellipse average)]. All parameters investigated are shown in Tables 1 and 2 .
|
|
Optical Coherence Tomography.
The optical coherence tomograph (OCT 2000; Humphrey-Zeiss Instruments,
Dublin, CA) uses low-coherence interferometry to assess peripapillary
RNFL thickness. This instrument measures RNFL thickness by measuring
the difference in temporal delay of back-scattered light from the RNFL
and a reference mirror. RNFL is differentiated from other retinal
layers using an edge detection algorithm (version A4X1). RNFL thickness
is defined as the number of pixels between its anterior and posterior
boundaries.
OCT parameters investigated in this study were mean RNFL thickness (360° measure), temporal quadrant thickness (31645° unit circle), superior quadrant thickness (46135°), nasal quadrant thickness (136225°), inferior quadrant thickness (226315°), and thickness measures at three superior clock hours (11, 12, and 1 oclock, with 11oclock located superior temporally) and three inferior clock hours (5, 6, and 7 oclock, with 7 oclock located inferior temporally). We also developed a modulation parameter (called max-min) calculated by subtracting the RNFL thickness measurement of the thinnest quadrant from the measurement of the thickest quadrant. This parameter is designed to assess the amplitude of the characteristic double-hump pattern of RNFL thickness and is analogous to SLP modulation parameters.
Three circular scans of 3.4-mm diameter centered on the optic disc were obtained for each test eye. This approximate scan diameter was found to be optimal for RNFL analysis in a prototype instrument.15 Mean RNFL thickness values for quadrant and clock-hour measurements were determined from the three scans obtained.
Because OCT RNFL measures are taken nearer to the disc in subjects with larger discs because of the set radius of the circular scan, we examined the correlation between disc area and RNFL quadrant thickness measures (Pearsons r) and examined the difference in disc size between glaucomatous and healthy eyes (t-test) and found no significant results (all P > 0.1). Disc area was measured by confocal scanning laser tomograph (Heidelberg Retina Tomograph; Heidelberg Engineering, Heidelberg, Germany).
Short-Wavelength Automated Perimetry.
SWAP is a modification of SAP (Humphrey Field Analyzer II, program
24-2; Humphrey Instruments, San Leandro, CA) in which a 440-nm
narrow-band target is presented on a 100-candela (cd)/m2
yellow adaptation field to selectively stress short-wavelength cones
and small bistratified blueyellow ganglion cells.16
The
same parameters and stimulus programs are used in SWAP as in SAP and
have been discussed in detail elsewhere.17
SWAP parameters investigated in this study were MD, PSD, total number
of abnormal points in the total deviation and pattern deviation plots
(at P
0.05 or worse and P
0.01 or worse), number of abnormal points in the superior hemifield of
the total and pattern deviation plots (at P
0.05
or worse and P
0.01 or worse), and number of
abnormal points in the in the inferior hemifield of the total and
pattern deviation plots (at P
0.05 or worse and
P
0.01 or worse). Abnormality for SWAP
parameters was determined by comparison to our normative database of
342 eyes. Only 6 of the 38 healthy eyes used in this study contributed
to this database.
Frequency-Doubling Testing.
FDT perimetry (Humphrey Visual Field Instrument, using Welch Allyn FDT,
Skaneateles Falls, NY) is based on the frequency-doubling effect, in
which a low-spatial-frequency sine-wave grating, undergoing
high-temporal-frequency counter-phase flicker, appears to have double
its true spatial frequency. This effect has been attributed to
processing by a subset of magnocellular ganglion cells with nonlinear
response properties,18
although there is evidence to show
that at contrast threshold, all magnocellular cells are likely to be
responsive to this type of stimuli.19
20
Target stimuli consist of individual 10° square, 0.25 cyc/deg sinusoidal gratings, counter-phasing at 25 Hz. Targets are presented in one of 17 test areas located within the central 20° of the visual field (threshold program C-20; Welch Allyn). This test measures the stimulus contrast detection threshold using a modified binary staircase procedure.
For each stimulus presentation, observers responded with a button-press when the stimulus was detected. Maximum stimulus duration using this instrument is 720 msec. During the first 160 msec, the stimulus contrast is ramped up from 0 to that selected for the presentation. If it is not immediately detected, the stimulus remains at the selected contrast for 400 msec and then is ramped down to 0 during the final 160 msec. The interstimulus interval is randomly selected up to 500 msec and target location is pseudorandomly selected for each presentation.
FDT parameters investigated in this study were the same as for SWAP. Abnormality for FDT parameters was determined by comparison to the manufacturers internal normative database (Viewfinder version 1.02, program C-20; Welch-Allyn).
Data Analysis
For each measured parameter from each of the instruments, area
under the ROC curve for discriminating between glaucomatous and healthy
eyes was calculated. Differences between ROC curve areas among
instruments were determined by the method of DeLong et
al.21
Sensitivity and specificity for detection of
glaucomatous eyes was determined by obtaining the highest sensitivity
values with a target specificity set at
90% and again with a target
specificity set at
70%. Depending on the goals of the screening
program and characteristics of the target population, either of these
target specificities may be useful.22
Differences between
sensitivities at set specificities among instruments were determined
using the McNemar test. ROC curve and sensitivity and specificity
analyses were performed twice for each instrument, once using disc
appearance as a criterion for a glaucoma classification and once using
SAP results. For each instrument, parameter comparisons between
glaucomatous and healthy eyes were performed using T-tests with
Bonferroni-corrected
.
| Results |
|---|
|
|
|---|
When target specificity was set at
90%, the parameters with the
three highest sensitivities (sensitivity, specificity; Table 1 ) were
superior average thickness (40%, 92%); superior maximum thickness
(36%, 92%); and GDx LDF, GDx Number, superior-nasal mean ratio,
inferior maximum thickness, ellipse average thickness, and total polar
integral (all 33%, 92%). When target specificity was set at
70%,
the parameters with the three highest sensitivities were GDx LDF and
the GDx Number (both 71%, 71%); superior-nasal ratio and
superior-temporal integral ratio (both 62%, 71%); and superior ratio,
superior integral, and total polar integral (all 57%, 71%).
When SLP parameters in eyes classified as glaucomatous based on SAP
results were compared with healthy eyes, 5 of 28 parameters were
significantly different (after Bonferroni correction,
= 0.002)
between groups in the predicted directions (lower GDx LDF result,
higher GDx Number, with less thickness modulation in glaucomatous eyes;
Table 2
).
OCT: Glaucoma Diagnosed by SAP
For OCT parameters, areas under the ROC curve for diagnosis based
on SAP ranged from 0.91 to 0.66 (Table 3)
. The three largest areas under the ROC curves (ROC area, SE) were for
inferior quadrant thickness (0.91, 0.03), 6 oclock thickness (0.90,
0.04), and mean thickness (0.89, 0.04).
|
90%, the parameters with the
three highest sensitivities (Table 3)
were inferior quadrant thickness,
thickness at 6 oclock, and thickness at 7 oclock (inferior
temporal; all 79%, 92%). When target specificity was set at
70%,
the parameters with the three highest sensitivities were inferior
quadrant thickness (88%, 71%), mean thickness and thickness at 6
oclock (both 86%, 71%), and thickness at 7 oclock (83%, 71%).
When OCT parameters in eyes classified as glaucomatous based on SAP
results were compared with healthy eyes, all RNFL thickness measures
were significantly different (after Bonferroni correction,
=
0.004) between groups (thinner RNFL measures in glaucomatous eyes)
except for the max-min (modulation) parameter (P =
0.01; Table 4
).
|
1%, pattern deviation plot points
5%, and
superior quadrant pattern deviation plot points
5% (all 0.74, 0.05);
and pattern deviation plot points
1%, total deviation plot points
5%, and superior quadrant total deviation plot points
5% (all
0.73, 0.05).
|
90%, the parameters with the
three highest sensitivities were pattern deviation plot points
5%
(43%, 92%), PSD and superior quadrant total deviation plot points
5% (both 41%, 92%), and total deviation plot points
1% (39%,
92%). When target specificity was set at
70%, the parameters with
the three highest sensitivities were total deviation plot points
1%,
superior quadrant pattern deviation points
5%, and superior quadrant
total deviation plot points
5% (all 73%, 71%), superior quadrant
total deviation plot points
1% and superior quadrant pattern
deviation plot points
5% (both 63%, 74%), and PSD and pattern
deviation plot points
1% (both 61%, 71%).
When SWAP parameters in eyes classified as glaucomatous based on optic
disc appearance were compared with healthy eyes, all parameters were
significantly different (after Bonferroni correction,
=
0.004), except for inferior total deviation plot points
5% and
1%, and inferior pattern deviation plot points
5% and
1% (all
P
0.01). Glaucoma eyes had more negative MDs and
larger PSDs and, in general, had more localized defects than healthy
eyes (Table 6)
.
|
5% (0.87, 0.04), MD (0.83, 0.04), and both superior hemifield
total deviation plot points
5% and pattern deviation plot points
5% (0.82, 0.05).
|
90%, the parameters with the
three highest sensitivities were MD and pattern deviation plot points
5% (both 61%, 92%), pattern deviation plot points
1% (57%,
92%), and total deviation points
5% (51%, 95%). When target
specificity was set at
70%, the parameters with the three highest
sensitivities were MD (80%, 71%), total deviation plot points
5%
(78%, 84%), and both superior hemifield total deviation plot points
5% (78%, 76%) and PSD (78%, 71%).
When FDT parameters in eyes classified as glaucomatous based on optic
disc appearance were compared with healthy eyes, all parameters were
significantly different (after Bonferroni correction,
= 0.004)
between groups (more negative MD, larger PSD, more localized defects in
the glaucomatous eyes; Table 8
).
|
90% and
70% for SWAP and FDT, using the
diagnosis by SAP (Tables 5
7)
, and determined ROC curve areas and
sensitivities at specificities of
90% and
70% for SLP and OCT,
using the diagnosis by disc appearance (Tables 1
3)
. We then compared
all four techniques, according to each diagnostic criterion.
Diagnosis by SAP
Using the best parameter from each instrument, the largest
area under the ROC curve was found for OCT, followed by FDT, SLP, and
SWAP (Fig. 1)
. For diagnosis based on SAP, we found significant differences in ROC
curve area between OCT inferior quadrant thickness (0.91, 0.03) and
both GDx LDF (0.79, 0.05) and SWAP PSD (0.78, 0.05; both
P
0.02). No other significant differences were found
between parameters with the highest ROC curve areas from other
instruments.
|
90% for the most sensitive parameter
from each instrument, OCT inferior thickness, OCT thickness at 6
oclock, and OCT thickness at 7 oclock (all 79%, 92%) were
significantly more sensitive than SWAP PSD, SWAP superior total
deviation plot points
5%, SWAP pattern deviation points
1% (all
52%, 92%), and SLP superior average thickness (40%, 92%; all
P
0.01). FDT superior pattern deviation plot points
5% (71%, 92%) was more sensitive than SLP inferior average
thickness (P < 0.03). At target specificity set at
70%, OCT inferior quadrant thickness (88%, 71%) was more sensitive
than SWAP total deviation plot points
1%, SWAP superior pattern
deviation points
5%, SWAP superior total deviation points
5% (all
76%, 71%), and GDx LDF and GDx Number (71%, 71%; all
P
0.02). FDT PSD (88%, 71%) was more sensitive
than SWAP total deviation plot points
1%, SWAP superior pattern
deviation points
5%, SWAP superior total deviation points
5% (all
76%, 71%), GDx LDF, and GDx Number (71%, 71%; all P
0.02).
At
90% specificity for each instrument, agreement was poor among
pairs of parameters with the highest ROC curve area for classifying
eyes as glaucomatous. The
statistic ranged from -0.32 between OCT
inferior quadrant thickness and FDT total deviation plot points
5%
to 0.17 between OCT inferior quadrant thickness and SWAP PSD. In Figure 2
Venn diagrams are used to show the number of eyes correctly classified
as glaucomatous by the four instruments when diagnosis was based on
SAP.
|
0.02). No significant differences were found
between parameters with the highest ROC curve areas from other
instruments (Fig. 3)
.
|
90% for the most sensitive parameter
from each instrument, OCT 6 oclock thickness was significantly more
sensitive (75%, 92%) than SWAP pattern deviation plot points
5%
(43%, 92%) and SLP superior maximum thickness (39%, 92%; all
P
0.01). FDT MD and FDT pattern deviation plot
points
5% (both 61%, 92%) were significantly more sensitive than
SWAP pattern deviation plot points
5% and SLP superior maximum
thickness (all P
0.02). When target specificity was set at
70%, OCT inferior quadrant thickness was significantly more
sensitive (88%, 71%) than SWAP total deviation points
1%, SWAP
superior pattern deviation plots
5%, SWAP superior total deviation
points
5%, GDx LDF, and GDx Number (all 73%, 71%; all
P
0.03).
Similar to diagnosis by SAP comparisons, at
90% specificity
for each instrument, agreement was poor between pairs of parameters
with the largest ROC curve areas. The
ranged from -0.27 between
FDT total deviation plot points
5% and SWAP PSD to 0.34 between OCT
inferior quadrant thickness and SWAP PSD. Because the number of eyes
correctly classified as glaucomatous among all four instruments was
similar to when glaucoma was diagnosed by SAP results, the data are not
shown.
| Discussion |
|---|
|
|
|---|
90% when glaucoma was diagnosed based on disc appearance.
This general pattern of results was similar when specificity was set at
70%. The 90% and 70% specificities were chosen to represent high
and moderate specificities, respectively, not in an attempt to find the
best cutoff for any particular parameter. ROC curve area figures (Figs. 1
3)
indicate that at some specificities, SWAP and SLP parameter
sensitivities were similar to those of OCT and FDT. The diagnostic criteria used (SAP or disc appearance) had a minimal effect on the performance of SLP, OCT, SWAP, and FDT parameters in discriminating between glaucomatous and healthy eyes. For the most part, areas under the ROC curves and sensitivities were similar. This finding was probably affected by the fact that 37 of the same patients were included in both diagnostic groups. Analysis of independent patient populations for the two diagnostic groups may have provided different results. However, we expect that the percentage of patients with glaucomatous optic discs in an independent standard perimetry diagnosis group would have been similar and comparable to our group, because in our clinic population (as in many other clinics) few patients with glaucoma have repeatable standard visual field defects without observable optic disc damage.
Despite this, the diagnostic ability (based on ROC curve area and sensitivities and specificities) of most OCT, SWAP, and FDT parameters increased slightly (although not significantly) when glaucoma was defined based on functional criteria. This is not surprising, because it is likely that patients with repeatable visual field defects (and thus more advanced disease) have more RNFL damage compared with those who have not yet developed visual field defects. This increased RNFL damage may be detected using OCT, thus improving glaucoma detection. Although when evaluating structure-based tests it is theoretically best to use functional criteria as a diagnostic gold standard, and when evaluating function-based tests, it is best to use structural (optic disc) criteria, our results suggest that in some populations there may be little practical difference. Further research is needed to determine whether this is the case when using independent diagnostic groups.
In the present study, and inherent in all studies evaluating diagnostic procedures in glaucoma, there exists no perfect gold standard for diagnosis. Although stereophotography and standard visual field testing are the current standards used for glaucoma diagnosis in research, it is possible that newly developed instruments are better at detecting glaucoma. Estimates of sensitivity and specificity and ROC curve areas are reliant on the quality of the gold standard applied. The use of an imperfect gold standard can affect ROC areas in two ways, depending on the relationship between errors in the test and the standard.23 If errors in the test and standard are independent, ROC curve area is underestimated because different "mistakes" are made by both, thus decreasing sensitivity and specificity of test and standard. If errors in the test and the standard are positively dependent, ROC areas are overestimated. The extreme example of this case is when errors in test and standard are perfectly correlated. Because both tests make the same mistakes for each diagnosis, ROC area (and therefore sensitivity and specificity) is 1.0, regardless of the true utility of the test. Therefore, on theoretical grounds, SAP should not be used as the standard for evaluating other function tests, because it is likely that many errors in SAP are correlated with errors in SWAP and FDT. It is less clear whether errors in assessing optic disc appearance using stereophotographs are correlated with errors in SLP or OCT measures.
In general, sensitivities, specificities, and ROC areas in our study
are quite similar to those reported by others. For instance, using OCT,
we observed ROC curve areas in the 0.85 to 0.90 range and sensitivities
(with specificity set at
90%) in the 70% to 80% range for the best
parameters. Also using OCT, Greaney et al.24
reported a
best ROC curve area of 0.90 and a best sensitivity and specificity of
78% and 90%, respectively. For FDT we observed best ROC curve areas
in the 0.80 to 0.90 range and sensitivities (with specificity set at
90%) in the 60% to 70% range for the best parameters. Similarly,
Cello et al.25
reported an ROC curve area of 0.93 and a
sensitivity and specificity of 85% and 90%, respectively, for
patients with early glaucoma defined by SAP. FDT ROC curve areas and
sensitivities in our study may be somewhat underestimated, because we
used the C-20 program (Welch Allyn), which omits nasal points and,
therefore, nasal-step information. Using SWAP, our best ROC curve areas
were slightly less than 0.80 and sensitivities (with specificity set at
90%) were in the 40% to 52% range for the best parameters. Sample
et al.26
reported a sensitivity of 61% and a specificity
of 86% using SWAP test results outside of normal limits as a criterion
for diagnostic classification when disc appearance was the gold
standard.
In some cases, however, our values were considerably lower than those
previously reported. For instance, using SLP, we reported a best ROC
curve area of 0.81 and a best sensitivity (with specificity set at
90%) of 40%. Others have reported sensitivities and specificities
in the 80% to 90% range.12
27
Recently, some researchers
have reported improved SLP results (ROC curve areas of 0.90 for
discriminating between glaucomatous and healthy eyes, discrimination of
ocular hypertensive eyes from healthy eyes) using nonstandard data
analysis methods.13
28
29
30
Differences in SLP performance
among studies may be due to differences in severity of glaucoma,
differences in software, differences in corneal polarization
axes,31
or other methodological differences.
Statistically significant differences in variable measures
between glaucoma and healthy eyes found in the present study
confirm previous findings for all
instruments.10
12
13
14
26
27
30
32
33
34
35
36
37
38
39
We found
that a limited number of SLP parameters were significantly different
between diagnostic groups. We suspected that this may be due in part to
the necessary correction of
(
= 0.002) with such a large
number of parameters investigated for this instrument. However, if we
use the same
(i.e., 0.002) to define statistically significant
differences using OCT, SWAP, and FDT, almost all parameter differences
between diagnostic groups remain. Similarly, if we relax
for the
SLP comparisons to the level used for SWAP and FDT (
= 0.004),
few new significant differences appear. Although SAP-measured MD was
only 0.5 dB lower in the SAP diagnostic group than in the optic disc
diagnostic group, it is interesting that all OCT and SWAP parameters
and all but one FDT parameter (inferior total deviation plot points
5%) changed in the direction predicted by an increase in glaucoma
severity in the SAP diagnostic group (although no changes were
statistically significant). This general trend was not observed for
more than half of the SLP parameters.
It also is interesting to note that when specificity was set at
90%
and diagnostic agreement was compared across techniques, we found poor
agreement among the best (largest ROC curve area) parameter from each
instrument for determining which eyes were glaucomatous (all
between-instrument comparisons,
< 0.35; Fig. 2
). This finding
indicates that different techniques may detect different
characteristics of glaucoma. Poor agreement between SWAP and FDT may be
partially related to the different subsets of retinal ganglion cells
tested. Possibly, early in glaucoma, some eyes lose more of one cell
type than another. This may lead to nonoverlapping groups of patients
with glaucoma with different patterns of cell loss.26
Our
study did not address this hypothesis.
The primary strength of our study is that instruments were compared in a single population. Therefore, the sensitivity and specificity values per se were less important than their relative values among instruments. Limitations of this study include the small number of subjects. Our inability to find significant differences in ROC curve areas among the best parameters in most cases may be related to sample size. Another limitation is that all tests had to be completed within 1 year. Ideally, this maximum should be shortened to obtain the best cross-sectional comparison of different diagnostic techniques. However, because 75% of patients had tests completed within 3 months and 80% had tests completed within 6 months, it is unlikely that clinically meaningful glaucomatous change occurred within this time frame. It is possible that glaucoma will develop later in some of the healthy eyes included in this study. Therefore, longitudinal study is the only way to truly determine the sensitivity and specificity of these tests.
Another limitation, inherent in any comparable study, is that different diagnostic techniques evaluated in this study are currently at different stages of development. More established techniques (SWAP and SLP) were compared with newer technologies (FDT and OCT). In general, established technologies benefit from robust normative databases and more sophisticated analysis strategies. For instance, a complex parameter such as the GDx Number (a neural networkderived analysis) may be expected to outperform a crude measurement such as OCT-measured RNFL thickness at 6 oclock (derived from measurements at only eight or nine points). Similarly, a visual field analysis parameter such as PSD is likely to be more thoroughly derived from a standard 24-2 grid (used for SWAP) than from the more crude 16-location FDT grid. In our study, however, the more recently developed technologies generally performed better than the older ones. Because SLP corneal polarization compensation is inadequate in a sizable number of patients,31 implementation of a method that correctly compensates for corneal polarization axis in individual patients will probably improve the diagnostic precision of the instrument.
In conclusion, the largest ROC curve area for OCT (inferior quadrant thickness) was larger than the largest ROC curve area for SLP (LDF) and SWAP (PSD) when diagnosis was based on SAP, and the largest ROC curve area for OCT (inferior quadrant thickness) was larger than the largest ROC curve area for SWAP (PSD) when diagnosis was based on disc appearance. ROC curve areas among other instruments were not significantly different for either diagnostic criterion. Sensitivities were best (although not always significantly so) for OCT and FDT measurements followed by SWAP and SLP. However, the sensitivity and specificity of even the best parameter of the best instrument are probably not sufficient to warrant use as a sole screening method in the general population. In contrast, for screening in situations in which treatment is at a premium (e.g., developing nations), a sensitivity and specificity of 79% and 92% (for several OCT measures, for example) may be acceptable, assuming that the technique is relatively simple and quick. The poor diagnostic agreement found among instruments suggests that different techniques may identify different characteristics of glaucomatous damage.
| Footnotes |
|---|
Submitted for publication September 21, 2000; revised April 9, 2001; accepted April 26, 2001.
Commercial relationships policy: N.
The publication costs of this article were defrayed in part by page
charge payment. This article must therefore be marked
"advertisement" in accordance with 18 U.S.C.
1734
solely to indicate this fact.
Corresponding author: Linda M. Zangwill, Glaucoma Center and Diagnostic Imaging Laboratory, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0946. zangwill{at}eyecenter.ucsd.edu
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
T W Kim, T-W Kim, K H Park, and D M Kim An unexpectedly low Stratus optical coherence tomography false-positive rate in the non-nasal quadrants of Asian eyes: indirect evidence of differing retinal nerve fibre layer thickness profiles according to ethnicity Br. J. Ophthalmol., June 1, 2008; 92(6): 735 - 739. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bowd, J. Hao, I. M. Tavares, F. A. Medeiros, L. M. Zangwill, T.-W. Lee, P. A. Sample, R. N. Weinreb, and M. H. Goldbaum Bayesian Machine Learning Classifiers for Combining Structural and Functional Measurements to Classify Healthy and Glaucomatous Eyes Invest. Ophthalmol. Vis. Sci., March 1, 2008; 49(3): 945 - 953. [Abstract] [Full Text] [PDF] |
||||
![]() |
M Buyukates, S Kargi, O Kandemir, E Aktunc, S. Turan, and A Atalay The use of the retinal nerve fiber layer thickness measurement in determining the effects of cardiopulmonary bypass procedures on the optic nerve Perfusion, November 1, 2007; 22(6): 401 - 406. [Abstract] [PDF] |
||||
![]() |
Y. X. Wang, L. Xu, R. X. Zhang, and J. B. Jonas Frequency-Doubling Threshold Perimetry in Predicting Glaucoma in a Population-Based Study: The Beijing Eye Study Arch Ophthalmol, October 1, 2007; 125(10): 1402 - 1406. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Burgansky-Eliash, G. Wollstein, A. Patel, R. A Bilonick, H. Ishikawa, L. Kagemann, W. D Dilworth, and J. S Schuman Glaucoma detection with matrix and standard achromatic perimetry Br. J. Ophthalmol., July 1, 2007; 91(7): 933 - 938. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Fortune, S. Demirel, X. Zhang, D. C. Hood, E. Patterson, A. Jamil, S. L. Mansberger, G. A. Cioffi, and C. A. Johnson Comparing Multifocal VEP and Standard Automated Perimetry in High-Risk Ocular Hypertension and Early Glaucoma Invest. Ophthalmol. Vis. Sci., March 1, 2007; 48(3): 1173 - 1180. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. S. Harwerth, A. S. Vilupuru, N. V. Rangaswamy, and E. L. Smith III The Relationship between Nerve Fiber Layer and Perimetry Measurements Invest. Ophthalmol. Vis. Sci., February 1, 2007; 48(2): 763 - 773. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. V. Danesh-Meyer, S. C. Carroll, J. Y. F. Ku, J. Hsiang, B. Gaskin, G. G. Gamble, and P. J. Savino Correlation of Retinal Nerve Fiber Layer Measured by Scanning Laser Polarimeter to Visual Field in Ischemic Optic Neuropathy Arch Ophthalmol, December 1, 2006; 124(12): 1720 - 1726. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. A. Medeiros, P. A. Sample, L. M. Zangwill, J. M. Liebmann, C. A. Girkin, and R. N. Weinreb A statistical approach to the evaluation of covariate effects on the receiver operating characteristic curves of diagnostic tests in glaucoma. Invest. Ophthalmol. Vis. Sci., June 1, 2006; 47(6): 2520 - 2527. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sihota, P. Sony, V. Gupta, T. Dada, and R. Singh Diagnostic Capability of Optical Coherence Tomography in Evaluating the Degree of Glaucomatous Retinal Nerve Fiber Damage Invest. Ophthalmol. Vis. Sci., May 1, 2006; 47(5): 2006 - 2010. [Abstract] [Full Text] [PDF] |
||||
![]() |