|
|
||||||||
From 1 Discoveries in Sight, Devers Eye Institute, Legacy Clinical Research and Technology Center, Portland, Oregon; and the 2 Department of Optometry and Vision Sciences, University of Melbourne, Victoria, Australia.
| Abstract |
|---|
|
|
|---|
METHODS. Two novel threshold estimation procedures were evaluated: a rapid, efficient binary search technique (REBS) and a maximum-likelihood estimation (ZEST) procedure. A computerized visual field simulation model was developed to determine the accuracy and efficiency of these procedures. This model was constructed using previously derived characteristics of FDT perimetry from both normal observers (n = 506) and those with glaucomatous visual field loss (n = 352). The computer simulation program was used to determine the best parameters for the two new procedures and the effect of variability and response errors on algorithm performance. Comparisons were made to the performance of the modified binary search (MOBS) procedure used in the current commercial implementation of the FDT perimeter.
RESULTS. Both the optimized REBS and ZEST procedures approximately halved the time required for FDT threshold testing without loss of accuracy or reproducibility.
CONCLUSIONS. With suitable parameter choices, comparable performance was achieved using either ZEST or REBS. Simulation results indicate that accurate thresholds can be measured with an optimized ZEST or REBS procedure in approximately half the time required by traditional estimation methods.
| Introduction |
|---|
|
|
|---|
An advantage of FDT perimetry over conventional perimetry is decreased test time, partly due to fewer test locations. The commercially available FDT perimeter (Welch Allyn Inc., Skaneateles Falls, NY; Humphrey Systems, Dublin, CA) tests 17 locations: 4 in each quadrant and 1 central for the C-20 full-threshold test and 2 additional locations along the nasal meridian (19 total) for the N-30 full-threshold test. Full-threshold testing measures contrast sensitivity using a modified binary search (MOBS) procedure8 with an average test time of approximately 5 minutes. This compares favorably with the average test time of 15 minutes for a conventional 24-2 full-threshold or 8 minutes for a Swedish interactive threshold algorithm (SITA) standard strategy (Humphrey Systems).9
The purpose of this study was to develop new threshold procedures for FDT perimetry, to improve efficiency and maintain or improve accuracy. We explored the utility of maximum-likelihood procedures, specifically ZEST,10 11 12 13 because similar procedures (SITA) have been applied successfully to conventional perimetry.14 15 16 We also optimized the MOBS procedure for use with FDT perimetry to produce a rapid, efficient binary search (REBS).
Computer simulation was used to evaluate ZEST and REBS. Simulation
allows thousands of threshold estimates to be collected rapidly and has
been used previously to evaluate clinical test
algorithms.11
16
17
We used simulation to optimize the
input parameters and termination criteria of the algorithms and to
evaluate the relative importance of these parameters. We compared
performance of these algorithms with the MOBS procedure used in the
proprietary FDT perimeter. More than 1000 test procedures were
assessed. A limited number of these procedures are presented in
this report. The full results are available at
http://www.computing.edu.au/
andrew/Barramundi/fdp.html.
| Methods |
|---|
|
|
|---|
Using the simulation as just described confers no advantage over mathematical analysis of the test procedures. An advantage of simulation, however, is that sources of measurement error can be introduced and their effects on individual test procedures studied. We incorporated three types of measurement error into the simulation. Threshold variability was simulated by repeated sampling of a Gaussian distribution whose mean was the input threshold. Previous simulation studies have found little difference in performance between Gaussian and empiric distributions.17 18 The SD of the Gaussian was varied as 0, 1, and 2 dB to simulate patient variability.7 Figure 1 shows an example of an input threshold of 14 dB and a Gaussian distribution with an SD of 1 dB. The threshold used to determine a response is 14 dB with probability 0.40, 13 or 15 dB with probability 0.24, and so on. False-positive and false-negative rates were incorporated as a probability that the subject would respond yes or no, respectively, regardless of the stimulus presented. False-positive and -negative rates of 0%, 10%, and 30% were used within the simulation.
|
|
The top panel in Figure 2a shows an initial pdf that assumes the most likely threshold for the patient is 14 dB (probability 0.13), whereas the patient is very unlikely to have a threshold of 2, 3, 4, 19, or 20 dB (P = 0.001). The mean of this pdf is 12 dB, which is presented to the subject. If the subject responds no, then the pdf is modified to give more probability to lower decibel levels (Fig. 2a , bottom). The mean of this new pdf determines the next stimulus presentation at this location (9 dB in this case). If the subject responds yes, then the pdf is modified to give a pdf as shown in the bottom panel of Figure 2b , with more weight on higher decibel levels. A stimulus of 15 dB, the mean of this new pdf, is presented next.
|
Once a new pdf is derived, the new mean is calculated, and a stimulus contrast equal to that mean is presented. The process is then repeated, either a fixed number of times, or until the SD of the pdf falls below a predetermined value. The subjects threshold is the mean of the final pdf.
The defining features of a ZEST procedure are the initial pdf, the
likelihood function, and the termination rule. As Vingrys and
Pianta10
have noted, the initial pdf can be derived from
demographic studies and can be biased by an examiners intuition about
the subjects likely thresholds. The likelihood function should
reflect the variability inherent in the detection task as well as
subjective false error rates. Finally, the termination rule should be
chosen with realistic clinical outcomes in mind. One approach is to
finish the procedure when the SD (
) of the pdf declines below a
fixed value, which assures a level of confidence about the determined
threshold. For example, if
is chosen to be 1 dB, then there is a
95% chance that the real threshold lies within a ±2-dB range of the
measured threshold. One disadvantage of this approach is that
unreliable observers may require many presentations. An alternative is
to terminate the procedure after a fixed number of presentations, and
the 95% confidence interval of the final estimate can be determined
from the final pdf.
To determine the effect of the starting pdf on FDT perimetry thresholds, four classes of pdf were tested in the simulation: uniform distributions (pdfu); pdfs equal to the normalized histogram of sensitivity from the normal eyes in Table 1 (pdfn); pdfs equal to the normalized histogram of sensitivity from the eyes with glaucomatous visual field loss in Table 1 (pdfg); and pdfs formed by combining the normal and glaucomatous pdfs (pdfc). The combined pdfs added the bottom 5% of pdfg to pdfn.10 Before the addition, the bottom 5% of pdfg was reduced by a weighting factor so that it did not dominate the normal portion of the pdf. Twelve different weighting factors were evaluated. The results presented herein are for a weighting factor of 0.6. Each class of pdf contained a single pdf for each of the 17 visual field locations. There was little difference in the form of the 17 pdfs in each class; however, the means of the peripheral pdfs were generally lower than those for the central five locations. The resultant pdfc pdfs were trimodal (Fig. 2 , top panels), unlike the bimodal pdf of Vingrys and Pianta,10 which was derived from advanced glaucomatous deficits.
The following ZEST termination rules were evaluated: stopping after three or four presentations and stopping when the pdf achieved a SD of 0.5, 1, or 2 dB. A single-likelihood function was used for all ZEST procedures (Fig. 2 , middle panels). The ZEST procedures were performed on discrete functions, one for 0 dB, one for 1 dB, and so on. Similarly, the likelihood function was defined for whole-number thresholds in the range 0 to 20 dB. The mean of the pdf at each stage was rounded to the nearest decibel. This rounded mean was used to determine the alignment point for the likelihood function. Rounding introduces a maximal error of 0.5 dB when producing each new pdf. Given the 90% confidence interval for retest variability for normal observers for FDT perimetry is approximately 4 dB,7 this rounding is unlikely to be clinically significant.
Modified Binary Search
The optimal binary search technique19
requires
selecting the middle number of a range and then adjustment of the range
according to the response. For example, to find a number between 0 and
100, the first selection would be 50. If 50 is too high, the target
number falls in the range 1 to 49, and the next selection would be 25,
and so on. Binary search is a special case of a maximum-likelihood
procedure where the initial pdf considers all thresholds equally
likely, and the likelihood function is equal to 100% for thresholds
one side of the middle and 0% for other thresholds.
The binary search assumes that the target number is known and does not change. In perimetry, the threshold varies throughout testing, according to the psychometric function, and subjects make mistakes. To use the efficiency of binary search and also allow for errors and variability, Tyrell and Owens8 introduced MOBS. MOBS follows the binary-search strategy but also checks the range end points if two yes or no responses occur in succession. If the end point is not consistentfor example, the subject could not see the intensity at the bottom of the rangethen the range is widened to the previous end point. A reversal occurs when the previous response differs from the current response. MOBS terminates when the range reaches a minimum width and a fixed number of reversals have occurred.
Figure 3 shows examples of two MOBS procedures that terminate after two reversals, with a minimum range of 3 dB for a subject with a threshold of 8 dB. In Figure 3a the subject makes no mistakes, whereas in Figure 3b there is a mistake on the first presentation, responding yes to 10 dB. The white rectangle at each stage shows the possible range of thresholds, with the list of numbers to the left and right of the rectangle recording previous lower and upper end points. The subjects response is shown between each rectangle. In Figure 3a , after three presentations of (A) 10 dB, (B) 5 dB, and (C) 7.5 dB, a threshold of 6.25 dB is estimated. In Figure 3b , after two "no" responses in a row to stimuli of intensity (B) 15 dB and (C) 12.5 dB, the lowest end point of the range is presented and checked. In this case the subject responds that he or she cannot see the lowest end point (10 dB), and the previous end point is restored (0 dB). The procedure then continues until the termination criterion is met after a response at presentation (F). Note that after presentation (E), two reversals have been achieved, but the range is still more than 3 dB, and therefore MOBS continues for one more presentation to determine a threshold of 8.75 dB.
|
Analysis
Performance was compared using the mean and SD of error in
threshold measurement across all test locations and number of
presentations. Error was the difference between the estimated and input
threshold. Results are given for ideal patients (no Gaussian variation
in their input thresholds; 0% false-positive and false-negative
rates), typical patients (Gaussian variation of thresholds with an SD
of 1 dB; false-positive and false-negative rates of 10%), and
unreliable patients (Gaussian variation of threshold with an SD of 2
dB; false-positive and false-negative rates of 30%). The error and
efficiency for procedures intermediate to these fell within the
boundaries of the represented data.
| Results |
|---|
|
|
|---|
=
2 dB, the middle four bars terminating with
= 1 dB. The final
two bars represent the best of the procedures, stopping after three and
four presentations, respectively. The best procedure was that with the
lowest absolute value of mean error ± SD.
|
Comparing the top panels of Figure 4
shows that between two and three
extra presentations are required to reduce the SD of any of the pdfs
from 2 to 1 dB. These extra presentations lead to a reduction in the SD
of threshold measurement error by approximately 1 dB in ideal patients
(bottom panels), and approximately 0.3 dB in unreliable patients. If
ZEST is allowed to continue until the pdfs reach a SD of 0.5 dB (not
shown in Fig. 4
), 6 more presentations are required (total of 12
presentations) in unreliable patients, with only a 0.3 reduction in the
spread of errors (SD of 3 dB). In patients with highly variable
results, there is little gain in accuracy with continuation of testing
beyond three to five presentations. The spread of errors is within
reasonable limits (
1.5 dB) terminating after three or four
presentations (or
= 1 dB) in ideal and typical patients.
Examining the procedures terminating when
= 1 dB, the top
panels of Figure 4
show that ZEST procedures using pdfn were
fastest in normal subjects, and the ZEST based on pdfg was
fastest in ideal subjects with glaucoma. A surprising finding was that
ZEST using pdfn was the fastest in unreliable subjects with
glaucoma. The method based on the uniform pdf class (pdfu,
fourth bar) was slower than the other three pdf classes, requiring two
to three extra presentations, and showed greater errors than the method
using pdfc. The combined pdf was approximately one
presentation slower than the best method in each case. If it is known
which population the patient is from, then a ZEST procedure using an
appropriate pdf can achieve an accurate threshold using three to four
presentations. A combined pdf requires approximately five
presentations.
The glaucomatous subject data combines performance for locations with both normal and depressed thresholds, because any single glaucoma patient is likely to have some normal visual field locations. To compare the performance of the ZEST procedures on damaged and normal locations, Figure 5 presents the performance of the ZEST procedures plotted against the patients threshold. The data shown are averaged over all patients in each group. Three pdfs are shown for each patient group: pdfn and pdfc are as described earlier, whereas pdfd, is the difference between pdfc and pdfn, thus reflecting a pdf for damaged locations only.
|
Generally, ZEST with pdfc appears a good compromise between the two extremes, supporting our simulation results over the entire visual field. The number of presentations is consistent across all threshold values, particularly in patients with some erroneous responses. It sacrifices some speed to ZEST with pdfd in areas of damage (05 dB), and to ZEST with pdfn in the range 8 to 16 dB. Unless it is known in advance that a patients threshold will fall into one of these two ranges, there is no way of knowing which of these pdfs to use. The error behavior of pdfc is as expected, more variable than pdfn in the normal-threshold range, but less in the damaged range. Similarly the measurements made with pdfc are more erroneous in damaged locations than those made with pdfd, but provide better estimates than pdf in other threshold ranges.
Modified Binary Search
Figure 6
shows the performance of the MOBS and REBS procedures. Each panel shows
10 REBS procedures, each with different termination criteria, and MOBS.
The number of reversals used in the REBS procedure is indicated by
r = 0 for no reversals, r = 2 for two reversals, and r =
4 for four reversals. The stopping range appears as the tick label in
the x-axis of the bottom panel. MOBS is labeled M.
|
Where subjects respond ideally (Figs. 6a 6d) , the mean error is small, with most of the means plus one SD in the range ±1 dB. Given that the clinical difference between these errors is likely to be insignificant, choosing the fastest procedure (0 reversals and a range of 3 dB, leftmost bar) seems appropriate. When unreliability is introduced (Figs. 6b 6c 6e 6f) , this binary search procedure has the greatest spread of error (Fig. 6c 6f bottom panels), and exceeds that of the methods using four reversals.
Increasing reversals to four, rather than two, results in a small change in accuracy for all patient groups, but requires four to five additional presentations. If large errors (±6 dB) can be tolerated in unreliable patients, then the REBS procedure of choice is one that terminates with 0 reversals and a stopping distance of 3 dB. If more accuracy is required, but speed is necessary, then using two reversals with a stopping distance of either three or unlimited decibels appears most suitable.
Comparing MOBS, REBS, and ZEST
We compared our simulated results of REBS terminating with two
reversals and a range of 3 dB, the commercially available MOBS, and
ZEST using pdfc and a stopping criterion of
= 1.
Figure 7
compares the number of presentations, the mean error, and the SD of the
error in each of the six subject groups.
|
1.0 dB) in
all cases but ZEST in unreliable normal patients (-1.96 dB). Figure 7b shows that the new procedures are twice as fast as the existing MOBS,
with ZEST being faster than REBS in unreliable patients. | Discussion |
|---|
|
|
|---|
A hybrid of the two approaches is possible. ZEST can use a similar end point checking scheme as MOBS and REBS, allowing it to recover from incorrect responses. This may be particularly helpful when a mistake is made early in the procedure, because ZEST has limited ability to recover from early errors.10 Alternately, REBS can be modified to choose a stimulus other than the midpoint of the current range, guided by a pdf of likely thresholds within the range.
Using a pdf including both normal and glaucomatous thresholds, pdfc, provided the best overall performance of the ZEST procedure. The possibility for further improvements in efficiency at follow-up visits may be realized by choosing the pdf for each location of the visual field on a point-wise basis, depending on the threshold at the last examination. Additional benefits in efficiency may also be possible through the incorporation of neighborhood logic to the thresholding algorithm, in that the threshold at any particular location is not independent of that of its immediate neighbors. We chose not to include such methodology in this study, because the benefits when testing only 17 locations are likely to be small. This may not be the case when greater numbers of locations are tested.
As well as choosing a pdf for ZEST, a likelihood function should be
chosen. Although we present data for only a single-likelihood function,
we included 12 different likelihood functions within the simulation
ranging from a relatively flat slope derived by scaling a
white-on-white perimetry frequency-of-seeing curve, through to steeper
curves such as that finally used. As expected, steeper likelihood
functions resulted in decreased test times (for data see
http://www.computing.edu.au/
andrew/Barramundi/fdp.html). Our
primary purpose was to develop a fast procedure while admitting a small
and clinically acceptable loss in accuracy, and we therefore used
steeper curves. Similarly, the end points of the curve were varied from
95%5% through to 99%1%, and again the simulation results showed
that the increase in speed of convergence of the ZEST procedure when
using 99%1% was not at the expense of clinically acceptable loss in
accuracy.
Although we have attempted to include all factors within the simulation, there are several limitations that make it possible that human performance may differ from that predicted. First, the pdfs used in this study were generated from the same empiric patient thresholds that were input to the simulation. Consequently, the subjects tested were definitely represented within the pdfsan ideal situation. The ZEST procedure may not perform as well as reported by the simulation when new subjects are tested. Our pdfs were based on a large sample of eyes (506 normal eyes and 352 glaucomatous eyes) and it seems unlikely that profound differences in performance will result when ZEST is used within a clinical setting.
A further limitation is that throughout the simulation, error rates were set to be independent of threshold sizethat is, every input threshold was tested with ideal, typical, and unreliable error conditions. Clinical variability increases with defect depth.7 The variability of FDT perimetry thresholds increases only modestly (20%30%),20 in contrast to the large changes observed in achromatic perimetry (300%400%).7 In any patient, responses may vary from typical to unreliable at different locations within the visual field, depending on the deficit depth. In this case, the simulation predicts a different number of presentations and range of errors at these locations. A prediction of performance can be determined for each location from the data shown in Figure 5 , however overall performance will differ from that presented in Figures 3 and 4 .
Our results suggest that REBS can reduce the number of presentations by 45%, with an average error of less than 0.5 dB. In ideal and typical patients, ZEST matches the performance of REBS. In unreliable patients, ZEST can achieve even greater efficiency, saving 55% of presentations, but with an increase in average error to approximately 2 dB. The predicted reduced test time per presentation should result in more rapid testing of the established 17-location FDT perimetry pattern. Alternately, a greater number of test locations may be assessed within an acceptable test duration, thereby improving the spatial resolution of FDT perimetry. Clinical validation of these predictions forms the basis for ongoing study in our laboratory.
| Footnotes |
|---|
Submitted for publication October 19, 2000; revised September 28, 2001; accepted October 18, 2001.
Commercial relationships policy: F, C (CAJ); N (all others).
The publication costs of this article were defrayed in part by page
charge payment. This article must therefore be marked
"advertisement" in accordance with 18 U.S.C.
1734
solely to indicate this fact.
Corresponding author: Chris A. Johnson, Discoveries in Sight, Devers Eye Institute, Legacy Clinical Research and Technology Center, PO Box 3950, Portland, OR 97208-3950; cajohnso{at}discoveriesinsight.org
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Taravati, K. R. Woodward, J. L. Keltner, C. A. Johnson, D. Redline, J. Carolan, C. Q. Huang, and M. Wall Sensitivity and Specificity of the Humphrey Matrix to Detect Homonymous Hemianopias Invest. Ophthalmol. Vis. Sci., March 1, 2008; 49(3): 924 - 928. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. A. Medeiros, P. A. Sample, L. M. Zangwill, J. M. Liebmann, C. A. Girkin, and R. N. Weinreb A statistical approach to the evaluation of covariate effects on the receiver operating characteristic curves of diagnostic tests in glaucoma. Invest. Ophthalmol. Vis. Sci., June 1, 2006; 47(6): 2520 - 2527. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Vingrys and A. J. Zele Robust Indices of Clinical Data: Meaningless Means Invest. Ophthalmol. Vis. Sci., December 1, 2005; 46(12): 4353 - 4357. [Full Text] [PDF] |
||||
![]() |
P G D Spry, H M Hussin, and J M Sparrow Clinical evaluation of frequency doubling technology perimetry using the Humphrey Matrix 24-2 threshold strategy Br. J. Ophthalmol., August 1, 2005; 89(8): 1031 - 1035. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. H. Artes, D. M. Hutchison, M. T. Nicolela, R. P. LeBlanc, and B. C. Chauhan Threshold and Variability Properties of Matrix Frequency-Doubling Technology and Standard Automated Perimetry in Glaucoma Invest. Ophthalmol. Vis. Sci., July 1, 2005; 46(7): 2451 - 2457. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Anderson, C. A. Johnson, M. Fingeret, J. L. Keltner, P. G. D. Spry, M. Wall, and J. S. Werner Characteristics of the Normative Database for the Humphrey Matrix Perimeter Invest. Ophthalmol. Vis. Sci., April 1, 2005; 46(4): 1540 - 1548. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Turpin, A. M. McKendrick, C. A. Johnson, and A. J. Vingrys Properties of Perimetric Threshold Estimates from Full Threshold, ZEST, and SITA-like Strategies, as Determined by Computer Simulation Invest. Ophthalmol. Vis. Sci., November 1, 2003; 44(11): 4787 - 4795. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Anderson Spatial Resolution of the Tendency-Oriented Perimetry Algorithm Invest. Ophthalmol. Vis. Sci., May 1, 2003; 44(5): 1962 - 1968. [Abstract] [Full Text] [PDF] |
||||
![]() |
P G D Spry, C A Johnson, A M McKendrick, and A Turpin Measurement error of visual field tests in glaucoma Br. J. Ophthalmol., January 1, 2003; 87(1): 107 - 112. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |