|
|
||||||||
1From the Hamilton Glaucoma Center and Visual Function Laboratory, Department of Ophthalmology, and the 2Institute for Neural Computation, University of California at San Diego, La Jolla, California; and the 3Computational Neurobiology Laboratories, Salk Institute, La Jolla, California.
| Abstract |
|---|
|
|
|---|
METHODS. Two machine learning classifiers—quadratic discriminant analysis (QDA) and support vector machines with Gaussian kernel (SVMg)—were trained separately using standard perimetry data from the Diagnostic Innovations in Glaucoma Study (DIGS), clustered using three clustering schemes on a training data set (123 eyes/123 glaucoma patients with GON; 135 eyes/135 normal control subjects). Trained classifiers were then applied to an independent data set containing 69 eyes of 69 glaucoma patients with early visual field loss and 83 eyes of 83 normal control subjects. Two control conditions were included: unclustered data and a random assignment of locations to clusters.
RESULTS. Areas under the receiver operating characteristic (ROC) curve ranged from 0.85 (SVMg, thresholds clustered by Glaucoma Hemifield Test sectors) to 0.92 (QDA, thresholds clustered by Garway-Heath mapping) for the training data set. Use of clustered data showed no significant optimization of sensitivity over use of unclustered data, and no single clustering method resulted in significantly higher performance in the independent data set. Sensitivities tended to be higher with QDA than with SVMg, regardless of specificity cutoff and clustering method.
CONCLUSIONS. QDA performed better with the early glaucoma data set than did the SVMg. Clustering may be advantageous when data-dimension reduction is needed—for example, when combining field results with other high-dimensional data (e.g., structural imaging data)—but it is not necessary for visual field data alone.
One approach to optimizing the analysis of visual fields has been to group visual field locations. Mapping the visual field into clusters of related locations has been used to clarify the structure-function relationship21 22 23 24 and to aid in the detection of glaucomatous progression by reducing the effect of long-term variability.25 26 27 Clustering may also serve as a dimension-reduction tool to optimize MLCs and to increase our understanding of the visual field regional relationships. However, it is not clear whether there is an advantage to selecting one map over the others or whether dimension reduction with structure-derived clusters compares favorably with mathematical dimension-reduction.
In the present study we compared the sensitivity and specificity of two MLCs trained separately on three clustering schemes to determine (1) whether MLC ability to categorize healthy and GON eyes can be optimized by training with clustered data; (2) which MLC, visual field mapping scheme or MLC/map combination achieves the highest performance; and (3) how structure-derived schemes compare to the mathematical dimension-reducing scheme.
| Methods |
|---|
|
|
|---|
Inclusion Criteria for DIGS
All subjects underwent complete ophthalmic examination including slit lamp biomicroscopy, intraocular pressure measurement, dilated stereoscopic fundus examination, and stereophotography of the optic nerve heads. Simultaneous stereoscopic photographs were obtained in all subjects and were of adequate quality for the subject to be included. All subjects had open angles, best corrected acuity of 20/40 or better, spherical refraction within ±5.0 D, and cylinder correction within ±3.0 D. A family history of glaucoma was allowed.
Exclusion Criteria for DIGS
Subjects were excluded if they had a history of intraocular surgery (except for uncomplicated cataract or glaucoma surgery). We also excluded all subjects with nonglaucomatous secondary causes of elevated IOP (e.g., iridocyclitis, trauma), other intraocular eye disease, other diseases affecting visual field (e.g., pituitary lesions, demyelinating diseases, HIV+ or AIDS, or diabetic retinopathy), with medications known to affect visual field sensitivity, or with problems other than glaucoma affecting color vision.
Inclusion Criteria for This Report
Participants were required to have at least one reliable SAP field test result in their study eye. Healthy eyes were characterized by healthy optic discs and had no history of elevated IOP (IOP > 22 mm Hg). Glaucomatous eyes had GON on masked, independent stereophotograph review. Other than reliability, visual field results were not used for inclusion or diagnostic purposes.
Participants were allocated to one of two data sets: the initial evaluation or the independent data set. The initial evaluation data set included 156 healthy eyes and 189 glaucomatous eyes before age matching. These eyes were used in our previous report of the performance of MLCs trained with SAP raw thresholds and age.2 The normal and glaucomatous groups were then age matched by randomly removing participants within 10-year age bins, to equalize the number of healthy and GON eyes in each age bin, yielding 135 healthy eyes and 123 glaucomatous eyes. This data set was used for training and initial evaluation of the MLCs and included eyes with glaucoma of various severities.
A separate group of 95 healthy eyes and 73 glaucomatous eyes met our inclusion criteria for the independent data set. To determine whether the performance of the machine classifiers trained with clustered data could match or even exceed the performance of existing clinical methods in a more difficult classification decision, glaucomatous eyes in the independent data set had to meet the additional criterion of having early visual field loss. Early visual field loss was defined by using clinically available analyses as follows: (1) the field had to be abnormal (i.e., the pattern SD [PSD] had to be P < 5% or the Glaucoma Hemifield Test (GHT) result had to be outside normal limits), and (2) the visual loss had to be relatively mild (i.e., the mean deviation [MD] had to be better than or equal to –6 dB and the PSD no worse than the 1% probability level). A significant difference in age between GON and healthy eyes was noted (unpaired t-test; P < 5%), and so the groups were age matched using the same procedure as just described. An age-matched group of 83 healthy eyes and 69 glaucomatous eyes were used for all analyses.
Optic Disc Stereoscopic Photographs
Simultaneous stereoscopic photographs (Simultaneous Stereo Camera TRC SS; Topcon Instrument Corp. of America, Paramus, NJ) were obtained in all patients. Those stereophotographs closest in time to the visual field date were examined for GON. Each masked stereophotograph was graded independently by two experienced reviewers. GON was diagnosed if there was evidence of excavation, focal or diffuse rim thinning, or nerve fiber layer defects by both reviewers. In cases of disagreement, a third reviewer adjudicated. Healthy eyes had no evidence of GON.
Visual Fields
SAP visual fields were performed with the 24-2 program on the Humphrey Visual Field Analyzer (Carl Zeiss Meditec, Inc., Dublin, CA), with a white Goldmann size III (0.43°) stimulus on a 31.5-apostilb background. The full-threshold algorithm was used for all fields. Only reliable visual fields were included (i.e., false-positive and false-negative responses and fixation losses were all less than or equal to 25%).
Visual field locations have been grouped anatomically21 28 29 or by statistical clustering.25 26 27 Anatomically derived clusters of field locations are based on localized nerve fiber bundle or wedge defects21 29 and/or prominent nerve fiber bundles.21 28 Garway-Heath et al.21 developed their map from 69 eyes with normal-tension glaucoma that had discrete retinal nerve fiber layer defects and/or prominent nerve fiber bundles (Fig. 1A) . Visual field locations with defects that tended to correspond with a particular nerve fiber layer bundle were clustered. The GHT sectors were derived by superimposing the 30-2 test point pattern of the Humphrey Visual Field Analyzer on photographs of the retinal nerve fiber layer of normal subjects, to identify the normal arrangement of retinal nerve fibers (Fig. 1B) .30 Test points arranged along nerve fibers were clustered together to derive 10 sectors.
|
Random assignment of the visual field locations to 10 clusters served as one control condition (Random map). A second control condition, unclustered control, was based on unclustered visual field data (i.e., MLCs were trained and tested with all 52 raw thresholds and age).
Machine Learning Classifiers
We chose to employ two MLCs with different properties, which were used in our previous studies1 2 9 36 37 38 39 —a quadratic discriminant analysis (QDA) and a support vector machine with a Gaussian kernel (SVMg). The MLCs were trained with average cluster thresholds and age from each of the four maps separately using visual fields from the initial evaluation data set. Thus, there were eight classifier/map combinations and two classifier/unclustered control combinations.
For the initial evaluation data set, separation of teaching and test examples was accomplished with 10-fold cross-validation, to minimize bias in testing the sensitivity and specificity of the machine classifiers. Data from normal and glaucomatous eyes were randomly divided into 10 partitions. One partition of the data set was retained as a test set, and nine partitions were combined to act as the training set. The training-test process was repeated until each partition had the opportunity to be in the test set.
Machine classifiers were generated with custom computer programs (written in MatLab; The Mathworks Inc., Natick, MA) developed by one of the authors (KC).2 Detailed descriptions of the classifiers can be found in Goldbaum et al.2 However, a brief description of the classifiers is provided in the following sections.
Quadratic Discriminant Analysis.
In a discriminant analysis, two classes (GON and healthy eyes) of multivariate observations (e.g., 52 visual field locations plus age) together form a training set. A discriminant rule is learned from these data and can then be used to classify new observations into one of the two classes. Quadratic discriminant analysis is closely related to linear discriminant analysis but differs in some of the underlying assumptions about the characteristics of the data and by assuming that the discriminant rule is a quadratic function. QDA fits one normal distribution to each class, and the output is the combination of normal distribution for each class weighted by class probability (percent of normal and glaucoma). QDA estimates the covariance of each class, whereas LDA assumes that the two classes have the same covariance. This results in a linear discriminant rule for LDA, and a quadratic rule for QDA. This type of classification tool has been shown previously to be one of the more effective MLCs with standard perimetry data.2 QDA is disadvantaged, however, in high-dimensional space, as it requires the estimation of many parameters. Reducing the dimensionality of the data from 53 dimensions to 11 and fewer might improve the performance of QDA in classifying eyes as healthy or GON based on visual fields.
Support Vector Machine with a Gaussian Kernel.
Support vector machines are a class of supervised learning algorithms.40 41 Unlike QDA, the data are mapped to higher-dimensional space by nonlinear mapping through Gaussian kernel with the expectation that it will provide better separation instead of by adapting the separating surface. A multivariate Gaussian distribution kernel (SVMg) analysis is designed to cope with data that cannot be easily separated in the original data space.42 The data are mapped to higher-dimensional space by nonlinear mapping with the expectation that the data will be more separable in higher-dimensional space with a hyperplane. Support vectors are input data closest to the decision boundary in a projected higher-dimensional space. A linear separation hyperplane is learned based on maximizing the distance of support vectors from the separation hyperplane. The linear boundary in projected space results in nonlinear separation in data space. SVMs have been used for a variety of classification problems, with success.43 44 45 46 They are able to deal better with high-dimensional data than QDA and therefore do not require dimension reduction for maximal performance. However, SVMs are still sensitive with low-dimensional data (e.g., the clustered data). If a particular mapping scheme clusters the data more optimally for our data set, it is assumed that the performance of SVM at separating GON from healthy eyes will be better for some mapping schemes than others.
Analysis
Students t-tests were used to compare clinical characteristics between glaucomatous and healthy eyes for the initial evaluation and the independent data sets (JMP software; SAS Institute Inc., Cary, NC). P < 0.05 was considered significant.
Sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve were calculated for classifier/map combinations to evaluate performance of (1) clustered versus the two control conditions (random map and unclustered visual field data) for each classifier, (2) clustering methods (i.e., maps) across classifiers, (3) classifiers across clustering methods, and (4) classifier/map combinations versus clinically available analyses. Areas under the ROC curve do not fully account for the shape of the curve. Thus, areas under the ROC curve are reported but the primary analyses were conducted on sensitivity at the high specificities desired in glaucoma diagnostic tests. The specificity cutoffs were derived from the initial evaluation data set and then applied to the independent data set. The specificity was recalculated in the independent data set to determine how much the specificity changed, if at all, when the cutoff was applied to the new set of data. Statistical comparison of sensitivity and specificity were conducted using the McNemar test for nonparametric, paired data (SPSS software; SPSS Inc., Chicago, IL).
For comparison with a clinical standard, the sensitivity and specificity of MD and PSD were calculated using a cutoff of P < 5% provided by Statpac 2 to define abnormality. The GHT result was considered abnormal if it was outside normal limits.
| Results |
|---|
|
|
|---|
|
|
|
|
|
Classifier/Map Combinations and Clinical Analyses
Individual Statpac criteria did not achieve a specificity of 96% when clinically available criteria were used. The GHT Statpac 2 results had a specificity of 89% and sensitivity of 71%. No significant difference in specificity between the GHT Statpac outcome and any of the classifiers was found for 96% specificity cutoff (selected to more closely approximate the theoretical specificity of the GHT Statpac analysis). The unclustered-QDA (sensitivity, 80%) showed significantly higher sensitivity than the GHT Statpac result, whereas GHT-SVMg (sensitivity, 44%), random-SVMg (sensitivity, 47%) and vb-ICA-SVMg (sensitivity, 51%) showed significantly lower sensitivity than the GHT Statpac result.
PSD yielded a specificity of 89% and sensitivity of 61%. The unclustered-QDA showed significantly higher sensitivity than the PSD, whereas the GHT-SVMg showed significantly lower sensitivity. No other significant differences were found.
| Discussion |
|---|
|
|
|---|
Turpin et al.49 report comparable results in their analysis of visual field progression using a support vector machine with a linear kernel (SVMl). SVMl was able to classify eyes as progressing or stable with similar sensitivity as point-wise linear regression when the age-corrected thresholds from the 76 locations of the 30-2 pattern of SAP were used as input into the classifiers. This outcome was not improved in any significant way by clustering the data by using a anatomically derived map developed by Weber and Ulrich.50
There was not a strong advantage of clustering data with either statistical or anatomic clustering methods. Clustering visual field locations to reduce data dimensionality, however, did not adversely affect MLC performance in general. Thus, in circumstances in which reducing data dimensionality may be helpful, such as when visual field data are combined with high-dimensional data from another diagnostic instrument, clustering visual field maps is a viable option. Moreover, there appeared to be a trend in the data plotting sensitivity versus specificity, indicating the potential for a tradeoff. Some classifiers maintained sensitivity while losing specificity when applied to a new data set—for instance, the unclustered-QDA classifier. The GHT-QDA classifier appeared to have good sensitivity while approaching the desired specificity. Future studies examining the diagnostic potential of new classifiers should address specifically the possibility of this tradeoff. A good diagnostic analysis package should be both sensitive and maintain good specificity when applied to the population for which it has been developed.
Testing of the initial evaluation group confirmed that sensitivities at high specificities using the two MLCs trained with clustered visual field input were similar to those found in a previous study on the same population for classifiers trained with SAP raw thresholds and age.2 The present study differed from our previous studies in that the classifiers were trained on one data set with GON eyes of mixed disease severity (initial evaluation group), but tested on a separate data set containing eyes with early glaucoma (defined by visual fields)—a more stringent evaluation of the classifiers. In this case, the analysis of primary interest was the discrimination of healthy eyes from GON eyes with early field loss. As expected, sensitivities were lower for all classifier/map combinations. However, we believe this data set provides better separation of classifier/map combinations, because it excludes the most obviously defective cases.
The random control condition was used to determine whether data reduction per se is sufficient to optimize MLC performance. Although sensitivity was not always significantly lower for classifiers trained with the random map, the trend was present. When data reduction is desired, the other maps developed specifically to group related visual field locations would be most useful.
With respect to current clinical indices, a slight, but not statistically significant advantage for the QDA/GHT combination over the PSD was noted and matched the performance of the GHT. Because of the selection criteria for the patients in the independent data set, we do not know if the performance of the GHT can be improved on. QDA trained with data clustered by either one of these two maps may increase the ability to detect early visual field loss while maintaining high sensitivity. The GHT Statpac analysis is well documented, is commonly used in clinical practice, and has been valuable in identifying glaucoma.51 A future study comparing the classification by QDA combined with GHT clustering and GHT Statpac analysis in longitudinal data of ocular hypertensive and eyes with suspected glaucoma could help determine whether a QDA/map combination is able to detect field loss earlier than any of the current commercially available Statpac analyses.
A limitation of the present study is the lack of data from an independent source of participants who might be of a different socioeconomic and/or racial make-up than the participants in the DIGS. This limitation is true of most MLC studies in this field because of the difficulty of acquiring a large enough data set. However, since our purpose was to compare the relative performance of various combinations of classifiers and mapping schemes, rather than estimate diagnostic performance, the lack of data acquired independently from the test set should not have a large impact on our conclusions.
In summary, the results of the present study do not support the use of one method of clustering visual field locations over the others. There was a marked advantage for one MLC (QDA). In addition, one combination of classifier and data reduction, the QDA with data clustered by the GHT sectors, worked better than other such combinations. Future work is needed to examine the potential for using visual field clustering to optimize MLC performance when field data are combined with other high-dimensional test results, such as imaging data.
| Footnotes |
|---|
Submitted for publication August 1, 2006; revised November 30, 2006, and July 24, 2007; accepted October 24, 2007.
Disclosure: C. Boden, None; K. Chan, None; P.A. Sample, Carl Zeiss Meditec, Inc. (F), Haag Streit (F); J. Hao, None; T.-W. Lee, None; L.M. Zangwill, Carl Zeiss Meditec, Inc. (F), Heidelberg Engineering (F, R); R.N. Weinreb, Carl Zeiss Meditec, Inc. (F), Heidelberg Engineering (F, R); M.H. Goldbaum, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Pamela A. Sample, Department of Ophthalmology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0946; psample{at}glaucoma.ucsd.edu.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |