|
|
||||||||
From the Department of Ophthalmology, Malmö University Hospital, Lund University, Malmö, Sweden.
| Abstract |
|---|
|
|
|---|
METHODS. The results of SITA Standard visual field tests in 213 healthy subjects, 127 patients with glaucoma, 68 patients with concomitant glaucoma and cataract, and 41 patients with cataract only were included. The five different types of input data were entered into five identically designed artificial neural networks. Network thresholds were adjusted for each network. Receiver operating characteristic (ROC) curves were constructed to display the combinations of sensitivity and specificity.
RESULTS. Input data in the form of Pattern Deviation probability scores gave the best results, with an area of 0.988 under the ROC curve, and were significantly better (P < 0.001) than threshold sensitivities and numerical Total Deviations and Total Deviation probability scores. The second best result was obtained with numerical Pattern Deviations with an area of 0.980.
CONCLUSIONS. The choice of type of data input had important effects on the performance of the neural networks in glaucoma diagnosis. Refined input data, based on Pattern Deviations, resulted in higher sensitivity and specificity than did raw threshold values. Neural networks may have high potential in the production of useful clinical tools for the classification of visual field tests.
The Glaucoma Hemifield Test (GHT)7 included in Statpac, is a rather simple expert system based on up-and-down hemifield differences between probability scores calculated from Pattern Deviation probability maps. The GHT was one of the first computerized systems that was able to classify field test results reliably as normal or abnormal and improved the ability of ordinary clinicians to assess visual field test results.8
In the beginning of the 1990s artificial neural networks (ANNs), one of many algorithms in the machine learning classifier concept, were tested as a tool for the interpretation of perimetric results (Goldbaum MH, et al. IOVS 1990;31:ARVO Abstract 2471; Keating D, et al. IOVS 1992;33:ARVO Abstract 1394).9 ANNs were reported to be able to differentiate between glaucoma and normal visual field status at least as well as trained readers.10 In other papers, it was also reported that machine learning classifiers discriminate better between normal and glaucomatous fields than do global visual field indices.11 12 Global visual field indices are far from ideal as diagnostic tools, however, because they condense all threshold data into one number, resulting in loss of valuable spatial information, and visual field indices are not particularly sensitive to early localized glaucomatous visual field loss.13 14 15
The performance of ANNs has also been compared with that of other types of field interpretation criteria based on localized loss.11 Disc topography data have also been added to visual field data to improve the diagnostic ability of ANNs.16
We hypothesized that it may be possible to enhance the diagnostic performance of ANNs further by using input data from which the effects of age and media opacities have been eliminated or reduced and in which measured sensitivities have already been compared to the range of age-corrected normal sensitivities and subsequently translated into probabilities. The Statpac program provides two important analyses: (1) Numerical Total Deviations represent the deviation at each tested point of the measured threshold from age-corrected normal values. (2) Numerical Pattern Deviations represent a modification of the Total Deviation results in which a correction has been applied to account for any general elevation or depression of the field caused by media opacities or changes in pupil size. Total and Pattern Deviation probability maps are graphic presentations of the significances of the numerical deviations, relative to the known ranges of normal values at each test point location.
The purpose of this study was to test our hypothesis by comparing sensitivities and specificities achieved by ANNs for glaucoma diagnosis by using different types of perimetric inputs: numerical threshold values in decibels, and Statpac numerical Total and Pattern Deviations and probabilities.
| Methods |
|---|
|
|
|---|
|
Subjects
Because patients with glaucoma often have concomitant cataract, it is desirable that methods designed to recognize glaucomatous visual field loss not be affected by ocular media changes. Therefore, it was necessary to train the neural networks with fields from healthy subjects, patients with cataract, patients with glaucoma, and patients with both glaucoma and cataract. Patients with glaucoma had primary open-angle glaucoma (POAG), including normal tension, exfoliation, and pigment glaucoma. Other types of glaucoma, such as angle-closure, secondary, and congenital forms were not included. Glaucomatous eyes were defined as those having typical glaucomatous changes in the optic disc: notches, thin or absent neural rims or marked vertical optic cup asymmetry, combined with glaucomatous visual field defects. Glaucomatous field defects were those that were compatible with glaucoma and not explained by other disease. The visual field classification was subjective, including all information available on the single-field printouts. However, we also included seemingly normal fields from eyes with pathologic disc topography, if field defects were found in later visual field tests. Patients with macular or retinal changes and neurologic or endocrinological disorders or other conditions likely to cause field defects were excluded, whereas patients with diabetes mellitus without retinopathy were included. No first field test results of any subjects were considered, to avoid patterns caused by lack of perimetric experience.19 20 21 The study was conducted according to the tenets of the Declaration of Helsinki and was approved by the Ethics Committee of Lund University.
Healthy Subjects
Two hundred thirteen tests results of 213 subjects were randomly selected from an existing large normal database originally collected to establish normal thresholds and normal limits for the SITA thresholding strategies.22 The mean age of these subjects was 52 years, ranging from 19 to 84. Most fields in the normative database appeared quite normal, although normality was not a criterion for inclusion; Average Mean Deviation (MD) was 0.02 dB, ranging from 6.11 to +3.07 dB (Fig. 2A) .
|
Patients with Glaucoma
The field tests of patients with glaucoma were randomly selected from the directory fields included in the database in one of our Humphrey Field Analyzers. This database consisted of 11,134 tests of 3,629 patients, almost all assessed by the 30-2 SITA Standard program. The directory was sorted in alphabetic order according to the patients surname. Starting with the letter A, one field test was randomly selected from every fifth patient; no first field results were selected, to avoid patterns of learning. The selected patients were then matched to our glaucoma register. Only patients with a diagnosis of glaucoma or suspected glaucoma were eligible, and patient records were retrieved. In this way, 643 SITA Standard 30-2 test results were selected to be evaluated for inclusion. At this point the only information available was that the patient had undergone 30-2 SITA Standard visual field testing at least twice, and that the patient had a diagnosis of suspected glaucoma or glaucoma. After retrieving patient records disc photographs obtained before the selected field test were inspected. Fields of all eyes with glaucomatous disc appearance were deemed usable. A comprehensive description of disc topography was required in patient records lacking disc photographs. A description of lens status was also required. The absence of such a description or a notation of a clear lens or pseudophakic eyes was regarded as glaucoma without cataract, whereas data indicating the presence of any type or stage of cataract classified the eyes as having glaucoma plus cataract. After exclusion of eyes according to these criteria, 127 tests of 127 eyes with glaucoma and 68 tests of 68 eyes with concomitant glaucoma and cataract remained.
The mean age of the 127 patients with glaucoma was 75 years, ranging from 40 to 96. MDs ranged from 31.18 to +0.74 dB (Fig. 2B) . The group with both glaucoma and cataract averaged 77 years of age, ranging from 51 to 97 and had MDs ranging from 29.99 to 0.12 dB (Fig. 2D) . In some eyes, the selected field test results appeared normal, but then the disc appeared suspicious or pathologic, and later field tests, not included in the analysis, showed glaucomatous field loss.
Neural Network Design
Our networks were fully connected feed-forward multilayer perceptrons built using commercial software (Neural Network Toolbox, ver.4.0 of MatLab; The MathWorks Inc., Natick, MA). This network architecture, consisting of an input layer, two hidden layers, and an output layer, was the same for the different sets of input data. There were 74 units in the input layer, each unit corresponding to one test point in the 30-2 test point pattern. The number of processing elements in the two hidden layers was 25 and 5. The output layer, one neuron with a logistic transfer function, provided the networks output: glaucoma or normal.
Network Training
The networks were trained in batch mode by using an optimization of the back propagation algorithm developed by Møller.23 This algorithm has been shown to have a fast convergence rate (i.e., relatively few iterations are needed to achieve a small classification error calculated from the network output). An early stopping technique was applied to terminate the training procedure to prevent overfitting of the data. A glaucomatous field classified with 100% certainty was assigned an output of 1 and a 100% normal field an output of 0. Fields falling between were assigned values between 0 and 1. Outputs close to the endpoints 0 or 1 indicated high confidence in the classification, whereas those close to 0.5 indicated uncertainty of the output. Because the task of the network was to identify glaucomatous field loss, patients with cataract only were included in the normal group and patients with concomitant cataract and glaucoma in the glaucoma group. During network training, classification errors were calculated and used to adjust weights in the neural network. The number of necessary iterations, arbitrarily set to a maximum of 300, was also determined by the size of the classification error. Eighty percent of all fields were used in the training procedure.
Validation
A validation procedure was applied, using half the fields not used in initial training, to prevent overfitting of data. Overfitting of data hampers the networks generalization ability and effective classification of previously unseen data.
Evaluation
The performance of the network was evaluated with a 10-fold cross-validation procedure, in which all fields were randomly divided into 10 subgroups each containing 10% of the full data set.12 24 The number of subgroups used in training, early stopping, and test procedures was 8, 1, and 1, respectively. With this procedure, each subgroup was used for training, validation, and evaluation, while ensuring that the network was trained and evaluated, by using different sets of visual fields to avoid confounding.
Analyses
Network receiver operating characteristic (ROC) curves25 were produced by adjusting the network threshold. The network threshold, ranging from 0 to 1, was used to define patient classification or diagnosis. For each network threshold, fields with outputs larger than the threshold were classified as glaucomatous, and outputs lower than the network threshold were classified as normal. The areas under the ROC curves, one for each type of input data, were compared by a nonparametric method described by Delong et al.,26 and the Bonferroni correction was applied to adjust for effects of multiple comparisons on the type I errorthat is, to reject falsely the null hypothesis stating no difference between ROC curves.
| Results |
|---|
|
|
|---|
|
A best threshold for the network was determined by the best combination of sensitivity and specificity, simply defined as the product of the two. Best network thresholds differed for the various types of input data (Table 1) .
|
| Discussion |
|---|
|
|
|---|
The improved results obtained when field data were entered as Pattern Deviations is probably explained by the reduction of the influence of cataract on Pattern Deviations. Both Pattern Deviation numerical displays and probability maps were designed to reduce the effect of media opacities. Pattern Deviation misclassified only 2 normal eyes with cataract, whereas 13 were misclassified when Total Deviation was used. The network was designed to identify the absence or presence of glaucomatous visual field loss. Thus, we included subjects with cataract in the normal group and patients with concomitant cataract and glaucoma in the glaucoma group. We used this approach because cataract frequently occurs in the age groups where glaucoma is most prevalent.
The normal fields obtained in healthy subjects without cataract were randomly selected from a larger multicenter database used for calculation of Statpac normal values and normal limits for SITA fields. We do not believe that this has biased our results. A large database including data from multiple centers is probably more representative of a normal population than a smaller sample collected at one center only. We did not use the full database; 66% of the records were randomly selected for the purpose of this study. We also included normal fields of patients with media opacities in our set of normal fields. The results, as presented in ROC curves, depended considerably more on the network output than on the Statpac normal limits. Further, our purpose was to compare different input derived from the same normal and pathologic fields and the conclusion pertaining to that comparison would not be expected to cause any bias, as the effects of the selection of the normal data would be equal in all five parameters.
The five different ANNs correctly classified most fields; but, as expected, normal eyes with substantial cataract were more often classified correctly by the two Pattern Deviationbased ANNs compared with the Total Deviation and unprocessed threshold ANNs (Fig. 4) . In fields with severe damage, Pattern Deviationbased ANNs did not perform as well as ANNs trained with Total Deviation and threshold sensitivities. This was also anticipated, as the Pattern Deviation concept cannot presently be successfully used in end-stage fields.27 28
|
Our results suggest that the ability of artificial neural networks to classify visual fields can be further improved if refined input data based on Pattern Deviations is used. Such input data resulted in higher sensitivity and specificity than did raw threshold sensitivity values, probably because of the formers ability to separate field loss caused by glaucoma from that caused by cataract. Further studies including independent visual field data not used for training of network data are needed to evaluate a more general applicability of ANNs for classification of visual field test results. Neural networks and other machine classifiers seem to have a great potential to become a useful clinical tool in the diagnosis of glaucomatous visual field loss, and it may be of value in the study of the performance of a range of types of data inputs with different machine classifiers.
| Acknowledgements |
|---|
| Footnotes |
|---|
Submitted for publication February 10, 2005; revised April 14 and May 12, 2005; accepted July 1, 2005.
Disclosure: B. Bengtsson, None; D. Bizios, None; A. Heijl, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Boel Bengtsson, Department of Ophthalmology, Malmö University Hospital, Lund University, SE-205 02 Malmö, Sweden; boel.bengtsson{at}oftal.mas.lu.se.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Bowd, J. Hao, I. M. Tavares, F. A. Medeiros, L. M. Zangwill, T.-W. Lee, P. A. Sample, R. N. Weinreb, and M. H. Goldbaum Bayesian Machine Learning Classifiers for Combining Structural and Functional Measurements to Classify Healthy and Glaucomatous Eyes Invest. Ophthalmol. Vis. Sci., March 1, 2008; 49(3): 945 - 953. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Boden, K. Chan, P. A. Sample, J. Hao, T.-W. Lee, L. M. Zangwill, R. N. Weinreb, and M. H. Goldbaum Assessing Visual Field Clustering Schemes Using Machine Learning Classifiers in Standard Perimetry Invest. Ophthalmol. Vis. Sci., December 1, 2007; 48(12): 5582 - 5590. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |