|
|
||||||||
1From the Department of Ophthalmology, University of Florence, Florence, Italy; the 2Department of Ophthalmology University of Udine, Udine, Italy; and the 3Fondazione G. B. Bietti per lo Studio e la Ricerca in Oftalmologia–IRCCS (Istituto Ricerca e Cura a Carattere Scientifico), Rome, Italy.
| Abstract |
|---|
|
|
|---|
METHODS. Medline and Embase were searched electronically and six major ophthalmic journals from 1998 to 2006 were hand searched. Two reviewers independently assessed trial searches, studied quality with the QUADAS (Quality Assessment of Diagnostic Accuracy Studies) checklist, and extracted data. The target disease was clinically significant macular edema (CSME) according to Early Treatment of Diabetic Retinopathy Study (ETDRS) criteria. A bivariate model was used to obtain summary estimates of sensitivity and specificity and fit a summary receiver operating characteristic (ROC) curve.
RESULTS. Fifteen studies were considered eligible. These studies were of good quality for most items of the QUADAS checklist, but most studies did not report masking of examiners and did not describe how withdrawals and undetermined results were treated. Seven studies included healthy control subjects, which could have artificially enhanced OCT diagnostic performance. All but one study included both eyes of the patients without taking into account the within-subject correlation in statistical analyses. Sensitivity and specificity data could be extracted from only 6 of 15 studies, because appropriate cross tabulations of index and reference tests were not reported by the others. In five of these studies, central retinal thickness cutoffs between 230 and 300 µm were adopted to define abnormal OCT results and considered the central type of CSME only, whereas in one study a complex algorithm accounting for extrafoveal CSME was used. The design of one study was case–control and was excluded from the meta-analysis. The expected operating point on the summary ROC, a pooled estimate of all studies, corresponded to a sensitivity of 0.79 (95% CI: 0.71–0.86), a specificity of 0.88 (95% CI: 0.80–0.93), a positive likelihood ratio of 6.5 (95% CI: 4.0–10.7), and a negative likelihood ratio of 0.24 (95% CI: 0.17–0.32). These values suggest a good overall performance of OCT for diagnosing CSME.
CONCLUSIONS. OCT performs well compared with fundus stereophotography or biomicroscopy to diagnose diabetic macular edema. The quality of reporting of such studies should be improved, and authors should present cross tabulations of index and reference test results. Data adjusted for within-subject correlation should also be provided, although this issue represents a challenge for systematic reviewers.
Approximately 25% of people with diabetes have at least some form of diabetic retinopathy, and the incidence increases with the duration of the diabetes. At 10 years, the prevalence of retinopathy in diabetic patients is 7%, after 25 years it is more than 90%.3 In developed countries, diabetic eye disease represents the leading cause of blindness in adults under 75 years of age.4 Diabetic macular edema increases with the duration of diabetes, and the prevalence is 5% within the first 5 years after diagnosis and 15% at 15 years.3
According to the Diabetic Retinopathy Study Group, the risk of severe visual loss at 2 years was 3.2% for eyes with nonproliferative diabetic retinopathy.5 The presence of clinically significant macular edema (CSME) increases the risk of moderate visual loss to approximately 30% to 50%, depending on the level of baseline visual acuity.6
Diabetic macular edema was defined on the basis of stereoscopic fundus photography in ETDRS studies.7 This technique is complicated and difficult to use in a clinical setting and was replaced with contact fundus biomicroscopy, which was found to be in close agreement with stereophotography, particularly for CSME.8 Noncontact fundus biomicroscopy is more commonly used, since sophisticated fundus lenses have been proposed for binocular fundus observation during the past two decades, yet it has been shown to be slightly less sensitive than contact fundus biomicroscopy.9
Optical coherence tomography (OCT) has gained increasing popularity as an objective tool to measure retinal thickness and other aspects associated with macular edema.10 11 12 Standard OCT assessment of diabetic macular edema has been adopted in multicenter trials in patients with diabetic retinopathy by the Diabetic Retinopathy Clinical Research network (DRCR.net).13 An advantage of using OCT is its quantitative assessment, rather than the qualitative evaluation performed with photography or biomicroscopy. In fact, less than optimal diagnostic performances can be expected in a clinical practice when such qualitative methods are used by untrained physicians. It is worth summarizing the large amount of literature published on this topic in a systematic review to investigate whether OCT may perform as well as fundus photography or its surrogates to diagnose CSME, allowing it to become an objective and quantitative alternative to this gold standard.
| Methods |
|---|
|
|
|---|
A secondary purpose of this review was to investigate potential sources of heterogeneity of the estimates of sensitivity and specificity among diagnostic studies.
Definitions and Inclusion Criteria
Diabetes was defined as taking any glucose-lowering medication. The ETDRS definition of CSME was adopted as the definition of the target disease in this review, because it represents the main indication of focal or grid laser photocoagulation.7 The most common type of CSME is central diabetic macular edema (CDME), defined as retinal thickening within 500 µm of the center of the macula or, alternatively, hard exudates within 500 µm from the center of the macula with thickening of the adjacent retina. The noncentral type of CSME is less common and is defined as a zone of retinal thickening, 1 disc area or larger, any portion of which is located within 1 disc diameter from the center of the macula.
The reference tests considered as a valid gold-standard for this review were stereoscopic fundus photography and contact lens or noncontact lens biomicroscopy of the fundus. The index test is be OCT, either low-resolution (OCT 2000, the second-generation model) or high-resolution (Stratus OCT, the third-generation model). Studies were included if these definitions of target disease, reference test, and index test were met.
Search Strategy
Since predefined searches may not retrieve all studies in diagnostic reviews14 we searched Medline (PubMed 1966–September 2006) and Embase (embase.com 2002–October 2006) using the strategy displayed in Table 1 . We also hand searched the index of the following journals from 1998 to 2006: Ophthalmology, Archives of Ophthalmology, American Journal of Ophthalmology, Investigative Ophthalmology and Visual Science, British Journal of Ophthalmology, and Retina. We also hand searched the references of the articles obtained in full.
|
The methodologic quality of the included studies was performed independently by two reviewers (FM, AFD) based on the QUADAS (Quality Assessment of Diagnostic Accuracy Studies) checklist.15 16 Disagreement on study quality was resolved by a third author (GV).
Assessment of Heterogeneity, Subgroup Analyses, and Sensitivity Analyses
We planned to evaluate the following sources of heterogeneity if data were available: clinical setting (primary ophthalmic care versus retina specialist), diabetic retinopathy severity, OCT threshold or algorithm used, type of OCT, type of gold standard. A further source of heterogeneity assessed in this review was also the type of study, especially cohort studies versus case–control studies. In fact, it is well known that the latter can yield more optimistic estimates of diagnostic performance due to the sharp separation of the measurements of diseased patients from healthy control subjects.17 18
Because the ETDRS definition of CSME applies to all types of patients with diabetic retinopathy, we did not exclude specific subtypes of patients in this review. Finally, we did not take into account the role of the biomicroscopic finding of a thickened hyaloid associated with diabetic macular edema on the diagnostic performance of OCT.
Sensitivity analyses were planned with the exclusion of studies of lower methodological quality and the exclusion of case–control studies.
Data Analyses
Descriptive statistics and graphs were produced using MetaDiSc, software presented in an open-source, peer-reviewed, methodologic journal19 which can be downloaded free at http://www.hrc.es/investigacion/metadisc_en.htm. Forest plots of sensitivity, specificity, positive likely ratios, and negative likely ratios were presented for descriptive purposes (see the Glossary). To pool diagnostic odds ratios (DORs, see the Glossary) we planned to use fixed effects if three or fewer articles were found and random effects if more than three were found. Heterogeneity was assessed using the
2 test. Sources of DOR heterogeneity were investigated using multilevel logistic regression as suggested by Siadaty et al.20
The primary analysis in this review was the pooling of pairs of sensitivity and specificity in a random-effects bivariate model, as proposed by Reitsma et al.,21 obtaining a heterogeneous summary receiver operating characteristic (ROC) curve. The bivariate model uses a random effects approach in the estimation of summary estimates of sensitivity and specificity and their corresponding 95% CIs. Because the bivariate model estimates the strength and the shape of the correlation between sensitivity and specificity, we can either draw a 95% confidence ellipse around their mean values or draw a 95% prediction ellipse for individual values of sensitivity and specificity.21 This model overcomes the statistical shortcomings of Moses summary ROC approach22 and may be a simpler alternative to hierarchical ROC models,23 which cannot be fit with many common statistical packages.
Finally, we present the results of a clinical decision-making model24 that helps the clinician develop a rational, quantitative approach to the use of OCT for diagnosing CSME.
Analyses were conducted with commercial statistical software (Stata ver. 9.2; StataCorp, College Station, TX).
| Results |
|---|
|
|
|---|
|
|
The type of gold standard varied and included stereophotography, contact or noncontact biomicroscopy, and a combination of these methods (Table 3) .
Criteria Used to Define Positive versus Negative OCT Results
Data based on different cutoffs of retinal thickness at the central point were extracted for four studies: 300 µm in Brown,44 250 µm in Hee et al.12 and Gaucher et al.,41 and 265 µm in Browning et al.9 Goebel and Franke40 adopted a cutoff of 230 µm of retinal thickness in the central retinal subfield, using the automated macular thickness map protocol built into the OCT software. All of these studies presented data on the central type of CSME—that is, CDME. Sadda et al.39 used the Macular Grid 5 scanning protocol and a complex diagnostic algorithm that accounted for a separate recognition of CDME and the noncentral type of CSME.
Hee et al.12 Brown et al.44 and Gaucher et al.41 reported data for more than one cutoff value. We extracted data based on the cutoff points mentioned, since they provided a good number of eyes in each cell of the cross-tabulation.
Methodological Quality of Included Studies: QUADAS Checklist
The methodological quality of the included studies is shown in Table 4 , in which the quality issue of each QUADAS item is presented. It can be seen that items related to sample recruitment and representativeness, inclusion criteria, patient flow, index and reference test execution were of good quality for all or most studies. However, it was unclear in most studies if the gold standard (stereophotography or biomicroscopy) was interpreted independent of OCT (index test) and vice versa, with three studies stating that OCT interpretation was unmasked to gold standard results. We think that it is unlikely that this is an important source of bias for OCT interpretation, given that OCT provides an objective quantitative estimate of retinal thickness and the assessment procedure is sufficiently standardized. However, the gold-standard evaluation, which is a subjective one, could have been affected by OCT results if the examiners were unmasked. More important, the report and handling of uninterpretable test results, such as low-quality OCT examination with unreliable thickness estimate, was unclear or poorly reported in most studies.
|
Another issue was a mixed design in seven studies, which enrolled both patients with diabetic retinopathy (with or without CSME) and healthy control subjects. This is expected to enhance the accuracy of OCT, but it does not correspond to its use in clinical practice, during which ophthalmologists want to rule in or rule out the presence of CSME in cases of persons with diabetic retinopathy who are at risk of having this macular complication. As already stated, research has shown that the risk of overestimating the performance of the index test is high in case–control studies.17 18 The study by Gaucher et al.41 was classified as having a case–control design, since patients were enrolled as two groups with either presence or absence of CSME, the latter including a majority of diabetic subjects with no retinopathy.
We considered an additional statistical issue that is specific to ophthalmology (i.e., how the correlation between eyes of the same patient is treated). All but one study (Goebel and Kretzchmar-Gross45 ) included both eyes of some patients, but no study presented estimates that took into account the within-patient correlation.
Finally, sample size calculations were not reported in any study, a well-known issue in diagnostic research.52
Results of Analyses
A cross tabulation of dichotomous results with the index and reference test could be obtained for six studies. Raw data and additional study features are available in Table 5 .
|
|
DORs, Meta-analysis, and Assessment of Heterogeneity.
We initially assessed DOR heterogeneity among the six studies providing data for the meta-analysis. Heterogeneity was high if all six studies were included, since the
2 value was 49.5%, which is close to the threshold of 50% and is suggestive of statistical heterogeneity,53 even when the probability only approaches statistical significance (such as in this case; P = 0.078), due to the small number of included studies.
After exclusion of the case–control study by Gaucher et al.,41 as explained in the Methods section, there was no DOR heterogeneity (
2 = 0.0%). Therefore, we excluded this study from further analyses.
The pooled DOR estimate using random effects was 27.1 (95% CI: 17.4–42.2), which is consistent with a good performance of OCT. Of note, the results would not have changed had we incorporated data from Ozdek et al.43 in the meta-analysis, despite their using diabetic macular edema and not CSME for disease definition, since its estimates of sensitivity and specificity were identical with those in Hee et al.12
Because only five studies provided data, and no heterogeneity was found, we could not evaluate sources of heterogeneity consistently. We compareed only two studies12 39 which used the OCT 2000 with three studies using the Stratus OCT.9 40 45 The same two studies also included healthy control subjects. Their diagnostic performance did not differ from that of the other three studies in a multilevel logistic regression model.24
There was also no power to investigate the asymmetry of the summary ROC curve (i.e., the fact that the performance of OCT—the DOR—may be better at high levels of specificity rather than at high levels of sensitivity).54 55
Assessment of Threshold Effect of Retinal Thickness Cutoff Values.
A threshold effect was suggested by the typical shoulder arm displayed by the studies in the ROC plane (Fig. 2) and supported by the good correlation (0.61) of the logit-transformed sensitivity and specificity,21 although it did not reach statistical significance (P = 0.285), due to the small number of studies included.
|
Sadda et al.39 did not use thickness and yielded the highest DOR among all by adopting a complex algorithm accounting for the noncentral type of CSME.
Summary ROC Curve.
Figure 2 presents the summary ROC curve based on a bivariate model.21 If there is no heterogeneity between the studies, such as in our meta-analysis, the best summary estimate of test performance will be a single point on the summary ROC curve (the operating point).54 The expected operating point on the summary ROC corresponded to a sensitivity of 0.79 (95% CI: 0.71–0.86), a specificity of 0.88 (95% CI: 0.80–0.93), a positive likelihood ratio of 6.5 (95% CI: 4.0–10.7) and a negative likelihood ratio of 0.24 (95% CI: 0.17–0.32). The pooled DOR was 27.7 (95% CI: 17.0–45.3) with this method.
A Clinical Decision Model: Applying the Results to Clinical Practice
A decision model can help clinicians place the diagnostic performance of a test in a realistic scenario by integrating its diagnostic power in a question about treatment.24 The current clinical question would be whether a positive or negative OCT result changes the decision to use or not to use photocoagulation on a patient in whom CDME is suspected.
In a diagnostic clinical decision model, we first estimate the treatment threshold24 based on the available literature (i.e., the level of suspicion or probability of CDME, above which the benefits of applying focal or grid photocoagulation outweigh the harm). We computed the treatment threshold by using data from the Early Treatment of Diabetic Retinopathy Study (ETDRS),7 in which grid/focal laser photocoagulation reduced the risk of losing 3 or more lines of visual acuity from approximately 30% to 15% at 3 years. This 15% difference represents the benefit. An estimate of the cost (i.e., the probability of harm when applying treatment to patients, including those incorrectly diagnosed) was not available in the ETDRS or other studies. We arbitrarily used the value of 5% to weigh the loss of the perimacular visual field, the patients discomfort, and the rare risk of accidental foveal photocoagulation or occurrence of choroidal neovascularization. This method would result in a cost/benefit ratio of 0.33 and a treatment threshold at 25% suspicion of CDME, which may be clinically reasonable for such a safe treatment.
We then computed the range of pretest probability within which, using likelihood ratios, a positive or a negative test result can change the treatment decision by, respectively, increasing the probability above the treatment threshold or decreasing it below it. The reader can refer to classic evidence-based medicine books to obtain graphical methods or simple formulas for this purpose.56 57 Using the likelihood ratios derived from our meta-analysis, the range of usefulness of OCT for diagnosing CDME would be from 5% to 58%, which covers levels of low to medium pretest probability or suspicion of disease. The fact that this range mostly coincides with the prevalence of CDME—the pretest probability—in the included studies is reassuring both for the possibility of generalizing the results and for the fact that OCT was indeed used by those authors for patients with this level of diagnostic uncertainty. Indeed, it would be incorrect to use OCT when deciding on treatment if very little uncertainty exists, such as when CDME is almost definitely present or absent with biomicroscopy.
| Discussion |
|---|
|
|
|---|
We found a relatively large number of studies (n = 15) that assessed the performance of OCT for diagnosing CSME in diabetic patients, comparing it with well-established gold standard tests such as stereoscopic fundus photography or fundus biomicroscopy. However, few studies (n = 6) presented sufficient data to extract sensitivity and specificity estimates based on one or more cutoffs of OCT retinal thickness at the central point or central area or using other OCT criteria. In three studies, mean retinal thickness and its SD were presented, and in three more studies these values were given for subgroups of retinopathy severity or healthy control subjects. Such studies cannot be used for meta-analysis.
The results of this systematic review suggest that OCT is a useful tool for diagnosing suspect CDME. The central pooled estimate of our meta-analysis yielded positive and negative likelihood ratios of 6.5 and 0.24, respectively. These values are close to the corresponding values of 5 and 0.2 which were believed to be convincing evidence, but they do not reach the levels of 0.1 and 10 which were considered strong evidence.54 60
Decision Criteria to Use with OCT
We found that the five studies used to obtain the summary ROC curve provided results that suggested a threshold effect. Unfortunately, not all studies used the same OCT diagnostic criteria, nor did they present data for more than one retinal thickness cutoff. Furthermore, a threshold effect can also be due to patients characteristics, rather than to the cutoff value of macular thickness in each study (implicit threshold effect).54
Values of 250 to 265 µm were adopted for three of five studies in our meta-analysis and a thickness of 250 µm has also been used as an OCT inclusion criterion in most studies on CSME by the DRCR.net.13 In our review, the study adopting a cutoff of 300 µm obtained the highest specificity. Evidence-Based Medicine57 advises the use of the acronyms SpPIn and SnNOut to remind readers that a valid test should be calibrated at high specificity (positive result) when the purpose is to rule In disease and at high sensitivity (negative result) to rule out disease.55 56 57 We suggest that, in cases with clinical uncertainty about the existence of CDME, macular photocoagulation should not be used if central retinal thickness is below 250 µm, whereas a value of 300 or more strongly indicates the need for treatment. The pretest probability of CDME should always be taken into account (e.g., one should not delay photocoagulation if a thickness slightly inferior to 250 µm is found in a patient with very high clinical suspicion of CSME), especially if the noncentral type has been identified. In fact, it may be thought that using central retinal thickness results in a poor performance in detecting the second ETDRS definition of CSME, which includes a large patch of retinal edema that does not involve the central part of the macula. The algorithm used by Sadda et al.39 was of interest and deserves further study. The other studies in our meta-analysis did not report data on OCT performance for subgroups of types of CSME, since they were based on data on the central type only (i.e., CDME). In the discussion, the authors of one study44 observed that it is not known whether results would have been similar if macular edema not involving the fovea had been evaluated. They also stated that other studies11 45 demonstrate a strong correlation between extrafoveal and foveal thickness measurements, suggesting that results would be similar.
Validity and Reproducibility of OCT and the Gold Standard
Validity and reproducibility are key issues in therapeutic studies, as in diagnostic studies. A study is valid insofar as the results represent an unbiased estimate of the underlying truth. Reproducibility is the ability of a measure to yield the same result when reapplied to stable patients.54 In diagnostic accuracy studies, an imperfect gold standard may limit the performance of the index test. Fundus biomicroscopy may be less reliable when performed by ophthalmologists with diverse training, compared with the high reproducibility of OCT.58 59 The coefficient of reproducibility of OCT in eyes with diabetic macular edema was shown to be less than 10% with the fast macular thickness mapping protocol of Stratus OCT.61 Clinicians should consider the reliability indexes provided by the Fast Retinal Thickness Map of the Stratus OCT software and average data from semiautomatic measurements in cases with low reliability. The strength of the signal and the SD displayed in the OCT map can help for this purpose.
Risk of Bias in This Review
A few issues regarding the quality of the studies included in this review may be of concern. Although the methodological quality of the eligible studies was good overall for most QUADAS items, the masking of index and reference test was not reported in most studies. Uninterpretable results and withdrawals were also poorly or not reported (four of five studies with data used for meta-analysis). Nonetheless, the number of such patients was negligible, and so this is not expected to represent an important source of bias.
The fact that some studies included healthy control subjects was also of concern (two of five studies with data used for meta-analysis). Their inclusion is expected to enhance the accuracy of OCT, but it does not reproduce its clinical use, in which ophthalmologists want to rule in or rule out CSME in patients with diabetic retinopathy who are at risk of having this macular complication.
A specific quality issue not covered by the QUADAS checklist is the lack of statistical adjustment for the correlation between the patients eyes in all studies. This deficit is expected to yield too optimistic estimates of the standard errors (i.e., too narrow). To the best of our knowledge, methods for obtaining sensitivity and specificity values or ROC curves that take into account correlated data are available,62 but how these should be treated by a systematic reviewer has not been standardized in specific guidelines.
Implications for Practice
This review suggests that OCT can be used to diagnose CSME, particularly its central type or CDME, and decide on laser photocoagulation in patients with intermediate suspicion of disease. The strength of our conclusion is limited by the fact that data could be extracted from only a fraction of the published literature due to limitations in the reporting of the included papers. Furthermore, the precision of our estimate is probably worse than that reported due to the failure to take into the account correlation between eyes of the same patient in all the included studies.
Implications for Research
Several suggestions for further research can be drawn from this review. Most of them refer to the quality of reporting and highlight the need to disseminate the STARD (Standards for Reporting of Diagnostic Accuracy) initiative63 among authors of diagnostic studies in ophthalmology.
The inclusion criteria should be made explicit after framing the clinical question of interest. Including patients with any level of diabetic macular edema was common among the included studies and seems a sensible choice when OCT is used as the last objective diagnostic assessment before deciding on the indication to administer photocoagulation. The inclusion of healthy control subjects or diabetics without retinopathy should be avoided. A case–control design should also be avoided, as previously suggested.17 18
The availability of visual acuity and other clinical data to graders of fundus photographs should be disclosed by the authors.64 Although it may be intuitively reasonable to allow clinical information in a diagnostic study, in fact this information was shown to either enhance or impair the interpretation of a test.64
The reporting of results should include sensitivity and specificity data for more than one cutoff value of OCT retinal thickness, and an ROC curve should be plotted. Furthermore, we suggest that a cross tabulation of the data also be displayed with 25-µm intervals for thicknesses between 200 and 350 µm. The diagnostic performance of OCT for detecting the central and the noncentral types of CSME should be reported separately in subgroup analyses. More complex but less common OCT decision algorithms should also be reported separately and compared to the performance of the thickness criterion.
Finally, the correlation between eyes of the same patient should be taken into account in estimates of sensitivity, specificity, and DOR. However, this process will pose particular problems to systematic reviewers, which could be solved by providing them with individual patient data or making datasets available as additional files on the journals Web site.
Our review has summarized the performance of OCT in cross-sectional diagnostic accuracy studies adopting the current, although evolving, methodology of diagnostic systematic reviews with the purpose of investigating whether OCT is as valid as stereophotography and biomicroscopy for detecting CSME. However, such diagnostic accuracy studies are only one of the steps in the evaluation of diagnostic technologies, which was categorized into six levels from studies that address technical feasibility (level 1) to those that address societal impact (level 6).65 After diagnostic accuracy has been established with respect to traditional gold standards, longitudinal studies of outcome impact are needed to enable incorporation of a new diagnostic test into prognostic and therapeutic clinical pathways. Similar studies by the DRCR.net will help assess whether OCT can replace fundus biomicroscopy as a new objective and quantitative gold standard for diagnosing diabetic macular edema.13
| Appendix 1 |
|---|
|
|
|---|
| Acknowledgements |
|---|
| Footnotes |
|---|
Submitted for publication December 12, 2006; revised March 4 and May 25, 2007; accepted September 6, 2007.
Disclosure: G. Virgili, None; F. Menchini, None; A.F. Dimastrogiovanni, None; E. Rapizzi, None; U. Menchini, None; F. Bandello, None; R. Gortana Chiodini, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Gianni Virgili, Department of Oto-Neuro-Ophthalmological Surgical Sciences, University of Florence, Italy; Viale Morgagni 85, Florence 50134, Italy; gianni.virgili{at}unifi.it.
| References |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||