|
|
||||||||
1From the National Health and Medical Research Council (NH&MRC) Centre for Clinical Eye Research, Department of Ophthalmology, Flinders Medical Centre and Flinders University, Bedford Park, Australia; and the 2Department of Optometry, University of Bradford, Bradford, United Kingdom.
| Abstract |
|---|
|
|
|---|
METHODS. The questionnaire was developed and validated using conventional methods and Rasch analysis to assure content validity, repeatability, construct validity, and low respondent burden. Item identification and selection (647 items) were performed with an extensive literature review, professional advice, and lay focus groups. Item reduction used focus groups and data obtained from 161 subjects completing a 90-item pilot questionnaire. Validity and reliability, from data of 128 additional subjects, were assessed using Rasch analysis, intraclass correlation coefficient, and Bland-Altman limits of agreement.
RESULTS. A 28-item CLIQ Questionnaire was developed and shown to have good validity and reliability by Rasch analysis statistics: real person separation, 2.02; model person separation, 2.17; reliability, 0.80; root mean square measurement error, 2.73; mean square ± SD infit, 1.01 ± 0.18; outfit, 1.01 ± 0.19. The items (mean score, 49.8 ± 4.9) were well targeted to the subjects (mean score, 51.2 ± 6.2) with a mean difference of 1.35 (scale range, 0100) units. Testretest intraclass correlation coefficient (0.86) and coefficient of repeatability (±8.00 units) demonstrated good repeatability.
CONCLUSIONS. Rasch analysis and standard psychometric analyses demonstrated that the 28-item CLIQ Questionnaire is a valid and reliable measure of QoL in contact lens wearers. A scoring algorithm is provided for CLIQ Questionnaire users to convert raw scores into the Rasch analysisderived linear person measures.
Changes in QoL of patients wearing contact lenses have been reported with conventionally validated questionnaires.10 11 12 13 14 15 16 17 However, several of these questionnaires are restricted to dry eye symptoms10 13 14 and another to psychological issues.15 In addition, the Refractive Status and Vision Profile (RSVP) and National Eye Institute Visual Function Questionnaire (NEI-VFQ) have been shown to be insensitive to QoL matters relevant to people who wear contact lenses.11 12 The National Eye Institute Refractive Error Quality of Life (NEI-RQL) questionnaire, although not developed specifically for contact lens wearers, has been reported to discriminate between different modes of contact lens wear.16 17 In other studies that report QoL issues related to contact lens wearers, informal, nonvalidated questionnaires were used.18 19 20 21 22 23 24
Considerations in selecting a QoL instrument should include its reliability and validity. Many currently available refractive-errorrelated QoL instruments, including the RSVP, NEI-VFQ, and NEI-RQL, use traditional Likert scoring,25 in which patients response scores for a selected set of items are summed to derive the overall score. Likert scoring assumes that the value of each item represents equal difficulty and therefore scores them equally. In addition, the ordinal integer response scale used for each item assumes uniform changes between response categories. For example, in a Likert-scaled vision disability instrument such as the Activities of Daily Vision Scale (ADVS),26 a response of "a little difficulty" (score of 4) is used to represent twice the level of ability as "extreme difficulty" (score of 2) which is similarly two times as good as "unable to perform the activity due to vision" (score of 1) for all items. This appears illogical, and Rasch analysis has been used to confirm that specific responsecategory calibrations are essential for providing a linear scale.27 Similarly, Likert scales assume that all items are of equal difficulty. For example, with the ADVS instrument an answer of "a little difficulty" to the question regarding visual difficulties driving at night scores the same as the "a little difficulty" with driving during the day. Again, this assumption is illogical. Rasch analysis has been used to confirm that subjects report that driving at night is a more difficult task than driving during the day and Rasch analysis can provide an appropriate weighting for each item.27 Uncorrected, these problems cause discontinuities in the scale and nonlinear measurement, which occurs for instance with the NEI-RQL and the RSVP.28 We have developed a refractive-errorrelated questionnaire using Rasch analysis to overcome these problems and have created a truly linear measure of refractive-errorrelated QoL: the Quality of Life Impact of Refractive Correction Questionnaire (QIRC).8 29 30 QIRC was developed for spectacle wearers, contact lens wearers and refractive surgery patients. Although QIRC may be a good instrument to measure QoL for contact lens wearers, we hypothesize that if a questionnaire were developed specifically for contact lens wearers, it may necessitate a slightly different content to be more sensitive to some issues specific to contact lens wear.
The purpose of this study was to develop and validate a questionnaire, using Rasch analysis, for the measurement of the impact of contact lenses on QoL: The Contact Lens Impact on Quality of Life (CLIQ) questionnaire. The questionnaire was targeted at adults needing refractive correction who did not have other ophthalmic problems, but was confined to the prepresbyopic population. Most contact lens wearers are prepresbyopes,31 and presbyopes are likely to encounter different problems than prepresbyopes, related to the use of multifocal contact lenses, monovision (a contact lens for distance vision in one eye and near vision in the other) or the need for reading glasses in addition to distance vision contact lenses.
| Methods |
|---|
|
|
|---|
Domain and Item Identification and Selection
Domains and items thought likely to be influenced by refractive correction were collected from six sources: a search of the general QoL literature,35 36 37 38 39 a search of the vision-related QoL literature,40 41 42 including that relating to refractive error correction,4 43 44 45 46 47 retrospective analysis of case records at the University of Bradford Eye Clinic, invited responses from 63 practitioners and allied health workers in the fields of optometry, ophthalmology, contact lens practice and psychology, and focus groups (lay people and professionals in the fields mentioned); 647 items were identified. These could be categorized into domains of well-being (n = 192; of these 108 were associated with psychological well-being and 84 with social well-being), functional vision (n = 176), symptoms (n = 97), convenience issues (n = 85), economic issues (n = 54), cognitive issues (n = 24), and health concerns (n = 19). Question format was kept as regular as possible, but different content areas required different question syntax. Two styles of questions were chosen: severity assessment (e.g., How much difficulty do you have... ?) and incidence (e.g., During the past month, how often have you experienced... ?). A five-category response scale was chosen as it has been shown to be more useful and easier to complete compared with four- and seven-category response scales and a visual analogue scale.48 Suitably spaced response labels were selected from the research literature.49 Each different question structure and response scale was allocated to an Andrich rating scale model in the Rasch analysis.
The original 647 items were reduced to 115 by a professional focus group. In this process, many items were merged for having similar content (e.g., reading small print, reading medicine bottles). Others were discarded for not being relevant to most people (e.g., ability to cross-stitch). These 115 items were formatted into a self-administration questionnaire for further discussion by lay focus groups. They recommended discarding a further 25 items and numerous minor rewordings of instructions and questions to assist comprehension. Advice was also taken from the lay focus groups on the wording of the instructions. The 90 items for the pilot questionnaire were distributed among the domains of convenience issues (n = 20), functional vision (n = 19), well-being (n = 16), symptoms (n = 16), health concerns (n = 8), economic issues (n = 8), and cognitive issues (n = 3).
The 90-Item Pilot Questionnaire
The study was designed so that the final questionnaire would be relevant to the population of the United Kingdom. Thus, it was administered in 15 centers throughout the country, chosen to provide data from rural and urban areas and with a good geographical spread. Subjects were chosen on a consecutive-patient basis under the constraints of time inherent in a commercial practice. The success of recruiting a representative population was established through comparison of demographic information regarding gender, ethnicity and socioeconomic classification against U.K. national data.50 Inclusion criteria were age between 16 to 35 years (adult prepresbyopic age) and the use of contact lenses. Exclusion criteria were previous ocular surgery, eye disease, neurologic disease, systemic disease, or medication that may alter visual function and inability to read or understand the questionnaire. Informed consent was obtained from all subjects after the nature of the study had been fully explained. The tenets of the Declaration of Helsinki were observed, and the study gained approval from the university ethics committee. The 90-item pilot questionnaire was completed by self-administration by 161 subjects across optometry and contact lens practice settings. Five questionnaires were discarded due to the absence of demographic data or greater than 33% missing item responses. Rasch analysis was used to identify unusual response patterns. The Rasch model fit statistics infit and outfit mean square were used to monitor the compatibility of the data with the model. Nineteen subjects gave poor Rasch fit statistics (both outfit and infit mean square >1.40) indicating that their responses were very different from most of the subjects, and so the authors reviewed their questionnaires. Twelve of these were retained, as they appeared to provide reliable responses in a pattern different from that of the majority. The seven questionnaires that were discarded either contained the same category response for (typically) the last three pages of the questionnaire (these pages contained questions with reversed scalestypically a good QoL would score 1, but for some questions a good QoL scored 5so that a respondent who marked 1 for all questions on a page suggested responding without reading the questions and therefore unreliable data). This left a final n of 149 (mean age, 26.6 ± 4.7 years). Rasch analysis was then used for item reduction.
Assessment of the Validity and Reliability of the 28-Item CLIQ
The 28-item CLIQ Questionnaire (Table 1) was again administered in several settings across the United Kingdom. Questions 21 to 28 regarding feelings of well-being were asked in relation to the subjects refractive correction, in that the foreword to the questions included the following text: "We are now interested in the effect that your contact lenses have had on the way you have been feeling. The effect on your feelings may be obvious (e.g., you may feel that you look better in your contact lenses) or it may be indirect (e.g., you may feel more confident wearing contact lenses because you feel that you look better)." One hundred forty-two questionnaires were returned. Seven questionnaires were discarded due to absent demographic data or greater than 33% missing item responses. Rasch outfit statistics identified 21 possible inconsistent responders and after review; 14 were retained. This left 128 questionnaires. The validity of the CLIQ data was assessed using Rasch analysis.
|
| Results |
|---|
|
|
|---|
Figure 1 shows a subject QoL against item difficulty map established by Rasch analysis for the original 90-item CLIQ Questionnaire. Subjects (each # on the left represents three subjects) appear in ascending order of QoL from the bottom of the figure to the top. Items appear on the right represented by item numbers, with a decimal representing the response scale boundary. With a five-category scale, there are four boundaries between categories, so that each item is represented in the figure by four points (Fig. 2) . Boundaries occur at the point on the scale where the response most likely to be selected changes from one category to the next, appearing in ascending order of severity of impact of contact lenses on QoL, from the bottom of the figure to the top. In Figure 1 , both subjects and items appear along the same scale, which is a linear transformation of the Rasch logit scale to fit a 0 to 100 scale (Winsteps Umean = 50.39, Uscale = 6.76). For this sample, many items, especially visual functioning items (items 119) have little impact on the QoL of the contact lens wearers. This result is shown as the subjects located higher and item numbers located lower in the Rasch map and illustrates inadequate targeting of item severity to subject QoL issues related to contact lens refractive error correction. If the items were well targeted to the subjects, the means of the two distributions, denoted in Figure 1 by M, would be close to each other. We attempted to improve targeting of items to subjects through response scale reduction and item reduction.
|
|
Although the combined-response category improved the difference between item and subject mean values, there were still several items providing relatively little information. Rasch analysis was then used to remove poorly fitting items from the questionnaire, which were removed one at a time, as item removal changes fit statistics. This improved the fit of some items that initially had high infitoutfit values and reduced the mean difference between item difficulty and subject QoL. The criteria used for item removal were8 27 :
The item with the highest number of candidate criteria, ordered by priority, was removed first. If removal of an item with high or low infitoutfit values considerably decreased person separation (<2.0), that item was retained.54 Person separation is an indicator of the ability (precision) of the instrument to differentiate between different persons QoL. Person separation is expressed as the ratio of the adjusted SD to the root mean square error. A person separation of 2.0 or more is indicative that subjects are significantly different in QoL across the measurement distribution.53 This iterative process finally resulted in a 28-item questionnaire with a real person separation of 2.04 and mean difference of 2.34 units (Fig. 3) . Reducing the number of items further led to decreased person separation. This again was fit to a 0 to 100 scale (Winsteps Umean = 48.66, Uscale = 9.01).
|
Rasch analysis provided valid model statistics (real person separation, 2.02; real reliability, 0.80; model person separation, 2.17; model reliability, 0.82; root mean square measurement error, 2.73; mean square ± SD infit, 1.01 ± 0.18; outfit, 1.01 ± 0.19) and fit statistics showed that all items fit within a range of infit from 0.62 to 1.36 and of outfit from 0.63 to 1.37. Thus, the variance within items extends from 38% (for infit) and 37% (for outfit) less than the expected to 36% (for infit) and 37% (for outfit) more than the expected. These figures are larger than those in Table 1 , as these data are from another sample of subjects. These findings suggest all items measure a unitary concept, without redundancy. A sample (n = 45) of subjects were retested, and reproducibility was found to be good: intraclass correlation coefficient = 0.86 and coefficient of repeatability = ±8.00 units.55 56
Scoring of the CLIQ Questionnaire
Other investigators wishing to use the CLIQ Questionnaire can use our validation data to convert raw scores into Rasch person measures. Each question has a five-category response scale, but items 1 to 20 are collapsed to three categories and items 21 to 28 are collapsed to four categories. Also, items 1 to 20 (lower score is better) have polarity opposite that of items 21 to 28 (higher scores are better), so items 1 to 20 are reversed in polarity to give an overall higher score for better QoL. Therefore, for categories (1, 2, 3, 4, 5) assign (5, 4, 3, 3, 3) to the first 20 items and scores (2, 2, 3, 4, 5) to items 21 to 28. The average of these 28 items gives the CLIQ raw score. The score is related to the CLIQ Rasch person measure as illustrated in Figure 4 . The relationship is double asymptotic, because the average raw rating has a floor and a ceiling. The relationship can be described by the double-asymptotic nonlinear regression57 : CLIQperson measure = 34.41 x log(CLIQraw score/5 CLIQraw score) + 26.69.
|
| Discussion |
|---|
|
|
|---|
According to the assessments suggested by de Boer et al.,58 the psychometric properties of CLIQ were shown to be of high quality. Item selection included a thorough literature review and the input of patients, clinicians, and focus groups. Item reduction was of high quality, as items that measured something different to the overall scale or redundant items were removed using infit and outfit Rasch statistics and items with ceiling and floor effects were also removed. Items were reduced from 647 to 28. Respondent burden is chiefly driven by the number of items, and so reducing the number of items while retaining good measurement properties is a key issue in questionnaire development.54 The respondent burden on the patient was low, with an average completion time for the 28-item CLIQ being less than 11 minutes in nearly all cases.
The item reduction phase highlights an advantage of Rasch analysis beyond the obvious importance that it provides a quantitative score that is a valid linear measurement. We started with items in seven domains and used five rating scale models for theses domains. However, only the well-being rating scale model performed differently from the other four models, and so a two-rating scale analysis could be performed. This method improves the assessment of internal consistency that is attainable using Rasch fit statistics, and allowed us to reduce items while maintaining internal consistency and relevance to the population. The pilot questionnaire, although influenced by suggestions from lay people, was principally clinician-driven and contained many questions relating to functional vision (21%). However, Rasch analysis clearly indicated that subjects with corrected refractive error wearing contact lenses have relatively few problems with functional vision (Fig. 1) , so that many of these questions were not required, with only two items being retained in the 28-item CLIQ. If they had been left in, CLIQ would have targeted the population poorly. This may be why other vision-related QoL instruments that principally contain functional vision items, such as the RSVP (23% items in the functional vision domain), lack sensitivity in subjects with different modes of contact lens wear.12 The results reported in the present study show that subjects at large have few problems with visual function that are not corrected, and issues such as symptoms, convenience, cost, health concerns, and appearance determine the influence of contact lens refractive error correction on QoL (Table 1) . This demonstrates the importance of using Rasch analysis in the development of a questionnaire to ensure good internal consistency and targeting of items to people. Rasch analysis could be applied to existing questionnaires to gain its benefits in terms of scoring, but this approach may expose inadequacies in a conventionally developed questionnaire in terms of internal consistency and item targeting.27 28 This is indeed the case for one refractive error correction related QoL questionnaire, the RSVP,28 but whether any problems exist with the NEI-RQL remains to be tested.
Removal of poorly fitting data was a thorough procedure. We deliberately worded several questions to allow reversal of the scale direction to catch careless responders. Rasch analysis provides a powerful test for inconsistent data through the outfit statistic. Thus, we were able to reduce the influence of poor data on the dataset.
As hypothesized, the content of CLIQ is different from QIRC which, contrastingly, was developed to suit spectacle wearers, contact lens wearers, and people undergoing refractive surgery.29 Sixteen of the 20 items in QIRC are also included in CLIQ.8 Therefore, there are 12 items included in CLIQ that are not found in QIRC. This difference in content is due to developing CLIQ on contact lens patients only and thereby including only content relevant to that population. Although QIRC can be used on contact lens patients, we contend that CLIQ has better content validity for contact lens patients, and should be used in preference to QIRC (or any other questionnaire not solely developed for contact lens wearers) wherever possible. CLIQ should suit all modes of contact lens wear as the test populations did not exclude any lens types, but since they were unselected, the populations were dominated by soft disposable lens wearers, but did include all types. Differences in QoL between different types of contact lenses remain to be demonstrated. This is a key purpose for developing the CLIQ Questionnaire.
The limitations of CLIQ include that it has been developed only for the prepresbyopic population, although this represents most contact lens wearers.31 Nevertheless, CLIQ could still be used in presbyopes as long as it is interpreted that presbyopia-specific issues are not addressed. Another limitation is that CLIQ has a small sample population for testretest reliability analysis.3 Additional testing is needed to assess the remaining psychometric properties required to assess vision-related QoL instruments fully, such as construct validity, responsiveness, and interpretability, as suggested by de Boer et al.58
In conclusion, we present the CLIQ Questionnaire (available online at http://www.iovs.org/cgi/content/full/47/7/2789/DC1). This is a 28-item questionnaire reporting a single-valued score of QoL in contact lens wearers. It has several advantages over existing instruments: demonstrated with Rasch analysis that all items measure a single content area, and scaled using Rasch analysis to be a truly linear measurement of QoL where items are weighted for their impact on QoL.
| Acknowledgements |
|---|
| Footnotes |
|---|
Submitted for publication July 19, 2005; revised December 6, 2005; accepted May 5, 2006.
Disclosure: K. Pesudovs, None; E. Garamendi, None; D.B. Elliott, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: David B. Elliott, Department of Optometry, University of Bradford, Richmond Road, Bradford, West Yorkshire BD7 1DP, UK; d.elliott1{at}bradford.ac.uk.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |