|
|
Pub Date: |
2013-03-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Test Reliability; Graduate Students; Medical Students; Vocational Evaluation; Video Technology; Generalizability Theory; Interrater Reliability
Abstract:
Reliability estimations of workplace-based assessments with the mini-CEX are typically based on real-life data. Estimations are based on the assumption of local independence: the object of the measurement should not be influenced by the measurement itself and samples should be completely independent. This is difficult to achieve. Furthermore, the variance caused by the case/patient or by assessor is completely confounded. We have no idea how much each of these factors contribute to the noise in the measurement. The aim of this study was to use a controlled setup that overcomes these difficulties and to estimate the reproducibility of the mini-CEX. Three encounters were videotaped from 21 residents. The patients were the same for all residents. Each encounter was assessed by 3 assessors who assessed all encounters for all residents. This delivered a fully crossed (all random) two-facet generalizability design. A quarter of the total variance was associated with universe score variance (28%). The largest source of variance was the general error term (34%) followed by the main effect of assessors (18%). Generalizability coefficients indicated that an approximate sample of 9 encounters was needed assuming a single different assessor per encounter and assuming different cases per encounter (the usual situation in real practice), 4 encounters when 2 raters were used and 3 encounters when 3 raters are used. Unexplained general error and the leniency/stringency of assessors are the major causes for unreliability in mini-CEX. To optimize reliability rater training might have an effect.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Factor Analysis; Validity; Physical Activities; Measures (Individuals); Anxiety; Psychometrics; College Students; Factor Structure; Test Reliability; Self Efficacy; Human Body; Fear; Multivariate Analysis; Reliability; Prediction; Exercise
Abstract:
This study examined the psychometric properties of the Self-Presentational Efficacy Scale (SPES) developed by Gammage, Hall, and Martin Ginis (2004). University students (196 men and 269 women) completed the SPES and measures of social physique anxiety, fear of negative evaluation, and physical activity. Participants also completed the SPES a second time. A series of multivariate data analyses were conducted to examine the SPES's factor structure. Confirmatory factor analysis indicated a 3-factor model, with each factor representing a distinct latent variable. Acceptable internal consistency and test-retest reliability were found. Evidence of concurrent validity with respect to sex and exercise status was demonstrated. Convergent validity was also shown, as relationships to exercise participation and self-presentational anxiety were found. Future research should assess the reliability and validity of the SPES in other samples and the relative and unique contribution of the three factors in predicting exercise-related outcomes. (Contains 4 tables and 1 figure.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-02-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Science Teachers; Biology; Teacher Characteristics; Knowledge Base for Teaching; Pedagogical Content Knowledge; Measures (Individuals); Test Construction; Test Validity; Test Reliability; Item Response Theory
Abstract:
Research on teachers' professionalism and professional development has increased in the last two decades. A main focus of this line of research has been the cognitive component of teacher professionalism, i.e., professional knowledge. Most of the previous studies on teacher knowledge--such as the Learning Mathematics for Teaching (LMT) (Hill et al. 2004), the Professional Competence of Teachers, Cognitively Activating Instruction, and Development of Students' Mathematical Literacy (COACTIV) (Baumert et al. 2010), and the Mathematics Teaching in the 21st Century (MT21) (Schmidt et al. 2007) studies--have been conducted in the field of mathematics teachers' pedagogical content knowledge (PCK) and content knowledge (CK). There have been few comparable studies conducted with science teachers, especially biology teachers. To fill the gap, this study examines the development and use of instruments to measure biology teachers' CK and PCK. In particular, this study describes a method to develop reliable, objective, and valid instruments measuring teachers' CK and PCK in four steps by the use of empirical data of students. Additionally, the study explores whether CK and PCK might be measured as separate knowledge categories by using a paper-and-pencil test. This paper presents a theoretical model that guides test development and provides steps to develop and validate the instruments. Details are also provided regarding the computation of the Rasch scale score measures for 158 biology teachers. The results indicate that the instruments measured teachers' CK and PCK in an objective, valid, and reliable way. This suggests that the new instruments can be used in combination with classroom observations to examine teaching quality and further its relation to student learning.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Psychometrics; Test Reliability; Test Validity; Life Satisfaction; Well Being; Measures (Individuals); High School Students; Serbocroatian; Social Indicators; Foreign Countries
Abstract:
The main purpose of this study was to evaluate psychometric properties of the Serbian version of the Multidimensional Students' Life Satisfaction Scale (MSLSS). The research was carried out on a sample of 408 high school students (250 females, 158 males), with the mean age 16.6. The Serbian version of the MSLSS has demonstrated good psychometric properties. The internal consistency coefficients (Cronbach's alpha) for the MSLSS domain and total scores were adequate. Support for the validity of the MSLSS was provided by the pattern of correlations with various positive and negative indicators of well-being. However, it has been suggested that shortening the scale from 40 items to 25 items could provide more accurate measure of adolescents' life satisfaction for the future research.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Quality of Life; Measures (Individuals); Life Satisfaction; Adolescents; Psychometrics; Item Response Theory; Goodness of Fit; Test Reliability; Test Validity; Test Bias
Abstract:
A scale measuring quality of life (QOL) is important in adolescent research. Using the graded response model (GRM), this study evaluates the psychometric properties of the satisfaction ratings of the Quality of Life Profile Adolescent Version (QOLPAV). Data for 1,392 adolescents were used to check IRT assumptions such as unidimensionality and local item dependence (LID). The goodness of fit of the GRM to the data and the item characteristic curves were evaluated. The reliability and validity analyses included item/test information, Cronbach's alpha, and convergent and discriminant validity. Differential item functioning (DIF) procedures were also performed to detect item bias. The results provide evidence that the items sufficiently measured one single dimension. Few pairs of questions were flagged as LID due to content or wording similarity. Five items did not fit the GRM, and 4 were low in item discrimination. The findings also suggest that the assessment had appropriate reliability and validity. The DIF impact on the assessment score was considered minor. Because QOLPAV includes a respondent's perceived importance of various life aspects, a short form that only considers important life aspects in the overall QOL estimation for each respondent becomes feasible within the framework of IRT. Future studies focusing on the development of a QOL overall index using the items from QOLPAV are recommended.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Foreign Countries; Life Satisfaction; Well Being; Economic Climate; Human Capital; Social Capital; Surveys; Gender Differences; Statistical Analysis; Factor Analysis; Regression (Statistics); Test Reliability; Test Validity; Questionnaires
Abstract:
Research on subjective well-being (SWB) for western nations has been growing for the last 30 years. So far there has not been any study of Subjective Well-Being in the case of Greece. This study is the first attempt to quantify the SWB in Greece which is in a state of deep economic and values crisis. For this purpose the Personal Well-being Index (PWI) developed by Cummins et al. (2002) and used by the International Well-being Group has been applied. Additionally this study attempts to give answers to two research questions: (i) what is the effect of economic crisis on PWI and as a consequence on the homeostasis hypothesis? (ii) is there any indication of association between different types of capital (built, natural, human, and social capital) and domains of life satisfaction? A cross-sectional survey of 1,216 participants included sociodemographic variables, questions relating to dimensions or domains of personal well-being, and questions pertaining to built, human, natural and social capital. Based on cross-sectional data, statistical analyses were performed for the whole sample and for men and women to account for gender differences. Descriptive, correlation, factor and regression statistical techniques were used. Regression models were used to determine, which types of capital variables had a statistically significant association with each domain of life satisfaction. The statistical results of this study demonstrate the reliability and validity of the Greek adaptation of the Cummins questionnaire. Significant differences are found between men and women in personal well-being index score. The results support the hypothesis that the economic crisis has an impact on personal well-being. It is, therefore, possible that such an impact affects the state of homeostasis. This suggests that other mechanisms such as homeorhesis may be applicable in explaining the behavior of the state of personal well-being index. Different types of capital and domains of life satisfaction are found to be positively related. These findings must be considered in light of cross-sectional limitations. This study evaluated the psychometric characteristics of the Greek version of the Cummins questionnaire. The PWI results are not within the range of normative data for western nations. This is an interesting and important result: it shows that the economic crisis matters significantly for personal well-being. The statistical results of this study offer an indicational support for the role of types of capital on domains of life satisfaction. Since we are using cross-sectional data, no causal inferences can be drawn.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Religious Factors; Well Being; Prosocial Behavior; Measures (Individuals); Factor Structure; Test Reliability; Test Validity; Predictive Validity
Abstract:
Numerous studies suggest spirituality and subjective well-being (SWB) are positively associated. However, critics argue that popular spirituality instruments--including the Daily Spiritual Experiences Scale (DSES)--contain items that conflate religiosity/spirituality (R/S), prosociality and SWB. Advocates of the DSES retort that, despite this concern, the available evidence confirms a single underlying factor. The current paper evaluates the DSES's development, factor structure, reliability and convergent and predictive validity using a community sample. Despite the full DSES scale's excellent internal reliability, two related factors--theism and civility--are identified. Both scales are reliable and converge meaningfully with related R/S measures. As expected, given previous findings, the full DSES scale predicts higher SWB yet the two subscales display divergent associations. This finding offers new insights into the DSES and raises questions about the claimed belief-as-benefit effect.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Zhang, Jinming |
Source: |
Psychometrika, v78 n1 p37-58 Jan 2013 |
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Adaptive Testing; Simulation; Computer Assisted Testing; Test Reliability; Item Response Theory; Psychometrics; Test Items; Measurement Techniques; Test Construction; Data Analysis
Abstract:
In some popular test designs (including computerized adaptive testing and multistage testing), many item pairs are not administered to any test takers, which may result in some complications during dimensionality analyses. In this paper, a modified DETECT index is proposed in order to perform dimensionality analyses for response data from such designs. It is proven in this paper that under certain conditions, the modified DETECT can successfully find the dimensionality-based partition of items. Furthermore, the modified DETECT index is decomposed into two parts, which can serve as indices of the reliability of results from the DETECT procedure when response data are judged to be multidimensional. A simulation study shows that the modified DETECT can successfully recover the dimensional structure of response data under reasonable specifications. Finally, the modified DETECT procedure is applied to real response data from two-stage tests to demonstrate how to utilize these indices and interpret their values in dimensionality analyses.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Reports - Descriptive |
Peer Reviewed: |
|
|
|
|
Descriptors:
College Entrance Examinations; Predictive Validity; Test Reliability; Test Validity; Business Education; Predictive Measurement; Information Skills; Skill Analysis
Abstract:
Business schools seek students who can evaluate, synthesize and extract the important information and sort out the noise from very large volumes of data. With the launch of the Integrated Reasoning section in June, the GMAT exam started measuring these skills, which are essential for learning in today's programs, are expected of those who intend to work in business, and are of critical importance to the businesses they may create or join in the future. In the first six months of Integrated Reasoning, more than 105,000 exams have been administered. While it will take more time to establish predictive validity for individual programs--that is, to state precisely to what extent the section adds to the already high ability of the GMAT exam to predict test takers' potential for success in the classroom--some preliminary analysis has been conducted to see whether the test is showing any bias toward or against any subgroups of test takers, and how test takers who score similarly on the Quantitative and Verbal sections perform on the new section. This paper presents an overview of what GMAT IR scores mean to schools 6 months after launch.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
ERIC
Full Text (176K)
|
Author(s): |
Camilli, Gregory |
Source: |
Educational Research and Evaluation, v19 n2-3 p104-120 2013 |
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Alternative Assessment; Test Bias; Test Content; Test Format; Test Items; Test Length; Test Norms; Test Reliability; Test Validity; Testing Problems; Robustness (Statistics); Group Testing; Individual Testing; Evidence; Prediction; Item Analysis; Educational Research; Evaluation Methods; Evaluation Research; Evaluation Problems; Student Evaluation; Performance Factors; Barriers
Abstract:
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative approaches to test fairness, counterfactual reasoning is useful to clarify a potential charge of unfairness: Is it plausible to believe that with an alternative assessment (test or item) or under different test conditions an individual or groups of individuals may have fared better? Beyond comparative questions, fairness can also be framed by moral and ethical choices. A number of ongoing issues are evaluated with respect to these topics including accommodations, differential item functioning (DIF), differential prediction and selection, employment testing, test validation, and classroom assessment.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|