|
|
Pub Date: |
2013-07-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Test Items; Test Content; Test Bias; Individual Differences; Accuracy; Identification; Test Construction
Abstract:
In organizational and educational practices, sensitivity reviews are commonly advocated techniques for reducing test bias and enhancing fairness. In the present paper, results from two studies are reported which investigate how effective individuals are at detecting problematic test content and the influence such content has on important testing outcomes. In Study 1, signal detection analyses are used to examine the role of individual differences in the identification of insensitive test items, while Study 2 investigates the extent to which insensitivity differentially influences item performance and reactions. Results revealed small but significant differences in the overall accuracy and response tendencies of student test reviewers on the basis of demographics and key individual differences variables. Contrary to predictions however, problematic items did not exhibit differential item functioning across sex nor did their presence engender negative test taker reactions. Implications and suggestions for future research and sensitivity review practices are discussed. (Contains 6 tables and 2 figures.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-03-00 |
Pub Type(s): |
Journal Articles; Reports - Descriptive |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Public Agencies; State Government; Financial Support; State Aid; Smoking; Health Promotion; Health Programs; Program Evaluation; Training; Role; Evaluators; Technical Writing; Reports; Information Utilization; Attitudes; Stakeholders; Accountability; Program Effectiveness; Test Construction; Scoring
Abstract:
Nearly all private, government and non-governmental organizations that receive government funding to run social or health promotion programs in the United States are required to conduct program evaluations and to report findings to the funding agency. Reports are usually due at the end of a funding cycle and they may or may not have an influence on the continuation of program funding. The final evaluation report (FER), as the end-of-funding-cycle report is often called, generally relates the intervention and evaluation results of the funding period and has a dual purpose. It is considered an element of accountability and should give the program and its stakeholders direction for the future. All too often though, this is not the case. Evaluators have voiced myriad concerns about the many issues related to reports and their usage. In their study of a random sample of American Evaluation Association members, Torres et al. (1997) found that evaluators are generally discontent about reporting and about the fact that their reports are often misused or not used at all. Evaluation reports could be a valuable instrument for moving projects forward if stakeholders and project staff would make good use of evaluation findings. The Tobacco Control Evaluation Center (TCEC) (2006) at the University of California at Davis developed scoring measures for final report writing for over 100 local tobacco control projects in California but found 2007 reports lacking in quality. In 2010, it conducted a training campaign in the hope that the projects themselves, the funding government agency and TCEC may make better use of the reports. The response to the training call was overwhelming, and comparing scores from 2007 and 2010, participating agencies made statistically significant improvements but non-participants did not. Results relating to the mode of training were inconclusive. The pre- and post-score comparison proved to be a valuable measuring tool, and the 1-day face-to-face training was a useful training mode. (Contains 1 table.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-03-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
School Personnel; Reading Fluency; Emergent Literacy; Psychometrics; Equated Scores; Grade 2; Oral Reading; Elementary School Students; Factor Analysis; Measurement; Scores; Comparative Analysis; Error of Measurement; True Scores; Test Construction; Curriculum Based Assessment; Literacy; Validity; Reading Instruction; Reading Programs
Abstract:
Lack of psychometric equivalence of oral reading fluency (ORF) passages used within a grade for screening and progress monitoring has recently become an issue with calls for the use of equating methods to ensure equivalence. To investigate the nature of the nonequivalence and to guide the choice of equating method to correct for nonequivalence, the authors fit linear and nonlinear confirmatory factor analytic measurement models to Dynamic Indicators of Basic Early Literacy Skills (DIBELS) second-grade ORF passages routinely used for spring testing. They found evidence of nonlinear relations among passage scores that indicated equipercentile equating would be the best choice of equating method compared with mean or linear equating. The standard error of equating (SEE) with a sample of 600 participants was acceptable and less then two correct words per minute for equated scores from 0 to 150, which covers 95% and the useful range of scores. Consistent with the small SEE, the equating table also successfully removed all form differences in an independent sample of second graders. Given the widespread adoption of DIBELS in thousands of schools serving millions of students, equating all passages within a grade would substantially improve the quality of the tool and dramatically lower the assessment burden on school personnel. (Contains 5 tables and 5 figures.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-02-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Science Teachers; Biology; Teacher Characteristics; Knowledge Base for Teaching; Pedagogical Content Knowledge; Measures (Individuals); Test Construction; Test Validity; Test Reliability; Item Response Theory
Abstract:
Research on teachers' professionalism and professional development has increased in the last two decades. A main focus of this line of research has been the cognitive component of teacher professionalism, i.e., professional knowledge. Most of the previous studies on teacher knowledge--such as the Learning Mathematics for Teaching (LMT) (Hill et al. 2004), the Professional Competence of Teachers, Cognitively Activating Instruction, and Development of Students' Mathematical Literacy (COACTIV) (Baumert et al. 2010), and the Mathematics Teaching in the 21st Century (MT21) (Schmidt et al. 2007) studies--have been conducted in the field of mathematics teachers' pedagogical content knowledge (PCK) and content knowledge (CK). There have been few comparable studies conducted with science teachers, especially biology teachers. To fill the gap, this study examines the development and use of instruments to measure biology teachers' CK and PCK. In particular, this study describes a method to develop reliable, objective, and valid instruments measuring teachers' CK and PCK in four steps by the use of empirical data of students. Additionally, the study explores whether CK and PCK might be measured as separate knowledge categories by using a paper-and-pencil test. This paper presents a theoretical model that guides test development and provides steps to develop and validate the instruments. Details are also provided regarding the computation of the Rasch scale score measures for 158 biology teachers. The results indicate that the instruments measured teachers' CK and PCK in an objective, valid, and reliable way. This suggests that the new instruments can be used in combination with classroom observations to examine teaching quality and further its relation to student learning.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Zhang, Jinming |
Source: |
Psychometrika, v78 n1 p37-58 Jan 2013 |
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Adaptive Testing; Simulation; Computer Assisted Testing; Test Reliability; Item Response Theory; Psychometrics; Test Items; Measurement Techniques; Test Construction; Data Analysis
Abstract:
In some popular test designs (including computerized adaptive testing and multistage testing), many item pairs are not administered to any test takers, which may result in some complications during dimensionality analyses. In this paper, a modified DETECT index is proposed in order to perform dimensionality analyses for response data from such designs. It is proven in this paper that under certain conditions, the modified DETECT can successfully find the dimensionality-based partition of items. Furthermore, the modified DETECT index is decomposed into two parts, which can serve as indices of the reliability of results from the DETECT procedure when response data are judged to be multidimensional. A simulation study shows that the modified DETECT can successfully recover the dimensional structure of response data under reasonable specifications. Finally, the modified DETECT procedure is applied to real response data from two-stage tests to demonstrate how to utilize these indices and interpret their values in dimensionality analyses.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Mislevy, Robert J.; Haertel, Geneva; Cheng, Britte H.; Ructtinger, Liliana; DeBarger, Angela; Murray, Elizabeth; Rose, David; Gravel, Jenna; Colker, Alexis M.; Rutstein, Daisy; Vendlinski, Terry |
Source: |
Educational Research and Evaluation, v19 n2-3 p121-140 2013 |
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Testing Accommodations; Access to Education; Testing; Psychometrics; Test Bias; Standardized Tests; Construct Validity; Test Construction; Test Reliability; Test Validity; Test Theory; Educational Principles; Inferences; Measurement Objectives; Measurement Techniques; Evaluation Methods; Evaluation Problems; Evaluation Research; Student Evaluation; Educational Research; Performance Factors
Abstract:
Standardizing aspects of assessments has long been recognized as a tactic to help make evaluations of examinees fair. It reduces variation in irrelevant aspects of testing procedures that could advantage some examinees and disadvantage others. However, recent attention to making assessment accessible to a more diverse population of students highlights situations in which making tests identical for all examinees can make a testing procedure less fair: Equivalent surface conditions may not provide equivalent evidence about examinees. Although testing accommodations are by now standard practice in most large-scale testing programmes, for the most part these practices lie outside formal educational measurement theory. This article builds on recent research in universal design for learning (UDL), assessment design, and psychometrics to lay out the rationale for inference that is conditional on matching examinees with principled variations of an assessment so as to reduce construct-irrelevant demands. The present focus is assessment for special populations, but it is argued that the principles apply more broadly. (Contains 3 tables, 2 figures, and 2 notes.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Herman, Joan; Linn, Robert |
Source: |
National Center for Research on Evaluation, Standards, and Student Testing (CRESST) |
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Reports - Research |
Peer Reviewed: |
|
|
|
|
Descriptors:
Consortia; Student Evaluation; Educational Testing; Academic Standards; State Standards; Evidence; Test Construction; Summative Evaluation; Test Content; Test Items; Critical Thinking; Problem Solving
Abstract:
Two consortia, the Smarter Balanced Assessment Consortium (Smarter Balanced) and the Partnership for Assessment of Readiness for College and Careers (PARCC), are currently developing comprehensive, technology-based assessment systems to measure students' attainment of the Common Core State Standards (CCSS). The consequences of the consortia assessments, slated for full operation in the 2014/15 school year, will be significant. The assessments themselves and their results will send powerful signals to schools about the meaning of the CCSS and what students know and are able to do. If history is a guide, educators will align curriculum and teaching to what is tested, and what is not assessed largely will be ignored. Those interested in promoting students' deeper learning and development of 21st century skills thus have a large stake in trying to assure that consortium assessments represent these goals. Funded by the William and Flora Hewlett Foundation, UCLA's National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia's assessment development efforts are likely to produce tests that measure and support goals for deeper learning. This report summarizes CRESST findings thus far, describing the evidence-centered design framework guiding assessment development for both Smarter Balanced and PARCC as well as each consortia's plans for system development and validation. This report also provides an initial evaluation of the status of deeper learning represented in both consortia's plans. Study results indicate that PARCC and Smarter Balanced summative assessments are likely to represent important goals for deeper learning, particularly those related to mastering and being able to apply core academic content and cognitive strategies related to complex thinking, communication, and problem solving. At the same time, the report points to the technical, fiscal, and political challenges that the consortia face in bringing their plans to fruition. (Contains 5 tables, 5 figures and 1 footnote.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
ERIC
Full Text (1851K)
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Foreign Countries; Crime; Developing Nations; Voting; Test Construction; Census Figures; Social Indicators; Public Policy; Measurement; Sociometric Techniques; Statistical Analysis
Abstract:
Given limited resource availability in a developing nation like India, faced with high incidences of crime, it is important to optimize on the resources spent in combating crime by channelling them to proper direction. This requires an understanding of the actual and overall level of crime across India. Our paper provides a complete understanding of the various indicators of violent crime and the determinants of these crimes in India using district level data for three census years, namely, 1981, 1991 and 2001. We construct three alternative crime-burden indices. Including a variable like voter turnout in state election at the district level, we document significant impact of public awareness to reduce and combat crime. The constructed crime burden index shows that states located in northern parts of India have more incidences of crime compared to states in the south. We also find that our estimated crime-burden indices tend to report in general a higher level of crime-burden than the average based index. This suggests controlling for the factors beyond population while constructing the aggregate crime-burden index for any country is essential. Our work although is limited to the Indian data, we however, believe that this can be easily applied to various other countries.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|