Author(s): |
Brennan, Robert L. |
Source: |
Journal of Educational Measurement, v50 n1 p74-83 Spr 2013 |
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Opinion Papers |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Validity; Test Interpretation; Test Use; Scores; Inferences; Scoring; Generalization; Test Results
Abstract:
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation and/or use of test scores (IUA); and (2) provide an evaluation of the claims (validity argument). Kane discusses four types of inferences that provide a scaffolding for addressing these two arguments: scoring, generalization, extrapolation, and decision rules. Decision rules, in particular, are closely related to consequences, which loom large in the argument-based approach to validation. The present commentary on Kane's paper attempts to simplify some of his discussions, while expanding others. The author suggests that Kane's argument-based approach to validation offers by far the best current basis for optimism about improvements in validation. (Contains 7 notes.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-04-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Memory; Personality Traits; Semantics; Scoring; Cognitive Style; Personality; Metacognition; Task Analysis; Self Efficacy; Scores; Measures (Individuals); Correlation; Decision Making
Abstract:
In learning contexts, people need to make realistic confidence judgments about their memory performance. The present study investigated whether second-order judgments of first-order confidence judgments could help people improve their confidence judgments of semantic memory information. Furthermore, we assessed whether different personality and cognitive style constructs help explain differences in this ability. Participants answered 40 general knowledge questions and rated how confident they were that they had answered each question correctly. They were then asked to adjust the confidence judgments they believed to be most unrealistic, thus making second-order judgments of their first-order judgments. As a group, the participants did not increase the realism of their confidence judgments, but they did significantly increase their confidence for correct items. Furthermore, participants scoring high on an openness composite were more likely to display higher confidence after both the first- and second-order judgments. Moreover, participants scoring high on the openness and the extraversion composites were more likely to display higher levels of overconfidence after both the first- and second-order judgments. In general, however, personality and cognitive style factors showed only a weak relationship with the ability to modify the most unrealistic confidence judgments. Finally, the results showed no evidence that personality and cognitive style supported first- and second-order judgments differently.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-03-00 |
Pub Type(s): |
Journal Articles; Reports - Descriptive |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Public Agencies; State Government; Financial Support; State Aid; Smoking; Health Promotion; Health Programs; Program Evaluation; Training; Role; Evaluators; Technical Writing; Reports; Information Utilization; Attitudes; Stakeholders; Accountability; Program Effectiveness; Test Construction; Scoring
Abstract:
Nearly all private, government and non-governmental organizations that receive government funding to run social or health promotion programs in the United States are required to conduct program evaluations and to report findings to the funding agency. Reports are usually due at the end of a funding cycle and they may or may not have an influence on the continuation of program funding. The final evaluation report (FER), as the end-of-funding-cycle report is often called, generally relates the intervention and evaluation results of the funding period and has a dual purpose. It is considered an element of accountability and should give the program and its stakeholders direction for the future. All too often though, this is not the case. Evaluators have voiced myriad concerns about the many issues related to reports and their usage. In their study of a random sample of American Evaluation Association members, Torres et al. (1997) found that evaluators are generally discontent about reporting and about the fact that their reports are often misused or not used at all. Evaluation reports could be a valuable instrument for moving projects forward if stakeholders and project staff would make good use of evaluation findings. The Tobacco Control Evaluation Center (TCEC) (2006) at the University of California at Davis developed scoring measures for final report writing for over 100 local tobacco control projects in California but found 2007 reports lacking in quality. In 2010, it conducted a training campaign in the hope that the projects themselves, the funding government agency and TCEC may make better use of the reports. The response to the training call was overwhelming, and comparing scores from 2007 and 2010, participating agencies made statistically significant improvements but non-participants did not. Results relating to the mode of training were inconclusive. The pre- and post-score comparison proved to be a valuable measuring tool, and the 1-day face-to-face training was a useful training mode. (Contains 1 table.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Deane, Paul |
Source: |
Assessing Writing, v18 n1 p7-24 Jan 2013 |
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Scoring; Essays; Text Structure; Writing (Composition); Evaluation Criteria; Persuasive Discourse; Definitions; Evaluation Problems; Criticism; Writing Evaluation; Essay Tests; Computer Assisted Testing; Validity; Measurement; Psychometrics; Writing Skills
Abstract:
This paper examines the construct measured by automated essay scoring (AES) systems. AES systems measure features of the text structure, linguistic structure, and conventional print form of essays; as such, the systems primarily measure text production skills. In the current state-of-the-art, AES provide little direct evidence about such matters as strength of argumentation or rhetorical effectiveness. However, since there is a relationship between ease of text production and ability to mobilize cognitive resources to address rhetorical and conceptual problems, AES systems have strong correlations with overall performance and can effectively distinguish students in a position to apply a broader writing construct from those for whom text production constitutes a significant barrier to achievement. The paper begins by defining writing as a construct and then turns to the e-rater scoring engine as an example of AES state-of-the-art construct measurement. Common criticisms of AES are defined and explicated--fundamental objections to the construct measured, methods used to measure the construct, and technical inadequacies--and a direction for future research is identified through a socio-cognitive approach to AES. (Contains 4 figures.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Condon, William |
Source: |
Assessing Writing, v18 n1 p100-108 Jan 2013 |
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Measurement; Psychometrics; Evaluation Methods; Educational Testing; Writing Tests; Measures (Individuals); Writing Evaluation; Scoring; Writing (Composition); Essays; Negative Attitudes; Vendors; Essay Tests; Computer Assisted Testing; Internet; Validity; Comparative Analysis
Abstract:
Automated Essay Scoring (AES) has garnered a great deal of attention from the rhetoric and composition/writing studies community since the Educational Testing Service began using e-rater[R] and the "Criterion"[R] Online Writing Evaluation Service as products in scoring writing tests, and most of the responses have been negative. While the criticisms leveled at AES are reasonable, the more important, underlying issues relate to the aspects of the writing construct of the tests AES can rate. Because these tests underrepresent the construct as it is understood by the writing community, such tests should not be used in writing assessment, whether for admissions, placement, formative, or achievement testing. Instead of continuing the traditional, large-scale, commercial testing enterprise associated with AES, we should look to well-established, institutionally contextualized forms of assessment as models that yield fuller, richer information about the student's control of the writing construct. Such tests would be more valid, as reliable, and far fairer to the test-takers, whose stakes are often quite high. (Contains 1 figure.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Educational Testing; Guidelines; Scoring; Psychometrics; Evaluation Criteria; Program Descriptions; Vendors; Writing Evaluation; Essay Tests; Computer Assisted Testing; Program Evaluation; Evaluation Methods; Measurement
Abstract:
In this paper, we provide an overview of psychometric procedures and guidelines Educational Testing Service (ETS) uses to evaluate automated essay scoring for operational use. We briefly describe the e-rater system, the procedures and criteria used to evaluate e-rater, implications for a range of potential uses of e-rater, and directions for future research. The description of e-rater includes a summary of characteristics of writing covered by e-rater, variations in modeling techniques available, and the regression-based model building procedure. The evaluation procedures cover multiple criteria, including association with human scores, distributional differences, subgroup differences and association with external variables of interest. Expected levels of performance for each evaluation are provided. We conclude that the "a priori" establishment of performance expectations and the evaluation of performance of e-rater against these expectations help to ensure that automated scoring provides a positive contribution to the large-scale assessment of writing. We call for continuing transparency in the design of automated scoring systems and clear and consistent expectations of performance of automated scoring before using such systems operationally. (Contains 1 figure and 1 table.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Writing Evaluation; Scoring; Writing Instruction; Essays; Essay Tests; Computer Assisted Testing; Statistical Analysis; At Risk Students; College Freshmen; Ethnic Groups; Computer Software Evaluation; Automation; Research Universities; Student Placement
Abstract:
This study investigated the use of automated essay scoring (AES) to identify at-risk students enrolled in a first-year university writing course. An application of AES, the "Criterion"[R] Online Writing Evaluation Service was evaluated through a methodology focusing on construct modelling, response processes, disaggregation, extrapolation, generalization, and consequence. Based on the results of our two-year study with students (N = 1,482) at a public technological research university in the United States, we found that "Criterion" offered a defined writing construct congruent with established models, achieved acceptance among students and instructors, showed no statistically significant differences between ethnicity groups of sufficient sample size, correlated at acceptable levels with other writing measures, performed in a stable fashion, and enabled instructors to identify at-risk students to increase their course success. (Contains 5 tables.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|