|
Introduction When we evaluate teaching, we ordinarily want to assess
both what the teaching hopes to help people learn and whether it is successful with its intent. In short, we want to know
what kind of sustained influence the teaching and the course have on the way students think and act. Does the teaching help
and encourage students to learn something we regard as important and appropriate? Does it have a sustained, substantial, and
positive influence on the way students think and act (we might also want to know whether the instructor has assessed students'
learning accurately and fairly). How can an evaluator know any of this? Data to answer these
questions might come from a variety of sources, including syllabi and other course material or an instructor's reflective
essay on the intellectual definition of the course. Part of the data might come from student ratings and comments. Committees,
chairs, deans, the Provost, and the President must evaluate using the data from student ratings and comments and from other
sources. Notice the distinction between sources of information about teaching and the process of evaluation. I. Student Ratings and the Evaluation of Teaching What
can student ratings tell us that will help us find out what the teaching hopes to accomplish and whether it has been successful?
The literature on student ratings is voluminous. One summary in September 1995 found more than 1500 articles and books on the subject.1 The extensive research covered in those
works has found that student ratings and comments can provide valid and reliable information that can help an evaluator determine
the effectiveness of a teacher. Indeed, the research has discovered that student ratings can correlate well with external
measures of student learning and with instructor self-ratings when the latter are collected independent of personnel decisions.
It has also found that student ratings are statistically reliable (i. e., they have internal stability and are consistent
over time), are more statistically reliable than are colleague ratings, and are not easily or automatically manipulated by
grades. In fact, intellectually challenging classes average higher ratings than do easier courses with light work loads.2 Most important, student ratings can, as one observer
put it, "report the extent to which the students have been reached [educationally]."3 Yet student ratings
have their limitations. We will say more later about those limits, but, first, let us consider how student ratings can help: Research has found that certain questions produce
the most reliable results. The following types of questions have a strong track record: 4 Using a six-point
scale (1=lowest; 6=highest) - Provide an overall rating of the instructor.
- Give an overall rating of the course.
- Estimate how much you learned in the course.
Two additional questions
find favor with many evaluators because they also solicit information about the student's perception of the results of
instruction--in essence, asking did the course reach you educationally--and are, therefore, highly recommended: Again, on a six-point scale: - Rate the effectiveness of the
instructor in stimulating your interest in the subject 5
- Rate the effectiveness of this course in challenging you intellectually.6
A form should also collect somedemographic information on students: 7- My classification is: a) Graduate b) Senior
c) Junior d) Sophomore e) Freshman
- My major is:
- I took this
course to satisfy: a) Major or minor field requirements b) Other specific degree requirements c) Elective credits required
for a degree d) Non-degree requirements e) To satisfy general interest
- Before taking this
course my interest in the subject was: a) Very low b) low c) average d) high e) very high
A consideration of these factors can help control for
possible sources of bias in student ratings. Research has found, for example, that prior student interest in the subject does influence the outcome of student ratings of effectiveness.8
Finally, a form can usefully solicit open-ended responses if evaluators are willing to read all of the student comments
on a given class. To read only a few responses invites distortions in the mind of the evaluator. Because student ratings and
student comments are virtually identical in character, evaluators are likely to make fewer mistake if they use ratings only rather than ratings plus a reading of only a few of those comments
or summaries of them.
|
 |
 |
|
End Notes - William E. Cashin, "Student Ratings of Teaching: The Research Revisited." IDEA Paper, No. 32, September, 1995
[Center for Faculty Evaluation and Development, Kansas State University, Manhattan, KS]. Return to Text
- See, for example, Peter A. Cohen. "Student Ratings of Instruction and Student Achievement: A Meta-analysis of Multisection
Validity Studies." Review of Educational Research 51 (Fall, 1981): 281-309; Judith D. Aubrecht. "Are Student Ratings
of Teacher Effectiveness Valid?" IDEA Paper, No. 2, November 1979 [Center for Faculty Evaluation and Development, Kansas
State University, Manhattan, KS]; Robert T. Blackburn and Mary Jo Clark. "An Assessment of Faculty Performance: Some
Correlates Between Administrator, Colleague, Student and Self-ratings." Sociology of Education 48 (Spring, 1975): 242-256;
Larry Braskamp and Darrel Caulley. "Student Ratings and Instructor Self-ratings, and Their Relationship to Student Achievement."
American Educational Research Journal 16 (Summer, 1979): 295-306; Frank Costin, William Greenough, and Robert Menges. "Student
Ratings of College Teaching: Reliability, Validity, and Usefulness." Review of Educational Research 41 (December, 1971):
511-535; Frank Costin, "Do Student Ratings of College Teachers Predict Student Achievement?" Teaching of Psychology
5 (April, 1978): 86-88. Return to Text
- Kenton Machina. "Evaluating Student Evaluations." Academe 73 (May-June, 1987): 19-20. Return to Text
- P. C. Abrami. "How Should We Use Student Ratings to Evaluate Teaching?" Research in Higher Education 30 (1989):
221-227; J. A. Centra. Reflective Faculty Evaluation: Enhancing Teaching and Determining Faculty Effectiveness. San Francisco:
Jossey-Bass, 1993; Larry A. Braskamp and John C. Ory. Assessing Faculty Work: Enhancing Individual and Institutional Performance.
San Francisco: Jossey-Bass, 1994; William E. Cashin and Richard G. Downey. "Using Global Student Ratings for Summative
Evaluation." Journal of Educational Psychology 84 (1992): 563-572; William Cashin, R. G. Downey and G. R. Sixbury. "Global
and Specific Ratings of Teaching Effectiveness and Their Relation to Course Objectives: Reply to Marsh." Journal of Educational
Psychology 86 (1994): 649-657; Kenneth A. Feldman. "Instructional Effectiveness of College Teachers as Judged by Teachers
Themselves, Current and Former Students, Colleagues, Administrators and External (Neutral) Observers." Research In Higher
Education 30 (1989): 583-645. Return to Text
- This question is important if the teaching is supposed to stimulate continued student interest and possible learning in the
course, one of the ways the course might have a sustained, substantial and positive influence on the way students think and
act. Return to Text
- The language of "rating" is used throughout to emphasize the notion that students offer ratings, not evaluations.
Return to Text
- See, for example, Herbert W. Marsh. "Students' Evaluations of University Teaching: Dimensionality, Reliability,
Validity, Potential Biases, and Utility." Journal of Educational Psychology 76 (No. 5, 1984): 707-754; Kenneth A. Feldman.
"Course Characteristics and College Students; Ratings of Their Teachers: What We Know and What We Don't." Research
in Higher Education 9 (No. 3, 1978): 199-242. Return to Text
- Students who take courses to satisfy general interest or as a major elective tend to give higher ratings; students who take
courses to satisfy a major requirement or a general education requirement tend to give lower ratings. See, for example, Herbert
W. Marsh and M. Dunkin. "Students evaluations of University Teaching: A Multidimensional Perspective." in J. C.
Smart, editor. Higher Education: Handbook of Theory and Research. Volume 8. New York: Agathon, 1992: 143-233. Return to Text
- Peter A. Cohen. "Effectiveness of Student-Rating Feedback for Improving College Instruction: A Meta-Analysis of Findings."
Research in Higher Education 13 (1980): 321-341. Return to Text
- See, for example, George Howard and Scott Maxwell. "Correlation Between Student Satisfaction and Grades: A Case of Mistaken
Causation." Journal of Educational Psychology 72 (December, 1980): 810-820. Return to Text
- See, for example, H. W. Marsh. "The Influence of Student, Course, and Instructor Characteristics in the Evaluations
of University Teaching." American Educational Research Journal 17 (Summer, 1980): 219-237. Return to Text
- Kenton Machina. "Evaluating Student Evaluations." 20. Return to Text
- The literature on the correlations between grades and student ratings is long and complex. As noted in the text earlier,
student ratings tend to be slightly higher if students expect to receive higher grades. But this does not necessarily mean
that grade leniency accounts for the differences that have been noticed. Some research suggests that the differences come
because students give higher ratings if (1) they are highly motivated and (2) they are learning more and can thus expect to
get higher grades. See, for example, Howard and Maxwell. "Correlation Between Student Satisfaction and Grades: A Case
of Mistaken Causation.": 810-820; and George Howard and Scott Maxwell. "Do Grades Contaminate Student Evaluations
of Instruction?" Research in Higher Education 16 (1982): 175-188. The best way to determine if a course is leniently
graded is probably through a review of course materials and methods and practices of evaluating students. Lenient grading,
however, does not necessarily mean less learning. Because of the different standards by which different faculty members assign
different letter grades, the only way to determine levels of learning is to look in detail at actual student performances
(the papers they write, the types of questions they can answer, the problems they can solve, the performances they give) and
the way those performances change over time; mere class grade point averages cannot provide that information. Return to Text
- See, for example, Larry A. Braskamp, Dale C. Brandenburg, and John C. Ory. Evaluating Teaching Effectiveness: A Practical
Guide. Beverly Hills: Sage Publications, 1984; Kenneth A. Feldman. "The Significance of Circumstances for College Students'
Ratings of Their Teachers and Courses." Research in Higher Education 10 (No. 2, 1979): 149-172. Return to Text.
- See, for example, Robert M. Kaplan. "Reflections on the Doctor Fox Paradigm." Journal of Medical Education 49 (March,
1974): 310-312; Donald H. Naftulin and John E. Ware, Jr. "The Dr. Fox Lecture: A Paradigm of Educational Seduction."
Journal of Medical Education 48 (July, 1973): 630-635; H. W. Marsh. "Experimental Manipulations of University Student
Motivation and Their Effects on Examination Performance. British Journal of Educational Psychology 54 (June, 1984): 206-213.
Return to Text
- Naftulin and Ware. "The Dr. Fox Lecture." 630-635; Braskamp, et al. Evaluating Teaching Effectiveness; Feldman."The
Significance of Circumstances for College Students' Ratings of Their Teachers and Courses." 149-172. Return to Text
- Special Note: The research has found a very high positive correlation between student comments and ratings on the kinds of
results questions recommended here, suggesting that the student comments will not provide the evaluator with any evidence
to make a judgment that is not available from the ratings. See, for example, John C. Ory, Larry Braskamp, and D. M. Pieper.
"Congruency of Student Evaluative Information Collected by Three Methods." Journal of Educational Psychology 72
(1980): 181-185. Return to Text.
|
 |
 |
|
II. Student Ratings for Formative (Self-Improvement) Purposes
The questions noted
above can provide an evaluator with extremely valuable information with which to make a judgment about the quality of teaching.
There are some additional questions that might be used to help professors improve their teaching. Such questions ask about
student perceptions of particular methods of teaching: Did the instructor communicate well; was the instructor available and
willing to provide assistance outside of the classroom; was the course carefully planned and well organized; and so forth?
[Research has found that if feedback is collected in the first half of the term, it can help instructors improve the ratings
they will receive at the end of the term, and greatly improve the ratings at the end if consultation accompanies feedback 9 The Research Academy for University Learning currently offers a service
called a "Student Small Group Analysis" designed to help instructors collect detailed feedback from their students
during the term and receive feedback on the results.
Yet questions about methods should not be used
for summative evaluations (personnel decisions) because they ask about processes of achieving good teaching while the questions
in section one concentrate on assessing the results. In other words, one might get high marks on "how much students learned"
and low marks on, say, being "readily available and willing to provide assistance outside the classroom" or on "the
course was carefully planned and well organized." We might then argue that this person, nevertheless, did excellent teaching
(helped and encouraged students to learn), but did so despite ignoring some conventional wisdom on how best to teach. Conversely,
one might get high marks on all of the process questions, and still fail miserably as a teacher (not help students learn anything
worth learning as defined by the curriculum and the school or in ways that make a sustained and substantial difference). Process
questions do not tell us anything that result questions cannot, except perhaps that a person used this-or-that process, but
they can be misleading, potentially punishing those who achieve good results with unorthodox methods, or who teach in fields
in which some conventional methods are not appropriate. III. Limitations of Student Ratings
Two objections to the use of student ratings
for summative purposes often emerge. One objection argues that teachers can buy higher ratings with higher grades for the
students, thus corrupting the evaluation of both students and faculty. Yet considerable research on the subject has found
that students do not automatically give higher ratings to classes in which they receive the highest grades.10 Indeed, the highest
marks often go to the most challenging courses. Furthermore, researchers using multiple regression analysis and path analysis
to study the influence of various factors on the outcome of student ratings have found that expected grades account for only
a tiny percentage (2.6) of the result. Other factors account for more:11 prior subject interest, 5.1 percent; workload/difficulty, 3.6 percent (notice
that, despite popular conceptions to the contrary, the latter factor is positively related to student ratings; that is, more
difficult well-taught classes receive higher marks).
Perhaps, as one observer put it, "what
matters is what faculty think, not what is true. . . . If faculty believe, no matter how erroneously, that lowering of standards
will produce higher student [ratings]. . ., then faculty will live out that belief. They will lower standards and have guilty
consciences, or they will hold the line on standards and feel victimized or virtuous--all on the basis of what they believe
to be the connection between [ratings]. . . and standards."12 Thus, some understanding of the research on student ratings is essential. NYU faculty members have a long history of using
student ratings and understand that intellectually challenging courses graded with high standards will produce the best results.
The university can help new faculty members develop that same appreciation. A second objection
arises from the belief that some students will use the rating process as an opportunity to punish teachers, presumably over
some low grade or other factor unrelated to teaching quality. While there may be isolated instances of such behavior, the
extensive research on student ratings has found that such cases are largely spurious or so infrequent that they do not corrupt
the process. Indeed, research has found that student ratings tend to be higher when the directions say the results will be
used for personnel decision than they do when the form indicates that the results will go only to the instructor. This is
certainly not the pattern of students determined to "punish" the instructor. [Would student ratings go up at MSU
if more students became convinced that the ratings really do matter in personnel decisions?] Furthermore, student ratings
do not automatically go down with lower grades or up with higher ones;13 they have both high internal consistency and high rater stability over time.14 Yet it is possible to ask questions that tend to corrupt the process. Perhaps the most unreliable
question is one that asks little more than "How much did you enjoy the class?" The early Dr. Fox experiments demonstrated
that if surveys ask some form of that query, student responses may or may not tell us much about the quality of teaching (such
a question can be phrased in all sorts of ways, of course). While students usually do enjoy courses that are the most intellectually
challenging and meaningful, they can also report that they enjoy a particular class that contributes little to their learning.15
Yet--and this is extremely important--when surveys ask those same students to assess their learning in that particular class
or to provide an overall rating of the instructor and course or to assess its intellectual contributions, the students, as
a group, are able to distinguish "fluff" from substance. Equally important, what is true of
the whole is not necessarily true of every part. Exceptions exist for all of the generalizations noted here. Student ratings
can be wrong16 (although they may err more on the side of too much praise rather than too little).Students are not always well equipped to judge the course as an intellectual product, to determine whether it is appropriate
to the curriculum or sufficiently rigorous. We must use other sources of information, along with the results of student ratings,
to clarify the picture. Student ratings can provide valuable information, but they cannot always tell evaluators everything
needed to make valid, reliable assessments of teaching effectiveness. Evaluators should use student rating
data along with information from other sources to evaluate teaching. The AAHE Peer Review Project has developed some highly
effective ways to collect such additional information. That project treats teaching as a serious intellectual enterprise and
courses as important intellectual creations, and it emphasizes ways to assess both the nature of the course and its intellectual
influence on students. As the Peer Review Project recognizes, instructors can provide valuable information to evaluators,
not simply to say how good or bad they have been but to make a case. That case should be an argument (with supporting evidence)
(1) that the instructor tried to help students achieve certain intellectual (physical or emotional) goals and (2) that the
effort had a certain success (or failure). If good teaching helps and encourages students to learn something worthwhile and
in a way that makes more than a passing difference in the way students think and act, what evidence can the instructor offer
about the value of the content (the learning objectives) and about the success or failure of efforts to help students achieve
those objectives. Can the instructor offer evidence that the effort to help students learn was somehow worthy even
if students did not learn? Evaluators must decide whether the objectives are valuable and the effort
to help and encourage students to learn is sufficiently successful (or commendable despite its failures). Student ratings,
evidence of students' work, information about assignments, and so forth can provide evidence to support claims from the
instructor and help evaluators make judgments.
To make the best use of the data from student
rating forms evaluators need to understand and apply the major findings on what factors will influence ratings (e. g., level
of the course, student motivation, etc.), what differences the influences will make, and what factors will not influence ratings
significantly (e. g., time of day when the class is taught, the student's grade point average, etc.). They must consider
carefully what the student ratings will and will not tell us about the results of the teaching (about whether it actually
helped and encouraged students to learn).
IV. A Summary and Comments on Formative Evaluations
A successful evaluation system should help teachers
improve, not just provide evidence for summative judgements. While the five results questions and the four demographics questions
discussed in section one should tell evaluators everything they need to know from students about the success of the teaching,
such questions may not tell instructors how they can enhance their teaching. Professors may need to ask additional "formative"
questions that will help them identify specific strengths and weaknesses. Comments from students can also help instructors
improve. Each instructor should have the opportunity to add one or more formative questions, the results
of which would go only to the instructor. A central office could maintain a bank of such formative questions from which each
instructor could choose, or an instructor could write his or her own questions. Instructors could ask about specific assignments
or particular lectures or discussions. Every form could also contain some standard open-ended questions, the results of which
could go only to the instructor: -
What are the primary teaching strengths of the instructor?
- What are the primary weaknesses
of the instruction? Can you offer suggestions for improvement?
- Did the course help you learn?
Why are why not? If so, what did it help you learn.
Finally, each form could contain at least one question, the results of which for each class could be reported to
individual instructors while departmental and/or school averages on this question could be reported to chairs, deans, and
the Provost: - On average, I
spent the following number of hours per week on this course:
(a) 0-3; (b) 4-7; (c) 8-11; (d) 12-15; (e) 16-19 (f) 20+
The results of such a question might be valuable
in, for example, a study of how students spend their time, but should not be used in evaluating teaching because it determines
process rather than the learning results.
Copyright(c) by Kenneth R. Bain
Used by Permission
|
 |
|
|
 |
|