 |
 |
|
"Improvements in the evaluation and improvement of teaching must rest on carefully considered views
of what counts as excellence in university pedagogy . . . Departments, schools
or programs should generate their own statements of mission and, hence, of teaching excellence in their field. No single statement
will adequately cover all fields, nor should anyone presume to prescribe a universal set of standards."
Toward
Great Excellence in Teaching at Stanford

While in the twentieth century academic eminence often came solely from reputations in research and
published scholarship,(1) it seems likely, given important changes that are already taking place among the great institutions
of higher learning, that such a standing in the twenty-first century will come only if the university excels in both research
and teaching and recognizes how these two enterprises can complement each other. Rather than thinking in terms of the
traditional dichotomy of research and teaching, a separation that often paralyzed higher education in the twentieth century,
we can begin to think of ourselves as a learning community concerned with the learning of both faculty (research) and
students (teaching) and the ways in which the learning of one can benefit the other.
Yet a university
can achieve greater eminence in neither teaching (the cultivation of student learning) nor research (faculty learning) unless
it has a system to evaluate each enterprise. While methods to evaluate the latter have a long standing in the academic community,
methods to evaluate teaching do not, in part because many academics have been convinced that there has been no good way to
evaluate teaching. Accordingly, teaching excellence has perhaps not been as visibly promoted and rewarded among the faculty
as have been the enterprises of research and traditional scholarship. Every university must rectify this situation if it is
to achieve high standards. The changing expectations of great universities will allow nothing else.
We propose in what follows a concrete and systematic set of guidelines to identify, recognize, and reward outstanding teaching
within a university. Drawing upon various sources, including material developed in the American Association of Higher
Education’s Peer Review Project, we sketch out here the practices and insights that must play a role in any comprehensive
evaluation program, designing a set of flexible guidelines and suggestions to evaluate teaching for summative purposes.
We envision this report serving as a template for the evaluation process, from which administrators in individual schools
and departments can formulate procedures that are specific to their needs and expectations. We have included in what
follows some of the research from the theoretical literature on evaluating teaching, but we have striven primarily to offer
a very practical and useful set of guidelines and suggestions.
Before laying out that plan,
one final and extremely important introductory note: the first step in implementing a serious and meaningful program of teaching
evaluation is to conceptualize teaching as a form of scholarship, something most university faculty members already do.
Scholars have long believed, as Physicist Robert Oppenheimer said in 1954, “It is proper to the role of the scientist
that he not merely find the truth. . .but that he teach, that he try to bring the most honest and most intelligible account
of new knowledge to all who will try to learn.” But it was Ernest Boyer’s 1990 report Scholarship
Reconsidered that took these ancient commitments of the scholar to teaching and carried the idea an additional step.
He argued that teaching is not merely a logical outcome of scholarship but it is most properly thought of as a form of scholarship,
along with the scholarships of discovery [what we normally call research], integration, and application. Teaching as scholarship
implies that we recognize that the creation and manifestation of a course are challenging, creative, and consequential intellectual
tasks and that every course we craft is a lens into our field and our personal conception of our disciplines or inter-disciplines.
As Russell Edgerton, Pat Hutchings, and Kathleen Quinlan wrote in their discussion of the scholarship of teaching, “At
bottom, the concept entails a view that teaching, like other scholarly activities . . . relies on a base of expertise, a ‘scholarly
knowing’ that needs to and can be identified, made public, and evaluated; a scholarship that faculty themselves must
be responsible for monitoring.” Lee Shulman, Boyer’s successor as President of the Carnegie Foundation for
the Advancement of Teaching, argued that teaching is the highest form of scholarship because it, unlike any of the others,
necessarily entails all of the others. “Indeed,” Boyer wrote, “as Aristotle said, ?Teaching is the highest
form of understanding.’”
If we recognize the creation and manifestation of
a course as a piece of intellectual work as challenging, creative, and consequential an intellectual task as a piece
of published scholarship, then we will begin to recognize teaching as a practice that demands both serious evaluation and
meaningful rewards. If we will recognize, as Boyer wrote, that “knowledge is acquired through research, through
synthesis, through practice, and through teaching,” we can recognize that teaching is properly thought of as a form
of scholarship that deserves meaningful recognition, evaluation, and reward. To think in these terms means that
we must expect that the scholarly qualities of our teaching must meet the highest standards and that in our search for a method
to evaluate teaching that we must necessarily find ways to identify and assess the intellectual or artistic qualities of our
teaching.
What Kind of Evaluation and for What Purpose?
Evaluation of teaching might be done in a variety of circumstances and for several different reasons.
We might evaluate to help someone improve their teaching (what the literature often calls "formative" evaluations),
or we might do so to help make judgements about hiring, retention, merit pay, awards, or promotions (what the literature often
calls "summative" evaluations ). Faculty members and departments should already be fostering formative evaluations.
Many schools have already implemented regular mechanisms and means through which faculty members can obtain feedback on, and
work towards improving, their own teaching.
Our concern in this report, however, is primarily
with the evaluation of teaching for summative purposes.(2) Summative evaluations of teaching can play a substantial role in
decisions made 1) to hire someone to the faculty; 2) to retain, promote or tenure a faculty members, or 3) to offer merit
pay raises to faculty members.
Overview:
Evaluation Questions
There are two fundamental and simple questions that we should ask before we begin considering how to evaluate teaching:
1) What questions do we want to answer?
In other words, what are
we evaluating? Are we interested in the teacher's ability to motivate students? To have a sustained and substantial influence
on a student's intellectual development? To foster a sense of community in the classroom? To prepare students for external
exams of some sort? All of these elements? The evaluation process must begin with an answer (or answers) to this question,
which defines the terms and objectives of that process.
Our response to this question, based
both on practical experience in the classroom and extensive research in the learning sciences, runs like this: "Does
the teacher help and encourage students to learn something worth learning in a way that makes a sustained, substantial and
positive difference in the way they think, act, or feel, without doing the students any harm?" This answer breaks down
into four essential components:
A. Is the teacher's material worth learning? Has the teacher
identified questions worth answering? reasoning skills worth obtaining? abilities worth developing? Do the teacher's course
objectives and course materials offer a window, however wide or narrow, into the fundamental problems and questions of that
teacher's discipline? Will the learning objectives, if achieved by students, prepare them adequately for additional study?
It is here, more than with the questions to follow, that we can begin to define and assess the
scholarship of teaching, to apply scholarly standards to the evaluation of teaching. Are the information, ideas and practices
that students are expected to learn both appropriate to the level of the course and important within the discipline, either
as gateways to ongoing learning of the field or as important parts of the field on their own? Does what is taught show that
the professor has made the choices of learning objectives with an adequate understanding of the existing scholarship in the
field? Does the content and the professor's teaching of that content reflect an adequate understanding of the existing
scholarship in the field?
B. Are the students learning what the course is supposedly teaching?
Do exams, projects, and classroom performance reflect students' mastery of the course material? Have the students gained
or honed the intellectual, physical, or emotional skills the teacher's course objectives promised to them? Has the course
had a sustained and substantial influence on how they think, act, or feel?
C. Are the teacher's
strategies effective in helping and encouraging the students to learn? Has the teacher successfully motivated their interest
in the course? Do his or her teaching strategies and techniques actually facilitate student learning? Or are students learning
in spite of the teacher?
D. Has the teacher avoided doing harm to students? Or has the teacher
done harm to the students by simply fostering short-term learning with threats and intimidation tactics in the classroom?
Has the instructor inappropriately discouraged rather than stimulated additional interest in the field? Has he or she conducted
the class in an ethically appropriate manner? Has the teacher handled student diversity with sensitivity, respected students'
right to dissent and disagree, and allotted to the course sufficient time and energy to teach the material effectively?(3)
Perhaps most important, has the instructor evaluated student learning fairly and accurately? Has
that evaluation been based on the students' ability to achieve the stated learning objectives? Or has it been based on
some other standard, in essence redefining the objectives (perhaps along lines that might be deemed inappropriate or inadequate)?
These questions are probably essential ones for evaluation in any imaginable discipline and teaching
situation, but individual administrators certainly might have additional questions that they would like to see answered as
part of the evaluation process.(4)
There are certain questions, however, we would insist that
it is not important to answer. Any questions that try to evaluate aspects of the teacher's delivery skills would be inappropriate
for an evaluation process that is interested in assessing a teacher's ability to help and encourage student learning.
Teachers need not be entertainers or performers to be effective; indeed, many outstanding teachers foster student learning
most effectively by remaining off-stage as much as possible. While performance can certainly affect student learning, it is
not the performance per se that should be under review but the success or failure of that performance in helping and encouraging
students to learn. In short, the evaluation should not judge how the instructor helps and encourages students to learn something
worth learning (as long as no harm is done) but whether such help and encouragement exist.
This
leads to a second set of issues that the evaluation process should not address: the worth and quality of various teaching
methods and strategies. The evaluation process will lose credibility if it implicitly or explicitly endorses any one teaching
method as the most effective tool for enhancing student learning. The research on teaching and learning continually insists
that no teaching method trumps all others. As Braskamp and Ory have written, "no single instructional strategy is always
superior to any other . . . . Faculty who lecture are not necessarily better teachers than faculty members who use discussion
techniques" (Ory, 18). Hence faculty members should not necessarily be rewarded or punished for using either traditional
or experimental instructional techniques. The evaluation should focus upon the ability of the teacher to help and encourage
students to learn, whatever the methods will be.(5)
We can, however, imagine a situation in
which a department or school or a university might want to reward someone for conducting a carefully studied experiment that
might inform and benefit colleagues even if the results did not produce better learning. Indeed, the evaluation system we
outline here can and should identify and reward efforts to test methods of university teaching, enterprises that contribute
to a better understanding of what and how we should teach.
2) What will count as evidence in
answering those questions?
What materials will give you satisfactory answers to your evaluative
questions? While answers to this question will vary, depending upon your initial evaluative questions, what we can say with
absolute certainty is that no one source of data--either self-evaluations, or student ratings and comments, or the observations
of a peer--will provide you with sufficient evidence for a summative judgment. Any evaluative process must rely on multiple
sources of data, which are then compiled and interpreted by an evaluator or evaluative committee. Student remarks and ratings
on rating forms, in other words, are not evaluations; they are one set of data that an evaluation process should take into
consideration. The same can be said for self-evaluations, and the results of peer or administrative observations. In short,
different sources of data, or evidence, will be appropriate for different evaluative questions. In the next section, we consider
the kinds of evidence that might be appropriate for the four parts of the evaluative question we posed above.
Once you have determined both the questions you wish to answer and the evidence which will provide you with answers to these
questions, a final step remains: selecting the evaluators, and considering how you will train them to make informed and reflective
decisions. It seems evident that more than one individual should evaluate individual teachers, and that evaluation committees
should be composed of both administrators and peers. Precisely how those evaluators should be selected is a complex question
that requires ongoing discussion.
But it also seems evident that evaluators should have some
means of communicating with each other, and achieving a loose consensus upon, the standards of evaluation. That process may
take the form of a brief training program, sponsored by experts in the evaluation of teaching, or it may simply take the form
of group discussions prior to the evaluative process. Each member of the evaluative committee could spell out the standards
he or she intends to apply, and then the group could work together to compile standards acceptable to the entire committee.
Evaluation will be most successful, effective, and valid if some minimum degree of consensus on standards is achieved prior
to the evaluation process.
Finally, as Larry A. Braskamp, Dale C. Brandenburg, and John C. Ory
have argued in Evaluating Teaching Effectiveness: A Practical Guide,"we should regard [evaluation] as a form of argument"
(8). Evaluation, like research, is not simply about numbers. It involves presenting a case: advancing an argument about the
effectiveness of a teacher, presenting evidence to support that argument, and submitting that argument to the judgment of
one's peers. We have no numerical scale to determine the ultimate worth of a particular piece of research; the same applies
to teaching. But we do evaluate research, relying upon our own self-scrutiny, as well as the careful and experienced judgment
of peers, publishers, and the reactions of our reading audiences. Teaching is no different: we can evaluate teaching, and
we can do so by relying upon our own self-examinations, as well as the careful and experienced judgment of our peers, administrators,
and our students.
|
 |
|
|
|
 |
 |
|
1. Research reflects what Ernest Boyer called the scholarship of discovery, which in Boyer's terms and in the
terms we use later in this report represents only one aspect of a broad definition of scholarship. See Ernest L. Boyer, Scholarship
Reconsidered: Priorities of the Professorate. The Carnegie Foundation for the Advancement of Teaching, Princeton: 1990.
2. As Pat Hutchings, director of the AAHE Teaching Initiative, has argued, evaluation done for
formative purposes may eventually feed into evaluation for summative decisions. Material that faculty members gather for their
own self-improvement could easily become part of the portfolios which they submit to chairs and deans for summative evaluations.
An obvious caution here is that material gathered for formative purposes should not be required for summative evaluations;
requiring--or even the threat of requiring--faculty members to submit information gathered for self-improvement for summative
decisions would potentially stifle their willingness to pursue self-improvement. See Hutchings, "Peer Evaluation and
the Review of Teaching" (14-5). 3. While we assume that it is harmful to discourage
or "kill" students' interest in a subject, we must be careful to distinguish such intellectually damaging outcomes
from careful and sensitive efforts to guide students into certain fields and careers and away from others because of their
abilities, and to distinguish between the destruction of curiosity and interest and the necessary and proper use of high standards
that may keep some students from pursuing study in a particular area because they have not demonstrated sufficient ability
or preparation. 4. It should be noted here that it can happen that students don't learn for
reasons which are beyond the teachers control. Students may have some exceptionally strong bias against the subject matter,
or may be experiencing personal problems, or some external political or social issue on or off campus may disrupt the learning
process, etc. Isolated instances of this should not harm a teacher's evaluation, but if it happens consistently with one
instructor it is likely indicative of some problem with that instructor's methods. 5. Here we encounter one
of the important distinctions that might be made between questions that may be valuable for a formative process and ones that
work best for summative evaluations. If an instructor is trying to identify some changes in teaching behavior that could improve
efforts to help students learn, it may be useful to ask about specific performances. But a failure to do well with a particular
types of teaching performance does not necessarily indicate that the teacher has not helped and encouraged students to learn
something worth learning and appropriate to the course and discipline. Therefore, the results of questions about means should
not be used to judge the quality of teaching. 6. See James Lang and Ken Bain, "The
Teaching Portfolio," The Teaching Professor. (December 1997): 1 7. It is the conception of
the teaching portfolio as a "container" that has created documents with no consistency across faculty members or
disciplines, leaving administrators and promotion and tenure committees understandably frustrated and reluctant to rely heavily
on them to evaluate teaching. At Northwestern University's 1993 Focus on Teaching Conference, then-Provost David Cohen
explained that the material he receives from faculty members on their teaching is not very well-defined or consistent across
schools or departments. As a result, "while the dossiers are enormously improved with respect to the level of attention
to teaching, the documentation is not working very well" (Proceedings, 7). He implicitly calls for the kind of guidelines
we are setting out here. 8. Research has also found that we must be extremely careful what
we ask students because they will answer truthfully. Thus, if we ask them a form of the question "did you enjoy the course,"
they may provide accurate responses, but their answers may say little about how much they learned.
9. We realize that such a model will not work for all faculty members because some people may teach only one course or have
such limited experience outside one course that it would not make sense to divide matters in the fashion we have suggested.
In other instances, a faculty member's entire teaching may consist of participation in a series of courses rather than
responsibility for any one course. Departments and schools must adjust the expectations accordingly.
10. Encouraging the use of the statement of teaching philosophy will encourage faculty members to become what Donald A. Schon
has called "reflective practitioners." A reflective practitioner continually examines and reexamines his or her
practices, modifying or adjusting them in the light of new information and experiences. The reflective practitioner represents
practical artistry at its best, and can handle skillfully what Schon describes as those "indeterminate zones of practice"
(36) which we so often encounter in our classrooms: difficult and unexpected situations which can become, in the hands of
a thoughtful and experienced teacher, moments of intensive teaching and learning. Peter Seldin, in the AAHE Bulletin, also
recommends highly such uses of faculty self-assessment: "The trend toward wider and more structured information gathering
is reflected in the growing popularity of self-assessment. Many academics--administrators and faculty alike--are convinced
that self-evaluation provides useful insights into course and instructional objectives as well as classroom competency"
(12). 11. See, for example, Herbert W. Marsh and M. Dunkin. "Students evaluations of University
Teaching: A Multidimensional Perspective." in J. C. Smart, editor. Higher Education: Handbook of Theory and Research.
Volume 8. New York: Agathon, 1992: 143-233; and H. W. Marsh. "The Influence of Student, Course, and Instructor Characteristics
in the Evaluations of University Teaching." American Educational Research Journal 17 (Summer, 1980): 219-237. 12.See,
for example, Howard and Maxwell. "Correlation Between Student Satisfaction and Grades: A Case of Mistaken Causation.":
810-820; and George Howard and Scott Maxwell. "Do Grades Contaminate Student Evaluations of Instruction?" Research
in Higher Education 16 (1982): 175-188 13.The Carnegie Foundation for the Advancement of Teaching
offers the following additional guidelines first made by researcher John Centra: "examining several sets of evaluation
results for each professor for patterns or trends; making sure that a sufficient number of students evaluate each course;
considering course characteristics and comparative data when interpreting results; relying primarily on global or summary
items (rather than questions about specific aspects of the scholar's teaching) for purposes of personnel decision [the
core questions are such global questions]; and not overestimating the importance of small differences in scores. John A. Centra,
Reflective Faculty Evaluation: Enhancing Teaching and Determining Faculty Effectiveness. Jossey-Bass, San Francisco: 1993:
89-90. Cited in Charles E. Glassick, Mary Taylor Huber, and Gene I. Maeroff, Scholarship Assessed; Evaluation of the Professoriate.
An Ernest L. Boyer Project of the Carnegie Foundation for the Advancement of Teaching. Jossey-Bass, San Francisco: 1997: 47
|
|
 |
|
Evaluating Teaching:
Blueprints
How can we use these ideas to carry out a specific evaluation process? We expect an evaluation
to answer the question, does the teacher help and encourage students to learn something worth learning without doing the students
any harm? What will count as evidence to answer that question? How will that evidence be collected and presented? Will we
require different kinds and levels of evidence depending on the purpose of the evaluation?
We
suggested above that evaluation involves an argument. What we describe here about the nature of the argument may remind some
of the much discussed "teaching portfolio," but we deliberately avoid using the term "portfolio" because
too often it has been thought of as a kind of container into which a faculty member simply pours all of the products and descriptions
of his or her teaching.(6) We have in mind a case, complete with supporting evidence, that the faculty member would make about
her or his efforts to help and encourage students to learn something worth learning (without doing them harm).(7) That case
would, in fact, be a series of arguments, with supporting evidence, that answer each of the questions that the department
and school have decided are important. For example, such a case might provide an answer to the following questions: What have
you tried to help and encourage students to learn? Why are those learning objectives worth achieving for the course you are
teaching? What strategies did you use? Were those strategies effective in helping students to learn? Why or why not? What
did your students learn as a result of your teaching? [If they are not learning what you want them to learn, why not?] Did
you stimulate their interest in the subject?
Those arguments would require careful and rigorous
thought on the part of the teacher. Rather than simply gathering material--student ratings, syllabus, etc.--and sending it
to the evaluator, the faculty member under review would offer synthetic and carefully organized arguments. Thus, the burden
of establishing connections with the evidence and offering coherence throughout would fall on the teacher under review rather
than on the evaluator. If the teacher merely submits a container of documents that lack coherence, then the argument for teaching
effectiveness has simply not be made.
This approach has the positive benefit of allowing teachers
to assume control over what aspects of their teaching are subject to evaluation. Properly conceived and overseen, the case
(or argument) on teaching quality can also help ensure that all faculty members are subjected to the same high standards.
If we require each faculty member to submit some evidence in each of the four categories we discuss here--though the precise
nature of that evidence will be teacher's decision--then we will ensure both flexibility and a certain degree of continuity
in the evaluation process.
Because faculty members must build a case and evaluators must decide
whether the case has demonstrated the existence of good teaching, we turn again to the question of what constitutes evidence
for each major type of question we have proposed.
A. Is the material worth learning and appropriate
to the course and curriculum?
There are really two types of evidence here: evidence that comes
from the teacher and evidence that comes from an outside review:
1. Evidence from the teacher:
The most important piece of evidence is a statement from the teacher of what he or she has helped students to do intellectually,
physically, or emotionally. That statement, in turn, might be supported with references to a lesson plan, lecture, PBL session,
clerkship, syllabus, specific assignments or patterns of assignments, or assessments that reflect the learning objectives.Perhaps
the most important evidence the teacher can provide is not simply a statement that certain objectives have been pursued but
evidence from examinations, assignments, problem sets, and so forth that demonstrate that students were ultimately evaluated
for their ability to meet certain objectives rather than on the basis of other considerations, considerations that spell out
the true nature of learning objectives.
2. Evidence from an outside review: The individuals
most qualified to judge the quality and adequacy of the learning objectives would be the instructor's peers: other members
of his or her discipline, preferably even within that instructor's area of specialty. Just as each discipline has its
own protocols and criteria for evaluating research, so will each have its own standards for evaluating learning objectives.
The faculty member would work with the department to solicit from another expert in the field an initial judgement of the
learning objectives that the faculty member has defined. This is especially important for the case on a single course, but
can also be done for the general case on teaching quality. The outside reviewer would look at the case offered by the faculty
member, including the evidence reflected in the way students are evaluated, to offer a review. That review of objectives then
becomes part of the evidence that other reviewers will see. Thus, the first level of evaluation actually develops additional
evidence that other evaluators will consider. Once that evidence has been created, the departments would work with the faculty
member under review to select both internal and external reviewers who will receive the completed cases and make evaluations
of the teaching. The evaluations from each level of review should become part of the growing portfolio that other, subsequent,
reviewers will see.
B. Are the students learning the material?
As noted earlier, the best kind of evidence of student learning comes from examples of student work. Faculty members must
consider carefully what makes them think they have helped students learn. What evidence best illustrates the level of student
learning? If students have not progressed in their ability to do whatever they are trying to do intellectually, physically,
or emotionally, what does their lack of progress suggest, if anything, about the quality of teaching. We recognize that special
circumstances beyond the control of the instructor may keep students from learning. The instructor must simply make the explanations
about the levels of student learning, connect it to the evidence, and provide examples of that evidence. One important factor
we must keep in mind: this question calls for evidence not just of superior student achievements but of improvements in student
performances; that is, evidence that the instructor has made a difference.
There is, however,
an additional category of evidence available to every instructor. There are questions that one can ask students on a rating
form that can provide important evidence about the level of student learning. Considerable research suggests that students
are actually quite accurate judges of their own learning, if we ask them the right questions.(8)
Given the good questions, students obviously can provide prima facie evidence about the degree of their own intellectual stimulation
or the decline or growth in their interest in a subject. Thus, the student responses to the question "Rate the effectiveness
of the teacher in challenging you intellectually," and the question "Rate the effectiveness of the instructor in
stimulating your interest in the subject" provide important evidence about student "learning" in the broadest
sense. The evidence from the question "Estimate how much you learned" is a little trickier. We would not regard
student self-reports of their learning as sufficient evidence to evaluate their learning because of the possibility of self-serving
responses. But research has found that if, on a form to rate teaching, we ask students to estimate how much they have learned,
their responses usually have a high positive correlation with independent measures of their learning. Thus, while examples
of student work remain the best evidence of their learning, the response to the "Estimate learning" question can
usually provide a strong indication of the level of overall student learning in a class and, thus, they can help answer an
important question about teaching.
The class "average" responses to each of these
questions can reflect the levels of student "achievements," but we want to offer a word of caution that is so important
that we will repeat it in other sections. Averages can emerge from a variety of distributions of ratings. They may come from
all of the numbers clustered fairly close to the mean. They may come from a combination of both high and low ratings. Each
distribution might suggest something quite different about the success of the teaching. In the former case, the instructor
might be only marginally successful in reaching everyone while in the latter, the instructor may be highly successful in helping
many students but fail completely with others. If evaluators look only at averages, they may fail, for example, to recognize
the qualities of a course that is highly successful in helping some students achieve spectacular results yet suffers the wrath
of a disgruntled few. Rather than asking only about the averages, both faculty members and evaluators should look at individual
ratings and ask, has the teaching been highly successful in reaching anyone? What percentage of the class? How many students,
if any, reported that they learned a great deal? Or were challenged intellectually? How many did not? Why not?
Indeed, throughout the evaluation process, professors and their evaluators should focus on the qualities of learning objectives
and the efforts to help students achieve them rather than on numbers. What does the teaching contribute to student learning?
Does the instructor expect ambitious and creative learning objectives that make important contributions to the thinking about
student learning within the discipline? What are the nature and qualities of the learning objectives? Do those objectives
reflect the highest scientific and scholarly standards? Is there any reason to believe that the instructor helps any of the
students to achieve that highest quality of work? What quality of work do most students achieve?
C. Is the teacher effective in helping and encouraging students to learn?
Again, important evidence
will come from student responses to the right student rating questions. When students respond to questions like "Provide
an overall rating of the instruction" they are indicating how well the instruction reached them educationally, how well
the methods of instruction and the design of the course helped and encouraged them to learn. As already noted, when they respond
to the questions noted above they are indicating how well the instructor and course stimulated their interest in the subject
and challenged them intellectually.
The instructor can also provide important evidence with
a thoughtful self-analysis that explains what he or she has done to promote the specific intellectual, physical, or emotional
abilities that are the goals of the course and why there is reason to believe that those efforts have been successful or unsuccessful.
We are reluctant to recommend peer observations as evidence that the instructor has effectively
helped students learn. Use of such peer observations has found that observers tend to give high marks to colleagues who provide
the same kind of help the observer would offer and lower marks to colleagues who do it differently. Furthermore, observing
only one or two classes of a course can provide a very distorted picture of what goes on in that classroom on a daily basis
or leave observers with false impressions about the way the instructor understands and explains key concepts. An instructor
might, for example, help students learn complex ideas by exposing them first to simple explanations then gradually, over several
sessions, unfolding for the students the complexity of the concepts. An observer watching only the first iteration of the
idea might believe that the teacher is leaving students with overly simplified notions that distort basic principles when,
in fact, the instructor may have employed a strategy that helped students learn the complexities quite effectively. Furthermore,
an instructor could, of course, have a bad day only when colleagues show up to observe. Most important, we are interested
not in the specific methods the teacher uses but whether he or she helps and encourages students to learn on an appropriate
level.
A teacher might, however, present as part of a larger body of evidence about a particular
course (see above), a videotape of one or two sessions of class, along with both a written analysis from the teacher and a
review from a colleague. That way the instructor can pick those sessions that best represent his or her efforts to help and
encourage students to learn and that best capture what he or she is trying to teach. Other observers (students) are in the
class on a regular basis to provide a broader report on how well the class is going (see discussion of the use of student
ratings).
D. Is the teacher fostering learning without harming students?
Evidence about possible harms might come from a variety of sources. If the course or instructor discourages interest in the
subject, the student will indicate such in the rating of effectiveness in stimulating interest. If the instructor evaluates
students unfairly by basing the evaluation on abilities different than the stated objectives, that should be apparent from
an examination of evaluation procedures (see above) and, no doubt, reflected in the ratings the students will give the instructor.
We are not suggesting instructors must provide evidence of lack of harm, but that evaluators should be sensitive to evidence
of harm that does emerge--always insisting on substantial evidence. While we believe that it is necessary to address this
question, we suspect that rarely will it be an issue that needs extensive exposition.
Summary:
We are suggesting that the faculty member think about teaching (in a single session or an entire
course) as a serious intellectual act, a kind of scholarship; and that he or she develop a case, complete with evidence, exploring
the intellectual meaning and qualities of that scholarship. Each faculty member under review for tenure or promotion could
present two cases about their teaching: one case on a single course and another for teaching in general.(9)
Each case would consist of (1) a narrative (usually 3-5 typed pages)--a statement of teaching philosophy--that would
(a) define what students should be expected to do intellectually, physically, or emotionally, and why
they should develop those abilities;
(b) explain and assess the efforts that have been made
to help students learn, with references to specific aspects of the course (assignments, activities, issues, etc.) that have
been designed to foster and/or assess the learning that is supposed to take place; and
(c) explore
the learning that has taken place (what kind of sustained influence is the teaching likely to have on the way students think,
act, or feel), with references to what the students' work, their responses to the student ratings, or other evidence reflect
about the influence of the course on them.
It would also consist of (2) the evidence referenced
in the narrative (student ratings, syllabus, examples of students work, assignment sheets, videotapes of class sessions with
commentary, etc.).(10)
If we envision the presentation of materials about teaching quality as
an argument, we can conceive of the evaluation of teaching as the evaluation of an argument, and the case becomes the pedagogical
equivalent of the scholarly paper--to capture the scholarship of teaching. Hence--as with the traditional scholarly paper--while
the general protocols for conducting the argument should be the subject of university consensus, the final form and content
of the argument should remain the choice of the individual teacher. This conception of the case allows individual freedom
in determining the data of evaluation, but still requires careful and rigorous thought on the part of the teacher being evaluated.
He or she must make the argument, complete with evidence, inferences, and conclusions. If, as we noted above, the case lacks
coherence, or is submitted merely as a container of documents on teaching, then the argument for teaching effectiveness has
simply not been made.
We have outlined here a procedure that should work well for most faculty
members, but departments, schools, and the university must decide who will review these cases. As we have already noted, even
in the process of compiling the case and the evidence, the faculty member would work with the department to solicit from another
expert in the field an initial judgement of the learning objectives that the faculty member has defined. This is especially
important for the case on a single course, but can also be done for the general case on teaching quality. That review of objectives
then becomes part of the evidence that other reviewers will see. Once that evidence has been created, the departments should
work with the faculty member under review to select both internal and external reviewers who will receive the completed cases
and make evaluations of the teaching. The evaluations from each level of review should become part of the growing portfolio
that other, subsequent, reviewers will see.
There are some central points that we should repeat
for the sake of emphasis and clarity:
1. The information (ratings and comments) collected from
student rating form provide evidence that evaluators can use to make judgements about the quality of teaching if students
are asked the five core questions. They are not, by themselves, evaluations.
2. Evaluators should
look at the distribution of responses rather than the averages and should keep clearly in mind what each response might suggest
about the success of teaching for the student who offered the response. [As we noted earlier, averages can emerge from a variety
of distributions of ratings. They may come from all of the numbers clustered fairly close to the mean. They may come from
a combination of both high and low ratings. Each distribution might suggest something quite different about the success of
the teaching. In the former case, the instructor might be only marginally successful in reaching everyone while in the latter,
the instructor may be highly successful in helping most students but fail completely with others.] Rather than asking only
about the averages, both faculty members and evaluators should look at individual ratings and ask, has the teaching been highly
successful in reaching anyone? What percentage of the class? How many students, if any, reported that they learned a great
deal? Or were challenged intellectually? How many did not? Why not?
3. Each of the five core
questions provides different kinds of information to the evaluator and must be read with care.
4. Some external factors beyond the control of the instructor can influence the way students respond to certain questions.
These factors should be taken into consideration when using the information to make evaluations. Students who take courses
to satisfy general interest or as a major elective tend to give slightly higher ratings; students who take courses to satisfy
a major requirement or a general education requirement tend to give slightly lower ratings. Prior student interest in the
subject can account for as much as 5.1 percent of a rating. Thus, senior courses filled with students who report high "interest
before taking [the] class" and/or students who are not required to take the class should expect slightly higher ratings
than introductory level classes filled with students with low prior interest and/or students who are required to take the
class. Demographic questions provide information on prior interest and so forth.(11) Thus, evaluators must make comparisons
between faculty members who teach classes with similar demographics on these issues.
5 The literature
on the correlations between grades and student ratings is long and complex. Student ratings tend to be slightly higher if
students expect to receive higher grades. But this does not necessarily mean that grade leniency accounts for the differences
that have been noticed. Research has found that students, in general, tend to give higher ratings to courses they regard as
intellectually challenging and helpful in meeting those challenges and lower ratings to courses that are easy and in which
they do not learn much. Furthermore, students give higher ratings if (1) they are highly motivated and (2) they are learning
more and can thus expect to get higher grades.(12)
6. The best way to determine if a course
is leniently graded is through a review of course materials and methods and practices of evaluating students. Lenient grading,
however, does not necessarily mean less learning. Because of the different standards by which different faculty members assign
different letter grades, the only way to determine levels of learning is to look in detail at actual student performances
(the papers they write, the types of questions they can answer, the problems they can solve, the performances they give) and
the way those performances change over time; mere class grade point averages cannot provide that information.
7. As with all evaluations, the evaluators should keep in mind the questions they are attempting to answer. We believe that
student ratings on the Core Questions can provide evidence to help answer the key question, is the teacher effective in helping
and encouraging students to learn?(13)
Implementing the Program
In devising these recommendations, we have considered also how they might be implemented. To implement this program, departments,
schools, and universities must first identify evaluators, provide them with training, and begin the discussion about the standards
of teaching quality that will be expected of the faculty. Many disciplines have a long history of conversations about what
courses should attempt to help students to be able to do intellectually, physically, or emotionally; others have not, but
all department will have to engage in that conversation as this evaluation process emerges. It is a conversation that will
arise every time an evaluator makes a decision about the learning objectives that a course or instructor tries to help students
to meet. In many disciplines the expectations are well established and fairly exact; in others, they are more general. Some
disciplines resist any attempt to spell out a list of what students should be taught, and rightly so, but all disciplines
have standards of scholarship--intellectual standards--that they can apply to this conversation, in much the same way that
they have always applied those scholarly standards to questions about the quality of research and published scholarship that
faculty members have produced.
Developed by Kenneth R. Bain and James Lang
|
|