Слайд 1SULEYMAN DEMIREL UNIVERSITY
FACULTY OF EDUCATION AND HUMANITIES
Assessment
Lecture 3: RELIABILITY (Part
2)
How to make tests more reliable
Слайд 2Take enough samples of behaviour
The more items you have on
a test, the more reliable that test will be.
Слайд 3
The more independent the items are, the more reliable will
be the test.
Слайд 4
Do not allow candidates too much freedom
Слайд 5Candidates should not be given a choice. The range of
possible answers should be restricted.
Compare the following tasks:
Sample A
Write
a composition on tourism.
Слайд 6
Sample B
Write a composition on tourism in this country.
Слайд 7Sample C
Write a composition on how we might develop the
tourist industry in this country.
Слайд 8Sample D
Discuss the following measures intended to increase the number
of foreign tourists coming to this country:
More/better advertising and/or information
(where: what form should it take?)
Improve facilities (hotels, transportation, communication, etc.)
Training of personnel (guides, hotel managers, etc.)
The successive tasks impose more and more control over what is written. The fourth task is likely to be a much more reliable indicator of writing ability than the first.
Слайд 10Provide clear and explicit instructions
Applies both to written and oral
instructions.
Do NOT rely on the students’ powers of telepathy.
Read
spoken instructions from a prepared text.
Слайд 11Tests must be well laid out and perfectly legible
Make sure
your test:
is not badly typed (or handwritten)
- doesn’t
have too much text in too small a space
Слайд 12
Candidates should be familiar with format and testing techniques
Слайд 13Provide uniform (the same) and non-distracting conditions of administration
timing
acoustic conditions
no distracting sounds or movements
Слайд 14Use items that permit scoring which is as objective as
possible
multiple choice items
the open-ended item which has
a unique, possibly one-word, correct response
Слайд 15Make comparisons between candidates as direct as possible
Do not
allow candidate to choose a topic
For writing a composition
give ONE topic instead of six.
Слайд 16Provide a detailed scoring key
Specify acceptable answers.
Assign
points for partially correct responses.
For example,
A. Paraphrase the sentence using
the given word.
1. The last time I saw David was last Monday. HAVEN’T
I ______________________ Monday.
Keys: haven’t seen David / since
Student A: I haven’t seen David since Monday. (1 point)
Student B: I haven’t seen David from Monday (0,5 points)
Student C: I didn’t see David from Monday (0 points)
Слайд 17Train scorers
The scoring of compositions should not be assigned to
anyone who has not learned to score accurately.
After each
administration performance of scorers should be analyzed.
Individuals whose scoring deviates markedly from the norm should not be used again.
Слайд 18
Agree acceptable responses and appropriate scores before scoring
A sample of
scripts should be taken immediately after the administration of the
test.
Only when all scorers are agreed on the scores to be given to these should real scoring begin.
Слайд 19Identify candidates by number, not name
A scorer may be influenced
by:
the name
a photograph
gender or nationality
Слайд 20Employ multiple, independent scoring
Writing and speaking tests should be scored
by at least two independent scorers.
Neither scorer should know
how the other has scored a test paper.
Scores should be recorded on separate score sheets and passed to a third, senior, colleague, who compares the two sets of scores and investigates discrepancies (differences).
Слайд 21Reliability and Validity
To be valid a test must provide consistently
(always) accurate measurements. It must therefore be reliable.
A reliable
test, however, may not be valid at all.
In our efforts to make tests reliable, we must be wary of reducing their validity.
There will always be some tension between reliability and validity. The tester has to balance gains in one against losses in the other.
Слайд 22Activities:
Look at your own institutional tests. Using the list
of points in the chapter, say in what ways you
could improve their reliability.
What examples can you think of where there would be a tension between reliability and validity? In cases that you know, do you think the right balance has been struck?