Scientific Literacy, STS and Assessment: Lessons from TIMSS Anthony W. Bartley and
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
An earlier version of this paper was presented at the NSTA National Convention, Orlando, Florida in April 2000. Scientific literacy has been a significant emphasis in many of the recent science curriculum documents in North America (American Association for the Advancement of Science, 1993; National Research Council, 1996; Council of Ministers of Education, Canada [CMEC], 1997).Evolvingdefinitions of scientific literacy (DeBoer, 2000) show important structural differences. Instead of defining scientific literacy as merely ability to recall scientific facts (termed “functional scientific literacy” by Bybee, 1997), some have moved to examine the influence of science on personally relevant issues (Kolstø, 2001; Roth, 2003). This is consistent with a curriculum emphasis known as Science/Technology/Society (“STS,” Roberts, 1982) that focuses upon the social, political, and economic dimensions of science (Hughes, 2000). In this paper, we argue that assessments should embrace STS to support this broad definition of scientific literacy. We provide a specific example of an STS assessment item and discuss some issues encountered in its development and interpretation when used in the 1995 Third International Mathematics and Science Study (TIMSS). Recent moves towards performance assessment have opened new possibilities for assessing some of the “process skills” in science education (albeit some 30 years after they began to appear in curricula). However, in the last 20 years, more goals for developing a truly scientifically literate population have been added to science curricula. These focus not on science knowledge alone, nor even on science process skills, but on the application of scientific knowledge to real issues in society—issues that frequently include technology. Inclusion of STS in science curricula presents new challenges, especially in terms of designing effective and appropriate STS assessments.Orpwood (2001) argues little progress can be made towards implementing an STS emphasis in science curricula unless and until valid assessment tools are developed. This paper examines one of the science items used in TIMSS that was designed to measure STS achievement. We use this example to raise questions about what STS education should comprise, what we want our students to achieve in the area of STS, and what sorts of STS assessment are needed in the future. TIMSS in BriefTIMSS is a large-scale international assessment of achievement in mathematics and science. It was first administered in 1995, but has since moved to a four-year cycle (1999, 2003, and 2007). The TIMSS tests were developed for use with three student populations:
Over 40 countries chose to participate in population 2. Some, such as Canada, provided sub-samples at the provincial level, which enabled provinces to examine their students’ achievement. The frameworks for describing the mathematics and science curriculum used in TIMSS were developed at the beginning of the study (Robitaille, Schmidt, Raizen, McKnight, Britton, & Nicol, 1993). Based upon curriculum statements from participating countries, the TIMSS framework was used to prepare and describe the test blueprints (Schmidt, McKnight, Valverde, Houang, & Wiley, 1997). Many possible items were submitted at the beginning of TIMSS; more were written specifically for the study. Selections of these were compiled and field-tested in most of the 45 participating countries. At the same time, national coordinators were invited to review the items for concurrence with their own curricula and for general suitability. Following the field trials, the TIMSS mathematics and science coordinators prepared final versions of the TIMSS tests, which were reviewed by a committee of subject matter specialists and by the TIMSS national coordinators. The final selection process took into account both the psychometric properties of the items based on the field tests and the results of the item reviews. STS in TIMSSWhile STS appeared as a component in the TIMSS curriculum framework, it was not emphasized. When the test blueprint (based on the analysis of national curricula) was finalized, STS was subsumed into a category called “the nature of science” which itself constituted such a small aspect of the overall test that it was reported initially in combination with “environmental issues.” This illustrates that one barrier to visibility of STS was that many national curricula paid it scant attention and many national coordinators regarded some STS items as inappropriate. In an international study like TIMSS, it is difficult to insist that items of significance to a small number of countries – in this case, mostly the English-speaking countries – should be retained, when a majority of other countries do not support their use. Many items that could be regarded as STS items are culturally or nationally specific. Issues such as methods of energy production, effects of radiation, protection of wildlife, or methods for combating pollution are difficult to write in such a way that students from a wide variety of countries find them equally meaningful, let alone equally difficult. Even when “good” questions are prepared, it is hard to find international consensus on “good” answers. These are required if the scoring scheme is to be reliable. What is an STS Item?It is helpful to try and characterize a variety of items that might be considered “STS items.” To do this, we use a framework developed by Glen Aikenhead (1994; Keeves & Aikenhead, 1995) to describe a variety of STS curricula and the corresponding degree to which STS features in assessment. Aikenhead's framework has already proven useful in analyzing various science curricula (see Table 1).
Table 1.Range of Integration of STS and Science Content (Keeves and Aikenhead, 1995) For example, Jasper (1998) used this framework to analyze what Roberts (1998) described as the “curriculum policy images” held by Alberta teachers – their interpretations of STS in a mandated provincial curriculum . Aikenhead’s framework, useful in analysis of curriculum materials and policy documents, might also be useful for comparing assessments. It could be applied to an examination, the range of assessments used by a teacher in the course of teaching, or a collection of assessment items. For example, selected TIMSS science literacy items (Retrieved fromhttp://TIMSS.bc.edu/TIMSS1995i/TIMSSPDF/CitemMSL.pdf) are categorized by the authors into Aikenhead's levels—see Table 2 below.
Table 2: Categorization of Selected TIMSS Items Using the Aikenhead Framework If such a profile were to be constructed for the overall TIMSS item pool, we predict that over 95% of the items would fall into level 1 of the framework (or even level 0 – if this were added for items with no STS aspects). The frustrations of some TIMSS critics such as Fensham (1998), who remarked that he found many of the items to be "boring,” may perhaps be explained by a lack of STS content. A recent international assessment – the Organization for Economic Co-operation and Development (OECD) sponsored Programme for International Student Assessment (PISA) – examined this issue. Its focus is “literacy,” interpreted as students’ ability to cope in real-life situations. The PISA item pool is not publicly available but, based on the program’s assessment framework (OECD, 1999), PISA should have a distinctly higher STS emphasis compared to TIMSS. We turn now from general considerations of STS assessment to a specific account of one TIMSS item we call the “Bridge Question.” The Case of the Bridge QuestionThe “Bridge Question” 1. was used for TIMSS Population 2 and the Population 3 Mathematics and Science Literacy study. Our analysis here begins with a discussion of the structure of the question’s STS content and moves on to a description of the TIMSS coding scheme together with comparative data from different jurisdictions including Ontario and Newfoundland
Figure 1. The Bridge Question The two parts of this question differ in their classification on Aikenhead’s schema. Part (a) is level 1 or 2: “why must steel bridges be painted?” We classified part (b) as level 7 or 8: “consequences of using the new paint.” We begin our discussion with consideration of part (a). However, since the STS content of part (b) is much greater, we shall give it more attention. Part (a) Why paint steel bridges?The TIMSS coding scheme for part (a) is shown in Figure 2. Coding schemes for extended response questions were designed to capture both the “correctness” of student responses (represented by the first digit of the scoring code—1 for correct, 7 for incorrect) and the variety of student responses (represented by the second digit, 9 to represent “other”). 2.
Figure 1. Codes for reason to paint Overall 59%3. of 13 year-olds (Population 2) gave the correct response coded “10” relating to rusting or corrosion. Together with 4% giving other correct responses (code “19”), 63% of students internationally responded correctly. By country, correct responses ranged from 85% (England) to 16% (South Africa). The “incorrect” responses are interesting. In Japan and Korea, 25% of the students mentioned only esthetics, code “70.” Twenty-five percent is much higher than the international average of 5.5% for this response. We cannot explain this, as Korean students performed well on part (b). England, Japan and Korea did not participate in the Population 3 Literacy test. Part (b) What are the Consequences of Using New Paint?The TIMSS coding scheme for part (b) of the “Bridge Question” is shown in Figure 3. Coding guidelines are shown as notes following the figure.
Notes: Each of the two consequences must be coded separately. The same code can be used twice. However, if the consequences described are essentially the same, the second should be coded as 79.
Figure 3. Coding for consequences of using new paint The three specified correct responses represent distinct consequences of using the new paint. Code “10” focuses upon increased profits for companies or communities as a consequence of less frequent painting—an economics-driven view of the problem. Codes “11” and “12” focus upon the impact upon the workers. In code “11,” the effects are couched in terms of improvements in the workers’ lives, a social perspective that is generally seen as positive. Responses classified as code “12” also take account of the workers’ perspective and represent both economic and social viewpoints, but with negative consequences for the workers: lower salaries, more layoffs and unemployment. Code “19” was used for other locally determined “correct” responses, for example that may relate to employment insurance payments. We now move on to a discussion of how students from around the world responded to these questions. Population 2 (13 year-olds) Key Results 4.
Population 3 (End of High School, Science Literacy) Key Results
We now move to a discussion of some distinctly Canadian results for this question. DiscussionEarlier in this paper we stated that STS content is at the core of part (b) of the “Bridge Question.” The TIMSS coding categories emphasize profit (code “10”), painters working less (code “11”) and unemployment or lower salaries (code “12”). These concerns do not conform to what traditionally has been called science content. However, the impact of a change in technology upon society is very much part of an STS perspective on science content. Our first discussion theme is the nature of the differences between the responses from Population 2 and Population 3. In general, older students were more successful in answering the “Bridge Question.” However, the emphasis of older students’ responses was also different (see Table 3).
Table 3. Frequency of Correct Responses to Part (b) in Population 2 and Population 3 (aggregate international results) Overall, older students gave more correct answers but the figures demonstrate a change in emphasis. Older students appear more likely to consider economic impacts as demonstrated by the increased frequency of responses coded “10” and “12” (e.g. code “10” responses increased from 26% to 50%). This brings us to our second theme. The economic and social context in which TIMSS was written, and increasing awareness of economic issues among older students, may be partly responsible for responses coded “10” and “12.” Particularly interesting are Canadian responses predicting negative impacts. As Canadians, we have chosen to examine the apparently anomalous results within our country. Of all Population 2 groups around the world, only students in the provinces of Newfoundland and Ontario identified negative consequences for workers more frequently than other impacts of using the new paint. Let us examine these results in the broader economic and social context of the time. In Newfoundland, for four years prior to TIMSS, the unemployment rate was over 25%. This is attributed to the decline and effective closure of the cod fishery and the subsequent expansion of a crab fishery (see Table 4). Reductions in income and changes in working practices leading to increased unemployment are phenomena overly familiar to all Newfoundlanders and were part of the life experience of the students who participated in TIMSS.
Table 4. Values of Selected Fisheries in Newfoundland (Source: Department of Fisheries and Oceans, Canada). Ontario has been Canada’s most affluent province. However, during the years immediately preceding TIMSS there were major declines in the manufacturing sector followed by a period of government restraint. According to the Ontario Ministry of Finance, unemployment rates doubled from 5% in 1989 to 10% in 1994, the year before TIMSS, with a large reduction4. in the number of people employed in manufacturing industries. At the time of the assessment, there was intense public debate about the impact of North American Free Trade upon (un)employment in Ontario and Canada. We speculate that there were, in both Newfoundland and Ontario, some parallels to the bridge scenario. Students aware of negative consequences for workers associated with the introduction of new technologies might be more likely to give code “12” responses. Finally, it was surprising to many, including the authors, that results from Ontario and Newfoundland should be anomalous while the vast majority of students in the world believed new technologies would have mainly benign social impacts. Internationally, 87% of all responses were coded “10” or “11.” Although some may view such “benign” responses as somewhat naïve, it is interesting that students from English speaking countries (with the exception of New Zealand) were least likely to have responses coded as “10” or “11.” These are the same countries where STS has been emphasized in recent years. Reflections on STS AssessmentSeveral issues emerged from the results of the TIMSS “Bridge Question,” raising further questions for science educators. What should be the scope of STS in science education? If STS means that students should learn science in a broad social context, we would argue that this broader context should include social, economic, technological, political, and environmental aspects. The TIMSS results suggest, and many STS materials confirm, that STS often refers to the social and environmental impacts of science and technology—these being most familiar to teachers—while technological and economic impacts are given less emphasis. What Counts as STS Assessment? Science educators must match STS curricula with appropriate assessment. As performance tasks with hands-on investigations are now used to assess the inquiry skills that are part of science education, so new forms of assessment with tasks and questions drawn from the real world need to be incorporated into STS assessments at classroom, local, national and international levels. Part of the challenge in developing such assessments is the variety of contexts in which students live and the variations of “right answer” that frequently are apparent in real-world situations. At stake here is not so much the definition of “right answer” but rather how to enable students (and teachers) to consider feasible solutions to real problems. Reliability concerns have caused some psychometricians to feel uncomfortable. More work in this area would be worthwhile. How do students’ contexts affect their responses? Responses to items like the “Bridge Question” show that students draw on their personal experience and social context. This is appropriate, but presents problems for scoring. In TIMSS, local scoring teams were instructed to use their own judgement concerning what “made sense” in their national context. For example, in a country with provisions for guaranteed employment, consequences for the painters might be very different from one in which there were no such provisions. Such differences need to be taken into account by all involved in test development and use (Bartley, 1995). ConclusionsWhat should science educators do next? In these days of rigorous assessment being demanded by policymakers and parents, failure to address the needs of STS assessment will result in the continued marginalization of STS. If we believe that a scientifically literate person is one who not only knows science and knows how to do science but also can relate science to technology and the world outside of school, then to improve science literacy we must include STS in both our instruction and in our assessment. We must convince those with more limited views about what counts as science education that an STS emphasis is indeed important and should be included in all science assessments. We need to develop more examples of STS assessment. And we need to conduct further discussion, research and development in this area. ReferencesAAAS (1993). Benchmarks for science literacy. Retrieved October 31, 2005 from http://www.project2061.org/tools/benchol/bolintro.htm),. Aikenhead, G. S. (1994). What is STS teaching? In J. Solomon & G. Aikenhead (Eds.), STS education: International perspectives on reform. New York: Teachers College Press. Bartley, A. W. (1995). A framework for validation enquiry of performance assessments in science. Unpublished doctoral dissertation from the University of British Columbia, Vancouver, BC. Bybee, R. (1997). Achieving scientific literacy: From purposes to practices. Portsmouth, NH: Heinemann. Council of Ministers of Education, Canada [CMEC] (1997). Common framework of science learning outcomes K to 12: PanCanadian protocol for collaboration on school curriculum: For use by curriculum developers. Toronto, ON: CMEC. DeBoer, G. E. (2000). Scientific literacy: Another look at its historical and contemporary meanings and its relationship to science education reform. Journal of Research in Science Teaching, 20(6), 582--601. Fensham, P. (1998). Insights from TIMSS for Australian science education. Paper presented at the annual meeting of the National Association for Research in Science Teaching, San Diego, CA. Garden, R. A., & Orpwood, G. (1996). Development of TIMSS achievement tests. In M. Martin & D. Kelly (Eds.) Third international mathematics and science study, technical report, volume 1: Design and development. Chestnut Hill, MA: Boston College. Hughes, G. (2000). Marginalization of socioscientific material in Science–Technology–Society science curricula: Some implications for gender inclusivity and curriculum reform. Journal of Research in Science Teaching, 37(5), 426-440. Jasper, W. G. (1998). Detecting biology teachers’ images of teaching about Science, Technology, and Society. Unpublished Master’s thesis, University of Calgary, Canada. Keeves, J. P, & Aikenhead, G. S. (1995). Science education in a changing world. In B.J. Fraser & H.J. Walberg (Eds.), Improving science education. Chicago: NSSE. Kolstø, S. D. (2001). Science education for citizenship - thoughtful decision-making on science-related social issues. Unpublished doctoral dissertation from the University of Oslo, Norway. Retrieved March 20, 2005 from http://www.uib.no/people/pprsk/Dankert/Articles/DokContent.htm. Lie, S., Taylor, A., & Harmon, M. (1996). Scoring techniques and criteria. In M. Martin & D. Kelly (Eds.), Third international mathematics and science study, technical report, volume 1: Design and development. Chestnut Hill, MA: Boston College. Miller, J. D. (1998). The measurement of civic scientific literacy. Public Understanding of Science, 7, 1-21. OECD (1999). Measuring student knowledge and skills: A new framework for assessment, Paris: OECD. Orpwood, G. (2001). The role of assessment in science curriculum reform. Assessment in Education, 8(2), 135-151. Orpwood, G., & Garden, R. A. (1998). Assessing mathematics and science literacy. Vancouver, BC: Pacific Educational Press. Petroski, H. (1995). Engineers of dreams: Great bridge builders and the spanning of America. New York: Random House. Roberts, D. (1982). Developing the concept of “curriculum emphases” in science education. Science Education,66, 243-260. Roberts, D. (1998). Developing the concept of “curriculum policy image” in science education. Paper presented at the annual meeting of the National Association for Research in Science Teaching, San Diego, CA. Robitaille, D. F., Schmidt, W. H., Raizen, S., McKnight, C., Britton, E., & Nicol, C. (1993). Curriculum frameworks for mathematics and science. Vancouver, BC: Pacific Educational Press. Roth, M. (2003). Scientific literacy as an emergent feature of collective human praxis. Journal of Curriculum Studies, 35(1), 9-23. Retrieved March 17, 2005 from http://faculty.ed.uiuc.edu/westbury/JCS/Vol35/ROTH.htm. Schmidt, W., McKnight, C. C., Valverde, G. A., Houang, R. T. & Wiley, D. E., (1997). Many visions, many aims, volume 2: A cross-national investigation of curricular intentions in science. Dordrecht, NL: Kluwer.
1 The original version of this question was written for timss by one of the authors (AB) and derives from a real-world context. The Forth Bridge in Scotland used to occupy twenty-four painters painting on a continuous twelve-year cycle to keep the entire structure covered with five coats of paint. The vastness of the endeavour was so well known that “painting the Forth Bridge” became a metaphor (in the UK) for an endless task (Petroski, 1995, p. 381). 2 For more information the TIMSS coding system, see Lie, Taylor, and Harmon (1996). 3 Data presented in this paper is taken directly from the appropriate TIMSS reports available at http://timss.bc.edu/timss1995i/TIMSSPublications.html. Averages presented here are means of all students responding to a question, or part of a question. 4 In describing the results, we combine the percentages of students citing any given reason as their first or second choice since duplicates were not given credit and no priority between reasons was called for. All TIMSS data are available from the TIMSS International Study Center, Boston College. 5 The Canadian provinces of British Columbia, Alberta, Ontario, New Brunswick (English-speaking schools), and Newfoundland elected to sample at a level to enable provincial-level comparisons. 6 Numbers employed in manufacturing in Ontario: 1989 (1,022,000) — 1994 (901,000). Source: Ontario Ministry of Finance.
|