Minggu, 19 Desember 2010

Toward a Research and Development Strategy for Computer-Assisted Language Learning


John L. D. Clark
Defense Language Institute


Abstract:
This article describes and recommends an approach to CALL-related research and development activities in which the CALL portion of the learning process is not addressed in isolation but as part of a total instructional system which also includes the live teacher, textbooks and other print materials, outside-of-class learning opportunities, and numerous other non-technological components. In order to determine the optimum instructional strategy (including CALL) to be used in a given language learning situation, the instructional developer must first assemble detailed information about the intended outcome goals of the instruction, as well as about the language background, language learning aptitude, and other input characteristics of the students to be taught. Only after these crucial initial steps have been taken does it become possible to meaningfully and effectively consider the appropriate instructional strategies to be used in developing the intended outcome performance abilities on the students' part. Eight major procedural steps are discussed for developing, refining, and evaluating the efficiency and effectiveness of language teaching programs based on the input-process-output model described.

KEYWORDS: CALL, CALL research, instructional system design, instructional program development
The purpose of this article is to briefly discuss some of the major considerations that appear to be at issue in planning and carrying out optimally useful instructional development projects and associated research studies within the context of computer-assisted language learning (CALL). At issue under this rubric are those activities whose purpose is to develop and validate instructional procedures that are intended to have immediate, or at least near-term, practical application in language teaching programs. "Purer" types of CALL-related
5
research, in which the .primary goal is to extend the investigation of computer-assisted technology into new, cutting-edge areas of inquiry, independently of any particular near-term operational benefit, quite properly follow other rationales and procedures which are outside the intended scope of the present article.
It would be useful to begin by suggesting that the terms, "research in CALL," "research on CALL," or any other characterizations that imply as their primary focus the scrutiny of the technology or of the impact of the technology per se, may inappropriately limit both the nature and scope of the research questions addressed under such an aegis and, by the same token, the informational usefulness of the obtained results. With the Possible rare exception of a language learning program delivered in its entirety by computer, real-life instructional programs characteristically make use of computer technology as only one of several different types of delivery media, which also include the human teacher; textbooks and other print materials; "lower-order" technology such as linear audiotape or videotape; and, not infrequently, contact opportunities with a variety of native speakers either in the classroom or (in the case of study abroad programs) in actual in-country communicative settings. In all of these instances, the CALL component does not--or at least, should not—operate autonomously, but in close and complex interaction with each of the other components, in a joint and carefully integrated attempt to produce the intended learning results. For this reason, attempts to isolate the CALL portion for separate, individual development and study would risk not adequately capturing and appropriately reflecting the instructional contribution of the other, non-CALL portions of the overall learning program. This, in turn, could have potentially serious effects on the appropriateness and overall efficiency of the instruction itself, as well as on the degree of confidence with which any associated research results could be extrapolated to other learning situations.
To more vividly characterize the issue under discussion, one might imagine that classroom chalk has only recently been invented, with the pedagogical world agog at the virtually unlimited capabilities of the new technology. In an attempt to study and to mine these previously unavailable resources, a variety of research-oriented activities are undertaken. Large-scale comparative studies of "chalk-mediated" vs. "traditional" teaching procedures are carried out; at a more detailed level, experiments are conducted on the differential effectiveness of colored rather than white chalk, and of left-slanting rather than right-slanting chalkboard presentations; and students themselves are
6
extensively queried about their cognitive and affective reactions to "chalk-assisted instruction" vis-a-vis the more conventional procedures.
Although somewhat fanciful, the example does serve to emphasize that if undue investigatory attention is paid to the chalk- (or CALL-) technology in and of itself, this orientation may tend to ignore or obscure the much more critical issue of whether the total instructional system to which the student is being exposed— including but by no means limited to the technological portion as such—is delivering suitable, timely, and effective instruction. Viewed from this broader perspective, studies focused on determining whether "CALL works," or "works better than" other types of instruction would appear to miss the opportunity to address the much more significant question of the proper design and implementation of well-reasoned and logically developed total systems of instruction, within which the CALL component per se would be crafted to function as an important and carefully integrated component.
If, for purposes of further discussion, the desirability and potential utility of adopting a "total-system" approach to CALL-related instructional development and research can be accepted, the immediate next question is that of identifying an appropriate system model. The so-called "input-process-output" paradigm, widely used in manufacturing and other technical/scientific areas, would merit strong consideration. This model can be schematically represented as follows:
INPUT --- > PROCESS --- > OUTPUT
A simple example using this model is that of automobile manufacturing, in which the intended output is a vehicle having specified target characteristics with respect to load capacity, road clearance, gasoline mileage, and other factors; the input elements are the necessary quantities and qualities of sheet metal, engine components, trim, etc.; and the process is the assembly line, whose task is to carry out a series of operations on the input materials so as to produce a vehicle satisfying the originally specified output requirements.
Although second language learning is certainly a considerably more complex and less highly defined (or definable) operation than automobile production, an input-process-output perspective on this undertaking does appear both relevant and productive. As an example, a particular language program might have as its intended output goal the development of reading comprehension to a point at which students completing the program will be able to accomplish basic "survival" reading tasks (e.g., understand street signs, read
7
menus, etc.) within settings normally encountered by non-native in-country travelers. The student input in this instance might be specified as adult native speakers of English with no prior study of the target language. Once provided the intended output goal or goals of the instruction, together with relevant information on the characteristics of the input students, the instructional developer is in a position to carefully consider and delineate the specific learning process that would appear to have the best likelihood of efficiently and effectively inculcating the desired performance capabilities on the students' part.
From this single and deliberately Quite simple example (most "real-life" language programs and teaching contexts being considerably more complex and challenging), it can readily be seen that the instructional processes most suited to a particular language learning program are strongly and inevitably dependent both on the established output goals of the program and on the initial entry characteristics of the students to be taught. By the same token, the nature and magnitude of the contribution made by CALL within a particular learning program would be expected to vary as a function of these same considerations; and any attempt to develop, discuss, or conduct and report research on CALL-using instructional programs without taking input and output characteristics formally into account would be to risk developing inappropriate/inefficient teaching programs and incorrectly and misleadingly reporting and interpreting the results of research based on these programs.
Based on the two touchstone concepts discussed briefly above—the need to view CALL as only one of several possible instructional media, one that operates in concert with the live instructor and other media to provide the total instructional system; and the input-process-output model as a useful and powerful means of conceptualizing and describing the essential components of language instructional programs in general—it is possible to suggest, at least in broad outline form, the major elements of a reasonably systematic approach to, and potential implementation strategy for, operationally-oriented language program development incorporating CALL technology, as well as for related research activities. The proposed approach involves carrying out the following types of activities, generally in the order shown, but with simultaneous development possible and desirable in several instances.
(1) Carefully define the input-output characteristics at issue in the program development or research project,
(2)Develop or refine testing instruments fully adequate to the task of measuring the intended outputs.
8
(3) Develop standardized and uniform means of measuring and reporting instructionally relevant characteristics of the input students.
(4) Gather state-of-the art information on the instructional strategies best suited to particular components of the language learning process.
(5) Select the media/procedures most suited to the effective delivery of instruction based on the identified strategies.
(6) Design instructional programs, embodying CALL components as appropriate, based on information obtained in (4) and (5).
(7) Conduct "microresearch" as necessary to develop and refine the instructional program.
(8) Conduct "macroresearch" as necessary to validate the efficiency and effectiveness of the overall instructional program.
Each of these component activities is discussed in greater detail below.
1. Carefully define the input-output characteristics at issue in the program development or research project.
A high degree of clarity and definition in the intended performance goals to be met by the student on completion of the instructional program is vital to both the proper design of the program and the evaluation of program results. At a minimum, the developer/researcher should specify the skill areas (listening comprehension, reading comprehension, oral production, and/or writing) in which the student would be expected to demonstrate accomplishment, as well as specify the testing procedure or other means by which this accomplishment is to be measured. For example, the output goal definition for a comprehensive, multi-year program of instruction might specify that the students are to develop an ACTFL/ILR "level 4" proficiency in reading, together with 'level 3" competence in both listening comprehension and speaking. Armed with these end-of-program goals, the instructional developer would then be in a position to select and sequence those teaching procedures that he or she considered most suited to effectively and efficiently developing the specified outcome proficiencies on the students' part. On the basis of the same procedure, program effectiveness research studies would have available pre-specified and clearly understood assessment criteria by which the overall program results would most appropriately be evaluated.
Student input characteristics—often omitted from or only cursorily addressed both in program design efforts and in research studies and reports—include such factors as the students' age; aptitude for foreign language
9
study; prior familiarity with and affective orientation to various types of teaching procedures; purposes and motivation for language study; and, of course, entry-level knowledge of or proficiency in the language, which substantially determines the type and degree of didactic challenge that the instructional program will face in bringing the students up to the intended output level of performance. It would be necessary for the program designer to have as much information as possible about these and other relevant student characteristics in order to properly tailor the program to the intended audience and to avoid, on the one hand, instructional "overkill" and, on the other, instruction that is too sparse or abstruse for the particular students involved. Research studies on program effectiveness would also need to take student characteristics into account in order to properly evaluate the observed results and make informed statements about other student groups and learning contexts to which the results might legitimately be extrapolated.
Although clarity in input-output specification is of great importance to individual program development and research activities, there is a larger sense in which these terms and the considerations underlying them could be helpful to the language teaching community. Specifically, they could be used as foci for large-scale discussion of the intended outcomes of second language instruction in the United States and of the types of individuals or groups for which this instruction would best be provided. Of course, given the enormous diversity which characterizes the second language education field, it is not likely that a uniform national policy addressing the goals of language instruction and the types of students to be accommodated within the instructional program could or would be developed within the foreseeable future. However, across the total instructional spectrum, there may be several areas within which groups of instructional developers or development researchers might voluntarily agree to concentrate their efforts, with regard to both the envisioned outcome goals and the types of students to be served. For example, the teaching of general/global foreign language proficiency to adult learners is an interest shared by the Defense Language Institute and several other government and private agencies. The learning by adults of English as a second language is a pedagogical concern of many TESOL-affiliated organizations; and those involved in planning and developing foreign language programs at the elementary school level might find a natural collaborative framework in the recently-established Advocates for Language Learning (ALL) organization. Without further elaborating on this issue, it could be suggested that to the extent that developers or researchers
10
within particular formal or informal organizations could agree on the most critical input-student/ output-goals combinations within their areas of interest, and focus joint development and research attention on these particular combinations, this approach would provide for considerably greater synergism of effort by all concerned than would otherwise be the
case.
2. Develop or refine testing instruments fully adequate to the task of measuring the intended outputs.
The language testing field is fortunate in having available to it the concept of general proficiency and the associated testing procedures originally developed by the Foreign Service Institute and subsequently expanded on and disseminated within the government setting by the Interagency Language Roundtable (ILR) and in the private sector by Educational Testing Service (ETS), the American Council on the Teaching of Foreign Languages (ACTFL), and others. (For development history and current status of this testing approach, see Liskin-Gasparro 1984; Clark and Clifford 1987.) Although certain aspects of the "proficiency" movement and "proficiency-based testing" are undergoing lively debate in the professional journals and other forums (Lantolf and Frawley 1985; Bachman and Savignon 1986; Jarvis 1986; Kramsch 1986; Bachman 1987), the overall positive effect on the language teaching enterprise of being able to cast instructional goals in terms of the development of real-life communicative abilities in realistic language-use settings has been extremely great. By the same token, the proficiency movement has provided highly workable, if not conceptually and operationally perfect, instruments and procedures for measuring functional ability in the target language. Moreover, from an evaluative research perspective, these instruments and procedures have the extremely important advantage of reporting the obtained results in terms of externally established performance criteria that are totally independent of any particular internal features or characteristics of the instructional program per se. As such, they can serve a genuine "honest broker" function either in the qualitative evaluation of a given learning program or in the comparative evaluation of two or more competing programs, provided, of course, that the intended output goals of each are considered to be properly expressed in "proficiency" terms.
Notwithstanding the major strengths of "proficiency" testing of the ILR type, the generalized nature of the ILR proficiency scale and, in particular, its very broad scope, which spans the full range of ability from no functional proficiency in the language to that of an educated native speaker, makes it insensitive to relatively small increments of student performance change.
11
Although the more detailed break-out of the lower levels of the ILR scale into the Novice, Novice-Plus, etc. categories established by ACTFL does offer the possibility of documenting increased performance over time frames of a single school term or semester, neither the original lLR scale nor the ACTFL adaptation is capable of measuring unit-to-unit or other short-term performance changes, as would be needed in materials-development research or other highly focused, diagnostically-oriented studies. Fortunately, a highly viable alternative appears to be at least potentially available in the form of so-called "prochievement" tests--a term coined by the author to characterize a testing approach that combines elements of both proficiency and achievement testing: proficiency testing in that the student is asked to function linguistically within "real-life" communicative contexts, and achievement testing in that the corpus of material with which the student is asked to operate does not go beyond that to which the student has been formally exposed in the course of instruction. The prochievement testing concept has not been extensively discussed in the measurement literature to date, despite its fairly widespread application in communicatively-oriented classrooms over the past several years (Omaggio 1983). However, the recently-established National Foreign Language Center has expressed interest in further promoting both theoretical and practically-
oriented discussion of prochievement testing issues, and is planning to host a series of discussion meetings with interested individuals and institutions over the next several months (Walton 1988).
The widespread use of prochievement-oriented testing procedures within the context of language program development and evaluation would be expected to satisfy two important pedagogical and measurement concerns. First, it would continue to place needed emphasis on functional performance in realistic language-use settings; and second, it would permit more finely-grained measurement of student accomplishment over relatively small instructional time spans than is possible using the broad ILR (or ACTFL) scale and testing procedures. However, a considerable amount of both conceptual fine-tuning and practical development work will be needed before the prochievement testing approach is adequately realized in the form of detailed test development guidelines and procedures that can be effectively used by language program developers and researchers on a widespread basis. The challenge is one which the profession should readily and enthusiastically accept in the interest of making available assessment tools that will considerably facilitate both program
12
development and associated evaluation research.
3. Develop standardized and uniform means of measuring and reporting instructionally relevant characteristics of the input students.
The identification and measurement of learner-specific variables affecting the facility and overall degree of success with which individuals are able to learn a second or foreign language are areas of inquiry in which considerable further work remains to be done. Important and still valid research on the components of "language learning aptitude" was conducted by John B. Carroll and his associate, Stanley Sapon, in the early 1950s, with this activity culminating in the well-known and extensively validated Modern Language Aptitude Test (MLAT). (See Carroll 1981 for a detailed history of development of this and other language aptitude measures.) In follow-up to this initial work, the Carroll-developed "model of school learning" (Carroll 1963) attempted to elucidate the relationship between "aptitude" for a task such as foreign/second language learning and a number of other types of both learner-specific and external variables, including motivation, opportunity to learn, general intelligence, and overall quality of instruction, each of which was considered to be functionally related to the efficiency (speed) of learning. According to the Carroll model, students with relatively low "aptitude" for a given learning task, but who were both highly motivated and supplied with instructional materials of high quality, should be able to learn a given body of material as rapidly as a student of higher aptitude but with a lower motivation level and/or poorer quality instructional materials. An immediate implication of the Carroll model for program development and research is that in order to adequately account for and quantify the variety of learning-affecting variables brought to the learning situation by the students themselves, suitable measures of "motivation" and other learner-specific variables will need to be routinely administered in addition to the more traditional language aptitude measures.
Fortunately, the profession appears to be at the point of having many of the requisite instruments either fully available or well under development. Tile work of Gardner and others in the areas of student motivation for language learning and attitudes toward language study (Gardner and Lambert 1972; Gardner 1983) has provided both the conceptual and empirical underpinning and the associated questionnaire-based instruments to permit the effective measurement of these two learner variables. Within the past few years, a considerable amount of research effort has also been concentrated on identifying and operationally defining differences in learning styles or strategies that the
13
learner brings to the language learning task. The work of Witkin and Goodenough (1977) in identifying so-called "field dependent" and "field independent" learners has had research spinoffs in foreign language teaching contexts; and more recently, Rebecca Oxford and her associates have developed quite extensive taxonomies of differential learning strategies and procedures used by language students in addressing various language learning tasks (Oxford 1986).
The major implication of all of the foregoing for language program design and research, especially for programs having a CALL component, is the crucial need to capture, by means of suitable test instruments, questionnaires, or other data gathering procedures, as much quantifiable information as possible about learner-specific variables anticipated to bear either positively or negatively on the instructional outcomes, and to report this information as part and parcel of the developmental activity or research study at issue. If these types of information are not routinely obtained and reported, the ability to validly extrapolate findings from the development activity or research project per se to other instructional contexts is correspondingly reduced. As is the case with the output measures previously discussed, to the extent that instructional developers/researchers can collaborate on, and pool for joint use, questionnaires, test items, and other data gathering procedures addressing critical learner variables, the efficiency and "inter-operability" of the development and research activities involved will be enhanced.
4. Gather state-of-the art information on the instructional strategies best suited to particular components of the language learning process.
It may be suggested that the procedure best suited to identifying the most appropriate instructional media and procedures to produce given output goals is not, in general, that of large-scale comparative research. Indeed, the number of empirical studies that would have to be carried out to test the relative efficiency/effectiveness of specified media or combinations of media by comparison to other possible configurations would quickly reach enormous and for all practical purposes unmanageable proportions. Rather, a process of logical argumentation and theoretical modeling would appear much more viable and ultimately more fruitful. This approach would involve identifying, for each of the several instructional components of a particular language program, those language learning principles and strategies that would be considered to have the most firm and current theoretical and/or experiential support as representing logical and effective approaches to meeting the instructional requirements of
14
those program components. For example, large amounts of authentic listening comprehension input at the Krashen "i+1" level would generally be viewed as a potentially effective approach to the development of student proficiency in this skill area—perhaps preceded by some amount of instruction in targeted listening and in the use of advance organizers. Development of reading lexicon by providing the student numerous opportunities to work with new material within full and diverse contexts (as opposed to memorizing lists of lexical "equivalents") would be another example of generally agreed-upon "good pedagogical practice" insofar as this particular component of the total instructional program is concerned.
Some concern might be raised to the effect that a logical modeling approach is not sufficiently scientific, and that there might in fact be a "better way" to accomplish a given instructional task, if one could only be given the opportunity to research the issue in depth. The major point to be made here is that it is—and will always be— impossible to conduct all of the controlled studies that would be required to fully "prove" the appropriateness and validity of the pedagogical approach taken within each and every area of a learning program as complex and as sophisticated as those involved in foreign/second language instruction. As a matter of practical reality, it would appear necessary to decide upon at least the broad (component-level) parameters of the instructional program through non-experimental means, using, nonetheless, the best professional information and wisdom currently available in making these decisions. In this respect, an earlier study by Hayes, Lambert, and Tucker (1967) is of considerable interest and potential value as a means for determining the degree of professional consensus existing on a variety of pedagogical issues. In this study, the authors asked a panel of experienced teachers to independently judge the relative instructional merit of each of a large number of classroom practices. Data reported from this investigation included, for each of the evaluated practices, both a total "score" for that practice and a measure of the extent of agreement among the judges as to the rating assigned.
Procedures generally similar to those described above could readily be used to obtain the collective judgments of qualified individuals on each of a variety of specified teaching/learning principles or "instructional propositions," these judgments to include informed consideration of both the intended language performance outputs and the posited entry characteristics of the input students. This information would subsequently be used (assuming a reasonably
15
high degree of inter-participant agreement) to provide principled theoretical and procedural frameworks around which specified programs of instruction would be developed. To take this approach a step further, a Delphi procedure could also be used, in which the original evaluators and/or other qualified groups would engage in iterative paper- (or electronic network?)- based dialogues explicitly addressing program design issues for one or a number of specified input-output combinations, using the initial proposition-rating results as a basic point of departure.
5. Select the media/procedures most suited to the effective delivery of instruction based on the identified strategies.
It is probable that the particular media and procedures most appropriate to delivering the type(s) of instruction at issue within a given program component would be fairly readily apparent, once a good degree of specificity had been reached on the nature of the instructional activities themselves. For example, on the assumption that one major instructional component of the previously-mentioned "level 4" reading and "level 3" listening comprehension/speaking program would involve providing the student opportunities to engage in interactive discourse in authentic communicative settings, it should be quite evident that the human instructor would provide the medium par excellence for this aspect of the total program. For listening comprehension, a hypothetical previously-endorsed instructional proposition might be that the student should be exposed to a variety of different speakers of both genders and having diverse vocal characteristics. For this application (barring the possibility of student access to a large number of live native speakers), random-access audio or videodisc might be the medium of choice. Similar decisions would be arrived at for the other components of the learning program by carefully analyzing the characteristics and capabilities of various candidate media against the presentational and other pedagogical requirements dictated, or at least strongly implied, by the nature of the previously-specified learning activities. In all instances, it would be clearly understood that the particular characteristics of the intended instructional activity would drive the selection of the implementing medium, not the other way around.
6. Design instructional programs, embodying CALL components as appropriate, based on information obtained in (4) and (5).
Relatively few generalizations can be made or preliminary guidance suggested for the program development activity per se, since the scope and nature of the relevant procedures would necessarily vary considerably depending on both the instructional goals of the program and the student input characteristics. For example, two different programs, such as (1) a reading-only
16
program for doctoral candidates and (2) a social-situation listening comprehension and speaking program preparatory to travel or residence abroad, would--after separately following, as diligently as possible, the pedagogical and media-use dictates of the appreciably differing output goals and student clienteles at issue in each instance--be expected to exhibit quite different formal characteristics with respect to both the instructional media used and the nature and sequencing of the instructional activities. It might be suggested, as an aid in later program evaluation as well as for the information and use of other instructional developers in the same or closely related areas, that a conscientious and detailed "audit trail" be maintained throughout the development process, by documenting the major procedural decisions and rationales followed in making these decisions.
7. Conduct "microresearch" as necessary to develop and refine the instructional program.
As a quite new sub-area within the total armamentum of language teaching media and procedures, CALL can lay claim at present to only a relatively limited amount of developed knowledge and practical implementation experience by comparison to other more longstanding and "traditional" instructional approaches. Thus, there appears to be a considerable need—and an associated major opportunity— for instructional developers and others working within a CALL context to carry out and report the results of a large number of what might usefully be termed "microstudies," each addressing a relatively small and highly focused practical-application question in the use of this technology. The series of studies recently carried out by Robinson et al... (1985) at the Center for Language and Crosscultural Skills (CLCCS) in San Francisco is exemplary of such an approach. Using quite rigorous experimental and statistical techniques, Robinson and her associates investigated each of several alternative CALL-mediated approaches to particular instructional tasks. These included the teaching of structure, e.g., by providing an integrated, authentic context for the structure, and by allowing for meaningful practice of the structure; and the handling of error feedback and correction, e.g., via implicit or explicit correction, and by providing the opportunity to re-attempt missed items at spaced intervals. "Microstudies" of the type conducted at CLCCS have numerous advantages over larger-scale research projects insofar as practical instructional development is concerned in that they (1) focus on a small and relatively easily defined/objectified pedagogical issue or area; (2) involve comparatively little expense or administrative burden (again, by contrast to broad "program-
17
comparison" or other major studies); and (3) provide informational results at a level of detail and specificity that allows these results to be applied immediately and appropriately in other applications that share the same or closely similar instructional goals. The recent experiment by Pederson (1986) investigating the relative contribution to student reading comprehension of passage availability and unavailability during computerized question-answering exercises is another example of a "microstudy" that is directly applicable to instructional program development.
The Robinson et al., Pederson, and other similar studies involve the use of true experimental procedures, in which students are allocated to experimental and control groups which, respectively, do and do not undergo the particular instructional treatment at issue. Although this procedure does incorporate a high degree of empirical rigor, alternative approaches may also usefully be considered in conducting microstudies addressing CALL-related (or other instructional-design) questions. One such approach is the collection and analysis of student-provided feedback about various aspects of the instructional process, including both questionnaire-based data and information gathered through observation of and discussion with students about the ways in which they address their learning tasks. These and other ethnographically-oriented research techniques, while frequently employed in other subject areas, have not been widely applied in connection with language teaching/learning studies, notwithstanding strong suggestions by Lett (1983) and Pederson (1987) that such an approach could have great informational potential. Significant data gathering and analysis possibilities also exist for those CALL programs which automatically provide a record of individual students' responses in working through the program. Items of information such as response latencies, frequency of recourse to help screens, frequency of selection of branching choices, etc. are all potentially relevant to the increasingly informed initial design and subsequent fine-tuning of the CALL portion of the instructional program.
8. Conduct "macroresearch" as necessary to validate the efficiency and effectiveness of the overall instructional program.
Large-scale research studies, in which entire programs of instruction are evaluated with respect to their final learning outcomes (either individually or by comparison to the outcomes of other competing programs), were characterized in preceding sections as relatively ill-suited to addressing practical questions of instructional design with sufficient specificity and detail to provide useful procedural guidance to the program developer. Notwithstanding the generally low level of diagnostic feedback afforded by these broad-scope studies, they do
18
play a very proper and significant role in documenting, for the information of administrative bodies, funding sources, and other interested audiences, the total instructional yield of the program. The question at issue, then, is not so much whether "macro" studies should or should not be conducted, but rather, the particular form which such studies would most usefully take. In this regard, it would not be overstating the case to suggest that formal con7pat-ative studies, i.e., studies pairing an experimental learning program against a "traditional" program, or two or more experimental programs against one another, run an almost unacceptably high risk of producing final results of questionable validity and interpretability, as a result of inadvertent contamination of instructional methods, the influence of unforeseen conditions or events that affect student performance differentially across comparison groups, and a variety of other factors to which large-scale studies are almost invariably susceptible. In the language field, such large and ambitious research projects as the "Keating report" on the effects of language laboratory use (Keating 1963) and the later "Pennsylvania study" of language laboratory effectiveness (Smith and Baranyi 1968; Smith and Berger 1968), have uniformly fallen prey to a number of procedural lapses and other technical problems that have severely reduced the interpretability of the reported results (Clark 1969; Otto 1969; Valette 1969).
A second factor that reduces the informational utility of large-scale studies focused on the comparison of two or more competing methodologies is the relativistic nature of the obtained results. In the absence of agreed-upon, study-independent criterion measures of student performance, comparative studies may show "better" results for a particular instructional program by comparison to some other program without, however, addressing the question of either program's practical yield in terms of the development of functional language competence. As a result, two methods might show differential results on a comparative basis (e.g., "Method B" students scoring some number of points higher on the course final examination than "Method A" students), but produce essentially equivalent outcomes when viewed from the perspective of an externally defined and validated scale Of functional competency such as that provided by the ILR or ACTFL guidelines and scoring procedure.
Although the ACTFL/ILR or similar scale could indeed be used in a comparative study context, it may be suggested that the availability of such a scale would tend to obviate the need for formal methods-comparison studies per se in that—to the extent that program developers and researchers in the field
19
would be willing to adopt the ACTFL/ILR scale and testing procedure as a common standard for reporting the results of their own individual studies—a "single-method" research approach would be highly viable from both conceptual and technical perspectives.
Using this "single-method" procedure, the researcher would, first, carefully characterize the composition of the student input group, using, wherever possible, widely available and standardized measures (e.g., MLAT as a measure of "language aptitude"; Oxford's Strategy Inventory for Language Learning (SILL) as an indicator of student learning styles and strategies, etc.). Second, the program of instruction would be described as extensively as possible, including, in particular, such quantifiable aspects as the total number of hours of exposure to or use of each of the major instructional media and procedures (classroom instruction, self-study of text materials, CALL, interaction opportunities with native informants, etc.). Third and finally, competency results for individual students and for the participant group as a whole would be reported on the ACTFL/ILR or other commonly-accepted scale, together with any other ethnographically-based information On student reactions to the instructional process, suggestions for program improvement, and so forth. Summative appraisal of the instructional quality and results of the program would be based on reviewers' accumulated knowledge of the proficiency development and other outcomes of earlier programs having generally similar input/process parameters with respect to student backgrounds and other learner characteristics, instructional Contact hours, etc. Again, detailed quantification of these and other relevant characteristics of the input students and of the instructional program, making use of standardized and "field-accepted" measurement instruments and documentation procedures to the greatest extent possible, would be needed to make the "single-study" results maximally informational and useful to other program developers and researchers.
As previously indicated, the preceding outline of a possible broad-scale, several-front approach to the conceptualization, planning, instrumentation, conduct, and reporting of language program development activities and associated research projects involving CALL as Component is intended only to suggest a Possible "macro" framework for further discussion in these areas. More detailed explication of both the conceptual and practical issues involved will be necessary—further work that might most Profitably be undertaken within the context of designing and conducting a Prototype implementation project based generally on the principles described briefly in this paper. The author and others
20
at the Defense Language institute would certainly be interested in establishing a dialogue with others interested in taking further discussion/development steps in this regard.
References
Bachman, Lyle F. 1987. "Problems in Examining the Validity o the ACTFL Oral Proficiency Interview." In Proceedings of the Symposium on the Evaluation of Foreign Language Proficiency. Bloomington, IN: Committee for Research and Development in Language Instruction, Indiana University.
Bachman, Lyle F. and Sandra Savignon. 1986. "The Evaluation of Communicative Language Proficiency: A Critique of the ACTFL Oral Interview," Modern Language Journal 70(4): 380-390.
Carroll, John B. 1963. "A Model of School Learning," Teachers College Record 64: 723-733.
Carroll, John B. 1981. "Twenty-five Years of Research on Foreign Language Aptitude." In Individual Differences and Universals in Language Learning Aptitude, Karl C. Diller (Ed.), 83-118. Rowley, MA: Newbury House.
Clark, John L. D.1969. "The Pennsylvania Project and the 'Audio-Lingual s. Traditional' Question," Modern Language Journal 53(6): 388-396.
Clark, John L. D. and Ray T. Clifford. 1987. "The FSI/ILR/ACTFL Proficiency Scales and Testing Techniques: Development, Current Status, and Needed Research." In Proceedings of the Symposium on the Evaluation of Foreign Language Proficiency, Albert Valdman (Ed.), 1-18. Bloomington, IN: committee for Research and Development in Language Instruction, Indiana University.
Gardner, Robert C. 1983. "Learning Another Language: A True Social Psychological Experiment," Journal of Language and Social Psychology 2(2-4): 219-239.
Gardner, Robert C. and Wallace E. Lambert. 1972. Attitudes and Motivation in Second-Language Learning. Rowley, MA: Newbury House.
Hayes, Alfred S., Wallace E. Lambert, and G. Richard Tucker. 1967. "Evaluation of Foreign Language Teaching," Foreign Language Annals 1(1): 22-44.
Jarvis, Gilbert A. 1986. "Proficiency vs. Achievement: Reflections on the Proficiency Movement," ADFL Bulletin 18(1) 9-21.
Kramsch, Claire J. 1986. "Proficiency vs. Achievement: Reflections on the Proficiency Movement," ADFL Bulletin 18(1) 22-24.
Keating, Raymond F. 1963. A Study of the Effectiveness of Language Laboratories. New York: Institute of Administrative Research - Teachers College, Columbia University.
Lantolf, James P. and William Frawley. 1985. "Oral Proficiency Testing: A Critical Analysis," Modern Language Journal 69(4): 337-345.
Lett, John A., Jr. 1983. "Research: What, Why, and for Whom?" In Practical Applications of Research in Foreign Language Teaching, Charles J. James (Ed.), 9-49. Lincolnwood, IL: National Textbook Co.
21
Liskin-Gasparro, Judith E. 1984. "The ACTFL Proficiency Guidelines: A historical Perspective." In Teaching for Proficiency, The Organizing Principle, Theodore V. Higgs (Ed.), 11-42. Lincolnwood, IL: National Textbook Co.
Omaggio, Alice C. 1983. Proficiency-Oriented Classroom Testing. Washington, DC: Center for Applied Linguistics.
Otto, Frank. 1969. "The Teacher in the Pennsylvania Project," Modern Language Journal 53(6): 411-420.
Oxford, Rebecca L. 1986. Second Language Learning Strategies: A Practical Taxonomy and a Research Synthesis with Implications for Instruction and Measurement. Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. [Technical Report.]
Pederson, Kathleen M. 1986. "An Experiment in Computer-Assisted Second Language Reading," Modern Language Journal 70(1): 36-41.
Pederson, Kathleen M. 1987. "Research on CALL." In Modern Media in Foreign Language Education: Theory and Implementation, Wm. Flint Smith (Ed.), 99-131. Lincolnwood, IL: National Textbook Co.
Robinson, Gail, John Underwood, Wilga Rivers, Jose Hernandez, Carollyn Rudesill and Clare M. Ensenat. 1985. Computer-Assisted Instruction in Foreign Language Education: A Comparison of the Effectiveness of Different Methodologies and Different Forms of Error Correction. San Francisco: Center for Language and Crosscultural Skills.
Smith, Phillip D., Jr. and Helmut A. Baranyi. 1968. A Comparison Study of the Effectiveness of the Traditional and Audiolingual Approaches to Foreign Language Instruction Utilizing Laboratory Equipment. final Report, Project No. 7-0133, Grant No. OEC-1-7-070133-0445. Washington, DC: U.S. Department of Health, Education, and Welfare, Office of Education.
Smith, Phillip D., Jr. and Emanuel Berger. 1968. An Assessment of Three Foreign Language Teaching Strategies Utilizing Three Language Laboratory Systems. final Report, Project No. 5-0683, Grant No. OE-7-48-9013-272. Washington, DC: U.S. Department of Health, Education, and Welfare, Office of Education.
Valette, Rebecca M. 1969. "The Pennsylvania Project, Its Conclusions and Its Implications." Modern Language Journal 53(6): 396-410.
Walton, A. Ronald. 1988. Personal Communication, January 28, 1988.
Witkin, H. A. and D. R. Goodenough. 1977. "Field Dependence and Interpersonal Behavior," Psychological Bulletin 84: 661-689.
Author's Biodata
John L. D. Clark is Dean of Program Evaluation, Research, and Testing at the Defense Language Institute in Monterey, California. He was previously Senior Examiner in Foreign Languages at Educational Testing Service and Director of the Foreign Language Education Division of the Center for Applied Linguistics. Dr. Clark received an M.A. in Romance Languages from the University of North Carolina and an Ed.D. in Research in Instruction from Harvard University. His major current interests include foreign language 

 retrieved from calico.org


Tidak ada komentar:

Posting Komentar