SlideShare a Scribd company logo
1 of 48
Download to read offline
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.



Language corpora and the language classroom

1. Introduction

These days, language corpora are being used by language teachers, researchers and students
more and more often. Computers have become widely available in homes and schools,
corpora can be searched on the Internet for free and corpus resources have improved the
quality and the access to the methods of corpus linguistics in applied fields such as foreign
language teaching. Compiling your own ad-hoc corpus or a corpus of your own students is
easier today than ever before and free resources abound.




The most important application of corpora in language classrooms is called Data-driven
learning. Corpus Linguistics (CL) and Data-driven learning (DDL) are two terms that have
caught the attention of teachers in foreign language teaching (FLT) and researchers alike for a
decade now. This is so because the assumptions behind CL and DDL are of enormous
importance to language researchers and FL teachers. In a very recent publication, O'Keeffe,
McCarthy and Carter (2007:21) state the following about the application of language corpora
in FLT:

        As well as providing an empirical basis for checking our intuitions about language,
        corpora have also brought to light features about language which had eluded our
        intuition […] In terms of what we actually teach, numerous studies have shown us that
        the language presented in textbooks is frequently still based on intuitions about how
        we use language, rather than actual evidence of use.

It seems that language corpora can help us discover that which apparently appears undisputed
in prescriptive or in intuition-led textbooks and other reference materials.




In the following paragraphs, we will offer a brief account of the implications of CL and DDL
for mainstream FLT. In particular, we aim to present useful insights into how using language
corpora can help our teaching.

Most of the resources presented in this chapter are freely available on the Internet.




                                                                                                           Page 1 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




2. Corpus linguistics and Data and Data-driven learning in a
nutshell


2.1. Data in FLT: preliminary issues

Data-driven learning is a language learning approach that is “basically developed through
self-conscious activities instead of being imparted through conceptual knowledge” (Pérez
Basanta, C and Rodríguez Martín: 146-7). In DDL, learners become active researchers, they
see language from a different perspective and discover language and communication facts that
otherwise may remain unseen.




In DDL, reading concordance lines is a usual practice. Take the word important, a basic
adjective that learners use on an everyday basis in schools. The following screenshot from
Collins WordbanksOnline English corpus1 shows fifty random uses of the Word in a 10-
million corpus of spoken British English:




1
    http://www.collins.co.uk/Corpus/CorpusSearch.aspx


                                                                                                              Page 2 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 1. Sample concordances of important in the Collins WordbanksOnline English corpus.

In a way, DDL promotes vertical reading rather than horizontal reading as learners are invited
to look at the accumulated frequency and co-occurrence of lexical items. In Figure 1, learners
could note the following:

The words to the left of important: more, most, quite, awfully, very, etc.
The words to the right of important: to + infinitive, factor, thing, point, etc.




However, using concordance lines is useful to note language behaviour that goes beyond the
boundaries of two words that appear in contiguity. Take the word sure as an instance. The
Cambridge Advanced Learner‟s Dictionary2 offers 8 entries for the word. You can find the
entries and examples below:

1: certain; without any doubt:
"What's wrong with him?" "I'm not really sure."

2
    http://dictionary.cambridge.org/


                                                                                                              Page 3 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


I'm sure (that) I left my keys on the table.
I feel absolutely sure (that) you've made the right decision.
It now seems sure (that) the election will result in another victory for the government.
Simon isn't sure whether/if he'll be able to come to the party or not.
Is there anything you're not sure of/about?
There is only one sure way (= one way that can be trusted) of finding out the truth.
See also cocksure.

2 be sure of/about sb to have confidence in and trust someone:
Henry has only been working for us for a short while, and we're not really sure about him yet.
You can always be sure of Kay.

3 be sure of yourself to be very or too confident:
She's become much more sure of herself since she got a job.

4 be sure of sth be confident that something is true:
He said that he wasn't completely sure of his facts.

5 be sure of getting/winning sth to be certain to get or win something:
We arrived early, to be sure of getting a good seat.
A majority of Congress members wanted to put off an election until they could be sure of
winning it.

6 be sure to to be certain to:
She's sure to win.
I want to go somewhere where we're sure to have good weather.

7 make sure (that) to look and/or take action to be certain that something happens, is true, etc:
Make sure you lock the door behind you when you go out.

8 If you have a sure knowledge or understanding of something, you know or understand it
very well:
I don't think he has a very sure understanding of the situation.

Isolated from any context, sure is usually taught as being highly assertive, that is, it is taught
to express certainty like I’m sure I was there. Of course, there is nothing wrong with this. As
you have read above, this is the usual mainstream use of the word. However, if we search for
sure in a corpus, in this case the SACODEYL English corpus of European young people, we
will find that there is a new pattern which emerges clearly: I‟m not sure + what / if/ whether.
See Figure 2:




                                                                                                           Page 4 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 3: sure in SACODEYL English corpus.

It appears that I’m not sure is a powerful pattern to express hedging or tentative opinion as in
I’m not sure if I’d like to live there. Or followed by a canonical Subject + Verb + Complement
clause to indicate contrast or opinion as in I’m not sure. I’ve always wanted to be... or in I’m
not sure. I find art relaxing because…




As you can see, when we examine the different contexts in which a node is found, that is, the
word you are looking up, we can clearly see different patterns of use that are not always found
in textbooks or dictionaries.




Corpus linguists often discuss this phenomenon and try to account for it by looking at
language as a lexico-grammatical field of interplay rather than one where meaning is created
by the use of word in isolation (i.e. sure).

Bernardini (2004:16) highlights the fact that in DDL there is a “shift of emphasis from
deductive to inductive learning routines” which has a great impact on the agents of FLT. This
is summarised in Table 1:

FLT agents                    Shift
Teachers                      Become coordinators of research and facilitator
Learners                      Learn how to learn through exercises that involve the observation
                              and interpretation of patterns of use
Pedagogic grammars            Are now informed by enough evidence and stimuli for the learner to


                                                                                                           Page 5 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


                        arrive at developmentally-appropriate generalisations
Table 1. Shift of emphasis in DDL-FLT (Bernardini 2004: 16-7).




DDL then is about using data to promote richer language learning experiences. The
definition needs clarification, though. D in DDL stands for data, in other words, for language
data:




However, we should say that in the CL literature these data markedly present a computational
reading. We will try to go deep in the implications for language teachers and deflate the
obscurity that the term may shed in the following paragraphs.


2.1.1. Our English teaching is mediated by language data

We may have not reflected on the issue before, but when we decide on a textbook we are
opting for a particular set of language data to be used in our classroom.




In all probability, you face a situation where the Education Authorities have set an official
curriculum that you are bound to abide by. In a similar way, as a member of a large
institution, you are required to follow certain general methodological guidelines. Leaving
organizational aspects aside, however, teachers have the chance to reflect on their teaching
and choose the materials that best suit their learners. What choices can you make in terms of
the contents of your teaching? What are the main ingredients of your teaching? Do you stick
to a textbook? If so, to what extent do you or your Department consider the language in
there? Have you examined the language used in your textbook?

This is a fundamental issue that deserves our attention. EFL teachers, as most professionals in
other teaching areas, rely on solvent, reliable publishing houses that make an effort to mediate
between the learners and their teachers. In this process, the teacher, or group of teachers of a
school, has the opportunity to revise first and select then the textbooks that will be later used.
If we use language corpora as a complement to our teaching, we will be enlarging the width
of the scope of the language that we present to our students and, certainly, we will be
enriching their learning environment (Aston 1997).

But, before we move on to dealing with the ways in which we can use language corpora, let
us consider briefly the very basics of corpus linguistics.


                                                                                                           Page 6 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
     Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




2.2. Introducing Corpus Linguistics

Corpus linguistics (CL) makes use of data to gain insight into how language works. A well-
known definition for corpus is the following:

         Any collection of more than one text can be called a corpus, (corpus being Latin for
         "body", hence a corpus is any body of text). But the term "corpus" when used in the
         context of modern linguistics tends most frequently to have more specific connotations
         than this simple definition3.


This definition is well rooted in the linguistic tradition, and thus the connotations that
McEnery and Wilson bring up are concerned with the role of a corpus in a research-oriented
paradigm. These connotations are

        representativeness,
        size,
        machine-readable form and
        standard reference.

If linguists claim that using a corpus is a convenient way to research language use and
behaviour, they have to make sure that their tool, that is their language corpus, and their
methodology are geared towards maximizing the representative quality of the language
samples that have been included in the corpus. McEnerey and Wilson have put it this way:

         We are therefore interested in creating a corpus which is maximally representative of
         the variety under examination, that is, which provides us with an as accurate a picture
         as possible of the tendencies of that variety, as well as their proportions. What we are
         looking for is a broad range of authors and genres which, when taken together, may be
         considered to "average out" and provide a reasonably accurate picture of the entire
         language population in which we are interested4.

An example of all this is the British National Corpus (BNC). The BNC claims to be
representative of the English language used in the UK in the late 80‟s; its size (100 million
words) is big enough to include most communications genre and textual types; it is of course
electronic and, as a consequence of it all, it has become a standard reference of British
English. The BNC is introduced in its website as follows:

        The British National Corpus (BNC) is a 100 million word collection of samples of
        written and spoken language from a wide range of sources, designed to represent a
        wide cross-section of British English from the later part of the 20th century, both
        spoken and written. The latest edition is the BNC XML Edition, released in 2007.



3
  McEnery and Wilson. Corpus Linguistics. Available at
http://bowlandfiles.lancs.ac.uk/monkey/ihe/linguistics/corpus2/2fra1.htm
4
  Idem.


                                                                                                            Page 7 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


          The written part of the BNC (90%) includes, for example, extracts from regional and
          national newspapers, specialist periodicals and journals for all ages and interests,
          academic books and popular fiction, published and unpublished letters and
          memoranda, school and university essays, among many other kinds of text. The
          spoken part (10%) consists of orthographic transcriptions of unscripted informal
          conversations (recorded by volunteers selected from different age, region and social
          classes in a demographically balanced way) and spoken language collected in different
          contexts, ranging from formal business or government meetings to radio shows and
          phone-ins5.

The BNC can be searched free of charge from http://www.natcorp.ox.ac.uk/ The results are
limited to 50 hits, but this is enough to have a clear idea of what we are looking into:




                                            Figure 3. The BNC website.


However, using corpora is not the ultimate, one and only solution to linguistic inquiry and
research. This is not the place to revisit the old controversy between Noam Chomsky and
Charles Fillmore, two influential linguists of the second half of the XXth century. The former
has overtly criticized the use of language corpora as they are not seen as a reliable way to
render the complexity and vastness of language. Chomsky believed that the rules governing a
language could actually be scrutinized through introspection; the actual performance was
considered, by contrast, something that could not be apprehended. Fillmore criticised
armchair linguists that do not use real, that is, attested language data and, on the contrary,
rely on their own intuition and idiolect to develop complex theories of language.




5
    From http://www.natcorp.ox.ac.uk/corpus/index.xml


                                                                                                              Page 8 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


By the way, Fillmore criticises similarly corpus linguists that waste their time on design
issues, but that‟s a different story. The point here is that there has traditionally been a
controversy between introspection and data examination as valid tools for linguistic analysis.
Corpus Linguistics has gained now the interest of many researchers that believe that data need
to be collected before we can jump into conclusions about language use. In this sense, CL
methodology is empirical and data-driven.

Corpus-based research can be then characterised by two main features (Conrad 1999:3-4):

   1. The use of a principled collection of naturally-occurring texts, that is, a corpus. The
      BNC discussed above.

   2. The use of computers for language analyses. Depending on the items being analysed,
      these can be automatic or may need human interaction.


Corpus-based studies include both quantitative analyses and functional interpretations of
language use. The following table offers the basics of CL:




                                                                                                           Page 9 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Term            Explanation
Chunks          Groups of words that cluster together in n-number of words, i.e., 2,3,4,5, etc.
                These are not necessarily phrases (i.e. Noun Phrases) or clauses, but rather
                words that combine together in a statistically significant way. I don’t know,
                what I really mean or a couple of are good examples of chunks.
Collocates      Words that occur frequently in contiguity or almost in contiguity. To
                determine whether a collocate is significant, the software package performs
                statistical analyses.
Concordance Lines of text which show a node in the middle. The node is the word or string
lines           of words that is being searched in a corpus.
Concordancer The software that generates concordance lines.
Corpus          A principled collection of texts. This collection should follow strict design
                guidelines if the corpus is to represent a language or a register.
Wordlist        The list of words that are found in a corpus or in a particular text. This list
                usually shows the frequency of occurrence and, possibly, other statistical
                indexes.
Table 2. The basics of CL.

All these terms are usually found in descriptive accounts of English and have a very
interesting potential in language learning. For example, chunks are strings of n-words that
cluster together in a systematic way. Linguists such as Lewis (1993) or Nattinger and De
Carrico (1992) have stressed that lexis is primed over grammar in discourse:

           Lexis is central in creating meaning, grammar plays a subservient managerial role. If
           you accept this principle then the logical implication is that we should spend more
           time helping learners develop their stock of phrases, and less time on grammatical
           structures6.

Corpora are useful in revealing that the language speakers use relies heavily on chunking, that
is, the repetition of string of words. O'Keeffe, McCarthy and Carter (2007:60) highlight that
“language is available for use in ready-made chunks to a far greater extent than could ever be
accommodated by a theory of language which rested upon the primacy of syntax”. Let us give
you real instances of chunking in English. These authors have used the CANCODE corpus 7, a
5-million word corpus of spoken British English, to generate the most frequent chunks of n-
words. These are the results for the top 1 and 2:

                                           Top 1 chunk                              Top 2 chunk
3-word chunks                              I don‟t know                             a lot of
4-work chunks                              You know what I                          know what I mean
5-word chunks                              you know what I mean                     at the end of the
6-word chunks                              do you know what I mean                  at the end of the day

and these for the top 15 and 19 (chosen at random):

                                           Top 15 chunk                             Top 19 chunk

6
    Islam and Timmis: http://www.teachingenglish.org.uk/think/methodology/lexical_approach1.shtml
7
    http://www.cambridge.org/elt/corpus/cancode.htm


                                                                                                             Page 10 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


3-word chunks                              I think it‟s                             you know the
4-work chunks                              or something like that                   that sort of thing
5-word chunks                              I don‟t know what it                     an hour and a half
6-word chunks                              and at the end of the                    if you see what I mean (top
                                                                                    16)

O'Keeffe, McCarthy and Carter (2007:71) state that despite being syntactic fragments, these
chunks perform a very important pragmatic function beyond the word level and, significantly,
many have a discourse marking function (I mean, you know, you know what I mean, at the
end of the day, if you see what I mean,...).




In the same way, a corpus can be used to generate collocates, frequency lists and, as seen,
concordance lines. There are software packages that can handle this. Probably WordSmith
5.08 is one of the most complete suites available. Interesting non-commercial applications
include:

Generate concordance lines for every word in a text:
Text-based concordances: http://www.lextutor.ca/concordancers/text_concord/

Generate chunks for a text:
N-Gram phrase extractor: http://www.lextutor.ca/tuples/eng/

Search principled corpora:
Online concordancer: http://www.lextutor.ca/concordancers/concord_e.html

Generalte concordance lines, frequency lists, etc.:
Tubo Lingo: http://www.staff.amu.edu.pl/~sipkadan/lingo.htm




8
    http://www.lexically.net/wordsmith/


                                                                                                             Page 11 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 4. Online concordancer.




                                                                                                          Page 12 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


2.3. How can we make use of Corpus Linguistics? Indirect
approaches
Following Geoffrey Leech, Römer (2008) distinguishes between indirect and direct
applications of CL in the field of language teaching. Indirect approaches to corpora provide
access to corpus-informed insights into the nature of language. Those who consume this
information are typically, although not exclusively, researchers and language material writers
and designers. The typical users of this approach are teachers and learners themselves. The
following figure summarises this dichotomy:




  Figure 5. Indirect and direct applications of CL in the language classroom (Römer 2008).



Direct approaches are focused on straight, hands-on learning activities and the generation of
classroom material. These direct hands-on experiences can be either guided or unguided by
the teachers, and thus it is likely that most teachers find tasks that are suitable to their
students‟ needs and contexts.

Indirect approaches to using corpora in the language classroom have occupied the agenda of
applied linguists for over a quarter of a century now. These approaches have benefited from
linguistic research into the nature of language and offer a fresh non-normative view of
naturally occurring language. One of the main contributions of these studies is that corpus
data very often question our perceptions of how language works. A good example of this is
Biber (1988) and, particularly useful in the context of FLT, Biber at al. (1999):




                                                                                                          Page 13 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




            Figure 6. Longman Grammar of Spoken and Written English (LGSWE).


The authors of the LGSWE claim that this work “describes the actual use of grammatical
features in different varieties of English: mainly conversation, fiction, newspaper language,
and academic prose […] The LGSWE adopts a corpus-based approach, which means that the
grammatical descriptions are based on the patterns of structure and use found in a large
collection of spoken and written texts, stored electronically, and searchable by computer”
(Biber et al. 1999: 4). So the idea here is that a well-designed corpus can be useful in learning
more about how language works. This is useful for both native and non-native speakers as
even the latter cannot rely on pure intuition to determine how language works across every
single register and communicative domain.




Let us have a look at one syntactical construction to illustrate the usefulness of corpora in the
language classroom. Existential clauses contain, in most cases, be as a verb and there as a
subject: There is no coffee is a nice example of locative here. There, however, introduces
other verbs: seem, appear, suppose and use to are nice examples. When to use one or another
as their meanings are so close? In the LGSWE we find corpus-driven information that tells us
that the frequency of appearance of these verbs after existential there depends on the textual
and domain features of the communicative event.




Thus there exist/exists is very frequent in academic texts while it is rare or infrequent in
conversation, fiction and news language. There come/comes, on the contrary, is infrequent in
academic language, conversation and news, but very often found in fiction texts and creative
language use. Figure 7 illustrates this point:




                                                                                                          Page 14 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 7. Verbs other than be in existential constructions. Biber et al. (1999).

When these and similar verbs are followed by to be we discover interesting facts. There
seem/seems to be is found to occur across all 4 domains and textual types while there used to
be is untypical and not frequent at all in fiction, news or academic language:




Figure 8. To be after some verbs in some existential constructions. Biber et al. (1999).


In these examples we can note the interplay between grammatical categories and register.




                                                                                                          Page 15 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.



3. Direct approaches

As stated, direct approaches are more prone to immediate, straightforward classroom
applications. In some schools, it might be convenient to make use of a computer room while
in others teachers will prefer to develop materials that can be printed and later distributed. The
nature of the lesson will determine what kind of interaction we expect from our students.


3.1. Some tips

If you want your learners to plunge into using a corpus, our suggestion is to follow a
carefully-planned route:




1. Select a small group of learners. Using technology is cumbersome at times and computers
tend to crash in multimedia LANs which are often used by many. If your LAN restricts IPs or
domains, make sure before hand that the sites you plan to use are availble.

2. Avoid meta-language, such as linguistics, node or principled corpus. It is language, real
language that your learners will be more interested in.

3. Before getting your students to use a concordancer or a similar tool, distribute activities
where they can get used to reading vertically rather than horizontally. Make sure they get used
to interpreting the context and making hypothesis about contexts of use and prosodies, that is,
whether the line is used in a derogatory way or positively.

4. Select what you want your students will be looking up well beforehand. Examples or
activities that are over the top easily discourage students.

5. Try to put interesting questions to your students. Motivate them and make them become
interested in turning themselves into researchers or, better, detectives.

6. Select carefully the corpus you want to use. You may consider building your own corpus.



3.2. Activities: using SACODEYL

A corpus is an excellent tool to discover language behaviour and to learn more about
collocations and patterning. In teaching contexts, principled corpora may not adapt well to
your students‟ level, especially if these are very young. We recommend that you build your
own collection of texts if they are suitable to your students‟ needs. However, using
SACODEYL is a more straightforward option if you want to use teen talk, multimedia
corpora:



                                                                                                          Page 16 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




By using a corpus as a tool to find out language, learners are given the chance to empower
their inductive skills to learn about language, which is highly instrumental for further
learning. Sinclair (2004:288) is definitely optimistic about the unmediated use of reference
corpora in the language classroom:

        ...both teacher and student can make use of a corpus right away, with only a modest
        few hours orientation; there is no need to wait for the new textbooks and reference
        books. Only fairly simple queries can be handled at this stage, but the results can be
        illuminating and very helpful. For this, you will need a computer of normal
        performance, a corpus and some query software. Will the corpus be 100% reliable,
        comprehensive and representative? Of course not, but do your present books match
        these targets? Or your reference grammars and dictionaries? Or any native speaker
        models? Or any combination of these? Of course not.

Despite Sinclair‟s statement, the teaching context in secondary education is still far from
complying with much of the requirements above. Good reference corpora are commercial and
search tools are difficult to handle9. Mauranen (2004:1999) has voiced her concern for the
actual use of innovation in classrooms:

        No teaching method can become an important innovation, whatever its potential, if it
        does not make its way to the normal classroom where teachers and students ca use it as
        part of their everyday routines, whit not too much extra hassle.

Fortunately, there are now a few instances of pedagogical corpora whose focus is more on
learning than on linguistic research and which happen to be free to use. SACODEYL is one of
these pedagogically-motivated corpora. ELISA, its predecessor and inspiration, is another
interesting effort:

        ELISA is a collection of video-based interviews with native speakers of different
        varieties of English (e.g. US, England, Scotland, Ireland, Australia) and from different
        walks of life. They talk about their professional career. All interviews follow a general
        pattern, covering a similar range of topics, e.g. the what the speakers do, their
        educational background, how they started their career or business, the type of projects
        they are involved in, their daily routines and future plans. While some of the speakers
        engage in unusual professions (e.g. a tour guide at Ayers Rock, a guitar teacher, a
        travel journalist and an arts therapist) and thus make for the attraction of the materials,
        they all describe issues of general interest in professional contexts. The corpus
        currently contains 25 interviews of 5 to 15 minutes. the transcripts amount to about
        60,000 words in total10.




9
   Guy Aston and Lou Burnard published in 1998 The BNC handbook: exploring the British National Corpus
with SARA. Edinburgh Textbooks in Empirical Linguistics, an excellent reference book to fully exploit SARA.
10
   http://www.uni-tuebingen.de/elisa/html/elisa_index.html


                                                                                                          Page 17 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


SACODEYL offers young learners the language and the voices of their peers. As in ELISA,
SACODEYL kids talk about their daily routines, about themselves, their schools, their
hometowns, their leisure time activities and hobbies, films, books, sports and many other
topics.

The SACODEYL corpus has been annotated with a view on pedagogical applications. This
makes SACODEYL a very interesting complementary material in mainstream teaching where
teachers and students can find a familiar range of language/communications context. The
following figure illustrates this:




Figure 9. SACODEYL search categories.

These categories resemble the language and the communication-oriented methodology of
mainstream language teaching. Learners ant teachers using SACODEYL may want to
navigate the English corpus in exactly the same way as they mavigate the contents of their
textbook. In SACODEYL, every interview has been split into sections, that is, convenient
teaching and learning stretches of language which have a pedagogical value. Each section has
been annotated by experienced teachers who have assigned them a full array of categories and
subcategories. Having annotated the corpus, this can be searched accordingly:




                                                                                                          Page 18 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




                          Figure 10. SACODEYL search categories in detail.

Users can also browse interviews:




                                                                                                          Page 19 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 11. Browse area for SACODEYL English corpus.

And sections within interviews, search for sections that meet the criteria you set:




Figure 12. Browse area for SACODEYL section search.


Let us consider some activities for the language classroom. We assume that your learners are
Secondary School students of English, so we will use SACODEYL English corpus, a small


                                                                                                          Page 20 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


corpus of teenage talk contributed by some 25 interviewees from the Reading area in the UK.
Here is a selection of activities that illustrate the type of


3.2.1. Activities focused on communication and attention to form

Tell your students to search for [Reading]. You may want to introduce them to the area and
neighbouring cities, all of them widely known. Ask them to read the concordance lines and
get them to classify (A) words on the left, (B) words on the right and (C) contexts of use:




Figure 13. Simple SACODEYL word search.

The following screen shows the number of hits by displaying the concordance lines:




                                                                                                          Page 21 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 14. SACODEYL Search tool.

You may want to guide your students in their search. Providing tables to fill in is usually very
productive as this keeps students focused on the task, which becomes more convergent:

A              Write here the most frequent words or punctuation to the left of Reading

    (like, feel, tell) about   (live, be) (here) in            the (centre, outskirts) of
B            Write here the most frequent words or puntuation to the right of Reading

           as a place                   ./?                            festival
C                                        Guess: What is it talked about?

           Context 1                                 Context 2                                 Context 3




                                                                                                          Page 22 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


Reactions to/ opinions on               Staying in Reading of leaving Reading festival
your hometown /                         Reading / Travelling
where you live


Table 3. Fill-in table.

In A and B students are invited to observe the surrounding context of a word and note the
accumulation of certain instances to the left or to the right of the node. In C, students are
invited to make hypotheses about what is being talked about. If desired, you can explore uses
of like about / feel about / tell about or [Murcia/ Cartagena as a place] or, more from a
communicative perspective, expressing opinion about your city/ place or the place where you
live. If you tell your students to search for [like about], they will be given instances where
kids use it in a real context embedded in the flow of speech. And more importantly, your
students will be presented with an opportunity to disambiguate other uses of [like about]:




Figure 15. SACODEYL Search tool.


In the case highlighted above, [like about] is used as a hedge, a very common feature of
spoken English. This is a convenient way to combine communication oriented teaching and
Form-focused instruction. This range of activities is focused on analysing the context of use
of a given word [Reading], both linguistically and communicatively.




                                                                                                          Page 23 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




In a unit where music and concerts are presented, you may want to ask your students to find
out about [Reading Festival]. This is what they may find11:




Figure 16. SACODEYL Search tool.


From here, students can go to the interview section where the speaker talks about it:




11
     At the time of writing, the corpus search facility was under construction, so search results may vary.


                                                                                                             Page 24 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 17. SACODEYL Search tool: section level.

and read and listen to what this speakers says about it:




Figure 18. SACODEYL English corpus: section level.

It is interesting to see how the online nature of spoken discourse affects the way we put things
while speaking. In this very short extract, your students can find the following, among others,:

-Native correction: [gonna to]
-Unfinished sentences: [been so, but]
-Contractions not frequently used by Sapnish speakers: [it‟ll be]




                                                                                                          Page 25 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
      Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


As put by Bernardini (2004: 17) working “concordancing in particular may prove unique in
the acquisition and restructuring of competence [...] Language learning may be viewed as an
inductive process in which meaning and form come to be associated”.



3.2.2. Activities focused on attention to form and communication

Römer (2008: 19) has pointed out that concordance lines can be used by teachers to “create
DDL exercises tailored to their learners‟ proficiency level and their particular learning needs”.
A case in point is the use of articles. This will be dealt with later in chapter 4 from a different
angle.

Let us search for sections in SACODEYL English corpus that have been annotated as being
representative of this particular linguistic feature:




Figure 19. SACODEYL English corpus: category search on section level.

From this you may want to select stretches of language that can be submitted to students for
evaluation and analysis or simply they can be used as materials to improve their mastery of
the form. The following bits are interesting for different reasons. A is actually very
convenient to see the use of the indefinite article:

(A)




                                                                                                            Page 26 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
      Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


Interviewer: So, what kind of house do you live in? Can you describe what kind of
house you live in?
Rachel: It‟s a semi-detached and it‟s got a garage and a big garden and it‟s quite big. It‟s got
quite a lot of rooms but I have to share my room with my sister.


You could present this in a cloze format:

 Interviewer: So, what kind of ...house do you live in? Can you describe what kind of
...house you live in?
Rachel: It‟s ... semi-detached and it‟s got ...garage and ... big garden and it‟s quite big. It‟s
got quite ... lot of rooms but I have to share my room with my sister.

In B, we can notice the presence of the zero article:

(B)

Interviewer V: You say you‟ve got a lot of work this year why is that?
Sam: It‟s our first year of GCSEs so you‟ve got course work and it‟s like
writing essays for different subjects. And recently we‟ve been doing English we
did a we did a we did course work on a book Hard Times by Charles Dickens. Which
was a bit boring but, but we‟ve finished that now so it‟s alright.

You could present this in a cloze format:

Interviewer V: You say you‟ve got a lot of work this year why is that?
Sam: It‟s our first year of GCSEs so you‟ve got ...course work and it‟s like
writing ...essays for ...different subjects. And recently we‟ve been doing ...English we
did a we did a we did ...course work on ... book Hard Times by Charles Dickens. Which
was a bit boring but, but we‟ve finished that now so it‟s alright.

In actual fact, (B) can be expanded easily into an interesting source for pragmatic information
including sentence restructuring [did a a we did], sentence relatives to express evaluation
[Which was a bit boring] and conclusion [so it‟s alright].


Barlow (1996) sees in activities like these a potential for teachers to enrich the learning
environment and students‟ knowledge of language.

For a thorough account of concordance-based DDL, we suggest reading a practical book on
the issue (Tribble and Jones 1990):




                                                                                                            Page 27 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
   Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 20. Concordances in the classroom, by Chris Tribble and Glyn Jones. Longman 1990.




                                                                                                         Page 28 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.



4. Indirect approaches: Learner corpora in the EFL classroom


4.1. Definition

Among the many types of corpora which can be compiled, analysed and used (see McEnery,
Xiao and Tono, 2006, for an overview), Computer Learner Corpora (CLC) stand out as one of
the most powerful pedagogic tools for the EFL or ESL classroom. As recently defined, they
are

„[…] electronic collections of foreign or second language learner texts collected on the basis
of strict design criteria.‟ (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007: 254)

In other words, a learner corpus is compiled when the oral or written texts produced by your
students of English are collected with strict design criteria, put in electronic format, and then
stored in your hard drive, memory stick, etc., so that you can conduct analyses with
programmes like WordSmith Tools, already mentioned:




Figure 21. From oral or written texts to a computer learner corpus.

Thanks to the availability of computers and freely available software to carry out analyses,
Learner Corpora Research (LCR) has been a fruitful field since the second half of the 1990s.




From that moment onwards, the growing number of publications either in edited volumes (cf.
Granger, 1998; Granger, Hung and Petch-Tyson, 2002; Guilquin, Papp and Díez-Bedmar, in
press, etc.), or international journals (cf. Corpora, Applied Linguistics, English Corpus
Studies, Journal of English for Academic Purposes, ReCALL, etc.) shows the potential of this
type of research and constitutes the first steps to the awareness of the possibilities that CLC
can offer for Second Language Acquisition and for the TEFL or TESOL classroom.



                                                                                                          Page 29 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




4.2. Types of CLC

Due to the importance of CLC-based results, the number of CLC has mushroomed since the
second half of the 1990s. The research questions pursued by various researchers or research
teams have fostered different types of CLC, which are frequently classified according to four
related variables, namely the mode of the language in the learner corpus, its size, the type of
intervention (i.e. when the CLC-based will be applied in the design of materials, the
sequencing of the curriculum, etc.), and the type of annotation in the corpus.


                                                Written
 Mode                                           Spoken
                                                Multimedia
                                                Big (commercial or some research teams)
 Size                                           Small
                                                (research)

                                                Delayed Human Intervention
 Type of Intervention12
                                                Early Human Intervention

                                                Raw
 Type of annotation13
                                                POS-tagged
                                                Semantically- tagged
                                                Error-tagged
Table 4. Main variables considered for the classification of learner corpora.



4.3. Methodologies used with CLC

Compiling students‟ production does not constitute new practice to teachers of English as a
second or foreign language, as it has always been considered to create remedial exercises, test
their command of the foreign language, etc. However, the methodology used to conduct the
analysis of the students‟ production has changed along time, as researchers and teachers have
focused their attention on different aspects (the students‟ L1, the target language, etc.) and
technology has made it possible to compile CLC, i.e. learners‟ real data in electronic format.




Table 5 shows the three main methodologies used before the arrival of CLC. The first one,
Contrastive Analysis, in its strong form, did not consider the students‟ production, but the
12
     This distinction was made by Sinclair (2001, vii).
13
     For the types of annotation, refer to McEnery and Wilson


                                                                                                             Page 30 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
           Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


similarities and differences between the students‟ L1 and their target language (i.e. Spanish
and English), in order to predict the difficulties that students would have. The weaknesses
found in this methodology led researchers to shift their attention to Error Analysis, whose
theoretical principles and methodological issues were provided in a series of articles in the
1960s and 1970s (and reprinted in Corder, 1981). Specially outstanding was the paper „The
significance of learners errors‟ (included in Corder, 1981), which proved that errors were
crucial to researchers, teachers and students, since they all could learn from them and apply
that knowledge to their research, teaching practice or learning process. Thus, the steps for
conducting an EA were followed by many teachers and researchers and the results published,
on some occasions, as dictionaries and lists of common errors.

However, Error Analysis only considered errors and dismissed the learners‟ correct use of the
foreign or second language. This led Selinker to his Interlanguage Analysis (IA) (Selinker,
1972), which examined the students‟ entire production, i.e. errors and non-errors alike. In this
way, it was possible to obtain a better description of the students‟ use of the foreign language
when performing a task at a specific point in time in their language learning process: their
interlanguage.


               Methodology                               Focus of interest             Publications
               Contrastive Analysis (CA)                 Comparison of                 Lado (1957)
                                                         the students‟ L1 and their TL
               Error Analysis (EA)                       Students‟ real errors         Corder (1981)
 Pre CLC




                                      The students‟ whole
               Interlanguage Analysis (IA)                            Selinker (1972)
                                      production, errors and non-
                                      errors
Table 5. Methodologies used to describe the students‟ language before CLC.


Despite not in a systematic way, teachers of English as a foreign or second language
frequently analyse their students‟ production following any of these methodologies or a
combination of some of them.




For instance, an Error Analysis is conducted when a teacher corrects a batch of essays and
uses a code system, i.e. an error taxonomy,14 to make the students aware of the type of error
made. Thus, „sp‟ may stand for a spelling error, „wo‟ for word order, „prep‟ for a problem
with a preposition, etc. After marking all the essays, and skimming his or her annotation, the
teacher realises that the most frequent error in the compilation of essays has to do with a
certain aspect of the foreign language (be it prepositions, articles, verb tenses, etc.). If the
correct instances of those aspects are considered together with the incorrect ones, an
Interlanguage Analysis is conducted. However, if the students‟ L1 is compared to their TL


14
  For an overview of various error taxonomies, refer to (Dulay, Burt and Krashen, 1982: 146-197) or James
(1998: 102-117).


                                                                                                                 Page 31 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
       Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


either before or after analysing their production in an attempt to explain the causes of the
students‟ errors, a CA in its strong or weak version, respectively, is completed.

The manual analysis of the students‟ errors, following a CA, EA or IA methodology, proves a
time- and effort- consuming task which a teacher can only do with a limited number of
essays, as it is necessary to go to the essays, look for the errors, highlight, classify and count
them, make sure all the errors are being considered, look for the correct use of the aspect of
the language being analysed, compare the use of the aspect under analysis in the L1 and the
FL, etc. Fortunately, those processes have been sped up thanks to the improvement in
technology and, consequently, the advent of CLC, their electronic format being among their
main advantages (Nesselhauf, 2004: 139-40), because they make their compilation and their
analysis easier.

Not to fall prey of the temptation to collect huge disorganized amounts of data, as it is the
case with corpora in general (see section 2.2. above), strict design criteria are to be observed
when compiling a learner corpus. Special attention needs to be given to the principles of
authenticity and representativeness, and all attempts are to be made to avoid the effects of
variability not to compare aspects from a not homogeneous learner corpus. Thus, if the
teacher aims at representing students‟ in-class argumentative writing at intermediate level,
pieces of writing which belong to other genres, which are written by students at other
proficiency levels, or at home (and presumably with access to reference materials), should not
be included in that corpus, since the results would be biased. Just consider, from your own
experience, the difference in the type and amount of errors which an argumentative essay
written by a student in class (and without the use of dictionaries, online resources, etc.) and at
home would have or, likewise, the type of errors that you expect from descriptive writing as
compared to narrative writing.

Drawing from the methodologies in the pre-CLC era, the analysis of students‟ use of
language, as represented in a learner corpus, is nowadays being made in a systematic and
scientific way following Computer-aided Error Analysis (CEA), Contrastive Interlanguage
Analysis (CIA) or the Integrated Contrastive Method (ICM):


         Methodology                  Focus of interest                 Publications
         Computer-aided Error AnalysisStudents‟ real errors, as         (Dagneaux, Dennes
         (CEA)                        attested in a CLC                 and Granger, 1998)
         Contrastive Interlanguage    Comparison of                     (Granger, 1996)
         Analysis (CIA)                     NS vs. NNS
                                              production
                                            NNS vs. NNS
                                              production
      Integrated Contrastive Method         CA                         (Granger, 1996;
 CLC




      (ICM)                                 CIA                        Gilquin, 2000/2001)
Table 6. Methodologies used in the description of the learners‟ production of the foreign
language.


The first one, CEA, is a „new type of EA‟ (Dagneaux, Dennes and Granger, 1998: 165). In
other words, it is a computerized version of EA, which allows a quicker error annotation and
easy retrieval of the erroneous instances of students‟ use of the foreign language. There are


                                                                                                             Page 32 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


two ways to conduct such an analysis, which depends on whether the learner corpus is error-
tagged or not, i.e. whether a code system to highlight the errors has been used or not.

If it is not, an intuitive search for an error-prone aspect is undertaken. This is the case when
the teacher feels that the central articles the and a(n) pose problems to his or her students. By
means of a learner corpus and retrieval tools, s/he can read in the concordances retrieved the
use of those articles and decide which ones are incorrect, thus conducting an EA.




However, a raw learner corpus, i.e. one without error annotation, will not allow the researcher
to retrieve those instances of the (mis-)use of the zero article, since it would be impossible to
automatically retrieve them. To do so, the learner corpus needs to be error-tagged.

There are two types of error-tagged learner corpora:

                                    Fully error-tagged and
                                    Partially error-tagged

In the former, a comprehensive error taxonomy has been used to highlight all the possible
errors in a learner corpus. Although few learner corpora are fully error-tagged due to practical
reasons of time and money, the results which such EAs yield provide a bird‟s-eye perspective
of the students‟ problems when using the foreign language at a specific moment in their
language acquisition process. As an example, Figure 7 shows the percentage of errors in
forty-three aspects of the foreign language (as represented in the error tags on the horizontal
axis) that the written production by first-year university students contains at the beginning of
the academic year (Díez-Bedmar, 2005):




Figure 22. EA of first-year University students when beginning the academic year (Díez-
Bedmar, 2005).




                                                                                                          Page 33 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


A partially error-tagged learner corpus only highlights a specific type of error, which is of
interest to the teacher or the researcher. Resuming the case of the central articles, a partially
error-tagged learner corpus will make it possible to easily retrieve, quantify and analyse the
errors made with the articles the and a(n) (as it was the case with a raw learner corpus), but
also those errors involving the zero article (Ø). Notice in the following concordance lines the
cases of incorrect use of the central articles, a(n), followed by erroneous uses of the zero
article, and then erroneous uses of the, as error-tagged (GA).




Figure 23. Article errors as retrieved from a partially error-tagged learner corpus using
WordSmith Tools..


The second methodology used with CLC, the Contrastive Interlanguage Analysis, allows the
researcher to compare the students‟ production with:

1   the production by native speakers of English
2   the production by other groups of learners of English with a different L1

On the one hand, if your students‟ production is compared to that by native students of
English (at the same level and under the same external variables), it would be possible to see
how (dis-)similar both productions are when an aspect of the foreign language is studied. As a
result, instances of misuse but also under- or over-use are revealed and conclusions such as
the overuse of the prepositions between, inside and according to by Spanish university
students, when comparing them to native learners of English can be drawn (Martínez Osés
and Neff, 2001: 144). On the other hand, you may be interested in comparing how various


                                                                                                          Page 34 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


groups of students of English (at the same proficiency level and under the same external
variables) struggle with the same aspect of the foreign language, as Kaszubski (2001) did
when comparing the use of the lemma be by Spanish, Polish and Belgian-French students.




Finally, the Integrated Contrastive Model includes a CIA and a corpus-based CA. Therefore,
three different corpora are used, namely the learner corpus, the control corpus and a corpus
which contains the production by native speakers in the L1. As it happened with CA in the
pre-CLC era, there are two ways of conducting an ICM. First, the corpus-based CA is
conducted in order to see the main differences between the two native languages considered
and, then, the problems posed by such differences are attested in the learner corpus. On the
contrary, the problems in a learner corpus, as revealed by a CIA may lead to a corpus-based
analysis of the two native languages in an attempt to find the causes of such errors.


4.4. The application of CLC in the TEFL classroom

The potential of CLC in the direct and indirect approaches will be explored in this section.
The first one will deal with the indirect approach, that is, using the results from the analysis of
CLC (following the methodologies described in 4.3) to improve teaching materials, the
curricula, etc., whereas the second one will focus on the direct approach, which provides
hand-on experience in working with CLC.


4.3.1. The indirect approach

Although CLC-based descriptions of the students‟ interlanguage are still limited and only
provide „[…] patchy knowledge of the different stages of interlanguage development.‟
(Gilquin et al., 2007: 322), the results obtained are progressively being introduced in teaching
materials.

Among the ones which have benefited more from the results in CLC are the dictionaries of
common errors, such as The Longman Dictionary of Common Errors (Turton and Heaton,
1987) and the Cambridge series Common Mistakes at… (Tayfoor, 2004; Driscoll, 2005; etc.),
in which frequent errors in learner corpora are highlighted and explained.




                                                                                                          Page 35 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 24. CLC-informed materials focused on common errors.

Likewise, dictionaries have also been CLC-informed. The first one was the Longman
Essential Activator (LEA), which made use of the information in the Longman Learner’s
Corpus (LLC), and was followed by some others such as the Cambridge International
Dictionary of English, based on the error-tagged Cambridge Learners’ Corpus (Nicholls,
2003), or the second edition of the Macmillan English Dictionary for Advanced Learners,
based on a CIA analysis of the International Corpus of Learner English (ICLE) and a corpus
of native speakers‟ academic writing.




Figure 25. CLC-informed monolingual dictionaries of English.


The CLC-based information in these dictionaries is typically provided in „help boxes‟, which
are quite familiar to any learner of English as a foreign or second language. However, new
ways of offering information from CLC are being devised, as it is the case of the graphs in the
Macmillan English Dictionary for Advanced Learners, which shows the results of the CIAs
conducted on problems of frequency, register confusion, etc. Similarly, alternative ways to
express the students‟ typical errors are also suggested (as exemplified from the control
corpus) and extended writing sections on twelve rhetorical or organizational functions which
are particularly prominent in academic writing are included (cf. Gilquin, Granger and Paquot,
2007, pp. IW1-IW29).




                                                                                                          Page 36 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 26. CLC-based results as provided in the Macmillan English Dictionary for Advanced
Learners (MED2).


Recent grammars also include information from learner corpora, as it is the case of Carter and
McCarthy‟s (2006) Cambridge Grammar of English, or the on-line Chemnitz Internet
Grammar of English.




Figure 27. CLC-informed grammars of English.


Finally, CLC may inform CALL programmes, such as WordPilot (Milton, 1998) or be
integrated into CALL programs, so that teachers and students, if deemed convenient, have a
direct access to the real data, as in the EXample eXtractor Engine for LAnguage Teaching
(eXXelant) (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007).




Although syllabus design, textbooks and writing courses are now beginning to consider native
data in their recent editions (cf. the Touchstone Student’s Book series), there is no doubt that
the information provided by CLC can complement and improve such materials to meet the
students‟ real needs.


                                                                                                          Page 37 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




4.3.2. Designing remedial exercises from a learner corpus

Analysing a learner corpus and designing CLC-based remedial exercises to meet your
students‟ real needs is not a difficult task. To help you analyse the data in a learner corpus,
this section will explore two ways to approach a small raw learner corpus. The first one deals
with the students‟ use of vocabulary, and the second one with the lexico-grammatical pattern
of the verb „say‟ and „tell‟.




The learner corpus used is one composed of the handwritten production by 16 first-year
university students (amounting to 17,765 words) when writing descriptive texts in class,
without any access to reference materials and a time limit of 60 minutes, was used. The piece
of software used for such purpose will be WordSmith Tools version 4.0.

4.3.2.1. Exploring vocabulary usage: wordlists and concord

This piece of software allows the teacher or researcher to create a wordlist, to run
concordances and explore keywords, as can be seen in the following Figure. However, we
will focus on the use of word lists and concordances for an exploratory analysis of the
adjectives used by a group of learners.




                                                                                                          Page 38 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 28. WordSmith Tools 4.0.


As this self-explanatory term indicates, a word list is a list of the words in your learner
corpus. This term was reviewed in Table 2 above. Such list may be quantitatively ordered
from the word which presents the highest number of occurrences to the ones which only
appear once, or the other way round.

As can be seen in Figure 29 below, a word list of the adjectives that students used in the
learner corpus was obtained after removing from the list the words which did not belong to
this open word-class. As a result, it was possible to check that the adjectives which were most
used by those students were „good‟, „important‟ and „different‟. This finding may not have
surprised an experienced teacher, but the co-text in which these adjectives are used may
reveal interesting and unexpected deficiencies in the learners‟ vocabulary.

In order to explore such co-texts, the next step is to run concordances of any of these words.
For this example, „important‟ was selected. As can be seen in Figure 30 below, by running a
concordance we obtain sentences with the searched word in the middle and in blue. This is
known as „Key Word In Context‟ (KWIC), or node, and the lines obtained (i.e. concordance
lines) are not to be read in the traditional way (that is, everything from left to right as already
seen above), but we only focus on the first word to the left or to the right of the KWIC. Thus,
we are able to see the type of pre-modification the students use with the adjective under
consideration (first word to the left of the KWIC), and which elements are qualified as
„important‟. As already reported (cf. Granger and Tribble, 1998 or Osborne, 2004, among
others), students rely on this adjective, to the detriment of the use of others like „crucial‟,
„outstanding‟, „main‟, „valuable‟, etc., in the appropriate contexts. Therefore, a very easy


                                                                                                          Page 39 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


exercise to create with the students‟ real words in their compositions is to remove the KWIC
and leave a blank, so that they have to think of a better alternative to fit in the linguistic
contexts they have created.




Figures 29 and 30. WordSmih Tools: Running a concordance and hiding the KWIC.




                                                                                                          Page 40 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


Figure 32, presents a screenshot of such worksheet, which you can put into a word document
and use in class. The strongest aspect of this exercise is that it is based on your students‟ own
errors, and therefore, cater for their very specific needs. Furthermore, students are more likely
to feel motivated to do this exercise, since they may recognise their sentences and may be
willing to learn how to improve them.




Figure 31. Concord utility.




                                                                                                          Page 41 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 32. Worsksheet in a .doc document.


4.3.2.2. Exploring lexico-grammatical patterns: „say‟ and „tell‟

The use of the verbs „say‟ and „tell‟ are reported to pose difficulties to students at various
levels due to their different lexico-grammatical patterns. However, it is worth exploring
whether your students do make those mistakes and, if so, which are the most problematic
uses.




In order to do so, the first step is to run a concordance of the verb „say‟ and sort the first
words to the right of the concordance line, as shown in Figures 33 to 35.




                                                                                                          Page 42 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figures 33 to 35. Running a concordance and sorting them considering the first element to the
right of the KWIC


By doing so it is now possible to see how the students complement the verb „say‟ in different
contexts and co-texts that they have created themselves. In checking those uses, it is also
possible to notice uses of the verb „say‟, where „tell‟ would have been preferred, or where
another wording would have been more native-like.

In order to show students real native examples of the use of those problematic verbs, i.e. „say‟
and „tell‟, we can use the freely available version of the British National Corpus (BNC) or the


                                                                                                          Page 43 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


Collins Wordbanks Online English Corpus as control corpora, and show students some
examples in KWIC format to foster their analysis of the lexico-grammatical patterns used
(with the help of the teacher if necessary). To do so, we only have to query those corpora
(Figures 36 and 37), select the examples which show the various possibilities to complement
the verbs and, finally, create a word document for them to work with

Once real input has been provided to students and they have reflected on the various lexico-
grammatical patterning, an exercise based on their own written production, that is, in the
learner corpus compiled, can be created. As it was the case with the example of the use of
„important‟ above, we can easily remove the KWIC (the verbs „say‟ or „tell‟ in this case) from
the concordance lines and create a remedial exercise.




                                                                                                          Page 44 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.




Figure 36 and 37. Concordances of the verbs „say‟ and „tell‟ in two native corpora.




As can be seen, creating materials which meet our students‟ real needs is not such a difficult
or time-consuming task. EFL teachers‟ experience is highly valuable when considering their
intuitions regarding their students‟ problems, which are worth checking and exploring in the
learner corpus that they have compiled. Once the remedial exercises have been created, the
worksheets can be stored either in paper format or distributed in a virtual platform, so that
students with the same problems, in our school or in another, may benefit from our work
created and improve their use of the foreign language.




                                                                                                          Page 45 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.



References

Barlow, M. (1996). Corpora for Theory and Practice. International Journal of Corpus
Linguistics, 1, 1. 1-37.

Bernardini, S. (2004). In the classroom: Corpora in the classroom: An overview and some
reflections on future developments. In John Sinclair (ed) How to Use Corpora in Language
Teaching,15-36. Amsterdam: John Benjamins.

Carter, R. and McCarthy, M. (2006). Cambridge Grammar of English. Cambridge:
Cambridge University Press.

Corder, S. P. (1981). Error analysis and interlanguage. Oxford: Oxford University Press.

Dagneaux, E., Dennes, S., and Granger, S. (1998). Computer-aided error analysis. System 26:
163-174.

Díez-Bedmar, M.B. (2005). Struggling with English at university level: error-patterns and
problematic areas of first-year students‟ interlanguage. In P. Danielsson and M. Wagenmakers
(eds), The corpus linguistics conference series. Retrieved 16 September 2007, from
<http://www.corpus.bham.ac.uk/PCLC/>

Driscoll, L. (2005). Common Mistakes at PET… and How to Avoid Them. Cambridge:
Cambridge University Press.

Dulay, H.., Burt, M., and Krashen, S. (1982). Language Two. Oxford: Oxford University
Press.

Gilquin, G. (2000/2001). The integrated contrastive model. Spicing up your data. Languages
in Contrast 3(1): 95-123.

Gilquin, G., Papp, Sz. and Diez-Bedmar, M. B. (eds.) (in press) Linking up Contrastive and
Learner Corpus Research. Amsterdam and Atlanta: Rodopi.

Gilquin, G., Granger, S, and Paquot, M. (2007). Learner corpora: The missing link in EAP
pedagogy. Journal of English for Academic Purposes 6: 319-335.

Granger, S. (1996). From CA to CIA and back: an integrated approach to computerized
bilingual and learner corpora. In K. Aijmer, B.Altenberg and M. Johansson (eds.), Languages
in Contrast. Text-Based Cross-Linguistic Studies, 37-51. Lund: Lund University Press.

Granger, S. (ed.) (1998). Learner English on Computer. London and New York: Addison
Wesley Longman.

Granger S. and Tribble C.(1998). Learner corpus data in the foreign language classroom:
form-focused instruction and data-driven learning. In S. Granger (ed.) Learner English on
Computer, 199-209. London and New York: Addison Wesley Longman.




                                                                                                          Page 46 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


Granger, S., Hung, J. and Petch-Tyson, S. (eds.) (2002). Computer Learner Corpora, Second
Language Acquisition and Foreign Language Teaching, Amsterdam and Philadelphia: John
Benjamins.

Granger, S., Kraif, O., Ponton, C., Antoniadis, G. and Zampa, V. (2007). Integrating learner
corpora and natural language processing: A crucial step towards reconciling technological
sophistication and pedagogical effectiveness. ReCALL 19(3): 252-268.

James, C. (1998). Errors in Language Learning and Use. Exploring Error Analysis. London
and New York: Longman.

Kaszubski, P. (2001). Tracing idiomaticity in learner language –the case of BE. In P. Rayson,
A.Wilson, T. McEnery, A. Hardie and S. Khoja (eds.), Proceedings of the Corpus Linguistics
2001 Conference (29 March-2 April), 312-322. Lancaster: University Centre for Computer
Corpus Research on Language

Lado, R. (1957). Linguistics Across Cultures. Ann Arbour, Michigan: Michigan University
Press.

Lewis, M. (1993). The Lexical Approach. Language Teaching Publications.

McEnery, T.; Xiao, R., and Tono, Y. (2006). Corpus-based language studies. An advanced
resource book. London: Routledge.

Milton J. (1998). Exploiting L1 and Interlanguage Corpora in the Design of an Electronic
Language Learning and Production Environment. In S. Granger (ed.) Learner English on
Computer, 186-198. London & New York: Addison Wesley Longman.

Martínez Osés, F. and Neff Van Aertselaer, J. (2001). Corpus analysis of prepositional
patterns in native and non-native university writing. In C. Muñoz, M. L. Celaya, M.
Fernández-Villanueva, T. Navés, O. Strunk and E. Tragant (eds.), Trabajos en Lingüística
Aplicada, 139-147. Barcelona: Univerbook.

Mauranen, A. (2004).Spoken corpus for an ordinary learner. In John Sinclair (ed) How to Use
Corpora in Language Teaching, 89-105. Amsterdam: John Benjamins.

Nattinger, J. R. and J. S. Decarrico. (1992) Lexical phrases and language teaching. Oxford:
Oxford University Press.

Nesselhauf, N. (2004). How learner corpus analysis can contribute to language teaching: A
study of support verb constructions. In G. Aston, S. Bernardini and D. Stewart (eds.),
Corpora and Language Learners, 109-124. Amsterdam and Philadelphia: John Benjamins.

O'Keeffe, A. McCarthy, M. and Carter, R. (2007). From corpus to classroom. Cambridge:
Cambridge Univrsity Press.

Osborne, J. (2004). Top-down and Botom-up Approaches to Corpora in Language Teaching.
In U. Connor and T. A. Upton (eds.). Applied Corpus Linguistics. A Multidimensional
Perspective, 251-265. Amsterdam and New York: Rodopi.




                                                                                                          Page 47 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
    Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.


Römer, U. (2008). Corpora and language teaching.

Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10: 209-231.

Sinclair, J. (2001). Preface. In M. Ghadessy, A. Henry and R. L. Roseberry (eds.), Small
Corpus Studies and ELT. Theory and Practice, vii-xv. Amsterdam and Philadelphia: John
Benjamins.

Sinclair, J. (2004). New evidence, new priorities, new attitudes. In John Sinclair (ed) How to
Use Corpora in Language Teaching, 271-299. Amsterdam: John Benjamins.

Tayfoor, S. (2004). Common Mistakes at First Certificate… and How to Avoid Them.
Cambridge: Cambridge University Press.

Tribble, C. and Jones, G. (1990). Concordances in the classroom. London: Longman.

Turton, N. D. and Heaton, J. B. (1987). Longman Dictionary of Common Errors. Harlow:
Longman.




                                                                                                          Page 48 of 48

More Related Content

What's hot

English for Specific Purposes Lecture 7
English for Specific Purposes Lecture 7English for Specific Purposes Lecture 7
English for Specific Purposes Lecture 7Hameed Al-Zubeiry
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
Translatability and untranslatability
Translatability and untranslatabilityTranslatability and untranslatability
Translatability and untranslatabilityAmer Minhas
 
Communicative competence
Communicative competenceCommunicative competence
Communicative competenceDrew F
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teachingJonathan Smart
 
pragmatics speech act theory promises, felicity conditions
pragmatics speech act theory promises, felicity conditionspragmatics speech act theory promises, felicity conditions
pragmatics speech act theory promises, felicity conditionsSajid Ali
 
Corpus linguistics, ch6
Corpus linguistics, ch6Corpus linguistics, ch6
Corpus linguistics, ch6VivaAs
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics introAlex Curtis
 
Discourse analysis for language teacher.
Discourse analysis for language teacher.Discourse analysis for language teacher.
Discourse analysis for language teacher.Lenin Lopez
 
English For Specific Purposes
English For Specific PurposesEnglish For Specific Purposes
English For Specific Purposesguest7f1ad678
 
Approches to Syllabus Design
Approches to Syllabus DesignApproches to Syllabus Design
Approches to Syllabus DesignKAthy Cea
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The ClassroomColin Graham
 
Concordancing 1
Concordancing 1Concordancing 1
Concordancing 1Hala Fawzi
 

What's hot (20)

English for Specific Purposes Lecture 7
English for Specific Purposes Lecture 7English for Specific Purposes Lecture 7
English for Specific Purposes Lecture 7
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Translatability and untranslatability
Translatability and untranslatabilityTranslatability and untranslatability
Translatability and untranslatability
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Communicative competence
Communicative competenceCommunicative competence
Communicative competence
 
Task based syllabus
Task based syllabusTask based syllabus
Task based syllabus
 
Genre Analysis
Genre AnalysisGenre Analysis
Genre Analysis
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teaching
 
pragmatics speech act theory promises, felicity conditions
pragmatics speech act theory promises, felicity conditionspragmatics speech act theory promises, felicity conditions
pragmatics speech act theory promises, felicity conditions
 
Corpus linguistics, ch6
Corpus linguistics, ch6Corpus linguistics, ch6
Corpus linguistics, ch6
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
 
Discourse competence
Discourse competenceDiscourse competence
Discourse competence
 
Discourse analysis for language teacher.
Discourse analysis for language teacher.Discourse analysis for language teacher.
Discourse analysis for language teacher.
 
English For Specific Purposes
English For Specific PurposesEnglish For Specific Purposes
English For Specific Purposes
 
Audio lingual method 111
Audio lingual method 111Audio lingual method 111
Audio lingual method 111
 
Approches to Syllabus Design
Approches to Syllabus DesignApproches to Syllabus Design
Approches to Syllabus Design
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Concordancing 1
Concordancing 1Concordancing 1
Concordancing 1
 

Viewers also liked

What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discoursePascual Pérez-Paredes
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learningnfuadah123
 
What can a corpus tell us about lexis (1)
What can a corpus tell us about lexis (1)What can a corpus tell us about lexis (1)
What can a corpus tell us about lexis (1)Pascual Pérez-Paredes
 
Foreign Language Classroom Assessment in Support of Teaching and Learning
Foreign Language Classroom Assessment in Support of Teaching and LearningForeign Language Classroom Assessment in Support of Teaching and Learning
Foreign Language Classroom Assessment in Support of Teaching and LearningCALPER
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?Pascual Pérez-Paredes
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysisAseel K. Mahmood
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basicsJorge Baptista
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsRaul Vargas
 
Corpus linguistics - an introduction
Corpus linguistics  - an introductionCorpus linguistics  - an introduction
Corpus linguistics - an introductionC.B. Balaban
 

Viewers also liked (12)

Discourse and corpus
Discourse and corpusDiscourse and corpus
Discourse and corpus
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
 
What can a corpus tell us about lexis (1)
What can a corpus tell us about lexis (1)What can a corpus tell us about lexis (1)
What can a corpus tell us about lexis (1)
 
Foreign Language Classroom Assessment in Support of Teaching and Learning
Foreign Language Classroom Assessment in Support of Teaching and LearningForeign Language Classroom Assessment in Support of Teaching and Learning
Foreign Language Classroom Assessment in Support of Teaching and Learning
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics - an introduction
Corpus linguistics  - an introductionCorpus linguistics  - an introduction
Corpus linguistics - an introduction
 
Corpus linguistics and pragmatics
Corpus linguistics and pragmaticsCorpus linguistics and pragmatics
Corpus linguistics and pragmatics
 

Similar to Language corpora and the language classroom.

The Strengths And Theories Of The First Language Acquisition
The Strengths And Theories Of The First Language AcquisitionThe Strengths And Theories Of The First Language Acquisition
The Strengths And Theories Of The First Language AcquisitionErin Torres
 
ESL teacher training orientation
ESL teacher training orientationESL teacher training orientation
ESL teacher training orientationBobbee Pennington
 
Language, The, And The Glass Broke
Language, The, And The Glass BrokeLanguage, The, And The Glass Broke
Language, The, And The Glass BrokeLaura Williams
 
Second Language Learning Essay
Second Language Learning EssaySecond Language Learning Essay
Second Language Learning EssayJill Crawford
 
General Education Classroom Teachers Are Responsible For...
General Education Classroom Teachers Are Responsible For...General Education Classroom Teachers Are Responsible For...
General Education Classroom Teachers Are Responsible For...Jennifer Reither
 
Thesis- Vocabulary
Thesis- VocabularyThesis- Vocabulary
Thesis- VocabularySyracuse2
 
Psychology and Language Learning (I Bimestre)
Psychology and Language Learning (I Bimestre)Psychology and Language Learning (I Bimestre)
Psychology and Language Learning (I Bimestre)Videoconferencias UTPL
 
Intercultural Communication Essay
Intercultural Communication EssayIntercultural Communication Essay
Intercultural Communication EssayJamie Miller
 
Advantages And Disadvantages Of Vocabulary Learning
Advantages And Disadvantages Of Vocabulary LearningAdvantages And Disadvantages Of Vocabulary Learning
Advantages And Disadvantages Of Vocabulary LearningLindsey Jones
 
Changing Contexts And Shifting Paradigms In Pronunciation Teaching
Changing Contexts And Shifting Paradigms In Pronunciation TeachingChanging Contexts And Shifting Paradigms In Pronunciation Teaching
Changing Contexts And Shifting Paradigms In Pronunciation Teachingenglishonecfl
 
Learning grammar for young learner
Learning grammar for young learnerLearning grammar for young learner
Learning grammar for young learnerMaretha Agape
 
Research paper arnoldo cabrera
Research paper arnoldo cabreraResearch paper arnoldo cabrera
Research paper arnoldo cabreraGustavo Catalan
 
The Theories Of Child Language Acquisition
The Theories Of Child Language AcquisitionThe Theories Of Child Language Acquisition
The Theories Of Child Language AcquisitionBrenda Thomas
 
Language Shaped The Reality Of Our World
Language Shaped The Reality Of Our WorldLanguage Shaped The Reality Of Our World
Language Shaped The Reality Of Our WorldCarla Jardine
 
Acknowledgement
AcknowledgementAcknowledgement
Acknowledgementmaddy992
 

Similar to Language corpora and the language classroom. (20)

Imprtant
ImprtantImprtant
Imprtant
 
The Strengths And Theories Of The First Language Acquisition
The Strengths And Theories Of The First Language AcquisitionThe Strengths And Theories Of The First Language Acquisition
The Strengths And Theories Of The First Language Acquisition
 
ESL teacher training orientation
ESL teacher training orientationESL teacher training orientation
ESL teacher training orientation
 
Authentic material
Authentic materialAuthentic material
Authentic material
 
Language, The, And The Glass Broke
Language, The, And The Glass BrokeLanguage, The, And The Glass Broke
Language, The, And The Glass Broke
 
bilingualism
bilingualism bilingualism
bilingualism
 
Second Language Learning Essay
Second Language Learning EssaySecond Language Learning Essay
Second Language Learning Essay
 
SLA
SLASLA
SLA
 
General Education Classroom Teachers Are Responsible For...
General Education Classroom Teachers Are Responsible For...General Education Classroom Teachers Are Responsible For...
General Education Classroom Teachers Are Responsible For...
 
Thesis- Vocabulary
Thesis- VocabularyThesis- Vocabulary
Thesis- Vocabulary
 
Psychology and Language Learning (I Bimestre)
Psychology and Language Learning (I Bimestre)Psychology and Language Learning (I Bimestre)
Psychology and Language Learning (I Bimestre)
 
Intercultural Communication Essay
Intercultural Communication EssayIntercultural Communication Essay
Intercultural Communication Essay
 
Advantages And Disadvantages Of Vocabulary Learning
Advantages And Disadvantages Of Vocabulary LearningAdvantages And Disadvantages Of Vocabulary Learning
Advantages And Disadvantages Of Vocabulary Learning
 
Changing Contexts And Shifting Paradigms In Pronunciation Teaching
Changing Contexts And Shifting Paradigms In Pronunciation TeachingChanging Contexts And Shifting Paradigms In Pronunciation Teaching
Changing Contexts And Shifting Paradigms In Pronunciation Teaching
 
Learning grammar for young learner
Learning grammar for young learnerLearning grammar for young learner
Learning grammar for young learner
 
Research paper arnoldo cabrera
Research paper arnoldo cabreraResearch paper arnoldo cabrera
Research paper arnoldo cabrera
 
Glossary
GlossaryGlossary
Glossary
 
The Theories Of Child Language Acquisition
The Theories Of Child Language AcquisitionThe Theories Of Child Language Acquisition
The Theories Of Child Language Acquisition
 
Language Shaped The Reality Of Our World
Language Shaped The Reality Of Our WorldLanguage Shaped The Reality Of Our World
Language Shaped The Reality Of Our World
 
Acknowledgement
AcknowledgementAcknowledgement
Acknowledgement
 

More from Pascual Pérez-Paredes

Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
A contrastive analysis of native and non-native speaker interviews
A contrastive analysis of native and non-native speaker interviewsA contrastive analysis of native and non-native speaker interviews
A contrastive analysis of native and non-native speaker interviewsPascual Pérez-Paredes
 
Education as a multilingual and multicultural space
Education as a multilingual and multicultural spaceEducation as a multilingual and multicultural space
Education as a multilingual and multicultural spacePascual Pérez-Paredes
 
Higher Education as a multilingual and multicultural space
Higher Education as a multilingual and multicultural spaceHigher Education as a multilingual and multicultural space
Higher Education as a multilingual and multicultural spacePascual Pérez-Paredes
 
English-medium instruction as a transformation policy
English-medium instruction as a transformation policyEnglish-medium instruction as a transformation policy
English-medium instruction as a transformation policyPascual Pérez-Paredes
 
European Commission Erasmus – Facts, Figures & Trends.
European Commission Erasmus – Facts, Figures & Trends.European Commission Erasmus – Facts, Figures & Trends.
European Commission Erasmus – Facts, Figures & Trends.Pascual Pérez-Paredes
 
Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pascual Pérez-Paredes
 
Aesla 2011 getting_things_done_pascual_pérez-paredes
Aesla 2011 getting_things_done_pascual_pérez-paredesAesla 2011 getting_things_done_pascual_pérez-paredes
Aesla 2011 getting_things_done_pascual_pérez-paredesPascual Pérez-Paredes
 
Rannsókn á lestrarvenjum og notkun bókmennta
Rannsókn á lestrarvenjum og notkun bókmenntaRannsókn á lestrarvenjum og notkun bókmennta
Rannsókn á lestrarvenjum og notkun bókmenntaPascual Pérez-Paredes
 
Involvement in personal narratives-ma of learner language
Involvement in personal narratives-ma of learner languageInvolvement in personal narratives-ma of learner language
Involvement in personal narratives-ma of learner languagePascual Pérez-Paredes
 
Jornada lectura lit. infantil September 28, 2011
Jornada lectura lit. infantil September 28, 2011Jornada lectura lit. infantil September 28, 2011
Jornada lectura lit. infantil September 28, 2011Pascual Pérez-Paredes
 
Teaching and learning children litarature in europa ni̇han
Teaching and learning children litarature in europa ni̇hanTeaching and learning children litarature in europa ni̇han
Teaching and learning children litarature in europa ni̇hanPascual Pérez-Paredes
 
UK Comenius project dissemination event
UK Comenius project dissemination eventUK Comenius project dissemination event
UK Comenius project dissemination eventPascual Pérez-Paredes
 

More from Pascual Pérez-Paredes (20)

TELL-OP App - How it works
TELL-OP  App - How it worksTELL-OP  App - How it works
TELL-OP App - How it works
 
TELL-OP App
TELL-OP  App TELL-OP  App
TELL-OP App
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
A contrastive analysis of native and non-native speaker interviews
A contrastive analysis of native and non-native speaker interviewsA contrastive analysis of native and non-native speaker interviews
A contrastive analysis of native and non-native speaker interviews
 
Education as a multilingual and multicultural space
Education as a multilingual and multicultural spaceEducation as a multilingual and multicultural space
Education as a multilingual and multicultural space
 
Higher Education as a multilingual and multicultural space
Higher Education as a multilingual and multicultural spaceHigher Education as a multilingual and multicultural space
Higher Education as a multilingual and multicultural space
 
English-medium instruction as a transformation policy
English-medium instruction as a transformation policyEnglish-medium instruction as a transformation policy
English-medium instruction as a transformation policy
 
European Commission Erasmus – Facts, Figures & Trends.
European Commission Erasmus – Facts, Figures & Trends.European Commission Erasmus – Facts, Figures & Trends.
European Commission Erasmus – Facts, Figures & Trends.
 
Escribir ciencia en inglés
Escribir ciencia en inglésEscribir ciencia en inglés
Escribir ciencia en inglés
 
Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...
 
Using pedagogic corpora in ELT
Using pedagogic corpora in ELTUsing pedagogic corpora in ELT
Using pedagogic corpora in ELT
 
Aesla 2011 getting_things_done_pascual_pérez-paredes
Aesla 2011 getting_things_done_pascual_pérez-paredesAesla 2011 getting_things_done_pascual_pérez-paredes
Aesla 2011 getting_things_done_pascual_pérez-paredes
 
Los blogs en el área de humanidades
Los blogs en el área de humanidadesLos blogs en el área de humanidades
Los blogs en el área de humanidades
 
Kynnig á degi íslenskrar tungu
Kynnig á degi íslenskrar tunguKynnig á degi íslenskrar tungu
Kynnig á degi íslenskrar tungu
 
Rannsókn á lestrarvenjum og notkun bókmennta
Rannsókn á lestrarvenjum og notkun bókmenntaRannsókn á lestrarvenjum og notkun bókmennta
Rannsókn á lestrarvenjum og notkun bókmennta
 
Involvement in personal narratives-ma of learner language
Involvement in personal narratives-ma of learner languageInvolvement in personal narratives-ma of learner language
Involvement in personal narratives-ma of learner language
 
Jornada lectura lit. infantil September 28, 2011
Jornada lectura lit. infantil September 28, 2011Jornada lectura lit. infantil September 28, 2011
Jornada lectura lit. infantil September 28, 2011
 
Teaching and learning children litarature in europa ni̇han
Teaching and learning children litarature in europa ni̇hanTeaching and learning children litarature in europa ni̇han
Teaching and learning children litarature in europa ni̇han
 
UK Comenius project dissemination event
UK Comenius project dissemination eventUK Comenius project dissemination event
UK Comenius project dissemination event
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
 

Recently uploaded

AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 

Recently uploaded (20)

AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 

Language corpora and the language classroom.

  • 1. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Language corpora and the language classroom 1. Introduction These days, language corpora are being used by language teachers, researchers and students more and more often. Computers have become widely available in homes and schools, corpora can be searched on the Internet for free and corpus resources have improved the quality and the access to the methods of corpus linguistics in applied fields such as foreign language teaching. Compiling your own ad-hoc corpus or a corpus of your own students is easier today than ever before and free resources abound. The most important application of corpora in language classrooms is called Data-driven learning. Corpus Linguistics (CL) and Data-driven learning (DDL) are two terms that have caught the attention of teachers in foreign language teaching (FLT) and researchers alike for a decade now. This is so because the assumptions behind CL and DDL are of enormous importance to language researchers and FL teachers. In a very recent publication, O'Keeffe, McCarthy and Carter (2007:21) state the following about the application of language corpora in FLT: As well as providing an empirical basis for checking our intuitions about language, corpora have also brought to light features about language which had eluded our intuition […] In terms of what we actually teach, numerous studies have shown us that the language presented in textbooks is frequently still based on intuitions about how we use language, rather than actual evidence of use. It seems that language corpora can help us discover that which apparently appears undisputed in prescriptive or in intuition-led textbooks and other reference materials. In the following paragraphs, we will offer a brief account of the implications of CL and DDL for mainstream FLT. In particular, we aim to present useful insights into how using language corpora can help our teaching. Most of the resources presented in this chapter are freely available on the Internet. Page 1 of 48
  • 2. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 2. Corpus linguistics and Data and Data-driven learning in a nutshell 2.1. Data in FLT: preliminary issues Data-driven learning is a language learning approach that is “basically developed through self-conscious activities instead of being imparted through conceptual knowledge” (Pérez Basanta, C and Rodríguez Martín: 146-7). In DDL, learners become active researchers, they see language from a different perspective and discover language and communication facts that otherwise may remain unseen. In DDL, reading concordance lines is a usual practice. Take the word important, a basic adjective that learners use on an everyday basis in schools. The following screenshot from Collins WordbanksOnline English corpus1 shows fifty random uses of the Word in a 10- million corpus of spoken British English: 1 http://www.collins.co.uk/Corpus/CorpusSearch.aspx Page 2 of 48
  • 3. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 1. Sample concordances of important in the Collins WordbanksOnline English corpus. In a way, DDL promotes vertical reading rather than horizontal reading as learners are invited to look at the accumulated frequency and co-occurrence of lexical items. In Figure 1, learners could note the following: The words to the left of important: more, most, quite, awfully, very, etc. The words to the right of important: to + infinitive, factor, thing, point, etc. However, using concordance lines is useful to note language behaviour that goes beyond the boundaries of two words that appear in contiguity. Take the word sure as an instance. The Cambridge Advanced Learner‟s Dictionary2 offers 8 entries for the word. You can find the entries and examples below: 1: certain; without any doubt: "What's wrong with him?" "I'm not really sure." 2 http://dictionary.cambridge.org/ Page 3 of 48
  • 4. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. I'm sure (that) I left my keys on the table. I feel absolutely sure (that) you've made the right decision. It now seems sure (that) the election will result in another victory for the government. Simon isn't sure whether/if he'll be able to come to the party or not. Is there anything you're not sure of/about? There is only one sure way (= one way that can be trusted) of finding out the truth. See also cocksure. 2 be sure of/about sb to have confidence in and trust someone: Henry has only been working for us for a short while, and we're not really sure about him yet. You can always be sure of Kay. 3 be sure of yourself to be very or too confident: She's become much more sure of herself since she got a job. 4 be sure of sth be confident that something is true: He said that he wasn't completely sure of his facts. 5 be sure of getting/winning sth to be certain to get or win something: We arrived early, to be sure of getting a good seat. A majority of Congress members wanted to put off an election until they could be sure of winning it. 6 be sure to to be certain to: She's sure to win. I want to go somewhere where we're sure to have good weather. 7 make sure (that) to look and/or take action to be certain that something happens, is true, etc: Make sure you lock the door behind you when you go out. 8 If you have a sure knowledge or understanding of something, you know or understand it very well: I don't think he has a very sure understanding of the situation. Isolated from any context, sure is usually taught as being highly assertive, that is, it is taught to express certainty like I’m sure I was there. Of course, there is nothing wrong with this. As you have read above, this is the usual mainstream use of the word. However, if we search for sure in a corpus, in this case the SACODEYL English corpus of European young people, we will find that there is a new pattern which emerges clearly: I‟m not sure + what / if/ whether. See Figure 2: Page 4 of 48
  • 5. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 3: sure in SACODEYL English corpus. It appears that I’m not sure is a powerful pattern to express hedging or tentative opinion as in I’m not sure if I’d like to live there. Or followed by a canonical Subject + Verb + Complement clause to indicate contrast or opinion as in I’m not sure. I’ve always wanted to be... or in I’m not sure. I find art relaxing because… As you can see, when we examine the different contexts in which a node is found, that is, the word you are looking up, we can clearly see different patterns of use that are not always found in textbooks or dictionaries. Corpus linguists often discuss this phenomenon and try to account for it by looking at language as a lexico-grammatical field of interplay rather than one where meaning is created by the use of word in isolation (i.e. sure). Bernardini (2004:16) highlights the fact that in DDL there is a “shift of emphasis from deductive to inductive learning routines” which has a great impact on the agents of FLT. This is summarised in Table 1: FLT agents Shift Teachers Become coordinators of research and facilitator Learners Learn how to learn through exercises that involve the observation and interpretation of patterns of use Pedagogic grammars Are now informed by enough evidence and stimuli for the learner to Page 5 of 48
  • 6. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. arrive at developmentally-appropriate generalisations Table 1. Shift of emphasis in DDL-FLT (Bernardini 2004: 16-7). DDL then is about using data to promote richer language learning experiences. The definition needs clarification, though. D in DDL stands for data, in other words, for language data: However, we should say that in the CL literature these data markedly present a computational reading. We will try to go deep in the implications for language teachers and deflate the obscurity that the term may shed in the following paragraphs. 2.1.1. Our English teaching is mediated by language data We may have not reflected on the issue before, but when we decide on a textbook we are opting for a particular set of language data to be used in our classroom. In all probability, you face a situation where the Education Authorities have set an official curriculum that you are bound to abide by. In a similar way, as a member of a large institution, you are required to follow certain general methodological guidelines. Leaving organizational aspects aside, however, teachers have the chance to reflect on their teaching and choose the materials that best suit their learners. What choices can you make in terms of the contents of your teaching? What are the main ingredients of your teaching? Do you stick to a textbook? If so, to what extent do you or your Department consider the language in there? Have you examined the language used in your textbook? This is a fundamental issue that deserves our attention. EFL teachers, as most professionals in other teaching areas, rely on solvent, reliable publishing houses that make an effort to mediate between the learners and their teachers. In this process, the teacher, or group of teachers of a school, has the opportunity to revise first and select then the textbooks that will be later used. If we use language corpora as a complement to our teaching, we will be enlarging the width of the scope of the language that we present to our students and, certainly, we will be enriching their learning environment (Aston 1997). But, before we move on to dealing with the ways in which we can use language corpora, let us consider briefly the very basics of corpus linguistics. Page 6 of 48
  • 7. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 2.2. Introducing Corpus Linguistics Corpus linguistics (CL) makes use of data to gain insight into how language works. A well- known definition for corpus is the following: Any collection of more than one text can be called a corpus, (corpus being Latin for "body", hence a corpus is any body of text). But the term "corpus" when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition3. This definition is well rooted in the linguistic tradition, and thus the connotations that McEnery and Wilson bring up are concerned with the role of a corpus in a research-oriented paradigm. These connotations are  representativeness,  size,  machine-readable form and  standard reference. If linguists claim that using a corpus is a convenient way to research language use and behaviour, they have to make sure that their tool, that is their language corpus, and their methodology are geared towards maximizing the representative quality of the language samples that have been included in the corpus. McEnerey and Wilson have put it this way: We are therefore interested in creating a corpus which is maximally representative of the variety under examination, that is, which provides us with an as accurate a picture as possible of the tendencies of that variety, as well as their proportions. What we are looking for is a broad range of authors and genres which, when taken together, may be considered to "average out" and provide a reasonably accurate picture of the entire language population in which we are interested4. An example of all this is the British National Corpus (BNC). The BNC claims to be representative of the English language used in the UK in the late 80‟s; its size (100 million words) is big enough to include most communications genre and textual types; it is of course electronic and, as a consequence of it all, it has become a standard reference of British English. The BNC is introduced in its website as follows: The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written. The latest edition is the BNC XML Edition, released in 2007. 3 McEnery and Wilson. Corpus Linguistics. Available at http://bowlandfiles.lancs.ac.uk/monkey/ihe/linguistics/corpus2/2fra1.htm 4 Idem. Page 7 of 48
  • 8. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins5. The BNC can be searched free of charge from http://www.natcorp.ox.ac.uk/ The results are limited to 50 hits, but this is enough to have a clear idea of what we are looking into: Figure 3. The BNC website. However, using corpora is not the ultimate, one and only solution to linguistic inquiry and research. This is not the place to revisit the old controversy between Noam Chomsky and Charles Fillmore, two influential linguists of the second half of the XXth century. The former has overtly criticized the use of language corpora as they are not seen as a reliable way to render the complexity and vastness of language. Chomsky believed that the rules governing a language could actually be scrutinized through introspection; the actual performance was considered, by contrast, something that could not be apprehended. Fillmore criticised armchair linguists that do not use real, that is, attested language data and, on the contrary, rely on their own intuition and idiolect to develop complex theories of language. 5 From http://www.natcorp.ox.ac.uk/corpus/index.xml Page 8 of 48
  • 9. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. By the way, Fillmore criticises similarly corpus linguists that waste their time on design issues, but that‟s a different story. The point here is that there has traditionally been a controversy between introspection and data examination as valid tools for linguistic analysis. Corpus Linguistics has gained now the interest of many researchers that believe that data need to be collected before we can jump into conclusions about language use. In this sense, CL methodology is empirical and data-driven. Corpus-based research can be then characterised by two main features (Conrad 1999:3-4): 1. The use of a principled collection of naturally-occurring texts, that is, a corpus. The BNC discussed above. 2. The use of computers for language analyses. Depending on the items being analysed, these can be automatic or may need human interaction. Corpus-based studies include both quantitative analyses and functional interpretations of language use. The following table offers the basics of CL: Page 9 of 48
  • 10. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Term Explanation Chunks Groups of words that cluster together in n-number of words, i.e., 2,3,4,5, etc. These are not necessarily phrases (i.e. Noun Phrases) or clauses, but rather words that combine together in a statistically significant way. I don’t know, what I really mean or a couple of are good examples of chunks. Collocates Words that occur frequently in contiguity or almost in contiguity. To determine whether a collocate is significant, the software package performs statistical analyses. Concordance Lines of text which show a node in the middle. The node is the word or string lines of words that is being searched in a corpus. Concordancer The software that generates concordance lines. Corpus A principled collection of texts. This collection should follow strict design guidelines if the corpus is to represent a language or a register. Wordlist The list of words that are found in a corpus or in a particular text. This list usually shows the frequency of occurrence and, possibly, other statistical indexes. Table 2. The basics of CL. All these terms are usually found in descriptive accounts of English and have a very interesting potential in language learning. For example, chunks are strings of n-words that cluster together in a systematic way. Linguists such as Lewis (1993) or Nattinger and De Carrico (1992) have stressed that lexis is primed over grammar in discourse: Lexis is central in creating meaning, grammar plays a subservient managerial role. If you accept this principle then the logical implication is that we should spend more time helping learners develop their stock of phrases, and less time on grammatical structures6. Corpora are useful in revealing that the language speakers use relies heavily on chunking, that is, the repetition of string of words. O'Keeffe, McCarthy and Carter (2007:60) highlight that “language is available for use in ready-made chunks to a far greater extent than could ever be accommodated by a theory of language which rested upon the primacy of syntax”. Let us give you real instances of chunking in English. These authors have used the CANCODE corpus 7, a 5-million word corpus of spoken British English, to generate the most frequent chunks of n- words. These are the results for the top 1 and 2: Top 1 chunk Top 2 chunk 3-word chunks I don‟t know a lot of 4-work chunks You know what I know what I mean 5-word chunks you know what I mean at the end of the 6-word chunks do you know what I mean at the end of the day and these for the top 15 and 19 (chosen at random): Top 15 chunk Top 19 chunk 6 Islam and Timmis: http://www.teachingenglish.org.uk/think/methodology/lexical_approach1.shtml 7 http://www.cambridge.org/elt/corpus/cancode.htm Page 10 of 48
  • 11. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 3-word chunks I think it‟s you know the 4-work chunks or something like that that sort of thing 5-word chunks I don‟t know what it an hour and a half 6-word chunks and at the end of the if you see what I mean (top 16) O'Keeffe, McCarthy and Carter (2007:71) state that despite being syntactic fragments, these chunks perform a very important pragmatic function beyond the word level and, significantly, many have a discourse marking function (I mean, you know, you know what I mean, at the end of the day, if you see what I mean,...). In the same way, a corpus can be used to generate collocates, frequency lists and, as seen, concordance lines. There are software packages that can handle this. Probably WordSmith 5.08 is one of the most complete suites available. Interesting non-commercial applications include: Generate concordance lines for every word in a text: Text-based concordances: http://www.lextutor.ca/concordancers/text_concord/ Generate chunks for a text: N-Gram phrase extractor: http://www.lextutor.ca/tuples/eng/ Search principled corpora: Online concordancer: http://www.lextutor.ca/concordancers/concord_e.html Generalte concordance lines, frequency lists, etc.: Tubo Lingo: http://www.staff.amu.edu.pl/~sipkadan/lingo.htm 8 http://www.lexically.net/wordsmith/ Page 11 of 48
  • 12. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 4. Online concordancer. Page 12 of 48
  • 13. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 2.3. How can we make use of Corpus Linguistics? Indirect approaches Following Geoffrey Leech, Römer (2008) distinguishes between indirect and direct applications of CL in the field of language teaching. Indirect approaches to corpora provide access to corpus-informed insights into the nature of language. Those who consume this information are typically, although not exclusively, researchers and language material writers and designers. The typical users of this approach are teachers and learners themselves. The following figure summarises this dichotomy: Figure 5. Indirect and direct applications of CL in the language classroom (Römer 2008). Direct approaches are focused on straight, hands-on learning activities and the generation of classroom material. These direct hands-on experiences can be either guided or unguided by the teachers, and thus it is likely that most teachers find tasks that are suitable to their students‟ needs and contexts. Indirect approaches to using corpora in the language classroom have occupied the agenda of applied linguists for over a quarter of a century now. These approaches have benefited from linguistic research into the nature of language and offer a fresh non-normative view of naturally occurring language. One of the main contributions of these studies is that corpus data very often question our perceptions of how language works. A good example of this is Biber (1988) and, particularly useful in the context of FLT, Biber at al. (1999): Page 13 of 48
  • 14. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 6. Longman Grammar of Spoken and Written English (LGSWE). The authors of the LGSWE claim that this work “describes the actual use of grammatical features in different varieties of English: mainly conversation, fiction, newspaper language, and academic prose […] The LGSWE adopts a corpus-based approach, which means that the grammatical descriptions are based on the patterns of structure and use found in a large collection of spoken and written texts, stored electronically, and searchable by computer” (Biber et al. 1999: 4). So the idea here is that a well-designed corpus can be useful in learning more about how language works. This is useful for both native and non-native speakers as even the latter cannot rely on pure intuition to determine how language works across every single register and communicative domain. Let us have a look at one syntactical construction to illustrate the usefulness of corpora in the language classroom. Existential clauses contain, in most cases, be as a verb and there as a subject: There is no coffee is a nice example of locative here. There, however, introduces other verbs: seem, appear, suppose and use to are nice examples. When to use one or another as their meanings are so close? In the LGSWE we find corpus-driven information that tells us that the frequency of appearance of these verbs after existential there depends on the textual and domain features of the communicative event. Thus there exist/exists is very frequent in academic texts while it is rare or infrequent in conversation, fiction and news language. There come/comes, on the contrary, is infrequent in academic language, conversation and news, but very often found in fiction texts and creative language use. Figure 7 illustrates this point: Page 14 of 48
  • 15. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 7. Verbs other than be in existential constructions. Biber et al. (1999). When these and similar verbs are followed by to be we discover interesting facts. There seem/seems to be is found to occur across all 4 domains and textual types while there used to be is untypical and not frequent at all in fiction, news or academic language: Figure 8. To be after some verbs in some existential constructions. Biber et al. (1999). In these examples we can note the interplay between grammatical categories and register. Page 15 of 48
  • 16. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 3. Direct approaches As stated, direct approaches are more prone to immediate, straightforward classroom applications. In some schools, it might be convenient to make use of a computer room while in others teachers will prefer to develop materials that can be printed and later distributed. The nature of the lesson will determine what kind of interaction we expect from our students. 3.1. Some tips If you want your learners to plunge into using a corpus, our suggestion is to follow a carefully-planned route: 1. Select a small group of learners. Using technology is cumbersome at times and computers tend to crash in multimedia LANs which are often used by many. If your LAN restricts IPs or domains, make sure before hand that the sites you plan to use are availble. 2. Avoid meta-language, such as linguistics, node or principled corpus. It is language, real language that your learners will be more interested in. 3. Before getting your students to use a concordancer or a similar tool, distribute activities where they can get used to reading vertically rather than horizontally. Make sure they get used to interpreting the context and making hypothesis about contexts of use and prosodies, that is, whether the line is used in a derogatory way or positively. 4. Select what you want your students will be looking up well beforehand. Examples or activities that are over the top easily discourage students. 5. Try to put interesting questions to your students. Motivate them and make them become interested in turning themselves into researchers or, better, detectives. 6. Select carefully the corpus you want to use. You may consider building your own corpus. 3.2. Activities: using SACODEYL A corpus is an excellent tool to discover language behaviour and to learn more about collocations and patterning. In teaching contexts, principled corpora may not adapt well to your students‟ level, especially if these are very young. We recommend that you build your own collection of texts if they are suitable to your students‟ needs. However, using SACODEYL is a more straightforward option if you want to use teen talk, multimedia corpora: Page 16 of 48
  • 17. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. By using a corpus as a tool to find out language, learners are given the chance to empower their inductive skills to learn about language, which is highly instrumental for further learning. Sinclair (2004:288) is definitely optimistic about the unmediated use of reference corpora in the language classroom: ...both teacher and student can make use of a corpus right away, with only a modest few hours orientation; there is no need to wait for the new textbooks and reference books. Only fairly simple queries can be handled at this stage, but the results can be illuminating and very helpful. For this, you will need a computer of normal performance, a corpus and some query software. Will the corpus be 100% reliable, comprehensive and representative? Of course not, but do your present books match these targets? Or your reference grammars and dictionaries? Or any native speaker models? Or any combination of these? Of course not. Despite Sinclair‟s statement, the teaching context in secondary education is still far from complying with much of the requirements above. Good reference corpora are commercial and search tools are difficult to handle9. Mauranen (2004:1999) has voiced her concern for the actual use of innovation in classrooms: No teaching method can become an important innovation, whatever its potential, if it does not make its way to the normal classroom where teachers and students ca use it as part of their everyday routines, whit not too much extra hassle. Fortunately, there are now a few instances of pedagogical corpora whose focus is more on learning than on linguistic research and which happen to be free to use. SACODEYL is one of these pedagogically-motivated corpora. ELISA, its predecessor and inspiration, is another interesting effort: ELISA is a collection of video-based interviews with native speakers of different varieties of English (e.g. US, England, Scotland, Ireland, Australia) and from different walks of life. They talk about their professional career. All interviews follow a general pattern, covering a similar range of topics, e.g. the what the speakers do, their educational background, how they started their career or business, the type of projects they are involved in, their daily routines and future plans. While some of the speakers engage in unusual professions (e.g. a tour guide at Ayers Rock, a guitar teacher, a travel journalist and an arts therapist) and thus make for the attraction of the materials, they all describe issues of general interest in professional contexts. The corpus currently contains 25 interviews of 5 to 15 minutes. the transcripts amount to about 60,000 words in total10. 9 Guy Aston and Lou Burnard published in 1998 The BNC handbook: exploring the British National Corpus with SARA. Edinburgh Textbooks in Empirical Linguistics, an excellent reference book to fully exploit SARA. 10 http://www.uni-tuebingen.de/elisa/html/elisa_index.html Page 17 of 48
  • 18. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. SACODEYL offers young learners the language and the voices of their peers. As in ELISA, SACODEYL kids talk about their daily routines, about themselves, their schools, their hometowns, their leisure time activities and hobbies, films, books, sports and many other topics. The SACODEYL corpus has been annotated with a view on pedagogical applications. This makes SACODEYL a very interesting complementary material in mainstream teaching where teachers and students can find a familiar range of language/communications context. The following figure illustrates this: Figure 9. SACODEYL search categories. These categories resemble the language and the communication-oriented methodology of mainstream language teaching. Learners ant teachers using SACODEYL may want to navigate the English corpus in exactly the same way as they mavigate the contents of their textbook. In SACODEYL, every interview has been split into sections, that is, convenient teaching and learning stretches of language which have a pedagogical value. Each section has been annotated by experienced teachers who have assigned them a full array of categories and subcategories. Having annotated the corpus, this can be searched accordingly: Page 18 of 48
  • 19. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 10. SACODEYL search categories in detail. Users can also browse interviews: Page 19 of 48
  • 20. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 11. Browse area for SACODEYL English corpus. And sections within interviews, search for sections that meet the criteria you set: Figure 12. Browse area for SACODEYL section search. Let us consider some activities for the language classroom. We assume that your learners are Secondary School students of English, so we will use SACODEYL English corpus, a small Page 20 of 48
  • 21. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. corpus of teenage talk contributed by some 25 interviewees from the Reading area in the UK. Here is a selection of activities that illustrate the type of 3.2.1. Activities focused on communication and attention to form Tell your students to search for [Reading]. You may want to introduce them to the area and neighbouring cities, all of them widely known. Ask them to read the concordance lines and get them to classify (A) words on the left, (B) words on the right and (C) contexts of use: Figure 13. Simple SACODEYL word search. The following screen shows the number of hits by displaying the concordance lines: Page 21 of 48
  • 22. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 14. SACODEYL Search tool. You may want to guide your students in their search. Providing tables to fill in is usually very productive as this keeps students focused on the task, which becomes more convergent: A Write here the most frequent words or punctuation to the left of Reading (like, feel, tell) about (live, be) (here) in the (centre, outskirts) of B Write here the most frequent words or puntuation to the right of Reading as a place ./? festival C Guess: What is it talked about? Context 1 Context 2 Context 3 Page 22 of 48
  • 23. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Reactions to/ opinions on Staying in Reading of leaving Reading festival your hometown / Reading / Travelling where you live Table 3. Fill-in table. In A and B students are invited to observe the surrounding context of a word and note the accumulation of certain instances to the left or to the right of the node. In C, students are invited to make hypotheses about what is being talked about. If desired, you can explore uses of like about / feel about / tell about or [Murcia/ Cartagena as a place] or, more from a communicative perspective, expressing opinion about your city/ place or the place where you live. If you tell your students to search for [like about], they will be given instances where kids use it in a real context embedded in the flow of speech. And more importantly, your students will be presented with an opportunity to disambiguate other uses of [like about]: Figure 15. SACODEYL Search tool. In the case highlighted above, [like about] is used as a hedge, a very common feature of spoken English. This is a convenient way to combine communication oriented teaching and Form-focused instruction. This range of activities is focused on analysing the context of use of a given word [Reading], both linguistically and communicatively. Page 23 of 48
  • 24. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. In a unit where music and concerts are presented, you may want to ask your students to find out about [Reading Festival]. This is what they may find11: Figure 16. SACODEYL Search tool. From here, students can go to the interview section where the speaker talks about it: 11 At the time of writing, the corpus search facility was under construction, so search results may vary. Page 24 of 48
  • 25. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 17. SACODEYL Search tool: section level. and read and listen to what this speakers says about it: Figure 18. SACODEYL English corpus: section level. It is interesting to see how the online nature of spoken discourse affects the way we put things while speaking. In this very short extract, your students can find the following, among others,: -Native correction: [gonna to] -Unfinished sentences: [been so, but] -Contractions not frequently used by Sapnish speakers: [it‟ll be] Page 25 of 48
  • 26. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. As put by Bernardini (2004: 17) working “concordancing in particular may prove unique in the acquisition and restructuring of competence [...] Language learning may be viewed as an inductive process in which meaning and form come to be associated”. 3.2.2. Activities focused on attention to form and communication Römer (2008: 19) has pointed out that concordance lines can be used by teachers to “create DDL exercises tailored to their learners‟ proficiency level and their particular learning needs”. A case in point is the use of articles. This will be dealt with later in chapter 4 from a different angle. Let us search for sections in SACODEYL English corpus that have been annotated as being representative of this particular linguistic feature: Figure 19. SACODEYL English corpus: category search on section level. From this you may want to select stretches of language that can be submitted to students for evaluation and analysis or simply they can be used as materials to improve their mastery of the form. The following bits are interesting for different reasons. A is actually very convenient to see the use of the indefinite article: (A) Page 26 of 48
  • 27. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Interviewer: So, what kind of house do you live in? Can you describe what kind of house you live in? Rachel: It‟s a semi-detached and it‟s got a garage and a big garden and it‟s quite big. It‟s got quite a lot of rooms but I have to share my room with my sister. You could present this in a cloze format: Interviewer: So, what kind of ...house do you live in? Can you describe what kind of ...house you live in? Rachel: It‟s ... semi-detached and it‟s got ...garage and ... big garden and it‟s quite big. It‟s got quite ... lot of rooms but I have to share my room with my sister. In B, we can notice the presence of the zero article: (B) Interviewer V: You say you‟ve got a lot of work this year why is that? Sam: It‟s our first year of GCSEs so you‟ve got course work and it‟s like writing essays for different subjects. And recently we‟ve been doing English we did a we did a we did course work on a book Hard Times by Charles Dickens. Which was a bit boring but, but we‟ve finished that now so it‟s alright. You could present this in a cloze format: Interviewer V: You say you‟ve got a lot of work this year why is that? Sam: It‟s our first year of GCSEs so you‟ve got ...course work and it‟s like writing ...essays for ...different subjects. And recently we‟ve been doing ...English we did a we did a we did ...course work on ... book Hard Times by Charles Dickens. Which was a bit boring but, but we‟ve finished that now so it‟s alright. In actual fact, (B) can be expanded easily into an interesting source for pragmatic information including sentence restructuring [did a a we did], sentence relatives to express evaluation [Which was a bit boring] and conclusion [so it‟s alright]. Barlow (1996) sees in activities like these a potential for teachers to enrich the learning environment and students‟ knowledge of language. For a thorough account of concordance-based DDL, we suggest reading a practical book on the issue (Tribble and Jones 1990): Page 27 of 48
  • 28. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 20. Concordances in the classroom, by Chris Tribble and Glyn Jones. Longman 1990. Page 28 of 48
  • 29. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 4. Indirect approaches: Learner corpora in the EFL classroom 4.1. Definition Among the many types of corpora which can be compiled, analysed and used (see McEnery, Xiao and Tono, 2006, for an overview), Computer Learner Corpora (CLC) stand out as one of the most powerful pedagogic tools for the EFL or ESL classroom. As recently defined, they are „[…] electronic collections of foreign or second language learner texts collected on the basis of strict design criteria.‟ (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007: 254) In other words, a learner corpus is compiled when the oral or written texts produced by your students of English are collected with strict design criteria, put in electronic format, and then stored in your hard drive, memory stick, etc., so that you can conduct analyses with programmes like WordSmith Tools, already mentioned: Figure 21. From oral or written texts to a computer learner corpus. Thanks to the availability of computers and freely available software to carry out analyses, Learner Corpora Research (LCR) has been a fruitful field since the second half of the 1990s. From that moment onwards, the growing number of publications either in edited volumes (cf. Granger, 1998; Granger, Hung and Petch-Tyson, 2002; Guilquin, Papp and Díez-Bedmar, in press, etc.), or international journals (cf. Corpora, Applied Linguistics, English Corpus Studies, Journal of English for Academic Purposes, ReCALL, etc.) shows the potential of this type of research and constitutes the first steps to the awareness of the possibilities that CLC can offer for Second Language Acquisition and for the TEFL or TESOL classroom. Page 29 of 48
  • 30. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 4.2. Types of CLC Due to the importance of CLC-based results, the number of CLC has mushroomed since the second half of the 1990s. The research questions pursued by various researchers or research teams have fostered different types of CLC, which are frequently classified according to four related variables, namely the mode of the language in the learner corpus, its size, the type of intervention (i.e. when the CLC-based will be applied in the design of materials, the sequencing of the curriculum, etc.), and the type of annotation in the corpus. Written Mode Spoken Multimedia Big (commercial or some research teams) Size Small (research) Delayed Human Intervention Type of Intervention12 Early Human Intervention Raw Type of annotation13 POS-tagged Semantically- tagged Error-tagged Table 4. Main variables considered for the classification of learner corpora. 4.3. Methodologies used with CLC Compiling students‟ production does not constitute new practice to teachers of English as a second or foreign language, as it has always been considered to create remedial exercises, test their command of the foreign language, etc. However, the methodology used to conduct the analysis of the students‟ production has changed along time, as researchers and teachers have focused their attention on different aspects (the students‟ L1, the target language, etc.) and technology has made it possible to compile CLC, i.e. learners‟ real data in electronic format. Table 5 shows the three main methodologies used before the arrival of CLC. The first one, Contrastive Analysis, in its strong form, did not consider the students‟ production, but the 12 This distinction was made by Sinclair (2001, vii). 13 For the types of annotation, refer to McEnery and Wilson Page 30 of 48
  • 31. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. similarities and differences between the students‟ L1 and their target language (i.e. Spanish and English), in order to predict the difficulties that students would have. The weaknesses found in this methodology led researchers to shift their attention to Error Analysis, whose theoretical principles and methodological issues were provided in a series of articles in the 1960s and 1970s (and reprinted in Corder, 1981). Specially outstanding was the paper „The significance of learners errors‟ (included in Corder, 1981), which proved that errors were crucial to researchers, teachers and students, since they all could learn from them and apply that knowledge to their research, teaching practice or learning process. Thus, the steps for conducting an EA were followed by many teachers and researchers and the results published, on some occasions, as dictionaries and lists of common errors. However, Error Analysis only considered errors and dismissed the learners‟ correct use of the foreign or second language. This led Selinker to his Interlanguage Analysis (IA) (Selinker, 1972), which examined the students‟ entire production, i.e. errors and non-errors alike. In this way, it was possible to obtain a better description of the students‟ use of the foreign language when performing a task at a specific point in time in their language learning process: their interlanguage. Methodology Focus of interest Publications Contrastive Analysis (CA) Comparison of Lado (1957) the students‟ L1 and their TL Error Analysis (EA) Students‟ real errors Corder (1981) Pre CLC The students‟ whole Interlanguage Analysis (IA) Selinker (1972) production, errors and non- errors Table 5. Methodologies used to describe the students‟ language before CLC. Despite not in a systematic way, teachers of English as a foreign or second language frequently analyse their students‟ production following any of these methodologies or a combination of some of them. For instance, an Error Analysis is conducted when a teacher corrects a batch of essays and uses a code system, i.e. an error taxonomy,14 to make the students aware of the type of error made. Thus, „sp‟ may stand for a spelling error, „wo‟ for word order, „prep‟ for a problem with a preposition, etc. After marking all the essays, and skimming his or her annotation, the teacher realises that the most frequent error in the compilation of essays has to do with a certain aspect of the foreign language (be it prepositions, articles, verb tenses, etc.). If the correct instances of those aspects are considered together with the incorrect ones, an Interlanguage Analysis is conducted. However, if the students‟ L1 is compared to their TL 14 For an overview of various error taxonomies, refer to (Dulay, Burt and Krashen, 1982: 146-197) or James (1998: 102-117). Page 31 of 48
  • 32. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. either before or after analysing their production in an attempt to explain the causes of the students‟ errors, a CA in its strong or weak version, respectively, is completed. The manual analysis of the students‟ errors, following a CA, EA or IA methodology, proves a time- and effort- consuming task which a teacher can only do with a limited number of essays, as it is necessary to go to the essays, look for the errors, highlight, classify and count them, make sure all the errors are being considered, look for the correct use of the aspect of the language being analysed, compare the use of the aspect under analysis in the L1 and the FL, etc. Fortunately, those processes have been sped up thanks to the improvement in technology and, consequently, the advent of CLC, their electronic format being among their main advantages (Nesselhauf, 2004: 139-40), because they make their compilation and their analysis easier. Not to fall prey of the temptation to collect huge disorganized amounts of data, as it is the case with corpora in general (see section 2.2. above), strict design criteria are to be observed when compiling a learner corpus. Special attention needs to be given to the principles of authenticity and representativeness, and all attempts are to be made to avoid the effects of variability not to compare aspects from a not homogeneous learner corpus. Thus, if the teacher aims at representing students‟ in-class argumentative writing at intermediate level, pieces of writing which belong to other genres, which are written by students at other proficiency levels, or at home (and presumably with access to reference materials), should not be included in that corpus, since the results would be biased. Just consider, from your own experience, the difference in the type and amount of errors which an argumentative essay written by a student in class (and without the use of dictionaries, online resources, etc.) and at home would have or, likewise, the type of errors that you expect from descriptive writing as compared to narrative writing. Drawing from the methodologies in the pre-CLC era, the analysis of students‟ use of language, as represented in a learner corpus, is nowadays being made in a systematic and scientific way following Computer-aided Error Analysis (CEA), Contrastive Interlanguage Analysis (CIA) or the Integrated Contrastive Method (ICM): Methodology Focus of interest Publications Computer-aided Error AnalysisStudents‟ real errors, as (Dagneaux, Dennes (CEA) attested in a CLC and Granger, 1998) Contrastive Interlanguage Comparison of (Granger, 1996) Analysis (CIA)  NS vs. NNS production  NNS vs. NNS production Integrated Contrastive Method  CA (Granger, 1996; CLC (ICM)  CIA Gilquin, 2000/2001) Table 6. Methodologies used in the description of the learners‟ production of the foreign language. The first one, CEA, is a „new type of EA‟ (Dagneaux, Dennes and Granger, 1998: 165). In other words, it is a computerized version of EA, which allows a quicker error annotation and easy retrieval of the erroneous instances of students‟ use of the foreign language. There are Page 32 of 48
  • 33. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. two ways to conduct such an analysis, which depends on whether the learner corpus is error- tagged or not, i.e. whether a code system to highlight the errors has been used or not. If it is not, an intuitive search for an error-prone aspect is undertaken. This is the case when the teacher feels that the central articles the and a(n) pose problems to his or her students. By means of a learner corpus and retrieval tools, s/he can read in the concordances retrieved the use of those articles and decide which ones are incorrect, thus conducting an EA. However, a raw learner corpus, i.e. one without error annotation, will not allow the researcher to retrieve those instances of the (mis-)use of the zero article, since it would be impossible to automatically retrieve them. To do so, the learner corpus needs to be error-tagged. There are two types of error-tagged learner corpora:  Fully error-tagged and  Partially error-tagged In the former, a comprehensive error taxonomy has been used to highlight all the possible errors in a learner corpus. Although few learner corpora are fully error-tagged due to practical reasons of time and money, the results which such EAs yield provide a bird‟s-eye perspective of the students‟ problems when using the foreign language at a specific moment in their language acquisition process. As an example, Figure 7 shows the percentage of errors in forty-three aspects of the foreign language (as represented in the error tags on the horizontal axis) that the written production by first-year university students contains at the beginning of the academic year (Díez-Bedmar, 2005): Figure 22. EA of first-year University students when beginning the academic year (Díez- Bedmar, 2005). Page 33 of 48
  • 34. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. A partially error-tagged learner corpus only highlights a specific type of error, which is of interest to the teacher or the researcher. Resuming the case of the central articles, a partially error-tagged learner corpus will make it possible to easily retrieve, quantify and analyse the errors made with the articles the and a(n) (as it was the case with a raw learner corpus), but also those errors involving the zero article (Ø). Notice in the following concordance lines the cases of incorrect use of the central articles, a(n), followed by erroneous uses of the zero article, and then erroneous uses of the, as error-tagged (GA). Figure 23. Article errors as retrieved from a partially error-tagged learner corpus using WordSmith Tools.. The second methodology used with CLC, the Contrastive Interlanguage Analysis, allows the researcher to compare the students‟ production with: 1 the production by native speakers of English 2 the production by other groups of learners of English with a different L1 On the one hand, if your students‟ production is compared to that by native students of English (at the same level and under the same external variables), it would be possible to see how (dis-)similar both productions are when an aspect of the foreign language is studied. As a result, instances of misuse but also under- or over-use are revealed and conclusions such as the overuse of the prepositions between, inside and according to by Spanish university students, when comparing them to native learners of English can be drawn (Martínez Osés and Neff, 2001: 144). On the other hand, you may be interested in comparing how various Page 34 of 48
  • 35. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. groups of students of English (at the same proficiency level and under the same external variables) struggle with the same aspect of the foreign language, as Kaszubski (2001) did when comparing the use of the lemma be by Spanish, Polish and Belgian-French students. Finally, the Integrated Contrastive Model includes a CIA and a corpus-based CA. Therefore, three different corpora are used, namely the learner corpus, the control corpus and a corpus which contains the production by native speakers in the L1. As it happened with CA in the pre-CLC era, there are two ways of conducting an ICM. First, the corpus-based CA is conducted in order to see the main differences between the two native languages considered and, then, the problems posed by such differences are attested in the learner corpus. On the contrary, the problems in a learner corpus, as revealed by a CIA may lead to a corpus-based analysis of the two native languages in an attempt to find the causes of such errors. 4.4. The application of CLC in the TEFL classroom The potential of CLC in the direct and indirect approaches will be explored in this section. The first one will deal with the indirect approach, that is, using the results from the analysis of CLC (following the methodologies described in 4.3) to improve teaching materials, the curricula, etc., whereas the second one will focus on the direct approach, which provides hand-on experience in working with CLC. 4.3.1. The indirect approach Although CLC-based descriptions of the students‟ interlanguage are still limited and only provide „[…] patchy knowledge of the different stages of interlanguage development.‟ (Gilquin et al., 2007: 322), the results obtained are progressively being introduced in teaching materials. Among the ones which have benefited more from the results in CLC are the dictionaries of common errors, such as The Longman Dictionary of Common Errors (Turton and Heaton, 1987) and the Cambridge series Common Mistakes at… (Tayfoor, 2004; Driscoll, 2005; etc.), in which frequent errors in learner corpora are highlighted and explained. Page 35 of 48
  • 36. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 24. CLC-informed materials focused on common errors. Likewise, dictionaries have also been CLC-informed. The first one was the Longman Essential Activator (LEA), which made use of the information in the Longman Learner’s Corpus (LLC), and was followed by some others such as the Cambridge International Dictionary of English, based on the error-tagged Cambridge Learners’ Corpus (Nicholls, 2003), or the second edition of the Macmillan English Dictionary for Advanced Learners, based on a CIA analysis of the International Corpus of Learner English (ICLE) and a corpus of native speakers‟ academic writing. Figure 25. CLC-informed monolingual dictionaries of English. The CLC-based information in these dictionaries is typically provided in „help boxes‟, which are quite familiar to any learner of English as a foreign or second language. However, new ways of offering information from CLC are being devised, as it is the case of the graphs in the Macmillan English Dictionary for Advanced Learners, which shows the results of the CIAs conducted on problems of frequency, register confusion, etc. Similarly, alternative ways to express the students‟ typical errors are also suggested (as exemplified from the control corpus) and extended writing sections on twelve rhetorical or organizational functions which are particularly prominent in academic writing are included (cf. Gilquin, Granger and Paquot, 2007, pp. IW1-IW29). Page 36 of 48
  • 37. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 26. CLC-based results as provided in the Macmillan English Dictionary for Advanced Learners (MED2). Recent grammars also include information from learner corpora, as it is the case of Carter and McCarthy‟s (2006) Cambridge Grammar of English, or the on-line Chemnitz Internet Grammar of English. Figure 27. CLC-informed grammars of English. Finally, CLC may inform CALL programmes, such as WordPilot (Milton, 1998) or be integrated into CALL programs, so that teachers and students, if deemed convenient, have a direct access to the real data, as in the EXample eXtractor Engine for LAnguage Teaching (eXXelant) (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007). Although syllabus design, textbooks and writing courses are now beginning to consider native data in their recent editions (cf. the Touchstone Student’s Book series), there is no doubt that the information provided by CLC can complement and improve such materials to meet the students‟ real needs. Page 37 of 48
  • 38. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 4.3.2. Designing remedial exercises from a learner corpus Analysing a learner corpus and designing CLC-based remedial exercises to meet your students‟ real needs is not a difficult task. To help you analyse the data in a learner corpus, this section will explore two ways to approach a small raw learner corpus. The first one deals with the students‟ use of vocabulary, and the second one with the lexico-grammatical pattern of the verb „say‟ and „tell‟. The learner corpus used is one composed of the handwritten production by 16 first-year university students (amounting to 17,765 words) when writing descriptive texts in class, without any access to reference materials and a time limit of 60 minutes, was used. The piece of software used for such purpose will be WordSmith Tools version 4.0. 4.3.2.1. Exploring vocabulary usage: wordlists and concord This piece of software allows the teacher or researcher to create a wordlist, to run concordances and explore keywords, as can be seen in the following Figure. However, we will focus on the use of word lists and concordances for an exploratory analysis of the adjectives used by a group of learners. Page 38 of 48
  • 39. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 28. WordSmith Tools 4.0. As this self-explanatory term indicates, a word list is a list of the words in your learner corpus. This term was reviewed in Table 2 above. Such list may be quantitatively ordered from the word which presents the highest number of occurrences to the ones which only appear once, or the other way round. As can be seen in Figure 29 below, a word list of the adjectives that students used in the learner corpus was obtained after removing from the list the words which did not belong to this open word-class. As a result, it was possible to check that the adjectives which were most used by those students were „good‟, „important‟ and „different‟. This finding may not have surprised an experienced teacher, but the co-text in which these adjectives are used may reveal interesting and unexpected deficiencies in the learners‟ vocabulary. In order to explore such co-texts, the next step is to run concordances of any of these words. For this example, „important‟ was selected. As can be seen in Figure 30 below, by running a concordance we obtain sentences with the searched word in the middle and in blue. This is known as „Key Word In Context‟ (KWIC), or node, and the lines obtained (i.e. concordance lines) are not to be read in the traditional way (that is, everything from left to right as already seen above), but we only focus on the first word to the left or to the right of the KWIC. Thus, we are able to see the type of pre-modification the students use with the adjective under consideration (first word to the left of the KWIC), and which elements are qualified as „important‟. As already reported (cf. Granger and Tribble, 1998 or Osborne, 2004, among others), students rely on this adjective, to the detriment of the use of others like „crucial‟, „outstanding‟, „main‟, „valuable‟, etc., in the appropriate contexts. Therefore, a very easy Page 39 of 48
  • 40. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. exercise to create with the students‟ real words in their compositions is to remove the KWIC and leave a blank, so that they have to think of a better alternative to fit in the linguistic contexts they have created. Figures 29 and 30. WordSmih Tools: Running a concordance and hiding the KWIC. Page 40 of 48
  • 41. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 32, presents a screenshot of such worksheet, which you can put into a word document and use in class. The strongest aspect of this exercise is that it is based on your students‟ own errors, and therefore, cater for their very specific needs. Furthermore, students are more likely to feel motivated to do this exercise, since they may recognise their sentences and may be willing to learn how to improve them. Figure 31. Concord utility. Page 41 of 48
  • 42. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 32. Worsksheet in a .doc document. 4.3.2.2. Exploring lexico-grammatical patterns: „say‟ and „tell‟ The use of the verbs „say‟ and „tell‟ are reported to pose difficulties to students at various levels due to their different lexico-grammatical patterns. However, it is worth exploring whether your students do make those mistakes and, if so, which are the most problematic uses. In order to do so, the first step is to run a concordance of the verb „say‟ and sort the first words to the right of the concordance line, as shown in Figures 33 to 35. Page 42 of 48
  • 43. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figures 33 to 35. Running a concordance and sorting them considering the first element to the right of the KWIC By doing so it is now possible to see how the students complement the verb „say‟ in different contexts and co-texts that they have created themselves. In checking those uses, it is also possible to notice uses of the verb „say‟, where „tell‟ would have been preferred, or where another wording would have been more native-like. In order to show students real native examples of the use of those problematic verbs, i.e. „say‟ and „tell‟, we can use the freely available version of the British National Corpus (BNC) or the Page 43 of 48
  • 44. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Collins Wordbanks Online English Corpus as control corpora, and show students some examples in KWIC format to foster their analysis of the lexico-grammatical patterns used (with the help of the teacher if necessary). To do so, we only have to query those corpora (Figures 36 and 37), select the examples which show the various possibilities to complement the verbs and, finally, create a word document for them to work with Once real input has been provided to students and they have reflected on the various lexico- grammatical patterning, an exercise based on their own written production, that is, in the learner corpus compiled, can be created. As it was the case with the example of the use of „important‟ above, we can easily remove the KWIC (the verbs „say‟ or „tell‟ in this case) from the concordance lines and create a remedial exercise. Page 44 of 48
  • 45. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 36 and 37. Concordances of the verbs „say‟ and „tell‟ in two native corpora. As can be seen, creating materials which meet our students‟ real needs is not such a difficult or time-consuming task. EFL teachers‟ experience is highly valuable when considering their intuitions regarding their students‟ problems, which are worth checking and exploring in the learner corpus that they have compiled. Once the remedial exercises have been created, the worksheets can be stored either in paper format or distributed in a virtual platform, so that students with the same problems, in our school or in another, may benefit from our work created and improve their use of the foreign language. Page 45 of 48
  • 46. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. References Barlow, M. (1996). Corpora for Theory and Practice. International Journal of Corpus Linguistics, 1, 1. 1-37. Bernardini, S. (2004). In the classroom: Corpora in the classroom: An overview and some reflections on future developments. In John Sinclair (ed) How to Use Corpora in Language Teaching,15-36. Amsterdam: John Benjamins. Carter, R. and McCarthy, M. (2006). Cambridge Grammar of English. Cambridge: Cambridge University Press. Corder, S. P. (1981). Error analysis and interlanguage. Oxford: Oxford University Press. Dagneaux, E., Dennes, S., and Granger, S. (1998). Computer-aided error analysis. System 26: 163-174. Díez-Bedmar, M.B. (2005). Struggling with English at university level: error-patterns and problematic areas of first-year students‟ interlanguage. In P. Danielsson and M. Wagenmakers (eds), The corpus linguistics conference series. Retrieved 16 September 2007, from <http://www.corpus.bham.ac.uk/PCLC/> Driscoll, L. (2005). Common Mistakes at PET… and How to Avoid Them. Cambridge: Cambridge University Press. Dulay, H.., Burt, M., and Krashen, S. (1982). Language Two. Oxford: Oxford University Press. Gilquin, G. (2000/2001). The integrated contrastive model. Spicing up your data. Languages in Contrast 3(1): 95-123. Gilquin, G., Papp, Sz. and Diez-Bedmar, M. B. (eds.) (in press) Linking up Contrastive and Learner Corpus Research. Amsterdam and Atlanta: Rodopi. Gilquin, G., Granger, S, and Paquot, M. (2007). Learner corpora: The missing link in EAP pedagogy. Journal of English for Academic Purposes 6: 319-335. Granger, S. (1996). From CA to CIA and back: an integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B.Altenberg and M. Johansson (eds.), Languages in Contrast. Text-Based Cross-Linguistic Studies, 37-51. Lund: Lund University Press. Granger, S. (ed.) (1998). Learner English on Computer. London and New York: Addison Wesley Longman. Granger S. and Tribble C.(1998). Learner corpus data in the foreign language classroom: form-focused instruction and data-driven learning. In S. Granger (ed.) Learner English on Computer, 199-209. London and New York: Addison Wesley Longman. Page 46 of 48
  • 47. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Granger, S., Hung, J. and Petch-Tyson, S. (eds.) (2002). Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, Amsterdam and Philadelphia: John Benjamins. Granger, S., Kraif, O., Ponton, C., Antoniadis, G. and Zampa, V. (2007). Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCALL 19(3): 252-268. James, C. (1998). Errors in Language Learning and Use. Exploring Error Analysis. London and New York: Longman. Kaszubski, P. (2001). Tracing idiomaticity in learner language –the case of BE. In P. Rayson, A.Wilson, T. McEnery, A. Hardie and S. Khoja (eds.), Proceedings of the Corpus Linguistics 2001 Conference (29 March-2 April), 312-322. Lancaster: University Centre for Computer Corpus Research on Language Lado, R. (1957). Linguistics Across Cultures. Ann Arbour, Michigan: Michigan University Press. Lewis, M. (1993). The Lexical Approach. Language Teaching Publications. McEnery, T.; Xiao, R., and Tono, Y. (2006). Corpus-based language studies. An advanced resource book. London: Routledge. Milton J. (1998). Exploiting L1 and Interlanguage Corpora in the Design of an Electronic Language Learning and Production Environment. In S. Granger (ed.) Learner English on Computer, 186-198. London & New York: Addison Wesley Longman. Martínez Osés, F. and Neff Van Aertselaer, J. (2001). Corpus analysis of prepositional patterns in native and non-native university writing. In C. Muñoz, M. L. Celaya, M. Fernández-Villanueva, T. Navés, O. Strunk and E. Tragant (eds.), Trabajos en Lingüística Aplicada, 139-147. Barcelona: Univerbook. Mauranen, A. (2004).Spoken corpus for an ordinary learner. In John Sinclair (ed) How to Use Corpora in Language Teaching, 89-105. Amsterdam: John Benjamins. Nattinger, J. R. and J. S. Decarrico. (1992) Lexical phrases and language teaching. Oxford: Oxford University Press. Nesselhauf, N. (2004). How learner corpus analysis can contribute to language teaching: A study of support verb constructions. In G. Aston, S. Bernardini and D. Stewart (eds.), Corpora and Language Learners, 109-124. Amsterdam and Philadelphia: John Benjamins. O'Keeffe, A. McCarthy, M. and Carter, R. (2007). From corpus to classroom. Cambridge: Cambridge Univrsity Press. Osborne, J. (2004). Top-down and Botom-up Approaches to Corpora in Language Teaching. In U. Connor and T. A. Upton (eds.). Applied Corpus Linguistics. A Multidimensional Perspective, 251-265. Amsterdam and New York: Rodopi. Page 47 of 48
  • 48. Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Römer, U. (2008). Corpora and language teaching. Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10: 209-231. Sinclair, J. (2001). Preface. In M. Ghadessy, A. Henry and R. L. Roseberry (eds.), Small Corpus Studies and ELT. Theory and Practice, vii-xv. Amsterdam and Philadelphia: John Benjamins. Sinclair, J. (2004). New evidence, new priorities, new attitudes. In John Sinclair (ed) How to Use Corpora in Language Teaching, 271-299. Amsterdam: John Benjamins. Tayfoor, S. (2004). Common Mistakes at First Certificate… and How to Avoid Them. Cambridge: Cambridge University Press. Tribble, C. and Jones, G. (1990). Concordances in the classroom. London: Longman. Turton, N. D. and Heaton, J. B. (1987). Longman Dictionary of Common Errors. Harlow: Longman. Page 48 of 48