In this project, we aim to take an important step towards enabling the use of corpora (electronically stored and searchable text databases) to study writing as a process. Corpus-based research into the writing process is crucial as it provides, for example, new empirically grounded insights into the role of feedback, drafting and revision, which in turn will facilitate new pedagogical developments in the teaching of writing and therefore also support students’ ability to write and critically analyse text.
Writing plays an important role at different levels in higher education. For researchers and PhD students, the pressure to publish seems to be increasing at a steady and unstoppable rate. Students and also universities are evaluated on the basis of written products like Master’s and Bachelor’s theses. Many students are expected to be confident users of both written and spoken English in their working life as English is the main international corporate language. At the same time, students enter higher education with very different experiences of writing and with different levels of performance of English. This puts increasing pressure on universities to be able to cater for high levels of written performance both academically and professionally. It has therefore become more important to understand how writing at university can be taught and how to address issues at various levels.
In the 1990s, language corpora grew to become a central tool in language research. Some of the great advantages of corpora are that research can be based on large data sets and that data can be made available to a wide range of users. However, the influence of corpora in the study of writing has been limited, partly because corpora typically contain only one version of a text.
We will compile, systematise, and annotate a corpus of writing as a process (MUCH), which will be made searchable and will contain an interface that makes it possible to view differences between versions and to connect changes to comments made by peers and instructors. What sets it apart from other learner corpus projects is, first and foremost, the focus it will have on writing as a process. This will be realised primarily by the inclusion of several drafts of a paper, student self- reflective papers, and teacher and peer feedback in the corpus. The student papers in the corpus will be written in English and will range from undergraduate to PhD levels. It will hence cater for comparisons across proficiency levels, writing tasks and language backgrounds. These features make the MUCH corpus attractive to a wide user base.
The main contribution of this project lies in making a corpus available to researchers, instructors and students. The design of the corpus will make it interesting to researchers from several disciplinary backgrounds such as writing researchers, linguists, pedagogues, and corpus designers. The availability of the corpus is of utmost importance as data in writing research is still often only locally available. Another key issue is to design a tool that allows for comparisons between first language and second language data. A great deal of research on written feedback from teachers and students is based on studies carried out in first language contexts. The MUCH corpus will collect data produced in second language higher education contexts and hence allow important research on feedback in such contexts.
The current project requires expertise in the teaching of academic writing, writing research, corpus compilation and the development of software for language studies. To fully fit this profile, we have gathered an excellent team of researchers from Chalmers University of Technology (Chalmers: grant administrator), Malmö University (MAH), Swedish Language Bank at the University of Gothenburg (Språkbanken, GU) and the University of Southern Denmark (SDU). There is also collaboration with The Language Archive (TLA) at Max Planck Institute for Psycholinguistics (MPI).