Designing A Grammar and Writing Achievement Test for the Community Language Program (CLP)


When & Where
Sep 2016 - Dec 2016
Teachers College, Columbia University

teachers college logo

Makiko Habu

My role
Learner research, test design, data analyses

Research Methodology
A quantitative non-experimental, one-group, post-test-only research study

Problem Statement
Quality of tests was overlooked when teachers only focus on test scores. <br>

Possible Solution
Through conducting intensive literature review, applying test development practices, and using multiple data analysis methods, language assessment items can be designed and reiterated to better assess language learners. 

blue line.png

Part 2: Overview

Studies on language assessment attach importance to how the target language works theoretically and how it is used practically. They also play an important role as a key to understand language teaching and learning through examining test-takers’ responses in assessments. 

One might some concerns about language assessment:

To provide test-takers with a valid and fair test, teachers and testing practitioners should make appropriate decisions regarding assessment in academic content areas. Conducting assessments helps teachers identify learners’ strengths and weaknesses, plan for lessons, and conduct applicable instruction. Bearing this notion in mind, the motivation for the following study is to investigate and describe the process of test development, and then analyze an achievement test that we conducted in an ESL (English as a Second Language) class, getting a sense of how the test can be designed most effectively based on the theories in current studies.

The following study we conducted was conducted at the Community Language Program (CLP) at Teachers College, an English as a Second Language program that provides beginners to advanced learners with English lessons to improve four skills of English (listening, speaking, reading, writing) collectively. The learners’ primary focus is to utilize those four skills in various social contexts in everyday life in New York City. Therefore, the CLP adopts a communicative teaching approach and the lessons are based on a theme-based syllabus. Complying with a theme-based syllabus, each lesson focuses on a specific topic. Also, all of the classroom activities are tied to the theme during the unit to contextualize the elements of the target language. 

This study focuses on an achievement test which was administered at the end of a unit for the students at the CLP. An achievement test functions in various ways. First, learners will be given valuable feedback through an achievement test, which helps them gain a clear view of how much they have learned and what they might need to spend more time on for improvement. Second, it will help instructors gauge teaching effectiveness and student learning from the results. For example, a certain grammar point might have been stressed multiple times in class, but when the test report comes out, it might not match the assumption. Therefore, for test designers, it is important to collect data from an instructor’s feedback, students’ needs analysis, and self-evaluation for creating more effective tests.

Click the image to enlarge

Part 2: Define 

Who is our target group?
 — meet with students and learn about their background and learning goals

What do we want to test on the students? 
— discuss with their instructor to see what she thinks and expects

What did the students learn in this unit? 
— observe their class, study their textbook and learning materials to familiarize with the specific unit they are studying

Community Language Program @ Teachers College, Columbia University

Community Language Program @ Teachers College, Columbia University

Part 3: Ideate

Hypothesis: there is a relationship between the students’ writing ability and grammar skills. 

Click to enlarge image


Part 4: Prototype & Build

For this project, each group is asked to choose two components from the following five skills to build a test: reading, listening, speaking, writing, and grammar. With the selected two components, one should be in the form of multiple-choice tasks, and the other a constructed-response task. Our team chose grammar for the multiple-choice task and writing for the constructed-response task. We planned to design 20 multiple-choice questions and one writing prompt. 

Before we started with putting grammar questions together, Makiko and I received a training course in constructing multiple choice questions. Where I learned about the EMPATHY in testing. 


“Include items based on what you most emphasized in the class - not the minor points or details mentioned in passing.”

“Order the items according to perceived difficulty. The easy items go first and the harder ones last.”

“Do NOT test language based on prescriptive grammar which are pretty much ignored in spoken English. (e.g., With whom did you go? vs Who do you go with?).”

“The content should be age-appropriate (e.g., most teens do NOT know how to rent an apartment) and should be as bias-free as possible. Some biases include: cultural biases (e.g., most EFL students are not familiar with double-decker buses or the NYC subway station); gender biases (e. g., do not stereotype men and women in roles); topical bias (avoid questions requiring knowledge of a topic that is not part of the test - e.g., helicopter parts).”


Part 5: Prototype (Our Test!)



Part 6: Administer & Testing

Several notes: 
1. We forgot to add a section for test-takers to put their names.
2. We did not make an area for test-takers to add answers to the multiple choices questions (they could only circle it in the questions). 

Data do not give up their secrets easily. They must be tortured to confess.

– Jeff Hopper, Bell Labs

Part 7: Data Analyses (with SPSS)

A. Results for the Grammar Task (multiple-choice questions)
1. Descriptive Statistics
2. Internal-consistency Reliability and Standard Error of Measurement
3. item analyses (item difficulty, Discrimination, Alpha, Decision of keep or throw out)
4. Distractor Analysis
5. Evidence of Construct Validity within the Grammar Section (magnitude of the correlation coefficient, generalizability of the correlation coefficient)

B. Results for the Writing Task
1. Descriptive Statistics (overall view of how the grades are spread out)
2. Internal-consistency Reliability and Standard Error of Measurement (how are the three different components of writing ability related to each other?)
3. Inter-rater Reliability (How are Makiko and I different in grading?)
4. Evidence of Construct Validity within the Grammar Section (magnitude of the correlation coefficient, generalizability of the correlation coefficient)

C. Other Evidence of Validity
1. Relationship between the Two Sections of the Test
2. Relationship between a background variable and test performance [we decided to study whether their English learning goal has a relationship with the test-takers’ writing ability, as suggested in our last research question.]

Click individual image to see more details.

Part 8: Iteration

Changes to Test Items (Content) Design

  1. Eight items out of 18 were taken off from this item analysis portion by SPSS because all test-takers answered those items correctly: those items were all unable to discriminate between high- and low-ability test-takers.

  2. Based on the data above (Table 4), where we looked at the item Difficulty (p-value) and their D-index respectively, we would suggest that six items (1, 7, 8, 15, 19, 20) be revised and one item (9) be removed to increase the reliability of the exam.

discarded question/item

Changes to Test Layout Design

Also, better layout that helps administrative tasks and data analysis! We forgot to include an answer sheet at the beginning/end of the test, which is necessary for more efficient grading! A name-block is also needed to remind test-takers to write their names down. 


Part 9: Thoughts

Thinking back at the 15-week project and what process we were involved in, I'd like to introduce Paul Souza (1996)'s design model, which I find speaks for Makiko and my experience.

I also find Paul Souza (1996)'s design model highly relatable.

I also find Paul Souza (1996)'s design model highly relatable.

Last but not least, I invite you to read out complete paper!