The development of the EPT has been based on various studies that stemmed from years of analysis and evaluation. Since 1997, more than 16 studies have been published in regards to various measurement-related research agendas, such as development of new test format (e.g. Cho, 2001; Lee, H., 2004; Lee, Y., 2005), rater analysis (e.g. Craig, 2001; Rouillon, 2008; Jang, 2010) and test analysis (e.g. Chi, 2007). Brief summaries of selected papers are provided below.
This study investigates the characteristics of argumentation in essay performances on a second language (L2) writing test. Inspired by Toulmin’s ( 2003) model of argumentation and research on evaluating the soundness of arguments (Means & Voss, 1996; Schwarz, Neuman, Gil, & Ilya, 2003), we examined argument structure and quality in 150 argumentative essays of different levels on an integrated English-as-a-second-language (ESL) writing placement test at a US university. The essays were coded using Toulmin’s elements, scored on a Likert scale in terms of argument relevance and acceptability, and compared via multivariate analysis of variance. Results show that both argument structure and quality are associated with differences in writing proficiency. However, in terms of argument structure, the presence of individual Toulmin’s elements alone does not predict proficiency level. Instead, it is the structural complexity, i.e., co-occurrence of multiple structural elements, that reflects the proficiency differences. The findings of this study help disentangle the nature of argumentation across proficiency levels, offering both insights over L2 writing development and validity evidence for L2 writing assessments. Additionally, this study has methodological implications for L2 writing assessment research, suggesting the added value of incorporating fine-grained measures of argumentation in scale development and validation.
Kim, Bowles, Yan, & Chung (2018) examined the comparability of the on-campus and online versions, focusing on essay quality and examinee preference among 26 examinees who took both versions within a week, in counterbalanced order. Essay quality was measured in terms of linguistic (complexity, accuracy, fluency) and rhetorical features (integration of sources, progression of ideas, argument effectiveness). No meaningful differences in essay quality were observed between the two versions, although online essays were slightly longer. Post-test questionnaire responses revealed that a majority of test-takers preferred the online version for its convenience. This paper discussed the advantages and disadvantages of including peer review in writing placement tests, and we concluded by providing recommendations for evaluating comparability as a part of standard quality control practice in local tests.
At the University of Illinois at Urbana-Champaign (UIUC), the English Placement Test (EPT) is the institutional placement test that is used to place students into appropriate English as a second language (ESL) writing and/or pronunciation service courses. The EPT is used to assess the English ability of newly admitted international undergraduate and graduate students as validated against the English language demands of our campus. According to Davidson & Cho (2001), the current format of the EPT has maintained its quality and displayed evidence of validity through the use of detailed test specifications that align with the goals of the UIUC ESL writing and pronunciation service courses. UIUC offers these courses to international students who are accepted on a limited status based on their scores on standardized English proficiency tests (TOEFL or IELTS) and/or other relevant information in each student's admission dossier. Students accepted on limited admission status are required to take the EPT before start of instruction.
Kokhan (2013) conducted a study based on EPT writing score data (2006 –2010). Conclusions were drawn that indicated TOEFL scores do not accurately predict ESL placement. Using a multinomial logistic regression model, it was revealed that students with high TOEFL scores may be placed into lower levels of the ESL writing class as opposed to being exempted or placed into higher levels, and students with low TOEFL scores could have a higher chance of exemption. The findings may justify the use of institutional placement tests that are better designed to meet the criteria of a specific writing program.
Kokhan (2010) conducted a study based on EPT writing score data (2006 –2010). Conclusions were drawn that indicated TOEFL scores do not accurately predict ESL placement. Using a multinomial logistic regression model, it was revealed that students with high TOEFL scores may be placed into lower levels of the ESL writing class as opposed to being exempted or placed into higher levels, and students with low TOEFL scores could have a higher chance of exemption. The findings may justify the use of institutional placement tests that are better designed to meet the criteria of a specific writing program.
Lee and Anderson (2007) studied topic generality with respect to a diverse group of graduate students coming to UIUC with different academic backgrounds. They suggest that departmental affiliation is not correlated with the possibility of getting the same score for each topic. This claim enhances the validity of the EPT by suggesting that departmental affiliation does not matter for different topics as students scored similarly even if they were from different departments. This paper also suggests that EPT administrators should select only one topic out of three for all EPTs without giving any consideration to the departmental affiliation on their writing ability. Researchers also identified a clear relationship between general language skills and writing ability of test takers as the EPT is integrated workshop based writing test. However, this study presents reservations in drawing any conclusions about the writer’s general language ability and performance in specific tasks of the test. More research is suggested to find a clear comparison between language competence and performance on specific tasks of the EPT.
Davidson & Cho (2001) give an overview of three different eras of the EPT development formats and how these formats reflect the English for academic purposes (EAP) testing scenario based on a similar mandate. These three eras include, the first era based on the structuralist model of foreign language teaching, the second era based on psychometric concerns, and the third era based on psychometric concerns as well as classroom content. This paper also provides a historical overview of language assessment tradition at UIUC.
There are also several conference presentations on the EPT, which reflects on-going/recent research projects related to the EPT. Abstracts of selected presentations are provided below.
The COVID-19 pandemic created incalculable problems for educators in every field, and it specifically
posed challenges for language assessment in terms of security and administration (e.g., Green & Lung, 2021; Ockey, 2021). Large- and small-scale tests alike were thrust into the at-home, remotely-administered assessment arena whether they were ready or not. In the case of one large Midwestern University, the English Placement Test team was able to rapidly modify and improve a fledgling online test administered periodically to have it become the primary mode of test administration during the pandemic. This technology demonstration will provide an overview of the administrative procedures involved in hosting and proctoring the online test; this includes selection of a testing platform, translating paper-based materials to online forms, and use of video conferencing for test administration. Since the official test is large-scale and utilizes many university resources, not every technological feature described will apply to all testing situations. Thus, we also introduce alternative modes of remote proctoring and platforming that are suitable for smaller-scale testing operations, including the use of synchronous document-editing and video conferencing. Reflecting on the transition, we realize the importance of digitizing test materials with available technology tools and the necessity of promoting a platform or system that can accommodate both online and face-to-face testing. This experience not only demonstrates the possibility of transitioning test formats in unprecedented times but also provides implications on test security and administration as we move forward.
In validation research for second language (L2) integrated writing assessment, analysis of performance characteristics has largely focused on source use (e.g., Plakans & Gebril, 2013; Weigle & Parker, 2012), lexico-grammatical complexity and accuracy (e.g., Biber & Gray, 2013; Gebril & Plakans, 2016), and discourse organization (e.g., Gebril & Plakans, 2009; Plakans & Gebril, 2017), but less frequently on the construct of argumentation. This study focuses on the characteristics of argumentation as a source of validity evidence for L2 integrated writing assessment. By adapting Toulmin’s ( 2003) model of argumentation and referencing criteria for evaluating the soundness of arguments (Means & Voss, 1996; Schwarz, Neuman, Gil, & Ilya, 2003), this study examines argument structure and quality in 150 argumentative essays of different proficiency levels on an integrated ESL writing placement test in a North American university. The essays were coded using Toulmin’s elements, scored on a 3-point Likert scale in terms of argument relevancy and acceptability, and compared via multivariate analysis of variance (MANOVA). Results indicate that the absence or presence of individual Toulmin’s elements alone does not predict proficiency level. Instead, it is the co-occurrence of multiple structural elements (i.e., overall argument structure) that reflects the difference. Additionally, argument quality is a distinguishing factor of essay performances across proficiency levels. The results also show that overall argument structure and quality of argument are two different constructs. The findings of this study suggest the added value of incorporating fine-grained measures of argumentation in evaluating the validity evidence for L2 writing assessment and offer implications for test development and writing pedagogy.
In language testing, rater studies tend to examine rating performance at a single time point. Research on the effect of rater training either adopts a pre- and post-training design or compares rating performance of novice and experienced raters. While those studies provide insights into the end results of rater training, little has been known about how rater performance changes during the process of rater training. This study examined how rater performance develops during a semester-long rater training and certification program for a post-admission English as a second language (ESL) writing placement test at a large US university.
The certification program aims to align raters to a newly developed rating scale that provides both placement recommendations and diagnostic information regarding students’writing proficiency level and skill profile. The training process employed an iterative, three-stage approach, consisting of face-to-face group meetings, individual rating exercises, and scale re- calibration based on rater performance and feedback. Using many-facet Rasch modeling (Linacre, 1989, 2006), we analyzed rating quality of 17 novice raters across four rounds of rating exercises. Rating quality was operationalized in terms of rater severity and consistency, raterconsensus, and raters’ use of rating scale. These measurement estimates of rater reliability werecompared across time and between certified and uncertified raters.
At the start of the training program, all raters were inconsistent, varied widely in severity, and achieved low exact score agreement. Over time, certified raters improved on multiple indices of rating quality and became more indistinguishable from one another in the application of the rating scale. However, rater performance did not improve in a linear fashion but instead followed a U-shaped developmental pattern. In contrast, uncertified raters’ performance remainedinconsistent across rounds. Findings of this study have implications for the effectiveness of rater training and developmental patterns of rating behavior over time.
This study investigates the accuracy and interpretability of a newly developed automatic essay scoring (AES) system using corpus linguistics, computational linguistics and machine learning methods for an English writing placement test at a large Midwestern university. In order to ensure interpretability and theoretical soundness, the program extracted 21 linguistic features from student essays based on literature on commercial AES systems and second language writing development. These features span the domains of linguistic complexity, accuracy, and fluency (e.g. type-token-ratio, number of words, ratio of grammatical errors) as well as academic style measures (e.g. ratio of nouns, ratio of verbs). Factor analysis was then used to reduce the features into a smaller number of interpretable linguistic dimensions. Five resulting factors were chosen to build a Naïve Bayes model to predict the placement decisions of 71 benchmark essays on the test. The resulting model achieved a prediction accuracy of over 80% using 5-fold cross validation. A further analysis of the parameters in the machine learning model provides insights into the representative features of essays and the writing proficiency of test-takers at different placement levels. Specifically, higher level students tend to use more academic and formal register and make much less grammatical errors. Interestingly, lower level students produce longer essays, but higher level students tend to produce longer sentences. Overall, the AES system demonstrates a high level of accuracy and interpretability, suggesting its potential to complement human ratings and to be used to evaluate the reliability and validity evidence of the English writing placement test. Possible ways to improve the AES system will also be discussed.
The UIUC English Placement Test (EPT) is administered to incoming international undergraduate and graduate students who do not meet the campus English proficiency standard. The test is administered to students once they arrive on campus prior to the start of each semester. The test is anchored to the test specifications that have evolved over several years to meet changing mandates and agenda settings. For several years, the test has changed its format in order to improve consistency and accuracy. In 2008, the undergraduate population demographics changed drastically due to a sudden increase in the number of international students. The surge of international undergraduate students brought about new challenges to test administrators and test score users which consequently led to a review of the validity of the test. The surge, in addition to questioning the test validity, led to the development of a new test delivery mode and administration methods. The new mandates from the surge, mainly from undergraduate advisors and admission officers, barely existed in the past. Before the surge, the test functioned as a peripheral assessment tool to provide additional support to the international undergraduate population. Since the surge, the test is playing a more crucial role in the undergraduate academic setting. Because of the test’s increasing influence on the international undergraduate population, the test administrators are faced with the demand of developing a pre-arrival test delivery format, an online version of the English Placement Test. The goal of the pre-arrival online test is to have newly admitted students take the test during the summer prior to their enrollment at the university. This external mandate motivated a research project to investigate ways to create online assessment tools so that students can take the EPT before coming to campus. In this study, we investigated validity issues of the EPT by examining layers of test specification structures based on Fulcher and Davidson (2009). We also looked into the history and policy changes of the test that was reflected in the previous test specifications and test revision. Moreover, we closely examined the needs and advantages of adopting online assessment tools for international students by modified use of online instructional tools. Findings from the test specifications and policy review show that the external mandate was a crucial factor that affected the various format changes of the test. Also, the current policy redirects teachers and advisors to use test scores differently than the previous test setting. Online assessment tool has merit on test administrations but there is a possible threat to test validity due to technical difficulties and issues related to test security.
This presentation will discuss the development of a web-based placement test for ESL at the University of Illinois. The web-based placement test will be based on the current format of the exam which is a half-day, process-writing based workshop model developed by Cho (2001). In the current format, students are assessed on two main skills: source-based academic writing and pronunciation skills. In order to maintain the overall model of the current offline test, the test administrators are seeking ways to use Moodle to develop a model for the web-based placement test. The presentation will focus on four major topics: 1) background of the research, 2) the web-based test development procedures and its model, 3) potential benefits and challenges facing the web-based placement test in terms of the undergraduate advising community, test security, and other administrative details, and 4) various research agenda regarding the web-based test development such as the reliability and validity research on the current and future model of the placement test. The new format is under development in hopes of bringing three major benefits: 1) students will be able to receive appropriate advising prior to their arrival to campus, 2) advisors and departments would be able plan courses based on more accurate information, and 3) test administrators would be able to reduce labor compared to the offline testing.
The English Placement Test (EPT) at the University of Illinois at Urbana-Champaign (UIUC) is used to assess international students, both undergraduate and graduate, who were admitted on limited status based on the campus-wide English proficiency requirements. Based on the results of the EPT, students will be placed into or exempted from ESL writing and/or pronunciation courses. However, one problem that has emerged recently due to the large increase in the number of international undergraduate admission is the score reporting system of the current on-campus testing. The current test is conducted one week prior to the beginning of instruction, which conflicts with departmental freshman orientation and registration. Students who take the EPT once they arrive on campus may not have the results ready for proper class registration which can delay the registration process. In order to meet the University mandates and advisors’ needs, a web-delivered EPT is in development. As a supplementary method to the web-delivered EPT, we would like to consider self-assessment as a tool that can be incorporated into the ESL class placement decision. In order to find out the effectiveness, the predictive value, and the discriminating power of self-assessment, we conducted an online self-assessment survey to undergraduate and graduate students who have taken the English Placement Test. The preliminary results of the survey will be reported with suggestions on how to implement self-assessment in placement decisions.