Development of Arabic Reading Skills Test Items Based on Common European Framework of Reference for Languages Theory

Safira Aina Najiyah; Ihwan Mahmudi; Muhammad Ismail; Latif Fatus Sa’diyah

doi:10.32332/an-nabighoh.v28i1.47-70

Authors

Safira Aina Najiyah University of Darussalam Gontor
Ihwan Mahmudi University of Darussalam Gontor
Muhammad Ismail University of Darussalam Gontor https://orcid.org/0000-0003-1097-8304
Latif Fatus Sa’diyah Al-Azhar University

DOI:

https://doi.org/10.32332/an-nabighoh.v28i1.47-70

Keywords:

Arabic Test, CEFR, Reading Skills, Online Test

Abstract

Introduction: The lack of standarized assessment instruments aligned with the CEFR remains a significant challenge in evaluating student’s Arabic reading proficiency within the Indonesian educational context. In particular, there is a critical need for instruments that bridge local curricula with the Common European Framework of Reference for Languages (CEFR). Research Objectives: This study aims to develop valid and reliable Arabic reading skills test items based on the CEFR proficiency descriptors. Methodology: To address this gap, this research employed a Research and Development approach using the 4D model, which consists of define, design, develop, and disseminate stages. However, the implementation of this study was limited to the first three stages: defining needs, designing the test blueprint, and developing test items. The participants of this study were 259 fifth-grade students of Pondok Modern Darussalam Gontor Putri 1, selected using the Slovin formula. The initial blueprint comprised 119 test items distributed across six CEFR proficiency levels, ranging from A1 to C2. The test items were constructed in various formats, including multiple-choice, true-false, and matching questions. To ensure the quality of the instrument, expert validation and limited field trials were conducted. The data obtained were analyzed to determine content validity, empirical validity, reliability, item difficulty level, and discrimination power. Results: The results revealed that 72 items met the validity criteria and were suitable for use. The reliability analysis showed a coefficient of 0.99, indicating a very high level of reliability. Most of the valid items were categorized as easy, although several items required revision to improve their discrimination power and better differentiate between high- and low-performing students. Unique Contribution: This study contributes practically by providing Arabic language teachers in Indonesia with a standardized, CEFR-based assessment tool that can support accurate measurement of student’s reading competencies and enhance the effectiveness of Arabic reading instruction. Conclusion: Overall, the findings demonstrate thet the developed instrument is valid, reliable, and appropiate for assessing student’s reading profiency based on CEFR standards. Recommendations: Further research is recommended to implement the disseminate stage and expand the use of this instrument in broader contexts.

Downloads

Download data is not yet available.

Author Biographies

Safira Aina Najiyah, University of Darussalam Gontor

Safira Aina Najiyah is a postgraduate student in the Arabic Language Education program at the University of Darussalam Gontor, Indonesia. She earned her Bachelor’s degree in Arabic Language Education from the same university and is currently pursuing her Master’s degree in Arabic Language Education at UNIDA Gontor. Her academic interests include Arabic language teaching, language testing, and error analysis. She is also involved in research focusing on the development of evaluation instruments and language pedagogy in Islamic educational contexts. She can be contacted via email at safiraaina30@gmail.com.
Ihwan Mahmudi, University of Darussalam Gontor

Ihwan Mahmudi is a lecturer at the University of Darussalam Gontor, Indonesia, and currently serves as the Dean of the Faculty of Tarbiyah. He earned his Bachelor’s degree from ISID (now UNIDA Gontor), continued his Master’s degree in Educational Research and Evaluation at Universitas Negeri Jakarta, and completed his Doctoral degree in the same field at Universitas Negeri Jakarta. His academic interests include educational evaluation, test development, and assessment in learning. He has supervised numerous research projects and actively contributes to academic innovation and institutional quality enhancement at UNIDA Gontor. He can be contacted at ihwanm@unida.gontor.ac.id.
Muhammad Ismail, University of Darussalam Gontor

Muhammad Ismail is a lecturer in Arabic Language for Non-Native Speakers at the University of Darussalam Gontor, Indonesia. He earned a Bachelor’s degree in Arabic Language Education (2007–2011) and a Master’s degree in the same field (2011–2014) from UNIDA Gontor. He completed his Doctoral degree at the University of the Holy Quran and Islamic Sciences, Sudan (2018–2022). His expertise spans Arabic testing, translation, research methodology, and teaching strategy. Since 2013, he has been teaching Arabic and is also a certified book editor. He is the developer of the Alikhtibar Learning Management System and has led projects such as the Arabic Online Course and Arabic Adaptive Test. He can be contacted via email at ismail@unida.gontor.ac.id.
Latif Fatus Sa’diyah, Al-Azhar University

Latif Fatus Sa’diyah is a student at Al-Azhar University, Cairo, Egypt. She actively participates in various student and social organizations, both in academic and non-academic fields. Her academic interests include Arabic studies, Islamic education, and cross-cultural communication. She can be contacted via email at latiffatussadiyah20@gmail.com.

References

Aiken, Lewis R. ‘Three Coefficients for Analyzing the Reliability and Validity of Ratings’. Educational and Psychological Measurement 45, no. 1 (1985): 131–42. https://doi.org/10.1177/0013164485451012. DOI: https://doi.org/10.1177/0013164485451012

Aithal, Architha, and P. S. Aithal. ‘Development and Validation of Survey Questionnaire & Experimental Data – A Systematical Review-Based Statistical Approach’. SSRN Electronic Journal, ahead of print, 2020. https://doi.org/10.2139/ssrn.3724105. DOI: https://doi.org/10.2139/ssrn.3724105

Alatlı, Reşat, İsa Güldenoğlu, and Tevhide Kargın. ‘Examination of the Reading Comprehension Skills of Good and Poor Readers in the Dimension of Reading Components Developed by a Reading Skills Assessment Tool’. Education and Science 47, no. 211 (2022): 273–95. https://doi.org/10.15390/EB.2022.11080. DOI: https://doi.org/10.15390/EB.2022.11080

Albero-Ros, Elsa, Amalia Lorente-Velázquez, David Madrid-Costa, and Mariano González-Pérez. ‘Development and Initial Validation of the MCL-PRO-CAT: A Computerized Adaptive Test Designed to Measure Multifocal Contact Lens Performance from the Patient’s Perspective’. Contact Lens and Anterior Eye 48, no. 3 (2025): 102378. https://doi.org/10.1016/j.clae.2025.102378. DOI: https://doi.org/10.1016/j.clae.2025.102378

Alderson, J. Charles, Neus Figueras, Henk Kuijper, Guenter Nold, Sauli Takala, and Claire Tardieu. ‘Analysing Tests of Reading and Listening in Relation to the Common European Framework of Reference: The Experience of The Dutch CEFR Construct Project’. Language Assessment Quarterly 3, no. 1 (2006): 3–30. https://doi.org/10.1207/s15434311laq0301_2. DOI: https://doi.org/10.1207/s15434311laq0301_2

Alharbi, Ahmad A., Hamad S. Al Amer, Abdulaziz A. Albalwi, et al. ‘Cross-Cultural Adaptation and Psychometric Properties of the Arabic Version of the Fall Risk Questionnaire’. International Journal of Environmental Research and Public Health 20, no. 8 (2023): 5606. https://doi.org/10.3390/ijerph20085606. DOI: https://doi.org/10.3390/ijerph20085606

Alwi, Nik Aloesnita binti Nik Mohd, and Wan Alisa Hanis binti Wan Abdul Halim. ‘Variations and Methodological Components in CEFR-Aligned Language Tests: A Systematic Review’. Journal of Creative Practices in Language Learning and Teaching 12, no. 1 (2024). https://journal.uitm.edu.my/ojs/index.php/CPLT/article/view/2708.

Cabahug, Kendy Suet. ‘The Implementation of a Validated Contextualized Reading Material to Enhance Decoding Skills of Grade 1 Learners’. Pantao (International Journal of the Humanities and Social Sciences) 4, no. 2 (2025). https://doi.org/10.69651/PIJHSS040285. DOI: https://doi.org/10.69651/PIJHSS040285

Dewi, Ni Putu Eka Maryani, and Ida Bagus Putrayasa. ‘An Application of Difficulty Level Analysis of Question Items in Language Learning Evaluation’. Bulletin of Science Education 4, no. 3 (2024). https://attractivejournal.com/index.php/bse/article/view/1681.

Duke, Nell K., and Kelly B. Cartwright. ‘The Science of Reading Progresses: Communicating Advances Beyond the Simple View of Reading’. Reading Research Quarterly 56, no. S1 (2021). https://doi.org/10.1002/rrq.411. DOI: https://doi.org/10.1002/rrq.411

Gazali, Erfan, and Hasan Saefuloh. ‘Development of an Arabic Receptive Proficiency Test Instrument Based on the Common European Framework of Reference for Languages’. Al-Ta’rib : Jurnal Ilmiah Program Studi Pendidikan Bahasa Arab IAIN Palangka Raya 11, no. 2 (2023): 293–308. https://doi.org/10.23971/altarib.v11i2.6721. DOI: https://doi.org/10.23971/altarib.v11i2.6721

Haddad, Chadia, Hala Sacre, Sahar Obeid, Pascale Salameh, and Souheil Hallit. ‘Validation of the Arabic Version of the “12-Item Short-Form Health Survey” (SF-12) in a Sample of Lebanese Adults’. Archives of Public Health 79, no. 1 (2021): 56. https://doi.org/10.1186/s13690-021-00579-3. DOI: https://doi.org/10.1186/s13690-021-00579-3

Hidayah, Fathi. ‘Crosswalking as a Tool to Decide Arabic Language Standard in Madrasa Tsanawiyah: From Arabic Curriculum to ACTFL and CEFR’. International Conference on Humanity Education and Society (ICHES) 3, no. 1 (2024). https://proceedingsiches.com/index.php/ojs/article/view/265.

Ho, Ally Oi Kuan, Don Yao, and Antony John Kunnan. ‘An Analysis of Macau’s Joint Admission Examination–English’. The Journal of AsiaTEFL 18, no. 1 (2021): 208–22. https://doi.org/10.18823/asiatefl.2021.18.1.12.208. DOI: https://doi.org/10.18823/asiatefl.2021.18.1.12.208

Indaryanti, Rosita Budi, Harsono Harsono, Sutama Sutama, Budi Murtiyasa, and Bambang Soemardjoko. ‘4D Research and Development Model: Trends, Challenges, and Opportunities Review’. Jurnal Kajian Ilmiah 25, no. 1 (2025): 91–98. https://doi.org/10.31599/na7deq07. DOI: https://doi.org/10.31599/na7deq07

Karim, Sayit Abdul, Suryo Sudiro, and Syarifah Sakinah. ‘Utilizing Test Items Analysis to Examine the Level of Difficulty and Discriminating Power in a Teacher-Made Test’. EduLite: Journal of English Education, Literature and Culture 6, no. 2 (2021): 256. https://doi.org/10.30659/e.6.2.256-269. DOI: https://doi.org/10.30659/e.6.2.256-269

Koray, Özlem, and Sercan Çetinkılıç. ‘The Use of Critical Reading in Understanding Scientific Texts on Academic Performance and Problem-Solving Skills’. Science Education International 31, no. 4 (2020). https://www.icaseonline.net/journal/index.php/sei/article/view/239. DOI: https://doi.org/10.33828/sei.v31.i4.9

Li, Changlin, Nik Aloesnita Nik Mohd Alwi, and Mohammad Musab Azmat Ali. ‘A Comparative Review of the CEFR and CET4 Writing Assessment with Insights from Task Complexity Theories’. Malaysian Journal of Social Sciences and Humanities (MJSSH) 10, no. 3 (2025): e003251. https://doi.org/10.47405/mjssh.v10i3.3251. DOI: https://doi.org/10.47405/mjssh.v10i3.3251

Li, Yan, Miaomiao Zhen, and Jia Liu. ‘Validating a Reading Assessment Within the Cognitive Diagnostic Assessment Framework: Q-Matrix Construction and Model Comparisons for Different Primary Grades’. Frontiers in Psychology 12 (December 2021): 786612. https://doi.org/10.3389/fpsyg.2021.786612. DOI: https://doi.org/10.3389/fpsyg.2021.786612

Mahmudah, Umi, and Tulus Musthofa. ‘Reading Skills Learning in the “Arabic-Online.Net” Application by Saudi Electronic University Based on the Common European Framework of Reference for Languages (CEFR)’. Scaffolding: Jurnal Pendidikan Islam Dan Multikulturalisme 5, no. 3 (2023): 370–85. https://doi.org/10.37680/scaffolding.v5i3.3377. DOI: https://doi.org/10.37680/scaffolding.v5i3.3377

Maqsood, Ammara, and Tanzeela Anbreen. ‘Bridging Proficiency and Practice: Aligning Lexical Bundles in Pakistani L2 Learner Writing with CEFR Descriptors’. Journal of Applied Linguistics and TESOL (JALT) 8, no. 4 (2025). https://jalt.com.pk/index.php/jalt/article/view/1646.

Martin, Florence, Yan Chen, Robert L. Moore, and Carl D. Westine. ‘Systematic Review of Adaptive Learning Research Designs, Context, Strategies, and Technologies from 2009 to 2018’. Educational Technology Research and Development 68, no. 4 (2020): 1903–29. https://doi.org/10.1007/s11423-020-09793-2. DOI: https://doi.org/10.1007/s11423-020-09793-2

Mokhtari, Hadjer. ‘Asālīb Ta’līm Al-Lughah Al-‘Arabiyah Li-Ghairi Al-Nāṭiqīn Bihā’. HuRuf Journal : International Journal of Arabic Applied Linguistic 2, no. 2 (2023): 156. https://doi.org/10.30983/huruf.v2i2.5956. DOI: https://doi.org/10.30983/huruf.v2i2.5956

Muslim, Buhori, Zikrina Zikrina, and Mukhlisah Mukhlisah. ‘Taṭwîr Kitāb Al-Qirā’at Al-Rasyîdah Li Tarqiyyah Mahārah Al-Qirā’ah ‘Inda Al-Ṭālibah Bi Istikhdām Al-Kitāb Al-Elektrûny Al-Tafā’Uliy Fi Al-Madrasah Al-Mutawassiṭah Insān Qur’āny Aceh Besar’. Jurnal Ilmiah Islam Futura 23, no. 2 (2023): 347. https://doi.org/10.22373/jiif.v23i2.19489. DOI: https://doi.org/10.22373/jiif.v23i2.19489

Nagai, Noriko, Gregory C. Birch, Jack V. Bower, and Maria Gabriela Schmidt. ‘Integrating Learning, Teaching, and Assessment’. In CEFR-Informed Learning, Teaching and Assessment. Springer Texts in Education. Springer Singapore, 2020. https://doi.org/10.1007/978-981-15-5894-8_5. DOI: https://doi.org/10.1007/978-981-15-5894-8

Neal, David, Sophie Gaber, Phil Joddrell, Anna Brorsson, Karin Dijkstra, and Rose-Marie Dröes. ‘Read and Accepted? Scoping the Cognitive Accessibility of Privacy Policies of Health Apps and Websites in Three European Countries’. DIGITAL HEALTH 9 (January 2023): 20552076231152162. https://doi.org/10.1177/20552076231152162. DOI: https://doi.org/10.1177/20552076231152162

Ntumi, Simon, Sheilla Agbenyo, and Tapela Bulala. ‘Estimating the Psychometric Properties (Item Difficulty, Discrimination and Reliability Indices) of Test Items Using Kuder-Richardson Approach (KR-20)’. Shanlax International Journal of Education 11, no. 3 (2023): 18–28. https://doi.org/10.34293/education.v11i3.6081. DOI: https://doi.org/10.34293/education.v11i3.6081

Połczyńska, Monika M., and Susan Y. Bookheimer. ‘General Principles Governing the Amount of Neuroanatomical Overlap between Languages in Bilinguals’. Neuroscience & Biobehavioral Reviews 130 (November 2021): 1–14. https://doi.org/10.1016/j.neubiorev.2021.08.005. DOI: https://doi.org/10.1016/j.neubiorev.2021.08.005

Putri, Annora Pratama, and Joko Sayono. ‘Evaluation of Item Quality: Analysis of Difficulty Level and Distinction Power with Quantitative Methods’. Journal of Educational Sciences 10, no. 1 (2026). https://jes.ejournal.unri.ac.id/index.php/JES/article/view/1246.

Razida, Mirdawati, Nur Hasaniyah, and Abdul Muntaqim Al Anshory. ‘Blending Technology and Pedagogy: Optimizing Maharah al-Qirā’ah through the Alef Education Platform’. Al-Irfan : Journal of Arabic Literature and Islamic Studies 8, no. 2 (2025): 211–25. https://doi.org/10.58223/al-irfan.v8i2.424. DOI: https://doi.org/10.58223/al-irfan.v8i2.424

Rezigalla, Assad Ali, Ali Mohammed Elhassan Seid Ahmed Eleragi, Amar Babikir Elhussein, et al. ‘Item Analysis: The Impact of Distractor Efficiency on the Difficulty Index and Discrimination Power of Multiple-Choice Items’. BMC Medical Education 24, no. 1 (2024): 445. https://doi.org/10.1186/s12909-024-05433-y. DOI: https://doi.org/10.1186/s12909-024-05433-y

Rohman, Habibur, and Faiq Ilham Rosyadi. ‘Development of Arabic Teaching Materials Based on the Common European Framework of Reference (CEFR) to Improve Students’ Arabic Language Skills’. Al Mahāra: Jurnal Pendidikan Bahasa Arab 7, no. 2 (2021): 163–83. https://doi.org/10.14421/almahara.2021.072-01. DOI: https://doi.org/10.14421/almahara.2021.072-01

Santuya, Glenna Rose. ‘Learners’ Level of Reading Comprehension: Basis for Contextualized Reading Materials’. Pantao (International Journal of the Humanities and Social Sciences) 4, no. 2 (2025). https://doi.org/10.69651/PIJHSS040292. DOI: https://doi.org/10.69651/PIJHSS040292

Shi, Lijun, Tuan Sarifah Aini Syed Ahmad, and Anealka Aziz Hussin. ‘A Systematic Literature Review of Current Studies on Comparison Between the CEFR and CSE’. International Journal of Social Science Research 12, no. 2 (2024): 18. https://doi.org/10.5296/ijssr.v12i2.21627. DOI: https://doi.org/10.5296/ijssr.v12i2.21627

Supunya, Nuntapat. ‘Towards the CEFR Action-Oriented Approach: Factors Influencing Its Achievement in Thai EFL Classrooms’. 3L The Southeast Asian Journal of English Language Studies 28, no. 2 (2022): 33–48. https://doi.org/10.17576/3L-2022-2802-03. DOI: https://doi.org/10.17576/3L-2022-2802-03

Van Den Akker, Jan, Koeno Gravemeijer, Susan McKenney, and Nienke Nieveen, eds. Educational Design Research. Routledge, 2006. https://doi.org/10.4324/9780203088364. DOI: https://doi.org/10.4324/9780203088364

Wan Abdul Halim, Wan Alisa Hanis, and Nik Aloesnita Nik Mohd Alwi. ‘Cefr-Aligned Language Tests: A Systematic Scoping Review’. SSRN Scholarly Paper No. 4244716. Social Science Research Network, 11 October 2022. https://papers.ssrn.com/abstract=4244716.

Zakkiyah, Maulia Yasminah, Nurriya Maghfirah Fidyahwati, Ahmad Tarajjil Ma’suq, and Novita Anggraini. ‘Assessment Design and Analysis of Arabic Reading Skills Instructional Materials’. IJIE International Journal of Islamic Education 3, no. 1 (2024): 31–46. https://doi.org/10.35719/ijie.v3i1.2000. DOI: https://doi.org/10.35719/ijie.v3i1.2000

Zhang, Helen, Anthony Perry, and Irene Lee. ‘Developing and Validating the Artificial Intelligence Literacy Concept Inventory: An Instrument to Assess Artificial Intelligence Literacy among Middle School Students’. International Journal of Artificial Intelligence in Education 35, no. 1 (2025): 398–438. https://doi.org/10.1007/s40593-024-00398-x. DOI: https://doi.org/10.1007/s40593-024-00398-x