posted on 2025-07-15, 07:59authored byLinda William, Javen Lai
<p dir="ltr">1. Introduction: Assessment is a key component of the learning process, allowing lecturers to measure student progress and adapt instructional strategies accordingly. However, creating and maintaining high-quality assessment questions is a significant challenge, particularly when ensuring that questions are diverse, contextually relevant, and accurately weighted in scoring. Traditional methods require substantial manual effort, often taking weeks to develop and refine assessment items. Additionally, the difficulty of individual questions is not systematically analysed, leading to grading inconsistencies where easy and difficult questions contribute equally to final scores. To address these limitations, this study proposes an AI-driven Assessment Question Management Tool that leverages Artificial Intelligence (AI) and Machine Learning (ML) to automate question generation and analyse question difficulty. </p><p dir="ltr">2. Methods: The tool consists of two primary components: (1) Automated Question Generation and (2) Question Difficulty Analysis. These components work together to facilitate efficient question creation and adaptive scoring, improving assessment quality and grading fairness. The first component automates the generation of text-based and image-based questions across multiple formats, including multiple-choice, true/false, ordering, and short-answer. The tool allows lecturers to specify the desired number of questions, difficulty range, and question type. The Generative AI models used include GPT-4.0 and GPT-4.0-mini for text-based questions, generating content aligned with given subtopics, and DALL-E 3 for image-based questions, designed for subjects requiring visual representations. The second component focuses on evaluating the difficulty of existing questions and recommending dynamic scoring adjustments. Traditional grading approaches do not differentiate between easy and difficult questions, which can lead to grading imbalances. The tool addresses this issue by analyzing student response data and applying ML models to estimate difficulty levels. Difficulty estimation is based on (1) correct response rates (percentage of students answering correctly), (2) Average response time (how long students take to answer), and (3) Error variance (variation in student performance across attempts). </p><p dir="ltr">3. Experiment Result: The tool was tested on academic subjects including Mathematics, Geography, and general knowledge domains. The GPT-4.0-mini model demonstrated faster generation times, making it suitable for real-time applications, while GPT-4.0 provided higher-quality questions for complex subtopics. However, DALL-E 3 exhibited limitations in mathematical diagram generation, often distorting proportions or misinterpreting certain symbols. The tool-predicted difficulty levels were validated against key performance indicators, such as student accuracy rates and response times. Results indicate that the AI-based difficulty estimates align well with expected patterns, leading to fairer and more data-driven grading. </p><p dir="ltr">4. Conclusion: The findings from this study demonstrate that AI-driven assessment tools can effectively reduce the manual workload for lecturers, improve assessment scalability, and ensure fairer grading practices. The integration of Gen AI for question creation and ML for difficulty-based scoring provides a systematic and adaptive approach to assessment design.</p>