Assessment shapes the experiences of students and influences their behaviour more than the teaching they receive (Bloxham & Boyd, 2007). Assessment increases its influence when it takes the form of standardised or high-stakes examinations; these examinations are made worldwide, and one of their essential purposes is to streamline the process of learning and evaluate effectiveness and extent of knowledge.
Almost all academic institutions are engaged with assessment in one way or the other. The decisions based on assessments have wide-ranging, and often long-lasting consequences, for the stakeholders, and therefore, it is of utmost importance to ensure the quality of assessments and the results they generate.
The purpose of this article is to take the readers through the process of assessment and quality assurance cycles from the experience of the Aga Khan University Examination Board (AKU-EB). Its purpose is also to share examples of international best practices and quality indicators, such that any other educational institution involved in assessment, be it an individual school, the network of schools, assessment bodies, examination boards or policymakers, can adapt and align their practices with that of these high standards.
This article will initially explain the assessment cycle, followed by a description of internationally recognised quality indicators of assessment. It will conclude elaborating how assessment and quality assurance cycles can be intricately woven together to meet international standards of assessment quality.
The Assessment Cycle
The assessment cycle consists of three phases, as shown in Figure 1.
Phase I: Question and Exam Development
Examination questions along with their marking schemes should be developed by subject experts with at least a few years’ experiences of teaching and assessment of similar content for the same level of students. One or more peer subject experts for content should ideally review these questions and marking schemes, before further undergoing multiple stages of review by a multi-disciplinary panel of experts. The purpose of these reviews is to critically analyse questions from various aspects, such as content, cognitive level, difficulty, relevance, and language. Once approved and accepted, they should be securely stored in a question database; it is suggested to save these questions according to their characteristics such as student learning outcome (SLO), difficulty, and cognitive level for easy retrieval during examination paper development.
During examination paper development, reviewed and approved questions are selected according to the blueprint of the examination, also known as exam specifications. The examination paper should also go through multiple reviews for content representation, overall difficulty, cognitive level, relevance, language, and time requirement before it is finalised.
Phase II: Marking
Markers, also recognised as assessors, should be carefully selected from a pool of subject experts based on their teaching and assessment/marking experience for similar content and level of students. These markers should be oriented about the purpose of the examination and receive training about the best practices of assessment/scoring for fair and accurate marking of student scripts.
Before initiating the marking of scripts for the entire student cohort, marking schemes should be tested to be aligned with the questions/tasks, comprehensive, well-structured and clear. During this phase, known as seeding, at least two experienced markers review the marking scheme for alignment to the questions, content and comprehensiveness. After this review, the markers independently score answer scripts of a small, random and representative sample of students selected based on gender, regions and student performance. This is done to evaluate if the marking schemes facilitate markers in awarding the same scores to the same students. During this process, marking schemes are revised for more clarity, structure or comprehensiveness, as required, until the two markers award similar scores to the selected student scripts. The reviewed and standardised marking schemes are then used for marking of student scripts for the entire student cohort.
Phase III: Review of Results
The third phase of the assessment cycle involves a comprehensive review of results before they are announced. Student scores should be reviewed comprehensively using statistical analysis to ensure the results are error-free, accurate and acceptable. The report also helps identify strengths and any deficiencies or gaps in the curriculum, teaching/learning, and assessment. These findings must be shared with relevant stakeholders, such as curriculum developers, teachers, students, exam developers, academic leaders and policy developers, facilitating them for further improvement (Zuberi et al., 2018). Such information sharing enhances transparency, accountability and credibility of assessments.
Figure 1. The Assessment Cycle
Quality Indicators of Assessment Cycle
The three internationally recognised quality indicators of assessment are elaborated as follows:
Quality Indicator 1: Validity
Validity is defined as how well a test measures what it is supposed to measure (Downing, 2013). In simple terms, validity evaluates the extent to which the examination, and the marking that follows, help assess what was intended to be assessed. The degree of alignment takes both the depth and breadth of the expected learning outcomes and the domain, that is, knowledge, skill or attitude, into account (Yousuf N, 2018).
Example 1: If the purpose of the examination is to assess the application of Newton’s law of motion, assigning marks to paragraph and spelling errors would compromise the validity of scores.
Reason: Assessment would not be aligned with the learning outcome, i.e. the application of scientific knowledge versus language skills.
Example 2: If the purpose of the examination is to assess mathematical reasoning skills, and the test comprises of simple arithmetic tasks, such an assessment would be considered invalid.
Reason: Assessment would not be aligned to the cognitive level of the learning outcome, mathematical reasoning versus basic arithmetic skills.
Example 3: If the purpose of the examination is to assess the skills to use microscope effectively and appropriately, evaluating such an object using a paper and pencil test would be invalid.
Reason: Assessment would not be aligned to the domain of the learning outcome, psychomotor skill versus knowledge.
Quality Indicator 2: Reliability
Reliability is the degree to which an assessment produces reproducible or consistent results (Phelan C & Wren J, 2006; Downing, 2014). In simpler terms, it is the precision of assessment scores and is the inverse of measurement error. The more the mistake to scores, the less the reliability. Reliability is ensured through good construction of examination questions, objective and structured marking schemes, and marker training and standardisation among others (Yousuf, 2018).
Example 1: If the language used in the question is not clear, it will introduce error that can compromise the reliability of scores.
Reason: Questions with unclear language can be misinterpreted by the students, and hence, will introduce error compromising the reliability of scores.
Example 2: During marking, if the marker is not trained or the marking scheme is not explicitly spelt, it will compromise the reliability of scores.
Reason: During scoring by an untrained marker or in the absence of a (clearly developed) marking scheme, the marker may award scores based on subjectivity/ individual biases, hence compromising the reliability of scores.
Quality Indicator 3: Fairness
Fairness is the consideration of a learner’s needs and characteristics, and any reasonable adjustments that need to be applied to take account of them (Smith A, 2016). In simpler terms, it ensures that the examination provides equal opportunity to all the students to perform as per their knowledge and skills.
Example: If the question carries a contextual clue that is not understood by a particular sub-group of the student population, then the test would not be fair.
Reason: Some or more students could not score on the test, not because they were not prepared, but because they were unaware of the context used in the question. For example, using ‘fur coat’ or ‘shopping mall’ as contextual clues for students residing in warm or rural areas, respectively.
Bringing it all together: Interwoven Assessment and Quality Assurance Cycles
The following section will discuss how each of these quality indicators is embedded in each phase of the assessment cycle to ensure quality.
Phase I: Question and Exam Development
During question and examination paper development and review, the validity of assessment decisions is enhanced through:
- Aligning each question to an SLO of the syllabus derived from the defined curriculum. This ensures the content validity of the problems.
- Ensuring each question is within the depth (cognitive level) and the breadth (content and context) of the SLO.
- Verifying the appropriateness of the difficulty level and relevance of the questions for the level of students.
- Developing a comprehensive marking scheme, such as rubrics or checking hints, with each question that gives credit to all valid possible answers.
- Use of examination blueprint (exam specifications), which ensures the content validity of the examination paper.
During question development and review, reliability is ensured by reviewing the construction and language of questions. Moreover, the structure and objectivity of the marking schemes enhance the reliability of scores.
For enhancing fairness during the question development and multi-disciplinary review, it should be ensured that the language of the questions is accessible for all students to understand and does not contain any jargon. Further to this, it should be guaranteed that the content of the question is not culturally or politically biased and the context should be familiar to all learners.
Phase II: Marking
During seeding, the inclusion of all possible answers for which the student should gain credit enhances the validity of the scores. Furthermore, the content expertise, experience and training of the markers ensure their understanding and effective use of the marking schemes.
The reliability of scores during seeding and marking should be enhanced through the following strategies:
- Training of markers on best practices for scoring.
- The enhanced clarity and standardisation of marking schemes during seeding, as required.
- Testing the objectivity of marking schemes during seeding by ensuring the two independent markers give similar scores to a selected representative sample of student scripts.
Fairness can be enhanced by ensuring student anonymity throughout the seeding and marking processes. This prevents bias from affecting the marking process regarding the region, gender, ethnicity, institution or any other personal identity.
Phase III: Results Review
Validity during post-exam analysis is ensured through a review of the results. The scores should be reviewed by a group of experts comprising of subject specialists/teachers and assessment experts/coordinators to discuss the student performance on each question, to identify any error in the question or marking scheme that has unfairly affected scores. The objective of this activity is to ensure that the students are not penalised for any such errors or omissions.
During results’ review, reliability should be measured using internationally recognised indicators such as Cronbach’s alpha and standard error of measurement. Identification of any errors or omissions, regarding the content or language of the questions or marking schemes and the scoring process, and its rectification will enhance the reliability of scores.
Fairness can be enhanced through openness, information sharing and accountability. Providing relevant information regarding examinations, its quality and student performance to stakeholders would make assessments transparent and fair. Moreover, such information should facilitate the stakeholders to identify their strengths and areas for further improvement.
The ‘coupled’ cycles of assessment and quality assurance provide a road map for all kinds of educational bodies dealing with the assessment to ensure that the decisions they make based on assessments are valid, reliable and fair. These processes shall enable all those involved in assessments to achieve excellence (Zuberi et al., 2018). Information out of result review must inform students, teachers, markers, curriculum and examination developers, academic leadership and policymakers facilitating quality improvement. The assessment processes and its quality assurance must ensure the validity, reliability and fairness of students’ scores.
About the Author:
Dr Naveed Yousuf is Associate Director, Assessment & Research and Assistant Professor, Educational Development at the Examination Board and Faculty of Health Sciences, Aga Khan University. His PhD thesis focused on psychometrics and standard setting in assessment. Dr Yousuf is responsible for the psychometric quality of national-level assessments and for psychometric research projects at AKU. He also serves as thesis supervisor and faculty member for the Master of Health Professions Education Programme. He is an Editorial Board Member for Education in Medicine Journal.
Ms Munira Mohammad is currently completing her MPhil in Education, completed her Master of Arts in International Relations from the University of Karachi and MBA in Human Resource Management from PAF-Karachi Institute of Economics and Technology. At AKU-EB her core responsibilities include the development of examination papers and developing, revising and reviewing examination syllabi for Pakistan Studies, Geography, Civics, History and Geography of Pakistan, Pakistan Culture, Education, Sociology, Ethics, Fine Arts and other humanities subjects. Based on her extensive experience, Munira also conducts and facilitates training for teachers and educators.
Rabia Nisar completed her MSc in Applied Mathematics from the University of Karachi and is currently completing MS in Computational Mathematics from NED University of Engineering and Technology. At AKU-EB, her core responsibility as a member of the Assessment department is to ensure quality by implementing best international practices in assessment. She contributes to examination development and syllabus revision of Mathematics and Sciences. Based on her extensive experience, she also conducts and facilitates teachers’ training.
Dr Shehzad Jeeva was appointed as the Director, Aga Khan University Examination Board in October 2014 and also holds a joint appointment as an Assistant Professor of Faculty of Arts and Sciences, AKU. He has transformed the institution from a startup to a robust organisation through his enthusiastic leadership. He has also re-defined the institution’s vision and mission to make AKU-EB a model of excellence and innovation in education for Pakistan and for the developing countries. Under his leadership, the Examination Board has been providing support to the Government of Balochistan through UNICEF/EU funded project to develop capacity in Assessment. He also gave consultancy to the Asian Development Bank and design a project for the Government of Sindh to improve secondary education in Sindh. Dr Jeeva has been appointed as a member of the Sindh Curriculum Council, member of Inter Board Chairmen Committee (IBCC), chair of Group-BCC and chair of several government sub-committees. He is also a founding member of the International Association for Educational Assessment’s Recognition Committee (UK) to develop international standards for examinations across the world. Dr Jeeva received several scholarships including the Aga Khan Foundation Scholarship to complete his PhD in Chemistry from the University of Cambridge, where he received the Toby Jackman Prize for the most outstanding PhD thesis in any subject.