
After attending two days of the Artificial Intelligence in Measurement and Education Conference (AIME-Con), my primary takeaway is a feeling of cautious optimism mixed with a need to ensure responsible use of AI in education. There was a recurring message heard from the main stage to the paper sessions, pairing our excitement about the potential of AI in assessment with rigorous, evidence-based validation to ensure that we are truly using AI in ways that help students learn and measure that learning effectively.
As someone dedicated to learning innovation at WGU Labs, I left the conference with three critical lessons defining the future of AI-enhanced assessment.
The Power of Purpose: Asking Why Before How
The most refreshing aspect of AIME-Con was the consensus on avoiding overuse of AI. This immediately shifted the focus from technology to pedagogy and ethics.
- The Cost of AI: Liberty Munson, Director of Psychometrics at Microsoft Worldwide Learning, began her keynote on day one with a stark reminder of the extreme energy costs associated with large language models. This pushed me to question the necessity of every AI application. I even deleted my AI notetaker app, knowing that the act of notetaking actually enhanced my own learning, a vital insight to pass on to students about shortcuts.
- Fairness is Personalization: My notes strongly reflected the idea that fairness is not about standardization, but about personalization. This validates the work of just-in-time AI tutors (like those highlighted by Dr. Ken Koedinger), which provide formative feedback and adaptive practice, allowing assessment to be embedded in the learning process rather than a separate, time-consuming event.
The New Data: Measuring Reasoning and Authentic Practice
I was excited to see research moving beyond simple multiple-choice scoring to capture complex student thinking and authentic practice through simulations.
- Conversation as Assessment: Various projects, including those at Khan Academy, are pioneering conversation-based assessment. The purpose is clear: to use dialogue with the AI to capture reasoning, reveal student misconceptions, and enable real-time scaffolding. There were some great discussions around the challenge that many students face with explaining their thinking, as well as the importance of helping teachers learn how to elicit those conversations. There is still much to learn about how to effectively utilize AI to capture those conversations, assess the learning from them, and provide meaningful feedback to students. For now, we still need a lot of human engagement to gather and analyze the data from these conversations to interpret the meaning that can lead to meaningful future innovations for teaching and learning.
- Leaning into Simulations: Other institutions, such as the University of Maryland and researchers from the University of Virginia (utilizing Mursion), are also embracing authentic AI simulations to offer ongoing learning opportunities. These projects track specific "teacher moves" or workplace skills, providing a rich, measurable dataset that goes far beyond traditional question types. Using AI for real-world simulations has been a topic of interest to me (and an area I’ve dabbled in - see my work on educator simulations here) as soon as I got my hands on Chat-GPT, so I really enjoyed learning about what others were doing, and I look forward to contributing to this research in the near future.
- The Context Gap: Human Understanding vs. AI "Personalization": The challenge lies in moving from "personalized output" to contextualized understanding. My notes highlighted that conversational models, in their default settings, still have flaws: they tend to ask leading questions or produce output that is mistaken for genuine, personalized feedback. More critically, AI models have an inherent bias to avoid saying "I don't know," which compromises their honesty and accuracy. We discussed how this AI-driven personalization is fundamentally limited because it lacks the human element of teaching and learning. As attendees noted, AI cannot know if a student is hungry, tired, stressed, or collaborating while engaging in the assessment, a context that a teacher in a classroom can use to truly understand and meet a student's needs. This is the crucial gap we must address to achieve authentic personalization.
The Evidence Mandate: Validation is the True Rigor
This was the most critical takeaway: the need to slow down to ensure efforts are meaningful and validated.
- Flaws at the Extremes: Researchers sought to automate the scoring of challenging professional skills, but found that their initial AI models were not aligned with human assessments. Notably, the AI often refused to give scores on the extremes (very high or very low).
- Pilot Studies are Essential: Most of the paper sessions I attended focused on relatively small pilot studies. This data is invaluable: it forces us to return to the human experts, validate AI scoring and rubric models, and compare them with human standards to maintain objectivity and accuracy. Researchers are patient people; everyone was happy to share their data even when the results did not align with their hypothesis. We learn as much from failure as from success.
Taking Action: WGU Labs Pilots and the Future of Assessment
My biggest alignment from the conference is with the vision for the future of assessment. I firmly believe that the ultimate goal of these AI developments can and should be to replace the end-of-course, point-in-time, and standardized assessments that are the norm today. New, authentic assessments provide us with new data; we must utilize them to transform measurement.
I am grateful that WGU Labs is embracing this mindset. We are committed to combining enthusiasm for play with expertise to question everything. That is why I am thrilled about the several AI pilots we have scheduled with our Student Insights Council in the weeks ahead. We will directly experiment with various ways to utilize authentic and adaptive AI assessment to ensure our models are effective, reliable, and fair before scaling them. I'm eager to see and share what we learn.


.jpg)
