Introductory Programming Assessment Must Accommodate Copilot-like Assistants ~ Observational Hazard

GitHub Copilot and other machine learning based programming assistants will fundamentally change how we assess competency in introductory programming courses. Computer science educators can no longer rely on formulaic assignments and an honor code to ensure original work that demonstrates programmatic ability. Copilot-like assistants have blurred the line between the programmer’s work and machine generated code (largely modifications of pattern-matched work from their training sets). While they provide a productivity boost for the professional programmer and the advanced student, they may potentially act as a crutch for introductory students, who will rely on these tools in lieu of developing a strong understanding of their own.

There are certain standard problems that we are accustomed to assigning because completing them demonstrates the ability to implement fundamental simple algorithms. A prior strategy to reduce plagiarism has been to provide scaffolding code or put a spin on a problem to make it unique. Unfortunately, Copilot-like assistants are almost as capable in these scenarios as they are at writing generic simple algorithms. In my own preliminary testing on a (what I believe to be) unique scaffolded assignment of my own creation for an introductory class, Copilot was able to contextualize the comments and write most of the smaller functions accurately with just a little bit of my assistance.

What can we do about this? How can we assess competency in this environment? Over the past few decades, computer science education, like many other fields, has been moving away from exams and towards project-based learning. This has been a positive trend for a host of well-researched reasons that are spelled out in the literature. Unfortunately, I think this trend will need to be at least partially reversed for introductory courses. Students must demonstrate the ability to write and comprehend fundamental algorithmic code without the assistance of an AI. We could try banning it, but that never works well. Instead, we can try to assess knowledge “live.” How much do you know in this moment without someone or something’s assistance? And that’s what an exam evaluates.

Of course, exams have well-documented downsides, including but not limited to the fact that a significant number of bright students do poorly on them who do fine with project-based learning. I am not suggesting we return to a world of exams making up the majority of the grade in a course. Instead, I am suggesting that exams and exam-like evaluation will need to be a greater percentage of the mix. We can be creative. An oral presentation can in some instances demonstrate knowledge as well as an exam. A project coded live in class on machines that do not have AI assistants enabled, can serve as a pseudo exam.

We do not need to return to the dark ages. But we must acknowledge that these new tools mean that how we evaluate introductory programming knowledge has to change. I have had graduates obtaining first jobs in the insurance, defense, and healthcare industries building programs that have life-and-death consequences. They need to have a firm grasp of the fundamentals. What happens when Copilot makes a fundamental mistake and the person using it does not have the skill to realize?

Observational Hazard

David Kopec's Blog

Introductory Programming Assessment Must Accommodate Copilot-like Assistants

Search

Popular Posts

Blog Archive