Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models (SIGCSE TS 2024 - Papers)

Who

Muntasir Hoq, Yang Shi, Juho Leinonen, Damilola Babalola, Collin Lynch, Thomas Price, Bita Akram

Track

SIGCSE TS 2024 Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 22 Mar 2024 14:10 - 14:35 at Meeting Rooms B115-116 - LLMs, Debugging, and Detection Chair(s): John Edwards

Abstract

The emergence of publicly accessible large language models (LLMs) such as ChatGPT poses unprecedented risks of new types of plagiarism and cheating where students use LLMs to solve exercises for them. Detecting this behavior will be a necessary component in introductory computer science (CS1) courses, and educators should be well-equipped with detection tools when the need arises. However, ChatGPT generates code non-deterministically, and thus, traditional similarity detectors might not suffice to detect AI-created code. In this work, we explore the affordances of Machine Learning (ML) models for the detection task. We used an openly available dataset of student programs for CS1 assignments and had ChatGPT generate code for the same assignments, and then evaluated the performance of both traditional machine learning models and Abstract Syntax Tree-based (AST-based) deep learning models in detecting ChatGPT code from student code submissions. Our results suggest that both traditional machine learning models and AST-based deep learning models are effective in identifying ChatGPT-generated code with accuracy above 90%. Since the deployment of such models requires ML knowledge and resources that are not always accessible to instructors, we also explore the patterns detected by deep learning models that indicate possible ChatGPT code signatures, which instructors could possibly use to detect LLM-based cheating manually. We also explore whether explicitly asking ChatGPT to impersonate a novice programmer affects the code produced. We further discuss the potential applications of our proposed models for enhancing introductory computer science instruction.

DOI

https://doi.org/10.1145/3626252.3630826

Muntasir Hoq

North Carolina State University

United States

Yang Shi

North Carolina State University

United States

Juho Leinonen

Aalto University

Finland

Damilola Babalola

North Carolina State University

United States

Collin Lynch

North Carolina State University

United States

Thomas Price

North Carolina State University

United States

Bita Akram

North Carolina State University

United States

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 22 Mar
Displayed time zone: Pacific Time (US & Canada) change

13:45 - 15:00	LLMs, Debugging, and DetectionPapers at Meeting Rooms B115-116 Chair(s): John Edwards Utah State University

13:45 25m Talk		Can Language Models Employ the Socratic Method? Experiments with Code DebuggingGlobalCC Papers Erfan Al-Hossami UNC Charlotte, Razvan Bunescu UNC Charlotte, Justin Smith UNC Charlotte, Ryan Teehan New York University DOI
14:10 25m Talk		Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning ModelsCC Papers Muntasir Hoq North Carolina State University, Yang Shi North Carolina State University, Juho Leinonen Aalto University, Damilola Babalola North Carolina State University, Collin Lynch North Carolina State University, Thomas Price North Carolina State University, Bita Akram North Carolina State University DOI
14:35 25m Talk		Towards Comprehensive Metrics for Programming Cheat DetectionCC Papers Frank Vahid UC Riverside / zyBooks, Ashley Pang UC Riverside, Benjamin Denzler University of California, Riverside DOI