Humans Need Training Data Too: Teaching with GPT-4

04 Apr, 2023

Machine Learning and Human Learning

Machine learning is all about feeding tons of data (the training dataset) into models until they can perform some task. Then, we evaluate how well the model performs its task. For a machine vision system, this might mean scoring the model based on how often it correctly identifies traffic lights in some test dataset.

To date, human learning is all about giving a relatively small corpus of data (the textbook) to humans until they can perform some task. Then, we evaluate how well the human performs its task. For a computer science course, this might mean scoring the human based on how often it correctly identifies answers to multiple choice questions about code in the exam questions.

Training Data for Humans

When I was in school I found that the best way to score highly on exams was to find as many realistic practice problems as possible to the point that when the test came, responses were mostly automatic. In short, I found that the way to really beat the system was to emulate the machine – get a huge training set! My success in finding exercises was dependent on the course, the professor, and how many practice questions I could find online.

As a teacher, I want to help my students study for exams in the same way that I did, with copious amounts of practice problems. [1] Finding times tables worksheets for a third grader isn’t so bad, but as topics become more advanced, practice problems become more and more scarce.

Now with GPT-4, teachers no longer have to hoard precious practice problems handed down by Pearson and the College Board. We can generate (and proofread) relevant problems at scale. Our job now becomes tailoring these problems to the students we have, based on where we see gaps in their knowledge. With an abundant set of machine generated questions and exercises in hand, teachers can focus their energy on what they’re uniquely good at: keeping students motivated and building their enthusiasm for the topic.

Sample problems: good luck!

(These are for the AP Computer Science A exam, the code is Java)

Given the following code segment, what will be the value of the variable 'sum'?

int sum = 0;

for (int i = 1; i <= 20; i++) {
    if (i % 3 == 0 || i % 5 == 0) {
        sum += i;
    }
}

A) 98

B) 102

C) 107

D) 117

E) 123

Given the following code segment, what will be the contents of the ArrayList numbers?

ArrayList numbers = new ArrayList<>();

numbers.add(1);
numbers.add(2);
numbers.add(3);
numbers.add(4);
numbers.add(5);

numbers.set(1, 6);
numbers.add(2, 7);
numbers.remove(4);

A) [1, 6, 7, 3, 4, 5]

B) [1, 6, 7, 3, 4]

C) [1, 6, 7, 2, 3, 4]

D) [1, 6, 7, 3, 5]

E) [1, 6, 2, 7, 3, 4]

Conclusion

It’s impossible to predict how GPT-4 (and GPT-5 and GPT-6) will change teaching and learning, but using GPT-4 to make exercises is an easy win in the short term.

-----------

[1] What I actually want is for kids to have fun coding, but I’m not in charge of this whole system!