User Tools

Site Tools


grading_programs_machine_learning

Idea: extract features from data dependency graph where each node is annotated with which control structure the dependency occurs in (inside a loop, inside nested loop, from parent scope, etc).

Two expert graders each graded about 90 instances of each of 2 problems (“encrypt” a string by adding position-dependent number to each character; sort a collection and return every other element in sorted order) on a 5-point scale. 5 = correct, passes tests, uses “correct” abstractions and data structures; 3 = significant errors in data structures and/or control flow; 1=gibberish. When experts disagreed, they discussed and came to agreement.

SVM and ridge regression were trained on 2/3 of data and run on 1/3. (Coefficients, penalties, etc. were determined empirically to find lowest RMS error during 3-fold cross-validation.) The best results were selected for presentation.

In final confusion matrix (predicted scores on ~30 examples), about 1/2 agreed with raters and most of the rest were off by 1 category.

They also tried 1-class modeling (vs supervised learning) and got correlations of between .5 and .7.

Limitations:

  • Features seem arbitrarily constructed and have a powerset problem (loop in a loop? parent scope from outer loop vs parent scope from enclosing function? etc)
  • I have my doubts about test set: only 6.6% of CS *seniors* got the questions 100% correct, yet the questions are pretty simple ().
  • Vague about how much effect the correctness had on the accuracy of their grader. Including correctness simplifies the problem greatly.
grading_programs_machine_learning.txt · Last modified: 2018/02/28 17:02 (external edit)