User Tools

Site Tools


Autograding - beyond multiple choice

What kinds of autograding are possible beyond the usual short-answer/multiple-choice questions? Here are some existing examples:

Computer graphics: comparing students' images to reference solution (Prof. Ravi Ramamoorthi, CS 184x)

One aspect of computer graphics assignments is to compare images produced by the student's program against a reference image. Since almost no two graphics systems or programs produce exactly the same results, a simple pixel comparison gives too many false negatives. Therefore, we do some further processing involving thresholding to measure the number of erroneous or “hot” pixels and see if this falls below a threshold. We also watermark all images to avoid simple copying of solutions between students. In our local classes, we go one step further, using student's source code that is compiled and run in simulation against the reference solution; using code prevents most forms of cheating and provides a more consistent comparison on similar hardware. We have not yet implemented code-grading in the online class, but do currently ask for submission of source code, and are beginning to batch-grade to determine the feasibility of code-based comparisons in future iterations of the course.

Embedded systems: automatically generating exercises and solutions (Prof Sanjit Seshia, EE 149x)

Prof Seshia and colleagues developed a template-based approach to classifying problems in a recent textbook by Lee and Seshia, and outline approaches to problem and solution generation based on mutation and satisfiability solving. Learn more about it here.

Software engineering: using unit testing, mutation testing, and remote integration testing to test complete programming projects (Prof Armando Fox, CS 169.1x)

We use a mixture of well known testing techniques and repurpose existing professional-grade testing tools to evaluate student homeworks. Projects range from writing a few tens of lines of code to deploying a complete application in the cloud, either new code or enhancing existing code. Students get more detailed feedback than they've historically gotten from TAs. Some articles about the course and autograding are here.

Stop reading now if you are not a programmer

There are two categories of machine-gradable exercises/questions that can be incorporated into an EdX MOOC.

One category is questions that rely on the autograding already “built in” to the edX platform, which supports various types of short-answer questions, questions involving algebraic formula responses, and even open-ended grading of short essays. The EdX 101 self-service training course explains how to use these.

The other category is questions in which students submit a piece of work that is graded by an external autograder (or “external grader” in edX parlance). This document explains how to create such an autograder. You will need familiarity with edX studio, familiarity with raw XML editing, advanced programming skills including HTTP/REST network programming, and someplace that one or more instances of the autograder can run, such as Amazon EC2 or a Berkeley server that is publicly reachable over HTTPS.

The edX documentation sometimes refers to externally-graded problems as “code response problems”. Look at the “Using an External Grader” explanation on the edX Studio docs page for an overview. Some details and code samples are below.

Overall strategy

Here is the developer-facing documentation for creating external graders.

Your autograder pulls a student submission from a queue (the submission includes an opaque student identifier), grades it, and posts back a result to the queue. The submission also includes a grader_payload element whose content is passed to the autograder; it can be used, for example, to pass the autograder information about which assignment this is.

1. Identify which queue you'll use, or have one created

Ask edX to create one or more named “queues” from which your autograder will pull. When you configure the “submit your work here” page in edX Studio, you will specify the name of the queue to which the assignment will go. Our recommendation is to MINIMIZE the number of queues you need: if you have a single autograder “engine” and use the info in grader_payload to determine what grading script to run, you can have just a single queue per course. (This is what we do in CS 169, the first Berkeley course to use this API.)

Here is the list of existing queues for BerkeleyX classes.

2. Code the autograder engine

Autograder engine should make the following assumptions:

  • it will be handed a blob of content representing the student's submission
  • it will be handed the textual content of the <grader_payload> field, if any
  • it must eventually generate and post back a JSON object as follows:
{ 'correct': 'True',  'score': '85',  'feedback':  'Good work!'  }

'correct' - should be the string “True” or “False”; since that makes little sense for nontrivial assignments, we set it to “True” unless the score is identically zero. A true value means the student will see a green checkmark, false means they'll see a red X.

'score' - number of points achieved, as an integer converted to string, e.g. “85”. Should be consistent with the info displayed in Studio for that assigment

'feedback' - a string of free text, feedback to student about what they got wrong etc. Unclear what the length limit is, but be sure the string is properly JSON-escaped.

3. Create submission page in EdX Studio

Create a Studio problem page that includes the following markup (do this by creating a new Problem, then clicking Edit to bring up the raw editor):

    HTML markup with instructions for students.  You can use most regular HTML tags.
  <coderesponse queuename="BerkeleyX-cs169x-p1">   <!-- queue name should match the name of queue you asked edX to create in step 1 above -->
    <filesubmission points="100"/>                   <!-- points should be consistent with 'score:' property of JSON postback in step 2 above -->
      <grader_payload>assign-0-part-1</grader_payload> <!-- arbitrary string passed to autograder; could be a JSON object; we recommend
                                                           using it to identify the assignment and/or problem being submitted -->

Autograder protocol: establishing a secure session to the queue server

You will need to get values for the following parameters, referred to in rest of this description - ask Armando or another edX liaison:

qauthuser, qauthpass username and password for Django authentication
quser, qpass username and password for HTTPS BasicAuth (for historical reasons, you need both username/password sets)
quri URI of the queue server
qname the name of the queue EdX configured for you, from above

Set up a POST request to quri/xqueue/login/ with the Authorization: header encoding quser and qpass. Your language's HTTP library hopefully has a utility function to do this for you, but if not, do it manually by setting the value of the Authorization: header to the word Basic followed by a space followed by the Base64 encoding of quser:qpass

The body of the POST request should consist of the following HTTP form data (Note that these are DIFFERENT from the username/password in the previous step!)

username:  qauthuser
password:  qauthpass

On successful response, be sure to capture the value of the first cookie in the Set-cookie: response header (that is, everything up to but not including the first semicolon); you'll need it later.

You now have a session and can retrieve one or more submissions.

Autograder protocol: retrieving and grading a submission

Do an HTTPS GET to quri/xqueue/get_submission/ with the following properties:

  • Headers must include Cookie: with the value of the session cookie you received when you established the session (last step of “Establishing a secure connection” above)
  • Content must include an encoded form with field queue_name: qname

If response error code is 3xx, 4xx or 5xx, it probably means this submission was already served from a queue with a different name. If that happens, it's bad, because it means you have 2 autograders trying to pull from different queues but only one of those queues is supposed to be associated with this assignment.

Otherwise, check for successful response (HTTP status 2xx), and if so, grab the response body. The response body is a JSON object from which you need to grab the following fields:

  'content':   {
      'xpackage':  {
           'xqueue_header':  "secret key you should grab for postback",
           'xqueue_files':  {    // list of filenames submitted by student, and where to download each:
               'foobar1.c': '',
               'foobar2.c': ''
           'xqueue_body': {
               'grader_payload': "contents of the &lt;grader_payload&gt; XML element on Studio submission page",
               'student_info': {
                  'anonymous_student_id':  "obfuscated identifier for this student",
                  'submission_time':  "2013-09-17T06:44Z",   // UTC datetime when student submitted the work

Retrieve each of the xqueue_files by doing an HTTPS GET to each file's URI. Normally, we ask the student to submit just one file per assignment, so we expect to find only a single key-value pair in the xqueue_files object.

Autograder protocol: posting results to queue

Call your autograder engine to do the grading. As explained above, it will need to supply values for correct, score, and feedback.

End-to-end testing

Best practices for external autograders

  • They should be stateless, or if they rely on persistent state, it should be stored reliably external to the autograder process. This is so that multiple autograder instances can be spun up to handle a spike in homework submissions.
  • The implementation language is up to you. The API used to retrieve submitted assignments and post back grades is a RESTful HTTP API.
  • For now, you are responsible for running these yourself, either on Berkeley's servers or on a public cloud. In the future EdX will be able to provision them. Either way, we strongly suggest you package your autograder as a complete Amazon virtual machine image (AMI), rather than assuming any specific OS environment at deployment time. You can prepare and debug machine images using Virtualbox and then convert the resulting image to an AMI, or you can develop on Amazon directly.
  • Be very careful about sandboxing. If you're running student-submitted (ie untrusted) code, do so in a tightly-controller interpreted environment if at all possible.
  • For both security and robustness, we recommend the autograder fork a new process for each assignment to be graded, and protect the forked process with a watchdog timer or similar mechanism. The overall autograder “outer loop” (that forks each child) should itself be protected with a watchdog timer to make sure it's making progress. Hung autograders will cause you pain and bad press.
  • If you don't practice test-driven development, now would be a really great time to start. Having an autograder fail under duress is a very unpleasant experience.

Some example cloud-based autograder code you can look at:

autograding.txt · Last modified: 2018/02/28 17:02 (external edit)