User Tools

Site Tools


So you want to do research on MOOC data

The rest of this page focuses on pragmatics of doing MOOC-related research.

Generally speaking, you must:

  • Be a Berkeley Principal Investigator, or a Berkeley student working with a Berkeley PI, or a Visiting Scholar or Visiting Industrial Fellow/Researcher working with a Berkeley PI
  • Have completed Human Subjects training. Berkeley-affiliated researchers can do this online: follow the instructions for "Online Training" on this page.
  • (Coming Soon) Once you have completed the training, the Social Sciences Data Lab (D-Lab) curates the data and can provide access and instructions on how to work with the data formats.

What data is available and how is it formatted?

Importing edX data is a bit tricky; we have developed some scripts to help (these will become obsolete when D-Lab curation handover is complete).

In general, an edX MOOC produces three different data sources:

  • MySQL databases store info about students, their grades, and earned certificates.
  • The clickstream log or event log, which has an event for every host-side or client-side (typically Javascript) action. Each such event is a JSON object. There's a folder for each server, and under that a file for each calendar day, so getting all events for a calendar day requires examining that day in each folder. The number of events is very large so most applications will pre-filter it with a script or with a simple tool such as grep.
  • The MongoDB databases that store the edX forum data (discussion boards)–the full text of posts, replies, and so on. Importing them into Mongo is straightforward using mongoimport.

To learn more about the data:

There's also a variety of user-contributed tools (both for data management and other stuff) on the edX Tools wiki.

Using MOOCs to test interventions and new pedagogy

Under construction. But you will almost certainly need to file an IRB protocol. Watch this space for details and example protocols.

MOOC Research Venues and Sources

Below are possible sources of MOOC research and publishing venues, as well as other sources of potentially useful information.


Conference Submit Date Notify Date Conference Date
SIGCSE (ACM Special Interest Group on Computer Science Education) Paper: Sept 6, 2013
Poster: Oct 28, 2013
March 5-8, 2014
Learning at Scale Paper: Oct 6, 2014
Work-in-progress: Dec 15, 2014
Paper: Dec 8, 2014
Work-in-progress: Jan 27, 2015
March 15-16, 2015
LAK (Learning Analytics & Knowledge) Paper: Oct 14, 2014
Poster: ??
Paper: Dec 9, 2014
Poster: Dec 9, 2014
March 16-20, 2015
SIGCHI (ACM Special Interest Group on Computer Human Interaction)
SIGCHI education
Autodesk Paper Forager for CHI and UIST
Paper: Sept 22, 2014
Work-in-progress: Jan 5, 2015
Paper: Dec 15, 2014
Work-in-progress: Jan 26, 2015
April 18-23, 2015
Intelligent Tutoring Systems Paper: Jan 26, 2014 (abstract: Dec 15) Paper: ?? June 5-9, 2014
CSCL (International Conference on Computer-Supported Collaborative Learning) Paper: Nov 10, 2014
Poster: Nov 10, 2014
Paper: ??
Poster: ??
June 7-11, 2015
ITiCSE (Innovation and Technology in Computer Science Education)Paper: Jan 23, 2014
Poster: March 16, 2014
June 23-25, 2014
EDM (Educational Data Mining) Paper: Feb 19, 2015
Poster: April 9, 2015
Paper April 2, 2015
Poster: April 19, 2015
June 26-29, 2015
AIED (Artificial Intelligence in Education) (biennial) Paper: Jan 28, 2013
Poster: Jan 28, 2013
Paper: March 17, 2013
Poster: ??
July 9-13, 2013
ICER (International Computing Education Research) Paper: April 20, 2015
Work-in-progress: June 15, 2015
Paper: June 1, 2015
Work-in-progress: ??
Aug 10-12, 2015
KDD (Knowledge Discovery and Data Mining) 2015
Data Mining for Educational Assessment and Feedback (ASSESS 2014)
Paper: Feb 21, 2014 Paper: May 12, 2014 Aug 10-13, 2015
NIPS (Neural Information Processing Systems Foundation)
NIPS Workshop Data Driven Education (2013)
Paper: June 6, 2014 Paper: ?? Dec 8-11, 2014





research-intro.txt · Last modified: 2018/02/28 17:02 (external edit)