Observing group learning/group tasks

  • When predicting performance, less of a bad thing is better than more of a good thing: Stemler (Wesleyan U), Aggarwal, Nithyanand,Bhatt (AspiringMinds, India), Proc. AERA 2014 (short paper).
    • Goal: identify employees with socially-destructive (“dark”) traits that may harm their job performance or their organization's reputation. (“In education, for example, some researchers have suggested that removing the 5% of teachers at the bottom of the distribution would do far more to enhance the productivity and reputation of the educational workforce than would raising the selection bar for the top performers (Hanushek, 2009)”). THey administer a Situational Judgment Test (SJT) (“What would you do in following scenario…”) to existing salespeople & managers whose performance records are known. 11% of variance in their actual performance is predicted by people choosing “worst” answers to questions, vs only 5% by choosing “best”. That is, less of a bad thing is better than more of a good thing.
  • (from Marti) Improving Teamwork using real-time language feedback, Tausczik and Pennebaker, CHI 2013. Interested in the question of team dynamics (as opposed to learning). To study it however, they use a large intro psych class and force the students to do their study in order to get class credit. The related work section is very interesting as is their characterization of the work on small group dynamics. They have the audacity to try to do automated suggestions of feedback messages based on what people are saying. I don’t buy this part at all, but there is a significant main effect of if people are don’t talking much, if you insert feedback, they talk more. I do believe that one.
  • The emergence of GitHub as a collaborative platform for education. Zagalsky, Feliciano, et al., CSCW 2015. Qualitative interview-based analysis of using GitHub to manage courses, keep tabs on student activity (both within groups and solo work), version/improve course material, and do subset of tasks typically delegated to LMS.
    • - Current course asset formats thwart merging: hard to use merge/fork/pull to selectively import self-contained modules or sets of changes from others' courses. (Armando has long bitched about this to edX, but the other LMSs aren't any better.)
    • - Faculty resistant to use if they don't know GitHub, since it does have a harder learning curve than predecessors.
    • - Even experienced users admit that it can be tricky to undo/back out certain kinds of mistakes, even though always possible (vanishingly rare that data is actually lost for good).
    • - GH for Education site actually contains instructor-facing user guide but none of the 15 interviewees (including me) knew about it.
    • + Convenience of student submission: eg exams & assignments can just have students push to GitHub, and last push before deadline is the one that's graded.
    • + Distributed version control allows repo copies to travel on USB drives, laptops, …, and avoids red tape usually associated with campus-IT-based centralized version control.
    • + Some instructors version course materials by fork-and-pull each semester, so can compare course diffs across offerings.
    • + Students like making projects available publicly on GH as part of their resume/portfolio.
    • + Transparency of activity: instructors can see which students are doing what, and quickly see who's slacking and what they're (not) doing relative to peers.
    • + Some instructors use Github news feeds (“Johnny Q pushed to Homework1/Problem1 repo 2 minutes ago, with 100 changed lines”) to catch problems early
    • + Encourages participation: students can use fork-and-pull to suggest improvements to course materials; pull request tends to generate discussion; merged pulls result in extra credit/bonus points
    • + Some instructors encourage students to use Issues feature when they have trouble, and tag instructor or TA in issue/comment if they get stuck.

Measuring behaviors of student software project teams

Metrics in Agile Project Courses (Alperowitz et al., ICSE 2016, short paper) BibTeX, PDF.

The authors teach a course with ~100 students formed into project teams, similar to CS169. They describe metrics provided to coaches and instructors to help them keep track of how teams are doing, but they haven't rigorously analyzed how these metrics predict team success. Interestingly, the metrics are NOT shown to the students in the teams, for fear that the students will “game” the metrics myopically (e.g. if students learn that instructors are tracking the number of Git commits, they'll start gratuitously committing to increase their count, even though the instructors aren't using that metric in isolation.) Metrics they focused on:

  • Average life of a pull request, from open to merge. “Optimal” is <24 hours “based on our past experience with the course”, but too-short lifetimes may be bad because they indicate the code wasn't reviewed thoroughly before merging.
  • Percent of merge requests with at least one comment or other activity (eg associated task). This is similar to our metric of pull-request activity in ProjectScope.
  • Mean time between a CI (continuous integration) failure and first successful build on the main development branch.
  • All indicators are tracked week-to-week (or sprint-to-sprint or iteration-to-iteration) so instructors see trends as well as snapshots.
  • Number of deploys this iteration (In their course, “deployment” == “customer downloads latest app”, but for SaaS we could just look at Heroku deploy history.)

How surveys, tutors, and software help to assess Scrum adoption in a classroom software engineering project (Bruegge et al, same team as “Metrics in Agile Project Courses”, but from Hasso Plattner Institute), ICSE 2016 full paper.

Authors used combination of anonymous surveys, unobtrusive monitoring of scrum meetings, and a software tool called ScrumLint to monitor if students in project teams are following the XP/Scrum process correctly; 5 project teams over 4 sprints (iterations) were tracked.

The toolchain is GitHub with pull-requests, Travis CI with auto-deploy to Heroku, Github Issues with Waffle.io for backlog tracking, and a Slack group with a channel-per-team.

  • Each iteration started and ended with surveys to ask each student whether it was clear what the Product Owner wanted in the user stories, whether the estimates of required time were realistic, etc. Surveys were voluntary and anonymous (not pseudonymous). Main lesson: “satisfaction with Scrum as a process” (one of the survey questions) correlates with knowing how to create and work effectively with high-quality user stories (r=0.26) and high-quality tests (r=0.24).
  • Tutors passively “sat in” (in person) on scrum meetings and privately rated teams on how “dedicated” they were to following scrum practices and how much actual progress they were making, on a scale of 1 to 10. This scales poorly and has the usual problems of inter-rater inconsistency, plus students may change their behavior if an “authority figure” like a tutor is present at their meeting. Main lesson: Tutors' assessments over time of how well teams followed Scrum did not particularly track survey responses over time, so surveys aren't enough

Most interesting part for me was their tool ScrumLint, which analyzes GitHub commits and issues relative to 10 “conformance metrics” developed for their course, including (eg) “percentage of commits in the last 30 minutes before the sprint-end deadline”. Each conformance metric has a “rating function” that computes a score between 0 (practice wasn't followed at all) and 100 (practice was followed very well); each team can see their scores on all metrics.

  • They used Zazworka et al.'s model of "process nonconformance" to define a process-conformance template (consist of process name, process goal/focus, process description, collected data used to determine conformance, syntactic violation such as temporal patterns in data that clearly indicate violation of process, and semantic violation in which thresholds on the collected data indicate low quality if not an outright violation) and to define a conformance workflow (gather data and detect process violations; understand violation's context; if the violation is a true violation, fix it; if not, determine why you got a false positive, and either improve the detection metric or change the definition of process conformance).
  • From a list of 35 Agile practices extracted from large-ish scientific computing projects using Agile (see next paper, below), they used and/or modified 10 for ScrumLint to detect.
  • ScrumLint found violations in every team's processes; highest overall score (over all 5 teams, all 4 iterations) was 87.5, with a mean around 75 (eyeballing the graph) and a low of 65 (1 team, 1 sprint).

Caveats:

  • They're measuring whether students are following a given process, as surfaced by specific metrics. It's not (yet) correlating ScrumLint measurements with project “success” as determined by tutors (I'm a bit surprised they didn't try this, since they had the tutors' evaluations available).
  • “There is little research on what constitutes Agile best practices” (presumably, meaning which practices predict project success, and what are the metrics of success). One earlier (2004) effort is HackyStat, which instruments students' IDEs to get each student's ActiveTime, MostActiveFile, FileSize, FileComplexity (based on CHidamber-Kemerer complexity metrics for Java), UnitTestCoverage, and compares these metrics across students within same project team. Interesrtingly, “all of these measures are highly uncorrelated with each other”, ie no joy in using these to find patterns that predict success or failure. Also this is embedded in individuals' IDEs so it's hard to get info on “group” activities that you get naturally from GitHub, Slack, etc.

Related work:

  • Since even Kent Beck (inventor of Agile) doesn't give formal definitions for “agile process conformance”, others have defined informal "Agile maturity levels" and a useful team self-assessment for measuring them. (We've made this self-assessment available as a Google Forms document.)
  • Are developers complying with the process?: an XP study (the Zazworka et al. reference), Proc. Empirical SW Eng. & Management (ESEM) 2010, proposes a way to codify conformance templates and violation detection for monitoring conformance with a process. Here is an example conformance template for the practice “collective code ownership” (Note that this card actually covers two distinct metrics, Pair Switching and Truck Factor. In practice it might be better to define a conformance template for each one, but it depends what you want to report).
Example conformance template:
Process Name Collective code ownership via promiscuous pairing
Process focus code is collectively owned; high Truck Factor
Process description Pair switching: subjects switch pairs for each new story and/or between iterations
Collected data (automatic/implicit) Commit logs include names of programmers and story card number
Collected data (manually) Self-assessment surveys asking developers about their pairing behaviors
Process violations (syntactic) (1) same developer pair works together on two consecutive story cards; (2) same developer pair working together in two consecutive iterations
Process violations (semantic) (1) Too low Truck Factor

(The Truck Factor, also called Truck Number or Bus Factor or Bus Number, is the percentage of people in the project who would have to be run over by a truck or bus to doom the project. 100% is optimal but unrealistic; 20% is bad (a 5-person project that has 1 person who could doom the project by leaving). There are various ways to estimate it automatically, such as noting what fraction of developers have nontrivially edited a particular file, class, etc., and finding the “most vulnerable” files or developers.

What Do We Know about Scientific Software Development's Agile Practices?, Sletholt et al., IEEE Computing in Science and Engineering, March/April 2012

From 5 scientific-computing projects whose use of Agile was documented in a scholarly way, authors extract 35 Agile practices.

teamwork_and_team_dynamics.txt · Last modified: 2016/07/15 16:10 by fox
 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki