Tuesday, September 5, 2017

Improving gender & ethnic diversity in CS: what can faculty do?

Recently, my colleagues Dave Patterson, John Hennessy and Maria Klawe, all storied contributors to both academic and professional computing, wrote an excellent essay enumerating all the things that were incorrect (and there were many) in Google engineer James Damore's "anti-diversity post." For those who didn't hear about said post, which initially circulated internally at Google but eventually found its way onto the public Internet, Damore argued that the reason there are so few women in computing is because biological differences make women less fit than men for the technical and leadership activities associated with computing, and that companies who strive for gender parity are therefore misguided.

I can’t improve on my colleagues’ extensive rebuttal of what was a poorly-argued piece costumed in the garb of science, but as I just gave a phone interview to the Daily Cal (UC Berkeley’s student-run newspaper) on this topic, I thought I’d summarize my comments in this post too.

While Berkeley (and most of our peer institutions) make ongoing and vigorous efforts through a panoply of programs designed to improve the representation of women and underrepresented minorities (URMs) in STEM fields, I believe such programs can only work if they are driven from the relationships that structure students’ day-to-day lives on campus and collectively form an institutional culture. Some of those relationships are among students themselves—as peers in a course, between student and TA, or as part of a student-run club or academic group. Other relationships are between students and faculty—as course instructors, research advisors, or sponsors/liaisons to student-run activities.

In my faculty role, I try to do two things: lead by example in my interactions with students, and listen to them.

Leading by example. There’s quite a bit we can do in our courses and research projects, such as:

    • Invite women and URM guest speakers, both to talk about their experience of being part of an underrepresented demographic in tech and just to give interesting technical talks.
    • If we supervise GSIs, or are the faculty sponsor of a student group, DeCal course, or other student-centric activity, we can make it a point when engaging with those students to bring up unconscious bias and the importance of improving diversity in our field. Especially for GSIs, their behavior, language, and demeanor are important because of their position of authority.
    • We can make students aware of gender/ethnic diversity results that are directly relevant to course if possible. In my software engineering project course, I encourage project teams to make an effort to include women, as team research has found the presence of women on a team to be a strong predictor of team success. Indeed, a study of GitHub code repositories found that women's contributions to code bases were more likely to be accepted, unless you could tell from their profile that they were women!  I also often introduce bits of computing history (it’s my hobby) that are relevant to the lecture topics; most students aren’t aware that the “masculinization” of computing didn’t really start in earnest until the 1960s, and that until then virtually all programmers were women. In CS375,  a required GSI orientation and training that I’ve co-taught, we do an in-class activity based on Google’s unconscious-bias training that directly demonstrates our proneness to unconscious bias. 


Listening.  As faculty, we can decide to allocate time to participate in programs designed to support women and URMs entering the field. One example is the Berkeley CS Scholars program, of which I'm a faculty sponsor. When Ido get to interact with women or underrepresented minorities, I listen to what they have to say about their experience so that you know what to fix. Was there a difficult interaction with a professor? with a GSI? with a fellow student? Unless we ourselves have had the experiences that many women or minorities have had in computing (and given the problem statement, most of us haven’t), this is the only way to reliably understand what to work on.

Department- and University-supported diversity programs are great, and leaders like Patterson, Hennessy and Klawe have to stand up and point out the problems in posts like Damore's. That is a necessary but not sufficient part of fixing the problem. The “mentality of a field” is based on the values embodied in the relationships among people in that field, and as faculty it’s part of our responsibility to bring those values to every student relationship in which we participate.




Thursday, August 24, 2017

Our techno-fetishism problem

A frequent meme in both the Software Engineering class I teach and during my open office hours (when any student can come for advice or discussion on anything) is what appears to be a premature fear of lack-of-scalability when developing Internet services/Web apps. This concern is often hard to disentangle from techno-fetishism—the often irrational desire to use the latest rockstar tech, without really understanding whether it solves a problem you have, but providing the flimsy rationalization that “we need to use it for {scalability|dependability|having a good user experience|responsiveness}.”

There are various manifestations of this meme, including:
  • "We want to use {Mongo | Cassandra | your favorite NoSQL DB} because relational databases don't scale."
  • "We want to write our server in Node.js because synchronous/blocking/task-parallel appservers don't scale." (Weaker version: replace "don't scale" with "require more servers".)
  • "Our app is going to be huge, and we'd like your advice on how to achieve high scalability and 24x7 dependability across our soon-to-be-worldwide user base."
These thoughts are well-intentioned; many of these students are old enough to remember how MySpace crashed and burned in part because of its inability to get these things right, leaving a vacuum into which Facebook immediately stepped, and the rest is history, etc.

But since that time, the tools ecosystem has come a long way, computers have gotten way faster, and server cycles and storage have dropped in price by one or more orders of magnitude.

My challenge is therefore to help these students understand two things clearly:
  1. No technology will help you if your fundamental design prevents scaling (your app is poorly factored and/or cannot be divided into smaller services that can be scaled separately).
  2. If your app is well-factored, these days you can get surprisingly far with conventional RDBMS-backed stacks on commodity hardware. 
Ozan Onay makes this point well in his post "You are not Google," which incidentally includes quotes from my own Berkeley colleague Joe Hellerstein. As Onay points out:
"As of 2016, Stack Exchange [the forum system of which StackOverflow is just one part] served 200 million requests per day, backed by just four SQL servers: a primary for Stack Overflow, a primary for everything else, and two replicas."
In other words, the highest-traffic interactive Web app used by developers all over the world uses the most mundane of technologies—most SQL servers support this kind of master/slave replication by simply tweaking a config file, no special app programming required. Similarly, as of a couple of years ago, all of Pivotal Tracker was hosted on a single RDBMS instance.

A more sarcastic view of the same meme is put forth in one of my favorite xtranormal movies on Node.js. Go ahead, watch it. It’s only a few minutes long. You’ll thank me someday.

The basic message is the same: Most new technologies, languages, frameworks, and so on evolved to fill a specific technical need, and especially if they evolved in a large company, they probably fill a technical need that you don't have, and that you may or may not ever have.

There is a subtext to the message too: it is relatively rare that a new tool or framework embodies a fundamentally new idea about software engineering. The event-loop framework used by Node.js has been around since the beginning of time; indeed, the original Mac OS required all apps to be written this way (in part because the original Mac used a microprocessor that didn’t support virtual memory, making true preemptive multitasking impossible anyway), which was super painful. OS designers created threads to make this pain go away; threading systems do exactly what programmers are expected to do in Node. And modern threading systems running on modern hardware are damn fast.

Finally, and most importantly, no tool or framework can save you from a bad design. Some frameworks try to help you by effectively legislating some design decisions out of existence (e.g. Rails all but mandates a Model-View-Controller architecture), but there's still plenty of room to shoot yourself in the foot design-wise.

That will be your real obstacle to scaling and dependability—not the fact that you didn't go with Node.js or write your front end using React. So you can still wear that Node t-shirt, but please consider wearing it ironically.

Friday, June 16, 2017

Teach yourself to code using online tutorials…but which ones?

There’s an almost embarrassing plethora of “teach yourself to code” tutorials online—including reference sites like W3schools, interactive “try it in a browser” sites like CodeCademy and CodeHunt, beginner-friendly step-by-step activities such as those featured on Code.org, and many more. And that’s not even counting the dozens of online courses, both free (edX, Coursera) and paid (Udemy, Lynda)—devoted to some aspect or other of learning to code.

But which of these are actually likely to be effective in producing learning? Researchers who study how students learn have developed a rich body of well-tested ideas and recommendations around evidence-based teaching and learning—that is, the kind where you can unambiguously measure whether learning is taking place—but which if any of the online teach-yourself-to-code resources actually follow these principles?

My colleague Prof. Andrew Ko at the University of Washington and his student Ada Kim have produced a very nice paper addressing this question, which they presented at SIGCSE 2017, the largest conference focused on computer science education. Kim and Ko studied 30 free and paid highly popular online resources for teaching yourself beginning coding skills.

They have numerous findings, but the one that struck me most was that while many of these sites cover most of the same material (with varying degrees of beginner-friendly organization and quality of presentation), most focus on how to understand and practice a particular concept in coding, but provide little practice on identifying when and why to apply that concept in real programming situations. While Kim and Ko stop short of making “hard” recommendations, they find that the tutorials from Code.org, and online games such as Gidget, Lightbot, and CodeHunt, do the best job at incorporating elements of evidence-based teaching that have been shown to promote learning:

  • Provide immediate and personalized feedback to the learner
  • Organize concepts into a hierarchy with clear goal-directed for learning each concept
  • Opportunity for mastery learning, i.e. to practice a concept repeatedly until understanding is complete
  • Promote meta-cognition—knowing how and when to use a concept, not just how to use it
With everyone trying to make a buck (or just get visibility/sell ad space) claiming to teach beginners to code, a critical look at these resources through the lens of Computer Science Education Research is a welcome breath of fresh air. Read their 6-page paper here.

Thursday, March 2, 2017

The 7 (or so) habits of highly successful projects

Each time we offer at UC Berkeley based on the Engineering Software as a Service curriculum, students complete a substantial open-ended course project in teams of 4 to 6.  Each team works with a nonprofit or campus business unit to solve a specific business problem using SaaS; the student organization Blueprint helps us recruit customers.

Besides creating screencasts demonstrating their projects, students also participate in a poster session in which we (instructors) ask them to reflect on the experience of doing the project and how the techniques taught in the class helped (or didn't help) their success.

Virtually all project teams praised BDD and Cucumber as a way of driving the development of the app and reaching agreement with the customer, and all agreed on the importance of the proper use of version control. Beyond that, there was more variation in which techniques different teams used (or wished in retrospect that they had used). Below is the most frequently-heard advice our students would give to future students doing similar projects, in approximate order of popularity (that is, items earliest in the list were independently reported by the largest number of project teams).
  1. Reuse, don't reinvent. Before coding something that is likely to be a feature used by other SaaS apps (file upload, capability management, and so on), take the time to search for Ruby gems or JavaScript libraries you can use or adapt. Even two hours of searching is less time than it takes to design, code and test it yourself.

  2. Start from a good object-oriented design and schema. Taking time to think about the key entity types (models), relationships among them, and how to capture those relationships in a schema using associations and foreign keys. A good design reduces the likelihood of a painful refactoring due to a schema change.

  3. Weekly meetings are not enough.  Especially with the larger 6-person teams we used in the Fall 2013 course offering (237 students forming 40 teams), a 15-minute daily standup meeting helped tremendously in keeping everyone on track, preventing conflicting or redundant work, and informally sharing knowledge among team members who ran into problems. Teams that met only once a week and supplemented it with online chat or social-networking groups wished they had met more often.

  4. Commit to TDD early.  Teams that relied heavily on TDD found its greatest value in regression testing: regression bugs were spotted immediately and could be fixed quickly. Teams that didn't commit to TDD had problems with regression when adding features or refactoring. Teams that used TDD also noted that it helped them organize not only their code, but their thoughts on how it would be used ("the code you wish you had").

  5. Use a branch per feature. Fine-grained commits and branch-per-feature were essential in preventing conflicts and keeping the master branch clean and deployment-ready.

  6. Avoid silly mistakes by programming in pairs.  Not everyone paired, but those who did found that it led to higher quality code and avoided silly mistakes that might have taken extra time to debug otherwise.

  7. Divide work by stories, not by layers.  Teams in which one or a pair of developers owned a story had far fewer coordination problems and merge conflicts than teams that stratified by layer (front-end developer, back-end developer, JavaScript specialist, and so on) and also found that all team members understood the overall app structure better and were therefore more confident when making changes or adding features.
There you have it—the seven habits of highly successful projects, distilled from student self-reflections from approximately sixty projects over two offerings of the course.  We hope you find them helpful!