Friday, October 6, 2017

Making CS/IT education more immersive

This month's Communications of the ACM includes an article by Thomas A. Limoncelli on Four Ways to Make CS and IT More ImmersiveThe author laments that by and large, CS students aren't learning best practices in the classroom, and offers four specific pieces of advice for serving students better.

1. Use best-of-breed Dev/Ops tools from the start: the normal way for students to work on coding tasks should be to use Git, submit via commit/push, and provide pointers to CI reports.

In our ESaaS course and MOOC, students do just this. Git is used from day one, and student projects (in the campus version of the course, and a coming-soon Part 3 MOOC) are required to include coverage, CodeClimate, and CI badges on their repo front pages. Homework submission is often via deployment to Heroku.

2. Homework programs, even Hello World, should generate Web pages, not text. This requires understanding minimal bits of SaaS architecture, languages, and moving parts.

ESaaS’s first assignment is indeed a Hello World for Ruby, but the very next one has students manipulating a simple SaaS app built using Sinatra. The goal is to get students “thinking SaaS” early on.

3. Curricula should start with a working system that shows best practices, not just build from low-level to higher-level abstractions.

ESaaS’s second assignment has students examining code for a simple Sinatra app that follows good coding practices, includes integration and unit tests (before students have been introduced to creating or reading code for such tests), and is activated by deploying to the public cloud.

4. Focus on reading, understanding, and adding value to an existing system—not just on greenfield development. The former is much more common than the latter in the software engineering profession.

ESaaS indeed includes a 2-part homework assignment on enhancing legacy code, but more exciting, an increasing fraction of all projects undertaken in Berkeley’s version of the course (10/11 projects in summer 2017 offering of the course; 13/20 in Fall 2017 offering) are existing, functioning legacy systems wanting additional features.

Our course materials are freely available on edX as the 2-part (soon to be 3-part) MOOC sequence Agile Development Using Ruby on Rails, and plenty of instructor materials are available for teachers wanting to use this content or the accompanying textbook in a SPOC.

Tuesday, September 5, 2017

Improving gender & ethnic diversity in CS: what can faculty do?

Recently, my colleagues Dave Patterson, John Hennessy and Maria Klawe, all storied contributors to both academic and professional computing, wrote an excellent essay enumerating all the things that were incorrect (and there were many) in Google engineer James Damore's "anti-diversity post." For those who didn't hear about said post, which initially circulated internally at Google but eventually found its way onto the public Internet, Damore argued that the reason there are so few women in computing is because biological differences make women less fit than men for the technical and leadership activities associated with computing, and that companies who strive for gender parity are therefore misguided.

I can’t improve on my colleagues’ extensive rebuttal of what was a poorly-argued piece costumed in the garb of science, but as I just gave a phone interview to the Daily Cal (UC Berkeley’s student-run newspaper) on this topic, I thought I’d summarize my comments in this post too.

While Berkeley (and most of our peer institutions) make ongoing and vigorous efforts through a panoply of programs designed to improve the representation of women and underrepresented minorities (URMs) in STEM fields, I believe such programs can only work if they are driven from the relationships that structure students’ day-to-day lives on campus and collectively form an institutional culture. Some of those relationships are among students themselves—as peers in a course, between student and TA, or as part of a student-run club or academic group. Other relationships are between students and faculty—as course instructors, research advisors, or sponsors/liaisons to student-run activities.

In my faculty role, I try to do two things: lead by example in my interactions with students, and listen to them.

Leading by example. There’s quite a bit we can do in our courses and research projects, such as:

    • Invite women and URM guest speakers, both to talk about their experience of being part of an underrepresented demographic in tech and just to give interesting technical talks.
    • If we supervise GSIs, or are the faculty sponsor of a student group, DeCal course, or other student-centric activity, we can make it a point when engaging with those students to bring up unconscious bias and the importance of improving diversity in our field. Especially for GSIs, their behavior, language, and demeanor are important because of their position of authority.
    • We can make students aware of gender/ethnic diversity results that are directly relevant to course if possible. In my software engineering project course, I encourage project teams to make an effort to include women, as team research has found the presence of women on a team to be a strong predictor of team success. Indeed, a study of GitHub code repositories found that women's contributions to code bases were more likely to be accepted, unless you could tell from their profile that they were women!  I also often introduce bits of computing history (it’s my hobby) that are relevant to the lecture topics; most students aren’t aware that the “masculinization” of computing didn’t really start in earnest until the 1960s, and that until then virtually all programmers were women. In CS375,  a required GSI orientation and training that I’ve co-taught, we do an in-class activity based on Google’s unconscious-bias training that directly demonstrates our proneness to unconscious bias. 

Listening.  As faculty, we can decide to allocate time to participate in programs designed to support women and URMs entering the field. One example is the Berkeley CS Scholars program, of which I'm a faculty sponsor. When I do get to interact with women or underrepresented minorities, I listen to what they have to say about their experience so that you know what to fix. Was there a difficult interaction with a professor? with a GSI? with a fellow student? Unless we ourselves have had the experiences that many women or minorities have had in computing (and given the problem statement, most of us haven’t), this is the only way to reliably understand what to work on.

Department- and University-supported diversity programs are great, and leaders like Patterson, Hennessy and Klawe have to stand up and point out the problems in posts like Damore's. That is a necessary but not sufficient part of fixing the problem. The “mentality of a field” is based on the values embodied in the relationships among people in that field, and as faculty it’s part of our responsibility to bring those values to every student relationship in which we participate.

Thursday, August 24, 2017

Our techno-fetishism problem

A frequent meme in both the Software Engineering class I teach and during my open office hours (when any student can come for advice or discussion on anything) is what appears to be a premature fear of lack-of-scalability when developing Internet services/Web apps. This concern is often hard to disentangle from techno-fetishism—the often irrational desire to use the latest rockstar tech, without really understanding whether it solves a problem you have, but providing the flimsy rationalization that “we need to use it for {scalability | dependability | having a good user experience | responsiveness}.”

There are various manifestations of this meme, including:
  • "We want to use {Mongo | Cassandra | your favorite NoSQL DB} because relational databases don't scale."
  • "We want to write our server in Node.js because synchronous/blocking/task-parallel appservers don't scale." (Weaker version: replace "don't scale" with "require more servers".)
  • "Our app is going to be huge, and we'd like your advice on how to achieve high scalability and 24x7 dependability across our soon-to-be-worldwide user base."
These thoughts are well-intentioned; many of these students are old enough to remember how MySpace crashed and burned in part because of its inability to get these things right, leaving a vacuum into which Facebook immediately stepped, and the rest is history, etc.

But since that time, the tools ecosystem has come a long way, computers have gotten way faster, and server cycles and storage have dropped in price by one or more orders of magnitude.

My challenge is therefore to help these students understand two things clearly:
  1. No technology will help you if your fundamental design prevents scaling (your app is poorly factored and/or cannot be divided into smaller services that can be scaled separately).
  2. If your app is well-factored, these days you can get surprisingly far with conventional RDBMS-backed stacks on commodity hardware. 
Ozan Onay makes this point well in his post "You are not Google," which incidentally includes quotes from my Berkeley colleague Prof. Joe Hellerstein. As Onay points out:
"As of 2016, Stack Exchange [the forum system of which StackOverflow is just one part] served 200 million requests per day, backed by just four SQL servers: a primary for Stack Overflow, a primary for everything else, and two replicas."
In other words, the highest-traffic interactive Web app used by developers all over the world uses the most mundane (in many developers’ view) of technologies—most SQL servers support this kind of master/slave replication by simply tweaking a config file, no special app programming required. Similarly, as of a couple of years ago, all of Pivotal Tracker was hosted on a single RDBMS instance.

A more sarcastic view of the same meme is put forth in one of my favorite xtranormal movies on Node.js. Go ahead, watch it. It’s only a few minutes long. You’ll thank me someday.

The basic message is the same: Most new technologies, languages, frameworks, and so on evolved to fill a specific technical need, and especially if they evolved in a large company, they probably fill a technical need that you don't have, and that you may never have.

There is a subtext to the message too: it is relatively rare that a new tool or framework embodies a fundamentally new idea about software engineering. The event-loop framework used by Node.js has been around since the beginning of time; indeed, the original Mac OS required all apps to be written this way (in part because the original Mac used a microprocessor that didn’t support virtual memory, making true preemptive multitasking impossible), which was super painful. OS designers created threads to make this pain go away; threading systems do exactly what programmers are expected to do in Node. And modern threading systems running on modern hardware are damn fast.

Finally, and most importantly, no tool or framework can save you from a bad design. Some frameworks try to help you by effectively legislating some design decisions out of existence (e.g. Rails all but mandates a Model-View-Controller architecture), but there's still plenty of room to shoot yourself in the foot design-wise.

That will be your real obstacle to scaling and dependability—not the fact that you didn't go with Node.js or write your front end using React. So you can still wear that Node t-shirt, but please consider wearing it ironically.

Friday, June 16, 2017

Teach yourself to code using online tutorials…but which ones?

There’s an almost embarrassing plethora of “teach yourself to code” tutorials online—including reference sites like W3schools, interactive “try it in a browser” sites like CodeCademy and CodeHunt, beginner-friendly step-by-step activities such as those featured on, and many more. And that’s not even counting the dozens of online courses, both free (edX, Coursera) and paid (Udemy, Lynda)—devoted to some aspect or other of learning to code.

But which of these are actually likely to be effective in producing learning? Researchers who study how students learn have developed a rich body of well-tested ideas and recommendations around evidence-based teaching and learning—that is, the kind where you can unambiguously measure whether learning is taking place—but which if any of the online teach-yourself-to-code resources actually follow these principles?

My colleague Prof. Andrew Ko at the University of Washington and his student Ada Kim have produced a very nice paper addressing this question, which they presented at SIGCSE 2017, the largest conference focused on computer science education. Kim and Ko studied 30 free and paid highly popular online resources for teaching yourself beginning coding skills.

They have numerous findings, but the one that struck me most was that while many of these sites cover most of the same material (with varying degrees of beginner-friendly organization and quality of presentation), most focus on how to understand and practice a particular concept in coding, but provide little practice on identifying when and why to apply that concept in real programming situations. While Kim and Ko stop short of making “hard” recommendations, they find that the tutorials from, and online games such as Gidget, Lightbot, and CodeHunt, do the best job at incorporating elements of evidence-based teaching that have been shown to promote learning:

  • Provide immediate and personalized feedback to the learner
  • Organize concepts into a hierarchy with clear goal-directed for learning each concept
  • Opportunity for mastery learning, i.e. to practice a concept repeatedly until understanding is complete
  • Promote meta-cognition—knowing how and when to use a concept, not just how to use it
With everyone trying to make a buck (or just get visibility/sell ad space) claiming to teach beginners to code, a critical look at these resources through the lens of Computer Science Education Research is a welcome breath of fresh air. Read their 6-page paper here.

Thursday, March 2, 2017

The 7 (or so) habits of highly successful projects

Each time we offer at UC Berkeley based on the Engineering Software as a Service curriculum, students complete a substantial open-ended course project in teams of 4 to 6.  Each team works with a nonprofit or campus business unit to solve a specific business problem using SaaS; the student organization Blueprint helps us recruit customers.

Besides creating screencasts demonstrating their projects, students also participate in a poster session in which we (instructors) ask them to reflect on the experience of doing the project and how the techniques taught in the class helped (or didn't help) their success.

Virtually all project teams praised BDD and Cucumber as a way of driving the development of the app and reaching agreement with the customer, and all agreed on the importance of the proper use of version control. Beyond that, there was more variation in which techniques different teams used (or wished in retrospect that they had used). Below is the most frequently-heard advice our students would give to future students doing similar projects, in approximate order of popularity (that is, items earliest in the list were independently reported by the largest number of project teams).
  1. Reuse, don't reinvent. Before coding something that is likely to be a feature used by other SaaS apps (file upload, capability management, and so on), take the time to search for Ruby gems or JavaScript libraries you can use or adapt. Even two hours of searching is less time than it takes to design, code and test it yourself.

  2. Start from a good object-oriented design and schema. Taking time to think about the key entity types (models), relationships among them, and how to capture those relationships in a schema using associations and foreign keys. A good design reduces the likelihood of a painful refactoring due to a schema change.

  3. Weekly meetings are not enough.  Especially with the larger 6-person teams we used in the Fall 2013 course offering (237 students forming 40 teams), a 15-minute daily standup meeting helped tremendously in keeping everyone on track, preventing conflicting or redundant work, and informally sharing knowledge among team members who ran into problems. Teams that met only once a week and supplemented it with online chat or social-networking groups wished they had met more often.

  4. Commit to TDD early.  Teams that relied heavily on TDD found its greatest value in regression testing: regression bugs were spotted immediately and could be fixed quickly. Teams that didn't commit to TDD had problems with regression when adding features or refactoring. Teams that used TDD also noted that it helped them organize not only their code, but their thoughts on how it would be used ("the code you wish you had").

  5. Use a branch per feature. Fine-grained commits and branch-per-feature were essential in preventing conflicts and keeping the master branch clean and deployment-ready.

  6. Avoid silly mistakes by programming in pairs.  Not everyone paired, but those who did found that it led to higher quality code and avoided silly mistakes that might have taken extra time to debug otherwise.

  7. Divide work by stories, not by layers.  Teams in which one or a pair of developers owned a story had far fewer coordination problems and merge conflicts than teams that stratified by layer (front-end developer, back-end developer, JavaScript specialist, and so on) and also found that all team members understood the overall app structure better and were therefore more confident when making changes or adding features.
There you have it—the seven habits of highly successful projects, distilled from student self-reflections from approximately sixty projects over two offerings of the course.  We hope you find them helpful!

Monday, December 19, 2016

“Towards a new hacker ethic”

I just read the transcript (and saw slide deck) of a great talk, Programming is Forgetting: Toward a New Hacker Ethic by Allison Parrish at the 2016 Open Hardware Summit.
The whole talk is worth watching/reading, but the part I liked best is Parrish’s reformulation of the “hacker ethic” laid out in Steven Levy’s book Hackers in the 1980s. Levy set out to chronicle the cultural history of hacking (the good kind, i.e. building and tinkering with computing and technology systems, as opposed to maliciously subverting them) and he claimed to summarize the “hacker ethic” in the following four points:
  • Access to computers should be unlimited and total
  • All information should be free
  • Mistrust authority; promote decentralization [of control and content]
  • Hackers should be judged by their hacking [skills], not “bogus” criteria such as degrees, age, race, or position
The gist of Parrish’s talk is that while the above principles are noble, the anecdotes Levy recounts of hacker behavior are often inconsistent with the above points, including a noteworthy anecdote in which some hackers “exercised their ethic” in a way that dismissively interfered with the work of Margaret Hamilton, who would go on to invent the term “software engineering” and to be the lead architect of the guidance computer software for the Apollo missions that landed the first humans on the moon.

For me the best part of the talk was Parrish’s proposed reformulation of the “hacker ethic” in a way that takes the form of questions that creators of technological systems should ask themselves as they think about deploying those systems. I doubt I can improve on her phrasing, so I’ll quote the talk transcript directly:
“…my ethic instead takes the form of questions that every hacker should ask themselves while they’re making programs and machines. So here they are.
Instead of saying access to computers should be unlimited and total, we should ask “Who gets to use what I make? Who am I leaving out? How does what I make facilitate or hinder access?”
Instead of saying all information should be free, we could ask “What data am I using? Whose labor produced it and what biases and assumptions are built into it? Why choose this particular phenomenon for digitization or transcription? And what do the data leave out?” 
Instead of saying mistrust authority, promote decentralization, we should ask “What systems of authority am I enacting through what I make? What systems of support do I rely on? How does what I make support other people?” 
And instead of saying hackers should be judged by their hacking, not bogus criteria such as degrees, age, race, or position, we should ask “What kind of community am I assuming? What community do I invite through what I make? How are my own personal values reflected in what I make?”
A few weeks ago a Medium post went viral in which developer Bill Sourour discussed “The code I’m still ashamed of”. At the direction of his manager, he had written code for a website that was posing as a general-information website where you could take a quiz to determine what prescription drugs were recommended for your particular symptoms and condition. In fact, though, the website was effectively an advertisement for a specific drug, and no matter what your responses to the quiz, the recommendation would always be the same—you needed this company’s drug. (A young woman later killed herself due to depression attributable in part to consuming the drug.) Business Insider reported that Sourour’s post had triggered a storm of “confessions” on Reddit from other engineers who were ashamed of having done similar things under duress, and includes some pointed comments from software thought leader “Uncle Bob” Martin such as “We are killing people.” He warns us that the Volkswagen emissions-cheating scandal was probably just the tip of the iceberg, and that even though in this case the CEO was ultimately held accountable (which doesn’t always happen), “it was software developers who wrote that code. It was us. Some programmers wrote cheating code. Do you think they knew? I think they probably knew.”

Uncle Bob goes on to lament that coding bootcamps rarely include any required material on software ethics, and I'm beginning to fear we don't do enough at Berkeley either. In my work as a college instructor, I do have to deal with breaches of academic integrity of various sorts, from straight-ahead plagiarism to students paying freelancers to do their homework to students presenting false documentation about medical emergencies to avoid taking an exam. Disturbingly often, when these students are confronted with evidence of their actions, their only remorse seems to be that they were caught, and I find myself wondering whether they are the software writers who will go on to insert “cheat code” into a future consumer product. We do have a required Engineering Ethics course and there is a software engineering ethics code endorsed by the Association for Computing Machinery, but I worry that our ethical training doesn't have sharp enough teeth. As Uncle Bob wrote, “We [software developers] rule the world, we just don’t know it yet.” We’d better start acting like it. Self-reflection questions like those proposed by Parrish would be a good place to start.

Tuesday, December 13, 2016

Getting a good software internship in industry or academia

I hire summer students all the time to work on software projects related to my research and teaching at Berkeley, and I have students coming to me frequently asking for advice on either getting a good internship or selecting among various offers. Leaving aside the (very important) nontechnical aspects of your interview or job choice, here's my technical advice from the point of view of what I look for in a software hire (and, not coincidentally, what I teach in Berkeley's software engineering course):

Have a story for testing. Testing and debugging software is much harder than writing it. The question you should be prepared to answer is not "Do you test?" but rather "How do you test?" Do you have tests that can be run programmatically? Do you have any idea how much of your code is covered by tests?

Demonstrate that you know the ecosystem. The developer ecosystem, roughly centered around GitHub, now includes continuous integration (Travis, Jenkins), project tracking (Pivotal Tracker), code quality measurement (CodeClimate), and more. How do you use these tools?

Have you worked in a team? Software development is rarely a solo endeavor anymore; most interesting software is built by teams. What practices does your team use to organize its activities (scrum, standups)? To manage simultaneous development (branches, fork-and-pull)? To communicate and stay on same page about the project (Slack, Basecamp)?

Be a full-stack developer.  What can I do with a front-end developer who can't write the back end? That's like someone who can build the front of my house but not the side walls or stairs. What I need is a house, even if a very simple one. Similarly, even a back-end developer must be able to get my app out to the end user in some way, even if the user interface is modest and simple.

Get good at picking up new languages and frameworks. This is more relevant for full-time jobs, but I'd rather hire someone who can learn new things fast than someone who's an expert on whatever framework I happen to be using right now, since new frameworks and languages come along all the time. How would you demonstrate that you can learn things fast?

Understand Unix. The single most influential set of ideas in modern server and client operating systems derives from the Unix programming model. The ability to quickly put together simple tools to do a job is vital to developer productivity. If I ask you "Given a set of large text files, count approximately how many distinct email addresses they contain," and your first question is "In what language," we're done.  One plausible answer is the shell command:

    grep -E -o '\b\w+@\w+\b'  filenames | sort | uniq | wc -l

Do your own projects.  If the only software you write is in course projects, I'd wonder if you really love coding enough. Imagine trying to compete in a tennis tournament if the only time you had spent on the court was during your actual tennis lessons. Class projects just don't give you enough mileage to get really good at software; doing your own projects (extracurriculars, pro-bono work, hackathons, personal projects, contributing to an open-source project, whatever) is essential. Be prepared to answer "What is the coolest personal software project you've worked on and what made it exciting/challenging/a great learning experience?"

Show me your code.  As my colleague and GitHub engineer Jesse Toth has succinctly put it, “Your code is your résumé.” A résumé without evidence of coding skill is minimally useful. If you can’t make your GitHub public repos part of your code portfolio or add me as a view-only collaborator on some repos, at least send me a link to a tarball of code you’ve written.