Friday, June 16, 2017

Teach yourself to code using online tutorials…but which ones?

There’s an almost embarrassing plethora of “teach yourself to code” tutorials online—including reference sites like W3schools, interactive “try it in a browser” sites like CodeCademy and CodeHunt, beginner-friendly step-by-step activities such as those featured on, and many more. And that’s not even counting the dozens of online courses, both free (edX, Coursera) and paid (Udemy, Lynda)—devoted to some aspect or other of learning to code.

But which of these are actually likely to be effective in producing learning? Researchers who study how students learn have developed a rich body of well-tested ideas and recommendations around evidence-based teaching and learning—that is, the kind where you can unambiguously measure whether learning is taking place—but which if any of the online teach-yourself-to-code resources actually follow these principles?

My colleague Prof. Andrew Ko at the University of Washington and his student Ada Kim have produced a very nice paper addressing this question, which they presented at SIGCSE 2017, the largest conference focused on computer science education. Kim and Ko studied 30 free and paid highly popular online resources for teaching yourself beginning coding skills.

They have numerous findings, but the one that struck me most was that while many of these sites cover most of the same material (with varying degrees of beginner-friendly organization and quality of presentation), most focus on how to understand and practice a particular concept in coding, but provide little practice on identifying when and why to apply that concept in real programming situations. While Kim and Ko stop short of making “hard” recommendations, they find that the tutorials from, and online games such as Gidget, Lightbot, and CodeHunt, do the best job at incorporating elements of evidence-based teaching that have been shown to promote learning:

  • Provide immediate and personalized feedback to the learner
  • Organize concepts into a hierarchy with clear goal-directed for learning each concept
  • Opportunity for mastery learning, i.e. to practice a concept repeatedly until understanding is complete
  • Promote meta-cognition—knowing how and when to use a concept, not just how to use it
With everyone trying to make a buck (or just get visibility/sell ad space) claiming to teach beginners to code, a critical look at these resources through the lens of Computer Science Education Research is a welcome breath of fresh air. Read their 6-page paper here.

Thursday, March 2, 2017

The 7 (or so) habits of highly successful projects

Each time we offer at UC Berkeley based on the Engineering Software as a Service curriculum, students complete a substantial open-ended course project in teams of 4 to 6.  Each team works with a nonprofit or campus business unit to solve a specific business problem using SaaS; the student organization Blueprint helps us recruit customers.

Besides creating screencasts demonstrating their projects, students also participate in a poster session in which we (instructors) ask them to reflect on the experience of doing the project and how the techniques taught in the class helped (or didn't help) their success.

Virtually all project teams praised BDD and Cucumber as a way of driving the development of the app and reaching agreement with the customer, and all agreed on the importance of the proper use of version control. Beyond that, there was more variation in which techniques different teams used (or wished in retrospect that they had used). Below is the most frequently-heard advice our students would give to future students doing similar projects, in approximate order of popularity (that is, items earliest in the list were independently reported by the largest number of project teams).
  1. Reuse, don't reinvent. Before coding something that is likely to be a feature used by other SaaS apps (file upload, capability management, and so on), take the time to search for Ruby gems or JavaScript libraries you can use or adapt. Even two hours of searching is less time than it takes to design, code and test it yourself.

  2. Start from a good object-oriented design and schema. Taking time to think about the key entity types (models), relationships among them, and how to capture those relationships in a schema using associations and foreign keys. A good design reduces the likelihood of a painful refactoring due to a schema change.

  3. Weekly meetings are not enough.  Especially with the larger 6-person teams we used in the Fall 2013 course offering (237 students forming 40 teams), a 15-minute daily standup meeting helped tremendously in keeping everyone on track, preventing conflicting or redundant work, and informally sharing knowledge among team members who ran into problems. Teams that met only once a week and supplemented it with online chat or social-networking groups wished they had met more often.

  4. Commit to TDD early.  Teams that relied heavily on TDD found its greatest value in regression testing: regression bugs were spotted immediately and could be fixed quickly. Teams that didn't commit to TDD had problems with regression when adding features or refactoring. Teams that used TDD also noted that it helped them organize not only their code, but their thoughts on how it would be used ("the code you wish you had").

  5. Use a branch per feature. Fine-grained commits and branch-per-feature were essential in preventing conflicts and keeping the master branch clean and deployment-ready.

  6. Avoid silly mistakes by programming in pairs.  Not everyone paired, but those who did found that it led to higher quality code and avoided silly mistakes that might have taken extra time to debug otherwise.

  7. Divide work by stories, not by layers.  Teams in which one or a pair of developers owned a story had far fewer coordination problems and merge conflicts than teams that stratified by layer (front-end developer, back-end developer, JavaScript specialist, and so on) and also found that all team members understood the overall app structure better and were therefore more confident when making changes or adding features.
There you have it—the seven habits of highly successful projects, distilled from student self-reflections from approximately sixty projects over two offerings of the course.  We hope you find them helpful!

Monday, December 19, 2016

“Towards a new hacker ethic”

I just read the transcript (and saw slide deck) of a great talk, Programming is Forgetting: Toward a New Hacker Ethic by Allison Parrish at the 2016 Open Hardware Summit.
The whole talk is worth watching/reading, but the part I liked best is Parrish’s reformulation of the “hacker ethic” laid out in Steven Levy’s book Hackers in the 1980s. Levy set out to chronicle the cultural history of hacking (the good kind, i.e. building and tinkering with computing and technology systems, as opposed to maliciously subverting them) and he claimed to summarize the “hacker ethic” in the following four points:
  • Access to computers should be unlimited and total
  • All information should be free
  • Mistrust authority; promote decentralization [of control and content]
  • Hackers should be judged by their hacking [skills], not “bogus” criteria such as degrees, age, race, or position
The gist of Parrish’s talk is that while the above principles are noble, the anecdotes Levy recounts of hacker behavior are often inconsistent with the above points, including a noteworthy anecdote in which some hackers “exercised their ethic” in a way that dismissively interfered with the work of Margaret Hamilton, who would go on to invent the term “software engineering” and to be the lead architect of the guidance computer software for the Apollo missions that landed the first humans on the moon.

For me the best part of the talk was Parrish’s proposed reformulation of the “hacker ethic” in a way that takes the form of questions that creators of technological systems should ask themselves as they think about deploying those systems. I doubt I can improve on her phrasing, so I’ll quote the talk transcript directly:
“…my ethic instead takes the form of questions that every hacker should ask themselves while they’re making programs and machines. So here they are.
Instead of saying access to computers should be unlimited and total, we should ask “Who gets to use what I make? Who am I leaving out? How does what I make facilitate or hinder access?”
Instead of saying all information should be free, we could ask “What data am I using? Whose labor produced it and what biases and assumptions are built into it? Why choose this particular phenomenon for digitization or transcription? And what do the data leave out?” 
Instead of saying mistrust authority, promote decentralization, we should ask “What systems of authority am I enacting through what I make? What systems of support do I rely on? How does what I make support other people?” 
And instead of saying hackers should be judged by their hacking, not bogus criteria such as degrees, age, race, or position, we should ask “What kind of community am I assuming? What community do I invite through what I make? How are my own personal values reflected in what I make?”
A few weeks ago a Medium post went viral in which developer Bill Sourour discussed “The code I’m still ashamed of”. At the direction of his manager, he had written code for a website that was posing as a general-information website where you could take a quiz to determine what prescription drugs were recommended for your particular symptoms and condition. In fact, though, the website was effectively an advertisement for a specific drug, and no matter what your responses to the quiz, the recommendation would always be the same—you needed this company’s drug. (A young woman later killed herself due to depression attributable in part to consuming the drug.) Business Insider reported that Sourour’s post had triggered a storm of “confessions” on Reddit from other engineers who were ashamed of having done similar things under duress, and includes some pointed comments from software thought leader “Uncle Bob” Martin such as “We are killing people.” He warns us that the Volkswagen emissions-cheating scandal was probably just the tip of the iceberg, and that even though in this case the CEO was ultimately held accountable (which doesn’t always happen), “it was software developers who wrote that code. It was us. Some programmers wrote cheating code. Do you think they knew? I think they probably knew.”

Uncle Bob goes on to lament that coding bootcamps rarely include any required material on software ethics, and I'm beginning to fear we don't do enough at Berkeley either. In my work as a college instructor, I do have to deal with breaches of academic integrity of various sorts, from straight-ahead plagiarism to students paying freelancers to do their homework to students presenting false documentation about medical emergencies to avoid taking an exam. Disturbingly often, when these students are confronted with evidence of their actions, their only remorse seems to be that they were caught, and I find myself wondering whether they are the software writers who will go on to insert “cheat code” into a future consumer product. We do have a required Engineering Ethics course and there is a software engineering ethics code endorsed by the Association for Computing Machinery, but I worry that our ethical training doesn't have sharp enough teeth. As Uncle Bob wrote, “We [software developers] rule the world, we just don’t know it yet.” We’d better start acting like it. Self-reflection questions like those proposed by Parrish would be a good place to start.

Tuesday, December 13, 2016

Getting a good software internship in industry or academia

I hire summer students all the time to work on software projects related to my research and teaching at Berkeley, and I have students coming to me frequently asking for advice on either getting a good internship or selecting among various offers. Leaving aside the (very important) nontechnical aspects of your interview or job choice, here's my technical advice from the point of view of what I look for in a software hire (and, not coincidentally, what I teach in Berkeley's software engineering course):

Have a story for testing. Testing and debugging software is much harder than writing it. The question you should be prepared to answer is not "Do you test?" but rather "How do you test?" Do you have tests that can be run programmatically? Do you have any idea how much of your code is covered by tests?

Demonstrate that you know the ecosystem. The developer ecosystem, roughly centered around GitHub, now includes continuous integration (Travis, Jenkins), project tracking (Pivotal Tracker), code quality measurement (CodeClimate), and more. How do you use these tools?

Have you worked in a team? Software development is rarely a solo endeavor anymore; most interesting software is built by teams. What practices does your team use to organize its activities (scrum, standups)? To manage simultaneous development (branches, fork-and-pull)? To communicate and stay on same page about the project (Slack, Basecamp)?

Be a full-stack developer.  What can I do with a front-end developer who can't write the back end? That's like someone who can build the front of my house but not the side walls or stairs. What I need is a house, even if a very simple one. Similarly, even a back-end developer must be able to get my app out to the end user in some way, even if the user interface is modest and simple.

Get good at picking up new languages and frameworks. This is more relevant for full-time jobs, but I'd rather hire someone who can learn new things fast than someone who's an expert on whatever framework I happen to be using right now, since new frameworks and languages come along all the time. How would you demonstrate that you can learn things fast?

Understand Unix. The single most influential set of ideas in modern server and client operating systems derives from the Unix programming model. The ability to quickly put together simple tools to do a job is vital to developer productivity. If I ask you "Given a set of large text files, count approximately how many distinct email addresses they contain," and your first question is "In what language," we're done.  One plausible answer is the shell command:

    grep -E -o '\b\w+@\w+\b'  filenames | sort | uniq | wc -l

Do your own projects.  If the only software you write is in course projects, I'd wonder if you really love coding enough. Imagine trying to compete in a tennis tournament if the only time you had spent on the court was during your actual tennis lessons. Class projects just don't give you enough mileage to get really good at software; doing your own projects (extracurriculars, pro-bono work, hackathons, personal projects, contributing to an open-source project, whatever) is essential. Be prepared to answer "What is the coolest personal software project you've worked on and what made it exciting/challenging/a great learning experience?"

Show me your code.  As my colleague and GitHub engineer Jesse Toth has succinctly put it, “Your code is your résumé.” A résumé without evidence of coding skill is minimally useful. If you can’t make your GitHub public repos part of your code portfolio or add me as a view-only collaborator on some repos, at least send me a link to a tarball of code you’ve written.

Sunday, December 11, 2016

"Keeping the lights on" for ESaaS-built pro-bono software deployments

Berkeley's software engineering course, which I developed with Prof. Dave Patterson, has something important in common with the nonprofit AgileVentures, founded by Prof. Sam Joseph (who also is the lead facilitator for our edX MOOC on Agile Development).

Both organizations allow developers-in-training (students in the case of Berkeley; professionals in the case of AV) to work on open source pro bono software projects in a mentored setting, usually for nonprofits and NGOs. Indeed, since 2012, Berkeley students have delivered/deployed over 100 such projects, many still in production use by the original customer.

A perennial problem we've had, though, is what to do when each offering of the course ends. How can these nontechnical customers arrange for basic maintenance of their apps if they don't have any IT staff who can do this? Even if the customer wants future students to continue working on the software, it needs to be kept running until the next course offering.

This semester (Fall 2016) we're trying something new. AgileVentures is introducing a "Nonprofit Basic Support" membership tier that gives nonprofits basic support for maintaining these SaaS apps. For a very low monthly fee, an AgileVentures developer will be the maintenance contact for the app, ensure its "lights are kept on" (restart when needed, etc.), and advise the customer if the app needs other attention, for example, if the app's resource needs require it to be moved to a paid or higher hosting tier.

The goal is to "keep the lights on" either until the next team of students or AV developers further enhances the app, or until the customer decides to take over maintenance of the app themselves (or move it to another contractor).

Of course, a few customers don't need this service; they may already have in-house staff, or the app may be one that was already in use and for which our student developers just provided new features. But for the majority of customers who are nontechnical and may not even be able to afford in-house IT for maintaining these apps, we look forward to seeing how this experiment works out!

Thursday, September 29, 2016

Agile DevOps

If you're surprised at how frequently in this blog I mention articles from Communications of the ACM ("CACM"), you're missing out. Especially if you're a student, membership is inexpensive and the flagship monthly magazine tends to be full of really good stuff relevant to both practice and research.

Today I'm blogging about an article in the July 2016 issue (yes, I'm behind in my reading) talking about the "small batches principle" for Dev/Ops tasks. The article is written by a Dev/Ops engineer at Stack Overflow, the now-legendary site where all programmers go to give and receive coding help and which was co-founded by Joel Spolsky, one of my software heroes and author of the excellent Joel On Software blog and books.

This article talks about various in-house tasks associated with deploying software, such as pre-release testing and hot-patching, building and running a monitoring system, and other tasks that this company (and many others) historically did once in a great while. The month during which a new release was being tested and deployed became known as "hell month" because of the magnitude and pain of the tasks.

The article describes how Stack Overflow has migrated to a model of doing smaller chunks of work incrementally to avoid having to do very large chunks every few months; how they moved to a "minimum viable product" mentality (what is the simplest product you can build that will enable you to get feedback from the customer to validate or reject the features and product direction); how they adopted a "What can get done by Friday"-driven mentality, so that there would always be some new work on which their customers (in this case, the development and test engineers) could comment on; and so on.

Essentially, they moved to an Agile model of Dev/Ops: do work in small batches so each batch minimizes the risk of going off in the wrong direction; deploy incremental new changes frequently to stimulate customer feedback; thereby avoid painful "hell months" (possibly analogous to "merges from hell" on long lived development branches).

Agile applies to a lot of things, but this article does a nice job of mapping the concepts to the Dev/Ops world from the pure development world.

Tuesday, September 20, 2016

Flipped classroom? No thanks, I'd rather you lecture at me

As an early adopter and enthusiast of MOOCs, I've followed the "flipped classrooms" and "lecture is dead" conversations with some interest. Indeed, why would students attend lecture—and why would professors repeat last semester's lectures—when high-quality versions are available online for viewing anytime?

This semester, I'm once again teaching Software Engineering at Berkeley to about 180 undergraduates. (Past enrollments have been as high as 240, but we capped it lower this semester to make projects more manageable.) In the past, Dave Patterson and I have team-taught this class and we've developed a fairly well-polished set of lectures and other materials that are available as a MOOC on edX, Agile Development Using Ruby on Rails, and which we also use as a SPOC.

Last spring, by the end of the semester our lecture attendance had plummeted to about 15% of enrollment. We surveyed the students anonymously to ask how we could make lecture time more valuable. A majority of students suggested that since they could watch pre-recorded lectures, why not use lecture time to do more live coding demos and work through concrete examples? In other words, a classic "flipped lecture" scenario.

So this semester I dove in feet-first to do just that. Less than 20% of my lecture time has been spent covering material already in the recorded lectures; instead, we made those available to students from the get-go. The rest has consisted of live demos showing the concepts in action, and activities involving extensive peer learning, of which I'm a huge fan. I've even started archiving the demo "scripts" and starter code for the benefit of other instructors. The "contract" was that we would list which videos (and/or corresponding book sections) students should read or watch before lecture, and then during the lecture time (90 minutes twice a week) the students would be well prepared to understand the demo and ask good questions as it went along.

I told the students I would also sprinkle "micro-quizzes" into the lectures to spot-check that they were indeed reading/viewing the preparatory materials. The micro-quiz questions were intended to be simple recall questions that you'd have no trouble answering if you skimmed the preparatory material for the main ideas. (We use iClicker devices that are registered to the students, so we can map both responses and participation during lecture to individual students.)

Today in lecture, a bit more than 4 weeks into the course, I've officially declared the flipped experiment a failure. (Well, not a failure. A negative result is interesting. But it's not the result I expected.)

Since we made the pre-recorded lectures available within an edX  using the edX platform itself, and students have to login with their Berkeley credentials to access the course, we can see using edX Insights (part of the platform's built-in analytics) how many people are watching the videos.

According to the edX platform's analytics, the typical video is partially viewed by about 20 people. Only 45 people have ever watched any video to completion. Remember, this is in a class of 180 enrolled students, whose previous cohort specifically requested this format.

Maybe people are watching the videos after lecture rather than before? If they were, you'd expect video viewing numbers to be higher for videos corresponding to previous weeks, but they're not.

Maybe people are reading the book instead? If they were, the performance on the microquizzes should be nearly unimodal—they are simple recall questions that you cannot help but get right if you even skim the video or the book—but in fact the answer distributions are in some cases nearly uniform.

Maybe people already know the material from (e.g.) an internship? One or two students did approach me after today's lecture to tell me that. I believe them, and I know there are some students like this. But we also gave a diagnostic quiz at the beginning of the semester, and based on its results, very few people match this description.

Maybe students don't have time to watch videos or read the book before lecture? This is a 4-unit course, which means students should nominally be spending 12 hours per week total on it, of which lecture and section together account for only 4. The reading assignments, which can be done instead of watching the videos, average out to 15-20 pages twice a week, or 30-40 pages per week. Speaking as a faculty member leading an upper-division course in the world's best CS department at the world's best public university, I don't believe 30-40 pages of reading per week per course is onerous. Also, in past offerings of the course, we've polled students towards the end of the course to ask how many hours they are actually spending per week. The average has consistently been 8-10 hours, even towards the end where the big projects come due. So by the students' own self-reporting, there's 2-4 hours available to either do the readings or watch the videos.

As you might imagine, planning and executing live demos requires a couple of hours of preparation per hour of lecture to come up with a demo "script", stage code that can be copy-pasted so students don't have to watch me type every keystroke, ensure the examples run in such a way as to illustrate the correct points or techniques, remember what to emphasize in peer-learning questions at each step of the demo, and so on. But it's discouraging to do this if at most 1/9 of the students are doing the basic prep work that will enable them to get substantive value out of the live demo examples.

So at the end of lecture today I informed the students that I wouldn't use this format any longer—those who wanted to work through examples could come to office hours, which have been quiet so far—and I asked them to vote using the clickers on how future lectures should be conducted:

(A) Deliver traditional lectures that closely match the pre-recorded ones
(B) Cancel one lecture each week, replacing it with open office hours (in addition to existing office hours)
(C) Cancel both lectures each week, replacing both with open office hours (ditto)
(D) I have another suggestion, which I am posting right now on Piazza with tag #lecture
(E) I don’t care/I’ll probably ignore lecture regardless of format

Below are the results of that poll. The people have spoken. Less work for me, but disappointing on many levels, and I feel bad for the 20-30 people who do prepare and who have been getting a lot of value out of the more involved examples in "lecture" that go beyond the book or videos. And it doesn't seem like a great use of my time to do a live performance of material that's already recorded and on which I'm unlikely to improve, though it is a lot less work for me (if not as interesting).

(Note that the number of poll respondents adds up to 121, consistent with previous lectures this semester. So even in steady state at the beginning of the semester, 1/3 of the students aren't showing up, even though participation in peer-instruction activities using the clickers counts for part of the grade.)

(Update: I posted a link to this article on the Berkeley CS Facebook page, and many students have commented. One particularly interesting comment comes from a student who took my class previously: “When I took your class, as someone who rarely, if ever, went to lecture or prepared, the microquizzes helped me convince myself to do that. They weren't hard, so getting them wrong was just embarrassing to me. I was probably the best student I ever was until you stopped doing them my semester for reasons I can't recall and after that the whole thing fell apart for me.” So maybe I should just stick to my guns on the "read before lecture" part and make the microquizzes worth more than the ~10% they're currently worth...)

A year or so ago, I approached the campus Committee on Curriculum and Instruction to ask whether they'd be likely to approve a "lab" version of this course, in which there is no live lecture (only pre-recorded), the lecture hours are replaced with open lab/office hours, and all the other course elements (teaching assistants, small sections, office hours for teaching staff, etc.) are preserved. They said that would likely be fine. It's sounding like a better and better deal to me given that the majority of students want me to spend my time doing something nearly identical to what I've done in past semesters. And if previous semesters are any indication—and this has been very consistent since I started teaching this course in 2012—lecture attendance will fall off about halfway through. (I won't be surprised if the rationale given is that the lectures are available to watch online, even though the data we're gathering this semester shows that's clearly not happening.)

In an ideal universe, maybe we'd have 2 versions of the course, one tailored to people who do the prep work and want a fast-paced in-depth experience in lecture and another for people who prefer to be lectured to in the traditional manner. But we don't live in that ideal universe because of finite resource constraints. And one could argue that we already have that second version of the course—it's on edX and is the highest-grossing and one of the most popular courses on the platform, and it's free.

Instructors: What has your experience been flipping a large class?

Students: What would you do?