Thursday, December 28, 2017

Funding agencies, are you listening? How to build an unsuccessful research center

In 2014, my colleague David Patterson had a CACM Viewpoint entitled How to Build a Bad Research Center, offering tongue-in-cheek bad advice to build a research center that's unlikely to produce breakthrough work. Dave used the recent wrap-up of the Berkeley Parallel Computing Lab (ParLab) as a case study in building a successful center, following a pattern used by many successful centers for which Dave had been a PI or co-PI.

About a year ago (I'm behind in my reading), Jesper Larsson Träff, a parallel computing researcher at TU Wien, wrote an eerily similar Viewpoint on (Mis)managing Parallel Computing Research Through EU Project Funding, which makes the point that the bureaucracy and heavyweight project management style required for multinational EU Projects (solicited regularly by the European Commission), including reliance on artificial "deliverables" that seldom correlate with actual research breakthroughs, thwart efforts to do groundbreaking work.

Reading Träff's litany of bad research-management practices, it's almost as if the EU Project commission read the "Eight Commandments for a Bad Center" in Dave's article, but failed to realize they were supposed to be satirical. Both Dave's and Jesper's short pieces are well worth reading in their entirety, and if you're pressed for time, I hope these side-by-side representative quotes from each may convince you to do so:

Patterson: "Bad Commandment #2: Expanse is measured geographically, not intellectually. For example, in the U.S. the ideal is having investigators from 50 institutions in all 50 states, as this would make a funding agency look good to the U.S. Senate."

Träff: "…the enforced geographical and thematic spread among consortium members in reality means that time and personnel per group is often too small to pursue serious and realistic research."

Patterson: "Bad Commandment 8. Thou shalt honor thy paper publishers. Researchers of research measure productivity by the number of papers and the citations to them. Thus, to ensure center success, you must write, write, write and cite, cite, cite."

Träff: "As part of the dissemination activities it is important for all consortium members to show regular presence at various associated, often self-initiated workshops and conferences, including meetings and workshops of other projects; high publication output is encouraged … The primary purpose of many of these activities is to meet the dissemination plans [deliverables], and has led to a proliferation of workshops presenting and publishing project-relevant trivialities. Apart from the time this consumes, the apparent authority of a workshop masquerading as a scientific event at a well-established conference may reinforce docility and low standards in the naive Ph.D., and appall and deter the more observant one."

Patterson: "A key to the success of our centers has been feedback from outsiders. Twice a year we hold three-day retreats with everyone in the center plus dozens of guests from other institutions [including industrial partners]. Having long discussions with outsiders often reshapes the views of the students and the faculty, as our guests bring fresh perspectives… Researchers desperately need insightful and constructive criticism, but rarely get it. During retreats, we are not allowed to argue with feedback when it is given; instead, we go over it carefully on our own afterward."  Note: In the Berkeley research center model, industrial affiliates pay a fee to participate in the project.

Träff: "…in many cases industrial involvement makes a lot of sense. However, it seems confusing at best to (ab)use scientific research projects to subsidize European (SME) industry. Can't this be done much more effectively by direct means? In any case, it would be more transparent and less ambiguous if industrial participation in EU projects was not directly funded through the projects. Strong projects would be interesting enough that industry would want to participate out of its own volition, which in particular should be the case for large businesses."  Note: In the EU Project research center model, public funds are used to pay private companies to participate in the project.

Träff closes his article with some suggestions for moving towards a "radically better" funding model for European high-performance computing research. In this section, unsurprisingly, Träff and Patterson find themselves in agreement:

Träff: "[P]roposals and projects to a larger extent [should] be driven by individual groups with a clear vision and consortium members selected by their specific project-relevant expertise. It would make long-term scientific, and perhaps even commercial sense to make it possible and easy to extend projects that produce actual, great results or valuable prototypes…more possibilities for travel grants and lightweight working groups to foster contacts between European research groups, and more EU scholarships for Ph.D. students doing their Ph.D. at different European universities would also be welcome additions."

Patterson: "After examining 62 NSF-funded centers in computer science, the researchers found that multiple disciplines increase chances of research success, while research done in multiple institutions—especially when covering a large expanse—decreases them: 'The multi-university projects we studied were less successful, on average, than projects located at a single university. ... Projects with many disciplines involved excelled when they were carried out within one university  (J. Cummings and S. Kiesler, Collaborative research across disciplinary and organizational boundaries, Social Studies of Science 35, 5 (2005), 703–722.)"

While the two pieces don't overlap 100%, there are great lessons to be learned from reading both.

The question is whether the responsible funding agencies are reading them.

Tuesday, December 19, 2017

"The Death of Big Software"

The current edition of our ESaaS textbook makes the case that as early as 2007, Jeff Bezos was pushing Amazon towards an "API only" integration strategy, in which all services would have to be designed with externalizable APIs and all inter-service communication (even inside the company) would have to occur over those APIs.  We pointed out that this service-decomposition approach to designing a large service—i.e. compose it from many microservices—was one of the first highly-visible successes of "service-oriented architecture," which in 2011 had begun to acquire a reputation as a content-free buzzphrase bandied by technomarketeer wanna-be's. Indeed, I've recently reported on other CACM pieces extolling the benefits of microservices-based architecture and how such an organization enables a more agile approach to Dev/Ops.

As we reported in ESaaS, enormous monolithic projects (the opposite of SOA) fail too often, are too complex to manage, and so on. Since small agile teams (Bezos calls them "2-pizza teams") work best on modest-sized projects, all the more reason to decompose a very large service into its component parts, and put a small agile team on each part. But a Viewpoint (op-ed) article in the December 2017 CACM goes farther and asserts simply that "big" monolithic software systems are dead.  In addition to the execution problems, big monolithic systems, with their implied vendor lock-in and very long development cycles, are anathema to a nimble business. And the advent of inexpensive cloud computing removed an important deployment obstacle to "going SOA", since a multi-component app could be deployed and accessed from anywhere, even if the components came from different vendors.

All of which is to say that the time is more than ripe for the Second Edition of ESaaS (probably coming sometime next year) to take an "API first" approach to service design: rather than thinking in terms of an app and its features, for the architecture design phase think in terms of resources (in the REST sense) and operations you want to expose on them. Expect a thoroughly revamped discussion of SaaS architecture including a précis of the last 5-8 years of evolution, quickly accelerating from the early stumbling blocks of SOAP-based SOA through the growth of simple HTTP/XML/JSON-based RESTful services and now to an "API first" microservices design approach.  SwaggerHub, here we come!



Berkeley SaaS course Demo Day Competition winners!

In the Berkeley campus version of the ESaaS course (Computer Science 169, Software Engineering), which typically enrolls anywhere from 100 to 240 students working in teams of six, we used to have an end-of-semester poster session where every team would have a poster about their project. While it made for a lively event, it was hard for me (the instructor) to spend more than a few minutes at each poster. Of course, there are many other project deliverables, so the grade didn't really depend very heavily on the poster, but I still felt that it was a hurried and stressful process that could stand to be improved.

Last summer, Cal alumna Carina Boo, who had previously served as a TA for this course and is now a full-time software engineer at Google, was hired to guest-teach the course over the summer. It was very successful, and one of her innovations was to substitute an optional Demo Day Competition for the poster session. Participation would be completely voluntary and not carry any extra-credit points; each team would have up to a 10-minute slot to both discuss the challenges/lessons learned from the experience of working with their customer and to present some technical discussion of challenges in their app that would impress a panel of three judges who are all full-time software engineers as well as former students and former TAs of this course (Kevin Casey, now at Facebook, and Tony Lee, now at Splunk).

I decided to adopt this format for this semester as well, and it worked so well that I believe I'll keep it for future semesters. 9 out of 21 teams signed up completely voluntarily to give 10-minute presentations, and some of their customers showed up as well to offer an endorsement of the team’s work. The presentations and the work represented by them were uniformly excellent, and I’m pleased to congratulate the winners here:

1st place: Berkeley MarketPlace, a Berkeley-exclusive C2C buyer and seller market (also won Best UI/UX subcategory). Congratulations to Jiazhen Chen, Shuyin Deng, Amy Ge, Yunyi Huang, Ashton Teng, Jack Ye

2nd place: ESaaS Engagements, an app to track ESaaS customers across different institutions. Congratulations to Sungwon Cho, Joe Deatrick, Kim Hoang, Dave Kwon, Kyriaki Papadimitriou, Julie Soohoo.

Runner up: Berkeley Student Learning Center, an app to help schedule tutoring sessions. Congratulations to Jennifer Be, Haggai Kaunda, Maiki Rainton, Nah Seok, Salvador Villegas, Alex Yang.

Most Technically Challenging: a tie between two projects that the judges considered to have surmounted unusually tough technical challenges:
  • iC-uC, a lab app for optometry students  help model eye geometry and vision for the UC Berkeley Optometry School. The app had to integrate a Matlab terminal and graphs with a Java wrapper and a Rails front-end.  
  • Audience1st, a ticketing and backoffice app used by various community theaters in the Bay Area. Among the features added was the online migration of the entire authentication system to OmniAuth, without disturbing the existing login experience, and without access to customers’ current passwords.



Friday, October 6, 2017

Making CS/IT education more immersive

This month's Communications of the ACM includes an article by Thomas A. Limoncelli on Four Ways to Make CS and IT More ImmersiveThe author laments that by and large, CS students aren't learning best practices in the classroom, and offers four specific pieces of advice for serving students better.


1. Use best-of-breed Dev/Ops tools from the start: the normal way for students to work on coding tasks should be to use Git, submit via commit/push, and provide pointers to CI reports.

In our ESaaS course and MOOC, students do just this. Git is used from day one, and student projects (in the campus version of the course, and a coming-soon Part 3 MOOC) are required to include coverage, CodeClimate, and CI badges on their repo front pages. Homework submission is often via deployment to Heroku.

2. Homework programs, even Hello World, should generate Web pages, not text. This requires understanding minimal bits of SaaS architecture, languages, and moving parts.

ESaaS’s first assignment is indeed a Hello World for Ruby, but the very next one has students manipulating a simple SaaS app built using Sinatra. The goal is to get students “thinking SaaS” early on.

3. Curricula should start with a working system that shows best practices, not just build from low-level to higher-level abstractions.

ESaaS’s second assignment has students examining code for a simple Sinatra app that follows good coding practices, includes integration and unit tests (before students have been introduced to creating or reading code for such tests), and is activated by deploying to the public cloud.

4. Focus on reading, understanding, and adding value to an existing system—not just on greenfield development. The former is much more common than the latter in the software engineering profession.

ESaaS indeed includes a 2-part homework assignment on enhancing legacy code, but more exciting, an increasing fraction of all projects undertaken in Berkeley’s version of the course (10/11 projects in summer 2017 offering of the course; 13/20 in Fall 2017 offering) are existing, functioning legacy systems wanting additional features.

Our course materials are freely available on edX as the 2-part (soon to be 3-part) MOOC sequence Agile Development Using Ruby on Rails, and plenty of instructor materials are available for teachers wanting to use this content or the accompanying textbook in a SPOC.


Tuesday, September 5, 2017

Improving gender & ethnic diversity in CS: what can faculty do?

Recently, my colleagues Dave Patterson, John Hennessy and Maria Klawe, all storied contributors to both academic and professional computing, wrote an excellent essay enumerating all the things that were incorrect (and there were many) in Google engineer James Damore's "anti-diversity post." For those who didn't hear about said post, which initially circulated internally at Google but eventually found its way onto the public Internet, Damore argued that the reason there are so few women in computing is because biological differences make women less fit than men for the technical and leadership activities associated with computing, and that companies who strive for gender parity are therefore misguided.

I can’t improve on my colleagues’ extensive rebuttal of what was a poorly-argued piece costumed in the garb of science, but as I just gave a phone interview to the Daily Cal (UC Berkeley’s student-run newspaper) on this topic, I thought I’d summarize my comments in this post too.

While Berkeley (and most of our peer institutions) make ongoing and vigorous efforts through a panoply of programs designed to improve the representation of women and underrepresented minorities (URMs) in STEM fields, I believe such programs can only work if they are driven from the relationships that structure students’ day-to-day lives on campus and collectively form an institutional culture. Some of those relationships are among students themselves—as peers in a course, between student and TA, or as part of a student-run club or academic group. Other relationships are between students and faculty—as course instructors, research advisors, or sponsors/liaisons to student-run activities.

In my faculty role, I try to do two things: lead by example in my interactions with students, and listen to them.

Leading by example. There’s quite a bit we can do in our courses and research projects, such as:

    • Invite women and URM guest speakers, both to talk about their experience of being part of an underrepresented demographic in tech and just to give interesting technical talks.
    • If we supervise GSIs, or are the faculty sponsor of a student group, DeCal course, or other student-centric activity, we can make it a point when engaging with those students to bring up unconscious bias and the importance of improving diversity in our field. Especially for GSIs, their behavior, language, and demeanor are important because of their position of authority.
    • We can make students aware of gender/ethnic diversity results that are directly relevant to course if possible. In my software engineering project course, I encourage project teams to make an effort to include women, as team research has found the presence of women on a team to be a strong predictor of team success. Indeed, a study of GitHub code repositories found that women's contributions to code bases were more likely to be accepted, unless you could tell from their profile that they were women!  I also often introduce bits of computing history (it’s my hobby) that are relevant to the lecture topics; most students aren’t aware that the “masculinization” of computing didn’t really start in earnest until the 1960s, and that until then virtually all programmers were women. In CS375,  a required GSI orientation and training that I’ve co-taught, we do an in-class activity based on Google’s unconscious-bias training that directly demonstrates our proneness to unconscious bias. 


Listening.  As faculty, we can decide to allocate time to participate in programs designed to support women and URMs entering the field. One example is the Berkeley CS Scholars program, of which I'm a faculty sponsor. When I do get to interact with women or underrepresented minorities, I listen to what they have to say about their experience so that you know what to fix. Was there a difficult interaction with a professor? with a GSI? with a fellow student? Unless we ourselves have had the experiences that many women or minorities have had in computing (and given the problem statement, most of us haven’t), this is the only way to reliably understand what to work on.

Department- and University-supported diversity programs are great, and leaders like Patterson, Hennessy and Klawe have to stand up and point out the problems in posts like Damore's. That is a necessary but not sufficient part of fixing the problem. The “mentality of a field” is based on the values embodied in the relationships among people in that field, and as faculty it’s part of our responsibility to bring those values to every student relationship in which we participate.




Thursday, August 24, 2017

Our techno-fetishism problem

A frequent meme in both the Software Engineering class I teach and during my open office hours (when any student can come for advice or discussion on anything) is what appears to be a premature fear of lack-of-scalability when developing Internet services/Web apps. This concern is often hard to disentangle from techno-fetishism—the often irrational desire to use the latest rockstar tech, without really understanding whether it solves a problem you have, but providing the flimsy rationalization that “we need to use it for {scalability | dependability | having a good user experience | responsiveness}.”

There are various manifestations of this meme, including:
  • "We want to use {Mongo | Cassandra | your favorite NoSQL DB} because relational databases don't scale."
  • "We want to write our server in Node.js because synchronous/blocking/task-parallel appservers don't scale." (Weaker version: replace "don't scale" with "require more servers".)
  • "Our app is going to be huge, and we'd like your advice on how to achieve high scalability and 24x7 dependability across our soon-to-be-worldwide user base."
These thoughts are well-intentioned; many of these students are old enough to remember how MySpace crashed and burned in part because of its inability to get these things right, leaving a vacuum into which Facebook immediately stepped, and the rest is history, etc.

But since that time, the tools ecosystem has come a long way, computers have gotten way faster, and server cycles and storage have dropped in price by one or more orders of magnitude.

My challenge is therefore to help these students understand two things clearly:
  1. No technology will help you if your fundamental design prevents scaling (your app is poorly factored and/or cannot be divided into smaller services that can be scaled separately).
  2. If your app is well-factored, these days you can get surprisingly far with conventional RDBMS-backed stacks on commodity hardware. 
Ozan Onay makes this point well in his post "You are not Google," which incidentally includes quotes from my Berkeley colleague Prof. Joe Hellerstein. As Onay points out:
"As of 2016, Stack Exchange [the forum system of which StackOverflow is just one part] served 200 million requests per day, backed by just four SQL servers: a primary for Stack Overflow, a primary for everything else, and two replicas."
In other words, the highest-traffic interactive Web app used by developers all over the world uses the most mundane (in many developers’ view) of technologies—most SQL servers support this kind of master/slave replication by simply tweaking a config file, no special app programming required. Similarly, as of a couple of years ago, all of Pivotal Tracker was hosted on a single RDBMS instance.

A more sarcastic view of the same meme is put forth in one of my favorite xtranormal movies on Node.js. Go ahead, watch it. It’s only a few minutes long. You’ll thank me someday.

The basic message is the same: Most new technologies, languages, frameworks, and so on evolved to fill a specific technical need, and especially if they evolved in a large company, they probably fill a technical need that you don't have, and that you may never have.

There is a subtext to the message too: it is relatively rare that a new tool or framework embodies a fundamentally new idea about software engineering. The event-loop framework used by Node.js has been around since the beginning of time; indeed, the original Mac OS required all apps to be written this way (in part because the original Mac used a microprocessor that didn’t support virtual memory, making true preemptive multitasking impossible), which was super painful. OS designers created threads to make this pain go away; threading systems do exactly what programmers are expected to do in Node. And modern threading systems running on modern hardware are damn fast.

Finally, and most importantly, no tool or framework can save you from a bad design. Some frameworks try to help you by effectively legislating some design decisions out of existence (e.g. Rails all but mandates a Model-View-Controller architecture), but there's still plenty of room to shoot yourself in the foot design-wise.

That will be your real obstacle to scaling and dependability—not the fact that you didn't go with Node.js or write your front end using React. So you can still wear that Node t-shirt, but please consider wearing it ironically.

Friday, June 16, 2017

Teach yourself to code using online tutorials…but which ones?

There’s an almost embarrassing plethora of “teach yourself to code” tutorials online—including reference sites like W3schools, interactive “try it in a browser” sites like CodeCademy and CodeHunt, beginner-friendly step-by-step activities such as those featured on Code.org, and many more. And that’s not even counting the dozens of online courses, both free (edX, Coursera) and paid (Udemy, Lynda)—devoted to some aspect or other of learning to code.

But which of these are actually likely to be effective in producing learning? Researchers who study how students learn have developed a rich body of well-tested ideas and recommendations around evidence-based teaching and learning—that is, the kind where you can unambiguously measure whether learning is taking place—but which if any of the online teach-yourself-to-code resources actually follow these principles?

My colleague Prof. Andrew Ko at the University of Washington and his student Ada Kim have produced a very nice paper addressing this question, which they presented at SIGCSE 2017, the largest conference focused on computer science education. Kim and Ko studied 30 free and paid highly popular online resources for teaching yourself beginning coding skills.

They have numerous findings, but the one that struck me most was that while many of these sites cover most of the same material (with varying degrees of beginner-friendly organization and quality of presentation), most focus on how to understand and practice a particular concept in coding, but provide little practice on identifying when and why to apply that concept in real programming situations. While Kim and Ko stop short of making “hard” recommendations, they find that the tutorials from Code.org, and online games such as Gidget, Lightbot, and CodeHunt, do the best job at incorporating elements of evidence-based teaching that have been shown to promote learning:

  • Provide immediate and personalized feedback to the learner
  • Organize concepts into a hierarchy with clear goal-directed for learning each concept
  • Opportunity for mastery learning, i.e. to practice a concept repeatedly until understanding is complete
  • Promote meta-cognition—knowing how and when to use a concept, not just how to use it
With everyone trying to make a buck (or just get visibility/sell ad space) claiming to teach beginners to code, a critical look at these resources through the lens of Computer Science Education Research is a welcome breath of fresh air. Read their 6-page paper here.

Thursday, March 2, 2017

The 7 (or so) habits of highly successful projects

Each time we offer at UC Berkeley based on the Engineering Software as a Service curriculum, students complete a substantial open-ended course project in teams of 4 to 6.  Each team works with a nonprofit or campus business unit to solve a specific business problem using SaaS; the student organization Blueprint helps us recruit customers.

Besides creating screencasts demonstrating their projects, students also participate in a poster session in which we (instructors) ask them to reflect on the experience of doing the project and how the techniques taught in the class helped (or didn't help) their success.

Virtually all project teams praised BDD and Cucumber as a way of driving the development of the app and reaching agreement with the customer, and all agreed on the importance of the proper use of version control. Beyond that, there was more variation in which techniques different teams used (or wished in retrospect that they had used). Below is the most frequently-heard advice our students would give to future students doing similar projects, in approximate order of popularity (that is, items earliest in the list were independently reported by the largest number of project teams).
  1. Reuse, don't reinvent. Before coding something that is likely to be a feature used by other SaaS apps (file upload, capability management, and so on), take the time to search for Ruby gems or JavaScript libraries you can use or adapt. Even two hours of searching is less time than it takes to design, code and test it yourself.

  2. Start from a good object-oriented design and schema. Taking time to think about the key entity types (models), relationships among them, and how to capture those relationships in a schema using associations and foreign keys. A good design reduces the likelihood of a painful refactoring due to a schema change.

  3. Weekly meetings are not enough.  Especially with the larger 6-person teams we used in the Fall 2013 course offering (237 students forming 40 teams), a 15-minute daily standup meeting helped tremendously in keeping everyone on track, preventing conflicting or redundant work, and informally sharing knowledge among team members who ran into problems. Teams that met only once a week and supplemented it with online chat or social-networking groups wished they had met more often.

  4. Commit to TDD early.  Teams that relied heavily on TDD found its greatest value in regression testing: regression bugs were spotted immediately and could be fixed quickly. Teams that didn't commit to TDD had problems with regression when adding features or refactoring. Teams that used TDD also noted that it helped them organize not only their code, but their thoughts on how it would be used ("the code you wish you had").

  5. Use a branch per feature. Fine-grained commits and branch-per-feature were essential in preventing conflicts and keeping the master branch clean and deployment-ready.

  6. Avoid silly mistakes by programming in pairs.  Not everyone paired, but those who did found that it led to higher quality code and avoided silly mistakes that might have taken extra time to debug otherwise.

  7. Divide work by stories, not by layers.  Teams in which one or a pair of developers owned a story had far fewer coordination problems and merge conflicts than teams that stratified by layer (front-end developer, back-end developer, JavaScript specialist, and so on) and also found that all team members understood the overall app structure better and were therefore more confident when making changes or adding features.
There you have it—the seven habits of highly successful projects, distilled from student self-reflections from approximately sixty projects over two offerings of the course.  We hope you find them helpful!