Monday, May 23, 2016

If you can only speak TDD, maybe you'll think TDD?

I just finished reading Through the Language Glass, an interesting popular-press exposition of the relationship between spoken language and perception, and in particular of the "weak version" of the Sapir-Whorf hypothesis, according to which a speakers' use of language influences their mental processing.
An example is given of the Guugu Yimithirr aboriginal language, in which all location-specifying speech acts use only absolute compass directions (NSEW) as opposed to relative directions. For example, if we were holding hands and standing next to each other, instead of saying "I am standing to the right of Armando", you'd say "I am standing to the north of Armando" (or whatever direction it happened to be). And if we then rotated clockwise together and faced a different direction, you'd then have to say "I'm now standing to the east of Armando." G-Y speakers have to think constantly about their orientation relative to the world, since communication about place would be impossible otherwise. Experiments are described whose provocative finding is that this makes it more difficult, e.g., for G-Y speakers to solve spatial puzzles in which the relative spatial relationships between objects are unchanged but their absolute orientation is changed.
What does this have to do with Agile+SaaS? I've been looking for ways to more strongly encourage the students in the Berkeley course to really commit to TDD. (Many do, of course, but some still write tests "after the fact" only because they know we will be grading them on test coverage.) I wonder if one could devise a lab exercise in which students develop some functionality via pair programming, but they are only allowed to speak in terms of test results when discussing with each other. This would be an interactive (vs. autograded/solo) exercise, and the goal would be to get students to use a vocabulary that effectively limits their communication to describing things in terms of test results. Eg "For this next line, the effect I want is that given a valid URL, it should retrieve a valid XML document." I know many of us already (strive to) maintain this mindset, but is anyone up for designing/scaffolding an exercise that would help us teach it to TDD initiates?

Tuesday, May 17, 2016

Amazon Echo is a fun way to practice "API life skills"

Last week was my birthday, and I got an Amazon Echo. It's a remarkably cool device, but of course the coolest thing is you can write your own apps for it. (Technically they are "Alexa apps" and not "Echo apps"—Alexa is the back end that runs the apps, the Echo is a specific device that acts as a client, but there's also mobile client apps.) Amazon has a nice development environment in which you essentially write a simple "grammar" for the phrases you want your app to understand, and some "event handlers" that get called when particular phrases are recognized. You can write the event handlers in Node.js, Python, or Java. (No Ruby, but hey, I'm ecumenical and Python is just fine.)
Students of Part 2 of ESaaS know that I have strong feelings about the use of Node.js for server-side code. (This post does a great job of summarizing my views, as does this mildly NSFW xtranormal video.) Nonetheless, Amazon's developer documentation starter examples are mostly in Node, so I decided to give it a fair shake.
I decided to make an Alexa app that gives me upcoming departure information for BART, the Bay Area's rapid transit system, which I use daily to get to work. They have an RESTful XML API in which simple HTTP GETs return an XML data structure you can parse.

What does the code look like in Node.js to do a simple GET? It looks like this:

    http.get(endpoint + queryString, function (res) {
        var noaaResponseString = '';
        console.log('Status Code: ' + res.statusCode);

        if (res.statusCode != 200) {
            tideResponseCallback(new Error("Non 200 Response"));
        }

        res.on('data', function (data) {
            noaaResponseString += data;
        });

        res.on('end', function () {
            var noaaResponseObject = JSON.parse(noaaResponseString);

            if (noaaResponseObject.error) {
                console.log("NOAA error: " + noaaResponseObj.error.message);
                tideResponseCallback(new Error(noaaResponseObj.error.message));
            } else {
                var highTide = findHighTide(noaaResponseObject);
                tideResponseCallback(null, highTide);
            }
        });
    }).on('error', function (e) {
        console.log("Communications error: " + e.message);
        tideResponseCallback(new Error(e.message));
    });


As a less-experienced JavaScript programmer, I had to stare at it for awhile to understand its flow, and then I realized my God, all it does is fetch a JSON data structure using an HTTP GET call. Here's the code I wrote in Python to do the same thing (using BART's API):

try:
  self.body = urllib2.urlopen(url).read()
except urllib2.URLError:
  self.error = 'The BART website did not respond'

But I digress. The point is writing Alexa apps is a great way to build up your API-fu, since the best Alexa apps will probably receive concise speech commands, interact with several back-end APIs, and present concise speech output. And you can write and test the apps without having a device, using Amazon's web interface to test how the app responds to specific phrases. Plus the apps can be pretty fun to use, and the device makes me feel like I live in a home of the vaguely near future.Does no one but me see what is wrong with this picture? What is app code doing trying to respond to individual HTTP-level events at the app level? This clearly violates the "A" in the SOFA principle I teach in ESaaS: a function should stick to a single level of abstraction. I'm no more impressed with Node than I was at the beginning of the exercise.
So try out writing some Alexa apps—you can even deploy them free on Lambda, a "managed AWS" service in which you deploy handlers rather than full VMs (essentially).
But for the love of all that is good…write them in Python.

Looking under the hood in cloud datacenters

There's a great article in the May 2016 issue of Communications of the ACM authored by a group of Googlers, including former Berkeley PhD student David Oppenheimer (now the tech lead on Kubernetes) and my PhD advisor Eric Brewer (Berkeley professor on leave to be Google's VP Infrastructure), on the evolution of cloud-based management systems at Google. They were doing "containerization" of cloud servers before almost anyone else, and contributed to the Linux containerization code. The article distills over 10 years of designing and working with different approaches to containerizing software to share the hardware in Google's datacenters, and some lessons learned and still-open problems. A couple of key messages:

  • The "unit of virtualization" is no longer the virtual machine but the application, since modern systems like Kubernetes and Docker virtualize the app container and the files and resources on which it depends, but actually hide some details of the underlying OS, instead relying on OS-independent APIs. Apps are therefore less sensitive to OS-specific APIs and easier to port across containers.
  • As a result of having both resource containers and filesystem-virtualization containers map to an app (or small collection of apps that make up a service), troubleshooting and monitoring naturally have affinity to apps rather than (virtual) machines, simplifying devops.
  • A "label" scheme in which a given instance of an app has one or more labels simplifies deployment and devops as well. For example, if one instance of a service is misbehaving, that instance can (eg) have the "production" label removed from it, so that monitoring/load balancing services will exclude it from their operations but leave it running so it can be debugged in situ.
The article also points to still-open problems, including (surprise!) configuration (every configuration DSL eventually becomes Turing-complete and they're all mutually incompatible), dependency management (related to configuration, but also introduces new complications, i.e. what if service X depends on Y but Y must first go through a registration or authentication step before it can be deployed), and more.

Overall a really interesting read that suggests, ironically, that the programmer-facing view of cloud datacenters is actually a "back to the 1990s" view of managing a collection of individual apps with APIs (remember ASPs?) rather than managing a collection of either virtual or physical machine images.