Dropbox dives into CoffeeScript

Uncategorized / 118 Comments Posted by Dan Wheeler, Ziga Mahkovec, Chris Varenhorst on September 13, 2012

During July’s Hackweek, the three of us rewrote Dropbox’s full browser-side codebase to use CoffeeScript instead of JavaScript, and we’ve been really happy with how it’s been going so far. This is a controversial subject, so we thought we’d start by explaining why.



By the way, the JavaScript has a scoping bug, did you catch it??

We’ve heard many arguments against CoffeeScript. Before diving in, we were most concerned about these two:

  1. That it adds extra bloat to iterative development, because each tweak requires recompilation. In our case, we avoided this problem entirely by instrumenting our server code: whenever someone reloads a Dropbox page running on their development server, it compare mtimes between .coffee files and compiled .js equivalents. Anything needing an update gets compiled. Compilation is imperceptibly fast thanks to jashkenas and team. This means we didn’t need to change our workflow whatsoever, didn’t need to learn a new tool, or run any new background process (no coffee --watch). We just write CoffeeScript, reload the page, loop.
  2. That debugging compiled js is annoying. It’s not, and the main reason is CoffeeScript is just JavaScript: it’s designed to be easy to debug, in part by leaving JavaScript semantics alone. We’ve heard many arguments for and against debuggability, and in the end, we convinced ourselves that it’s easy only after jumping in and trying it. We converted and debugged about 23,000 lines of JavaScript into CoffeeScript in one week without many issues. We took time to test the change carefully, then slowly rolled it out to users. One week after Hackweek had ended, it was fully launched.

Probably the most misleading argument we hear against CoffeeScript goes something like this: If you like Python or Ruby, go for CoffeeScript — it’s really just a matter of syntactic preference. This argument frustrates us, because it doesn’t consider history. Stick with us for a minute:

  • April 1995: Brendan Eich, a SICP enthusiast, joins Netscape with the promise of bringing Scheme to the browser.
  • He’s assigned to other projects in the first few months that he joins. Java launches in the meantime and explodes in popularity.
  • Later in ’95: Scheme is off the table. Upper management tasks Eich with creating a language that is to Java as VBScript is to C++, meant for amateurs doing simple tasks, the idea being that self-respecting pros would be busy cranking out Java applets. In Eich’s words:

    JS had to “look like Java” only less so, be Java’s dumb kid brother or boy-hostage sidekick. Plus, I had to be done in ten days or something worse than JS would have happened.

    Imagine Bruce Campbell Brendan Eich as he battled sleep deprivation to get a prototype out in 10 days, all the while baking his favorite concepts from Scheme and Self into a language that, on the surface, looked completely unrelated. LiveScript is born. It launches with Netscape Navigator 2.0 in September ’95.

  • December ’95: For reasons that are probably marketing-related and definitely ill-conceived, Netscape changes the name from LiveScript to JavaScript in version 2.0B3.
  • August ’96: Microsoft launches IE 3.0, the first version to include JavaScript support. Microsoft calls their version “JScript” (presumably for legal reasons).
  • November ’96: ECMA (Now Ecma) begins standardization. Netscape and Microsoft argue over the name. The result is an even worse name. Quoting Eich, ECMAScript “was always an unwanted trade name that sounds like a skin disease.”

Especially considering the strange, difficult and rushed circumstances of its origin, JavaScript did many things well: first class functions and objects, prototypes, dynamic typing, object literal syntax, closures, and more. But is it any surprise that it got a bunch of things wrong too? Just considering syntax, things like: obscuring prototypical OOP through confusingly classical syntax, the var keyword (forgot var? congrats, you’ve got a global!), automatic type coercion and == vs ===, automatic semicolon insertion woes, the arguments object (which acts like an array except when it doesn’t), and so on. Before any of these problems could be changed, JavaScript was already built into competing browsers and solidified by an international standards committee. The really bad news is, because browsers evolve slowly, browser-interpreted languages evolve slowly. Introducing new iteration constructs, adding default arguments, slices, splats, multiline strings, and so on is really difficult. Such efforts take years, and require cooperation among large corporations and standards bodies.

Our point is to forget CoffeeScript’s influences for a minute, because it fixes so many of these syntactic problems and at least partially breaks free of JavaScript’s slow evolution; even if you don’t care for significant whitespace, we recommend CoffeeScript for so many other reasons. Disclaimer: we love Python, and it’s Dropbox’s primary language, so we’re probably biased.

An interesting argument against CoffeeScript from Ryan Florence, that seemed plausible to us on first impression but didn’t hold up after we thought more about it, is the idea that (a) human beings process images and symbols faster than words, so (b) verbally readable code isn’t necessarily quicker to comprehend. Florence uses this to argue that (c) while CoffeeScript may be faster to read, JavaScript is probably faster to comprehend. We’d expect cognitive science provides plenty of evidence in support of (a), including the excellent circle example cited by Florence. (b) is easily proven by counterexample. Making the leap to (c) is where we ended up disagreeing:

  • For the most part CoffeeScript isn’t trading symbols for words — it’s dropping symbols. Highly repetitive symbols like , ; {} (). We believe such symbols mostly add syntactic noise that makes code harder to read.
  • CoffeeScript introduces new symbols! For example, (a,b,c) -> ... instead of function (a,b,c) {...}. Along with being shorter to type, we think this extra notation makes code easier to comprehend, similar to how math is often better explained through notation instead of words.
  • Consider one example where CoffeeScript does in fact swap a symbol for a word: || vs or. Is || really analogous to the circle in Florence’s example, with or being the verbal description of that circle? This needs the attention of a cognitive scientist, but our hunch is || functions more linguistically than it does symbolically to most readers, acting as a stand-in for the word or. So in this case we expect something more like the reverse of the circle example: we think || and or are about equally readable, but would give slight benefit to CoffeeScript’s or, as it replaces a stand-in for or with or itself. Humans are good at mapping meanings to symbols, but there’s nothing particularly or-esque about ||, so we suspect it adds a small amount of extra work to comprehend.

On to some code samples.

Before and after shots

JavaScript CoffeeScript

if (files) {
  BrowseDrag._update_status_position(e, files);
} else if (e.dataTransfer &&
           e.dataTransfer.types &&
           e.dataTransfer.types.contains('Files')) {

if files
  @_update_status_position e, files
else if e.dataTransfer?.types?.contains 'Files'
  CrossDomainUploader.show_drop_indicators e

this.originalStyle = {};
['top', 'left', 'width', 'height'].each(function (k) {
  this.originalStyle[k] = this.element.style[k];

@originalStyle = {}
for k in ['top', 'left', 'width', 'height']
  @originalStyle[k] = @element.style[k]

Sharing = {
  init: function (sf_info) {
    [sf_info.current, sf_info.past].each(function (list) {
      list.each(function (info) {

Sharing =
  init: (sf_info) ->
    for list in [sf_info.current, sf_info.past]
      for info in list
        @_decode_sort_key info


We’ll let this comparison speak for itself. We consider it our strongest argument in favor of CoffeeScript.


JavaScript CoffeeScript
Lines of code 23437 18417
Tokens 75334 66058
Characters 865613 659930


In the process of converting, we shaved off more than 5000 lines of code, a 21% reduction. Granted, many of those lines looked like this:


Regardless, fewer lines is beneficial for simple reasons — being able to fit more code into a single editor screen, for example.

Measuring reduction in code complexity is of course much harder, but we think the stats above, especially token count, are a good first-order approximation. Much more to say on that subject.

In production, we compile and concatenate all of our CoffeeScript source into a single JavaScript file, minify it, and serve it to browsers with gzip compression. The size of the compressed bundle didn’t change significantly pre- and post-coffee transformation, so our users shouldn’t notice anything different. The site performs and behaves as before.


Rewriting over 23,000 lines of code in one (hack)week was a big undertaking. To significantly hasten the process and avoid bugs, we used js2coffee, a JavaScript to CoffeeScript compiler, to do all of the repetitive conversion tasks for us (things like converting JS blocks to CS blocks, or JS functions to CS functions). We’d start converting a new JS file by first compiling it individually to CS, then manually editing each line as we saw fit, improving style along the way, and making it more idiomatic. One example: the compiler isn’t smart enough to convert a JS three-clause for into a CS for/in. Instead it outputs a CS while with i++ at the end. We switched each of those to simpler loops. Another example: using string interpolation instead of concatenation in places where it made sense.

To make sure we didn’t break the site, we used a few different approaches to test:

  1. Jasmine for unit testing.
  2. We built a fuzz tester with Selenium. It takes a random walk across the website looking for exceptions. Give it enough time, and it theoretically should catch ‘em all ;)
  3. Tons of manual testing.

Going Forward

Dropbox now writes all new browser-side code in CoffeeScript, and we’ve been loving it. We’ve already written several thousand new lines of coffee since launching in July. Some of the things we’re looking to improve in the future:

  • Browser support for CoffeeScript source maps, so we can link JavaScript exceptions directly to the source code, and debug CoffeeScript live.
  • Native CoffeeScript support in browsers, so that during development, we can avoid the compilation to JavaScript altogether.


To Brendan Eich and Jeremy Ashkenas for creating two fantastic languages.

Some love for JavaScript applications

Uncategorized / 34 Comments Posted by Victor Costan on August 31, 2012

During our last hack week, Aakanksha Sarda and I set out to build a library that helps JavaScript developers use the Dropbox API. My main goal was to take “static” Web applications to the next level. However, the JavaScript library can be useful for any application that runs a significant part of its logic in the browser, as well as for node.js server-side code.
If you prefer getting your hands dirty right away, go ahead and register for a Dropbox API key, set up an application in your Dropbox, borrow sample code, and read the library documentation.

Motivation: static Web applications

Thanks to recent improvements in browser support and VM performance, I often find myself writing small and medium applications completely in JavaScript, whenever I can get away with it. JavaScript runs on users’ browsers, so all the application’s files are static, and can be served by any plain old file server such as nginx, pretty much any Web hosting service, and your humble Dropbox.

My interest in static Web apps is greatly influenced by how easy they are to deploy. For example, the deployment process for my Dropbox-hosted applications is the cp command (copy if you’re on Windows). Dropbox syncs the files to its servers, and even lets me revert a bad deployment. I’ve been using this beautiful, simple paradigm for all my applications that don’t need to store per-user data.

However, up until now, having to handle per-user data has been a different story. I needed a database to store the data and an application server to talk to the database server. My applications then needed accounts and authentication to ensure that users wouldn’t overwrite each others’ data. Finally, deploying a new version of the application involved pulling the updated code from a git repository, migrating the database schema, and restarting the application server. I was afraid I’d make a mistake, so I ended up writing complex scripts to handle the deployment.

To my despair, the code for having small applications store per-user data was much longer than the actual application code. Deploying the apps was a problem on its own, and left me yearning for the minimalistic “copy to Dropbox” approach. Fortunately, today’s announcement fixes everything!

Demo: “powered by Dropbox” Web applications

My hack week project, dropbox.js, makes it easy to use Dropbox for storing per-user data. For example, let’s consider Checkbox, a To Do manager hosted entirely in this Dropbox folder. While Checkbox won’t be winning design or usability awards any time soon, it is fully functional! It can store your To Do list right in your Dropbox, in less than 70 lines of HTML and less than 300 lines of commented CoffeeScript (which compiles into less than 350 lines of JavaScript).

Let’s skim the Checkbox source code. The action happens in the app’s CoffeeScript code. The Checkbox class is the application’s view and controller, so it renders the UI and handles DOM events. The Tasks and Task classes implement the data model and the Dropbox integration.

Checkbox uses the “App folder” Dropbox access level, so Dropbox automatically creates a directory for my app data in my users’ Dropboxes. The data model design favors ease of development and debugging. Each task is stored as a file whose name is the task’s description. Tasks are grouped under two folders, active and done. Operations on tasks cleanly map to Dropbox file operations in Dropbox. For example:

  • to initialize the task database, create two folders, active and done (implemented by the mkdir calls in the load method in the Tasks class)
  • to create a task named “Buy milk”, create the empty file “active/Buy milk” (implemented by addTask in the Tasks class)
  • to mark a task as completed, move its file from the active folder into the done folder (implemented by setTaskDone in the Tasks class)
  • to remove a task, delete its file from the user’s Dropbox (implemented by removeTask in the Tasks class)
  • to list all the user’s tasks, read the contents of the active and done folders (implemented by the readdir calls in the load method in the Tasks class)

Most importantly, each Dropbox operation takes up one line of code in the Tasks implementation. Thanks to dropbox.js, the Checkbox source code is all about the application’s logic, and does not get distracted with infrastructure issues.

In the end, dropbox.js makes it easy for me to store my users’ data in a system whose features and reliability go beyond those of the storage backing many enterprise-grade applications. Checkbox-managed To Do lists are transmitted securely, stored redundantly, and backed up. My own To Do lists are in my Dropbox, so I debugged my application with my file manager, and deployed it using cp. I have never SSHed into a server, and I haven’t issued a single SQL query.

Want in on the action? This little JavaScript application can help you run small Web applications out of your Dropbox!

dropbox.js: designed to make developers happy

Hopefully, I’ve convinced you to try out dropbox.js for your next prototype or hobby project. In fact, I won’t be offended if you stop reading now and start using it! However, I think you’ll also find it interesting to read more about the goals, tough decisions, and story behind the library’s design. Let’s start with the goals:

dropbox.js stays out of your way

I think that infrastructure libraries should silently support your development efforts, and stay out of your way. Every bit of brainpower lost on figuring out low-level issues is not spent on making your application awesome. So the overarching goal for dropbox.js is to have you spend as little time as possible thinking about it.

To that end, the libraries aims to follow the principle of least surprise, and to make the common use cases especially easy to implement. For example, the interfaces of the methods for reading and writing Dropbox files heavily borrow from fs.readFile and fs.writeFile in node.js, as you can see in the code below.

  1. // Inefficient way of copying a file.
  2. client.readFile(“source.txt”, function(error, data) {
  3.     client.writeFile(“destination.txt”, data, function (error) {
  4.         console.log(“Done copying.”);
  5.     });
  6. });

Dropbox has more features than your average filesystem, such as revision history, and these advanced features can be accessed by passing an options object. For example, the same readFile method can be used to retrieve an old revision of a file.

  1. client.readFile(“source.txt”, { versionTag: “0400000a” },
  2.             function(error, data) {
  3.     console.log(“Done copying.”);
  4. });

However, passing an options object is not mandatory, and the default options reflect the common use case, such as reading the most recent revision of a file. Compare and contrast with the Win32 CreateFile function, which takes 7 parameters.

Last but not least, the library’s interface acknowledges both experienced JavaScript programmers and experienced Dropbox API users. For example, listing a folder’s contents can be done by calling a readdir method, whose interface matches fs.readdir, or by calling a metadata method, whose interface is closer to the /metadata REST API.

dropbox.js is easy to hack

I really hope that dropbox.js works out of the box for you, and helps you integrate with Dropbox quickly and painlessly. At the same time, I realize that a small library can’t possibly cover all the use cases, and some of you will have to extend or modify the library.

The library exposes some of its internals on purpose, so you can break the abstractions when you need to. For example, methods that make AJAX calls, such as readFile and writeFile, return the XmlHttpRequest object used for the AJAX call. If you want to implement progress bars for Dropbox operations, you can set up listeners for the relevant XmlHttpRequest events. To further encourage extension, the library’s internal methods are fully documented using JSDoc, just like the public APIs.

Asides from source code, dropbox.js includes an automated build script, and a mostly-automated test suite with very solid coverage. If you need to modify the library, the README file will help you use the test suite to verify the correctness of your changes and incorporate your changes in a minified library build or an npm package. The entire library is hosted on GitHub, so you can easily submit your patches and not get stuck maintaining a fork.

Tough decisions

Perhaps the most delicate aspect of integrating with a Web service is the user authentication piece. Dropbox authentication uses a four-step process that bounces the user between the application’s Web page and the Dropbox servers, so applications will most likely need to customize the process. dropbox.js defines an authentication driver interface for the custom code, and also includes three implementations that will get you going through the prototype stage of your application. For example, the code below covers both library initialization and authentication.

  1. var appKey = { key: “api-key”, secret: “api-secret”, sandbox: true };
  2. var client = new Dropbox.Client(appKey);
  3. client.authDriver(new Dropbox.Drivers.Redirect());
  4. client.authenticate(function(error, data) {
  5.     if (error) { return showError(error); }
  6.     doSomethingUseful(client);  // The user is now authenticated.
  7. });

While writing the JavaScript library, I struggled a lot figuring out how much I can diverge from the Dropbox REST API, in the name of offering a more intuitive API. While I didn’t want to create a RequestProcessorFactoryFactory-like monster, I also wanted to hide some aspects of the REST API that I found confusing.  For example, the JavaScript library wraps the /metadata output, and uses the stat name for the concept of metadata, to match existing filesystem APIs. The rev parameter that shows up in many API calls is renamed to revisionTag in dropbox.js, to avoid the misconception that its value would be a sequential integers, like Subversion revisions. I look forward to seeing how this decision plays out, and to learning from it.

When I started working on the JavaScript library, I envisioned that the Dropbox object would implement a high-level API, and developers would only call into Dropbox.Client if they would need a low-level API. I had high hopes for the high-level API. It was supposed to provide File and Directory classes, automatically cache Dropbox data in IndexedDb, and automatically kick off the authentication process on token expiration. Unfortunately, by the end of the hack week, the low-level API was barely completed, so the current dropbox.js release ships an empty Dropbox object, and the code samples use Dropbox.Client. Ship early, ship often :)

The story of dropbox.js

dropbox.js was designed and developed during the most recent Dropbox hack week, a hackathon where we put our regular work on hold for 5 days and get to work on our crazy ideas and pet projects.

I got the idea to write a JavaScript client for the Dropbox API when I brainstormed for hack week projects, and I realized that all my ideas would be best prototyped as JavaScript applications. At first, I had many doubts about the idea. Would other people use this? Would I be able to make a decent API? Fortunately, I shared my idea with a few colleagues, who encouraged me to commit to the idea and post it to the company’s internal list of hack week projects.

After posting the project, I received an amazing (and humbling) amount of support from fellow Dropboxers. Before hack week, Chris Varenhorst added CORS headers to the Dropbox API responses, which allow dropbox.js to work in the browser.

During hack week, Aakanksha Sarda fleshed out the code for all the file operations, figured out how to deal with binary files (e.g., images), got the automated test suite running in the browser, and wrote Dropstagram, a Dropbox-powered photo-editing application, using WebGL shaders. A few other Dropboxers used dropbox.js and gave us great feedback and a sense of urgency. Rich Chan built a true Internet Terminal running on JavaScript and Dropbox and a stunning visualizer for the revision history of any text file in your Dropbox. Franklin Ta prototyped a Google Chrome extension that lets you download and upload files straight into / from your Dropbox. David Goldstein worked on a secret project that will make Dropbox apps even more awesome, and used dropbox.js to write a browser-based .zip unpacker for your Dropbox files.

Hack week provided an amazing backdrop for the development of dropbox.js. Graham Abbott set up a pod for everyone working with dropbox.js, so it was very easy for us to collaborate and exchange ideas and feedback. Jon Ying sprinkled some design magic on the applications that we developed. The kitchen staff stayed up and prepared hot food for us every night of the week. Dropboxers (including Drew and Arash) came by our pod, looked at our demos, and cheered us on.

After hack week, Chris Varenhorst, Dima Ryazanov, and Brian Smith helped me work through some last-minute technical difficulties. Jon Ying gave a makeover to the Checkbox sample app. Albert Ni, Alex Allain, Glara Ahn and Jon Ying helped make this blog post happen.

Go forth and hack!

I hope you enjoyed reading about the dropbox.js. Above all, I really hope that you will go download the library, and build something amazing with it!

I expect that “powered by Dropbox” Web applications will become a great tool for learning Web programming and building class projects. I’ll walk the walk myself this coming winter, when I’ll teach Web Programming at MIT.

I look forward to learning about the projects that come out of the next Dropbox hack week, and I’ll be counting how many of them use dropbox.js :)

Plop: Low-overhead profiling for Python

Uncategorized / 9 Comments Posted by Ben Darnell on July 10, 2012

It’s almost time for another Hack Week at Dropbox, and with that in mind I’d like to present one of the projects from our last Hack Week.

A profiler is an indispensable tool for optimizing programs.  Without a profiler, it’s hard to tell which parts of the code are consuming enough time to be worth looking at.  Python comes with a profiler called cProfile, but enabling it slows things down so much that it’s usually only used in development or simulated scenarios, which may differ from real-world usage.

At our last hack week, I set out to build a profiler that would be usable on live servers without impacting our users.  The result, Plop (Python Low Overhead Profiler) is now available on Github.

How it works

Plop is a sampling profiler, similar to Google’s gperftools.  Every 10 milliseconds, a timer is fired which causes the program to record its current stack trace.  After 30 seconds, the collected samples are aggregated and saved.  There is a web-based viewer for the resulting call graphs (using d3.js)

Here’s a sample profile, from a simple Tornado-based web server. Click on the image for a full-size interactive view (it’s big, so you’ll need to either scroll around a lot or use your browser’s zoom controls).

Each bubble is a function; they’re color-coded by filename and you can mouse over them for more details. The size of the bubble represents the amount of time spent in that function.  The thickness of the lines connecting the bubbles represents how frequently that function call appears on the stack.  The disconnected bubbles around the edge are actually common utility functions that are called from many places throughout the code – they’re so common that drawing the connections to every function that calls them would make the graph unreadable.

Since hack week, we’ve integrated Plop into our servers and run it regularly.  Every time we push new code to the site, a script collects a profile for the new version.  This has proven not to be disruptive, with less than 2% CPU overhead while the profile is being collected. It’s still a work in progress (especially the viewer), but has already proven useful in identifying performance regressions.

zxcvbn: realistic password strength estimation

Uncategorized / 107 Comments Posted by Dan Wheeler on April 10, 2012

Over the last few months, I’ve seen a password strength meter on almost every signup form I’ve encountered. Password strength meters are on fire.

Here’s a question: does a meter actually help people secure their accounts? It’s less important than other areas of web security, a short sample of which include:

  • Preventing online cracking with throttling or CAPTCHAs.
  • Preventing offline cracking by selecting a suitably slow hash function with user-unique salts.
  • Securing said password hashes.

With that disclaimer — yes. I’m convinced these meters have the potential to help. According to Mark Burnett’s 2006 book, Perfect Passwords: Selection, Protection, Authentication, which counted frequencies from a few million passwords over a variety of leaks, one in nine people had a password in this top 500 list. These passwords include some real stumpers: password1, compaq, 7777777, merlin, rosebud. Burnett ran a more recent study last year, looking at 6 million passwords, and found an insane 99.8% occur in the top 10,000 list, with 91% in the top 1,000. The methodology and bias is an important qualifier — for example, since these passwords mostly come from cracked hashes, the list is biased towards crackable passwords to begin with.

These are only the really easy-to-guess passwords. For the rest, I’d wager a large percentage are still predictable enough to be susceptible to a modest online attack. So I do think these meters could help, by encouraging stronger password decisions through direct feedback. But right now, with a few closed-source exceptions, I believe they mostly hurt. Here’s why.

Strength is best measured as entropy, in bits: it’s the number of times a space of possible passwords can be cut in half. A naive strength estimation goes like this:

# n: password length
# c: password cardinality: the size of the symbol space
#    (26 for lowercase letters only, 62 for a mix of lower+upper+numbers)
entropy = n * lg(c) # base 2 log

This brute-force analysis is accurate for people who choose random sequences of letters, numbers and symbols. But with few exceptions (shoutout to 1Password / KeePass), people of course choose patterns — dictionary words, spatial patterns like qwerty, asdf or zxcvbn, repeats like aaaaaaa, sequences like abcdef or 654321, or some combination of the above. For passwords with uppercase letters, odds are it’s the first letter that’s uppercase. Numbers and symbols are often predictable as well: l33t speak (3 for e, 0 for o, @ or 4 for a), years, dates, zip codes, and so on.

As a result, simplistic strength estimation gives bad advice. Without checking for common patterns, the practice of encouraging numbers and symbols means encouraging passwords that might only be slightly harder for a computer to crack, and yet frustratingly harder for a human to remember. xkcd nailed it:

As an independent Dropbox hackweek project, I thought it’d be fun to build an open source estimator that catches common patterns, and as a corollary, doesn’t penalize sufficiently complex passphrases like correcthorsebatterystaple. It’s now live on dropbox.com/register and available for use on github. Try the demo to experiment and see several example estimations.

The table below compares zxcvbn to other meters. The point isn’t to dismiss the others — password policy is highly subjective — rather, it’s to give a better picture of how zxcvbn is different.

qwER43@! Tr0ub4dour&3 correcthorsebatterystaple
Dropbox (old)
Bank of America (password not allowed) (password not allowed) (password not allowed)
eBay (password not allowed)

A few notes:

  • I took these screenshots on April 3rd, 2012. I needed to crop the bar from the gmail signup form to make it fit in the table, making the difference in relative width more pronounced than on the form itself.
  • zxcvbn considers correcthorsebatterystaple the strongest password of the 3. The rest either consider it the weakest or disallow it. (Twitter gives about the same score for each, but if you squint, the scores are slightly different.)
  • zxcvbn considers qwER43@! weak because it’s a short QWERTY pattern. It adds extra entropy for each turn and shifted character.
  • The PayPal meter considers qwER43@! weak but aaAA11!! strong. Speculation, but that might be because it detects spatial patterns too.
  • Bank of America doesn’t allow passwords over 20 characters, disallowing correcthorsebatterystaple. Passwords can contain some symbols, but not & or !, disallowing the other two passwords. eBay doesn’t allow passwords over 20 characters either.
  • Few of these meters appear to use the naive estimation I opened with; otherwise correcthorsebatterystaple would have a high rating from its long length. Dropbox used to add points for each unique lowercase letter, uppercase letter, number, and symbol, up to a certain cap for each group. This mostly has the same only-works-for-brute-force problem, although it also checked against a common passwords dictionary. I don’t know the details behind the other meters, but a scoring checklist is another common approach (which also doesn’t check for many patterns).
  • I picked Troubadour to be the base word of the second column, not Troubador as occurs in xkcd, which is an uncommon spelling.


zxcvbn has no dependencies and works on ie7+/opera/ff/safari/chrome. The best way to add it to your registration page is:

<script type="text/javascript" src="zxcvbn-async.js">

zxcvbn-async.js is a measly 350 bytes. On window.load, after your page loads and renders, it’ll load zxcvbn.js, a fat 680k (320k gzipped), most of which is a dictionary. I haven’t found the script size to be an issue; since a password is usually not the first thing a user enters on a signup form, there’s plenty of time to load. Here’s a comprehensive rundown of crossbrowser asynchronous script loading.

zxcvbn adds a single function to the global namespace:

zxcvbn(password, user_inputs)

It takes one required argument, a password, and returns a result object. The result includes a few properties:

result.entropy            # bits
result.crack_time         # estimation of actual crack time, in seconds.
result.crack_time_display # same crack time, as a friendlier string:
                          # "instant", "6 minutes", "centuries", etc.
result.score              # 0, 1, 2, 3 or 4 if crack time is less than
                          # 10**2, 10**4, 10**6, 10**8, Infinity.
                          # (helpful for implementing a strength bar.)
result.match_sequence     # the detected patterns used to calculate entropy.
result.calculation_time   # how long it took to calculate an answer,
                          # in milliseconds. usually only a few ms.

The optional user_inputs argument is an array of strings that zxcvbn will add to its internal dictionary. This can be whatever list of strings you like, but it’s meant for user inputs from other fields of the form, like name and email. That way a password that includes the user’s personal info can be heavily penalized. This list is also good for site-specific vocabulary. For example, ours includes dropbox.

zxcvbn is written in CoffeeScript. zxcvbn.js and zxcvbn-async.js are unreadably closure-compiled, but if you’d like to extend zxcvbn and send me a pull request, the README has development setup info.

The rest of this post details zxcvbn’s design.

The model

zxcvbn consists of three stages: match, score, then search.

  • match enumerates all the (possibly overlapping) patterns it can detect. Currently zxcvbn matches against several dictionaries (English words, names and surnames, Burnett’s 10,000 common passwords), spatial keyboard patterns (QWERTY, Dvorak, and keypad patterns), repeats (aaa), sequences (123, gfedcba), years from 1900 to 2019, and dates (3-13-1997, 13.3.1997, 1331997). For all dictionaries, match recognizes uppercasing and common l33t substitutions.
  • score calculates an entropy for each matched pattern, independent of the rest of the password, assuming the attacker knows the pattern. A simple example: rrrrr. In this case, the attacker needs to iterate over all repeats from length 1 to 5 that start with a lowercase letter:

    entropy = lg(26*5) # about 7 bits
  • search is where Occam’s razor comes in. Given the full set of possibly overlapping matches, search finds the simplest (lowest entropy) non-overlapping sequence. For example, if the password is damnation, that could be analyzed as two words, dam and nation, or as one. It’s important that it be analyzed as one, because an attacker trying dictionary words will crack it as one word long before two. (As an aside, overlapping patterns are also the primary agent behind accidentally tragic domain name registrations, like childrens-laughter.com but without the hyphen.)

Search is the crux of the model. I’ll start there and work backwards.

Minimum entropy search

zxcvbn calculates a password’s entropy to be the sum of its constituent patterns. Any gaps between matched patterns are treated as brute-force “patterns” that also contribute to the total entropy. For example:

entropy("stockwell4$eR123698745") == surname_entropy("stockwell") +
                                     bruteforce_entropy("4$eR") +

That a password’s entropy is the sum of its parts is a big assumption. However, it’s a conservative assumption. By disregarding the “configuration entropy” — the entropy from the number and arrangement of the pieces — zxcvbn is purposely underestimating, by giving a password’s structure away for free: It assumes attackers already know the structure (for example, surname-bruteforce-keypad), and from there, it calculates how many guesses they’d need to iterate through. This is a significant underestimation for complex structures. Considering correcthorsebatterystaple, word-word-word-word, an attacker running a program like L0phtCrack or John the Ripper would typically try many simpler structures first, such as word, word-number, or word-word, before reaching word-word-word-word. I’m OK with this for three reasons:

  • It’s difficult to formulate a sound model for structural entropy; statistically, I don’t happen to know what structures people choose most, so I’d rather do the safe thing and underestimate.
  • For a complex structure, the sum of the pieces alone is often sufficient to give an “excellent” rating. For example, even knowing the word-word-word-word structure of correcthorsebatterystaple, an attacker would need to spend centuries cracking it.
  • Most people don’t have complex password structures. Disregarding structure only underestimates by a few bits in the common case.

With this assumption out of the way, here’s an efficient dynamic programming algorithm in CoffeeScript for finding the minimum non-overlapping match sequence. It runs in O(n·m) time for a length-n password with m (possibly overlapping) candidate matches.

# matches: the password's full array of candidate matches.
# each match has a start index (match.i) and an end index (match.j) into
# the password, inclusive.
minimum_entropy_match_sequence = (password, matches) ->
  # e.g. 26 for lowercase-only
  bruteforce_cardinality = calc_bruteforce_cardinality password
  up_to_k = []      # minimum entropy up to k.
  backpointers = [] # for the optimal sequence of matches up to k,
                    # holds the final match (match.j == k).
                    # null means the sequence ends w/ a brute-force char
  for k in [0...password.length]
    # starting scenario to try to beat:
    # adding a brute-force character to the minimum entropy sequence at k-1.
    up_to_k[k] = (up_to_k[k-1] or 0) + lg bruteforce_cardinality
    backpointers[k] = null
    for match in matches when match.j == k
      [i, j] = [match.i, match.j]
      # see if minimum entropy up to i-1 + entropy of this match is less
      # than the current minimum at j.
      candidate_entropy = (up_to_k[i-1] or 0) + calc_entropy(match)
      if candidate_entropy < up_to_k[j]
        up_to_k[j] = candidate_entropy
        backpointers[j] = match

  # walk backwards and decode the best sequence
  match_sequence = []
  k = password.length - 1
  while k >= 0
    match = backpointers[k]
    if match
      match_sequence.push match
      k = match.i - 1
      k -= 1

  # fill in the blanks between pattern matches with bruteforce "matches"
  # that way the match sequence fully covers the password:
  # match1.j == match2.i - 1 for every adjacent match1, match2.
  make_bruteforce_match = (i, j) ->
    pattern: 'bruteforce'
    i: i
    j: j
    token: password[i..j]
    entropy: lg Math.pow(bruteforce_cardinality, j - i + 1)
    cardinality: bruteforce_cardinality
  k = 0
  match_sequence_copy = []
  for match in match_sequence # fill gaps in the middle
    [i, j] = [match.i, match.j]
    if i - k > 0
      match_sequence_copy.push make_bruteforce_match(k, i - 1)
    k = j + 1
    match_sequence_copy.push match
  if k < password.length # fill gap at the end
    match_sequence_copy.push make_bruteforce_match(k, password.length - 1)
  match_sequence = match_sequence_copy

  # or 0 corner case is for an empty password ''
  min_entropy = up_to_k[password.length - 1] or 0
  crack_time = entropy_to_crack_time min_entropy

  # final result object
  password: password
  entropy: round_to_x_digits min_entropy, 3
  match_sequence: match_sequence
  crack_time: round_to_x_digits crack_time, 3
  crack_time_display: display_time crack_time
  score: crack_time_to_score crack_time

backpointers[j] holds the match in this sequence that ends at password position j, or null if the sequence doesn’t include such a match. Typical of dynamic programming, constructing the optimal sequence requires starting at the end and working backwards.

Especially because this is running browser-side as the user types, efficiency does matter. To get something up and running I started with the simpler O(2m) approach of calculating the sum for every possible non-overlapping subset, and it slowed down quickly. Currently all together, zxcvbn takes no more than a few milliseconds for most passwords. To give a rough ballpark: running Chrome on a 2.4 GHz Intel Xeon, correcthorsebatterystaple took about 3ms on average. coRrecth0rseba++ery9/23/2007staple$ took about 12ms on average.

Threat model: entropy to crack time

Entropy isn’t intuitive: How do I know if 28 bits is strong or weak? In other words, how should I go from entropy to actual estimated crack time? This requires more assumptions in the form of a threat model. Let’s assume:

  • Passwords are stored as salted hashes, with a different random salt per user, making rainbow attacks infeasible.
  • An attacker manages to steal every hash and salt. The attacker is now guessing passwords offline at max rate.
  • The attacker has several CPUs at their disposal.

Here’s some back-of-the-envelope numbers:

# for a hash function like bcrypt/scrypt/PBKDF2, 10ms is a safe lower bound
# for one guess. usually a guess would take longer -- this assumes fast
# hardware and a small work factor. adjust for your site accordingly if you
# use another hash function, possibly by several orders of magnitude!
SINGLE_GUESS = .010 # seconds
NUM_ATTACKERS = 100 # number of cores guessing in parallel.


entropy_to_crack_time = (entropy) ->
  .5 * Math.pow(2, entropy) * SECONDS_PER_GUESS

I added a .5 term because we’re measuring the average crack time, not the time to try the full space.

This math is perhaps overly safe. Large-scale hash theft is a rare catastrophe, and unless you’re being specifically targeted, it’s unlikely an attacker would dedicate 100 cores to your single password. Normally an attacker has to guess online and deal with network latency, throttling, and CAPTCHAs.

Entropy calculation

Up next is how zxcvbn calculates the entropy of each constituent pattern. calc_entropy() is the entry point. It’s a simple dispatch:

calc_entropy = (match) ->
  return match.entropy if match.entropy?
  entropy_func = switch match.pattern
    when 'repeat'     then repeat_entropy
    when 'sequence'   then sequence_entropy
    when 'digits'     then digits_entropy
    when 'year'       then year_entropy
    when 'date'       then date_entropy
    when 'spatial'    then spatial_entropy
    when 'dictionary' then dictionary_entropy
  match.entropy = entropy_func match

I gave an outline earlier for how repeat_entropy works. You can see the full scoring code on github, but I’ll describe two other scoring functions here to give a taste: spatial_entropy and dictionary_entropy.

Consider the spatial pattern qwertyhnm. It starts at q, its length is 9, and it has 3 turns: the initial turn moving right, then down-right, then right. To parameterize:

s # number of possible starting characters.
  # 47 for QWERTY/Dvorak, 15 for pc keypad, 16 for mac keypad.
L # password length. L >= 2
t # number of turns. t <= L - 1
  # for example, a length-3 password can have at most 2 turns, like "qaw".
d # average "degree" of each key -- the number of adjacent keys.
  # about 4.6 for QWERTY/Dvorak. (g has 6 neighbors, tilda only has 1.)

The space of total possibilities is then all possible spatial patterns of length L or less with t turns or less:

\lg \left ( \sum_{i=2}^{L} \sum_{j=1}^{\min{t,i - 1}} {{i - 1}\choose{j - 1}} s d^{j} \right )

(i – 1) choose (j – 1) counts the possible configurations of turn points for a length-i spatial pattern with j turns. The -1 is added to both terms because the first turn always occurs on the first letter. At each of j turns, there’s d possible directions to go, for a total of dj possibilities per configuration. An attacker would need to try each starting character too, hence the s. This math is only a rough approximation. For example, many of the alternatives counted in the equation aren’t actually possible on a keyboard: for a length-5 pattern with 1 turn, “start at q moving left” gets counted, but isn’t actually possible.

CoffeeScript allows natural expression of the above:

lg = (n) -> Math.log(n) / Math.log(2)

nPk = (n, k) ->
  return 0 if k > n
  result = 1
  result *= m for m in [n-k+1..n]

nCk = (n, k) ->
  return 1 if k == 0
  k_fact = 1
  k_fact *= m for m in [1..k]
  nPk(n, k) / k_fact

spatial_entropy = (match) ->
  if match.graph in ['qwerty', 'dvorak']
  possibilities = 0
  L = match.token.length
  t = match.turns
  # estimate num patterns w/ length L or less and t turns or less.
  for i in [2..L]
    possible_turns = Math.min(t, i - 1)
    for j in [1..possible_turns]
      possibilities += nCk(i - 1, j - 1) * s * Math.pow(d, j)
  entropy = lg possibilities
  # add extra entropy for shifted keys. (% instead of 5, A instead of a.)
  # math is similar to extra entropy from uppercase letters in dictionary
  # matches, see the next snippet below.
  if match.shifted_count
    S = match.shifted_count
    U = match.token.length - match.shifted_count # unshifted count
    possibilities = 0
    possibilities += nCk(S + U, i) for i in [0..Math.min(S, U)]
    entropy += lg possibilities

On to dictionary entropy:

dictionary_entropy = (match) ->
  entropy = lg match.rank
  entropy += extra_uppercasing_entropy match
  entropy += extra_l33t_entropy match

The first line is the most important: The match has an associated frequency rank, where words like the and good have low rank, and words like photojournalist and maelstrom have high rank. This lets zxcvbn scale the calculation to an appropriate dictionary size on the fly, because if a password contains only common words, a cracker can succeed with a smaller dictionary. This is one reason why xkcd and zxcvbn slightly disagree on entropy for correcthorsebatterystaple (45.2 bits vs 44). The xkcd example used a fixed dictionary size of 211 (about 2k words), whereas zxcvbn is adaptive. Adaptive sizing is also the reason zxcvbn.js includes entire dictionaries instead of a space-efficient Bloom filter — rank is needed in addition to a membership test.

I’ll explain how frequency ranks are derived in the data section at the end. Uppercasing entropy looks like this:

extra_uppercase_entropy = (match) ->
  word = match.token
  return 0 if word.match ALL_LOWER
  # a capitalized word is the most common capitalization scheme,
  # so it only doubles the search space (uncapitalized + capitalized):
  # 1 extra bit of entropy.
  # allcaps and end-capitalized are common enough too,
  # underestimate as 1 extra bit to be safe.
    return 1 if word.match regex
  # otherwise calculate the number of ways to capitalize
  # U+L uppercase+lowercase letters with U uppercase letters or less.
  # or, if there's more uppercase than lower (for e.g. PASSwORD), the number
  # of ways to lowercase U+L letters with L lowercase letters or less.
  U = (chr for chr in word.split('') when chr.match /[A-Z]/).length
  L = (chr for chr in word.split('') when chr.match /[a-z]/).length
  possibilities = 0
  possibilities += nCk(U + L, i) for i in [0..Math.min(U, L)]
  lg possibilities

So, 1 extra bit for first-letter-uppercase and other common capitalizations. If the uppercasing doesn’t fit these common molds, it adds:

\lg \left ( \sum_{i=1}^{min(U, L)} {{U + L}\choose{i}} \right )

The math for l33t substitution is similar, but with variables that count substituted and unsubstituted characters instead of uppers and lowers.

Pattern matching

So far I covered pattern entropy, but not how zxcvbn finds patterns in the first place. Dictionary match is straightforward: check every substring of the password to see if it’s in the dictionary:

dictionary_match = (password, ranked_dict) ->
  result = []
  len = password.length
  password_lower = password.toLowerCase()
  for i in [0...len]
    for j in [i...len]
      if password_lower[i..j] of ranked_dict
        word = password_lower[i..j]
        rank = ranked_dict[word]
          pattern: 'dictionary'
          i: i
          j: j
          token: password[i..j]
          matched_word: word
          rank: rank

ranked_dict maps from a word to its frequency rank. It’s like an array of words, ordered by high-frequency-first, but with index and value flipped. l33t substitutions are detected in a separate matcher that uses dictionary_match as a primitive. Spatial patterns like bvcxz are matched with an adjacency graph approach that counts turns and shifts along the way. Dates and years are matched with regexes. Hit matching.coffee on github to read more.


As mentioned earlier, the 10k password list is from Burnett, released in 2011.

Frequency-ranked names and surnames come from the freely available 2000 US Census. To help zxcvbn not crash ie7, I cut off the surname dictionary, which has a long tail, at the 80th percentile (meaning 80% of Americans have one of the surnames in the list). Common first names include the 90th percentile.

The 40k frequency list of English words comes from a project on Wiktionary, which counted about 29M words across US television and movies. My hunch is that of all the lists I could find online, television and movie scripts will capture popular usage (and hence likely words used in passwords) better than other sources of English, but this is an untested hypothesis. The list is a bit dated; for example, Frasier is the 824th most common word.


At first glance, building a good estimator looks about as hard as building a good cracker. This is true in a tautological sort of way if the goal is accuracy, because “ideal entropy” — entropy according to a perfect model — would measure exactly how many guesses a given cracker (with a smart operator behind it) would need to take. The goal isn’t accuracy, though. The goal is to give sound password advice. And this actually makes the job a bit easier: I can take the liberty of underestimating entropy, for example, with the only downside of encouraging passwords that are stronger than they need to be, which is frustrating but not dangerous.

Good estimation is still difficult, and the main reason is there’s so many different patterns a person might use. zxcvbn doesn’t catch words without their first letter, words without vowels, misspelled words, n-grams, zipcodes from populous areas, disconnected spatial patterns like qzwxec, and many more. Obscure patterns (like Catalan numbers) aren’t important to catch, but for each common pattern that zxcvbn misses and a cracker might know about, zxcvbn overestimates entropy, and that’s the worst kind of bug. Possible improvements:

  • zxcvbn currently only supports English words, with a frequency list skewed toward American usage and spelling. Names and surnames, coming from the US census, are also skewed. Of the many keyboard layouts in the world, zxcvbn recognizes but a few. Better country-specific datasets, with an option to choose which to download, would be a big improvement.
  • As this study by Joseph Bonneau attests, people frequently choose common phrases in addition to common words. zxcvbn would be better if it recognized “Harry Potter” as a common phrase, rather than a semi-common name and surname. Google’s n-gram corpus fits in a terabyte, and even a good bigram list is impractical to download browser-side, so this functionality would require server-side evaluation and infrastructure cost. Server-side evaluation would also allow a much larger single-word dictionary, such as Google’s unigram set.
  • It’d be better if zxcvbn tolerated misspellings of a word up to a certain edit distance. That would bring in many word-based patterns, like skip-the-first-letter. It’s hard because word segmentation gets tricky, especially with the added complexity of recognizing l33t substitutions.

Even with these shortcomings, I believe zxcvbn succeeds in giving better password advice in a world where bad password decisions are widespread. I hope you find it useful. Please fork on github and have fun!

Big thanks to Chris Varenhorst, Gautam Jayaraman, Ben Darnell, Alicia Chen, Todd Eisenberger, Kannan Goundan, Chris Beckmann, Rian Hunter, Brian Smith, Martin Baker, Ivan Kirigin, Julie Tung, Tido the Great, Ramsey Homsany, Bart Volkmer and Sarah Niyogi for helping review this post.

Hilary Mason Speaks at Dropbox

Uncategorized / 6 Comments Posted by Martin on February 23, 2012

We host a monthly tech talk series we call “Droptalks“. In the past, we’ve hosted Steve Souders, Guido van Rossum, Greg Papadopoulos, and Amit Singh.

A couple weeks ago, we were lucky to have Hilary Mason in town. Hilary is the Chief Scientist of bit.ly, the world-famous URL shortener. Bit.ly may seem like a simple service, however, when done at such a large scale there is much more behind the scenes. There’s also a lot of neat data to play with.

Hilary spoke about some of the challenges and lessons from her work trying to derive meaningful uses from the mass of data that flows through bit.ly. She spoke about the history of bit.ly, some of the philosophy of analyzing time-series data, and some cool engineering tricks. She even gave demos of three internal tools at bit.ly that will be released as products in the next few months (really cool stuff!).

The slides are available.

For those interested in learning more about Analytics and Data Science, Hilary suggested a few introductory books:

Using the Dropbox API from Haskell

Uncategorized / 12 Comments Posted by Rian on January 02, 2012

I love Haskell. My first encounter with Haskell started out about eight years ago. Like many people in those days, when I was in high school I spent a lot of time playing around with code on my computer. Reading and understanding open source projects was a main source of knowledge and inspiration for me when I was learning how to program. When I came upon the bzip2 homepage and consequently Julian Seward’s homepage I found a short note about Haskell and how it was a super fun and interesting language to write a compiler for. Haskell? What’s that?

After reading more about Haskell, functional programming, lazy evaluation, and type inference and seeing the elegance of the various code samples, I was hooked. I spent the next couple of weeks going through “Yet Another Haskell Tutorial” and I remember it being incredibly difficult yet incredibly rewarding. After I wrote my first fold over a recursive algebraic datatype, I felt like I was finally starting to speak Haskell. I felt like I had massively improved as a programmer.

While I’ve been at Dropbox, Python has been my main language of computational expression. Even though it was a bit rocky at first, I’ve grown to really love Python and the culture of fast iteration and duck typing. Like a good Pythonista, I’m of the opinion that types are training wheels, but that’s really only until you use a language with a real type system. In C# or Java, types can get in your way and even force you to write overly verbose code or follow silly “design patterns.” In Haskell, types help you soar to higher computational ground. They encourage you to model your data in coherent, concise, and elegant ways that feel right. They aren’t annoying.

More people should use Haskell. The steep learning curve forces you to understand what you are doing at a deeper level and you will be a better programmer because of it. To help that happen, this post will be in a semi-literate programming style and I’ll be describing a Dropbox API app written in Haskell. This won’t be like a normal tutorial so you’ll probably have to do a bit of supplemental reading and practice afterward. The goal is to give you a flavor of what a real program in Haskell looks like.

This post assumes no previous knowledge with Haskell but it does assume moderate programming ability in another language, e.g. C, C++, Ruby, Python, Java, or Lisp. Since this post does not assume previous Haskell experience the beginning will be more of a Haskell tutorial and core concepts will be sectioned off to facilitate the learning process. This post is a published version of a Literate Haskell file. Code lines prefixed with the “>” character are actually part of the final program. This makes it so that it’s possible to simply copy & paste the text here into your favorite editor and run it, just make sure you save the file with a “.lhs” extension. If you ever get tired of reading you can get the real source at the GitHub repo.

The Haskell implementation we’ll be using is The Haskell Platform, it’s a full stack of Haskell tools prepackaged to work out of the box on all three major desktop operating systems. It’s based on GHC, The Glasgow Haskell Compiler. GHC is an advanced optimizing compiler focused on producing efficient code.

Image Search for Dropbox

Years ago Robert Love of Linux kernel hacker fame wrote a FUSE file system that made it so that user-created folders were populated with Beagle search results using the folder name as the search query. It was called beaglefs. The point was to demonstrate the power of user-space file systems, notably the power of having so much more library code available to you than in kernel-space.

We can do a similar thing with the Dropbox API. We’re going to write a hypothetical Dropbox API app that populates user-created folders with Creative Commons licensed images found by performing a web image search using the folder name as the search term. Using Dropbox, all the user has to do to perform an image search is simply create a folder.

Let’s get started!


> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE QuasiQuotes #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE FlexibleContexts #-}
> {-# LANGUAGE TemplateHaskell #-}
> {-# LANGUAGE DeriveDataTypeable #-}

Despite being more than two decades old, Haskell is still evolving. You can tell your Haskell compiler to allow newer language features using this syntax, this is called the LANGUAGE Pragma. Please don’t worry about what these exact language extensions do just yet, you can read more in the GHC docs.

Import Declarations

> import Yesod

This is an import declaration. Declarations like these inform the Haskell module system that I am going to use definitions from this other module. A module in Haskell is a collection of values (functions are values in Haskell too!), datatypes, type synonyms, type classes, etc.

Here I am telling the module system to bring in all definitions from the module Yesod in the namespace of this module. If I wanted to I could also access those definitions prefixed with “Yesod.” similar to Java.

Yesod is a fully-featured modern web-framework for Haskell. We’ll be using it to create the web interface of our API app.

> import Text.XML.HXT.Core hiding (when)

This is a another import declaration. This is just like the Yesod import except we use the “hiding” syntax to tell the Haskell module system to not import the “when” definition into this module’s namespace. The module we are importing is the main module from HXT, the XML processing library that we use to parse out the search results from an image search result page. More details on this much later.

> import Control.Applicative ((<$>), (<*>))
> import Control.Concurrent (forkIO, threadDelay, newChan,
>                            writeChan, readChan, Chan, isEmptyChan)
> import Control.Monad (forM_, when)
> import Control.Monad.Base (MonadBase)
> import Control.Monad.Trans.Control (MonadBaseControl)
> import Data.Maybe (fromJust, mapMaybe, isNothing)
> import Data.Text.Encoding (decodeUtf8With)
> import Data.Text.Encoding.Error (ignore)
> import Data.Typeable (Typeable)
> import Network.HTTP.Enumerator (simpleHttp)
> import System.FilePath.Posix (takeFileName, combine)

These import declarations are slightly different. Instead of bringing in all names from the modules I am only bringing in specific names. This is very similar to the “from module import name” statement in Python.

> import qualified Control.Exception.Lifted as C
> import qualified Data.ByteString as B
> import qualified Data.ByteString.Lazy as L
> import qualified Data.List as DL
> import qualified Data.Map as Map
> import qualified Data.Text as T
> import qualified Data.URLEncoded as URL
> import qualified Dropbox as DB
> import qualified Network.URI as URI

Remember how I said that I could also access names by prefixing them with “Yesod.” earlier? Adding qualified to the import declaration makes it so you must refer to names in other modules using the module prefix. The “as C” part in the first line makes it so that I can do C.try instead of Control.Exception.Lifted.try.

Algebraic Datatypes

> data ImageSearch = ImageSearch (Chan (DB.AccessToken, String)) DB.Config

This is an algebraic datatype declaration. Nevermind what algebraic means for the moment, this is the basic way to define new types in Haskell. Even though it’s very short this little code actually does a couple of things:

  1. It defines a type called ImageSearch. This is the immediate text after “data”.
  2. It defines a function called ImageSearch that takes two arguments, the first of type Chan (DB.AccessToken, String), the second of type DB.Config, and it returns a value of type ImageSearch, ignore these complex types for now they aren’t important. This function is called a constructor in Haskell.

Constructors play a big role in Haskell. Wherever a name can be bound to a value in Haskell, you can also use contructor pattern matching, or deconstruction, to extract out the data contained within that value. Here’s a function to get out the channel component of our ImageSeach type:

imageSearchChan (ImageSearch chan config) = chan

imageSearchChan takes in an ImageSearch argument and return the channel wrapped inside of it. You’ll see deconstruction a lot more later.

So far we’ve defined a type called ImageSearch and we’ve also defined a function called ImageSearch. This is okay because in Haskell type names and value names live in different namespaces.

Maybe Type

The Maybe type is one algebraic datatype that you’ll see a lot in Haskell code. It’s often used to denote an error value from a function or an optional argument to a function.

data Maybe a = Nothing | Just a

Unlike ImageSearch, the Maybe type has not one but two constructors: Nothing and Just. You can use either constructor to create a value in the Maybe type. A value of Nothing usually denotes an error in the Maybe type. A Just value indicates success.

Another difference from our ImageSearch type is that Maybe isn’t a concrete type on its own; it has to wrap some other type. In Haskell, a “higher-order” type like this is called a type constructor; it takes a concrete type and returns a new concrete type. For example, Maybe Int or Maybe String are two concrete types created by the Maybe type constructor. This is similar to generics, like List<T>, in Java or C#. We use the type variablea” in the declaration to show that the Maybe type constructor can be applied to any concrete type.

It’s important to note that type constructors, like Maybe, are different from data constructors, like Nothing or Just.

Type Signatures & Currying

> toStrict :: L.ByteString -> B.ByteString
> toStrict = B.concat . L.toChunks

This is a function definition in Haskell. I’ll explain the definition in the next section but first I wanted to talk about the use of “::” since it keeps coming up. This is how we explicitly tell the Haskell compiler what type we expect a value to be. Even though a Haskell compiler is good enough to infer the types of all values in most cases, it’s considered good practice to at least explicitly specify the types of top-level definitions.

The arrow notation in the type signature denotes a function. toStrict is a function that takes an L.ByteString value and returns a B.ByteString value.

One cool thing about Haskell is that functions can only take one argument. I know it doesn’t sound cool but it’s actually really cool. How do you specify a function that takes two arguments you ask? Well that’s a function that takes a single argument and returns another function that takes another argument. This reformulation of multiple-argument functions is called currying. Currying is cool because it allows us to easily do partial application with functions, you’ll see examples of this later.

Here’s the type signature for a function that takes two arguments of any two types and returns a value in the second type:

twoArgFunc :: a -> b -> b

We use the type variables “a” and “b” to indicate that twoArgFunc is polymorphic, i.e. we can apply twoArgFunc to any two values of any two respective types.

With currying in mind, it shouldn’t be too hard to figure out that the arrow is right-associative, i.e. “a -> b -> b” actually means “a -> (b -> b)”. That would make sense, twoArgFunc is a function that takes an argument and returns another function that takes an argument and returns the final value. Say that over and over again until you understand it.

What if we grouped the arrow the other way?

weirdFunc :: (a -> b) -> b

In this case weirdFunc is a function that takes another function as a its sole argument and returns a value. This is much different from twoArgFunc which instead returns a second function after it accepts its first argument. Passing functions to functions like this is a common idiom in Haskell and it’s one of the strengths of a language where functions are ordinary values just like integers and strings.


The definition of toStrict makes use of the function composition operator, “.”, but Haskell functions don’t have to be defined this way. Here’s a function defined using a variable name:

square x = x * x

What about two arguments?

plus x y = x + y

We can define a function that adds five to its argument by making using of currying:

addFive = plus 5

We curried in the 5 argument to the plus function and just as we said, it returns a new function that takes an argument and adds a five to it:

addFive 2 == 7

Another way to define functions is to use a lambda abstraction. You can write a function as an expression by using the backslash and arrow symbols. Lambda abstractions are also known as anonymous functions. Here’s another way to define plus:

plus = \x y -> x + y

All functions in Haskell are pure. This means that functions in Haskell cannot do anything that causes side-effects like changing a global variable or performing IO. More on this later.


Okay now that you’re cool with functions let’s get back to toStrict. Here’s another way to define it using a lambda:

toStrict = \x -> B.concat (L.toChunks x)

Instead, we define toStrict using the function composition operator, “.”. The function composition operator takes two functions and returns a new function that passes its argument to the right function and the result of that is passed to the left function and the result of that is returned. Here’s one possible definition:

f . g = \x -> f (g x)

Yes this is a real way to define an operator in Haskell! The function composition operator comes standard in Haskell but even if it didn’t you’d still be able to define it yourself. In Haskell, operators and functions are actually two different syntaxes for the same thing. The form above is the infix form but you can also use an operator in the prefix form, like a normal function. Here’s another way to define “.”:

(.) f g = \x -> f (g x)

In this definition of the function composition operator we use prefix notation. It is only necessary to surround an operator with parentheses to transform it into its prefix form. Switching between infix and prefix notation works for any operator, e.g. you can add two numbers with (+) 4 5 in Haskell.

The ability to switch between prefix and infix notation isn’t limited to operators, you can do it with functions too by surrounding the function name with backticks, “`”:

div 1 2 == 1 `div` 2

This is useful for code like: "Dropbox" `isInfixOf` "The Dropbox API is sick!"

One last thing about operators; Haskell has special syntax to curry in arguments to operators in infix form. For example, (+5) is a function that takes in a number and adds five to that number. These are called sections. To further illustrate, all of the following functions do the same thing:

  • (+5)
  • (5+)
  • \x -> x + 5
  • (+) 5
  • plus 5
  • addFive

With operator currying it’s important to recognize that the side you curry the argument in matters. For example, (.g) and (g.) behave differently.

The “$” Operator

Next to the “.” operator there is another function-oriented operator that you’ll see often in Haskell code. This is the function application operator and it’s defined like this:

f $ x = f x

Weird right? Why does Haskell have this?

In Haskell, normal function application has a higher precedence than any other operator and it’s left-associative:

f g h j x == (((f g) h) j) x

Conversely, “$” has the lowest precedence of all operators and it’s right-associative:

f $ g $ h $ j $ x == f (g (h (j x)))

Using “$” can make Haskell code more readable as an alternative to using parentheses. It has other uses too, more on that later.


> listToPair :: [a] -> (a, a)
> listToPair [x, y] = (x, y)
> listToPair _ = error "called listToPair on list that isn't two elements"

listToPair is a little function I use to convert a two-element list to a two-element tuple, usually called a pair.

A list in Haskell is a higher order type that represents an ordered collection of same-typed values. A list type is denoted using brackets, e.g. “[a]” is the polymorphic list of any inner type and “[Int]” is a concrete type that denotes a list of Int values. Unlike vectors or arrays in other languages, you can’t index into a Haskell list in constant time. It is more akin to the traditional Linked List data structure.

You can construct a list in a number of ways:

  • The empty list: []
  • List literal: [1, 2, 3]
  • Adding to the front of a list: 1 : [2]

The “:” operator in Haskell constructs a new list that starts with the left argument and continues with the right argument:

1 : [2, 3, 4] == [1, 2, 3, 4]

One last thing about the list type in Haskell, it’s not that special. We can define our own list type very simply:

data MyList a = Nil | Cons a (MyList a)

-- using "#" as my personal ":" operator
x # xs = Cons x xs

myHead :: MyList a -> a
myHead (Cons x xs) = x
myHead Nil = error "empty list"

Yep, a list is just a recursive algebraic datatype.

foldr and map

In the Lisp tradition, lists are a fundamental data structure in Haskell. They provide an elegant model for solving problems that deal with multiple values at once. While lists are still fresh in your mind let’s go over two functions that are essential to know when manipulating lists.

The first function is foldr. foldr means “fold from the right” and it’s used to build a aggregate value by visiting all the elements in a list. It’s arguments are an aggregating function, a starting value, and a list of values. The aggregating function accepts two argu—Screw it, let’s just define it using recursion:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z []     = z
foldr f z (x:xs) = f x $ foldr f z xs

Notice how we used the “:” operator to deconstruct the input list into its head and tail components. Like I said, you can use foldr to aggregate things in a list, e.g. adding all the numbers in a list:

foldr (+) 0 [1..5] == 15

[1..5]” is syntactic sugar for all integers from 1 to 5, inclusive.

Another less useful thing you can do with foldr is copy a list:

foldr (:) [] ["foo", "bar", "baz"] == ["foo", "bar", "baz"]

foldr’s brother is foldl; it means “fold from the left.”

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

foldl collects values in the list from the left while foldr starts from the right. Notice how the type signature of the aggregating function is reversed, this should help as a sort of mnemonic when using foldr and foldl. Though it may not seem like it, the direction of the fold matters a lot. As an exercise try copying a list by using foldl instead of foldr.

map is another common list operation. map is awesome! It takes a function and a list and returns a new list with the user-supplied function applied to each value in the original list. Recursion is kind of clunky so let’s define it using foldr:

map :: (a -> b) -> [a] -> [b]
map f l = foldr ((:) . f) [] l

Even though foldr is more primitive than map I find myself using map much more often. Here’s how you would map a list of strings to a list of their lengths:

map length ["a", "list", "of", "strings"] == [1, 4, 2, 7]

Remember “$”, the function application operator? We can also use map to apply the same argument to a list of functions:

map ($5) [square, (+5), div 100] == [25, 10, 20]

We curried in the 5 value into right side the “$” operator. That creates a function that takes a function and then returns the application of that function to the value 5. Using map we then apply that to every function in the list. This is why having an “$” operator in a functional language is a good idea but it’s also why map is awesome!

A Quick Note About Tuples

I talked about lists but I kind of ignored tuples. Like lists, tuples are a way to group values together in Haskell. Unlike lists, with tuples you can store values of different types in a single tuple.

intAndString :: (Int, String)
intAndString = (07734, "world")

A more subtle difference from lists is that tuples of different lengths are of different types. For instance, writing a function that returns the first element of a three-element tuple is easy:

fst3 :: (a, b, c) -> a
fst3 (x, _, _) = x

Unfortunately, there is no general way to define a “first” function for tuples of any length without writing a function for each tuple type. Haskell does at least provide a fst function for two-element tuples.

Final note, see how we ignored the second and third elements of the tuple deconstruction by using “_”? This is a common way to avoid assigning names in Haskell.

Template Haskell

> $(mkYesod "ImageSearch" [parseRoutes|
>                         / HomeR GET
>                         /dbredirect DbRedirectR GET
>                         |])

Yesod makes heavy use of Template Haskell. Template Haskell allows you to do compile-time metaprogramming in Haskell, essentially writing code that writes code at compile-time. It’s similar to C++ templates but it’s a lot more like Lisp macros. Template Haskell is a pretty exotic feature that is rarely used in Haskell but Yesod makes use of it to minimize the amount of code you have to write to get a website up and running.

The mkYesod function here generates all the boilerplate code necessary for connecting the HTTP routes to user-defined handler functions. In our app we have two HTTP routes:

  1. / -> HomeR
  2. /dbredirect -> DbRedirectR

The first route connects to a resource called HomeR. The second route, located at /dbredirect, connects to a resource called DbRedirectR. We’ll define these resources later.

Type Classes

This part of the app brings us to one of the most powerful parts of Haskell’s type system, type classes.

Type classes specify a collection of functions that can be applied to multiple types. This is similar to interfaces in C# and Java or duck typing in dynamically-typed languages like Python and Ruby. An instance declaration actually defines the functions of a type class for specific type. Formally, type classes are an extension to the Hindley-Milner type system that Haskell implements to allow for ad-hoc polymorphism, i.e. an advanced form of function overloading.

Note that the word instance used in this context is very different from the meaning in object-oriented languages. In an object-oriented language instances are more akin to Haskell values.

Section 6.3 of the Haskell 2010 Report has a graphic of the standard Haskell type classes. Many core functions in the standard library are actually part of some type class, e.g. (==), the equals operator, is part of the Eq type class. For fun, let’s make a new type class called Binary. It defines functions that convert data between the instance type and a byte format:

import qualified Data.ByteString as B

class Binary a where
  toBinary :: a -> B.ByteString
  fromBinary :: B.ByteString -> a

You can imagine I might use this class when serializing Haskell data over a byte-oriented transmission medium, for example a file or a BSD socket. Here’s an example instance for the String type:

import Data.Text as T
import Data.Text.Encoding as E
import Data.ByteString as B

instance Binary String where
  toBinary = E.encodeUtf8 . T.pack
  fromBinary = T.unpack . E.decodeUtf8

For this instance toBinary is defined as a composition of E.encodeUtf8 and T.pack. Notice the use of T.pack, you’ll see it a lot. T.pack converts a String value into a Text value. E.encodeUtf8 converts the resulting Text value into a ByteString value. fromBinary does the inverse conversion, it converts a ByteString value into a String value.

Let’s define another instance:

import Control.Arrow ((***))
import Data.Int (Int32)
import Data.Bits (shiftR, shiftL, (.&.))

-- stores in network order, big endian
instance Binary Int32 where
  toBinary x = B.pack $ map (fromIntegral . (0xff.&.) . shiftR x . (*8))
               $ reverse [0..3]
  fromBinary x = sum $ map (uncurry shiftL . (fromIntegral *** (*8)))
                 $ zip (B.unpack x) (reverse [0..3])

fromBinary in this instance may look kind of gnarly but you should know what it does; it converts a ByteString value to an Int32 value. Understanding it is left as an exercise for the reader.

Now getting Haskell data into a byte format is as easy as calling toBinary. An important distinction between type classes and the interfaces of C# and Java is that it’s very easy to add new functionality to existing types. In this example, the creators of the both the String and Int32 types didn’t need any foreknowledge of the Binary type class. With interfaces, it would have been necessary to specify the implementation of toBinary and fromBinary at the time those types were defined.

It’s also possible to define functions that depend on their arguments or return values being part of a certain type class:

packWithLengthHeader :: Binary a => a -> B.ByteString
packWithLengthHeader x = B.append (toBinary $ B.length bin) bin
  where bin = toBinary x

Here packWithLengthHeader requires that its input type “a” be a part of the Binary type class, this is specified using the “Binary a =>context in the type signature. A subtle point in the definition of this function is that it requires the Int type to be a part of the Binary type class as well (the return value of B.length).

> instance Yesod ImageSearch where
>   approot _ = T.pack "http://localhost:3000"

Yesod requires you to declare a couple of instances for your app type. Most of the definitions in the Yesod type class are optional and have reasonable defaults but it does require you to define approot, the root URL location of your app. This is necessary for Yesod to be able generate URLs.

> instance RenderMessage ImageSearch FormMessage where
>   renderMessage _ _ = defaultFormMessage

Here’s another instance declaration. The RenderMessage type class in this case is actually based on two types, ImageSearch and FormMessage. It defines a function called renderMessage which takes two arguments and returns defaultFormMessage.

Exceptions & Automatic Type Class Derivation

> data EitherException = EitherException String
>     deriving (Show, Typeable)
> instance C.Exception EitherException

There are multiple ways to specify errors in Haskell. In purely functional contexts it’s not uncommon to see the use of either the Maybe or Either types. In monads based on the IO monad, I usually like to use Haskell exceptions. What’s a monad you say? It’s complicated. Just kidding :) I’ll get to them later.

For now, we’re defining a new exception type. It’s the same algebraic datatype declaration you saw earlier for our ImageSearch type except now there’s this “deriving” thing. A Haskell compiler can automatically derive instance declarations for some of the standard type classes. For EitherException we automatically derive instances for the type classes Show and Typeable. As a note, the ability to automatically derive instances for the Typeable type class was enabled by the LANGUAGE pragma DeriveDataTypeable above.

The last line declares EitherException to be an instance of C.Exception. It might be weird that we didn’t define any functions for this instance. This is because type classes sometimes provide default implementations for the functions in the class. The C.Exception type class actually provides default implementations for types that are part of the Typeable type class.


> exceptOnFailure :: MonadBase IO m => m (Either String v) -> m v
> exceptOnFailure = (>>=either (C.throwIO . EitherException) return)

exceptOnFailure is a function that takes a monad that wraps an Either type and returns that same monad except now wrapping the right side of the Either type. This makes sense to me very clearly but I know, o patient reader, that this must look like gibberish to you.

First let’s talk about monads. Monads are types within the Monad type class. The Monad type class specifies two functions (or an operator and a function):

class Monad m where
  (>>=) :: m a -> (a -> m b) -> m b
  return :: a -> m a

The actual Monad type class definition is slightly more complicated but for simplicity’s sake this will do.

The first operator “>>=” is called bind. What does it do? Well it depends on your monad instance! What we can say for sure is that it takes takes a monad of type “a”, a function that maps from “a” to a monad of type “b”, and returns a monad of type “b”.

The second function, return, takes a value and wraps it in the monad. What it means to be wrapped in the monad, again, depends on the instance. Please note, the return function isn’t like the return statement in other languages, i.e. it doesn’t short-circuit execution of a monad bind sequence.

If that sounds abstract that’s because it is! Monads are a sort of computational framework, many different types of computation are monadic, i.e. they fit the form imposed by bind. Why are monads important? Perhaps the most important reason they exist in Haskell is that they provide a purely functional way to perform (or specify how to perform) IO. Monads are much more than just a way to do IO, however, their general applicability extends to many things.

I won’t dwell on monads too much in this post but for the purposes of your immediate understanding, it suffices to explain the IO monad and do notation. Just as I’m not dwelling on monads, you shouldn’t either. It takes a long time to really understand and master what’s going on. The more you program in Haskell, the more it’ll make sense.

So why can’t we do IO in Haskell without the IO monad? Haskell is a purely functional language, that means that functions in Haskell mimic their mathematical counterparts. A pure function is a mathematical entity that consistently maps values from one domain to another. You expect cos(0) to always evaluate to 1, if it ever evaluated to something else something would be very wrong.

In a purely functional language how would you define getChar?

getChar :: Char
getChar = ...

It takes no arguments so how can it deterministically return what the user is submitting? The answer is it can’t. You can’t do this in a purely functional language.

So what are our options? The answer is to generate a set of actions to take as IO is occurring, in a purely functional manner. This is what the IO monad is and this is why values in the IO monad are called IO Actions. It’s a form of metaprogramming. Here’s an IO action that prints “yes” if a user types “y” and “no” otherwise:

-- type signatures
print :: Show a => a -> IO ()
getChar :: IO Char

main :: IO ()
main = getChar >>= (\x -> print $ if x == 'y' then "yes" else "no")

Why does this work? Notice how the “output” of getChar isn’t tied to its own value, instead the bind operation gets the output value for us. We’re using monads here to build and model sequential and stateful computation, in a purely functional way!

You can imagine that writing real programs in the IO monad could get ugly if you used “>>=” and lambdas everywhere so that’s why Haskell has some syntactic sugar for writing out complex monadic values. This is called do notation. Here’s the same IO action from above written in do notation.

main = do
  x <- getChar
  print $ if x == 'y'
          then "yes"
          else "no"

In do notation each monad is bound using bind in order and values are pulled out using the left-pointing arrow “<-”.

We’re coming to the close of yet another Haskell monad explanation but before we finish I really want to emphasize that monads and the IO monad in particular aren’t that special. Here’s my very own implementation of the IO monad:

data MyIO a = PrimIO a | forall b. CompIO (MyIO b) (b -> (MyIO a))

myGetChar :: MyIO Char
myGetChar = PrimIO 'a'

myPrint :: Show a => a -> MyIO ()
myPrint s = PrimIO ()

instance Monad MyIO where
  m >>= f = CompIO m f
  return x = PrimIO x

runMyIO :: MyIO m -> m
runMyIO (PrimIO x) = x
runMyIO (CompIO m f) = runMyIO (f (runMyIO m))

Of course in the real IO monad, getChar isn’t hard-coded to return the same thing each time and print actually prints something on your terminal. IO actions are run by your Haskell runtime which is usually written in a language where you can actually call a non-pure getChar function, like C.

Either Type

Now, back to exceptOnFailure. Let’s look at it again:

exceptOnFailure :: MonadBase IO m => m (Either String v) -> m v
exceptOnFailure = (>>=either (C.throwIO . EitherException) return)

Is it still confusing? :)

Remember how I said earlier that the Either datatype was used to denote errors in Haskell? The definition of Either looks like this:

data Either a b = Left a | Right b

either :: (a -> c) -> (b -> c) -> Either a b -> c
either f _ (Left x) = f x
either _ g (Right x) = g x

You can use the either function to return different values depending on the Either datatype passed in. By convention the Left constructor is used to denote an error value.

For exceptOnFailure, first we create a function that takes an Either value and if it’s a failure we throw an exception using C.throwIO otherwise we call return to rewrap the success value. Then we curry in that function to the right side of the “>>=” operator.

Some Utility Functions

> myAuthStart :: DB.Config -> Maybe DB.URL -> IO (DB.RequestToken, DB.URL)
> myAuthStart config murl = exceptOnFailure $ DB.withManager $ \mgr ->
>   DB.authStart mgr config murl
> myAuthFinish :: DB.Config -> DB.RequestToken -> IO (DB.AccessToken, String)
> myAuthFinish config rt = exceptOnFailure $ DB.withManager $ \mgr ->
>   DB.authFinish mgr config rt
> myMetadata :: DB.Session -> DB.Path -> Maybe Integer
>               -> IO (DB.Meta, Maybe DB.FolderContents)
> myMetadata s p m = exceptOnFailure $ DB.withManager $ \mgr ->
>   DB.getMetadataWithChildren mgr s p m

Here I’ve defined a couple of convenience functions for using the Dropbox SDK. This is just so I don’t have to use DB.withManager every time I call these functions. Also all of the vanilla Dropbox SDK functions return an Either value in the IO monad so we make use of exceptOnFailure to automatically throw an exception for us if something goes wrong.

> tryAll :: MonadBaseControl IO m => m a -> m (Either C.SomeException a)
> tryAll = C.try

C.try is the normal way to catch exceptions in Haskell. Unfortunately it uses ad-hoc polymorphism within the Exception type class to determine which exception it catches. Since tryAll has an explicit type signature, it’s bound to the instance of C.try that catches C.SomeException.

> dbRequestTokenSessionKey :: T.Text
> dbRequestTokenSessionKey = T.pack "db_request_token"

This is a constant I use for the name of my session key that stores the OAuth request token when authenticating the user to my API app but more on that later.

The Dropbox API Authentication Process

The Dropbox API uses OAuth to grant apps access to Dropbox user accounts. To access any of the HTTP endpoints of the Dropbox API an app must provide an access token as part of the HTTP request. Access tokens are revokable long-lived per-user per-app tokens that are granted to an app at the request of a user.

Acquiring an access token is a three step process:

  1. The app must obtain a unauthenticated request token via the Dropbox API.
  2. The app redirects the user to the Dropbox website. Once the user is at the Dropbox website the user can either approve or deny the request to authenticate the request token for the app.
  3. The Dropbox website redirects the user back to the app’s website. The app can then exchange the authenticated request token for an access token.

A request token actually consists of two components, a key and a secret. Only the key component should be exposed in plaintext. To ensure proper security, the secret should only ever be known to the Dropbox servers and the API app attempting authentication. To exchange an authenticated request token for an access token, the app must also provide the original secret of the request token. This prevents third-parties from hijacking authenticated request tokens.

An access token is long-lived but at any point in time can become invalid. When an access token becomes invalid it is the responsibility of the API app to go through the authentication process again. This allows Dropbox and its users to revoke access tokens at will.

Initiating the Authentication Process

> getHomeR :: Handler RepHtml
> getHomeR = do

Users can enable our app for their Dropbox account using the web. Yesod is the web framework we are using to implement web pages in Haskell. In Yesod all HTTP routes are values in the Handler monad. The convention is that the handler name is the combination of the HTTP method (GET in this case) and the name of the resource. getHomeR is the handler for the GET method on the “HomeR” resource which is located at root HTTP path, “/”. Handlers are connected to HTTP routes served by the web server via the use of Template Haskell above.

The Handler monad is essentially a decorated IO monad so don’t worry about what it is that much. You should use it just like you would use the IO monad.

>   ImageSearch _ dbConfig <- getYesod

getYesod retrieves the app’s value. In our app this value has the ImageSearch type that we defined at the very beginning. Here we’re deconstructing the ImageSearch value and extracting only the config component (while ignoring the channel component). The config value stores some information about our app, like app key and locale, that is used by the Dropbox SDK.

>   myRender <- getUrlRender

getUrlRender gets the URL render function for your app. It turns a resource value for your app into a Text URL.

>   (requestToken, authUrl) <- liftIO
>                              $ myAuthStart dbConfig
>                              $ Just $ T.unpack $ myRender DbRedirectR

Here we call the DB.authStart function (by way of myAuthStart). This function performs the first step of the Dropbox API authentication process. We pass in the URL, DbRedirectR, that we want the user to be redirected to after they authenticate and we get back our new unauthenticated request token and the Dropbox URL where the user can authenticate it. Note that T.unpack converts a Text value into a String value.

The myAuthStart function is in the IO monad so we make use of liftIO to execute myAuthStart in the Handler monad. Since Handler is a wrapper around the IO monad, the liftIO function “lifts” the IO action into the higher monad.

>   let DB.RequestToken ky scrt = requestToken

Do notation allows you to bind names using “let” in a do block. This is for when you need to assign a name to a value that isn’t coming from a monad. Here we’re deconstructing the request token from myAuthStart to get the key and the secret.

>   setSession dbRequestTokenSessionKey $ T.pack $ ky ++ "|" ++ scrt

A session in Yesod is a set of key-value pairs that is preserved per browsing session with our web site. It’s implemented using encryption on top of HTTP cookies. We store the key and the secret of the request token in the session using setSession. We’ll need the secret to finish the authentication process after the user authenticates our app.

setSession expects two Text values so we use T.pack to turn the second argument from a String type into a Text type.

>   redirectText RedirectTemporary $ T.pack authUrl

Finally we redirect the user to the URL given to us from myAuthStart, authUrl. Dropbox will ask the user if they want to allow our app to have access to their account. After they respond, Dropbox will authenticate the request token and then redirect the user to the URL passed to the call to myAuthStart above.

Applicative Functors

> getDropboxQueryArgs :: Handler (T.Text, T.Text)
> getDropboxQueryArgs = runInputGet $ (,)
>                       <$> ireq textField (T.pack "oauth_token")
>                       <*> ireq textField (T.pack "uid")

When Dropbox redirects the user back to our site it passes along some query args in the GET request: “oauth_token” and “uid”. Yesod provides a convenient way, using runInputGet, to extract those in the handler.

The (,) operator is a special prefix-only operator that creates pairs for us:

(,) x y = (x, y)

You might be wondering what “<$>” and “<*>” are. Relax, these are regular operators. They are used for applicative functors. Applicative functors are kind of like monads except not as powerful. I’m going to do something horrible here and define “<$>” and “<*>” in monad terms:

(<*>) :: Monad m => m (a -> b) -> m a -> m b
mf <*> m = do
  f <- mf
  x <- m
  return $ f x

(<$>) :: Monad m => (a -> b) -> m a -> m b
f <$> m = return f <*> m

If you pretend that bind, “>>=”, doesn’t exist and you only have “<*>” and return defined for your type, then your type isn’t a monad, it’s an applicative functor. The only exception is that return is instead called pure in the applicative functor type class:

class Applicative f where
  pure  :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

The actual Applicative type class definition is slightly more complicated but for simplicity’s sake this is good enough.

Now let’s put it all together, in our definition of getDropboxQueryArgs we apply (,) in the applicative functor, then pass the resulting value to runInputGet. runInputGet then runs the applicative functor in the Handler monad.

I know it sounds crazy, I know it does, but luckily you don’t have to fully understand what’s going on behind the scenes to understand how it’s supposed to behave. Keep writing and reading Haskell and eventually it’ll make a lot of sense. Trust me, if I can understand this stuff you can too.

The Authentication Result Handler

> getDbRedirectR :: Handler RepHtml
> getDbRedirectR = do

This is the handler for the location of the redirect in the app authentication process. After the user has given our app access to their account they are redirected here.

>   mtoken <- lookupSession dbRequestTokenSessionKey

Remember how before we redirected the user to the Dropbox authentication URL we first set a key-value pair in the Yesod session using setSession? After the user is redirected, we use the lookupSession function to get the token back out. lookupSession returns a Maybe value so that if a key-value pair does not exist in the current session it can return Nothing, otherwise it will return the value wrapped in the Just constructor.

>   let noCookieResponse = defaultLayout [whamlet|
> cookies seem to be disable for your browser! to use this
> app you have to enable cookies.|]
>   when (isNothing mtoken) $ noCookieResponse >>= sendResponse

when is a nice function courtesy of Control.Monad that runs the second argument, a value in some monad, only if the first argument is true. It’s defined like this:

  when :: Monad m => Bool -> m () -> m ()
  when p m = if p then m else return ()

If mtoken is bound to a Nothing value we’ll return an error message to the user asking them to enable cookies and discontinue normal execution by using sendResponse. After the isNothing check we are guaranteed that mtoken is bound to a Just value.

>   let rt@(sessionTokenKey:_) = T.splitOn (T.pack "|")
>                                $ fromJust mtoken

We extract the token from mtoken using fromJust and pass that along to T.splitOn. T.splitOn will split a Text value into a list of Text values using the input argument (“|” in this case) as the delimiter. Then we use the “@” syntax to simultaneously bind the result to the rt name and deconstruct the first element of the result into the sessionTokenKey name.

>   (getTokenKey, _) <- getDropboxQueryArgs
>   when (getTokenKey /= sessionTokenKey)
>     $ invalidArgs [T.pack "oauth_token"]

To get the request token key that the user authenticated at the Dropbox website we use getDropboxQueryArgs. Checking this key against the request token key that we stored in the session helps prevent request forgery. If the keys don’t match we stop execution of this handler by calling invalidArgs. “/=” is the not equals operator, like “!=” in other languages.

We do this verification because we want to prevent other sites from successfully coercing a user into invoking this handler. We only want the Dropbox website to invoke this handler.

>   let requestToken = uncurry DB.RequestToken
>                      $ listToPair $ map T.unpack rt

Here’s listToPair in action! Since it returns a tuple we use this nifty built-in function called uncurry:

uncurry f (x, y) = f x y

Using uncurry and listToPair, we pass the DB.RequestToken constructor the request token key and secret that we stored in the session. Since rt contains two Text values we use “map T.unpack” to convert them into two String values.

>   ImageSearch chan dbConfig <- getYesod
>   accessToken <- liftIO $ myAuthFinish dbConfig requestToken

Now we can finish up the authentication process and get our access token. We pass in the authenticated request token to DB.authFinish (by way of myAuthFinish) and if everything is successful we obtain the access token.

>   liftIO $ writeChan chan accessToken

In this app we make use of Concurrent Haskell. This allows us to create multiple independent threads of control in the IO monad. There are logically two main concurrently running threads in this app, the web server thread where all of our handler code is run, and the thread that is updating the Dropbox accounts of the users of our app with the relevant search results. We’ll talk about our use of threads a lot more later.

Channels are a mechanism for typed inter-thread communication in Haskell. We use the channel component of our app value to send over the access token so that we can begin the updating process for this user’s Dropbox account.

>   defaultLayout [whamlet|okay you are hooked up!|]

Finally, we send a success message to the user’s browser indicating they have linked their account to our app.


We’ve gone over monads and we’ve gone over their weaker counterparts, applicative functors. Arrows are another pattern you’ll likely see used in Haskell code. They commonly serve as a general framework for representing and combining operations that convert input to output, kind of like filters. Arrows, like monads and applicative functors, are actually types and arrow types are instances of the Arrow type class:

class Arrow a where
  arr :: (b -> c) -> a b c
  (>>>) :: a b c -> a c d -> a b d
  first :: a b c -> a (b, d) (c, d)

The actual Arrow type class definition is slightly more complicated but for simplicity’s sake this will do just fine :)

Given my analogy to filters you might think that regular Haskell functions resemble arrows and you’d be right. Arrows are generalizations of regular Haskell functions and functions are, in fact, defined as instances of the Arrow type class:

instance Arrow (->) where
  arr f = f
  f >>> g = g . f
  first f (x, y) = (f x, y)

To understand this instance declaration it’s important to note that the arrow symbol, “->”, is an operator in the Haskell type language. Just like normal operators, operators in the type language also have prefix forms, e.g. “(->) a a” is the same type as “a -> a.”

In “instance Arrow (->)”, we turned the “->” operator into its prefix form and used that to define the Arrow instance for functions. In Haskell, the “->” operator in the type language is actually a type constructor, i.e. when you apply it to two types it creates a new type, a function of those two types.

> selectImageUrls :: ArrowXml a => a XmlTree String
> selectImageUrls = deep (isElem
>                         >>>
>                         hasName "a"
>                         >>>
>                         hasAttr "href"
>                         >>>
>                         getAttrValue "href"
>                         >>>
>                         isA isImgUrl
>                        )
>   where isImgUrl = DL.isInfixOf "imgurl="

It’s common to use the Arrow type class as a basis for combinator libraries; libraries that allow you to combine domain-specific functions in intuitive ways to create new functions. HXT is one such combinator library, it’s a library for manipulating XML documents. In our app we define an arrow, selectImageUrls, that takes an XML document, denoted by the XmlTree type, and extracts all of the links that contain the string “imgurl=”. These are links in the image search result page that contain the URLs to the found image files.

Thread Architecture

Above I wrote that there were two main threads of execution in this app: the thread that served out HTTP requests for our web front-end and the thread that implemented the image search functionality in our user’s Dropboxes. For the latter part, there are actually many threads. There is one thread that is listening on the Haskell channel for new users linking our app to their accounts from the web server thread. There is a thread per user account that polls the user’s Dropbox account every 30 seconds and waits for new folders for it to populate with images. There is also a thread per new folder per user account that is responsible for populating a specific folder with the images found during the image search.

In other languages like C, C++, Java, or Python this unbounded use of threads wouldn’t be very efficient since threads in those languages often map 1:1 to kernel-level threads. Normally you can’t depend on kernel-level threads scaling into the tens of thousands. In Haskell (or at least in modern versions of GHC) threads are relatively cheap and the runtime does a good job of distributing many Haskell threads across a bounded number of kernel-level threads, usually one per CPU.

Image Uploading Thread

> handleFolder :: DB.Session -> String -> IO ()
> handleFolder session filePath = do

handleFolder is the thread that is responsible for populating a specific folder in a user’s Dropbox with the image search results.

>   let searchTerm = takeFileName filePath

We use the file name portion of the folder path as the search term.

>       src__ = "http://www.yourfavoritesearchengine.com/"
>       src_ = src__ ++ "search?tbm=isch&tbs=sur:f&"
>       src  = src_ ++ (URL.export $ URL.importList [("q", searchTerm)])

src is the generated full image search URL. We use the URLEncoded library to generate a properly escaped URL query string.

>   imageSearchResponse_ <- simpleHttp src
>   let imageSearchResponse = T.unpack
>                             $ decodeUtf8With ignore
>                             $ toStrict imageSearchResponse_

Here we fetch the image search result page using simpleHttp. simpleHttp returns a lazy ByteString type but our XML library requires a String type so we have to convert between the two using a combination of T.unpack, decodeUtf8With, and toStrict.

>   images <- runX ( readString [ withParseHTML yes
>                               , withWarnings no
>                               ] imageSearchResponse
>                    >>>
>                    selectImageUrls
>                  )

Here we make use of the HXT XML processing library to parse out all the relevant image search URLs from the HTML document returned from the search query. Notice the use of the selectImageUrls arrow defined earlier. images is of type [String].

>   forM_ images $ \url -> tryAll $ do

forM_ executes a monad for each element in its list argument. The second argument is a function that takes an element from the input list and returns the corresponding monadic value.

Using forM_ we’re performing an IO action for each image URL we parsed out of the result page to ultimately upload that image into the user’s Dropbox. We wrap the monad expression in a tryAll to prevent an exception in the processing of any single element from stopping the entire process.

>     urlenc <- URL.importURI $ fromJust $ URI.parseURIReference url
>     let imgUrl = fromJust $ URL.lookup ("imgurl" :: String) urlenc
>         imgUrlURI = fromJust $ URI.parseURIReference imgUrl
>         imgName = takeFileName $ URI.uriPath imgUrlURI
>         dropboxImgPath = combine filePath imgName

Each of the URLs that were parsed out of the HTML contain the source URLs of the images in an embedded query arg, “imgurl”. In this code snippet we extract the actual source URL of the image, imgUrl, from the “imgurl” query arg and we generate the path into the user’s Dropbox where we want to place the image, dropboxImgPath.

>     image <- simpleHttp imgUrl

simpleHttp performs an HTTP request to the location of the source image and returns the response body in a lazy ByteString.

>     DB.withManager $ \mgr ->
>       DB.addFile mgr session dropboxImgPath
>       $ DB.bsRequestBody $ toStrict image

This is the call to the Dropbox SDK that allows us to upload a file. We upload the file data, image, to the path we generated earlier, dropboxImgPath. If a file already exists at that path, DB.addFile won’t overwrite it.

New Folder Detection Thread

> handleUser :: Chan DB.AccessToken
>               -> DB.Config
>               -> DB.AccessToken -> [String] -> IO ()
> handleUser chan dbConfig accessToken_ foldersExplored = do

handleUser is the thread that runs for each user that is linked to our API app. It monitors the user’s Dropbox for new folders that we should populate with search results. It polls the user’s Dropbox every 30 seconds and loops forever.

While this thread is running it’s possible for the handleNewUsers thread to send us a new access token to use through the channel given by the chan argument.

>   accessToken <- let getCurrentAccessToken x = do
>                        emp <- isEmptyChan chan
>                        if emp
>                          then return x
>                          else readChan chan >>= getCurrentAccessToken
>                  in getCurrentAccessToken accessToken_

I use the “let ... in ...” syntax to privately define the getCurrentAccessToken function. This function repeatedly polls the access token channel using isEmptyChan until it’s empty at which point it returns the last access token that was pulled off the channel.

>   efolders <- tryAll $ do

We wrap all the IO actions in this run of handleUser just in case a transient exception occurs.

>     let session = DB.Session dbConfig accessToken

session is the name bound to the DB.Session value that the Dropbox SDK interface needs to upload file data into a user’s Dropbox.

>     metadata <- myMetadata session "/" Nothing

We make use of myMetadata to get a collection of all the children inside the root of our API app sandbox, “/”.

>     let (_, Just (DB.FolderContents _ children)) = metadata
>         folders = mapMaybe (\(DB.Meta base extra) ->
>                              case extra of
>                                DB.Folder -> Just $ DB.metaPath base
>                                _ -> Nothing) children
>         newFolders = folders DL.\\ foldersExplored

This snippet of code extracts out the list of new Dropbox paths that we should be populate with image search results.

mapMaybe is a combination of map and filter. Any element that the input function returns Nothing for is filtered out of the returned list. Elements that the function returns a Just value for are included in the output list without the Just wrapper. We use it here to return all the paths in the “/” folder that are folders and we exclude children that are plain files.

The “DL.\\” operator returns all the elements in the first list operand that aren’t included in the second list operand, it’s like a set difference operation.

>     forM_ newFolders (forkIO . handleFolder session)

Here we use the forM_ function again. This time we spawn off a new thread using forkIO for each new folder we found in the app’s sandbox folder.

>     return folders

We need to give our parent IO action access to the new list of paths in the sandbox folder so it can keep track of what paths are new.

>   threadDelay $ 30 * 1000 * 1000

threadDelay is like sleep() in other languages; It pauses execution for 30 seconds.

>   let curFolders = either (const []) id efolders

If an error occurred while polling the user’s account for new folders we bind an empty list to curFolders, otherwise we bind the current list of folders to curFolders.

>   handleUser chan dbConfig accessToken
>     $ curFolders `DL.union` foldersExplored

After sleeping for 30 seconds we loop by recursing. This is the common way in Haskell to loop in a monad. Before we recurse here we update the total lists of folders we’ve ever seen so that we don’t attempt to update them again.

New App User Thread

> handleNewUsers :: ImageSearch
>                   -> Map.Map String (Chan DB.AccessToken)
>                   -> IO ()
> handleNewUsers app_@(ImageSearch chan dbConfig) map_ = do

handleNewUsers is the thread that is listening for newly linked users to our API app via the channel and spawns off a handleUser thread for each new user.

We make use of the “@” syntax again to simultaneously bind the ImageSearch argument to the app_ name and deconstruct it into its chan and dbConfig components. The map_ argument keeps a mapping from user ID to the channel of the thread that is handling that user ID. We need that so we can update the access token a thread is using if it is revoked.

>   (accessToken, userId) <- readChan chan

readChan gets a value off the channel shared between this thread and the web server thread. Each value is a tuple that contains a user ID and an access token for that user ID.

>   newMap <- case Map.lookup userId map_ of
>     Just atChan -> do
>       writeChan atChan accessToken
>       return map_
>     Nothing     -> do
>       nChan <- newChan
>       _ <- forkIO $ handleUser nChan dbConfig accessToken []
>       return $ Map.insert userId nChan map_

We look up the user ID we were given in our map of thread channels. If we have a thread handling the account of user ID we got, we send it the new access token by writing to its channel. If we don’t have a thread handling this specific user account then we create a new channel, spawn off a new handleUser thread, and update our channel map.

>   handleNewUsers app_ newMap

Finally we loop with the new map.

Main Program Entry Point

> main :: IO ()
> main = do

So it’s been a long and arduous path but finally we arrive at that main IO action. The main IO action kicks off execution for every Haskell program just like in C/C++ and Java.

>   let defaultAppKey = undefined :: String
>       defaultAppSecret = undefined :: String
>       defaultAppType = undefined :: DB.AccessType

Here are the default credentials for the app. We use the Haskell value undefined, otherwise known as _|_. This is a polymorphic constant that you can use anywhere in Haskell, it can be of any type. An exception will be thrown if an undefined value is ever evaluated in your Haskell program. To get this app to work you will need to supply your own values for these constants.

In theory these should be parsed out of the command line or a configuration file but for the purposes of this demo app we define them inline here.

>   chan <- newChan

Create the inter-thread communication channel using newChan.

>   dbConfig <- DB.mkConfig
>               DB.localeEn
>               defaultAppKey
>               defaultAppSecret
>               defaultAppType
>   let app_ = ImageSearch chan dbConfig

Create our application specific ImageSearch value. It contains both the channel and a DB.Config value.

>   _ <- forkIO $ handleNewUsers app_ Map.empty

Kick off the handleNewUsers thread that accepts new users to our app.

>   warpDebug 3000 app_

And finally, call warpDebug which kicks off our Yesod web interface.


That’s it. That’s our Dropbox API app in Haskell. If you were a newcomer to Haskell this would be a healthy time to have tons of questions. Actually if I’ve done my job right you should be very curious to know more about Haskell :) Head on over to HaskellWiki and start your journey. If you want a nice friendly book to help you get more formally acquainted I can recommend both “Real World Haskell” and “Learn You a Haskell for Great Good”. One piece of advice for your new Haskell journey: don’t sweat the monads.

As for our API app, it’s actually not finished yet. One huge thing missing is that it doesn’t remember which users linked to our app and what folders we’ve populated across restarts, we’d need to store that data in some kind of persistent database to fix that.

Another pain point is the user has to wait 30 seconds in the worst case for their folders to be populated with images. This is because each of our handleUser threads poll the Dropbox API every 30 seconds. While this is bad from a user experience perspective it’s also bad from an engineering perspective. This will cause the load we induce on the Dropbox API to increase linearly with the number of users using our app, we’d instead like it to increase linearly with the number of active users using our app. Currently there’s no way to get around this issue but we’re working on it!

Other minor improvements include picking a better algorithm to decide which folders we populate with photos, better HTML for our web interface, and streaming uploads to the Dropbox API. I’m sure there are more.

As an exercise, consider fixing some of these problems, remember you can fork this project on GitHub. By the way our Haskell SDK is also on GitHub, you may need to fork that too.

If you have any questions, feel free to reach out. My email is rian+dropbox+com. Have fun!

Many thanks to Kannan Goundan, Brian Smith, Dan Wheeler, ChenLi Wang, Martin Baker, Tony Grue, Ramsey Homsany, Bart Volkmer, Jie Tang, Chris Varenhorst, and Scott Loganbill for their help in reviewing this post. Also special thanks to Michael Snoyman for creating Yesod and accepting my patches. Lastly, a huge thanks to all those who have researched and pushed Haskell forward throughout the years.

A Python Optimization Anecdote

Uncategorized / 49 Comments Posted by Pavel Panchekha on October 24, 2011

Hi! I’m Pavel and I interned at Dropbox over the past summer. One of my biggest projects during this internship was optimizing Python for dynamic page generation on the website. By the end of the summer, I optimized many of dropbox.com’s pages to render 5 times faster. This came with a fair share of challenges though, which I’d like to write about today:

The Problem
Dropbox is a large website with lots of dynamically generated pages. The more pages that are dynamically generated from user input, the bigger the risk becomes for Cross-site scripting attacks. To prevent this, we must ensure to escape all dynamically generated text before it gets sent to the browser. Note that we also need to escape things differently in different string contexts such JavaScript code, HTML text, and HTML attributes. This is because an abusive user-generated string placed in JavaScript may not be abusive if placed in HTML and vice-versa.

All user-generated strings on the Dropbox website are passed through a special escaping function we’ve written that takes context into account. The problem is that having defined our own escaping function we need to ensure that it’s as fast as possible since basically all user input is passing through this function. In this article we’ll focus on one particular escaping context, HTML text.

A quick disclaimer, before you accuse us of reinventing the wheel and ignoring both the xml.sax.saxutils.escape and cgi.escape functions, it doesn’t escape quite enough characters. This is especially true if you consider the possibility for confusing the browser into believing your page is UTF-7, at which point the equal sign has to be escaped. So we have to write our own.

First steps
The original function looked like this:

  1. WHITELIST = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
  3. def escape_engine(s, on_whitelist, on_ascii, on_unicode):
  4.     pieces = []
  5.     for char in s:
  6.         if char in WHITELIST:
  7.             pieces.append(on_whitelist(char))
  8.         elif ord(char) < 256:
  9.             pieces.append(on_ascii(char))
  10.         else:
  11.             pieces.append(on_unicode(char))
  12.     return "".join(pieces)
  14. def html1(s):
  15.     """
  16.    Escapes a string for inclusion into tag contents in HTML.
  18.    Do *not* use for including a string into an HTML attribute; use
  19.    escape.attr() for that.
  20.    """
  22.     # Ensure we deal with strings
  23.     s = unicode(s)
  25.     escaped = escape_engine(s,
  26.         on_whitelist = lambda c: c,
  27.         on_ascii = lambda c: "&#%02x;" % ord(c),
  28.         on_unicode = lambda c: "&#%02x;" % ord(c),
  29.     )
  31.     return UNSAFE_bless(escaped) # Now it’s escaped, so should be safe

Applying it to some test templates gives (times in seconds; the digits (obviously) aren’t all significant):

: html1 14.1678471565

Not very fast. (I blame the intern who wrote it…) Of course, the code wasn’t optimized for speed, but for readability. But like I said, given that this is actually a non-negligible part of our render time, it could use some optimization.

The first fact of Python optimization is that function calls are slow. html1 is awful in this regard: it calls a function per character! Worse yet, the common case is calling a function that does nothing at all! So the first step is to inline. This also lets us join the on_ascii and on_unicode cases (originally separated because we also use the above for escaping JavaScript, where we do make a distinction between Unicode and ASCII literals).

  1. def html2(s):
  2.     """
  3.    Escapes a string for inclusion into tag contents in HTML.
  5.    Do *not* use for including a string into an HTML attribute; use
  6.    escape.attr() for that.
  7.    """
  9.     # Ensure we deal with strings
  10.     s = unicode(s)
  12.     pieces = []
  13.     for char in s:
  14.         if char in WHITELIST:
  15.             pieces.append(char)
  16.         else:
  17.             pieces.append("&#%02x;" % ord(char))
  19.     return UNSAFE_bless("".join(pieces)) # Now it’s escaped, so should be safe

This has a pretty good 15% or so improvement:

: html2 12.8864150047

Implicit Loops
Now, the preceding code sample maybe isn’t too pretty, but the improvement is nothing worth sneering at. There’s more we can do though. The second fact of Python optimization is that the loop overhead can also be pretty significant. This leads to our second attempt, which gets rid of the loop in favor of a generator expression. Fortunately, switching to generator expressions makes the resulting function as readable as the original.

  1. def html3(s):
  2.     """
  3.    Escapes a string for inclusion into tag contents in HTML.
  5.    Do *not* use for including a string into an HTML attribute; use
  6.    escape.attr() for that.
  7.    """
  9.     # Ensure we deal with strings
  10.     s = unicode(s)
  12.     escaped = "".join(
  13.         c if c in WHITELIST
  14.         else "&#%02x;" % ord(c)
  15.         for c in s)
  17.     return UNSAFE_bless(escaped) # Now it’s escaped, so should be safe

What are the timings with this new version?

: html3 13.4748219418

Hmm. The readability improvements are nice, but the speed dropped. Maybe we can get both speed and reability with some tweaks?

More OptimizationsWe already picked out a neat 10% improvement and still have a wonderfully readable function. But we can go faster…

Generators?? Really?
The problem with the generator expressions used above is that Python actually constructs a generator. Let’s look at the bytecode:

  2. >>> import dis
  3. >>> dis.dis(optimization_story.html3)
  4.  78          12 LOAD_CONST               1 ()
  5.              15 LOAD_ATTR                1 (join)
  7.  79          18 LOAD_CONST               2 (<code object <genexpr> at 0x1005dddc8, file "./optimization_story.py", line 79>)
  8.              21 MAKE_FUNCTION            0
  10.  81          24 LOAD_FAST                0 (s)
  11.              27 GET_ITER            
  12.              28 CALL_FUNCTION            1
  13.              31 CALL_FUNCTION            1
  14.              34 STORE_FAST               1 (escaped)

Luckily, we can avoid making the generator by simply using a list comprehension instead of a generator expression:

  1. def html4(s):
  2.     #…
  3.     escaped = "".join([
  4.         c if c in WHITELIST
  5.         else "&#%02x;" % ord(c)
  6.         for c in s])
  7.     #…

This brings us back to faster than html2.

: html4 11.2531888485

Looks like the generator expression is actually slower than the list comprehension in this case, a good explanation is probably that our sample set of strings are probably too small to reap the benefits of using a generator. As expected, our disassembly now looks a lot more friendly:

  1.  97          12 LOAD_CONST               1 ()
  2.              15 LOAD_ATTR                1 (join)
  4.  98          18 BUILD_LIST               0
  5.              21 DUP_TOP            
  6.              22 STORE_FAST               1 (_[1])
  8. 100          25 LOAD_FAST                0 (s)
  9.              28 GET_ITER            
  10.         >>   29 FOR_ITER                43 (to 75)
  11.              32 STORE_FAST               2 (c)
  12.              35 LOAD_FAST                1 (_[1])
  13.              38 LOAD_FAST                2 (c)
  14.              41 LOAD_GLOBAL              2 (WHITELIST)
  15.              44 COMPARE_OP               6 (in)
  16.              47 JUMP_IF_FALSE            7 (to 57)
  17.              50 POP_TOP            
  18.              51 LOAD_FAST                2 (c)
  19.              54 JUMP_FORWARD            14 (to 71)
  20.         >>   57 POP_TOP            
  21.              58 LOAD_CONST               2 (‘&#%02x;’)
  22.              61 LOAD_GLOBAL              3 (ord)
  23.              64 LOAD_FAST                2 (c)
  24.              67 CALL_FUNCTION            1
  25.              70 BINARY_MODULO      
  26.         >>   71 LIST_APPEND        
  27.              72 JUMP_ABSOLUTE           29
  28.         >>   75 DELETE_FAST              1 (_[1])
  29.              78 CALL_FUNCTION            1
  30.              81 STORE_FAST               3 (escaped)

Sets?? Really?
But this is still slower than it should be. One low-hanging fruit still stands out: why are we doing a linear search on the whitelist?

  1. WHITELIST2 = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")

The timings bear out the guess that sets are faster than strings:

: html5 8.6205868721

The question remains, Can we go faster?

Deeper Python Optimization
If we’re going to optimize this function, we might as well extract all the performance we possibly can. One peculiarity of modern Python is that it has two LOAD instructions: LOAD_GLOBAL and LOAD_FAST. Due to a implementation detail loading a local variable is faster than loading a global variable. Consequently, you want to avoid global variables in tight loops. The above disassembly pointed to two such globals: ord and WHITELIST.

  1. def html6(s):
  2.     # …
  3.     lord = ord; lWHITELIST2 = WHITELIST2
  4.     escaped = "".join([
  5.         c if c in lWHITELIST2
  6.         else "&#%02x;" % lord(c)
  7.         for c in s])
  8.     #…

We don’t expect this to buy us much, but why not?

: html6 7.87281298637

In absolute measurement a second is not much of an improvment but as a percentage it’s still a significant of time.

String Interpolation

String interpolation is another thing Python isn’t very fast at. Let’s see if removing that helps.

  1. def html7(s):
  2.     # …
  3.     lstr = str; lord = ord; lWHITELIST2 = WHITELIST2
  4.     escaped = "".join([
  5.         c if c in lWHITELIST2
  6.         else "&#" + lstr(lord(c)) + ";"
  7.         for c in s])
  8.     # …

: html7 5.58323383331

That’s a huge boost from saving the string interpolation!

Every Problem Can be Solved With a Big Enough Hash Table
Any web programmer knows that performance problems can be universally solved with caching. (Heh heh, if only…) In any case, if the gain from removing string interpolation was so great, maybe we can just cache the results of that characters to escaped form: function, and not do string concatenation either. We’ll set up a cache of characters to HTML escapes:


Then we read and write our cache as necessary:

  1. def html8(s):
  2.     # …
  3.     escaped = "".join([
  4.         c if c in lWHITELIST2
  5.         else CACHE_HTML_ESCAPES.get(c) or CACHE_HTML_ESCAPES.setdefault(c, "&#" + lstr(lord(c)) + ";")
  6.         for c in s])
  7.     # …

Since our web servers are long-running processes, the cache should eventually capture all of the characters people are using; Python’s dicts are then fast enough to give us a speed boost over string concatenation and =str(ord(c))=.

: html8 4.5724029541

Another big boost from avoid what seems like a trivial computation.

Premature Optimization…
If we’re going down the setdefault branch so rarely, and since that’s the only place we’re using str and ord, maybe it’s not worth making those local variables?

  1. def html9(s):
  2.     # …
  4.     escaped = "".join([
  5.         c if c in lWHITELIST2
  6.         else CACHE_HTML_ESCAPES.get(c) or CACHE_HTML_ESCAPES.setdefault(c, "&#" + str(ord(c)) + ";")
  7.         for c in s])

We don’t expect much of a benefit here, but maybe a percent change or so…

: html9 4.565928936

Wait, Why a Python Loop At All?
Wait, why are we using a Python loop at all? Python loops are slow… Instead, maybe we can use the C code in Python’s re module? We can use regexes to find anything that isn’t in our whitelist, and do the replacement. That way, the “do-nothing” operation on whitelisted characters becomes much cheaper.

  1. import re
  2. RE_FIND_NON_WHITELISTED = re.compile("([^" + "".join(WHITELIST2) + "])")

Unfortunately, there’s no way to get the entire matched range from a Python match object. So we’re forced to call the group method– and function calls are slow!

  1. def htmlA(s):
  2.     # …
  3.     escaped = RE_FIND_NON_WHITELISTED.sub(
  4.         lambda m: CACHE_HTML_ESCAPES.get(m.group(1)) \
  5.                or CACHE_HTML_ESCAPES.setdefault(m.group(1), "&#" + str(ord(m.group(1)))) + ";",
  6.         s)
  7.     # …

What are our results?

: htmlA 4.54020690918

Hmm, this isn’t that great, actually. We did save a percentage point, but this is a lot less readable (at least, in my opinion) than html9, and the gains aren’t worth it.

Until, that is, we try it on whitelisted-only text:

: html9 3.84376811981
: htmlA 0.796116113663

Whoosh! html9 got a bit faster, by virtue of avoiding hitting the cache at all, but the regex-based solution *rocked*, since it could do all of the skipping of characters in C, not Python.

In fact, the regex-based solution is slower for punctuation, since it has to do multiple function calls instead of just one function call and dictionary lookup. And for short strings, the overhead to initiating the regex search is worse. Let’s try a test on punctuation-heavy, short text snippets:

: html9 1.59476995468
: htmlA 3.44844794273

Measure Twice, Cut Once
So we have one function that’s faster for English alphanumeric text and one that’s faster for everything else. But we can’t just shaft our non-US users, and we don’t want to settle for a function that’s five times slower for the common case! So we have a few options.

The first is simply to expand our whitelist — most file names have a dot in them, and spaces, dashes, and similar are popular. At signs appear in emails. And so on. Of course, one has to be careful not to permit any XSS vector while doing so; but a conservative expansion by adding -|!,. _ to our whitelist should be safe. Of course, this helps both versions, so it’s not really a fix.

Another fix presents itself, though: can we figure out which version of html to use quickly?

  1. def htmlB(s):
  2.     # …
  4.     non_whitelisted = RE_FIND_NON_WHITELISTED.findall(s)
  5.     if len(non_whitelisted) > .6 * len(s):
  6.         escaped = RE_FIND_NON_WHITELISTED.sub(
  7.             lambda m: CACHE_HTML_ESCAPES.get(m.group(1)) \
  8.                 or CACHE_HTML_ESCAPES.setdefault(m.group(1), "&#" + str(ord(m.group(1)))) + ";",
  9.             s)
  10.     else:
  11.         lWHITELIST2 = WHITELIST2
  12.         escaped = "".join([
  13.                 c if c in lWHITELIST2
  14.                 else CACHE_HTML_ESCAPES.get(c) or CACHE_HTML_ESCAPES.setdefault(c, "&#" + str(ord(c)) + ";")
  15.                 for c in s])
  17.     # …

Why .6, you ask? Well, the exact constant isn’t too important (the function *works* whichever way the test goes, it’s just an optimization), but some testing showed it to be the approximate break-even point of the two methods.

With hope and trepidation, we run the tester…

: htmlB 5.16241598129
: html9 3.97228693962
: htmlA 3.95208191872


As we hoped, the result is fast on whitelisted-only characters…

: htmlB 1.24477005005
: html9 3.41327309608
: htmlA 0.696345090866

And is passable on punctuation:

: htmlB 5.97420597076
: html9 3.61161899567
: htmlA 8.88924694061

But the overhead is pretty saddening…

We can improve things a bit by testing just a few characters for punctuation/unicode.

  1. def htmlC(s):
  2.     # …
  3.     non_whitelisted = RE_FIND_NON_WHITELISTED.findall(s[:10])
  4.     if len(whitelisted) < 6:
  5.     # …

We use =6= instead of =.6 * min(10, len(s))= because if the string is shorter than 10 characters, either alternative is going to be fast.

This leads to a marked improvement. For alphanumeric strings:

: htmlC 0.836707115173
: html9 3.34154415131
: htmlA 0.701889276505

For unicode-heavy strings:

: htmlC 3.70150613785
: html9 2.82831597328
: htmlA 6.77485609055

This is now really looking like an option for production use. But checking is still quite a bit of overhead still– in the case where the user has just about the break-even balance of English and international characters, we’re looking at a 20% premium. Can we do better?

The User is Always Right
Well, so checking which to use has very high overhead. What to do? Well, we’ve got to think about when each case will come up: how might we predict that lots of non-English-alphanumerics are going to come up? What users are likely to store files with international names, or use an international name? (Hopefully, people who use lots of punctuation in both are few…)

Well luckily, Dropbox added translations a bit back, so we already have all of the infrastructure in place to detect what locale a user is from. So a quick optimization is to switch to the html9 version (the list-comprehension-based one) for international users, and use the fast htmlA version (regexes) for US-based users, and also users from countries with mostly-Latin alphabets. The code here is mostly tied to Dropbox internals, so I’m not going to show it, but I’m sure you get the point.

This final optimization removes the check overhead while giving us most of the benefit. Success!

What We Didn’t Do
Now there are some ways to optimize the above even further. The obvious approach is to rewrite the entire thing in C. This lets us squeeze the last bits of performance out of the above, but it would also make things much harder to maintain.

Similarly, using PyPy or a similar JIT would probably help on the inner loop of the pure-Python version, likely making it as fast as the regex-based approach.

The first, most basic conclusion is that the basic facts of Python optimization inline functions, use implicit loops, move computation into C if possible are perfectly valid. Another fact: Python dictionaries are *fast*. The WHITELIST set and the CACHE_HTML_ESCAPES dict both rely on the superb hash table implementation for their performance gains.

Other “common wisdom”, like using locals instead of globals, yields relatively little gain.

Optimizing inner loops in a high-level loops requires lots of trial and error. It was absolutely amazing that moving to string concatenation from string interpolation gave such a huge boost to performance.

Finally, measurement is key. It was measurement that told us that HTML escaping was slow to begin with; and without measuring performance, we would never have guessed that string interpolation was so slow.

Thanks for reading!

Dennis Ritchie

Uncategorized / 4 Comments Posted by Rian on October 14, 2011

Many people spend their lives endlessly searching for the perfect answer to “why?” It stops them early in their tracks and holds them back from reaching their true potential.

Then there are those who are confronted with the question “why?” and laugh directly in its face. They interest themselves solely with the “how?” For these people any “why?” will do as long as there is sufficient “how?” associated with it. Dennis Ritchie was one of those people. A fiercely gifted programmer, he relished in the “how?”

If you were to ask any common person in these modern times “Why Unix?” Well it would be obvious. Unix because of file systems, because of the internet, because of Windows, because of Mac OS X, because of Linux and software freedom. What about “Why C?” Well that would be obvious too. C because of C++, because of Java, because of Perl, because of Python, the list goes on.

But if you were to ask someone “Why Unix?” or “Why C?” in 1972 you’d have a hard time finding an answer, even from Dr. Ritchie himself. The only true answer you’d be able to find would be “For Space Travel” or maybe “For Fun” and little else.

People like Dennis Ritchie made it respectful for those who were more interested in the “how?” rather than the “why?” to follow their own path. A programmer could program for the sake of the craft and the art without feeling like she was wasting her time or potential. Sometimes you just want to make something and that should be okay. Programs became works of art rather than slaves to science or utility.

And because they were works of art it takes an artist to understand the elegance, the universality, and the eternality of his accomplishments. Unix will be here forever and as much as people would like to say otherwise, the file system will be here forever. I’m sorry to say it but those who don’t yet understand why the file system is the ultimate abstraction still have much to learn.

As time has progressed from then to now the works of Dennis Ritchie have taught us that the “why?” doesn’t always have to dictate the “how?” and sometimes the “how?” can dictate the “why?”

Translating Dropbox

Uncategorized / 53 Comments Posted by Dan Wheeler on April 18, 2011

¡Hola, mundo! Welcome to our new engineering blog. We’d like to start with a post on i18n, because aside from being exceedingly fresh in our minds right now, hearing our take on it might be useful to other startups. Going global can be an intimidating prospect, especially if you’re like me and illiterate in all the new languages. I’ll focus today on assembling a translation team because it’s one of the trickier challenges.

Professional and community translation: we use both

Back in March 2008, Facebook’s community famously translated the site into French in just 24 hours. Community translation, done right, often offers the highest quality because users know your product inside-out and naturally use the right tone and formality level. Community translation also makes it possible to go from a handful of languages to over 100.

While intriguing, we’re holding off on community-led translation for now and going with a hybrid approach instead. The main reason is the substantial engineering investment. We’d need to:

  • Build a translation mode that allows volunteers to translate text in-context, as it appears on the screen. This is especially hard for everything not on the website: our Windows, Mac, and Linux desktop apps, mobile apps, emails, etc.
  • Determine, at all times, which text needs translation, and which needs to be re-reviewed. This is tricky because if a translated string is reused, but in a different context, it still needs to be reviewed.
  • Build a voting mechanism to select the best translations, including an ability to flag offensive translations.
  • Prevent large-scale voter conspiracy so that incidents like these don’t happen.
  • Import/Export translations from/to a bunch of different formats (gettext PO, iOS and Android XML, property lists, static HTML, etc.) Groan.

So it requires a big initial effort. Facebook had a staff of about 500 in March 2008, for example, whereas we have 50. By contrast, professional translators are incredibly convenient and barrier-free. We can send them the latest version of 12 files in 12 different formats, they’ll determine the difference from the last version, and send us back the translations, ready to go.

Here’s how our approach works: we hire a firm to translate everything beforehand, then ask our generous userbase to review and send us their corrections, iteratively. Our firm reads through every suggestion per English string, one string at a time, and makes the most popular correction. Before sending a feedback round to translators, we quickly skim it on our own first through a special admin page, to do three things:

  • Fix bugs, such as untranslated text or incorrectly formatted numbers.
  • Evaluate our translators: are people reporting typos, grammar mistakes, and gross mistranslations? Or is all the feedback subjective?
  • Confirm that previously reported corrections get incorporated: if our translators don’t fix a problem, and it continues to get reported, our admin page will throw a tantrum.

Here it is in action. Say I’d like to report a dubious French translation. First I click the green tab to the left of the screen:

Then I type or paste a few letters of the translation I’d like to improve. It pops up a list of matches:

After selecting the translation, the original English text shows up underneath. I type in my improved translation and the reason I like it better. Done.

This feature also exists on experimental builds of our desktop app, released on the forums.

Programming details

Grouping suggestions by English string is key. It offers good usability: people can type just a few letters, autocomplete the translation in question, and view it alongside the original English string. Further, it organizes everything for easy translator review. One issue complicates things: strings with placeholder variables. For example:

Hello, %(first_name)s!

We want users to be able to autocomplete text as it appears on the page: “¡Hola, Dan!”, not “¡Hola, %(first_name)s”, and at the same time, internally group all instances of this placeholder string together. You might think it would be as easy as wrapping the gettext _() function, but consider typical gettext code:

greeting = _('Hello, %(first_name)s') % {'first_name' : user.first_name}

That is, placeholder substitution happens outside the _() translation call. Our solution: on the server, we subclass python’s unicode builtin type, overriding __mod__ to remember its filled-in placeholders. We then wrap the gettext _() function to do two things:

  • Return this subclassed string type.
  • Keep a response-wide list of all such returned strings.

At the very end of the response, we take the list of translated strings and serialize into JSON, ready to go for browser-side javascript autocompleter code. For each string, this list includes the placeholder form “¡Hola, %(fname)s” (for bookkeeping), and filled-in form “¡Hola, Dan” (for autocompletion). The reason we keep autocompletion entirely browser-side is that it makes the experience feel instant. Another nice result is the autocompletion menu is always narrowed down to text that appears on the page — less choices to wade through.

One last complicating detail: AJAX. Some of our AJAX requests, for example in the events feed, contain translated display text. We want users to be able to select text no matter where it came from. To solve, we pack a list of new translations within each AJAX request, similar to the list from the initial request.

More reading

  • Excellent background on Facebook’s system. They use professionals too, more than most people realize.
  • Check out Smartling if you’re interested in community translation, especially if your content is web-only. They’re making community translation easier on multiple levels.

i18n bonus round

For anyone about to do similar work, I thought I’d give a quick summary of some of the other challenges we needed to cover for today’s launch. Let us know in the comments if you’d like to hear more about any of these topics; we’d be happy to expand in future posts.

We encountered some typical i18n problems:

  • Translating text inside Python/Javascript/Objective-C/Java, templates, database tables (where our emails live), and static docs like our terms of service. Keeping translations synced with evolving English. Covering string substitutions and plurals.
  • Translating Dropbox-specific terminology ahead of time. Everyone loves a good translation blunder. One of my favorites: “Jolly Green Giant,” the mascot for Green Giant frozen vegetables, being translated into Arabic as “Intimidating Green Ogre.” Here’s another disaster. We carefully translated Dropbox terminology before everything else — phrases like “selective sync,” “Dropbox guru,” and “Packrat” — because these are the hardest to get right.
  • Supporting both the perverse American date format 4/22/1970 and much more logical German 22.04.1970. Correctly formatting times, numbers, percents, and currency. Thanks babel!
  • Formatting people’s names correctly. For example, Japanese names are family-name-first, with a space in between if the name is ASCII (Takahashi Yukihiro), and no space for East Asian characters (高橋幸宏). Similarly, for our signup form, we put family name first for Japanese.
  • Displaying translated country lists, correctly sorted, in our address form. We use IP geolocation to guess the right country and put it at the top of the list.
  • Translating images with embedded text, overdubbing videos.
  • Fixing one million layout problems resulting from “overfitting” layouts to fixed English text. Watching long translations completely destroy one’s fragile, inelastic layouts.
  • Making sure various locale settings — user preference, cookie, web request, Accept-Language header, desktop app settings — all work together. For example, if a user changes their locale from the desktop app, and then clicks a web link from within the app, we make sure the website matches the new locale. On the other hand, it would be confusing, and difficult to undo, if changing the website’s locale affected the desktop app.

And a few not-so-typical:

  • For the desktop app, overriding a class in Python’s gettext module to pack translations inside python bytecode (instead of outside the executable, for example /usr/share/locale on Linux). This eases cross-platform concerns and keeps our app stand-alone. In the future, as we add more languages, we might have the server beam down text to the client.
  • Building a lightweight gettext-like lib in javascript for browser-side translations.
  • Extending the parser for Cheetah, our templating language, to allow English string extraction.
  • Word wrapping Japanese on the desktop app. The algorithm is slightly different and has a couple of hard to detect special cases, more info here.
  • Collation. It is a hard lesson in life that for most of the world, alphabetical != ASCIIbetical. For example, in German, ä is sorted as if it is two letters, ae, while in Swedish, it comes after z. For the most part, sorting a list of strings is as easy as calling a library. Thanks ICU and PyICU language bindings! Unfortunately, we don’t have ICU in browser-side javascript. Supporting locale-sensitive sorting and resorting on the website’s file browser, with dynamic insertions for new files and directories, and making it fast, was an interesting challenge. Similarly, we had to manually implement Japanese collation, with its multiple scripts, for iPhone and iPad, and build our own Japanese browser widget to match Apple’s — something missing from the iOS SDK.

This is a totally new world for us, so if you think of any improvements we can make or areas we missed, we’d love to hear about it!