X-Team Blog - The Most-Loved Company for Engineers

Xplorers: A deep-dive into mkdirp

Written by Josh Johnston | Feb 17, 2015 5:00:00 AM

Greetings adventurer! In today’s expedition we’re going to pick a popular nodejs library and dive into its depths to see what we can find.

If this all goes to plan we’ll come away not only understanding the internal workings of a commonly-used module, we’ll also discover some interesting patterns and techniques along the way.

Meet mkdirp

Get ready to dig into substack’s excellent mkdirp, a handy library for recursively creating directories (just like mkdir -p in linux). If you’re like me, you’ve probably used this library many more times than you actually realize – it is a dependency of about 1000 other modules in npm! So understanding modules like this will naturally lead us on to enlightenment elsewhere.

If we look in index.js we’re dealing with just under 100 lines. That feels pretty achievable. Let’s go.

Aliasing the export

The first thing that stands out here is on line 4: module.exports = mkdirP.mkdirp = mkdirP.mkdirP = mkdirP;

At first glance it looks a bit odd, but it’s quite easy to understand if we read it from right to left:

  • module.exports = mkdirP.mkdirp = mkdirP.mkdirP = mkdirP is the main function that we’re exporting. This is defined on line 6.
  • module.exports = mkdirP.mkdirp = mkdirP.mkdirP = mkdirP: what we’re doing here is adding a new property to the mkdirP function (remember that in javascript functions are objects). This new property is an alias back to the original function. Why would we do this? I can’t speak for the author’s intention here, but it could be to allow different forms of require. For example, the alias means we can get the same function in 2 different ways:
  1. require('path/to/module')
  2. require('path/to/module').mkdirP

Maybe someone would prefer the second form for being more explicit.

  • finally there is one more alias: mkdirP.mkdirp = mkdirP.mkdirP = mkdirP. This alias gives us the same function but with an all-lowercase name.

So, first hill climbed. Onward to the function itself.

Function arguments and alternate signatures

The mkdirP function starts out with a technique that, while widely used, can be confusing at first glance:

(Line 7)

if (typeof opts === 'function') {
    f = opts;
    opts = {};
}

You might read this literally as “if the options object is actually a function, set function f to be the options object and then erase the options object.” And reading it that way you would be rightly confused.

But once you’ve seen it a few times you’ll be able to see between the lines and recognise it as a technique for optional function arguments. You could picture it as 2 signatures for the same function:

  • function mkdirP (p, opts, f)
  • function mkdirP (p, f)

With that in mind we can ditch the confusing literal reading from before, and learn to read this as “if the second argument is a function, the caller must be using the mkdirP (p, f) signature. Treat the second argument as f and set opts to its default value (an empty object)”.

Further on you’ll see more code dealing with alternate signatures, like in lines 11 to 13.

Dependency injection

The next interesting thing we’ll talk about is on line 16: var xfs = opts.fs || fs;

This allows us to substitute a custom fs object in place of the core module. As long as we provide an object with equivalent functions for mkdir, mkdirSync, stat and statSync it will go ahead as normal. The most obvious use for this would be for testing (so that a real filesystem is not needed) and we can see mock-fs being used in some of the test-cases. But there’s no reason why you couldn’t also use this for some kind of virtual filesystem.

Binary operations

On line 19 we see one of the less-common parts of the javascript language: binary operations.

mode = 0777 & (~process.umask());

You may well write javascript for years and never need to use syntax like this. But it’s good to be aware of it for the sake of understanding what’s going on. To break it down:

  • we start with a number representing “full access” (0777. The 0 at the start tells javascript this is an octal number)
  • process.umask() gives us another number which relates to the filesystem permissions of the user (whichever user executed the node process).
  • ~ is a binary operation which inverts the value.
  • & is an operator that combines the two numbers in a certain way (binary AND).

So putting that all together: we get the user’s permission mask, invert it, and apply the mask to a full set of permissions. The value we end up with after applying the mask is what will be set on any files or directories we create.

If you want to learn more on this topic, check out File system permissions and umask in node.js by fellow X-Teamer Kamil Ogórek.

Making the directory

Finally we get to the part of the function that actually makes the directory:

(Line 26)

xfs.mkdir(p, mode, function (er) {
    if (!er) {
        made = made || p;
        return cb(null, made);
    }

This reads as “Create a directory, and if there’s no error, we can return here (via callback)”. The second argument to the callback, made, will be set to the path of the first directory to be successfully created (p).

Line 31 continues with error handling: switch (er.code) {

Note that in this module we actually expect errors to occur. The ENOENT error means that mkdir failed to create the directory because a parent directory didn’t exist. We handle that by recursively calling mkdirP again, but this time attempting to create the parent directory (path.dirname(p)):

(Line 32)

case 'ENOENT':
    mkdirP(path.dirname(p), opts, function (er, made) {
        if (er) cb(er, made);
        else mkdirP(p, opts, cb, made);
    });
    break;

This recursion will continue stepping up to the parent until it successfully creates a directory, at which time it steps back down to create the child directory we originally asked for.

Homework

We only got halfway through the module, but the rest of the code mostly deals with a synchronous version of the same thing (using fs.mkdirSync instead of fs.mkdir). Take a look and see if you can recognise the same patterns that we saw earlier!

Another great way to gain more insight into a 3rd-party module is to read some of the history. Pick a spike on the contributors graph and see what you can learn from commits made in that time. You can also learn more about the design goals and particular challenges the authors faced by reading over past and present issues and pull requests. And of course, if you get stumped on the meaning of a certain part of the code you can always ask (:

Wrapping up

Thus concludes our adventure into the land of mkdirp! I wonder how many times I’ve depended on this code without having understood in depth how it works? My hope is that by now, you also can see that this need no longer be the case. Nearly everything in software development is a matter of making a tradeoff. So the more we understand the libraries we use, the more informed our decisions will be when it comes time to trade.