Building a Debugger: Code Analysis

A crash course in writing your own Babel plugin.

Nanda Syahrasyad

Nanda Syahrasyad

Last updated May 15, 2021

A couple of months ago, I shipped Playground — a web-based JavaScript debugger that lets you write JS code and attach breakpoints using the debugger statement. It was the first personal project that I saw through to completion, and it was an incredible opportunity for me to dip my toes in web technologies I've never used before.

I learned a ton, and in the next little bit, I want to talk about how exactly it works. In the process, we're going to build our own mini-debugger using Babel's plugin APIs. Let's get started!

Architecture

The app consists of a React UI and a pipeline with two parts: the code transpiler and the code runner. When a user types in code in the editor on the left, the code is sent into the pipeline and gets processed by both the transpiler and the runner. After the code is processed, the result (a list of variable values) gets sent back to the React app to display.

Within the pipeline, the transpiler transforms the code into code that is "debuggable". Then, the runner evaluates that code and passes on the result back to the React app.

But how do you transform code? What does "debuggable" even mean? Let's find out by rebuilding it ourselves.

Taking a Peek

The sole responsibility of the transpiler is to modify the code so that the rest of the app can know what's going on inside the function as it's running. For debuggers, knowing "what's going on" inside a function typically means knowing the values of local variables at any given time. Now the question becomes: how do we get those values out of a function?

If you're doing it manually, you can update the implementation of your function to save the data to some external variable:

const variables = []
function sum(arr) {
let sum = 0
for (const num of arr) {
variables.push({ num, arr, sum })
sum += num
}
return sum
}

Then, when you call the function, you can read that variable to know what's going on inside:

sum([1, 2, 3])
console.log(variables) // { num: 1, arr: [1,2,3], sum: 0 }, ...

This works totally fine for small and one-of functions, but it doesn't scale too well. When you have large functions and lots of internal variables, you have to scan through each one and add them yourself. In terms of Playground, it also doesn't make sense to make the user do this themselves — there's too much friction!

This brings us to our problem:

Problem 1: Where to put it?

Our first problem is a simple one – if we're generating these save calls automatically, where in the function should we put them?

I messed around with ways to automatically inject these calls, but ultimately I felt it made more sense for the users to specify where they want to debug. Of course, this indicator should be easy to write, otherwise we're back to the original problem. In the end, I landed on using the debugger keyword as the indicator.

Here's a demo of what exactly the transpiler does — simply, it turns any debugger statement into a function call:

This is an editable demo! Try to add some more debugger statements in the left panel to see what code the transpiler converts it to.

Problem 2: Automatic Transformation

So how exactly would we write something to transform the code automatically?

Since code is ultimately a very long string, we'll use strings as a starting point. Then, we can use a regex matcher to change every instance of debugger with our save call:

code.replaceAll('debugger', 'variables.push(/* stuff */)')

This is a promising start, but this approach really breaks down when you start needing information from the code. What are the variables in scope right now? What are the names of those variables? Are those variables declared? Can we safely use them?

Can you think of a way to answer these questions using solely regex operations? (I can't). Clearly, we need a better way to represent code — ideally, one that lets us easily answer questions like these.

Abstract Syntax Trees

By far the most common way of representing code uses something called an abstract syntax tree, or AST for short. In an AST, source code is broken apart and grouped into "nodes" based on the structure that they represent.

...That probably sounds a bit cryptic, so let's look at an example. Here, I have the sum function from before:

const variables = []
function sum(arr) {
let sum = 0
for (const num of arr) {
variables.push({ num, arr, sum })
sum += num
}
return sum
}
sum([1, 2, 3])

ASTs always start with the entire source code as the root node of the tree:

    Next, we break down this source code into smaller nodes based on the code structure it represents. Here, our code consists of three parts — declaring the variable at the top, defining a function in the middle, and calling the function at the end. So let's add three more nodes as child nodes:

      This process gets repeated until you can't break down the code any further. Try pressing the '+' button next to the node labels to see how that node gets broken down.

      By representing the code in this way, we have access to a lot more information about the code and its semantic meaning — making it much easier to manipulate it in the way you want (like to write a debugger, perhaps?).

      But our original problem still holds — don't we have to manipulate the code string to generate this tree in the first place?

      The Babel APIs

      To solve this problem, we're going to take advantage of the Babel library. In particular, we're going to let Babel do the code parsing for us so that we can focus on manipulating the AST.

      Fundamentally, Babel is a JavaScript compiler — it's a program that takes in JS code and outputs JS code, potentially modifying it in the process. It does this by parsing the code into an AST, manipulating that AST through plugins, and finally converting the AST back into code.

      var a = 10
      Parse
      • Plugin
      • Plugin
      • Plugin
      Traverse
      let a = 10
      Generate

      The AST that we saw in the previous section was an AST generated by Babel but with a bunch of properties cut out. In truth, Babel's ASTs are a lot more expressive. For example, here's the 'raw' AST for a function call with all the original properties shown:

        If you want to see what kind of AST Babel generates for other pieces of code, play around with the sandbox here:

          Plugins

          Out of the box, Babel doesn't apply any transformations to your code — you would need to provide it a series of plugins for it to do anything. A Babel plugin is a function that modifies an AST using something called a visitor. It looks something like this:

          export default function (babelInstance) {
          return {
          visitor: {
          Identifier(node) {
          // do stuff with the Identifier node
          },
          VariableDeclaration(node) {
          // do stuff with the VariableDeclaration node
          },
          /* do more stuff with more nodes */
          },
          }
          }

          Most, if not all, of the logic of modifying the AST lies in this visitor object — we're going to focus the rest of this post on talking about what it is and how it works.

          Be My Guest

          To transform an AST, you have to traverse through the tree, modifying each node one at a time as you visit them. A visitor is used to describe how different node types are modified as you visit each one. Here's how it works.

          A visitor is an ordinary object with node types as keys and handler functions as values. The idea is simple — if the current node type matches one of the types we defined in the object, then call the handler function to modify the node.

          For example, here's a visitor that has handlers for the Identifier node type and the VariableDeclaration node type:

          const visitor = {
          Identifier: function (node) {
          // do stuff with the Identifier node
          },
          VariableDeclaration: function (node) {
          // do stuff with the VariableDeclaration node
          },
          /* do more stuff with more nodes */
          }

          Whenever the traversal algorithm reaches a node, the algorithm checks in this visitor object for what to do. If there's a match, great — we call the handler function with the current node. Otherwise, we'll move on to the next node.

              Visitor

            • Identifier
            • VariableDeclaration

            This method of separating the traversal algorithm with the transformation process isn't unique to Babel — in fact, it's a classic design pattern called the visitor design pattern. By separating these concerns, plugin authors can focus solely on how they want to modify the tree — they never have to worry about how the tree traversal algorithm works.

            Paths

            Admittedly, I lied a bit in the last section. A visitor in a Babel plugin doesn't take in the current node, but rather the current path. A path is a wrapper around an AST node with methods that let you figure out things like:

            • The parent of the current node,
            • The siblings (if any) of the current node,
            • Which variables are in scope at the current node

            As well as various methods to replace, remove, and insert nodes. For more on paths (and Babel's APIs in general), I highly recommend Jamie Kyle's Babel Plugin Handbook.

            Building the Visitor

            Now that we know a bit about visitors and ASTs, let's try to piece together our own Babel plugin to debug our code. To recap, we want to change every instance of debugger with code that records the values of all variables in scope:

            Try it for Yourself!

            Before I go over the solution, see if you can build it yourself! Again, Jamie Kyle's Babel Plugin Handbook is an invaluable resource to help you with the APIs that are available to you. In particular, I want to highlight the path.scope.getAllBindings() method which returns an object of all variables currently in scope.

            I've embedded a visitor and AST playground for you to use. Make changes to the code block at the top and the rest should automatically update!

              Solution

              Let's work through this together. The first thing we have to do is figure out which node type(s) we should target in the visitor. Thankfully, there's a dedicated node type specifically for the debugger statement we're interested in:

                So we can target that in our visitor:

                export default ({ types: t }) => {
                return {
                visitor: {
                DebuggerStatement(path) {
                /* todo */
                },
                },
                }
                }

                Our next step is to get all the variables in scope. As I mentioned in the hint, Babel makes this really easy using the getAllBindings method. This method returns an object where the keys are the names of the variables in scope, and the values contain metadata about the variables.

                We only need the names of the variables so let's extract that using Object.keys:

                DebuggerStatement(path) {
                const variables = Object.keys(path.scope.getAllBindings())
                }

                And with that, we're almost done! All that's left is to create the _variables.push() call. There's two ways we can do this — using builders, or using the replaceWithSourceString method.

                A builder is a helper function used to create AST nodes. When we use builders to create the function call, we essentially create a "mini AST" that we use to replace the debugger statement.

                Here's what the mini AST could look like:

                  Our plugin has access to the complete suite of builder functions through the types property passed in as an argument:

                  export default ({ types: t }) => {
                  /* plugin code */
                  }

                  Using builders is the recommended way to create new code because it's fast and resilient — after all, dealing with objects is a lot easier than dealing with strings, as we've talked about earlier.

                  The main con is that it could get pretty verbose. The little AST we're building has eight different nodes at a minimum, so we would have to call the builder functions at least eight different times to build our AST.

                  Because of this, and because performance doesn't matter too much here, we're going to use the replaceWithSourceString method instead. This method lets you pass in a code snippet as a string rather than the AST nodes themselves. This means all we have to do is:

                  DebuggerStatement(path) {
                  const variables = Object.keys(path.scope.getAllBindings())
                  path.replaceWithSourceString(`_variables.push({ ${variables.join()} })`)
                  }

                  Here, we use the join method to convert the list of variables to a comma-separated string.

                  Putting everything together, we get the final result:

                  If you're curious, here's what it looks like if we solely used builder functions instead:

                  This is also the version that's in the Playground source code since, at the time I wrote the code, I wasn't aware of the replaceWithSourceString method!

                  Edge Cases

                  A big part of building your own Babel plugin is handling edge cases. After all, there's so many ways of writing code — you're bound to miss something on your first try!

                  In the visitor we just wrote, we missed one pretty big edge case. Can you spot it? Here's a hint: what happens if you add a debugger statement before you declare a variable?

                  Our output code is referring to b before it's been declared! If we try to run this code, we're going to get a runtime error. Unfortunately, we can't ask Babel to help us here since it doesn't have any APIs to detect whether a variable was declared or not.

                  Solving this specific edge case was a bit tricky and ultimately out of scope of this post. However, the Playground source code is open-source if you're looking to take a deeper look at how it's done.

                  Applications

                  Learning about how ASTs work and making my own Babel plugin was incredibly insightful because, ultimately, it's what powers a lot of the tools that we web developers use.

                  Take two of the most popular linters and formatters, for example — eslint and prettier. What's cool about these tools is that they're implemented using the same principles of visitors and ASTs that we've worked on throughout this post! For example, eslint lets you write your own plugin through their rules API, just like Babel. If you take a look at the source code for a rule, you'll find that rules are essentially visitors themselves, just like the one we just wrote!

                  I think the most important use case of ASTs is one that we, as programmers, probably use on a daily basis without thinking — compilers and interpreters. These tools take human-readable code like JavaScript, and turns them (through ASTs!) to machine code, a low level language understandable by our computers.

                  Conclusion

                  And that's that for the transpiler! I had a lot of fun building the debugger and talking about it here. In the future, I hope to talk more about the second part of the debugger project, the code runner.

                  For now, thanks for reading!

                  P.S. I just integrated a mailing list for this blog! Sign up to receive web development tidbits, draft posts and more, straight to your inbox.

                  Have feedback?

                  Was anything confusing, hard to follow, or out of date? Let me know what you think of the article and I'll make sure to update it with your advice.

                  Email or twitter handle (optional)

                  Newsletter

                  If you like my content, consider signing up for my newsletter. You'll receive updates on new posts, gain access to subscriber-exclusive posts and little tidbits on whatever I find interesting!