Resources

Hamel Husain | Literate Programming With Jupyter Notebooks and Quarto | RStudio (2022)

video
Oct 24, 2022
16:51

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

I'm going to be talking about exploratory programming in Quarto with nbdev.

Just a little bit about myself. I've been a data scientist for about 15 years. About half of that I've been building machine learning infrastructure and tools for data scientists and software engineers. Worked at places like DataRobot, Airbnb, GitHub. Somewhere along the way I met somebody named Jeremy Howard. You might know him from his popular course on machine learning. It's called Fast AI. He also has a library by the same name.

Most recently I've been working on a project from Fast AI called nbdev. It's something that Jeremy used as kind of a secret weapon for his personal productivity and something that I find completely fascinating and it's really something very profound that I want to share with you today.

What is nbdev?

So in this talk I'm going to be introducing nbdev and nbdev and some associated tools kind of compose like a dialect of Python and these tools allow me and other developers to be very productive. And this dialect of Python focuses on Python's dynamic properties and really focuses on REPL driven development so that you can build software with kind of this REPL driven development and software engineering in mind.

So the reason I'm saying that is I really want you to keep an open mind to new workflows and new IDs and new conventions of ways of doing things that you may not necessarily be used to when you're seeing Python. And all of this is supercharged by Quarto, which I'll talk about as well.

So just to set the foundation, I think a lot of people can agree like in scientific computing but also in software engineering in general, exploratory programming is very critical to our workflows. So at least in kind of the Python community this is very kind of a source of tension. So this is a screenshot from Joel Grew's I Don't Like Notebooks talk, if you may have seen it.

And Joel Grew also spends a lot of time in the iPython show and he goes back and forth between VS Code and iPython. And kind of it's a, this kind of is a, the debate of what tools to use and whether to use notebooks or not are very indicative of this tension that people feel in their tooling. They want to use exploratory programming but they encounter a lot of friction when they have to go back and forth. You have to live in almost two worlds, the exploratory world and then the world of your text like static environment, which is a text editor, which is really painful.

RStudio I feel like knew this from the beginning and I started off as R, as an R programmer. And I feel like some of this knowledge was baked into RStudio. So I don't think in this audience anything I'm saying is controversial.

So other than sort of REPL-driven development and exploratory programming, we also have amazing tools to document code. Things like R Markdown and Jupyter notebooks where we can weave together code and prose. However, the interesting observation is, at least for Python, this is not being leveraged to its full potential for documenting code. At least in Python land, a lot of people are, you know, copying and pasting code into Markdown. In this case, what I'm showing here on the screen is scikit-learn documentation and we have code being copied and pasted into doc strings and outputs of code in doc strings, which is not ideal. You're copying and pasting code into a kind of static environment and that's very painful.

So the question is, is there something we can do about this? Can we have the best of both worlds? And the answer to that is nbdev is something like that.

So nbdev brings everything together into one interface. Exploratory programming, documentation, even things like testing, continuous integration, and also Python packaging. So I was so excited when I experienced nbdev. I've been using it for almost three years. And, you know, I felt like it was kind of this iPhone moment to me where, like, all of a sudden this technology was all bundled into one thing and it was very productive.

And, you know, I felt like it was kind of this iPhone moment to me where, like, all of a sudden this technology was all bundled into one thing and it was very productive.

Live coding demo

So just to give you an idea of what nbdev is, let me do just a little bit of live coding, which I try not to do at talks, but we will see how it goes. So what I'm going to do here is, I don't know if you can see this, let me zoom in a little bit. On the right-hand side is a notebook. And what I'm going to do is I'm going to just author a very simple Python module that's a playing card, like a deck of cards playing card. And the idea is, I'm going to show you just a little bit about nbdev, just to whet your appetite.

So at the top of the notebook you have this directive. It's very much like a Quarto directive, as you can see. And this default export, default exp, tells what Python module this notebook is going to be compiled to. And then the corresponding source code is on the left. You see this is card.py, which corresponds to card. And then we have some pros in the notebook, and then we have some snippet of code. This export just signifies that this is source code.

So I'm going to go ahead and kind of do this fast version of kind of pretending like I'm authoring code. So I'm going to create this card class, and I'm going to export this code. Just for brevity purposes, I kind of created this code beforehand, so you don't have to sit here and watch me code, because that's not the point of the presentation. But basically just imagine you're writing some code in a notebook. Some of it is source code.

So I create this, I have this code here. Actually this here might be something I want to, and you might want to have some pros. So for example, you can create a card like this. Very simple. And then you might want to show the representation of that card. So this is a three of diamonds. It's kind of small. I don't know whether you can see it or not. And then you can go into other things such as, you know, let's just write some more pros here.

And then I might want to do something like create another card and maybe, you know, compare it. So let's just say this should be bigger than that, so it's true. And then, you know, I might want this to be a test. So I might want to put a test here and say, hey, this should be true. And then actually maybe here I might want another test, but I might want to hide this test. So I might want to say, okay, I want to make sure that this, you know, this is represented the right way. And I might want to go ahead and hide that. I'm going to put the hide directive here, okay?

And that's kind of very simple. Now what I'm going to do is I'm going to do export, nbdev export, and what we have is you see the source code has been generated in the Python module from the notebook. So you can think of notebook as syntactic sugar for source code. But then also at the same time, what we have is you have tests. So I can run nbdev test here, console, and then you'll see it will go through and test all the notebooks. Now, nothing failed, but let's say something does fail, so I'm going to make that an equal sign. And you'll see that you see a failure message and it will show you, hey, in this notebook there's a failure, okay?

And then the next thing I want to show you is the documentation. So let me make this screen a little bit bigger so you can kind of see it. And so the documentation looks like this. This is generated by Quarto as well. You have this very simple documentation that shows you the example, shows you the test, hides cells that you might not want to see. And so that's the gist of it. That's kind of whetting your appetite, just scratching the surface of what nbdev is for.

What you get with nbdev

So kind of to summarize, so let's go back to the slides. So when you create this notebook, that's when the magic starts. That's just the beginning. So at that point where you create this code, the first thing you get is Python modules, which you saw. You can do actually a two-way sync between them. So you can make edits in the plain text version of that and it syncs backwards for quick edits. And you can navigate with your plain text editor if you like things like C tags and things like that.

You can develop Python modules from notebooks very easily. So you can import things, work very similar semantically to how you would author modules in general. The documentation gets rendered with Quarto. The readme can even be generated with a notebook, which is really cool. And then tests are also done for you. So you can write unit tests in your notebook, and nbdev comes kind of prepackaged with everything you need to do CI. So when you create an nbdev project on GitHub, it comes with everything you need already prebundled, the CI is already turned on, your tests are already running. And similarly, with the Quarto documentation, the machinery to host this on GitHub pages is already turned on. So all you do is create your project, you get a live site with your documentation site, you get CI if that's what you choose.

And then you also get Python packaging. nbdev takes care of all the boilerplate, a lot of the boilerplate involved with Python packaging, and it can be really significant. So things like PyPy and Conda, it takes care of that as well.

So the thing that's really powerful is usually when you're done writing code, what do you do? And sometimes you might create documentation, maybe another effort. You might figure out, okay, how am I going to package this? And that's a different effort. And then similarly, you might think about, okay, I need to set up my CI, I need to do all of these things, and also maybe clean up your tests. But all of this comes for free. And the thing is, it's all in one context. You're not switching context all the time for all these different things. You're writing software this way.

And I want to say that we've written all kinds of different software with nbdev. It's not just scientific computing libraries like fast.ai, but it's also terminal user interfaces, API clients. There's a Python client for the GitHub API called ghapi. It's also written in nbdev. We've written all kinds of other infrastructure. So a wide range of software has been built with this. So it is quite robust.

And as I mentioned, you can think of notebooks kind of as syntactic sugar for source code. So this is a comment from Eric who works at Lyft. He uses this in production at Lyft, and he says, I can write docs, test, and code all in one place. And he also says, I get a lot of skeptical looks when I say the source code is notebooks, but that's just syntactic sugar for raw source code. So I mean, he really, really understands kind of nbdev in that way.

I get a lot of skeptical looks when I say the source code is notebooks, but that's just syntactic sugar for raw source code.

And really, so another thing, if you want to kind of experience nbdev and try it, which I encourage everybody to do, one kind of useful mindset is to kind of realize it's a dialect of Python. This is not just programming in notebooks. There's also a lot of extensions to the Python programming language that complement nbdev. There's a library called fastcore, which is also, not surprisingly, built in nbdev, which makes it easier to program in a notebook. Things like it makes sure that tab completion works even better for things, and it makes sure that, and it allows you to split up code in between cells where you might not usually be able to, and all kinds of other things, which I won't get into in this talk because fastcore is a completely different talk by itself, and it's very interesting even on its own.

What Quarto enables

So you might be wondering, what does Quarto enable? So what does Quarto fit into this? So it's actually really exciting. Before Quarto, we had a different kind of fragmented system for, you know, we had a separate project for blogs, and a separate project for books, and a separate project for, you know, documentation. And then we were also trying to figure out, well, what if you want to document your existing code base that's not written in nbdev? Can you use nbdev? Because nbdev has a lot of nice sugar for documenting existing code bases as well. And so now we have a common framework for all of that. You don't have, you know, a common set of tools. No matter what kind of format you want to produce, you can use, you do that because of Quarto.

It's also a very stable infrastructure, like, you know, we're very impressed by JJ and his support on this project. He's been amazing in kind of adding features, squashing bugs, you name it, it's been special. And we can also offload a lot of work. There's a lot of commonality between nbdev and Quarto, like a lot of directives that Quarto has. We don't have to do that anymore because Quarto is doing it. The notebook filters are great. You know, JJ added notebook filters, you know, in part for nbdev. And that makes it very customizable. You can do anything, essentially, with notebooks in that situation.

And then also there's a flexibility to use any static site generator. So, people want to, you know, use their own static site generator, like Docusaurus or something like that. And you can do that with Quarto because Quarto is kind of like this Pandoc super processor where you can mutate anything and create anything, which is amazing.

So last thing I want to say is give nbdev a try. This is nbdev's website. Let me, I just want to show you the website. The website is, the homepage for the website is actually made in a notebook. This is a Quarto page. So this is really cool. It's amazing. And this, if you go behind and look at this, this is a QMD file, surprisingly enough. And yeah, this page is generated in a notebook, which I think is super cool.

So yeah, please get in touch with me if this is interesting to you. You know, I'd really like to talk to anyone who's interested in this, but yeah, thank you.