Resources

Marco Gorelli: Narwhals, ecosystem glue, and the value of boring work

video
Dec 16, 2025
55:00

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to The Test Set. Here we talk with some of the brightest thinkers and tinkerers in statistical analysis, scientific computing, and machine learning, digging into what makes them tick, plus the insights, experiments, and OMG moments that shape the field.

On this episode, we sit down with Marco Gorelli, Celtic folk shredder, Narwhals mastermind, and software engineer at Quansight Labs. Welcome to The Test Set, where we explore the people behind the data. I'm Michael Chow, a principal software engineer at Posit, and I'm joined here with my co-host, Wes McKinney, creator of Pandas, co-creator of Aera, and principal architect at Posit. And I think we're super pumped to have Marco with us, who is a software engineer at Quansight, and creator of Narwhals, and also a maintainer or former maintainer of Pandas, so doing a lot of DataFrame stuff. Marco, thanks so much for coming on to The Test Set. I'm a giant fan of your work and just the rowdy party that is the Narwhals Discord. So, so happy to have you on.

Yeah, thank you so much for having me. It's a pleasure to be here. In fact, Michael, I think you might have been the first person, you and Rich, I think you might have been the first people who I ever demoed Narwhals to.

Oh, wow. Yeah. Yeah, that's actually, I remember a lot of, this is almost like an open source thing, like I remember having like an early chat. But that's actually crazy, I didn't know that it was that early in the Narwhals process.

We met up to talk about something else, which was about great tables. And then I think I just ended the call by saying, hey, check this thing that I just started over the weekend. And sometimes your little weekend projects end up running away from you a bit and spiraling out of control. And that's what's happened here.

What is Narwhals?

Yeah, I love that. I mean, it's, it's been really cool to see and maybe for some background context, I guess you're also a contributor to Polars. So you've kind of spanned a lot of these data frame libraries, Polars, Pandas, Narwhals.

I mean, I'm embarrassed to say that I've actually never used Narwhals myself. I know what it is, but probably there's plenty of people listening who aren't, who've never heard of Narwhals. They're like, Narwhals? What is, what is that? So maybe, maybe you can explain, explain the project and how, how it came about and, yeah, we can talk more about it.

Yeah, yeah, sure. I mean, I think it's very likely that you have used it, but accidentally, in the sense that it's, it's intended as a compatibility layer between data frame libraries. And it's not something that end users tend to use directly. Rather, it's something that people tend to use as a transitive dependency. So they tend to use it because some library that they're using is actually using Narwhals under the hood in order to be able to handle multiple kinds of data frame inputs.

As an example, if you've been using Plotly since version six, then Narwhals is a required dependency and it's used to do all of the data frame operations. And this allows it to accept, let's say a Polars data frame and keep all the computation native to Polars until it has to serialize it without having to convert to Pandas or to depend on Pandas. And conversely, Pandas users can keep passing in Pandas data frames to Plotly without needing to take on Polars as a dependency. Like you can meet users where they are and both the Pandas and Polars and PyArrow user bases can just enjoy using their native tools with Plotly and other data science tools without having to even necessarily know that Narwhals exists.

As an example, if you've been using Plotly since version six, then Narwhals is a required dependency and it's used to do all of the data frame operations. And this allows it to accept, let's say a Polars data frame and keep all the computation native to Polars until it has to serialize it without having to convert to Pandas or to depend on Pandas.

I was just looking and I saw that it has 43 million monthly downloads, so it's definitely gotten some uptake. But to your point, I guess it's being picked up as a transitive dependency in a number of projects. I'm familiar with the problem because in the early days of Pandas, people would want to accept Pandas data frames in different libraries. I know that Scikit-learn was one of those projects where people were like, I really just want to pass a Pandas data frame into this project. But there was this challenge of, well, we can't require Pandas as a hard dependency of Scikit-learn. But then what if people want to pass other types of tabular data structures? And so, yeah, it seems like it's solving that problem and the evidence is in the uptake of the project, which is great.

Cheers, thank you. It's interesting that you mentioned Scikit-learn, actually. I just got pinged today on an issue in the repository where they're talking about using Narwhals because they do currently accept both Pandas and Polars, but they've got their own hand-rolled compatibility code, which I think they're getting a little bit tired of maintaining. There's been some issues reported, and it's the kind of situation where if they can outsource the work to a project where the only thing that we're concerned about is handling compatibility between different data frame libraries, then that can leave them to focus on their own competitive advantage.

If I'm understanding, you're saying a lot of libraries, they need to do a little bit of data frame-like stuff. They want to take a tabular structure, they might want to choose some columns or filter a little bit of data. So before Narwhals, or a lot of these kind of compatibility tools, they were sort of stuck with being like, oh, well, I want to do a little bit of data frame stuff, so I sort of need to choose a data frame to make a dependency and kind of build on top of that. Yes, that's exactly right. Whereas Narwhals allows you to write your data frame operations against the abstract idea of a data frame, and then whatever the user passes in will use the user's library.

Data types and the pain of interoperability

Now, my expectation when starting it was that the pain points we'd be addressing would be things like, I don't know, does group by maintain order? Or like in the unique function, if you've got a null value, does that count as one unique value, or do you have one unique per none, or do you not count null values at all?

And in the early days, I realized that actually the biggest pain points had to do with data types. And like in Pandas, if you want an int64, there's currently three different ways of expressing an int64 column. If you want a string column, there's even more. And unless you're actively, very closely following developments in the Pandas GitHub repository, it's very difficult to stay on top of it all.

I think we might have had a discussion in GreatTables about this as well, where there was one of Pandas string data types which was displayed in one way, but then if you used the PyArrow-backed string type, it was displayed in a different way. But arguably, from an end-user perspective, it's just a string column.

Yeah. Yeah, totally. And this, I should say, so GreatTables is a tool I maintain that helps display tables. And we don't use Narwhals yet, because I think we released before Narwhals did, but we're ready, just to maybe hype you up, we're ready to switch to Narwhals for a lot of the things you said to take some of that burden off of wrangling different types of data frame inputs, so people can just bring whatever data frame they want and get really consistent stuff.

Exactly. I mean, in the Narwhals documentation, there's a why section where I try to answer why would we need something like this anyway? And I give some examples of simple data frame operations which look like they should be superficially the same between libraries, but actually behave wildly differently. Now, I don't know if we'll get to it later on the topic of using AI tools for development. And I don't, for a second, want to discount the value that AI tools can bring in certain contexts. But for the development of Narwhals, they've been incredibly frustrating, because they just always spit out things that look plausible, but that actually are missing some very key details. For example, if you ask an AI, what's the SQL equivalent of the Polars N unique function? It'll tell you count distinct, except it's not actually equivalent, because Polars counts null values in N unique, but count distinct doesn't. So you'll need to do count distinct and then compensate for whether there were null values present or not.

I mean, the irony is that LLMs are notoriously bad at counting.

LLMs, training data, and keeping up with fast-moving APIs

Yeah, well, there was a period of time, I think it's gotten better now, but there was a moment in time where there was a gap between the present reality and when the training cutoff date was for especially the earlier versions of ChatGPT or Claude. I think it's gotten a lot better, because firstly, I think the training, the AI labs have done a better job of getting the like tracking, getting their models closer to present day in terms of slurping up all of the data on GitHub and all over the internet to keep the training data closer to the present day whenever they release the models. That plus libraries like Polars have stabilized a little more, like they aren't changing and refining nearly as quickly. And so just as a combination of the models getting better, the training data getting better, and Polars stabilizing. So now I'm now using the coding agents to use Polars, like I've been using Polars to build a personal finance side project. And yeah, it stubs its toes and it does things wrong, maybe 10% of the time or 20% of the time, but it's a lot more effective. I feel comfortable, a lot more comfortable using Polars with Claude than I did a year ago.

Funnily enough, the first time I met Richie Vink, who is the creator of Polars was at Euro SciPy, I think in 2023. And I remember at the time we had a little discussion and we were saying, ah, should group by actually be group underscore by? Because that's generally the philosophy that Polars follows in its API, but group by was just a single word, I think just for compatibility, just because he'd taken it from what Pandas was doing. So he said, yeah, sure. Let's change it. Seems like a minor thing. What's the worst that can happen? And then GPT got super popular and the cutoff date was just before that deprecation was introduced. It was phenomenally bad timing. I'm relieved that this didn't kill the project. But I think it was a good idea to change it anyway.

Expression systems and the pandas API

We did the same thing with group by in Ibis, which maybe has a slightly similar type of vibe to Narwhals, but different goals. But partially when designing the API of Ibis, there was a goal of deliberately placing distance between the perceived sins of the Pandas API, making things a little more normalized and consistent. But it's interesting because it's nice that now we have this whole idea of having an expression system and being able to express complex aggregations or complex data frame operations with a lazy expression system. Going back in time, we really didn't have that. But now it's nice that we have the type system of Arrow and modern databases with nested types to provide some structure for like, okay, these are all the things that expression systems need to support.

And then Polars designed its expression system to be able to do all of the fancy things you can do inside Polars aggregations, and Narwhals has adopted that. It seems like there was an effort to maybe push some of that upstream into Pandas. I heard rumblings of pd.col. I always wanted to have an expression system within Pandas, but it was one of those things that even today, my eyes bleed when I look at people trying to do complex aggregations in Pandas, and they're passing lambda functions. I'm like, no, that destroys the performance. The nature of open source, we're never quite where we want to be, but we're always working and trying to make progress.

Yeah, and I think pd.col is out. I think they released pd.col, is that? Marco, were you responsible for that or was that you're doing?

I opened the pull request, yeah. Maybe just to clarify, it's not as nice as I would like it to be. It can only be used in places where Pandas already accepts callables. For example, loc, and get item, and assign, those are the big ones. In group, it would require some other upstream changes in Pandas, which I didn't yet have the capacity to do. But it's interesting that you bring up how central the concept of expressions became, and I totally agree. I think it's just so important to be able to express the abstract idea of an operation that you want to do without any of the objects in the expression having to be tied to any particular data frame.

When Narwhals started, it was somewhat in response to a previous effort at trying to standardize data frame APIs, where one of the major points of contention that I had with the other participants was on the concept of expressions. When other people wanted to keep the API much more tied to data frames in series, those were the two main objects they wanted, whereas I wanted expressions. That was one of the major points that we just were not able to get agreement on, unfortunately. But I'm glad that in the years that have since passed, the concept of expressions has become really popular, and users certainly seem to enjoy it. You mentioned Ibis having expressions, and in fact, I think Ibis took expressions from SUBA, which was a project started by Michael. So we've really come full circle here in this chat.

It is funny because I do think that Ibis adopting lazy expressions is actually what really made me feel good about Ibis. Ibis is so powerful. I'll just summarize again. I know it's come up. It's a tool to basically a data frame API that could fire on, say, a SQL backend or a Pandas data frame. I think they have support for Polars and things like DuckDB, so it's a nice way to use a single API to fire a lot of places. These types of tools are so critical, I think, as a human being. To just be able to think in one data frame and fire a lot of places is such a relief.

I started on Ibis in 2015, and initially, around that time, there was another project that was being developed at Anaconda, formerly Continuum Analytics, called Blaze. It was a similar expression system trying to tie together NumPy and Pandas-like operations, as well as potentially... This is the same era where Dask was created by Matthew Rocklin. I was working with a lot of SQL engines, and so Ibis was essentially an effort to try to reconcile the relational algebra of SQL databases with DataFrame operations.

So this is where I say the vibe is similar to Narwhals, but the objective, the end goal, is different, whereas rather than being a portability layer to enable libraries to write code once and target different DataFrame libraries, Ibis' goal actually was to have a DataFrame-like API that could express all of the concepts that are found in modern SQL. And so if you're using Postgres and Postgres has these fairly normal concepts in relational algebra and in databases that don't really exist in Pandas, for example, ideas like correlated and uncorrelated subqueries and anti-joins and semi-joins and different things that are very normal in the... Actually things that do exist in dplyr, for the record, but stuff that was normal in the database world, but there hadn't been that much database technology that had trickled through to the Python ecosystem.

So it's interesting, but it's nice that these projects have influenced each other and have... Again, that's the magic of open source that people have good ideas and every project should curate everybody's best ideas and use them to be better and to make improvements and not be stuck in the way that things are and be willing to give credit where credit's due. And so I'm very excited about where things are and the general trajectory of things. So compared with 10 years ago, it feels like a world of difference. So I have to give credit to everyone and all the work that they've done to get to where we are today.

Marco's origin story and motivation

Yeah, totally. Yeah, sure. So how did I get interested in this space? What made me want to spend a big chunk of my time and now, fortunately, some part of my job tackling this problem? Perhaps just quickly before I answer this, since Ibis has come up a few times, just to clarify, it's not in competition with Narwhals at all. In fact, Narwhals supports Ibis as an input. So you can choose to use Ibis with a Narwhals front end if you wish to.

What got me interested in the space originally, I was really just frustrated with the fact that I was a huge fan of Polars. I really liked using Polars. I really liked contributing to Polars. But then one of the most frequent complaints I heard from prospective users was that when they went to use their favorite data science tools, they either had to convert everything to Pandas or the tool would claim to support Polars, but under the hood, it was just converting to Pandas. There are some cases when that's okay. You really don't see the effect of it, but there are cases when the performance penalty can be quite large, like converting a string to object in Pandas can be quite expensive. There are some data types that don't exactly match.

And perhaps one that I really felt strongly about in terms of missed opportunities was that Polars has both a lazy and an eager API, whereas Pandas is purely eager. So having to convert everything to Pandas, when there were some operations which could in principle have been done lazily, it really felt like a missed opportunity. It felt like it should have been possible to do much better. And I could see that some libraries were doing what you were doing in Great Tables, Michael, where you had your own hand-rolled compatibility layer. Scikit-learn was doing something similar. HP Plot had their compatibility layers for QDF and Dask and others.

There were some other libraries that were doing similar things. So I just figured, well, let's try to make something reusable. And we need to choose a common API. Originally, I thought about using the Pandas API, but I found that it was a bit frustrating to try to transpile Polars to Pandas, whereas doing it in the other way around, it just worked beautifully. The Polars API is stricter than Pandas. And arguably, when it comes to doing transpilation, it's easier to start from users writing something in a stricter API, and then transpiling that to the less strict one. Also, Pandas has the concept of an index that Polars doesn't have. So if the user does not have to write index code directly, then that's one less thing to worry about.

So I just tried this. And you can see in the early commit messages in Narwhals before I released it, when I was just writing it by myself as a little one-person experiment, the commit messages start saying things like, ooh, it works! Getting there! Three exclamation marks. I was really excited at first. At some point, I published it online and started enforcing pull requests and reviews and all of that. But for better or for worse, those early commit messages that I was just writing to myself are still there in the project history. But it's actually not too bad to see the excitement that I was feeling at the beginning when I realized that actually transpiling Polars, translating Polar syntax to Pandas was fairly satisfying.

So really, that was the original motivation, just trying to solve this ecosystem problem where we had all of these really nice data frames out there. But frustratingly, the data science ecosystem was locked in to Pandas. The first time I presented Narwhals, after having showed it on a call to Michael, was at PyCon Lithuania in 2023. And I ended the presentation by saying that my hope for the next year was that we would see the data science ecosystem become less locked into Pandas, and that maybe we would also see Duolingo add a Lithuanian course. And only one of these things happened. I think there's still no way to learn Lithuanian on Duolingo, but hopefully they'll address that as well.

Building the Narwhals community

I'm super curious to hear about, to go back to Wes's question, how you got into some of this. And also, I know Narwhals has this really active Discord community. I'd be really curious to hear some of the outreach you did. How did you get involved, and what did you do early on to build out the Narwhals community? Because it's a happening place. It's a pretty sweet crew you've got.

Sure, yeah. So you're asking how did I get started with the project? Yes, I've covered the motivation, which was about not having data science tools all locked into Pandas, and about physically how it started. I just put it out there. I remember sending it to some newsletters, and to my surprise, people started checking out the repository, giving it stars. And I got a bit of a signal that, okay, maybe it can be useful. Maybe people like the idea. And the original name for the project was Polars API Compat. The idea was that it would just make other APIs compatible with the Polars API. And then I figured, wait a second, I really want this to become popular. That's not going to work in the present day and age, is it? We need an entertaining name. People want to be entertained. And all data science popular projects seem to be named after animals.

So I looked on the Wikipedia page on the list of animals that had not yet been taken, and I found Puffin. Seems like the perfect name, but unfortunately, there was already a Python package called Puffin. And then I found Narwhals, which immediately reminded me of that viral Mr. Weeble song about Narwhals. And I thought, I can't call it this. People will just think of the meme. But actually, that ended up being the biggest strength. You see the project, you already feel entertained, and you already get a bit of a sense of fun.

And that's something we've tried to keep throughout the contributing process. When people submit pull requests, I don't just approve it. I give them a little Narwhal gif. And yeah, I love how other people who I then added as maintainers also took on the practice and give each other celebratory Narwhals gifs on their successful pull request reviews.

It's honestly really sweet. Yeah, super sweet to see gifs flying, I feel like, in a repo.

Yeah. We were thinking about making an official list of Narwhals approved gifs to use in pull request reviews. On the rest of the community, I feel like I need to give a shout out to Ines Pawson, who is from Open Teams, which is like a sister company of Quansight Labs. And she was one of the first people to really believe in the Narwhals project. And she encouraged me to start a community call and a Discord and a way for contributors to interact in a low pressure environment. Because if you've got a GitHub page, in theory, anyone can communicate with the project. In practice, a lot of people don't really feel comfortable just asking a casual question by opening an issue. They feel like it's a very serious place and that GitHub issues aren't really an appropriate place to banter or to just have a casual chat about what you're doing with the project. Nor is it an appropriate place to showcase things that you might have built using the project. But if you've got a Discord with several channels, then people are more than welcome to do that and really need to give a credit for having encouraged me to do this early on. I did not think that there would be any interest. I thought it would just be like a dead Discord server, but no. Contributors started hanging out there, helping each other out, sharing things.

The best thing that's happened, as far as I'm concerned, is the study group. So I think weekly or bi-weekly, some regular contributors meet for what they call the Narwhals study group, in which they just share learnings that they've made while contributing to Narwhals and they help each other out with making sense of lots of topics. They've had a lot of discussions around the type system in Narwhals. In Narwhals, we take typing very seriously. There's a lot of typing shenanigans going on, but if you want compatibility between APIs, then I think that's a fairly important thing to do.

The group really has gone beyond Narwhals in the sense that it usually starts off with some topic that they might have encountered while contributing to Narwhals. For example, what's a protocol? What does it mean for a type var to be covariant? And then they explore this deeply, try to understand this, try to make analogies.

Proactive outreach and dogfooding

I feel like there's a period of time where you were almost showing up in any place that made any mention of Narwhals. It was like a way to summon you. Is that true?

Yeah, there is some truth to this. I realized when I published my first open source project back during the initial COVID outbreak, it was called NBQA. It was a little quality assurance tool for Jupyter Notebooks. At the time, my assumption was that if you published a tool and it had a bug, somebody would open an issue and tell you about it, and then you fixed the bug, and then it would be better for everyone else. And that's when I realized, no, most people, if they try a tool and they find a bug, they uninstall it and they move on to something else without even reporting the bug. But GitHub makes it easy to just do a site-wide search for mentions of keywords. So just searching around for mentions of the word, I started seeing issues that people were running into that they weren't even reporting. I just figured, ah, this is so valuable. This is so useful. I can address these things. And so when I started Narwhals, I was just already used to having to proactively look for issues that people might be encountering rather than relying on them to report things.

So yes, I would just do GitHub searches for mentions of Narwhals. Fortunately, Narwhals is a fairly unique name. So most mentions that came up were exactly of this project. And if I saw projects saying, oh, yeah, we should maybe consider looking into this Narwhals project, I was just so keen on getting the ball rolling that I would show them how it could be done, or in some cases, even open a pull request to them showing, hey, here's a proof of concept of how you can Narwhalify your code base. And that was so valuable. You just realize so many things about your tool once you actually start trying to use it to solve a problem. The dogfooding process is just so valuable. For anyone wanting to start a tool, that's probably the best thing you can do to make it usable. Just try using it yourself for an extended period of time.

Social capital and trust in open source

I think people, sometimes people underestimate how important the social part of open source is, as opposed to purely the technical part. Maybe a nice example of why this is so important. The first major library to start using Narwhals was the Altair visualization library. And originally, I would have thought there's no way that they would trust a small project like Narwhals or consider taking it on as a required dependency. But I remember in the GitHub discussion about it, that one of the Altair maintainers had said that he'd seen a talk that I'd done in person at PyCon Germany. He liked the talk I'd given. He liked how it would come across to the audience and all of that. And so, because he already knew me, because I'd already earned a bit of social capital in their world, they felt that they could trust this project that I'd put out. So, talking at conferences, putting yourself out there, generally trying to be a good member of the community, it can really pay off later if you want people to trust something that you've built.

I think people, sometimes people underestimate how important the social part of open source is, as opposed to purely the technical part. So, talking at conferences, putting yourself out there, generally trying to be a good member of the community, it can really pay off later if you want people to trust something that you've built.

AI tools and the challenge of discovering new projects

Well, one question I have for Marco, I'd be interested in what it's been like because Narwhals is a newer project. And so people are, many people are hearing about it for the first time in the last year or two. And so I feel like there's like AI is definitely shaping the way that people learn about new open source projects and how they interact with them. And so like we spoke earlier in the podcast about like the challenges of like, you know, LLMs being able to keep up with the latest API changes and projects, but how people will discover and start using a totally new piece of software is a little bit like an open question. Like I'm concerned that maybe people won't be motivated to use anything new that their LLM assistants don't already know about. And so you're kind of hopeful that eventually like the AI labs will index your project's documentation and figure out how to use the project on their own and then start offering suggestions to the users. But essentially it creates this chicken and egg problem where like the models are dependent on training data to understand how to best use these projects. But if people won't use the projects and create the training data, then how will the LLMs ever bootstrap and get to a point where they become experts in how to use your new open source projects?

Yeah, that's absolutely, it's absolutely been at the top of my mind. So when I started the project, I thought it was fairly important to make something that felt familiar to people. So that's why I chose the top level API. I decided it would just be a subset, like a strict subset of the Polars API. So like this, the, whatever work in indexing LLMs have to do to familiarize themselves with the Polars API, it should automatically transfer to Narwhals. They only need to learn some extra things like from native and to native for how to get in and out of the library.

Narwhals plugins and a composable ecosystem

Seeing as composability has been brought up as well as DataFusion, something that we've just released this week has been Narwhals Daft. So this is a plugin for Narwhals and this is the first plugin that we release. And it's intended to be a reference for how to write Narwhals plugins. And what I'm envisioning here is really something like a composable ecosystem where in Narwhals, we have some protocols which just define what methods, some expression classes, some data frame classes, and some namespaces need to implement. And as long as libraries implement those and they hook things up correctly using the plugin architecture, then they can become Narwhals compatible for free. So this means there's any code base written using Narwhals like Plotly, Altair, Scikit-Lego, et cetera. It means you can pass in a Daft data frame as long as you've got your plugin installed. And it also means that people can make plugins for their own systems. Like I'm hoping that either we or somebody can make a DataFusion plugin. Somebody can make just like a pure Python dictionary plugin. Somebody can make a plugin for Bodo. Like lots of other tools are coming out, it's not really feasible for us to maintain all of them in a library that is meant to stay lightweight. But what we can do is we can define a plugin mechanism. We can define some protocols that people can follow, and then hopefully the ecosystem can just become composable from people following it.

Just for people listening, how would you define a plugin? Like what is a plugin?

By this I mean an extra library that you can install, which is an optional add-on. So by default, if you install Narwhals, it'll support Pandas, Polars, PyArrow, Modin, QDF. Dask, Ibis, SQL Frame, PySpark, DuckDB. Hope I've not forgotten anyone. It'll support these inputs. And if you install a plugin, like the Narwhals Daft plugin, it'll also accept Daft. If when we have the Narwhals DataFusion plugin, if you install that, it'll also support DataFusion and so on. So I think like this, we should be able to really empower the community to make their own little plugins that can compose with everything else.

The value of boring problems

Overall, I'm really excited about people solving quote-unquote boring problems. And so be it enabling tools to compose better with each other, be it the typing, like just the static typing in Python that just keeps getting better with each new Python release. I think when it started to come out, people were often very skeptical about it, feelings were mixed, but recent Python releases, like it's just, it's getting really solid. It's getting much better. So I'm very excited about that. I like the advent of tools like DuckDB, which even like the creators describe as a quote-unquote boring project. And yeah, I hope that we can have more of this. And even, yeah, what Wes was talking about, you know, file formats. I think there's just so much that we can solve to make the experience of working with technology more efficient and more pleasurable.

So there's one piece of advice somebody once gave me. Unfortunately, I can't remember who, but it was that if there's something you find interesting that other people find boring, then you've just found your competitive advantage. So to anyone listening, if you happen to stumble on a problem that you find interesting and other people just don't seem very excited about it, well, maybe don't give up on it. Maybe try to go really deep on it and you might just discover something that you can make an impact on and that can be very rewarding.

If there's something you find interesting that other people find boring, then you've just found your competitive advantage.

Celtic folk and closing thoughts

Yeah, that's incredible. I feel like that's such helpful advice. I guess maybe one thing I do want to close out on that I feel like we can't leave out is as a personal tidbit, I think you mentioned for fun, just MarcoFact, you like to jam out for little Celtic folk sessions. Is that, could you take us on a tour of that, Marco, you know, Celtic folk, Marco?

Sure. Well, in fact, just after this call, I'm going to head to the Irish pub in Cardiff. There's an Irish-themed session this evening. So we're going to play a mix of tunes, some Irish traditional music, some crowd pub pleasers. And yeah, that's my favourite thing to do outside of work, just learning songs, mostly on guitar, which I've played for the last 20 years or so. Yeah, I think it's a good way to disconnect from work and kind of escape into a different world. And it's a very fun way to meet people who you otherwise wouldn't get the chance to meet as part of your work.

Oh yeah, yeah. So in Quansight, the company I work for, there's quite a few people who play instruments. The company founder, Travis Oliphant, is an incredible singer. So when we have our meetups, we always make sure to reserve some time in the agenda for jam sessions. And for me, those are the highlights of the trips. And I know that other people, and a lot of other people feel similarly. Music really has this power to bring people together and connect them.

Usually I'm just so focused and excited by the music that I'm too busy playing it. I'm also not particularly good at dancing, so I wouldn't be the person to lead that. Well, Marco, I feel like I feel like so honored to have us all to have you and Wes in a podcast. I feel like life with data frames and analyzing data has been made so much better in Python today by you focusing on this problem, just how to connect things and how to interchange. And I do think that your emphasis on boring problems, like I feel like we're all so much better today because you're thinking about how every data frame thing can talk to every other data frame thing. And so, yeah, so excited to have you on today and to hear a bit about your process and the things you're working on.

It's been an absolute pleasure. Thanks for having been encouraging about the project when I first showed it to you. I think it makes such a difference when the first person you show something to is encouraging. And yeah, for anyone listening, if people show you something, maybe it doesn't hurt to say something positive and encourage people to take it further.

Yeah. Yeah, that's huge. Well, thanks. Yeah. Thanks so much for coming on and really excited to see what you do with Narwhals over the next year. Thanks. It's been an absolute blast. And now, if you'll excuse me, I need to go and tune up. This is it. Celtic folk. Let's go. Thank you, Marco. All right. Thanks. Thank you.

The Test Set is a production of Posit PBC, an open source and enterprise tooling data science software company. This episode was produced in collaboration with creative studio AGI. For more episodes, visit thetestset.co or find us on your favorite podcast platform.