Resources

Rebecca Barter: Persistent learning, tool building, and ‘Will code even exist?’

video
Jan 27, 2026
59:06

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to The Test Set. Here we talk with some of the brightest thinkers and tinkerers in statistical analysis, scientific computing, and machine learning, digging into what makes them tick, plus the insights, experiments, and OMG moments that shape the field. On this episode, we sit down with Rebecca Barter, Senior Data Scientist at ARENE and Adjunct Assistant Professor at University of Utah, who has really interesting perspectives on learning and teaching and demystifying AI.

Hey, everyone. Welcome to The Test Set, where we dig into the people behind the data. I'm joined here with Rebecca Barter, who's a tenure data scientist at ARENE and an adjunct professor at University of Utah. And I'm joined by my co-hosts, Wes McKinney, who's a Principal Architect at Posit, and Hadley Wickham, who's Chief Scientist at Posit. Rebecca, we're so happy to have you.

You mentioned a lot in prepping about AI and teaching and kind of your path. But one thing that stood out to me was you said, you're not the kind of person who can work on something you don't care about. And I feel like that is so relatable. I thought maybe you could just kick us off with a little bit of that.

Absolutely. Yeah, no, I get bored easily, I think is really what happens. So if I'm given a bunch of tasks or like working on a kind of field that I'm like, there's no point to doing any of this. It's just like mindless tasks. I start just staring out the window thinking about what I'm doing with my life. So I'm very much like, if I wake up in the morning and I have tasks where I'm like, yeah, I actually care about completing these, I feel like there's some kind of impact, whether it's to other people, to the company that I work for, or to a field as a whole, I'm going to be so much more engaged. I'm going to actively have fun. Whereas if I wake up and I'm like, all right, let's just kind of get butt in seat and do the thing I got to do. And it doesn't really matter if I do it or not, but I have to do it. I'm going to probably start crying at some point. That's just, it's not me. Some people can.

No, that's so real. And you're, as I understand it, you're in Tahoe, so you live in the mountains. So if you're not motivated, also you have all these beautiful mountains.

Exactly. And I've definitely done that. So I was, before I was living in Tahoe, I was living in Salt Lake City and it was the same thing. So there was like mountains right there. So like, you know, anytime I start feeling that feeling of like, I don't want to be doing this, I find some kind of excuse to just like go run away in the mountains and stop doing the task.

Wes, how do you feel about, can you grind through things you don't want to do?

So sometimes you have to do work that you'd rather not, but it's for the greater good or like the project just is not going to be able to move forward without it. So I've definitely found myself like dealing with, like thinking back on the early days of Pandas, it was definitely everything having to do with Windows was like the thing that I didn't want to do. And so there was a dark period where I had like my Windows virtual machine was like the blessed place where I would create the installer packages for Pandas. This was before we had wheels. This was before, you know, Conda and UV, it was like dark times creating these .exe files to install Pandas. And at no point did that ever spark joy. This was purely, you know, work that needed to get done for the greater good of the project.

Yeah, that's, I could see Windows, no shade on Windows, but yeah, sometimes Windows is being the demotivator.

JJ used to say that, like, I think it was some company, he was like saying, like, if you want people to work on Windows, particularly Windows installers, like those people just need like an automatic bonus because it's just like such, it's like such important work, but like no one wants to do it. And it's painful.

I think that that comes around in a lot of aspects as well. I feel like whenever I'm teaching a workshop, it's always the people on Windows who have like installation issues. And like, I'm like, I have no idea how to help you because I don't even know how to use a Windows machine like normally.

Work at Arine and the tool-building role

Yeah, that's real. Well, I hope that you're working on a lot of things you do want to do right now. And I think I'm, yeah, I'm so curious to dig into a bit about your like AI work and what you're doing as a data scientist now. I figured what one approach that might be nice is maybe you could tell us a little bit about your work at Arine now. And then I'd love to circle back and maybe talk about kind of your path to getting there a little bit.

Absolutely. Yeah. So, I mean, I think related to your previous question as well of like, you know, I like to do things that I'm passionate about really is kind of what it's about is the thing that's really got me going in terms of data science has been like applications in healthcare and like the kind of patient focused approach to analysis and trying to like help improve patient experiences. And so that's really a big part of what's driven me to my current position, which is at Arine, which is a kind of medication management startup where we're really trying to like, make sure people are taking like correct medications and they're, you know, not, they're adhering to their medications and like all of stuff like that, where it's just like really kind of focusing on the patient experience and trying to make sure like we're improving health outcomes. So like that for me is something that I can wake up in the morning and be like, yeah, stoked to do that, you know?

So yeah, I think it's like, like for, for me, I'm very much drawn to, to these kinds of healthcare applications in general, but it's also like, I'm lucky that I get to work with a lot of really wonderful, intelligent people who are kind and, you know, like just the joy to, to, to work with as well, which makes a big difference.

Yeah, that's awesome. What's the team you're on, Mike? Like, what's the, like, how many people do you work with and what's the setup like?

Yeah, we're a team of like, I think maybe 10 to 12 people. We're kind of split between like health outcomes and data science. And I'm, I kind of do both. I do a lot of like tool building, like, so, you know, if there's a lot of kind of tech debt and like things that have been built that are really kind of ad hoc and inefficient, I do a lot of like, kind of working with whoever wrote that original code to like, be like, let's create this, just turn this into a pipeline where it's like, you just run a script. It maybe does something with like Quarto even down the end and like creates this nice report. And like, you don't have to like, like, like, you know, run this and then run this and then change this config and then change that config and then run that and then run that. And then, oh, you want something else? Okay, let's do it all again. You know, like, like, like I, like the people that I work with are a big combination of like engineers, like people with stats background, people who are more data kind of science background. Like it's a pretty diverse team. And it's, it's really cool. Like everyone's really easy to work with and really friendly.

Yeah. Oh, cool. Yeah. I love the, like the tooler role where you get to kind of roll in and help like spruce things up and like put some glitter on it. How is it, how do you find, like, how do you choose what to beef up or like which tools, like who needs or what needs a little bit of like tool love?

Yeah. Usually there's a, like, it's pretty clear, like when we've got inefficient pipelines, you know, like, I think it's also, you know, a nature of, of being a startup where a lot of things are built kind of ad hoc and, and for individual customers. And then it's like, oh, we need to kind of scale this up to be more general and more applicable. So, you know, it's a lot of the time, it's like, there's a lot of things that we do repeatedly for customers that like, we need, it's like very clear. It's like, this could be improved. So like, like it's pretty easy to point towards, okay, that particular pipeline is really inefficient and could very much be like benefited from having someone create kind of a overall compute pipeline that makes it a lot easier to go from point A to point B, you know, rather than going point A, A.1, A.2, A.3, and then eventually you get to point B and you realize a lot of things that you could have done differently along the way.

Moving between R, Python, and other tools

Yeah, yeah, for sure. And I, I think you mentioned before this that you, I think you, I've seen you've done a lot of R, work in R, but that you might be doing a little more like Python and stuff now. I'm curious the kind of mix of tools you're using in your work now.

Yeah. So for a very long time, like I learned R when I was like 18, you know, I'm not going to tell you how old I am now, but I'm older than 18. So I learned R before the tidyverse existed, before Hadley had created ggplot, you know, all the good stuff. And that was like very much like, you know, like that was what I was comfortable with. Like, I was like, yeah, I get it. You know, I'm, I'm R, R like, I felt like I was R queen, you know, I'm like, I know how to do R things. And then Hadley came along and like ruined everything. Because now he's created all these new tools that are way better that I had to go and learn. And I just, I remember someone being like, like, like I was fortunately for me, I was like, I was in grad school by this point. And someone was like, okay, we're going to use this new thing called ggplot and dplyr and this like project or whatever. And I was like, yeah, can't I just do what I've always done? And it was like, no. So that, you know, forced me to learn. And then I realized, like in that experience, I also very much realized like, oh, things do kind of keep changing. You can't kind of just like learn the one thing and then, you know, say, oh, I've mastered it. You know, like it's, you never really master it because what you've mastered today is going to be really irrelevant in like these days, really irrelevant in like three weeks.

You can't kind of just like learn the one thing and then, you know, say, oh, I've mastered it. You know, like it's, you never really master it because what you've mastered today is going to be really irrelevant in like these days, really irrelevant in like three weeks.

But like, you know, back then it was like a couple of years. And so now it's kind of like, so like through, through kind of like grad school, I did a lot of R and like, you know, I feel like I got pretty good at R. I like, you know, mostly like a lot of the stuff that Hadley and co have built, you know, and that was like my day to day toolbox, you know, and like in, in academia, I was in a statistics department, like people are using R that's kind of like the normal thing.

And then kind of gradually as I started trying to like intentionally expose myself to, to projects where I would be forced to learn new tools. I kind of like started, like someone was like, Oh, Hey, I've got this project that like, I realized would end up, you know, I'd have to learn some NLP. I'd have to use Python. And I'm like, the stakes were low, you know? So I was like, this is perfect. Cause like, this will be a really good opportunity for me to like learn like these new tools where it's like, there's no particular like deadlines. Like they don't have any expectations. Like, like they don't know anything about data. So like, it's like a really good opportunity for me to like take the time and be like, okay, that's, I'm going to take on that project. Cause it means I'm going to learn something new.

And this is like, for example, when I was at the University of Utah full time, you know, I found some collaborators in health informatics and health communications that were like, they really wanted help with like a lot of this kind of text data that they had. And so I specifically like, I was like, yeah, I'll help with that and learned a lot of new stuff there. And now I've kind of transitioned to industry. I feel like, and especially with like the advent of AI, like I feel like I've gotten really good at picking up new tools. And I think like, that's one of my main skills is like, like, sure. I'm really good at R and like, I'm decent at Python now, but like, I wouldn't say like, those are the things that are going to propel me in the future. The thing that's really going to propel me in the future is like, you show me a new tool. I'll like play with it and like learn how to use it really quickly. Mostly because like, as long as you can get to like a basic level of competence, like AI can kind of help you get the rest of the way these days, like pretty quickly. So like in terms of the specific tools that I'm using, like yesterday I used like Python, I used R, I used Quarto, I used Streamlit, which is like Python. It's kind of like shiny, but not really like shiny Python kind of, you know, I stared and got very confused by an AWS step function. Like I'm now, like I use SQL, like every day I'm just like exposed to like such a broad array of tools that some of which I'd never even heard of like six months ago, you know?

AI tools and the challenge of keeping up

Yeah. Yeah. That's cool. You're able to just pick up new tools and kind of move into them faster. And I almost wonder, it's interesting you brought up ggplot2 also as being the one someone might dread learning because I think doesn't, I think ggplot kind of warns people, like I think it gives people a warning, like, hey, it doesn't say you're going to have a bad time, but I think it's like, you're going to need to put some work in before you have like a good time.

I distinctly remember when I first started using ggplot2, I was trying to make just like a simple scatterplot and I just kept getting an error and I did not, I could not figure out what I was doing wrong. I'm sure now thinking back, I was just like missing the AES like function, but like I was getting so frustrated and like, it was just like, like now I look at it and I'm like, well, it's so obvious and it's so clear. And it's just like, I think it's just like when you aren't used to learning new things as well, like, especially like when it's a big change in syntax, it's like, you kind of look at, you look at the code and you can't see what's wrong with it, you know? And it's like, it's also like a different era of like getting help. Like there, like it wasn't obviously where chatbots, like there wasn't even stack overflow. There was like the R help mailing list, which was like pretty, pretty unfriendly.

Yes. I mean, I mean, a thing that's a thing that's been keeping me up at night lately, thinking about is, is, you know, I feel like the whole model for how people discover and start using new open source technologies, like new R libraries, new Python libraries is going to have to change because like our LLM coding agents and assistants are all really good at all of the projects that exist now, which have rich bodies of training data available on GitHub and stack overflow and all over the internet. But if you build something new, almost by definition, there's not going to be any training data available. And so essentially we're going to have to build things in such a way that we can like point the agents at the projects documentation or like create, create the projects in such a way such that this new thing can be presented up to our, our, you know, AI co-pilots so that they can figure out how to take advantage of something new. Because otherwise we're going to end up like locked in the present moment of like, nobody uses anything new because if their LLMs don't know how to use it.

No, I, I think about that a lot actually. And I encountered that, like, cause like I mentioned, I've been using Quarto, like specifically what I've been doing with Quarto is building like a quite sophisticated Quarto dashboard, like report with like all the tabs and like, you know, cards and all this kind of stuff. And like, they're like, I'm using Gemini cause that's what we use at our company. And like, it does not know anything about Quarto dashboards. And so like my default now, when I'm looking up something and I'm like, oh, how do I do this thing? Is like, I type it into Gemini or GPT or like whatever I'm using. Or like I use the inbuilt kind of like thing in my IDE, like, like that's my go-to now. And it's very clear to me, like when I encounter something that's new, that like, it is kind, it gives you a lot of very confident advice. Like it, it doesn't tell you, Hey, I don't actually know much about this cause it's kind of new, like, or there's not a lot of documentation. And it, it very much like gives you a lot of confident advice and you end up trying to spend a lot of time debugging the terrible code that it gives you because it's just made it up. It doesn't like tell you that it made it up.

And then like, at the same time, like I used to write a lot of blogs, like tutorials and stuff like that on a lot of R stuff and like, you know, coding stuff. And like, there's not nearly as much incentive like to do that anymore as well. Cause it's like, well, people aren't Googling to be like, how do I use the purrr package? They're going to AI and they're being like, how do I use the purrr package? And it's going to do a much better job than I ever did. You know? So it's like, things are shifting a lot in terms of how we learn things and how we troubleshoot for sure. And I don't know how that's going to look in terms of new stuff that's coming out that really doesn't have a whole lot of documentation and, and people asking good questions and giving good answers on Stack Overflow and stuff like that. Like, who knows?

The posit::conf talk: AI hype, help, or hindrance

Yeah, I know you gave the talk AI hype helper hindrance. Maybe you could explain a little bit of that. So that was at posit::conf.

Yeah, that was a really fun talk to give and to prepare. Actually, it was kind of funny. So I think when I started preparing or like when I proposed the talk, I was very much like, yeah, I know how to use like all these like AI agents and, and whatever IDE I was using VS Code a lot. Cause I wanted to use Copilot. That was kind of the main option when I started using it. And then I came around to like the point where I was like, okay, I have to actually like write this talk and like everything had changed. Like literally, like there was like no resemblance between like how I was using it when I started using it to like how I was using it when it actually like, or like how I could have been using it. Cause like, it's another one of those things where it's really easy. You start using it and then you're like, okay, I get it. And then you stop paying attention. You know, you're like, I get it. I don't need to like see all these new things that are happening. But basically like so many, like, you know, things have been added. There was now like an agent, you know, and I kind of focused that particular talk a lot on like the Positron assistant, but it's the same, you know, I'm using like Claude in like PyCharm a lot, like in my current job. And like, you know, VS Code has equivalent stuff as well. Like there's like an agent, there's like an editor, there's like a chat, there's like code completion. There's like all this stuff that is now a part of like these AI tools for coding that like, I really like, wasn't like keeping up.

And they're like, I feel like I do this kind of way where it's like, I'll like learn something and then I'll be like, yeah, I get it. And then I'll suddenly realize for some reason, oh wait, it's moved on. And then I'll like have another rapid period of like learning and then I'll like stagnate again. And then I'll be like, oh wait, what? It can do what now? So like, like that kind of thing is like kind of, it's hard keeping up I think with that. Like I struggle with that for sure. But I think my conclusion, like really from this talk is like, I've been like using like these tools for quite a while now. I probably started when Copilot was pretty new. And like for the most part, it was like hype, help, or hindrance. Like, well, it's kind of hype. Like it's not really going to do everything for you. Like you have to really be able to tell it what you want to do, what you want it to do. It can't really read between the lines if you give it a very vague kind of prompt. So like, it's not, I don't think it's going to take our jobs, but like, is it like, it's like hindrance, like a little bit, like it gives you wrong code. Like there's like security issues. Like sometimes it just sends you in circles. Like there's a lot of kind of like things that are not great about it, but like mostly it's just really helpful.

Like honestly, like all of these tools. So like, this is like Positron Assistant, GitHub Copilot, Cloud Code that are embedded inside of your IDE. Like whether it's Positron or VS Code or PyCharm or a little bit RStudio, that really just has code completion. But you know, like these tools as a whole, like I think I said, like in my talk, it's like for some projects, like they don't really help me much at all. Like if I'm, if it's a pure R project, like I'm so good at R that like, I don't, I don't need, like, I don't have to think that hard, you know? Whereas like, if it's another project where I'm using like Python or like, and you know, writing a bunch of like really complex CTEs in SQL that like, I'm not that good at, you know? Like it means that instead of spending all my time Googling how to like fix the thing that I did or figure out what's the function that I need, it helps me just like, I know exactly what I want to do. Cause like I have that base competence and it just really kind of helps me like go from that, like level of like, I am kind of competent in this, but I'm not an expert to, I can just produce the thing as if I was an expert. Like it really covers that gap really well. But so I think my conclusion was like hype, help, hindrance, all three, but mostly help.

Yeah. Was your, did you get any, like in maybe like feedback or response, like did you have a sense for like the audience, how people felt kind of overall? Where were they on the like help, hype, hinder?

Yeah. This is not a continuum. Yeah. I had a lot of people talk to me after, after this talk. And for the most part, it was actually a lot of people who felt like they didn't know how to get started with these things. So they weren't even at the point of, of saying whether it was hype, help, or hindrance. Like they were aware of these things existing, but just like literally didn't know, like how, like, how do you use them? Like what, what are they? You know? So like, I think like it was kind of interesting because preparing for that talk, you know, I was talking to several people during like the speaker coaching where they're like, I think you can assume everyone in the room is going to be familiar with these things. And I was like pushing back a lot being like, I don't think so. Like, like I think there's a lot of people like, just because like, you know, me as someone who's like very enthusiastic about like learning new things, even though it's hard to kind of keep up, like, and like, I think it was a very biased room, like this kind of preparation kind of session where it was like, everyone in that room like is excited about learning new things and is kind of like, you know, whether you're fearful or excited, like you're engaged enough to like go and, and put time into learning these kind of how to use these new tools. But honestly, I think the vast majority of people aren't like that. Like, and that's fine.

Like, it's not like a big deal. It's hard. Like I distinctly remember this feeling of like, I'd learned base R. I was like, I'm good. You know, this was the skill that I needed. I needed to learn R. And then it's like, why, why should I learn something new when I can already do everything I need to do just fine? Like, how is it going to help me? And like, I think the vast, yeah, the vast majority of people who kind of came up and chatted with me afterwards was people just being like, thank you for just showing me how I can use these things. And like, I kind of want to give it a go now. Like, definitely there's a lot of people who like, I think the more experienced you are, the more likely you're going to lean on the hindrance side. And sometimes the newer you are as well, cause like, if you lean on it too heavily, but you don't actually know anything about the language, it's also kind of a hindrance because you can't debug. You don't know how to prompt it well enough to, to be able to make it like do the right thing. So it's, it's kind of a, a spectrum. Like it's not, it's not great for everyone at every stage, I think.

Yeah. I mean, I was going to say like my, my hot take on the matter, like I've been using a cloud code, like Cody agents pretty extensively look like this year. And, but I, I think like so-called vibe coding is way, is way over hyped in the sense like that, that, you know, that, and there's a lot of people like AI boosters going around saying that, that, that soon, soon, soon trademark, there's going to be the coding agents are going to allow somebody without coding skills or data science skills to replace, you know, a senior or expert person in those fields. But you know, as a user of these tools every day, like I, I simply don't see it. Like, I feel like, I feel like being in the loop and reviewing the work that's being produced, it's helping me go faster. It's doing the typing for me. But, but like, I, I feel like if I weren't in the loop, like reviewing the work and giving feedback and catching the mistakes and whatnot, which just sounds like exactly what you're saying that, that yeah, without having somebody having the experience in the loop to be able to judge the output, you could end up, you know, creating a morass that's very, you know, difficult to escape, you know, quite quickly.

And so I can imagine in sometime in the next year, we're going to enter some kind of a trough of disillusionment where, you know, a big wave of business users try vibe coding and end up disappointed and conclude that AI sucks. And, you know, was overhyped and oversold. And even though, you know, they feel like they have a gun at their head saying, you know, use AI or else the initial, a lot of the, you know, attempts to eliminate data scientists or eliminate coders are going to fail as a result.

Oh yeah. There's definitely a lot of people who are like not good at using the AI who think it's just like a waste of time and like way overhyped. I don't know. I watched Hadley's keynote at posit::conf though, so I'm convinced.

I was interested, I read a good kind of criticism of AI by Corey Doctorow. And I think the thing that he kind of pointed out, like the real kind of risk right now is not that like AI will replace software engineers, which I don't know, like the more I use it and even as the better it gets, I'm like less convinced of that. The real risk though is like CEOs believe that it's going to replace software engineers and hence, you know, start firing them. It's just, yeah, there's clearly like, like I love the tool and how empowering it is, even for me as someone who's like a pretty good programmer, but just like all of the, like the stuff around like concentration of wealth and like the, I don't know, a lot of the people involved and the people who stand to make the most money from this is, it's like pretty, it's pretty icky.

The real risk though is like CEOs believe that it's going to replace software engineers and hence, you know, start firing them.

It is so interesting. I find it really hard sometimes with like a very AI assistant heavy project, like in the early stages too, I find it really hard to like evaluate, is this even going to pay off? Like I just feel like sometimes it gets into states where I'm like, I don't know if I'm just generating a lot of garbage, like how can I kind of like, will I be able to reign this thing in? And I also don't know what the moves are either, even it, it feels like very managerial almost, but also embracing so much production that it's like trying to control like a fire hose sometimes.

But I do think to that point of like, will software engineers be replaced? Do CEOs think software engineers will be replaced? It is such a funny time of like, what is the hard challenges that will kind of like linger? Well, I think for me, I think what's interesting is that, that, you know, coding agents have revealed something that we already knew deep down, which is that probably 80% of the work that we do as programmers is not that special.

It's a lot of like configuring, you know, configuring YAML files and writing, you know, grody bash scripts and doing things that are a little bit, you know, essential, but like not, not special. And so, but where the actual, you know, value is in that 20% that, that, that is more special that requires judgment or requires like synthesizing your experience or your background in a certain, in a certain domain, like for example, data science, like requires a lot more human judgment is a lot more subtle than software engineering, which is, you know, software engineering is often implementing something based on a specification and then showing compliance with a, with a specification. Whereas like data science is, can often be as much art as, as science of knowing, like, you know, what are the right tools to employ? And then even interpreting the results may be, you know, may be subjective.

AI in day-to-day data science work

Yep. Rebecca, I'm curious, like at your work, what do you see with AI? Cause it sounds like you're, I mean, you've done a lot of really great work talking about it and kind of helping people like ease into it. Like, what have you seen in terms of people picking it up and trying it out?

Yeah, I think there's big spectrum, right? Like I know people who are like, have no interest in trying it out and using it and incorporating it into their day to day. And like, maybe they don't, if they're only ever doing stuff that's like the same as what they've already done and they're really good at that, like maybe they don't need it, you know, like maybe it doesn't help them that much. But then I also see a lot of people who are like really diving deep into it. You know, like there's a lot of people, you know, at my company and outside, like who, they're like building all these like agents to like do all this stuff, like just to like take as much away from the human as they possibly can, but like in a good way for them, like they're very excited about it.

I think other people look at that and they're like, you know, they get that fear. I mean, the first time I ever used it, I definitely had that kind of fear of like, I remember I just asked it, I just said like, write some code to like do, I don't remember what I asked, it was like some kind of R code I asked it to write for me. And it just like did it really well. It was like early days of ChatGPT and I had this like sinking feeling of like, oh no. Yeah, thank you and how dare you.

Yeah, like everything I've like, I got so good at this thing that like I really didn't need to get that good at anymore, you know, I just needed to get like okay at. And like I think a lot of people are, I feel like people, there's so many dimensions and like I've spoken to so many people who are like, some people are really excited about it, some people really hate it, some people think it's stupid and useless and like some people are like very anti, some people are very pro, you know, like it's the entire spectrum, I think. And I don't even know what the distribution is because like I'm probably biased, you know, because like I initially was, I was never anti kind of AI, like it definitely, I think there's a lot of ethical questions and concerns that I have for sure. But like in terms of a tool, like once I started getting better at like using it, like it just increased my efficiency so much that like I now spend, I don't spend that much time doing it, but like, you know, when I'm talking to people about it, I'm usually trying to like convince them how useful it is because a lot of people are skeptical.

And like I understand because like I think initially I was kind of like, okay, cool, it can write that code, but like I could also write that code, you know, but it can do so much more. I know like preparing for this, you mentioned, I think you mentioned like critical thinking skills being a really big piece for a data scientist. I'm curious how that, that kind of plays into this, like the infusion of AI into your work.

Yeah, definitely. So it's kind of interesting because I think I use AI a lot more to do tool building than I do for actual analysis. So like, I think when I'm doing actual like data analysis where I have to like do some explorations and maybe I'll be like, how do I write this complicated SQL thing that I need to write? And like AI will do a better job than me. Like I might use it to like help me like come up with my SQL query. But like, like overall, like other than just like maybe sometimes outsourcing code that I can't immediately like without trying think of how to write, like I don't use AI that much for exploratory stuff. That is still very much just me thinking and me being like, okay, like, okay. I see that there's this many claims. Is that normal? Like how many are there supposed to be? And then I'll go look at like another data set and compare. And like, I'll be like, oh, that seems low. Like, you know, and then I'll have to think about like, how do I need to, I need to communicate this to like the right people. How do I want to, based on who it is I'm trying to communicate it to, I need to think about, okay, do I just need to like write a sentence? Do I need to like create a table? Do I need to give them the code? Like, like none of that, AI can't do any of that. And that is like, like, cause my job, I do a lot of tool building, but I also do a lot of analytics. So it's kind of like split 50, 50. The tool building stuff, like AI all day, the analytics stuff, like a little bit of AI, like, like I said, like to kind of help just like speed up the code writing when it's like, it doesn't really matter. I just need the query to be written. And I can't like, it's like, how many CTEs do I need? Like, you know, what's the, how do I lateral flatten? Like, I can't remember any of that stuff. Like, so like for that kind of stuff, it is helpful, but like, like for tool building, I think it's a lot more useful whereas for things where you do need to do a lot more critical thinking and thinking about how to communicate things. Like that's all still, I think me as a human needing to do that.

It's interesting to hear about data analysis that like this activity, when you're in like tool building mode versus like data analysis mode, you're using AI a lot more heavily for the tools. Yeah. I'm curious if you have had like similar experiences with like working on tools versus analyzing data. Like, do you use AI to analyze data ever?

I don't do that much data analysis, but it does seem kind of like fundamentally less useful, at least right now, just because there is that loop. Like you do, like you don't, like you don't know what you're doing. Like sometimes you don't know what you're doing in software engineering, but most of the time you've got like some tasks, you kind of know what you need to do. Whereas a lot of the time in data science, you're just like, well, what, like what's going on here? And there's like a much more of a loop of like, do something, look at the results, do something else, look at the results. And yeah. And bringing in all of your other context about like, like what, what does this variable mean? And like remembering, oh, actually, you know, there was this problem back in 2020 and you can't trust the data before that. And like all the, bringing all of that context, like, like, yes, it would be awesome if that was all written down somewhere. And like, probably we should be striving towards that just so people in our org know, but also so agents in our org can know, but it feels like we're a long way away from that. It's just the kind of tacit knowledge and just like the kind of EDA loop of like, you know, try something, ask a question with a plot, look at the answer, generate another plot and do that as rapidly and many times as you can.

I mean, I've definitely, yeah, I mean, I've definitely used it for like a lot of mundane, like, you know, mundane, mundane data wrangling and, and, and like, you know, one-off simple, like simple queries. Like as an example, like in my home here, I have, I have some like little apps that I've, I've built with cloud code that I run on machines on my, on my home network that I can access over, over tail scale, which is super convenient. But then they have SQLite databases, some of them, and every now and then I'll have a question about like, what's going on with this, something looks wrong in this app. And yes, I could SSH into the machine and I could run, start SQLite three and I could write the SQL query, but, you know, using an agent, I don't have to, and everything is very simple, but rather than something taking me 20 minutes of fiddling, it's something that can be done in, you know, 60 seconds or two minutes. And so that seems like a good, like a good application. But, you know, if I were doing something more, you know, something more advanced or requiring like discretion or judgment, I would be less comfortable.

I mean, I think what you said earlier about like, you know, even as software engineers, like 80% of what we do is just not, like, it's not that magical. It's not that, it's the same for data scientists. Like a lot of it's just doing pretty like routine stuff and having something to help you like do that faster is super, super empowering. But that last 20%, like if you completely take the human out of the loop, that starts to get like pretty scary and pretty, pretty dangerous.

Yeah. Like the AI can definitely help kind of take away a lot of those mundane tasks, you know, from us, but then like hopefully give us more time to do a lot more of the stuff that is less mundane, the thinking, you know, whether it's critical or not thinking.

Teaching, explaining, and the data scientist's communication skills

Yeah, I think I really appreciate just all of this like thought on AI and teaching. And I'll say, I am impressed that you have like held onto your accent for pronouncing data.

And my accent, I think is at this point, it's increasingly American though. Like there's like, I think you still sound like a solid Kiwi. But I don't say data. I say, I don't say data anymore. I say data. Yeah. But it's data. I mean, I, when, when I was teaching R a lot and maybe you get the same thing where I'll just be like, ah, and people are like, yeah, that's why I say R now. That's because Australians and New Zealanders have a non rhotic R, which is set like the vowel sound. It's very confusing.

I like to think that you're, are you saying whenever you chat with like an Australian or New Zealand, you're like clocking their use of data. You're like. Hadley clearly is. Yeah. Just when I'm in America and someone says data, I'm like, oh, that's not an American.

I think there's some words, like I do put R's in places now where I don't think they belong. The letter R and the sound R, but I like, there's certain words that I hold onto with dear life. So for example, like I'm not wearing a sweater right now. I'm wearing a jumper and I will hold onto that for the rest of my life. Even though every time someone's like a jumper, isn't that like a jumpsuit? And I'm like, no. I love that. I think it's good to keep people on their toes. You know, like you got to have like a couple of things.

Yeah. Rebecca, thanks so much for coming on. I really appreciate all your thoughts on AI and also your role building out tooling. I feel like it's so such an interesting role for data scientists to take in orgs. So thank you so much for coming on. Thank you so much for having me. It was good to chat with you all. Yeah. Thanks Rebecca. Thank you. Thanks for coming.

The Test Set is a production of Posit PBC, an open source and enterprise tooling data science software company. This episode was produced in collaboration with creative studio, Agi. For more episodes, visit thetestset.co or find us on your favorite podcast platform.