James Wade - Posit Academy in the Age of Generative AI - Lessons from the Frontlines

video

Oct 31, 2024

20:26

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Well, good afternoon, everybody. I'm really excited to be here today to talk about generative AI and Posit Academy, and potentially the impacts of generative AI on coding education overall.

An alternative title I could have given this talk is How I Learned I Was Wrong About Generative AI.

To say what I mean by that, I first want to tell you about my experience with Academy at Dow, where I work. We've had around 200 researchers that have taken, have participated in the program. And to give you an idea of who these people are, they're typically chemists, chemical engineers, maybe a physicist, somebody with a strong technical background who often has a PhD in this technical subject.

The learners themselves are more or less evenly split between R and Python. What's more, they don't really identify as a data scientist. They are part of a growing community we have internally of citizen data scientists, which you might have your own definition in your head, but for us, what it means is somebody who's interested in applying concepts of data science to their work, but doesn't identify as a data scientist as the core function of their role.

Initial reactions to AI coding assistants

Now, when I started playing with large language models and generative AI capabilities, I got really excited about them. I have a number of examples of some capabilities that I imagine you might be familiar with that can really turbocharge your own experience for how you can write code. And I thought that the same was going to apply to the Academy participants. We'd train them up. They would go off and create amazing things.

But when I asked them, they told me they didn't really like these coding assistants. Out of five stars, they gave them, over half of the learners gave them two out of five. Now, I don't know about you, but if you're looking on Yelp or Google Maps for restaurants around here to go to for dinner tonight, you're probably not going to go to the two-star ones. Neither will I.

So what gives? Why aren't these learners finding the AI coding assistants to be helpful?

Threshold concepts and learning

To answer that question, I want to ask another question. And I'll also introduce you to somebody named Andy Matuszak. He's an independent researcher who explores tools for thought and is a skeptic but extremely interested in how AI can augment education. He has a fantastic lecture that I encourage you all to go watch called How Might We Learn, where he asks a very important question for learning.

What were the most rewarding, high-growth periods of your life? I'd like for all of you to take a second and think about what might this be for each of you.

When he asked people this question, two themes emerged. Number one, learning wasn't the point. Rather, there was some other objective that they had to go learn something to go do. Secondly, learning, it really worked.

How many of you have seen a new package come out or a new capability you want to go learn? You started to dive in and you hit a wall. Of course, these high-growth, rewarding experiences should have the opposite impact of that.

So let's go back to these coding assistants. What's missing? Why aren't they providing this rewarding, high-growth experience? The answer for me is threshold concepts.

Now, if you're like me, maybe six months ago, you have no idea what threshold concepts are. So let's define them. There's a number of characteristics, and I'm certainly no expert in this, but some of the key points are, number one, these are ideas that once you understand them, they transform your perception and approach to a discipline or a topic.

Number two, these threshold concepts must be encountered. You can't be told them. When you think about these threshold concepts, think about struggle and embracing the struggle.

These threshold concepts must be encountered. You can't be told them.

They have some other important characteristics that can guide us and maybe give us a hint at why coding assistants don't allow us to master them. Number one, they're troublesome. They might be conceptually difficult, and they might challenge our existing ways of thinking about a subject. Number two, they're liminal. We might not learn in a linear fashion. We have to take some steps back as we're trying to take some of these steps forward. We get confused along the way. They're transformatives, where we're able to recognize new patterns, have new revelations, really have new understanding. And lastly, they're irreversible. Once you learn these concepts, you can't unlearn them.

Now, I want to take you back to my own rewarding high-growth period in my life. Also, the reason why I wasn't at PositConf last year. It's these two. It's these two. On the left is the troublemaker, Henry, and on the right is the sweet child, Penelope. They taught me a lot.

Number one, they taught me that asking for help means that you care, that self-care is not selfish, and that incomplete to-do lists are okay.

Now, a couple of years ago, if I were sitting in this audience and I heard these, I'd be like, okay, these are trite cliches, whatever. But to some extent, that's the point. I didn't experience this. I didn't go through the struggle, and so these concepts wouldn't have meant much to me until I experienced them.

Threshold concepts in data science

So let's bring this back to data science and to Academy. Part of Academy is to impart you with the skills to use all of the packages of the hex stickers I'm sure you recognize on the right here. We can learn these threshold concepts like tidy data enables efficient analysis or modular code enhances reusability and clarity, or maybe visualization is a tool for exploration and communication.

Now, if you've been coding in R for a while, you probably can resonate with these, but it might be hard for you to explain to a beginner why these concepts are so transformational.

Beyond threshold concepts, chat GPT or maybe pick your favorite AI coding assistant, there's a couple of things it won't give you, and it's beyond these threshold concepts. It's these learning dispositions.

One of these is a sense of the possible or agency, and this really comes through the mentorship that's part of Academy. You leave this understanding that you can tackle a problem. You have new capabilities and a persistence that you didn't have before.

Another learning disposition is flexibility. You can expand the palette of possibilities of the types of problems you're willing to encounter or willing to take on. You're not so constrained as you might not have been before.

How to use AI coding assistants in Academy

Okay, so let's zoom out a bit. You've heard me talk about all these things that these AI coding assistants won't do for you. They don't give you these threshold concepts. They don't give you these learning dispositions. Should we throw them away? I would certainly answer no, but it's important to think of how we're going to use these.

Let's go through some analogies. Many of you may have heard a phrase popularized by Steve Jobs in the 19, I think it was in 1990, of computers as the bicycle for the mind. The analogy here is that we're walking along and computers allow us to move much faster, but there's an important implication here. It's that, yes, with these bicycles, we can go faster, but we still have to steer. What's more, we have to know where we're going and we might have to fall a few times.

Maybe we need to continue maybe butchering the analogy here. Maybe we need to wear a helmet, but we can all agree that the bicycle is not an invention that's going away and it's probably something that we should pay attention to.

For a different analogy, does anybody in the audience play the cello? Actually, well, I do not, but let's say you're going to go compose a masterpiece concerto. You could ask some of these new AI capabilities to go write that concerto for you and it would produce one that probably sounds pretty good, maybe is passable, maybe even for a naive listener like myself, I would actually enjoy, but it's not the concerto that you would write.

Much in the same way, these coding capabilities aren't going to produce what we are going to create. If I'm going to go make a new Shiny app, part of the creation process is using the app itself, the creation of the app itself to discover what it is I actually want to create. So in that sense, writing concertos and coding are contingent disciplines.

Let's go back to Academy. I have here a very overly simplified curriculum of the foundations course, and I would challenge you to find anything from this that you would take out. The curriculum itself is 10 weeks long and it actually really challenges the learners to push them up to their limits and learning capabilities. I'm not willing to take things out like summarizing data or certainly writing functions. So what do we do?

We have 10 weeks and we've really settled upon that as an appropriate length of time for this. How do we introduce these AI coding assistants? My suggestion here is to introduce them gradually and intentionally.

For example, let's say week three, you've just got exposed to the pipe, you can start to summarize your own data. Maybe it's now it's time for you to use these AI coding assistants to do things like explain this piece of code, explain this piece of code that I don't understand, or maybe explain this piece of code from a peer who did something really cool. You could also start to ask it to expand beyond your current abilities, maybe by just a little bit, doing things like show me how to group two columns together.

Let's fast forward a few weeks later where you're starting to feel a little bit more at ease within a Markdown notebook, maybe plotting is becoming something comfortable, something you're excited to show off to your colleagues, and you've started to get annoyed with some spaghetti code. Now it's time for you to write some of this into functions. That can be both tricky and you probably also want to go and document some of this as well.

Well, as many of you probably know, these coding assistants are fantastic when it comes to writing documentation. It also can maybe help you improve some inefficiencies in your code. I've also found it's quite helpful if you ask it to impersonate some of your favorite programmers out there. In fact, if I ask the coding assistant to write in the style of Hadley, it usually does a pretty good job. As an aside, if you ask it to write in the style of Max Kuhn , it ends up going back to about a decade ago and majorly favoring Carrot, but that will come with time.

So let's fast forward to the end. You're almost done with your curriculum. It's time for you to go and showcase some of the great creations that have come about. Now's the time to really, you know, get on that bike, so to speak. It's time for you to maybe go a bit faster. Now it's time to maybe bring some of these code assistants inside of your IDE. Maybe it's time for you to go pay for a GitHub Copilot license, or you can go beyond the curriculum that you've already been exposed to. Because these models have been trained on tremendous amounts of code. Looking at general ways people solve problems is a good question you can ask of them. Asking questions like, what packages should I explore next?

Threshold concepts for generative AI

So how can we change this experience for these learners? I think that's part of the answer, but I want to go back to close here to threshold concepts again. Now I'll admit that threshold concepts are supposed to be established ideas within a field. That's certainly not true with generative AI capabilities overall. We're far from having clear established ideas in this space, but that doesn't mean we can't try.

I'd love to hear from all of you of what you think the threshold concepts happen to be for these coding assistants. Some that come to mind for me are drive faster, but don't forget to steer. Coming back to that bicycle analogy again. Or maybe something like prompting matters. Learning to use these capabilities is a skill. But I'm sure there's much more. What comes to mind for you? Are these right? Do these resonate?

Establishing these threshold concepts I think will be critical in successful adoption of these tools, which will keep us relevant as coders, but also make sure we're preparing these new learners for the future.

Establishing these threshold concepts I think will be critical in successful adoption of these tools, which will keep us relevant as coders, but also make sure we're preparing these new learners for the future.

I want to finish with a huge thank you to all of the people who made our Academy program possible. I have Tony Sokolov in the audience here who's done a tremendous amount of work to make sure we identify the right people who can participate from this program. And this has been a tremendous collaboration with Posit that gets me excited to come to work. And of course, to come here to the conference as well. So I want to thank you all for your time and I'm happy to answer any questions if you have them.

Q&A

Thank you so much, James. I have a question, actually. We've been listening to several talks that speak about how generative AI can augment your learning experience or augment your coding experience, just like you talked about. At what point, when you're a little bit more advanced at these concepts, at what point do you think it's reasonable or okay to use generative AI tools to learn new concepts?

Yeah, I think you can start with that quite early on, actually, especially if you can learn from examples. So if you have, if it's something that nobody's been, nobody has done before, you haven't seen somebody do before, that can certainly be a challenge because you're going to be pushing the constraints of what the model has seen before. But if it's something, if it's a new concept for you, but not a new concept for the community, it's likely that you'll be starting from a pretty good place.

But of course, the intuition that I'd like to give out there, I thought about including in this talk, but it's clippy, right? So you have this useful assistance on your desktop. Keep in mind that it has lots of context that it can provide for you. Of course it can be wrong, but lots of my colleagues other than Tony are wrong as well. And I still work with them and I still can learn from them as well.

So we actually have some viewers in the virtual audience that don't know what Posit Academy is. Could you give like a very short rundown? Absolutely. So Posit Academy is an offering unsurprisingly from Posit that's a 10 week, somewhat intensive apprenticeship program that takes you from learning little to no code in either R or Python and gives you a curated curriculum that allows you to adopt the skills that are going to be most relevant to you in your role. So it'll include things like we talked about, summarizing data, creating visualizations. There's a lot of customization that can be applied to that. Probably one of the more important parts about the whole program, in addition, aside from the wonderful mentors that you get to work with from Posit is that it's project based. So you get to apply what you're learning on as close to a real data set that's relevant to your work.

So we have another question that was kind of similar to what I asked before. You gave examples of what packages should I explore next or suggest ways to improve my code? What do you see as the dangers of learners relying on an AI assistant for next steps?

Yeah, so there's certainly, I mean, you can end up going in directions that maybe the field has moved beyond. So I gave the example of Max Kuhn with Carrot, right? He would certainly recommend that you use tidy models these days. So that's the sort of thing, maybe the documentation isn't fully up to date. It's possible that the code is going to be wrong. But it seems similar to maybe looking at a Stack Overflow post where nobody has had a chance to upvote or something like that. So the risks are different. The risks are now isolated to your own experience with these agents.

But it's not that, the experience is, the same rules apply, right? Where you want to be skeptical of code that you're going to go execute. But these agents are getting better and better. And there's an incentive especially for the commercial ones, for them to align with things that aren't going to get you in trouble.

And even from my own experience, I remember when some of the AI coding assistants weren't even able to access the internet and things have just improved incredibly. So, yep, good news. Hopefully that continues on that trajectory.

I think we have time for one more question here. So, would Posit Academy benefit someone who already has a lot of coding experience but lacks confidence with coding?

So that would just perfectly describe some of our initial cohorts. It was people who were very enthusiastic and didn't identify as coders or data scientists for sure. They ended up being star students in the curriculum. And also, there's pluses and minuses to having people who have this advanced practice. You can inspire some of your cohort who maybe doesn't have the same level of experience as you do. But sometimes that can be intimidating. So you need to be mindful about the discrepancy within an individual cohort to match.

It doesn't have to be perfectly matched. But yeah, it certainly can give people a level of certification that maybe you don't get. But if you're wondering yourself, if you need that certification, Academy is great. But that's not, just getting a certificate is not the reason to do it. I'm sure any of you who are feeling self-conscious about that, you're doing great. We all feel that way.

Thank you so much. Thank you so much. And we really appreciate your talk today.