Resources

Hands-on Session: GenAI to Enhance Your Statistical Programming - Phil Bowsher & Cole Arendt

video
Oct 12, 2024
38:47

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey, this is Phil and Cole. We're excited to do a live session for you today. Hey Cole, how's it going? I think we are live. So let me go ahead and share my screen, and we're going to talk through a couple things today, set the context, and then jump in as fast as we can. We only have 30 minutes today, so we're going to go pretty fast, try to give everyone a chance to play with some of these new and exciting tools that are in the space. So let me go ahead and share my screen.

So back in 2023, so about a year ago, I went to Cole and said, hey, there's a conference coming up. Why don't we do a talk together and focus on all the exciting ways that GenAI is impacting the statistical programming and the pharmaceutical space. And so we got together and we wrote a paper. I'll put it in the chat box. It's called AI Exploration and Innovation for the Clinical Data Scientist. This was presented at the FUSE conference, I think back in February, March timeframe. And you can find the paper here, and I'll put this in the chat so everybody can get to it.

And what we did is we, in this paper, broke down all of the tools that people were using at that time for helping to aid programmers. And there's a lot of interest right now in how can we support programmers, especially coming from commercial software, and aid them in the transition over to open source, especially R, using Python and tools like that. And so that was the perspective of the paper. But we decided as we were writing it that there was two new and exciting areas that we jumped on. One was the idea of can you use local models in addition to the public LLMs? And then also, if you do that, or even if you build and have access to other public models, how do you have an interface into that? And so we have some sections in here around creating Shiny interfaces as well as Streamlit interfaces for doing that.

And so over the course of, you know, four or five months this year, Cole and I gave a couple different talks about these topics. And what they kept transitioning over into was the power of Shiny for interfacing into LLMs. And so, Cole, I know this was something that felt like we've seen a lot with groups that we work with, with people in the public. There was also Joe Chang's talk at the Posit Conference a couple weeks ago. Winston spoke about this. So why don't you take the attendees through this diagram that you wrote and explain this a bit?

Yeah, this is my crazy brainchild. But the thing that I thought was really interesting is that OpenAI and ChatGPT went like crazy viral to the point that people who know nothing about software and technology and statistics were talking about it. And I think the thing that they really nailed was this little triangle that I developed here. It wasn't just that they created a cool model. It's the bottom right corner. They did create a cool model, right? They've made a lot of good progress there. But the thing that they really nailed is they created an interface that people could build on top of, and that's the top of the triangle, right? So they created an OpenAPI interface that folks can build against when they're building their own software tools. And so that's consistent. It's structured. It's useful if I wanted to build some app that used this model.

And so that, I think, is part of what made them enormously successful is that, and I've done this before, is like you build a tool that uses their model through that OpenAPI interface. And if they improve the model, all I do is like click a button and my tool gets better. I don't have to do any rework. So that extensibility and that platform kind of structure was really big. But then the other thing that they really nailed, and this was part of the virality of it, was the user experience. This chat app where you could like talk to the model and be like, wow, this thing's really smart. That was, I think, really innovative and it was what kind of made the power of the model click for everyone.

what kind of made the power of the model click for everyone.

And so these three things all kind of work together really nicely. And this is kind of the worldview that we think about a lot when we think about data products, that you want to think about how your users are interacting with what you've built. A lot of times they don't care, you know, how you built the model or how you tuned it. They want to have an interface that makes sense and provides value to them. But then the other thing is to build these interfaces so that you don't have to rebuild your UX every time the model changes. And so that's kind of the API interface and more kind of structured interactions. So anyways, all three of these go into a good data science ecosystem, in our opinion.

Shiny for Python and LLM interfaces

And I think, so what Cole and I did as the talks evolved, we focused on the landscape for GenAI and the open source ecosystem. And then we migrated that over into a hands-on workshop. And then the last talk that we gave in June was how do you use the tools that are on my screen for building Shiny applications. And so we made videos of almost all of these. And so we're going to leave that to you. And if there's some time today in our 30-minute session here today, we'll try to show you how to get up and running with some of these. But what we want to focus on today is this UI space. How do we create custom interfaces into LLMs, into models?

And so there's a lot of great ways to do this. There's Shiny for R that's been around for many, many years. It's really changed the game and it's really having an awesome impact on the pharmaceutical space. If you've seen Teal or other tools like that, it's really, really great. There's also interfacing oftentimes here with Python. And so you can use an R package called reticulate to interface and help with the interoperability of Python and R working together. But what's exciting and what we want to talk to you today is Shiny for Python. And so Shiny for Python came out, I think we released it about a year or two ago at our conference. And so what we want to talk about today is how do you get up and running, creating a Shiny for Python application and how do you do that, especially in this realm of LLMs and interfacing with custom LLMs.

Maybe it's your own backend that you've built using cloud and LLMs. Maybe you have local models that you're wanting to interface. Regardless what it is, Shiny is a great UI experience and you can do that both in R and in Python. And that's what we're going to look at for today. And so, Cole, with that, why don't we transition over to your repository and show them how they can get up and running with it and play with it today using some of the new tools that are at their disposal.

Getting started with GitHub Codespaces

All right. So the GitHub repo that we want to start with, which in order to do the hands-on part of this workshop, again, 30 minutes is a pretty quick turnaround, you are going to need a GitHub account. So the URL is github.com slash my name, Cole Arendt. Sorry for the, you know, last name is a little different, but slash Shiny dash chat.

And there's two examples here that we'll get started on. But as most of you probably know that you need a place to work, and that's a kind of pernicious problem with demos and workshops and stuff like that. And so what we're using today is Codespaces. So you see this button here at the top of the repo opening GitHub Codespaces. If you click on that, it will use your GitHub account and say, hey, do you want to open this in a Codespace? I've already done this. So I have a Codespace that I can resume. You probably will have to just create a new one. The default settings are perfect. And this is a GitHub product that there's a free tier, and then I'm sure there's paid, you know, stuff too. But basically the idea is it allows you to use VS Code directly in your browser.

And so that's what I'm going to do. I'm going to resume the Codespace. It's actually loaded right over here, but I'm just going to reopen it. And you should see something very much like this when you get there. Yours is going to take a little bit longer, unfortunately, because, again, I had the foresight of doing this. It takes like three to four minutes, and basically what happens is GitHub allocates you a URL. It'll look something like this, some prefix of names with some gobbledygook. And this is your private interaction environment.

There is some shared stuff that you can do. And then what it's going to do is it's going to provision you a place where you're running this environment. It's going to clone the repository and all of its code. It's also going to do some work for you on installing the Python packages that we're using, as well as the VS Code extensions. So if you're not familiar with VS Code, there's plenty to learn here. But we added some Python extensions, the Shiny extensions, which are fairly new. And then we also added something to provision the Python packages. So it should take just a few minutes.

I don't really have any method of feedback to know if anybody's running into trouble. Yeah, let me use my instincts here. So maybe give everyone a minute or two, especially if they're logging into GitHub or if they're new to GitHub. So a couple of quick questions and things to point out here, Cole. So VS Code is this extensible IDE, and probably a lot of people on today are used to either using some type of commercial IDE or RStudio. Maybe explain a little bit why are we using this for Shiny for Python?

Yes. So VS Code is an IDE written by Microsoft. There is a licensed version of it from Microsoft, and then there's an open source version. And so VS Code is a really great IDE. It's good for Python. It's common in the Python ecosystem. And so that's kind of the primary reason. RStudio does have a bunch of Python features, but it is definitely not a Python-first kind of ecosystem. But the other thing is that VS Code is something that can be built on top of. And so that's where you have all of these extensions.

So to answer the question there in the chat, you can install extensions if you want. There's actually some cool features inside of GitHub where you can install a bunch of your default extensions for your user and stuff like that. But VS Code is sort of bare without extensions. And so you add all these extensions, and that's something that Posit we're working towards is trying to make an IDE that has all of the goodness of VS Code without the kind of DIY you have to build your own IDE from it. And so that's Positron.

So I don't know if you wanted to chat about that a little bit, Phil. Yeah, no, I think the key thing is that I feel like when I go into RStudio, it's kind of what I need it to be. And when I go into VS Code, I kind of need to make it what I want it to be. We're working on that with Positron. It's a new IDE by RStudio. So I do think probably going forward, there'll be an IDE like this that we'll use. And I'll put that in the chat box as well. Currently, this is supported on the desktop version. And so this will be a very similar experience, but built from the perspective of VS Code is mainly for software development and software engineers. We're taking that and building this for data scientists so that they can have that multilingual experience.

Exploring Shiny for Python examples

So the other thing, this is where we wanted to transition to the LLM world. The interface that everybody's familiar with is the chat interface. And so that's what we're going to do. And this is, again, one of the huge benefits of code, right? I'm totally just stealing a package that one of my teammates built. Winston Chang, if y'all know.

So, yeah. So this chat stream package is a user interface for building a chat application. And so this is what everybody's used to. And the idea is you can do it yourself. And what you have to do is you just have to wire up a back end. And so that's the piece here. You might be like, wow, can we really do something meaningful in 20 lines of code? Yes, you can. But we need something else first. And the first thing here is that we need an open AI URL.

So you should run this line of code if you're walking along with us running the terminal. Basically, the idea is, remember, we talked about that little triangle. There's an API that you can talk to with our user interface we're about to build. And that has a model behind it. It has open API behind it or open AI behind it, ChatGPT. But I removed all the authentication. I'm paying for your API requests for the next 15 minutes. And so if you do that, you should get a little .env file here. That's going to get loaded up. And we're going to use this magic URL to use ChatGPT.

So the idea is, once you have that .env file, and then you go to run your Shiny application, you should get a chat interface with your model. And remember, I made the choices about where the API is, where the model is, what model we're using. I made all those choices for you. But the whole idea here is you're in control of the interface. And so if you were doing this in your organization, right, you could swap out what model is being used. You could swap out where the API is, if it's hosted inside your organization, or if it's hosted outside your organization.

So I got it. I just got no CSS. So you can see the model is working. I just don't have any CSS loaded. So it's not as pretty as it should be. That's amazing. It was working fantastically earlier. But you can see I'm interacting here. I'm talking to a model.

So yeah, so this is the idea, right? I wrote code. It pulled in this package that builds a chat interface with a model. Where is that model? It's an API. That API is a model, is an API that I'm hosting that has ChatGPT behind it. So if you want to do this after the next 15 minutes, you can go into the readme of this repo, and it talks about setting up a direction, like talk directly to OpenAI and their hosting of the model. In order to do that, you have to have an API key and an account and all that shenanigans.

So anyway, so that's the case for this. But this is still only in my local development environment. I can use this thing, but nobody else can. And so what we wanted to show you also is how to deploy this thing so that anybody can use it.

Yeah, no, I think the idea here is that you're at a pharma, and you have built some really cool stuff on the back end. Maybe you're using tools at AWS, or you have your own vector database, or maybe you have a local model or extensions of Anthropic, or maybe you're using ChatGPT. And you want to do what ChatGPT did. You want to do what OpenAI did. You want to build a simple interface for employees at your company to use and to use the power that you have on the back end. And so Shiny for Python is a great interface and tool to do that. And so what we want to show you today is without much sophistication, you can build an interface into these back ends that you have inside your company, or in your partnerships or relationships with these larger Gen AI companies.

what we want to show you today is without much sophistication, you can build an interface into these back ends that you have inside your company, or in your partnerships or relationships with these larger Gen AI companies.

Deploying to Posit Connect Cloud

So if you go to connect.posit.cloud, the login that you have to go through, and there is a little bit of a login dance. So if you haven't been before, you will have to go through that. And that, but you have to go through GitHub. So again, you're using your GitHub account. And the reason that we do that is because when you publish, and so this is what I was saying, we don't necessarily need to publish extension. We can publish directly from connect cloud.

And what you do is you go in here, you say, Hey, I want to deploy a Shiny application. And then you choose your repository. And so you can see, I can deploy all the different kind of entry point files inside of the repository. And so I want to deploy my chat application. And you can choose your version of Python, all that jazz. I'm also going to set my open AI URL here with my magic URL.

So one thing I want to point out is I just put the link to the publishing tool that we're going to use in the chat box. So if you'd like to try it out, the quickest way since we've got about eight minutes to play with this is to use your GitHub credentials. So you can log in with the GitHub credentials. That's right, Cole.

And then I think this is really thinking cool. So once you've got that page, all you have to do is just point it to Cole's repository, make sure it's pointed to the main branch and the app.py file, and then set the configurable variable so that that open AI underscore URL that we set in the terminal, we're going to set it here in this interface. And then, Cole, without much, go ahead and kick it off.

Yeah. Yes, we can publish. And now we're sharing this out with the world. And I really love how stinking fast this thing is. Anybody who has sat around and watching dplyr compile before will just absolutely love how fast this is because it's done already, which is insane. And so now this is a URL that anybody can use.

So cool. Why don't you try see if see how it does in creating a Shiny for Python app.

Yeah, so I will. I will confess, I played with this earlier today. And it struggled. It kept creating Streamlit apps and Dash apps. Yeah, see, it wants Dash. But if it makes you feel any better, I also found some other funny things. And, oh, wow, it got it. Good job. But check this out. It's not very confident, though. It was convinced it was Saturday, the last time I was talking to it.

So I think this connects back to the beginning, where at the very beginning of the workshop, we showed five different ways to create Shiny apps with gen AI. One of those, and probably the most popular one is Copilot. And I'll put a link in the chat box at how to set that up inside of RStudio. Originally, we were going to show that, but it's pretty straightforward. You just go into the global options, you can enable it, and then you have a gen AI programming assistant right inside of the RStudio environment. There's another tool called Chatter that lets you use OpenAI. But I think Cole and I, you know, what we mentioned about Positron earlier, there'll probably be some nice extensions for that.

Wrapping up and key takeaways

I really want to highlight this. There's a couple of ways I think you could, you know, take away from this. I think one of the easy ways to take away from this, I think there's probably people in the audience who are like, well, I mean, it's not like we did that much, right? Like, I mean, and you can't do that much in 30 minutes. But I think the key is what we're trying to show you is building blocks that you can then use and build something awesome with. And the idea is you bring the awesome to that equation, right? But these building blocks are really good tools.

And somebody made a comment about, you know, can this be used with such and such model? And the thing I want to highlight is, again, this is just using an interface, a structured interface that OpenAI made popular. Any model that uses that interface or any random API that somebody like me wrote that happens to use that interface, which is what this is, it's a random thing that I cooked up one night, will work, right? You just swap out that URL and point it somewhere else. It's like, hey, this is my other API that is a front end, it's a wrapper around Bedrock or whatever other model you want to use. And so the idea is to identify these building blocks and then, you know, piece them together to build something really cool that makes an impact on, you know, your company, the world, whatever, you know, just you bring the, you got to bring the ideas though, because these are just like building blocks that you can then put together.

you know, just you bring the, you got to bring the ideas though, because these are just like building blocks that you can then put together.

I think that where we'd point you next is that Joe and Winston gave some amazing talks at the Posit conference that we had a couple weeks ago. I don't know if those videos are made available yet, but I think the plan is to make them freely available on YouTube and definitely watch those. I've been interacting with pharmas a lot the last couple weeks and they have been starting to use those concepts and tools, using Shiny for doing data analysis and also using Shiny to help you build Shiny apps. And so there's some really great talks.

But hopefully today highlighted how you can get going creating Shiny for Python apps and then also how you can use that in the space that you have around Gen AI and LLMs, which seems to be a popular topic. So I think we have about two or three minutes for questions. And if anybody wants to post those in the chat, I can ask those out loud for Cole and myself, and then we will pass it to the first speaker for today.

Yeah, so somebody I think is asking about the cloud hosting. I know that the connect.posit.cloud is pretty much brand new, Cole. It's free, I think, right? There's a free tier, maybe some also. Do you know much about that yet? I don't. Yeah, I'm pretty sure it's free right now. Same thing with where we were developing earlier. This has a free tier as well. It's hosted. The thing is, most of these services, you're using somebody else's computer, basically. If anybody's ever seen that sticker, there is no cloud, it's just somebody else's computer. So a lot of times they'll let you use their computer up to a point. When you start trying to train a thousand models and that kind of stuff, they're like, hey, pony up, it's time to pay. So anyway, so there's definitely a free tier.

Connect.Cloud right now is free because it's alpha. But yeah, lots of good tools out there. Lots of things that you can download for free onto your desktop. You do want to be careful with hosted services because you don't want to rack up a big bill or anything. Because I think the thing here, Cole, is we wanted to show you what it's like to publish, but most people inside their company are going to have strict controls around publishing and how they do that. So you probably want to build these Shiny for Python apps in a managed space, and then you also have a publishing environment or platform that you use. But hopefully this gives you an idea of how you can ship things off onto the web, whether it's internal or external.

And somebody asked, Cole, how do I start the app after I created the .env? Yeah, yeah. So when you want to run the app, you have the .env file. There's this helpful little run button, and that is made possible via this Shiny extension. So if you don't have that extension installed, you want to install that, and it'll recognize that, hey, this is a Shiny app, and you can run it. But if you notice what happens here, it's just a Python command, right? Python dash m shiny run, and then there are a bunch of arguments. And so you're using Python to run the Shiny app, but the little run button makes it a lot easier.

I've got two quick comments here. So someone asked about how secure is the private data? Because a lot of times, the idea with LLMs is working with your own PDFs, your own documents. And that's why Cole and I originally explored the local models, but some organizations now have relationships with the big tech companies where they're doing these types of things. So typically, inside your company is where you're going to have a lot of this software in managed VPCs, virtual private clouds. So the security is just going to be inherited based on the IT team and how they set things up. And usually, you just take Posit software there or wherever you're managing things in-house.

So the next talk is starting. I think Cole and I can hang out while everybody transitions over. The talk that Cole and I gave at FUSE that sparked a lot of this and was originally the idea for the conference, the speaker is going to go now from Roche to talk about what Roche is doing internally to create chatbots to help people with programming. So I'd encourage you to leave this session and jump over to the next one, where we will be kicking things off for the conference today. And thank you so much for coming. Hopefully, this gave you some cool ideas on how you can use Shiny. And we will see you in October for R and Pharma coming up. So thanks a lot. Thanks, y'all. All right. See you later.