Getting Started with LLMs in R and Python: Tools and Best Practices

Transcript#

This transcript was generated automatically and may contain errors.

Again, but it's great to be here talking to you all and hopefully we can chat a bit more interactively after I talk for a bit. So yeah, today I'm going to talk about getting started with LLMs in R and Python. I'll discuss various tools for doing that and then some best practices.

If you're already familiar with, you know, working with these APIs from R or Python, the beginning will be a bit of an intro, but then hopefully towards the end there will be some more information that's relevant to you. And if you haven't done anything like this in R or Python before, this should be hopefully a good intro. But yeah, like I said, please, you know, ask questions when we have time. I'm happy to answer anything.

Great. So to start, I'm going to talk about how to get started with Elmer and ChatList, two packages for working with LLMs from R and Python. Then I'm going to talk about how you might add new knowledge to LLMs with system prompts, talk about tool calling, which is a way to give LLMs new abilities. I'll briefly mention some AI tools for doing data analysis, and then we'll also talk about some privacy and security concerns at the end.

Great. So you might already know this, but most LLMs are accessible through HTTP API. So this is just like sort of a normal API. If you've ever queried an API to get any kind of data, it's pretty much the same idea. And this means that you can interact with LLMs from languages like R and Python. So instead of, you know, using something like ChatGPT in the browser, you can interact with an LLM from R or Python or any programming language.

And there are a variety of packages for doing this. Posit has two that make it pretty easy to interact with LLM APIs from R and Python. For R, it's called Elmer, and for Python, it's called ChatList. And you don't really need to know much about how LLMs work to get started, but there is just a bit of setup, and this might differ depending on your specific organization and the models you have access to. But generally, you need an API key or some way to authenticate to these models. And if it's an API key, you're going to add that to your .R environment file if you're using R or a .N file, something similar if you're using Python, and then install the relevant package. And that's pretty much all you need to do to start interacting with an LLM from R or Python.

So even though this is essentially just a text file, it can be really powerful and you can really control the behavior of LLMs just with text, just with these system prompts, as well as give it new information.

Okay. A few tips and tricks for working with system prompts. First is that it, just like if you were explaining to another human, you really want to clearly explain what you want it to do. The more explicit you can be, the better. Examples are also very useful. If you have a specific way you'd like the model to answer, provide examples of that. You can also add structure as well as use LLMs themselves to help you draft your prompts. Claude, for example, has a prompt generator that can help you write a system prompt.

Tool calling

Okay. So we just talked about how to give LLMs new knowledge. What about giving them new abilities? This is where tool calling comes in. Tools give LLMs the ability to add new skills to their tool set. This lets them take action on the world, like, you know, affect your environment, make changes to files, run code, that kind of thing, and also gain access to real-time information. Okay. So before I talk about tools, we're just going to talk a little bit about how LLMs actually work and clarify what they can and can't do. So on their own, can LLMs access the internet, run code, send an email, do any of this stuff? Let's talk a little bit about tools.

Let's see. So what if you just ask the LLM from Elmer, what's the weather like in Seattle? Implied is like right now, not, you know, four years ago or whenever the model was trained. So if I run this code in R, it tells me I don't have access to real-time weather data, so I can't tell you the current weather conditions in Seattle, Washington. This is because LLMs don't have access, you know, just in their core technology to real-time data. They were trained in a particular time on a particular set of information, and that's all they have access to, unless you have given them access in another way. Okay. What if we ask it to affect the world in some way? For example, what if we ask it to write a file to a CSV? Again, it basically says it can't. It says it's not able to create or write files on a computer. It can only provide code and text responses. So they don't have the ability to access up-to-date information or the ability to affect the world. And this might seem a little off to you, because you have probably seen an LLM access real-time information or affect the world in some way, or, you know, have heard that they can basically give you access to real-time weather data.

Whatever you want them to. And the reason is that you can hook LLMs up to tools to give them these abilities, to give them the ability to access real-time information or affect the world in some way. Tools are essentially extra capabilities that you give the LLM. And a tool at the core of it is essentially a function and some metadata. If you've ever written an R or Python package, it is kind of like if you wrote a function in a package and then documented it. That's all you really need to give the LLM to give it a new ability, is a function and then metadata telling it how to use that function.

So let's see what it looks like. I'm just going to go over this example in R, but it works very similarly in Python. And I'll give you the link to the documentation for Python after this. Okay, so let's go through this. First, we need to define the tool. This is a tool function from Elmer. And this is going to allow us to define a tool that we're going to hand to the LLM to give it a new ability. Again, I said a tool is a function plus documentation. So this line is the function. We're giving it a bit of code that it's going to request to be run. In this case, it's a function that queries a weather API for a particular latitude and longitude. And then we give it the metadata. So a name for the tool, a description, and the arguments. And again, this is essentially like a function and documentation. We're telling the LLM how to use a tool. After that, we register the tool. This lets the LLM know it exists and that it can request to use it. Okay, so once we run this code, now we can ask the LLM, what's the weather in Seattle? And it actually gives us a real response. So you'll notice that in the output, it shows that it did a tool call. It requested that the tool get weather be run for a particular latitude and longitude that the model chose. It received a bit of data back telling us the current weather in Seattle. And then it used that data to inform its response back to the user.

Okay, so importantly, with tool calls, the LLM itself is not running code. It's not that, you know, it took a bit of our code and then ran it, you know, on OpenAI or Anthropic servers or in the cloud or somewhere else. This code is running on your laptop or wherever you've requested or wherever you've set up this conversation. Because LLMs cannot run code by itself. They essentially just take in text and output text. But they can request that code be run. And that's what's happening here. Instead, the LLM controls when the tool is called. So it decides when in the conversation it is appropriate to ask. It's appropriate to find out, you know, weather for a particular location. So it controls when the tool is called. And it controls how the tool is called. And this mostly means choosing the right arguments. You might have noticed in the previous slides, we didn't need to tell the model what latitude and longitude to use for Seattle. It knows that information. So it can choose the correct Latin long to pass to that tool. It's controlling how the tool is called. But the code is still running on your laptop or on whatever computer you've set up this chat. And, again, tool calling Python works very similarly. Here's the link to the documentation for ChatList tool calling.

LLM performance and evaluation

Okay. At this point or maybe previously or all the time, you might feel kind of apprehensive about LLMs. Maybe kind of like this cat encountering a Roomba. And maybe a little bit curious as well. You might have heard that LLMs can do cool things or very powerful things, but they also hallucinate, that they have privacy and security concerns, that they're wrong a lot, that they're untrustworthy, all of these things. So I'm going to talk a little bit about what LLMs are actually good for and maybe what you should avoid.

And the first thing is that LLMs are jagged. LLM performance is in some ways very unpredictable. You might think that LLM performance looks a bit like this plot where really easy tasks are very easy for the model to do. It performs great at them. But then as tasks get harder, model performance goes down. And really hard tasks are nearly impossible for LLMs. But that's not really how it works. Instead, this plot probably looks a bit more like this. There are some easy tasks that models are terrible at, some easy tasks that models are great at, hard tasks that they're great at, hard tasks that they're bad at. For example, coding is hard tasks that they're great at, hard tasks that they're bad at. For example, coding is often very difficult. Models can be very good at coding. However, they are very bad at counting. You might have seen the example of they don't know how many R's are in strawberry. If you give it a R vector, they're actually very bad at counting the number of items in that vector. And so because of that, I generally encourage you to think empirically instead of theoretically about LLMs. It's okay to kind of treat them as black boxes where you don't really know how their internals work. And instead, try it out and try through experimentation, figure out if they're going to be good at a given task instead of trying to theorize, you know, from your armchair. Because you might be surprised both ways. There are things that you think they could almost surely not do that they can clearly do today. But there are also things that you think maybe like counting that surely they are good at and can do today that it turns out they're terrible at. In general, I'd say experimentation is much more useful when thinking about LLMs than intuition about how the model actually works.

In general, I'd say experimentation is much more useful when thinking about LLMs than intuition about how the model actually works.

To help you do that, there are R and Python packages that can help you build evaluations for LLMs. So these are essentially experiments that you can run that assess how well an LLM is doing a particular task. For R, there's an entire dedicated package called vitals . And then there's also built-in evaluation support in ChatList for Python. So evaluations can help you compare models, compare prompts, ensure your tools are working as you want, all of this. It's very useful. Along with my coworker, Simon Couch , we've been writing a series of blog posts looking at how well various LLMs generate R code using the vitals package. And I made this little app that you can see the link to here where you can compare how well various modern LLMs do on R code, do on R code, generate on like a particular R code generation task. So this is mostly, I'm mostly giving this to you as an example of vitals. You can see the code behind the app if you click on that link. Okay, so again, my recommendation is to not just guess if a model is good enough, but to test it on your actual use case and measure how well it does. And vitals and ChatList can help you do this.

AI tools for data analysis

Okay, what about AI tools for actually doing data analysis? So far, we've been talking about packages that let you build the things that use LLMs from R and Python, but what about tools for actually carrying out data analysis? Posit has built several AI powered tools for doing data science. I'm just going to briefly introduce them here. So the first is Positron Assistant. Positron, if you haven't heard of it, is Posit's next generation data science IDE. It's like a, you know, you might think of it's, if you've used RStudio , it's a similar idea where it's built for working with data, but it's designed for both R and Python. So Positron Assistant is built specifically for data science workflows. You can generate code, refactor code, debug, can answer questions for you. And this is, so it's a coding assistant versus DataBot, which is the other agent that is built in Positron, which is specifically for doing exploratory data analysis. So DataBot can help you explore and visualize your data, doing in minutes what might have previously taken you hours. And it's also designed for really short feedback loops. So it doesn't, you know, work autonomously for 10 minutes and then come back to you with a summary of your data. Instead, it's more interactive. It does one thing, gives you the results and plots, asks you what you want to do next. You send a request off, it comes back. So it's a little bit more collaborative. And yeah, again, this is available in Positron. DataBot and Positron Assistant are really not designed to replace your judgment or your domain expertise or any of that, but just to accelerate your workflow.

Privacy and security

Okay. At this point, you might be wondering, are tools and ones, are these tools and similar ones safe to use? Positron Assistant and DataBot and tools like them can execute arbitrary code and see your files. This is where they get their ability to do powerful things like explore your data, help you write code, all of that. But this means that, you know, they can see your data, they can see your files, and they can run code that queries that data, moves your files around, all of that. And they can incorporate the information that they gain into queries sent to the LLM. So this means your data is going somewhere, and it is being viewed. But let's talk about exactly how that works for a bit.

Okay. So now I'm going to just briefly talk about privacy and security concerns. I know this is especially relevant if you're working with any kind of protected data, you're working with health data, you know, HIPAA data, any of that. First, I just want to briefly clarify the difference between a model provider and an LLM, because this turns out to be important when talking about privacy and security. So the model provider is the company that hosts your model that you might be paying, or like where you're getting your model from. So this is something like Anthropic or OpenAI. Or if you are using a local model, it might be something like HuggingFace. And this is in contrast to the LLM itself. This is the actual model that generates the responses to your queries.

Okay. This is important because LLMs themselves, the actual model, are stateless. They don't have any memory. They don't remember your prior requests. If I send some data over to an LLM, it is not storing that anywhere. It takes in that input, it generates a response, it sends that response back to me, but it does not store that information anywhere because of how the LLM itself works. They're stateless. You might be saying, you know, I know that's not true. You can carry on a conversation with them and they'll tell you what they're doing. I know that's not true. You can carry on a conversation with them and they will remember my prior request. And this is because each time you send a request to the LLM, it gets the whole conversation. So if you've had 15 back and forths at the 16th request, it's sending all those 15 requests to the LLM. This happens each time. It's sending the entire content of the conversation. And you have to do that because of the statelessness of the LLM.

Okay. So this is good news for data privacy. However, your model provider may log and store your requests and use them for other purposes. So this is why I've tried to clarify the difference between these two. Okay. To make it a little bit more explicit, I have a diagram. So if we're talking about a tool like DataBot or Positron Assistant that can see your data, it can see our Python session, it can see all your variables. Posit itself does not store your data anywhere if you're using our AI coding agents or any other time you're using a tool like Positron. But then DataBot or Positron Assistant sends a query that might incorporate that information to your model provider. And then the model provider sends that query to the LLM, but then might also do other things with it. It could use it for future model training, it might share it with third parties, and it might store your data. And all of these, those three things are in dotted lines, because those are maybes. Those aren't necessarily givens. And this is why it's very important for you or your organization to develop some kind of trust with whatever model provider you're using. So Posit doesn't store your data. I told you that the LLM itself doesn't store your data, but your provider might, depending on your or your organization's agreement. The good news is that zero data retention agreements, HIPAA compliant agreements, or other arrangements are relatively common for organizations to work out with model providers. So you can have an arrangement where they absolutely do not store your data, share it in any way, or use it for future training. But these are generally not defaults. If you just sign up for a free ChatGPT account, they probably are doing something with your data, which is why it's important for your organization to have a specific agreement with the model provider. If you want to learn more, we have a blog post on this that goes into a bit more detail.

The good news is that zero data retention agreements, HIPAA compliant agreements, or other arrangements are relatively common for organizations to work out with model providers. So you can have an arrangement where they absolutely do not store your data, share it in any way, or use it for future training. But these are generally not defaults.

Okay, so I know I talked about a lot today. There are even more packages that we at Posit have developed for working with LLMs from R and Python. Here are a few of them, if you're curious. So we talked about Vitals, Elmer, and ChatList. There's also things like ShinyChat , which let you build AI-powered Shiny apps, Ragnar for doing RAG from R, and many others. I've also put a link to some of them here.

Just to help you out, if you want to click through these links, I'll put a link to my slides in a little bit. Along with my coworker, Simon, I also write a bi-weekly, meaning every other week, newsletter where we cover external AI news as well as internal Posit AI news. This is the link to all the editions that we've published so far.

Okay, so thanks. I'm looking forward to questions if you have them. This is the link to my slides, and here are some links where you can get in touch with me. Thanks.