Workflow Demo Live Q&A - December 18th!

video

Dec 19, 2024

31:45

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Awesome. Well, as a reminder, we do host these Workflow Demos the last Wednesday of every month. I know that's not today, but next week with the holiday, we moved it a week earlier. But all of the sessions are recorded, and we do have over 20 different workflows now. So I'll share the link to the full playlist in the YouTube chat in just a minute here, as well as the calendar link if you want to add the monthly recurring event to your calendar.

But I also want to add that I know many people who join these workflows are current customers, but if you are new to Posit Team and are curious about learning more, want to try it out for free, feel free to let us know here in the chat. You can also book time with our team, and we'll share those links in the chat in just a second too.

But I just wanted to go around and quickly do some introductions here. So I'm Rachel. I lead Customer Marketing here at Posit. I'm based in the Boston area, and I also host our Data Science Hangout that we have every Thursday.

Isabella, do you want to go next? Sure. Hi, everyone. My name is Isabella Velasquez. I work on the Developer Relations Team here at Posit, and I'm really excited to be here. Pass it to Sarah.

Hi. Yeah. Hi, again. I'm Sarah. I'm also on the Developer Relations Team at Posit.

All right. And I'm Ryan. Many of you folks probably recognize me at this point, but on a few of these demos. I'm a Data Science Advisor here at Posit, and that falls underneath our Solutions Engineering Team. I delivered a few of these sessions in the past, but really I'm just kind of here to help out with random questions you all may have about the workflow, but also just Posit team in general.

Overview of Posit Team

Awesome. Well, I saw a few questions were about Posit team and about the specific products, like there were questions about Connect or Workbench. And Ryan, I thought it might be helpful to just, if we could do a quick overview for everybody, that might help in the Q&A session too. Absolutely. Can you all see my screen okay?

I just pulled it over. Yep. All right. Perfect. Yeah. So it'd probably just be good to give you a quick primer on what is Posit team. And I'm going to describe it from the perspective of a data scientist, which I'm going to assume many of you all here today are probably data enthusiasts that are working with data, potentially reading and writing to databases. So Posit team is a bundled offering of our three professional tools. And the first tool I just want to touch upon is Posit Workbench. So Posit Workbench is a server-based implementation of the RStudio IDE, as well as JupyterLab, Jupyter Notebook, VS Code. And real excitingly, we just added preview support for Positron . But this is going to be your development environment. So when you're writing code, you're creating Shiny applications, you're reading and writing data from a database, you're going to be doing it within Posit Workbench.

And I know there was a question about, is there like a cloud-based version of Posit Workbench we can all access? Our professional tools, these are tools that sit on top of servers, Linux servers, that your team manages. That could be an on-prem server, it could be a server running in AWS or Azure or some other cloud environment. So these are tools that you would install on servers that you maintain within your group. But once you've created something like a Shiny application, for example, you're going to then publish it to Posit Connect. So Posit Connect is our professional publishing platform, which allows you to easily and securely share content with whoever you want.

Now, I know there was also a question about authentication and how do you securely share those insights. So both Posit Workbench and Posit Connect will typically be tied into whatever authentication your team uses. So in order to access Workbench, you will need to log in. Same for Posit Connect. And then when you go to share content with someone, you do have the ability to share it with other folks that are within your team's authentication system. And depending on your license type, you can share things like interactive applications more broadly with the entire world.

And then just to round it out, our third professional tool, Posit Package Manager, does exactly as its name implies. It helps manage, organize, centralize, deliver all those amazing open source packages your team may be using in both R and Python. And so this is a centralized repository, so to speak, that can feed both Posit Workbench and Posit Connect in your environment.

Using dplyr and dbPlyr with databases

Awesome. Yeah, thank you. Thank you so much, Ryan. I realized that answers some of these questions that were coming in, like, how do we publish Shiny? Or do we need to download Posit Workbench? So if anybody has follow-up questions on those, if those were ones you submitted, feel free to add more into the YouTube chat as well. But jumping over to some other questions, one was, do I need to use dbPlyr, or can I use dbPlyr directly? I don't know. Sarah, do you want to get started with this one?

Great. Yeah. So I guess it depends on what you're doing in your app and why you're asking. So in the R version of the app that I showed, if you look at the code, I don't think actually load dbPlyr, because you're using dbPlyr, but it's the same kind of concept where you're using the dbPlyr interface. Like, you're using all those familiar-looking dplyr verbs, but then interacting with a database table, and it's doing that lazy evaluation where it's not running the entire query unless you're asking for it.

So if you want to interact with the database in that way, where you only want to run the full query when you need to or you force it to, then you will want to use dbPlyr. But if you just want to grab your data from the database and turn it into something that's locally stored or whatever, you can use collect and then use dplyr.

I think we link to the first database blog post, Rachel. That one, I think, explains in a little bit more detail than this Shiny blog post and the demo how collect works. On the R side, where you're taking your database table and pulling it all into R, and then you can use dplyr directly. But I guess at a high level, like, whether you're using dplyr or dbPlyr, the experience is pretty similar, because you can still use dplyr verbs to interact with your database table, like, except in sort of specific cases. Yeah. I hope that answers the question. Feel free to ask follow-ups in the chat.

Awesome. Yeah. And great call-out to the blog post, too, because I meant to say that in the beginning. There are some great posts that Sarah has put out the last few months, so I want to make sure to share them here right now in the chat with you. But Isabella, you had also mentioned a few other blog posts that might be useful. Do you want to speak to those, too?

Yeah, absolutely. This was related to one of the questions, like, asking about speed and, you know, the advantages of using one tool over the other. And the community has put out several really great posts. Rachel, I'll just share with you, if you don't mind sharing them, please. One is by Stephen Turner that talks about dbPlyr, dplyr, and base R. Another one is by Art Steinmetz, which talks about the different kind of wrappers around tidyverse . So we've been talking about dbPlyr, there's dplyr, dtPlyr and things like that. So there's definitely a lot of great resources in terms of, like, if speed is very important to your workflow, how you can kind of compare and then decide what package is the right one for you.

User authentication in Shiny apps

Yeah. So, I briefly talked about this in the demo. But, you know, one way to manage access to your Shiny app, if you're making something like we showed today, is to have the users authenticate through the app. So, that first method I showed, like, I showed you how you I think there was a question on this in the chat as well to talk a little bit more about the adding the key to in Posit Connect. So, that method was we were adding, like, one key to Posit Connect. So, then when you go and look at the app, it's taking it's using that key or password or whatever things we have put in Connect and then using that to connect to the database.

So, if you want each user to be able to authenticate, you have a couple options. One is to, like, set something up yourself where you have something in your Shiny app that allows people to enter their credentials and then you use those credentials to authenticate. And then the other one that I talked a little bit about is about using the OAuth integration. So, this is now supported in Connect. Maybe we can, like, link to that documentation in the chat or something where I can grab it after this. But with OAuth integration, you don't have to, like, write any of that code yourself to, like, figure out the credentials. You just need to write some code that, like, passes them along to your connection code.

And that's nice because you can use OAuth for something like Snowflake or Databricks if you're using those for your database and then let OAuth sort of handle things on the security side.

Collaborative Shiny app development

That's a really good question. That is not something that I know a lot about because I tend to make Shiny apps by myself. But do you have any, either of you have anything or any resources we can point them to? I would say at least the first thing I would mention is that you should probably leverage some type of version control for this. And all that collaborative work should be done on a repository. It's never a really good idea to do like, you know, someone builds a Shiny application, then copy and paste it in like a Slack channel and send it to your colleague that way. So, it's much better, much more organized. You can approve changes. You can work on multiple branches. So, certainly using version control. And another advantage with Posit Connect is you can actually publish a piece of content to Connect directly from a GitHub repository or some type of Git repository, which can be really helpful for collaborative work.

One thing that also could be good to know about is shinylive . If you are, shinylive is nice, you might not want to build your entire app there. But if you are working on an app together and want to just like show people the current functionality of it, or like you have a bug and you want help with it, you can paste a bit of code in shinylive and then easily share that with the other people with the other people you're working on or for someone you're asking for help. And I can grab a link to shinylive if people aren't familiar with it. But it's a way to run Shiny applications in the browser.

One thing that also could be good to know about is shinylive. If you are, shinylive is nice, you might not want to build your entire app there. But if you are working on an app together and want to just like show people the current functionality of it, or like you have a bug and you want help with it, you can paste a bit of code in shinylive and then easily share that with the other people with the other people you're working on or for someone you're asking for help.

So many great resources here. I'm excited to share these on LinkedIn right after as well. Isabella, do you want to comment on the Mastering Shiny one you shared too? Yeah, absolutely. That's such a great question. I think, you know, recently, a friend was asking about like R packages and kind of similar sort of setup, like how do I work collaboratively? And the Mastering Shiny book does have a section on best practices and a specific one on kind of like the software development, like Ryan mentioned, like the idea of version control and things like that. And so definitely check it out. And I hope that's helpful.

Concurrent writes and database persistence

somebody asked, if I deploy this type of app to Posit Connect, and multiple people use it simultaneously, can they both write to the database if they aren't submitting simultaneously?

Yeah, so this will depend on your database and how you have things set up. So for DuckDB, like generally, it doesn't allow you to write concurrently to the same connection. And it's not really set up to do that, in my understanding. The example app, like if you go and look at it, you all can open it up and write and do whatever you want with it simultaneously, because it's doing, it's like spinning up a new database every time you load that app, it's using an in-memory database. And I did that just so like, it wasn't confusing when many people were writing to one place. But you don't know that they're using the app. So like, suddenly things have changed. But also because of this concurrency issue. But like, like I said, that's just like not really what DuckDB is designed for.

But other databases are not DuckDB and are designed for that. And we'll have like, I guess different considerations. So I guess my general answer is it depends. And it'll depend on how you have things set up. But you might need to think about what you want to happen if two people are writing to the database at the same time. And think about like, do you want that to be able to happen? If someone writes a change, do you want your app to automatically update? How do you want to manage that?

Yeah, we'll have to look into the resources. So there's a lot of questions on DuckDB specifically as well. And I don't want to forget to say this, but I mentioned in the beginning that I also host our data science hangout. And so every Thursday, we have a leader from the community who joins to answer questions from you all. And actually, tomorrow is the last hangout of 2024. And Hannes, the co founder of DuckDB labs is going to be our featured leader. So if there are any follow up questions about DuckDB or anything you want to ask Hannes, I'll share the link to the data science hangout right now.

Using IBIS in the Python app

So the question I have is, was it necessary to use IBIS in the Python app? Or could you have connected to DuckDB in other ways? Yeah, so not necessary to use IBIS. If you don't want to use IBIS, for some reason, you don't have to. That's just one approach. And again, like we were using IBIS in this case to connect to the database. And then we also used IBIS to write the code that interacts with that table.

And so one other option is there is a Python API to DuckDB. So if you're using DuckDB and just didn't want to use IBIS, you can directly use the Python API. And then there's, yeah, other options that will do a very similar thing, like allow you to create a connection to a database and then manipulate that data. I think IBIS is just a good choice. IBIS is nice because it has so many backends that you can use. Like you can connect to a variety of databases. So as I showed in that first blog post that we linked at some point, the introduction to connecting to databases or something, the code can look very similar as you're moving between things like DuckDB, Snowflake, Databricks, or whatever database you're using. So it's nice that you can sort of reuse the code and you don't have to switch too much. Whereas some of the things in the DuckDB API are specific to DuckDB and the way they do things, but it also has a pretty familiar looking API as well. So that is also a good option.

Persistent storage and wrapping up

was in your example, the database was bundled with the app, but when deployed to Posit Connect, the storage is not persistent. Is that true? Yeah. Yeah, that's correct. It's not persistent in that when you refresh the app or if someone loads it, they are not going to see those changes that you made to the flag column. And again, this is partially by design for the purposes of the example, because I think if you were making this in real life for your organization, obviously you would want people to be able to change that data and then for other people to see those changes. But in this case, if you're all loading it, it doesn't really make sense for you to see your changes. And it's also nice because it's not dependent on anything external. So you can clone this repository and run the app yourself without worrying about connecting to something that we've set up. So in that way, the storage is not persistent by design.

But I think if you had persistence, like if you have persistent storage set up somewhere else, I think you could, like, my hope is that you can sort of apply the code to that situation by just like editing how you have connected to the database.

Awesome. Well, I know this workflow example is just one example of ways that you could do this, but teams have so many different databases or tools internal to their organization. And so whether that's Databricks or Snowflake or using DuckDB, we hope to make examples that can be relevant to you specifically. So if there are other things that you'd like to see, feel free to let us know in the chat.

Somebody did give us a suggestion for the one that's going to be covered in January. So we're going to do a deeper dive on Vetiver and model cards. So thank you so much to Chris, who had asked for that. But thank you all for taking the time to join us today. And a huge thank you to you, Sarah, for the great demo as well. Thank you, Ryan and Isabella, for jumping in here, too. Always fun to have this whole crew here for the Q&A.

One more very quick note. I have so many exciting things to share with you all. Registration for PositConf is now open. So if you want to join us at the conference next year in Atlanta, you can register to do so. And this is the lowest price that you'll ever see for the conference. So I just put that in the chat as well. So many links that we've shared with you. I know we keep saying that over and over again, but I'm going to right now go and take all those links and put it in the YouTube description, too, so you have them there in one easy spot. Thank you all so much. Have a great rest of the day. Bye, all.

Featured software#