Workflow Demo Live Q&A - Nov 29th
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey everybody, thanks for jumping over here to the Q&A. We're going to give people a few minutes here to make the transition over and I'll pull Gordon over here with us. Thanks so much, Gordon. We're going to give people another, let's say, minute to jump over.
But if you're, I guess we can start with some introductions here as you're jumping over. Thank you all so much for joining us today on this Wednesday. I'm Rachel Dempsey. I lead customer marketing here at Posit and host a few different community events here. So always love to see so many of you. Thanks again for joining us.
Gordon, I know you introduced yourself originally for the demo, but do you want to introduce yourself again here too? Sure. I'm Gordon Shotwell. I'm a software engineer on the Shiny team at Posit, mostly working on Shiny for Python. And Ryan. Yep. Hey everybody. My name is Ryan Johnson. I'm a data science advisor here at Posit. I've hosted a few of these workflows in the past, probably will in the future, but really great to see everyone again. And thanks again, Gordon, for a great demo.
And thank you, Ryan, for kicking these off as well and being the one to start these. I think we've had maybe about, I think we had seven so far now, but thank you all for being here in the Q and A room. As a reminder, if you want to ask any questions anonymously, you can use the Slido link and I'll just show this on the screen right now, but it's pos.it slash demo dash questions.
I saw quite a few questions coming in already during the demo. I usually save this part for the end, but I want to be better about saying it up front. If you liked what you saw in the demo and want to do a trial of Workbench or Connect, you can also use the link that I'll show on the screen here to book time to chat with our team and they can send you an evaluation environment. They'd love to learn more about your use case. And a lot of times I just find that some people don't know that their companies might already have these solutions. So if you want more information on that or to connect with others within your company, feel free to reach out to me directly on LinkedIn, but I'm going to share a link in the chat for you to use too.
Python vs R for APIs
So one of the first questions I see over on Slido is, what are the benefits of preparing these APIs in Python versus R? So I would say there's sort of the one main reason, which is you have Python users versus R users, and that's, I would say, the main reason you should decide to use one or the other, or you're using a Python library versus an R library. So I would say for most data science applications, that's enough, basically. If you have Python users, you're working with Python users, then having something all in Python is really handy. That's one of the main reasons to use Shiny for Python versus Shiny for R. They do mostly the same stuff, but if you're working with Python users, it's helpful.
The one area where that's not quite true is in the API level. And the main reason for that is that Plumber as an API is a wonderful tool. It's a great way of deploying our things to an API. But the usage of Plumber, for the most part, is pretty small scale. And so as a result, it hasn't been sort of battle-tested at these very large or very fast APIs, whereas tools like some of the Python API tools like FastAPI are really developed around sort of very, very high-performance production APIs. And so you do end up with a lot of benefits of... You get a lot of the benefits of those tools just by using them.
So an example of that is the way that FastAPI does the data validation. So when you sort of specify, like, this is the object type I want to accept, and I want to give you a 422 error if you're providing a JSON that's malformed or something like that. That's both very smooth from a developer perspective and also incredibly technically impressive with how fast it is. So you're able to deploy those APIs, have them be very, very high traffic, and still that type checking happens extremely quickly. But I would say for most data scientists, there's no particular difference in terms of what you're likely to do with it or not. It's mostly, are you a Python user or an R user?
Posit Connect vs hosting web apps internally
So I see one of the questions is, what is the difference between hosting in Posit Connect and deploying web apps internally? So I'm assuming, like, hosting them internally, you just mean, like, hosting them on a web server that you control. I think there's a really substantial difference, which is in terms of compliance and security and getting those things approved. So when you're, I've kind of worked in data sciences in a pretty highly regulated environment. And I think most of the time, when you're trying to spin up a new web server, a new web service, an API, or something like that, there's a lot of work you need to do to make sure that that thing is secure and compliant, and that everybody at your company agrees with you that it's secure and compliant and necessary. And that can take months and months and months to get off the ground.
And so the thing I love about Posit Connect, both when I was a user of it before I joined Posit, and now as a kind of worker, somebody who works here, is that it gives you basically like a pre-approved space for deploying all of these different things. So that makes it better, both because, you know, you can do things faster, but it also makes it a lot more secure, because it means that if there is a security change at your organization, like you're switching identity providers, or you're going from password to SAML or something like that, that happens in one place, as opposed to having 100 different web applications that you're hosting. So it really simplifies things. And I think with data science, and these types of sort of rapid prototyping experiences, it's really a great tool for that, because it puts it all in one secure box.
And I think with data science, and these types of sort of rapid prototyping experiences, it's really a great tool for that, because it puts it all in one secure box.
Posit solutions vs SageMaker
One was, we mainly use AWS for our infrastructure. What is the benefit of using Posit solutions for an end-to-end workflow and production in comparison to SageMaker? Yeah, so this is a really hard question. I would recommend probably talking to somebody on the sales team to just figure out your particular use case and whether Posit solutions are helpful or duplicative or not necessary.
One thing I have experienced is around cost and the way that SageMaker or other sort of managed cloud solutions charge you, which is mostly they charge a markup on compute. So if you're using SageMaker and you're using some EC2 instance with a particular amount of RAM and CPUs, you're usually charged, I think, 20, 30% above the AWS rate for that. And that's kind of the charging for the convenience of having those things be managed for you. Whereas Posit solutions all run on a licensing model. So you're charged by the number of users, but you can run it on whatever compute you want. If you have your own server, you can run it there or your own cloud.
So basically you're paying from the compute side of things, like the basic AWS fee. And that fee is also capped, which is nice. So if you have one server that's running connect, that means that no matter what your data scientists throw at that server, they're not going to suddenly explode your costs around that. And you don't have to worry about cost caps and things like that. It's just basically we have the server, we budget it for it. You can do whatever you want there. If you outgrow that server, you have to come and have a budgeting conversation with whoever runs your infrastructure.
One sort of big advantage of AWS SageMaker is scaling, scaling things up is a lot easier on that system. But again, this is really context dependent about whether these things matter to you or not. So I'd recommend talking to sales. And if it's helpful to add, we do have lots of customers who run on AWS today, but also with regards to SageMaker, we'd have customers running Posit Workbench through SageMaker as well.
Managing Python versions on Connect
Question over from Tyrone in YouTube is, how do you manage different Python or Quarto versions major and minor versions on Connect? Is it inadvisable to host multiple versions? Would it affect publishing? Yeah, you should definitely host multiple versions. So Connect lets you, both Connect and Workbench let you, the administrator, basically specify the same way you can do with R. You can specify these are all the different Python versions that you have up there.
The one big difference, I think, between R and Python in terms of multiple versions is that the Python, it's really important that the major versions match exactly, whereas R you can kind of get away with not quite matching exactly, because our package ecosystem is a little bit more backwards compatible than the Python one. So what I have done is basically you have a sort of upgrade path for your team in terms of Python, where you're sort of saying, these are the Python versions we're supporting. Everybody needs to use one of those Python versions when they're deploying, and you have those Python versions also installed on Connect.
One other really crucial thing for Python is to have people use virtual environments when they're publishing, because basically when you publish something to Connect for Python, it basically checks the environment that you're running that in, and if that's like a global Python that you've just installed every package in the world onto, it's going to try its best to install those, because it has no way of really knowing what you need, and there could be lots of conflicts or just slow deploys from that. So I would say the number one thing is making sure people use virtual environments, and then the second one is trying to get some sort of upgrade path for your whole team who's using that system to make sure that they're using a supported version of Python.
Pushing custom packages to Posit Package Manager
One was, how was the custom Python package pushed to Posit Package Manager? Oh, so this one, I actually didn't push it to Posit Package Manager, because it's a public package, so I could just install it from GitHub. So for pushing things to Posit Package Manager, there's a number of different ways. The one that I think is probably the best one is to have a kind of Git-backed, basically Git-backed deployment to Package Manager. So basically, Package Manager is, I think it has a GitHub source. It's been a long time since I've done this. And then whenever you push something to GitHub, Package Manager will pull and build it. In this case, it's such a small package that, and since it was public, it was just easier to have it on GitHub.
Alternatives to PyLance on Posit Workbench
I see Michael asked a question in YouTube as well, and it was, any recommendations for alternatives to PyLance? Not having PyLance on Posit Workbench VS Code is a huge pain, to the point I often develop on my personal machine and then upload to Workbench. Yeah. So basically, Microsoft, which owns VS Code, has a number of extensions that are usable only on the closed source version of VS Code, and Workbench runs the open source version. So there's a number of open source type checkers that are similar, I think the one that most people use is PyRight. But yeah, so it's kind of, and I think also, that's probably the one that I most often hear about, and Shiny Live, which is the WebAssembly hosted Shiny for Python that uses PyRight for type checking. It works really well.
Constraints for production use
Another question over from Slido was, you mentioned the setup is best for proof of concepts. What are the constraints of this setup if we wanted to use it for serving many models for production? Yeah, so there's a few of them. So one of them is, and I kind of mentioned this in the project readme, I don't think I quite emphasized this enough on the video, which is that kind of endpoint authorization pattern. That works great to protect against mistake or people doing the wrong thing. I wouldn't say that it's reliable against a malicious user, right? So if you were to publish this to the world, you'd want to, first of all, take out any of those, the model upload endpoint, have those hosted on some other API that's controlled at the top level through something like Posit Connect or another gateway.
I would say there's nothing technical about that API where you wouldn't serve it. And this is, again, one of the advantages of FastAPI is that the API technology is strong enough to serve, commonly used to serve very large intensive production workflows. I would say the main thing that I would think about is, are you the person to deploy and host a production API? Is your team able to do that? And if yes, then that's great. You can do it and work fine, but it often is a different set of skills in terms of building an API that can scale the way you need it to scale, can be cost effectively run. Some of those things are things that oftentimes you don't necessarily have on your team or personally.
The one other, and then the one last feature is, I would be worried about using the same environment, I think both for licensing reasons and other reasons, to use the same environment to serve internal data products and external ones, both in terms of just people making a mistake and publishing the wrong thing to the public API when they meant to publish to the private API and something getting out there, or running into basically performance problems if you have a big spike in usage on the external one. That's not always the case, but just something I tend to favor personally is to have just separate environments where you can still definitely use Connect. We have people who use Connect for serving large public listings, shiny apps or APIs, like that would work fine.
Implementing the workflow without Connect
One was, how would someone implement this workflow without Connect? Yeah, so the nice thing about-one of the things I really love about Connect is that Connect doesn't have any-the applications that you deploy to Connect are not special Connect versions of those applications, right? So this FastAPI or this Shiny app, they're not special Connect FastAPIs, they're just vanilla Shiny and Connect and FastAPI applications, and so that means you can host them wherever you want to. You can do the same workflow with multiple servers that you're managing or a single server running several processes.
The advantage of that, of course, is that you don't have to pay for the software, right? It's obvious. The disadvantages, I think, are the-there is a cost associated with rolling your own stuff, and the cost is you're now responsible for the security, patching, and deployment workflows, and that can be pretty expensive. It's just not something you necessarily pay for right away. The kind of good example is that sort of same one of just your compliance or security group upgrades the requirements on what you need to do, like I had an experience where my company was going through this big process of using a different encryption algorithm because of some of the people we were trying to sell to, and this was an enormous task for all the services that we had rolled our own, but when we were sort of checking on Connect, like Connect had already implemented that in a previous version, so it was just a flag.
Docker and Kubernetes for API management
One more question over from Slido was Docker and Kubernetes were mentioned in the presentation. Could you talk a little bit more in detail about how these technologies help the management of the API or app? Yeah, so I would say Docker-so there's two pieces of it. So Docker is basically a way of capturing like more of the environment. So when you do Connect deployment, Connect captures like your Python environment, but it's running-like all the Connect applications are running on the Connect server, so if that has like a particular system library installed, it's going to need to use that system library. Docker basically lets you have those things be customized, so it makes your deployment a little bit more-like it's more work because you have to deal with the Docker file. Docker is a little bit more irritating of a development experience, but the benefit is that more of the things like your Linux version, you know, your libgirl version, like those things are all locked down to a particular application.
And then Kubernetes is a way of scaling Docker instances horizontally, so running like many Docker instances on one or many servers. And so I would say they're kind of like growth patterns from this would be what I would sort of say, like, and I think Vetiver does a-is a tool which does a great job of like helping you get those sort of like basic API Docker containers written for you. But oftentimes, like it's-if you have something where you have an API, it's pretty easy to put it into a Docker container because there's a lot of patterns which is like, how do I run fast API in Docker? And you just go copy that and point to your API and it works fine.
Docker is a really good solution, I think, for passing things on to a DevOps group. So if you-if you're a data scientist and you're trying to get something deployed in production or in another system, you can say to say like, look, I have this version running on Connect, you can go test it, try it out, see how it works. And I also have a Docker file for you. So you just need to go and take this and put it on your system. Most things have some way of running and exposing a Docker container. So it's a really nice way of just doing people's work for them a little bit. And it's not that hard to do. But if you don't have to use it, I wouldn't, I would say it'd be my advice, like, if you can have something where it's like, oh, yeah, I can get this working and deploy it and stuff like that. Like, start with that and then worry about Docker when you actually have Docker problems.
Okay, well, I think we've gotten to all of the questions. I'll double check if there's any that I missed, I will try and put it with the recording as well. But just want to say thank you all so much for joining us here today and for asking all the great questions. I want to remind everybody that we have these events on the last Wednesday of every month. And so you can add them to your calendar using this link shown on the screen, POS.IT slash team dash demo.
And Ryan, I was wondering if you wanted to give us a little preview of like something you're thinking about for the next one. Yeah, I mean, we're always open to suggestions on what would be most helpful from the community in terms of your POS.IT team workflows. But one of the things I think is very under leveraged underutilized within POS.IT team is the ability to integrate with continuous integration, get back deployment to POS.IT Connect. And so I think those are going to be some features that I'd like to cover hopefully next month, or at least very soon.
And I did just want to add, again, if you saw this, liked what you saw, and are thinking about how you could deploy APIs or models or Shiny apps or Dash or anything you're building with R or Python and want to try Connect, we do offer free evaluations of Connect. So I'll share this here with you in the chat, as well as on the screen. But thanks again for hanging out with us. So nice to see you all and have a great rest of the day. Really appreciate it, Gordon. Thank you.
