Veerle van Leemput | Analytic Health | Optimizing Shiny for enterprise-grade apps
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Well hi everybody, let's get started then. Thank you so much for joining us today. Welcome to the RStudio Enterprise Community Meetup. I'm Rachel, if I haven't had a chance to meet you yet. I'm calling in from Boston, actually at the RStudio office today. But if you've just joined now, feel free to introduce yourselves through the chat window and say hello, maybe where you're calling in from.
I'd like to let everybody know up front, you can use the more button in the zoom bar below to turn on live transcription if you wish. We'll go through some short introductions of the meetup group and we'll have Veerle here presenting on her experience optimizing Shiny for enterprise-grade apps. And we'll also have lots of time for questions. Veerle said we can also ask questions during the middle of the presentation too, but we'll have lots of time at the end as well.
So for anybody who's joining this group for the first time, this is a friendly and open meetup environment and I'm sure everybody's heard this feel a bunch before, but for teams to share the work they're doing within their organizations, teach lessons learned, network with each other, and really just allow self to learn from each other. Thank you all for making this a welcoming community. We really want to create a space where everybody can participate and we can hear from everyone. I always love to reiterate that we really love to hear from everyone no matter your experience or the industry that you worked in as well.
Just a quick reminder that this session will be recorded and shared to YouTube for anyone who missed it or if you want to go back and check it out again. So for questions you can also use the Slido link which I'll put in the chat in just a moment. So with that you can ask questions anonymously too if you don't want to be part of the recording or you can put your name there and we can call on you to jump in the conversation which is always fun too.
But with all of that thank you all so much for joining. I'm so excited to turn it over to you Veerle and I will mute myself here and turn it over to you. Okay hi welcome everybody nice to see you all. I'm just closing down this presentation and move on to mine.
Introduction and agenda
All right good to see you all. Today we're going to talk about optimizing Shiny for enterprise-grade applications and what that exactly will be. So what are we going to talk about today? So first I'll give a little bit of an introduction so that you know who's talking to you. Also a bit about the company so that you have a little bit of background on the kind of applications that we're making. Then I'm going to dig into the content and we'll start with what happens before you start coding. Then we're going to the coding part so how do you write code for an enterprise-grade application. Then we'll talk about infrastructure, some tips and tricks on how to optimize Shiny. Then we go over how to test your application and the last part will be about monitoring your status of the application and usage.
So first I'll start by introducing myself. My name is Veerle. I'm a data scientist. I graduated as a data scientist in 2015 already in the Netherlands. That's also where I'm calling in from today. I did a master's of data science and after that I started working as a data scientist at some companies. Started off with an energy company then ended up in the pharmaceutical industry at Dr. Reddy's. At a certain point in time I was like I want to start my own business. I have so many ideas and I want to do things my own way. So I found Hybrite which was a data science consultancy firm. During all my nice assignments there I met my current business partner Greg Mills with who I now run Analytic Health and I'm managing director and head of data science there. I'm an R programmer. I love programming R. I build Shiny applications and besides that I'm also doing a bit of JavaScript, building applications in Vue and Node.
About Analytic Health
So a bit more background about the company Analytic Health. Who are we? At Analytic Health we develop tools and applications for the UK healthcare sector. We try to accelerate innovation in healthcare by giving companies the tools they need to access data and to get information from that data. And in order to do that we have two applications. The first one is Farmly Analytics and the second one is Farmly Cloud Data. Farmly Analytics is a Shiny application as well as Farmly Cloud Data and with Farmly Cloud Data we are also giving access to a plumber API. So we do lots of stuff in R. We are a core team of three people.
What is an enterprise-grade app?
And to start talking about enterprise-grade apps I want to take you into what that exactly is. Because an enterprise-grade app can mean many things for many of us and most of the people who hear enterprise-grade think about oh this is an app serving thousands and thousands of users with lots of servers behind it. They think big. And while that might be true for an enterprise-grade app it's not the only thing that defines an enterprise-grade app. So this is my view on what it actually means and what it means to us.
So for example the applications that we are giving access to our subscription base and they are for the B2B market. That means that people pay for them. It's not an open source app. Which also means that if obviously you are offering your app through subscriptions that you need to have some kind of access control and extra data security. Sometimes people or companies also provide their own data to us which means that we need to have a high level of data security. And because people and businesses rely upon our applications to make decisions on a daily basis we need to have high availability, speed and there needs to be data validation.
And obviously if you have this information this information needs to be correct. So there are high data validation demands. On top of that what I think distinguishes like a hobby shiny app from an enterprise-grade app is an outstanding user experience. Which means that you need to put in considerable thought on how the app behaves, what it looks like and what the overall feeling is of the application. And an extra thing that comes with an enterprise-grade app is support. Meaning that if people have questions they can come to you immediately and that you provide this high level of support for your application.
So again enterprise-grade in my opinion has nothing to do with the number of users, can be thousands of users, can be tens of users, can be 50, 100, doesn't really matter. What qualifies for me as an enterprise-grade app is that you put in considerable thought on the availability on the app and the experience when people are using your application. And in fact I also want to say that an enterprise shiny app shouldn't actually feel as a shiny app. We sometimes get that comment from people who are seeing our app for the first time and our shiny developers. They say oh this doesn't look like a shiny app.
And in fact I also want to say that an enterprise shiny app shouldn't actually feel as a shiny app.
But like I love shiny and I'm here to promote shiny definitely. But people shouldn't see with what software the app was made. It needs to look like a good app, good functioning app, high quality. So I see that as a compliment. So the software behind it shouldn't really matter and it's the outcome that matters in this case.
Farmly Cloud Data overview
So today I'm giving you kind of a sneak peek on how we manage things and as I said we have two applications and I will take you along with one of our applications which is Farmly Cloud Data. The application is kind of the data marketplace for the UK. You go into the app, you choose the data set that you're interested in, you search for your active ingredients or for your medicine brands that you're looking after and you can download that data. It's not a complicated app meaning that there are not machine learning models happening or that there are not complicated graphs or something but the app is all about giving you an easy user interface to access all the data that you need to make your decisions.
UI first: user experience before coding
So let's first start not by going into the code immediately but first with a very important and overlooked thing. The UI first principle or user experience and user experience because one does not simply succeed without a plan and often especially if there are a lot of data scientists here today we think whoa I need to build an application let's code but what we often forget is that we first need to think very carefully about what the app is being used for, how people are going to use it, what it needs to look like, where the buttons need to be etc and this is often an overlooked part of building an application especially if you're coming from an analyst background, a data scientist background whatsoever and this is not only true for a Shiny application by the way but for any application you develop.
So put some considerable thought into how the application is going to work. We kind of going back to my own experience at the beginning we were also exactly like that we need to build this feature we're going to build it immediately went into the code. Well that does lead to problems because as soon as you are going to deploy your application then your users start seeing the feature then it's like oh yeah maybe it wasn't working out as we thought in the beginning. So these are all things you want to know upfront and UX designers can help you with that perfectly because they exactly know how things should work and then you can see it before you actually start.
And obviously there are Shiny ways to make this kind of a prototype and one Shiny way I want to highlight is the Shiny Ibsen package. With the Shiny Ibsen package you can make this prototype in Shiny yourself and while this does mean that you start kind of with the code it is a bit different because here you can put in placeholders in your user interface meaning that you can very easily set up an app and just say okay I want to have here a table or a plot and some text and a button and then you can kind of play it around with the user interface first before you actually start writing the server code. So obviously the Shiny Ibsen package makes a reference to the lorem ipsum dummy text. It's very easy to use and I would definitely recommend it if you're looking for a quick way to prototype your applications.
Writing maintainable code
So once you've done that you kind of know what your app is going to be about, what it's going to look like, what it should do, you go into code and that's the interesting part. But to build enterprise-grade applications you also need to put in some considerable thought into how you're going to manage that code and how you're going to build your application to scale it. And this is something that we have learned over the years as well. We're a startup company, we basically started with an app and then that grew and that built and more customers came on it. So we learned over the years that it's very important to if you have this massive app that can do a lot of stuff and that grows over time that your code is maintainable and that you can work with it.
So how do you exactly make maintainable code because that's kind of a magic thing. These are principles that are, well, not necessarily for Shiny but for any app that you're developing in any software. It's important that you write code which is easy to understand and debug, meaning that you think about coding style, commenting your code, keep it stupid simple, the KISS principle. So that is always easy to understand. You can choose to write really complex code but that honestly never helped us because if you developed something very complex with a very complex piece of code and then look at it a couple of months later, you're like, I don't know what was actually happening here, some magic.
We're also talking about maintainable code, we're looking for an easy to modify and enhance code, meaning that don't repeat yourself or dry principle, meaning that if you need to make a change to your code, because let's say in our app, we're going to add another data set, then obviously we don't want to do everything again. We don't want to repeat our code. If we need to make changes need to happen fast because we want to have new features out fast. So build your code like that you can easily modify it.
Also separation is key. And I remember when I started working as a data scientist at Dr. Reddy's and had to build my first shiny app. And I was super proud of the app because it had like, I don't know, 3000 lines of code in one script, which was F dot R. And I was like, Whoa, this is a really cool app, so many lines of code. Obviously, I didn't know what I knew now, obviously. Because separation, splitting your code up into small, easily consumable parts is essential. That app that I built in the early days, 3000 lines of code, not maintainable at all. If I needed to find something, it could take me minutes or maybe half an hour to actually search through code and make it all work.
So what we do now at Analytic Health is we separate basically every separate piece of functionality into separate scripts. So that if also in separate directories, meaning that if we need to find something or a specific piece that does this or that, that we can easily find it. Also, we need to have code that's easy to test. And there comes the automated units tests in place. If you have separated your code in like these churns, it's also easier to test only that script for the desired outcome.
Organizing Shiny apps: Golem, modules, and functions
Alright, so maintainable code, that's one thing. Then how are you going to organize your Shiny app? And how are we doing that? We have one way you can do it is Golem and the Almighty modules. Everybody talks about modules, especially when you are talking about building enterprise apps, or building apps for production, or as soon as you come into the territory of more advanced Shiny apps, then there's no way to go around modules. And Golem is a package that based Golem is a package that actually helps you to build your Shiny application as a package. So Golem is basically a framework for building your production grade Shiny application, it helps you to structure your code and to organize it. And to basically be able to make your Shiny app as an R package.
If you have any experience with building R packages, it seems very familiar to you. Because all the other things that you're used to of making functions in your package are also here as well. So it's super easy to document super easy to test. So coming back to how to build maintainable code, make sure that you can test it. Golem kind of facilitates that. And obviously, because if you can package it, you can also easily share it. So that's also a great advantage. And it makes use of Shiny modules.
So you hear a lot about modules. But there's also another way that you can kind of manage or make maintainable code. And that is functionalizing. I don't know if this is an actual word. But what I mean with functionalizing your R code is basically that you build your code up as functions. And this helps you to don't repeat yourself. Because if you're making functions out of several server scripts, or UI parts, then you can easily use those functions in other parts of the application.
So coming back to my app, farming cloud data, or our app, I should say, sorry, farming cloud data. And we what we do there is we provide access to different data sets. So let's say we have 10 data sets. And for those 10 data sets, you can kind of request active ingredients and brands, and then you can download data, every tab that's in that app does look kind of the same. So what we did here, instead of using modules, for a variety of reasons, we chose to functionize the code, meaning that if we now need to make an adjustment to, let's say, the download button, we can very easily do so by just changing that function.
So what functions don't do is address the namespaces problem, because that is what modules do. So when you're using functions, you still need to make sure yourself that none of the input and output names or IDs have a conflict with each other. Because in a shiny app, you can't use the same IDs for inputs, or the same IDs for outputs, because then it will just not work. So that is one downside of using functions. And yeah, as I said, you can use modules, because they add this one extra level of abstraction, and taking care of the namespaces problem. But functions will already get you, well, good, as a good start.
Then another thing, another framework that I want to highlight today, and if you know me already, then you might have seen this. But it's about a new package Rhino. Rhino is developed by Epsilon. And it's also a framework that kind of promises to help you build enterprise level shiny applications. So it kind of is similar to Golem, but something different. What Rhino does, I think it was only released a couple of weeks ago, what Rhino does is it helps you to have this kind of framework around building your shiny app, and helping you with all kinds of best practices, and giving you development tools. So it provides you with the basic structure of a project, it provides you with a folder structure, files, kind of the same as when you're using Golem. And it also provides tools to work with JavaScript and SAS, for example.
So this one uses box to modularize code, meaning what box does, box basically makes our files, importable, as kind of modules, meaning that you can just write an R script, and then can import it using box. And then it works kind of the same as a module. So it's a bit different than the shiny modules work. And it's definitely an interesting concept. So it's actually also a little bit similar to chopping up like your big app.r script into multiple R scripts, and then importing them in another way, box kind of take care of that in a more clever way, so that you don't repeat yourself. Definitely worth a look.
Infrastructure and deployment
And the other thing now is, okay, you've built your code, you have nice, nice code, nice structure, it's maintainable, you have built your, your kind of structure of your app using either Golem functions, Rhino, whatever there is, then it's always a question, how are you going to bring that to your users, you're going to use it. So a little bit about infrastructure and how we do that. Obviously, there are 1000s of ways you can do it. And I'm not saying that the way we did it is like the most ultimate way to do it. But for us, it works very well.
So these are my experiences, and our setup. And this is our infrastructure. So I've kind of simplified it here, then how it is in the real world, obviously. But our broad infrastructure, looking beyond shiny, looks like this. Like, first, we get our data somewhere. And we get data from the NHS, the National Health Service in the UK, then we do some stuff in R to transform it, clean it, validate it, etc. Then we send it somewhere to be stored. And we use pins and databases for that. If you're not familiar with pins, we definitely read up about it. It's just a nice way to store any R objects.
Once we have our data somewhere, then we build tools around the data. As I said, we have API's using plumber, we have shiny applications, but we also do our markdown reports and evil reports using blastula. And then we deploy it. And we use RStudio Connect for that. So our API's and our shiny apps are deployed on RStudio Connect and are accessed from our customers outside of the organisation. You don't hear that a lot with RStudio Connect, people actually do that. But you can really nicely do that because you can also add your own branding to Connect, meaning that it's also perfectly usable for outside customers if they want to access your application in a safe way.
Because as I said, it's important for us that there is some access control, so kind of a login mechanism. And RStudio Connect really takes care of that very well. So then we basically have our ETL, our extract transform load, we have our apps, we have it deployed. And then obviously around that is monitoring, we need to make sure that every step of the process is being taken care of. Version control, meaning that we need to have, for example, our API, customers are directly able to access data via the API and can use the API in their own pipelines, dashboards, databases, etc. So it's very important to have a version control on there. Because if you release a new version and have breaking changes, what if the pipeline of the customer depends on that.
And if we zoom in a bit more to our infrastructure, when it comes to our applications on Connect, we have a shiny app, which I showed you farming cloud data. And actually, the server part of the shiny app is not that interesting. Because what the shiny app does is it's making requests to the plumber API. So basically, our back end is being written as an API in plumber. This is not any different from, for example, writing Vue applications or applications in JavaScript, because there you often have front end and back end completely separated. Shiny brings them together. But in this case, we kind of chose to also separate it here.
So we have an API that fetches the data from a database, in our case, databases in Azure. And the plumber API, then does some processing to retrieve the correct data, whatever was requested and sends it back to the application. The application server part is then only about okay, make a request. And we're done with it. The other thing that the shiny application does is making use of pins. So as I briefly mentioned, pins are a way where you can store your objects. It's basically you could kind of compare it to a database, it's not the same, but it's a place where you store our related stuff.
So it is deployed on RStudio Connect, we only have one server available that kind of serves RStudio Connect, it's a good server and fits our needs. Obviously, if you have a very large audience of let's say 1000s of users, you might want to look into putting multiple servers, and then kind of load balancing across these servers, which RStudio Connect is also capable of, but we just have one. And what we kind of do to make sure that our apps are accessible, and fast is we have always a couple of R processes running at minimum to serve these apps. For example, for the plumber API, we have five processes always running and for the shiny app also five, meaning that there are 10 processes, R processes, always running to server applications so that people quickly can access the application and do whatever they want there.
And I can also give you that as the biggest tip, we've seen a lot of benefits with using an API plumber API, instead of doing everything in that one session in the shiny application itself. So kind of we separated it, that makes it one more maintainable and two faster as well, because we can kind of spread or load between what the app is doing on the front end versus what is happening in the back end. And we can also do those things in parallel.
As I said, this is just one way how we do it. There are obviously a bunch of alternatives. But just so you know, there are multiple solutions to achieve deployment of your application for an enterprise. And one thing that you can do, if your app grows bigger and has more traffic, you can add load balancing and high availability using RStudio Connect, meaning that you kind of configure RStudio Connect to be across multiple servers. And then there's also Shiny Server open source, which is a fully open source solution where you can also which you can add to one of your service if you want and serve your application from there.
Optimizing Shiny: process configuration
So we've written a code, and we've made it maintainable, we have built a structure, and we have thought about user experience. So we've deployed it somewhere. So now it's time to go look into a bit more detail into how we can optimize Shiny, to kind of make sure that everybody who wants to access the application at a given point in time can also do so. And the first thing I'm going to talk about with you is process configuration. And I quickly touched upon this when I said that we always have five R processes running for our Shiny app and for our Plumber API. Because, as you might know, a Shiny app or a Plumber API requires an R process, meaning that R needs to be up and running, and that needs to be ready to do stuff.
And you can configure these kind of settings. And these kind of settings that I mentioned here, maximum processes, minimum processes, maximum connections per process, load factor, you can set them in most cases yourself. So in shinyapps.io, which is not an enterprise solution, but if you're still there, these settings will also be there in RStudio Connect. And it's just very important, also for if you are deploying your app in for any other cloud service provider, it's important to understand how this works and how you are kind of redirecting your users to an R process.
Because a Shiny app requires an R process. Okay, so then there is an R process in the Shiny app. And then there's also a user. So a user can basically hop on to that R process. And you can say, okay, I want to have only one user for our process, or I want to have multiple and that is kind of going to determine your scalability of the application. So things that you can tweak, for example, are the maximum number of processes that you are allowing to run for your Shiny app.
The minimum number of processes means you have, as in our case, a minimum number of processes running for your Shiny app at any point in time. So if you set this higher, let's say 10, 15, 15, it means that you also consume more resources. Because all these processes are running, they consume memory of your server. So the higher you set this, the more resource consuming it will be. So you need to think about that. In our case, we chose for five minimum processes. Because if we're looking at the number of users that are coming at the simultaneous time to the app, and are doing stuff, this is more than enough.
But if you have the minimum number of processes set, it means that there's always an R process running for your application, meaning that the startup in time will increase drastically. Because and that's kind of the downside of building a Shiny app, the R process needs to be startup. So besides loading your app, it also needs to load an R session. So together, that can take a lot of time and people don't like to wait. And especially if you are building apps for a larger audience, apps that people pay for, you have to have high expectations, people can't wait 40 seconds for the app to load, needs to be 10 seconds max.
Then you have the maximum connections per process, which means how many users are you letting on to each R process. If you set this very low, it means that, for example, in our case, we have set it to one. There's only one user allowed per process, because otherwise, users have to wait for each other. Because if one user is pushing a button to do a calculation, and the other, then the session is basically occupied. And we all know when an R session is occupied, then you can, you kind of can give an assignment or a calculation, but R won't do anything until the previous one is executed.
The last one is load factor. That's a value between zero and one and the lower means that you spin up many new processes. So you can also if you have the maximum number of connections per process set to five, for example, five users can simultaneously access your R session. If you set it very low, it means that as soon as these people start hitting buttons and doing calculations and asking things from the R session, that they get redirected to another process or that the new process is being spun up to kind of handle that load. Like playing around with these settings can drastically change the performance of your app.
Smart coding: cache, background processes, and Feather
So that's one thing that you can optimize. But most of the game really is to be made in the app itself. So how are you going to make your app fast? How are you going to make your app enjoyable? Because that is part of building enterprise-grade apps. They need to work great. People expect a lot from them.
So smart coding. How are we going to speed up our Shiny application? One way we can do that is by using cache. And cache is an interesting way of kind of reducing the time it takes to get output. So imagine that people can hit a button in your Shiny app. And once they hit that button, there will be a very lengthy calculation running. I don't know. Maybe you have some kind of machine learning model or something. But they choose one input. And the calculation runs and it gets an output. It's obviously a shame if then a second user comes in, chooses the exact same input, does the calculation again, and gets the exact same output again. It's kind of a waste of time. Why should you do that calculation twice?
So this is where cache comes in. Because what you also can do is say, okay, well, the first user requested a result for this input. Input A, we call it. We do that lengthy calculation. Yes, it takes some time. But for the next ones, the next users, we're not going to do that. No. Instead, we're going to immediately use the results that were being cached from that calculation. So cache is basically a way of storing your output somewhere, in this case, on your server or wherever you choose to store it. And then retrieving it from storage instead of doing the calculation.
So especially if you have a large audience for your app, it can save a lot of time. And especially if a large audience is doing many of the same things. And not only if you have a large audience, even if you have just one user using your app, but it's constantly switching between input A, B, C, A, B, C, B again, C again. It's kind of requesting constantly that same calculation because he's just shifting inputs and he forgets kind of what the outcome of A was, so goes back again. So even for single users, it can be very beneficial to just store these results.
There are two ways you can implement cache in Shiny. The first is the memoirs package, which kind of helps you to store functions and the result of those functions as cache. And you have Shiny's bind cache function. I have an example of both of them on my GitHub page, and I will share that afterwards.
So cache sounds nice, but a couple of disclaimers. It's obviously only useful when your output does not change and does not change often. So let's say that your output changes, I don't know, because there's a random seed in there or whatever. It changes every time, yet there's no point. It also is not that much point if your data behind it changes like every, I don't know, six hours or every day. It only is useful when it doesn't change regularly, so that is actually a point of storing your output. Then the downside of using cache is that it requires space, because obviously output needs to be stored somewhere, and that requires some space. And you need to think about sensitive data as well, because if you are kind of sharing your cached results with your other users, you need to think about, okay, is user A allowed to see whatever user C did? But cache can be a game changer in the performance of your app.
Another smart thing you can do is using background processes, and I touched upon it a bit by saying that if you have multiple users per connection, that they have to wait for each other because the session is occupied. Well, that sometimes can happen, and we don't want that, because we don't want to have parallel execution. No, we want to have asynchronous execution of code, especially if you're developing an app for multiple users. You want them to be able to do things at the same time and not be affected. And in fact, you also don't want a user to have to wait while something is being calculated and that he or she cannot do anything in the app.
So let's say a user hits a button to kind of render a graph, and then in the background, some data is being retrieved from the database, and then calculations are being done, and then the graph is being rendered. Well, maybe the user doesn't want to wait those, I don't know, 30 seconds for the graph to show up, but maybe in between, the user wants to look at another table and maybe hit another button for a lighter calculation or another calculation. You don't want people to wait. So this is where callr comes into play, because with callr, you can spin a background R process. And this is really cool, because your Shiny app is running on one R process, and then you can use the callr to kind of set up an additional R process, do the calculations there, then fetch the result, and put it back into the current R process. So this is especially useful if you have queries to databases, which take a very long time.
And this is actually the exact same thing that we do with our API, in fact, and our Shiny app, because we actually let another R process do the work, retrieve the data from the database, and then kind of send the results back. That's a similar principle to callr, only here you do it in the Shiny app at the same time. And this is also useful, for example, when you are starting up your application. So on startup, most often a bunch of data has to be loaded, and you want to display graphs and tables and whatever. It might be useful if you just say, okay, I'll just give it a little boost, and on startup, I'll spin up multiple processes for my application. And it works really nice. callr shuts itself down when the process is done, and it's just a really nice and clean way to kind of put some extra resources to your Shiny app.
Because as I said, our app is about retrieving data from the API, displaying it, and making it available to download. So in our case, it means the faster our API is, the faster our Shiny app is. Even though we are pulling in, I don't know, 100,000 lines of data whenever somebody makes a request, we want that to happen in a split second. What we want to avoid is people have to wait even 10 seconds for their data, because one of the selling points of our app is it's super fast. So for us, it's very important that the API is fast, so the Shiny app can be fast.
And one of the real game changers for our app and our API was, is that we were going to use Feather instead of JSON. Like a plumber API, by default, returns JSON, but you can also let it default to other data formats, for example, Feather. And we found that Feather is almost two and a half times faster in comparison to JSON, and also the file size is being reduced. Because we have to send data from the API to the app, the more data we have to send in terms of size, the longer it's going to take. So by implementing Feather and sending it over as a smaller Feather file, it was very, it took, it saved a lot of time to actually send that Feather file instead of a JSON file. And in some cases, we were actually able to reduce API response time by as much as 50%.
And in some cases, we were actually able to reduce API response time by as much as 50%.
And those are massive numbers. We're very happy that we've used Feather, and you can use Feather in any situation. So let's say that you are reading in a file from disk, for example, try to convert it to a Feather file, and then read it in as a Feather file. And most often you are way faster, even with, even faster than like functions like app read for reading a CSV and whatever. This is a really cool way to speed up your Shiny application.
Testing your application
All right. So I hope these kind of tips helped you to kind of have some ideas about how you can improve your Shiny application, and how you can make your app faster, and how we did it. So this was kind of the stuff that I wanted to talk about when it comes to coding. Now I want to talk a bit about testing, because it's obviously very important that if you have an app for like users, enterprise users, customers who are relying on a daily basis on the information that you have in the application, you obviously want to test it. You want to test to make sure that it can handle the load that you're giving it. And you want to be able to test if it's working as expected anyway.
Is your app ready for the load that it's supposed to handle? So we need to test for load. And testing for load we can do with a package called shinyloadtest. And shinyloadtest provides the command line tool called shinycannon. And because I find shinycannon a lot cooler than shinyloadtest, I put that as the headline. And this basically can answer the question for you, how many users can your Shiny app support? And it's really cool because the only thing that you need to do is record your R session like you're a normal user. And then you kind of let shinyloadtest simulate all those recordings and replay them in parallel, meaning that it can kind of simulate the situation in which multiple users at the same time are going to access your app.
And then after that, you can analyze the results. shinyloadtest has a bunch of convenient functions in there that allow you to analyze all the files that it generates. And then you can build nice looking reports. And these reports can, for example, tell you the time it takes to load the HTML home page or the time it takes to grab your CSS or JavaScript files or the starting time of the session. So how long it actually takes before Shiny connects to Socket.js. Also calculation time of, for example, any button that you press or any graph that you try to render or table that you're going to display. And you also get a lot of details about individual events. And this makes sure that you know, OK, my app can safely handle this load. And then you need to see, OK, does it actually fit my expectation of how many people are going to visit the app? Or do I need to change something? It's also great to find out what is taking the most time and where you can get some efficiency gains.
So we test for load. Another thing that we want to test is test for outcome. And testing for outcome means that we want to be sure that when somebody pushes a button, that the outcome is what we expected. And we can test for outcome with the package called shinytest2. It's actually the follow up of the package called ShinyTest. No surprises there. And this package you can use to create and run automated tests. And behind the hood, it uses TestStats. So what you kind of do here in this package is also you record, just like shinyloadtest, you record the session. And then you kind of replay it.
But in this case, shinytest2 will translate your actions into actual code. It's really cool. It's like magic. But it will write actual unit tests for you using TestStats. So this allows you then to kind of everybody talks about our units that are a pain, and they take a lot of effort, there's so much time, and I don't have time to write them and always forget it. This makes it super easy. I mean, it doesn't get easier than that. You just play around in the app, you click a few things, and the code is there for you.
This package uses Chrome modes to launch a Chrome browser, a headless one. And it's a really cool way to kind of make sure that your app is doing what it's supposed to do. Plus, what the thing that's up to you, and it's your responsibility, is you need to update those tests whenever you are making a deployment or whatsoever. If you don't update your tests, then it's pretty useless. But it's a nice way to be confident about releasing your application with a new update. Because unfortunately, when you change, and we all know that, when you change somewhere in one part of the app, another thing might break. And this kind of prevents that. So this is really key to make sure that you don't ship anything to production before it's really good.
Monitoring app performance and usage
So I'm moving on to the last part I want to tell you something about how to monitor app performance and app usage. Because we've walked through the whole pipeline now, from like designing, UX designing, to code, to optimizing Shiny, to deploying it, to testing your application, like very high level. But then there's also the thing about, okay, how are you going to monitor your app? And how are you going to know how many people are actually using your app?
And how we do that. So how we do that is with something I'm still very proud of every day, our status reports. We have created status reports with blastula and RStudio Connect, meaning that every morning, or whenever we wish, we get a status report in our mailbox, like this, looking like this. I even check in on my phone, it's like awesome. And it just tells me exactly, okay, this is all good, or this is not good, or pay attention to this, these apps are running, these URLs are accessible, this is the status of our services.
And while there's a bunch of stuff here that isn't necessarily related to Shiny, but more about our servers, it is important for us to know our servers up and running. Because our apps run on those servers. And we want to make sure that like every morning, everything is running as intended. So what we kind of do every day is we check the disk space, because unfortunately, it happened once, and I'm sure I'm not the only one, that the disk space was used 100%. And what happens then, then your server won't do anything anymore. So your app is not working. But that's not cool. And also things like is the engines configuration on, are the URLs accessible and stuff like that. It's a great way to kind of monitor what's going on into your pipeline into your infrastructure. And it's just nice to get it as an email so that you're always up to date.
And I want to come back to shinycannon, because shinycannon is not only useful for building these load tests and kind of see how many users your Shiny app can support. But it's also a great way to kind of check if your app is still behaving the same as yesterday. Because with shinycannon, or shinyloadtest, you recorded something, a recording of you kind of replaying the app. And with the command line tool, so shinycannon, you can run this as a single recording. So you can use shinycannon to do these load tests with a number of workers and the load duration and kind of simulating 50 users at the same time. But you can also use it to say, okay, I have a recording here, play it, and then give me the results. And is it the same as expected? Then yes, thumbs up. If not, give a warning, in our case, in one of our states reports. So this is also a great way to kind of automatically check if your app is actually running and doing what it is supposed to do.
And then a little bit about usage. You need to make sure if you're building your application for your customers, you need to make sure if you're actually using it and how much. What we did, obviously, a very shiny way to do that, we build our own app, which we call the console app. And in the console app, we can do a bunch of things. And one of those things is checking usage of our apps, meaning that we can see how many times people logged in, who did log in, how long were there for, what did they do, et cetera. This allows us to build us a better product, but it also allows us to kind of check in with customers. Like, is everything still okay? Can I help you with something? That kind of stuff.
And then a very quick note about that you can also do this with RStudio Connect. You can use the API from RStudio Connect if you're using Connect to deploy your apps. And you can get all these kinds of stats as well. Super easy to see who did log in, who interacted with your content, which content is being viewed most, at which times of day, and in which weeks, et cetera. So very important if you're building apps for an enterprise level.
And then I want to have a closing note with, can you build enterprise-grade apps with Shiny? Yes, you definitely can. And I hope that today I showed you a little bit of how we manage things. And I hope I gave you some tips and tricks to make enterprise-grade apps, or to optimize your existing apps, or just learn you some general stuff about Shiny. And then now it's time for questions.
Can you build enterprise-grade apps with Shiny? Yes, you definitely can.
Q&A
Amazing. I think there are some. There are lots of questions. Thank you so much Veerle. Thank you all so much for the great questions too. I see there's one that was upvoted. That is, what's the benefit of using pins rather than pulling the data from your database in Azure?
Well, what we sometimes found is that pins are actually quicker than doing it in our database. So we use pins to kind of store information that's subsetted. And while you obviously can retrieve subsetted information from your database, a database needs to run a query, needs to find the correct subset, can take a couple of seconds. And if the pin already contains all the information you need, pre-processed, whatever, it's much quicker to do that. Besides that, the data comes in the correct format already. So whether you stored it as a data table or as a table with kind of the column formatting you want, so this column needs to be an IDate or this column needs to be a character, whatever, that's all preserved. So it's a really clean way to, in an R way, quickly retrieve information.
Are there package license considerations you had to think about when monetizing Shiny applications? Yes, definitely. Not all R packages are available for free, and you need to have a license. Think about High Charter, for example, that requires an enterprise license. You need to be aware of that.
Someone else asked, do you find Rhino more easy to handle reactives? So, for example, passing them between modules than Golem? Interesting question. And I have to say, because it's a super new package, I didn't have the time to go fully into it yet. I just kind of played around with the toy example. So I can't really answer that question at this point in time. I hope it will be easier, because I do see some advantages of using Box for modules compared to Shiny modules, because they just work very differently. So, yeah, I'm very curious to explore it myself, too.
Do you use Promises to scale the application? No, we didn't use Promises, but we are using callr instead, because Promises still wasn't providing us with a good solution to do whatever we want. Then we found callr more easy to use and also more clean. It cleans itself up pretty easily, while sometimes with Promises, some stuff lingers around and keeps lingering around. And we didn't find those issues with callr. So that's why we use that instead. But it works the same as a Promise. callr is nothing more or less, except for a couple of benefits or maybe disadvantages, compared to Promises.

