Capacity Planning for Microsoft Azure Data Centers | Using R & RStudio Connect

video

Feb 1, 2022

1:13:29

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Everybody, welcome to the RStudio Enterprise Community Meetup. I'm Rachel Dempsey. I'm calling in from Boston today. I'll be your host for today's meetup. This is my first time trying out YouTube Live. So I'm joined by my colleague, Tom Mock hanging out behind the scenes as well. So thank you so much, Tom. If you've just joined now, feel free to introduce yourselves through the chat window and say hello. Maybe put in where you're calling in from. For today's meetup, we are joined by Paul Chang, Senior Data and Applied Scientist at Microsoft. So Paul is going to share their data science workflow using R, RStudio Connect, and Azure for Capacity Planning of Microsoft Data Centers. And so after Paul's presentation, we'll have lots of time for Q&A as well. So you can put your questions in the YouTube chat, or you can use the Slido link as well for anonymous questions.

But that Slido link is just rstd.io meetup questions. For anyone who's joining for the first time, welcome to RStudio Enterprise Community Meetup. This is a friendly and open meetup environment for teams to share the work they're doing within their organizations, teach lessons learned, network with each other, and really just allow us all to learn from each other. So while we usually meet on Zoom, we wanted to give YouTube Live a try here. Thank you all so much for making this a welcoming community. We really want to create a space where everybody can participate and we can hear from everyone.

But with that, thank you all so much for joining us, and I would love to turn it over to our speaker for today's meetup, Paul Chang. Hey, Paul. Hey, great. Thanks, Rachel. Thanks, Tom. And thanks, everybody, for joining in in this presentation. So yesterday, I will be talking about capacity planning for Azure data centers. And this is a joint work with our fantastic team, Sarah, Ivy, Yajun, Ritchie, Sajay, and Serge.

Overview of Azure and capacity planning

Okay, so the agenda for this presentation. So I will describe the capacity planning process for Azure data centers. I'll then describe how we use RStudio Connect to generate these capacity plans, the value that we get from RStudio Connect. And then I'll do a demo involving a simplified version of a Shiny dashboard that we built.

Okay, so before we get into the planning process, let's get to know Azure a little bit. So Azure is Microsoft's cloud platform. We have 200 products and cloud services. We serve over a billion customers worldwide. 20 million companies are on Azure. And we're in 60 plus regions around the world. We have over 90 regulatory compliance offerings, the most of any cloud provider. 95% of Fortune 500 companies are on Azure. And Microsoft spends a billion dollars in cybersecurity every year.

Now, in order to enable these cloud services, we have built a global infrastructure that consists of 165,000 miles of fiber optic cable, over 200 data centers, and a lot of land, a lot of land.

Key features of our Azure data centers, high availability, low latency, so super important for gamers, scalability. And by using Azure, we ensure that our customers have access to the latest cloud technologies. Your data is safe and secure. So if you architect your workload in a certain way, where your workload consists of multiple Azure services, those Azure services can communicate with one another over the Microsoft network. So that IP traffic never enters the public internet, and so your data stays safe and secure.

Okay, so now let's talk about capacity planning a little bit. So we are a group of economists, data scientists, and we work with this fantastic group of program managers to produce the long-range capacity plan for Azure data centers. Now, some characteristics of our plan. So we do look at the 127-year time horizon, and the data center build-out process is characterized by somewhat long lead times. So typically, roughly around the two-year mark, but in certain instances, that lead time could be a lot longer than two years. We invest billions of dollars in our data centers and our global infrastructure, and our capacity plan is for the 60-plus regions that we are in right now, and with the hope of expanding into a lot more regions in the coming years.

Workflow and demand inputs

Okay, in terms of our workflow summary, these are roughly the main entities that we interact with. So we're capacity planning in the center there. We ingest data from hardware and region design. We ingest demand from various partners to generate our demand forecast models and our capacity plans. We also collaborate closely with finance to iterate on the capacity plans, and we also collaborate with our supply team closely as well. So we give demand signals to our supply team. They, in turn, do the hard work of figuring out where the supply is, where the most cost-effective supply is, and they give that information back to us, which in turn affects or influences where we shape demand.

Okay, so now let's double-click on demand bucket first. So here are some main categories of demand inputs that we consider. There's this bucket that we call organic data, so it consists of economic data and historical Azure revenue that we use to feed into our econometric and data science models. On top of that, we do consider capacity requests from various Microsoft engineering groups, being Office 365, Xbox. As well, we also consider capacity requests from our large customers. So we actually have a large team of people who talk with governments and large enterprises, you know, Fortune 500 companies, and these customers, when they onboard onto Azure or they make a new capacity request, those capacity asks are actually pretty large usually. And so those type of large capacity requests are not necessarily forecastable by the historical data. So for that reason, we really need to consider demand from large customers as a separate bucket.

Okay, in terms of hardware and region design, so Azure services, right, so various Azure services actually depend on core Azure services. Some Azure services have special server requirements, special hardware requirements, so we need to take account of those considerations. The hardware roadmap, that's another big input for us. So yeah, I don't know if Moore's Law is still valid or not, but I think it is still our thinking that as time goes on, CPUs should become more powerful and energy efficient. And so if CPUs and GPUs, they get better and better, the thinking is perhaps we might need fewer servers and maybe even fewer buildings to serve the same demand.

So we ingest all that information from our partners to produce the capacity plan. These are roughly four of the main pillars that support our work, that allow us to produce the capacity plan. So the first is revenue forecasts. So we have a revenue forecast model that employs a couple of well-understood models that we stitch together, including a model on technology diffusion inspired by Diego Coleman. We have a hierarchical time series model that allows us to distribute a worldwide revenue forecast down to the geo and region level. We have a gravity model of trade. The hardware roadmap is another big pillar that our team works on.

Capacity principles are also important, right? So we need to account for the discrete sizes of various data centers due to their design, and we also need to build in safety stock to account for supply lead time and volatility on both the supply and demand sides. And the fourth is demand shaping, which I'll talk more about in the demo later in this presentation.

Key considerations: accuracy, timeliness, and explainability

So for our team, what are the top considerations for our team for doing this capacity planning? I think the close to the top one is accuracy, right? So like many other data science teams, we care very much about accuracy. It's how we build credibility, super important that our models are accurate. But that's not the only consideration that we need in order for our team to be successful. So I think data science teams and models, they tend to over-index on accuracy, but in order to make our models a success, we need other factors as well.

So the other big considerations are timeliness, right? So we need to produce capacity plans on time, every month. Delays in publishing this plan can be costly, right? So we don't get new capacity unless it is signaled in this plan. And if we don't get new capacity in a timely manner, there's a risk of lost demand and other delays. So it's very important for us that we produce plans on time consistently.

Business strategy, so we need to be agile to incorporate new ideas, thoughts, intelligence from our stakeholders and leaders. So we need to override inputs in case new information comes in, or if there are occasionally data quality issues with some of our input data. So if we do detect data quality issues, we need to be able to override those inputs in a non-hacky way, in a robust non-hacky way.

And the last one, I think in some ways this might be the most important consideration, explainability. So we gain trust only when we can explain the drivers behind the capacity plan in a clear and concise manner. Shiny apps help a lot with this, by the way. So we love shiny apps, we love shiny dashboards. I think it's only when our explanations are clear and concise that we are able to also be transparent with our internal stakeholders and partners. So I really think explainability is one of these key things that you need that enables us to be transparent. And when we are able to be transparent and open with our partners about our model assumptions, about the input data that we see, then we start building trust. So being transparent, that enables us to build trust with our stakeholders. And that trust, it leads to more questions. So being able to build that trust leads to us having good conversations with our internal stakeholders, and that leads to more collaboration.

So I think this entire chain of goodness that starts from being explainable. Explainability leads to transparency, leads to trust, leads to collaboration. So I think this is a big one for me. I'm very passionate about explainability.

So we gain trust only when we can explain the drivers behind the capacity plan in a clear and concise manner.

Before and after adopting RStudio Connect

Okay, so with that said, let me talk about how we use RStudio Connect to help us address these considerations and help us do our work.

Okay, so let me paint a picture of our work life before adopting RStudio Connect. So truthfully, it wasn't bad. It wasn't bad at all. It was not that great either, but it wasn't that bad. So before using RStudio Connect, models were developed by data scientists using R and C Sharp, and the data scientists would check that code in. And they would be deployed by engineers in data pipelines using Databricks. It's important to note that most of our data is mid-sized. So we typically deal with data that's, you know, at most maybe 100 gigabytes roughly, right? So our team doesn't really deal with petabytes of data. Other teams deal with petabytes. We don't directly deal with big data.

And so the pipelines would run in Databricks. It would take an hour for the pipeline to run. It would dump the data in some database, and then people would sort of extract that data and, you know, create static tables, static graphs, and paste them in PowerPoint presentations. And then they would present these presentations with our stakeholders. Our stakeholders would have feedback for us, and then our data scientists would have to either create new models or modify existing models, and then the cycle would continue.

So what were the pain points before we started using RStudio Connect? I think the big one for me was just slow. It was just slow, slow, slow. Slow end-to-end development time, right? So that cycle that I just talked about, you know, was like a couple days between when the data scientists would modify some model to stakeholder feedback. And remember, we have to publish this capacity plan on a monthly basis, right? There's 20 working days in a month, so three days is a long time within that 20-day time span.

So Databricks is great for big data, right? So Databricks and also Azure Synapse, these are actually really great platforms for dealing with big data, from getting insights from big data. But for our case, where we're only working with a couple hundred gigabytes of data, it's actually a lot faster for us to run our models on a laptop or on a desktop machine, right? So it's really, for our case, it should not take an hour for the pipeline to run end-to-end.

Now, having said that, even in the current workflow, we still depend on big data. I think some of our upstream teams, they probably use some of these big data platforms to produce our input data. So I still think Databricks is great. I think Azure Synapse is great. But for our particular scenario, it was, you know, we could get things done faster using something else.

Hard to do what-if scenarios. So if somebody wanted to change an input parameter from one to two, you know, yeah, sure, change the input parameter one to two, hit enter, and it'd take an hour for the results to come out. So that big lag between changing an input and getting model results, you know, that makes it hard to do what-if scenarios.

And then visualization of results. So I will say that, you know, during this time period before we onboarded to RStudio Connect, our data scientists were already experimenting with creating R Shiny apps, right? So we were already testing that out. We were also, we were already sort of using it internally in our team. But there was a big requirement, right? There was a big desire for us to be able to share those Shiny apps with our internal partners and stakeholders, right? So at that time, we didn't have a means of deploying Shiny apps. And so I think it was really that need to be able to deploy Shiny apps that prompted us to explore options, right? And so I think the great option that we landed on was RStudio Connect.

So before I get to this slide, a big shout out to Amit Gandhi, who is no longer with our team. He's over in Airbnb. I think he's, I think he looked at this situation before, and he was really instrumental, really, really instrumental in driving our team to adopt RStudio Connect. So I think he's, I think he's the single person most responsible for sort of shifting our team towards this new model of development.

So after we adopted RStudio Connect, what is the model for how we work? In this new world, data scientists own the entire model development lifecycle. So data scientists write the data ingestion jobs, they develop the models, they deploy models on RStudio Connect, and they write Shiny apps to visualize results. We also use some R Markdown documents as well. At the same time, our engineers remain vital, indispensable. The engineers, we could not do our work without engineers. They are responsible for creating and maintaining the infrastructure for running RStudio Connect and other Azure services. They are responsible for creating the architecture. They also help shepherd the R packages that data scientists write.

Advantages. So as I said before, RStudio Connect empowers our data scientists to directly communicate insights to stakeholders. They own the development lifecycle end-to-end. The end-to-end development time is now quick, right? It can be as little as a couple hours between model development to deploying the app on RStudio Connect to getting stakeholder feedback. Easy collaboration between the data scientists. So we love the pins package, by the way. So we share data sets amongst ourselves using pins. We write most of our code in R packages, and we can share code between ourselves via R packages easily. And now it's easy to do what-if scenarios, right? So we're able to deploy plumber APIs, and we're able to write shiny apps that talk to those plumber APIs. And so in doing so, we're able to create dynamic apps, right, where people can fiddle around with input parameters and do sensitivity analyses in essentially real time, right? So it no longer takes an hour for people to explore, you know, what happens if I change this parameter or that parameter. It's now real time, right, due to this combination of shiny apps and plumber APIs.

So the return on investment on RStudio Connect is clear to us. So we have been consistently on time and publishing the capacity plan every month. No delays. As I said before, any delays are, they're potentially costly. So really being on time, that's huge. Transparency and explainability. So I'm a big fan of shiny apps, shiny dashboards. It's super quick to put together a dashboard and deploy it on RStudio Connect. And even if some of the algorithms we develop are black boxes, if people can interact with the models via the shiny apps, and if it kind of looks right, then that capability alone is powerful, right? So sometimes it's really hard to sort of explain the guts of an algorithm. If it, on the other hand, if it kind of looks right, if it performs as people expect it, you know, in a lot of cases, that alone might be sufficient, right? That alone might go a long way towards explainability and building that trust that I talked about. And finally, I think our data scientists are just happier. I think they're just happier with this new world.

Data flow and infrastructure

In terms of the data flow, so let me just touch upon this briefly. Describe sort of some, how some of the things we do to ingest data, and also some of the products that we create using the RStudio Connect platform. So we do run RStudio Connect on a Linux D16 VM. We consume data sources from our internal partners via APIs, Cosmos DB, Data Explorer, which is also known as Kusto, and also some SQL databases. We have a mechanism for incorporating that human intelligence via CSV files that we store on Azure Blob Storage, okay? So it is a robust mechanism that can use to override any input data source as appropriate. Our team stores data using the PINS package and the Azure Store R package, right? So we store our data on Azure Blob Storage. We deploy plumber APIs, and the plumber APIs, they power a number of downstream applications as well. I love plumber API, right? It's just super easy to just, with a few lines of code, you can just go and deploy the API on RStudio Connect.

And of course we use a whole bunch of Microsoft and Azure services. So not only do we plan for Azure data centers, we are also users of Azure services ourselves. So one special shout out to Excel, by the way. So our team, I mean, we love RStudio Connect, we love dplyr and PyD, but we still remain big fans of Excel. So my philosophy on this and other topics has been, it's not an either-or situation, it's an and situation, right? It's like, we don't have to pick between RStudio and Excel, we can do both, right? We can use both products successfully in our process. And by the way, there are people who work almost entirely in Excel, but they have a worksheet where they can pull the latest capacity plan, and they do that via the plumber API.

Availability zones and demand shaping

Okay, so before I get to the demo itself, let me give a brief introduction to availability zones. So availability zones are physically and logically separated data centers with their own independent power source and pooling. Okay, so the basic idea is that if one of these zones go down, right, perhaps due to some natural disaster or, you know, or maybe somebody accidentally hits the power off switch. So if any of these things happen, then those workloads can fill over to another availability zone, and that workload can continue to function. So it's this fill over capability that gives our customers workloads that high availability, right? It's how we can guarantee these high SLAs for our customers.

Now, in terms of demand, we actually think of a demand as split into three buckets. So there is a portion of demand that we think of as being pinned to a particular zone. So this workload must happen in this physical location, right? So there is that type of demand. There's a second class of demand where it's just, you know, it just has to be replicated across all three zones, mainly for that resiliency that I was talking about. And the last type of demand is this discretionary demand. So there's this third type of demand where we are basically free to allocate that demand into any or all of these availability zones as we see fit.

Okay, so now, so we can place this demand, we can allocate this demand however we want. And so then the question is, well, how should we allocate this demand, right? So there are a number of policies that we might want to optimize for, right? So one might be we want to allocate demand in such a way that we try to maximize and equalize the amount of excess supply that we have in any of the three zones, right? So that's perhaps one policy objective. Another policy objective might be we want to push demand to zones where we can get the most cost-effective supply, right? So, and there could be others as well, right? There are other heuristics, other policy decisions or policies that we might want to consider when we're trying to allocate this discretionary demand.

Okay, so some background on the demo that you're about to see. So I have implemented a new algorithm for distributing this discretionary demand. I spent hours debugging the code and debugging the code and debugging the code. And at the end of that, I was finally able to create a first version of the capacity plan with this new algorithm. And so I shared the results with our program managers and other partners. And the feedback was, you know, there was some feedback that said, hey, look, this doesn't look right, right? This result doesn't look right. And so there are some genuine bugs that were found. And so I went and fixed those bugs. But there was other feedback that was like, hey, you know, I'm not quite sure what the algorithm is doing in this particular region. I would get questions like, hey, can you actually explain what this algorithm is actually doing? I'm not quite sure I understand, right?

And so it was apparent to me after getting that feedback that I really, really needed some type of visualization to build trust in the algorithm that I just developed, right? To give people a sense of what the algorithm was doing without actually having to delve too deep into the details. So given that, I spent a couple hours putting together an interactive demo that would allow people to adjust some model parameters and to get a feel for the algorithm, right? And so if people can even just get a feel for the algorithm, then I think that would build trust and confidence in the algorithm, even if I don't delve too deeply into the nitty-gritty details of the algorithm.

Demo: interactive Shiny dashboard

Okay, so now let me show the demo of that tool that I just talked about.

Okay, so let's see here. Okay, so the tool that I was just talking about, it is, I used a shiny dashboard to build this tool that tried to explain this new algorithm for allocating discretionary demand. We are looking at supply and demand graphs for a fictional region here, right? So this is, this particular region is HALA. I know Tom is a big fan of Captain Marvel and I'm sure that Microsoft would want to build data centers on HALA if it could.

But anyways, on the top we see regional supply demand, right? So this is supply and demand that's aggregated on the region level. The black line here is demand and on the bottom here you see the demand and supply distributed amongst the three availability zones. So AZ-1, AZ-2, AZ-3, and you're also looking at a base case scenario here. We also have info boxes, right, at the bottom. So there's this risk index, the supply risk index that is color-coded to be either green, yellow, or red, as well as a demand forecast value for the end of fiscal year 2024.

Okay, and so on the left side here and the right side there are sliders that allow users to adjust model parameters. The left sliders allow users to vary the safety stock that we hold in each availability zone. The right sliders, they limit the maximum amount of discretionary demand that we can allocate to a particular availability zone. And before I start playing around with the input parameters, I want to stress again that all computations are live. So as I vary the sliders, calls are being made to a plumber API in the backend and the model is actually re-computing the results you see, right.

So for instance, I might want to increase the safety stock in AZ-2, right. So if I do that, then you see that there's this new dashed line that pops up. This corresponds to the what-if scenario. You can see the effect of changing this input parameter on output results. I can also, for instance, if I limit the discretionary demand that goes to AZ-1. Okay, so if I limit the demand going to AZ-1, you see that the demand line in AZ-1 goes down, but of course that demand should be redistributed to other AZs as well.

So it was through this tool that people got a sense of what the algorithm was doing. And so, yeah, it seems to be doing the right thing, right, even though I didn't exactly explain the nitty-gritty details of the algorithm.

One more minor success story associated with this tool. So I'm going to switch to the grand city of Minas Tirith here. So here's another example that is somewhat similar to a real-world case. So the program manager came to me and he was looking at results, looking similar to this, and he was concerned about this supply gap in AZ-2. Right, so it was, you know, so yes, the algorithm is doing its thing, it's distributing demand, but still, you know, there's this little gap here, right? Maybe the algorithm isn't doing the right thing. So at that time, it was really too late in the publication process for me to make any changes to the code. So I wasn't going to make any changes to the algorithm at this point. So what we did instead was, hey, you know, let's, can we try to adjust the input parameters a little bit to see if we can make things better, right? So it was through this interactive tool where we played around with parameters that we eventually settled on a solution that looked like this, where we said, hey, what if we just limit the amount of discretionary demand that goes to AZ-2? And so you see that when we did that, all of a sudden, the supply risk index turns green. It turns green in this case.

So that was, you know, it was a minor success story, but, you know, I felt very happy about that, right? Not only was this tool able to inform, give people a sense of what the algorithm was doing, but we even used it to do a little planning, right? We even used it to actually implement the change in the input parameter, and that became part of our official plan of record.

Another tool that we built, it's this, you know, long-range capacity plan comparer. So we built this tool to compare the month-over-month changes between different versions of the capacity plan, right? So for this synthetic data set, we have a capacity plan for December 2021 and an updated plan for January 2022. So, you know, people can click execute, get results.

So again, it's just talking to a plumber API in the background. And so, yeah, so this table, by the way, is a reactable table. And so people can quickly see that, hey, you know, on the regional level, the latest capacity plan is asking for, you know, 30.5 units of data center in June 2022. That goes up to about 60 in June 2025. And you can see the deltas, right? The month-over-month changes between the latest version of the capacity plan versus the previous months. And people can also drill down a little bit too, right? Like, hey, what are the contributing factors to this regional forecast, right? The organic component, the networking component, the engineering component. Now the real version of this tool actually has a lot more tables and features. I had to strip out most of the features for this presentation.

Okay. But, you know, for instance, people can also click on the engineering table, right? And, hey, you know, what are the components of these engineering capacity asks, right? So you can see, hey, different engineering groups have different capacity requests.

Getting started with RStudio Connect on Azure

Okay. Here's a, you know, suggestion, right? So if you are interested in trying out RStudio Connect on Azure, how would you go about doing that? Well, I recommend people create an Azure account if they have not already done so. You know, I personally love it. I love Azure. I love using Azure products. So it's, if I remember correctly, I think there are some trial credits that people can get to sort of try it out. Highly recommend Azure services. For the purposes of trying out RStudio Connect on Azure, right? You can provision a Linux virtual machine. We ourselves settled on Ubuntu 18. I think you can also use Ubuntu 20. Install a trial version of RStudio Connect on that VM. If your organization is adopting one of these hybrid cloud strategies, you might also consider provisioning an ExpressRoute instance. So ExpressRoute allows for secure communications between your cloud VM running RStudio Connect and your on-prem network. And finally, we do use Azure Active Directory for user authentication in RStudio Connect.

Okay. So, and then finally, I just want to really, really thank all our fantastic internal partners at Microsoft and our program manager teammates, our teammates at finance and supply. We just have a fantastic team. I'm a very, very happy person here at Microsoft. And a special thank you to Rachel, Tom, and Mitch for helping us onboard to RStudio Connect. So, tremendously helpful in guiding us through the process, giving us tips on how to deploy Shiny. Tremendously, tremendously helpful. And so, with that, I will toss it back to you, Rachel.

Q&A session

Awesome. Thank you so much, Paul. It's really amazing to see how you showed that it can be one hour from model development to stakeholder feedback in that example. I know there's a lot of questions coming in. And depending on where you're watching from, if you're on LinkedIn, when you comment there, the questions will get to us as well. Or you can put comments into Slido or the YouTube chat as well. And Tom is in the background helping me with that. But I see Paul, Neil had asked a question that was, how are you measuring accuracy?

Oh, that's a big topic. One of our program managers, he's passionate about accuracy. I will say that our team has actually been at this for almost five years now. So, we are in a relatively rare position where we did some forecasting back in 2017, and we're actually able to see whether those predictions hand out in the year 2021 and 2022. So, I don't think anything beats actual real-world confirmation.

Great. Thank you. Hey, Monica, I see you had asked a question on YouTube as well. Can you give an example, Paul, of what a shiny dashboard with explainable metrics would look like?

So, we actually did create another tool that gave some summary statistics of our data-centered capacity plans. So, these are, I mean, a lot of these metrics are familiar to our internal stakeholders, right? So, they actually have metrics that they are comfortable with. And so, we actually tried to create metrics that sort of talk to them. We loved shiny dashboards. I'm actually a big fan of shiny dashboards. So, you know, those boxes. I think that was great.

Cool. Thank you, Paul. I'm going to read over a question from Slido as well. So, you can ask anonymous questions there too. One of the questions was, are multiple models used for the various forecasts or is a single most accurate model chosen for each variable?

So, we do have – we do – we have considered multiple models. I think my description of how we're doing the demand forecasting, right, is this model of technology diffusion inspired by Diego Coleman coupled with this hierarchical time series model. I think that is what we're using right now. I mean, if the question is about, like, do we do, like, you know, random forests or some sort of ensemble, our team currently doesn't – we don't currently use that. But I do know that there is – we do have a partner team that does exactly that, that uses an ensemble approach.

Thanks, Paul. I'm just going to keep throwing questions at you because there's a lot of great ones here. One was, was forecast modeling a data analyst-driven initiative or was this primarily through management?

Oh, that's a good question. I think in some sense it's a bit of both, right. I mean, there's a lot of history to that. I mean, certainly within Microsoft, Microsoft understands the value of data science and, you know, analyzing data, right. So, Microsoft has many, many other initiatives, right, you know, involving AI and vision and so forth. So, Microsoft has already bought into this idea of, hey, you're developing forecasts using data science. At the same time, though, there is a little bit of a bottoms-up feel to this as well. I mean, I think our data scientists are tremendously empowered to actually come up with their own ideas, right. It's like, hey, you know, I think we have a better forecast or maybe we can do things differently. So, I think on our team, we really empower our data scientists to sort of drive other initiatives beyond what management tells us.

That's great. I am going back over to some of the questions from LinkedIn, too. We're jumping all over the place here, but Ethan asked, I saw that Power BI was utilized as well. Were there any reasons why you ported this over to Shiny? Was it just more convenient to use Connect as a platform? So, I will, before I answer that, I will say that we still use Power BI dashboards as well, right. So, I think I said this before, my philosophy isn't necessarily an either-or situation, right. So, it's many, you know, many tools. So, we do have a fantastic Power BI platform as well, and then we do use Power BI. But having said that, I think our data scientists, they're most proficient in R, and so it's just easier to code up Shiny apps. It's a familiar language, it's R-based, lots of great packages, and you can test it very quickly on your own desktop machine before deploying it to RStudio. So, at least for our team, it is way more convenient to use RStudio Connect and to deploy Shiny apps. At the same time, there's a definite role for Power BI as well.

Definitely. A lot of other anonymous questions coming in, too, and one was, when publicizing forecasting and planning, how is feedback managed? Are analysts given leeway to consider model feedback, or is it more centrally managed?

I think there's a mix there, right. So, it's not like there's a central team that sort of takes all the feedback and then tells us what to do. It's not like that at all. Analysts are definitely given leeway. We definitely have a lot of control over sort of what things we do. We are still very time-constrained, by the way. So, we're actually quite busy. We can't do it all. So, we do have to be a little bit selective in some cases, right. It's like, hey, let's go after the big-ticket items, the high-priority items.

The other aspect to this is that alongside, I actually didn't mention this, but alongside our use of RStudio Connect, we also developed a business process that goes along with that, right. So, as I said before, I really like this idea where explainability leads to transparency, leads to collaboration. I really think that that is true. And so, in our case, that really did happen. We actually did get a lot more collaboration from our partners. So, every month, we have these forums, right, with multiple partners where our entire team of program managers, data scientists, we sit in the forum alongside our stakeholders. And it's in that entire forum that we consider, hey, here's the latest plan, you know, what's the latest changes, what are the input drivers. So, it's all very open. We have these fantastic forums where we're, you know, getting feedback and collaborating, sometimes in real time. So, it's just really great.

Explainability leads to transparency, leads to collaboration. I really think that that is true. And so, in our case, that really did happen. We actually did get a lot more collaboration from our partners.

I love that. I'd love to learn more about that, too. How did you start those forums? I think it's mainly about organic process, I guess. So, I mean, as I said, the world before, we did have, you know, presentations to partners and stuff. I think it's, at this point in time, those are much more, you know, the scheduling is much more expected, right. So, we have a monthly process and we have, you know, dates when certain forms happen, when certain deliverables happen. So, I guess, I think it's probably, from my perspective, I think that kind of grew organically, I guess.

Nice. I see a great question on YouTube that came in from Darren, and I'm really curious about this, too. What is your biggest future feature request for RStudio Connect or the packages that you're using?

So, like any, again, love RStudio Connect. I do. So, I, myself, so I do use RStudio Connect heavily, but I also use a SQL and C Sharp and Visual Studio heavily as well. So, and so I think some of the debugging features in Visual Studio, I think it's easier to debug code in Visual Studio than it is in RStudio. So, it's feedback. I think, you know, I would be very happy if there were future improvements to RStudio to sort of make the debugging experience a little bit more like Visual Studio.

Going back to, I know, the question about Power BI and RStudio Connect, I see Thomas had a comment as well that said, great presentation. We have embedded Power BI dashboards within our Shiny apps and use Azure functions for some plumber APIs. Is that an architecture you consider, too? I didn't even know that was possible. I'll have to try that out. So, you take the Power BI and you sort of embed it in the Shiny app? Wow, that could be a whole different thing.

I see there's, let's see, some other questions from LinkedIn. Paolo asks, what's the main reason to use a REST API when rerunning the model? I think, so from an engineering perspective, you want to decouple the graphical user interface from the underlying model and code. So, APIs, they really provide a, you know, language-independent way of sort of decoupling back-end code from GUI. So, I think that's very good design.

Thanks, Paul. One of the anonymous questions that came through is, who are the users of your platform, like teams and titles, for example, and how many concurrent users can your app handle at one time?

So, at this point, to be honest, we are still running on one Linux VM. So, it is highly available, but it is just one Linux VM. We should expand that out. So, use Azure scale sets to sort of have multiple VMs. I think our license right now, it's a couple hundred users, right? So, I think we can handle a couple hundred users maximum at this point. But, yeah, to really scale out, we need to sort of stand up more Linux VMs and have that use an Azure scale set.

Thanks, Paul. I realize an hour went by really quickly and just want to double-check that you have a few more minutes to ask a few other questions. Okay, great. I see Heath has said, I'm impressed that you train and deploy models in hours rather than days. What is the pipeline that you're using for modeling? Is that tidy models?

So, training and deploying. So, I think there are certain ways of running your data pipeline where you do the majority of training sort of at the front of the process, if that makes sense. So, at the front of the process, if that makes sense, right? So, I mean, think of, so what is that again? Think of transfer learning, right? So, in the transfer learning scenario, the models are almost already pre-trained and you're sort of simply sort of replacing the last layers of that model to sort of adapt that model to whatever situation you're looking at. Now, our situation is not quite like that, but it is true that, and as I said, we're not working with petabytes of data either. We are working with hundreds of gigabytes of data. So, that's another reason why it's faster for us to sort of run the models on our local machines, right? If anything takes days, we're back to Azure Synapse, right? Then we're not running these things on our local machines anymore.

Thanks, Paul. Someone asked, are you all using linear programming to do these zone modeling functions for the discretionary demand piece? The short answer is no. I actually think that in my, so I've actually worked on a couple other projects where, you know, these types of constraint satisfaction problems where, you know, you're trying to allocate demand or allocate, you know, number of seats or whatever in some problem. I've actually, to be honest with you, I've actually never used linear programming in practice myself. I've often used these types of quote-unquote traditional AI techniques to try to do the optimization. In this particular case, it's actually simpler than that. We just, there was just a bunch of heuristics that we employed in the algorithm. And so, I'm pretty sure that the heuristics that we employed are not necessarily optimal. There is a case to be made to sort of balance between optimality and explainability, by the way, right? So, if you try to optimize your algorithm too much, you have a, in general, you might have a harder time explaining. So, the short answer is no, we don't use linear programming.

Thanks, Paul. There was another question about the dashboard itself. So, for external and internal users of the dashboard, is there a need to consider other layouts to accommodate needs of different stakeholders? Oh, that's an awesome question. So, we do have a bunch of tentative projects within Microsoft, right? So, these are projects where only certain people within Microsoft should have access to that information. So, in that case, we actually came up with a simple solution, right? So, we do have access to other databases within Microsoft that says that, hey, this set of users have access to this information, right? And so, we basically put in if switches in the Shiny app, right? That says, if the user is in this group, then go ahead and enable this conditional panel to sort of show that data.

Yeah, that's a great question. Thank you. One other question that came on YouTube, jumping over there, was from Robert. It asks, how do you tackle troubleshooting the application itself on Azure when problems come up?

Troubleshooting on Azure itself. So, I have to be honest, and this is truth here. We haven't had much problems with Azure itself. So, I think our main focus when we did have to debug was actually, you know, our logic, right? So, our code, our Shiny apps, you know, sometimes our C-sharp code. So, it's, you know, a lot of the debugging and troubleshooting is on that front. Our Azure services have been, the services we use, they've actually have been pretty stable, highly available. So, we haven't had the opportunity or the necessity to have to troubleshoot.

Thanks, Paul. One other question about Azure. On Slido, someone said, the Azure VM sizing seems quite huge. Is that for RStudio Connect or R processes only, or are there other processes running on the VM? So, our Linux VM is totally dedicated to running the RStudio Connect. So, yes, it is a little big. Yes

Featured software#