Ralph Asher & Laura Darby Rose | R in Supply Chain Management | RStudio
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to the RStudio Enterprise Community Meetup. I'm Rachel Dempsey. I'll be your host for today's R in Supply Chain Meetup. I'm joined by my colleague and co-host for today, Curtis Kephart, if you want to say hello as well, Curtis.
And we're also joined by our two awesome speakers for today, Laura Darby Rose and Ralph Asher. If you just joined now, feel free to introduce yourselves through the chat window as well and say hello, maybe include where you're calling in from. But just to go through a brief agenda while we wait for a few others from the waiting room, we'll go through some introductions of the meetup, have two awesome presentations, Intro to Supply Chain Design with Ralph Asher and Forecasting Demand with R with Laura Rose.
We'd like to also use today's meetup to kick off a Supply Chain Working Group, too. And Curtis Kephart from our team will give us some more information on that and how to join if this is something that's interesting to you. And then lastly, we'll open it up for additional audience Q&A and just open discussion among everyone. I did mute everybody upon entry right now, but towards the end, we will unmute. But if you do have questions during any of the presentations, we do have a Slido link that you can use. So you can ask anonymously or include your name there as well.
And we'll share that in the Zoom window right now. That will really help us to organize questions, too, and you can upvote people's questions. And just a quick note that the recording will be shared up to the RStudio YouTube as well. So if you want to share it with anyone or go back and listen later, that will be shared up there. One other housekeeping note, if you want to turn on live transcription during the meeting, you can do so as well. And that's offered in the Zoom bar below.
For anyone who's joining for the first time today, I'd love to welcome you to this meetup group. This is a friendly and open meetup environment for teams to share the work that they're doing within their organizations, teach lessons learned, network with others, and really just allow us all to learn from each other. So thank you all for making this a welcoming community. I really want to create a space where everybody can participate and we can hear from everyone. I want to reiterate that we love to hear from everyone, no matter your level of experience in the field as well.
If you ever have suggestions or general feedback or want to speak at a future meetup, I'll share a few links in the chat window right after this as well. But with all that, I would love to introduce our first speaker today, Ralph Asher. Ralph is the founder of Data-Driven Supply Chain. And prior to founding Data-Driven Supply Chain, Ralph worked as an operations research scientist in corporate supply chain functions at Target and General Mills. And so with that, I will turn it over to you, Ralph.
Introduction to supply chain network design
All right. Thank you, Rachel, for the introduction. Hello to everyone who's on this webinar, wherever you may be. Just reading through the chat, sounds like we have a global presence already, and that's great. I really appreciate the opportunity to speak with you today. As mentioned, my name is Ralph Asher, and today I'll be talking about supply chain network design. What is it, first off, and how can we use open source languages like R and Python to design supply chains?
So a little about myself. Like many on this call, I have a math and science background. I have a bachelor's in physics and a master's in operations research. I started my career after college in the military. I was on active duty in the Marine Corps for several years, and I've been serving as a reserve Marine in a part-time capacity for about 10 years since then. After the military, I worked in supply chain design and simulation for eight years at General Motors in Target here in beautiful, cold Minneapolis, Minnesota. I left the corporate world last summer to found my own consulting firm. In addition to running my company and serving in the Marine Reserve, I'm also an adjunct at the University of Minnesota Business School, where I teach statistics. I've been an R user for 10 years, since 2012, and I really found the language to be an ever-increasingly versatile tool for my work.
So supply chains, from the back office to the front page. The COVID-19 pandemic has really caused unprecedented disruption to global supply chains. Manufacturing shutdowns as well as labor shortages and a dramatic shift in consumer spending from services to goods have caused massive disruptions to the previously invisible flow of materials. This disruption can be seen in the container ships waiting off the coast of Southern California, awaiting a dock in Los Angeles and Long Beach. It can be seen in the reconsideration of reshoring American manufacturing capacity or the delays in building and repairing homes due to supply shortages. It can be seen in the inflation that you see at the store shelves and at the gas pump. And even Santa Claus can't fix these problems. And he has the most advanced logistics operation known to mankind, or Elfkind.
Instead, the task of ensuring supply chains continue to function, so we literally have food on our plate and fuel in our cars, that falls to the legions of hardworking supply chain management professionals. And as a data scientist, I'm really quite proud to be able to use my quantitative and technical skills to help these efforts.
So you may be asking yourself, how can analytics be used in the management of supply chains? Depending upon your background, this slide may be old news to you, but I find it's useful to discuss supply chain analytics in the framework of descriptive, predictive, and prescriptive analytics. Now, descriptive analytics, the biggest and most important part of the pyramid, includes business intelligence and diagnostic analytics of the supply chain. How many of my trucks arrived on time? How productive were my manufacturing plants in my warehouses? How are these metrics trending? And are any of them worrisome? Predictive analytics is next. How do we predict what's going to happen next in our supply chain? That includes things like sales forecasting, as we'll be discussing next, as well as forecasting costs for your supply chain inputs and some types of supply chain simulation. Finally, the top of the pyramid, which by volume should occupy the least amount of time, is prescriptive analytics. What should I do with my supply chain? This is my wheelhouse, and it includes things like supply chain network design, discussed today, as well as inventory optimization and other types of supply chain simulation.
Now, a key point for this audience, especially, is that open source technologies are really changing the landscape in supply chain management and supply chain analytics. When you look at each of these capabilities, you can mentally pick out which R or Python packages you could use for them. This really is a sea change from when I started in the supply chain analytics field about a decade ago, when the toolkit consisted of Excel for data analysis, SPSS for statistics, and commercial visualization software like Tableau.
So, what is supply chain network design? Put plainly, it's the strategic analysis of the locations and activities of the various elements of a firm's supply chain. For example, if I'm a manufacturer, where should I locate warehouses for my raw materials, the stuff I use to make other stuff, or my semi-finished goods, which are items that are only partway through the manufacturing process? Where should I put my manufacturing facilities, my plants? Where should I put my distribution centers, my warehouses for finished goods? And if I'm a retailer, how should I use my stores as a link in my supply chain? For example, many leading retailers, including my former employer, Target, use their stores as strategic assets for shipping e-commerce orders. Supply chain network design seeks to answer these big questions, all with the intent to understand the impacts on profitability, cost, service, and other key metrics, and with the goal to minimize or maximize one or two key metrics, usually cost or profitability.
Today, I'll be focusing on a relatively simple example of supply chain network design, a distribution network with two echelons, the distribution centers, or warehouses, and customers that are serviced by those warehouses. In designing a supply chain, we ask ourselves, given a customer base of demand aggregated to a city level, one, where should we open distribution centers to satisfy that demand? And two, what customers should be serviced by or aligned to each distribution center? While most real life supply chain design problems are much, much more complex than this example, I hope it will be informative on how practitioners approach these kinds of problems and how we can use open source technology for them.
Optimization modeling for supply chain design
So how do we do this? How can we answer these questions? We can answer them with optimization modeling. Optimization modeling is a mathematical technique that allows a practitioner to optimize a set of interconnected decisions where the decisions are constrained by real world limitations. Optimization solvers take the decision options formulated as a mathematical model and return the set of decisions that optimizes, minimizes, or maximizes an equation that includes the decisions as inputs. This equation is formally known as an objective function.
So in our case, the decisions we need to make are one, how many distribution centers should open and where should we locate them? And two, what customers should align to each distribution center? You can see that these decisions, they're interconnected because I can't align a customer to a distribution center in a certain city unless I also open a distribution center in that city. And the limitations on our decisions or our constraints, they're one, that all customer demand must be met. We may have a minimum or maximum number of distribution centers. We may really only realistically open distribution centers in certain locations. We may have a maximum volume per distribution center. And for certain reasons, we may put a maximum distance between a customer and its servicing distribution center.
So we'll shortly transition to the Shiny app I built for the webinar. But first I'll introduce this scenario. You are a supply chain data scientist at Heartland Widgets Incorporated. Your company is the market leader in widget sales across eight state area of the American Midwest. The pandemic-induced shift in consumer demand from services to goods, as well as stay-at-home orders, has led to a boom in demand for your company's flagship product, the at-home widget. Based with record demand, the Director of Supply Chain Network Planning has asked you to, quote, reimagine the company's distribution network. You are to recommend the where to open Heartland's distribution centers, as well as the customer to distribution center alignment, to achieve a minimal cost.
Heartland Widgets has customers in 117 Midwestern cities, spread from the Dakotas in the west, all the way to my home state of Indiana in the east. Each city's demand is proportional to its population. So, for example, Chicago, Illinois, which is a big city, you may have heard of it, has more demand than Fargo, North Dakota. Heartland is considering 20 cities for potential distribution center sites. Because each distribution center will cost $1 million to open, it may not be cost effective to open distribution centers in all 20 cities, but we're keeping our options open.
So, let's get a little bit more technical here. The decisions we need to make for Heartland Widgets can be represented by variables. When we ask, what customer should we align to which distribution center, this is equivalent to asking a set of yes-no decisions that must be made in concert. For each customer city I, should I service it by a distribution center in city J, yes or no? In making our decisions, we're limited to having exactly one yes per customer city. It has to be serviced by exactly one distribution center. Relatedly, we need to make decisions about how many distribution centers to open and where to locate them. This is a series of if-then decisions related to our alignment decisions. If at least one customer is aligned to a distribution center in, for example, Omaha, Nebraska, then we must open a distribution center in Omaha.
Mathematical optimization and, more specifically, the techniques of linear and integer programming are ideal for making these kinds of interrelated decisions. When we use mathematical optimization, including for supply chain design, we're trying to minimize or maximize some equation that is the function of our decisions. In this example, Heartland Widgets wants to design the supply chain with the lowest total cost. The supply chain costs in this example are, one, the opening costs for distribution centers, which are priced at $1 million each, two, the handling costs of the distribution centers, which is one cent per unit per widget, and three, the transportation costs from distribution centers to customers, which are five cents per unit mile.
Now, a unit mile is moving one unit one mile. So, for example, if you have two widgets, two units, and you move them 100 miles, that's two times 100, 200 unit miles. At five cents each, that's $10 to move two units 100 miles. You add up these three components, cost components, and that leads to the total supply chain costs that we're trying to minimize.
Now, one of the tenets of supply chain design is that these cost components can be inversely related to each other. For example, the more distribution centers you have, the closer you are, on average, to your customers. And if you're closer to your customers, on average, you're going to spend less on getting the product to them on that transportation cost because it's just closer. However, the more distribution centers you have, the more you spend to open them in the first place. So, there's an inverse relationship between the distribution center opening cost and the transportation cost to get the product to your customer. Optimization modeling helps us figure out these relationships and find the optimal solution that takes into account the relationships between the costs.
Optimization modeling helps us figure out these relationships and find the optimal solution that takes into account the relationships between the costs.
R packages and the Shiny app demo
All right. So, because this is an R Meetup and I'm going to demo a Shiny app, I want to go over the packages I used. This is a Tidyverse-centric solution for data prep, but I also use DT, Data Table, and Vroom. Shiny Dashboard is used for visualization, and I use our hands-on table to allow for interactive adjustment to input tables. Leaflet is used as a mapping solution, and the core of the optimization modeling rests with the OMPR package and the open-source Symfony optimization solver. As an aside, the CoinRR Foundation, the organization that maintains Symfony and other open-source optimization tools like Pulp and Pyomo for you Python users, many in financial help, please consider donating via that GitHub link.
All right. From here, I'm going to work from the app, but I have slides for backup in case of technical difficulties. I host my app on shinyapps.io. You'll see the address, but to avoid having technical issues, I'd ask that, you know, nobody try to load it while I'm giving the demo.
All right. So, this is a pretty sparse app because it's a fairly simple example. And first up is uploading our data. So, we really only have two sets of input data, our customer information and our distribution center information. All right. So, now I'm going to inspect this upload data and talk a little bit through it. First, our customer data. So, our customers are, as mentioned, 117 Midwestern cities. And for each of these, we have city, name, state, latitude and longitude, their demand, which is proportional to their population. And then we have this column called maximum miles to distribution center.
Finally, we have this column called include. And this is a nice thing about using our hands-on table because you can actually dynamically update the table that you use later on in your app. So, everything has a checkmark here. If I unchecked any of these, they would not be used in the optimization model. So, I'm going to display the customer map. You see these 117 cities across the Midwest.
Distribution centers, similarly, we have 20 candidate cities across the Midwest. And to put our distribution centers in, we have a latitude, our longitude, our capacity, which is the maximum volume that it can have aligned to it. Each of them costs $1 million to run, and our handling cost is $0.01 per unit, as mentioned earlier. And this checkmark allow essentially says, do I want to consider this location?
So, finally, we have some inputs here before we run the optimization. So, I mentioned earlier that you can put a constraint around your minimum and maximum number of distribution centers. And so, I'm going to start off by saying I'm going to have a minimum of one – I need a warehouse somewhere – and a maximum of 20. Because there's only 20 options here, having 20 as your maximum is kind of like saying I'm willing to take any of them.
All right. So, when I click Run Optimization, what's happening now is my app is taking the customer information, the distribution center information, and the maximum and minimum number of distribution centers, and forming an optimization model on the backside, on the Shiny app server. It then finds the optimal solution, the minimal cost, and returns the outputs.
All right. So, let's look and see what those optimal solutions are. So, to service all of our customer demand for Heartland widgets, the optimal lowest cost solution is four distribution centers, one in Omaha, Nebraska, one here in Minneapolis, one in Chicago, and one in my old hometown, Indianapolis. You can see the cities that are serviced by each distribution center are represented by their color, you know, red for Omaha and blue for Indianapolis, green for Chicago, yellow for Minneapolis. And they're kind of geographically separated, as we would kind of expect, because you generally want to have your customers closer to your distribution center.
This table is an alignment table, which is just the tabular version of the map that we're seeing above. The farthest distance from a customer to its servicing distribution center is 415 miles, and that's from Rapid City, South Dakota, over here, all the way to Omaha. Take note of that, because we'll actually come back to that in a little bit.
So, what is the optimal cost here? The total supply chain cost that can be the minimal total supply chain cost in this scenario is $7.5 million over here. That includes $4 million in distribution center opening costs, $1 million per distribution center and four of them, $13,000 for handling at the distribution centers, and about $3.5 million in transportation costs. So, our total $4 million plus $13,000 plus $3.5 million equals $7.5 million, and that is our absolute lowest total cost.
Understanding constraints and trade-offs
Now, I'm going to make a small detour before we get into another scenario to explain more what we mean by constraints in supply chain network design. In optimization modeling, constraints are mathematical limitations on what set of decisions you can make that correspond to real life limitations. The more constraints you have in your model and in real life, the worse your objective function becomes. Optimization modeling, including supply chain network design, is an exercise in understanding the trade-offs involved in making constraints more or less restrictive or tighter, which is the term of art. In designing a supply chain, the tighter the constraints, the higher the total cost becomes. Supply chain design is partly an art around understanding the additional expenses associated with tighter constraints.
Next, we'll talk about what a couple of those constraints may look like. So, first, we see that the lowest total supply chain cost occurs when we have four distribution centers in these cities up here. So, what would happen if we actually force our model to have exactly three distribution centers? How will that change our cost? How will that affect what our network looks like? This is actually quite straightforward to see. All we need to do is go back to our minimum and maximum number of distribution center constraints. So, I'm going to change the minimum to three and the maximum to three.
And now we see that instead of four distribution centers, we have the required three. And it's the same thing, except now Indianapolis is gone. So, let's look at these outputs. Our DC opening costs are now $3 million because we only have three distribution centers and they're $1 million a piece. Our handling cost is the same because it's the same for all distribution centers and we have the same amount of volume. But our transportation cost, which was $3.5 million before is now $4.6 million. So, while our distribution center opening costs went down $1 million, our transportation cost went up $1,100,000. And so, our total cost went from $7.5 million to $7.6 million.
Next, we'll see what happens when we enforce a maximum distance constraint between our distribution centers and our customers. This is often the case with food supply chains because food is perishable and you don't want your food outside of a climate controlled warehouse for too long before it gets to a store. And so, oftentimes when designing food supply chains, you would put in a distance constraint.
So, what would happen if we did force a 400-mile constraint? Is that instead of, you know, we still have four distribution centers as the optimal setting, but instead of a warehouse in Omaha, Nebraska, we would have it in Sioux Falls, South Dakota. Why? Because that is the distribution center that allows Rapid City, South Dakota over here, which is a very remote customer, to be within 400 miles. And the impact on the cost structure is that instead of $3.5 million in transportation costs, we now have $4 million in transportation costs. So, by adding that additional constraint around 400 miles from your distribution center to your customers, your total supply chain cost rose from $7.5 million to $8 million.
Let's say we actually wanted to see the overall total cost at different number of distribution centers. I went and ran this model, forcing it to have exactly one distribution center, exactly two distribution centers, three distribution centers, all the way to nine. I then recorded the cost components between the handling cost, the opening cost, and the transportation cost. The handling cost in green is flat, no matter how many distribution centers you have, because it's the same volume, just split in different ways. This blue line is our distribution center opening cost, which is linear, because with the more distribution centers, you're spending another million dollars. Your transportation cost in red goes down, because as I mentioned, the more distribution centers you have, the closer you are on average to your customers, so your transportation cost goes down.
Your total supply chain cost, which is this purple line, is kind of a parabola. The minimal cost occurs at four distribution centers, as we saw before, but between, for example, two and five, there's not a tremendous amount of difference. If I was doing an actual study for a client, I'd be presenting something like this and saying, actually, while four distribution centers is your minimal cost network, between two and five, there's not much of a difference in cost, and there's pros and cons of each of those.
All right, well, that's most of the content, folks, so I really appreciate your time today. I just wanted to wrap up by saying and emphasizing that supply chain management really was a behind-the-scenes corporate function until the beginning of the COVID pandemic, but now it's front-page news. And analytics powered by open-source programming languages, it really is changing the nature of this field. Supply chain network design, it's kind of an obscure part of supply chain analytics, but I find it to be a powerful tool in devising and evaluating the organization's supply chain strategy, and hopefully today you just got a taste of what that might look like.
And thank you for listening. If you'd like to know more, please reach out.
Q&A with Ralph Asher
Awesome. Thank you so much, Ralph. And I just want to reiterate to everyone, I see a bunch of questions coming in through the Zoom chat as well. If you could use the Slido, that would be super helpful, so we can kind of help see the questions versus comments in the chat.
But Ralph, I'd love to ask a few questions that came in on Slido now, and then we can hold a few others until the end as well. But one I see was from Jerry. That is, how large of a problem can the open-source solvers handle from your experience? How often do you encounter models that would require a commercial tool?
Yeah, I mean, I can speak in generalities. I found that it's more just a question around what size of machine you put it on. I've ran some pretty large optimization models on open-source solvers. And if you have the right hardware, and you're kind of paying for it in your time or you're paying for it in your dollars, and so if you're willing to wait an hour longer than you would otherwise, it's well worth it. I haven't, both in my consulting and in my corporate work, I've not encountered a size of a problem where an open-source solver couldn't handle it, just like wouldn't compute.
Yeah, yeah. And so you can kind of shape your objective function to reflect whatever combined metric you may care about. So in this case, the objective function that you saw was purely a cost function. But I've in the past done things like, well, here's my cost. And here's, for example, a customer service metric. That is also a function of my decision variables. And you can weight them. Now, when you see the output, it's a number, it doesn't really reflect like, you know, dollar bills on the table or anything like that. But it can kind of, and the more heavily you weight cost versus customer success or customer satisfaction, that combined metric can lean towards one way or the other.
Forecasting demand with R — Laura Darby Rose
But with that, I'd love to turn over to our second speaker, Laura Darby-Rose. Laura is Manager of Demand at Mallinckrodt Pharmaceuticals, where she's responsible for statistical forecasting, forecast visualization, and forecast accuracy measurement. With that, I'll turn it over to you, Laura.
Thank you, Rachel. Okay, so I'm going to talk about our experience at Mallinckrodt with forecasting demand at R. So we went through a project about a year ago to replace a software as a service, I will not name that company, my boss has advised me to not name it, with an open source R solution.
So a little bit about my background. I'm a demand manager, as Rachel mentioned. I work in the supply chain department. And in Mallinckrodt, we have a couple of different divisions. So I work for the specialty generics division. And we make basically generics that you might be prescribed when you go to the doctor or your nurse practitioner or whoever you might go to see. So a lot of pain management, acetaminophen, other things as well. So my main job responsibilities are stat forecasting, and then measuring forecast accuracy.
And I wanted to add a caveat here that I only had about five months of experience regularly programming in R when I started the project. And I share that with you all to hopefully encourage people who are just getting started in R or Python and, you know, thinking about doing a project that will really involve R or Python. If you know enough to write a script, you probably know enough to do a project that will really impact your company.
If you know enough to write a script, you probably know enough to do a project that will really impact your company.
Okay, so how the project began. So imagine it's spring 2020. Actually, you don't have to imagine, right? It's probably all still fresh in our minds. And you're staying home, being a good citizen, and you're a little bored. So what did I do? Created a little Shiny app, a simple Shiny app for time series forecasting. And shout out to the R-Ladies organization. I actually got the inspiration for the app from R-Ladies St. Louis meetup. So, you know, I'm working remotely and I want to impress to my manager that, hey, I'm being productive. So in one of our calls, I showed him the app and he was impressed with it. And he thought, well, maybe we can replace the SAS that we were currently using for stat forecasting with this app or something like it.
So the SAS that we were using costs about $200,000 per year, which for my company is not small change. We were definitely in a cost savings mode. It was also generating negative number demand forecasts. So keep in mind that that's what do you even do with that, right? We're not going to sell a negative number. The IT support for this company could not figure out what was going on. So it was costing us a lot of time and effort to basically modify these forecasts.
So after my manager's initial approval, I collected some demand data. We chose about 30 SKUs that we thought were representative of the business. And I did a rolling forecast accuracy analysis. We wanted to have an idea before we proceeded further, what kind of accuracy we could expect if we switch to the R forecast package. And specifically, the ETS function, which is the exponential smoothing, commonly used in univariate supply chain forecasting, also the ARIMA function.
So the ETS function was actually comparable to what was used or what we had enabled in the SAS. Various exponential smoothing models were selected by the demand forecaster. So I ran this rolling forecast accuracy analysis. And to give you an idea of how perhaps not efficient I was at the time, and I'm sure I'll be saying the same thing about myself in a couple years today, I had three for loops that I used to do this. And anybody who's programmed in R for a while knows that's generally not the best strategy. But it did work. And we were satisfied that, hey, just doing like setting the ETS algorithm to best, you know, best fit, minimizing the AICC, that was comparable to what we were getting in terms of forecast accuracy from our SAS.
Also, the benefit was, hey, we didn't need manual intervention from the demand analyst or manager. So we made the decision to move forward with this project. And then we had to get the buy-in from the IT group at Mallinckrodt. It wasn't difficult to get buy-in for a cost savings project. They were also frustrated with the lack of support that we were getting from the SAS company and the time required that we were all spending on doing useless tasks.
Setting up the enterprise R environment
So up until then, I had written some other scripts that I used for various data wrangling tasks, the kind of stuff you do in supply chain analysis, right, or demand planning, which is my specific area of expertise. And we didn't want to just have me running scripts on RStudio Desktop, as useful as the free version of RStudio is. We wanted more enterprise-capable options. So we looked into RStudio Server Pro, and I believe the most recent version of that is called Workbench. Then we also contracted with Lander Analytics, and they helped with server setup and maintenance and then application installation. So it is a Linux server, for those of you who might be more Windows people. And at Mallinckrodt, we're more of a Windows shop, is what I've been told by our IT group, so definitely needed a little bit more guidance in setting that up. And we had both a production and dev server that were supported with them.
Okay, so we're moving forward in this project. We made a decision to go forward. The IT analyst who was responsible for maintaining the data used for SAS forecasting needed to collaborate with me to replace the data management. So not the forecasting, but those of you who might have used the SAS at your company for forecasting or other things, note that the SAS can sometimes do functions that are not just the main function that you may have purchased it for. So we needed to do in-house solutions.
So we had tables for both cleanse history and non-cleanse history. So cleanse being removing outliers, sometimes you might decide to not include all history for consideration in the stat forecast model, because there might have been a structural change, something like that. So we wanted to move away from Excel spreadsheets. There was definitely a time and a place for spreadsheets, but we just didn't feel that was in the best interest of the business. So we used the DBI package, which probably many of you are familiar with, and then the ODBC package. And just using some basic SQL, we got those set up, and it was very easy to connect to the AS400 system, pull in those tables that I would need to estimate models and calculate forecasts.
We also changed our outlier cleansing process, as I alluded to in the second point, and that is because we had previously cleansed the history directly in the SAS application. So we just moved to a pretty simple process. And we had an Excel upload. If you needed to remove an outlier, you'd upload an Excel file to our AS400 system. So probably could have been – well, we're changing the process actually now, and I'll get more to that in the future. But, you know, it worked in the initial stages, it worked, and sometimes you have to go with that.
And then we also needed a table to store stat forecasts generated from the R script. So as I said before, no Excel files involved. We were going to write that to a table. And then we also needed another file or table, so to speak, for storing historical forecasts, and that is to measure forecast accuracy.
Okay, so we were able to use RStudio Server Pro for other uses besides stat forecasting. As I alluded to earlier in my presentation, I had written other R scripts to do a lot of data analysis and wrangling tasks, the kind of things that you don't want to spend a lot of time on if you can. Measuring weight of MAPE. Weight of MAPE is what we use to measure our forecast accuracy. Checking the commercial forecast upload versus the download. I am responsible for loading the commercial partners forecast to the ERP system, and depending on how things go, we might have kind of phantom forecasts hanging out in the ERP system.
Also, last year with the – well, I said last year. I guess it's two years ago almost now. We had a lot of orders for acetaminophen with the start of the pandemic, and our product director needed some help in understanding how to allocate orders versus forecasts. So it was very simple to set these jobs up to run on the server using Cron R, and that's – as you would expect by the name, it's an R package which interfaces with Cron.
So the scripts take a lot of time, all these jobs set up. We could compile a lot of data reports, no intervention, or maybe a little bit of just give it a check from demand managers. And then the real test is when I was out on short-term disability this past summer. The jobs ran well. I saved my coworker a lot of time having to cover my responsibility as well as his.
So my coworker left the company in November, so I've since been taking over his former duties. And I've been writing various R scripts to kind of hopefully improve some of the processes, and it really has made it possible. Scheduling these jobs on the RStudio Server Pro interface has really made it possible for me to do my job as well as his previous duties.
Project outcomes and future improvements
Okay, to summarize the project, you're probably wondering how long did all this take. It took about 11 months. And I will say this, you could probably do this kind of project where you would totally replace the software you've been using in less time. We happened to have a lot of time based on when the contract with the SAS was going to be up, and so we had several months in early 2021 where we were running both concurrently.
There are definitely improvements to be made to the forecasting processes, and I can tell you that as well as anybody at the company. That's what I deal with day in, day out. But the initial goal of replacing the SAS with a solution at least as effective was completed. So we consider that the initial stage to be a success. We measure on the lag three forecast accuracy, which basically takes into consideration essentially it's sort of like lag four. But it's been about the same, depending on the month, a little bit worse, a few percentage points. But we did meet our goal for 2021. So my boss is happy. I'm happy.
And the cost savings is definitely worth it for a slight trade off in accuracy. And we are still using pretty simple models like, you know, ARIMA, exponential smoothing models. So we hope that as we introduce new models, we will improve accuracy over time.
And then I alluded to this earlier in the presentation. So I started out with the forecast package in summer 2020. And then I found out, I guess it was probably old news, but it was new news to me at that time about the tidyverts suite of packages, which is kind of the replacement for the forecast package. And so I switched to that. I switched over the code that I had written to that and took a little bit to kind of make sure I understood everything. But it's a great, great suite of packages that includes Fable, Feast, Sybil, Sybil data, I think. So I have found those to be really useful for automatic forecasting. If you have a lot of skews and you don't have time to go in and tweak parameters, figure out the best ARIMA model, figure out the best exponential smoothing model, for instance, like univariate forecasting. The models they produce, the forecast for supply chain are stable forecasts. They don't require a lot of intervention. You're not going to get anything unbelievable or crazy, I guess.
Okay, so I mentioned the Shiny app earlier. So the impetus of the project started with a simple time series forecasting app. We made the decision to forecast the script scheduled as a monthly job instead of an app. And part of the reason for that was it seemed simpler at the time, time savings. Also, it ended up working out, we weren't really ready to go with RStudio Connect at the time, which we would have needed to host the app for an enterprise, in an enterprise capable way.
So after that, I developed another Shiny app, kind of based on the first one, which is more, uses a number of different packages, the tidyverts packages, of course, DT, I think Ralph mentioned that in his app, to edit history, you can change parameters, basically interfaces with our AS400 system via SQL. So you can, through the app, you can interactively edit our data in our company's systems. And this allows me to, especially for those high volume, really important SKUs, to go in and pick out specifications I think are optimal.
So future state, it's always good to be thinking about continuous improvement, right? So I'm working on another Shiny app to visualize our generics data, that's our finished dosage data, with a pivot table, probably using the pivot tabler package, not sure yet, I might look into a few other packages. And then we either look at monthly or weekly doses or bottles, depending on what the user would want. So I currently use this very slow tool, using a legacy system I will not name. And we use this for creating visualizations for the demand review slide deck, where we look at our customer demand, compare that to our, look at our forecast, see what's changed, look at our financials, that kind of a thing.
And then another project we started last month is looking at the commercial team forecast. So they currently don't use much statistical forecasting. So they are interested, though, in looking at their end purchaser demand and how we can roll that up under distributor customers. Mallinckrodt is a manufacturer, so we sell to some big time distributor customers, you may have heard of in the news. And there's certain customers that are pricing contracts. And then those distributor customers sell to, you know, more of retail pharmacies kind of a thing. So we're hoping to get a better picture of our demand to kind of tie that out and build a better forecast for them in that way.
And then in terms of model improvement, those of you who are familiar with the tidyverts packages probably know that you can do hierarchical forecasting as well. We did explore that a little bit in the initial stages of the project when we were trying to understand what we wanted, how much we wanted to experiment with different kinds of models. My boss thought it wasn't a good idea at the time. But now that we've completed the initial stage, we'll probably look into that a little bit more. And then also machine learning methods for time series. We would need to do that on weekly data. We don't have enough monthly data really to make it worth it. But model time is a good package. So that is something that probably this year we are looking into exploring.
So to summarize, I just, a lot of this on this slide is common sense, but I think it's probably worth reiterating. You're interested in doing like a project where you replace a, you know, a SAS with an R solution, start sooner versus later. Something that we learned sometimes the hard way is that the SAS will perform other functions besides forecasting. So you need to come up with solutions for data storage and management. And those solutions might not be R specific, they need to be compatible with, you know, whatever you're doing in R for forecasting. Also, project management 101, I guess, make sure you've an estimate of the time involved on your part, as well as many collaborators, because you don't want to come up to the end of the project and figure out that they don't have enough time to give you to complete the project.
And then I wish somebody had told me this ahead of time, learning some project management skills is helpful. Probably a lot of us on this call have good technical skills. When I was trying to map out this process, which is my subsequent point, I had really no idea what I was doing. And it got done, but it could have been a lot more efficient.
Okay, and then I feel like I might be a little bit of a broken record here, but you don't want to realize you haven't accounted for something you previously relied on the SAS to do when you've almost completed the project. And as you probably expect, we learned the hard way. My coworker used the SAS for something I did not exactly know what he was doing. And I think there was a miscommunication between perhaps an IT analyst and him, not throwing blame, but we realized, oh, we never accounted for this. Now we're practically finished with this project. We need to make sure we have something that will allow him to do what he did with the SAS. So you don't want to have our experience there. And then I'll finish. Hopefully this is not too pessimistic of a note, but with a paraphrasing of Murphy's Law, if something can go wrong, it will. So plan for all the contingencies. And since we're all in supply chain planning, that shouldn't be too hard.
Q&A with Laura Darby Rose
Yeah, I can help with reading this. But thank you so much, Laura. That was an awesome presentation. Really appreciate it. But I did see one question that just came through quickly on the chat was, if you could please repeat the name of the package that you used, which allows the user to edit data interactively.
Oh, sure. Sure. DT. But I know, and Ralph mentioned this in his presentation, R hands-on tables, another good one. And there are other ones out there. I was familiar with DT, and I already gone down that path. So I was just like, I'm sticking with it. If I was building it from scratch, maybe I'd do R hands-on table. But yeah, there are several good packages out there.
Demand data. So we have two divisions. Yeah, invoice sales. Yeah, it's by request date. So it's not shipment. It's supposed to be representation of what our true demand is. So not taking into account any supply issues. So we sell business to business. So we supply a lot of the pharmaceutical companies you've probably heard of in the news.
So R and Shiny is a clear winner for me. So I have used a commonly – I can't name names, but a commonly used data visualization tool, which is kind of a click and drag and everything. And I just don't find it. I like to know what's going on behind the scenes. And when I'm writing that Shiny code, I know how things fit together. But when I'm doing – if I was using more in Tableau or another tool, I wouldn't – and I know there's beautiful visualizations that can be done in Tableau. But I wouldn't have the same sense of control. So that's my personal preference. And I'm not saying it's the only – it's the right one.
Thanks, Laura. And I'm going to try and also pull people into the discussion and allow people to unmute yourself. I know this is dangerous with 300 people. But Eduardo, I saw you had a great question that you put into the Slido as well. And I was wondering if you'd want to read that one out loud and add some context there. And I can start by reading the question too. But it was, Laura, did you have to deal with uploading back the forecast to ERP? If that was the case, were there any guidelines you had to follow with IT?
So you're talking in the stat forecast, I assume? I mean, I know I mentioned doing a check of an upload. So I'm assuming you're going with a stat forecast here because that's what we – the models estimated and calculated are. So I actually worked – the nice thing about designing a system on your own – again, maybe I'm a control freak, but I hope not – is that I had a chance to work with the IT analyst. And he and I have a very good working relationship to say, okay, this is the kind of table we want to build. I need this field. I need this field. I need this field. So we kind of had a custom solution built. There – I think we had to have, like, permissions set so that anybody could – you could write with that account to the table. That's a pretty simple fix. So, no, they were basically like, what do you need from IT? And I was like, here's what I want. Here's what I want. And they said, okay.
Okay, yeah. I kind of did allude to that, I guess. So we are – every – this is something that IT team handles. Well, I guess I could theoretically do this in R. But at the end of the month, before the table that receives the forecast is cleared, that forecast, so all 67,000 now lines of it or whatever, we forecast three years out. That's just – nothing magic about three years, just how we do it. It's copied to another table that has – I think we store 13 months of historical forecast. And then when I measure statistical MAPE, the script pulls in from that historical table and it filters. I use some of the DBI interfaces. And there's – in the parameter argument, you can say, I want to pull in based on this criteria. So you pull in whatever might be your lag three forecast or lag one. So that's how we measure it. But yeah, we do track our stat forecast accuracy pretty carefully.
Open discussion and working group
Great. Thank you so much, Laura. And I see there are a ton of questions that have come in here too. So a few of them don't say if it's for Ralph or Laura either or both. So I
