Resources

Scaling a shiny app to 100K monthly users | Tan Ho | Data Science Hangout

video
Nov 5, 2025
54:10

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

Can't wait to see you there. I am so excited to introduce today our featured leader, Tan Ho, Engineering Manager at Teamworks Intelligent Soccer, and you might also know Tan as a person who volunteers frequently at DSLC, Data Science Learning Community, and also is a huge open source contributor. So Tan, I would love it if you could introduce yourself, tell us a little bit about you and something you like to do for fun.

Yeah. Hey, everybody. Wow, there's a lot of people in here. My name is Tan. I am based in Ottawa, Canada. I've been writing R since 2019. And my background is in business. So I studied commerce as an undergrad. Then I went into my family business, and then the family business was property management, and basically hated that job. So along the way, I started doing, you know, hobby stuff in fantasy football, and, you know, building spreadsheets, doing sports analytics. And eventually that kind of snowballed as I got more and more skilled at spreadsheets, you know, the demand for, you know, doing some programming to support it kind of increased. And eventually that led me to starting work on, you know, programming and fantasy football packages and so on. And then along the way, basically, I lost my job in property management, and then proceeded to find a new job as a construction data analyst. And that has since snowballed into a position with Zealous, which is now called, which was acquired by a company called Teamworks. And now I work professionally in software analytics. So that's the quick rundown. And you asked about something fun I like to do. Definitely, this time of year is prime pumpkin carving season. And I am starting to kind of figure out what pumpkins I want to carve this year. Generally, I try to carve at least one, I think this year, I want to try for two. And I haven't really figured out what those things are going to be yet. But a couple ideas. So we'll definitely take some suggestions if you have any.

Background: from Excel to Shiny

Okay, let's dig in just a little bit more with that background because I want everybody to understand just how amazing it is that you started somewhere that you hated doing something that you didn't like to do a previous life, shall we say to data science, and you built something in Excel with like, power query and stuff, right? Not even an R, you did not know R, you did not know Shiny, you built it and you got so much of a following on Reddit and so many people using your thing that you're like, we need to scale this up, right? How amazing is it that that's how you learned R? Tell us a snippet of that.

Yeah, it's really crazy because like, I didn't have any programming by trade, right? So I went to business school, I was good at Excel, but you know, I wasn't really sure what I want to do with myself aside from that. And so I started, you know, essentially like trying to connect to APIs through power query. So that's the power BI, like querying, like data manipulation language. And that was kind of my gateway drug into like some of the programming stuff. Because, you know, at this point, I had like, I want to say like over 20 fantasy leagues and to try to analyze all of them basically just needed scale, right? And so as soon as you get one league, you can just, you know, do it manually. But when you get to 20, you're like, okay, now, now you're a programmer. And so I put that all together, put it on Reddit, and people started asking me like, hey, can you make me a sheet just like this? And it works fine the first one, two times, but power query at that time, was basically just on Excel for Windows. And so it was kind of challenging because, you know, there wasn't a Mac version of this or wasn't, you know, they had one mobile or whatever, right?

And so built enough of a following. And so one of my friends, Joe Sulewski, who I also happened to meet on Reddit through a fantasy league, basically just offered to write me a Shiny app. He was learning R in school. And so he learned R and a little bit of Shiny at the time. And he wrote the very first version of the Shiny app. And so at this point, he basically built a front end that hooked into my backend app, like my backend, like spreadsheets and data polls and so on, and put it on the internet. And it basically exploded. And he was a grad student at the time. I was, you know, working on, I was working on, you know, as a product manager, just doing stuff as a hobby, but still doing this, mostly with Excel. And so he couldn't keep up with essentially feature requests. And I was like, well, dude, like, this thing's broken, you know, you need to do this, it needs to do that. And I just became that really pesky project manager. And so one day he just gave up and was like, just go do it yourself. Right? And it's like, okay, sure. Where do I do? What do I do? And so my very first experience with R was basically like bug fixing and editing the Shiny app and didn't use tidyverse. It didn't use, you know, it just used Shiny. And it was like this awful base R thing. And he'll forgive me for saying this, because we're both, you know, whatever. But I basically like taught myself with this app with it, I needed to do this and taught myself Shiny first. And then I taught myself tidyverse. And then basically rewrote the whole app eventually to use tidyverse and like to build like this data pipeline that went with it. And that was our first big app. And so my gateway drug to R was a Shiny app that did fantasy football. And then after that, I started expanding from there.

Scaling to 200K monthly users

Wait, wait, wait. Important point. How many users did you have? Because I have worked on apps professionally that I've had like seven users if I'm lucky.

Yeah. I mean, I think it hit, it got called to death basically immediately when we put it in. And that's like Joe's first version. How many thousands of users, Tan? And so at its peak, I think it hit 200,000 monthly users on the Shiny app itself, which was spread over, I think, three app instances at that time. And so that's like with Google Analytics monthly users. But when people talk about Shiny not scaling, I laugh, because I have firsthand experience running an app out of my garage, essentially, uh, with hundreds of thousands of users and like pictures to prove it and everything. And to this day, it's like a hundred thousand users.

But when people talk about Shiny not scaling, I laugh, because I have firsthand experience running an app out of my garage, essentially, uh, with hundreds of thousands of users and like pictures to prove it and everything.

There's a lot of like faces in our grid of people that was like, Oh my gosh, that's amazing. And I got a little, you know, seven users. Libby, I've been there. Yeah. I mean, I think that getting people to use something that you built at a business that's like a dashboard or something is really, really hard. I'm going to put a link in the chat here. Everybody go save this talk for later so you can watch it, um, and get a more in-depth view of what Tan's talking about this Shiny app that was serving a hundred thousand monthly users.

Open source work and professional skills

But my, my question is obviously I've been a huge fan of the open source work you've done, especially around sports analytics, your contributions to the NFL verse, even though I'm not a football fan, primary, I love the tooling around it. Just care, especially for those that may be wondering, just, you know, this out loud, so to speak, how did the techniques and maybe skills you learn in crafting a lot of the tooling behind NFL verse help translate to when you actually were able to get a new role that, as you mentioned, developing these sophisticated R tools or, or, or programs, what kind of techniques and tools and, you know, ideas that you learned from your open source work that have directly benefited in your professional day today?

Yeah, it's interesting, right? Because like nothing I can do. So I work most on internal tools. And I think a lot of people will have, will share this experience, um, as our production engineers essentially is that your user base, especially with Shiny apps, but even just pipelines and stuff is mostly internal. But I think all of the things that I learned working on dynasty process I use today, like every day, right? Like all of the stuff, learning to run your own server, how do you scale things? How do you, um, like obviously like performance, like I write fat, like I write fast code because I had to, right? Like I had to serve, I have to write code that serves that many people. But I think the other thing that it really taught me was just the interface part, like getting people to adopt your use. Like, I think the parts about the packages are very interesting because I I'm on a team now of essentially 10. Our coders essentially. And I think all of the stuff that NFL verse and open source packages have taught me is to really think about the way you design that interface, trying to allow them to basically go through and do whatever it is they need to do. It's part of the, like building the tooling and build thinking, coming at it as like a tools process is part of that, right? Cause I think the temptation from like an, like if you only write analysis for yourself is, you know, you write it. Basically like you write what you need right then. And I almost by default now we'll write, okay, well, what's the like building blocks that makes sense for this general process. And so coming at it a little more procedurally, a little more like what's the interface going to be, whether that's an app, whether that's a package, whether that's like, you know, what, what are the user stories? What are the expectations? That's sort of the main takeaway that I think across the board, obviously like the process of design pack is also, you know, the branding to help me get the job and all that kind of stuff. But I think as a whole that would be like, yeah, I would probably point back to the idea. The main idea is basically like, you know, thinking through user interface story, how they're going to work with your work.

Working out loud and building a personal brand

I think also it's really important to point out just how much Tan worked out loud. So you started coding in R or learning R in 2019, I think. And then during the pandemic, you started Twitch streaming your coding, which means that you were working out loud very early, like probably before you felt like you were a confident coder. Do you feel like that had a big part to play in your confidence level in your, you know, building a personal brand being recognized? Definitely. And I think like the streaming has paid off in a whole bunch of ways, but I think the, the number one way is just like this ability to kind of talk through when you get stuck on things or, you know, pair program through like when you, when you run into a bug, you know, there's no panic, especially like if something's broken, you know, it's often quickest to just think through what the problem is and solve that thing. And I've definitely found it useful. Like the streaming for sure was a big part of that. And I also do it, you know, asynchronously in, you know, forum questions and try to think through other people's questions and, you know, help build those mental models and so on. The branding part for sure. I think part of that, like, I think the, the, the part with a hundred thousand users and whatever, I think like that's a different set of people, right? Like those are people who are mostly interested in fantasy football, but definitely the packages work, working out loud, building public tools. That was a lot of what I would say, like built towards the like public coding reputation and, you know, trying to be approachable, trying to pay all of that forward is one of those like big motivators that I think has just had a ton of benefits every time you turn around.

Defining "code in production"

And one of them is a great question. I think like the part about coding and production, I think it's hard for like data scientists sometimes because it's hard to visualize what production is, right? And so the way I like to think about production is, you know, you have an analysis, it becomes something in production when someone uses it to make a decision. And so if you put that analysis into someone's hands and they're looking at it and doing something with that, based on what they're reading from you, that becomes production. If you're just writing, you know, if you just put a plot for it being pretty or for, you know, communicating something, the moment someone reads it and thinks that they need to go do something, that is the start of the production pipeline, if you will. And so then it's like, okay, well, they made that decision once, but they will probably want to do that repeatedly. So then, you know, thinking about the reproducibility and, you know, pipeline and consistency. But I think in general, everything that influences somebody to make a decision from your analysis is production.

And so moving that, I want to call it a goalpost. But moving the goalpost from this mythical, you know, like code in production is an app, it's a report, it's a, no. Code in, like, the data science part that when it is in production is when someone reads it and makes a decision from that. And so hold yourself to standards for production level thing if they're going to do something with your code. If you're just writing it for fun, if it's, you know, a, my very first package, pseudo package essentially was like a dad joke API app or whatever, right? That's not production. Then I was going to make a decision from that. That's, that would be something I would consider a toy. But if someone was going to do something with my data viz, with my report, with my analysis, that's production. And so anytime you get into something that's going to do that, don't take, like, that's when you need to like hold yourself to whatever your standard of production is. And then, you know, that those standards will grow, the competencies will grow as you work through that process. So like my definition of what production quality is might be different than yours, might be different than any other person, but when to apply that standard is sort of my like thing that I think people should do more of, which is like, anytime someone does a decision, you should do that. So that's, that's when you need to put it in a package. That's when you need to write functions. That's when you need to document. That's when you need to leave it notes for yourself, right? Like all of those things. And once someone makes a decision with it, that's production. So hold yourself to production standards at that point.

I think in general, everything that influences somebody to make a decision from your analysis is production.

Shiny resources and package development

Yeah. And this is for everyone. So, yeah, I definitely those resources didn't exist. Mastering Shiny is an awesome resource. Um, I think I'm a person who learns by projects and helping people like answer questions. Those are kind of the two main things that I like think through. I'm very like, so very like case study situational type stuff. I'm bad at textbooks. I am, um, not very good at those kinds of things, but, um, I know there are a bunch of good resources. I think, you know, mastery shiny is probably the book that gets you from zero to 80 these days. Um, and then from there, you basically start getting more advanced at things. There's a great book by David Grand John, who's basically God of our, uh, of Shiny UX and UI, um, call, I think it's outstanding user interfaces. Um, and I have read that I peruse it every now and then. I think that's like a super useful way to think about those things.

Um, the backend stuff, I think like, actually, you know, there's probably my, my thing about the Shiny backend is that actually people are overcomplicating it in most cases. Um, I think that your backend should be less complicated, move that stuff into a package is my like main thesis there. So I would encourage people to go into our packages next, actually, um, in terms of like production, like, you know, putting Shiny apps in prod. Um, those are kind of the two places that I go to most. I think, you know, I advise working on projects, learning something, um, figuring out what you need at that moment. And then also just like going around and looking at other apps and other websites, right? Like, I think like if you put yourself into that user brain, I think that like, that's, that'll teach you a lot of like, you know, design type stuff. Like people ask, we'll sometimes ask like, how do you be a good like app designer? I'm a terrible one, but like, I think like the ideas are like, there are a lot of people working on really good stuff. How do you emulate this point click information, that kind of thing. So those are kind of the main things I think about for sure.

Yeah. And I think that, um, volunteering can also help. Like if you have a buddy who's working on a Shiny app volunteer to work on it with them, you might learn things. Um, like I didn't learn modularization in Shiny apps until I was working on one for the first time. And I was like, I guess I'm going to learn modules now because I have to do them. I think that, um, the working out loud part can be with buddies. It doesn't have to be alone. You can, you can do it with friends.

And you mentioned going into packages next as a sort of bit of advice. And I know that Travis had asked in the chat, like, well, what's the trigger point where you decide to bundle several packages or products into one thing? Like NFL versus now, a lot of things kind of like tidy versus a lot of things. Um, how big did it get before you decided to put everything in one? I start with everything as a package. Now I find that like, it's a little bit like everyone is, I think this, this will sometimes seem like dogma where I think like, you'll see people tell you like, just start everything as a package. Um, but there's a couple compelling reasons for me that like everything is a package is just the right way to think about things. Um, one of them is that you can like, there's like a bunch of resourcing built around packages, right? Packages, a package is a collection of functions to me. And so as soon as you have more than one function, you have a package in my opinion. And so you can just put that right away. Um, the, the definition of package being just, it's a collection of functions also means that like, that's when you can start thinking about how do I test this? How do I document this as a function? How does someone use this code? Right. Um, people think about like, it needs to meet all of the CRAN bar of like standards, um, which I don't agree with. I think like a package can be just, you know, a collection of functions with documentation and maybe tests and that's about it. But like, when should it be a package from the start is really my answer.

Scaling Shiny: performance and architecture

Yeah. And so the most popular app is essentially a trade calculator. And so, um, as a brief explanation of fantasy football, you basically, you know, you get players on your team and you get points from them doing things on the field. And so each person in a league will have players. And, um, a lot of what the, well, the very, the most successful app is basically just, I want to trade this player for that player and this other player, is this fair? And so the individual requests are, was quite quick, right? Like some of the math stuff behind it, you know, I could pre-calculate and all of that stuff. And so the initial, like each user session is quite short and, you know, their interactions make for very small request turnaround times, but some like some thoughts around that essentially are like moving as much of the logic out of the app as possible. If you can pre-compute it, great. If you can, you know, um, cache common requests, that's great. Um, if you can, um, you know, simplify the interface or reduce the interface that serves the thing so that like they only really get to play with like a limited set of things, then like you do a little less, but you do that thing really well. And then the other part is like, if you can speed up the actual query, so we wrote it in data table, for instance, um, that scales quickly.

And so Shiny is really good at, like, it's easy to, it's, it's the reputation about scaling definitely is partially like how long is the session. So the average session is like 30 seconds or a minute or whatever. They get the answer and then move on kind of thing. Um, but like once, if the app is not doing any thinking, it can serve a lot of users really well, right? Like it's, it's not when it's, when everybody's idle or roughly idle, it's not that huge a pain to have everybody opening the app at once. It's how many simultaneous calculations can you have at once? And so then, you know, that's where you get into promises or, you know, horizontal scaling and like, you know, pushing everybody to different instances of that app. Um, but as a general thing, like, yeah, it's the reason why it's so successful is because it's small and does a very short calculation that returns a thing. Um, and that's like the main, like performance consideration is like making each user's like round trip very, very small.

Writing fast code and benchmarking

Yeah. In general, when you're interested in code timings, you either approach it from inside, like, is it like a single function or you approach it from like the code performance in the bigger system. And so, um, the tech, the tech techniques I use the most is like bench, the bench package to benchmark something to specifically like that set of things. Um, and that's good too, because it also gives you a nice, um, like sort of estimate of like memory usage. Um, and it like, we'll like run multiple iterations of the code rather than just once. So the bench package is my go-to for like timing single function. Um, and then actually like, I just use log messages most of the time with time stuff for like bigger systems. Um, I think that like, it's hard to get a sense of where something is slow. So, um, if you stick a bunch of log messages in, I'm excited about the new open telemetry stuff that Barrett is working on. Um, but you know, that's a similar thing where you're basically logging like code block timing. Um, those are kind of the two things that I'm most, I use the most. Um, but in general, like I don't really optimize something until I see that I need to. And so that's where like, I start with logging and, um, would be keen to try open telemetry for sure for that kind of stuff. Uh, and then once I get into that, I can sort of identify where I should get, where I should spend time to make things go faster.

Development process: use case first

Yeah. Generally use case first. I I'm the, I'm the kind of person who will, for Shiny apps, especially I will wireframe them out on a tail draw or lucid chart or something like that first stigma. Um, now, um, and then hook things into it after, but even for packages, like, you know, I want to, I have a rough sense of like how I think people will use it. And then, um, you know, start with those building blocks first. Um, people, some people are, um, test driven development. That's sort of the same idea. You write the test first because the test is the use case, right? Like you can do that. I know my friend, John Harmon would say something about vignette driven development, which is the same idea. It's a similar idea about like, what's the use case, like draw the story and then build the functions that support that use case. Um, definitely that is how I would think about it because sometimes you can start at too generic, you can start at too high a level or too low a level. Um, so I would definitely start, uh, with use cases and user stories.

Soft skills and mental models

Yeah, that's an interesting question. It was almost like, how did you grow as a person through this Dan? Yeah, it's interesting. And I think like, that's, I think, um, I don't think I'm that different from a traits perspective. I think the things that have helped me the most, um, I, I mean, part of this is like ADHD-ness, but like this, like curiosity and wanting to understand the whole thing. Like I want to, like, I want to go deep on basically everything. And I think like, what that actually is, as I kind of learn more about programming is this idea that, um, mental models are everything. Um, and really fleshing those out takes a lot of persistence, curiosity, you know, willing to help, right? Like a huge part of like building out that skillset is just lurking the DSLC forums and, um, or, you know, help channels and trying to get through everything just to kind of test my own mental model, right? Like some of like part of the, like, you know, things around, like answering questions and, you know, being helpful and paying it forward. Some of it is like motivation of the motivated by helping people. Um, but I'll admit to being really selfish about some of these things is in that it helps me really like test depth of knowledge. And, you know, really, if I can like understand the problem and explain that mental model, um, it really kind of strengthens your own learnings in a really good way.

And so, you know, very curious, um, very, you know, driven to understand something to a lot of depth. Those are kind of like some of my, some of my, some of the things that I think are the most useful from a self-teaching perspective is just like go through, build mental models, understand the relationships between things, why things work the way they work. Um, and it's actually one of the things that like LLMs concerns me the most about is that like people are less likely to develop strong mental models if they, you know, part of their mental model becomes use LLM to understand why this thing is, then just ask LLM and then come back. Right. That's one of my actually biggest concerns about that and the loss of the critical thinking curiosity parts. Um, but, you know, patience, you know, I think that that's definitely a, you know, and like lack of shame, right? Like we've talked about the like learning in public and live streaming. Um, those things I think are also useful, um, not specifically about the self-taught like part perhaps, but, um, you know, I think that those skills and soft skills are great for more for an advancement side of things perhaps, or, um, you know, calmly approaching bugs. Sure. Um, but also having faith in yourself that you can solve basically everything. You solve a lot of hard bugs before, and you can solve anything that you try to.

Mental models are everything. Um, and really fleshing those out takes a lot of persistence, curiosity, you know, willing to help, right?

Shipping at 80% and knowing when to stop

Um, so to the. This is also from John, um, introduced me to this talk, but, uh, this video, but, um, a project is like 1 golden rule for like this sort of thing is always ship a project when you think it's like 80% done and then take a break from that project. So like, there is. Like this thing that people always want to make it perfect to do this thing, but when it's 80% of as good as you think it could be based on where you are now and all your skills ship it. And then, you know, if it's really, really, really. You know, something you want to come back to, you'll find the time. After you take that break and like, you'll come back to it and like users will drive you back to this thing, but always, always, always, always ship it, especially at the 80% mark, because that last 20% is going to take 80% more work essentially. Right. So like the 80, 20 rule of like shipping things is like always ship it at 80%. And then don't go back to it for a long time, essentially. Um, and that's been super helpful to also stop me from going overboard on how deep things to go.

Handing off to non-technical teams

In general, I think that if you're handing a software, like, are you talking about handing an app off to a entirely non-technical team? Because I think that's a slightly different situation than handing it off to technical users who are not as technical as you, right? So I think that's actually two different things almost, because if you're handing it to a non-technical team, you want, um, basically you want to give them something like a Shiny app or like a public, like you only want to give them the quarter output or the markdown output and not the actual app, um, or not the app's code. Right. And so to me, that means you need to have it deployed somewhere and you should hand it off there and, you know, have a feedback form, a Google slide, a Google form link to collect feedback on the thing, um, and have a way for them to communicate thoughts around that prototype. Um, they're going to run into bugs. And so logging is more important in that case, because they're probably not going to be able to like reproduce it. Like, they're not going to hand you a reprex. They're going to be like, you know, I did this thing and it broke. Um, and so to me, like when you hand it off entirely non-technical, that's sort of my like lines of thought is that there's the polish required from you is a little higher and you're going to need to guard against, like, you're going to need to find ways to collect their feedback that are non-code oriented. Um, as far as like less technical teammates who are still coding, um, I think that's where like setting production standards, um, become a little bit of a thing and, you know, being good at templating, being good at having tests and, um, you know, I dockerize everything. And so if you have those things structured and, you know, as many guardrails as you can set up, um, you know, that doesn't mean that they need to understand those things in order to like work with the code. But, you know, I think having more guardrails and, you know, supporting their development as technical coders becomes more important than, um, more so than, you know, the polished layers of like the app kind of thing.

Career advice: make friends, not a network

Um, you know, I think the one that, the one that's that we haven't already touched on, uh, yet, I'd say is making friends. I think make friends in your communities, um, make nerdy friends, talk to people, talk about the things you're interested in, talk with, you know, people doing the work that you're interested in, understand, like, sort of how they think, make friends. It doesn't have to be friends. You don't have to be friends with, you know, people who are like massively further ahead of you, but I think like making friends with, you know, your peers who are looking in this space being, you know, open and, you know, sharing your work and, you know, sharing memes and those connections matter more. Then, and like, when I say making friends, people go like, you know, when I say making friends, I don't say networking for a reason, right? Like a few really good friends are going to be way more than, you know, a LinkedIn button acquaintance kind of thing. So to me, like make friends in the spaces you care about, they, you know, will tell you to, you know, go fix the dang app yourself and, you know, do those things for you that like, you can't force, I think, in like a network, in like a more, you know, over networking way. Like, I think if you think about networking as go make nerdy friends, having a few really good nerdy friends will take you so much further in that space than in any other way.

I agree so much that I gave a talk called why you should stop networking and start making friends. And I really hope that if you are not on the data science discord server that Posit runs, that I help run for Posit, you should definitely get there. Everybody, I cannot wait to see you next week. We have Dudi Roy, head of clinical data science capability management at Boehringer Ingelheim Pharmaceuticals. That's going to be a great time. I really, really hope that you will consider working out loud. Consider making some friends. Consider sharing projects, sharing code, sharing your mess and being brave. I'm very proud of all of you for being an amazing community.