Scaling a shiny app to 100K monthly users | Tan Ho | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

Can't wait to see you there. I am so excited to introduce today our featured leader, Tan Ho, Engineering Manager at Teamworks Intelligent Soccer, and you might also know Tan as a person who volunteers frequently at DSLC, Data Science Learning Community, and also is a huge open source contributor. So Tan, I would love it if you could introduce yourself, tell us a little bit about you and something you like to do for fun.

Yeah. Hey, everybody. Wow, there's a lot of people in here. My name is Tan. I am based in Ottawa, Canada. I've been writing R since 2019. And my background is in business. So I studied commerce as an undergrad. Then I went into my family business, and then the family business was property management, and basically hated that job. So along the way, I started doing, you know, hobby stuff in fantasy football, and, you know, building spreadsheets, doing sports analytics. And eventually that kind of snowballed as I got more and more skilled at spreadsheets, you know, the demand for, you know, doing some programming to support it kind of increased. And eventually that led me to starting work on, you know, programming and fantasy football packages and so on. And then along the way, basically, I lost my job in property management, and then proceeded to find a new job as a construction data analyst. And that has since snowballed into a position with Zealous, which is now called, which was acquired by a company called Teamworks. And now I work professionally in software analytics. So that's the quick rundown. And you asked about something fun I like to do. Definitely, this time of year is prime pumpkin carving season. And I am starting to kind of figure out what pumpkins I want to carve this year. Generally, I try to carve at least one, I think this year, I want to try for two. And I haven't really figured out what those things are going to be yet. But a couple ideas. So we'll definitely take some suggestions if you have any.

Background: from Excel to Shiny

Okay, let's dig in just a little bit more with that background because I want everybody to understand just how amazing it is that you started somewhere that you hated doing something that you didn't like to do a previous life, shall we say to data science, and you built something in Excel with like, power query and stuff, right? Not even an R, you did not know R, you did not know Shiny, you built it and you got so much of a following on Reddit and so many people using your thing that you're like, we need to scale this up, right? How amazing is it that that's how you learned R? Tell us a snippet of that.

Yeah, it's really crazy because like, I didn't have any programming by trade, right? So I went to business school, I was good at Excel, but you know, I wasn't really sure what I want to do with myself aside from that. And so I started, you know, essentially like trying to connect to APIs through power query. So that's the power BI, like querying, like data manipulation language. And that was kind of my gateway drug into like some of the programming stuff. Because, you know, at this point, I had like, I want to say like over 20 fantasy leagues and to try to analyze all of them basically just needed scale, right? And so as soon as you get one league, you can just, you know, do it manually. But when you get to 20, you're like, okay, now, now you're a programmer. And so I put that all together, put it on Reddit, and people started asking me like, hey, can you make me a sheet just like this? And it works fine the first one, two times, but power query at that time, was basically just on Excel for Windows. And so it was kind of challenging because, you know, there wasn't a Mac version of this or wasn't, you know, they had one mobile or whatever, right?

And so built enough of a following. And so one of my friends, Joe Sulewski, who I also happened to meet on Reddit through a fantasy league, basically just offered to write me a Shiny app. He was learning R in school. And so he learned R and a little bit of Shiny at the time. And he wrote the very first version of the Shiny app. And so at this point, he basically built a front end that hooked into my backend app, like my backend, like spreadsheets and data polls and so on, and put it on the internet. And it basically exploded. And he was a grad student at the time. I was, you know, working on, I was working on, you know, as a product manager, just doing stuff as a hobby, but still doing this, mostly with Excel. And so he couldn't keep up with essentially feature requests. And I was like, well, dude, like, this thing's broken, you know, you need to do this, it needs to do that. And I just became that really pesky project manager. And so one day he just gave up and was like, just go do it yourself. Right? And it's like, okay, sure. Where do I do? What do I do? And so my very first experience with R was basically like bug fixing and editing the Shiny app and didn't use tidyverse . It didn't use, you know, it just used Shiny. And it was like this awful base R thing. And he'll forgive me for saying this, because we're both, you know, whatever. But I basically like taught myself with this app with it, I needed to do this and taught myself Shiny first. And then I taught myself tidyverse. And then basically rewrote the whole app eventually to use tidyverse and like to build like this data pipeline that went with it. And that was our first big app. And so my gateway drug to R was a Shiny app that did fantasy football. And then after that, I started expanding from there.

Scaling to 200K monthly users

Wait, wait, wait. Important point. How many users did you have? Because I have worked on apps professionally that I've had like seven users if I'm lucky.

Yeah. I mean, I think it hit, it got called to death basically immediately when we put it in. And that's like Joe's first version. How many thousands of users, Tan? And so at its peak, I think it hit 200,000 monthly users on the Shiny app itself, which was spread over, I think, three app instances at that time. And so that's like with Google Analytics monthly users. But when people talk about Shiny not scaling, I laugh, because I have firsthand experience running an app out of my garage, essentially, uh, with hundreds of thousands of users and like pictures to prove it and everything. And to this day, it's like a hundred thousand users.

But when people talk about Shiny not scaling, I laugh, because I have firsthand experience running an app out of my garage, essentially, uh, with hundreds of thousands of users and like pictures to prove it and everything.

There's a lot of like faces in our grid of people that was like, Oh my gosh, that's amazing. And I got a little, you know, seven users. Libby, I've been there. Yeah. I mean, I think that getting people to use something that you built at a business that's like a dashboard or something is really, really hard. I'm going to put a link in the chat here. Everybody go save this talk for later so you can watch it, um, and get a more in-depth view of what Tan's talking about this Shiny app that was serving a hundred thousand monthly users.

I think in general, everything that influences somebody to make a decision from your analysis is production.

Shiny resources and package development

Yeah. And this is for everyone. So, yeah, I definitely those resources didn't exist. Mastering Shiny is an awesome resource. Um, I think I'm a person who learns by projects and helping people like answer questions. Those are kind of the two main things that I like think through. I'm very like, so very like case study situational type stuff. I'm bad at textbooks. I am, um, not very good at those kinds of things, but, um, I know there are a bunch of good resources. I think, you know, mastery shiny is probably the book that gets you from zero to 80 these days. Um, and then from there, you basically start getting more advanced at things. There's a great book by David Grand John, who's basically God of our, uh, of Shiny UX and UI, um, call, I think it's outstanding user interfaces. Um, and I have read that I peruse it every now and then. I think that's like a super useful way to think about those things.

Um, the backend stuff, I think like, actually, you know, there's probably my, my thing about the Shiny backend is that actually people are overcomplicating it in most cases. Um, I think that your backend should be less complicated, move that stuff into a package is my like main thesis there. So I would encourage people to go into our packages next, actually, um, in terms of like production, like, you know, putting Shiny apps in prod. Um, those are kind of the two places that I go to most. I think, you know, I advise working on projects, learning something, um, figuring out what you need at that moment. And then also just like going around and looking at other apps and other websites, right? Like, I think like if you put yourself into that user brain, I think that like, that's, that'll teach you a lot of like, you know, design type stuff. Like people ask, we'll sometimes ask like, how do you be a good like app designer? I'm a terrible one, but like, I think like the ideas are like, there are a lot of people working on really good stuff. How do you emulate this point click information, that kind of thing. So those are kind of the main things I think about for sure.

Yeah. And I think that, um, volunteering can also help. Like if you have a buddy who's working on a Shiny app volunteer to work on it with them, you might learn things. Um, like I didn't learn modularization in Shiny apps until I was working on one for the first time. And I was like, I guess I'm going to learn modules now because I have to do them. I think that, um, the working out loud part can be with buddies. It doesn't have to be alone. You can, you can do it with friends.

And you mentioned going into packages next as a sort of bit of advice. And I know that Travis had asked in the chat, like, well, what's the trigger point where you decide to bundle several packages or products into one thing? Like NFL versus now, a lot of things kind of like tidy versus a lot of things. Um, how big did it get before you decided to put everything in one? I start with everything as a package. Now I find that like, it's a little bit like everyone is, I think this, this will sometimes seem like dogma where I think like, you'll see people tell you like, just start everything as a package. Um, but there's a couple compelling reasons for me that like everything is a package is just the right way to think about things. Um, one of them is that you can like, there's like a bunch of resourcing built around packages, right? Packages, a package is a collection of functions to me. And so as soon as you have more than one function, you have a package in my opinion. And so you can just put that right away. Um, the, the definition of package being just, it's a collection of functions also means that like, that's when you can start thinking about how do I test this? How do I document this as a function? How does someone use this code? Right. Um, people think about like, it needs to meet all of the CRAN bar of like standards, um, which I don't agree with. I think like a package can be just, you know, a collection of functions with documentation and maybe tests and that's about it. But like, when should it be a package from the start is really my answer.

Scaling Shiny: performance and architecture

Yeah. And so the most popular app is essentially a trade calculator. And so, um, as a brief explanation of fantasy football, you basically, you know, you get players on your team and you get points from them doing things on the field. And so each person in a league will have players. And, um, a lot of what the, well, the very, the most successful app is basically just, I want to trade this player for that player and this other player, is this fair? And so the individual requests are, was quite quick, right? Like some of the math stuff behind it, you know, I could pre-calculate and all of that stuff. And so the initial, like each user session is quite short and, you know, their interactions make for very small request turnaround times, but some like some thoughts around that essentially are like moving as much of the logic out of the app as possible. If you can pre-compute it, great. If you can, you know, um, cache common requests, that's great. Um, if you can, um, you know, simplify the interface or reduce the interface that serves the thing so that like they only really get to play with like a limited set of things, then like you do a little less, but you do that thing really well. And then the other part is like, if you can speed up the actual query, so we wrote it in data table, for instance, um, that scales quickly.

And so Shiny is really good at, like, it's easy to, it's, it's the reputation about scaling definitely is partially like how long is the session. So the average session is like 30 seconds or a minute or whatever. They get the answer and then move on kind of thing. Um, but like once, if the app is not doing any thinking, it can serve a lot of users really well, right? Like it's, it's not when it's, when everybody's idle or roughly idle, it's not that huge a pain to have everybody opening the app at once. It's how many simultaneous calculations can you have at once? And so then, you know, that's where you get into promises or, you know, horizontal scaling and like, you know, pushing everybody to different instances of that app. Um, but as a general thing, like, yeah, it's the reason why it's so successful is because it's small and does a very short calculation that returns a thing. Um, and that's like the main, like performance consideration is like making each user's like round trip very, very small.

Writing fast code and benchmarking

Yeah. In general, when you're interested in code timings, you either approach it from inside, like, is it like a single function or you approach it from like the code performance in the bigger system. And so, um, the tech, the tech techniques I use the most is like bench, the bench package to benchmark something to specifically like that set of things. Um, and that's good too, because it also gives you a nice, um, like sort of estimate of like memory usage. Um, and it like, we'll like run multiple iterations of the code rather than just once. So the bench package is my go-to for like timing single function. Um, and then actually like, I just use log messages most of the time with time stuff for like bigger systems. Um, I think that like, it's hard to get a sense of where something is slow. So, um, if you stick a bunch of log messages in, I'm excited about the new open telemetry stuff that Barrett is working on. Um, but you know, that's a similar thing where you're basically logging like code block timing. Um, those are kind of the two things that I'm most, I use the most. Um, but in general, like I don't really optimize something until I see that I need to. And so that's where like, I start with logging and, um, would be keen to try open telemetry for sure for that kind of stuff. Uh, and then once I get into that, I can sort of identify where I should get, where I should spend time to make things go faster.

Development process: use case first

Yeah. Generally use case first. I I'm the, I'm the kind of person who will, for Shiny apps, especially I will wireframe them out on a tail draw or lucid chart or something like that first stigma. Um, now, um, and then hook things into it after, but even for packages, like, you know, I want to, I have a rough sense of like how I think people will use it. And then, um, you know, start with those building blocks first. Um, people, some people are, um, test driven development. That's sort of the same idea. You write the test first because the test is the use case, right? Like you can do that. I know my friend, John Harmon would say something about vignette driven development, which is the same idea. It's a similar idea about like, what's the use case, like draw the story and then build the functions that support that use case. Um, definitely that is how I would think about it because sometimes you can start at too generic, you can start at too high a level or too low a level. Um, so I would definitely start, uh, with use cases and user stories.

Soft skills and mental models

Yeah, that's an interesting question. It was almost like, how did you grow as a person through this Dan? Yeah, it's interesting. And I think like, that's, I think, um, I don't think I'm that different from a traits perspective. I think the things that have helped me the most, um, I, I mean, part of this is like ADHD-ness, but like this, like curiosity and wanting to understand the whole thing. Like I want to, like, I want to go deep on basically everything. And I think like, what that actually is, as I kind of learn more about programming is this idea that, um, mental models are everything. Um, and really fleshing those out takes a lot of persistence, curiosity, you know, willing to help, right? Like a huge part of like building out that skillset is just lurking the DSLC forums and, um, or, you know, help channels and trying to get through everything just to kind of test my own mental model, right? Like some of like part of the, like, you know, things around, like answering questions and, you know, being helpful and paying it forward. Some of it is like motivation of the motivated by helping people. Um, but I'll admit to being really selfish about some of these things is in that it helps me really like test depth of knowledge. And, you know, really, if I can like understand the problem and explain that mental model, um, it really kind of strengthens your own learnings in a really good way.

And so, you know, very curious, um, very, you know, driven to understand something to a lot of depth. Those are kind of like some of my, some of my, some of the things that I think are the most useful from a self-teaching perspective is just like go through, build mental models, understand the relationships between things, why things work the way they work. Um, and it's actually one of the things that like LLMs concerns me the most about is that like people are less likely to develop strong mental models if they, you know, part of their mental model becomes use LLM to understand why this thing is, then just ask LLM and then come back. Right. That's one of my actually biggest concerns about that and the loss of the critical thinking curiosity parts. Um, but, you know, patience, you know, I think that that's definitely a, you know, and like lack of shame, right? Like we've talked about the like learning in public and live streaming. Um, those things I think are also useful, um, not specifically about the self-taught like part perhaps, but, um, you know, I think that those skills and soft skills are great for more for an advancement side of things perhaps, or, um, you know, calmly approaching bugs. Sure. Um, but also having faith in yourself that you can solve basically everything. You solve a lot of hard bugs before, and you can solve anything that you try to.

Mental models are everything. Um, and really fleshing those out takes a lot of persistence, curiosity, you know, willing to help, right?

Shipping at 80% and knowing when to stop

Um, so to the. This is also from John, um, introduced me to this talk, but, uh, this video, but, um, a project is like 1 golden rule for like this sort of thing is always ship a project when you think it's like 80% done and then take a break from that project. So like, there is. Like this thing that people always want to make it perfect to do this thing, but when it's 80% of as good as you think it could be based on where you are now and all your skills ship it. And then, you know, if it's really, really, really. You know, something you want to come back to, you'll find the time. After you take that break and like, you'll come back to it and like users will drive you back to this thing, but always, always, always, always ship it, especially at the 80% mark, because that last 20% is going to take 80% more work essentially. Right. So like the 80, 20 rule of like shipping things is like always ship it at 80%. And then don't go back to it for a long time, essentially. Um, and that's been super helpful to also stop me from going overboard on how deep things to go.

Handing off to non-technical teams

In general, I think that if you're handing a software, like, are you talking about handing an app off to a entirely non-technical team? Because I think that's a slightly different situation than handing it off to technical users who are not as technical as you, right? So I think that's actually two different things almost, because if you're handing it to a non-technical team, you want, um, basically you want to give them something like a Shiny app or like a public, like you only want to give them the quarter output or the markdown output and not the actual app, um, or not the app's code. Right. And so to me, that means you need to have it deployed somewhere and you should hand it off there and, you know, have a feedback form, a Google slide, a Google form link to collect feedback on the thing, um, and have a way for them to communicate thoughts around that prototype. Um, they're going to run into bugs. And so logging is more important in that case, because they're probably not going to be able to like reproduce it. Like, they're not going to hand you a reprex . They're going to be like, you know, I did this thing and it broke. Um, and so to me, like when you hand it off entirely non-technical, that's sort of my like lines of thought is that there's the polish required from you is a little higher and you're going to need to guard against, like, you're going to need to find ways to collect their feedback that are non-code oriented. Um, as far as like less technical teammates who are still coding, um, I think that's where like setting production standards, um, become a little bit of a thing and, you know, being good at templating, being good at having tests and, um, you know, I dockerize everything. And so if you have those things structured and, you know, as many guardrails as you can set up, um, you know, that doesn't mean that they need to understand those things in order to like work with the code. But, you know, I think having more guardrails and, you know, supporting their development as technical coders becomes more important than, um, more so than, you know, the polished layers of like the app kind of thing.

Career advice: make friends, not a network

Um, you know, I think the one that, the one that's that we haven't already touched on, uh, yet, I'd say is making friends. I think make friends in your communities, um, make nerdy friends, talk to people, talk about the things you're interested in, talk with, you know, people doing the work that you're interested in, understand, like, sort of how they think, make friends. It doesn't have to be friends. You don't have to be friends with, you know, people who are like massively further ahead of you, but I think like making friends with, you know, your peers who are looking in this space being, you know, open and, you know, sharing your work and, you know, sharing memes and those connections matter more. Then, and like, when I say making friends, people go like, you know, when I say making friends, I don't say networking for a reason, right? Like a few really good friends are going to be way more than, you know, a LinkedIn button acquaintance kind of thing. So to me, like make friends in the spaces you care about, they, you know, will tell you to, you know, go fix the dang app yourself and, you know, do those things for you that like, you can't force, I think, in like a network, in like a more, you know, over networking way. Like, I think if you think about networking as go make nerdy friends, having a few really good nerdy friends will take you so much further in that space than in any other way.

I agree so much that I gave a talk called why you should stop networking and start making friends. And I really hope that if you are not on the data science discord server that Posit runs, that I help run for Posit, you should definitely get there. Everybody, I cannot wait to see you next week. We have Dudi Roy, head of clinical data science capability management at Boehringer Ingelheim Pharmaceuticals. That's going to be a great time. I really, really hope that you will consider working out loud. Consider making some friends. Consider sharing projects, sharing code, sharing your mess and being brave. I'm very proud of all of you for being an amazing community.

Scaling a shiny app to 100K monthly users | Tan Ho | Data Science Hangout

Transcript#

Background: from Excel to Shiny

Scaling to 200K monthly users

Open source work and professional skills

Working out loud and building a personal brand

Defining "code in production"

Shiny resources and package development

Scaling Shiny: performance and architecture

Writing fast code and benchmarking

Development process: use case first

Soft skills and mental models

Shipping at 80% and knowing when to stop

Handing off to non-technical teams

Career advice: make friends, not a network

Featured software#

Shiny