Data viz, Shiny app design, & technical career paths | Kiegan Rice | Data Science Hangout

video

Sep 3, 2025

56:07

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Posit Data Science Hangout. I'm Libby Heron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

Can't wait to see you there. Now I would love to go ahead and introduce our featured leader today, Kiegan Rice, Senior Statistician at NORC at the University of Chicago. Kiegan, thank you so much for joining us. I would love it if you could introduce yourself. Tell us a little bit about you and something you like to do for fun.

Yeah, hi. Thanks for having me. It's really good to see everybody. I see some familiar faces both on and off camera. So I'm Kiegan Rice. I'm a Senior Statistician at NORC. And if you don't know what NORC is, it is a nonprofit, nonpartisan research institute. And we do a lot of social science type research, a lot of public policy type research, and do a lot of work with federal, state, and local governments and foundations and things like that.

So I, in my role as a Senior Statistician, actually, weirdly enough, mostly do data visualization and science communication work. I don't do a lot of stat modeling anymore. So I like to joke that I am a statistician. I don't, you know, get the chance to do a lot of deep modeling anymore. But I lead the development of interactive data explorers and interactive applications of all sorts for various clients with different purposes. And I also research data visualization design when I have time. I spend a lot of time talking to both technical and non-technical groups of people and bridging that gap a little bit. And I work with a lot of really wonderful, brilliant, and kind colleagues who research all sorts of things, education, healthcare, public health, economics, public safety. And I feel like I'm probably forgetting one or three, but a lot of different things. I get to kind of play in everybody's backyard a bit.

What about something that you like to do outside of work for fun?

I am a big reader. So, and also a board gamer, though I haven't been able to board game much recently. But I read a lot of sci-fi and fantasy, though I definitely had a lull for a few years there. So I'm just kind of getting back into it. So if anyone has recommendations, let me know.

Introduction to Shiny

I wanted to sort of kick off by asking you just some details around your introduction to Shiny. So like, did you learn Shiny in undergrad at grad school? Did you think that this is what you would be doing when you were learning statistics?

That's a good question. No, I didn't really think this is what I would be doing. So I first started learning it when I was in grad school. So a little bit of background. I have a PhD in statistics. And again, this is why I joke that I'm a statistician, because people sometimes forget that, actually, that I went to grad school for five years and did statistics, because I mostly just visualize things now. But I started playing around with Shiny in grad school and looking at some historical data and some maps. And I made this really, really terrible, looking back on it now, Shiny app that I called user-friendly at the time. And that was like my master's thesis. And it's really funny to think back on how not user-friendly at all that user-friendly Data Explorer was.

But I played around with it a little bit. And then I just kind of didn't think about Shiny for about three years, really. I mostly worked non-interactively and did a lot of data analysis and managed a big research project in the rest of my PhD. And then when I joined NORC, I had focused a lot on reproducibility in graduate school. And they were like, oh, so you can code. You should help us on this Shiny project we're working on. And I totally picked it back up right away. And then the rest is sort of history, as they say. I got stuck in this, like, I just want to develop Shiny all the time hole. And now it's been almost five years. And that's pretty much what I do all day.

Data visualization tips

What are your top tips for data visualization design that you've accrued over the year through your work and so forth?

That is a great question. So, I could probably talk about this forever. So, I will try to keep it relatively brief. But I would say one of the biggest tips that I have is really think about who your audience is. So, we as data people tend to get really stuck in our own use of visualizations when we're analyzing data, when we're thinking about what data vis can be used for. We use it a lot in exploratory data analysis, or at least I do. And I think it's really easy to think that, oh, well, I understand this chart. So, other people will also understand this chart. Whereas actually, most of the time, if we're working with data people, we have much more familiarity and expertise and ability to read charts right off the bat and understand them than most people do.

And so, I think one of my biggest tips that we've actually found in research that we've done, but also we just apply practically all the time, is that you really have to think about who you're visualizing data for. And if that's not a very statsy or data-heavy audience, you might have to actually kind of pull back on how complex of a visual you're using or really think carefully about your labels and how you're kind of describing the chart and how you're laying it out.

Putting too much data in one visual can actually just be really overwhelming for some people, and they might not even engage with the chart if they look at it and think, I'm really going to have to spend some time to figure out what this is saying.

So, that's one of my biggest things, is just really trying to put yourself in the shoes of others. And a great trick for this is asking someone who does not visualize data regularly to take a look at your chart and ask if they understand it. I do this all the time with my friends. I will send them something and be like, does this make sense? And a lot of times they say, no, what are you talking about? And then I fix it. So, that's a really handy approach too, is just to lean on people who maybe aren't as in the data space when you're thinking about visualizing.

I think one of my biggest tips that we've actually found in research that we've done, but also we just apply practically all the time, is that you really have to think about who you're visualizing data for.

Communicating statistical results to non-statisticians

What are the challenges that you've faced when you're communicating statistical results to non-statisticians, and how have you overcome those?

Yeah, another great question. There are a lot of challenges. I think in particular with statistical results, people don't necessarily understand what something like significance means, or if we're thinking about a statistical model, it's not just what any individual coefficient is that matters. It's really, you know, the model as a whole. And that's something that's really, really hard to communicate. So again, it'll be a little bit of a repeat, but one of the things I do when we're trying to communicate like something from an actual model or a test or some sort of finding is run it past people that aren't statisticians.

The other thing I've also tried to do is just not go too overboard with communicating significance, right? So there's, I know there's like a million different discussions that have been had in the past about practical versus statistical significance. And specifically what I do, a lot of it is very public facing, like public policy type research. And so for that, there is a difference between, you know, when we're showing some sort of summaries of something we tested in a survey, you know, people's opinions about something. Does it really matter if percentages are, if we've like really tested that they're significantly different across groups, or do we just really want to show, you know, what those percentages are and some uncertainty around those estimates, whether or not they might be statistically significantly different.

So I think it's kind of a two pronged, like thinking about whether the actual stat pieces are really, really important to communicate to your audience. And then if they are checking with people who like don't speak stat fluently and seeing how we can maybe like update the language to make it a little bit more clear. But p-values, for example, are notoriously very difficult to communicate to people. And I don't think we've really come up with a great solution, but I think just trying to kind of wordsmith the language and even like asking our communications people, like, can we iterate on what a better way to say this is, that's still faithful to what it actually means statistically, but isn't, you know, our standard boilerplate interpretation of a p-value.

Bridging the gap between business questions and analysis

Yeah, okay, she's giving me this. Just to add on just a little bit, I also find that people, leadership asks for, give me a table of A versus B. They don't want a table of A versus B. They want to answer a question, but they don't know, like, they have a business question. They don't need the table. How, that bridge is the hardest one I've found to figure out how to get them to ask me their business question, so I can use my math brain to help them solve the question, rather than just put the data in a way for them that's maybe not the most useful, or to build a model that's maybe not exactly correct for their question.

Yeah, okay, this is a very amazing question, and I, again, could probably talk about this forever, but this is one of the key parts of, like, my day-to-day job, is figuring out how to force people to write down what their research and business questions are, because people do sometimes think that they know what they want, like, when we're building an application, sometimes a client will literally tell us, well, we want a table that looks like this, this, and this, and then once we actually have a more in-depth conversation about what the goals of the tool actually are and what their users might want to do, we really distill it down to, like, what are the research questions.

And I think, you know, parallel with what are your business questions, right? What is the question you're trying to answer, and we just sort of force people to document their research questions, and then from there, you can think about how do we actually help people answer that question. One of my colleagues has a really great way of breaking this down. She says, you know, if you had to write out what this conclusion is in a sentence, like, can you write it out for me, like, what you want to be able to say, and that kind of helps you back up to, okay, what is the question that you're answering? So, like, in your case, you know, if they want to see a specific table of A versus B, like, asking, okay, what is the conclusion you actually want to draw from seeing that table, and then backing that up to what's the question you're asking of that table, and then is, like, is the table really what you want to see, or do you actually want to see this other thing that directly answers your question?

Imposter syndrome and career identity

So, my question was, you know, as having the title of a data scientist, do you experience any kind of, like, cognitive dissonance or imposter syndrome because you're not really doing stats anymore?

Yes, great question, and yes, I totally experience that all the time. It's really funny when I just, like, look at my title, sometimes I'm like, it just doesn't seem right. I should just be, like, the pictures wizard, or whatever, you know, like, it's just a different, and my skill set is different. I do think there's a really funny example, actually, of a colleague of mine. This was a few months ago. We were in an initial call with a colleague of ours about building a dashboard, and they needed someone to do some statistical work on the front end, and they said, oh, well, Keegan's here. Maybe she could do that, and my one of my colleagues who works with me on data visualization, just kind of off the top of his head, was like, oh, if you need someone to do stats, like, we'll ask someone else. That's not what Keegan asked.

And he did not mean anything negative by it at all. He was correct in that, like, if we needed someone to do that, we'd probably pull someone else in because most of the time that's not what I'm focused on, and I prioritize, but of course, then I was like, well, I actually do have a PhD in statistics, so, like, I could do it, but I'm not going to. And he, you know, he profusely apologized, and again, he didn't do anything wrong in this situation, but it was kind of funny. Like, you get into those weird, like, situations where people's assumptions of what you can do and what you actually do maybe don't always align and differ across people.

How data visualization is evolving

How have you seen data viz evolving and any good books to recommend?

Yeah, this is a good question. So, I have seen it evolving in several ways. I think I've seen one arm of data viz that gets into these really, really complex, like very integrated and detailed visualizations that are just things we literally couldn't visualize before, right? Because technology, visualization technology has gotten better, and there are entire sections of conferences that explore fancy new visualization formats that just weren't computationally possible before. And so, that's one area that I think is really, really cool. I'm blanking on any examples at the moment, but just being able to visualize more data and in different ways and get creative with how we actually visualize it.

And that's kind of one approach, but that, kind of like I mentioned before, is really for the data people, right? Because like most, like the average person isn't going to necessarily understand those more complex or detailed visualizations or new formats they've never seen before. The other thing I would say is, like the other kind of angle is for more simple visualizations, I've seen a bit of a change in how people present the data. So, there's a really interesting movement towards changing how we think about labeling our data visualizations and explaining them to people. And a couple of trends I really love are, you know, thinking about how we define legends and titles in creative ways to kind of help the user understand what they're seeing. Or even, we're seeing this more in like visual or in journalism, like data journalism, is actually explaining, interpreting a given point or piece of a graph for the user. So, like an annotation on the chart itself that explains to you what a specific data point means. And that kind of helps center the user a little bit in, here's what this actually means for this one point. Okay, now zoom out and look at all these other points in this chart, like a great examples on a scatter plot. If you interpret what one dot on the scatter plot means, that makes it easier for people to understand what all the other dots mean.

In terms of book recommendations, there are a ton. And one thing I'll say as a data researcher who like studies graphical perception and understanding of charts, is that like almost every book has a ton of recommendations that are different from other books. And almost all books that are out there are heavily based on practitioner opinion. And I think part of being a data visualization person is that you have to kind of form your own style and your own opinions a little bit. So, I wouldn't recommend any book over any other book really.

Working with federal clients and public data explorers

How do you interact with your clients? A lot of them are death by tables. And how do you sort of get them to modernize, to use visualization that is not death by tables?

Yes. This is definitely a very common problem in this space where we work with a lot of federal clients. And partially because federal agencies have a lot of existing standards around how they report out data and things like that. I do think it's a challenge. I don't know if there's like a perfect strategy that I've found by any means. But I think one, this is going to sound silly, but showing them, visualizing it rather than just telling them can sometimes be really effective. So, even just sometimes showing them an alternative, even if they didn't necessarily ask for it, can help get them a little bit on board with, oh, I really see the pattern now. And I also think there's been a bit of a shift towards more like public facing visualizations in the last couple of years.

Because I think there's been some tools that have come out and data explorers that have come out that have really shown people the benefits of visualizing the data. But I think it's just almost like sometimes you have to show them the difference of like, here's what this looks like in a table. Here's how powerful this pattern or this finding could come through if you actually visualize it in this way and let people explore it that way.

What's the biggest mistake that you've seen in like a public data explorer design? And it takes some like really nice data, but makes it like completely unusable or just unworkable in terms of visualization. And how would you have fixed that if there's anything that just sticks in your head?

This is a good question. I'm trying to think. I think the main thing that I experience that frustrates me as a user in public facing data explorers that's bad is lack of clear communication of what's going on. So it's sort of like, there's a lot of like, there's a million drop downs, a million options you have to select in order to pick your data. And it's not necessarily clear what you've actually selected. Maybe that's filtering it. Maybe that's how you want to group it or aggregate the data or something like that. And then you just get like a chart and the hover over it, like if it's interactive, the hover overs or tool tips might not really explain what's actually happening in the chart. The title might be very like dry for lack of a better word. It might just be kind of like, here's what you queried. And the access labels might not be super strong.

Like, I think this is a very common problem in government data portals and data explorers is that there's a lack of context and a lack of helping people actually understand what they're seeing. It's just kind of like, okay, great. You queried the data. Here's a bar chart. And it doesn't really help you actually understand the bar chart. And sometimes it's not clear what the denominator is in the bar chart. If it's a percentage, what the numerator actually is, it's just showing you some value. And I think that's a big miss that I see a lot is lack of clear labeling. Like, if I have to really think about like, how did this percentage get calculated? That's bad. Because like, I think about data all day long. And we want to make sure that if people are accessing federal data, they don't have to think about data all day long to understand what they're seeing on a page.

And I would say, what we actually do is try to focus more on curating a specific view. The approach that we try to take is focusing each view on a specific style of research question, or a specific type of data. And also, I talked about this a little bit in my PositConf talk last year, but we also try to make these what we call breadcrumb sentences. So the title on the page or on the chart actually interprets the chart for you a little bit. So when you're plugging in, you know, an example on our live crime tracker site is, if you want to look at crimes on a specific day, and a specific crime type, you know, we're filling in that sentence at the top. The sentence at the top says aggravated assaults on July 1st, and we've kind of highlighted the selections that you made within that sentence, but it's very clear, like, what the context of the chart you're looking at actually is based on what you've selected. And then also, you know, we try really hard to make hover-overs and tooltips on all of our interactives that basically, like, interpret the data for you, right? So if you hover over a specific bar in that chart, it'll say, you know, there were 27 aggravated assaults in Chicago, Illinois on July 1st. This is a rate of however many per 100,000 residents.

We also try to make these what we call breadcrumb sentences. So the title on the page or on the chart actually interprets the chart for you a little bit.

Shiny deployment and interactive tools

My current data viz decision tree is Quarto for static things without a super high degree of complexity necessary and then JavaScript otherwise. Is there a middle ground for Shiny and what's the development situation like? Can I use sites with free tiers like Netlify or Vercel with Shiny apps?

I would say like so all of our Shiny apps that involve pretty complex interaction, we host on either a Posit Connect server or on a Shiny server. And I am not aware, besides like Shiny Live, which I think is another option that's come out to kind of develop Shiny applications in the browser. I know like we've actually run into IT security issues with trying to use Shiny Live. So take that as you will if you work in an organization that has careful IT policies. But that's another option where you can actually like embed a Shiny app into a website. I'm gonna probably say this wrong, but I think it essentially like installs all the dependencies for the user in their own browser. So the first time they go to use it, it takes a while to load. But then after that, it's basically like a Shiny app just running in the browser, which is actually very, very cool.

Deepsha Mangani did a talk on crosstalk in 2023, I think, in Chicago. That's a great example of like something where you can have things talk to each other. And when you change one thing, something else can change. But you don't necessarily need a Shiny server in the background doing stuff.

Getting started with interactive data viz

What advice do you have for someone who's just getting started with interactive data viz? So, there are lots of options out there. Shiny, Tableau, Power BI, others?

Yeah. This is a good question in the sense of, I think there are a lot of different ways you can approach it, and none of them are wrong, right? Data viz is a very multifaceted process. So, that means there's a lot of different tools you can use, even within the R ecosystem. I saw some different people popping packages in the chat earlier, like, you can use Plotly, you can use Giraffe, you can use React Table, you can use GT . There's all sorts of different choices you can even make, even once you're in a specific framework, like R. Then, like you mentioned, Libby, there's also Tableau and Power BI. There's also pure JavaScript, things like D3 and Observable.

And I think the real answer is, like, there's not really a wrong choice. We actually use a lot of different approaches, really, depending on the project. And for us, because we're, like, very client-focused, it depends a lot on our clients' needs. But I will say, like, me and the people that I tend to work with have a very heavy, Shiny bias, as you will not be surprised to learn, because it's the framework we're most comfortable with, and we actually think it's really powerful for a lot of reasons when we're working with data of various complexity. It's just often a really strong approach to build something for a reasonable amount of money, and that's really well-designed and works well.

So, in terms of picking a stack, there's not really a wrong choice. I think the biggest advice that I would give is just to start playing around with stuff. If I were to go back and look at all of the charts that I made in grad school, and even in my first couple of years at NORC, I'd be like, wow, I did not know what I was doing, or like, wow, that's really ugly. But it's not. It's just that I was still developing my own skills and my own style and my own understanding of visualization. And I think it's easy to look at all these amazing visuals out there. I used to look at the TidyTuesday feed on, like, Stats Twitter and be like, who are these people? Like, they're amazing. They're coming up with these beautiful visualizations in, like, an hour.

But I just, I remember thinking, like, I will never be as good at that as these people, but really it was just that I hadn't played around with it enough, you know, and now, oh, let's see, 12 years into working in R, people come to me all the time with ggplot questions, and I'm like, they're like, you're faster than Googling it, and I'm like, okay, well, that's great. You could have just Googled it, but like, you know, I'm happy to answer the question, but I never would have, like, thought that that's where I would be when I started, so I think just give yourself some grace to just start small and, like, learn whatever system you want to start with and just play around with lots of different data. TidyTuesday is a great, great thing for that. It just gives you lots of opportunities to try visualizing in different ways and getting creative, even if it's just recreating what someone else made. Like, that's really good practice, actually, is, like, can I exactly recreate what someone else already did, and it takes some of the stress of having to make the decisions for yourself and helps you learn a little bit more how to actually just do it.

Attending PositConf for the first time

I'm not going to say don't be nervous because you'll probably be nervous no matter what if you're giving a talk the first time. So, I gave a talk for the first time in 2022 with a colleague of mine. We gave a joint presentation, and I think we were both so nervous. He told me afterwards, I think I just skipped part of what I was supposed to say. I gave another one in 2024, and actually the room was a lot bigger, and there were a lot more people there, and I felt less nervous. The first time might just be nerve-wracking, but I would just say this, as evidenced by the chat, this community and the PositConf community is one of the most supportive communities out there, and you're not out there presenting in front of people who are going to be judging you. You're out there presenting in front of people who are excited to hear what you have to say and really just want to get to know you and get to know the cool thoughts that you're going to share, or the cool example, or the cool project, or whatever you're sharing. It's a really cool opportunity to just show people a little bit of who you are and what you're interested in and what gets you excited.

Data lineage and trust in visualizations

Especially with the current climate and distrust of data sources and fake data and all that, how difficult is it and how important is it to describe or show the lineage of the data so that people can trust your visualizations?

Yeah, this is a phenomenal question, and I would say it is both very important and very difficult, but it is totally worth it. So, this is something we think about all the time is the transparency, and in my spare time when I'm not thinking about visualizations, I'm a reproducibility person as well, and so I'm also very focused on that whole traceability back to the original data source is of the utmost importance. I think it can be really challenging because you don't want to inundate the user with too much information, but one of the things we try to do is give people the option to explore the data source information.

There's two sites I'll mention. One is our live crime tracker, which I talked about a little bit, and I talked about a lot in my PositConf talk last year. When someone's looking at a specific city or a specific crime type, at the bottom of the site, we have all these details about that specific city's data, or if it's a specific crime type, we print out what the actual definition of that crime type is, but we also have a whole methodology page where people can go explore how we actually grab the data and actually, for a specific city, pull up the link to their data portal that we're using so they can go look for the data themselves. And that one's a little bit unique in that we're pulling data from like 55 different sources.

Then, for things like the Medicare current beneficiary survey, so this is a project that NORC works on that's a survey of Medicare beneficiaries. We built this chart book website that's an interactive data explorer for all of the survey stats for that survey in a given year, and anytime someone is looking at a specific variable, we literally print out the measure construction from the survey underneath, and the definition of every single variable that's involved in the visualization on the page. This goes back to Stephanie's question earlier about federal clients. They're very specific about all of those details being present, and so that one can be maybe a little more overwhelming, but you can toggle between like which details you want to see on that page as well, and so you don't have to look at all of them all at once, but I would say it's extremely important to do it, and it can be difficult to integrate it without overwhelming people, but it's still worth doing.

I would say it is both very important and very difficult, but it is totally worth it.

Well, thank you so much for being here, and everybody have a wonderful weekend, and next week, Donnie Unardi is going to be with us, Principal Data Scientist at Genentech. I hope that you will come join me and talk to him. I love Donnie. He's so much fun, and it's going to be an amazing time, as always. And reminder, if you enjoyed this and you would like to have friends come back with you, send us to them, pos.it slash dsh. We would love to see them too, and we cannot wait to see you in the PositConf Discord server, which is opening up soon. The best way to get on that is to go register for PositConf. You don't have to register in person or virtual specifically to get on the Discord server, either one. You're going to get an invite either way in your attendee portal. Thank you so much for being here. We'll see you next week, everybody. Bye. Bye, everyone.

Featured software#