Data Science Hangout | Joe Gibson, de Beaumont Foundation | Collaboration Across a Team
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to the Data Science Hangout. If you're joining for the first time, it's great to meet you. I'm Rachel, I'm the host of the Hangout. If this is your first Hangout, this is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing, what's going on in the world of data science. And so we want this to be a space where everybody can participate and we can hear from everyone. So there's three ways you can ask questions. You can always jump in live and just raise your hand on zoom that seems to be the easiest way. You could also put questions in the zoom chat and just put a little star there if you want me to ask the question out loud if you're maybe like in a coffee shop or something. And then we also have a Slido link where you can ask anonymous questions.
Just want to reiterate, we love to hear from everyone no matter your level of experience to or, or the industry that you work in. But I'm so happy to be joined by my co host for today, Joe Gibson. Joe is a senior project director at the Beaumont Foundation, and former director of epidemiology at Marion County Public Health Department. And Joe, I'd love to have you jump in here and introduce yourself and maybe share a little bit about your role in the foundation.
Sure. So I've been at Beaumont just since December, so it's not that long. We're working on a really interesting project to develop health centered local data ecosystems that already exist all over the place. I mean, there's already health data at any location that gets used by different people for different purposes, but try to develop those so that they're more focused on equity and anti-racism with a lot of community engagement to guide the use of that data, especially engagement by communities that are generally marginalized. So try to focus the use of that data toward identifying structural racism and addressing issues of structural racism, bring the data more to bear on those issues. So anyway, that's what I'm doing at Beaumont, and I'm just kind of informatics advisor as well there. But I've spent 18 years as Indianapolis's epidemiology program director within the Marion County Public Health Department and been very involved nationally in trying to improve public health informatics, essentially. A lot of partnership with healthcare informatics. We've got the Regenstrief Institute here in Indianapolis, which is a real leader in terms of clinical informatics.
Growing the team at Marion County Public Health Department
Thanks, Joe, and thanks for all that you're doing in this space, too. I know I actually was introduced to you by Jennifer, who comes to the Hangout quite a bit, and she had let me know how you had grown the team from one to 17 epidemiologists back when you worked together at the Public Health Department, and I'd love to just kind of hear a little bit about how you did that and your journey in doing so.
A lot of the credit goes to our director. She really values data, and I came in, you know, I'm just a data geek. There were four people in the group, and I just started organizing our data sets into a central area, trying to link data together and trying to amass more data and organize it so we can be very responsive to any sort of data requests that came through from the director, from all our community partners. All the, in public health, public health only works for partnerships, so there's all sorts of people in the community who are concerned with health and are trying to get grants to improve health clinics, you know, a hunger network, all sorts of groups like that, so we tried to set up systems so we could really quickly respond to their needs. And so we were able to use mainly SAS, and then we got into R to organize the data sets and present the data out to the community and then partner with a lot of other local sort of data-oriented organizations where we could give them data and they could create interfaces to it as well.
Just over time, just by being valuable, by pulling in more grants, the group slowly grew. I was there 18 years, so the growth from about four to about 20 was over 18 years. About half of that was through grants and about half of that was through general fund from the health department because we were valuable.
Hiring, skills assessment, and code culture
What was that process like of getting people on board to use SAS and then R?
It was interesting. It's always so hard when you hire somebody to really have a good assessment of their skills, and we were constantly trying to figure out how to assess skills as we hired people. You know, we'd have various, you know, at first it was around SAS, and so we had sort of this question about people's experience and skills in SAS, and we'd ask them very specific to do specific tasks around SAS, and I remember somebody who did really well in the quiz but turned out not to just sort of have a mind for doing the programming, and other folks who didn't especially do well on those quizzes and just grokked the program and were able to do amazing things.
So never really cracked that nut, but always had a lot of support within the group, so we tried to create a really open environment where people were expected to ask for help from other people, that nobody was, working on your own was not a good thing. We had to balance that with some people just wanting to sort of, learning by discovering themselves, but I really tried to push people to ask questions, and we set up systems where everybody's code got reviewed by somebody else in the group, which was, in part it was a quality check, but more it probably provided value by helping people see different approaches to coding. So if somebody took one approach and somebody else comes in, they go, oh, you know, I'm learning something from your code, or they're looking at code that could be improved and they're teaching somebody how to improve it. So just a validation process we had for any work products being produced, the code had to be audited, we called it a code audit, by somebody else.
That's been, that was a really good learning tool within the group to bring people along in their programming. And then we've tried to have an environment where people could just go, like, encourage people to innovate and take chances and create things, and it's, you know, it was okay to make mistakes, but it wasn't okay not to try things.
And then we've tried to have an environment where people could just go, like, encourage people to innovate and take chances and create things, and it's, you know, it was okay to make mistakes, but it wasn't okay not to try things.
Working virtually and communication norms
There are good and bad things about it, but in Net, I think we actually were more effective when we went virtual. We used WebEx teams for, you know, chat, but, you know, everybody's got some sort of chat tool. And that actually increased a lot of the interactions that we had with people, because it was so easy to quickly send a message to somebody else about something. So, in Net, it was kind of a good thing. There's still a lot of challenges in sort of establishing the relationships, especially when you hire new people, and it was all virtual to really make somebody feel like they're part of a team, you know, we could have lunches together, which was always the highlight of the day.
Yeah, we had a couple of discussions in our staff meetings around that. What will be our protocol? What's acceptable? What's not acceptable? To try to encourage a reasonable amount of communication without people feeling imposed upon. So, we set some standards there.
Adopting open source tools and transitioning from SAS to R
Not really. We, we had a lot of autonomy in terms of what tools we decided to use. So yeah, IT really didn't challenge that much.
I might dig into this a bit deeper, because I was just on a call yesterday with three people who were having struggles with convincing IT or just, like, bringing open source tools to IT. How did you introduce it to them?
We just asked for it. I mean, but this was, this is only, we only got heavily into R. We got very heavily into R during the COVID pandemic, just early on in that, because the visualization interface was just a lot easier for us to work with in SaaS and to expand upon. But we'd started dabbling in R just, just like five or seven years ago. And by that point, our IT group, which is a great, great group, really a lot of forward thinking people in it, was pretty open already to open source, you know. So it wasn't, there really wasn't a lot of questions. They just, we said, this is something we need. They said, okay, set it up. They weren't happy. They wanted, they wanted to stay on Windows systems. So, you know, we didn't get to use some of the Linux stuff, which might have made some things easier, except that they just wanted one OS to maintain.
Rachel, I, I can chat about that, but I don't want to monopolize anything. We're going through it constantly and we're in our, yeah. So like, I, I feel the pain of where the question has come from. I've had it in many places and we're doing it now. And it just, tomorrow I have a architectural review board. Why are you considering using R for pipelines into GCP? Just like those questions because IT don't understand, right? Like how is a stats language a thing to do stuff with data pipelines? There's a lot of education.
Yeah, I was just going to add at, at my last company, this was like a fairly regular conversation because there was a cohort of kind of your classical statisticians that were just very comfortable doing their work in SAS. And we got around that. I mean, they still use SAS to a certain extent, but having like very tailored walkthroughs of things you can do with R for like very specific teams. So like, Hey, let's look at your workflow right now and let's see how we could do this in R. That always seemed to really impress, especially when you kind of wrapped it around R Markdown or something to like, you know, very quickly and cleanly present your data. So, yeah, I think just having these one-on-one kind of smaller trainings with these groups of SAS users, like here, here's how you do it in SAS. Let me show you how you could do it in R. I think that goes a long way.
We're trying to get people into R. We did some of that stuff as well. I was just spotting whether people would pick up on it. Each of our ethies is pretty independent in their programming. I mean, we have those code outs and stuff, but in terms of their assignments, they can choose what they want to use for what they're doing. And it was hard to get people to dig in to a new system. Because, you know, it's easy to use what you know. But once we did, some people really took to it.
Managing mixed-language teams and code archiving
Yeah, so I just want to ask that within your team, you mentioned that people use both SAS and R. Do you have situations where some people are really into SAS and people are really into R? How do you manage considering you're still code reviewing each other? Because this is speaking from my experience, and I have colleagues, we use not SAS, but SPSS. But we have people who have 10, 20 years of experience doing SPSS, and it's difficult to teach old dog new tricks. So it's difficult to get people to switch on to new stuff.
Yeah, and that would come down to who you choose to do your code review. So we had a system where you kind of rotate through everybody. But if it's going to be some complicated piece of R, then you've got to, there's just a limited number of people that are going to do it. Same with SAS. I mean, we didn't have people use SAS, but we didn't have a whole lot of people who were super experts in SAS. So it just came down to who was going to be doing the code review there.
Again, though, it was a great opportunity for discovery and for people to see what was possible. So I think, again, about COVID, about the, we created a dashboard, which I wouldn't, I brag about how great the functionality was and how it looked, but we're not, we weren't expert enough to actually create a dashboard that would load quickly. So it takes like 30 seconds to load, and then it's amazing. But when people saw what was possible, it would get them excited. And then if you could show them how simple it was to add a new graph or something, that could get them engaged as well.
So we stayed very organized in our code. And any work product we got produced had to have code. It was archived. It went into a repository and it got changed. So we had a lot of controls so we could always do what I call a data audit. Any work product that we produced over the 18 years I was there, you could take that work product, there'd be an identifier on it and you could track that back to the code that produced that data thing. So if there's a problem with it, we can figure out what the problem was. Or if you wanted to reproduce it, you could find the source code and you could update that and use that code again.
So we built this code library that had a foundation and sort of evolving code for different functions that we often did. By the time I left there, there were thousands of programs in this library, but they're tracked in a database and you could search it and you could find useful code. And it didn't really matter if you use SAS or R. It was just to get the work done. I mean, I didn't really care. I mean, I really wanted the group to move toward R. I was kind of hoping long-term we could bounce off the SAS and maybe even get to like just having a Python R environment just to get to free programs that we could share with other public health agencies and sort of move the whole public health informatics world forward by being able to share code more freely and not be limited by some of the costs involved with SAS. But I didn't care if people used R or SAS. And we created code that would allow people to call R from SAS or SAS from R to try to go back between the two. And we had a lot of model code for using the two of them together to take advantage of some of our graphics and some of the SAS data management. But we didn't have rules about it. And we didn't have like an FDA breathing down our necks to say you have to conform to this or that. We were free to do whatever we decided the standards would be.
Improving public health through communities of practice
I know something you're especially passionate about is improving the public health system. And I'm curious if there's ways that we as a community can come together to help do that or what ideas you have for doing this.
There's some exciting stuff going on in public health informatics around R, around just open source data sharing. They're kind of one-off initiatives for the most part. But there are folks who have GitHub sites where they're sharing their code. There are some communities of practice. There's one around, they're called syndromic surveillance, but essentially looking at emergency department data. There's a national system and there's a lot of local systems that pull in emergency department data and look at it for emergencies that are occurring or suddenly there's a bunch of this disease or that disease. We're tracking COVID. There's all sorts of very useful things you can do in emergency department data to get a more real-time picture of what's going on in your community. And there's a community of practice around that. And the community of practice has a group within it that's doing a lot of sharing of R code for analyzing this data and some kind of standards for it.
How this community might help is just participating in community of practice. They work because people engage in them. I mean, you guys have one right here. So being engaged in a community of practice, supporting it, moving into getting very involved and helping to move things forward, lead things, take initiative and really get engaged. I've got to say my own career really comes, I am where I am now because of communities of practice. I started in public health back when people wrote with chisels and stones, but I evolved to where I am now, which is I'm very involved nationally in a lot of stuff because of communities of practice by it's sort of the duocracy, the people who do come into power, you get engaged, you see what has to happen, you try to move it forward, you work with other people to make things happen and you build things. A lot of them fail and some of them don't work, but you create a network that helps improve things.
I've been involved in trying to start communities of practice that just fizzled and others that didn't. Some of them take some of the ones, the big one I think of that fizzled is because it was sort of attached to a research initiative that the CDC had. Suddenly the funding got pulled from it and so we were sort of the practice engine or the practice appendage that was attached to this sort of research community or research funded project and that went away and so the appendage just sort of faded and that'll happen.
It's got to be around something that people care enough to spend time doing. There's got to be some immediate payoff to people. It's got to be something people care about but you've got to have enough, you've got to have a couple people who are spending a lot of their time thinking about the community and how to make the community work and how to engage the community. I mean, I think Rachel, you probably have this role within this group of how do I do something that these people care about and bring them together so they get value out of the time so they're willing to come next week to do this again and then relationships start to happen because people start to meet and talk to each other and side projects start to build out of that and some of those fade and some of those are successful. There's value that comes out of that and it can grow but you really have to have the gardener. You have to have the person cultivating it or the several people cultivating it really to create enough of a payoff kind of socially and value in terms of work that it moves forward and that only happens because people take a risk of investing their time in it. Somebody's got to gamble and put in that effort.
Building a code archive database
Yeah, and this is as user-friendly as we've gotten. It still requires that each time that you create a work product that you log it into this database. So that's the trick, making sure that people log it. So when you start a work project, you go to the database, it creates a new ID for that work project. You put in a title, you put in just some basic information about the purpose of it. It's not, you know, there's only like five fields you have to fill in. And then you hit a button that creates a folder for it. It kind of does some background setup to make things easier for you. And then you can store your stuff within the structure we've got on our network that allows people to find things easily and is based on that ID. And then that ID, you're responsible for that ID on any work product or within the name of the files that you create associated with this.
I'm not being really clear about that, but essentially the core of it is there's an ID that's generated every time you do a work product. That ID is the first thing in all the file names associated with that product. For the most part, sometimes you have to put at the end because you want people, other people who aren't techies to read it. And it goes into a folder that's in a structure where it makes it easy to find all that information. That's kind of the core of our system.
So that's the trick, making sure that people log it. So when you start a work project, you go to the database, it creates a new ID for that work project. You put in a title, you put in just some basic information about the purpose of it. And then you hit a button that creates a folder for it.
Documentation, code standards, and handling coverage gaps
Going back to, I know we were talking a lot about using different languages and how you manage people using different languages. But another question was, how do you handle the situation when something breaks and there isn't coverage across a specific language? Like if somebody's out on PTO or something.
We have enough people who know SAS and R that we figured it out. Part of that goes back to the documentation because there should be a record that explains what this program, well, you get the work product, you can track it back to this ID, to the program. We have a standard header that goes in each program that's supposed to explain enough about that program that people should be able to work their way through it well enough. I really encourage, as a light way to put it, that people put comments in the program so that other people can read it. You should, in my group, if you write a program, you should think about writing a program to somebody else in the group. You're not writing a program to get something done. You're writing a program, yes, to get something done, but to get it done in a way where somebody else in the group can pick that up and understand what you're doing. So there's got to be comments in your program.
And then we've got enough people who know SAS and R that we can figure it out for the most part. It's a risk. It's a challenge, but it hasn't been too bad because we do require some standards and documentation. And again, that's a pain in the neck. It takes time and investment and people, you got to enforce people doing that. But the code audit helps with that as well. Part of the code audit is making sure that the program's clear enough and commented enough so people can understand it.
RStudio snippets and code sharing
I see, Mike, you had asked, has anyone developed our snippets and distributed them across the team?
We've got, yeah, we've got, in addition to having sort of the library of all our code, we've got some folders that have handy code snippets and we just label them for, you know, here's a good way to do this. Here's a good way to do that. You know, so people can go in there and find useful approaches. And then the code itself is searchable.
Sorry, are those snippets actually within the RStudio IDE or are they in a separate document? They're separate. We're just, we just have, you know, a land file system with folders on it. And one of the folders is useful code and macros, something like that. And everybody knows you go there to look for some tips.
I mean, I don't know if people are familiar with the snippets within the RStudio IDE, but they're super cool because you can set up template frameworks where as you pull in the snippet, it jumps you to, you know, what's the next thing that you need to tailor for your own situation? It's essentially like a function, but it's a more of a kind of open code.
Those are super helpful, Mike. I didn't even know they existed until like six months ago. And I was shocked when I found out about it, but super helpful. I have all kinds of snippets for myself and my team right now. Yeah, they live in a specific place within your kind of, I think it's within app data or something. So it's not terribly easy to share them between people. But if you're in an organization where you can roll out an installation or, you know, configurations like that, then they're a really helpful thing to know about.
So if you're in the RStudio IDE, when you start typing a function name or something, you can use tab to auto-complete. But you basically give the little kind of template of code has its own little name. And so if you hit tab, it will auto-suggest one of the things it will auto-suggest is either a function or a snippet. And then if you choose the snippet, then if it's like a ggplot framework, then it will say, you know, what's your data? What's the aesthetics? And it kind of just guides you through filling in the right information. And so, as I say, it's almost like a function, except that it's just helping you. It's a framework for writing your code.
Hiring for initiative and the STAR interview method
Well, so when I hire, I'm not necessarily hiring a data scientist. I'm hiring an epidemiologist. And for an epidemiologist, I want somebody who, well, for a worker, I want somebody who's going to take initiative, who's going to be able to take the initiative who's going to be able to work independently or get to a point where they can work independently. Who's got, you know, I don't know. There's not one. They're going to have to have good content knowledge. And yeah, I would love to be able to have x-ray vision where I can see that they're kind of like you, Jen, where they've got this amazing data brain and they, you know, they go off to this stuff and they can, they understand, you know, they just sort of start to intuitively understand data structures and how to analyze stuff. But that's not necessarily essential. It's really more about people who are going to take initiative, be able to work independently and have some, have the right knowledge, at least. I love people with experience, but usually we end up just trying people out of school. So it's at least having sort of a basis of knowledge. And, well, and it's public health. So they've got to have the love for public health. They've got to be invested in what we're trying to get done. They've got to have the mission.
We use a process called targeted selection that I picked up years ago. I used to work for Eli Lilly, the pharma company. And they gave me this training and this interview process. But, you know, you lay out dimensions that are important, and then you have several different teams at different sessions ask about those dimensions. You have several people asking about each dimension in different, separately. So you can combine, you can discuss afterwards. We make sure we ask for examples. We try to really nail people down to saying, tell me a specific situation where you've been challenged with this and what did you do and what was the result? As opposed to people saying, I'm really, I really like doing this. I'm really good at that. You know, just abstract talking and trying to get people down to talking about really a lot of specifics.
And it's hard, but it's, I think probably the biggest tool I've used for making sure, seeing what people really have is asking them for specific examples and asking them. And it's called the star format. But you tell me what's the situation or task you had what did you do about it? And what was the result? That illustrates your ability to whatever you're interested in taking initiative or your investment in the mission of public health, whatever that might be.
Partnering with communities and the move to de Beaumont Foundation
Just based on a previous conversation is we talked a lot, a bit about like technical skills are important, but a lot of it is also learning how to partner with people as well. I was wondering if you could chat a little bit about that.
I've worked with people who have amazing technical ability and just super great data brains, great data brains, but at least at least in my field in epidemiology, it's all about producing information and it's about producing relevant information. So you've got to understand the customer. You've got to understand their situation and the better you understand where they're coming from, what they face day to day, the easier it is to add value to what they're doing by bringing data to bear. I really see my group in Marin County as a support group, as a service group. We're there to serve other people. Other people have the, they know what their problems are. We need to draw those problems out and understand what those problems are, kind of put them into our data language and then be able to apply data to those problems to get back to their issues. So that's the partnership. That's the understanding. You have to be really, the closer you are to the client, the better.
There's two things going on. One is just the whole issue of equity and racism. And this being just a really cool project to help try to help public health move along the pathway of addressing racism and equity more squarely. A lot of interest in integrating more equity-based practices and how we use data, what we do with data. But within public health, there's a lot of discussion of that, but we're still trying to figure out what it is. And this is an opportunity to put a lot of focus on figuring out what that might look like. And it's been a tremendous learning. I've had a tremendous amount of learning over the last several months working with people who are much more in the racial equity workspace and advocacy area to figure out what it might mean. I always knew we need to be closer to the community than we were at Marion County. But the closer we were to the community, and again, this is getting back to understanding the customer and understanding what the needs are out there and being relevant. The closer we are to them, the more relevant we're going to be, the more valuable what we produce is going to be. And it's just been underscored in the last several months about how important it is to make the affected communities, the marginalized communities, partners in figuring out what the questions are, figuring out what the questions are and what the factors are you need to pull together in the information, the data, to identify, to bring to the surface the issues like structural racism, how that's manifesting so that you can put a spotlight on that to try to bring about change and have them help frame the results in a way that's going to make that change move because they can frame the results in a really salient, impactful way, much more than I'm going to be able to as an old white guy working in an epi department looking at data can do without their engagement.
And it's just been underscored in the last several months about how important it is to make the affected communities, the marginalized communities, partners in figuring out what the questions are, figuring out what the questions are and what the factors are you need to pull together in the information, the data, to identify, to bring to the surface the issues like structural racism, how that's manifesting so that you can put a spotlight on that to try to bring about change.
So that was a lot of it, you know, just this opportunity to be involved in something that I think is really, really important. And it's a platform where I can be a lot more involved in some of the national discussions that I was involved with at Marin County, but now I can have, that's more central to what I do.
Yeah, there's a lot we can do. It's really, so much of what I'm recognizing more and more is the problem with race, things that perpetuate racism and structural racism are that we don't talk about it, we ignore it, we don't label it. And that's not okay because that allows it to continue. If we really want to change what's happening, we need to talk about white privilege and about how, again, I'm in public health, so how a lot of the differences we see in statistics rather than just presenting, okay, there's this difference in white, in terms of health outcomes, frame that in the context of we've got 400 years of history where this group has been oppressed in many ways. We've had, and not just before, not just until the Civil War, but there's redlining, there's all sorts of, there's the implementation of like the whatever, World War II, veterans, the veterans benefits and such like that. There's a lot of stuff that just really left out these communities. Social security, when social security was implemented, it excluded domestic workers and agricultural workers, which are white and black and all different colors, but most of them were different colors than white. And so the net effect of that was to oppress certain groups much more than other groups. So bringing that to the surface when we present results to frame it in more context where people can see the structural racism, the effects of the structural racism to push us toward thinking about how to address those issues.
It's not just a woman lives in a house that has a lot of asthma triggers. It's, there's a lot of dust, there might be cockroaches, whatever. We go and we can intervene to improve that house or move to a different house and that helps her. But we need to get back to why do we have people living in houses like that? We need to change the policies that are allowing these inequities to occur. And so it's trying to push back to more root causes of the issues that we're trying to deal with.
Yeah, I just wanted to say such great conversation that you're having, Joe. I love to hear that kind of looking at the source of things. One of the many things that we analyze in R is our fish contaminant data. So Penobscot Nation has sustenance fishing rights reserved in treaties. And as you can imagine, many sources of pollution do not allow that to happen. Even when fish dams get taken out and fish come back up from the ocean, we're finding recent studies that we're finding they're even more contaminated than non-seabed run fish. So those conversations about what is causing these problems are so critical. So I just really wanted to commend you for having that level of conversation.
Yeah, and I'm no expert in it. I'm just learning on this. And then again, nobody is. I mean, you got to give yourself some space and allow yourself to make mistakes, but we got to try in this area and figure it out. Exactly, exactly. I think we have to get comfortable with being uncomfortable because it's not easy stuff and it's not going to be a simple conversation. So but commend you for having it. It's really, really uncomfortable sometimes.
Yeah, I mean, the thing is. You got to speak up. You got to give yourself a little forgiveness for not always speaking up. But you got to take some risks and you got to push it. It's I don't have the answers. It's not an easy thing. And there's not a way to do it safely, really. I mean, there's always risk involved. So I don't know. You got to find finding finding. I guess when I look at the folks who are really doing this a lot, they've got a large group around them of allies, of folks who are allies. The wrong word in that in that context. But these are people who share the mission and can support each other as they go forward, because it's not easy, easy work. And you need you need people who are supporting you.
But thank you so much, Joe. If people want to connect with you or have other questions, what's the best way to connect?
No, it's email. I'm an old guy. Email is probably the best way. So I guess I guess you can share my email. If you get if people could try to make clear in the subject line what it's about, like maybe it's connected to the RStudio thing or something. So I can sort of segregate it off my other email. Make sense of it.
But also share some of the links that were mentioned in the chat and tips on our LinkedIn group for the hangout. But every week after the hangout, so try and share every link that was mentioned or a few tips from the featured leader. But thank you so much for joining us, Joe, and sharing your experience with us. Sure, happy to do it. I'm glad you guys are out there. It's good to see groups getting together and learn from each other. Thank you all for all the great questions, too. Have a great rest of the day, everyone.
