Resources

Brad Weiner @ CU Boulder | Using data to influence institutional decisions | Data Science Hangout

video
Jun 20, 2023
1:00:31

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to the Data Science Hangout and hope everybody's having a great week. I'm Rachel. I lead our pro community at Posit. And if you have not been here before, this is our open space to chat about data science leadership, questions you're facing and forgetting to hear about what's going on in the world of data across different industries.

So we're here every Thursday at the same time, same place. So if you are somewhere in the future watching this recording on YouTube, you can always add it to your calendar with the details and include it below. Together we're all dedicated to making this a welcoming environment for everyone. And we all love to hear from everybody, no matter your level of experience or area of work.

It's also totally okay to just listen in here if you don't want to join in the conversation, that's okay too. But there's three ways you can jump in and ask questions or provide your own perspective. So you can always jump in by raising your hand here on Zoom and I'll be on the lookout. You can put questions in the Zoom chat. And if you do want me to read it out instead of you, maybe you're in a cafe or your dog's barking or something, just put a little star next to your question in the Zoom chat. And then we also have a Slido link, which my colleagues will be sharing, oh, Tyler just shared it right now, which they shared in the chat so you can ask questions anonymously too.

With all that, thank you all so much for joining us here on this Thursday. I'm so excited to have Brad Weiner here with us as my co-host today. Brad is the Chief Data Officer at the University of Colorado, Boulder. And Brad, to get us going here, I'd love to have you just kind of jump in and introduce yourself, share a little bit about your role and also something you like to do outside of work too.

Brad's background and journey into data

So thanks so much, Rachel. I wanted to thank Rachel and everybody from the PASA team for being here. And then of course, it's easy to thank the community developers, but also just thanks to everyone from the PASA team for helping us do cool stuff in our jobs and in our lives. So it's super fun to get to hang out with everyone.

My name is Brad Weiner. I am the Chief Data Officer here at the University of Colorado, Boulder. And I got into this role in a fairly elliptical way. I, like many of you, are not technologists to start. I was actually a creative writing major in college. And I will die on this hill that data science and technical work is inherently creative work. But over time, I started working for universities and in a lot of different roles. And at some point along the way, someone was like, hey, we need someone to do some reporting and we need someone to do this. And I've always liked computers. I've always liked working with them. So I'm like, yeah, sure, I'll give that a shot and thought it was unbelievably fun and interesting and a really cool puzzle to solve.

And over time, got more and more and more engaged in the data side of what colleges and universities do. There's also a bump in the road where I was in graduate school and I took a statistics course. By the way, hello to all my Minnesota pals out there, but I was at the University of Minnesota. And I had a professor who taught his statistics classes using R. I mean, that was in about 2009 that I was first introduced to the whole concept of sort of open source data science tools and just totally fell in love with it. Just thought it was the most fun, most interesting thing that could possibly happen.

And so that led to the opportunity to work on a data science team out in startup land. And then finally to move here to CU Boulder to lead a data science team here and then eventually become the chief data officer. So that's like a little tiny bit of the journey.

Overview of the university as a data-rich environment

I like to kind of give a quick overview of higher ed in general. This is like maybe sort of boring for people, but I imagine many of you have gone to a university somewhere around the world and you've had some post-secondary education. And universities are, they seem like they're like this one thing, right? But what a university is, is it's just like this collection of thousands of tiny little units and there's this sort of famous joke that's like, we're hundreds or thousands of little micro units that are all unified by a common disagreement over parking or a common disagreement over the temperatures in buildings or a common disagreement over our football team or whatever.

So let me just give you like a quick sense of what, you know, the president of the University of California in the 60s was a guy named Clark Kerr, and he referred to the modern university as a multi-university. So let me just quickly kind of tell you why we're here. So the University of Colorado is a research university. We have three missions. Our three missions are to engage in research and to create new knowledge out in the world, to teach and ask that knowledge on to students, and then finally to engage with the public in what's called the public engagement mission. And that's translating your research efforts into things that can be broadly valuable. I mean, that really aligns very closely with what open source tools do.

But we also do a lot more. We produce art and creative works. We act as a venue for all kinds of things. We're a community space. We have a football team. We have museums. We have a bowling alley. We provide healthcare and patient treatment. We at least at CU Boulder put a lot of stuff into outer space. We engage in economic development with the community here in Boulder and the state of Colorado and internationally. We at least at CU, we care and feed and sustain a live buffalo named Ralphie who runs around on the football field. So the number of things that a single university are engaged in are quite limitless.

And so why I like to lead with this whole spiel is that each one of those results in a very like almost limitless number of interesting kind of data science questions and problems. So that's sort of the intro. And so I definitely want to take most of the time to answer questions for you. But just if you can imagine all of that complexity, every single one of those components can land as a data question. And so why I am here and why our teams are here is to try and think through ways to make the student experience better, to make the faculty experience better, to make the staff experience better, to make us work more effectively, work more efficiently, to make it so that the people of Colorado have the best possible university that they can have.

So with that, I'm loving this chat, by the way. I love that intro too. But we forgot one thing. What do you like to do outside of work as well? What do I like to do outside of work? So recently I decided I needed to spend less time in front of the screen. And so I started taking piano lessons, which I encourage anyone, especially if you're sort of later in life or midlife, to just try something totally new. I walk into these piano lessons and the person before me is an eight-year-old girl who's amazing. And I'm like barely making it through the trouble staff. So I've really been enjoying learning to play piano and getting back into music.

Influencing institutional decisions with data

Patrick, I see you have your hand raised here to jump in with a question you want to jump in. Thanks, Rachel. Hi, everybody. And thanks, Brad, for being here. I appreciate your time. First, I just want to say I love universities too and like their weirdness and flaws and all the kind of strange things about them. But your enthusiasm for it and the way you articulated that just now was awesome. So thank you very much. I really enjoyed that and appreciate how into it you are.

My question is specific to the university system. If you have any examples of where you have really like influenced an institutional level decision or maybe just for a school or department, but where you all have really shaped something like a real impact story from your data science work for the university.

Yeah, that's a great question. And so Patrick, I'm guessing you have experience in academia or in higher ed. You seem to know the space pretty well. Making the data work is the easy part. Getting people to actually use the data and sort of inform policies is much trickier. And I also always like to point to the fact that it really does take a village. We need all kinds of people to be invested and interested in a project.

Getting people to actually use the data and sort of inform policies is much trickier. And I also always like to point to the fact that it really does take a village. We need all kinds of people to be invested and interested in a project.

But there is one that we've sort of been pointing to lately, which was just released publicly to kind of quietly because it happened during finals. One of the things that we've thought a lot about is ways to make it so that students have as easy of an experience as possible. Because we really, really want students to be successful. Not just because that's like our core mission, but we're measured on that. All universities are measured on that in a lot of ways. And students want to go to a place where they're going to be successful.

So we're constantly looking for what I would consider, and I know this is kind of gross terminology for people in higher ed, but building a college degree really is a supply chain problem. You know, you need to have all of these constituent parts, and they need to land at the right time, and there's tons of dependencies, and those dependencies need to be built out years in advance. So if we don't have enough chemistry labs, that student doesn't get to take chemistry. And if not enough students take chemistry, they don't have the required courses, and they can't graduate. So we're constantly looking for these things.

And one of the things that we have found is that students found that the prices of their textbooks to be something of a barrier. And one of the things that we actually found, we're not only trying to graduate students, but we're trying to graduate everybody as equitably as possible. If the experience is one way for some students and one way for another group of students, that is not considered success. And so what we were finding and what we had heard and what we learned is that students were potentially choosing which courses to take based on the cost of the textbooks. And there's also a little bit of this mode of understanding now that the students of today are used to subscription services. Nobody ever thinks like, oh, how many songs did I listen to on Spotify? Am I getting my money's worth? They're just used to Netflix or these models where you just pay one price and you get everything. And it doesn't matter how much you use it or how little you use it.

So Patrick, to answer your question, this is kind of the long way around, but to answer your question, we just recently deployed, and you can read about it publicly, but we're one of the first handful, maybe a dozen or 15 schools that have gone in on this kind of bookstore equity access model, which is basically students pay one price. And that gives them sort of financial stability, the predictability in their budgets, their ability to budget on books, but also the ability to be really flexible in what they take. And by paying that one price, they get day one, what we call day one digital access. And there's also evidence that students who, I mean, think about your college days, right? You go and you take a class. The professor gives the old like, look to your left, look to your right, one of you is going to fail. And you're like, I'm out of here. And so then what you have to do is you have to go, you have to go back to the bookstore, you have to return that book.

And so we have found, and there's evidence that having digital immediate access to your books is actually a student success lever. And so for a variety of reasons, we wanted to implement this program. And one of the things that we were asked to do is, what price should the bundle be? And if any of you have worked in pricing, it's a really complicated problem. If the price is too high, nobody participates and the whole thing kind of collapses on itself. And if it's too low, you don't cover your costs. And so we had a whole team of people who are sort of thinking through that problem. And Rachel was very clear. She's like, you do not need to shill for Posit. But we are happy to say that we used our super sweet Posit workbench setup to simulate a whole bunch of different options. The data science team did that. And we were able to help them provide, to guide them on what that price was. And it got deployed out. And so next year, from what I understand, all undergraduates at CU Boulder will pay $279 a semester and get all of their books on day one. And if they drop a class, then those books go away and the new ones come in and they just have access to their books.

Patrick, I think it'll be valuable to follow up with me a year from now and see how well we did. Because this is a known unknown. We're going to have to see where I landed.

Data visualization tools in higher ed

This is something like multiple organizations I've been part of, I've been trying to figure out like what to use for data viz. So I see you guys are doing like a lot of Tableau dashboards and Tableau public. So how did you think that through as an organization, like what data viz tools to use?

So that is a great question. Data viz is eternally complicated because the technology is always moving. And then I think it's also fair to point out that higher ed tends to be a fairly conservative industry and it tends to buy things in like big ways. So hey, we want to spin up a Posit server and just test it out on 10 users is usually a little bit different than like, what is the best way that we can like use our tremendous buying power to have everybody in the same tool? And so it's always kind of striking a balance, but a lot of universities have gone in pretty deep on, I don't know if I'm allowed to say it, but on Tableau.

Tableau is kind of an industry standard across what's called institutional research. So universities have all these compliance regulations, they need to report on a bunch of stuff. And so Tableau is just like pretty commonly used and has been for about the last 10 years. But one of the things that we're constantly thinking through are ways to leverage more modern technology. And then one of the things that's a really interesting problem space is that Tableau doesn't necessarily provide semantic HTML underneath as part of its API calls. And what that means is that it's actually a little bit difficult to access with screen readers. And of course, we're a public university, we put out a bunch of public data. And one of the things we want to make sure we do is have as accessible of visualizations and accessible access to our information as possible.

And so Tableau provides a ton of value for a lot of reasons, and people really like being able to explore the data. But we are also looking at other ways to potentially visualize information and put out data. There's also just people who want to use open source tools. And so there's always the conversation of like, how big can we scale things like Flask or Streamlit or Shiny? And I know that those are amazing options, and I love them. But we haven't gotten to a point of like, sort of like institution wide buying. So we pick Abigail in the way that we pick a lot of things, which is through a process called satisficing, which is like, well, the university up the road is doing it seems good enough. It'll be fine. We'll go with that. And then we're mostly happy with what we get until something better comes along.

And all of the open source questions I totally agree with. And we just haven't gotten there yet. So those of you who work in like your cool startup with 50 people, and you're like, oh, well, we were in Streamlit for three weeks, and it didn't work for this. And now we're going to pivot over to Shiny or whatever. It's just a lot harder for us to move the sort of gravity of such a big organization. And I will say Tableau has a lot of advantages, it has a lot of functionality, although it does not have a spell check.

Textbook pricing outcomes and student success

I see Lisa had a follow up to your story that you shared about the book pricing. Lisa, do you want to jump in? I'm lunching right now, which is rather early for Minnesota, but so it is. Yeah, I was just curious if you shared out and maybe it's a little too early, any of the findings about the books, because I know having been in academia, that, you know, at least McAllister was sort of thinking about that problem as well.

Yeah, we're still pretty new to it. And higher ed, like seemingly has, like never has a mythical first mover. But there are a couple of schools that have gone in on this, including University of California at Davis. And so they're the ones who have sort of the most information about it. I would be happy to share and, you know, talk with the people who actually built this out. But right now we are launching it for this fall. And so it'll be a little while before we have any like real outcome data. But of course, the really exciting part will be, can we measure whether or not that particular intervention had an impact on student success? And so probably a year or two years from now, we'll have some like real data on what it all meant. Right now, we were just trying to get the thing out the door and put a price on it. But all the cool stuff is yet to happen.

Where the data office sits organizationally

Brad, I'm really curious where your role lives within the institution. Like is it, are you strongly connected to IT? Are you in administration? And I've got to think that where you kind of live organizationally has got impact on like how work comes to you, how you get to decide what to do, how you're prioritized, all those kinds of things. Just if you could kind of expand on that, that'd be great.

Yeah, that's a great question, Alan. I work in the sort of administrative side. But I have a really good, I have a colleague who's a chief data officer, and he will basically say that if you run or work with a data organization, and I would probably apply this to any of your organizations, although I only, I know my industry best, but if you are a centralized data team, and you're not serving everybody, you're failing. And maybe that's a little bit harsh, because it might not be set up that way everywhere. But for us, we want to support academic units, we want to support academic leadership, we want to support colleges and schools and programs, just as much as we want to support the auxiliary units parking in the bookstore and finance and budget and enrollment and all of those things.

So our role in the most big picture way is to support students, faculty and staff, and to engage in those three missions as much as we possibly can. And parsing out where the boundaries are is really, really tough. So people on my team, for instance, we have a whole team that does surveys. And those surveys help understand how students are engaging on campus, what their sense of belonging is, what type of activities they're participating in. But one of the things we do is we run what are called the faculty course questionnaires for all of campus. So every time you get to the end of semester, and you go, you know, this professor was great, or this professor wasn't so great, all of those data are sort of publicly available and come through this data group.

And that provides us opportunities to really talk about student success through the lens of teaching and learning and through the lens of how students are engaging in the classroom. And then, of course, there's a whole discipline of kind of what's known as learning analytics. So if you Google that, there's all kinds of papers that are out there. But you know, the degree to which students in different classes perform on different types of exercises or different types of, you know, in different types of learning environments, also matters.

So I, a bunch of years ago, I had the opportunity to work with a kind of multi-institutional team from around the Big Ten. And we were trying to understand whether or not there was any difference in academic performance based on a student's self-reported gender identity in introductory STEM classes. And it was really fascinating, because one, it was like an amazing data engineering problem. And you're like, well, we have a five-hour chemistry class, and they have a three-hour chemistry class. Are those the same? So like, harmonizing data across a bunch of universities was like, insanely complicated. But it delved really deeply into, you know, the unit of analysis of like the individual course. And then, of course, you can go even lower, like, what about the section? What about the instructor? And we are here to support efforts like that as well.

Which by the way, is why this job is so epically fun, because for three weeks, you're talking about bookstore prices, and then you're trying to figure out like, why courses that have just a midterm and a final are different from courses that have a variety of assessments along the way. So I don't know, that was kind of a long answer. But the gist is like, we live where we live organizationally, but we have touch points across the university. And if we didn't, we wouldn't have as good of an ability to answer the really hard questions where those two things land in exactly the same spot. You know, a student might not be successful because they might have basic needs problems, they might need additional financial aid, they might not have a sense of belonging, they might not be engaged in the community. They might also just really not be that good at math and need help with that. And all of that is part of the kind of interconnected sort of supply chain.

The rise of the chief data officer in higher ed

I think when we were talking a bit earlier, Brad, you mentioned that there aren't that many schools today that have a chief data officer. And I was wondering for schools who are maybe like in that spot of starting out to or like starting this role of chief data officer, if you have any tips to share them?

Yeah, I get this question pretty often. And so I'm gonna get my numbers a little bit wrong, but there's roughly 4000 or 5000 institutions in the United States that receive title for funding. Maybe it's actually probably a little bit higher, but that's everything from like, the California Barber College to Stanford. And that's basically title for funding means, you know, that they're eligible for federal student loans, or, well, that's, that's probably it. They're probably in the, you know, in the Higher Education Act, which Congress never reauthorizes. So we're always under like 20 year old rules. But there's 1000s and 1000s and 1000s of universities, and I'll just get on my soapbox for four seconds to say, remember that number, the next time you read in the New York Times or the Wall Street Journal or the Washington Post that it's so hard to get into college. It's not. We have so many choices, and most schools admit most students. So I always like to just tell people like, the national media does not paint a very good picture of the higher ed landscape at all.

Off the soapbox of those two, from what I understand, there's probably 50 or so chief data officers or fewer people with that title. There is a very traditional function in higher ed called institutional research, which sort of reports out all the numbers to the federal government, and they typically do this kind of work. So I'm not trying to pick on you, Lisa, but they're at Macalester in St. Paul, they actually have a really amazing institutional research office that's run by really cool people. But if you've worked in higher ed, you've at some point bumped into your kind of IR group. If you want to know anything about a university, these data are publicly available. And so there's someone who submits all of those data, and it's usually IR.

But what's happening more and more is institutions are realizing that their data is this core institutional asset, and that the more value that can be gotten out of those data, the better the institution can do, and the better that they can support students. There's also a growing sort of sense that with the regulatory framework and with compliance frameworks and what the need for data governance and what the need for sort of more standardized policies around data, that, oh, look, we have an iPads key holder, look at that, cool, nice to see you, George. But the conversation around how do we broadly manage our data is something that a chief data officer should be responsible for. And I wouldn't say it's like a weekly thing, but I'd say that probably like once a month, some university reaches out to me on LinkedIn and is like, should we have a chief data officer? What does that mean? What does the sort of construct look like in higher ed? And just like it was a growing trend in industry probably over the last 10 years, it's now sort of catching up in higher ed.

Prioritizing projects and getting people to use data

This has been really fascinating to learn about the setting of university and kind of problems that you have. I'm a data scientist at Microsoft, and usually in a business setting, I know the challenge that you spoke of is someone actually taking the action on the analysis you're doing is a really big challenge. And so on here, a lot of times we don't pick up analysis that could be interesting because we don't have the identified stakeholders that would actually take the action or willing to take the action, but it's also chicken and egg. Like if they don't see the analysis, they don't know what possible action they could take. So how do you overcome that challenge? How do you pick out the projects? How do you prioritize them considering you mentioned such a vast variety of projects you're able to take?

Yeah, I would say carefully and imperfectly, you know, and I think it's one of the coolest things and I have to give a hat to deposit into Rachel. One of the coolest things about getting in a room like this is realizing that we all have the same challenges. You know, I always had this idea like, oh, you work at Amazon, you work at Microsoft, like they've solved everything. And I've worked in higher ed, right? Like this place, it's been here since 1876. It's not going anywhere soon, but it's certainly decentralized and compartmentalized and has those same type of organizational challenges.

And I would say that mostly we take on projects in a what will ostensibly have the largest potential impact for students or for what we're trying to do. There's also a little bit of a conversation that I think I've thought deeply about DeepShot, which is like, what is within your span of control? And sometimes it's like, it's not, it's, you know, and this might come across as the easy road out, but it's like, what are things that we could directly impact or influence?

And then there are also times where it is the decision is made in ways that I refer to as like, you're bringing water to the desert, where occasionally there are people that have nothing, they are just totally out in the open. And so on that like dimension of like, what's easy for us versus what's valuable to them is like such in the sweet spot, because the simplest, most basic thing, it's like someone's just been asking forever, like, all I need is this one thing and to be able to get at it routinely. And you're like, hold my beer, let's focus the sprint, let's focus the cycles on that one thing. And those tend to be the situations where people come back and they're like, thank you so, so much, we just needed this one dashboard.

So I would say like, you know, if there's the thing that's easy for you and high value to someone else, that's worthwhile. If there's something where you really feel like it's totally within your galaxy to be like, hey, we had this finding, that finding suggests we shouldn't do this again. You're the person who's directly responsible for that. Don't do that again. And like, that's right within your sort of frame of reference or within your span of control.

And then, of course, as someone who just in case you can't tell, I'm kind of a wacky guy. I really value fun and joy and being a little bit weird, which I think is okay. And that one tiny corner of your whole portfolio should just be that random thing that seems like it's going to be a ton of fun to work on because those are the kinds of things that keep people engaged. And a lot of times it isn't like fun in some whimsical way, but it's just new, like this bookstore thing. That was hard work. It was a complicated problem, but it was probably like five weeks of work. It was time boxed and it was that one thing. And then we passed it on and we're like, oh, that was cool. We got to run tens of thousands of simulations on student opt-out behavior. Now back to our regularly scheduled program and we kind of can focus on the core stuff again.

And you did ask another question about like, how do you convince people to use the data? And that is, I think, you know, it just takes time and it takes trust and it takes effort. And it's also a recognition that it is not a linear path of like, we're getting better and better and better. And every single time there's more trust. It's like you go from step one to step two, and that's great. And step two to step three, and that's great. And then people start coming to you more. And then you totally box something and you're back to step one because the thing went sideways. And so I like to think that organizations are more resilient than we think they are. And that mistakes are sort of less damaging than we think they are in most cases.

And so people shouldn't be too hard on themselves. If you're like, we found this thing and then people go, oh, that's cool. Right on. We're still going to do this other thing. And you're like, but you might not want to do that because the data say this. And I will say that like higher ed is especially prone to folklore and to, you know, and to people believing that their experience is sort of outweighing data. That's changing a lot. But one of the most fun parts about working in higher ed, and this is one of the things that I was talking to Rachel about, is that if you work in some industries, finding the inefficiencies is really hard because all of the inefficiencies have already been found. And in higher ed, they're kind of like all over the place. So it's pretty cool.

Testing the campus visit weather myth

But to give you guys a quick story about folklore, has anyone ever heard the myth that students choose which college they're going to go to based on the weather on the day that they visit campus? Anyone ever been down that one? I was, I really wanted to go to school, but it was rainy and terrible. There's no way I'm going there. So it's maybe a little bit of an admissions joke, but there is some mythology around that. I happen to used to work at the University of Minnesota. Minnesota has famously great weather. And I tested that assumption and it turns out to be false.

And that included pulling hour by hour data, weather data for every single day for like going back to years and years and years and matching that up with when students visited. But to make it even more interesting, I know I'm talking to a bunch of data people, so you're all gonna be like, how did you do that? I actually calculated not the weather on campus, but I calculated the delta of the weather on campus from where they came. Because I'm imagining someone from Southern California being like, wow, this is a shocking difference. And I modeled that like in every possible way. It was full on p-hacking data torture and I couldn't find anything to support that assumption. So the next time you hear that one, college admissions people, I tested it in truly one of the worst climates or the most extreme climates in America. And it's not true.

It was full on p-hacking data torture and I couldn't find anything to support that assumption. So the next time you hear that one, college admissions people, I tested it in truly one of the worst climates or the most extreme climates in America. And it's not true.

Strategic communications and enrollment science

Brad, do you have any experiences or stories in using data science techniques to attempt to measure the effectiveness of CU's strategic communication efforts? Particularly in terms of things like recruiting effectiveness or public sentiment to university outreach efforts.

Yeah, that's a great question. We have a lot of connections with what at universities are called, is called like enrollment management. And they ask questions like that all the time. But that is sort of a really common set of questions that are asked, which is like, really, what is the effectiveness of anything? What is the effectiveness of a virtual visit versus an on-campus visit? What is the effectiveness of an email versus a paper mailing? And those are the types of things that are routinely asked. And that if you work in an organization that does sort of marketing work or does communications work, those are also, I think, some of the highest value conversations to have in data science land. Because so often what we do is very tough to append a specific cost or a specific sort of ROI to.

And so there are things like that where you're like, look, you know, we sent out this mailer. Did it work or did it not? We sent out this email communication. Did it work or did it not? Or what was the effect of that? Or what was the effect of that on different students? And, you know, email is basically free. So it's sort of a arcane and not very important research question. But there are questions of like, you know, this whole combination of communications, what do those do? And if they are found to have limited impact, then it's easy or tends to be easy to make that argument of like, you really should pivot these dollars to something else.

And enrollment management is becoming increasingly scientific. I mean, it really is. There's a guy, his name is Brock Tybert. He's going to be mad at me that I mentioned his name. But he thinks of it as like a science, enrollment science, you know, because what you have are millions of high school graduates. And you need to land a class that is big enough to, you know, it needs to be big enough to support your operations. It needs to be the right size. But you also need to have that across different majors and different colleges and different programs. And you need balance across all these different things. So if you Google the concept of enrollment management or enrollment science, you'll find a bazillion papers out there on how this happens.

And it starts with like millions and millions and millions and millions and millions of people whose name gets entered into a database, basically. And this is also a really fascinating conversation now because COVID had a dramatic effect on this entire industry. One of the big things was that many, many, many more schools are now no longer requiring the SAT or the ACT. And in so doing, the SAT and ACT were the primary source of a lot of these names. And so even like getting the kind of marketing apparatus going around student enrollment is a different story than it was even three to five years ago.

Open source tools and Posit Connect in higher ed

I'm also in the kind of higher ed data analytics and IR world over at Georgetown. Resonate with a lot of comments about Tableau and that this space and Power BI and everything around it, Looker, you name it. But curious, I think I have kind of found a team of folks who's really curious and sees an enormous amount of potential in the open source space in the orbit of R and Python and that sort of thing. And I was curious whether you have a Connect server up and going, whether you're kind of working with it in a production fashion or whether you've thought about it or we're headed in that direction and it isn't as common in IR as I might have thought.

We have, and I think that it is certainly a direction we'd be interested in pursuing is going into open source visualization tools. We do have kind of an interoperable, multilingual data science team. And by the way, this is only on the administrative side. There are of course probably hundreds of academic units out there. Scott, as you know, like who knows if the physics department has like a massive server and they're deploying everything. Like who knows, right? Very possible. I actually like to give you a sense of how decentralized higher ed is. It was easier for me to ask Rachel, which other people on CU's campus were using Posit rather than finding out myself. So like that's the world that we're in.

So I honestly think that as powerful and as great as those tools are, we still haven't exactly figured out how to make that happen. And some of that is kind of tradition and sort of common use of existing tools. And then some of that is the sort of deployment question, which is like one of the biggest complaints you get when you're working in a bigger organization. It's like, please don't give me something else to log into. My bookmark bar is already like 900 things.

But Scott, I would think that if you have the appetite to build it, and then of course there's like the cost structure around it, you know, where, you know, deploying something in Posit isn't free, but it's definitely cheaper than other options. But I would love to connect and sort of hear what your journey is like. We do have a couple of people who have built stuff in Shiny, but then, you know, we go, well, some end user wants to touch that and interact with it. And that end user is almost certainly not going to have the skills to even like fire up an IDE and like run the framework. So we're probably in early days on that. But I think what's probably a more likely path is that we would at least to start, and this is all like just me brainstorming in front of 127 people on a high wire, would be that we would use some of the HTML, like the semantic HTML frameworks that come from open source tools. So we would deploy using something like Flex Dashboard or something like, something like, oh my gosh, why is my brain falling out of my ear? High Chart, which is free use for academic institutions to just like deploy HTML web pages and put those out there. I think that would be kind of our like gateway into a full deployment of something like Shiny.

Building a campus data community of practice

Brad, one question, actually, one of my colleagues had asked this earlier, but they mentioned they saw something online that you host a community of practice event and was just curious to learn like how that came together and how you publicized it.

Yeah, that's a great question. So there was an online community of practice, there was sort of a Zoom meeting when I first joined CU and what we find, and I think this gets back a little bit to Deepa's question, which is like, how do you get people to sort of be in the experience? And the College of Engineering has analysts and all these places have analysts and they all have people who are working on similar problems. And so the better we are able to coordinate everybody and the better we're able to sort of work with everybody, the better we do. So we built out this data community of practice and it was a Zoom thing and we invited some guests and I haven't looked through the whole list here, but I may have hit some of you up to be guests at some point. I know one person in particular, but anyway, we'd have people come in and talk about data governance or master data management or we'd have people come in and talk about different things.

And the idea sort of surfaced of like, we're all kind of tired of staring at each other on Zoom a little bit. This is amazing because you're all distributed all over the world, but we have a need to build this community. And so we decided to have an event and we actually jokingly, I mean, I name things so stupidly. The biggest problem in computer science is naming things. It's why I'm never going to be a computer scientist because I'm terrible at naming things. But we straight up went with like IRL, like we're going to have a data community practice conference in real life. And let's all get into the room together and let's focus on a couple of things. One, we wanted to do some actual technical training and provide that value to our campus. So we had one training session that was in R in the tidyverse and we had another one that was in Tableau. And it was just a chance to sort of take some real experts in that and lift a bunch of boats all at once.

And then we had some sort of broader conversations, including a keynote speaker who spoke about the need to think about data in more equitable ways. We are constantly having this conversation about these important conversations around diversity and equity and around racism and anti-racism. And believe it or not, if you haven't spent time thinking about it, the way you collect data, the way you capture data, the way you count people matters. And so we brought in someone who is an expert in this. His name is Carson Bird. We brought him in from the University of Michigan to really talk to everybody and to have like a thought provoking conversation around the ways that data may or may not reflect the communities that we're trying to represent.

And then we also gave people the opportunity to do some lightning talks, which were absolutely amazing. We invited people from around campus to basically say, for 15 minutes, you have the opportunity to talk about a problem that is vexing you. And I cannot tell you enough how cool it was. And I would recommend it to any of your organizations to not only stand up a community of practice and just to get people together and say like, hey, what are you working on? What problems do you have? But to occasionally do your best to try and get everybody together in a meaningful way. Because people walked out of that, in my view, not only technically better, but with the ability to sit down and eat lunch with that person who sits across campus and is working on the same thing.

Rapid-fire questions

Tsering asked, have you done any analysis of the chances of a student successfully finishing a course or failing out? Yeah, that's a really, really common type of analysis that happens in higher ed is trying to estimate a student's likelihood. But again, we're not here to filter people out or to make people feel... There isn't some sort of channel for that. What it allows us to do is it allows us to focus our resources, like any predictive model, on the students that might need additional help. So if we can estimate that a student might not be successful in a particular class, it isn't like, don't take that class. It's what support can we give you so that you are successful?

And I think you answered this one in the chat, but is there a likelihood of colleges to move to AWS or Azure completely? Yeah, I mean, hang out in my email for a day and you'll see that higher ed, I mean, it's a massive industry. There are thousands of universities. And yes, there are institutions that have fully gone into a variety of cloud environments for a variety of reasons. So no doubt.

And I think maybe one last question I missed was you talked about synthetic data sets. Do you have any packages you could recommend or approaches? I would need to get back to you. I think that we're building those in Python and my Pythonic chops are a little bit weaker. So I could find out what we're using. And yes, I forgot to drop my favorite fun fact, which is that IPython was a side project of a grad student at CU Boulder in the early 2000s. So Fernando Perez is a physicist and realized it wasn't an interactive way to touch Python. And so there's some lab around here in which IPython and subsequently Jupyter were born. So depending on your view of CU, of Jupyter notebooks, either you're welcome or I'm sorry.

Well, thank you so much, Brad. I think that we got through most, if not all of the questions, but if there's anything you missed and anybody wants to send it my way, I can share it with Brad as well. But it was a pleasure getting to learn from you and hear about all the different projects you're working on across the university. This was awesome. Thank you very much, everyone. Thanks for your time. And thanks, Rachel and Paz for having me. And if anyone has questions, you can find me in all the normal spaces.