From Chemical Engineer to Dow Data Science Leader | Michael Hausinger | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout, everybody. If we haven't had a chance to meet before, I'm Rachel, I lead Customer Marketing at Posit. Posit builds enterprise solutions and open source tools for people who do data science with R and Python.

And I'm joined by my lovely co-host here, Libby. Hey, I am a Community Manager. I work with Posit to help foster our Hangout community here. And I'm also a Posit Academy Mentor in Posit Academy, where I help professionals do more with data by learning how to use both R and Python.

We're so happy to have you all joining us today. The Hangout is our open space to hear what's going on in the world of data across all different industries and chat about data science leadership and connect with others who are facing similar things as you. We get together here every Thursday at the same time, same place, of course, not if it's a holiday, though.

But if you're watching the recording and want to join us in the future, there will be details below in the YouTube description where you can learn how to add it to your own calendar and join us live.

But you can always reach out to me directly on LinkedIn as well. I love getting to connect with you all.

At the Hangout, we love hearing from you. It doesn't matter what your years of experience are or your title or your industry or what language you use or don't use. We really encourage you to talk in the chat, connect with each other, add your LinkedIn profile, introduce yourself, what you do, what your role is. If you have any roles that you're hiring for or if you're looking for a role, please, please talk about that so that we can connect people to each other.

And as a reminder, this is a community-driven discussion with our featured leader here. So if you don't ask questions, we don't have a conversation to have. So there are three ways to ask questions, which you are all here to do. You can raise your hand on Zoom. We can just call you. You can jump in live. You can put questions into the Zoom chat. We will find those and make sure that we ask them to Michael. We will ask them for you if you put a little asterisk next to them. But if you don't, we will just call on you and you can ask that question live.

And then there's also, if you feel more comfortable asking questions anonymously, there's a Slido link that Isabella will put in the chat. You can ask your questions there and we will also see them and ask them for you.

With all that, we're so excited to be joined by our featured leader and co-host today, Michael Hausinger, a scientist in research and development at Dow. And Michael, I'd love to have you kick us off with introducing yourself and a little bit about your role today, but also something you like to do for fun.

But the fact that you get something where it's not even like there's multiple correct answers, there's an infinite number of correct questions to even try to answer was really valuable in my perspective.

Career advice: find the intersection of enjoyment and value

So when people ask me about career advice, the first thing I'll say is I don't really know how to advise somebody to end up in a role like mine because I wasn't trying to get into a role like mine and it didn't really exist at the time. But what I did do is that I kept track of things that I enjoy doing. And if you can find a match between things that you enjoy doing and are good at and things that other people find valuable but have no interest in doing, you're probably in a really good spot.

So I enjoy a lot of this data manipulation type stuff. I enjoy putting together complicated graphics and so forth. There's a lot of people who really want absolutely nothing to do with some sort of computer-related programming tools and would find it extremely tedious to try to put together a database of information or working through all the bugs of the formatting of text or even a graphic or designing the table that makes the graphic and all that sort of stuff. And a lot of people would find that extremely tedious and not so interesting.

But I felt a lot of pull from people who wanted the finished product, wanted the finished system where they could search through information or make a clean graphic or whatever it is, do their lab data analysis more efficiently, whatever it is. And I enjoyed the process of making those sorts of tools. And so it's worked out well as a nice fit together between those two things.

Collaborating across teams and building prototypes

So yeah, there's a lot of different types of questions that you might get asked and that will have varying degrees of how easy they are to answer and how easy it is to build some sort of shiny app or whatever it is. And one of the things that I've learned is it, it really helps if you have the ability to sit down with somebody, have them hand you the start of their data, whatever it is. And if they can just sketch out a, well, this is what I want it to look like, and maybe a little bit of what the math is in between, then you can start putting together a first version.

And if I have a first version, I always like to just hand a shiny app to somebody and say, try this, let me watch. Because the first thing that they're going to go try to click on tells you a lot about what they actually want to be doing. So if I hand somebody a webpage and they go, it takes them three tries to click on the right thing, okay, I need to either label things differently or design things differently or whatever.

And a lot of times it's not too much effort to redesign the display of something. It's the calculations and the import and so forth that are the hard part. It's the data cleaning and the import and the math that takes more effort. And then if you just tweak the display so that your graph switches the axes or switches from a bar plot to a scatter plot or whatever it is, that makes life a lot easier and gives the real answer that somebody wanted. So if you can get a quick prototype out there and let them look at it, that goes a very long way.

Understanding the data before you start

So the most helpful thing to me is having a good understanding of what format the underlying data actually is part of. If somebody says, well, I want to take all this information and I want to put it in a more structured database or I want to make a graph out of it or whatever, I want to be able to see some of the, like a representative set of the files that everything's sitting in right now so that I can tell, okay, how consistent is the naming of your files? How consistent is the naming of what's going to end up in a column? Is it a bunch of paragraphs of text that have pieces of information in them? That's going to take a lot of work if we're actually going to turn that into tables.

Or maybe it's all sitting in some sort of consistent analytical output that's a spreadsheet but is very non-tidy to use programming terminology here. If it always comes out the same way, you can probably do something with it. And so just that first glance of seeing what the data looks like is huge.

I will also say that understanding the value in terms of at least how much time it takes for somebody to do whatever they're doing right now and how many people would benefit from a new tool. That's also huge. If you know that the same tool is going to be useful for a couple dozen people, that's a lot better than if you're designing it specifically for the one person. But if it's one person who spends an hour a day doing the exact same data cleanup or whatever it is, it's probably worth it to speed that process up.

Visualization training and binary file formats

So, on the visualization thing, I will say one of my favorite Shiny apps that I've ever put together was something that I used in this visualizations training, which the concept there was I liked running a training in the past where, in person, I could just hand out a folder to everyone in the room and say, open this in a minute. I'm going to put a question up on the board about the things that's in your folder. It's got one sheet of paper. It's got some data in there. Answer this question, I don't know, which catalyst is best at having high conversion under these circumstances for the data that's in your folder? And just raise your hand when you know the answer.

And what I didn't tell them is that there were about a dozen different possibilities. Some people got bar graphs. Some people got scatter plots. Some people got a table. Some people got whatever. And so, some people would just raise their hand in five seconds because they got an immediate answer from their scatter plot or something. Some people would still be staring at their text 30 seconds later going, why do I feel dumb? Everyone else is raising their hand and I can't answer this question. What am I missing? And so, you just got unlucky with a data format that wasn't optimized for this situation.

And so, I put together a shiny app that does the same thing. It's just a web link. You click on the link and it's got a random number generated in the background that says which of these dozen possible data displays is going to load when you load the page. And so, I could run some of these visualization trainings online now or just drop a link in the chat and it looks very unsuspicious for everybody to be clicking on the same link. I know that they're clicking on the same link. That was a lot of fun.

In terms of the binary file stuff, so a lot of different analytical instruments have their own system of how things get categorized or get saved. And there's a lot of stuff out there where a particular file format is just a wrapper around some other file format. So, a .xlsx file is actually a zip file. You can change it to a .zip manually on your computer and unzip the thing and find a bunch of XML pages and stuff in there, which is always fun to find the pages and the graphics and so forth in all these Microsoft file formats.

And there's a handful of file formats out there that we use regularly that we wanted to dig into. A lot of those actually have information online where somebody has already published a parser, at least to some extent. And sometimes it's even a public domain file format where they've described all that information. But if somebody has just published a parser online, it's not always complete. Maybe there were some things that they missed when they were putting things together. And so, the ability to dig around in there has been useful in, well, it's nice to start with Excel files if you have those or CSVs or whatever else. But sometimes you can't do that and it's helpful to make some tweaks to other file formats coming in, which are structured, which goes a long way.

Internal vs. external tools and navigating the Microsoft ecosystem

It has been much easier for me because I'm making things that are for internal use so far. We have a sizable organization that is in charge of our DAO.com site and everything gets more complicated if it needs to be external facing because we have all this branding that's very important to a company like ours to make sure that it's got the right formatting and the right color scheme and so forth to fit with everything else Dow.

When you're just putting together a smaller type of tool for something internal, you don't have to worry about that to the same extent, which is nice. Now, on the other side, though, I will say we have a number of things that we're working on pushing towards external facing tools and the ability to put together a Shiny app or a Dash app or a Stringlet app or whatever it may be and say, this is what I want it to look like. This is the functionality I want, et cetera, I think is really valuable to somebody who's more working on the true web interface for the company to say, okay, well, maybe I'm going to write that in a different suite of tools, but I can visibly see all the functionality that I want to incorporate and that the people using it find important and I think that speeds up the process and requires fewer iterations of the back and forth.

Having a prototype, even if it's an internal prototype, goes a very long way on that sort of communication.

Quantifying the value of learning R

Yeah, quantifying value of any sort of digital tool has been a major struggle for all of us because a lot of times it's in time saved doing a task. And then the real question is, would you have even bothered to spend that time doing that task if you didn't have the programming tool?

So one of the biggest compliments, I think, that I can give R is that I had built, like run a number of calculations that we just wouldn't have attempted if I didn't have a programming tool. And so like I can answer questions that we just would have said, we can't do that.

Now, what number of value can I put on those? That's really hard. And in terms of time saved on something that also gets messy when you're communicating to stakeholders, if you say, well, I saved so-and-so, I saved this group of whatever, a dozen people, two hours a week each. And they say, oh, that's great. Does that mean that we can have one fewer person in our group? Well, that's not really the answer that I'm looking for.

But if you can save some of the tedious steps and answer, that's a good thing. If you can answer questions that you just would not have been able to answer in the past, because you would have given up first, that's even more value. But putting a number on it is not something that I've found a good solution to.

So one of the biggest compliments, I think, that I can give R is that I had built, like run a number of calculations that we just wouldn't have attempted if I didn't have a programming tool. And so like I can answer questions that we just would have said, we can't do that.

Overcoming imposter syndrome and building confidence

So fortunately, I was definitely not the first person learning R at Dow. And there were some other people like Tony and James, I know is here, and others who are very friendly and open to consult on things. And one of the things that I think all of us have learned is that if you've got a group of people who are all working in the same space, if you've got a new person who asks a question, a more experienced person might be able to answer that question, but might not have necessarily known that answer directly. It's like, I'm 90% sure that the answer is this. Let's try it. Yep, it worked. That's cool. I had never thought about it that way.

So I've learned a ton of stuff from the other broader Dow community in this area. But I think I've been able to share quite a few things that they've learned from me as well, because I'm asking slightly different questions than they had asked in the past. So once you start to realize, like if you have more experienced people who are very willing to say, I think it's this, but I've never tried it before, that quickly gets you out of the imposter area, because you know that this is not a stupid question. I'm doing something that's kind of new and unusual, and that happens very quickly.

The other thing that I've encountered is that there's a huge amount of information online for all these programming languages where you can ask a question in some way, and it might give you an answer. That goes a long way. Convincing people that they're going to get somewhere by Googling it is huge.

So with one of the recent Posit Academy groups that I was working with, one of my favorite moments in that was somebody had a question where they wanted to make a plot on a log scale instead of a linear scale, and I said, I want you to share your screen. We're all here. We've got five minutes left in class. I want you to just type something in your search engine in the terminology that you would use for figuring this out, and open up a page and see if it gives you the right command or the right function to use here. Let's go. Let's try it.

And she Googled something, and she found a ggplot function of scale x log or logarithmic or something like that that accomplished what she wanted to do, and it was there. It was new to everybody in the group. I had seen it, but it was new to everyone in the group, and it was, I'm going to make you Google this, and you're going to find it, and you're going to get that answer. And the confidence that you can do that, and even if it breaks, it wasted 30 seconds of it crashing once. I can go Google this again, try something else. That goes a very long way as well.

It's a lot easier to run tests on code, especially if you're just doing something like modifying the graph, than it is to run a new reaction in lab that may take you hours or days or whatever it is. And the difference in time there is something that takes a lot of scientists some practice to get used to.

What makes Academy successful at Dow

So I think that, in my experience, Dow has always had a good willingness to let people learn new skills, and desire for people to learn new skills. With Academy itself, I think one of the things that we've learned there is that making sure that before somebody even starts going through Academy, that we've been in touch with their management chain and got explicit approval that yes, we will set aside time for so-and-so to have enough available bandwidth to work on all the lessons and things in Academy, that goes a long way.

And then the other thing is, the faster that you can work with your own data and get some benefit out of it, the better. So it's more motivating to you, and it's more motivating to your management. If three weeks into Academy, you say, I just took this Excel file that I get on a daily basis out of some instrument, and I turned it into this graph that I have to make every time, and it was faster. All of a sudden, you're like, okay, so you set aside this time for me to work on this class, and I'm still going to need that time. But now one of the other things that I was doing on a daily basis, I don't have to do anymore. People quickly start to see the value there, and it's exciting for both the leadership and the students. And that's a really valuable milestone when people hit that.

So we've had good support, but making sure that we're up front with the communication and up front with the approvals and so forth is really important to make sure that people don't get pulled away into some other responsibilities of a different training program, into a different job responsibility that pops up into whatever.

What's next: broadening awareness of what's possible

I think that the next big thing is a lot of times with getting more people aware of what's possible with the programming tools, not necessarily teaching them all the programming tools, but saying, if people can come in and say, here's this concept that I have, I think it might be easy to somebody who knows R or somebody who knows Python or whatever it is, and actually being mostly correct on that assessment, even if they don't know the programming stuff, that's a huge step forward.

So trying to put out more broadly useful examples of these things and advertising them to the organization and saying, here's a couple examples of something that we've been doing recently, you might like some of these, but let us know if you have other ideas of something that might fall into the same realm that might look similar, that sort of stuff, come talk to us and it might be really fast to accomplish.

I think that's a big goal for everyone in the data science world, to have the right communication with the non-programmers out there who have a lot of great ideas and the faster you can filter those down to the ones that are good for computers and the ones that are not as friendly for computers to work on, the easier your job becomes.

I think that's a big goal for everyone in the data science world, to have the right communication with the non-programmers out there who have a lot of great ideas and the faster you can filter those down to the ones that are good for computers and the ones that are not as friendly for computers to work on, the easier your job becomes.

Well, thank you so much, Michael, for joining us today and to share your experience. And I'm so glad to hear how this experience with Academy has gone for you. But just so happy to have so many people from the Dow team here today as well.

From Chemical Engineer to Dow Data Science Leader | Michael Hausinger | Data Science Hangout

Transcript#

Michael's background and role at Dow

From bench science to data science via Posit Academy

Knowing when Excel isn't enough

What made Posit Academy's format effective

Career advice: find the intersection of enjoyment and value

Collaborating across teams and building prototypes

Understanding the data before you start

Visualization training and binary file formats

Internal vs. external tools and navigating the Microsoft ecosystem

Quantifying the value of learning R

Overcoming imposter syndrome and building confidence

What makes Academy successful at Dow

What's next: broadening awareness of what's possible