Resources

Data Science Hangout | Tiger Tang, CARFAX | Quantifying the Hours Saved

video
Sep 13, 2022
1:02:30

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi friends, welcome to the Data Science Hangout. If you're joining for the first time, very nice to meet you. I'm Rachel. So the Data Science Hangout is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing, and what's going on in the world of data science. The sessions are recorded and shared to YouTube as well as the RStudio Data Science Hangout site. So you always can go back and rewatch older sessions or find helpful resources there too. We also have a LinkedIn group for the Hangout too. So if you ever want to continue the discussion with somebody or ask other questions there, feel free to use that as well.

I know a lot of times it's me posting in there. So feel free to start your own posts in that group too. Last week, Marcos had shared an idea of starting a show and tell or show and share thread there. So you could maybe highlight a side project you're working on, maybe a cool tutorial or article you found as well. At the Hangouts, we're dedicated to creating a welcoming environment for everybody. So we love when everyone can participate and we can hear from everybody. So there's three ways you can ask questions today. If you haven't been here before, I'll walk you through this. So you can jump in and raise your hand on Zoom. You can put questions in the Zoom chat. And feel free to just put a little star next to your question if you want to meet or read it out loud. Maybe you're in a loud environment. And then lastly, we also have a Slido link where you can ask questions anonymously. And my colleague Tyler just shared that in the chat right now.

Just want to reiterate, we love to hear from everybody. So no matter your level of experience or the area of work as well. But with that, I am so excited to be joined by my co-host for today, Tiger Tang, Manager of Data Science at CARFAX. And Tiger, I'd love to have you introduce yourself and maybe tell us a little bit about your role, the company, and also something you like to do in your free time outside of work.

Oh sure, thanks Rachel. So my name is Tiger and then I'm really glad to be a part of the Data Science Hangout. And then I've been following and watching a couple of videos in the past. They've been also very helpful to me. And then so I'm currently a Data Science Manager at CARFAX. We are a company focusing on providing the data insights for consumers to better own, shop, and then purchase their vehicle, all that. And then oh great, I saw somebody use CARFAX before. So my role is basically building a data science team. And then that's mostly focused on NLP because we receive a lot of car information where we have to use a lot of the NLP techniques to handle and then to make it displayable on our product. And then we also do some of the forecasting. And then that's my main team's focus.

And then we do a lot of analytics here and there. And then for this team we have, so far this is our second year. And then so it's from almost one or zero to now I think around seven to eight people. That number fluctuates because of the number of interns that we have. And then in my free time and then I like to collect shells, so seashells. If I go to like a different beach and then I will check out on the sand. And then yeah, just you'll probably, if you see me, you know, just keeping my head down for a couple hours, that's me just trying to find shark teeth, all that. Usually nothing can be found, but I just enjoy looking for things.

It reminds me of a little bit like data mining, right? But usually I think with R, I can find a little bit more things, but just me walking by the sand and then by the ocean, usually the sifter is not as good as the R tools I use at work.

NLP use cases at CARFAX

I know somebody had reached out to me earlier this week and said that they love to learn a little bit more about NLP use cases. And since you focus on this area, I'd love to learn a little bit more about how you use NLP. Oh, sure. So for that, let's just say if you go to a specific, let's just say one example is that if you go to like a service shop and then to service your car, and then you will get an invoice of saying that this is the stuff that they have performed services on. And then basically we will probably receive like information, pieces of information similar to that invoice layout. But then the data content is probably maybe could be just written by the technician. It could be a standardized language for that specific location or management system. And when we get there, get that, and then it could really be a million different ways to say, to try to express the same service.

So this is where we will try to make a good sense of that data and trying to understand, hey, Rachel, for your car service, let's just say last Friday, what was being done there. So that once we have that information, we can display that on the product. So then I think most of our cases is trying to make the sense that the, the human region data or that type of car related information. And I will say that techniques you would use is, is something that could really differ between, let's just say, if you're, if we're just doing a prototype and the initial proof of concept, you can just start by building very simple and straightforward, let's just say, logistic models. And then there are books on the, what is that? The, the tidy models with, I think there's also a, with a meal and that book, the supervised machine learning with R and then on tax modeling, all that. Those are very great resources. If people are interested in, let's just say applying NLP related stuff in their other work.

But I guess most important case to me is to find a great is a, oh, somebody already shared the link. Thanks Libby. And then I think establish the business case first is very important, but then the model development can really differ just based on the timeline that you have and the resources you have.

Selling data science to the business

Tiger, I know you gave, I already just posted about this on LinkedIn because I loved your RStudio conference talk as well. But I'd love to hear a little bit more about your experience of kind of selling data science to the business and getting the team on board.

Oh yeah. That, that is a great question. And then I think it's something that I think this is also one of the things that I really love about the what's been going on recent years about data science is that I think let's just say take the time back to 10 years ago and then people are still trying to have a better definition about data science. But now people from different back with different background, they would have heard data science here and there. So I think makes our job easier to sell the concept to, you don't have to really bring people up to speed on a specific background to try and to say, hey if you haven't heard, this is a great, you know, data science is a great help.

But now we were, we got to just to be able to focus on saying that if we can utilize specific data science tools or techniques, these are the specific business value that we can just bring. And then, so you don't have to sell the concept of data science. You just need to sell let's just say specific project by project. So that's that's what I did in the past is that once I realized, oh if there is a multiple different ways for us to develop, let's just say processes, reporting tools to help business people, account representatives to better manage the account to to be more effective at their jobs. Then I would just use that specific business case and then share that with upper management and then say, hey guys, I think I recently, you know, identified this idea that will, as an example, will help us to speed off our roadmap by two years. Do you want to hear about that?

And at that time, I think the one of the things I learned is that, you know, at that stage, you don't really need to mention any of the tools you use. Even though I am a big fan of R, big fan of Shiny, and then I want to talk about it wherever I go. But then I think in the several of those experience when I talk about when I got so focused on about the tools and about the techniques, I guess that's that's turns out that's not everybody's focus. That's probably people's focus when you share the same background when you are talking to a fellow data scientist. So they're going to get excited too. So but in this case, I just said, hey, this is the value. This is how much time you can save. This is how much business value that we can accumulate with such shorter time. And then if you allow me and I can present you a roadmap, right? If you give me like a one month of doing the pilot, I can even show you some of the initial business value.

So I never tried to persuade people to say, hey, data science is valuable. I think showing it exactly, you know, give me a week, give me two weeks, I'm going to show you a proof of concept right there. And then they will buy in the idea themselves. So I think so that's what have worked out really well for me. And I think I also have a very understanding leadership there. So there when they once they see that, and then I think the buying process went very smoothly.

I think showing it exactly, you know, give me a week, give me two weeks, I'm going to show you a proof of concept right there. And then they will buy in the idea themselves.

That's great. Thank you. And I see Mark, you're commenting in the chat. Can I have you share that as well? Yeah, I just said that Tiger, some of the stuff that you talked about in your conference talk have actually used like some of those ideas to get a project greenlight, basically just starting from the value problem and working backwards to what I'm interested in, which is like how to do it. But, you know, telling to the stakeholder, like, here's why you would actually be interested in this, which I'm not always the best at doing going in that direction. So yeah, glad that that methodology worked out. Great. Awesome to hear.

Communicating value to stakeholders

Do you have any tips for how we can all get better at doing that? Or communicating that to the business? So I know it's like putting it in terms of the money and return on investment. But do you have other tips for communication? Oh, yeah. So one of the things I, I think my, one of my former boss, and then did this exercise a lot of times with me is that sometimes I want to get, let's just say the technical details. And so he would always say to me, hey, Tiger, how about this? Explain to me like I'm a five year old. So that's where I'm trying to, that's where my, I just turn myself into like, let's just say if this is a difficult concept, can I use very good analogy there? And then to make people understand.

So when I say that, as an example, if I say I want to try explore a different NLP method, and then this will, you know, need different, you know, let's just say additional resources. But when I say that, hey, this is a different, different, different method I wanted to try, how do I show that's, you know, because this is exploration phase, how do I show this additional value? Then one of the, well, one of the analogy I came up with, I still, I'm still not sure if it's a good analogy. So I'm going to try it. And then you guys can let me know. So I'm going to say that, hey, when you have the raw data, this is like you're, you have, you know, like, let's just say rate by state, and then this whole modeling process is like a cooking. So once you have the, let's just say logistic model, a deep learning, and then let's just say the transfer models, then these are just different ways of cooking the same steak. And then let's just say you can pan fry, and then you can use an air fryer, which is the newer ways. And then, but we really need to try different ways and to identify the best way to cook this piece of steak. And then the hyper-parameter tuning, those are just, you know, us to trying to tune, change the time of cooking, turn the, let's just say the, you know, use the oven for, to change the temperature of cooking, right?

So that's where suddenly all of these big words from the, from those, let's just say machine learning methodology, turn it into something that most people can relate, right? And then I think everybody, whenever I said that, and then people are like, yeah, yeah, I think I know what you're saying. So, but I just don't know. Yeah, I think analogy is definitely a great way to get everybody there.

And then another thing is that I would always hold, try to hold the urge of sharing any of the technical breakthrough that I really wanted to share with everybody. So one example is that I think around four years ago, I was in the RStudio conference and I heard RStudio folks and then talked about the future plus promises so that you can have, make your apps async, asynchronous, so that you can handle multiple processes at the same time. I got super excited because I have several apps that handles a lot of requests that would have been, would have been, let's just say hundreds of ticket requests to our department.

And then this is where I was saying, oh, we need to invest more time to do this, but then I don't really have a good way. And then I was saying, we can make the app asynchronous. R is a, you know, it says, can only handle single threaded jobs. People are lost. And then I was like, how can you, you know, how come this is a, you know, I think everybody had a great feedback when Joe Chan was talking about it. How come I can't get that across? And then later on, I realized that it's a, unless this is happening to people, downgrading people's app experience, they should never need to know R is single threaded because it doesn't really matter. So in that case, I was like, oh, I sense the room is, nobody's understanding what I'm saying. So then I was saying that, you know what, if we do this, even if we have 10 users running those different tasks at the same time, their experience is going to be perfect. But right now, if you have them doing that, I think everybody has to wait. And then the business side, yeah, let's do it.

So I think this is the part that still, of course, I was still sharing that, hey, we're, once the promises are out, we're maybe, maybe one or two months after the talk I heard, I implemented right away. So I was still proud of that. And I've still talked to, let's say my team members to say, hey, you know what, later on, you know, RStudio folks told me once I shared this, RStudio folks told us we were one of the first group that implemented this. So I still feel proud. It's just that I just don't share that with my stakeholder because I know, hey, fellow data scientists, they can get it. So I would say, use analogy wherever you can, of course, run that by your colleagues maybe. And then the other one is hold the urge of sharing technical breakthrough with your stakeholder or with your maybe upper management, whoever does not have, who doesn't have this data scientist background as you do. Maybe share that with, let's just say across teams, something like that.

Business analyst vs. data scientist roles

But one was, what's the difference between a business analyst and data scientists or data science manager in your eyes? Oh, well, I can only offer my understanding there. So business analysts, I think, in my company, and then there will be more focused on, let's just say, building that bridge between the business side and then some of the analytics side. And then building, let's just say, translating the business needs to the development criteria that developers and then, let's just say, data scientists needs to follow. So I almost think business analysts, they need to have the, let's just say, more have that business mindset rather than the analytical mindset. So it's a mixture, maybe like 80-20, something like that. But if you're talking about, let's just say, data analysts, they probably, we need to balance that a little bit more like a 50-50. And then for my role, specifically for my company, I'm trying to develop this team and then I have, let's just say, data analysts, data scientists, and then other function that related to how my company utilizes data science.

Niall, I see you asked a question in the Zoom chat. Do you want to jump in and ask it? Yeah, sure. So you talked earlier, Sagar, about bringing the ideas to the stakeholders, which leads me to assume that a lot of your projects are created within the data science team, you know, you're spitballing, brainstorming, coming up with ideas, and then sort of selling them, which is great when it works. And I'm curious about your balance between those projects and projects that are coming from outside within the organization saying, hey, we really want to do this and we think you guys have the skills to help, and how you sort of balance those. And then also the success rate of each, I guess, is where I'm really interested. Like, are they more likely to succeed if they're coming in with that outside buy-in at the start, or if they're coming in with your knowledge of data science?

Gotcha, gotcha. Oh, that's a great question. I actually, well, this is something I'm still working on. Yes, I'm preparing a presentation to actually work, what is that, evaluating some of those ideas for business-related data science idea, you know, coming from them instead of coming from my team. So I would say that immediately just makes me think about, let's just say, two things. The first one is that how possible that is to achieve, let's just say, in the realistic timeline. Because when they, well, we definitely, you know, a lot of the ideas coming from everywhere, but sometimes we may not really get the data to help us to come up with, let's just say, a reliable recommendation. So this is something that I think whenever we receive this type of request, I would always ask for, let's just say, one to two weeks of either, I wouldn't really call that proof of concept. It's even an initial investigation to try to see how possible, you know, that, do we have all the data points? Do we think this is, we can provide a viable solution to the business problem that they are talking about?

And then the next thing is, if there is a great potential, then I would try to, let's just say, try to see if we can put this on our roadmap. So this would involve, like, prioritization. So trying to say that, hey, how big, how important this is compared to all the other projects that the team is currently focusing on. So to me, that's almost like two-step. If there is, if everything works well, right, we validate this idea and we think we can offer some of the viable solutions, and then we T-shirt size it, then we send it for, let's just say, stakeholders to discuss to say, this one is going to take maybe half a year, but we are currently have a full roadmap. Do you think it's going to, we can do that maybe next year? If not, how about the, you know, you guys can decide which one is more important.

But yeah, I'm totally with you that there is a specific, the success rate, I never thought about it that way, but I never really measured it, I should say. But I would say it is something, I think what turned out to be the case is that some of the ideas that the business folks that I mentioned maybe two years back, and then now we suddenly realize, oh, we suddenly have the data, suddenly have the tools to make it happen. This is where we're going to pitch again among the stakeholders and then the upper management to say, now this is possible. Two years ago, maybe not.

What was that process for going back to projects that were proposed like two years ago? Oh, yeah. So, well, we don't really have a specific process. It's almost like keeping track of all the ideas that we want to do in the, that we didn't get a chance, either the resources or we didn't know the technology to make it happen, or even where it didn't have the data points to make that happen. So from time to time, people will come up with similar ideas. It's just that if we revisit them at a later time, maybe once a year for all the past two years' ideas, I think sometimes we can find some of the valuable information there or even, let's just say, updated or enhanced idea, like a sub-idea out of those.

Oh, can you say that again, Frank? Yeah, do you use software to track that or does everyone just keep it in a document and you get together for the annual idea hash? Oh, yeah. I wish we have a... So, well, we are trying to find out better software, but usually it's just like a document where we have that, but we use the roadmap tools. So the roadmap tools will save all the ideas where we put on blocks. And then so once in a while, we will look at all the ideas on maybe block from two years ago. And then I think so far, we've been able to catch that through that. But I guess if we are able to set a reminder and say, let's revisit this in two years, I think that would be great too.

Education and background in data science

Tiger, I looked at your background and you have an education and business school. Is that a path you'd recommend to people interested in data or more tech fields? Oh, so for the education? Oh, yeah. So that's, well, that wasn't really planned, to be honest. At that time, I think the people are still trying to define, I think the graduate school trying to define the data science related majors. So it just happened to be under the business school. But of course, we have some of the business school courses, which turned out to be very helpful. But I think I also heard a lot of great talks in the past on the data science hangout, I think. I don't think there is like a specific requirement for the people in the, let's just say, have to have to those type of background. But I guess, all of these, I heard a lot of people talking about the podcast and I watched Frank's episode that he also mentioned a bunch of the great recommendations there. So I think with those, we don't really have to, let's just say, go to a specific business school to get that information. I think now it's all available online for free. So I almost think that it's a mindset that we need to all acquire. I think it's an important direction we need to go, but it doesn't really need to go to that specific school, I would say.

Large language models and NLP exploration

Frank, I see you had asked a question in the chat. Do you want to jump in for that? I would love to. Tiger, I'm really curious. Are you or your team exploring using in any way, shape or form some of the newer large language models like GPT-3 that are coming out? Like you do NLP work. There's some awesome advances, which pairs really well with the idea of revisiting projects that weren't possible before. And if not, do those hit your radar? And are you hopeful that you will be able to use that in the future?

Oh, yeah, yeah. We are. But this is something that I checked with Mark. I was planning to have because this is a Thursday event, right? And there's another Tuesday event that people want to share like what's going on with the work, all that. So I was all prepared to talk about, oh, this is how we do NLP at this company. And then later on, I was told that, oh, these are just something that I'm not allowed to share. So by my level, I am. The details I cannot share. But yeah, I'm super excited about those and then exploring those things. Yeah, I'm very high level. I'm very excited. And then this is something that we're my team is exploring. Right. You've got to be probably as excited as I am that it's September 1st and OpenAI is bringing down the cost of hitting their their API. Yeah. Right on.

Relationship with IT

There is an anonymous question now that was, what is your relationship with IT like at your organization? Oh, yeah, I think so. Overall, I'm into a very good relationship with the IT department. And then while I think this is one of the amazing thing about working at Carfax, in my experience is that they're just usually very, extremely helpful. And then I know that while you're doing a lot of exploration, sometimes you run things on your computer, you'll get a lot of questions. Right. And then sometimes installing things. Well, maybe because I'm into a very good relationship. Whenever I got a call out and then it's not that serious. But I would say one of the maybe if you're at a much bigger organization where you do not get to know the whole team, then I think it would be nice to at least, you know, explore the let's just say maybe run it by them for some of those things that you're going to do. And then sometimes you even need to have stronger hardware. And then I think these are the part that I think maybe because we're like a mid-sized company. So knowing them personally actually smooths my experience.

So if we are in an organization, a huge company where we don't know IT yet and we haven't built relationships with them, what do you think is the best way to go about that? Oh, then I would say let's build a relationship. Just connect with them and then say that, hey, here's something that I will need to do. And I will need a lot of those things will need admin rights. And then yeah. And then I just tell them this is exactly what you're going to do. And then sometimes I think people is they are going to tell you, hey, nobody can have admin rights and then say, can we have like one day, right? Or two hours where you can even just look at my screen or you can help me there up to you. So when you are asking for smaller things, people are less likely to say no. So but I think my original answer is that, oh, I need admin rights to make this work. But then no. And then I was like, can we have one day? And then I think the response is much softer.

Training and R adoption

But on that note, do you do training for people who are just interested in getting started learning? Yes. Yes, I do. And then I think before I got my, let's just say, went to the came to the U.S. for the, let's just say, data analytics or data science degree, I have been a teacher in China. And then so I really love the feeling, let's just say, of sharing and helping people to succeed in the way. And then so I really enjoy teaching. And then so this is where I think after we've gone through the initial R training at our organization, and I began to develop the materials for folks and then who might not able to capture all the learnings in, let's just say, compressed two to three days. And then we have some default training. And then later on, when there's new things, and then we will just do a sharing session. So as right now, we are also my team is also preparing for some of the recurring sharing session around the company.

Yeah, and also, well, after the RStudio conference, participating in that, and then of course, we're going to do like a show and tell, this is what you learned. So I think it is something would always, well, if it, if we can build like a regular cadence, just like a data science hangout, similarly within the company, I think it's a great growing atmosphere that we can build. And everybody, not everybody can, well, we only have the energy to focus on a small part of the, which is the R packages that we're interested in. But then if we cannot gather that together, and then it's almost like we're learning, let's just say, from four other people, four other people, five other people are all helping us to gather that information. So our learning speed is should be much faster.

That's great. And do some people join those groups or those sessions who are just like starting to like scratch the surface of data science or just starting to express interest? Oh, yeah. So that is always a hard part because people have different needs, right? So the training will be open to business people who are interested in analytics, and then new R users, and then data scientists, senior data analysts, all that. So when it comes to that, and then I will, at the beginning of the training, when I send out the invite, I will say, these are the expectations. So before this training, you're expected to complete X, Y, Z. And then after the training, you will expect it to master X, Y, Z. So if you already know the answer to, let's say, this next two questions, no need to come to the training or the sharing session. So we will have, let's just say, the beginner to intermediate session one time, intermediate session one time, and then the advanced session, maybe only a few people will show up. So but having that, having set up the expectation, and of course, always record it, and then people, then it will be helpful for people later on.

Quantifying hours saved through automation

Lisa, you just asked a great question in the chat. Do you want to jump in? Sure. I'm just curious, and I'm sorry, I did jump in a little bit late, but I did go to your talk at RStudio conference, which was awesome. So I'm just wondering, like, what types of tools, like analytics tools, were available to you on your first day? And then if you could, like, you know, I, in your talk, and, and so far today, you've talked maybe slightly more generically about how to get, you know, buy-in from whoever the people are you need buy-in from, but like, was there a specific project? And I know you probably can't talk about details, but maybe, like, the themes of the type of project it was? Was it more like machine learning type stuff? Was it more like, I don't know, Shiny, reproducible reporting? Like, what sort of area kind of helped you most convince people of the tools you needed?

Yeah, that's, yeah, yeah, it does. Okay, Lisa, and then it's a great question. So I would say, I would just first answer your, the later part of the question is that work, what really worked out for our project, for my project, is, is our project is, it starts with a Shiny app, because the Shiny app, it has actually has a shorter runway of developing. So, and then after, and then once, well, you have the Shiny app developed, and I think our, well, this is something, well, I can still make it very shareable, so is that, let's just say, we receive a lot of business requests, and then to handle that business request is something that you will need to have an analyst to look into databases, to do a lot of web requests, and then to compile those, maybe in, let's just say, in R, in Excel, all that, and then share like the, like the finding for a specific, let's just say, query.

And then, but as more and more requests that we are getting from them, we realize that it is taking a lot of the analyst's time, and then, but then the logic that we follow to get those information is very similar. So, then we decided to build a Shiny app, and then to have all of these things handled by Shiny. So, then instead of having the business folks creating a, let's just say, a request, go through our internal system, gets assigned to our business analyst or, or data analyst there, and then they would have to wait in the queue, right, for maybe one or a day or so, maybe a couple hours, then they spend, let's just say, 30 minutes to work on it. Now, it's going to be a click and run for the requester, and then this whole workflow will be gone, right, will be replaced by Shiny.

So, the initial app that, I think, serious app that I developed, and then that just does exactly that. This is something that allow the, allow me to say, with those three Shiny apps, or the first series of Shiny apps, we were able to handle, I think yesterday I did the count again, I think was each week, or each month, now we're still handling 560 requests that would have been a ticket request coming to this department. And then, so with that, well, the modeling stuff is something that you will need a longer runway to prove the value, but for the developing of those apps, it's much shorter. And then the request that the business need that you handled is something that would work maybe for majority of the corporates, is that we are just saving time. We're making business requests handled faster than before, and here's the value.

And then the request that the business need that you handled is something that would work maybe for majority of the corporates, is that we are just saving time. We're making business requests handled faster than before, and here's the value.

And then the back to your first question is that, so luckily, I joined this company, Carfax, as an intern, and then I moved to a business analyst, and then data scientist, and then trying to develop a team. So I started, at that time, I think what was available to me was Excel and SAS. So maybe some of the folks, maybe you've seen the same at that time, right? Now we're all trying to move to different things.

So what were those steps then, from going from Excel and SAS to being able to use other tools? Oh, this is the process that, I think, for me, I was just so excited about this new stuff. So I kind of forced myself to learn things very quickly, but we have a whole group, right? At that time, I'm just one of the team members. And then when I saw, oh, there's a lot of potential for people to do the same, but I couldn't really, it's difficult for you to talk to your, let's just say, colleagues side by side, and then to say, hey, there's a much better way to do it. It's, I don't know, it's not really a good way to try to stimulate change, right? Especially when you're on the same team.

So I think little, what I found was that, oh, what if I can just try to automate, streamline this process, and then just show them, oh, you know what, Rachel, this one, you see, you handed off me this process two months ago, and then used to take, I think, half a day, let's say, each week. Now with this, it's just, I just need to click and run. And then, so, and then you would have, it's easy for you to see the difference. It was like, oh, I know that you have a similar task there. I can, if you like, we can work together to get that started. So gradually, that's how I was able to identify, oh, if we could make that, the streamlining things into a project, that is how I got the work automation project.

And then, so, instead of trying to do all the training to get people to adapt all the new things, you make that whole mission as a project. And then so people are like, yeah, this is a business thing, a business initiative we're doing. We're not asked to learn new things. We're just asked to work on this business initiative. Throughout the process, people are, let's just say, voluntarily, they fall into the trap, a learning trap that I set up for them, right? And then they got the excitement of getting things working by click of a button. And then they gradually, they want to know what's going on behind that. And then whenever there is error messages, they will come to me and they'll say, oh, yeah. So whenever this one shows up, it means that one of the environment wasn't set up correctly. So we're not asking them to learn it from zero to one. And we're just saying, I'm going to hand you off at two, you know, halfway there, right? And gradually, you are going to trying to go back and then go forward and then trying to see the direct value there.

So it sounds like very similar to how we get to the stakeholder is that I'm just going to show it to you. I don't want to, I don't need to be correct to persuade you. I just want to be helpful. And then this is one of the, I think, having a, making a business initiative makes people forget about, I'm asked to learn new things, all that, right? So we kind of get past that. So, yeah, in a way, I think this is almost like a hacky way of getting that done.

Measuring time savings

Another question from earlier was, how did you actually quantify the number of hours that you save the organization with automation and data science? Yeah, that's a great one. So that's something that I would say at the very beginning, you want to have a clear definition of what type of time you're counting. So when you're running, let's just say, when you're working on analysis, so pulling data, let's just say from different databases, that will take time. So if you're doing it step-by-step, it's also taking up your time to wait for those queries. Those are what I would just say is the processing time. Regardless of if it's two days or one hour, I don't count those. I only count the actual time that people have to be focused to interact with things. And then that's just my definition.

But once you identify that, oh, it's going to be the manual hours that people spend, then before you apply the streamlining or automation, before you can ask the current report owner, process owner, maybe sometimes it's yourself, estimate the number of hours you need to spend on a weekly, daily, monthly basis first. And then you time yourself again after you streamline it. So what I used to have is a table of things. These are the tasks that we apply automation for. This is the before time. And then each month, if it's a daily task, you use that daily task times 20, whatever number of days that you need to work on that. And then this is the afterwards time. And so it's a very simple table that you can calculate. So each month, and then you can say, hey, I apply automation on the five tasks as listed. This is the total time saving. But before, we need maybe 50 hours. Now we need two hours. In total, we saved 48.

Onboarding new team members

Libby, I love the question you just put into the chat. Would you want to jump in here? I was going to ask about onboarding. So onboarding in data science teams can be a little bit of a challenge for some people. And I've been through that as well as a data science employee. And I was wondering if you'd found any kind of best practices that make onboarding a more positive experience for everybody, including team members that are existing, right? That can be a burden for a new person coming in and taking other people's time to train them and get them up and going on stuff. Did you have anything to share?

Oh, yeah. I think that's a great question. It's still something that I'm trying to improve. So I wouldn't say it's a maybe it's not a good recommendation. But just so what I do now, what we do now is we try to I feel the onboarding, the way I think about that is almost like you're training, you're offering a specific training for people to let's just say to get better and then to get to get them prepared for the work. And then the working culture, the let's just say the institutional knowledge, all of that and setting up the them for success for those tools. And then in general, we would have a let's just say like a like a guideline, almost like a training guide. You have the agenda for, hey, this is what we want you to go through for the first, let's just say one month. And then and a lot of times we use the let's just say videos. And then we will have the impersonal training online with different folks that touches different ideas, touches different subjects.

But I guess for specifically for data science or data analytics roles, what I have found helpful or I think one of the keys I want to get them up to speed is analytical project. And those type of things are usually deeply connected with business needs. So this is something that usually I would just start with a high level business goals and then making sure the business goal clicks first. So so, you know, let's just say if this is for a let's just say a specific project we're working on, then I want to first go over how this report, how this process will help us to or help another team to get, let's just say, more accurate result, more, let's say, more effective approach to deal with, let's just say, to make a better product. And then I will go over the high level process of what individual staff will we will need to go through.

So to usually the tasks that we were going to first hand off are the ones that everybody can click and run. So we are trying to minimize their learning gap and then not trying to intimidate people, intimidate people. Oh, there are so many things I need to learn. No, we're going to give you that quick success feeling first. It's almost like, you know, getting people trained up with R, right? I want you to feel the power of click and run. This is great. I helped solve the business problem that Tiger just mentioned to me an hour ago. And then lastly, I would go over the difficult part that we face in this in this process. So those are the ones maybe tricky part. For example, if we're if this is a Shiny app we're supporting, then I would go over some of the last common things that we would use or more difficult, more complicated setup that we have. For example, if this analyst is not is not familiar with the let's just say future promises, then it's a good idea for for us to be prepared to share that maybe as a last step of handing off that project to them.

So typically onboarding, that's what I would follow is minimize the let's just say, don't get them, don't intimidate them. Just start with the why we're doing things. And then we offer, let's just say the quick success feeling, feeling everybody is, you know, feels great about this process. Lastly, we went over the harder ones. So later on, let's just say the next iteration, people will feel, oh, I know exactly why I'm doing this. I know how to do it. So that's the second step. And lastly, in case there's any difficult questions, I can reach out to Tiger or I can, you know, figure it out based on the information Tiger shared or other team members shared. So I would say that's my high level idea for onboarding.

Thank you, Tiger. That actually sounds a lot like getting executive buy-in or business partner buy-in really offer quick wins, build trust and then move into things that are more difficult or more complex. I think that's fantastic. I've never heard it described that way. I appreciate it. Yeah, I guess I kind of feel like in the end, we're just working with people, right? Right. Yeah. Either way, it's people. Yeah, we're trying to make all the experience smoother. And I guess, oh, it all comes down to us to bear all of these, let's say ideas, you know, in our own mind. So people working around us, they're, let's just say, receiving the good experience there.

Love that advice. I see, Brittany, you have your hand raised too. Do you want to jump in? Yeah, when I heard onboarding, my mind literally started like flying off in fireworks. At least at my company, we over years have tried to refine our onboarding process constantly. And one thing that we found is actually using, like making sure that your onboarding is really, really well documented. And then knowing who is responsible for what in terms of onboarding. So whenever we onboard a new team member, they typically have a mentor that will be their mentor for at least the first six months, probably the first year that they're with the company. And then they have their team leader and their specific tasks that we've actually assigned out to the mentors and the team leaders. And one of the things you brought up too is making sure that that new team member has a focus and has ownership, something that they can really like run with and provide value with. And we've actually used Microsoft Planner to like literally lay out almost like an Asana type board where there's tasks assigned to each person and like all of the bitty gritty things from like HR onboarding all the way through like reflection after the first 90 days. It's all kind of laid out there. So I think it's also good just to really like, even if it's as simple as just post-it notes on a whiteboard, really understanding all of the different things that someone has to do when they onboard and then making that plain and available for them.

So actually you make sure there's no gaps or things that you miss when you're onboarding people and maybe on person A got, you know, XYZ when they onboarded, but they missed ABC and things like that. So we just found that like having something like that set up and then just making copies for each new person we onboard was really, really helpful. I'm curious for both of you and whoever wants to jump in. Colin from RStudio and I were chatting about this with onboarding. How do you also ensure that you share the team culture right from the start or the company culture when people are going through like so many like checklists of these are the things that I have to go through?

Yeah, so from, I think just now I heard great advice and then I really like the, I think everybody's prefer their visual type of layout and I think that's, yeah, that's something I'm definitely going to check out. And then let's just say from my side and then while we bundle that task, let's just say we work together with HR team on that to make work fun and then so that's how one of the things that we promote and then we also have like, so we would kind of share that task and then so HR team will organize the, let's just say, since most people are online now, the online events and then to trying to have that, let's just say, a different experience outside of the Zoom meeting type of work. And then, and I guess the other one is that we also sometimes it's the, I think the company culture type of thing is something I also try to try a lot of different ways and then try to say, oh, this is how I experienced. I want you to feel maybe similarly, but then I realized that I have several years of staying in this company. So it's something that I think later on I realized it's something that maybe we'll just from time to time let the people experience that, just give them more time. So we will share the, there's a high level, but in the end I will always say or share that it's something that you will have the first-hand experience yourself.

Maintaining tools while continuing new development

Allen, I'm sorry, I realized I missed your question earlier. Do you want to jump in and ask that one? Yeah, and no worries there. This is a classic, like, I have to ask a question and then go to my next, like, far less interesting meeting at the top of the hour. But I'm curious, Tiger, if your team ends up in a situation where you build these really, like, interesting, developed, and exciting tools that provide lots of value for folks and then end up needing to sort of support and own it for the long term. And if you're faced with the dilemma of how do we, you know, provide that support while continuing to give, like, meaningful, interesting, new work to the folks on the team, for me that's a dilemma of figuring out, you know, how do we maintain stuff and also continue to develop. And so I wonder if you've got a good solution there.

Yeah, Allen, that's a great question. So I, well, luckily, I think, yeah, we still need to do that as well. So it's the same issue that we're facing here. So luckily, I guess the maintenance work is probably, of the