Resources

Greg Shick @ Charles Schwab | Data Science Hangout

video
Apr 17, 2024
1:00:05

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everybody, welcome back to the Data Science Hangout. I'm Rachel. If we haven't met yet, I lead customer marketing at Posit. So I'm excited to have you joining us here today. If it is your first time, let us know in the chat. We'd love to say hello and welcome you in. The Hangout is our open space to hear what's going on in the world of data across different industries, chat about data science leadership, and connect with others facing similar things as you. And we get together here every Thursday, unless it's a holiday, at the same time, same place.

So if you're watching this as a recording on YouTube in the future and want to join us live, there'll be details to add it to your calendar below. Just make sure it adds it for 12 Eastern Time. But at the Hangout, we're all dedicated to keeping this a friendly and welcoming space for everyone. And we love to hear from you, no matter your years of experience, titles, industry, or languages that you work in. You can choose to use the Hangout however it suits you. If you just want to listen in, you can do so or be a part of the party happening in the Zoom chat, share resources and links with each other.

There's also three ways you can jump in here and ask questions or provide your own perspective. So you could raise your hand on Zoom, and I will call on you to jump in. You could also put questions in the Zoom chat and just put a little asterisk or star next to it if you want me to read it instead. Maybe you're in a coffee shop or something. And then third, we have a Slido link where you can ask questions anonymously.

A few notes I like to just share because we have everybody here. I want to make sure everybody knows POSIT Conference is coming up in August in Seattle. We'd love to see you all there in person or virtually. And this is something new, but I was just curious if anybody joining the Hangouts is in London, because our team will actually be there next week for a Databricks event. So I'd love to make sure that I send the invite to you if you're in the area.

Well, I am so excited to be joined by my co-host today, Greg Schick, Director of Analytics at Charles Schwab. And Greg will say he's a long-time listener, first-time caller here. So really excited to have you joining us. And Greg, I'd love to have you introduce yourself and share a little bit about your role and get us started here.

Yeah, absolutely. Thank you, Rachel. Yeah, a long-time listener, first-time caller, been joining the Hangouts for quite a while. And it's kind of surreal now to be the featured speaker. So I appreciate the opportunity. Just before I go into my background, I would love to just highlight a couple of things that I love about the Data Hangout. Number one is the chat. My whole goal for today is to really get the chat going and riled up.

Greg's background and career path

So a little bit about me. I graduated from the University of Denver with a degree in statistics and mathematics. And while I was at the University of Denver, kind of typical college student, I got to my senior year and thought, oh, I've actually got to go get a job. So talking to my advisor, I kind of narrowed in on actuarial science and got an internship and got a job straight out of college working as an actuary and consulting. My advisor had told me that actuaries were high paid and low stress, and I still always tease her to this day that I think she got those backwards.

So it was a good intro career for me. I learned a lot about programming, learned a lot about presentations, business. We were in the consulting space. We were not in the insurance space. So working with customers and trying to understand their needs and things like that, spent about three years doing that. And after that, I went to HR analytics for just about a year, which was really interesting to look at data that you could actually tie to specific people. Spent a little over a year doing HR analytics, and then I moved to DirecTV, where I did field operations analytics.

And that was a really great career move. For me, I was about five years out of school at that point and really hadn't found what I wanted to do yet. And DirecTV, I had a great boss who hired me and really helped me find some of my passion about data and analytics, and worked there for just about six years, was part of the AT&T merger, too. We were acquired by AT&T, and then ended up coming to Schwab back in 2017.

So Charles Schwab, for those who aren't familiar with us, we're a full-service financial firm. So we offer investing, banking, advice for those who are interested in those types of things, all kinds of financial products. And so the last six and a half years or so, I've been at Schwab. And one of the cool things about my role has been, since I've been here for six and a half years, I've probably had five or six different jobs. I actually started in the marketing analytics space and working on new clients that were new to the firm and also referrals, and moved naturally into what we call journey analytics, so how customers go from point A to point B.

And after that, moved into more of a role focusing on prospect conversion. This was on the call center team specifically, so supporting call centers and trying to use natural language processing and large amounts of text to try to determine why customers were calling us, what were their needs, how we could help them, those types of things. And then most recently, I've been in more of the product analytics and risk space, and just recently picked up support for our treasury team that manages cash and margin balance and things like that for us.

I have one more thing to add to that too, but I already saw a question about why was that boss a great boss? And so I want to tackle that because he was a great boss. He actually now works for Toyota. He hasn't been my boss in like 10 years, and we still keep in contact, still good friends. Purely because he got me, he had a really great way of getting us excited about the impact that we can make, right? At Direct TV, when we were helping with field, so we would do things where we would launch new products, and then our team would be responsible for not truly A-B testing, but like measuring the effectiveness of the products and trying to compare them to similar products and things like that.

And he did such a great job of connecting the work we were doing to the value that we were creating for clients and we were creating for the firm. And so it really made me excited to come to work every day. I think that was probably one of the first times in my career that I was excited to come to work. And after I would go home, I would still be thinking about problems at work.

And he did such a great job of connecting the work we were doing to the value that we were creating for clients and we were creating for the firm. And so it really made me excited to come to work every day.

And then the one last thing I'll mention about Schwab is I also currently still work with the University of Denver, with my old department, which is now rebranded as the Business Information and Analytics Departments. I'm hoping I may have some students from DU on with us today, but we do regular capstone projects with them. So that's really cool to be able to connect real world data to problems or projects at the undergrad and graduate level. And I also help run the data science at Schwab user group.

Oh, that's awesome. Big community day for you. One thing about yourself I forgot to ask you, what about something you like to do in your free time too?

Oh yeah, that's a great question. So I think as Rachel knows, I'm based in Colorado. I should be a skier or a snowboarder. I'm actually a fifth generation native. And my great, great grandfather came to Colorado specifically to compete in ski jump championships. And I have never skied or snowboarded in my life. I'm a big fan of just being outside. Anything, anything hiking, paddle boarding was been something I picked up recently. And then building too. I've been really into woodworking and 3D printing probably most recently during the pandemic. My two sons and I, we got a 3D printer and built it ourselves and then are now printing things from the printer we built.

Mentors and career development

Can you discuss any mentors or people that you have come across that helped shape your career journey? Yeah, absolutely. I've been really fortunate that I've had a mentor at probably each role that I've been at. And even early on, one of the mentors I had when I worked for Mercer in actual consulting, this was fresh out of school. I was just kind of bright eyed and knew a lot of the, maybe the technical things I needed to know, but not of the business side, the relationship building and networking and offering to help people when they didn't necessarily, I didn't necessarily get anything back from that side, just volunteering and things like that.

And he suggested, he gave me really tangible suggestions of things to build my career. At that time, one of the basic things was like learning mail merge in Word, between Word and Excel. That sounds like so outdated now, but the idea that you could like, not really program something, but you could create something that you can scale up to one iteration or 10 or a thousand or a hundred thousand, right? It was really interesting. Also building some of my technical coding skills.

And then even more recently at Schwab, we've had, I've had some great mentors who've talked to me about like thinking about personal brand and think about, when someone, when I'm in a room and I'm, or when people are in a room and I'm not there and they say things about me, like, what are they talking? Is it positive? Is it negative? Do they even know who I am? That's probably the first battle. So trying to get that personal brand out there and being kind of aware of what your reputation is at a firm.

Measuring and prioritizing data projects

Yeah, absolutely. I can probably talk about that most recently from Schwab because we've taken quite a few, in the past two or three years, we've taken a lot of learnings from Gartner and from other folks in industry who do research about these things, because measuring data science projects or BI or analytics projects, anything really is tough, right? A lot of them, we have this idea maybe after the fact of what it might be worth, but it's hard to get that up front.

So about two or three years ago, we went through a pretty robust process at Schwab to try to understand what the expected value of a project could be. And so we got really down into the details and the weeds from this and thought about, is this satisfying a regulatory need? Is it satisfying a business need? Is it an efficiency play? Are we just making life easier for people? And how can we quantify those things, right? Efficiency, generally, I think was one of the more easier ones, right? If I can save somebody 15 minutes a day by automating this process or creating this dashboard that they can go to, we can turn that into dollars pretty easily.

One thing that we've had a lot of success with was providing kind of ranges and estimates too. Like if this is fully successful, it might be X. And if it's partially successful, it might be Y. So it gives you kind of a threshold. And really, at the end of the day, I don't think the dollars or the values matter as much as they do for helping us prioritize, right? So if we have this set of expectations, we think about which of these projects is our actual top priority, and then we can stack rank them a little bit easier.

Working across different analytical domains

Yeah, absolutely. I think, you know, my move to Schwab was probably the only time in my career that I made a purposeful move for a domain, right? So up to that point in my career, six years ago, I had seen lots of different areas and ways to use analytics, to use data science. And the two that stuck out to me the most were definitely marketing analytics and biostats. Prior to that, I don't know that I was as purposeful.

So the approach that I started taking at Schwab is that I really want to get the widest breadth of experience that I can, right? I think as data practitioners, if we have the skill sets, we can apply them to any data problem. And I kind of use the analogy of like building a house, right? If you're hiring somebody to build your house, you don't care which tools they're going to use. You just care if they're going to do a good job, are they going to meet your expectations, those types of things. And so the more experience they get building different types of things and approaching unusual situations, especially the ones where when you start out, you have no idea what you're doing — those are probably the most fun ones.

Explaining ML and AI, and getting buy-in

That's a great question. If I can caveat the question to explaining ML and AI, maybe realistically, outside of the hype, I might say that. So we've seen a lot of those terms. And, you know, frequently when I hear some of the projects, the way that I would think about it is not, you know, ML or AI, I would think of it as automation or maybe regex, right? Like this can be solved through that. This can be, this is a report, you know, things like that.

And then your question about like buy-in, you know, is a really good one too, because we try to have a balance of like a reactive and a proactive approach, right? Where we're listening and hearing from each business and we're reacting to what they're telling us and to their, what they're saying are their top priorities and their needs, but also having a proactive approach where we're closely partnered enough with each organization at Schwab that we can go and say, you know, hey, you haven't asked us for this, but we were thinking about your business and we think this could be important.

With those, I think demos can always really help, like having something tangible, talking about some of these terms before you've built anything or before you've, you know, can have anything to show concretely is really difficult for folks who aren't practitioners and who aren't in data, you know, every day.

So for each team that we support and, you know, right now I'm supporting three different, you know, three separate organizations within Schwab. And each one, we have a data and analytics roadmap that we map out. We try to keep it somewhere between six months and 18 months out where we're saying, these are the things that are most important to you. Some of those projects are, you know, the very logistical, tactical, like we don't have this data and we need this data. So it can be kind of that foundational layer of, we just need data quality fixes or data enhancements. Then we also talk about what is in their business plan, right? So what are their objectives for the year and how does data impact that? And then finally, is that kind of that last layer that I talked about of like, what are we not doing that we should be doing?

Human qualities in hiring

Yeah, that's a great question. The probably the biggest thing is just kindness is just somebody I'd want to work with. Right. I think that's a very underrated skill set. And actually, when I think about my first mentor, I mentioned at Mercer, you know, that's something coming out of school, I had zero focus on being a good person to work with. And, and, you know, I really had this like statistician type mindset of like, I'm just gonna go and crunch numbers. And, you know, I'll provide the results. And I'll just sit in this, you know, this far off corner and do those types of things.

So now I really try to, you know, probe and ask questions about like Rachel asked, you know, what do you do in your free time? And, and what types of projects do you like working on? What are the worst things that you, you know, you hate doing? Right? Trying to understand more about, you know, about that person. And would they be somebody that I would, you know, enjoy coming to work with every day?

Connecting data work to business value

Yeah, absolutely. And actually, one of my former managers, you know, just posted this on LinkedIn a few weeks back, and it really connected with me that, you know, the single most important thing a data scientist can do is understand how their business makes money, or how their company makes money. And so start with that, right? So Schwab is a financial services firm. Big picture, we make money a few different ways, right? We can earn money on interest, on interest revenue from assets that we hold, we can earn money when you make certain types of trades, most of our trades are now $0, they're free.

So if we just, you know, focus on the advice side, how many clients are we giving advice to? Like, how does that? What does that advice turn into? A lot of those are in person conversations between our clients and between their financial consultants. So it's hard to kind of quantify the value of a conversation or the value of having a physical presence or those types of things. But those are the types of problems that we try to tackle of understanding how our advice turns into value for an individual client, right? Are you better off long term with the advice that we give you, you know, versus otherwise?

The single most important thing a data scientist can do is understand how their business makes money, or how their company makes money.

NLP and call center analytics

Yeah, absolutely. So this, so that, you know, that question, or that project was, was now a couple years ago. So I'm trying to remember the exact methods, but it was definitely an NLP project. And actually, it's probably another good highlight of the reason that anybody in data should have an understanding of the full pipeline of your data, right? How it gets created, how it gets manipulated before it's stored, how it's stored, and the output that it turns into. So we did use LDA, and a couple of other methods, I think one of the biggest ones for that one that we ended up using was actually TF IDF, and trying to compare subsets of calls to broad topics of calls.

That one was a really interesting one, because it opened my eyes to a lot about how often people call us and some of the, you know, the topics of conversation that I, you know, I would have been surprised to learn that I, you know, didn't know beforehand. So the reason I mentioned bringing up the pipeline was because one thing we found out after we had already started the analysis of this specific team, or, you know, subset of calls at Schwab, we found out that not all calls were being transcribed by our platform. So only a subset of agents were getting this transcription. And at the time, you know, it turned into be like a licensing thing, right? It wasn't any, like, you know, anything nefarious. It was just, you know, there was a licensing, you know, challenge behind it.

I think that's still one of the biggest things that we try to understand is the sentiment. There was actually a really interesting presentation in our data science at Schwab that I know Amazon's done something similar where they look at, like, the volume of the voice, not just the topic of the call. You know, is it like a, if you're talking louder, you know, you might be more upset if you're talking quieter. So there's all kinds of interesting things you can do with those types of things.

Probably one of my favorite. As we were building some of those models, we were obviously very in the weeds with the data. And I kept seeing this term in the text, the transcribed text that said, this is a man from Dallas. And I was like, that's really weird. Like, I don't understand why this keeps coming up. And then we went and listened to the actual recording of the call, and it was saying, this is Amanda in Dallas. So you're talking to him. So even, you know, things like that, like, obviously, that wasn't going to fit, you know, that individual record wasn't going to affect our sentiment analysis very much. But, you know, getting into the weeds and really doing descriptive analysis on your data before you jump into any modeling is super important.

Customer journey analytics

Yeah, absolutely. And this is the work that I started three years ago when I was on the data science team. And that team has since taken it to like entirely new frontiers and just expanded it past my wildest imagination. I'd say one of the hardest things to start with is defining what is a journey, right? Like what is a client journey? Differentiating when somebody goes on the website and it's just kind of maybe poking around or maybe they logged in because they were bored and they're just reading articles. Are they really trying to do something purposeful and definitive or are they just logging in?

Once we decide you are on a purposeful journey, say again, I use that example at the beginning of moving money from one place to another, right? What is the start of that journey? Is it when you initiated that transfer? Is it when you logged in? Is it when, how is it completed? Those types of things. So we actually, we try to measure every client interaction we can, whether that be on the web, that be on a call, on in-person branch visits.

But trying to programmatically or systematically assign a journey to a category, like this is what the client was trying to do and this is what they did while along that journey. Because sometimes we've also seen that they can change their mind. They log in and they were going to make a transfer and then they saw an interesting article. So then they go read an article and then they're like, oh, maybe I should open an IRA. So they start exploring that page and then they go back to, oh yeah, I was here to make a journey. I know I do this not on Schwab.com, but probably on Amazon all the time, right?

To your question about measurement, a lot of it can be related to the things that we have tangible data on. So like how long maybe you spent on a specific page or were you calling at the same time you were on the websites, those types of things. Where we can measure that, we'd like to gain understanding of that simply for the purpose of making that journey easier, right? We want to make it easy to do business with. If you're in an e-commerce company and you want people to buy stuff from you, you should have that button very visible all the time, right? You shouldn't make it hard for people to figure out how to buy your products.

Tech stack and tools at Schwab

In terms of tech stack and capabilities, I'd say for better or for worse, we have a little bit of everything, right? So we have, when I started at Schwab six years ago, I was pretty much primarily a SAS programmer and heard immediately, pretty quickly after I joined Schwab that SAS was going away. And so I better learn R or Python. So I took that very seriously and really focused my personal development on that. Six and a half years later, we still have SAS and we still use it for a lot of different things. R and Python, I would say Python might have a little bit of an edge on the data science team, purely for the model production standpoints, but we still do have quite a few R developers.

We even, one of the new teams that I support, we have a lot of like R-Python conversion back and forth. So that's been interesting using things like the reticulate package has been super helpful to, you know, hopefully avoid a full conversion and just, you know, code in both languages. Let's see, other than that, you know, Tableau, Alteryx, like pretty much every, you know, everything that you can think of. And so a lot of the challenge comes from trying to determine what our kind of platform strategy is. Like where is the right use case? I would think we have a consensus that if you're building a dashboard, it should probably be Tableau. But for other things, like if you're building a model, if you're doing an analysis, I don't think there is that consensus. It's really whatever gets the right answer.

And to be quite honest, I've been trying to figure that out myself. I know that one of our teams has Workbench for model developments. And then another one of our teams that does model development has both Workbench and Connect. And it's pretty limited to that team, from my understanding today. So I'm trying to bribe the leader of that team with lots of coffee and chocolate to let us expand that.

The data science at Schwab user group

Yeah, absolutely. It's been really interesting, the growth of our data science at Schwab user group. It started out originally as a R user group, because we wanted, a few of us were new to Schwab at the same time, and we all wanted to learn R and we're just doing it on our own. And so it started out with a group of about probably four or five of us, literally every month, just presenting back to each other. Like it was the same four people over and over and over. And now our group has probably about 600 members across Schwab with about, our monthly attendance is somewhere around a third of that, about 200.

And we've expanded to all types of programming languages, all types of skillsets. We'd love to have people who come in and have just written their very first, hello world scripts. And we also have the people doing very advanced neural networks and topic modeling and all kinds of different things.

Yeah, we've moved over the years through a couple of different iterations. Back when we started, it was all on WebEx and it was just like an Outlook invite. Now we have better things like Teams, Microsoft Teams and Confluence from Atlassian. So we can actually post invites, we do trivia each month, those kinds of things. The Teams space is our main channel and it's also kind of de facto been our main kind of support system for Python R installation. There's just even basic things like downloading packages because we're highly regulated and because we have some specific, you know, situations that we have to work around.

And for me personally, moving away from that small group of the four or five of us answering all the questions to the community answering their own questions has been probably the most rewarding part of it, right? There's people that I don't even know that are answering questions and it's awesome.

Oh, I think it's too easy for this group, but I'll throw it out there. All R versions package updates are named after a famous comic strip. What is the comic strip? And somebody said the answer to every R trivia question is Hadley Wickham, which is awesome.

Feature engineering and customer journey complexity

Yeah, no, that's a fantastic question. I'm a big, probably my favorite type of chart is if I had to pick a single one as a Sankey chart. And when we first started building user journeys, we had a bunch that looked like an infinite loop, like a Sankey chart that was just an infinite loop because the customer, the data would show us that they were going, you know, A to B to C back to A and then, you know, just keep doing that. And I'm like, well, that's not very useful. That's not really a journey.

So in terms of the feature transformations, one thing that we did hone in on is trying to create this new kind of identifier in a way where we could have an identifier for the journey, right? Because we started from like the customer perspective, right? So this is what customer A did, and this is the order that they did it in. And instead, you know, transformed our thinking into thinking about journeys and how many customers did X, Y, Z, right? And even if it's not in that order.

So trying to transform the data that way, and then also just trying to not limit our thinking in terms of what touch points we should be measuring, right? It may not all be calls or web data that we can use. There might be other data, you know, systems outside of that, that would help us enhance that feature set. So a lot of it was almost kind of detective work of finding what data was available and how we could use that to enhance our understanding of what a client was trying to do.

And the comment about signal is really key there, right? There's a lot of, a lot of journeys that don't have the necessary signal to really make any kind of meaningful determination. And so then we have to decide, you know, how do we handle that case, right? Is it worth trying to measure? Can we, is it physically not possible to do, or are we just thinking about it?

So for the time question, I'll say too, we also have an approach where we really try to time box things. And so sometimes like a common one would might be 90 days or, you know, giving it a full quarter. I think time boxing is really important because you can waste a lot of time spinning your wheels, trying to find an answer that is not possible to get to, or not possible to derive value from. Maybe you can get the answer, but you can't actually get value from that answer.

Time here, but thank you all so much for the great questions and huge. Thank you to you, Greg, for joining us today. I did just want to let people know we've been experimenting with these kind of like spinoff hangout topics for things that come up quite a few times. Data stewardship was something that we talked about with Jamie Warner and Dan at Biogen. And so we have an event coming up on May 15th. They'll be sharing a little bit about their experience at data stewardship at the individual level, and then opening it up to a group discussion to just talk about like, is managing data part of your job as a data scientist? And what are the challenges that you're facing? So I just wanted to put a call out to that. Thank you all so much for taking the time to spend the day with us. Hope you all have a great rest of the day.