Resources

Peter Spangler, Marketing Analytics Leader | Data Science Hangout

video
Jul 8, 2024
58:50

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody. Welcome to the Data Science Hangout. Nice to see you all again this week. Thank you to Randy for covering for me last week when I was out. I'm Rachel Dempsey. I lead Customer Marketing at Posit. Posit's the open source data science company building tools for the individual, team, and enterprise. Thank you so much for spending time with us today. If you're new to the Hangout, this is our open space to hear what's going on in the world of data across different industries, chat about data science leadership, and connect with others facing similar things as you.

And so we get together here every Thursday at the same time, same place. So if you happen to be watching this as a recording in the future, and you want to join us live, there'll be details to add it to your calendar below. And I know that people really enjoy connecting with other attendees here. I see you all saying hello in the chat already. But if you are interested in connecting with others, I want to encourage you to say hello in the chat and introduce yourself, maybe include your role or your base, something you do for fun, if you're going to Posit conference in August too.

But we're all dedicated to keeping this the friendly and welcoming space that you all have made it. We love hearing from you, no matter your gears of experience, titles, industry, or languages that you work in. If you are hiring, feel free to share your roles in the chat as well. It's great to see all the open roles there to be able to share them with everyone here. It's also 100% okay if you just want to listen in for the day, or maybe you prefer to hang out mostly in the Zoom chat. But if you do want to jump in and ask questions or provide your own perspective, you can raise your hand on Zoom, and I will call on you.

And with all that, thank you again for joining us today. I'm so excited to be joined by my co-host Peter Spangler, who has led marketing analytics at Lyft, Nextdoor, and Toast. And Peter, I'd love to have you kick us off with introducing yourself and sharing a little bit about your background.

Yeah, absolutely. My background primarily has been working in technology companies. I got started working in an Alibaba startup, which is primarily focused on bringing their marketplace to the United States. And so I got my first taste of marketing analytics working in that environment. I kind of moved on through different SaaS companies like Citrix and Toast. And then I built a marketing science team really from the ground up at Nextdoor, which is just a very localized social media platform where we're really focused on growth and customer engagement, as well as marketing to SMBs.

And so a lot of my time was really trying to understand, is our marketing effective? Is it efficient? And what are the kinds of metrics and methodologies that we can use to tell that story back to the business and do it in a way that's compelling? And so a lot of what I work on now are projects that are really mapping business solutions to a scientific approach.

Getting into data and discovering R

So I had a really, I would say probably ordinary experience in college. I was studying statistics and political science. And at that time, political science had become primarily a quantitative field focused on understanding the relationships between voting behaviors, election results. And a lot of that mapped to the types of inferential statistics and causational models that we have today. I think the best example is when President Obama was running, that was really the first time that you saw a lot more investment in digital marketing, essentially, as a way of influencing voting behavior.

And so when I started, I was using the tools that universities use, which are essentially paid statistical tools like SPSS. Once I left that academic environment and was going into my career, I realized I'm not going to have access to SPSS. IBM had purchased that product by that time, and it was on the order of $4,000 or $5,000 for a license. And so I discovered R. And R and RStudio just gave me essentially a key to open up my career, a key to a bigger world, and a way of helping to accomplish the goals I was trying to accomplish in my own career while having a community to support me.

R and RStudio just gave me essentially a key to open up my career, a key to a bigger world, and a way of helping to accomplish the goals I was trying to accomplish in my own career while having a community to support me.

And so that's also led to myself and a colleague starting up a data consulting business of our own called Paradigm Data Group. And a lot of what we do is spend time giving support to communities that don't have a lot of attention. So, for example, in my own local community, we built the Shiny app for our university, and that Shiny app delivers essentially inferential data about how we are delivering food to the needy in our community and where we can do better and how many people we're serving and our progress over time. And last year, we helped to train the L.A. County Public Health Department in RStudio and the languages that we use in addition to that as well as the packages.

Building a marketing analytics team from scratch

I would recommend, I think, if I was to go into any technology or non-technology company and there was not a team or there was a beginning team, the first thing I would do is have a meeting with every executive or the most senior leaders I could at that organization and ask them the questions they're trying to answer. I think that one of the ways that you can be derailed or be prevented from being successful in any data role, especially in a marketing analytics role, is to not capture those core questions. So the first thing I would do is go in and ask, you know, what are the core questions you're trying to answer?

I would then understand what the infrastructure we have available is. Are we capturing data that's able to measure the things we need, the metrics we need in order to answer those questions? If not, what are the lightest weight tools that we can spin up? And then I would start answering the lowest hanging fruit first. In my experience, building trust is the quickest way to be successful. And that really means doing good work, but it also means communicating your work and being someone that is engaging, being someone who's available. And so I would really start out with those first three steps. What are the questions we're trying to answer? How are we going to be able to answer them given the technology and infrastructure we have in our data? And then what is the lightest question I can answer first that gets me started building trust?

Training teams on R and connecting to databases

Yeah, absolutely. So that is part of what I do when I'm not at work is Paradigm Data Group trains, like I said, the example I was giving is training folks at the LA County Public Health Department. And the example I think with them is very good because it's a timeline, right? The LA County Public Health Department is the largest of its kind and probably in the world, at least in America. And as COVID started ramping up during the pandemic in 2020, there started to be a greater and greater need for reproducibility. Analysis that could be put into analytical frameworks that were built in code and where the results of those could be shared across domains and across other tools.

I always start with a tidyverse because it's like how we construct sentences, right? It's a verb, subjects, and objects. And that can get you off to a very fast start. And I think it also has the benefit of building trust, right? You're delivering something to a stakeholder that answers a question. And it gets you started in being able to work one-on-one with someone who maybe has more experience and is more comfortable in the language.

I think the best way to handle... So I've been writing SQL, I guess, for maybe 15 years. I haven't ever... And that's going to be an essential skill in any technology company, right? Because every database is cloud database these days. So you're accessing it through something like Snowflake or Databricks. And in that case, you're going to be able to connect through either the IDE on your desktop or through the platform tools that Posit offers. So Posit offers tools where if you would like to, you could simply write SQL inside of a browser window, right? That browser window is essentially is when we're connected to an instance in the cloud. And then you're pulling down data through the connection that you build inside of the IDE.

I would suggest that if it's a security question, that you really look at the resources available through Posit. This is something that is very, very common, but I can tell you it's easily overcome. When I worked at Lyft, for example, we were advocating to use Posit's platform tools and it was not a challenge.

Prioritizing work and recommending strategy changes

Let's start with how do you manage the day to day? So the way that I work primarily is through prioritizing our workflows, meaning what we're, which projects we're going to execute on and resource to based on size and sensitivity. So for example, I have a very simple framework of where I measure the size of a project. What is its contribution to the organization? How are we defining that contribution? And it is a sort of arithmetic way of doing it. It's quantitative, but it also needs to ladder up to one of the strategic goals of the organization.

I would first focus on prioritizing based upon the influence, the effect, the impact of the project itself. And make sure that you're working with your stakeholders. If you're reporting into a director or a manager or something, make sure that you, you know, spend time and take the opportunity with that manager to sit down and say, hey, you know, I'd like to be able to more effectively manage my work. Maybe I feel overloaded. I don't have time to do the quality of work that I would like to. And here's a framework in which I'm thinking about using with my partners to size projects.

When it comes to changing strategy, that typically comes from an analysis, meaning some data product, some data science product you've created. Oftentimes in marketing, what we really care about is are we getting close to our goal or plan for the organization? Meaning like if we say we're going to deliver a thousand customers this month, are we delivering, are we pacing to that? If you're not pacing to that, which will occur from time to time, you need to understand why.

And so you might create an analysis that surfaces that maybe a channel is underperforming. Maybe something has changed in your acquisition funnel. And you need to understand how this change is contributing, say, negatively to your goal. If that's the case, then you need to be able to first identify the problem, define it, to be able to express how it is related to the same thing, to one of the strategic goals of the organization, and communicate that to your team. Marketing has a bit of a challenge compared to most product analytic roles, where the marketing funnels themselves can change on a regular basis depending upon tactics that change.

Causal frameworks and ROI in marketing

So one of all marketing partners, no matter where you are, is going to be a finance team because that's who holds the budget for marketing. And the job of finance, like the job of, you know, any good finance teams to be skeptical. You know, if a marketer says, like, I need a thousand dollars to acquire a customer, they're going to say, why do you need a thousand dollars? And you should have observational data, right? You should take an empirical approach to this. What is the empirical evidence that suggests you need a thousand dollars to acquire a customer?

I think that depending upon where you are with the finance team, depending on what your organization is like, there will be more skeptical or less skeptical. I think one of the one of the ways marketers often work is they create exactly that example I gave. So create a CPA, a cost per acquisition number. It's based upon a set of assumptions and that cost per acquisition number is largely arbitrary. It's something that they usually think that they can achieve, but it's not based upon efficiency, meaning how low could it possibly be?

And that's when you start having to have more causal frameworks, meaning incrementality studies. And those come in a variety of flavors, whether it's a randomized controlled experiment, whether it's a geomapping, geomatching experiment. But at the end of all those experiments, there should be some metric that you can validate internally on your own data.

I think that the one of the challenges for all marketing teams is being able to make those causal statements, meaning that if we spend another hundred thousand dollars, a million dollars, ten million dollars, we will know that this marginal difference is incremental, meaning it would not we would not get this revenue if we didn't spend this money. Those are the most challenging questions to answer, and I think that is where oftentimes marketing teams and finance teams can get disconnected, is that there's a lot of value for the marketers or the marketing analytics folks should be spending time communicating these concepts. Evangelize for measurement, but spend a lot of time on communicating the concepts of causality and how you're going to define success.

Those are the most challenging questions to answer, and I think that is where oftentimes marketing teams and finance teams can get disconnected, is that there's a lot of value for the marketers or the marketing analytics folks should be spending time communicating these concepts. Evangelize for measurement, but spend a lot of time on communicating the concepts of causality and how you're going to define success.

Recommender systems and experiment design

Yes, absolutely. So I'll give you an example. At Nextdoor, you know, one of the things that we're always trying to accomplish is increasing engagement on the platform, meaning getting people to log in, to connect with each other, to communicate with each other. And in our case, one of the objectives we had was creating more connections between users. And in that case, we developed our own algorithm, which essentially was a distance measure, to think of like a cosine distance or something like that. We developed an algorithm based upon a distance measurement and then features about users themselves in order to recommend who they should connect to.

And that, so that plays a very critical role in also how we deliver ads, right? We want to make sure that the ads themselves are relevant to the person receiving them. And we know certain things about users, we have features about them, and we can measure similarity to the success of an ad with somebody like that. So, for example, if an ad is successful with me, and there's someone who has a very similar profile to me based upon some metric that we can create from an algorithm, like a distance metric, we might serve that other person the same ad.

You need to have a metric. All experiments really require two things from the outset, at least, which is you need a metric that's sensitive enough to change during the observational period. So you need to know, you know, how sensitive is this metric? Then you also need a metric that you know represents getting closer to your goal. So, for example, if we wanted to understand if we were increasing engagement on the platform, and we say engagement is defined as the percentage of days in the month that you're on Nextdoor, we can't wait a month to run that experiment.

Right? So we need some metric that correlates with that. And we know one of the things that correlates with that is your connections, the number of connections you have. And so we chose connections as our experiment metric, meaning the objective function of that experiment. And then we would be able to identify, you know, if we ran this experiment on, you know, X number of users, we would expect hypothetically based on effect size, say 10% increase or 15% marginal increase, this many new connections.

And we'd be able to then have a power analysis to determine, you know, a sensitivity analysis to determine if that was going to be sufficient to be confident on that result. So it's about making sure that you're selecting the right thing to measure, which can sink an experiment, right? Because if you say, oh, we need six months to run this, you know, you're probably going to get some pushback. And then you're going to have to have a metric that ensures it's correlated with something up funnel from the final objective that you're interested in.

Surprising insights about user behavior

I think that one of the things I've learned that I was struck by early on was that people understand value based upon something they can map to themselves. So, for example, when I worked at Lyft, we had this hypothesis that there would be, we'd be able to identify the likelihood of acquiring a driver based upon the features of a driver that map to another driver on our platform. So, meaning another way of saying that is people that are like each other will behave similarly.

We know from, you know, he talked to any statistician, everyone is different, but they're also very similar. And so one of the things we found was, is that, you know, depending upon where someone lived, the neighborhood that they lived in, like something very nuanced, depending upon if they knew someone who was also driving for Lyft. That made them wildly more likely to join our platform and become a driver. And so I think that like really the insight there is that try to think about how people can be similar as much as they are different. And that people understand value based upon something that they can relate to.

So for example, if you think about asking someone to do something in your marketing funnel, meaning like say make a purchase, if you present them with something that is very, appears to be very low value, like if you were buying Zoom and it was 99 cents, that's probably going to harm your conversion rate because it doesn't look like it has very much value. Anything that's a dollar doesn't have any value. Start valuing your products the way that people would perceive that value. And especially when you tell that story, tell the story about how people extract value from it so that when you present them with a value on purchasing, that it makes sense.

Geo analysis and customer data platforms

Yeah, so a lot of marketing, you have households. And the way you break up households could be by geography, could be by CBSAs, micro metropolitan areas. But you might have a city that sits on the state boundary where you have very different activity just by the factor in two different states. Market defined areas are nil since what people watch on TV or radio and defined areas like that. I would recommend for anyone who's not part of it, especially if you're thinking about doing any type of geo analysis, I would recommend joining the Census Bureau's live channel. It is full of loads of resources, many of them in R, because most of the geo work is done with libraries in R.

Yes, Jared, to your point, yes, I do use, I oftentimes break those down like cities or households into census tracts. Census tracts, because it creates homogeneity, right? So you want to make sure when you're doing any kind of analysis that you understand the variation within the units that you're looking at and then within the subjects. So I think of census tracts as also being, or even DMAs, as being a really great opportunity to run studies. So, for example, you can look at the package that was created by Kim Larson, who's one of the, I think he was the first head of data science for Citrix. He's a friend of mine, created a package called Market Matching. That's a way of using those geo locations in order to test hypotheses about any number of things, including your marketing.

One of the examples I would give with why that is so important, those geo locations, is when you're doing paid marketing. So, for example, on Facebook, the way those algorithms work is that they're going to capture the cheapest click. And one of the most influential variables in that model is where's the greatest amount of capital, meaning just where's the greatest amount of money. And so your marketing will primarily go into the 20 or 30 largest cities in the country. And if you're trying to understand over time how the saturation of customers in those cities could be influencing the success of your marketing, meaning how much does it cost to acquire a customer, how valuable is your average customer, you have to start being able to break down your acquisition based upon those locations.

Sales forecasting with sparse data

But I will start with this. There are two types of forecasts, basically, when it comes to sort of marketing work, which is a forecast based upon inputs. So, for example, like if I don't know what kind of data you have, but you might have things like impressions or website traffic, things like that. That is something you could use to build a forecast from. I don't think I would start with doing it at specific zip code. I think I would start with doing it at the aggregate level. And then if you have the same data for every the same amount of data for every single zip code, you could test it on some zip codes to see how close you get to the actual values.

Another way I would do this, and I think I probably have an example I can share with you offline, is thinking about doing this with a structural, a structural model, meaning a time series model where you're just fitting a line. Right, you're fitting a predicted line to the actual data. Matt Dancho has some really good resources on this that are free and publicly available. I would also check out. You could try using Prophet, which is Facebook's forecast model, because it takes does take inputs. And there's a great getting started in R. I think the time series one if you're just if you don't need explanation of why you're getting you're you're getting the results you're getting meaning like you don't need to know. Do we need more impressions do we need more website visits. And you don't need to know why but you just need to know if you're right. I would think about using a time series.

Okay, I think we have covered most everything. Sorry if there were any that I missed, but thank you so much, everybody for all the great questions and thank you so much, Peter for joining us today. This has been great.