Running unified attribution at scale | Martin Stein @ Conversion Logix | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout, everybody. If we haven't had the chance to meet before, I'm Rachel, I lead customer marketing at Posit. Posit is the company formerly called RStudio . I like to add that in here now. We build enterprise solutions and open source tools for people who do data science with R and Python, and I'm joined by my co-host here, Libby.

Hey, everybody. I'm Libby, and speaking of R and Python, I am a Posit Academy mentor, so I help people learn R and Python to do more stuff with data in their everyday jobs. And of course, I also work with Rachel here to facilitate our amazing, beautiful community.

We're so happy to have you joining us here today. If it's your first time, the Hangout is our open space to hear what's going on in the world of data across all different industries, chat about data science leadership, and connect with others who are facing similar things as you. We get together here every Thursday at the same time, same place.

But if you're watching this as a recording in the future and you want to join us live, we would love to have you join us, and there will be details to add to your own calendar below.

Thank you so much to those who have helped make this the friendly and welcoming space that it is today, and we are so proud of that. We're all dedicated to keeping it that way, so if you ever have feedback about your experience that you'd like to share with me, and honestly, good or bad, or maybe suggestions for topics to dive deeper on, I'm going to share a Google Form in the chat with you right now.

But you can always reach out to me directly on LinkedIn as well. Absolutely, and reach out to each other on LinkedIn. We would love everybody to connect with each other. And we love hearing from you. It doesn't matter your years of experience, what industry you're in, what your job title is, what languages you use, we want you to know that you belong here, and we want to hear what you think, and we want to hear your questions.

So I encourage you to put yourself in the chat, put your name, your role, where you're from, a link to where somebody can find you, maybe your website, and definitely use the chat. It's full of your space for sharing resources and getting together. There are three ways for you to jump in and actually ask questions today, or just provide some feedback or perspective, so you can raise your hand on Zoom. We can call you to jump in. You can put your question in the Zoom chat. We will grab it from there.

And then there's also a Slido link where you can ask questions anonymously.

And because I have you all here, I just want to do a quick announcement, because I let people know yesterday about this month's workflow demo, but wanted to share here on October 30th, which is Wednesday, Ryan Johnson is going to share how you can save time with dynamic and professional PDFs powered by Types, Shiny , and Posit. And so I just wanted to share that with you. We hold those workflow demos once a month, the last Wednesday of every month.

With all that, thank you so much for spending time with us today. I'm so excited to be joined by our featured leader and co-host, Martin Stein, Chief Analytics Officer at ConversionLogix. And I've known Martin as a customer all the way back in 2018, I believe, a colleague, and now a Hangout featured leader. So Martin, I'd love to have you introduce yourself, share a little bit about your role today, and also something you like to do outside of work too.

Hey, Rachel, and hey, Libby, hey, community, it's awesome, wonderful to be here. So yeah, it's like, I think at least 2018, maybe even a little bit early, I was probably one of the first Commercial Connect customers, if I think that might go back to 2016 or so, so quite a while.

So my background is I work for an agency called ConversionLogix, a tech-enabled agency in the space of apartment rentals and senior living rentals. The company provides services for marketing, media services, advertisement, and of course, there's a clear indication that you need data science to optimize out of that, and that's what we're doing.

So my background, I started real quick at the age of 18. My first software company, studied then political science and sociology, statistics, got all into SPSS, and as I was a coder, it didn't take too long to start R, and I've been using R probably for, since 2008, nine, for a long, long time.

And yeah, so now I'm a practitioner as well with my team here, and things that I do like is take my dog, Camuti, she's a religion rich back out here in beautiful Central Oregon onto hikes, and now she's just got up, she heard her name, I guess, and if you hear some whining, that's her.

So to answer your question, if you understand the whole journey, you know how to spend the money as a marketer wisely, and you're more efficient with your marketing spend, and you have less wasted spend.

Building the business case internally

So in the past, and even currently, what happens is I have an idea, I propose a project, I pitch it maybe to my manager, and my manager, bless him, is like, hey, I'm behind you. I want to help you pitch this project. However, I need you to help build the business case for it, right? And I can think on those terms, but I'm used to the nitty-gritty and thinking about the data problem, right, not the business problem. And so I wondered, from your experience, do you handle any of that? What's your methodology? If not, who do you hand that off to?

So I think that is the question that so many of us face, where when somebody comes to us and has a clear set of problems to solve for them, and they have a clear set of goals, has a clear set of problems to solve with, and it's already vetted, and you just got to get to it. That is an easier case to deal with, because you know, okay, let me know what's the problem, where's the data, by when do you need something, right?

But when you have to get to your point, go out and say, hey, look, I mean, I'm discovering those issues, and I want to make a case about we have to solve something. I'll give you an example. I mean, everybody, all companies deal with a churn problem, a customer churn problem to a certain degree. You do really good work, so you might sit in your company and discover, hey, look, we're losing too many customers, and I would like to help them about, you know, how do we address this?

So, that's a classical situation where you, as a data scientist, as in an organization or in a team, have an idea about what to do. Classically, to me, that has been the situation throughout my career. I have been a data scientist and a chief product officer, so I always cover both areas, the problem discovery and the problem solution, and then the technical implementation with my teams and myself.

So, the way to do that is really get very clear about what's the impact that this problem that you want to solve has. If the impact is quantifiable and the impact is easy to understand for your stakeholders, then I think the first thing that I would do is, like, you know, kick off a little bit of a research project for you. It doesn't have to be big, about just getting your stakeholders, the audience that potentially makes your business decision, if you get the time and resources to solve that issue, to get them actually understand what the problem is.

So, that is the hardest part because you might go into the technical details and how to solve it, but it's not about the how at this point. You purely have to focus on why it matters and why it matters to your organization or your team. And so, that's usually very high level and you don't spend a lot of time in 20 different cases. You just bring up the one, two, three biggest problems. And then for the stakeholders, they have to make a trade-off decision. Do we do this or do we do something else? Do you understand that everything they do is a trade-off decision?

The best method to do this is to have conversations with the business owners. That's what I do. I sit down and listen and talk to them very frequently. Try to understand what the issues are. And that's the same what we do as a marketing company with our customers. Maybe today it's not the attribution piece. Maybe today it's like they have a competitor across the street. And then we would like to understand what it is. And once we understand this and contextualize the problem, that's really the most important thing. And then help them make a trade-off decision, second most important thing. Then you have basically the first step done.

Pitching to external clients

So out of my experience, I deal a lot with investors too. So I think one of the toughest pieces when you go to a venture capitalist and make a case for getting $10 million, right? That's a whole lot of money for them to spend. So your case has to be really, really, really good.

So I think to me, it really breaks down, Jared, into three aspects that I follow. Number one is I always open my conversations with framing the situation about who we are, who I am, who the organization is, so they know where I'm coming from, right? So they know, oh, yeah, this person comes out of this direction. So everybody else you're speaking with can contextualize you and know what to ask you and make sure that you're on the same page.

The second piece, and that very first, usually in an investor presentation, when you go through your startup, it's the same like with the client. It's like you present yourself, say, we are conversion logics. This is what we do, and that's what people love about us. And these are the problems we solve. So you've got to break it down. And that's the same thing in a data science environment.

Next step is give the other side a chance to weigh in. So the main approach that I would – the main way I would describe that approach is consulting or consulting. It's not selling. It's really being more – you being the doctor who listens to somebody's issue. And then you're trying to understand very good, careful listening.

When they have shared their information, then you got to – you have to aggregate it. You have to reduce complexity. If you don't reduce complexity in that conversation, if you create more complexity, you lost it. It's as simple as that. If you just go into one area and you blow that up technically, oh, we could do this. People will not know about what this means, what you're saying. They have no idea. But you got to reduce complexity.

You said what you do. You listen to what their problem is. And then you got to bring this together and say, oh, I see. You have an issue here that you – for example, you have – in our case, your apartment occupancy rates are not 95%. And for some buildings in a highly competitive environment, let's say Chicago. So we can help with that. And then we go forward and say, confirm that this is the problem. Reduce complexity. Focus on one or two things.

And then you say, hey, classically, that's where really trust comes in now because somebody is listening to you. And it's really a decision theory challenge here. It's like they got to learn to understand that we talk about the right thing. Then they got to give you credit, that you have credibility. That's now after you locked in into here's the problem that we can – that you said you have and we can help you with. Now you bring credibility. Now it's about the time, not before. At this point, you say, hey, look, this is what we have done for others. And then people understand, oh, same problem. You have done this. And then at the very end, you can talk about how do we go about this. But that's usually only 10% of the conversation.

A/B testing and causal inference

So I think we see a big change to, well, not a big change, but we see a change to causal analysis and causal inference and so on. And classically, we did a lot of A-B testing. Our organizations did a lot of A-B testing.

And I do think that on the A-B testing side, it's a classic method. As long as you don't violate the peaking problem. We all know what we're talking about in A-B testing. You do it right. And I think for A-B testing in general, it's still a common practice. I think you need data scientists or very, very knowledgeable data analysts to get it right. I'm not a big fan of those super automated A-B testing tools where, you know, you put something forward, you look at what's happening, you peek the heck out of what's happening here. This is not the way to do it.

Causal analysis and causal inference, I feel like, is a really, really interesting approach. I personally think this allows us to do a whole lot more with understanding, you know, where without an A-B test where over time series, over a certain amount of time, things have happened. We're just right now on our side at ConversionLogix getting into causal inference and causal analysis. In my last probably two to three years, it was a mainstay. I think most agencies today should probably have a plan for how to, you know, leverage causal inference and causal analysis. You can do it in R. You can do it in Python.

I think one of my favorites, I have it right here. I'm going to share this with you. One of the great books is by Martin Hoover, you know, so it's like causal analysis. You can see it here. That's one of the really, really good ones that I recommend. And Martin puts out a really good theoretical context about how to conduct causal analysis. So that is my recommended read for people who have not gone into this.

Customer lifetime value

So yeah, Peter Feder is a Wharton professor for those. He has published a couple of really, really great books as well. He puts forward his own method and principle about what metrics to follow in an organization.

So he's really straightforward with what values to take, how to calculate those things, and then puts this in a context that makes sense. If you feel like other VCs out there, not that Peter Feder is a VC, but VCs out there have their own systems about how do you calculate customer lifetime value and payback periods and so on. So now we're going into a more financial modeling part, right?

When you create those values or those KPIs for an organization or a team, you need to understand the context of what is that going to tell you, right? What is the next? How do we put this in context? And so custom lifetime value for those who are hearing this term for the first time, it's really about an approach to understand what's the value for your customer here in your organization that you have acquired. And that tells you what to do, how to sell, how to market, how to retain that customer and how to get new customers because there's a limited lifetime potentially for that.

So to me, if I would be in an organization, here's the tip that I would give you. I would sit down with my finance team. If you have a FP&A team, financial planning and analysis team, they are all over that. FP&A, if you have this in your organization, you sit down with them as a data scientist and say, you know, how do we actually get, what are your challenges?

So you have that team, finance team, and they have a sub-team called FP&A, financial planning and analysis. They do CLV calculations. And you say, hey, look, I mean, I know you might do this in Excel or whatever, in Google Sheets. We can do this potentially. I can help you here. And so I think you want them to explain to you what their approach is. Do they follow a Peter Fader approach or something else? And then you can actually follow up on really the metrics, make sure you have the data and help them.

And to me, help always looks like the best way is to show something of value. Classically, what I do is I put forward a shiny app that I share and connect and, you know, just model something very quickly. And usually what happens is people get excited when they see that.

Don't do it alone because you go down potentially a rabbit hole of just, like, metrics. What I'm saying is, like, there's a reality in your organization. If you have a big organization like 200 people, most likely you have somebody in finance who actually does the FP&A role. And then you talk to that person and then you say, hey, look, what's your biggest challenge? And that's basically then just listening.

And if you have something like Connect where we as data scientists can show value just, like, immediately, this is the best part of it because you just put something together and your job is to really then get this person excited and help them to do more with their time. So the whole company benefits from your effort and your finance person's effort as well.

Getting started with AI

Best question ever because what we're doing here is the best way to do it. Join a community. That's literally the answer. That is the answer. Honestly, I could not give you a better answer to that. I'm a member of multiple communities. I'm a member of the Machine Learning Hangout in Seattle, ML Ops community, the Data Science Hangout that Rachel started many, many years ago, which is fantastic.

So to me, literally what you should be, if you have time to join those meetings, go and participate in those in communities. Go on to the Slack channels. If you have a local meetup, I mean, go to the local meetup. I mean, nothing beats meeting people, explaining what you're doing, and understanding what they're doing.

So it really comes down to two things. Thing number one is that you can understand that you're not the only one dealing with an issue and everybody is learning. I mean, we're all learning, continuously learning. You're not the only one who is learning. I would bet out of the 130 people, 20 people here, there would be only a handful saying, oh, I know everything. I don't need to learn anymore. So that's why people go there.

I remember just one example when Generative AI came around, and I was in Seattle, and a ton of people said, oh, let's go to those meetups. People didn't really know what racks are and so on and so on and all this stuff, retrieving an augmented system. That's when people said, this is what I do. Some brought example applications that they had running on the machines, and then they said, here's my repository, and then you can go there and look at the code. So that's really that exchange that helps you going and helps with learning.

Yeah, I just wasn't sure how – I haven't used Shiny in a long time, but I wasn't sure how people were sharing it. I remember when I published it, it's on my local host URL, but if I don't have Posit at this particular company, I forget how to share it independent of me. So it's like, hey, here's a tool for finance use, upload Excel, et cetera. I just don't know about the modern stack.

Now, I think that is – so I think there are a couple of solutions for that issue. Posit has a hosted environment where you can actually – I don't know if it's for free. There was a free portion there in the past, and I don't know what the URL is. Was it shinyapps .io, I think, or something like this? Shinyapps.io, and now there's Connect Cloud as well.

Yeah, so I think I would go hosted, Sol, first, and I would set up my own account there, and you can actually then from your ID, you can use RStudio. I don't know if Positron does it now. I haven't tested Positron on that. But in RStudio, when you publish, you can connect to any target, and the hosted service that Posit offers is such a target. That's, to me, the easiest way, and you don't need to deal with infrastructure, which is really what your whole question is all about. It's like, how do I manage infrastructure when I just want to share something? I'm not a DevOps engineer. I'm a data scientist.

So there are two stages to infrastructure. Let's put it this way. If you have time, if you have a DevOps engineer around you, well, you can go and look into Docker-based solutions and spin up things on that side. You know, there's definitely a way to do that, and you can host this as well on Cloud Run and Google or wherever else you want to do that.

If you do business data, you probably don't have authentication issues. You have authentication issues until you might connect something with Firebase and other servers on Google's side, which I developed for authentication people. But it takes so much time out of you, quite frankly, that you want to go and rather make the case and say, hey, team, that's what I did in my organization. Let's get Connect. And we have Connect running on GCP, and it does what it needs to do.

First, you go to Posit's public service and create an account, and then you have authentication, all of that stuff taken care of. You don't need to do anything. Infrastructure check. And once you grow and have shown this and people like this, you can run your own Connect environment on GCP, and it's super easy to set up. I mean, honestly, I can do it. So it's not that difficult to get all of this stuff going. So that's typically the way that I would suggest.

If you don't want to go the way to use the Shiny app, then there's an easier way. You can use Quarto , and you can use interactive Quarto documents as well, which is really a very cool way to showcase something. So that runs a new machine. You don't need to host anything. And then we can also go to Shiny Live and WebR in a second. But Quarto is a really good markdown approach of creating a document, having some basic interactivity in there. And I think that's probably the easiest way now that I think. So I would do Quarto first. If you're on to developing an app with more interactivity, more app-based than a markdown, then you go to Posit and put it on their servers. And if you're really serious about that, that has business cases funded, then go and get Connect.

Use cases with Connect, pins, and Vetiver

So when we started this session here, I was quoting the number one issue on the data side, remove the three buckets, data, putting in production, having data consistently good, know where it is, use case number one, putting stuff into production, use case number two. And then we talk about the teams on the other side. Let's remove the teams, go to use case one, which is where's my data? Typically use case for us is using pins, pins with Connect. I think there's no better solution.

There are probably more other solutions out there, but pins is great. You can have actually a repository that understands where your data sets are. It's versioned. It can be centralized on Connect as well. You can actually put in local machines as well. You only connect to pins, but that's a typical use case where how we use Connect is when we share data. You have controls over who can see it, who cannot see it. And then you have versioning that comes with it. To me reduces 30% of the first problem we talked about. It's just worth it out of the box.

Second use case is now we talk a little bit about something that once you did your data, you run your tidy models or whatever else, or you run your scikit-learn stuff and you have everything going and you might have a shiny app or streaming app or you name it, whatever you want to run there. It's really sharing and it's the production case. Production cases have two sides. We're going to be very clear about not everything you do is an MLOps case and you have to build an enterprise application.

There's a super great framework that's called Vetiver. That is from Julia and the R version, at least I think is Julia wrote this and the Posit team. And that is another use case where you can store Vetiver models on Connect. So that's the use case we use. So we understand what happens to the models there. So I use everything, the whole bandwidth from the very beginning, to the very end and then share things.

Measuring channel performance

So that's a really tricky question because usually it's not like the success of one channel, but the success of your campaign in marketing. So if you have a campaign that uses multiple channels, right? To me, classically what we have done as marketers, we look at what's our impression share, what's our clicks and click-through rates. And if you have some kind of measurement on conversion, you look at that. So that's the classic thing that we do, right? So we look at each channel separately. Sometimes we don't understand how they connect together. That's what we're working on here at ConversionLogix. How do they support each other?

So now how do you do that? That's the key question. You can use platforms that aggregate that for you, like TapClicks and others. They go out and reach out to your AdWords account, to your Meta account. I mean, you name it, and they put data into their system. So there are a ton of those services out there. Me personally, I would like to have everything in BigQuery. That's where I put it.

So now with GA4, Google Analytics 4, you get for free BigQuery import function. So when you go into your GA4 account, you can actually set up that up to 1 million events a month, get copied into a BigQuery account. So what is BigQuery first? BigQuery is a data warehouse in Google, so it's like a nested data structure, not just like flat, like classically what we know from CSV files, but it's nested. So it's very efficient for storing, uses SQL to query.

You don't need to worry about this because we have Deployer, which can do all of that cool stuff, so we don't need to learn SQL. So the thing that we do here is literally get the data in there and then basically pull the data out and analyze. So now the question, how do I look at this? I compare, first of all. I compare all the stats across all of the channels. Once I connect Deployer to BigQuery and pull this all in and do the typical modeling that I do, it's like compare things based on impressions, clicks, click-throughs, and conversions. That's number one.

Model monitoring with Vetiver

So Vetiver to me is really a, you know, once you build your model and you put it out, you can host it, and you have the versioning, and you get basically taken care of with Connect with Vetiver data being there and being versioned. To me, the most important part about Vetiver is, like, the known data that went into my model and how does my model react to unseen data. I mean, that's literally what you want to compare, and that's the general use cases for Vetiver. That's a classical, you can say, it's an MLOps use case.

What you want to see is, like, if your model produces an inference and produces predictions with data that it hasn't seen, is it behaving correctly, or are there aspects of drift, data drift, model drift, and so on, which basically means are there things happening to data that your model was not trained for, right? And therefore, what your model does is not what you want it to do. And so you need to know when that happens.

And therefore, we aggregate data that comes out of the prediction, out of the inference, and look at metrics that basically says, well, this has a different distribution now, and so on. Things are changing. Once we detect that, most likely something has changed. And for you as a data scientist, at that point, you got to go back and say, well, let's take new data and retrain the model.

One example, I can't even remember pre-COVID time, but now I do. Pre-COVID was we collected data, and then we got into COVID in 2020, and all of a sudden the environment changed. And I can tell you, because I was doing data science machine learning at the time, most of our models failed at that point. They just didn't work anymore, because behavior completely changed. In marketing, it was really traumatic, traumatic. I mean, everything was different, right?

Here's where Vetiver comes into play. When you have your models hosted and they detect that something is, or Vetiver detects, the model doesn't detect this, the systems that ML Ops, those ML Ops systems detect that things are changing, that's for you the first hint to go back and really go to your modeling part and take the new data and compare and start understanding what is different and why things are different.

I know we're getting close to the top of the hour, but that is my number one use case for anything ML Ops, that you want to have an accurate model. And there's a legal side to that, too. You want to make sure your model is also compliant. If you're in a regulated market, like real estate marketing has Fair Housing Act, there are other markets with regulations. You got to make sure that your model is compliant, and those are some of the ways that you can make sure there's model compliance.

Career advice

Don't really think if you don't know anything that you cannot get there. You can get there. Absolutely. Believe in yourself. Yeah, we all have questions. I mean, it's really about how you deal with who can help you. Find a mentor. That's really the one thing. People can actually take you by the hand where you are at the moment and then pay back to the community. So that is literally my number one is, like, don't doubt yourself. If you don't know, ask somebody. Get some help and pay back.

You know, give interns, if you have a chance, give interns a chance to work in your organization. Take a data intern, a data scientist intern, somebody, give them a chance to work on real problems. That's how they learn. That's how we actually continue to be this beautiful community that we are.

Find a mentor. That's really the one thing. People can actually take you by the hand where you are at the moment and then pay back to the community.

Next week Marco Gorelli is going to be joining us as the featured leader, and Marco is a core dev of pandas and polars. So it might be another fun one to share with your team. But thank you all so much. Have a great rest of the day.

Running unified attribution at scale | Martin Stein @ Conversion Logix | Data Science Hangout

Transcript#

Unified attribution and the product launch

Daily struggles in data science

Understanding last touch attribution

Building the business case internally

Pitching to external clients

A/B testing and causal inference

Customer lifetime value

Getting started with AI

Use cases with Connect, pins, and Vetiver

Measuring channel performance

Model monitoring with Vetiver

Career advice

Featured software#

renv

rstudio

Running unified attribution at scale | Martin Stein @ Conversion Logix | Data Science Hangout

Transcript#

Unified attribution and the product launch

Daily struggles in data science

Understanding last touch attribution

Building the business case internally

Pitching to external clients

A/B testing and causal inference

Customer lifetime value

Getting started with AI

Sharing Shiny apps and infrastructure

Use cases with Connect, pins, and Vetiver

Measuring channel performance

Model monitoring with Vetiver

Career advice

Featured software#

renv

rstudio