Resources

Standardizing a safety model with tidymodels, Posit Team & Databricks at Suffolk Construction

video
Mar 26, 2025
29:24

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everyone, my name is Max Patterson and I'm a data scientist at Suffolk Construction. Today, I'm excited to share with you how we leverage the Posit suite of tools to streamline and productionize our machine learning workflows. Specifically, I'll be detailing the workflow for one of our models that focuses on enhancing project safety.

I'll be covering several key areas in today's presentation. First, I'll introduce Suffolk as a company, giving you a better understanding of our perspective. Next, I'll detail the purpose and significance of our safety model, explaining why it's crucial for our operations. Following that, I'll walk you through the key information about the model itself and the objectives we aim to achieve through its implementation. We will then discuss the critical events in the model's workflow and how we utilize the Posit suite to accomplish these tasks efficiently.

Additionally, I'll demonstrate some of the outputs from this process, including a Shiny dashboard and an automated email reporting system. Finally, I'll touch on data security and how Posit enables us to implement row-level permissioning within our deployed applications, ensuring that our data remains secure and accessible only to authorized users.

About Suffolk Construction

So, who is Suffolk? Suffolk Construction is a leading general contractor headquartered in Roxbury, Massachusetts, just south of Boston. We have a strong presence across the United States with offices in New York, Florida, Texas, California, and several other states. Our work spans multiple sectors, including commercial, gaming, aviation, healthcare, life sciences, education, and mission critical. With a workforce of approximately 3,000 employees, Suffolk is recognized as one of the 25 largest general contractors in the United States.

The safety model and why it matters

As a general contractor, safety is of paramount importance to us. Our commitment to safety is reflected in our jobsite signs that remind everyone that the most important thing they do each day is to go home to their families. To ensure the highest standards of safety, we track various key performance indicators related to safety performance. One of the industry standard metrics we use is the Total Recordable Incident Rate, TRIR, which measures the number of incidents as a function of the total number of hours worked on the jobsite. At Suffolk, we monitor this metric at the project, regional, and company levels.

While TRIR is an excellent metric for assessing historical performance, it is inherently a lagging indicator. This limitation inspired us to develop our safety model, which aims to assess project risk proactively rather than retrospectively. By providing forward-looking risk assessments, our project teams can better prepare for potential risks and implement strategies to mitigate incidents before they occur.

This limitation inspired us to develop our safety model, which aims to assess project risk proactively rather than retrospectively.

The first generation of our safety model was developed several years ago by a third-party vendor. However, a couple of years ago, we decided to bring the model in-house and replicate the third-party model as closely as possible. Recently, with the availability of newer data streams over an extended period, we revamped the model to include additional features that enhance risk assessment. Based on feedback from our teams, we determined that distributing project risk assessments on a weekly basis would be most effective. So, we run the model on a weekly cadence, incorporating data related to staffing, trade partners, who are our subcontractors, observations, which are inspections conducted by our staff, incident history, project schedule details, and core project information.

To disseminate the model's findings, we developed a Shiny app for visualizing the results, and we use Quarto to send automated email reports highlighting key projects.

Model workflow overview

The workflow for our safety model is fairly standard and can be divided into two distinct processes, model training and model inference. Both processes utilize an ODBC connection to Databricks, which is our data warehouse. We employ the Posit suite of tools to train and deploy the model. Once deployed, we can make API calls to the model for inference. We perform batch inference every week using a script that gathers data from our Databricks tables, creates the model input, makes a request to the model's API endpoint, and writes the predictions back to Databricks in a table. Once this process is complete, our Shiny app displays the most recent predictions, allowing end users to access the data. Additionally, any projects identified as high risk are highlighted in an emailed report, which includes insights from the model explaining why the project is considered high risk.

The Posit suite offers a comprehensive set of tools that allow us to productionize our model workflow. The tidymodels library enables us to train multiple models and select the best one based on hyperparameter tuning. The DBI library allows us to establish an ODBC connection to Databricks, enabling us to read data and write predictions back to the database. This connection also supports row-level permissioning in Databricks, ensuring that data access is restricted based on predefined permissions, thereby relieving data scientists from managing data access. Vetiver allows us to version our models easily and deploy them to Posit Connect, where they can be served as API endpoints. Quarto helps us schedule the model workflow and send out key results via email. Finally, we use a Shiny app to present the model results in a user-friendly format for our stakeholders, with the app also deployed to Posit Connect.

Model training with tidymodels

In this section, I will walk you through the script we use to train our data. While I have a separate script dedicated to generating the training data, I will not be showcasing that script or the actual data set used for this public demonstration. However, this provides an excellent opportunity to highlight the utility of the PINS library, which allows us to save a version data set and access it from a separate script. This is precisely what I am doing here in this cell.

One of the standout features of the tidymodels library is its comprehensive suite of tools for model training. I will guide you through some of the key features we utilize. Initially, we generate our train-test split to evaluate the model's performance on an independent data set that the model has not encountered before.

Here, we introduce the concept of a model recipe, which I believe is an unsung hero of the tidymodels process. Incorporating a model recipe is advantageous because it enables data transformations within the model workflow itself, thereby minimizing the preprocessing required before feeding data into the model when it is served as an API.

In our script, we flag the project number and week columns as IDs, ensuring that the columns are included in the model workflow but not used as features. The HadIncident column, which indicates whether an incident occurred, is assigned the outcome role, signaling to the model that this is the target variable we aim to predict. We also handle missing values by removing any rows with NA values for numeric features, and perform one-hot encoding for nominal features. It's important to note that unless the one-hot parameter is set to true, the encoding will create N-1 features for N values in a nominal column.

Additionally, we create interaction terms between specific features and perform downsampling to address the imbalance in our training data set. This imbalance, characterized by a higher number of project weeks without incidents compared to those with incidents, is obviously favorable from a business and safety perspective, because it means that incidents are relatively rare. However, without downsampling, the model would suffer from poor recall, struggling to correctly identify project weeks with incidents in the testing set.

We specify two models for training, a random forest classifier and an XGBoost classifier. To optimize the models, we use the tune function to specify hyperparameters, which we define shortly later in the script. A notable feature of tidymodels workflows is the ability to train multiple models using multiple recipes within the same process by using the workflow set function. This allows us to test the performance of various recipes on each specified model and select the best one. Although we are using a single recipe in this demonstration, it's worth mentioning the flexibility this feature provides.

We define our k-fold cross-validation strategy for training and specify the range of hyperparameters to test. For each hyperparameter, we test five different values evenly distributed across the specified range. During the training process, we track performance metrics such as precision, accuracy, recall, and area under the curve.

The workflow map function integrates all the components of our script, including the workflow set, cross-validation specification, hyperparameter tuning grid, and performance metrics. Running this function initiates the training process.

Based on our previous evaluations, we know that the random forest model outperforms the XGBoost model, so we will skip the step of comparing their performance. However, examining the training metrics allows us to assess the model's performance. Julia Silgi has published numerous insightful blog posts on iteratively tuning model hyperparameters, which I highly recommend for those starting from scratch. For the sake of brevity, I know that the model performed best as a random forest, so we will assume that the best model is the random forest model and proceed.

In this cell, I extract the best random forest model based on my choice of metric, in this case, the area under the curve metric. To finalize the model workflow, we combine the chosen recipe, model, and training data into a workflow chain.

Saving and deploying the model with vetiver

Once satisfied with the model, we save it locally using the vetiver library. This involves creating a model board, which serves as a subdirectory, and converting the model into a vetiver model object. tidymodels workflow objects are natively convertible to vetiver model objects, and additional model types can be explored if desired. Finally, we save the model object as a pin within the model board.

Although this demonstration saves the model pin locally, we have a separate script for deploying the model and serving it as an API, which I will demonstrate next. So now we have our model saved locally, and we want to save it to Posit Connect, and then serve it as an API. As you can see by the size of this script, once you have a model object saved, it really doesn't take much work to deploy it when using vetiver. Similar to the last script, we have to instantiate our model board directory, which we do in this cell.

Next, we use vetiver pin read to load our model object. We versioned our model within the vetiver pin board, and reading it in like this loads the most recent version of the model. Next, we must create a pin board on Posit Connect. It's essentially the same process as creating a local board, but we have to add a pin board to the local board, which we'll do in a moment. Note that we also make it a versioned model board, so that we can deploy multiple model versions to the same location, in case we wanted to retrain the model and deploy a new version later on. Then, we just use the same vetiver pin write function as before, only this time it's writing the model to connect. On Posit Connect, it saves our vetiver model object as a pinned list, and you can see that here.

Now, all that's left to do is deploy it as an API, which is this small snippet of code here. Using the connect board we already created, and adding in the model name as a function input, it knows which object to serve as an endpoint. I did add one additional argument, which is predict arcs. Our model is a binary classifier, but we'd really like to see the confidence that the model has in the prediction. So, what we prefer is not the binary prediction, but rather the probability score, which is what we're defining here.

And that's it. So, here we are in Posit Connect, and we can see the API endpoint that was created, and now we can call it for predictions.

Running batch inference

Okay. So, now we've trained our model and served it as an API. Now we can run our predictions on new data. We have a separate script to do this, which is a script I'm showing here. This script is three primary things. First, generate the predictions. Second, write back those predictions to our data warehouse. And third, send out an email with the information we specify when the script runs.

I'm going to start showing what we have in this script at this point, because the portions of the script above are querying our tables. So, just assume that everything above is what we do to generate our data frame that we feed into the API, which is this predictions data data frame. It has the same format as the data frame we built to train the model. The next step is to call our API endpoint. We store the endpoint in our API key as environment variables, which we load into the script. Then we just make a simple API call to the endpoint. Remember that this model has a recipe associated with it. So, the endpoint makes the necessary preprocessing transformations for us before it makes the predictions. All that's left is to pull the predictions from the model after the request, which we do here.

As part of the back and forth with our stakeholders, one thing of value for them is to see which features are causing an elevated risk. We use SHAP scores to evaluate feature importance. If you want to do this, tidymodels requires a slight workaround to get SHAP scores on non-training data. First, we load the model and model recipe from Posit Connect.

Next, we need to retrieve our training data, which the fast SHAP explain function needs when generating the scores. We store the training data as metadata when we deploy the model, and that allows us to easily gather the training data when producing SHAP scores.

Finally, we need to apply the recipe to the new predictions data, which is what we're doing here with prep data. So, we have our predictions, we have our model, and we have our training data. Now, we're ready to use the fast SHAP library to get our SHAP scores. We feed in this information to the explain function. The first parameter is our model object. X is our training data after it has been processed through the model recipe. Pred wrapper is the function we use to extract predictions once the model makes its prediction, which I'm creating here. Nsim is the number of simulations run to estimate the SHAP score. Per the documentation, the higher the number, the better. For us, 50 gives us solid estimates within a reasonable runtime. And finally, new data is the prep data from the predictions data we're trying to feed in.

This portion of our code performs the write back to Databricks so that we can store our predictions and the SHAP scores for our features. Slightly above, we do some transformations to the data so that the feature names are in plain English, but it's not really important to show. So, just take my word for it that this data frame new predictions contains the latest predictions from this model run, and this risk reasons DF contains the feature importances. Assuming we have data to publish, we create an insert script using the values in the data frame. For each row, we create a string containing the four column values we need. Project number, week, the probability of risk, and our model version. We store the model version so that we can eventually compare the model performance by version over time. We do this same string creation with the columns in our risk reasons data frame. Finally, we use DBI's DB execute function to write back the data to Databricks using our ODBC connection to the database.

Automated email reporting with Quarto

Now that our data is written back, it's time to do the final part, which is create the email output. To create an email section in your Quarto document, you need to add the following containers. First is the email container, which holds everything you want included in the email. Second is optional, but if you would like to include a custom subject line, you would need to add a subject container.

As you can see, I already have one built out. Here, I have an automated message about the safety model, and then I have two formatted data frames that I want to display. The first is a table of high-risk projects with their top features that contribute to their high-risk score. The second is a data frame of projects that were in the previous week's list of predictions, but not in this week's list, which is likely due to data that was required for the model and isn't available.

Finally, we are ready to publish. We're currently in the process of adding get back deployment as part of our process, but for simplicity's sake, I can show you how to publish via the UI within RStudio.

As you can see, we now have the published script on Posit Connect, and the content that shows up is exactly the content in the email we send out. If we head over to the Settings tab and hit Schedule, we can specify when the script is supposed to run. Let's set it up to run at 8 a.m. on Sundays. Additionally, you can send it via email to others in your organization by choosing this. I can send to all collaborators and viewers of this document, and if I wanted to add anyone else specifically to this document, I can do so via email.

So there you go. Now we have the model inference workflow deployed to Posit, which includes actions to write predictions back to Databricks and send out key information to your stakeholders.

The Shiny dashboard

Finally, we've reached the real end product of our safety model workflow, which is our dashboard. High level, the objective of the dashboard is to provide some key insights of risk across the project portfolio and help project teams see their risk in a few different ways, which I can get into. I'll point out the basic elements of the app in the code here, and then we can look at the finished product. First off, the page has a sidebar layout, which we specify here. We have two drop-down selections that a user can choose. The region that they want to look at, which could be a specific region or all projects, and then a week that they want to inspect. We also have a download button in case users want to get the predictions data for a particular week.

Then we have a row of value boxes to show some high-level facts. The number of projects in the dataset, the number of high-risk projects, the percent of all projects being high-risk, the number of projects having incidents that week, and the percentage of those incidents being on projects we flagged as high-risk. Next, on the second row of the dashboard, we have a data table and a navigation panel with a few different tabs displaying project predictions data. The data table is just a selectable list of the projects available for a user to choose. Based on the projects the user picks, this updates the data shown in the navigation panel.

There are three tabs in the navigation panel that show predictions data. First, the risk rating for projects over time, with the view showing their score relative to our high-risk threshold. Second, the risk rating for projects relative to the distribution of all predictions we've made on that project. And third, the top features driving a project's risk rating for that week. The idea for this section is that we want this product to be seen as a helpful tool for projects rather than a comparison tool between projects.

One last thing I'll call out before we run the app is our ODBC connection. We have built an internal R package, which we have called Excavator. In it, we try to standardize and simplify routine processes that all of our team members do. One example is our database connection function. In it, we aim to help introduce OAuth in our Shiny applications. By getting a user's credentials in Posit Connect based on their session in the app, we can generate a temporary access token to Databricks based on those credentials. The benefit of this is that we no longer have to be in charge of permissions. Our data governance team can handle that.

Here is the rendered app. As you can see, it's a pretty standard dashboard, but it gets to the point as far as what our safety team needs to be aware of, both at the portfolio level and at the project level. Like I said, on the left-hand side, we have a sidebar with dropdowns to select the project region and the week you want to view predictions for, as well as a button to export those predictions to Excel. Then on the top row, you can see the need-to-know facts as far as how many projects there are, how many of those projects are risky, and how are we trending on incidents for that week.

Then below, we have the project level view. I blurred out the project names just for privacy purposes, but I'll select a project here and you can see the data that's available to view. First, we have a historical plot of this project's risk over time, which gives a nice view of how this project's risk has journeyed throughout its lifecycle.

Then in this next plot, we can see how the week's risk compares to all of its historical predictions. As you can see, this project has had high historical risk, but in terms of this point in the project, it's actually relatively low risk.

This view is important because some projects are just naturally more risky than others due to project characteristics. Because of this, it's helpful to provide more context about a risk rating rather than simply giving a black box prediction.

Because of this, it's helpful to provide more context about a risk rating rather than simply giving a black box prediction.

Finally, we have a list of the top features that are contributing to a heightened risk rating, which is what we generated via the SHAP function. Now, let's see how OAuth fits in.

Row-level permissions with OAuth

As an example, I made a simple row-level permission in Databricks for our safety model table. As you can see, I only want the user to access projects in the northeast region where the office does not equal New York. After setting up these permissions, we'll go back to our app and rerun it. As you can see, when I open the app, I only have northeast projects available to me.

While this was a simple example, it demonstrates the value of how OAuth can help your organization set up data permissions in one location and keep it consistent across everything you can do.

Key takeaways

So, in closing, I'll mention a couple key takeaways from today's presentation. First, data is key to what we do at Suffolk, with safety being one of the many areas. Second, the Posit suite of tools helps us make workflows that benefit the business by standardizing our process and assisting us in displaying key findings to our stakeholders via apps, emails, and other tools. And finally, Posit Connect's capabilities with OAuth allow us to keep our data governance standards that we created in our data warehouse.

So, with that, I want to thank you all for watching today. I hope you all found today's presentation helpful, and I look forward to hearing from you all in the Q&A section.