Data-level permissions using Posit Connect (with Databricks, Snowflake, OAuth)
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi everybody, my name is Zach Verham. I am an engineer on the Connect team, and I'm really excited to give this workflow demo today. We're going to be talking about a new feature in Connect that we're really excited about called OAuth Integrations. And what this is, is it allows published apps on Connect to interact with third-party resources using the app viewer's OAuth credentials.
And so we're going to talk over the course of this workflow demo about what this feature is, and why we're so excited about it, and what it allows you to do in the new usage patterns in Connect that this opens up. We're going to talk about how to set up fine-grained access controls in Databricks Unity Catalog. And then we're going to show an application that makes use of this new feature in Connect to use the fine-grained access controls that we set up in Databricks when querying data to get unique personalized views of that data in applications that are published to Connect.
And we're going to talk a little bit near the end about what the developer experience is, and how to begin using this feature in your own applications. It's important to call out, we're going to show this feature as it interacts with Databricks, but this same pattern will also work with Snowflake. And we'll have time at the end after the demo for Q&A.
What is Posit Connect and the old approach
So really briefly, Posit Connect, for those of you who are unfamiliar, it's a way, it facilitates being able to publish and share various data science artifacts, whether those are static content like a Quarto document, or interactive applications like a Shiny app. And this feature specifically works with interactive content, so like a Shiny or a Streamlit or a Dash app. And it is designed and built to make it easier to interact with the data that you might have in your Databricks environment or your Snowflake environment.
So it's important to call out here, the way that customers and users would try to implement this type of interaction previously, is that you would set perhaps environment variables that are using service account like credentials to interact with resources like Databricks or Snowflake. So the problem that this flow ran into pretty quickly is that these deployments really only have one identity that they're using when they're interacting with the external resource. So having fine grained access control that respects, say, how different users or user personas might interact with that external resource would be really, really difficult.
Some of the workarounds that people would use would be maybe deploying the same content item with either different business logic, where it's running different queries against the dataset, or perhaps different service account credentials to interact with that external resource as a different account identity. But this means that you're deploying the same content over and over and over. And it's the responsibility of the app to ensure that data is being accessed securely and that the permissions that are applied to a given user are being respected. So this moves the locus of sort of secure access out of the data source and into the applications that are querying that data.
So this moves the locus of sort of secure access out of the data source and into the applications that are querying that data.
How OAuth integrations solve the problem
So what this new feature supports is that where in Databricks Unity Catalog, for example, you can set up fine grained row and column based access control, and you can set up like really robust access protocols against those data sources. It's now possible to deploy content to connect that transparently relies on what is set up on top of the data in one centralized location. So you can deploy a single application to connect, which serves many users and many different data access patterns. And the app is transparently relying on what the security protocol is at the data source itself.
So in this workflow demo, we're going to set up a data source with fine grained access controls in Databricks Unity Catalog. And then we're going to show how those permissions can then be delegated to connect using this new OAuth integrations feature.
Setting up access controls in Databricks
So let's go ahead and jump over to Databricks. So this is our Databricks workspace, and we are going to use this example Lending Club data set. So this is a table that contains a lot of information about various mortgage loans that were obtained through the Lending Club platform. And so we can see there's a lot of information here. And we can notice that there are a couple of columns that seem like they might contain sensitive information, perhaps personally identifiable information.
So in this case, we're looking at this. This column has information about the employment information about the loanee. And there might be some information about their annual income that maybe we don't want everybody in our organization to be able to access. And similarly, this data is running across the United States. So maybe there are certain people who in our organization who should only access certain subsets of the data. It's possible to set these restrictions in Databricks itself using column masks and row filters.
So we're going to look at one of each of those. So if I jump over here, we can see that we have a mask function already set up. And what this is doing is it is checking to see who the current user is who's querying the data. And we are saying in this mask that if the current user is the demo Databricks user, then instead of returning the data itself, we're going to mask it out with a star. So we're not going to actually return that data. We're going to return this star string instead.
We also have a row filter that is going to do a similar sort of check against the current user. And we're checking to see if it's this demo Databricks user account. And if it is, then we are going to check the zip code column of the loan table. And we're going to see if it has this nine or an eight in it. And what this is doing is it's checking to see if it's a zip code on the western side of the United States.
So we have our two filter mask functions set up. We're now going to really easily apply them to our lending club data set. And these are just really simple SQL commands. So we're going to alter these columns where we think that this data might be sensitive. And we're going to set the demo mask to apply to those columns. And we're going to also alter the table and set a row filter that is going to look at the zip code column and potentially filter out to just the western region depending on who the current user is. So the end result of these filters is that our demo Databricks user cannot see any columns that would be considered personally identifiable information. It can only see a subset of the data related to loans from the western region of the United States.
Publishing the app to Connect
So we have our data set up and we have our access control set up in Databricks itself. And so now we want to create some content and we want to publish some content to Connect that is going to use these fine grained credentials. So I already have this app deployed up to Connect. I'm going to jump over to it.
And I'm actually going to hit the content directly so that way you can see what the login flow looks like as a viewer who's coming in to interact with the app for the first time. So if I go to this content URL, what you're going to see is I'm going to be automatically redirected to go through an OAuth login flow against Azure. And I'm going to log in and be logging in as myself. So I'm going to go ahead and hit that URL. So I get taken to an Azure login. I'm going to pick my account. I'm going to be taken through this login flow. And the end result is I'm going to be taken back to Connect. And Connect is going to then have my OAuth access token. And now the content is able to request the access token from Connect and use it to query data from Databricks.
So you can see here that the app is aware of who I am. And it is querying that same data set that we just looked at in the Unity catalog and doing some aggregations and visualizations with it. We know that from our column mask and our row filter, I did not have any restrictions in my ability to access this data. So I'm able to see everything. I can see the columns that we said are personally identifiable. I can see every region. I have access to the whole data set.
Developer experience and local development
And this looks very similar to how you might set up this sort of interaction with Databricks using service account-like credentials, which was the pattern that most people used previously. And a quick aside here, I do have this same application with the same source code running locally. And we can see that it behaves the exact same way. We put a lot of thought into what the developer experience is like with this feature. And we really wanted to make sure that the app would run the same way on your local laptop as you're building it, as it would when it is being published up to Connect, that the same code works in both environments.
And so it's the same source code running both on Connect and locally. And we can see that I'm able to query the data as myself. And I'm getting the full data set from the Unity catalog. And I want to jump over really quick to show how that's working and how you can begin to build these apps for yourself.
So we have these cookbook examples that are going to be published later this week that show how you can set up integrations with Databricks and Snowflake. And it is really, really easy to get this set up and to get going. So this is not the exact same code. This is a constrained simplified example that just shows how in a few lines of code, you can get this working in your own applications.
So all you need to do ultimately is there is this session token that's associated with the viewer that you'll get out of the header that's being passed into your app. And the actual way of getting the header is going to vary depending on the framework that you're using. So this is how you do it in Shiny. I'm actually going to jump over and show how to do it in Streamlet because that's what is running up on the Connect server that I'm showing. But you can see that it's really, really similar. There's a slight change in the syntax of how to get the header, but the code ultimately looks very, very similar.
And this Posit credential strategy comes out of the Posit SDK. And you can see that it is accepting that session token that's identifying the user who's interacting with the content. And it's also accepting a local strategy. And what this allows us to do is we can determine whether the app is running locally on your laptop or if it's running up on a Connect server. And if it's running locally, we can fall back to the Databricks CLI sort of native way of authenticating with your Databricks workspace. So that means that the code is able to run and do sort of the personalized view of the data knowing who I am, whether it's running locally or up on Connect.
So once we build this credential strategy, we pass that into a Databricks SDK config. And once we have that, we're able to make queries against the API. And we can make very specific queries that identify the user who's actually making that request. And similarly, it's very easy to then pass this into a SQL connector where we can begin to run queries against the Unity catalog itself. And those queries are going to respect the row and column level access control that we set up previously.
Seeing the feature in action with different users
So I've shown how this is working as me. I've shown data that looks pretty much the same as what you would see from a service account. But I think where you can really see how powerful this feature is is if I jump over to an incognito window where I'm logged in as a different user. So here I'm actually logged in as that demo Databricks user that we applied the masks and the filters for in the Unity catalog. And we can see that this user is not seeing the same data that I'm seeing. Specifically, this user is only seeing the West region. And they are seeing that column mask that we applied where it's starring out the data that is personally identifiable.
And you can really see the power of this if I put them side by side where you can see that there are materially different views of the data in both views where the aggregations are ultimately different because the users are receiving different data. But we can see that this is the same application. It's the same code running on Connect. And it is presenting different views of the data in Databricks depending on the user who is logged in and interacting with the app.
But we can see that this is the same application. It's the same code running on Connect. And it is presenting different views of the data in Databricks depending on the user who is logged in and interacting with the app.
Supported integrations and what's next
So this is, I think, very exciting. I think it is going to open up a lot of new usage patterns on Connect and a lot of new ways of interacting with third-party resources like Databricks and Snowflake. And one last thing I want to show is if I hop over to the Connect server and look at the integrations that are available. And these are integrations that your Connect admin would set up for the server for all published content to potentially be able to use. We currently have native support for Databricks and Snowflake. We also support just an Azure OAuth integration. And we also support a custom integration.
So any external OAuth app we support integrating with and using this flow to do viewer-based access against that third-party resource. So anything that implements OAuth we can potentially communicate with and people can build apps against. And that's really, really exciting. We are going to continue to add to this list of integrations that we support natively. This is something that we want to continue to expand. And we're really excited to see how people use this feature and the types of things that it's going to open up.
So thank you so much for your time. And I think we're going to jump over to Q&A now.
