snowflakelookerheader-final

Game changer: Use Snowflake Data Sharing and Looker to deliver a branded end-to-end data sharing experience

There is a lot to love about the Snowflake Data Sharing feature as a means for sharing data with third parties. It’s secure and easy, it allows consumers to bring their own compute power to crunch on the shared data, and most importantly the data is live.  You can filter the data that you share using secure views so that each consumer gets a custom subset of data intended specifically for them.

At the data warehouse layer, by allowing the sharing of tables, views, and schema, Snowflake has hit a home run. There are so many reasons why companies may need to share data, and Snowflake Data Sharing is flexible enough to handle most all of them. If you’ve ever needed to share a collection of database tables before, you can see how powerful this feature can be.

But what if a company wants to share more than just the raw data?  What if they want to share insights on the data they are sharing? After all, sharing business intelligence and trends is way more powerful (and potentially profitable) than just sharing raw data. Chances are companies will want to share both data and insights at the same time.  In other words, when talking about data sharing, we need to consider how we can build on top of Snowflake Data Sharing to include sharing of live dashboards and visualizations for that data in a unified flow.

Spread the Compute Around But Keep Your Data Secure

Let’s say you’ve written a data application on top of Snowflake that embeds some truly powerful and insightful dashboards as well as a means for exploring the data using custom filters. You decide that this data and corresponding dashboards are worth sharing with partners and clients and hope that you’ll be able to turn this into a profit generator for your company.

It would certainly simplify the architecture if all users logged into the same application and shared the same underlying warehouse to view the shared dashboards.  You could allow all of your data consumers to mooch off of your Snowflake account and data warehouse from your app. However, with Snowflake Data Sharing and Looker, it is possible to “bring your own warehouse” while also having a shared architecture for your applications and dashboards.  

There are three main considerations for why Bring Your Own Warehouse (BYOW) is a better approach for your shared data application:

1) You want to share the raw data too –  In addition to sharing dashboards with insights, you want to share the underlying data as well. You want your partners or customers to get your insights and visualizations, but you also want them to be able to query and access the data directly and join it with other data in their own environment.  For example, maybe you are selling detailed historical weather data, and customers want to be able to join this data with data about their business to analyze patterns related to weather.

2) Cost – While your goal is to share data and dashboards, you also want to share the cost of the data warehousing.  It can be expensive, especially when running complicated queries to generate your dashboards. And if your application allows for exploring and filtering your data in user-specific ways, you can’t rely on a common data caching layer across users.  With Snowflake (and others) providing a pay-as-you-go cost structure, the more you explore the data, the more it will cost you. By using Snowflake Data Sharing, consumers bring their own Snowflake account (and therefore pay for their own warehouse costs) while using the data you’ve shared.  They will get charged for only as much compute as they use and you don’t have to worry about that expense. At the same time, you’ve written your application in a way that it can be used with any shared copy of your data (more on that below).

3) Securing Data – Many times you want to share a strict subset of data with individual consumers.  For example, you may want to share insights about your data that relates to one partner but not show them data relating to a different partner.  Snowflake Data Sharing makes this easy, and we would argue that this is the correct level for this security to be enforced. Otherwise, your data application would have to enforce the role based filtering as a client.

The Flow

What is needed is a powerful data platform that sits on top of Snowflake that can make this possible as well as a data sharing tool that leverages the features of both.  In our experience, Looker is uniquely positioned as a data platform that can leverage the Snowflake Data Sharing feature because of its ability to define reusable models with LookML and to parameterize both queries and connections.  

Here is how the stack might look if you wanted to create a shared data application while also leveraging BYOW using Snowflake Data Sharing:

To understand how Looker + Snowflake create a powerful data sharing platform, it’s important to first consider what the ideal flow would look like.

A Data Provider:

  1. Identifies a set of tables or views they intend to share.
  2. Builds a collection of dashboards and visualizations in a parameterized way on top of those tables using a data platform such as Looker.
  3. Creates a “share” using Snowflake Data Sharing for those tables and views and shares them with another Snowflake account
  4. Shares the dashboards and visualizations using a platform such as White Label Data which can manage the collection of user parameters and Snowflake account information necessary to oversee the end-to-end flow.

A Data Consumer:

  1. Creates a database in Snowflake using the share that is now available to them
  2. Receives an email invitation from the White Label Data application and follows the steps to create a password to view the data as well as enter their specific Snowflake account information
  3. Views the dashboards and visualizations in the White Label Data application, which embeds the Looker dashboards that now use the specified Snowflake connection information

Looker As the Ultimate Data Platform for Snowflake Data Sharing

It’s a testament to Looker’s architecture and powerful features that it is able to integrate with Snowflake’s Data Sharing feature in such a seamless way.  In short, the goal is to be able to create a set of dashboards and visualizations for a given dataset, and then to be able to reuse (i.e. share those dashboards) with other consumers who also have access to that dataset in their own Snowflake account (because you’ve shared it with them using Snowflake Data Sharing).  

If you look at many other data analytics platforms (we won’t name names), you’ll notice that the connection details and the specific table and schema names are stored with the visualizations and dashboards.  In other words, if you wanted to create a bar chart, you choose your connection, select your tables, and chart the data. From that point on, that visualization is tied to that specific connection and table. You would have to redo the chart from scratch if you wanted to use a different connection or table (say on a different Snowflake account), even if you knew it used the same schema.  Not so with Looker. Here’s how:

1) Parameterized Connections

You can create a connection in Looker for Snowflake, and then have the hostname, port number, username, password, database name, and schema name be user attributes that vary with each user.  Looker also supports having the warehouse name be a user attribute and passed as an extra field in the JDBC connection. When a user authenticates, the user attributes are set to their designated values. So while these dashboards are using the same connection name (as specified in the LookML model), they will actually be connecting to different data warehouses by user because the specified connection is parameterized.

Looker originally intended this feature to be used for different development stages, say a development warehouse, a staging warehouse, and production.  But it works equally well to point at Customer A’s Snowflake account, Customer B’s Snowflake account, and so on. With Snowflake Data Sharing, each consumer of shared data has the same table names but the connection details, database name and schema names can vary.  Using a parameterized connection, you can use the same LookML model files across consumers and therefore share the dashboards that are built on top of that model.

2) User Attributes in LookML to Segment Caching

Parameterized Connections get you 90% of the way there, but there is one other consideration that is critical to making this function correctly.  In order to securely use Looker’s query caching to reduce load and expense on your data warehouse, you need to ensure that the cache is not being shared across the different Snowflake accounts that are sharing the connection.

By default Looker uses this approach to determine whether to read from a cache or make a new query: When a new query is written, the cache is checked to see if the exact same query was previously run. All fields, filters, and parameters must be the same, including things such as the row limits. If the query is not found, then Looker runs the query against the database to get fresh database results (and those results are then cached).

Without an additional deliberate step to make queries different across different Snowflake targets, the queries will appear to be identical because it’s the exact same SQL that is generated.  Further, because Snowflake Data Sharing allows for using row-level filters in secure views such that each consumer may have access to a subset of data for the same schema, it’s important that the cache is not shared across consumers. After all, a cached query from one Snowflake account could have different data than another Snowflake account.  

In order to ensure that the cache is not being shared across users who are not sharing the same Snowflake attributes for their connection, we need to add some magic into the Looker model to make that happen.  Fortunately, Looker makes this really easy in LookML. You can simply add something like the following to each of your explores:

sql_always_where: '{{ _user_attributes['snowflake_host'] }}' = '{{ _user_attributes['snowflake_host'] }}' ;;

This adds a where clause that always evaluates to true, but provides uniqueness in each query made in the model.  It effectively segments all queries by the value of the user attribute “snowflake_host”. You could segment based on different attributes that make sense in your use case.  However, this ensures that the cached data you are reading is from a query to your Snowflake host and not someone else’s Snowflake host.

Putting it All Together with White Label Data Sharing

Once you have built your dashboards on top of your dataset using Looker and parameterized your Snowflake connections as specified above, you still need to create a data application that can be used as a vehicle for sharing your insights.  In order to write a shared data application that can also take advantage of shared dashboards across different Snowflake accounts, you need an application platform that is capable of supporting Bring Your Own Warehouse using Snowflake Data Sharing.  

White Label Data by Hashpath is a platform that is capable of orchestrating the steps necessary to configure multiple Snowflake accounts as well as the Looker user attributes required to share LookML models across shared Snowflake data.  And it can all be done without writing any code.

Step 1: Spin up a White Label Data application in minutes

White Label Data allows you to embed your Looker dashboards into an application in minutes without writing any code, requiring only simple JSON configuration setup.  Once embedded, the White Label Data application enables you to share dashboards with your own users without requiring them to login to Looker directly.

Although you are sharing your company’s data, it still belongs to your company and should be branded as such. White Label Data makes this possible by letting you fully brand your custom application before sharing. You can select your color scheme, navigation icons, and logo as well as other advanced customizations.  This is explained in more detail here.

Step 2: Invite users to view your dashboards with Bring Your Own Warehouse

Once you have at least one dashboard available in White Label Data, you can invite others to view it using the Invite Form that is provided by White Label Data.  Note this assumes that you have already shared the dataset with this new consumer using the Snowflake Data Sharing feature directly from Snowflake.

After you submit this form, the invited user will receive an automated email from you inviting them to view the data you’ve shared using the dashboards you’ve also shared. This is done using a one-time link that lets them begin the sign-up process.

Step 3: Consumer configures their Snowflake connection on their first login to the White Label Data application

After clicking on the link in the email that they received, the invited user (a.k.a. the consumer), follows the steps to create a password for accessing the data as well as to configure the specific Snowflake connection.  

In this example, the consumer has already received a “share” of the data via Snowflake Data Sharing.  As part of receiving that shared data, the consumer is required to create a database in Snowflake that is a read-only pointer to that shared data store.  As specified by Snowflake, this database has certain restrictions such as it cannot be modified, shared again, or copied. Otherwise, it has all the normal properties of a database and can be queried directly from the consumer’s account.  In this case, the consumer has called their database CONSUMER1_DATABASE, and they created a warehouse called CONSUMER1_WAREHOUSE in Snowflake.

Step 4: You’re Done!

After completing the connection setup, the new consumer is directed to the main navigation page of the White Label Data application.  From there they can view the dashboards that the data provider has created, using their own Snowflake account.

If you would like more information on how to use White Label Data with Looker and Snowflake, or to discuss data sharing in general, please drop us a line at hello@hashpath.com

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email

Get OUR latest Data Ideas

Hashpath is a Product+Analytics consultancy helping clients build powerful internal and customer-facing data applications.