Optimizing advertisement campaign outcomes with the Snowflake Data Cloud & Refract

Reading Time: 4 minutes

Global advertisers utilize the concept of “Sales Lift” to estimate the increase in sales resulting from a specific advertisement, and to assess the effectiveness of their advertising efforts. This approach enables advertisers to make informed decisions regarding the allocation of advertising budgets, the selection of ad types, and targeting of their ads to maximize impact. However, it is crucial to account for confounding variables and their impact on the success or failure of these ad campaigns.
Confounding variables – demographic factors like age or income – have the potential to influence both advertising and sales figures. Failing to effectively monitor these variables may lead advertisers to incorrectly attribute the increase in sales.

In this use case, we will demonstrate how the Snowflake platform along with Refract can be used in this scenario for data analysis and model training. By understanding how certain confounding variables can predict whether a user will purchase a product, advertisers can make more accurate predictions.

Refract by Fosfor, is built to enable new-age data-to-decision teams; it accelerates and automates the entire Data Science, Artificial Intelligence (AI), and Machine Learning (ML) life cycles, from data discovery to modelling, deployment, and monitoring. Refract addresses the needs and challenges of different user personas by fostering collaboration and making data consumption for decision-making simple and effortless.

Additionally, the Refract platform is employed for hosting and post-production monitoring of the model. To facilitate insights contextualization for business users, we have developed a Streamlit app that allows them to grasp the intricacies of the data and appreciate the benefits. We have tried to keep the solution simple by not predicting the sales lift, but rather just predicting if the user is going to make the purchase given some confounding parameters.We have also ensured that the solution is visually intuitive, so that even non-technical users can understand the data.

Note: This solution is inspired from and built on top of another article written by our strategic technology partner, Snowflake. The article offers an excellent explanation on the dataset that was built and showcases some of the fantastic Snowflake features that were used to build the solution.

Business problem:

Most businesses look to calculate what the sales lift would be or try to understand the probability of sales conversion for a particular user after consuming a targeted advertisement. To keep the problem statement simple, we used only a few of the confounding parameters (age band, marital status, if they saw the ad or not, etc.) to predict if the user would make the purchase. But a few other “non-confounding” parameters were also used to make it more robust, such as the channel and form of advertisement, and other demographical data (like employment details, etc.). Perhaps even a market mix model, can be used in a future version of this solution.

Refract by Fosfor:

In this example, data manipulation and modeling has been executed on Snowflake, to provide for data security, while Refract provides an integrated platform to track and consume the model performance through a Streamlit app. This app provides a direct UI to consume the model in a what-if type of analysis and understand the data using dynamic visualizations.

Key benefits of the proposed solution:

  • Powerful, flexible data science pipelines built through Snowpark for Python
  • Quick visualizations of attrition facts & figures through Streamlit (quicker turnaround for everyday visualizations)
  • Interactive storytelling with quick app-style deployment of notebooks

The below image is a representation of the solution workflow that illustrates how the Streamlit app is connected and integrated on Refract.

Image 1: Refract solution workflow

The data:

  • To understand and to be able to predict user purchase patterns, we have mapped user demographics such as marital status, age band, type of household, etc.
  • We also have the details regarding if they are already a customer of the brand
  • we tracked the details if the user has watched the ad
  • We also mapped which user has purchased the product given the other confounding variables

The process:

Step 1: We have used Refract’s integrated Jupyter notebooks to connect to Snowflake and start a session. As the Snowpark API requires Python 3.8, Refract gives users the option to freely choose the desired version of Python. Refract has pre-created templates catering to a wide area of use cases.

Step 2: We created a data pre-processing pipeline for simple manipulations like missing value imputations, data scaling, and one-hot-encoding, followed by the model training pipeline consisting of Random forest algorithm and grid search for finding the best parameters. This entire code is written in a separate .py file in the form of Python functions, which is then sent to Snowflake stage that you will be using. This training function is then registered as a stored procedure using Snowpark SQL.

Using the above code, you can send any file to your Snowflake stage.

Image 2: Code for registering the model training pipelines as a SPROC (stored procedure) in Snowflake.

The above piece of code will register the model training pipelines as a stored procedure in Snowflake where the training will happen, do remember to mention the Python modules you will be using in your training pipeline, and if your pipeline has multiple functions, specify the main function as the handler of the SPROC.

Image 3: Code for triggering the stored procedure

Using the above code you can trigger the training stored procedure, while mentioning the variables that you don’t want to use in your training.

Step 3: Like step 2, you can create a separate file defining your prediction functions, sending that file to your Snowflake stage environment and then registering the functions to Snowflake using the below code.

Image 4: Code for registering the model training pipelines as a UDF (User Defined Function) in Snowflake.

Here, we will be registering the prediction file as a Snowpark UDF, having the same imports and handler.

Image 5: Code for triggering the UDF

We can call the UDF and trigger the prediction on an entire Snowflake table(sdf),and we can write back and save the predictions in a Snowflake table.

Step 4: Once the model training and predictions are done, we brought back the model to Refract along with all the necessary files for scoring the model and registered the model in Refract for monitoring and consumption.

Refract ML is a module that makes the registration of models seamless – just provide the model file and some metadata around it, and you will be able to use Refract’s one-click-deployment and post deployment monitoring features.

Registering models on Refract comes with a lot of benefits:

  • You get the visual representation of all the build-time metrics
  • Your model decision making is explained
  • Automatic API creation for basic consumption of the model
  • Model monitoring can be scheduled to look out for any possible data drift
  • Multiple versions can be deployed and compared
  • Option to choose the compute power while deploying the model, giving you better control on resource utilization

Step 5: Once the model is deployed on Refract, you will get the link to the API where you can provide the input payload and get the predictions. For small scale testing though, you can directly consume the model in Refract itself.

Campaign performance measurement through a digital application

We have built and deployed a Streamlit-based app,which offers users the technical and visual understanding of the data, so that anyone can understand the relationships between the variables and understand the possibility for users’ purchase behaviors.

Image 6: Data Profile tab – shows snippets of the actual data and the technical data profile.

The data profile tab of the app gives you a snippet of the data, just so that you can understand the data you are using.Below the data, you get the technical profile of all the variables present in the data – the distributions, cardinality, missing values, correlations, etc. This type of data profiling enables you to better understand the data, and decide the steps to figure out what kind of pre-processing is needed for better modeling.

Image 7: Know your Data tab – Visual representation of the data in form of graphs to better show the relationships between the variables.

The know your data tab provides you with the visual representation of all the variables, with the best graph representation possible for each data type of the variables.
In the above charts, it is evident that by looking at the variables’ distribution and the age band distribution, that either the data under consideration was a little biased or the interest of study was biased towards middle-aged people.

This app also enables the user to consume the trained model in a what-if analysis type of fashion, to predict the purchase of the user, given a certain confounding parameter.

Image 8: Model and Predictions tab – it provides information about the model performance and facilitate the user with a what-if analysis setup.

On the left-hand side of the UI (as seen in the image above), you can change the given parameter and see how its effects on the purchase made for an user. Along with this, we get the import features, the confusion matrix, and the model parameters such as accuracy, precision, and recall. Having all the information in a single place can enable users to better understand the data and model, and to come up with probable purchase scenarios and or understand the relationship between the confounding variables.


This solution was built upon very distinctive Ad Campaign measurement data. Information such as type and channel of advertisement can also be used to better understand the effects of channels of advertisement on these scenarios.Therefore, it is obvious that these confounding variable models along with sales optimization algorithms (MMM models) can really make an effect in estimating a sales lift.
Want to learn more? Contact our team at Refract by Fosfor to set up your free consultation today!


Ayush Kumar Singh

Specialist – Data Scientist | Fosfor (Refract) | LTIMindtree.

Ayush Kumar Singh has 6+ years of experience in executing data driven solutions. He is proficient in Machine Learning and deep learning and is adept at identifying patterns and extracting valuable insights. He has a remarkable track record of delivering end-to-end Data Science projects.

Latest Blogs

See how your peers leverage Fosfor + Snowflake to create the value they want consistently.

Empowering Organizations to solve Attrition with AI

Employees who start and end their careers in a single business organization rarely come by. Employees often switch jobs after a few years of service in any given organization. Although the reasons may vary on a case-to-case basis, these switches could be either voluntary attrition, or organization-driven.

Read more

Generative AI - Accelerate ML operations using GPT

As Data Science and Machine Learning practitioners, we often face the challenge of finding solutions to complex problems. One powerful artificial intelligence platform that can help speed up the process is the use of Generative Pretrained Transformer 3 (GPT-3) language model.

Read more

Illuminating sustainable energy generation at the Gartner A&BI Showdown

We were delighted to showcase Lumin by Fosfor during the Analytics and Business Intelligence at the 2022 Gartner Data and Analytics Summit in Orlando, Florida.

Read more