4 min read

Making your Snowflake pipeline robust with Fosfor Spectra

Reading Time: 4 minutes

Q: “How can I avoid constantly jumping between Snowflake UI and Spectra UI to know what transformations would be apt for this data pipeline I am trying to configure on Fosfor Spectra?” 

Q: “How do I ensure that the pipelines I am developing are not affecting my production databases and still be able to see the results of my pipeline execution without writing to an actual table?” 

Q: “I am done with testing my pipeline. Now, I want to have a robust, production-ready version of my flow. How can I achieve this?”

When creating Snowflake data pipelines on Fosfor Spectra, these common questions come to mind. In this article, let’s find out how Spectra facilitates its users to construct their Snowflake pipelines, make them robust and ensure that they are production-ready.

We will cover the following topics in this article:

  • Flow configuration and what nodes we will use.
  • How to see sample data of tables in Fosfor Spectra.
  • Using the peek node to validate the output of our flow.
  • Saving a production-ready version with an actual writer node.

Before we start, if you are not familiar with Spectra and (or) Snowflake, I would urge you to check out the following links to get started:

Now that we are familiar with the context, let’s begin.

Flow configuration and what nodes we will use

Consider a straightforward use case with two tables – Customer and Nation. I have chosen these tables from Snowflake’s sample data share SNOWFLAKE_SAMPLE_DATA and copied them into my own Database and Schema. 

Here’s what the structure of these tables looks like –

Customer

Nation


To achieve our use case, we would need to use the following nodes on Spectra

Two Snowflake readers

In this case, one reader is for monitoring the Customer Table and the other for the Nation Table. Configuring a reader node and checking its sample data from within Spectra is an effortless process. All you must do is drag a Snowflake reader node into the canvas. It can be found in the left panel under the READER Section.

Once on the canvas, the process is as simple as selecting your connection, database, schema, and table. Then, tap on the validate button.

How to see sample data of tables in Spectra

Awesome, we have configured our readers, and now you may be wondering, “How do I know what data is in these tables that I have configured my readers with? Do I have to go to my Snowflake UI to find out?” 

The answer is a very reassuring “NO.” This brings me to Spectra’s Sample Data feature. After validating the reader node as seen in the previous step, all you have to do here is click on the Data Tab in the Bottom Panel and give it a few seconds. 

This will fetch a sample of the data stored in your Snowflake table within the Spectra UI itself.

Join node

We will add a simple inner join to join the Customer and the Nation table on the NATIONKEY columns (JOIN ON C_NATIONKEY = N_NATIONKEY).

Additionally, from the INPUT section of our Join Node, let’s drag the C_CUSTKEY column under the first Snowflake Reader (corresponding to the Customer Table) and the N_NAME Column under the second Snowflake Reader (corresponding to the Nation Table) into the OUTPUT section of our Join Node.

Group by node

Let’s add a group by node and group the data on the N_NAME (Nation Name Column), and add an aggregation to count the number of customers in each Nation. We use COUNT(C_CUSTKEY) for this in the Aggregation sub-section under the PROCESS section or our Group by Node.

Peek node 

Finally, we will add a Peek Node into the canvas to test the pipeline we created. As the name suggests, the Peek Node is a process node that provides a quick glance over the data that would be potentially written to an actual writer at a particular point in the flow. In this case, after the Group By process node that we configured. 

This gives us the flexibility to check the data being produced at a particular pipeline stage without writing the data to an actual target table.

Using the peek node to validate the output of our flow

Like the sample data on the reader node, Spectra gives you an option to view the peek node sample data. 

The flow must first be executed successfully to get the sample data at any peek node. On its latest run history, the data can be viewed.

To do this, simply submit the flow to your configured Snowflake Warehouse for execution.

Once it is successfully executed, you should be able to see a successful entry in the run config tab as below.


Now, upon clicking on this entry, it should take you to the execution instance of this flow. Double click on the Peek Node in the canvas (or right-click > Configure). This will enable the Data section on the bottom panel. Clicking on the Data tab will then bring up your sample data.

There we go! You have successfully configured a Snowflake pipeline on Spectra and tested its output without creating an actual table with the writer node.

Saving a production-ready version with an actual writer node

You may now be presented with this thought, “But I need to convert this proof of concept into an actual production pipeline… How do I do that?”

Well, the answer is simple. Remove the peek node and drag a Snowflake writer in its place. A few clicks to configure and run the flow, and then voila, in no time, you would have written the output of your pipeline to an actual table.

Save a version of the flow to keep it safe from unwanted edits

At this point, we have successfully configured our Snowflake pipeline by leveraging some beneficial features of Fosfor Spectra. Now, we must also ensure that the production version of our pipeline is stable and no changes are made by mistake. In short, make it robust.

Behold! Spectra’s flow versioning feature

Yes, that is right! Spectra allows you to save versions of your flow that you could easily fall back to if you desire. This will enable you to keep the production version of your flow intact and fearlessly make edits to the flow. If you feel you ever want to go back, rest assured. It’s kept safe and sound, just a few clicks away.

To save a version of your flow, find the Save as Version button on the top right corner of the screen, right beside the Save button. Enter a description for your version and click on Publish. 

You may save multiple versions of a flow and view or checkout anyone from the ones you have saved.

To view or edit your published flows, simply click on the Versions tab in the bottom panel, and you will be able to see all the versions you have posted. You could then edit or even publish an older version of the flow.

Author

Ruvesh V.

Senior Product Engineer, Spectra by Fosfor

Ruvesh is a senior product engineer at Spectra. He has over 2.5 years of experience in designing and developing robust data-driven applications with technologies like Spring boot, React.Js, Apache Spark, Snowflake, Docker, and Kubernetes. For the past two years, he has contributed to the end-to-end development of many major features on the Spectra product roadmap and contributed to the product’s automation of product packaging activities. Additionally, he has experience in domains like e-commerce and Android Application development, and he loves to explore new technologies and experiment with different tech stacks.

Latest Blogs

See how your peers leverage Fosfor + Snowflake to create the value they want consistently.

Choosing the best AI/ML platform from a multimodel vendor

Artificial intelligence (AI) and machine learning (ML) technologies are expanding rapidly as organizations seek to capitalize on the value of their data. Half of the companies surveyed in a 2020 Mckinsey study have already adopted AI in at least one business function.

Read more

Data lake, data mesh, or something else?

Zhamak Dehghani coined the term data mesh in 2019. Since this data transformation, the internet started flooding with information about its advantages and all the problems it would solve. Many articles compare data lake with data mesh. 

Read more

Broker performance analysis solution: Analyzing broker performance as an insurance carrier

Most, if not all, large insurance and re-insurance carriers today work with brokerage agencies to grow their books and ensure a healthy stream of business. Depending on the carrier size, they might often work with dozens of agencies spread across the globe, each with its own operating processes and ways of working. For broker managers, monitoring agency performance and working with them to target the right lines of business, suitable policies and the right customers can be a nightmare. The need to be able to quickly analyze broker performance and take corrective actions to meet submissions. As such, underwriting targets is critical. Lumin's Decision Intelligence capabilities make this task considerably simpler. Let’s dive in.

Read more