Snowflake + Fosfor Spectra user-defined functions and expressions

Reading Time: 2 minutes

In this age of data, it is growing prominently with each passing day. As the data volume increases, there is a need for something that can store and process such an amount of data more efficiently. The most popular choice to cater to this data need is Snowflake.

Fosfor Spectra is a modern enterprise DataOps platform covering major data connectors and a wide range of transformation capabilities. Spectra’s push-down optimization is the best solutions and services when the source and target of the data is Snowflake. Snowflake is a cloud-based data warehouse or Relational Database Management System (RDBMS).

In general, all RDBMS provides SQL functions to cover common operations like concatenation, count etc. Such functions are known as in-built or system-defined functions.

User-defined function

RDBMS allows its users to perform operations that are not available through the system-defined functions by allowing them to create user-defined functions. User-defined functions are usually written for the requirement of their users. Any complex logic can be implemented through user-defined functions. In other words, user-defined functions provide a mechanism for enhancing the RDBMS server by adding more functions to it.

Expression

Expressions are formulas written within SQL queries to perform some arithmetic, logical or string operations. Simple expressions can be columns, values, or functions. Complex expressions can be formed by joining two or more simple expressions with the help of arithmetic or logical operators.

Importance of UDF (User-defined function) in data integration pipeline

With UDF, Data scientists can:

  • Carry out complex transformation logic as per users’ requirements.
  • Create a wrapper for the system-defined functions to avoid syntax-related issues during data migration.
  • Parameterize UDFs to use the interface at multiple places with different inputs.

Importance of expressions in data integration pipeline

UDFs and expressions are two different things. UDFs can contain multiple statements, but an expression is a single statement. When it is evaluated, it generates a single value.

Expressions in data integrations ensure that software users can:

  • Create complex formulas with a name.
  • Save the cost of creating UDFs in target environments.
  • Use portable expressions.
  • Use portable expressions.

Snowflake user-defined functions and expressions

Snowflake currently supports the following languages for creating UDFs:

  • SQL
  • JavaScript
  • Java

These may return scalar or tabular results. For example, one can use Snowflake’s UDF to check if a given year is a leap year:

create or replace function IS_LEAP_YEAR(year INTEGER)

returns BOOLEAN

as

$$

select year % 4 == 0 AND year % 400 != 0 AND year % 100 != 0; $$ ;

Snowflake expressions:

Snowflake SQL follows ANSI SQL standard. Snowflake SQL expressions are a combination of:

  • Columns
  • Values
  • System-defined functions
  • User-defined functions
  • Operators adhering to the rules provided by ANSI SQL

Some examples of expressions include:

  • Expressions to check whether year is leap or not
    year % 4 == 0 AND year % 400 != 0 AND year % 100 != 0
  • Expression to calculate gross salary:
    (basic_salary + hra + (earned_points * bonus)) – income_tax

Snowflake UDFs in Fosfor Spectra:

Spectra provides some predefined Snowflake UDFs that can be created during the Snowflake warehouse or spark cluster initialization in Spectra. Some UDFs are mentioned below.

# UDF Description Run Engine
1 SOUNDEX(V String) Implements Soundex Algorithm for lookup operations Snowflake, SPARK
2 LEVENSHTEIN(STR1 String, STR2 String, FLAG Boolean) Implements Levenshtein Algorithm for lookup operations Snowflake, SPARK
3 CONVERT(COL float, FROM_DEMO STRING, TO_DEMO STRING) Covert column value to kilometre, centimetre etc. Snowflake, SPARK
4 UNZIP(binaryArray) Unzip binary content SPARK
5 DECODE(binaryArray, charset) Decode content SPARK

Snowflake expressions in Spectra:

Spectra provides classic UI (User Interface) to create and maintain domain-specific expressions. Expressions can be written using ANSI SQL format. The user is allowed to validate and test expressions before using them. Users will be able to see the list of components that will be affected when the expression is modified.

Predefined expressions

Fosfor Spectra provides some predefined expressions. The user is allowed to create UDFs as per their domain knowledge. The following are some predefined expressions available in Spectra for the manufacturing industry.

# Expression Name Domain Sub-Domain Expression
1 INVENTORY_DAYS_OF_SUPPLY Manufacturing Supply Chain Inventory_on_Hand/Average_Daily_Usage_of_Inventory
2 INVENTORY_TURNOVER Manufacturing Supply Chain Cost_of_Goods_Sold / ((Opening_Stock – Closing_Stock) / 2)
3 GROSS_MARGIN_RETURN_ON_INVESTMENT Manufacturing Supply Chain (Gross_Profit) / ((Opening_Stock – Closing_Stock) / 2) * 100
4 DAYS_SALES_OUTSTANDING Manufacturing Supply Chain (Account_receivables / Sales) * Days_in_Period
4 FILL_RATE Manufacturing Supply Chain ((Total_Number _Of_Items – Number_of_Shipped_Items) /Total_Number_of_Items) * 100

What do Snowflake and Spectra offer stakeholders for their UDFs and expressions?

1. Cost optimization

Usually, an additional cost is required to create UDFs in Snowflake. Spectra will help to avoid this cost. Furthermore, expressions will be stored in Spectra’s backend database while no additional charge is needed to validate the expressions.

2. Easy Maintainance

Because Fosfor Spectra provides and creates the UI for all expressions-related operations, the system is easy to maintain.

3. No limit on the number of expressions

Users can create many expressions as they want.

4. Reusability

Once an expression is created, users can employ it at multiple places like data pipelines, parameters schedules etc.

5. Portable expressions and user-defined functions

Software developers can quickly transfer their data from one platform to another.

6. Avoidance of syntax errors during data migration

With Snowflake and Fosfor Spectra, the same functionality can be available in multiple RDBMS, with a highly restricted likelihood of syntax errors.

7. Concurrent expressions creation

To help identify issues at an early stage, the duo’s concurrent expression creation discovers defects/bugs before the delivery to the client. This guarantees the quality of the data pipelines.

8. Impact predictability for changes before expression modification

With Snowflake and Fosfor Spectra, the user will be able to see data pipeline simulations of changes before expression modification.

Author

Prashant Sutar

Technical Lead, Spectra by Fosfor

Prashant is a Technical Lead at Spectra. He has over 6.5 years of experience in designing, implementing, and installing software solutions. He has also developed high-quality software designs and architectures. He has hands-on experience with technologies like Java, Spring Boot, Apache Spark, Snowflake, Google BigQuery, MySQL, PostgreSQL, Elasticsearch, Docker, Kubernetes, AWS, Fluentd, JavaCC and Hashicorp Vault.

For the past few years, he has contributed to the end-to-end development of many significant features on the Spectra product roadmap and contributed to the product's automation of product packaging activities. Additionally, he has experience in domains like MDM. He loves to explore new technologies and experiment with different tech stacks. He is a regular contributor to stackoverflow.com, having more than 1500 reputations. He is also a contributor to the popular open-source project JSQL parser.

Latest Blogs

See how your peers leverage Fosfor + Snowflake to create the value they want consistently.

Making your Snowflake pipeline robust with Fosfor Spectra

"How can I avoid constantly jumping between Snowflake UI and Spectra UI to know what transformations would be apt for this data pipeline I am trying to configure on Fosfor Spectra?" 

Read more

Market share and aisle share: Category analytics for CPG Industry

Consumer Packaged Goods (CPG) companies, rarely have visibility of their market, competitors, or selling patterns because they depend on retailers as their end-customer. The presence of retailers in the process of selling CPG products to end consumers creates the need for multiple varied data access for CPG companies.

Read more

Overall equipment effectiveness: is it still relevant in the industry 4.0 era?

Industry 4.0 builds upon the enterprise-wide automation stack (that characterized Industry 3.0), focusing on integrating more (and newer) technologies, like IIoT and data science, into the production environment. The goal is to blur the chasm between the physical (Operational Technology) and the digital (Information Technology) world - creating a fully connected, integrated, data-driven, and autonomous digital factory.

Read more