Large Language Models

Reading Time: 4 minutes

Refract: An Overview

Refract is a DSML platform that helps multiple personas like Data Scientist, ML Engineer, Model Quality Controller, MLOPs Engineer, and Model Governance Officer work seamlessly together on any AI use case. Refract accelerates each stage of the ML lifecycle, including data preparation, model development, model deployment, scoring, and monitoring on snowflake data.

Large Language Models — LLMs

Large Language Models are advanced artificial intelligence models that have been trained on vast amounts of text data to understand and generate human-like text. These models are designed to process natural language input, enabling them to understand and generate response text in a way that is contextually relevant and coherent.

LLMs, such as GPT-3 (Generative Pre-trained Transformer 3) developed by OpenAI, have achieved significant advancements in natural language processing and have been applied to a wide range of tasks, including text completion, language translation, question- answering, and more. They can understand and generate text in various languages and can be fine-tuned for specific applications and domains.

The OpenAI API is a cloud-based service provided by OpenAI, where you make requests to OpenAI’s servers to access the language models and receive responses — you do not host the API infrastructure or models on your own servers. But the APIs can be integrated into your applications or services by making HTTP requests to the OpenAI servers.

Limitations of Cloud-hosted private LLMs

Let us discuss a couple of limitations of such Cloud- hosted LLMs.

  • Data privacy and security: The user might need to send input data (that might be sensitive) to cloud servers (where the LLM is hosted) for processing, and this may be in direct violation of enterprise compliance with relevant data protection regulations.
  • Lack of transparency: Due to the intricate nature of such models, comprehending them can prove difficult for individuals. The lack of transparency further compounds the issue, impeding one’s understanding of how these models handle data and utilize information for decision-making. Consequently, this could result in limited transparency, and potential issues related to safety and privacy.

Solution: Self-hosted, open-source LLMs

If you look at the timeline (see image above), you will find that several of these LLMs are a part of the open-source community. Usually, these open-source models are released with some pre-trained weights, after getting trained on generic data such as Wikipedia’s crowd-sourced content. These models can either be directly deployed for consumption or they can be fine-tuned on custom data before deployment.

As the title of this blog suggests, the open-sourced models can be fine-tuned/trained on Refract and deployed in Snowflake.

In this blog, I will provide a clear guide on how to fine-tune a T5 model on Refract for text summarization. In this example, we make use of PyTorch and HuggingFace transformers for fine-tuning the model on a News dataset. After fine-tuning, the model was able to generate a summary of long news content.

In this blog we will also illustrate how to deploy the model as a Python UDF (User Defined Function) on Snowflake.

Step-by-Step Guide:

Step 1: Create a project on Refract and attach your GitHub repository to it. This will help in managing all your code.

Step 2: Open the project and launch a template for writing the code. Refract offers multiple pre-designed templates, including one specifically for Snowflake. This template is designed with all the necessary dependencies for Snowflake and flexible to run on various compute options, including GPU, CPU, or Memory Optimized instances. Once you’ve made your choice, you can launch the corresponding template and get started with your Jupyter Notebook right away. In our case of fine-tuning an LLM, we will opt for the GPU enabled Snowflake template.

Step 3: Import libraries and set PyTorch device to GPU. Fine-tuning is a compute-intensive operation, this step enables Refracts GPU to accelerate fine-tuning by parallelizing tasks.

Step 4: Use the following class to initialize the dataset. This will help in tokenizing and preparing the data for fine-tuning. The data should have two columns: one is the document of arbitrary length, and the other is its shorter version or summary.

Step 5: Add the following two functions for training and evaluation.

Step 6: Load the T5 model. T5, an encoder-decoder model, operates by transforming all NLP problems into a text-to-text format. During training, it utilizes a technique called teacher forcing, which necessitates an input sequence and a corresponding target sequence. The input sequence is provided to the model using input_ids, while the target sequence is modified by shifting it to the right. This involves adding a start-sequence token at the beginning and feeding it to the decoder via decoder_input_ids. In the teacher-forcing approach, the target sequence is extended with an EOS token and serves as the label. The start-sequence token is represented by the PAD token. T5 offers the flexibility of being trained or fine-tuned in both supervised and unsupervised manners. In our case we will fine-tune the model in a supervised manner.

Step 7: Prepare training and validation data loaders.

Step 8: Start Training for N number of epochs. You can try starting with N=2 or N=3.

Step 9: Add model to Refracts registry to be able to track it when it moves from one stage to another in the ML lifecycle. Refract comes with a Python SDK (Software Development Kit) which can help with registering the model. User needs to provide a score function, and the model to the register_model function. The score function will have the logic to make predictions using the model. Importing the refractml library, writing the score function, and registering the model are covered in the following three cells.

Save the fine-tuned LLM to Refract persistent storage. The saved model can be downloaded any time from Refract and used for scoring. It has two components: tokenizer and model — we will save both.

Step 10: Deploy the fine-tuned LLM on snowflake. We will use Snowpark Python UDF (User Defined Function) for deploying the model. The UDF will have the logic to load the model and tokenizer into memory, and then generate the response by feeding the input. At this time, the fine-tuned model and tokenizer which was saved to Refract’s persistent storage in step 9, will be uploaded to a stage location. The response will be returned from the UDF.

First, a Snowflake session must be created.

Step 11: Create a stage location on Snowflake and write the model and tokenizer files and configurations to it. In this example, we have created a stage location named T5_P_TOKENIZER for storing the tokenizer and associated files, and T5_P_MODEL for storing the model and the configs of the tokenizer.

Below is an example of adding one of the config files to the Snowflake stage. Similarly, all other files should be put in their respective stages.

Step 12: Write the UDF logic. For the UDF to be able to access the model and tokenizer, we need to bind the files with the UDF. Session.add_imports will help in registering the staged file as an import of the UDF. In our case, we will use add_import for all the files staged in the previous step.

Step 13: Let us create a stage location called LLM and initialize the UDF with that. The Python packages on which the UDF is dependent can be defined using the package param, which takes the list of packages with version as input. In our case we need the parameters, ‘sentencepiece==0.1.95’, ‘snowflake-snowpark-python==1.0.0’, and ‘transformers==4.24.0’.

With this the T5 language model is deployed successfully in Snowflake and is ready for consumption.

Step 14: Now that the model is deployed, we can consume the same using the following code.

Here, the input text would be the document for which the summary must be generated, and the output would be the summary given by the LLM after processing the document.


In conclusion, cloud-hosted, private Large Language Models (LLMs) have certain limitations that need to be considered. These limitations include potential concerns regarding data privacy and security when sending data to cloud servers for processing, as well as the lack of transparency in understanding how these models handle data and make decisions. This lack of transparency can lead to safety and privacy issues, especially when companies choose not to reveal their proprietary code and data. Considering self-hosted, open-source LLMs as a possible solution can help overcome these limitations. These self-hosted, open-source models can be fine-tuned for a custom use case using Refract, and then hosted within a controlled environment like the Snowflake Data Cloud. This way, the deployed LLMs can be used to discover and answer prompts securely. In this blog, we have explored a step-by-step guide on how to fine-tune a T5 model on Refract for text summarization. Additionally, we have explored deploying the model as a Python-based User Defined Function (UDF) on Snowflake.

By leveraging self-hosted, open-source LLMs, organizations can have greater control over their data, ensure compliance with privacy regulations, and enhance transparency in how models handle information, enabling more secure and customizable language processing capabilities.

Click here to learn more about how the combination of Refract and Snowflake can help you get more value from your data with less effort.


Tushar Madheshia

Data Scientist- Refract by Fosfor

Tushar Madheshia works as a Data Scientist in the product engineering team at Fosfor. He is passionate about developing AI/ML solutions for complex business problems. Tushar designed and pioneered the MLOps module for Refract in the Fosfor suite, aiding efficient deployment and monitoring of models at scale. Currently, he is integrating Refract with Snowflake for seamless ML experimentation, deployment, and monitoring within Snowflake, without data egress.

Latest Blogs

See how your peers leverage Fosfor + Snowflake to create the value they want consistently.

A tale of two events: Inside Snowflake’s and Databricks' marquee events

The simultaneous timing of the events did raise some eyebrows in the industry, and soon it became evident that a fierce competition was unfolding between the two powerhouses, both vying for a similar target audience and a larger partner ecosystem. While Snowflake’s spectacular show in Las Vegas boasted over 12000 confirmed attendees, there was an equally palpable excitement in the air for the Databricks event happening in San Francisco.

Read more

5 ways to foster data curiosity in your business

Just as early humans survived on curiosity to discover fire and invent the wheel, today’s organizations built on data need to know their data terrain well to survive and stay on the top. They need to understand what data resources are available to them and what challenges they face from disruptors, who always keep them on their toes.

Read more

Bias in AI: A primer

While Artificial Intelligence (AI) systems can be highly accurate, they are imperfect. As such, they may make incorrect decisions or predictions. Several challenges need to be solved for the development and adoption of technology. One major challenge is the bias in AI systems. Bias in AI refers to the systematic differences between a model's predicted and true output. These deviations can lead to incorrect or unfair outcomes, which can seriously affect critical fields like healthcare, finance, and criminal justice.

Read more