5 min read

Prompt Engineering: The New Era of AI

Reading Time: 5 minutes

Prompt Engineering: The New Era of AI

Trends are changing in the data science domain every day. Many tools, techniques, libraries, and algorithms are developing daily. This constantly changing landscape keeps the data science domain at the bleeding edge. The techniques and methods used to solve different tasks in Machine Learning (ML)/ Deep Learning (DL) and Natural Language Processing (NLP) are also changing.

What is next in Artificial Intelligence (AI)? This is one of the questions every data science aspirant should ask themself. In the initial years of the AI revolution, symbolic AI-related rule-based systems solved the different ML/DL or NLP problems such as text classification, natural language text extraction, passage summarization, data mining, etc. As time passed, statistical models replaced symbolic AI-based ML solutions. In statistical models, we use machine learning or deep learning systems with lots of training data and use model prediction. Statistical models need lots of training data for training. Realistically, obtaining that much labeled, cleaned data to train isn’t easy. This significant limitation to statistical modeling was overcome by a new learning method called prompt engineering.

Prompt-based machine learning, aka prompt engineering, has already opened many possibilities in ML. Lots of research is taking place in this area globally. Let’s dive into prompt engineering and why it is becoming more popular in data science.

What is Prompt Engineering?

Prompt engineering is the process of designing and creating prompts, or input data for AI models to train on as they learn specific tasks.

Prompt engineering is a new concept in artificial intelligence, particularly, in NLP. In prompt engineering, the task description is embedded in the input.. Prompt engineering typically works by converting one or more tasks into a prompt-based dataset and training a language model with what has been called “prompt-based learning” or just “prompt learning.”

This technique became popular in 2020 with GPT- 3, the third generation of Generative Pretrained Transformer. AI solution development has never been simple, but with GPT-3, you only need a meaningful training prompt written in simple English.

The first thing we must realize while working with GPT-3 is how to design required prompts according to our use case. Like In statistical models, the quality of training data boosts prediction quality. Here, the quality of input prompts increases the quality of the task it performs. So, a good prompt can deliver a good task performance in the desired favorable context. Writing the best prompt is often a matter of trial and error.

Prompt-based learning is based on Large Language Models (LLM) that directly model text probability. In contrast, traditional supervised learning trains a model to take an input x and predict an output y as P(y|x). In order to use these models for prediction tasks, the initial input (x) is changed using a template into a textual string prompt with some empty slots (x’). The language model is then used to probabilistically fill the empty information to obtain a final string (x^), from which the final output (y) can be deduced. This architecture is effective and appealing for various reasons. For example, this architecture allows the user to pre-train the language model on enormous volumes of unlabeled text and conduct few-shot, one-shot, or even zero-shot learning by specifying a new prompting function.

Pretrain – Prompt – Predict Paradigm

As you now know, prompt engineering has evolved with the pretrained LLMs. As mentioned above, prompt engineering is based on the LLM. So all the prompts we trigger are hit on a Language Model behind the scene. A powerful, well-trained language model should be selected/Identified before we start the prompt engineering. The first step is selecting the pre-trained model.

Pretrain

While selecting a pre-trained model, considering the pretraining objective is significant. One can pretrain a language model in multiple ways depending on the context.

Pretraining via next token prediction

Next-token prediction is when we simply want to predict the next word after giving all the previous words in context. This is one of the most straightforward and easy-to-understand pretraining strategies. These pretraining objectives are so important for prompting considerations because they will affect the types of prompts we can give the model and how we can incorporate answers into those prompts.

Pretraining via masked token prediction

Instead of predicting the next token by taking all other previous tokens, the masked token prediction method predicts any masked tokens within the input, given all the surrounding contexts. The classic BERT language model has been trained via this strategy.

Training an entailment model

More than a pretraining strategy, the entailment helps in the prompted classification tasks. Entailment means, given two statements, we want to determine if they imply one another, conflict with one another, or are neutral, meaning they have no bearing on one another.

Prompt and Prediction

Once we select the pretrained language model, the next step is designing the appropriate prompt for our desired task. Understanding what the pretrained language model knows about the world and how to get the model to use that knowledge to produce beneficial and contextual results is the key to generating successful prompts.

Here, in the form of a training prompt, we provide the model with just enough information to enable it to recognize the patterns and complete the task at hand. We don’t want to overwhelm the model’s natural intelligence by providing all the information simultaneously.

Given that the prompt describes the task, picking the right prompt significantly impacts both the accuracy and the first task that the model does.

As a general guideline, we should strive to elicit the required response from the model in a zero-shot learning paradigm when developing a helpful prompt. This indicates that the task should be finished without undergoing any sort of fine-tuning. If, after receiving the model’s response, you believe it falls short of your expectations, give the model one specific example along with the suggestion. The One-Shot Learning paradigm is used to describe this. If you still believe that the responses do not satisfy your needs, give the model a couple of additional instances along with the request. The Few-Shot Learning paradigm is what is used in this.

For simplicity, the standard flow for training prompt design should look like this: Zero-Shot → One Shot → Few shot

Let us check how we can generate prompts on the GPT-3 language model to perform our custom tasks.

Prompt with Zero-Shot

Consider the domain identification problem in NLP. In statistical modeling, we are used to solving it by training a machine learning or deep learning model using extensive training data. Modeling and training are not feasible in statistical machine learning without significant time and proper technical knowledge.

We can identify domains of some famous personalities using prompts in GPT-3 with zero-shot learning. Since the setup and background of GPT-3 are beyond this topic, we are directly moving to generally understanding prompts.

Prompt:
The following is a list of celebrities and the domains they fall into:
Leonardo DiCaprio, Elon musk, Ishan Kishan, Kamala Harris, Andrey Kurkov

Response:

Looks amazing right? Without providing prior knowledge, the pretrained model detects each domain based on the prompt.

Prompt with One-Shot

Consider the question-answering problem. Question-answering is one of the fundamental problems in NLP. Others include word sense, disambiguation, named entity identification, anaphora and cataphora resolution. Training a question-answering model from scratch and ensuring its performance is very complex in traditional machine learning.

Question-answering in machine learning is complex because it requires the model to understand natural language and context, reason, and retrieve relevant information from a large amount of data. Furthermore, natural language is complex and ambiguous, with multiple meanings and interpretations for the same words or phrases. This makes it challenging to accurately understand the question’s intent and provide an appropriate answer. The model must handle various forms of language, such as idioms, colloquialisms, and slang.

Context is another important factor in question-answering. The meaning of a word or phrase can vary depending on the context in which it is used. For example, the word “bat” can refer to a flying mammal or a piece of sports equipment, depending on the context. Thus, the model needs to be able to understand the broader context of a question to provide a relevant answer

Let us see how we can resolve this problem with prompt engineering.

An example and input are distinguished with the token

Prompt:
Context: The World Health Organization is a specialized agency of the United Nations responsible for international public health. Headquartered in Geneva, Switzerland, it has six regional offices and 150 field offices worldwide. The WHO was established on 7 April 1948.

Question: When was the WHO established?

Answer: 7 April 1948

Context: Greg Barclay has been unanimously re-elected as chairman of the International Cricket Council (ICC) for a second two-year term. The former New Zealand Cricket chair was unopposed following the withdrawal of Zimabwe’s Tavengwa Mukuhlani from the process, and the ICC Board reaffirmed its full support to Barclay to continue at the helm.

Question: Who is the chairman of ICC?

Answer:
Response:

Prompt with Few-Shot

Consider the text pattern challenge. Let’s see how the prompts with few-shot learning in GPT-3 can solve the problem.

Prompt:
Extract the ISO number from the Context:
Context: ISO 9001 is probably the most well-recognized ISO number in the world
ISO numbers of examples:
ISO 187288|
ISO 972012|
ISO 569295|

ISO number:
Response:

Prompt engineering is rising as a new era in Artificial Intelligence. Lots of research is happening in this domain today. Many prompt-based tools and libraries other than GPT-3 have already been developed.

Some of them include:

  • OpenAI Playground
  • GPTTools
  • LangChain
  • ThoughtSource
  • EveryPrompt
  • DUST
  • Dyno
  • Metaprompt
  • Prompts.ai
  • Lexica
  • Scale SpellBook
  • Interactive Composition Explorer
  • LearnGPT
  • AI Test Kitchen
  • betterprompt
  • Prompt Engine
  • PromptSource
  • sharegpt
  • DreamStudio
  • PromptInject

Prompt-based machine learning has already started to help us solve several bottleneck issues. Prompts will work in NLP and image processing domains. All AI aspirants worldwide look forward to this field’s latest research results.

So, why wait? Let’s prompt!

Author

Pradeep T

AI Engineer- Data Science in Fosfor (Lumin), LTI

Pradeep has 3+ years of progressive experience in executing data-driven solutions. He is proficient in machine learning and statistical modeling algorithms/techniques for solving data science problems. His expertise lies in unstructured data processing and Large Language Model (LLM) based applications. Pradeep holds a Master's degree in Computational Linguistics from APJ Abdul Kalam Technological University and a Bachelor's in Computer science from Cochin University of Science and Technology. He actively contributes to open-source projects and authors blog posts on artificial intelligence. He is a passionate data scientist who believes in making incremental progress toward daily growth.

Latest Blogs

See how your peers leverage Fosfor + Snowflake to create the value they want consistently.

Generative AI - Accelerate ML operations using GPT

As Data Science and Machine Learning practitioners, we often face the challenge of finding solutions to complex problems. One powerful artificial intelligence platform that can help speed up the process is the use of Generative Pretrained Transformer 3 (GPT-3) language model.

Read more

How modern data platforms are enabling enterprise DataOps

With the ever-increasing volume and complexity of data, extracting value from it has become more challenging. Organizations are facing challenges in providing the right set of data to the right team while maintaining data security at the same time. This presents an opportunity for organizations to become agile with their data management solution, provide the right data set to the correct domain team, and ensure collaboration among teams to get the maximum out of their data.

Read more

Key takeaways from the 2023 Gartner Data & Analytics Summit

The 2023 Gartner Data & Analytics summit at Orlando, Florida just concluded and like every year, there was so much to see, learn and discuss with my fellow data nerds.

Read more