Unfolding the universe of possibilities..

Whispers from the digital wind, hang tight..

A Good Description Is All You Need

How to use few shot learning to improve text classification performance

Photo by Patrick Tomasso on Unsplash.

I have been using large language models (LLMs) for a while now, both for personal projects and as part of my day-to-day work. Like many people, I am excited by the powerful capabilities of these models. Yet it’s important to know that while these models are very strong, one can always improve them for various tasks.

And NO, I am not going to write about fine-tuning LLMs, which can be costly and often requires a good GPU-powered device. In fact, I’m going to show you a very simple method of improving your model using few-shot learning.

Few-shot learning is a machine learning technique where models are trained to solve new tasks using only a few examples, often just 1–5 examples per class. There are a number of key points to few-shot learning:

Learning to generalize from small data: Few-shot learning methods aim to learn models that can generalize well from a small number of examples, in contrast to traditional deep learning methods that require thousands or millions of examples.Transfer learning: Few-shot learning methods leverage knowledge gained from solving previous tasks and transfer that knowledge to help learn new tasks faster and from less data. This transfer learning capability is key.Learning similarity metrics: Some few-shot learning techniques focus on learning a similarity metric between examples. This allows comparing new examples to existing labeled examples to make predictions.

But how can one use few-shot learning in a classification problem to improve model performance? Let’s walk through an example.

Data and Prep

I have started my analysis by obtaining data from HuggingFace. The dataset is called financial-reports-sec (This dataset has Apache License 2.0 and permits for commercial use), and according to the dataset authors, it contains the annual reports of U.S. public companies filing with the SEC EDGAR system from 1993–2020. Each annual report (10-K filing) is divided into 20 sections.

Two relevant attributes of this data are useful for the current task:

Sentence: Excerpts from the 10-K filing reportsSection: Labels denoting the section of the 10-K filing that the sentence belongs to

I have focused on three sections:

Business (Item 1): Describes the company’s business, including subsidiaries, markets, recent events, competition, regulations, and labor. Denoted by 0 in the data.Risk Factors (Item 1A): Discusses risks that could impact the company, such as external factors, potential failures, and other disclosures to warn investors. Denoted by 1.Properties (Item 2): Details significant physical property assets. Does not include intellectual or intangible assets. Denoted by 3.

For each label, I sampled 10 examples without replacement. The data is structured as follows:

Off the shelf prediction

Once the data is ready, all I have to do is to make a classifier function that takes the sentence from the dataframe and predicts the label.

Role = ”’
You are expert in SEC 10-K forms.
You will be presented by a text and you need to classify the text into either ‘Item 1’, ‘Item 1A’ or ‘Item 2’.
The text only belongs to one of the mentioned categories so only return one category.
”’
def sec_classifier(text):

response = openai.ChatCompletion.create(
model=’gpt-4′,
messages=[
{
“role”: “system”,
“content”: Role},
{
“role”: “user”,
“content”: text}],
temperature=0,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0)

return response[‘choices’][0][‘message’][‘content’]

I’m using GPT-4 here since it’s OpenAI’s most capable model so far. I’ve also set the temperature to 0 just to make sure the model does not go off track. The really fun part is how I define the Role — that’s where I get to guide the model on what I want it to do. The Role tells it to stay focused and deliver the kind of output I’m looking for. Defining a clear role for the model helps it generate relevant, high-quality responses. The prompt in this function is:

You are expert in SEC 10-K forms.
You will be presented by a text and you need to classify the text into either ‘Item 1’, ‘Item 1A’ or ‘Item 2’.
The text only belongs to one of the mentioned categories so only return one category.

After applying the classification function across all data rows, I generated a classification report to evaluate model performance. The macro average F1 score was 0.62, indicating reasonably strong predictive capabilities for this multi-class problem. Since the number of examples was balanced across all 3 classes, the macro and weighted averages converged to the same value. This baseline score reflects the out-of-the-box accuracy of the pretrained model prior to any additional tuning or optimization.

precision recall f1-score support

Item 1 0.47 0.80 0.59 10
Item 1A 0.80 0.80 0.80 10
Item 2 1.00 0.30 0.46 10

accuracy 0.63 30
macro avg 0.76 0.63 0.62 30
weighted avg 0.76 0.63 0.62 30

Description is all you need (few-shot prediction)

As mentioned, few-shot learning is all about generalising the model with a few good examples. To that end, I’ve modified my class by describing what Item 1, Item 1A and Item2 are (based on Wikipedia):

Role_fewshot = ”’
You are expert in SEC 10-K forms.
You will be presented by a text and you need to classify the text into either ‘Item 1’, ‘Item 1A’ or ‘Item 2’.
The text only belongs to one of the mentioned categories so only return one category.
In your classification take the following definitions into account:

Item 1 (i.e. Business) describes the business of the company: who and what the company does, what subsidiaries it owns, and what markets it operates in.
It may also include recent events, competition, regulations, and labor issues. (Some industries are heavily regulated, have complex labor requirements, which have significant effects on the business.)
Other topics in this section may include special operating costs, seasonal factors, or insurance matters.

Item 1A (i.e. Risk Factors) is the section where the company lays anything that could go wrong, likely external effects, possible future failures to meet obligations, and other risks disclosed to adequately warn investors and potential investors.

Item 2 (i.e. Properties) is the section that lays out the significant properties, physical assets, of the company. This only includes physical types of property, not intellectual or intangible property.

Note: Only state the Item.
”’
def sec_classifier_fewshot(text):

response = openai.ChatCompletion.create(
model=’gpt-4′,
messages=[
{
“role”: “system”,
“content”: Role_fewshot},
{
“role”: “user”,
“content”: text}],
temperature=0,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0)

return response[‘choices’][0][‘message’][‘content’]

The prompt now reads:

You are expert in SEC 10-K forms.
You will be presented by a text and you need to classify the text into either ‘Item 1’, ‘Item 1A’ or ‘Item 2’.
The text only belongs to one of the mentioned categories so only return one category.
In your classification take the following definitions into account:Item 1 (i.e. Business) describes the business of the company: who and what the company does, what subsidiaries it owns, and what markets it operates in.
It may also include recent events, competition, regulations, and labor issues. (Some industries are heavily regulated, have complex labor requirements, which have significant effects on the business.)
Other topics in this section may include special operating costs, seasonal factors, or insurance matters.
Item 1A (i.e. Risk Factors) is the section where the company lays anything that could go wrong, likely external effects, possible future failures to meet obligations, and other risks disclosed to adequately warn investors and potential investors.Item 2 (i.e. Properties) is the section that lays out the significant properties, physical assets, of the company. This only includes physical types of property, not intellectual or intangible property.

If we run this on the texts we get the following performance:

precision recall f1-score support

Item 1 0.70 0.70 0.70 10
Item 1A 0.78 0.70 0.74 10
Item 2 0.91 1.00 0.95 10

accuracy 0.80 30
macro avg 0.80 0.80 0.80 30
weighted avg 0.80 0.80 0.80 30

The macro average F1 is now 0.80, that is 29% improvement in our prediction, only by providing a good description of each class.

Finally you can see the full dataset:

In fact the examples I provided gives the model concrete instances to learn from. Examples allow the model to infer patterns and features, by looking at multiple examples, the model can start to notice commonalities and differences that characterise the overall concept being learned. This helps the model form a more robust representation. Furthermore, providing examples essentially acts as a weak form of supervision, guiding the model towards the desired behaviour in lieu of large labeled datasets.

In the few-shot function, concrete examples help point the model to the types of information and patterns it should pay attention to. In summary, concrete examples are important for few-shot learning as they provide anchor points for the model to build an initial representation of a novel concept, which can then be refined over the few examples provided. The inductive learning from specific instances helps models develop nuanced representations of abstract concepts.

If you’ve enjoyed reading this and want to keep in touch, you can find me on my LinkedIn or via my webpage: iliateimouri.com

Note: All images, unless otherwise noted, are by the author.

A Good Description Is All You Need was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment