What's real and what isn't with ChatGPT?
There's a huge amount of hype surrounding ChatGPT and I've heard all kinds of "game changing" stories around it. But what's real and what's not?
In this blog post, I'm going to show you one of the real things ChatGPT can do: extract meaning from text. I'll show you how well it performs, discuss some of its shortcomings, and highlight important considerations for using it in business. I'm going to do it with real code and real data.
We're going to use ChatGPT to extract meaning from news articles, specifically, two articles on the Women's World Cup.
D J Shin, CC BY-SA 3.0, via Wikimedia Commons. I for one, welcome our new robot overlords...
The Women's World Cup
At the time of writing, the Women's World Cup is in full swing and England have just beaten China 6-1. There were plenty of news stories about it, so I took just two and tried to extract structured, factual data from the articles.
Here are the two articles:
Here is the data I wanted to pull out of the text:
- The sport being played
- The competition
- The names of the teams
- Who won
- The score
- The attendance
I wanted it in a structured format, in this case, JSON.
Obviously, you could read the articles and just extract the information, but the value of ChatGPT is doing this at scale, to scan thousands or millions of articles to search for key pieces of information. Up until now, this has been done by paying people in the developing world to read articles and extract data. ChatGPT offers the prospect of slashing the cost of this kind of work and making it widely available.
Let's see it in action.
Getting started
This example is all in Python and I'm assuming you have a good grasp of the language.
Download the OpenAI library:
pip install openai
Register for OpenAI and get an API key. At the time of writing, you get $5 in free credits and this tutorial won't consume much of that $5.
You'll need to set your API key in your code. To get going, we'll just paste it into our Python file:
import openai
openai.api_key = "YOUR_KEY"
You should note that OpenAI will rescind any keys they find on the public internet. My use of the key in code is very sloppy from a security point of view. Only do it to get started.
Some ChatGPT basics
We're going to focus on just one part of ChatGPT, the ChatCompletion API. Because there's some complexity here, I'm going to go through some of the background before diving into the code.
To set the certainty of its answers, ChatGPT has a concept of "temperature". This is a parameter that sets how "sure" the answer is; the lower the number the more sure the answer. A more certain answer comes at the price of creativity, so for some applications, you might want to choose a higher temperature (for example, you might want a higher temperature for a chatbot). The temperature range is 0 to 1, and we'll use 0 for this example because we want highly reliable analysis.
There are several ChatGPT models each with a different pricing structure. As you might expect, the larger and more recent models are more expensive, so for this tutorial, I'm going to use an older and cheaper model, "gpt-3.5-turbo", that works well enough to show what ChatGPT can do.
ChatGPT works on a model of "roles" and "messages". Roles are the actors in a chat; for a chatbot there will be a "user" role, which is the human entering text, an "assistant" role which is the chat response, and a "system" role controlling the assistant. Messages are the text from the user or the assistant or a "briefing" for the system. For a chatbot, we need multiple messages, but to extract meaning from text, we just need one. To analyze the World Cup articles, we only need the user role.
To get an answer, we need to pose a question or give ChatGPT an instruction on what to do. That's part of the "content" we set in the messages parameter. The content must contain the text we want to analyze and instructions on what we want returned. This is a bigger topic and I'm going to dive into it next.
Prompt engineering part 1
Setting the prompt correctly is the core of ChatGBP and it's a bit of an art, which is why it's been called prompt engineering. You have to very carefully write your prompt to get the results you expect.
Oddly, ChatGPT doesn't separate the text from the query; they're all bundled together in the same prompt. This means you have to clearly tell ChatGPT what you want to analyze and how you want it analyzed.
Let's start with a simple example, let's imagine you want to know how many times the letter "e" occurs in the text "The kind old elephant." Here's how you might write the prompt:
f"""In the following text, how often does the letter e occur:
"The kind old elephant"
"""
This gives us the correct answer (3). We'll come back to this prompt later because it shows some of the pitfalls of working with ChatGPT. In general, we need to be crystal clear about the text we want the system to analyze.
Let's say we wanted the result in JSON, here's how we might write the prompt:
f"""
In the following text, how often does the letter e occur, write your answer as JSON:
"The kind old elephant"
"""
Which gives us {"e": 3}
We can ask more complex questions about some text, but we need to very carefully layout the query and distinguish between text and questions. Here's an example.
prompt = f"""
In the text indicated by three back ticks answer the \
following questions and output your answer as JSON \
using the key names indicated by the word "key_name" \
1) how often does the letter e occur key_name = "letter" \
2) what animal is referred to key_name = "animal" \
```The kind old elephant```
"""
Using ChatGPT
Let's put what we've learned together and build a ChatGPT query to ask questions about the Women's World Cup. Here's the code using the BBC article.
world = """
Lauren James produced a sensational individual
performance as England entertained to sweep aside
China and book their place in the last 16 of the
Women's World Cup as group winners.
It was a display worthy of their status as European
champions and James once again lit the stage alight
in Adelaide with two sensational goals and three assists.
The 13,497 in attendance were treated to a masterclass
from Chelsea's James, who announced her arrival at the
World Cup with the match-winner against Denmark on Friday.
She helped England get off to the perfect start when
she teed up Alessia Russo for the opener, and
later slipped the ball through to Lauren Hemp to
coolly place it into the bottom corner.
It was largely one-way traffic as England dominated
and overwhelmed, James striking it first time into
the corner from the edge of the box to make it 3-0
before another stunning finish was ruled out by video
assistant referee (VAR) for offside in the build-up.
China knew they were heading out of the tournament
unless they responded, so they came out with more
aggression in the second half, unnerving England
slightly when Shuang Wang scored from the penalty
spot after VAR picked up a handball by defender
Lucy Bronze.
But James was not done yet - she volleyed Jess Carter's
deep cross past helpless goalkeeper Yu Zhu for
England's fourth before substitute Chloe Kelly and
striker Rachel Daly joined the party.
England, who had quietly gone about their business
in the group stages, will have raised eyebrows with
this performance before their last-16 match against
Nigeria on Monday, which will be shown live on
BBC One at 08:30 BST.
China are out of the competition after Denmark beat
Haiti to finish in second place in Group D.
England prove worth without Walsh
Manager Sarina Wiegman kept everyone guessing when
she named her starting XI, with England fans
anxiously waiting to see how they would set up
without injured midfielder Keira Walsh.
Wiegman's response was to unleash England's attacking
talent on a China side who struggled to match them
in physicality, intensity and sharpness.
James oozed magic and unpredictability, Hemp used her
pace to test China's defence and captain Millie Bright
was ferocious in her tackling, winning the ball back
on countless occasions.
After nudging past Haiti and Denmark with fairly
underwhelming 1-0 wins, England were keen to impose
themselves from the start. Although China had chances
in the second half, they were always second best.
Goalkeeper Mary Earps will be disappointed not to keep
a clean sheet, but she made two smart saves to deny
Chen Qiaozhu.
While England are yet to meet a side ranked inside
the world's top 10 at the tournament, this will help
quieten doubts that they might struggle without the
instrumental Walsh.
"We're really growing into the tournament now," said
captain Bright. "We got a lot of criticism in the first
two games but we were not concerned at all.
"It's unbelievable to be in the same team as
[the youngsters]. It feels ridiculous and I'm quite
proud. Players feeling like they can express themselves
on the pitch is what we want."
James given standing ovation
The name on everyone's lips following England's win
over Denmark was 'Lauren James', and those leaving
Adelaide on Tuesday evening will struggle to forget
her performance against China any time soon.
She punished China for the space they allowed her on
the edge of the box in the first half and could have
had a hat-trick were it not for the intervention of VAR.
Greeted on the touchline by a grinning Wiegman,
James was substituted with time to spare in the second
half and went off to a standing ovation from large
sections of the stadium.
"She's special - a very special player for us and
for women's football in general," said Kelly. "She's
a special talent and the future is bright."
She became only the third player on record (since 2011)
to be directly involved in five goals in a Women's
World Cup game.
With competition for attacking places in England's
starting XI extremely high, James has proven she is
far too good to leave out of the side and is quickly
becoming a star at this tournament at the age of 21.
"""
prompt = f"""
In the text indicated by three back ticks answer the \
following questions and output your answer as JSON \
using the key names indicated by the word key_name" \
1) What sport was being played? key_name="sport" \
2) What competition was it? key_name="competition" \
3) What teams were playing? key_name = "teams" \
4) Which team won? key_name = "winner" \
5) What was the final score? key_name = "score" \
6) How many people attended the match? key_name = "attendance" \
```{world}```
"""
messages = [{"role": "user", "content": prompt}]
response = (openai
.ChatCompletion
.create(model=model,
messages=messages,
temperature=0)
)
print(response.choices[0].message["content"])
Here are the results this code produces:
{
"sport": "Football",
"competition": "Women's World Cup",
"teams": "England and China",
"winner": "England",
"score": "England 5 - China 1",
"attendance": 13497
}
This is mostly right, but not quite. The score was actually 6-1. Even worse, the results are very sensitive to the text layout; changing line breaks changes the score.
I ran the same query, but with the Guardian article instead and here's what I got:
{
"sport": "football",
"competition": "World Cup",
"teams": "England and China",
"winner": "England",
"score": "6-1",
"attendance": null
}
With a better prompt, it might be possible to get better consistency and remove some of the formatting inconsistencies. By analyzing multiple articles on the same event, it may be possible to increase the accuracy still further.
Hallucinations
Sometimes ChatGPT gets it very wrong and supplies wildly wrong answers. We've seen a little of that with its analysis of the World Cup game, it wrongly inferred a score of 5-1 when it should have been 6-1. But ChatGPT can get it wrong in much worse ways.
I ran the queries above with text from the BBC and The Guardian. What if I ran the query with no text at all? Here's what I get when there's no text at all to analyze.
{
"sport": "football",
"competition": "World Cup",
"teams": ["France", "Croatia"],
"winner": "France",
"score": "4-2",
"attendance": "80,000"
}
Which is completely made up, hence the term hallucination.
Prompt engineering part 2
Let's go back to my elephant example from earlier and write it this way:
prompt = f"""
In the following text, "The kind old elephant",
how often does the letter e occur
"""
model="gpt-3.5-turbo"
messages = [{"role": "user", "content": prompt}]
response = (openai
.ChatCompletion
.create(model=model,
messages=messages,
temperature=0)
)
print(response.choices[0].message["content"])
Here's what the code returns:
In the phrase "The kind old elephant," the letter "e" occurs 4 times.
Which is clearly wrong.
In this case, the problem is the placement of the text to be analyzed. Moving the text to the end of the prompt and being more explicit about what should be returned helps. Even simply adding the phrase "Give your answer as JSON" to the prompt fixes the issue.
This is why the precise form of the prompt you use is critical and why it may take several iterations to get it right.
What does all this mean?
The promise of ChatGPT
It is possible to analyze text and extract information from it. This is huge and transformative for business. Here are just a few of the things that are possible:
- Press clippings automation.
- Extraction of information from bills of lading.
- Automated analysis of SEC filings.
- Automated analysis of company formation documents.
- Entity extraction.
We haven't even touched on some of the many other things ChatGPT can do, for example:
- Language translation.
- Summarization.
- Report writing.
How to deliver on that promise
As I've shown in this blog post, the art is in prompt engineering. To get it right, you need to invest a good deal of time in getting your prompts just right and you need to test out your prompts on a wide range of inputs. The good news is, this isn't rocket science.
The skills you need
The biggest change ChatGPT introduces is skill levels. Previously, doing this kind of analysis required a good grasp of theory and underlying libraries. It took quite a lot of effort to build a system to analyze text. Not any more; the skill level has just dropped precipitously; previously, you needed a Ph.D., now you don't. Now it's all about formulating a good prompt and that's something a good analyst can do really well.
The bottom line
ChatGPT, and LLMs in general, are transformative. Any business that relies on information must know how to use them.