Get Started with ChatGPT API: A Step-by-Step Guide for Python Programmers (updated for OpenAI SDK version 1.1.1+)

Update: This article has been updated in November 2023 to include changes in the OpenAI SDK version 1.1.1+ and new features announced during OpenAI DevDay 2023.

I will show how to use the ChatGPT API, do proper prompt engineering, and make it interactive. At the end of the article, I will also show you how to limit the cost of API calls and use the API parameters to get better results.

If you are looking for a guide regarding the function calling API, check out my “Use OpenAI API Function Calling to Build a Chatbot for Slack with Access to a REST API” article.

Basic ChatGPT API Usage

Before we start, install the openai dependency, import it, and create a client. You will also need to specify the API key:

from openai import OpenAI
API_KEY = "..."

client = OpenAI(
    api_key=API_KEY
)

OpenAI has created a new API method that works slightly differently than other methods available earlier. In the case of ChatGPT, we don’t send a text prompt to the model. Instead, we send a list of messages. Each message has a role and content. The role can be either user, assistant, or system. The content is the message itself. The API uses the entire chat history to generate the next message every time. The API returns only the next message, so we must keep the history of messages ourselves if we want to implement a longer interaction.

Example:

query = [
    {"role": "system", "content": "You are a MySQL database. Return responses in the same format as MySQL."},
    {"role": "user", "content": "insert into users(name, email) values ('John', 'john@galt.example');"},
    {"role": "user", "content": "select count(*) from users"}
]

result = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query
)

In the result variable, we get a completion object whose text representation looks like this:

ChatCompletion(id='...', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='1', role='assistant', function_call=None, tool_calls=None))], created=1699334796, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=1, prompt_tokens=52, total_tokens=53))

We can get the response text using the choices attribute, selecting the first (and only) result and getting its message property.

result.choices[0].message.content

How to write messages for ChatGPT API

The input for ChatCompletion differs from all of the OpenAI methods. First of all, we specify a list of messages. Each message is a chat interaction: your message or the model’s response. We can distinguish between them using the role argument. You send messages denoted with the role user. Messages denoted with assistant are responses from the model.

Of course, we don’t have to provide actual responses. It’s ok to write a message with the role assistant and pass it as an example response. In fact, you can use them for in-context learning.

Additionally, we can use the system role to specify the context. The system message describes the situation in which the conversation takes place. It can describe the task, the data, or any other relevant information.

We can send system messages at any time. It’s useful to send more than one when you want to change the context in the middle of the conversation. For example, we start by asking ChatGPT to act as a MySQL database, but later we switch to a role in which the chatbot explains given SQL commands in German:

query = [
    {"role": "system", "content": "You are a MySQL database. Return responses in the same format as MySQL."},
    {"role": "user", "content": "insert into users(name, email) values ('John', 'john@galt.com');"},
    {"role": "system", "content": "You are an AI assistant. Explain what the given query does. Return the response in German."},
    {"role": "user", "content": "select count(*) from users"}
]

client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query
)

We see the model adheres to the last given context statement:

ChatCompletion(
  id='...',
  choices=[Choice(
    finish_reason='stop',
    index=0,
    message=ChatCompletionMessage(content='Die angegebene Abfrage zählt die Anzahl der Datensätze in der Tabelle "users".',
    role='assistant',
    function_call=None,
    tool_calls=None
  ))],
  created=1699334999,
  model='gpt-3.5-turbo-0613',
  object='chat.completion',
  system_fingerprint=None,
  usage=CompletionUsage(completion_tokens=22, prompt_tokens=75, total_tokens=97)
)

How to make ChatGPT API interactive

To make an interactive conversation like in the ChatGPT web interface, we need to store the history of messages written by the user and generated by ChatGPT. Additionally, we need functions that pass the model’s response to the user and get the user’s message. In the example below, I use the input() function to get the user’s message and the print() function to show the model’s response.

def talk_with(persona, tell_user, ask_user):
    message_history = []
    while True:
        user_input = ask_user()
        if user_input == "":
            return message_history

        message_history.append({"role": "user", "content": user_input})
        query = [{"role": "system", "content": persona}]
        query.extend(message_history)
        result = client.chat.completions.create(
          model="gpt-3.5-turbo",
          messages=query
        )
        gpt_message = result.choices[0].message
        message_history.append({"role": gpt_message.role, "content": gpt_message.content})
        tell_user("GPT: " + gpt_message.content)

To use the function, we can call it like this:

talk_with(
    persona="""You are a helpful cooking expert. You answer question by providing a short explanation and a list of easy to follow steps. You list ingredients, tools, and instructions.""",
    tell_user=print,
    ask_user=input
)

How to limit the cost of API calls

Usually, when you use the OpenAI API, the number of tokens in the response is limited to 16, and you have to modify the max_tokens parameter to get longer responses. It is not the case when you use ChatGPT API. This API has no limit on the number of tokens. Right now, the tokens are limited only by the model itself. The model can handle 4096 tokens, so the default maximal output length is 4096 - the number of tokens in the prompt.

To control the cost of API calls, you can explicitly set the max_tokens parameter. Remember that the model’s response doesn’t depend on the parameter. It will not try to be more succinct when you set max_tokens to a low value. Instead, the OpenAI backend will cut the response in the middle of a sentence when the model runs out of tokens.

query = [
    {"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
    {"role": "user", "content": "How to find and hire great programmers?"}
]

result = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query,
  max_tokens=30
)

result.choices[0]

The response gets truncated after 30 tokens:

Choice(
  finish_reason='length',
  index=0,
  message=ChatCompletionMessage(content="Oh, it's a piece of cake! Just wave a magic wand and they'll come flocking to you. Alternatively, you could try a more",
  role='assistant',
  function_call=None,
  tool_calls=None)
)

ChatGPT parameters explained

In this tutorial, I will focus on the most useful ChatGPT parameters. If you are interested in the full list, subscribe to the newsletter and get notified when I write an article about it.

Change the number of ChatGPT responses

We can generate more than one response for a given text. It’s useful when you want to explore alternatives or when you want to generate messages for an A/B test in a single API call.

When we set the n parameter to the value of responses we want, we will get a corresponding number of elements in the choices list in the response:

query = [
    {"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
    {"role": "user", "content": "How to make a nation thrive?"}
]

result = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query,
  n=2
)

Now, the response contains two elements, and we can refer to them by index:

result.choices[1]

Making the responses more or less predictable

You may get a slightly different response whenever you send the same message to ChatGPT. However, you can make the answers deterministic by setting the temperature parameter to 0.0 or (if you are more adventurous but still want predictability) to a low value between 0.0 and 0.5.

query = [
    {"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
    {"role": "user", "content": "How to build an AI chatbot?"}
]

result = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query,
  temperature = 0.0
)

result.choices[0]

If I send the same request again, it will return the same response:

Choice(
  finish_reason='stop',
  index=0,
  message=ChatCompletionMessage(content="Oh, building an AI chatbot? Piece of cake! Just sprinkle some fairy dust, wave a magic wand, and voila! You've got yourself a fully functional AI chatbot. Easy peasy lemon squeezy!",
  role='assistant',
  function_call=None,
  tool_calls=None)
)

When setting the temperature to 0 is not enough, and we still get not deterministic responses, we can set the seed parameter to a numeric value:

result = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query,
  temperature = 0.0,
  seed = 1337
)

However, OpenAI says that determinism isn’t guaranteed even when we set both seed and temperature to 0.0. After all, the underlying backend may change. We may use the system_fingerprint parameter to check whether OpenAI modified the backend between requests.

To make results less predictable, we can set the temperature parameter to a value between 1.0 and 2.0 to make the responses more random. The model will still follow the instructions and try to answer your question, but it will be more creative, and multiple people asking about the same topic won’t get the same answer:

query = [
    {"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
    {"role": "user", "content": "How to make a nation thrive?"}
]

result = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query,
  temperature = 1.2
)

result.choices[0].message.content

The same request as in the example where I showed you how to get multiple responses at once, but with a higher temperature, and we get a completely different response, but, of course, in the same style:

Oh, just a casual question about how to create a utopia. No biggie. Well, if you want to ensure that a nation thrives, I suppose the first step would be to protect individual rights and incentivize innovation and productivity. But, you know, that's just me.

Getting JSON from the AI

Up until November 2023, the only way to reliably get JSON as the AI response was to write a prompt instructing it to create such a response or use a result-validating library such as Guardrails.

Now, we can enable the JSON mode by setting the response_format to json_object.

result = client.chat.completions.create(
  model="gpt-3.5-turbo-1106",
  messages=query,
  response_format="json_object"
)

Of course, it isn’t perfect. When we exceed the token limit while generating the response, we will get a partial JSON that cannot be parsed. However, it’s still better than nothing. Personally, I’m going to use it in addition to the Guardrails library and detailed instructions in the prompt, not as a replacement.

How to request or ban certain words

When we want the answer always to include a word (or we want to ban words from the answer), we can do it in the prompt by describing what we want to see or what we don’t want to see. However, there is a better way that doesn’t require us to spend money on tokens.

We can use the logit_bias parameter to affect the probability of producing a particular token while generating the response. The parameter accepts a dictionary with the token ids as keys and the logit bias as values. The logit bias is a number between -100 and 100. The higher the number, the more likely the token will be produced. The lower the number, the less likely the token will be produced.

The keys of the dictionary are token ids, not words. You can get those tokens using the Embedding API or the OpenAI web tokenizer. The web tokenizer is useful for ad-hoc queries and testing or when you want to specify a constant value for desired/banned tokens in your code.

Let’s say I want to generate a response in a conversation. The model is supposed to ask for cat food, but we won’t mention it in the prompt:

query = [
    {"role": "system", "content": "You pretend to be a client who wants to buy food for a pet. You want two cans of food. Say what pet you have."},
    {"role": "user", "content": "Hi, how can I help you?"}
]

Instead, we will open the web tokenizer and get token ids for the words we want. If I type the word cat, in the tokenizer, I get the token id: 9246. However, it’s not enough. OpenAI uses different token ids for the same word when the word occurs in the middle of a sentence. In this case, they prepend a space to the word, so I get two values when I generate tokens for a string cat cat — one for a “cat”: 9246, and one for a “ cat”: 3797.

We want to increase the likelihood of seeing “ cat” in the generated response, so we create a bias dictionary with the token id and the value 20 as the bias parameter.

logit_bias = {
    3797: 20
}

Now, we can send the request to the API:

result = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=query,
  logit_bias = logit_bias
)
result.choices[0].message.content
Hello, I'm looking to purchase some food for my pet. I have a cat and I would like to buy two cans of cat food. Can you assist me with that?

Those parameters modify the probability. They won’t make an improbable thing happen. If you try to force ChatGPT to use a word that makes no sense in the context, it won’t do it.

Also, ridiculously high values (but within the supported range) tend to break the probabilities, and the API call fails or returns gibberish:

When I use the same input with bias 100:

logit_bias = {
    3797: 100
}

In the response, I get _sh repeated until the end of the token window:

_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_sh_...

During some tests, I was getting a failure from the API:

APIError: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at...

If you want to ban a word from the response, set the bias parameter to a negative value (between -1 and -100). Of course, if the word is the only thing that makes sense in the context, the model may still use it, generate nonsense, or just crash.


Do you need help building own AI-powered tools for your business?
You can hire me!

Older post

Maximize Customer Support Efficiency: Build an AI Chatbot to Answer Common Client Questions

How to build an AI-powered Facebook chatbot using GPT-3 from OpenAI and vector databases to answer client questions using your documentation - a tutorial with step-by-step instructions. You will learn how to set up a database, create text embeddings, use MLOps and prompt engineering to retrieve answers, and build a web application to connect with the Facebook API.

Newer post

Build an AI-powered Newsletter Generator with dust.tt and OpenAI

How to create an AI-powered newsletter generator using the dust.tt and OpenAI API. You'll learn how to set up a website with the AI application, use the few-shot in-context learning technique to train the AI model, and deploy the API to generate newsletters.