July 19, 2023

How to Meter OpenAI API and ChatGPT UsageImplementing OpenAI API and ChatGPT token usage metering

AI spreads through the industry as wildfire, and new and existing products adopt various artificial intelligence capabilities. Integrating AI in your product costs you money as you need to pay either after API usage to vendors like OpenAI or to cover expensive computing resources. To ensure profitability, businesses adopting AI must charge back their customers for usage. This is why we see an increasing trend around products adopting usage-based pricing models, at least partially for these new AI-powered features. This trend will accelerate furthermore, as today, some businesses don’t charge their customers for AI capabilities to stay competitive, but as this will impact their long-term profitability, it’s expected that eventually, everyone will have some usage-based pricing component allowing them to pass on AI costs to customers transparently.

To attribute AI usage to your users accurately, it is crucial to implement accurate metering that can handle scale and report usage for any given period to power usage-based billing and trigger product features like in-app usage reporting. In this article, we will explore how to meter OpenAI API usage, enabling you to bill after this usage and gain valuable insights into your customers’ usage patterns which you can use, for example, to optimize your pricing strategies.

#1. Collecting Token Usage

When you integrate AI solutions into your products, the charges are usually consumption-based, such as the number of tokens, API Calls, images generated, or GPU time used. OpenAI, for instance, charges for language models like ChatGPT based on token usage. You can think of tokens as pieces of words, and their cost can vary depending on the AI model you're using (GPT-3.5 or GPT-4).

import { Configuration, OpenAIApi } from 'openai';

const configuration = new Configuration({
  organization,
  apiKey,
});
const openai = new OpenAIApi(configuration);

const { data } = await openai.createChatCompletion({
  model: 'gpt-3.5-turbo',
  messages: [
    {
      role: 'user',
      content: 'Hello world',
    },
  ],
});

console.log(data.usage);
// { prompt_tokens: 10, completion_tokens: 11, total_tokens: 21 }

console.log(data.id);
// chatcmpl-7TZHnpx8mbiVeCs02oITlyVOHyOuA,

console.log(new Date(data.created * 1000));
// '2023-01-01T00:00:01.000Z'

To measure how many tokens an API call uses, we can look at the OpenAI API response that conveniently returns the token usage in the data.usage.total_tokens field, so we don’t have to tokenize the prompt and response manually.

Now that we know how many tokens a single API call uses let’s meter it for multiple calls.

#2. Metering Token Usage

In a production application, many customers can simultaneously execute a large volume of OpenAI API calls on various server instances. We need an efficient solution to accurately collect and attribute usage from these sources to specific users, and aggregate it for billing and analytics purposes.

This is basically a multi-tenancy chargeback use case where we want to attribute a shared resource, in this case, OpenAI usage, to our customers. You can use the same idea to attribute other shared resources to your customers, like storage, network transfer, and compute.

Let’s discuss our options for metering usage:

#Things That Don’t Scale: Database Writes

One approach on a smaller scale is to write every usage record to a database like MongoDB or PostgreSQL, including the timestamp, total tokens used, and user ID, for example:

INSERT INTO usage (
  id,
  user_id,
  tokens,
  date
)
VALUES (
  'chatcmpl-7TZHnpx8mbiVeCs02oITlyVOHyOuA',
  1,
  21,
  '2023-01-01 00:00:01'
);

Using SQL queries makes it convenient to retrieve the usage data for a specific time period, such as:

SELECT
  user_id,
  SUM(tokens)
FROM
  usage
WHERE
  date >= '2023-01-01'
  AND date < '2023-02-01'
GROUP BY
  user_id

However, this approach can become expensive at scale. Heavy write loads on a database impact costs, increase failover times, enlarge backup sizes, and potentially lead to replication lag. So, what alternatives do we have when we need to track a larger volume of usage?

#Things That Are Not Accurate: Metrics Systems

While metrics systems may seem appealing for tracking usage data at scale as they offer collectors and time-series capabilities, often they lack the consistency guarantees required for use cases like billing. The usage data can be lost during transport or double-counted during retries. Therefore, we need a solution that provides stronger guarantees and ensures idempotency.

#Things That Scale: Aggregating and Stream Processing

One common alternative for scalable usage metering is pre-aggregating usage and writing it in batches to reduce database load. This is usually done in a separate system with a messaging queue to aggregate usage cross-process and minimize data loss at failures. A common solution is to use a message queue like Kafka to ingest usage events and continuously aggregate them with a streaming processor like Kafka Streams, Flink, or ksqlDB. This approach only requires writing aggregated values to long-term storage, significantly reducing write frequency and, thus, costs. For example, if an application sends millions of usage events per second, we can aggregate them into coarser per-minute tumbling windows. The messaging queue also gives a buffer so that in the case of heavy spikes in traffic, our data processing may become more latent, but we never lose usage, and more importantly, we charge customers accurately.

Despite its advantages, managing a stream processing system can be challenging, especially for companies without in-house streaming engineering expertise. Can we have this out of the box? Yes, see the next section.

#3. Using OpenMeter

We highlighted in the previous section that engineers often face challenges balancing cost, accuracy, and complexity when implementing usage metering solutions. I encountered these trade-offs firsthand while I was working at Stripe’s database org and having to collect database usage for financial analysis. This eventually led to the decision to build OpenMeter, the open-source project to help engineers meter and attribute AI and compute usage for billing, chargeback, and analytics cases. In the background, OpenMeter leverages Kafka and stream processing, as mentioned above, so you never have to worry about accuracy and scale.

Let’s see our previous OpenAI usage metering example but this time with OpenMeter’s Node.js SDK:

import openmeter from '@openmeter/sdk';

const openmeter = new OpenMeter({ baseUrl: 'http://localhost:8888' });

await openmeter.ingestEvents({
  specversion: '1.0',
  id: data.id,
  source: 'openai',
  type: 'openai',
  subject: '1',
  time: new Date(data.created * 1000).toISOString(),
  data: {
    tokens: 21,
    model: 'gpt-3.5-turbo',
  },
});

OpenMeter uses the CloudEvents format to describe usage events and deduplicate events by ID and Source fields. And finally, let’s retrieve the hourly usage for our business use cases like billing or real-time analytics:

const { data } = await openmeter.getValuesByMeterId(
  { meterId: 'm1' },
  {
    subject: 'my-subject',
    windowSize: 'HOURLY',
    from: new Date('2023-01-01'),
    to: new Date('2023-02-01'),
  },
);

Learn more about integration for LangChain in our documentation.

#Summary

In this article, we explored how the rise of AI features in products drives the adoption of usage-based pricing and how companies need to stay within margins and pass the cost of API usage like ChatGPT to their users.

We recognized the importance of accurately attributing usage to users and the challenges of implementing a scalable solution. For example, database writes can be expensive, and metrics systems are inaccurate for usage metering use-cases like billing. Finally, we looked at the open-source usage metering solution OpenMeter and how it can help to implement accurate and real-time metering faster.
By leveraging OpenMeter, companies can efficiently attribute usage to users, enabling effective billing and analytics processes.

Check out our GitHub: https://github.com/openmeterio/openmeter

Counting LLM or token usage?

Get started with OpenMeter Cloud today!

Join OpenMeter Cloud

Peter Marton@slashdotpeter