At OpenMeter, we're privileged to work alongside exceptional teams, helping them monetize their innovative products and features. Our collaborations have painted a clear picture of what leading companies seek to meter. This article explores the most popular metering use cases we've encountered.
LLMs Run on Tokens
The launch of ChatGPT by OpenAI has set a precedent: charging based on token usage for generative AI, a model driven by both customer value and cost. Token count not only reflects the amount of information processed by foundation models but also correlates with operating costs. Given the high per-token cost across various LLMs, token-based billing provides a margin safety net when utilizing APIs from OpenAI, Anthropic, or other LLM providers.
Recommendations for Token Usage Metering
Considering the variation in charges across different prompt types and models, we suggest grouping token usage by prompt type (input, output, and system) and model version (gpt-3.5, gpt-4).
Example meter definition with OpenMeter:
Create Meter in CloudGPUs Power AI
GPUs are used everywhere to run AI workloads and build AI models, eventually leading to the 2023 GPU shortage. Reportedly, OpenAI uses 10,000 Nvidia GPUs to train models. It is no surprise that GPU time is expensive and must be accurately metered. We see an increasing number of OpenMeter users metering GPU time on a second granularity for monetization and cost control use-cases.
Recommendations for Metering GPU Time
We advocate for heartbeat-style metering, where the process periodically reports its status, simplifying implementation and avoiding the complexities of start-stop log analysis, which is error-prone and especially complex for long-running processes that overlap with billing period changes. You can read more about execution time metering in our previous blog post.
Example meter definition with OpenMeter:
Create Meter in CloudMulti-Tenancy and Cloud Cost
With cloud costs significantly impacting COGS, companies prioritize metering their priciest resources. This includes attributing compute, storage, and network usage within multi-tenant setups to respective consumers and teams. Examples include metering Kubernetes pod runtime, database storage, and ingested data volume.
Recommendations to meter multi-tenant resources
Given the complexity of modern systems, we recommend focusing on one or two key consumption metrics that best align with costs for efficiency instead of trying to meter every aspect of a distributed system. For example, you may meter storage and query usage for a database instead of metering all the components involved in running a database. This is efficient as the cost of most parts moves together anyway like the backup cost correlated with storage cost and so on.
Example meter definition with OpenMeter:
Create Meter in CloudAPIs Are Here to Stay
Billing based on API usage remains a popular pricing model. In serverless architectures, this extends to measuring the duration of API calls.
Recommendations to meter API calls
We recommend masking path parameters in your metering strategy to manage dataset
cardinality and keep it human-readable. For example, it is much easier to search
for /products/:product_id
instead of having 10,000 different endpoint paths
due to various product IDs.
Summary
2024, we see the rising need to meter AI resources like LLMs and GPUs. Companies adopting AI features necessitate tighter cost control and drive the adoption of complex usage-based pricing models. By implementing accurate metering, companies can ensure fair billing, optimize costs, and gain insights into customer behavior.