Logo

Low Latency Usage Limit EnforcementEntitlements and Usage Balances on the Edge

Peter Marton
Peter Marton@slashdotpeter
cover

To control costs, AI and cloud companies must be able to enforce usage limits as part of the infrastructure handling their online traffic. This enforcement needs to have low latency characteristics to provide a great user experience.

Today, we are excited to announce the private beta of OpenMeter's Edge Access, which provides low-latency usage limit enforcement.

Our globally distributed data plane offers quick access to balance information from multiple regions and enables usage limit enforcement for online traffic. If you are interested in testing Edge Access, please contact us.

Who is this for?

Companies that need to enforce usage limits and entitlements with low latency:

  • Enforcing usage limits part of customer interactions like LLM prompts
  • Enforcing API request limits defined in pricing

Getting Started

The Edge Access is currently in private beta.

How it Works

OpenMeter maintains a globally distributed dataset that includes up-to-date information about customer balances and the value of your entitlements. This dataset is updated every time a new usage event is processed or the state of entitlements is changed. Edge Access was designed for quick and frequent reads, allowing you to check balances and access part of your online traffic. Requests from your application are routed to the closest region to provide low latency (<=25ms) reads. Although reads are quick, the metered balance is eventually consistent and can take multiple seconds to get updated and reflect the latest usage events sent.