Launch week is here! March 17, 2025
Logo

Nvidia Run:ai

OpenMeter can integrate with Nvidia's Run:ai to collect allocated and utilized resources for your AI/ML workloads, including GPUs, CPUs, and memory. This is useful for companies using Run:ai to run GPU workloads and want to bill and invoice their customers based on consumption of allocated and utilized resources.

How it works

You can install the OpenMeter Collector as a Kubernetes pod in your Run:ai cluster to collect metrics from your Run:ai platform automatically. OpenMeter will then periodically scrape the metrics from your Run:ai platform and emit them as CloudEvents to your OpenMeter instance. This allows you to track usage and billing for your Run:ai workloads.

Once you have the usage data ingested into OpenMeter, you can use it to setup prices and billing for your customers based on their usage.

Example

Let's say you want to charge your customers $0.2 per GPU minute and $0.05 per CPU minute. The OpenMeter Collector will emit the following events every 30 seconds from your Run:ai workloads to OpenMeter Cloud:

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "specversion": "1.0",
  "type": "workload",
  "source": "run_ai",
  "time": "2025-01-01T00:00:00Z",
  "subject": "my-customer-id",
  "data": {
    "name": "my-runai-workload",
    "namespace": "my-runai-benchmark-test",
    "phase": "Running",
    "project": "my-project-id",
    "department": "my-department-id",
    // Workload running for a minute
    "workload_minutes": 1.0,
    // 96 CPU cores for a minute (m5a.24xlarge)
    "cpu_limit_core_minutes": 96,
    "cpu_request_core_minutes": 96,
    "cpu_usage_core_minutes": 80,
    // 384 GB of CPU memory for a minute (m5a.24xlarge)
    "cpu_memory_limit_gigabyte_minutes": 384,
    "cpu_memory_request_gigabyte_minutes": 384,
    "cpu_memory_usage_gigabyte_minutes": 178,
    // 1 GPU for a minute
    "gpu_allocation_minutes": 1,
    "gpu_usage_minutes": 1,
    // 40 GB of GPU memory for a minute
    "gpu_memory_request_gigabyte_minutes": 40,
    "gpu_memory_usage_gigabyte_minutes": 27
  }
}
Normalized Usage

Note how the collector normalizes the collected metrics to a minute (configurable) making it easy to set per second, minute or hour pricing similar to how AWS EC2 pricing works.

See OpenMeter Billing docs to setup prices and billing for your customers.

Run:ai Metrics

The OpenMeter Collector supports the following Run:ai metrics:

Pod Metrics

Metric NameDescription
GPU_UTILIZATION_PER_GPUGPU utilization percentage per individual GPU
GPU_UTILIZATIONOverall GPU utilization percentage for the pod
GPU_MEMORY_USAGE_BYTES_PER_GPUGPU memory usage in bytes per individual GPU
GPU_MEMORY_USAGE_BYTESTotal GPU memory usage in bytes for the pod
CPU_USAGE_CORESNumber of CPU cores currently being used
CPU_MEMORY_USAGE_BYTESAmount of CPU memory currently being used in bytes
GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPUGraphics engine utilization percentage per GPU
GPU_SM_ACTIVITY_PER_GPUStreaming Multiprocessor (SM) activity percentage per GPU
GPU_SM_OCCUPANCY_PER_GPUSM occupancy percentage per GPU
GPU_TENSOR_ACTIVITY_PER_GPUTensor core utilization percentage per GPU
GPU_FP64_ENGINE_ACTIVITY_PER_GPUFP64 (double precision) engine activity percentage per GPU
GPU_FP32_ENGINE_ACTIVITY_PER_GPUFP32 (single precision) engine activity percentage per GPU
GPU_FP16_ENGINE_ACTIVITY_PER_GPUFP16 (half precision) engine activity percentage per GPU
GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPUMemory bandwidth utilization percentage per GPU
GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPUNVLink transmitted bandwidth per GPU
GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPUNVLink received bandwidth per GPU
GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPUPCIe transmitted bandwidth per GPU
GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPUPCIe received bandwidth per GPU
GPU_SWAP_MEMORY_BYTES_PER_GPUAmount of GPU memory swapped to system memory per GPU

Workload Metrics

Metric NameDescription
GPU_UTILIZATIONOverall GPU utilization percentage across all GPUs in the workload
GPU_MEMORY_USAGE_BYTESTotal GPU memory usage in bytes across all GPUs
GPU_MEMORY_REQUEST_BYTESRequested GPU memory in bytes for the workload
CPU_USAGE_CORESNumber of CPU cores currently being used
CPU_REQUEST_CORESNumber of CPU cores requested for the workload
CPU_LIMIT_CORESMaximum number of CPU cores allowed for the workload
CPU_MEMORY_USAGE_BYTESAmount of CPU memory currently being used in bytes
CPU_MEMORY_REQUEST_BYTESRequested CPU memory in bytes for the workload
CPU_MEMORY_LIMIT_BYTESMaximum CPU memory allowed in bytes for the workload
POD_COUNTTotal number of pods in the workload
RUNNING_POD_COUNTNumber of currently running pods in the workload
GPU_ALLOCATIONNumber of GPUs allocated to the workload

Getting Started

First, create a new YAML file for the collector configuration. You will have to use the run_ai Redpanda Connect input:

input:
  run_ai:
    url: '${RUNAI_URL:}'
    app_id: '${RUNAI_APP_ID:}'
    app_secret: '${RUNAI_APP_SECRET:}'
    schedule: '*/30 * * * * *'
    metrics_offset: '30s'
    resource_type: 'workload'
    metrics:
      - CPU_LIMIT_CORES
      - CPU_MEMORY_LIMIT_BYTES
      - CPU_MEMORY_REQUEST_BYTES
      - CPU_MEMORY_USAGE_BYTES
      - CPU_REQUEST_CORES
      - CPU_USAGE_CORES
      - GPU_ALLOCATION
      - GPU_MEMORY_REQUEST_BYTES
      - GPU_MEMORY_USAGE_BYTES
      - GPU_UTILIZATION
      - POD_COUNT
      - RUNNING_POD_COUNT
    http:
      timeout: 30s
      retry_count: 1
      retry_wait_time: 100ms
      retry_max_wait_time: 1s

The above section will tell Redpanda Connect how to collect metrics from your Run:ai platform.

Configuration Options

OptionDescriptionDefaultRequired
urlRun:ai base URL-Yes
app_idRun:ai app ID-Yes
app_secretRun:ai app secret-Yes
resource_typeRun:ai resource to collect metrics from (workload or pod)workloadNo
metricsList of Run:ai metrics to collectAll availableNo
scheduleCron expression for the scrape interval*/30 * * * * *No
metrics_offsetTime offset for queries to account for delays in metric availability0sNo
httpHTTP client configuration-No

The collector supports all the metrics for both workloads and pods, visit the Run:ai API docs for more information.

Next, you need to configure the mapping from the Run:ai metrics to CloudEvents using bloblang:

pipeline:
  processors:
    - mapping: |
        let duration_seconds = (meta("scrape_interval").parse_duration() / 1000 / 1000 / 1000).round().int64()
        let gpu_allocation_minutes = this.allocatedResources.gpu.number(0) * $duration_seconds / 60
        let cpu_limit_core_minutes = this.metrics.values.CPU_LIMIT_CORES.number(0) * $duration_seconds / 60
        // Add metrics as needed...
 
        root = {
          "id": uuid_v4(),
          "specversion": "1.0",
          "type": meta("resource_type"),
          "source": "run_ai",
          "time": ,
          "subject": this.name,
          "data": {
            "tenant": this.tenantId,
            "project": this.projectId,
            "department": this.departmentId,
            "cluster": this.clusterId,
            "type": this.type,
            "gpuAllocationMinutes": gpu_allocation_minutes,
            "cpuLimitCoreMinutes": cpu_limit_core_minutes,
          }
        }

Finally, you need to configure the OpenMeter output:

output:
  label: 'openmeter'
  http_client:
    url: '${OPENMETER_URL:https://openmeter.cloud}/api/v1/events'
    verb: POST
    headers:
      # Optional: API key for OpenMeter Cloud
      Authorization: 'Bearer ${OPENMETER_TOKEN:}'
      Content-Type: 'application/json'
    # Batch settings for efficient API usage
    batching:
      # Send up to 100 events in a single request
      count: 100
      # Or send after 1 second, whichever comes first
      period: 1s
      processors:
        # Convert batch to JSON array format
        - archive:
            format: json_array
    dump_request_log_level: DEBUG

Read more about configuring Redpanda Connect in the OpenMeter Collector guide.

Scheduling

The collector runs on a schedule defined by the schedule parameter using cron syntax. It supports:

  • Standard cron expressions (e.g., */30 * * * * * for every 30 seconds)
  • Duration syntax with the @every prefix (e.g., @every 30s)

Resource Types

The collector can collect metrics from two different resource types:

  • workload - Collects metrics at the workload level, which represents a group of pods
  • pod - Collects metrics at the individual pod level

Installation

Check out the OpenMeter Collector guide for installation instructions.