Nvidia Run:ai
OpenMeter can integrate with Nvidia's Run:ai to collect allocated and utilized resources for your AI/ML workloads, including GPUs, CPUs, and memory. This is useful for companies using Run:ai to run GPU workloads and want to bill and invoice their customers based on consumption of allocated and utilized resources.
How it works
You can install the OpenMeter Collector as a Kubernetes pod in your Run:ai cluster to collect metrics from your Run:ai platform automatically. OpenMeter will then periodically scrape the metrics from your Run:ai platform and emit them as CloudEvents to your OpenMeter instance. This allows you to track usage and billing for your Run:ai workloads.
Once you have the usage data ingested into OpenMeter, you can use it to setup prices and billing for your customers based on their usage.
Example
Let's say you want to charge your customers $0.2 per GPU minute and $0.05 per CPU minute. The OpenMeter Collector will emit the following events every 30 seconds from your Run:ai workloads to OpenMeter Cloud:
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"specversion": "1.0",
"type": "workload",
"source": "run_ai",
"time": "2025-01-01T00:00:00Z",
"subject": "my-customer-id",
"data": {
"name": "my-runai-workload",
"namespace": "my-runai-benchmark-test",
"phase": "Running",
"project": "my-project-id",
"department": "my-department-id",
// Workload running for a minute
"workload_minutes": 1.0,
// 96 CPU cores for a minute (m5a.24xlarge)
"cpu_limit_core_minutes": 96,
"cpu_request_core_minutes": 96,
"cpu_usage_core_minutes": 80,
// 384 GB of CPU memory for a minute (m5a.24xlarge)
"cpu_memory_limit_gigabyte_minutes": 384,
"cpu_memory_request_gigabyte_minutes": 384,
"cpu_memory_usage_gigabyte_minutes": 178,
// 1 GPU for a minute
"gpu_allocation_minutes": 1,
"gpu_usage_minutes": 1,
// 40 GB of GPU memory for a minute
"gpu_memory_request_gigabyte_minutes": 40,
"gpu_memory_usage_gigabyte_minutes": 27
}
}
Note how the collector normalizes the collected metrics to a minute (configurable) making it easy to set per second, minute or hour pricing similar to how AWS EC2 pricing works.
See OpenMeter Billing docs to setup prices and billing for your customers.
Run:ai Metrics
The OpenMeter Collector supports the following Run:ai metrics:
Pod Metrics
Metric Name | Description |
---|---|
GPU_UTILIZATION_PER_GPU | GPU utilization percentage per individual GPU |
GPU_UTILIZATION | Overall GPU utilization percentage for the pod |
GPU_MEMORY_USAGE_BYTES_PER_GPU | GPU memory usage in bytes per individual GPU |
GPU_MEMORY_USAGE_BYTES | Total GPU memory usage in bytes for the pod |
CPU_USAGE_CORES | Number of CPU cores currently being used |
CPU_MEMORY_USAGE_BYTES | Amount of CPU memory currently being used in bytes |
GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPU | Graphics engine utilization percentage per GPU |
GPU_SM_ACTIVITY_PER_GPU | Streaming Multiprocessor (SM) activity percentage per GPU |
GPU_SM_OCCUPANCY_PER_GPU | SM occupancy percentage per GPU |
GPU_TENSOR_ACTIVITY_PER_GPU | Tensor core utilization percentage per GPU |
GPU_FP64_ENGINE_ACTIVITY_PER_GPU | FP64 (double precision) engine activity percentage per GPU |
GPU_FP32_ENGINE_ACTIVITY_PER_GPU | FP32 (single precision) engine activity percentage per GPU |
GPU_FP16_ENGINE_ACTIVITY_PER_GPU | FP16 (half precision) engine activity percentage per GPU |
GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPU | Memory bandwidth utilization percentage per GPU |
GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPU | NVLink transmitted bandwidth per GPU |
GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPU | NVLink received bandwidth per GPU |
GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPU | PCIe transmitted bandwidth per GPU |
GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPU | PCIe received bandwidth per GPU |
GPU_SWAP_MEMORY_BYTES_PER_GPU | Amount of GPU memory swapped to system memory per GPU |
Workload Metrics
Metric Name | Description |
---|---|
GPU_UTILIZATION | Overall GPU utilization percentage across all GPUs in the workload |
GPU_MEMORY_USAGE_BYTES | Total GPU memory usage in bytes across all GPUs |
GPU_MEMORY_REQUEST_BYTES | Requested GPU memory in bytes for the workload |
CPU_USAGE_CORES | Number of CPU cores currently being used |
CPU_REQUEST_CORES | Number of CPU cores requested for the workload |
CPU_LIMIT_CORES | Maximum number of CPU cores allowed for the workload |
CPU_MEMORY_USAGE_BYTES | Amount of CPU memory currently being used in bytes |
CPU_MEMORY_REQUEST_BYTES | Requested CPU memory in bytes for the workload |
CPU_MEMORY_LIMIT_BYTES | Maximum CPU memory allowed in bytes for the workload |
POD_COUNT | Total number of pods in the workload |
RUNNING_POD_COUNT | Number of currently running pods in the workload |
GPU_ALLOCATION | Number of GPUs allocated to the workload |
Getting Started
First, create a new YAML file for the collector configuration. You will have to use the run_ai Redpanda Connect input:
input:
run_ai:
url: '${RUNAI_URL:}'
app_id: '${RUNAI_APP_ID:}'
app_secret: '${RUNAI_APP_SECRET:}'
schedule: '*/30 * * * * *'
metrics_offset: '30s'
resource_type: 'workload'
metrics:
- CPU_LIMIT_CORES
- CPU_MEMORY_LIMIT_BYTES
- CPU_MEMORY_REQUEST_BYTES
- CPU_MEMORY_USAGE_BYTES
- CPU_REQUEST_CORES
- CPU_USAGE_CORES
- GPU_ALLOCATION
- GPU_MEMORY_REQUEST_BYTES
- GPU_MEMORY_USAGE_BYTES
- GPU_UTILIZATION
- POD_COUNT
- RUNNING_POD_COUNT
http:
timeout: 30s
retry_count: 1
retry_wait_time: 100ms
retry_max_wait_time: 1s
The above section will tell Redpanda Connect how to collect metrics from your Run:ai platform.
Configuration Options
Option | Description | Default | Required |
---|---|---|---|
url | Run:ai base URL | - | Yes |
app_id | Run:ai app ID | - | Yes |
app_secret | Run:ai app secret | - | Yes |
resource_type | Run:ai resource to collect metrics from (workload or pod ) | workload | No |
metrics | List of Run:ai metrics to collect | All available | No |
schedule | Cron expression for the scrape interval | */30 * * * * * | No |
metrics_offset | Time offset for queries to account for delays in metric availability | 0s | No |
http | HTTP client configuration | - | No |
The collector supports all the metrics for both workloads and pods, visit the Run:ai API docs for more information.
Next, you need to configure the mapping from the Run:ai metrics to CloudEvents using bloblang:
pipeline:
processors:
- mapping: |
let duration_seconds = (meta("scrape_interval").parse_duration() / 1000 / 1000 / 1000).round().int64()
let gpu_allocation_minutes = this.allocatedResources.gpu.number(0) * $duration_seconds / 60
let cpu_limit_core_minutes = this.metrics.values.CPU_LIMIT_CORES.number(0) * $duration_seconds / 60
// Add metrics as needed...
root = {
"id": uuid_v4(),
"specversion": "1.0",
"type": meta("resource_type"),
"source": "run_ai",
"time": ,
"subject": this.name,
"data": {
"tenant": this.tenantId,
"project": this.projectId,
"department": this.departmentId,
"cluster": this.clusterId,
"type": this.type,
"gpuAllocationMinutes": gpu_allocation_minutes,
"cpuLimitCoreMinutes": cpu_limit_core_minutes,
}
}
Finally, you need to configure the OpenMeter output:
# Send processed events to OpenMeter
output:
label: 'openmeter'
drop_on:
error: false
error_patterns:
- Bad Request
output:
http_client:
url: '${OPENMETER_URL:https://openmeter.cloud}/api/v1/events'
verb: POST
headers:
Authorization: 'Bearer ${OPENMETER_TOKEN:}'
Content-Type: 'application/json'
timeout: 30s
retry_period: 15s
retries: 3
max_retry_backoff: 1m
# Maximum number of concurrent requests
max_in_flight: 64
batch_as_multipart: false
drop_on:
- 400
# Batch settings for efficient API usage
batching:
# Send up to 100 events in a single request
count: 100
# Or send after 1 second, whichever comes first
period: 1s
processors:
# Track metrics on sent events
- metric:
type: counter
name: openmeter_events_sent
value: 1
# Convert batch to JSON array format
- archive:
format: json_array
dump_request_log_level: DEBUG
Read more about configuring Redpanda Connect in the OpenMeter Collector guide.
Scheduling
The collector runs on a schedule defined by the schedule
parameter using cron
syntax. It supports:
- Standard cron expressions (e.g.,
*/30 * * * * *
for every 30 seconds) - Duration syntax with the
@every
prefix (e.g.,@every 30s
)
Resource Types
The collector can collect metrics from two different resource types:
workload
- Collects metrics at the workload level, which represents a group of podspod
- Collects metrics at the individual pod level
Installation
Check out the OpenMeter Collector guide for installation instructions.