October 2, 2023

How Finance and Engineering Team Up for Cloud EfficiencyBridging the Gap Between Finance and Engineering to Reduce Cloud Cost

Major tech companies spend a substantial amount on the cloud, so it's no surprise that there's a growing emphasis on cloud savings. Usually, this begins with reducing waste, purchasing reserved instances cheaper, and updating to the newest VM families. However, if you only focus on the cost of resources without considering the business context, it's hard to see how cloud spending relates to your revenue, products, and customers. This approach may lead you to prioritize the top cloud resources over expensive product features or those that generate little to no revenue. The key is to foster collaboration between engineering and finance, which we will explore in this blog post.

Before founding OpenMeter, I was employed at Stripe, where the finance and engineering teams joined forces to understand cloud spending better. This data-driven strategy required high-quality usage data from various infrastructure components, inspiring me to simplify and standardize metering for engineers.

#Bridging the Gap Between Finance and Engineering

As mentioned earlier, traditional efficiency measures often target reducing the cost of individual resources rather than examining cloud spending from a product and business standpoint. While this approach is effective for quick wins, correlating spending with business yields more sustainable results in the long run.

Let's illustrate this with a simple example. ACME, an observability company, incurs the highest cloud costs from its database, risking its target margins. Initial cost-saving efforts might focus on right-sizing database nodes, upgrading to the latest instance families, enabling backup compression, etc. These actions are a good starting point as they can result in immediate savings. Yet, eventually, understanding what drives costs is crucial, and this entails translating cloud costs into business-relevant terms. An effective approach is to compute the cost per transaction, transforming cloud costs into a rate aligned with business operations. For instance, if ACME bills customers based on the number of analytics events collected, it's essential to determine the cost to collect, process, and store a single event throughout our architecture. This includes proxy costs, bandwidth, microservice communications, database inserts, backups, long-term storage, etc. For example, breaking down transaction costs can reveal that indefinitely storing events isn't a viable strategy for ACME. They would need to bill customers for long-term storage or introduce data retention terms in their contracts. This insight reveals that while the database is a major expense, it's not merely due to cloud waste but rather a lack of alignment between engineering and financial goals.

By calculating the cost per transaction across all your services and normalizing costs around usage, you can uncover surprising insights into what's driving your cloud spending. You can also compare transaction costs between services and teams to guide future architectural decisions. For example, understanding why it costs 5x more to process or store an event for one group over another, identifying what they do differently, and whether any unnecessary technical complexity can be eliminated.

#Infrastructure Rate Cards

In the previous section, we explored how calculating cost-per-transaction can provide a business-oriented view of cloud costs. Now, let’s dive into how we can break down transaction costs per service using rate cards. A rate card sets the price for a unit that an internal service or team offers. They can be as broad or detailed as needed and can be built upon one another, much like how our architecture layers interact. For instance, a database service might have a rate card of $0.3 per GB/month internally as it purchases multiple instances from the compute teams on a vCPU/second rate card and EBS volumes from AWS.

To establish a rate card, you need to normalize cost by dividing all downstream costs of the service by the sum of all usage (OpenMeter can help with this). For example, all bytes stored in the database are divided by all cloud costs associated with database storage. While rate cards may never be 100% accurate—for instance, you'll need to distribute database node cost between query and storage rate cards—they are useful as long as they align with cost drivers.

Rate cards may fluctuate as cost or usage changes over time. It's advisable to keep rate cards fixed for a year and re-calculate them only when significant architectural changes occur. When a delta occurs, you can define a special overflow bucket to account for positive or negative differences caused by rate cards. This practice allows for comparing old and new cards to assess if the team is moving towards reducing the unit price over time, as well as help teams forecast, budget, and design the future based on stable rate cards.

#Cloud Unit Economics

Defining cost-per-transaction and service-specific rate cards can suddenly make budgeting cloud cost or even planning future architectural changes business-centric, with finance taking the lead. For instance, onboarding a new customer with 100M monthly events is no problem—we know precisely how much cloud cost increase it should drive across our services and teams. When designing a new architecture and assessing its long-term business feasibility, we can calculate its cost based on underlying dependencies' rate cards and expected traffic ramp-up. This allows leaders to make long-term decisions around product and its supporting technology. Working with rate cards bridges the gap between finance and engineering. We move beyond talking about abstract shared cloud resources like EC2 or EBS to discussing metrics closely tied to customers, revenue, and growth.

This system of profit maximization concentrates on objective measurements, and evaluating your organization's performance against it is called Cloud Unit Economics.

#The Future of Cloud Cost Management

By adopting a business-centric approach to cloud cost management, organizations prepare for a future where finance and engineering teams are highly aligned and speak the same language. Expressing cloud cost as cost-per-transaction and rate cards bridges the gap between non-technical teams and engineering, placing cloud costs in the perspective of customers, revenue, and growth. This paradigm shift alters how products are priced, features are evaluated, and future architectural decisions are made.

As more organizations discover the advantages of collaborative work between finance and engineering teams, there’s a growing focus on the FinOps Foundation, dedicated to advancing individuals practicing cloud financial management through best practices.

Need to break down cost?

Get started with OpenMeter Cloud today!

Join OpenMeter Cloud

Peter Marton@slashdotpeter