Cache hit ratio is one of the most quoted caching metrics, but on its own it can hide as much as it reveals. This guide explains what cache hit ratio actually measures, how to estimate it in a repeatable way, which assumptions matter most, and how to interpret it alongside latency, origin load, and cost so your team can make better decisions instead of chasing a single percentage.
Overview
If you run a CDN, reverse proxy, API cache, or application-level cache, you will eventually be asked a simple question: what is our cache hit ratio? It sounds straightforward. A cache either serves the request or it does not. Count the hits, divide by total requests, and you have your number.
That basic formula is useful, but it is also incomplete. A high hit ratio may still produce poor user experience if the misses are concentrated on your most expensive or slowest requests. A lower hit ratio may be acceptable if the cache is offloading the heaviest bytes, reducing origin CPU load, or cutting tail latency where it matters most.
In practical terms, cache hit ratio should be treated as a directional metric, not a finish line. It helps answer questions like:
- Is the cache doing meaningful work?
- Did a config change improve or reduce cacheability?
- Are purge patterns or cache-control headers causing more misses than expected?
- Is origin traffic dropping as cache hits rise?
- Are we saving cost, latency, or infrastructure capacity in the places that matter?
For technical teams, the more useful goal is not “maximize hit ratio at all costs.” The goal is to improve effective cache performance: serving more valuable traffic from cache while keeping correctness, freshness, and operational simplicity intact.
It also helps to separate the layers involved. Browser cache hit ratio, CDN hit ratio, reverse proxy hit ratio, and application cache hit ratio are different measurements. Each layer sees different traffic and follows different rules. A team can have excellent browser caching for static assets while seeing poor CDN caching for HTML, or strong CDN offload but weak database query caching inside the app.
If your stack includes multiple layers, define the scope before discussing the number. “Our cache hit ratio is 92%” means very little unless you also know which cache, which routes, and which time window.
How to estimate
The simplest request-based cache hit ratio formula is:
Cache hit ratio = cache hits / total cacheable requests
That last phrase matters. Using total incoming requests as the denominator often creates misleading results, especially when the traffic mix includes endpoints that are intentionally uncacheable, authenticated pages, admin routes, or one-off API requests.
A more practical estimation workflow looks like this:
- Define the cache layer. Pick one layer at a time: browser, CDN, reverse proxy, app cache, or database cache.
- Define the traffic scope. Group by asset class, route type, hostname, environment, or service. Do not mix static assets and personalized HTML if you want a usable number.
- Decide on the denominator. Use total requests only if almost everything is designed to be cacheable. Otherwise, use cacheable requests.
- Measure hits, misses, and bypasses separately. A bypass is not the same as a miss. If a request was never eligible for cache, treat it as a separate bucket.
- Calculate companion metrics. Pair hit ratio with origin request rate, origin bytes served, median and tail latency, and error rate.
- Compare over time. A single snapshot is weak. Trend the metric by day, deploy window, purge event, or traffic segment.
For teams that want a more decision-oriented model, it helps to calculate three related ratios instead of one:
- Request hit ratio: percentage of eligible requests served from cache.
- Byte hit ratio: percentage of response bytes served from cache.
- Origin offload ratio: reduction in origin requests or origin bytes due to caching.
These often move in different directions. For example, a site may cache many small static files and achieve a high request hit ratio while still sending large image variants or API responses to origin, limiting byte offload. The reverse can also happen: a small number of large objects may create a strong byte hit ratio even if request hit ratio is moderate.
Here is a simple estimation framework you can use in a spreadsheet:
Step 1: Segment traffic
Create rows for groups such as versioned static assets, images, public HTML, anonymous API GET responses, authenticated API responses, and admin routes.
Step 2: For each segment, estimate:
- Total requests in the time window
- Cacheable percentage
- Expected hit ratio among cacheable requests
- Average response size
- Average origin processing cost or latency if missed
Step 3: Compute:
- Cacheable requests = total requests × cacheable percentage
- Hits = cacheable requests × hit ratio
- Misses = cacheable requests − hits
- Bytes offloaded = hits × average response size
- Origin requests remaining = non-cacheable requests + misses
This approach is more useful than a headline hit ratio because it shows what your cache is actually doing for infrastructure and user experience.
It also creates a repeatable baseline for planning. If you improve caching headers on public HTML, or introduce a stale-while-revalidate policy, or reduce unnecessary purges, you can re-run the same model and estimate impact before deploying.
For readers working on edge and proxy layers, related implementation details often matter as much as the formula itself. If your hit ratio looks worse than expected, issues may come from cache-control mistakes, purge design, or route-specific rules rather than traffic quality. In practice, articles like Common Cache-Control Header Mistakes and How to Fix Them, Cloudflare Cache Rules Explained with Practical Examples, and Reverse Proxy Caching Explained for Beginners are useful follow-on references when the metric tells you something is wrong but not why.
Inputs and assumptions
A cache hit ratio estimate is only as good as its assumptions. The most common reason teams misread the metric is that they do not document what went into the number.
These are the key inputs worth writing down.
1. Cache eligibility
Not every request should be cacheable. Personalized content, authenticated API traffic, write operations, preview modes, and admin paths often bypass cache by design. If these are mixed into the denominator, the reported hit ratio will appear artificially low.
Useful questions:
- Which methods are considered cacheable?
- Are query strings normalized or treated as unique variants?
- Do cookies disable caching for routes that could otherwise be public?
- Are status codes like 301, 404, or 200 cached differently?
2. Time to live and revalidation policy
TTL has a direct effect on hit ratio. Short TTLs reduce staleness risk but create more misses. Long TTLs raise hit ratio but require stronger versioning or purge discipline. Revalidation behavior also matters: conditional requests can reduce transfer size or origin work even if they do not count as pure cache hits in your reporting.
For versioned files, immutable caching often pushes hit ratio much higher over time because each asset URL is stable and safe to cache aggressively. See Immutable Caching for Versioned Assets: When max-age=31536000 Makes Sense for the mechanics behind that pattern.
3. Purge frequency
Frequent full-cache purges can crush hit ratio, especially on busy sites with large asset sets or globally distributed traffic. If your workflow purges everything on every deploy, your metric may reflect operational choices more than content characteristics.
Targeted invalidation usually gives a better long-term balance. If purge behavior is part of your environment, it deserves its own annotation in dashboards and reports. The trade-offs are covered well in CDN Cache Purge Strategies: Full Purge vs Tag Purge vs URL Purge.
4. Traffic distribution
Hit ratio tends to improve when traffic is concentrated on a smaller set of objects. It tends to fall when requests are spread across many low-reuse URLs. This is why the same configuration can produce very different outcomes for:
- A marketing site with shared static assets
- An ecommerce catalog with many filter combinations
- A public API with highly variable query parameters
- A dashboard product with mostly authenticated traffic
Traffic spikes can improve or worsen the metric depending on what is popular during the spike. A viral image-heavy page may drive excellent cache reuse. A sudden crawl across rarely visited URLs may do the opposite.
5. Variant explosion
Headers, cookies, languages, device-specific transformations, image resizing, and query parameters can create many cache variants for what appears to be the same resource. More variants generally mean fewer hits per variant unless traffic is large enough to warm each one.
This is a common reason teams see disappointing CDN hit ratio even when the content is nominally public. Normalizing URLs, ignoring unnecessary query parameters, and reducing cookie influence can often improve the metric more than simply increasing TTL.
6. Measurement semantics
Different platforms count hits differently. Some include stale responses served during revalidation. Some count memory and disk cache together. Some separate edge hits from shield hits. Some report request hit ratio but not byte hit ratio. Before comparing environments or vendors, make sure the definitions match.
A safe rule is to create your own internal glossary with explicit meanings for hit, miss, bypass, revalidated, stale served, and expired served. That reduces confusion during incident review and capacity planning.
Worked examples
The goal of these examples is not to present universal benchmarks. It is to show how the same hit ratio can imply very different outcomes.
Example 1: Strong request hit ratio, modest operational gain
Imagine a site where versioned CSS, JavaScript, fonts, and icons are heavily cached. These assets generate many requests, are small to moderate in size, and are reused often. Public HTML, however, has a short TTL and is re-fetched from origin frequently.
The request hit ratio may look excellent because the browser or CDN serves most asset requests from cache. But origin CPU may still be under pressure if the expensive HTML or API responses are missing often. In this case, the hit ratio sounds healthy while the real bottleneck remains unchanged.
This is why it helps to combine request hit ratio with origin response time and origin request volume. If the cache serves many cheap requests but misses the expensive ones, your overall performance gains may be smaller than the headline number suggests.
Example 2: Moderate hit ratio, strong byte offload
Now consider an image-heavy content site. The total request hit ratio is only moderate because HTML and some edge cases are not cached long. But image variants are reused heavily, account for most transferred bytes, and are served from cache efficiently.
In this setup, byte hit ratio and bandwidth savings may be excellent even though request hit ratio is unremarkable. If your infrastructure cost is sensitive to egress or image processing load, this could be a very successful outcome.
Teams working through asset optimization often see this pattern with media and fonts. Related guidance is covered in Image Caching Best Practices for Modern Web Performance and Font Caching Best Practices for Faster Core Web Vitals.
Example 3: Rising hit ratio, worse freshness risk
Suppose a team increases TTLs across many routes and sees the hit ratio improve quickly. At first glance, this looks like a clear win. But if purge workflows are weak or asset versioning is inconsistent, users may receive stale pages or mixed asset versions after deploys.
Here the metric improved, but correctness degraded. This is a classic case where hit ratio misleads because it rewards retention without measuring freshness quality. A good dashboard should include stale content incidents, purge latency, and deploy rollback patterns alongside the cache graph.
Example 4: Falling hit ratio after a release, but system health improves
Consider a release that intentionally excludes personalized pages from cache because the previous rules leaked too many variants or caused correctness issues. Hit ratio drops. However, support tickets fall, debugging becomes easier, and the remaining cacheable paths become more predictable and easier to optimize.
A lower ratio is not automatically a regression. Sometimes removing bad caching is progress. The useful question is whether the cache is now aligned with route behavior, freshness requirements, and operational risk.
Example 5: Same ratio, different cost impact
Two services both report a 75% cache hit ratio. Service A serves tiny JSON responses and lightweight HTML pages. Service B serves CPU-heavy rendered documents and expensive search results. Even with the same ratio, the cost and capacity value of hits in Service B may be much higher.
This is why weighted models are often more useful than a single percentage. If you assign rough cost or latency weights to route groups, you can estimate where a hit is worth the most. In practice, this helps prioritize optimization work far better than trying to lift every route evenly.
When to recalculate
Cache hit ratio is not a set-and-forget metric. It should be revisited whenever the underlying inputs change, especially in environments with active deployment workflows, evolving traffic patterns, or changing infrastructure costs.
Recalculate or review your estimate when any of the following happen:
- You change cache-control headers. Even small TTL or directive changes can alter hit and revalidation behavior.
- You change purge strategy. Moving from full purge to tag or URL-based purge can materially improve cache retention.
- You launch new routes or products. A new authenticated area, image pipeline, or API feature changes the traffic mix.
- You alter URL structure or query parameter behavior. Cache key changes can create or remove variants.
- You adopt a new CDN, proxy, or edge rule set. Vendor semantics and default behaviors vary.
- Traffic composition shifts. Seasonal content, campaigns, crawlers, or geographic expansion can all change reuse patterns.
- Performance or cost becomes a business priority. If latency budgets tighten or origin costs rise, cache efficiency deserves a fresh look.
A practical review loop for most teams is:
- Pick a consistent reporting window.
- Break results down by route or asset class.
- Track request hit ratio, byte hit ratio, and origin offload together.
- Annotate deploys, purges, and rule changes on the timeline.
- Investigate large shifts instead of celebrating or worrying about the top-line percentage alone.
If you want an action-oriented checklist, start here:
- Define which requests are intentionally cacheable.
- Separate hits, misses, and bypasses in monitoring.
- Measure by segment, not only site-wide totals.
- Pair the ratio with latency, origin load, and bytes served.
- Reduce unnecessary variants from cookies, headers, and query strings.
- Prefer targeted invalidation where possible.
- Review caching rules after deploy workflow changes.
- Use the metric to guide decisions, not to declare victory.
The most useful cache hit ratio is the one your team can explain. If you know what is included, what is excluded, and what outcome the cache is meant to improve, the metric becomes a practical operating tool. Without that context, it is just a percentage that sounds better than it works.
For teams refining their stack over time, it is worth revisiting the surrounding implementation details regularly. Browser and edge caching interact with asset versioning, origin headers, reverse proxies, and service worker behavior. If you need to tune the system behind the metric, relevant next reads include Apache Cache Headers Guide for Static Assets and HTML, Varnish Cache Configuration Patterns for APIs and Content Sites, and Service Worker Caching Strategies Compared: Cache First, Network First, and Stale While Revalidate.