Stale-While-Revalidate vs Stale-If-Error

A practical comparison of stale-while-revalidate and stale-if-error, with scenario-based guidance for faster and more resilient caching.

Stale-While-Revalidate and stale-if-error are often grouped together because both allow cached content to outlive its normal freshness window. But they solve different problems. One is mainly about keeping responses fast while a cache refresh happens in the background; the other is about keeping responses available when something upstream breaks. If you treat them as interchangeable, you can end up serving stale data at the wrong time, missing resilience gains, or masking origin problems longer than intended. This guide compares the two directives in practical terms, shows where each one fits, and gives scenario-based guidance you can use when tuning browser, CDN, reverse proxy, or origin cache behavior.

Overview

This section gives you the short version: what each directive does, where they overlap, and why the distinction matters in production.

Both directives are commonly used as Cache-Control extensions on cacheable responses. They come into play after a response has become stale. That shared starting point is what causes confusion.

Here is the simplest way to separate them:

stale-while-revalidate says: “If this response is stale, a cache may still serve it for a limited time while it fetches a newer version in the background.”
stale-if-error says: “If this response is stale and the attempt to get a fresh version fails with an error, a cache may still serve the stale version for a limited time instead of returning the error.”

In other words:

stale-while-revalidate is primarily a latency and smoothness tool.
stale-if-error is primarily an availability and resilience tool.

The difference matters because your users experience them differently. With stale-while-revalidate, users usually see a fast response even as the cache updates behind the scenes. With stale-if-error, users may still get content during an outage or transient backend failure instead of seeing a 500-level error or timeout.

Neither directive replaces a good TTL strategy. You still need to decide how long content is fresh under max-age or related controls, then decide what should happen once freshness has expired. If you need a stronger foundation for that part, see TTL Tuning Guide: How to Choose Cache Expiration Times by Content Type.

A helpful mental model is a three-phase lifecycle:

Fresh: serve normally.
Stale but acceptable: maybe serve stale under controlled rules.
Too old or too risky: require a fresh response or fail.

These stale directives shape phase two. They should not be added casually to everything. The right question is not “Can I serve stale?” but “Under what conditions is stale preferable to waiting or failing?”

How to compare options

This section gives you a framework for choosing between the directives, or combining them, without reducing the decision to a one-line rule.

When comparing stale-while-revalidate vs stale-if-error, focus on five variables.

1. Tolerance for outdated content

Start with business impact, not cache mechanics. Some content can be a few minutes old with little downside. Other content becomes misleading quickly.

High tolerance: marketing pages, blog posts, documentation, avatars, category pages with slow-changing metadata.
Medium tolerance: product listings, dashboards with non-critical summaries, search pages where exact freshness is helpful but not mandatory.
Low tolerance: account balances, inventory counts, pricing under active change, permission-dependent responses, personalized data.

If stale content is acceptable briefly during refresh, stale-while-revalidate may be a fit. If stale content is acceptable only as a fallback during failures, stale-if-error may be safer. If stale content is not acceptable at all, neither directive should be used broadly.

2. Cost of waiting

Ask what happens when a stale object needs revalidation or refetching.

If waiting adds visible latency and hurts user experience, stale-while-revalidate can help hide that delay.
If waiting is acceptable but serving an error would be much worse, stale-if-error may be the better protection.

This is especially relevant at the CDN or reverse proxy layer, where background refresh can reduce request collapse problems and smooth traffic spikes.

3. Failure modes at the origin

Think about how your origin actually fails.

Does it have brief deployment-related 500s?
Does it occasionally time out under load?
Does a dependency like a database or third-party API fail intermittently?

If your main operational risk is intermittent origin failure, stale-if-error is directly aligned with that risk. It allows a cache to keep serving known-good content instead of amplifying an outage.

If your origin is usually reliable but can be slow on cache misses, stale-while-revalidate often yields more value.

4. Personalization and cache scope

Not every response should be cached the same way. Public shared caches, private browser caches, and application-level caches can behave differently. Before enabling stale serving, confirm:

the response is safe to cache in the relevant layer,
cache keys correctly separate variants,
authorization or cookie-based responses are not being mixed accidentally.

Many cache bugs are not caused by the stale directives themselves, but by incorrect assumptions about what is cacheable. If you want a broader checklist, read Common Cache-Control Header Mistakes and How to Fix Them.

5. Recovery expectations

Finally, decide how long stale content should remain eligible.

A short stale window can absorb brief disruptions without hiding longer incidents. A very long stale window may protect uptime numbers while quietly serving outdated information for too long. This is where teams often overcorrect: after an outage, they increase stale allowances aggressively, then later discover users were seeing old content well past the acceptable limit.

Use stale windows as a controlled buffer, not as a substitute for fixing origin reliability or purge discipline.

Feature-by-feature breakdown

This section compares the directives directly so you can see where they diverge in behavior and tradeoffs.

Primary purpose

stale-while-revalidate: improve perceived performance by serving a stale response immediately while revalidation or refetch happens asynchronously.
stale-if-error: improve resilience by serving stale content when revalidation or refetch fails.

If you had to choose a single label for each, use speed for stale-while-revalidate and continuity for stale-if-error.

What the user sees

With stale-while-revalidate, the user typically gets a fast response and may never notice the object was stale.
With stale-if-error, the user gets a fallback response instead of an error page, timeout, or empty payload when the origin has trouble.

The user experience can be equally valuable in both cases, but the triggering event differs: normal refresh vs failure.

Trigger condition

stale-while-revalidate is triggered when the object is stale but still within the allowed stale-while-revalidate window.
stale-if-error is triggered when the object is stale, a fresh fetch is attempted, and that fetch results in an error condition covered by the cache implementation.

This distinction is easy to miss. A stale object does not automatically use stale-if-error. There has to be a refresh failure scenario.

Operational benefits

stale-while-revalidate can reduce latency spikes, soften cache-expiry thundering herds, and keep traffic flowing smoothly during refresh cycles.
stale-if-error can reduce incident blast radius, improve service continuity during origin problems, and give operators time to recover without exposing every backend failure directly to users.

For teams focused on both performance and uptime, using both can make sense on selected routes.

Main risks

stale-while-revalidate risk: users may keep seeing older content longer than expected if refresh behavior is misunderstood or if stale windows are too generous.
stale-if-error risk: stale content can mask real origin failures, making incidents less visible until data drift becomes obvious.

This is why monitoring matters. Do not treat successful stale fallback as equivalent to a healthy origin.

Best content fit

Good candidates for stale-while-revalidate:

public HTML that changes periodically but not every second,
documentation pages,
CMS-driven landing pages,
cached API responses with soft freshness needs,
derived or aggregated content that is expensive to regenerate.

Good candidates for stale-if-error:

public pages where availability matters more than minute-level freshness during incidents,
read-heavy APIs that can tolerate temporary fallback data,
status pages or informational endpoints that should degrade gracefully,
assets or edge-cached content where serving an older copy is better than failing.

Poor candidates for either, unless narrowly designed:

highly personalized responses,
security-sensitive data,
rapidly changing transactional data,
anything where stale values create legal, financial, or operational risk.

Relationship to other caching choices

Neither directive should be tuned in isolation. Their usefulness depends on the rest of your strategy:

Base freshness policy: A sensible max-age or similar baseline is still required.
Purge capability: If you can purge precisely, you can often allow more aggressive stale handling with less risk. See CDN Cache Purge Strategies: Full Purge vs Tag Purge vs URL Purge.
Versioned static assets: For truly immutable files, these stale directives may matter less than long-lived immutable caching. See Immutable Caching for Versioned Assets: When max-age=31536000 Makes Sense.
Proxy or CDN behavior: Actual support and interpretation can vary by layer, so test in your stack rather than assuming identical behavior everywhere. Articles like Cloudflare Cache Rules Explained with Practical Examples and Varnish Cache Configuration Patterns for APIs and Content Sites are useful next reads.

Best fit by scenario

This section turns the comparison into practical choices. Real systems usually need different rules by route, response type, and failure tolerance.

Scenario 1: Marketing pages on a CMS

Best fit: stale-while-revalidate, often with stale-if-error as a backup.

Why: users value fast navigation, and content usually does not require second-by-second accuracy. If a page was updated a few minutes ago but a stale version is briefly served during refresh, the downside is usually limited. During an origin outage, showing the last good version is often better than returning a 502.

Practical approach: use a moderate freshness period, a short to moderate stale-while-revalidate window, and a bounded stale-if-error window. Pair this with reliable purge hooks from the CMS when content is published.

Scenario 2: News homepage or frequently updated editorial front page

Best fit: careful stale-while-revalidate; limited stale-if-error only if older content is acceptable during incidents.

Why: the homepage benefits from fast responses, but freshness matters more than on a static landing page. You may accept very brief staleness during background refresh, but serving a much older page during a prolonged outage could be misleading.

Practical approach: keep the base TTL shorter, keep stale windows narrower, and ensure purge behavior is fast and selective.

Scenario 3: Product catalog or search results

Best fit: depends on how often data changes and what users expect.

If the main goal is responsiveness and the data changes in batches, stale-while-revalidate may be helpful. If backend dependencies are fragile and it is acceptable to show slightly older results during failures, stale-if-error adds resilience.

Use caution when inventory, pricing, or availability are embedded. If stale data could create false expectations, keep windows tight or separate cache policies by field or endpoint.

Scenario 4: Public API with expensive aggregation

Best fit: often both directives, but only for non-sensitive endpoints.

Why: expensive aggregation endpoints can become hotspots exactly when traffic increases. stale-while-revalidate can smooth load after expiration. stale-if-error can keep downstream clients functional during temporary backend issues.

Practical approach: document freshness expectations for API consumers. “Eventually consistent within a bounded window” is manageable when stated clearly; hidden staleness is not.

Scenario 5: User account dashboard

Best fit: usually neither at a shared-cache level.

Why: personalized data, permissions, and freshness expectations make stale serving risky. Some subcomponents may be safe to cache privately, but broad use of stale directives on mixed personalized pages often causes confusion or correctness issues.

Practical approach: split cacheable fragments from user-specific data where possible. Treat public and private cache behavior separately.

Scenario 6: Static assets and images

Best fit: usually solve with long-lived versioned caching first; stale directives are secondary.

For assets with content hashing in filenames, immutable caching is usually the cleaner answer. For images or media that are updated in place, stale fallback can still help, but purge and versioning discipline matter more. For related guidance, see Image Caching Best Practices for Modern Web Performance.

Scenario 7: Service worker discussions

Best fit: understand the naming overlap but keep the concepts separate.

Developers often mix up HTTP stale-while-revalidate with the service worker strategy of a similar name. They are related in spirit, but they live at different layers and are configured differently. If your stack uses both edge caching and service workers, document which layer owns freshness and fallback behavior to avoid accidental double-staleness. For that comparison, see Service Worker Caching Strategies Compared: Cache First, Network First, and Stale While Revalidate.

When to revisit

This section is the practical maintenance checklist. Cache policy is never fully set-and-forget, especially when stale behavior is involved.

Revisit your stale-while-revalidate and stale-if-error settings when any of the following changes:

Content volatility changes. A route that was updated weekly may now change hourly.
User expectations change. As a feature becomes more transactional, old data may become less acceptable.
Origin reliability changes. If outages become more frequent, stale-if-error may hide symptoms without fixing the cause.
Purge capabilities improve. Better purge precision can justify more aggressive stale use on public content.
New cache layers are added. A CDN, reverse proxy, browser cache policy, or service worker can interact in unexpected ways.
Incidents reveal surprises. If an outage served content that was too old, or a refresh caused user-visible slowness, your assumptions need updating.

A simple review workflow works well:

List your key response types by business importance and freshness sensitivity.
Define acceptable stale windows separately for performance smoothing and failure fallback.
Test success paths and failure paths, not just normal cache hits.
Monitor stale serves, backend errors, and user-facing latency together.
Adjust route by route rather than applying one global policy.

Two final recommendations help keep the topic practical:

Document intent next to configuration. If a directive exists to absorb deploy-time 502s, say so. Future maintainers should know why the window exists.
Treat stale fallback as a controlled degradation mode. It is a feature, not proof that the origin is healthy.

If you want one concise rule to leave with, use this: choose stale-while-revalidate when you want expired content to hide refresh latency, and choose stale-if-error when you want expired content to hide backend failure. Use both only when your route can tolerate both kinds of fallback, and keep the allowed windows short enough that “helpful stale” does not become “silently wrong.”

As your traffic patterns, cache layers, and publishing workflows evolve, revisit these settings the same way you revisit TTLs, purge rules, and hit ratio assumptions. A cache policy that was reasonable six months ago may be too conservative, too risky, or simply mismatched to the system you run today. For a broader measurement mindset, Cache Hit Ratio: What It Means, How to Measure It, and When It Misleads is a useful companion read.

Stale-While-Revalidate vs Stale-If-Error: Practical Use Cases