⌘K
Change language Switch ThemeSign In
Narrow Mode
How Grab Optimizes Image Caching on Android with Time-Aware LRU ===============================================================
I InfoQ @Sergio De Simone
One Sentence Summary
Grab improved Android image caching by extending Glide's LRU to a time-aware TLRU design that reclaimed significant storage while keeping hit ratio impact within a controlled range.
Summary
The article explains how Grab addressed two opposite failure modes in fixed-size LRU image caches: cache churn for heavy users and stale long-lived entries for light users. Instead of rewriting cache infrastructure, the team forked Glide's DiskLruCache and added time-based expiration, last-access persistence, and migration compatibility for existing journals. Their TLRU policy combines TTL, minimum cache threshold, and maximum cache size to balance retention and eviction. The rollout was guided by explicit success criteria, including a hit-ratio drop cap of no more than 3 percentage points. Reported results showed substantial storage reduction across users, with large aggregate savings and no major server-cost regression, making this a practical mobile performance optimization case.
Main Points
* 1. Pure size-based LRU produced both over-eviction and under-eviction in real usage.The team observed rapid churn for active users and months-long stale retention for low-activity users when cache size never hit the cap. * 2. TLRU was implemented by extending a proven cache core rather than replacing it.By building on Glide DiskLruCache, the solution inherited mature crash recovery and thread-safety behavior while adding time-aware policy logic. * 3. Migration and rollback compatibility were treated as first-class constraints.Grab assigned baseline timestamps during migration and preserved bidirectional journal compatibility to support safe fallback. * 4. The rollout used measurable SLO-like guardrails.A bounded hit-ratio degradation threshold enabled storage gains without unacceptable request amplification.
Metadata
AI Score
88
Website infoq.com
Published At Yesterday
Length 540 words (about 3 min)
Sign in to use highlight and note-taking features for a better reading experience. Sign in now
How Grab Optimizes Image Caching on Android with Time-Aware LRU - InfoQ ===============
To improve image cache management in their Android app, Grab engineers transitioned from a Least Recently Used (LRU) cache to a Time-Aware Least Recently Used (TLRU) cache, enabling them to reclaim storage more effectively without degrading user experience or increasing server costs.
The Grab Android app used Glide as its primary image loading framework, including an LRU cache to store images locally in order to reduce network calls, improve load times, and lower server costs. However, analytics showed that using a 100 MB LRU cache had significant shortcomings: it often filled up quickly for many users, leading to performance degradation, while in other cases images remained cached for months if the cache never exceeded the size limit, thus wasting storage.
To circumvent these limitations, they decided to extend LRU with time-based expiration. TLRU uses three parameters, _Time To Live_ (TTL), which determines when a cache entry is considered expired; a _minimum cache size threshold_, which ensures essential images remain cached even when they expire if the cache is underpopulated; and _maximum cache size_, which enforces the upper storage limit.
Instead of implementing TLRU from scratch, Grab engineers chose to fork Glide and extend its DiskLruCache implementation, leveraging its "mature, battle-tested foundation"
> This implementation is widely adopted across the Android ecosystem and handles complex edge cases like crash recovery, thread safety, and performance optimization that would require substantial effort to replicate.
DiskLruCache needed to be extended along three dimensions by adding support for tracking last-access time, implementing time-based eviction logic, and including a migration mechanism for existing user caches.
Last-access times were required to sort cache entries by their most recent access and had to be persisted across app restarts. The time-based eviction logic ran on each cache access to check if the least recently accessed entry has expired before removing it. For existing cache migration, the main challenge was assigning last-access timestamps to LRU entries. Since filesystem APIs did not provide a reliable source, Grab engineers decided to assign the migration timestamp to all entries:
> This approach preserves all cached content and establishes a consistent baseline, although it necessitates waiting one TTL period to realize the full benefits of eviction. We also ensured bidirectional compatibility - the original LRU implementation can read TLRU journal files by ignoring timestamp suffixes, enabling safe rollbacks if needed.
Another challenge was finding optimal configuration values, which was based on controlled experiments.
> Our success criteria is for a cache hit ratio decrease of no more than 3 percentage points (pp) during the transition to TLRU. For instance, a decrease from 59% to 56% hit ratio would result in 7% increase in server requests. This threshold balances storage optimization with acceptable performance impact.
Using this approach, 95% of the app users saw a 50MB reduction in cache size, with the top 5% seeing even larger savings. Based on these results, Grab engineers estimated that they could reclaim terabytes of storage across devices while maintaining the cache hit ratio within acceptable limits and without increasing server costs.
The original post provides highly valuable detail on LRU behavior and TLRU implementation than can be covered here. Make sure you read it to get the full detail.
I InfoQ @Sergio De Simone
One Sentence Summary
Grab improved Android image caching by extending Glide's LRU to a time-aware TLRU design that reclaimed significant storage while keeping hit ratio impact within a controlled range.
Summary
The article explains how Grab addressed two opposite failure modes in fixed-size LRU image caches: cache churn for heavy users and stale long-lived entries for light users. Instead of rewriting cache infrastructure, the team forked Glide's DiskLruCache and added time-based expiration, last-access persistence, and migration compatibility for existing journals. Their TLRU policy combines TTL, minimum cache threshold, and maximum cache size to balance retention and eviction. The rollout was guided by explicit success criteria, including a hit-ratio drop cap of no more than 3 percentage points. Reported results showed substantial storage reduction across users, with large aggregate savings and no major server-cost regression, making this a practical mobile performance optimization case.
Main Points
* 1. Pure size-based LRU produced both over-eviction and under-eviction in real usage.
The team observed rapid churn for active users and months-long stale retention for low-activity users when cache size never hit the cap.
* 2. TLRU was implemented by extending a proven cache core rather than replacing it.
By building on Glide DiskLruCache, the solution inherited mature crash recovery and thread-safety behavior while adding time-aware policy logic.
* 3. Migration and rollback compatibility were treated as first-class constraints.
Grab assigned baseline timestamps during migration and preserved bidirectional journal compatibility to support safe fallback.
* 4. The rollout used measurable SLO-like guardrails.
A bounded hit-ratio degradation threshold enabled storage gains without unacceptable request amplification.
Key Quotes
* Our success criteria is for a cache hit ratio decrease of no more than 3 percentage points (pp) during the transition to TLRU. * Using this approach, 95% of the app users saw a 50MB reduction in cache size * This implementation is widely adopted across the Android ecosystem and handles complex edge cases
AI Score
88
Website infoq.com
Published At Yesterday
Length 540 words (about 3 min)
Tags
Android
Caching
TLRU
Glide
Mobile Performance
Related Articles
* 4 Patterns of AI Native Development * Engineering Speed at Scale — Architectural Lessons from Sub-100-ms APIs * How Workers powers our internal maintenance scheduling pipeline * OpenAI Introduces Harness Engineering: Codex Agents Power Large‑Scale Software Development * Architecture in a Flow of AI-Augmented Change * Where Architects Sit in the Era of AI to describe human-AI collaboration levels, and highlighting the extende...") HomeArticlesPodcastsVideosTweets
How Grab Optimizes Image Caching on Android with Time-Awa... ===============