← 回總覽

Grab 如何使用时间感知型 LRU 优化 Android 上的图像缓存

📅 2026-03-15 04:00 Sergio De Simone 软件编程 10 分鐘 11671 字 評分: 88
Android 缓存 TLRU Glide 移动性能
📌 一句话摘要 Grab 通过将 Glide 的 LRU 扩展为时间感知型 TLRU 设计,改进了 Android 图像缓存,显著回收了存储空间,同时将命中率影响控制在可接受范围内。 📝 详细摘要 本文解释了 Grab 如何解决固定大小 LRU 图像缓存中的两种相反的故障模式:对于重度用户而言的缓存频繁更新,以及对于轻度用户而言的陈旧的长期存在的条目。团队没有重写缓存基础设施,而是 fork 了 Glide 的 DiskLruCache,并增加了基于时间的过期机制、最后访问持久化以及对现有日志的迁移兼容性。他们的 TLRU 策略结合了 TTL、最小缓存阈值和最大缓存大小,以平衡保留和逐出。
Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

How Grab Optimizes Image Caching on Android with Time-Aware LRU ===============================================================

I InfoQ @Sergio De Simone

One Sentence Summary

Grab improved Android image caching by extending Glide's LRU to a time-aware TLRU design that reclaimed significant storage while keeping hit ratio impact within a controlled range.

Summary

The article explains how Grab addressed two opposite failure modes in fixed-size LRU image caches: cache churn for heavy users and stale long-lived entries for light users. Instead of rewriting cache infrastructure, the team forked Glide's DiskLruCache and added time-based expiration, last-access persistence, and migration compatibility for existing journals. Their TLRU policy combines TTL, minimum cache threshold, and maximum cache size to balance retention and eviction. The rollout was guided by explicit success criteria, including a hit-ratio drop cap of no more than 3 percentage points. Reported results showed substantial storage reduction across users, with large aggregate savings and no major server-cost regression, making this a practical mobile performance optimization case.

Main Points

* 1. Pure size-based LRU produced both over-eviction and under-eviction in real usage.The team observed rapid churn for active users and months-long stale retention for low-activity users when cache size never hit the cap. * 2. TLRU was implemented by extending a proven cache core rather than replacing it.By building on Glide DiskLruCache, the solution inherited mature crash recovery and thread-safety behavior while adding time-aware policy logic. * 3. Migration and rollback compatibility were treated as first-class constraints.Grab assigned baseline timestamps during migration and preserved bidirectional journal compatibility to support safe fallback. * 4. The rollout used measurable SLO-like guardrails.A bounded hit-ratio degradation threshold enabled storage gains without unacceptable request amplification.

Metadata

AI Score

88

Website infoq.com

Published At Yesterday

Length 540 words (about 3 min)

Sign in to use highlight and note-taking features for a better reading experience. Sign in now

How Grab Optimizes Image Caching on Android with Time-Aware LRU - InfoQ ===============

To improve image cache management in their Android app, Grab engineers transitioned from a Least Recently Used (LRU) cache to a Time-Aware Least Recently Used (TLRU) cache, enabling them to reclaim storage more effectively without degrading user experience or increasing server costs.

The Grab Android app used Glide as its primary image loading framework, including an LRU cache to store images locally in order to reduce network calls, improve load times, and lower server costs. However, analytics showed that using a 100 MB LRU cache had significant shortcomings: it often filled up quickly for many users, leading to performance degradation, while in other cases images remained cached for months if the cache never exceeded the size limit, thus wasting storage.

To circumvent these limitations, they decided to extend LRU with time-based expiration. TLRU uses three parameters, _Time To Live_ (TTL), which determines when a cache entry is considered expired; a _minimum cache size threshold_, which ensures essential images remain cached even when they expire if the cache is underpopulated; and _maximum cache size_, which enforces the upper storage limit.

Instead of implementing TLRU from scratch, Grab engineers chose to fork Glide and extend its DiskLruCache implementation, leveraging its "mature, battle-tested foundation"

> This implementation is widely adopted across the Android ecosystem and handles complex edge cases like crash recovery, thread safety, and performance optimization that would require substantial effort to replicate.

DiskLruCache needed to be extended along three dimensions by adding support for tracking last-access time, implementing time-based eviction logic, and including a migration mechanism for existing user caches.

Last-access times were required to sort cache entries by their most recent access and had to be persisted across app restarts. The time-based eviction logic ran on each cache access to check if the least recently accessed entry has expired before removing it. For existing cache migration, the main challenge was assigning last-access timestamps to LRU entries. Since filesystem APIs did not provide a reliable source, Grab engineers decided to assign the migration timestamp to all entries:

> This approach preserves all cached content and establishes a consistent baseline, although it necessitates waiting one TTL period to realize the full benefits of eviction. We also ensured bidirectional compatibility - the original LRU implementation can read TLRU journal files by ignoring timestamp suffixes, enabling safe rollbacks if needed.

Another challenge was finding optimal configuration values, which was based on controlled experiments.

> Our success criteria is for a cache hit ratio decrease of no more than 3 percentage points (pp) during the transition to TLRU. For instance, a decrease from 59% to 56% hit ratio would result in 7% increase in server requests. This threshold balances storage optimization with acceptable performance impact.

Using this approach, 95% of the app users saw a 50MB reduction in cache size, with the top 5% seeing even larger savings. Based on these results, Grab engineers estimated that they could reclaim terabytes of storage across devices while maintaining the cache hit ratio within acceptable limits and without increasing server costs.

The original post provides highly valuable detail on LRU behavior and TLRU implementation than can be covered here. Make sure you read it to get the full detail.

I InfoQ @Sergio De Simone

One Sentence Summary

Grab improved Android image caching by extending Glide's LRU to a time-aware TLRU design that reclaimed significant storage while keeping hit ratio impact within a controlled range.

Summary

The article explains how Grab addressed two opposite failure modes in fixed-size LRU image caches: cache churn for heavy users and stale long-lived entries for light users. Instead of rewriting cache infrastructure, the team forked Glide's DiskLruCache and added time-based expiration, last-access persistence, and migration compatibility for existing journals. Their TLRU policy combines TTL, minimum cache threshold, and maximum cache size to balance retention and eviction. The rollout was guided by explicit success criteria, including a hit-ratio drop cap of no more than 3 percentage points. Reported results showed substantial storage reduction across users, with large aggregate savings and no major server-cost regression, making this a practical mobile performance optimization case.

Main Points

* 1. Pure size-based LRU produced both over-eviction and under-eviction in real usage.

The team observed rapid churn for active users and months-long stale retention for low-activity users when cache size never hit the cap.

* 2. TLRU was implemented by extending a proven cache core rather than replacing it.

By building on Glide DiskLruCache, the solution inherited mature crash recovery and thread-safety behavior while adding time-aware policy logic.

* 3. Migration and rollback compatibility were treated as first-class constraints.

Grab assigned baseline timestamps during migration and preserved bidirectional journal compatibility to support safe fallback.

* 4. The rollout used measurable SLO-like guardrails.

A bounded hit-ratio degradation threshold enabled storage gains without unacceptable request amplification.

Key Quotes

* Our success criteria is for a cache hit ratio decrease of no more than 3 percentage points (pp) during the transition to TLRU. * Using this approach, 95% of the app users saw a 50MB reduction in cache size * This implementation is widely adopted across the Android ecosystem and handles complex edge cases

AI Score

88

Website infoq.com

Published At Yesterday

Length 540 words (about 3 min)

Tags

Android

Caching

TLRU

Glide

Mobile Performance

Related Articles

* 4 Patterns of AI Native Development * Engineering Speed at Scale — Architectural Lessons from Sub-100-ms APIs * How Workers powers our internal maintenance scheduling pipeline * OpenAI Introduces Harness Engineering: Codex Agents Power Large‑Scale Software Development * Architecture in a Flow of AI-Augmented Change * Where Architects Sit in the Era of AI to describe human-AI collaboration levels, and highlighting the extende...") HomeArticlesPodcastsVideosTweets

How Grab Optimizes Image Caching on Android with Time-Awa... ===============

查看原文 → 發佈: 2026-03-15 04:00:00 收錄: 2026-03-15 08:01:00

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。