Title: Grab 工程实践:将 LRU 升级为 TLRU,Android 图片缓存节省 50MB+ | BestBlogs.dev
URL Source: https://www.bestblogs.dev/article/2c79a8c1
Published Time: 2026-03-20 08:07:00
Markdown Content: Skip to main content Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters
⌘K
Change language Switch ThemeSign In
Narrow Mode
Grab 工程实践:将 LRU 升级为 TLRU,Android 图片缓存节省 50MB+
!Image 4: InfoQ 中文 InfoQ 中文 @InfoQ 中文
One Sentence Summary
By upgrading Glide's LRU caching mechanism to a time-aware TLRU, Grab engineers successfully saved over 50MB of storage for Android users while maintaining cache hit rates.
Summary
This article details Grab's engineering practices in optimizing Android image caching. Addressing the issue where the native Glide framework's LRU algorithm fails to effectively clear long-idle cache, Grab introduced the TLRU (Time-Aware Least Recently Used) mechanism. This solution employs a collaborative approach using TTL (Time-to-Live), minimum capacity thresholds, and maximum capacity limits to achieve finer-grained cache eviction logic. By forking and extending Glide's DiskLruCache, Grab engineers solved core challenges such as access time tracking, eviction logic implementation, and smooth migration of legacy cache. Experimental results show that while keeping the drop in cache hit rate within 3%, this optimization significantly reduced the storage footprint on user devices, achieving a balance between performance and cost.
Main Points
* 1. The traditional LRU algorithm creates a conflict between storage waste and performance degradation in mobile image caching scenarios.LRU only focuses on access order, causing some images to remain in the cache for months, occupying space, or leading to performance fluctuations due to frequent evictions when the cache is full. * 2. The TLRU mechanism implements a more intelligent cache reclamation strategy by introducing a time dimension (TTL).By combining TTL, minimum capacity, and maximum capacity parameters, it ensures timely cleanup of expired 'zombie' cache entries without affecting the core user experience. * 3. Extending a mature framework (Glide) is an efficient and low-risk engineering implementation path.By forking and modifying DiskLruCache, leveraging its existing crash recovery and thread-safety mechanisms, the focus was on solving time tracking and bidirectional compatibility migration issues. * 4. Establishing clear, quantitative success criteria is key to evaluating performance optimization solutions.Grab set a red line of no more than a 3% drop in cache hit rate, achieving a scientific balance between storage reclamation gains and increased server bandwidth costs.
Metadata
AI Score
83
Website mp.weixin.qq.com
Published At Today
Length 1514 words (about 7 min)
Sign in to use highlight and note-taking features for a better reading experience. Sign in now
InfoQ 2026-03-20 16:07 江苏
Grab 如何通过时间感知 LRU 优化 Android 图片缓存。
作者 | Sergio De Simone
译者 | 田橙
为了改进 Android 应用中的图片缓存管理,Grab 的工程师将原有的 最近最少使用(Least Recently Used,LRU)缓存机制升级为时间感知最近最少使用(Time-Aware Least Recently Used,TLRU)缓存。这一改进使他们能够更高效地回收存储空间,同时不会降低用户体验,也不会增加服务器成本。
Grab 的 Android 应用使用 Glide 作为主要的图片加载框架。该框架内置了一个 LRU 缓存,用于在本地存储图片,从而减少网络请求、提升加载速度并降低服务器开销。然而,数据分析显示,使用 100 MB 的 LRU 缓存存在明显问题:对于许多用户来说,缓存空间很快就被填满,导致性能下降;而在另一些情况下,如果缓存始终没有达到大小上限,图片可能会在缓存中保留数月之久,从而造成存储空间浪费。
为了绕过这些限制,Grab 工程师决定在 LRU 的基础上引入基于时间的过期机制。TLRU 通过三个参数进行控制:首先是 Time To Live(TTL),用于确定缓存条目在何时被视为过期;其次是最小缓存容量阈值,确保即使缓存条目已过期,只要缓存容量不足,关键图片仍然可以继续保留;最后是最大缓存容量,用于限制缓存的存储上限。
在实现方案上,Grab 工程师并没有从零开始编写 TLRU,而是选择 fork Glide 项目并扩展其 DiskLruCache 实现,以利用其“成熟且经过大量实践验证的基础架构”。
> 这一实现方式在 Android 生态中被广泛采用,并且已经处理了许多复杂的边界情况,例如崩溃恢复、线程安全以及性能优化等。如果从零实现这些能力,需要投入大量额外的工程工作。
为了支持 TLRU,DiskLruCache 需要在三个方面进行扩展:一是增加对最近访问时间(last-access time)的追踪;二是实现基于时间的缓存淘汰逻辑;三是提供现有用户缓存的迁移机制。
其中,记录最近访问时间是为了能够按照“最近访问顺序”对缓存条目进行排序,并且这些时间信息必须在应用重启后仍然能够保留。时间驱动的淘汰逻辑会在每次缓存访问时运行,用于检查最近最少访问的条目是否已经过期,如果过期则将其移除。
对于已有缓存的迁移,主要挑战在于如何为原本的 LRU 缓存条目分配最近访问时间戳。由于文件系统 API 无法提供可靠的时间来源,Grab 工程师最终决定为所有缓存条目统一赋予迁移时间戳。
> 这种方式可以保留所有现有缓存内容,并建立一个一致的时间基线。不过,它也意味着必须等待 一个完整的 TTL 周期之后,缓存淘汰机制的全部效果才会体现出来。同时,他们还确保了双向兼容性:原始的 LRU 实现可以通过忽略时间戳后缀来读取 TLRU 的日志文件,从而在需要时能够安全回滚。
另一个挑战是确定最佳配置参数,这需要通过受控实验来完成。
> Grab 为此设定了明确的成功标准:在从 LRU 迁移到 TLRU 的过程中,缓存命中率下降不得超过 3 个百分点(pp)。例如,如果命中率从 59% 降至 56%,就意味着服务器请求量将增加约 7%。这一阈值在存储优化与性能影响之间取得了平衡。
通过这种方案,95% 的应用用户缓存大小减少了约 50 MB,而缓存规模最大的 5% 用户则获得了更为显著的节省。基于这些结果,Grab 工程师估算,在保持缓存命中率处于可接受范围、且不增加服务器成本的前提下,他们可以在用户设备上回收数 TB 级别的存储空间。
原始文章对 LRU 的行为机制以及 TLRU 的实现细节提供了大量更为深入的技术说明,远超本文所能覆盖的范围。如果希望了解完整实现细节,建议阅读原文。 原文链接: https://www.infoq.com/news/2026/03/grab-tlru-image-cache/
今日好文推荐 Cursor 经历生死存亡 黄仁勋 GTC 2026 演讲实录:所有SaaS公司都将消失;Token成本全球最低;“龙虾”创造了历史;Feynman 架构已在路上 Anthropic工程师都离不开!深夜随手撸出的开源神器,被OpenAl高价收购,23人创业逆袭 OpenClaw 之父惊叹中国速度!大厂集体杀入新战场:用AI 批量制造“一人公司”
!Image 8: InfoQ 中文 InfoQ 中文 @InfoQ 中文
One Sentence Summary
By upgrading Glide's LRU caching mechanism to a time-aware TLRU, Grab engineers successfully saved over 50MB of storage for Android users while maintaining cache hit rates.
Summary
This article details Grab's engineering practices in optimizing Android image caching. Addressing the issue where the native Glide framework's LRU algorithm fails to effectively clear long-idle cache, Grab introduced the TLRU (Time-Aware Least Recently Used) mechanism. This solution employs a collaborative approach using TTL (Time-to-Live), minimum capacity thresholds, and maximum capacity limits to achieve finer-grained cache eviction logic. By forking and extending Glide's DiskLruCache, Grab engineers solved core challenges such as access time tracking, eviction logic implementation, and smooth migration of legacy cache. Experimental results show that while keeping the drop in cache hit rate within 3%, this optimization significantly reduced the storage footprint on user devices, achieving a balance between performance and cost.
Main Points
* 1. The traditional LRU algorithm creates a conflict between storage waste and performance degradation in mobile image caching scenarios.
LRU only focuses on access order, causing some images to remain in the cache for months, occupying space, or leading to performance fluctuations due to frequent evictions when the cache is full.
* 2. The TLRU mechanism implements a more intelligent cache reclamation strategy by introducing a time dimension (TTL).
By combining TTL, minimum capacity, and maximum capacity parameters, it ensures timely cleanup of expired 'zombie' cache entries without affecting the core user experience.
* 3. Extending a mature framework (Glide) is an efficient and low-risk engineering implementation path.
By forking and modifying DiskLruCache, leveraging its existing crash recovery and thread-safety mechanisms, the focus was on solving time tracking and bidirectional compatibility migration issues.
* 4. Establishing clear, quantitative success criteria is key to evaluating performance optimization solutions.
Grab set a red line of no more than a 3% drop in cache hit rate, achieving a scientific balance between storage reclamation gains and increased server bandwidth costs.
Key Quotes
* TLRU is controlled by three parameters: first, TTL, used to determine when a cache entry expires; second, the minimum cache capacity threshold; and finally, the maximum cache capacity. * Grab engineers did not write TLRU from scratch; instead, they chose to fork the Glide project and extend its DiskLruCache implementation to leverage its mature infrastructure. * During the migration from LRU to TLRU, the cache hit rate was not allowed to drop by more than 3 percentage points (pp). * 95% of app users saw their cache size reduced by approximately 50 MB, while the 5% of users with the largest cache sizes achieved even more significant savings.
AI Score
83
Website mp.weixin.qq.com
Published At Today
Length 1514 words (about 7 min)
Tags
Android Development
Cache Optimization
LRU
TLRU
Glide
Related Articles
* OpenAI Frontline Development Observations: Those Who Can Manage 10-20 Agents Simultaneously and Run Hour-Long Tasks Are Leaving Other Engineers Far Behind * 1,500 PRs, 0 Human Coders: Building a Million-Line Internal Product Driven by Codex * Practices and Reflections on Vibe Coding in Code Generation and Collaboration * “AI on the Front Lines: How Developers are Reshaping the Software Development Process” | Roundtable Discussion * From Context to Long-Term Memory: Architectural Design and Practice of LLM Memory Engineering\" architecture.") HomeArticlesPodcastsVideosTweets