Skip to main content ![Image 3: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticles Podcasts Videos Tweets Sources Newsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

趋境科技发布 ATaaS：全球领先的高效能 AI Token 生产服务平台

量量子位 @十三

One Sentence Summary

Qujing Technology has launched the ATaaS platform, which aims to solve the problem of low resource utilization in intelligent computing clusters through heterogeneous inference reconstruction, SLO simulation, and extreme elastic scheduling technologies, transforming computing infrastructure into a high-performance 'Token Factory'.

Summary

Qujing Technology has officially launched ATaaS (Token as a Service), a high-performance AI Token production service platform. Addressing industry pain points such as the mismatch between computing investment and Token output, as well as severe hardware resource wastage, the platform introduces four core technical modules. By restructuring model computation logic (e.g., CPU+GPU heterogeneous separation), operator-level SLO simulation, and elastic scaling capabilities for 10,000-GPU clusters, ATaaS significantly improves hardware utilization and reduces operating costs. Its core philosophy is to evolve traditional data centers into efficiency-oriented 'Token Factories,' providing system-level optimization solutions to enhance the quality and efficiency of domestic computing power.

Main Points

* 1. The industry pain point lies in the imbalance between computing power investment and Token output.Traditional architectures rely too heavily on GPUs, leading to idle CPU and memory. Furthermore, the imbalance between software/hardware iteration and coarse-grained allocation results in over 50% of computing resources being wasted. * 2. Restructuring heterogeneous inference logic through 'Liuhe' technology.Deeply integrates CPU+GPU and domestic computing power to achieve intelligent task offloading, while expanding KV Cache by a hundredfold, reducing GPU computing overhead by up to 90%. * 3. Introducing operator-level SLO simulation to achieve precise scheduling.Uses virtual-physical isomorphism technology to pre-plan computing resources and isolate priorities based on business needs, increasing overall hardware utilization by several times. * 4. Infrastructure is evolving from 'Data Centers' to 'Token Factories'.The focus of industry competition has shifted from pure computing scale to Token production efficiency (e.g., TTFT, TPS), emphasizing value output per unit of energy consumption.

Metadata

AI Score

Website qbitai.com

Published At Today

Length 1893 words (about 8 min)

当前，人工智能产业正从模型能力竞争逐步转向规模化应用竞争。应用形态也从单一问答扩展到多智能体协作、长链路推理和复合任务执行，这使得 Token 需求快速增长。与此同时，算力采购、部署和运行过程中的设备和能源成本持续攀升，导致算力投入与实际 Token 产出之间的不匹配问题日益突出。围绕 Token 推理效率开展系统优化，已成为推动产业持续发展的关键环节。

基于这一背景，近日，全球领先的高效能 AI Token 生产服务商趋境科技，正式发布全新一代 AI 推理平台 ——趋境 ATaaS 高效能 AI Token 生产服务平台（Approaching.AI Token as a Service）。破解大额硬件投入难以转化为优质 Token 产能、资源浪费与成本空耗突出的行业困境。

针对当前行业难题，趋境 ATaaS 平台依托四大自研核心技术模块，构建覆盖异构整合、智能调度、弹性扩容的全链路能力，并将算力与能源封装为分层、面向具体应用场景定制的高效能 Token 服务，为国产算力提质增效、破解异构算力孤岛、实现规模化降本增效，提供标杆级中国方案。

!Image 4 1、硬件负载分化：过度依赖 GPU，CPU、内存等资源闲置空耗

传统 Token 生成链路重度依赖 GPU，而 CPU、大容量内存、集群 SSD 及 IB 高速互联等昂贵资源长期利用率不足 10%，全系统硬件资源利用率不足 20%，造成智算集群规模化刚性成本巨额空耗。 2、软硬件迭代失衡：芯片硬件快速更新，配套软件生态适配滞后

硬件标称算力虽持续提升，但软件层在通信、访存和算子融合等方面优化不足，PD/PP/CP/DP 等分布式并行策略在复杂组合下稳定性有限，最终导致超 80% 理论算力难以充分利用。 3、算力配置失准：脱离业务 SLO 精细化调度，粗放盲配引发资源冗余损耗

当前集群算力配置难以依据不同推理业务在时延、吞吐和稳定性上的差异化要求，精准匹配 CPU、GPU、内存等异构资源。统一部署和粗放配额仍较常见，导致超过 50% 的算力资源被隐形浪费。 4、架构演进失衡：开源模块丰富，但拼接式集成难以支撑规模化生产

开源生态为大模型推理提供了丰富模块，但在大规模集群场景下，单靠组件拼接难以解决系统级协同问题。原生架构对 KV Cache、序列长度等模型态关键参数感知不足，易引发负载失衡，再叠加通信阻塞和服务波动等因素，系统扩展后常面临性能下降与运维复杂度上升，无法支持大规模高效能 Token 生产。

趋境科技提供高效能Token的本质，是重构算力、电力与 Token 产量之间的效能曲线。ATaaS 不是简单的资源供给平台，而是效能放大器——用软件撬动数倍于当前的 Token 产能。

!Image 5 六合：异构推理 2.0｜全球首创大模型计算逻辑重构技术

深度融合 CPU+GPU、国产与非国产算力异构 PD 分离等技术，重构模型计算逻辑，并基于算子与任务特征进行智能分流：CPU 承载低计算密度任务，国产算力卡处理高密度 Prefill，大显存显卡承载高访存 Decode。万卡级智算集群整体运营成本压降20%以上。

通过架构重构，将原本依赖昂贵显存承载的 KV Cache 存储空间扩展百倍至千倍，形成近乎无限的缓存池资源，缓存命中率最高可达 90% ，直接削减 90% GPU 算力开销。 双仪：虚实同构｜全球首创算子级 SLO 仿真

基于算子级精细仿真，推演大模型 Token 生成全链路的吞吐、时延与访存表现，实现算力资源的智能预规划与动态调优；围绕业务 SLO 分级需求，精准切分异构算力配额并隔离资源优先级，可将万卡级智算集群硬件综合资源利用率最高提升数倍。 万象：极致弹性｜打通规模化量产最后壁垒

依托系统化工程能力，实现万亿参数大模型 7 秒快速拉起与动态配置变更、数百节点超大规模 EP 弹性调度，以及智能容灾重构和负载均衡，形成平台原生支持万卡级高性能横向扩展的关键能力。在落地初期，便推动某在线公司的AI业务实现千卡集群吞吐实现翻倍提升。 从“数据中心”到“Token工厂”

“趋境 ATaaS：高效能 AI Token 生产服务平台（Approaching.AI Token as a Service）”的发布，体现了 AI 基础设施发展重点的进一步演进。行业关注的焦点，正在从单纯的算力规模竞争，转向对 Token 生产效率的综合衡量，其中包括 Token 响应延时（TTFT）、Token 吞吐（TPS）以及资源利用效率等关键指标。

这一理念与当前的行业共识一致，当黄仁勋宣布 NVIDIA 的1万亿美元需求预测，当 Token “供不应求”成为常态，算力基础设施正从“数据中心”演进为“Token 工厂”。

作为高效能 AI Token 生产服务商，趋境科技依托长期的团队积累和推理优化能力，推出趋境 ATaaS 平台的意义不仅在于拓展了推理基础设施的技术边界，更在于为 AI 基础设施的建设和运营提供了新的思路和行业标准：通过提升算力调度效率、优化推理过程、增强资源协同能力，使每单位算力和能耗投入都能够数倍转化为更稳定、更可衡量的 Token 价值产出。

转载来源：趋境科技

本文为量子位获授权转载，观点仅为原作者所有。

量量子位 @十三

One Sentence Summary

Summary

Main Points

* 1. The industry pain point lies in the imbalance between computing power investment and Token output.

Traditional architectures rely too heavily on GPUs, leading to idle CPU and memory. Furthermore, the imbalance between software/hardware iteration and coarse-grained allocation results in over 50% of computing resources being wasted.

* 2. Restructuring heterogeneous inference logic through 'Liuhe' technology.

Deeply integrates CPU+GPU and domestic computing power to achieve intelligent task offloading, while expanding KV Cache by a hundredfold, reducing GPU computing overhead by up to 90%.

* 3. Introducing operator-level SLO simulation to achieve precise scheduling.

Uses virtual-physical isomorphism technology to pre-plan computing resources and isolate priorities based on business needs, increasing overall hardware utilization by several times.

* 4. Infrastructure is evolving from 'Data Centers' to 'Token Factories'.

The focus of industry competition has shifted from pure computing scale to Token production efficiency (e.g., TTFT, TPS), emphasizing value output per unit of energy consumption.

Key Quotes

* ATaaS is not just a simple resource supply platform, but an efficiency amplifier—leveraging software to unlock several times the current Token production capacity. * Computing infrastructure is evolving from 'Data Centers' to 'Token Factories'. * Through architectural restructuring, the KV Cache storage space, which originally relied on expensive VRAM, is expanded by a hundred to a thousand times, forming a nearly infinite cache pool resource.

AI Score

Website qbitai.com

Published At Today

Length 1893 words (about 8 min)

Qujing Technology Launches ATaaS: A Leading High-Performa...

趋境科技发布 ATaaS：高效能 AI Token 生产服务平台

趋境科技发布 ATaaS：全球领先的高效能 AI Token 生产服务平台

One Sentence Summary

Summary

Main Points

Metadata

One Sentence Summary

Summary

Main Points

Key Quotes

Tags

Related Articles

Qujing Technology Launches ATaaS: A Leading High-Performa...

🤖 問 AI