Tether AI 借助 QVAC Fabric，在智能手机上实现十亿参数模型运行

Title: Tether AI Enables Billion-Parameter Models on Smartphones...

URL Source: https://www.bestblogs.dev/status/2033906190451687723

Published Time: 2026-03-17 13:59:58

Markdown Content: Tether AI breakthrough Tether AI team just released new version of QVAC Fabric to include the World’s First Cross-Platform BitNet LoRA Framework to Enable Billion-Parameter AI Training and Inference on Consumer GPUs and Smartphones.

Background

Microsoft's BitNet uses one bit architecture to dramatically compress models.

Traditional LLMs operate on full-precision computation, where weights are stored as complex, high-resolution numbers. The innovation of BitNet is that it shrinks these weights into a tiny ternary range of only -1, 0, and 1. significantly reducing memory usage and computation.

LoRA, is a parameter-efficient fine-tuning technique that reduces the number of trainable parameters by up to ninety-nine percent.

Together they slash memory and compute requirements. Yet BitNet has mostly been limited to CPU or CUDA NVIDIA backends, and lacked the support of LoRA fine-tuning.

Enters QVAC Fabric: the unlock

Today, with QVAC Fabric LLM, is the first time BitNet LoRA fine-tuning and inference work cross-platform across GPU vendors and operating systems using Vulkan and Metal backends.

That means support for AMD, Intel, Apple Metal and also Mobile GPUs.

And for the first time ever, BitNet inference runs efficiently on smartphones using mobile GPUs.

On flagship devices, GPU inference is 2 to 11 times faster than CPU while using up to 90% less memory than the full precision models.

The biggest unlock: QVAC Fabric LLM support for BitNet LoRA fine-tuning on heterogeneous GPUs. Our team was able to demonstrate this by fine tuning models up to 3.8 billion parameters on all flagships phones such as Pixel 9, S25 and iPhone 16 and up to 13 billion parameter models on the iPhone 16.

Github repositories: github.com/tetherto/qvac-… : general QVAC Fabric codebase github.com/tetherto/qvac-… : specific QVAC Fabric's BitNet knowledge base, architecture docs and pre-built binaries

What does it mean?

What used to require dedicated GPUs now runs on consumer hardware.

This breakthrough is the first real-world signal of a local private AI that can truly serve the people.

And this is just the beginning.

In the next months and years Tether will relentlessly continue to invest significant amounts of resources and capital to continue to research and develop open-source intelligence that can scale and evolve on local devices, providing maximum utility and privacy to its users.

The era of Stable Intelligence has just begun.

Free as in freedom.

Tether AI 借助 QVAC Fabric，在智能手机上实现十亿参数模型运行

🤖 問 AI