真·养虾！3 步让龙虾边聊边进化，不用 GPU 不用数据集就能强化学习

Skip to main content ![Image 5: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticles Podcasts Videos Tweets Sources Newsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

真·养虾！3 步让龙虾边聊边进化，不用 GPU 不用数据集就能强化学习 ===================================

量量子位 @闻乐

One Sentence Summary

MetaClaw is an online reinforcement learning system designed for AI Agents that enables automatic skill evolution without local computing power through the SkillRL framework and cloud training.

Summary

This article introduces the MetaClaw project, led by UNC Assistant Professor Huaxiu Yao, which aims to lower the barrier to continuous learning for AI Agents. MetaClaw intercepts daily conversations between users and agents, converts them into training data, and optimizes them in the background using online reinforcement learning. At its core is the SkillRL framework, which combines 'skill injection' and 'skill evolution' mechanisms, allowing AI to learn from mistakes and automatically generate new skills. Additionally, the system offloads LoRA training tasks to the Tinker cloud platform, decoupling training from deployment and enabling developers to achieve agent self-iteration without local GPUs.

Main Points

* 1. The core mechanism is the self-developed SkillRL (Skill-enhanced Reinforcement Learning) framework.The framework achieves immediate performance optimization through 'skill injection' and uses a 'skill evolution' mechanism to let the AI analyze causes from failed interaction trajectories, automatically generating and storing new skills in a library for continuous capability growth. * 2. An asynchronous architecture and dual-learning mode are used to achieve complete decoupling of training and serving.The system intercepts interactions in the background for reward modeling and training without affecting real-time frontend responses. Users can choose between lightweight reinforcement learning based on implicit feedback or online policy distillation based on high-quality text feedback. * 3. Local computing power barriers are eliminated through a cloud platform, enabling low-cost continuous learning.MetaClaw offloads complex LoRA training tasks to the Tinker cloud platform. Developers can complete model hot-swapping via simple SDK calls, eliminating the need to maintain expensive GPU clusters or professional engineering teams.

Metadata

AI Score

Website qbitai.com

Published At Today

Length 1403 words (about 6 min)

> 闻乐发自凹非寺 > > > 量子位 | 公众号 QbitAI

让OpenClaw帮干活还不够，现在，程序员们正想方设法让龙虾自己变强。

注意注意！还不是某个任务上的单点提升，这次有人直接给整个智能体套一层在线强化学习系统MetaClaw——

不用自己维护GPU集群、不用数据集也无需人工微调，让AI跟你聊着聊着就能自己变聪明。

!Image 6

这种新的学习模式就是把用户和AI的日常对话直接变成训练数据，整个学习循环全在后台完成，也不耽误正常使用。

咱平时跟AI该聊啥聊啥，MetaClaw会默默拦截OpenClaw的交互过程，给每一轮对话打分，再通过在线微调优化AI的决策策略。

而且它还吃一堑长一智，要是AI哪句话翻车了，MetaClaw会自动扒完整的交互轨迹，分析问题出在哪，然后自动生成一个新技能存进技能库。

下次再遇到类似的坑，相关技能会被精准搜索出来注入系统提示，同款错误直接拜拜。

!Image 7

模型底座基于Kimi-2.5构建，同时也准备了Qwen3-4B这个轻量级替代方案，低配设备也能跑。

核心机制是自研的SkillRL技能增强强化学习框架，说白了就是技能注入+技能进化的组合拳。

* 技能注入 * 每轮对话里精准匹配相关技能指令，不用等训练结束，AI当场就能优化表现； * 技能进化 * 让AI从被动接收指令变成主动生成技能，技能库越用越丰富，能力水涨船高。

!Image 8

而最吸引人的，是不依赖本地GPU集群，不用自己维护这个设定。

MetaClaw把所有训练任务全甩给了Tinker云平台，训练和部署彻底分家。

只要你的设备能连上网，就能跑通整个系统，不用操心算力，也不用专门的工程团队盯着维护。

这波直接把AI持续学习的门槛干到了地板级，普通人也能养出会进化的龙虾了。

除此之外，MetaClaw的细节设计也很懂开发者的痛点。 异步架构+双学习模式，把服务、奖励建模和训练彻底解耦，AI一边给用户实时响应，后台一边做打分和优化，“工作学习”两不耽误。

学习模式也给足了选择，想轻量化就用强化学习，从用户隐式反馈里优化；想深度提升就用在线策略蒸馏，结合高质量文本反馈进阶。

主打一个你想怎么训就怎么训。

用起来还贼简单，就3步。 第一步先安装依赖，前面的是常规服务和大模型相关库，跑API、发请求、接大模型都用得上。

后面的tinker和tinker-cookbook是关键，这是云端LoRA训练的SDK。

> – pip install fastapi uvicorn httpx openai transformers > > – pip install tinker tinker-cookbook 第二步运行配置脚本将OpenClaw网关指向MetaClaw的代理，比较推荐的是Kimi2.5.

> – bash openclaw_model_kimi.sh 第三步是设置Tinker API密钥，直接跑训练脚本。

> – export TINKER_API_KEY=”xxx” > > – cd /path/to/metaclaw > > – python examples/run_conversation_rl.py

搞定，之后你只需要像平常一样和Agent聊天，MetaClaw会自动收集对话轮次、评分、训练模型。

每攒够一批样本就热替换一次权重，全程无需人工干预。

如果想启用技能注入，只需在配置中设置：

> – config = MetaClawConfig(use_skills=True)

想开始技能进化，可以设置（以GPT5.2为例）：

> – config = MetaClawConfig( > > use_skills=True, > > enable_skill_evolution=True, > > azure_openai_deployment=”gpt-5.2”, > > )

然后配好密钥：

> – export AZURE_OPENAI_API_KEY=”xxx” > > – export AZURE_OPENAI_ENDPOINT=”https://your-endpoint.openai.azure.com/“

所有配置项都集中在MetaClawConfig中，包括模型选择、LoRA参数、批次大小、训练步数、损失函数类型等，一目了然。

!Image 9

好好好，这下变成真·养虾了（doge）。

MetaClaw这项工作由姚骅修领导，他是电子科技大学校友，现任UNC计算机科学系的助理教授，曾在Stanford AI Lab做博士后，专注于Agent和具身AI。

项目地址：https://github.com/aiming-lab/MetaClaw

参考链接：

[1]https://x.com/BoWang87/status/2031094971630235941

[2]https://x.com/HuaxiuYaoML/status/2031069599651729905

— 完 —

量量子位 @闻乐

One Sentence Summary

MetaClaw is an online reinforcement learning system designed for AI Agents that enables automatic skill evolution without local computing power through the SkillRL framework and cloud training.

Summary

Main Points

* 1. The core mechanism is the self-developed SkillRL (Skill-enhanced Reinforcement Learning) framework.

The framework achieves immediate performance optimization through 'skill injection' and uses a 'skill evolution' mechanism to let the AI analyze causes from failed interaction trajectories, automatically generating and storing new skills in a library for continuous capability growth.

* 2. An asynchronous architecture and dual-learning mode are used to achieve complete decoupling of training and serving.

The system intercepts interactions in the background for reward modeling and training without affecting real-time frontend responses. Users can choose between lightweight reinforcement learning based on implicit feedback or online policy distillation based on high-quality text feedback.

* 3. Local computing power barriers are eliminated through a cloud platform, enabling low-cost continuous learning.

MetaClaw offloads complex LoRA training tasks to the Tinker cloud platform. Developers can complete model hot-swapping via simple SDK calls, eliminating the need to maintain expensive GPU clusters or professional engineering teams.

Key Quotes

* This new learning model turns daily conversations between users and AI directly into training data, with the entire learning loop completed in the background. * The core mechanism is the self-developed SkillRL framework—simply put, it's a combination of skill injection and skill evolution. * MetaClaw offloads all training tasks to the Tinker cloud platform, completely separating training from deployment. * This move lowers the barrier to continuous AI learning to the floor, allowing even ordinary people to raise lobsters that can evolve.

AI Score

Website qbitai.com

Published At Today

Length 1403 words (about 6 min)

真·养虾！3 步让龙虾边聊边进化，不用 GPU 不用数据集就能强化学习

One Sentence Summary

Summary

Main Points

Metadata

One Sentence Summary

Summary

Main Points

Key Quotes

Tags

Related Articles

🤖 問 AI

Related Articles

MiniMax M2.5 Released: $1/Hour, the King of Real-World Work

Deconstructing Clawdbot: Local Architecture, Memory Management, Agent Orchestration, and Context Assembly Principles

How to Design an AI Agent System

After Topping Open-Source Rankings with its Programming LLM, the Zhipu GLM Team Faced a 3-Hour Questioning Session

AI Starts to "Take Action", Alibaba's Qwen Leads the World

GPT-5.4 Released: OpenAI's First Unified Model, Truly Native

MiniMax Hailuo Video Team's First Open-Source Release: Tokenizer Exhibits a Clear Scaling Law

Yao Shunyu Lectures Face-to-Face with Tang Jie, Yang Zhilin, and Lin Junyang! Four Schema Heroes Debate Heroes at Zhongguancun