← 回總覽

开源多模态工具集 Generative-Media-Skills:为 AI Agent 接入专业音视频生成能力

📅 2026-03-14 15:30 GitHubDaily 人工智能 4 分鐘 3948 字 評分: 82
Generative-Media-Skills AI Agent MCP协议 多模态 开源项目
📌 一句话摘要 Generative-Media-Skills 是一套支持 MCP 协议的开源工具集,旨在让 Cursor 和 Claude 等 AI 助手具备专业的图像与视频生成处理能力。 📝 详细摘要 该推文介绍了一个名为 Generative-Media-Skills 的开源项目,旨在解决开发者在使用 Cursor 或 Claude 时需要频繁切换工具处理媒体文件的问题。该工具集打通了 Midjourney、Flux、Kling、Veo3 等上百个主流 AI 模型,提供高性能命令行方案。核心亮点包括:支持 MCP 服务器协议(可向 Claude Desktop 暴露 19 个结构化工
Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

Generative-Media-Skills: An Open-Source Multimodal Toolkit for Integrating Professional Audio and Video Generation into AI Agents =================================================================================================================================

Generative-Media-Skills: An Open-Source Multimodal Toolkit for Integrating Professional Audio and Video Generation into AI Agents ================================================================================================================================= ![Image 2: GitHubDaily](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_2f8e5816) ### GitHubDaily

@GitHub_Daily

平时用 Cursor 或 Claude 敲代码,遇到需要生成演示图片或处理视频流的场景,还得切换到其他工具。

为了解决这个问题,在 GitHub 上找到了 Generative-Media-Skills 这套开源的多模态工具集。

一套专为 AI Agent 设计的高性能命令行方案,直接将专业的音视频生成能力无缝接入本地开发环境。

一口气打通了上百个主流 AI 模型,涵盖 Midjourney、Flux、Kling 和 Veo3,敲敲键盘就能搞定图像编辑与视频生成。

GitHub:github.com/SamurAIGPT/Gen…

还内置了极具价值的专家技能库,把电影导演运镜、UI 界面设计和矢量 Logo 绘制等专业逻辑,直接封装成了开箱即用的脚本。

支持本地文件一键上传处理,生成完毕后还能自动调用系统播放器进行预览。

支持 MCP 服务器协议,只需运行一条命令,就能向 Claude Desktop 暴露 19 个结构化的媒体工具。

适合想让本地 AI 助手具备专业级图片和音频处理能力的朋友安装使用。Show More

!Image 3: Tweet image

Mar 14, 2026, 7:30 AM View on X

0 Replies

0 Retweets

18 Likes

2,716 Views ![Image 4: GitHubDaily](https://www.bestblogs.dev/en/tweets?sourceid=2f8e5816) GitHubDaily @GitHub_Daily

One Sentence Summary

Generative-Media-Skills is an open-source multimodal toolkit supporting the MCP protocol, designed to equip AI assistants like Cursor and Claude with professional image and video generation and processing capabilities.

Summary

This tweet introduces Generative-Media-Skills, an open-source project designed to address the issue developers face when frequently switching tools for media file processing while using Cursor or Claude. This toolkit integrates with hundreds of mainstream AI models, including Midjourney, Flux, Kling, and Veo3, offering a high-performance command-line solution. Key highlights include: support for the MCP server protocol (exposing 19 structured media tools to Claude Desktop), a built-in expert skill library (featuring professional logic like cinematic camera movements, UI design, and vector logo drawing), and support for local file processing with automatic preview. This provides robust local infrastructure support for building multimodal AI Agents.

AI Score

82

Influence Score 8

Published At Today

Language

Chinese

Tags

Generative-Media-Skills

AI Agent

MCP Protocol

Multimodal

Open-source Project HomeArticlesPodcastsVideosTweets

Generative-Media-Skills: An Open-Source Multimodal Toolki... ===============

查看原文 → 發佈: 2026-03-14 15:30:10 收錄: 2026-03-14 18:00:48

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。