⌘K
Change language Switch ThemeSign In
Narrow Mode
Generative-Media-Skills: An Open-Source Multimodal Toolkit for Integrating Professional Audio and Video Generation into AI Agents =================================================================================================================================
Generative-Media-Skills: An Open-Source Multimodal Toolkit for Integrating Professional Audio and Video Generation into AI Agents =================================================================================================================================  ### GitHubDaily
@GitHub_Daily
平时用 Cursor 或 Claude 敲代码,遇到需要生成演示图片或处理视频流的场景,还得切换到其他工具。
为了解决这个问题,在 GitHub 上找到了 Generative-Media-Skills 这套开源的多模态工具集。
一套专为 AI Agent 设计的高性能命令行方案,直接将专业的音视频生成能力无缝接入本地开发环境。
一口气打通了上百个主流 AI 模型,涵盖 Midjourney、Flux、Kling 和 Veo3,敲敲键盘就能搞定图像编辑与视频生成。
GitHub:github.com/SamurAIGPT/Gen…
还内置了极具价值的专家技能库,把电影导演运镜、UI 界面设计和矢量 Logo 绘制等专业逻辑,直接封装成了开箱即用的脚本。
支持本地文件一键上传处理,生成完毕后还能自动调用系统播放器进行预览。
支持 MCP 服务器协议,只需运行一条命令,就能向 Claude Desktop 暴露 19 个结构化的媒体工具。
适合想让本地 AI 助手具备专业级图片和音频处理能力的朋友安装使用。Show More
Mar 14, 2026, 7:30 AM View on X
0 Replies
0 Retweets
18 Likes
2,716 Views  GitHubDaily @GitHub_Daily
One Sentence Summary
Generative-Media-Skills is an open-source multimodal toolkit supporting the MCP protocol, designed to equip AI assistants like Cursor and Claude with professional image and video generation and processing capabilities.
Summary
This tweet introduces Generative-Media-Skills, an open-source project designed to address the issue developers face when frequently switching tools for media file processing while using Cursor or Claude. This toolkit integrates with hundreds of mainstream AI models, including Midjourney, Flux, Kling, and Veo3, offering a high-performance command-line solution. Key highlights include: support for the MCP server protocol (exposing 19 structured media tools to Claude Desktop), a built-in expert skill library (featuring professional logic like cinematic camera movements, UI design, and vector logo drawing), and support for local file processing with automatic preview. This provides robust local infrastructure support for building multimodal AI Agents.
AI Score
82
Influence Score 8
Published At Today
Language
Chinese
Tags
Generative-Media-Skills
AI Agent
MCP Protocol
Multimodal
Open-source Project HomeArticlesPodcastsVideosTweets
Generative-Media-Skills: An Open-Source Multimodal Toolki... ===============