Multimodal Agents
AI agents that process text, images, audio, and video· 35 agents
by Meta AI
Meta's open-source multimodal models. Llama 4 Scout (17B active, 16 experts, 10M context) and Maverick (17B active, 128 experts, 1M context). Released April ...
by Google DeepMind
Google DeepMind's multimodal AI assistant. Gemini 2.5 Pro with native thinking, 1M token context, and tight integration across Google Workspace, Android, and Search.
by Hugging Face
Hugging Face's open-source chat UI for any model. Access Llama 4, DeepSeek, Mistral, Gemma, and 100+ open-weight models. Free, no API key required.
by Alibaba Cloud / Tongyi
Alibaba's flagship open-source LLM. 235B MoE (22B active). Multilingual, strong on coding and math. Qwen3-Coder variant matches Claude Code on HumanEval.
by uditgoenka
Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat ...
by AlexAnys
🇨🇳 OpenClaw中文用例与案例大全 | 46个真实场景 | 国内特色 + 海外案例的国内适配 | 自动化办公·内容创作·运维·AI助理·知识管理 | 新手友好 | Chinese guide for OpenClaw AI agent use cases
by upstash
Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors
by CopilotKit
The Frontend Stack for Agents & Generative UI. React + Angular. Makers of the AG-UI Protocol
by covibes
Your autonomous engineering team in a CLI. Point Zeroshot at an issue, walk away, and return to production-grade code. Supports Claude Code, OpenAI Codex, Op...
by qingchencloud
🦞 OpenClaw 可视化管理面板 — 内置 AI 助手(工具调用 + 图片识别 + 多模态),一键安装 | Visual management panel with built-in AI assistant (tool calling + vision + multimodal + i18n(11))
by UfoMiao
Zero-Config Code Flow for Claude code & Codex
by iOfficeAI
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you l...
by nexu-io
The simplest desktop client for OpenClaw 🦞 — bridge your Agent to WeChat, Feishu, Slack & Discord in one click. Works with Claude Code, Codex & any LLM. BYO...
by n8n-io
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
by microsoft
A programming framework for agentic AI
by lintsinghua
《御舆:解码 Agent Harness》42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析,15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站:
by OpenAI
OpenAI's second-generation video model. Cinema-quality 1080p video up to 60 seconds from text, image, or video. Physics simulation, precise camera control.
by Julius AI
AI data analyst. Upload CSV, Excel, SQL databases — get visualizations, insights, and statistical analysis in plain English. No coding required.
by Obviously AI
No-code predictive AI. Connect any data source, train an ML model in 2 minutes, and get predictions on churn, sales, fraud, and pricing without any code.
by openai
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K ou...
by can1357
⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more
Qwen2.5 Coder 7B Instruct — a text-generation model by undefined on HuggingFace. 2,522,695 downloads, 678 likes.
by openai
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reaso...
by qwen
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-ex...
by qwen
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mix...
by qwen
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9...
by adongwanai
https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成
by anthropic
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative...
by qwen
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-expe...
by qwen
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-e...
by AntonOsika
CLI platform to experiment with codegen. Precursor to: https://lovable.dev
by laihenyi
Convert NotebookLM PDFs to PPTX with separated background images and editable text layers using Gemini AI
by qwen
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed ...
by kwaipilot
KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS inte...
by wanshuiyin
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment auto...
Have a Multimodal Agents agent?
Submit it to appear alongside 35 others in this category.
Submit in Multimodal Agents →