A self-contained multimodal AI agents lab built using MongoDB, Gemini and LangGraph.
☆66Sep 23, 2025Updated 5 months ago
Alternatives and similar repositories for multimodal-agents-lab
Users that are interested in multimodal-agents-lab are comparing it to the libraries listed below
Sorting:
- Building a multi-agent RAG system with advanced RAG methods☆12Jan 12, 2025Updated last year
- A simple exam generator and grader written in Python with OpenCV☆14Jan 14, 2026Updated last month
- Surrogate Modeling of the Aerodynamic Performance for Transonic Regime☆13Feb 12, 2024Updated 2 years ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆25May 31, 2025Updated 9 months ago
- mouse pet-ct image segmentation☆12Feb 19, 2023Updated 3 years ago
- Agent PR Replay takes merged PRs from any repository, reverse-engineers the task prompt, runs Claude Code against it, and compares what t…☆30Jan 1, 2026Updated 2 months ago
- A browser based CadQuery server☆12Feb 18, 2025Updated last year
- Generate a 3D BIM Model from 2D CAD Drawings☆12Nov 23, 2022Updated 3 years ago
- ☆13Feb 5, 2025Updated last year
- ☆14Jun 3, 2024Updated last year
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆12Jan 29, 2024Updated 2 years ago
- Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval And Synthesis For SLMs☆55Oct 7, 2025Updated 4 months ago
- Initially a fork of the GitHub repository for the paper "Informer" accepted by AAAI 2021. Heavily modified since then.☆15Apr 7, 2023Updated 2 years ago
- project page of "VAD v2: LLM-Like Probabilistic Modeling in End-to-End Autonomous Driving"☆11Mar 8, 2024Updated last year
- Closed-loop evaluation for end-to-end VLM autonomous driving agent☆25Mar 8, 2025Updated 11 months ago
- ☆25Jan 28, 2026Updated last month
- ☆19Aug 8, 2024Updated last year
- ☆25Jul 29, 2025Updated 7 months ago
- SDXL API provides a seamless interface for image generation and retrieval using Stable Diffusion XL integrated with Cloudflare AI Workers…☆13Feb 29, 2024Updated 2 years ago
- GPT 4 Vision + TTS 多模态能力 Demo☆17Nov 15, 2023Updated 2 years ago
- ☆18May 14, 2024Updated last year
- Linkedin Page☆25Apr 8, 2024Updated last year
- Save a png or jpeg and option to save prompt/workflow in a text or json file for each image in Comfy + Workflow loading☆24Aug 14, 2023Updated 2 years ago
- The autoware diffusion planner package☆33Jul 24, 2025Updated 7 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- Large Multimodal Model☆15Apr 8, 2024Updated last year
- CDQA: Chinese Dynamic Question Answering Benchmark☆17Dec 13, 2024Updated last year
- ☆17Sep 9, 2024Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Jan 17, 2024Updated 2 years ago
- Creates an Azure AI Studio hub, project and required dependent resources including Azure Open AI Service, Cognitive Search and more.☆32Oct 2, 2024Updated last year
- Ongoing research training transformer models at scale☆18Jul 27, 2023Updated 2 years ago
- This is the official code base of AgentNetTool in OpenCUA. Website: https://opencua.xlang.ai/☆39Sep 3, 2025Updated 5 months ago
- Using Llama-3.2 in ComfyUI☆19Sep 26, 2024Updated last year
- FastAPI Server Implementation for Bilibili Index TTS☆25Apr 13, 2025Updated 10 months ago
- ComfyUI custom nodes to apply various image processing techniques☆24Mar 30, 2025Updated 11 months ago
- Python Coding Agent is an Crew AI Agent built using OpenAI's GPT-3.5 and Groq, designed to help with coding tasks such as generating code…☆23Jun 8, 2024Updated last year
- ☆23Jan 8, 2024Updated 2 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- ☆19Dec 6, 2023Updated 2 years ago