om-ai-lab / OmAgent
A Streamlined Multimodal Agent Framework for Smart Hardware and More
☆1,116Updated this week
Related projects ⓘ
Alternatives and complementary repositories for OmAgent
- Real-time and accurate open-vocabulary end-to-end object detection☆1,525Updated 2 months ago
- The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"☆768Updated 2 months ago
- [NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,260Updated last month
- Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。☆1,498Updated this week
- LLM-And-More is a professional, plug-and-play, llm trainer and application builder that guides you through the complete LLM workflow from…☆457Updated 4 months ago
- Multilingual Corpus of Web Fiction☆215Updated 4 months ago
- Next-Generation Interactive Intelligent Programming Assistant☆1,045Updated 3 weeks ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆132Updated 2 months ago
- This is the official reproduction of FancyVideo.☆785Updated last week
- The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://a…☆353Updated last month
- Easiest and laziest way for building multi-agent LLMs applications.☆1,016Updated this week
- 【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models☆1,821Updated this week
- An MBTI Exploration of Large Language Models☆471Updated 9 months ago
- Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"☆225Updated 3 months ago
- PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.☆1,129Updated 2 weeks ago
- Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models☆639Updated last month
- [ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"☆520Updated 6 months ago
- Unofficial Implementation of ReplaceAnything: https://aigcdesigngroup.github.io/replace-anything/☆528Updated 5 months ago
- improve Llama-2's proficiency in comprehension, generation, and translation of Chinese.☆531Updated 7 months ago
- ⭐ Dynamically generate stats SVG from your Github, LeetCode, Steam, and more in #Cyberpunk style :)☆924Updated this week
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆269Updated 6 months ago
- The "virtual_human_stream" project is a real-time digital human system supporting audio-video dialogue. It integrates models like ernerf,…☆583Updated last week
- A tutorial based on MetaGPT to quickly help you understand the concept of agent and muti-agent and get started with coding development. 基…☆1,357Updated 5 months ago
- One-stop data intelligence agent, providing insights from all mainstream data formats in a single dialogue box, including documents, data…☆439Updated this week
- csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and App…☆509Updated this week
- Tiny3D is a next generation of 3D AI service production system.☆601Updated last year
- Accelerate your Stable Diffusion inference with the library's universal C/C++ framework design, powered by ONNXRuntime & across platforms…☆629Updated 2 months ago
- The official repository of the paper "(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long…☆578Updated 4 months ago
- An AI agent powered by LLMs that streamlines the entire process of data analysis. 🚀☆349Updated 3 months ago
- A powerful baseline for image classification, face recognition and image retrieval with Pytorch☆547Updated this week