XinyuSun / mlc-chatbot
python interface for mlc chat cli
☆15Updated last year
Related projects ⓘ
Alternatives and complementary repositories for mlc-chatbot
- Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters☆85Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆77Updated 9 months ago
- ☆83Updated last year
- ☆131Updated 10 months ago
- Official Code of IdealGPT☆32Updated last year
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆65Updated 4 months ago
- The official repository of "Video assistant towards large language model makes everything easy"☆210Updated 8 months ago
- ☆45Updated last year
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆233Updated last week
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆64Updated last year
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆102Updated last year
- Open LLaMA Eyes to See the World☆175Updated last year
- GPT-4V in Wonderland: LMMs as Smartphone Agents☆128Updated 3 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆80Updated 3 months ago
- ☆65Updated last year
- paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/☆229Updated last year
- Towards Large Multimodal Models as Visual Foundation Agents☆114Updated 2 weeks ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆264Updated last week
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆255Updated 8 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆125Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆98Updated 3 weeks ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆188Updated 2 months ago
- Official github repo of G-LLaVA☆121Updated 5 months ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆196Updated 3 months ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆164Updated 2 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆243Updated 6 months ago
- The model, data and code for the visual GUI Agent SeeClick☆219Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated this week
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆164Updated last year
- SVIT: Scaling up Visual Instruction Tuning☆163Updated 4 months ago