kyegomez / AnyMAL
The open source implementation of "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"
☆22Updated last month
Alternatives and similar repositories for AnyMAL:
Users that are interested in AnyMAL are comparing it to the libraries listed below
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆38Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 5 months ago
- ☆13Updated 2 years ago
- survery of small language models☆14Updated 7 months ago
- ☆35Updated last week
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 3 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- ☆19Updated last year
- Large Multimodal Model☆14Updated 11 months ago
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆17Updated 10 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated 11 months ago
- ☆30Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated last week
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated 11 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆33Updated 8 months ago
- ☆65Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- 🎮Manipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automati…☆17Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆48Updated 2 months ago
- FuseAI Project☆83Updated last month
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 2 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Updated 3 months ago
- [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…☆20Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year
- The official repo of continuous speculative decoding☆24Updated 3 months ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Updated 2 years ago
- ☆28Updated last year
- ☆66Updated 2 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated 11 months ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆18Updated 2 years ago