MetabrainAGI / Awaker

☆29

Alternatives and similar repositories for Awaker:

Users that are interested in Awaker are comparing it to the libraries listed below

SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆40Updated 7 months ago
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆49Updated 3 weeks ago
shulin16 / MMInA
Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"
☆41Updated last week
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated 10 months ago
WeihuangLin / INF-LLaVA
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆41Updated 6 months ago
dongyh20 / Insight-V
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆131Updated 2 months ago
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆33Updated 7 months ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆89Updated last month
FudanNLPLAB / MouSi
☆73Updated 11 months ago
beichenzbc / BoostStep
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆28Updated 3 weeks ago
Vchitect / LiteGen
A light-weight and high-efficient training framework for accelerating diffusion tasks.
☆46Updated 5 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆41Updated last month
xmu-xiaoma666 / Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆29Updated 4 months ago
mengcye / LAION-SG
☆47Updated 2 months ago
measure-infinity / mulan-code
☆40Updated 7 months ago
will-singularity / Skywork-MM
Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆23Updated last year
hyc2026 / StoryTeller
☆68Updated 2 months ago
si0wang / VisVM
☆33Updated last month
SihuiJi / FashionComposer
☆21Updated last month
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆74Updated 3 months ago
mini-sora / MiniSora-DiT
minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora
☆40Updated 10 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆114Updated 3 months ago
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆58Updated 3 months ago
Gabesarch / ICAL
☆32Updated 3 weeks ago
pixeli99 / MixLN
Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yi…
☆16Updated last month
viiika / HumanEdit
Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
☆22Updated 2 months ago
Owen718 / LongPrompt-LLamaGen
This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…
☆30Updated 3 months ago
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆67Updated 2 months ago
Roblox / SmoothCache
Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.
☆38Updated 3 weeks ago