nengelmann / Fuyu-8B---ExplorationLinks

Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍

☆27

Alternatives and similar repositories for Fuyu-8B---Exploration

Users that are interested in Fuyu-8B---Exploration are comparing it to the libraries listed below

Sorting:

lucasjinreal / ImageTokenizer
imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…
☆37Updated last year
360CVGroup / 360VL
Our 2nd-gen LMM
☆34Updated last year
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
xhan77 / in-context-alignment
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
☆35Updated 2 years ago
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆92Updated last year
FudanNLPLAB / MouSi
☆75Updated last year
MonolithFoundation / Bumblebee
A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.
☆38Updated last year
XieZilongAI / E2E-AFG
An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation
☆15Updated last year
kyegomez / TinyGPTV
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
☆15Updated last year
RhapsodyAILab / MiniCPM-V-Embedding
☆29Updated last year
xverse-ai / XVERSE-MoE-A36B
XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.
☆38Updated last year
saxenarohit / MovieSum
☆15Updated last year
princeton-nlp / ELIZA-Transformer
[NAACL 2025] Representing Rule-based Chatbots with Transformers
☆22Updated 9 months ago
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆53Updated 11 months ago
huizhang0110 / catvision
A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…
☆14Updated last year
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆47Updated 8 months ago
vedaldi / micro_llama
A tiny, didactical implementation of LLAMA 3
☆42Updated 11 months ago
Letian2003 / MM_INF
An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…
☆32Updated 5 months ago
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 11 months ago
roboflow / cvevals
Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…
☆37Updated 2 years ago
Ucas-HaoranWei / Vary-family
☆57Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated last year
LAION-AI / riverbed
Tools for content datamining and NLP at scale
☆44Updated last year
Tencent-Hunyuan / Tencent-Hunyuan-7B-0124
☆28Updated 3 months ago
opendatalab / image-downloader
☆28Updated last year
autodistill / autodistill-efficient-yolo-world
EfficientSAM + YOLO World base model for use with Autodistill.
☆10Updated last year
FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆41Updated last year
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Updated last year
will-singularity / Skywork-MM
Empirical Study Towards Building An Effective Multi-Modal Large Language Model
☆22Updated 2 years ago
kq-chen / qwen-vl-utils
helper functions for processing and integrating visual language information with Qwen-VL Series Model
☆16Updated last year