pufanyi / syphusLinks

Syphus: Automatic Instruction-Response Generation Pipeline

☆14

Alternatives and similar repositories for syphus

Users that are interested in syphus are comparing it to the libraries listed below

Sorting:

EvolvingLMMs-Lab / VideoMMMU
☆49Updated 2 months ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆35Updated 4 months ago
ypwang61 / StoryEval
[CVPR2025] A benchmark for evaluating video generative models in generating short stories
☆15Updated last month
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆75Updated 3 months ago
Luodian / GenBench
Benchmarking and Analyzing Generative Data for Visual Recognition
☆26Updated last year
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆45Updated last year
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆29Updated 3 months ago
penghao-wu / ProxyV
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆15Updated last month
zeyofu / Commonsense-T2I
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆22Updated 10 months ago
UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆39Updated 3 weeks ago
Share14 / ShareGemini
☆30Updated 11 months ago
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆37Updated last year
TIGER-AI-Lab / VISTA
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆18Updated 4 months ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Updated 9 months ago
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆52Updated 3 weeks ago
drx-code / EquivariantModeling
Official PyTorch implementation of the paper "Equivariant Image Modeling"(https://arxiv.org/abs/2503.18948)
☆33Updated 2 months ago
yangjie-cv / WeThink
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆24Updated 2 weeks ago
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆70Updated last week
chenshuang-zhang / imagenet_d
[CVPR 2024 Highlight] ImageNet-D
☆43Updated 8 months ago
TencentARC / MindOmni
☆66Updated last week
neu-vi / FleVRS
FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024
☆21Updated 6 months ago
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated last year
Pepper-lll / LMforImageGeneration
Codebase for the paper-Elucidating the design space of language models for image generation
☆45Updated 7 months ago
chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆64Updated last month
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆55Updated 10 months ago
ziplab / SN-Netv2
[ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".
☆27Updated last year
sled-group / moh
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
☆29Updated 7 months ago
si0wang / VisVM
☆44Updated 5 months ago
mshukor / ima-lmms
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆19Updated 8 months ago
Gen-Verse / HermesFlow
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆63Updated 4 months ago