pufanyi / syphus
Syphus: Automatic Instruction-Response Generation Pipeline
☆14Updated last year
Alternatives and similar repositories for syphus:
Users that are interested in syphus are comparing it to the libraries listed below
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆18Updated last week
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Updated last year
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆30Updated 6 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆57Updated last month
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆28Updated 6 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆40Updated last month
- ☆25Updated 4 months ago
- ☆33Updated last month
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆29Updated 2 weeks ago
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆26Updated last month
- Official Repository of Personalized Visual Instruct Tuning☆26Updated last month
- Liquid: Language Models are Scalable Multi-modal Generators☆23Updated this week
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆21Updated last month
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆31Updated 2 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆33Updated this week
- Diffusion Powers Video Tokenizer for Comprehension and Generation☆32Updated last week
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆54Updated 6 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆48Updated 4 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated 8 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆33Updated 6 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆25Updated last month
- (ICLR 2024, CVPR 2024) SparseFormer☆64Updated last month
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆51Updated last month
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆32Updated 3 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆57Updated 2 months ago