FoundationVision / Liquid
Liquid: Language Models are Scalable Multi-modal Generators
β23Updated this week
Alternatives and similar repositories for Liquid:
Users that are interested in Liquid are comparing it to the libraries listed below
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"β41Updated 2 months ago
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficientβ73Updated 2 weeks ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β30Updated 6 months ago
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Modelβ26Updated 3 weeks ago
- β32Updated last month
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Modelsβ54Updated 6 months ago
- β37Updated last year
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesisβ84Updated 5 months ago
- Open implementation of "RandAR"β37Updated last week
- FQGAN: Factorized Visual Tokenization and Generationβ36Updated 2 weeks ago
- β58Updated last year
- ICCV2023-Diffusion-Papersβ109Updated last year
- Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ57Updated 4 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generationβ32Updated last week
- Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLMβ39Updated 2 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generationβ78Updated 8 months ago
- π₯ Aurora Series: A more efficient multimodal large language model series for video.β57Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"β78Updated 2 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ29Updated 2 weeks ago
- VisualGPTScore for visio-linguistic reasoningβ26Updated last year
- The collection of awesome papers on alignment of diffusion models.β61Updated this week
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attenβ¦β31Updated 2 weeks ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β29Updated 2 weeks ago
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generationβ26Updated last week
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"β38Updated 6 months ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generationβ51Updated 3 months ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.β44Updated 2 months ago
- a collection of awesome autoregressive visual generation modelsβ51Updated this week