Sleepychord / cogdata
A light-weight data management system for large-scale pretraining
☆20Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for cogdata
- ☆11Updated 3 months ago
- Official github repo of G-LLaVA☆121Updated 5 months ago
- Video dataset dedicated to portrait-mode video recognition.☆35Updated 7 months ago
- ☆17Updated 7 months ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆48Updated 8 months ago
- LVBench: An Extreme Long Video Understanding Benchmark☆59Updated 2 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆111Updated last month
- ☆92Updated last year
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆50Updated 5 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆77Updated 9 months ago
- ☆121Updated 2 weeks ago
- ☆126Updated last week
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆72Updated 8 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆48Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆194Updated 8 months ago
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆17Updated 2 weeks ago
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆20Updated 8 months ago
- ☆47Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆26Updated 3 weeks ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆48Updated last week
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).☆30Updated 6 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 2 months ago
- Official repository of MMDU dataset☆74Updated last month
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆77Updated 7 months ago
- Keras implement of Finite Scalar Quantization☆63Updated last year
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆96Updated 3 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆197Updated 7 months ago
- DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆56Updated 3 weeks ago
- The HD-VG-130M Dataset☆108Updated 7 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆45Updated 2 months ago