Sleepychord / cogdata
A light-weight data management system for large-scale pretraining
☆20Updated 9 months ago
Alternatives and similar repositories for cogdata:
Users that are interested in cogdata are comparing it to the libraries listed below
- ☆138Updated last month
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year
- ☆11Updated 7 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆44Updated this week
- Keras implement of Finite Scalar Quantization☆70Updated last year
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆78Updated last month
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 3 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆85Updated 6 months ago
- Official repository of MMDU dataset☆86Updated 5 months ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆52Updated last year
- Source code for EMNLP2022 long paper: Parameter-Efficient Tuning Makes a Good Classification Head☆14Updated 2 years ago
- Official github repo of G-LLaVA☆130Updated 3 weeks ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆204Updated last year
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆30Updated 3 months ago
- ☆17Updated last year
- SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training☆177Updated last month
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆216Updated 11 months ago
- OFA-Compress is a unified framework which provides OFA model finetuning, distillation and inference capabilities in Huggingface version, …☆27Updated 2 years ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆68Updated 9 months ago
- LVBench: An Extreme Long Video Understanding Benchmark☆84Updated 6 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 6 months ago
- Video dataset dedicated to portrait-mode video recognition.☆44Updated 3 months ago
- ☆133Updated last year
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆179Updated last year
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Google☆49Updated 7 months ago
- ☆141Updated 4 months ago
- ☆100Updated 8 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year