Sleepychord / cogdataLinks
A light-weight data management system for large-scale pretraining
☆20Updated last month
Alternatives and similar repositories for cogdata
Users that are interested in cogdata are comparing it to the libraries listed below
Sorting:
- ☆11Updated 10 months ago
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆211Updated last year
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆74Updated 7 months ago
- ☆152Updated 5 months ago
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆188Updated last year
- OFA-Compress is a unified framework which provides OFA model finetuning, distillation and inference capabilities in Huggingface version, …☆28Updated 2 years ago
- Official repository of MMDU dataset☆92Updated 9 months ago
- ☆150Updated 8 months ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆54Updated last year
- Official github repo of G-LLaVA☆143Updated 4 months ago
- ☆21Updated last year
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆227Updated last year
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- ☆133Updated last year
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆71Updated last year
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark☆92Updated 10 months ago
- The HD-VG-130M Dataset☆118Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Updated last year
- The Document of WenLan API, which was used to obtain image and text feature.☆37Updated 2 years ago
- ☆17Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated 3 months ago
- An in-context conditioning version of MUSE with pre-trained checkpoints.☆112Updated 2 years ago
- RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with t…☆136Updated last year
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆102Updated 6 months ago
- WuDaoMM this is a data project☆74Updated 3 years ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 10 months ago
- ☆92Updated last year
- ☆37Updated last month