Sleepychord / cogdata
A light-weight data management system for large-scale pretraining
☆20Updated 11 months ago
Alternatives and similar repositories for cogdata:
Users that are interested in cogdata are comparing it to the libraries listed below
- ☆11Updated 8 months ago
- ☆143Updated 3 months ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆209Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆82Updated last year
- Official repository of MMDU dataset☆89Updated 6 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆70Updated 11 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Updated last year
- Official github repo of G-LLaVA☆137Updated 2 months ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆53Updated last year
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆42Updated last week
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆44Updated 4 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆223Updated last year
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆39Updated 4 months ago
- ☆133Updated last year
- ☆144Updated 5 months ago
- OFA-Compress is a unified framework which provides OFA model finetuning, distillation and inference capabilities in Huggingface version, …☆27Updated 2 years ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated last month
- An in-context conditioning version of MUSE with pre-trained checkpoints.☆111Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 8 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 2 months ago
- RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with t …☆126Updated 10 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- ☆63Updated 8 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆51Updated 5 months ago
- The HD-VG-130M Dataset☆117Updated last year
- Official repo for StableLLAVA☆95Updated last year
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆88Updated 2 months ago
- ☆51Updated last year