Survey on Data-centric Large Language Models
☆93Jul 8, 2024Updated last year
Alternatives and similar repositories for Data-centric_multimodal_LLM
Users that are interested in Data-centric_multimodal_LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A list of papers about data quality in Large Language Models (LLMs)☆27Dec 14, 2023Updated 2 years ago
- This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.☆47Aug 22, 2025Updated 7 months ago
- Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.☆32Mar 10, 2026Updated 2 weeks ago
- DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizin…☆117Updated this week
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 5 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Unified Codebase for Advanced World Models.☆261Updated this week
- ☆111Sep 11, 2025Updated 6 months ago
- WisdoMentor - Series: A LLM for undergraduates | 博导智言(辅助大学生 学习)☆13May 9, 2024Updated last year
- 基于Llama3,通过进一步CPT,SFT,ORPO得到的中文版Llama3☆16Apr 24, 2024Updated last year
- ☆14Mar 15, 2024Updated 2 years ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆89Sep 23, 2025Updated 6 months ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆35Jul 16, 2025Updated 8 months ago
- An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture sear…☆64Nov 11, 2025Updated 4 months ago
- [ICLR 2026] Empowering Small VLMs to Think with Dynamic Memorization and Exploration☆16Mar 18, 2026Updated last week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Easy Data Preparation with latest LLMs-based Operators and Pipelines.☆3,084Mar 17, 2026Updated last week
- ☆15Oct 4, 2024Updated last year
- PGRAG☆52Jul 16, 2024Updated last year
- [ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multi…☆176Feb 7, 2026Updated last month
- ☆23Jan 16, 2024Updated 2 years ago
- Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries☆35Nov 19, 2025Updated 4 months ago
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆36Jan 16, 2026Updated 2 months ago
- Code and data from the paper 'Human Feedback is not Gold Standard'☆20Mar 6, 2026Updated 3 weeks ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- (CVPR 2026) Official repository for Scone (Subject-driven COmposition and DistinctioN Enhancement) model, supporting subject composition …☆28Jan 14, 2026Updated 2 months ago
- Official code for ICLR 2024 paper "Do Generated Data Always Help Contrastive Learning?"☆31Apr 4, 2024Updated last year
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆28May 26, 2025Updated 10 months ago
- Code for our ICML'24 on multimodal dataset distillation☆43Oct 11, 2024Updated last year
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆35Aug 28, 2025Updated 6 months ago
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.☆36Jul 7, 2025Updated 8 months ago
- ☆15May 30, 2025Updated 9 months ago
- ☆47Dec 30, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey☆21Jul 27, 2025Updated 7 months ago
- awsome ai tools☆12Apr 21, 2023Updated 2 years ago
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Jun 23, 2025Updated 9 months ago
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆25Feb 25, 2025Updated last year
- ☆26Jul 10, 2025Updated 8 months ago
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Oct 9, 2024Updated last year
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆39Feb 13, 2025Updated last year