ZhentingWang / DUMPLinks
☆19Updated 3 weeks ago
Alternatives and similar repositories for DUMP
Users that are interested in DUMP are comparing it to the libraries listed below
Sorting:
- ☆18Updated 9 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆36Updated last week
- ☆15Updated last month
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆27Updated 3 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆44Updated last month
- A Sober Look at Language Model Reasoning☆63Updated last week
- What Makes a Reward Model a Good Teacher? An Optimization Perspective☆31Updated last month
- ☆19Updated 3 months ago
- ☆22Updated 2 months ago
- ☆14Updated 3 months ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated 11 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated last month
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated 2 weeks ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 7 months ago
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆25Updated 9 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆85Updated 7 months ago
- ☆32Updated 5 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 6 months ago
- Exploration of automated dataset selection approaches at large scales.☆42Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68Updated last year
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆10Updated 4 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆104Updated 2 months ago
- ☆15Updated last month
- Unsupervised GRPO☆24Updated this week
- Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆15Updated 2 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆16Updated last month
- ☆16Updated 10 months ago
- ☆19Updated 10 months ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆23Updated 6 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 3 months ago