tum-ai / number-token-lossLinks
A regression-alike loss to improve numerical reasoning in language models - ICML 2025
☆26Updated 2 months ago
Alternatives and similar repositories for number-token-loss
Users that are interested in number-token-loss are comparing it to the libraries listed below
Sorting:
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆68Updated last year
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆49Updated 6 months ago
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models☆43Updated 7 months ago
- A collection of AWESOME language modeling techniques on tabular data applications.☆32Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆66Updated 2 years ago
- ☆144Updated last year
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆32Updated last year
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆40Updated 2 years ago
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆34Updated 2 years ago
- ☆50Updated 9 months ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆20Updated last month
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆60Updated 11 months ago
- Model Stock: All we need is just a few fine-tuned models☆126Updated 3 months ago
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19Updated 2 years ago
- Recycling diverse models☆46Updated 2 years ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- ☆29Updated 2 years ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆45Updated 3 weeks ago
- Exploration of automated dataset selection approaches at large scales.☆48Updated 8 months ago
- ☆15Updated 11 months ago
- Implementation of Infini-Transformer in Pytorch☆113Updated 10 months ago
- Holistic evaluation of multimodal foundation models☆47Updated last year
- Code for "Democratizing Reasoning Ability: Tailored Learning from Large Language Model", EMNLP 2023☆36Updated last year
- [EMNLP 2025] Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards☆46Updated 2 months ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆61Updated last year
- MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants☆39Updated last month
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated 2 years ago
- ☆48Updated 8 months ago
- ☆33Updated 10 months ago