tum-ai / number-token-lossLinks
A regression-alike loss to improve numerical reasoning in language models - ICML 2025
☆26Updated 2 months ago
Alternatives and similar repositories for number-token-loss
Users that are interested in number-token-loss are comparing it to the libraries listed below
Sorting:
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆49Updated 5 months ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆68Updated last year
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models☆42Updated 6 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆65Updated 2 years ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆31Updated last year
- ☆25Updated last year
- A collection of AWESOME language modeling techniques on tabular data applications.☆32Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆44Updated last week
- ☆142Updated last year
- Model Stock: All we need is just a few fine-tuned models☆125Updated 2 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆40Updated 2 years ago
- ☆50Updated 9 months ago
- MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants☆37Updated last month
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆34Updated 2 years ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆20Updated last month
- Implementation of Infini-Transformer in Pytorch☆113Updated 9 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆54Updated 5 months ago
- ☆53Updated 9 months ago
- ☆50Updated 8 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆108Updated last week
- ☆28Updated 8 months ago
- ☆41Updated last year
- ☆15Updated 10 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆35Updated 6 months ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆39Updated 3 years ago
- Parameter-Efficient Fine-Tuning for Foundation Models☆94Updated 6 months ago
- State Space Models☆70Updated last year
- Geometric-Mean Policy Optimization☆86Updated last week
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆107Updated 10 months ago