tum-ai / number-token-lossLinks
A regression-alike loss to improve numerical reasoning in language models - ICML 2025
☆27Updated 4 months ago
Alternatives and similar repositories for number-token-loss
Users that are interested in number-token-loss are comparing it to the libraries listed below
Sorting:
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆50Updated 7 months ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆72Updated last year
- [ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models☆47Updated last week
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆67Updated 2 years ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆34Updated 2 years ago
- ☆50Updated 11 months ago
- ☆18Updated 10 months ago
- ☆145Updated last year
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆40Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Updated 2 years ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆62Updated last year
- ☆51Updated 10 months ago
- Code for "Merging Text Transformers from Different Initializations"☆20Updated 10 months ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆20Updated 3 months ago
- A collection of AWESOME language modeling techniques on tabular data applications.☆32Updated last year
- Recycling diverse models☆46Updated 2 years ago
- ☆15Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆56Updated 7 months ago
- Model Stock: All we need is just a few fine-tuned models☆128Updated 4 months ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆39Updated 3 years ago
- ☆203Updated last year
- An automated data pipeline scaling RL to pretraining levels☆71Updated 2 months ago
- ☆81Updated last month
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆58Updated last year
- ☆25Updated last year
- ☆17Updated 4 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆81Updated 2 years ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆57Updated last year
- Implementation of Infini-Transformer in Pytorch☆113Updated 11 months ago
- Code for the paper "ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?"☆31Updated 6 months ago