tum-ai / number-token-lossLinks
A regression-alike loss to improve numerical reasoning in language models - ICML 2025
☆25Updated last month
Alternatives and similar repositories for number-token-loss
Users that are interested in number-token-loss are comparing it to the libraries listed below
Sorting:
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆48Updated 4 months ago
- A collection of AWESOME language modeling techniques on tabular data applications.☆32Updated 11 months ago
- ☆142Updated last year
- ☆16Updated 7 months ago
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆43Updated last year
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆68Updated last year
- ☆48Updated 7 months ago
- ☆15Updated 10 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆53Updated 4 months ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆61Updated 11 months ago
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models☆42Updated 5 months ago
- ☆48Updated 7 months ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆20Updated last week
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆34Updated 5 months ago
- ☆18Updated 2 months ago
- Recycling diverse models☆45Updated 2 years ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆106Updated last week
- ☆19Updated 6 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆57Updated last year
- Multimodal Graph Learning: how to encode multiple multimodal neighbors with their relations into LLMs☆65Updated last year
- ☆40Updated 4 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆83Updated 11 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆41Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆55Updated 2 years ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆55Updated 8 months ago
- ☆16Updated 2 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- Parameter-Efficient Fine-Tuning for Foundation Models☆93Updated 6 months ago
- ☆48Updated 8 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆121Updated last year