tum-ai / number-token-lossLinks

A regression-alike loss to improve numerical reasoning in language models - ICML 2025

☆27

Alternatives and similar repositories for number-token-loss

Users that are interested in number-token-loss are comparing it to the libraries listed below

Sorting:

UCSC-VLAA / m1
[ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
☆46Updated 7 months ago
microsoft / x-reasoner
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆49Updated 6 months ago
lanxiang1017 / Language-Modeling-on-Tabular-Data-Survey
A collection of AWESOME language modeling techniques on tabular data applications.
☆32Updated last year
lucidrains / AMIE-pytorch
Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind
☆71Updated last year
bwconrad / soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆66Updated 2 years ago
uclaml / MoE
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆33Updated last year
Weixin-Liang / Mixture-of-Mamba
☆50Updated 10 months ago
facebookresearch / ModelRatatouille
Recycling diverse models
☆46Updated 2 years ago
UCSC-VLAA / o1_medical
☆48Updated 9 months ago
waltonfuture / Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆55Updated 6 months ago
facebookresearch / iclmlp
Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"
☆19Updated 2 years ago
csinva / tree-prompt
Tree prompting: easy-to-use scikit-learn interface for improved prompting.
☆40Updated 2 years ago
vis-nlp / ChartInstruct
☆25Updated last year
mistralai / mistral-evals
☆78Updated 2 weeks ago
lucidrains / tableformer-pytorch
Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch
☆39Updated 3 years ago
stanfordmlgroup / ManyICL
☆145Updated last year
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated 2 years ago
yang-ze-kang / AutoMMLab
☆18Updated 9 months ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 11 months ago
krafton-ai / mambaformer-icl
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
☆56Updated last year
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆41Updated 7 months ago
tianyu-z / VCR
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
☆31Updated 9 months ago
mlfoundations / dataset2metadata
☆27Updated last year
kyegomez / VortexFusion
Transformers + Mambas + LSTMS All in One Model
☆14Updated this week
seharanul17 / synthetic-tabular-LLM
☆15Updated last year
tianyi-lab / Mosaic-IT
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
☆20Updated 2 months ago
CMU-MultiComp-Lab / mmml-tutorial
☆30Updated 2 years ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆111Updated this week
THUDM / Awesome-Parameter-Efficient-Fine-Tuning-for-Foundation-Models
Parameter-Efficient Fine-Tuning for Foundation Models
☆101Updated 8 months ago
Hritikbansal / medmax
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
☆39Updated 2 months ago