tum-ai / number-token-lossLinks
A regression-alike loss to improve numerical reasoning in language models
☆18Updated last month
Alternatives and similar repositories for number-token-loss
Users that are interested in number-token-loss are comparing it to the libraries listed below
Sorting:
- Recycling diverse models☆45Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆39Updated 3 years ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆65Updated 10 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last month
- ☆14Updated 7 months ago
- Official repo for the paper "Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models" at N…☆8Updated 5 months ago
- ☆16Updated 2 years ago
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆33Updated last year
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19Updated 2 years ago
- Official implementation for Sparse MetA-Tuning (SMAT)☆16Updated last month
- Official Pytorch Implementation of Self-emerging Token Labeling☆34Updated last year
- Code accompanying the paper "A contrastive rule for meta-learning"☆12Updated 8 months ago
- Library for the Test-based Calibration Error (TCE) metric to quantify the degree to classifier calibration.☆13Updated last year
- Position Prediction as an Effective Pretraining Strategy☆8Updated 2 years ago
- Code for "Merging Text Transformers from Different Initializations"☆20Updated 5 months ago
- ☆29Updated last year
- ☆51Updated last year
- Applies ROME and MEMIT on Mamba-S4 models☆14Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆40Updated 9 months ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Updated 2 years ago
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆16Updated 9 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆29Updated 8 months ago
- The code for "Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling"☆11Updated 2 years ago
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 9 months ago
- ☆37Updated 11 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆31Updated last year
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆59Updated 7 months ago
- [NeurIPS'24] Official PyTorch implementation for paper "Knowledge Composition using Task Vectors with Learned Anisotropic Scaling"☆22Updated 4 months ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆12Updated last year
- [ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"☆18Updated 6 months ago