arshadshk / Position-Prediction-PretrainingLinks
Position Prediction as an Effective Pretraining Strategy
☆8Updated 2 years ago
Alternatives and similar repositories for Position-Prediction-Pretraining
Users that are interested in Position-Prediction-Pretraining are comparing it to the libraries listed below
Sorting:
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 9 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Updated last year
- ☆45Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆36Updated 11 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 7 months ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆14Updated 9 months ago
- ☆19Updated 10 months ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated last year
- Self-Supervised Alignment with Mutual Information☆19Updated last year
- ☆14Updated 3 years ago
- This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.☆27Updated last year
- ☆23Updated 8 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆54Updated 11 months ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated last month
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆64Updated last year
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆16Updated 2 years ago
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 8 months ago
- Official repository for Fourier model that can generate periodic signals☆10Updated 3 years ago
- ☆31Updated 7 months ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated 3 weeks ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆39Updated 3 years ago
- ☆16Updated 10 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆43Updated last year
- Code for T-MARS data filtering☆35Updated last year
- ☆51Updated 11 months ago
- Companion repository to "Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models"☆13Updated 2 years ago
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 8 months ago
- ☆16Updated 2 years ago
- Domain Adaptation and Adapters☆16Updated 2 years ago