arshadshk / Position-Prediction-Pretraining
Position Prediction as an Effective Pretraining Strategy
☆8Updated 2 years ago
Alternatives and similar repositories for Position-Prediction-Pretraining:
Users that are interested in Position-Prediction-Pretraining are comparing it to the libraries listed below
- Official code for the paper "Attention as a Hypernetwork"☆25Updated 9 months ago
- ☆18Updated 8 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- ☆31Updated 5 months ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆12Updated 9 months ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated last week
- This is the public github for our paper "Transformer with a Mixture of Gaussian Keys"☆26Updated 2 years ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆25Updated last year
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆20Updated last year
- Code for T-MARS data filtering☆35Updated last year
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated last year
- Structured Pruning Adapters in PyTorch☆16Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆24Updated 4 months ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 5 months ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆12Updated 7 months ago
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 6 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 6 months ago
- ☆14Updated 4 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 7 months ago
- ☆15Updated 8 months ago
- This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.☆27Updated 10 months ago
- Self-Supervised Alignment with Mutual Information☆16Updated 10 months ago
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆29Updated 2 years ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Updated 2 years ago
- ☆16Updated 2 years ago
- Official implementation of Matrix Variational Masked Autoencoder (M-MAE) for ICML paper "Information Flow in Self-Supervised Learning" (h…☆14Updated 6 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆29Updated last year
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated last year
- [Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.☆14Updated last year