arshadshk / Position-Prediction-Pretraining
Position Prediction as an Effective Pretraining Strategy
☆8Updated 2 years ago
Alternatives and similar repositories for Position-Prediction-Pretraining:
Users that are interested in Position-Prediction-Pretraining are comparing it to the libraries listed below
- Official code for the paper "Attention as a Hypernetwork"☆28Updated 10 months ago
- ☆14Updated 2 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- [Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.☆14Updated last year
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 6 months ago
- ☆18Updated 9 months ago
- Official repository for Fourier model that can generate periodic signals☆10Updated 3 years ago
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆20Updated last year
- Structured Pruning Adapters in PyTorch☆17Updated last year
- Self-Contrastive Learning: Single-viewed Supervised Contrastive Framework using Sub-network (AAAI 2023)☆20Updated last year
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 7 months ago
- ☆31Updated 6 months ago
- Official PyTorch implementation of "Energy-Based Contrastive Learning of Visual Representations", NeurIPS 2022 Oral Paper☆10Updated 2 years ago
- ☆10Updated 7 months ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Updated 2 years ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Updated last year
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated last year
- ☆11Updated last year
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 3 years ago
- ☆16Updated 6 months ago
- Official implementation of the paper "Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Perform…☆20Updated 2 years ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated last year
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆28Updated last year
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated 2 weeks ago
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆64Updated 3 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆51Updated 10 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 5 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 7 months ago