arshadshk / Position-Prediction-Pretraining
Position Prediction as an Effective Pretraining Strategy
☆8Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Position-Prediction-Pretraining
- ☆13Updated 2 years ago
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆27Updated 2 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- ☆13Updated last year
- Structured Pruning Adapters in PyTorch☆15Updated last year
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆11Updated 2 months ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated last year
- Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]☆31Updated 2 months ago
- ☆25Updated last month
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 10 months ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Updated 5 months ago
- ☆44Updated last year
- Code for T-MARS data filtering☆35Updated last year
- ☆51Updated 5 months ago
- Official PyTorch implementation for NeurIPS'24 paper "Knowledge Composition using Task Vectors with Learned Anisotropic Scaling"☆11Updated 2 weeks ago
- ☆15Updated 4 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆35Updated 2 months ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆36Updated 2 years ago
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆32Updated 3 months ago
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Models☆15Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 5 months ago
- Official PyTorch implementation of "Energy-Based Contrastive Learning of Visual Representations", NeurIPS 2022 Oral Paper☆9Updated 2 years ago
- codebase for the SIMAT dataset and evaluation☆38Updated 2 years ago
- Gradient-based Hyperparameter Optimization Over Long Horizons☆12Updated 3 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆95Updated last year
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- Repository for Skill Set Optimization☆12Updated 3 months ago