arshadshk / Position-Prediction-Pretraining
Position Prediction as an Effective Pretraining Strategy
☆8Updated last year
Related projects: ⓘ
- ☆13Updated 2 years ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆10Updated 3 weeks ago
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆29Updated last month
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Updated last year
- Structured Pruning Adapters in PyTorch☆15Updated last year
- ☆13Updated last year
- ☆15Updated 2 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆15Updated 10 months ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Updated 3 months ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆35Updated 2 years ago
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆27Updated last week
- ☆22Updated 10 months ago
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆15Updated last month
- ☆20Updated last year
- ☆11Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆42Updated last year
- ☆21Updated 7 months ago
- Official code for the paper "Attention as a Hypernetwork"☆20Updated 2 months ago
- ☆44Updated 11 months ago
- Model Stock: All we need is just a few fine-tuned models☆75Updated 5 months ago
- Code for "Merging Text Transformers from Different Initializations"☆18Updated last month
- The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT …☆28Updated 6 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆33Updated 3 months ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆16Updated last year
- [NeurIPS2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆28Updated last year
- Domain Adaptation and Adapters☆16Updated last year
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- Official PyTorch implementation of "Energy-Based Contrastive Learning of Visual Representations", NeurIPS 2022 Oral Paper☆9Updated last year
- ☆11Updated 11 months ago
- Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…☆11Updated 2 months ago