arshadshk / Position-Prediction-PretrainingLinks

Position Prediction as an Effective Pretraining Strategy

☆8

Alternatives and similar repositories for Position-Prediction-Pretraining

Users that are interested in Position-Prediction-Pretraining are comparing it to the libraries listed below

Sorting:

jason9693 / FROZEN
☆14Updated 3 years ago
DaehanKim / EasyRLHF
EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets
☆9Updated last year
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated last month
lucidrains / tableformer-pytorch
Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch
☆39Updated 3 years ago
gregorbachmann / scaling_mlps
☆51Updated last year
naver-ai / model-stock
Model Stock: All we need is just a few fine-tuned models
☆119Updated 10 months ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆44Updated last year
kaistAI / GAP
[ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization
☆29Updated 10 months ago
facebookresearch / iclmlp
Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"
☆19Updated 2 years ago
zhjohnchan / bert-clip-synesthesia
[Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.
☆14Updated 2 years ago
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆34Updated 2 years ago
alinlab / HOMER
Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).
☆43Updated last year
LukasHedegaard / structured-pruning-adapters
Structured Pruning Adapters in PyTorch
☆18Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
thunlp / DPT
☆13Updated 3 years ago
locuslab / T-MARS
Code for T-MARS data filtering
☆35Updated last year
formll / resolving-scaling-law-discrepancies
☆20Updated last year
lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆88Updated 2 years ago
facebookresearch / SIMAT
codebase for the SIMAT dataset and evaluation
☆38Updated 3 years ago
jiyounglee-0523 / FourierDecoder
Official repository for Fourier model that can generate periodic signals
☆10Updated 3 years ago
annosubmission / GRC-Cache
☆16Updated 2 years ago
JeanKaddour / LAWA
Latest Weight Averaging (NeurIPS HITY 2022)
☆31Updated 2 years ago
ThomasScialom / T0_continual_learning
Adding new tasks to T0 without catastrophic forgetting
☆33Updated 2 years ago
snu-mllab / Neural-Relation-Graph
Official PyTorch implementation of "Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data" (NeurIPS'23)
☆15Updated last year
BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆31Updated 2 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆119Updated 4 years ago
sanyalsunny111 / Early_Weight_Avg
[COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training
☆17Updated 9 months ago
MikeWangWZHL / Zemi
Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings
☆16Updated 2 years ago
LooperXX / ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
☆11Updated 7 months ago
wang-kee / LiNeS
Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"
☆30Updated 9 months ago