DRSY / EMOLinks

[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)

☆126

Alternatives and similar repositories for EMO

Users that are interested in EMO are comparing it to the libraries listed below

Sorting:

Yuanhy1997 / SeqDiffuSeq
Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation [NAACL 2024]
☆97Updated 2 years ago
thu-coai / DA-Transformer
Official Implementation for the ICML2022 paper "Directed Acyclic Transformer for Non-Autoregressive Machine Translation"
☆132Updated 2 years ago
morningmoni / UniPELT
Code for paper "UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning", ACL 2022
☆63Updated 3 years ago
thu-coai / TaiLr
ICLR2023 - Tailoring Language Generation Models under Total Variation Distance
☆21Updated 2 years ago
IndexFziQ / Diffusion4NLP-Papers
A paper list about diffusion models for natural language processing.
☆182Updated 2 years ago
thunlp / MoEfication
☆142Updated last year
SimiaoZuo / MoEBERT
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆112Updated 3 years ago
ChaosCodes / ProPETL
One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
☆40Updated 2 years ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Updated 2 years ago
RUCAIBox / Awesome-Text-Diffusion-Models
[IJCAI'23] The official Github page of the paper "Diffusion Models for Non-autoregressive Text Generation: A Survey".
☆60Updated last year
BenfengXu / KNNPrompting
Released code for our ICLR23 paper.
☆66Updated 2 years ago
benzakenelad / BitFit
Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
☆142Updated 3 years ago
yegcjs / DINOISER
☆25Updated 4 months ago
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆166Updated last year
microsoft / AdaMix
This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.1…
☆135Updated 2 years ago
LitterBrother-Xiao / Overview-of-Non-autoregressive-Applications
☆188Updated last year
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆83Updated last year
xhan77 / ssd-lm
Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
☆75Updated 3 years ago
JetRunner / SuperICL
Code for "Small Models are Valuable Plug-ins for Large Language Models"
☆131Updated 2 years ago
yegcjs / mixinglaws
☆108Updated 4 months ago
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆76Updated last year
ZHZisZZ / weak-to-strong-search
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
☆63Updated 11 months ago
lancopku / label-words-are-anchors
Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
☆165Updated last year
Alsace08 / OOD-Math-Reasoning
[NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"
☆27Updated last year
TianduoWang / DiffAug
[EMNLP 2022] Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536
☆40Updated 3 years ago
cmnfriend / O-LoRA
☆190Updated last year
RZFan525 / Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆79Updated 2 years ago
xinghaow99 / DenoSent
[AAAI 2024] DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning
☆15Updated last year
xiami2019 / CLAIF
[Findings of ACL'2023] Improving Contrastive Learning of Sentence Embeddings from AI Feedback
☆40Updated 2 years ago
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆30Updated last year