daochenzha / dreamshardLinks

[NeurIPS 2022] DreamShard: Generalizable Embedding Table Placement for Recommender Systems

☆29

Alternatives and similar repositories for dreamshard

Users that are interested in dreamshard are comparing it to the libraries listed below

Sorting:

daochenzha / autoshard
[KDD 2022] AutoShard: Automated Embedding Table Sharding for Recommender Systems
☆22Updated 2 years ago
aniquetahir / JORA
JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)
☆36Updated last year
mingkai-zheng / GENIUS
Can GPT-4 Perform Neural Architecture Search?
☆87Updated 2 years ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
microsoft / AutoMoE
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
☆47Updated 3 years ago
mazumder-lab / COMET
Code for COMET: Cardinality Constrained Mixture of Experts with Trees and Local Search
☆11Updated 2 years ago
guanyilin428 / Dynamic-Speculative-Planning
☆31Updated 2 months ago
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
Doraemonzzz / hgru2-pytorch
☆23Updated last year
nilesh2797 / ELIAS
Official codebase for NeurIPS 2022 paper End-to-end Learning to Index and Search in Large Output Spaces
☆12Updated 2 years ago
IdoAmos / not-from-scratch
☆33Updated last year
likenneth / q_probe
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆41Updated last year
wolfecameron / GIST
Repository for "GIST: Distributed training for large-scale graph convolutional networks"
☆15Updated 2 years ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
ctlllll / understanding_llm_benchmarks
Understanding the correlation between different LLM benchmarks
☆29Updated last year
abaheti95 / LoL-RL
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆26Updated last year
dinobby / MAGDi
The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…
☆37Updated last year
facebookresearch / macta
MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection
☆46Updated 2 years ago
google-deepmind / spectral_ssm
☆34Updated last year
Cornell-RL / drpo
Dateset Reset Policy Optimization
☆31Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆49Updated 2 years ago
krafton-ai / mambaformer-icl
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
☆57Updated last year
aoiang / LaMOO
☆29Updated 3 years ago
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆32Updated last year
EdinburghNLP / torch-adaptive-imle
☆35Updated 11 months ago
annosubmission / GRC-Cache
☆16Updated 2 years ago
gregorbachmann / scaling_mlps
☆52Updated last year
facebookresearch / NeuralMemory
A Data Source for Reasoning Embodied Agents
☆19Updated 2 years ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆36Updated last year
bailuding / rails
Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)
☆51Updated 6 months ago