mingkai-zheng / GENIUSLinks

Can GPT-4 Perform Neural Architecture Search?

☆87

Alternatives and similar repositories for GENIUS

Users that are interested in GENIUS are comparing it to the libraries listed below

Sorting:

microsoft / AutoMoE
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
☆47Updated 3 years ago
ctlllll / reward_collapse
☆27Updated 2 years ago
VITA-Group / Junk_DNA_Hypothesis
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Updated 7 months ago
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆49Updated 2 years ago
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Updated 2 years ago
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆23Updated last year
katiekang1998 / reasoning_generalization
☆33Updated 10 months ago
abaheti95 / LoL-RL
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆26Updated last year
facebookresearch / ModelRatatouille
Recycling diverse models
☆46Updated 2 years ago
aniquetahir / JORA
JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)
☆36Updated last year
ylsung / vl-merging
PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"
☆37Updated 2 years ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 9 months ago
kyegomez / EAOT
The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"
☆19Updated last year
LAION-AI / Conditional-Pretraining-of-Large-Language-Models
☆37Updated 2 years ago
aszala / EnvGen
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
☆38Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Updated 2 years ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
duterscmy / CD-MoE
Official PyTorch implementation of CD-MOE
☆12Updated 7 months ago
VITA-Group / Data-Efficient-Scaling
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
boone891214 / MEST
[NeurIPS‘2021] "MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge", Geng Yuan, Xiaolong Ma, Yanzhi Wang et al…
☆18Updated 3 years ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆28Updated 11 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
zhangir-azerbayev / MetaMath
☆11Updated 2 years ago
Infini-AI-Lab / gsm_infinite
☆55Updated 5 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆101Updated last year
AwesomeLemon / ENCAS
Neural Architecture Search + Cascades | Best Paper @ GECCO 2022
☆15Updated 2 years ago
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated 2 years ago