SusCom-Lab / ZeroMerge

☆11

Alternatives and similar repositories for ZeroMerge:

Users that are interested in ZeroMerge are comparing it to the libraries listed below

Scientific-Computing-Lab / MPI-rigen
MPI Code Generation through Domain-Specific Language Models
☆13Updated 4 months ago
catid / spectral_ssm
Implementation of Spectral State Space Models
☆16Updated last year
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆33Updated last year
Zyphra / zcookbook
Training hybrid models for dummies.
☆20Updated 2 months ago
iantbutler01 / ditty
A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.
☆16Updated 5 months ago
NolanoOrg / SpectraSuite
☆46Updated 8 months ago
yynil / RWKVInside
☆32Updated last week
kyegomez / SelfExtend
Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta
☆13Updated 4 months ago
Blackzxy / LoGAH
☆21Updated 6 months ago
katzurik / Knowledge_Navigator
☆19Updated 3 weeks ago
kyegomez / OpenStrawberry
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆28Updated 3 weeks ago
kyegomez / MobileVLM
Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …
☆15Updated last year
UmerHA / triton_util
Make triton easier
☆47Updated 9 months ago
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆22Updated 2 months ago
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆23Updated last year
catid / lllm
Latent Large Language Models
☆17Updated 7 months ago
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆19Updated 2 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated 8 months ago
cyzus / thoughtsculpt
☆13Updated 3 months ago
VITA-Group / ChainCoder
[ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …
☆40Updated last year
thunlp / SparsingLaw
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆18Updated 4 months ago
allenai / olmo-cookbook
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
☆16Updated this week
nexusflowai / NexusBench
Nexusflow function call, tool use, and agent benchmarks.
☆19Updated 3 months ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆42Updated 6 months ago
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆26Updated 3 weeks ago
kyleliang919 / Online-Subspace-Descent
This repo is based on https://github.com/jiaweizzhao/GaLore
☆26Updated 6 months ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 5 months ago
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆21Updated 4 months ago
JHU-CLSP / RATIONALYST
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆32Updated 5 months ago
facebookresearch / coocmap
code for paper "Accessing higher dimensions for unsupervised word translation"
☆21Updated last year