IBM / ModuleFormerLinks

ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.

☆223

Alternatives and similar repositories for ModuleFormer

Users that are interested in ModuleFormer are comparing it to the libraries listed below

Sorting:

kernelmachine / cbtm
Code repository for the c-BTM paper
☆107Updated last year
HazyResearch / TART
TART: A plug-and-play Transformer module for task-agnostic reasoning
☆200Updated 2 years ago
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆117Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆128Updated 2 years ago
imoneoi / multipack_sampler
Multipack distributed sampler for fast padding-free training of LLMs
☆198Updated 11 months ago
hydrallm / llama-moe-v1
☆95Updated 2 years ago
LLM360 / amber-train
Pre-training code for Amber 7B LLM
☆166Updated last year
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆90Updated last year
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated 11 months ago
SALT-NLP / demonstrated-feedback
☆124Updated 9 months ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆106Updated 7 months ago
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆267Updated last year
normster / llm_rules
RuLES: a benchmark for evaluating rule-following in language models
☆227Updated 5 months ago
google / sycophancy-intervention
Scripts for generating synthetic finetuning data for reducing sycophancy.
☆113Updated last year
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆91Updated last year
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆133Updated last year
facebookresearch / Shepherd
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆219Updated last year
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆140Updated 8 months ago
FasterDecoding / BitDelta
☆199Updated 7 months ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
declare-lab / flacuna
Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…
☆111Updated last year
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 9 months ago
huggingface / datablations
Scaling Data-Constrained Language Models
☆338Updated 3 weeks ago
lz1oceani / verify_cot
☆134Updated last year
AblateIt / finetune-study
Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.
☆82Updated last year
CarperAI / autocrit
A repository for transformer critique learning and generation
☆90Updated last year
sabetAI / BLoRA
batched loras
☆344Updated last year
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆231Updated 8 months ago
GeneZC / MiniMA
Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
☆100Updated last year