sanderwood / bgptLinks

Beyond Language Models: Byte Models are Digital World Simulators

☆331

Alternatives and similar repositories for bgpt

Users that are interested in bgpt are comparing it to the libraries listed below

Sorting:

dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆294Updated last year
XuezheMax / megalodon
Reference implementation of Megalodon 7B model
☆525Updated 6 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆196Updated last year
llm-random / llm-random
☆205Updated last week
catid / dora
Implementation of DoRA
☆307Updated last year
sshh12 / multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
☆187Updated last year
pratyushasharma / laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆389Updated last year
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆360Updated 11 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆155Updated last year
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆157Updated last year
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Updated 2 years ago
keeeeenw / MicroLlama
Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget
☆162Updated 3 months ago
FasterDecoding / BitDelta
☆204Updated 11 months ago
microsoft / GRIN-MoE
GRadient-INformed MoE
☆264Updated last year
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆407Updated last year
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆561Updated 11 months ago
jondurbin / bagel
A bagel, with everything.
☆325Updated last year
OpenNLPLab / TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
☆248Updated last year
IBM / ModuleFormer
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…
☆225Updated 2 months ago
EricLBuehler / xlora
X-LoRA: Mixture of LoRA Experts
☆252Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆115Updated 9 months ago
RWKV / RWKV-infctx-trainer
RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!
☆147Updated last year
LLM360 / amber-train
Pre-training code for Amber 7B LLM
☆169Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
TencentARC / LLaMA-Pro
[ACL 2024] Progressive LLaMA with Block Expansion.
☆513Updated last year
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆232Updated last month
bigcode-project / selfcodealign
[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation
☆321Updated 9 months ago