sanderwood / bgpt
Beyond Language Models: Byte Models are Digital World Simulators
☆309Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for bgpt
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆280Updated 6 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆647Updated last month
- ☆184Updated last month
- ☆470Updated 2 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆173Updated 4 months ago
- RWKV in nanoGPT style☆177Updated 5 months ago
- ☆175Updated this week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆262Updated last year
- Embed arbitrary modalities (images, audio, documents, etc) into large language models.☆176Updated 7 months ago
- Implementation of DoRA☆283Updated 5 months ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆370Updated 4 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆177Updated last month
- [ICML 2024] CLLMs: Consistency Large Language Models☆353Updated this week
- OLMoE: Open Mixture-of-Experts Language Models☆460Updated this week
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆307Updated 7 months ago
- scalable and robust tree-based speculative decoding algorithm☆315Updated 3 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆268Updated 2 weeks ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last month
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆199Updated 6 months ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆594Updated 3 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆537Updated 6 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆613Updated 5 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆229Updated 3 weeks ago
- ☆287Updated 2 months ago
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆305Updated 7 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆372Updated last month
- ☆451Updated 3 weeks ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆358Updated last month