thunlp / Delta-CoMe
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
☆57Updated 5 months ago
Alternatives and similar repositories for Delta-CoMe:
Users that are interested in Delta-CoMe are comparing it to the libraries listed below
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆131Updated 10 months ago
- ☆94Updated 4 months ago
- GLM Series Edge Models☆134Updated last month
- SUS-Chat: Instruction tuning done right☆48Updated last year
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆158Updated last week
- FuseAI Project☆85Updated 2 months ago
- A demo built on Megrez-3B-Instruct, integrating a web search tool to enhance the model's question-and-answer capabilities.☆37Updated 4 months ago
- Imitate OpenAI with Local Models☆88Updated 7 months ago
- ☆29Updated 7 months ago
- ☆137Updated last month
- ☆46Updated 10 months ago
- Mixture-of-Experts (MoE) Language Model☆186Updated 7 months ago
- Its an open source LLM based on MOE Structure.☆58Updated 9 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆249Updated 4 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated 11 months ago
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆26Updated 9 months ago
- ☆107Updated 5 months ago
- ☆38Updated 11 months ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆42Updated 2 months ago
- Light local website for displaying performances from different chat models.☆86Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- zero零训练llm调参☆31Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆134Updated 4 months ago
- ☆81Updated last year
- ☆225Updated 11 months ago
- OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…☆131Updated last week
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆130Updated 8 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆71Updated 3 weeks ago
- ☆51Updated 7 months ago
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆86Updated last month