UbiquitousLearning / SLM_Survey
☆83Updated 4 months ago
Alternatives and similar repositories for SLM_Survey:
Users that are interested in SLM_Survey are comparing it to the libraries listed below
- FuseAI Project☆83Updated 3 weeks ago
- ☆37Updated 4 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated 2 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 9 months ago
- ☆75Updated last month
- ☆64Updated 2 weeks ago
- Survey of Small Language Models from Penn State, ...☆156Updated last month
- ☆77Updated 2 weeks ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆114Updated 8 months ago
- ☆125Updated last year
- ☆47Updated 5 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆82Updated 8 months ago
- Exploring Model Kinship for Merging Large Language Models☆23Updated 3 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆47Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆107Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 8 months ago
- ☆59Updated 2 weeks ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆44Updated 4 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated last week
- a curated list of the role of small models in the LLM era☆91Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆42Updated 7 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆197Updated this week
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆107Updated 9 months ago
- Self-host LLMs with LMDeploy and BentoML☆17Updated last month
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆160Updated 2 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆38Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆140Updated 5 months ago
- A pipeline for LLM knowledge distillation☆91Updated 3 weeks ago