HKUDS / SepLLM
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
☆60Updated 2 months ago
Alternatives and similar repositories for SepLLM:
Users that are interested in SepLLM are comparing it to the libraries listed below
- ☆142Updated 6 months ago
- ☆78Updated this week
- ☆16Updated 2 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆58Updated 3 weeks ago
- ☆20Updated 8 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆70Updated 2 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆99Updated 4 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆48Updated 4 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆205Updated last month
- "EasyRec: Simple yet Effective Language Model for Recommendation"☆111Updated 2 weeks ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆33Updated last month
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆39Updated this week
- [preprint] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆42Updated 2 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆74Updated 9 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆60Updated last week
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆46Updated 4 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆38Updated 3 months ago
- The code of RouterDC☆53Updated 2 weeks ago
- Repository for the paper: 500xCompressor: Generalized Prompt Compression for Large Language Models☆30Updated 6 months ago
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆73Updated 4 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆50Updated 4 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆27Updated 11 months ago
- ☆22Updated 3 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆36Updated 5 months ago
- ☆18Updated 4 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 3 months ago