imagination-research / sot
[ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
☆146Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for sot
- The official repo for "LLoCo: Learning Long Contexts Offline"☆113Updated 5 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆293Updated 11 months ago
- ☆184Updated last month
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆199Updated 6 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆135Updated 5 months ago
- ☆122Updated 10 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆130Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆129Updated last month
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆79Updated last week
- Expert Specialized Fine-Tuning☆148Updated 2 months ago
- Fast Inference of MoE Models with CPU-GPU Orchestration☆173Updated last week
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆112Updated last year
- Explorations into some recent techniques surrounding speculative decoding☆212Updated last year
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆238Updated 4 months ago
- Open Implementations of LLM Analyses☆94Updated last month
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆262Updated last year
- RepoQA: Evaluating Long-Context Code Understanding☆100Updated 3 weeks ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆307Updated 7 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated 2 weeks ago
- An Analytical Evaluation Board of Multi-turn LLM Agents☆250Updated 6 months ago
- Benchmark baseline for retrieval qa applications☆95Updated 7 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆147Updated this week
- ☆116Updated 5 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆176Updated this week
- ☆199Updated 5 months ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆144Updated 9 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆216Updated 7 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆174Updated 4 months ago