imagination-research / sot
[ICLR 2024] Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
☆138Updated 6 months ago
Related projects: ⓘ
- The official repo for "LLoCo: Learning Long Contexts Offline"☆104Updated 3 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆195Updated 4 months ago
- ☆174Updated 4 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆264Updated 9 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆218Updated 5 months ago
- Expert Specialized Fine-Tuning☆129Updated last month
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆290Updated 5 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆123Updated 6 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆107Updated last year
- Experiments on speculative sampling with Llama models☆114Updated last year
- ☆111Updated 3 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆258Updated 10 months ago
- The scripts for MMLU-Pro☆84Updated this week
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆68Updated 2 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆55Updated this week
- Benchmark baseline for retrieval qa applications☆90Updated 5 months ago
- ☆117Updated 7 months ago
- Fast Inference of MoE Models with CPU-GPU Orchestration☆163Updated 3 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆182Updated last month
- ReLM is a Regular Expression engine for Language Models☆100Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆158Updated 4 months ago
- PB-LLM: Partially Binarized Large Language Models☆143Updated 10 months ago
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆274Updated 5 months ago
- ☆191Updated 3 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆123Updated 3 months ago
- Explorations into some recent techniques surrounding speculative decoding☆190Updated 11 months ago
- Evaluation and analysis code for LLM360☆75Updated 3 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆158Updated 2 months ago
- ☆262Updated this week
- Repository for organizing datasets and papers used in Open LLM.☆86Updated last year