MooreThreads / TurboRAG
☆31Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for TurboRAG
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆70Updated last week
- ☆22Updated 3 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆73Updated 3 weeks ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆23Updated last month
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆76Updated last month
- A pipeline for LLM knowledge distillation☆77Updated 3 months ago
- Benchmark suite for LLMs from Fireworks.ai☆58Updated this week
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆105Updated 3 weeks ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆73Updated 9 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆29Updated 6 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆137Updated this week
- The official repo for "LLoCo: Learning Long Contexts Offline"☆110Updated 4 months ago
- KV cache compression for high-throughput LLM inference☆82Updated last week
- ☆43Updated 3 months ago
- ☆64Updated last month
- Open Implementations of LLM Analyses☆94Updated last month
- Evaluation tools for Retrieval-augmented Generation (RAG) methods.☆127Updated 3 weeks ago
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆144Updated 8 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆124Updated 3 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆128Updated this week
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆114Updated this week
- Codebase accompanying the Summary of a Haystack paper.☆72Updated last month
- Experimental Code for StructuredRAG: Structured Outputs in Retrieval-Augmented Generation☆93Updated this week
- ☆29Updated 4 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆86Updated 3 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- ☆62Updated last week
- [SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models☆22Updated 8 months ago
- LLM Serving Performance Evaluation Harness☆55Updated 2 months ago