MooreThreads / TurboRAG
☆34Updated this week
Related projects ⓘ
Alternatives and complementary repositories for TurboRAG
- ☆22Updated 4 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆74Updated 10 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆79Updated last week
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆77Updated last month
- Open Implementations of LLM Analyses☆94Updated last month
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆23Updated last month
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆106Updated last month
- Modular and structured prompt caching for low-latency LLM inference☆69Updated 2 weeks ago
- ☆36Updated this week
- Evaluation tools for Retrieval-augmented Generation (RAG) methods.☆136Updated this week
- A pipeline for LLM knowledge distillation☆78Updated 3 months ago
- Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.☆128Updated this week
- [NAACL'24] Dataset, code and models for "TableLlama: Towards Open Large Generalist Models for Tables".☆116Updated 6 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆74Updated last month
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆139Updated this week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆29Updated 6 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆193Updated last month
- LLM Serving Performance Evaluation Harness☆57Updated 2 months ago
- ☆43Updated 4 months ago
- ☆222Updated 4 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆130Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆168Updated 2 weeks ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆46Updated last month
- ☆130Updated 3 months ago
- Expert Specialized Fine-Tuning☆148Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆135Updated 5 months ago
- experiments with inference on llama☆105Updated 5 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆360Updated last month
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆59Updated 3 months ago