manuelescobar-dev / LLM-Tools
Open-source calculator for LLM system requirements.
☆142Updated 3 months ago
Alternatives and similar repositories for LLM-Tools:
Users that are interested in LLM-Tools are comparing it to the libraries listed below
- Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower exe…☆199Updated this week
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆238Updated last year
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆227Updated 7 months ago
- Comparison of Language Model Inference Engines☆208Updated 3 months ago
- A throughput-oriented high-performance serving framework for LLMs☆782Updated 6 months ago
- Materials for learning SGLang☆355Updated last week
- Serverless LLM Serving for Everyone.☆441Updated this week
- ☆73Updated 4 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆137Updated 3 weeks ago
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆345Updated 11 months ago
- [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which r…☆945Updated last month
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆211Updated 7 months ago
- Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…☆384Updated this week
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆41Updated 8 months ago
- Efficient LLM Inference over Long Sequences☆365Updated last month
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆129Updated 8 months ago
- InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…☆368Updated last week
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆313Updated 6 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆201Updated 4 months ago
- Advanced Quantization Algorithm for LLMs/VLMs.☆413Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated 6 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆177Updated 2 weeks ago
- Awesome list for LLM quantization☆187Updated 3 months ago
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆293Updated 5 months ago
- A large-scale simulation framework for LLM inference☆355Updated 4 months ago
- Modular and structured prompt caching for low-latency LLM inference☆89Updated 4 months ago
- ☆142Updated last month
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆160Updated last year
- Evaluation tools for Retrieval-augmented Generation (RAG) methods.☆151Updated 4 months ago
- LLM Serving Performance Evaluation Harness☆73Updated last month