Preemo-Inc / text-generation-inference
☆200Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for text-generation-inference
- experiments with inference on llama☆105Updated 5 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆81Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆134Updated 3 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆232Updated 5 months ago
- ☆94Updated last month
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆167Updated 2 weeks ago
- ☆130Updated this week
- ☆112Updated this week
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆246Updated 2 weeks ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆145Updated last year
- Track OpenAI compatible requests to a dataset☆57Updated this week
- Manage scalable open LLM inference endpoints in Slurm clusters☆238Updated 4 months ago
- Low-Rank adapter extraction for fine-tuned transformers model☆162Updated 6 months ago
- Fast & more realistic evaluation of chat language models. Includes leaderboard.☆183Updated 11 months ago
- Google TPU optimizations for transformers models☆75Updated this week
- Domain Adapted Language Modeling Toolkit - E2E RAG☆312Updated 2 weeks ago
- ☆91Updated last year
- ☆123Updated this week
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆222Updated 3 weeks ago
- Tune MPTs☆84Updated last year
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆155Updated last year
- Let's build better datasets, together!☆206Updated this week
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated last month
- Late Interaction Models Training & Retrieval☆166Updated this week
- A bagel, with everything.☆312Updated 7 months ago
- ☆64Updated 5 months ago
- Just a bunch of benchmark logs for different LLMs☆116Updated 3 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆203Updated 6 months ago
- Merge Transformers language models by use of gradient parameters.☆201Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆253Updated last month