samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆29Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for llm_slerp_generation
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆72Updated 3 weeks ago
- Set of scripts to finetune LLMs☆36Updated 7 months ago
- ☆38Updated this week
- ☆52Updated 5 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 3 months ago
- ☆64Updated 5 months ago
- ☆116Updated 2 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆49Updated 7 months ago
- My fork os allen AI's OLMo for educational purposes.☆28Updated 6 months ago
- Simple examples using Argilla tools to build AI☆38Updated this week
- ☆91Updated last month
- ☆20Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated 10 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 5 months ago
- A repository for research on medium sized language models.☆74Updated 5 months ago
- code for training & evaluating Contextual Document Embedding models☆92Updated this week
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆96Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 5 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆111Updated last year
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆70Updated this week
- QuIP quantization☆46Updated 7 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆81Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆59Updated this week
- An introduction to LLM Sampling☆18Updated this week
- Data preparation code for Amber 7B LLM☆82Updated 6 months ago
- entropix style sampling + GUI☆25Updated last week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated last month
- A pipeline for LLM knowledge distillation☆77Updated 3 months ago