cornstarch-org / Cornstarch
☆64Updated this week
Alternatives and similar repositories for Cornstarch:
Users that are interested in Cornstarch are comparing it to the libraries listed below
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆277Updated last month
- minimal GRPO implementation from scratch☆65Updated 2 weeks ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆139Updated 3 weeks ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆111Updated 9 months ago
- Train, tune, and infer Bamba model☆87Updated 2 months ago
- Cray-LM unified training and inference stack.☆21Updated 2 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆127Updated this week
- Code for studying the super weight in LLM☆94Updated 3 months ago
- ❓Curie: Automated and Rigorous Scientific Experimentation with AI Agents☆54Updated this week
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆90Updated 3 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆169Updated last week
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 9 months ago
- ☆158Updated last month
- PyTorch library for Active Fine-Tuning☆62Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- A collection of all available inference solutions for the LLMs☆82Updated 3 weeks ago
- ☆43Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆311Updated 3 months ago
- A minimal implementation of vllm.☆37Updated 8 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆196Updated this week
- ☆41Updated 11 months ago
- ☆107Updated last week
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆107Updated last month
- making the official triton tutorials actually comprehensible☆21Updated last week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 11 months ago
- A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).☆139Updated 2 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 5 months ago
- ☆46Updated 4 months ago
- LLM KV cache compression made easy☆444Updated last week