spcl / CheckEmbedLinks
Official Implementation of "CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks"
☆22Updated 5 months ago
Alternatives and similar repositories for CheckEmbed
Users that are interested in CheckEmbed are comparing it to the libraries listed below
Sorting:
- ☆40Updated 7 months ago
- Compression for Foundation Models☆34Updated 4 months ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Updated last year
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆27Updated 11 months ago
- ☆34Updated 10 months ago
- Cascade Speculative Drafting☆32Updated last year
- LLM-Inference-Bench☆56Updated 4 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- [NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"☆77Updated 5 months ago
- Train, tune, and infer Bamba model☆137Updated 6 months ago
- ☆19Updated 8 months ago
- AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)☆79Updated 11 months ago
- ☆63Updated 2 weeks ago
- Source code for Activated LoRA☆23Updated 3 weeks ago
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆182Updated last year
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆18Updated last year
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Updated last year
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆68Updated last year
- ☆39Updated last year
- Sparsity support for PyTorch☆37Updated 8 months ago
- ☆53Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated last year
- Code for paper "Analog Foundation Models"☆27Updated 2 months ago
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆17Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated 2 months ago
- Estimating hardware and cloud costs of LLMs and transformer projects☆20Updated this week
- Source code for "BenchPress: A Deep Active Benchmark Generator", PACT 2022☆21Updated 2 years ago
- ☆78Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆70Updated last month