JosephJeesungSuh / subpop
Official repository for Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
☆16Updated last month
Alternatives and similar repositories for subpop:
Users that are interested in subpop are comparing it to the libraries listed below
- ☆37Updated 5 months ago
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆57Updated 10 months ago
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".☆13Updated 6 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 11 months ago
- ☆45Updated 9 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 3 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 9 months ago
- [EMNLP 2024 Main] Virtual Personas for Language Models via an Anthology of Backstories☆27Updated 4 months ago
- Cascade Speculative Drafting☆29Updated 11 months ago
- The official implementation of Cross-Task Experience Sharing (COPS)☆21Updated 5 months ago
- Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆22Updated last week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆111Updated 3 months ago
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆28Updated 4 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆148Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆104Updated 5 months ago
- Work in progress.☆50Updated 2 weeks ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆64Updated 2 weeks ago
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- A repository for research on medium sized language models.☆76Updated 10 months ago
- ☆50Updated 5 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆107Updated last month
- Compression for Foundation Models☆27Updated this week
- ☆43Updated last year
- ☆113Updated last week
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)☆60Updated last year
- ☆195Updated 3 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated last week
- Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆35Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆166Updated 3 weeks ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆158Updated 8 months ago