Extract full next-token probabilities via language model APIs
☆248Feb 23, 2024Updated 2 years ago
Alternatives and similar repositories for openlogprobs
Users that are interested in openlogprobs are comparing it to the libraries listed below
Sorting:
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆24Mar 25, 2025Updated 11 months ago
- utilities for decoding deep representations (like sentence embeddings) back to text☆1,069Dec 27, 2025Updated 2 months ago
- Generate textbook-quality synthetic LLM pretraining data☆509Oct 19, 2023Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆27Nov 30, 2024Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆562Dec 28, 2024Updated last year
- ☆60Mar 8, 2022Updated 3 years ago
- What would you do with 1000 H100s...☆1,154Jan 10, 2024Updated 2 years ago
- Just a bunch of benchmark logs for different LLMs☆119Jul 28, 2024Updated last year
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆39Apr 11, 2024Updated last year
- Scaling Data-Constrained Language Models☆342Jun 28, 2025Updated 8 months ago
- batched loras☆350Sep 6, 2023Updated 2 years ago
- ☆50Mar 14, 2024Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,915Updated this week
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆37Jun 10, 2024Updated last year
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆44Jan 18, 2024Updated 2 years ago
- Entropy Based Sampling and Parallel CoT Decoding☆3,434Nov 13, 2024Updated last year
- ☆198Feb 9, 2024Updated 2 years ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆487Mar 19, 2024Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆695Jan 26, 2026Updated last month
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆116Jun 13, 2024Updated last year
- Minimalistic large language model 3D-parallelism training☆2,579Feb 19, 2026Updated 2 weeks ago
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- Bayesian scaling laws for in-context learning.☆15Mar 12, 2025Updated 11 months ago
- ☆20Nov 4, 2025Updated 4 months ago
- ☆20Feb 11, 2024Updated 2 years ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Jan 17, 2024Updated 2 years ago
- Generative Representational Instruction Tuning☆687Jun 25, 2025Updated 8 months ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆161Apr 3, 2024Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆137Mar 14, 2024Updated last year
- ☆15Mar 2, 2025Updated last year
- Indranet Explorer, a simulated browser☆16Nov 12, 2024Updated last year
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆117Mar 16, 2024Updated last year
- PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)☆76Jul 16, 2022Updated 3 years ago
- A domain-specific probabilistic programming language for modeling and inference with language models☆142Apr 29, 2025Updated 10 months ago
- ☆75Dec 12, 2025Updated 2 months ago
- Annotated version of the Mamba paper☆497Feb 27, 2024Updated 2 years ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆224Dec 16, 2025Updated 2 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Oct 19, 2024Updated last year
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆242Nov 3, 2023Updated 2 years ago