justinchiu/openlogprobs

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/justinchiu/openlogprobs)

justinchiu / openlogprobs

Extract full next-token probabilities via language model APIs

☆248

Alternatives and similar repositories for openlogprobs

Users that are interested in openlogprobs are comparing it to the libraries listed below

Sorting:

pietrolesci / memorisation-profiles
View on GitHub
This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".
☆24Mar 25, 2025Updated 11 months ago
vec2text / vec2text
View on GitHub
utilities for decoding deep representations (like sentence embeddings) back to text
☆1,069Dec 27, 2025Updated 2 months ago
VikParuchuri / textbook_quality
View on GitHub
Generate textbook-quality synthetic LLM pretraining data
☆509Oct 19, 2023Updated 2 years ago
MeLeLBGU / SaGe
View on GitHub
Code for SaGe subword tokenizer (EACL 2023)
☆27Nov 30, 2024Updated last year
HazyResearch / m2
View on GitHub
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆562Dec 28, 2024Updated last year
cgarciae / einop
View on GitHub
☆60Mar 8, 2022Updated 3 years ago
srush / LLM-Training-Puzzles
View on GitHub
What would you do with 1000 H100s...
☆1,154Jan 10, 2024Updated 2 years ago
teknium1 / LLM-Benchmark-Logs
View on GitHub
Just a bunch of benchmark logs for different LLMs
☆119Jul 28, 2024Updated last year
hamelsmu / ft-drift
View on GitHub
Check for data drift between two OpenAI multi-turn chat jsonl files.
☆39Apr 11, 2024Updated last year
huggingface / datablations
View on GitHub
Scaling Data-Constrained Language Models
☆342Jun 28, 2025Updated 8 months ago
sabetAI / BLoRA
View on GitHub
batched loras
☆350Sep 6, 2023Updated 2 years ago
euclaise / supertrainer2000
View on GitHub
☆50Mar 14, 2024Updated last year
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,915Updated this week
Princeton-SysML / kNNLM_privacy
View on GitHub
Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888
☆37Jun 10, 2024Updated last year
rwightman / genalog
View on GitHub
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…
☆44Jan 18, 2024Updated 2 years ago
xjdr-alt / entropix
View on GitHub
Entropy Based Sampling and Parallel CoT Decoding
☆3,434Nov 13, 2024Updated last year
Preemo-Inc / text-generation-inference
View on GitHub
☆198Feb 9, 2024Updated 2 years ago
FranxYao / Long-Context-Data-Engineering
View on GitHub
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆487Mar 19, 2024Updated last year
marin-community / levanter
View on GitHub
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆695Jan 26, 2026Updated last month
ethz-spylab / rlhf_trojan_competition
View on GitHub
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆116Jun 13, 2024Updated last year
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,579Feb 19, 2026Updated 2 weeks ago
THUNLP-MT / PGRA
View on GitHub
Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks
☆12Sep 1, 2023Updated 2 years ago
aryamanarora / bayesian-laws-icl
View on GitHub
Bayesian scaling laws for in-context learning.
☆15Mar 12, 2025Updated 11 months ago
formll / resolving-scaling-law-discrepancies
View on GitHub
☆20Nov 4, 2025Updated 4 months ago
SolidShen / RIPPLE_official
View on GitHub
☆20Feb 11, 2024Updated 2 years ago
allenai / easy-to-hard-generalization
View on GitHub
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Jan 17, 2024Updated 2 years ago
ContextualAI / gritlm
View on GitHub
Generative Representational Instruction Tuning
☆687Jun 25, 2025Updated 8 months ago
huggingface / chug
View on GitHub
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆161Apr 3, 2024Updated last year
chaitanyamalaviya / ExpertQA
View on GitHub
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
☆137Mar 14, 2024Updated last year
dayal-kalra / low-memory-adam
View on GitHub
☆15Mar 2, 2025Updated last year
cosmicoptima / indranet-explorer
View on GitHub
Indranet Explorer, a simulated browser
☆16Nov 12, 2024Updated last year
jxiw / BiGS
View on GitHub
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆117Mar 16, 2024Updated last year
neulab / retomaton
View on GitHub
PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)
☆76Jul 16, 2022Updated 3 years ago
probcomp / LLaMPPL
View on GitHub
A domain-specific probabilistic programming language for modeling and inference with language models
☆142Apr 29, 2025Updated 10 months ago
datologyai / luxical
View on GitHub
☆75Dec 12, 2025Updated 2 months ago
srush / annotated-mamba
View on GitHub
Annotated version of the Mamba paper
☆497Feb 27, 2024Updated 2 years ago
RulinShao / retrieval-scaling
View on GitHub
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆224Dec 16, 2025Updated 2 months ago
TIGER-AI-Lab / StructLM
View on GitHub
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
☆76Oct 19, 2024Updated last year
swj0419 / detect-pretrain-code
View on GitHub
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…
☆242Nov 3, 2023Updated 2 years ago