☆104Jan 24, 2026Updated 2 months ago
Alternatives and similar repositories for infini-gram
Users that are interested in infini-gram are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆33Jun 19, 2024Updated last year
- ☆54Sep 26, 2025Updated 6 months ago
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- 🖋 Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024☆21Mar 29, 2026Updated 3 weeks ago
- Code for the paper "Closing the Curious Case of Neural Text Degeneration"☆12Apr 9, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The geometry of multilingual language model representations (EMNLP 2022).☆22Oct 21, 2022Updated 3 years ago
- AllenNLP integration for Shiba: Japanese CANINE model☆12Jun 26, 2021Updated 4 years ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- ChatGPT Participates in a Computer Science Exam (2023)☆31Mar 21, 2023Updated 3 years ago
- Awesome List of Sources of Japanese Censored Words☆19Sep 11, 2022Updated 3 years ago
- ☆33Feb 11, 2025Updated last year
- Code for COLING 2020 Paper☆13Feb 3, 2026Updated 2 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Apr 30, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆86Jan 12, 2025Updated last year
- Extracts plain text, language identification and more metadata from WARC records☆23Oct 1, 2025Updated 6 months ago
- ☆19Feb 19, 2024Updated 2 years ago
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- Hugging Face and Pyserini interoperability☆19May 18, 2023Updated 2 years ago
- This repository contains an extension of fairseq for pixel / visual representations of text for machine translation.☆37Feb 2, 2024Updated 2 years ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆227Nov 16, 2024Updated last year
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Jan 12, 2023Updated 3 years ago
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Model☆18Sep 15, 2025Updated 7 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A framework for adversarial attacks against token classification models☆33Nov 6, 2021Updated 4 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"☆36Jun 7, 2025Updated 10 months ago
- Implementation for robust ViT and scaled attention☆21Apr 4, 2025Updated last year
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.☆66Jul 6, 2025Updated 9 months ago
- Basic semantic search for a tweet archive☆59Feb 19, 2025Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.☆20Dec 25, 2023Updated 2 years ago
- A framework for graph-based dependency parsing.☆19Feb 9, 2022Updated 4 years ago
- Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“☆15Jun 13, 2023Updated 2 years ago
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🚀 A demonstration of hyperparameter optimization using Optuna for models implemented with AllenNLP.☆16Nov 28, 2020Updated 5 years ago
- ☆21Jan 15, 2024Updated 2 years ago
- Code for NeurIPS 2024 Paper - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass☆21Aug 22, 2024Updated last year
- ☆45Feb 11, 2026Updated 2 months ago
- Code base for the EMNLP 2021 Findings paper: Cartography Active Learning☆14Jun 3, 2025Updated 10 months ago
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated 2 years ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆284Jul 11, 2024Updated last year