FlexiTokens
☆18Dec 27, 2025Updated 2 months ago
Alternatives and similar repositories for flexitokens
Users that are interested in flexitokens are comparing it to the libraries listed below
Sorting:
- ☆21Jul 21, 2025Updated 8 months ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- ANE accelerated embedding models!☆20Dec 11, 2024Updated last year
- User-friendly viewer for Parquet files☆10Mar 7, 2026Updated 2 weeks ago
- Create a Robust CDN for your Django Project Static Files in this section. This repo is the reference code for the Django + S3 + Cloudfron…☆11Sep 8, 2021Updated 4 years ago
- Python Module implementing SRP☆12Jul 29, 2022Updated 3 years ago
- Fork of Flame repo for training of some new stuff in development☆19Updated this week
- Comparison of existing spell checking tools☆11Mar 28, 2023Updated 2 years ago
- Use the React CDN as well as Babel to make a Standalone React app without running `npx create-react-app`☆12Mar 22, 2019Updated 6 years ago
- ☆44Feb 11, 2026Updated last month
- Reference code for the AWS S3 section in the Dive into AWS Course.☆15Dec 8, 2022Updated 3 years ago
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Aug 28, 2024Updated last year
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 10 months ago
- Use Django with Docker and Deploy to Heroku.☆14Sep 21, 2019Updated 6 years ago
- benchmarks for LLM tokenizers☆17Feb 27, 2026Updated 3 weeks ago
- DImensionality REduction in JAX☆26Nov 21, 2025Updated 4 months ago
- An MCP tool server that provides stateful, TUI-compatible terminal sessions.☆14Feb 3, 2025Updated last year
- Declaratively set your DNS records with dnsmill, powered by libdns.☆12Nov 26, 2025Updated 3 months ago
- Pre-train Static Word Embeddings☆95Sep 9, 2025Updated 6 months ago
- Automatically exported from code.google.com/p/transducersaurus☆11Apr 1, 2015Updated 10 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- A frontend for your PDS☆23Oct 20, 2025Updated 5 months ago
- ☆69Mar 17, 2022Updated 4 years ago
- An implementation of the Pair Adjacent Violators algorithm for isotonic regression in Rust☆11Oct 18, 2025Updated 5 months ago
- copied from http://readable.sourceforge.net☆10Mar 13, 2016Updated 10 years ago
- Viewer for text datasets in formats like HuggingFace, JSONL, etc.☆15Feb 25, 2025Updated last year
- An R package for analyzing linguistic alignment between partners in conversation transcripts☆14Jan 30, 2026Updated last month
- The PyTorch implementation of paper "KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation"☆15Jul 4, 2025Updated 8 months ago
- get grabby with file trees☆13Mar 27, 2024Updated last year
- ☆11Nov 17, 2018Updated 7 years ago
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ …☆69Jan 7, 2026Updated 2 months ago
- A curated list of papers, tools, and resources on Multi-Token Prediction (MTP) and related techniques in Large Language Models (LLMs), Sp…☆57Feb 7, 2026Updated last month
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- 🚀🤗 A collection of templates for Hugging Face Spaces☆35Oct 9, 2023Updated 2 years ago
- zero shot NER fine tuning☆14Mar 17, 2025Updated last year
- CONFSEC's ComputeNode component of the OpenPCC standard☆18Dec 15, 2025Updated 3 months ago
- Plug-and-play document AI with zero-shot models.☆125Feb 16, 2026Updated last month
- Notebooks and other course materials for Emory QTM 340 (Fall 2022)☆12Dec 13, 2022Updated 3 years ago
- Nanoloop source files for the album "Prime 16"☆11Mar 7, 2026Updated 2 weeks ago