Next-generation Punkt sentence boundary detection with zero dependencies
☆30Nov 18, 2025Updated 6 months ago
Alternatives and similar repositories for nupunkt
Users that are interested in nupunkt are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Aug 15, 2024Updated last year
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- A collection of outcomes and discoveries from our legal AI research projects☆28May 21, 2026Updated last week
- ☆28Dec 20, 2021Updated 4 years ago
- KL3M training data collection and preprocessing☆22Apr 14, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Legal Matter Standard Specification (LMSS) library for Python☆17Nov 14, 2023Updated 2 years ago
- 🔢 Work with static vector models☆39Apr 21, 2025Updated last year
- A reddit bot that finds original publish dates on linked articles.☆10Nov 30, 2024Updated last year
- ☆20Jun 11, 2021Updated 4 years ago
- Evaluate language models using multiple choice items☆13Mar 6, 2026Updated 2 months ago
- Getting interpretable dimensions in word embedding spaces.☆15Jul 6, 2023Updated 2 years ago
- A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.☆20Jul 5, 2024Updated last year
- ChatGPT with access to the internet☆25Jun 16, 2023Updated 2 years ago
- This is a prototype of a Python module for simple modification of document files. ➡️ The project has moved to: https://gitlab.opencode.de…☆19Mar 20, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Dec 2, 2024Updated last year
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆18Jun 24, 2024Updated last year
- noslegal taxonomy facets and release notes☆43Aug 13, 2025Updated 9 months ago
- Nearly Inference Free Embeddings: make your RAG queries 500x faster☆77Apr 27, 2026Updated last month
- ☆15Mar 11, 2024Updated 2 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Dec 18, 2022Updated 3 years ago
- Alternative robots parser module for Python☆22Apr 8, 2026Updated last month
- A low-code microservices platform designed for legal engineers. Given a document, Gremlin will apply a series of Python scripts to it and…☆32May 25, 2022Updated 4 years ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59May 3, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).☆38Feb 24, 2026Updated 3 months ago
- benchmarks for LLM tokenizers☆18Mar 25, 2026Updated 2 months ago
- Plug-and-play document AI with zero-shot models.☆125May 11, 2026Updated 2 weeks ago
- Automatically exported from code.google.com/p/transducersaurus☆11Apr 1, 2015Updated 11 years ago
- NER model for 10K and 10Q SEC filings☆14Jun 18, 2020Updated 5 years ago
- EmbedDB is an ultra-lightweight vector database designed for rapid prototyping of semantic search and RAG applications. The entire implem…☆21Mar 24, 2025Updated last year
- Jazz Structure Dataset☆36Jul 11, 2024Updated last year
- Download client for legal opinions☆13Jan 26, 2025Updated last year
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Create dynamic web scraper in Objective-C or Ruby!☆24Mar 28, 2015Updated 11 years ago
- Code and data for the paper "Soft Gazetteers for Low-resource Named Entity Recognition"☆19Nov 3, 2020Updated 5 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python. ➡️ The project has moved to: https://gitlab.opencode…☆21Mar 20, 2026Updated 2 months ago
- Apify's reusable github workflows☆15May 21, 2026Updated last week
- Tools for formatting large language model prompts.☆13Dec 19, 2023Updated 2 years ago
- WIP. A directed graph editor with React, Redux and D3.js☆11Oct 3, 2017Updated 8 years ago
- FlexiTokens☆23Dec 27, 2025Updated 5 months ago