✂️ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) models
☆36Oct 1, 2025Updated 5 months ago
Alternatives and similar repositories for wtpsplit-lite
Users that are interested in wtpsplit-lite are comparing it to the libraries listed below
Sorting:
- suffix array construction and searching algorithms for in-memory binary data.☆12Sep 10, 2022Updated 3 years ago
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models☆11Jan 19, 2024Updated 2 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆16Sep 25, 2024Updated last year
- German Language Understanding Evaluation Benchmark @NAACL24☆22Dec 11, 2025Updated 2 months ago
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- 🐳 Python GPU adds a minimal install of CUDA and cuDNN on top of the official python:3.x-slim base image☆20Dec 20, 2024Updated last year
- The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.☆22Nov 10, 2024Updated last year
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆28Apr 17, 2024Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Jan 25, 2023Updated 3 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- ☆12Sep 9, 2022Updated 3 years ago
- This is a custom project for WGU, the original project repo is https://github.com/udacity/nd0821-c2-build-model-workflow-starter☆12Feb 1, 2026Updated last month
- HLTB Proxy API☆13Oct 14, 2025Updated 4 months ago
- ☆12Oct 23, 2020Updated 5 years ago
- Code for the ACL 2022 paper "Contextual Representation Learning beyond Masked Language Modeling"☆33Oct 23, 2022Updated 3 years ago
- ☆10Oct 2, 2024Updated last year
- Join us to create the first predictive augmentative communication platform for speech-impaired children!☆11Aug 9, 2023Updated 2 years ago
- Rust course☆10May 24, 2025Updated 9 months ago
- LLM Building Blocks for Python Course☆15Nov 17, 2025Updated 3 months ago
- Code that accompanies online course about using ChatGPT for data science☆15May 9, 2023Updated 2 years ago
- Linear Attention for Efficient Bidirectional Sequence Modeling☆15May 13, 2025Updated 9 months ago
- A collection of post-quantum cryptographic algorithms (and emerging standards) implemented in Rust.☆16Jul 18, 2025Updated 7 months ago
- TSDG: An efficient index graph for graph-based nearest neighbor search☆10Jul 14, 2022Updated 3 years ago
- An implementation of faster-rcnn for people detection in python☆10Apr 30, 2019Updated 6 years ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆48Oct 20, 2025Updated 4 months ago
- Code for the paper "Query-Key Normalization for Transformers"☆52Mar 6, 2021Updated 4 years ago
- ☆14Mar 5, 2024Updated 2 years ago
- 0-Shot Tokenizer Transplant☆14May 16, 2025Updated 9 months ago
- ☆10Oct 15, 2020Updated 5 years ago
- Seminar: intro to deep learning with tensorflow☆13Jun 27, 2017Updated 8 years ago
- An icon-based speech communicator for Disabled Children - Android App - Alternative communication (AAC)☆10Dec 23, 2021Updated 4 years ago
- Quantum Machine Learning☆10Jan 19, 2023Updated 3 years ago
- An experiment, a playground, a sandbox, a toy — LLMs judging code.☆10Jan 28, 2025Updated last year
- A library for language transfer methods and algorithms.☆16Feb 6, 2026Updated 3 weeks ago
- The source code for Fullstack Svelte Course☆12Jul 22, 2021Updated 4 years ago
- Collection of description of concepts, procedures, and simple XSLT files for text processing, e.g. simplify InDesign documents (.idml) to…☆12Jan 9, 2020Updated 6 years ago
- Solutions to the complete set of assignment problems which I did while crediting Computational Physics course by Prof. Manish Jain at IIS…☆11May 30, 2021Updated 4 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Jan 26, 2021Updated 5 years ago