stanfordnlp/string2string

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stanfordnlp/string2string)

stanfordnlp / string2string

String-to-String Algorithms for Natural Language Processing

☆563

Alternatives and similar repositories for string2string

Users that are interested in string2string are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shrutirij / ocr-post-correction
View on GitHub
☆141Mar 5, 2024Updated 2 years ago
timvieira / vocrf
View on GitHub
Variable-order CRFs with structure learning
☆17Aug 1, 2024Updated last year
bdusell / nondeterministic-stack-rnn
View on GitHub
Code for the paper "The Surprising Computational Power of Nondeterministic Stack RNNs" (DuSell and Chiang, 2023)
☆20Mar 21, 2024Updated 2 years ago
taylorai / galactic
View on GitHub
data cleaning and curation for unstructured text
☆329Aug 6, 2024Updated last year
HLasse / TextDescriptives
View on GitHub
A Python library for calculating a large variety of metrics from text
☆366May 5, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,777May 26, 2026Updated 2 months ago
norakassner / mlama
View on GitHub
☆25Jan 22, 2024Updated 2 years ago
inseq-team / inseq
View on GitHub
Interpretability for sequence generation models 🐛 🔍
☆471Apr 25, 2026Updated 3 months ago
lyutyuh / structured-span-selector
View on GitHub
A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…
☆21Jul 11, 2022Updated 4 years ago
salesforce / xgen
View on GitHub
Salesforce open-source LLMs with 8k sequence length.
☆727Jun 2, 2026Updated last month
hscells / pybool_ir
View on GitHub
Toolkit for domain-specific information retrieval experimentation
☆19May 18, 2026Updated 2 months ago
ahmetustun / udapter
View on GitHub
UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…
☆31Dec 5, 2022Updated 3 years ago
xlhex / dpe
View on GitHub
☆22Oct 26, 2020Updated 5 years ago
ahmetustun / hyperx
View on GitHub
☆21Dec 5, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
opinionscience / BERTransfer
View on GitHub
A BERT-based application for reusable text classification at scale
☆37Jul 23, 2023Updated 3 years ago
tlkh / t2t-tuner
View on GitHub
Convenient Text-to-Text Training for Transformers
☆18Dec 10, 2021Updated 4 years ago
cisnlp / mPLM-Sim
View on GitHub
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
☆11Jan 19, 2024Updated 2 years ago
dlab-berkeley / DIGHUM101-2020
View on GitHub
☆20Feb 9, 2021Updated 5 years ago
dbamman / litbank
View on GitHub
Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.
☆378Dec 8, 2022Updated 3 years ago
lingjzhu / idiolect
View on GitHub
Code for Learning idiolectal style variation in online register
☆10May 18, 2023Updated 3 years ago
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,060Updated this week
webis-de / small-text
View on GitHub
Active Learning for Text Classification in Python
☆648May 24, 2026Updated 2 months ago
Pleias / marginalia
View on GitHub
☆67Mar 4, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MaartenGr / BERTopic
View on GitHub
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
☆7,765May 13, 2026Updated 2 months ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,237Jul 22, 2026Updated last week
cisnlp / simalign
View on GitHub
[EMNLP 2020] Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
☆398Nov 7, 2023Updated 2 years ago
ddangelov / Top2Vec
View on GitHub
Top2Vec learns jointly embedded topic, document and word vectors.
☆3,101Nov 14, 2024Updated last year
MaartenGr / KeyBERT
View on GitHub
Minimal keyword extraction with BERT
☆4,209May 13, 2026Updated 2 months ago
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,952Updated this week
iesl / s-diora
View on GitHub
☆12Jan 29, 2021Updated 5 years ago
facebookresearch / belebele
View on GitHub
Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
☆341Dec 18, 2024Updated last year
suzgunmirac / marnns
View on GitHub
MARNNs Can Learn Generalized Dyck Languages
☆12Nov 11, 2019Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
harvardnlp / pytorch-struct
View on GitHub
Fast, general, and tested differentiable structured prediction in PyTorch
☆1,133Apr 20, 2022Updated 4 years ago
beir-cellar / beir
View on GitHub
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
☆2,255Oct 16, 2025Updated 9 months ago
mixedbread-ai / baguetter
View on GitHub
Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…
☆211Aug 31, 2024Updated last year
abertsch72 / unlimiformer
View on GitHub
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,062Mar 7, 2024Updated 2 years ago
stephantul / unitoken
View on GitHub
Tokenization across languages. Useful as preprocessing for subword tokenization.
☆19Feb 7, 2023Updated 3 years ago
kabirkhan / recon
View on GitHub
Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …
☆104Feb 26, 2024Updated 2 years ago