fedelopez77 / langdetectLinks
A language detection software
☆67Updated 8 years ago
Alternatives and similar repositories for langdetect
Users that are interested in langdetect are comparing it to the libraries listed below
Sorting:
- 80x faster and 95% accurate language identification with Fasttext☆164Updated 2 years ago
- ☆59Updated last year
- ☆55Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆64Updated last year
- Efficient few-shot learning with cross-encoders.☆61Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆186Updated 2 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated 2 years ago
- ☆61Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)☆136Updated last year
- Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric …☆79Updated 2 years ago
- multimodal document analysis☆166Updated 2 months ago
- Universal text classifier for generative models☆24Updated last year
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆86Updated 2 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- A robust web archive analytics toolkit☆127Updated 3 months ago
- A Python Search Engine for Humans 🥸☆243Updated last month
- Small python package to measure OCR quality and other related metrics.☆26Updated last year
- Pretraining Efficiently on S2ORC!☆179Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated 2 years ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆191Updated 6 months ago
- Model implementation for the contextual embeddings project☆40Updated 7 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆256Updated 3 years ago
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆68Updated last year
- ☆83Updated 2 months ago
- Python API for https://vespa.ai, the open big data serving engine☆157Updated last week
- Seed Machine Translation Data☆33Updated last year
- SummScreen: A Dataset for Abstractive Screenplay Summarization (ACL 2022)☆41Updated 3 years ago
- The corresponding code for our paper: "Exploring the Challenges of Open Domain Multi-Document Summarization". Do not hesitate to open an …☆33Updated 2 years ago