fedelopez77 / langdetectLinks
A language detection software
☆56Updated 7 years ago
Alternatives and similar repositories for langdetect
Users that are interested in langdetect are comparing it to the libraries listed below
Sorting:
- 80x faster and 95% accurate language identification with Fasttext☆162Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆148Updated 2 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆59Updated last year
- Efficient few-shot learning with cross-encoders.☆56Updated last year
- ☆52Updated last year
- Python API for https://vespa.ai, the open big data serving engine☆137Updated this week
- ☆57Updated 11 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆186Updated last month
- ☆62Updated last year
- Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)☆129Updated 11 months ago
- multimodal document analysis☆164Updated last year
- Model implementation for the contextual embeddings project☆35Updated 2 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆252Updated 2 years ago
- ☆154Updated last year
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- A sentence segmentation library with wide language support optimized for speed and utility.☆66Updated 2 months ago
- A Multilingual Replicable Instruction-Following Model☆94Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆63Updated 2 weeks ago
- A file utility for accessing both local and remote files through a unified interface.☆44Updated 3 months ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆110Updated last year
- This project studies the performance and robustness of language models and task-adaptation methods.☆151Updated last year
- BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages☆227Updated last year
- Source code and data for Like a Good Nearest Neighbor☆30Updated 7 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated 10 months ago
- Pretraining Efficiently on S2ORC!☆165Updated 10 months ago
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆110Updated 2 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆94Updated 8 months ago
- Seed Machine Translation Data☆33Updated 9 months ago
- ☆22Updated 5 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆21Updated last month