fedelopez77 / langdetectLinks
A language detection software
☆65Updated 8 years ago
Alternatives and similar repositories for langdetect
Users that are interested in langdetect are comparing it to the libraries listed below
Sorting:
- 80x faster and 95% accurate language identification with Fasttext☆163Updated last year
- ☆59Updated last year
- Efficient few-shot learning with cross-encoders.☆60Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆64Updated last year
- multimodal document analysis☆166Updated last month
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated 2 years ago
- Model implementation for the contextual embeddings project☆37Updated 6 months ago
- Pretraining Efficiently on S2ORC!☆178Updated last year
- Domain-Specific Text Generation for Machine Translation (with LLMs) - scripts and config files for the paper☆18Updated 2 years ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆181Updated last month
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 3 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)☆134Updated last year
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆67Updated 3 months ago
- ☆54Updated last year
- Seed Machine Translation Data☆33Updated last year
- A robust web archive analytics toolkit☆126Updated 2 months ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆111Updated last year
- Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric …☆79Updated 2 years ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆189Updated 5 months ago
- ☆62Updated last year
- A guide to structured generation using constrained decoding☆13Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆27Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated 2 years ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Updated 6 months ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Updated 2 years ago
- A Multilingual Replicable Instruction-Following Model☆95Updated 2 years ago
- Easy modernBERT fine-tuning and multi-task learning☆63Updated 5 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆256Updated 3 years ago