fedelopez77 / langdetectLinks
A language detection software
☆67Updated 8 years ago
Alternatives and similar repositories for langdetect
Users that are interested in langdetect are comparing it to the libraries listed below
Sorting:
- 80x faster and 95% accurate language identification with Fasttext☆164Updated 2 years ago
- ☆59Updated last year
- Efficient few-shot learning with cross-encoders.☆62Updated last year
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Updated 2 years ago
- Code and data releases for the paper -- DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory☆59Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆64Updated last year
- ☆58Updated last year
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Updated 2 years ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated 2 years ago
- ☆61Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆69Updated 2 months ago
- multimodal document analysis☆166Updated 2 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆82Updated last year
- Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)☆137Updated last year
- Universal text classifier for generative models☆24Updated last year
- A lightweight Python library for constructing, processing, and visualizing constituent trees.☆68Updated last week
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Updated 7 months ago
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆86Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆111Updated last year
- A massively multilingual modern encoder language model☆126Updated 3 weeks ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆192Updated 7 months ago
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆29Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated 2 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆145Updated 3 months ago
- ☆41Updated last year
- BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages☆229Updated 2 years ago
- ☆44Updated last year