commoncrawl/language-detection-cld2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/commoncrawl/language-detection-cld2)

commoncrawl / language-detection-cld2

Natural language detection, Java bindings for CLD2

☆17

Alternatives and similar repositories for language-detection-cld2

Users that are interested in language-detection-cld2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

commoncrawl / ia-web-commons
View on GitHub
Web archiving utility library
☆11Jun 19, 2026Updated last month
hopsparser / hopsparser
View on GitHub
A neural dependency parser that does its best
☆17Mar 6, 2026Updated 4 months ago
lfoppiano / material-parsers
View on GitHub
Material parsers and other tools, scripts Initially developed for Grobid Superconductor
☆14Feb 21, 2025Updated last year
mbanon / fastspell
View on GitHub
Targetted language identifier, based on FastText and Hunspell.
☆38Sep 4, 2025Updated 10 months ago
sujitpal / polydlot
View on GitHub
My attempt to learn more than one Deep Learning framework
☆15Apr 7, 2019Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
com3dian / Grobidmonkey
View on GitHub
The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.
☆12Mar 27, 2024Updated 2 years ago
kermitt2 / arxiv_harvester
View on GitHub
Poor man's simple harvester for arXiv resources
☆14Jul 14, 2023Updated 3 years ago
laurentromary / stdfSpec
View on GitHub
Specification of a stand-off element for the TEI guidelines
☆12Apr 29, 2021Updated 5 years ago
italia / daf-kylo
View on GitHub
Kylo integration with PDND (previously DAF).
☆19Nov 16, 2022Updated 3 years ago
viz-rs / radix-tree
View on GitHub
A radix tree implementation
☆15Sep 22, 2022Updated 3 years ago
kermitt2 / biblio-glutton-extension
View on GitHub
A browser extension providing Open Access bibliographical services
☆18Dec 9, 2022Updated 3 years ago
CederGroupHub / MaterialParser
View on GitHub
Utility to compile string of chemical terms into data structure with chemical formula and composition
☆13Sep 17, 2021Updated 4 years ago
allenai / bff
View on GitHub
☆39Apr 17, 2024Updated 2 years ago
softcite / softcite_kb
View on GitHub
A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources
☆18May 14, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
sinaahmadi / PersoArabicLID
View on GitHub
PALI: Language identification for Perso-Arabic Scripts
☆11Jul 11, 2023Updated 3 years ago
point85 / caliper
View on GitHub
Caliper is a project for managing units of measure and the conversions between them.
☆17Feb 17, 2026Updated 5 months ago
asajadi / wikisim
View on GitHub
Concept Representation (Embedding) and Semantic Relatedness
☆15Jul 3, 2019Updated 7 years ago
quantumbeam / materials-concept-learning
View on GitHub
☆15Dec 18, 2023Updated 2 years ago
BangLab-UdeM-Mila / NLP4MatSci-ACL23
View on GitHub
This repository contains the dataset and code for our ACL'23 publication: "MatSci-NLP: Evaluating Scientific Language Models on Materials…
☆17Nov 21, 2023Updated 2 years ago
vthib / tlsh
View on GitHub
Rust port of TLSH
☆14Oct 12, 2025Updated 9 months ago
neuged / webanno_tsv
View on GitHub
A small python library to parse and write TSV files generated by the WebAnno software.
☆11Apr 14, 2025Updated last year
josephdviviano / whatsinthebox
View on GitHub
analysis of public NLP corpora
☆11Feb 9, 2023Updated 3 years ago
carlotorniai / COVID-19-Italy
View on GitHub
Repository of data related to spread of COVID-19 in Italy.
☆20Mar 7, 2020Updated 6 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
scala / sbt-scala-module
View on GitHub
sbt plugin for scala modules.
☆13Updated this week
lfoppiano / SuperMat
View on GitHub
Superconductors material dataset
☆28Dec 5, 2023Updated 2 years ago
opencitations / cec
View on GitHub
Citation Extraction and Classifier
☆16Apr 18, 2026Updated 3 months ago
cjcourt / cdesnowball
View on GitHub
ChemDataExtractor toolkit updated to include semi-supervised quaternary relationship extraction
☆13Feb 8, 2021Updated 5 years ago
kermitt2 / grisp
View on GitHub
Knowledge Base stuff
☆23Mar 1, 2026Updated 4 months ago
pravega / flink-tools
View on GitHub
A collection of Flink applications for working with Pravega streams
☆12Dec 20, 2022Updated 3 years ago
ericevenchick / CANtool
View on GitHub
A CAN bus tool
☆11Jul 13, 2015Updated 11 years ago
GaryMcD / rustacean_gpt
View on GitHub
Meet Rustacean GPT, an experimental project transforming OpenAi's GPT into a helpful, autonomous software engineer to support senior deve…
☆14May 10, 2023Updated 3 years ago
oscar-project / goclassy
View on GitHub
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
☆86Apr 21, 2021Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
lfoppiano / grobid-superconductors
View on GitHub
Grobid module for superconductor material and properties extraction
☆23May 17, 2025Updated last year
shayne-longpre / a-pretrainers-guide
View on GitHub
☆71May 22, 2023Updated 3 years ago
CederGroupHub / LimeSoup
View on GitHub
LimeSoup is a package to parse HTML or XML papers from different publishers.
☆20Jan 4, 2021Updated 5 years ago
ntedgi / cld3-kotlin
View on GitHub
Bindings to Google's Compact Language Detector 3 to JVM Based Languages
☆21Jun 2, 2024Updated 2 years ago
johneiser / HackDeli
View on GitHub
A scripted library of hacking techniques.
☆18Jul 18, 2018Updated 8 years ago
amacneil / git-banish-large-files
View on GitHub
A git pre-receive hook to prevent large files from being committed to your repository
☆25Aug 6, 2017Updated 8 years ago
YoannDupont / WiNER-fr
View on GitHub
WiNER-fr is a free named entity corpus using French Wikinews texts.
☆17Feb 12, 2021Updated 5 years ago