πΈ GlotWeb: Web Indexing for Minority Languages (WWW 2026)
β17Feb 27, 2026Updated last month
Alternatives and similar repositories for GlotWeb
Users that are interested in GlotWeb are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- Residual Quantization Autoencoder, used for interpreting LLMsβ14Jan 1, 2025Updated last year
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- πΈ GlotCC Dataset and Pipline -- NeurIPS 2024β20Apr 6, 2025Updated last year
- π Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024β21Mar 29, 2026Updated 2 weeks ago
- Serverless GPU API endpoints on Runpod - Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β27Feb 16, 2026Updated 2 months ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languagesβ106Updated this week
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β197Mar 27, 2026Updated 2 weeks ago
- Evaluate language models using multiple choice itemsβ13Mar 6, 2026Updated last month
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific wayβ18Nov 4, 2025Updated 5 months ago
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.β38Feb 5, 2026Updated 2 months ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).β15Jun 4, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A simple neural truecaser written in pytorch and allennlp.β33Jun 17, 2024Updated last year
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β75Apr 1, 2025Updated last year
- β11Aug 13, 2024Updated last year
- ParCourE - Parallel Corpus Explorerβ12Dec 27, 2021Updated 4 years ago
- A fast python implementation of the SimHash algorithm.β27Oct 27, 2021Updated 4 years ago
- Data Collection System For NLP/Speech Recognitionβ25Apr 20, 2021Updated 4 years ago
- π’ Work with static vector modelsβ38Apr 21, 2025Updated 11 months ago
- A Directory of Online Newspaper Sources for 70+ Languagesβ31Apr 15, 2021Updated 5 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.β64Jul 29, 2024Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A new Turkish Dependency Treebank in UD styleβ15Aug 17, 2020Updated 5 years ago
- β44Feb 11, 2026Updated 2 months ago
- Translation of query languages to serialized KoralQuery protocolβ14Mar 30, 2026Updated 2 weeks ago
- β12Oct 31, 2025Updated 5 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β58Feb 3, 2026Updated 2 months ago
- Benchmark scripts for comparing different tokenizers and sentence segmenters of Germanβ12Feb 27, 2023Updated 3 years ago
- Collection of Common Machine Translation Toolsβ11Jul 26, 2022Updated 3 years ago
- The Flutter MotionPhotos Package to detect and extract the video content from the motion photos by https://ente.ioβ18Nov 22, 2024Updated last year
- code and data used to build a training dataset for dragnet modelsβ10Nov 29, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- β15Jan 10, 2022Updated 4 years ago
- Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"β32Jun 20, 2023Updated 2 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"β30Apr 2, 2022Updated 4 years ago
- Basis of FragDenStaat.de's βKoalitionstrackerββ15Jul 14, 2025Updated 9 months ago
- Python library for converting between BioNLP formatsβ22Apr 20, 2023Updated 2 years ago
- Terminal tool that converts files encoding to UTF-8β10Oct 5, 2019Updated 6 years ago
- Small string compression using smaz compression algorithm. Fast, because it's in C. Supports Python 3+β13Oct 18, 2025Updated 5 months ago