πΈ GlotCC Dataset and Pipline -- NeurIPS 2024
β20Apr 6, 2025Updated last year
Alternatives and similar repositories for GlotCC
Users that are interested in GlotCC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- πΈ GlotWeb: Web Indexing for Minority Languages (WWW 2026)β17Feb 27, 2026Updated last month
- β10Oct 2, 2024Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Repository for "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"β15Oct 4, 2024Updated last year
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific wayβ18Nov 4, 2025Updated 5 months ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMsβ47Sep 19, 2025Updated 6 months ago
- Gather pagegraph data from all over the internetβ30Updated this week
- A RAG that can scale π§π»βπ»β11May 28, 2024Updated last year
- Klexikon: A German Dataset for Joint Summarization and Simplificationβ17Oct 5, 2022Updated 3 years ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numbaβ38Oct 16, 2025Updated 6 months ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β27Feb 16, 2026Updated 2 months ago
- YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddingsβ13May 22, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- β17Jan 5, 2023Updated 3 years ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"β35Sep 20, 2025Updated 6 months ago
- Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMsβ13Feb 13, 2024Updated 2 years ago
- β44Feb 11, 2026Updated 2 months ago
- β23Aug 13, 2018Updated 7 years ago
- Sparse Embedding Compression for Scalable Retrieval in Recommender Systemsβ35Nov 21, 2025Updated 4 months ago
- Semantically Search Emojis From the Command Line!β13Nov 26, 2023Updated 2 years ago
- A missing piece of the Python multitask (both threads and processes) API: An extension that supports stateful worker pools & size-aware iβ¦β29Mar 8, 2026Updated last month
- β25Apr 28, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Starbucks: Improved Training for 2D Matryoshka Embeddingsβ23Jun 30, 2025Updated 9 months ago
- A Python module for retrieving script types of writing systems including alphabets, abjads, abugidas, syllabaries, logographs, featurals β¦β15Jul 19, 2024Updated last year
- Open-source Human Feedback Libraryβ11Oct 25, 2023Updated 2 years ago
- π Fine-tune OpenAI models for text classification, question answering, and moreβ17May 1, 2023Updated 2 years ago
- A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologiesβ21Oct 24, 2022Updated 3 years ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.β35Apr 1, 2025Updated last year
- TyDiP Multilingual Politeness dataset and codeβ12Oct 15, 2023Updated 2 years ago
- β43May 27, 2025Updated 10 months ago
- Tool for sentiment analysis annotationβ13Mar 26, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Featurize words into orthographic and phonological vectors.β42May 20, 2023Updated 2 years ago
- LVAS-Agent Code Baseβ22Apr 15, 2025Updated last year
- A neural network that jointly part-of-speech tags and lemmatizes sentences, boosting accuracy for morphologically-rich languages (Czech, β¦β34Apr 5, 2019Updated 7 years ago
- Keyphrase Extraction Prototypesβ15Nov 24, 2016Updated 9 years ago
- Python library to use Pleias-RAG modelsβ71May 1, 2025Updated 11 months ago
- John Langford's original release of Vowpal Wabbit -- a fast online learning algorithmβ16Jul 25, 2017Updated 8 years ago
- Cluster paraphrases by word senseβ12Jan 3, 2019Updated 7 years ago