πΈ GlotCC Dataset and Pipline -- NeurIPS 2024
β20Apr 6, 2025Updated 11 months ago
Alternatives and similar repositories for GlotCC
Users that are interested in GlotCC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated 11 months ago
- πΈ GlotWeb: Web Indexing for Minority Languages (WWW 2026)β17Feb 27, 2026Updated last month
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- Repository for "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"β15Oct 4, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific wayβ18Nov 4, 2025Updated 4 months ago
- The SETimes.HR+ Croatian dependency treebankβ16Dec 27, 2016Updated 9 years ago
- A RAG that can scale π§π»βπ»β11May 28, 2024Updated last year
- Klexikon: A German Dataset for Joint Summarization and Simplificationβ17Oct 5, 2022Updated 3 years ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numbaβ37Oct 16, 2025Updated 5 months ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β27Feb 16, 2026Updated last month
- YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddingsβ13May 22, 2025Updated 10 months ago
- β17Jan 5, 2023Updated 3 years ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"β35Sep 20, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- β44Feb 11, 2026Updated last month
- β23Aug 13, 2018Updated 7 years ago
- Sparse Embedding Compression for Scalable Retrieval in Recommender Systemsβ35Nov 21, 2025Updated 4 months ago
- Semantically Search Emojis From the Command Line!β13Nov 26, 2023Updated 2 years ago
- A missing piece of the Python multitask (both threads and processes) API: An extension that supports stateful worker pools & size-aware iβ¦β29Mar 8, 2026Updated 2 weeks ago
- Starbucks: Improved Training for 2D Matryoshka Embeddingsβ22Jun 30, 2025Updated 8 months ago
- A Python module for retrieving script types of writing systems including alphabets, abjads, abugidas, syllabaries, logographs, featurals β¦β15Jul 19, 2024Updated last year
- Open-source Human Feedback Libraryβ11Oct 25, 2023Updated 2 years ago
- π Fine-tune OpenAI models for text classification, question answering, and moreβ17May 1, 2023Updated 2 years ago
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologiesβ21Oct 24, 2022Updated 3 years ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.β34Apr 1, 2025Updated 11 months ago
- Efficient encoder-decoder architecture for small language models (β€1B parameters) with cross-architecture knowledge distillation and visiβ¦β32Feb 7, 2025Updated last year
- Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"β32Jun 20, 2023Updated 2 years ago
- TyDiP Multilingual Politeness dataset and codeβ12Oct 15, 2023Updated 2 years ago
- Tool for sentiment analysis annotationβ13Mar 26, 2025Updated last year
- Featurize words into orthographic and phonological vectors.β42May 20, 2023Updated 2 years ago
- LVAS-Agent Code Baseβ22Apr 15, 2025Updated 11 months ago
- Finite-state script normalization and processing utilitiesβ46Mar 9, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Python library to use Pleias-RAG modelsβ68May 1, 2025Updated 10 months ago
- Code and data for the WSDM '19 paper "Crosslingual Document Embedding as Reduced-Rank Ridge Regression (Cr5)"β30Aug 17, 2019Updated 6 years ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"β34Jun 13, 2025Updated 9 months ago
- Keyphrase Extraction Prototypesβ15Nov 24, 2016Updated 9 years ago
- A neural network that jointly part-of-speech tags and lemmatizes sentences, boosting accuracy for morphologically-rich languages (Czech, β¦β34Apr 5, 2019Updated 6 years ago
- Rhythm analysis toolkit in Pythonβ13Sep 29, 2023Updated 2 years ago
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engineβ26Dec 20, 2024Updated last year