cisnlp / GlotCCView external linksLinks
πΈ GlotCC Dataset and Pipline -- NeurIPS 2024
β20Apr 6, 2025Updated 10 months ago
Alternatives and similar repositories for GlotCC
Users that are interested in GlotCC are comparing it to the libraries listed below
Sorting:
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated 10 months ago
- πΈ GlotWeb: Web Indexing for Low-Resource Languages -- under construction.β17Aug 13, 2025Updated 6 months ago
- β10Oct 2, 2024Updated last year
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMsβ47Sep 19, 2025Updated 4 months ago
- Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMsβ13Feb 13, 2024Updated 2 years ago
- The SETimes.HR+ Croatian dependency treebankβ16Dec 27, 2016Updated 9 years ago
- Repository for "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"β15Oct 4, 2024Updated last year
- Klexikon: A German Dataset for Joint Summarization and Simplificationβ17Oct 5, 2022Updated 3 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β27Updated this week
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engineβ25Dec 20, 2024Updated last year
- β40May 27, 2025Updated 8 months ago
- β44Updated this week
- β24Aug 13, 2018Updated 7 years ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.β34Apr 1, 2025Updated 10 months ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"β35Jun 13, 2025Updated 8 months ago
- cbReader - A simple web-based comic book reader (CBZ/CBR)β10May 21, 2018Updated 7 years ago
- TyDiP Multilingual Politeness dataset and codeβ12Oct 15, 2023Updated 2 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"β30Apr 2, 2022Updated 3 years ago
- Code and data for the WSDM '19 paper "Crosslingual Document Embedding as Reduced-Rank Ridge Regression (Cr5)"β30Aug 17, 2019Updated 6 years ago
- Finite-state script normalization and processing utilitiesβ46Jan 14, 2026Updated last month
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"β33Sep 20, 2025Updated 4 months ago
- A neural network that jointly part-of-speech tags and lemmatizes sentences, boosting accuracy for morphologically-rich languages (Czech, β¦β34Apr 5, 2019Updated 6 years ago
- β26Feb 6, 2026Updated last week
- β72Jan 29, 2026Updated 2 weeks ago
- Cannabis strain informationβ10Feb 20, 2016Updated 9 years ago
- β14Dec 5, 2025Updated 2 months ago
- Featurize words into orthographic and phonological vectors.β41May 20, 2023Updated 2 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP researchβ34Dec 8, 2022Updated 3 years ago
- Residual Quantization Autoencoder, used for interpreting LLMsβ14Jan 1, 2025Updated last year
- A hubot script to perform Service Now API record lookups.β10Apr 17, 2023Updated 2 years ago
- β12Jun 5, 2019Updated 6 years ago
- A benchmark dataset designed to support the development and evaluation of large language models (LLMs) for conversational mental health aβ¦β17Feb 24, 2025Updated 11 months ago
- Open-source Human Feedback Libraryβ11Oct 25, 2023Updated 2 years ago
- Linear Attention for Efficient Bidirectional Sequence Modelingβ15May 13, 2025Updated 9 months ago
- A semi print-in-place hand for human-like manipulation, designed to be built by anyone.β17Jan 5, 2026Updated last month
- Experimental framework taking inspiration from biological systems, combining compression-based architectures, group theory, and symmetry β¦β14Nov 13, 2025Updated 3 months ago
- Crawler based on a modified browser to detect online tracking.β11Jul 19, 2023Updated 2 years ago