marcopoli / LLaMAntino-3-ANITA
The 🌟ANITA project🌟 *(Advanced Natural-based interaction for the ITAlian language)* wants to provide Italian NLP researchers with an improved model the for Italian Language 🇮🇹 use cases.
☆13Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLaMAntino-3-ANITA
- Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"☆46Updated 2 years ago
- ☆29Updated 9 months ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆42Updated last year
- ☆37Updated this week
- Shared code for training sentence embeddings with Flax / JAX☆27Updated 3 years ago
- Efficiently find the best-suited language model (LM) for your NLP task☆91Updated this week
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆56Updated 5 months ago
- A Python library aimed at dissecting and augmenting NER training data.☆56Updated last year
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆25Updated 2 months ago
- Retrieval-Augmented Generation battle!☆43Updated last week
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated last year
- A python package for benchmarking interpretability techniques on Transformers.☆212Updated last month
- Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)☆104Updated 2 years ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆92Updated last year
- A Python package to compute HONEST, a score to measure hurtful sentence completions in language models. Published at NAACL 2021.☆20Updated last year
- ☆26Updated last month
- Knowledge pills on Neural Search☆25Updated last year
- ☆65Updated last year
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆72Updated 2 years ago
- Benchmarking library for RAG☆123Updated this week
- Tools for managing datasets for governance and training.☆78Updated 3 weeks ago
- A collection of Italian benchmarks for LLM evaluation☆22Updated 3 weeks ago
- ☆45Updated 2 years ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated last year
- ☆95Updated last year
- Creating time-indexed datasets with clusters of texts as inputs and timeseries as targets.☆16Updated 3 weeks ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆183Updated last month
- ☆11Updated 3 years ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆149Updated 4 months ago