Nkluge-correa / TeenyTinyLlama
A pair of tiny foundational models trained in Brazilian Portuguese.🦙🦙
☆34Updated 2 months ago
Alternatives and similar repositories for TeenyTinyLlama:
Users that are interested in TeenyTinyLlama are comparing it to the libraries listed below
- ☆47Updated last year
- Code for training and evaluating T5 on Portuguese data.☆86Updated 2 years ago
- A Natural Portuguese Language Benchmark (Napolab) for the evaluation of language models.☆67Updated 3 weeks ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆75Updated last year
- Natively pre-trained open-source Portuguese language models.☆58Updated last month
- FaQuAD reading comprehension dataset and related code to reproduce experiments from Sayama et al. (BRACIS 2019).☆8Updated 2 years ago
- Finetuning Stanford Alpaca (LLaMA) with Brazilian Portuguese data☆39Updated last year
- ☆29Updated last year
- Portuguese translation of the GLUE benchmark and Scitail dataset☆31Updated 2 years ago
- Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.☆45Updated 3 months ago
- Pre-train Static Word Embeddings☆51Updated 3 weeks ago
- Efficient few-shot learning with cross-encoders.☆50Updated last year
- Pretrained segmenter models for Portuguese legislative text.☆14Updated 5 months ago
- Official Implementation of the 'When XGBoost Outperforms GPT-4 on Text Classification: A Case Study' NAACL-W 2024 paper☆13Updated 3 months ago
- 👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)☆23Updated last year
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆62Updated last month
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆124Updated 4 months ago
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated last year
- ☆48Updated this week
- ☆23Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆78Updated last year
- Using short models to classify long texts☆21Updated 2 years ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆72Updated 2 years ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- Nadir: Cutting-edge PyTorch optimizers for simplicity & composability! 🔥🚀💻☆14Updated 9 months ago
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆166Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆73Updated 5 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 5 months ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago