bigscience-workshop / model_card
☆24Updated 2 years ago
Related projects: ⓘ
- Hugging Face and Pyserini interoperability☆17Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆26Updated last year
- One stop shop for all things carp☆58Updated 2 years ago
- A library for squeakily cleaning and filtering language datasets.☆45Updated last year
- Developing tools to automatically analyze datasets☆68Updated 10 months ago
- Using short models to classify long texts☆20Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated last year
- ☆27Updated last year
- ☆19Updated last year
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated last year
- implementation of https://arxiv.org/pdf/2312.09299☆19Updated 2 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆33Updated last year
- Local emulator for Hugging Face Inference Endpoints customer handlers☆24Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆32Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆33Updated last year
- Embedding Recycling for Language models☆38Updated last year
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated 9 months ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated 8 months ago
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 5 months ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆60Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆13Updated last week
- Training and Inference Notebooks for the RedPajama (OpenLlama) models☆18Updated last year
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 2 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆31Updated 3 months ago
- Tools for merging pretrained large language models.☆19Updated 3 months ago
- ChatGPT Participates in a Computer Science Exam (2023)☆31Updated last year
- code for paper "Accessing higher dimensions for unsupervised word translation"☆19Updated last year