DataResponsibly / MirrorDataGenerator
MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. It focuses on how features relate with demographic attributes (e.g. gender, race, disability status, etc), which are considered as sensitive information for certain domains (e.g. employment, housing, etc).
☆18Updated 2 years ago
Related projects: ⓘ
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graph☆17Updated 6 months ago
- Streamlit app for recommending eval functions using prompt diffs☆24Updated 8 months ago
- This repository implements DSPy programs to tasks in Indian Languages☆11Updated 8 months ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 2 years ago
- Stuff related to scraping the Code Review StackExchange☆11Updated last year
- Ludwig benchmark☆19Updated 2 years ago
- Telemetry for applications that use LLM tools.☆24Updated last year
- Code for paper: "Privately generating tabular data using language models".☆14Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated last week
- ☆29Updated last year
- Causal Agent based on Large Language Model☆25Updated last month
- ChatBot App built using LangChain and Lightning AI☆17Updated last year
- Tools to make language models a bit easier to use☆22Updated last week
- examples and guides to using Nomic Atlas☆27Updated last week
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆26Updated 7 months ago
- Tool to take your ML model from local to production with one-line of code.☆23Updated 8 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆12Updated 6 months ago
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 5 months ago
- A logical, reasonably standardized, but flexible project structure for conducting ml research 🍪☆14Updated last month
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated last week
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆36Updated 5 years ago
- Writing Blog Posts with Generative Feedback Loops!☆41Updated 6 months ago
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated 5 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆26Updated last year
- Feste is a free and open-source framework allowing scalable composition of NLP tasks using a graph execution model that is optimized and …☆40Updated last year
- Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real…☆20Updated last month
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- ☆11Updated 10 months ago
- Drift detection module for machine learning pipelines.☆20Updated last year