UKGovernmentBEIS / as-evaluation-standardLinks
A repository that holds templates, examples, and tests to help external parties submit tasks to AISI that conform with the Autonomous Systems Team's Task Standard
☆11Updated 6 months ago
Alternatives and similar repositories for as-evaluation-standard
Users that are interested in as-evaluation-standard are comparing it to the libraries listed below
Sorting:
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆15Updated 8 months ago
- Package of Pathways-on-Cloud utilities☆17Updated last week
- ☆16Updated 4 months ago
- Playing around with various jailbreaking techniques ahead of the Gray Swan AI Ultimate Jailbreaking Competition☆14Updated 11 months ago
- ☆12Updated 2 weeks ago
- A bot that provides Youtube vid chapters on Twitter (a.k.a. X )☆12Updated 7 months ago
- ☆17Updated 10 months ago
- ☆11Updated 2 weeks ago
- A collection of Google Colab notebooks documenting a cruise from Buenos Aires to Antarctica and back through Chile, aboard the Holland Am…☆10Updated 7 months ago
- UNLP 2025 Shared Task on Detecting Social Media Manipulation☆21Updated last month
- A tool to build a graph from a codebase☆14Updated 6 months ago
- MLHub is a collection of impactful machine learning projects designed for learners and enthusiasts in the field of data science. Our goa…☆13Updated last week
- AI agent to handle and respond customer emails with internal knowledge base.☆12Updated 4 months ago
- Intelligent tour management system using multiple specialized agents to handle availability queries, cancellations, and reviews through a…☆11Updated 9 months ago
- ☆21Updated 5 months ago
- CodeRepoQA dataset☆11Updated 6 months ago
- Demo tutorial on how to program in Python an autonomous bot that plays the GeoGuessr game, using different Vision LLMs with LangChain☆11Updated 10 months ago
- ☆44Updated 2 weeks ago
- 🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.…☆12Updated 6 months ago
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆19Updated 7 months ago
- Context-Informed Machine Translation of Manga using Multimodal Large Language Models☆11Updated 9 months ago
- Measuring the situational awareness of language models☆38Updated last year
- 2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models (WWW 2025)☆10Updated 4 months ago
- Instagram Automation Tool is a framework that automates various Instagram tasks, including file-based operations and web automation (via …☆16Updated 4 months ago
- Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs (EMNLP 2024)☆14Updated 9 months ago
- ☆31Updated this week
- Interpreting Learned Search and Planning: Reverse-engineering recurrent convolutional networks (DRC) that play Sokoban☆15Updated 2 months ago
- Simulation-based Digital Twin for Production and Logistics Material Flows☆15Updated this week
- ☆106Updated 6 months ago
- Common tools for data processing☆18Updated 2 weeks ago