catalina17/XFlow

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/catalina17/XFlow)

catalina17 / XFlow

Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)

☆31

Alternatives and similar repositories for XFlow

Users that are interested in XFlow are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

itaigat / removing-bias-in-multi-modal-classifiers
View on GitHub
☆34Jan 5, 2021Updated 5 years ago
Hunter-P / 2020-KDD-Cup-Multimodalities-Recall
View on GitHub
人人都能看懂的轻量级解决方案
☆15Jul 10, 2020Updated 6 years ago
juntang-zhuang / explain_invertible
View on GitHub
repo for "Decision explanation and feature importance for invertible networks"
☆14Nov 13, 2019Updated 6 years ago
fmenat / multiviewRS-models
View on GitHub
List of deep learning models proposed for remote sensing (RS) multi-view data
☆16May 5, 2026Updated 2 months ago
narVidhai / Speech-Transcription-Benchmarking
View on GitHub
Example python scripts to evaluate various ASR methods
☆11Dec 22, 2021Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
smallflyingpig / learning-to-fool-the-speaker-recognition
View on GitHub
code for paper "learning to fool the speaker recognition"
☆10Jun 12, 2020Updated 6 years ago
ms-dot-k / Multi-head-Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
☆27Mar 9, 2024Updated 2 years ago
ZhilZheng / Lr-LiVAE
View on GitHub
Tensorflow implementation of Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions (CVPR 2019)
☆31Nov 5, 2019Updated 6 years ago
GussailRaat / EMNLP-18-MMMU-BA
View on GitHub
Contextual Inter-modal Attention for Multi-modal Sentiment Analysis
☆11Feb 24, 2021Updated 5 years ago
ms-dot-k / Visual-Audio-Memory
View on GitHub
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
☆22Apr 11, 2022Updated 4 years ago
CQU-BITS / MCN-main
View on GitHub
The implementation of Multiplication Convolutional Networks (MCN) in Pytorch.
☆22May 17, 2024Updated 2 years ago
pandayuanyu / HCFusion
View on GitHub
Offical code for Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local Cross-modal Attention
☆20Jul 16, 2025Updated last year
Li-Sanze / ID-Card
View on GitHub
给定一张身份证正、反面，识别身份证上的所有文字信息
☆10Sep 4, 2019Updated 6 years ago
akashe / Multimodal-action-recognition
View on GitHub
Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
☆73Jun 7, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
fanyix / SlowFast
View on GitHub
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
☆14Aug 11, 2020Updated 5 years ago
TiagoBras / audio-clip-extractor
View on GitHub
This utility allows one to cut multiple clips from a single or multiple audio files.
☆19Apr 13, 2026Updated 3 months ago
woosual / multiModalityFusionForClassification
View on GitHub
多模态数据融合：为了完成多模态数据融合，首先利用VGG16网络和cifar10数据集完成多输入网络的分类，在VGG16的基础之上，将前三层特征提取网络作为不同输入的特征提取网络，在中间层进行特征拼接，后面的卷积层用于提取融合特征，最后加上全连接层。该网络稍作修改就能同时提取…
☆103Sep 25, 2020Updated 5 years ago
yuyq96 / kaldifeat
View on GitHub
A light-weight Python library for computing Kaldi-style acoustic features based on NumPy
☆14Aug 17, 2020Updated 5 years ago
agsarthak / Goal-oriented-Dialogue-Systems
View on GitHub
Applying Deep Reinforcement Learning for dialogue generation. aka chatbot
☆13Apr 30, 2017Updated 9 years ago
Wojtab / minigpt-4-pipeline
View on GitHub
☆16Jun 6, 2023Updated 3 years ago
duncanplee / CARBayesST
View on GitHub
Spatio-Temporal Generalised Linear Mixed Models For Areal Unit Data
☆14Jan 16, 2023Updated 3 years ago
NoManNayeem / Langchain_CrewAI_Gemini-AI_Agents
View on GitHub
Langchain_CrewAI_Gemini - An Gemini AI powered AI Agent (Multi-Agent) Project.
☆14Mar 24, 2024Updated 2 years ago
otoolej / time_frequency_tracks
View on GitHub
Methods to extract tracks from time-frequency distributions; tracks can represent instantaneous frequency (IF) laws
☆10May 11, 2016Updated 10 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SSahuDS / Lipreading-Using-Mutimodal-Speech-Recognition
View on GitHub
Multimodal Speech Recognition for phoneme level prediction using Audio-Visual data from TCDTIMIT dataset implementing RNNs with LSTMs for…
☆15Jul 27, 2023Updated 2 years ago
arthurhero / deep_fill_2_pytorch
View on GitHub
Pytorch implementation of deep fill v2 (original by Jiayu et al.)
☆10Jun 26, 2019Updated 7 years ago
WisleyWang / DC-AI-LipReading
View on GitHub
☆11May 31, 2020Updated 6 years ago
SenticNet / multimodal-fusion
View on GitHub
Attention-based multimodal fusion for sentiment analysis
☆13Aug 14, 2018Updated 7 years ago
luanshiyinyang / ChineseOCR
View on GitHub
端到端的中文场景文字识别。
☆12Jun 27, 2022Updated 4 years ago
aravindhm / deconvnet_analysis
View on GitHub
Code for "Salient Deconvolutional Networks, Aravindh Mahendran, Andrea Vedaldi, ECCV 2016"
☆12Sep 28, 2016Updated 9 years ago
LeeYongHyeok / DCM_vgg_transformer
View on GitHub
Dual cross modality attention audio-visual speech recognition model based on vgg transformer with hybrid CTC/attention architecture using…
☆14Jul 2, 2020Updated 6 years ago
tyiannak / AUROS
View on GitHub
A ROS framework for Audio Analysis
☆12Apr 5, 2017Updated 9 years ago
AI-NERC-NUPT / MEDM
View on GitHub
Entropy Minimization vs. Diversity Maximization for Domain Adaptation
☆15Feb 9, 2020Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
atultiwari / LLaVA-Med
View on GitHub
Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.
☆10Nov 29, 2023Updated 2 years ago
AI-Research-BD / Keyword-MLP
View on GitHub
Official PyTorch implementation of "Attention-Free Keyword Spotting", Mashrur. M. Morshed & Ahmad Omar Ahsan, PML4DC @ ICLR 2022.
☆15Nov 5, 2022Updated 3 years ago
eliasgoldsztejn95 / PTDRL
View on GitHub
Hospital simulator with pedestrians and robot
☆15Oct 20, 2024Updated last year
luanshiyinyang / MLP
View on GitHub
Numpy手写BP神经网络，对比Dropout、Batch Normalization等训练技巧的效果。
☆10Dec 19, 2019Updated 6 years ago
yuweiwan / ASR-HMM-DNN
View on GitHub
speech recognition based on deep neural network/hidden markov model
☆10Jun 3, 2020Updated 6 years ago
Akella17 / speaker-embedding
View on GitHub
A deep neural network for finding text-independent speaker embedding written in tensorflow and tensorpack
☆10Feb 19, 2018Updated 8 years ago
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year