RIKEN Center for Advanced Intelligence Project Multimodal Visual Intelligence Team

Team Director: Takayuki Okatani (D.Eng.)

Japanese Page

Research Summary

Although AI systems centered on large language models can describe the scenes and events captured in images and video to some extent, they still lack a deep understanding of the real world. We are researching and developing AI that integrates visual information with a variety of other modalities to achieve true real-world comprehension, aiming to solve practical challenges such as bridge and road inspection as well as autonomous driving and driver assistance.

Main Research Fields

Informatics

Related Research Fields

Engineering
Complex Systems
Computer Vision

Keywords

AI and Robotic Technologies for Infrastructure Inspection and Management
Real-World Applications of Multimodal AI

Selected Publications

1. Charoenpitaks, Korawat, Van-Quang Nguyen, Masanori Suganuma, Masahiro Takahashi, Ryoma Niihara, and Takayuki Okatani.:
"Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction"
IEEE Transactions on Intelligent Vehicles, Early Access, 1-11 (2024).
2. Yamane, Tatsuro, Pang-jo Chun, Ji Dang, and Takayuki Okatani.:
"Deep learning-based bridge damage cause estimation from multiple images using visual question answering"
Structure and Infrastructure Engineering 1-14 (2024)
3. Kunlamai, Thannarot, Tatsuro Yamane, Masanori Suganuma, Pang‐Jo Chun, and Takayaki Okatani.:
"Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs"
Computer‐Aided Civil and Infrastructure Engineering 39, no. 3, 345-361 (2024)
4. Zhang, Jie, Masanori Suganuma, and Takayuki Okatani.:
"Contextual affinity distillation for image anomaly detection"
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 149-158 (2024)
5. Lu, Xiangyong, Masanori Suganuma, and Takayuki Okatani.:
"SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers"
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1123-1133 (2024)
6. Wang, Zhijie, Masanori Suganuma, and Takayuki Okatani.:
"Rethinking unsupervised domain adaptation for semantic segmentation"
Pattern Recognition Letters 186, 119-125 (2024)
7. Aota, Toshimichi, Lloyd Teh Tzer Tong, and Takayuki Okatani.:
"Zero-shot versus many-shot: Unsupervised texture anomaly detection"
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 5564-5572 (2023)
8. Zhu, Yanjie, Hidehiko Sekiya, Takayuki Okatani, Masayuki Tai, and Shogo Morichika.:
"B-CNN: a deep learning method for accelerometer-based fatigue cracks monitoring system"
Journal of Civil Structural Health Monitoring 13, no. 4, 947-959 (2023)
9. Wang, Zhijie, Xing Liu, Masanori Suganuma, and Takayuki Okatani.:
"Unsupervised domain adaptation for semantic segmentation via cross-region alignment"
Computer Vision and Image Understanding 234 (2023)
10. Nguyen, Van-Quang, Masanori Suganuma, and Takayuki Okatani.:
"Grit: Faster and better image captioning transformer using dual visual features"
In European Conference on Computer Vision, 167-184 (2022)

Lab Members

Principal investigator

Takayuki Okatani: Team Director

Core members

Quang Van Nguyen: Postdoctoral Researcher
Zhijie Wang: Postdoctoral Researcher
Hidehiko Sekiya: Visiting Scientist
Pang-jo Chun: Visiting Scientist

Contact Information

Aramaki Aza Aoba, Aoba-ku,
980-8579 Sendai, Miyagi,
Japan
Email: takayuki.okatani@riken.jp