RIKEN Center for Advanced Intelligence Project Multimodal Visual Intelligence Team
Team Director: Takayuki Okatani (D.Eng.)
Research Summary

Although AI systems centered on large language models can describe the scenes and events captured in images and video to some extent, they still lack a deep understanding of the real world. We are researching and developing AI that integrates visual information with a variety of other modalities to achieve true real-world comprehension, aiming to solve practical challenges such as bridge and road inspection as well as autonomous driving and driver assistance.
Main Research Fields
- Informatics
Related Research Fields
- Engineering
- Complex Systems
- Computer Vision
Keywords
- AI and Robotic Technologies for Infrastructure Inspection and Management
- Real-World Applications of Multimodal AI
Selected Publications
- 1.
Charoenpitaks, Korawat, Van-Quang Nguyen, Masanori Suganuma, Masahiro Takahashi, Ryoma Niihara, and Takayuki Okatani.:
"Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction"
IEEE Transactions on Intelligent Vehicles, Early Access, 1-11 (2024). - 2.
Yamane, Tatsuro, Pang-jo Chun, Ji Dang, and Takayuki Okatani.:
"Deep learning-based bridge damage cause estimation from multiple images using visual question answering"
Structure and Infrastructure Engineering 1-14 (2024) - 3.
Kunlamai, Thannarot, Tatsuro Yamane, Masanori Suganuma, Pang‐Jo Chun, and Takayaki Okatani.:
"Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs"
Computer‐Aided Civil and Infrastructure Engineering 39, no. 3, 345-361 (2024) - 4.
Zhang, Jie, Masanori Suganuma, and Takayuki Okatani.:
"Contextual affinity distillation for image anomaly detection"
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 149-158 (2024) - 5.
Lu, Xiangyong, Masanori Suganuma, and Takayuki Okatani.:
"SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers"
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1123-1133 (2024) - 6.
Wang, Zhijie, Masanori Suganuma, and Takayuki Okatani.:
"Rethinking unsupervised domain adaptation for semantic segmentation"
Pattern Recognition Letters 186, 119-125 (2024) - 7.
Aota, Toshimichi, Lloyd Teh Tzer Tong, and Takayuki Okatani.:
"Zero-shot versus many-shot: Unsupervised texture anomaly detection"
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 5564-5572 (2023) - 8.
Zhu, Yanjie, Hidehiko Sekiya, Takayuki Okatani, Masayuki Tai, and Shogo Morichika.:
"B-CNN: a deep learning method for accelerometer-based fatigue cracks monitoring system"
Journal of Civil Structural Health Monitoring 13, no. 4, 947-959 (2023) - 9.
Wang, Zhijie, Xing Liu, Masanori Suganuma, and Takayuki Okatani.:
"Unsupervised domain adaptation for semantic segmentation via cross-region alignment"
Computer Vision and Image Understanding 234 (2023) - 10.
Nguyen, Van-Quang, Masanori Suganuma, and Takayuki Okatani.:
"Grit: Faster and better image captioning transformer using dual visual features"
In European Conference on Computer Vision, 167-184 (2022)
Related Links
Lab Members
Principal investigator
- Takayuki Okatani
- Team Director
Core members
- Quang Van Nguyen
- Postdoctoral Researcher
- Zhijie Wang
- Postdoctoral Researcher
- Hidehiko Sekiya
- Visiting Scientist
- Pang-jo Chun
- Visiting Scientist
Contact Information
Aramaki Aza Aoba, Aoba-ku,
980-8579 Sendai, Miyagi,
Japan
Email: takayuki.okatani@riken.jp