1. Home
  2. Research
  3. Centers & Labs
  4. RIKEN Center for Advanced Intelligence Project
  5. Goal-Oriented Technology Research Group

RIKEN Center for Advanced Intelligence Project Sound Scene Understanding Team

Team Leader: Kazuyoshi Yoshii (Ph.D.)

Research Summary

Kazuyoshi  Yoshii(Ph.D.)

The Sound Scene Understanding team is developing analysis techniques for various kinds of audio signals including speech, music, and environmental sounds. Our approach is to formulate physically- or theoretically-reasonable probabilistic generative models that reflect the characteristics of target signals and solve the inverse problem. We tackle real-world problems by integrating Bayesian learning with deep learning.

Research Subjects:

  • Statistical Audio Signal Processing (Source Separation/Localization, Speech Enhancement)
  • Bayesian Learning (Hierarchical Bayes, Nonparametric Bayes)
  • Music Information Processing (Source Separation, Automatic Music Transcription)

Main Research Fields

  • Computer Science

Related Research Fields

  • Engineering
  • Mathematics

Selected Publications

  • 1. Yoshiaki Sumura, Diego Di Carlo, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii.:
    "Joint Audio Source Localization and Separation With Distributed Microphone Arrays Based on Spatially-Regularized Multichannel NMF."
    IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 145-149, September 2024.
  • 2. Liam Kelley, Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii.:
    "RIR-in-a-Box: Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation."
    Annual Conference of the International Speech Communication Association (Interspeech), pp. 3255–3259, September 2024.
  • 3. Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii.:
    "Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Direction."
    IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pp. 740–744, April 2024.
  • 4. Aditya Arie Nugraha, Diego Di Carlo, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii.:
    "Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning."
    IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.1–5, October 2023.
  • 5. Yoshiaki Bando, Yoshiki Masuyama, Aditya Arie Nugraha, Kazuyoshi Yoshii.:
    "Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation."
    European Signal Processing Conference (EUSIPCO),pp. 51–55, September 2023.
  • 6. Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii.:
    "Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments."
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 266–9273, October 2022.
  • 7. Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii.:
    "Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments."
    Annual Conference of the International Speech Communication Association Interspeech), pp. 2918–2922, September 2022.
  • 8. Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii.:
    "DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF."
    IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5, September 2022.
  • 9. Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii, Tatsuya Kawahara.:
    "Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation."
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30, pp. 2368–2382, 2022.
  • 10. Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii.:
    "Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation."
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30, pp. 1734–1748, 2022.

Related Links

Lab Members

Principal investigator

Kazuyoshi Yoshii
Team Leader

Core members

Aditya Arie Nugraha
Research Scientist
Diego Di Carlo
Postdoctoral Researcher
Yoshiaki Bando
Visiting Scientist
Hidetoshi Shimodaira
Visiting Scientist
Makoto Yamada
Visiting Scientist
Mathieu Francois Gustave Fontaine
Visiting Scientist
Momose Oyama
Research Part-time Worker I
Yoto Fujita
Research Part-time Worker I
Ryosuke Ono
Research Part-time Worker II
Ryunosuke Nihei
Research Part-time Worker II

Contact Information

Yoshida-honmachi, Sakyo, Kyoto 606-8501
Email: kazuyoshi.yoshii@riken.jp

Top