RIKEN Center for Advanced Intelligence Project Sound Scene Understanding Team
Team Leader: Kazuyoshi Yoshii (Ph.D.)
Research Summary

The Sound Scene Understanding team is developing analysis techniques for various kinds of audio signals including speech, music, and environmental sounds. Our approach is to formulate physically- or theoretically-reasonable probabilistic generative models that reflect the characteristics of target signals and solve the inverse problem. We tackle real-world problems by integrating Bayesian learning with deep learning.
Main Research Fields
- Computer Science
Related Research Fields
- Engineering
- Mathematics
Research Subjects
- Statistical Audio Signal Processing (Source Separation/Localization, Speech Enhancement)
- Bayesian Learning (Hierarchical Bayes, Nonparametric Bayes)
- Music Information Processing (Source Separation, Automatic Music Transcription)
Selected Publications
Papers with an asterisk(*) are based on research conducted outside of RIKEN.
- 1.Yoshii, K., Nakamura, E., Itoyama, K., & Goto, M.:
"Infinite Probabilistic Latent Component Analysis For Audio Source Separation"
IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017. - 2.Liutkus, A., & Yoshii, K.:
"A Diagonal Plus Low-Rank Covariance Model For Computationally Efficient Source Separation"
IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017. - 3.*Wake, M., Bando, Y., Mimura, M., Itoyama, K., Yoshii, K., & Kawahara, T.:
"Semi-Blind Speech Enhancement Based On Recurrent Neural Network For Source Separation And Dereverberation"
IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017. - 4.*Mimura, M., Bando, Y., Shimada, K., Sakai, S., Yoshii, K., & Kawahara, T.:
"Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition"
Annual Conference of the International Speech Communication Association (Interspeech), 2017. - 5.*Nishikimi, R., Nakamura, E., Goto, M., Itoyama, K., & Yoshii, K.:
"Scale- and Rhythm-Aware Musical Note Estimation for Vocal F0 Trajectories Based on a Semi-Tatum-Synchronous Hierarchical Hidden Semi-Markov Model"
International Society for Music Information Retrieval Conference (ISMIR), 2017 - 6.*Tsushima, H., Nakamura, E., Itoyama, K., & Yoshii, K.:
"Function- and Rhythm-Aware Melody Harmonization Based on Tree-Structured Parsing and Split-Merge Sampling of Chord Sequences"
International Society for Music Information Retrieval Conference (ISMIR), 2017 - 7.*Itakura, K., Bando, Y., Nakamura, E., Itoyama, K., Yoshii, K., & Kawahara, T.:
"Bayesian Multichannel Nonnegative Matrix Factorization for Audio Source Separation and Localization"
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 551–555, 2017. - 8.*Yoshii K., Tomioka, R., Mochihashi, D., & Goto M.:
"Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals"
International Conference on Machine Learning (ICML), pp. 576–584, 2013. - 9.*Yoshii, K., & Goto, M.:
"A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation"
IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, No. 3, pp. 717–730, 2012.
Related Links
Lab Members
Principal investigator
- Kazuyoshi Yoshii
- Team Leader
Core members
- Aditya Arie Nugraha
- Research Scientist
- Diego Di Carlo
- Postdoctoral Researcher
- Liam Kelley
- Intern
- Yoshiaki Bando
- Visiting Scientist
- Hidetoshi Shimodaira
- Visiting Scientist
- Makoto Yamada
- Visiting Scientist
- Yihua Zhu
- Research Part-time Worker I
- Yoshiaki Sumura
- Research Part-time Worker I
- Momose Oyama
- Research Part-time Worker I
- Yoto Fujita
- Research Part-time Worker I
Contact Information
Yoshida-honmachi, Sakyo, Kyoto 606-8501
Email: kazuyoshi.yoshii [at] riken.jp