1. Home
  2. Research
  3. Centers & Labs
  4. RIKEN Center for Advanced Intelligence Project
  5. Goal-Oriented Technology Research Group

RIKEN Center for Advanced Intelligence Project Sound Scene Understanding Team

Team Leader: Kazuyoshi Yoshii (Ph.D.)

Research Summary

Kazuyoshi  Yoshii(Ph.D.)

The Sound Scene Understanding team is developing analysis techniques for various kinds of audio signals including speech, music, and environmental sounds. Our approach is to formulate physically- or theoretically-reasonable probabilistic generative models that reflect the characteristics of target signals and solve the inverse problem. We tackle real-world problems by integrating Bayesian learning with deep learning.

Main Research Fields

  • Computer Science

Related Research Fields

  • Engineering
  • Mathematics

Research Subjects

  • Statistical Audio Signal Processing (Source Separation/Localization, Speech Enhancement)
  • Bayesian Learning (Hierarchical Bayes, Nonparametric Bayes)
  • Music Information Processing (Source Separation, Automatic Music Transcription)

Selected Publications

Papers with an asterisk(*) are based on research conducted outside of RIKEN.

  • 1.Yoshii, K., Nakamura, E., Itoyama, K., & Goto, M.:
    "Infinite Probabilistic Latent Component Analysis For Audio Source Separation"
    IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017.
  • 2.Liutkus, A., & Yoshii, K.:
    "A Diagonal Plus Low-Rank Covariance Model For Computationally Efficient Source Separation"
    IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017.
  • 3.*Wake, M., Bando, Y., Mimura, M., Itoyama, K., Yoshii, K., & Kawahara, T.:
    "Semi-Blind Speech Enhancement Based On Recurrent Neural Network For Source Separation And Dereverberation"
    IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017.
  • 4.*Mimura, M., Bando, Y., Shimada, K., Sakai, S., Yoshii, K., & Kawahara, T.:
    "Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition"
    Annual Conference of the International Speech Communication Association (Interspeech), 2017.
  • 5.*Nishikimi, R., Nakamura, E., Goto, M., Itoyama, K., & Yoshii, K.:
    "Scale- and Rhythm-Aware Musical Note Estimation for Vocal F0 Trajectories Based on a Semi-Tatum-Synchronous Hierarchical Hidden Semi-Markov Model"
    International Society for Music Information Retrieval Conference (ISMIR), 2017
  • 6.*Tsushima, H., Nakamura, E., Itoyama, K., & Yoshii, K.:
    "Function- and Rhythm-Aware Melody Harmonization Based on Tree-Structured Parsing and Split-Merge Sampling of Chord Sequences"
    International Society for Music Information Retrieval Conference (ISMIR), 2017
  • 7.*Itakura, K., Bando, Y., Nakamura, E., Itoyama, K., Yoshii, K., & Kawahara, T.:
    "Bayesian Multichannel Nonnegative Matrix Factorization for Audio Source Separation and Localization"
    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 551–555, 2017.
  • 8.*Yoshii K., Tomioka, R., Mochihashi, D., & Goto M.:
    "Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals"
    International Conference on Machine Learning (ICML), pp. 576–584, 2013.
  • 9.*Yoshii, K., & Goto, M.:
    "A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation"
    IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, No. 3, pp. 717–730, 2012.

Related Links

Lab Members

Principal investigator

Kazuyoshi Yoshii
Team Leader

Core members

Aditya Arie Nugraha
Research Scientist
Diego Di Carlo
Postdoctoral Researcher
Yoshiaki Bando
Visiting Scientist
Hidetoshi Shimodaira
Visiting Scientist
Makoto Yamada
Visiting Scientist
Yihua Zhu
Research Part-time Worker I
Yoshiaki Sumura
Research Part-time Worker I
Momose Oyama
Research Part-time Worker I
Yoto Fujita
Research Part-time Worker I

Contact Information

Yoshida-honmachi, Sakyo, Kyoto 606-8501
Email: kazuyoshi.yoshii [at] riken.jp

Top