Centers & Labs

Home > Research > Centers & Labs > RIKEN Center for Advanced Intelligence Project > Goal-Oriented Technology Research Group >

RIKEN Center for Advanced Intelligence Project

Sound Scene Understanding Team

Team Leader: Kazuyoshi Yoshii (Ph.D.)
Kazuyoshi  Yoshii(Ph.D.)

The Sound Scene Understanding team is developing analysis techniques for various kinds of audio signals including speech, music, and environmental sounds. Our approach is to formulate physically- or theoretically-reasonable probabilistic generative models that reflect the characteristics of target signals and solve the inverse problem. We tackle real-world problems by integrating Bayesian learning with deep learning.

Main Research Field

Computer Science

Related Research Fields

Engineering / Mathematics

Research Subjects

  • Statistical Audio Signal Processing (Source Separation/Localization, Speech Enhancement)
  • Bayesian Learning (Hierarchical Bayes, Nonparametric Bayes)
  • Music Information Processing (Source Separation, Automatic Music Transcription)

Selected Publications

Papers with an asterisk(*) are based on research conducted outside of RIKEN.
  1. Yoshii, K., Nakamura, E., Itoyama, K., & Goto, M.:
    "Infinite Probabilistic Latent Component Analysis For Audio Source Separation"
    IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017.
  2. Liutkus, A., & Yoshii, K.:
    "A Diagonal Plus Low-Rank Covariance Model For Computationally Efficient Source Separation"
    IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017.
  3. *Wake, M., Bando, Y., Mimura, M., Itoyama, K., Yoshii, K., & Kawahara, T.:
    "Semi-Blind Speech Enhancement Based On Recurrent Neural Network For Source Separation And Dereverberation"
    IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2017.
  4. *Mimura, M., Bando, Y., Shimada, K., Sakai, S., Yoshii, K., & Kawahara, T.:
    "Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition"
    Annual Conference of the International Speech Communication Association (Interspeech), 2017.
  5. *Nishikimi, R., Nakamura, E., Goto, M., Itoyama, K., & Yoshii, K.:
    "Scale- and Rhythm-Aware Musical Note Estimation for Vocal F0 Trajectories Based on a Semi-Tatum-Synchronous Hierarchical Hidden Semi-Markov Model"
    International Society for Music Information Retrieval Conference (ISMIR), 2017
  6. *Tsushima, H., Nakamura, E., Itoyama, K., & Yoshii, K.:
    "Function- and Rhythm-Aware Melody Harmonization Based on Tree-Structured Parsing and Split-Merge Sampling of Chord Sequences"
    International Society for Music Information Retrieval Conference (ISMIR), 2017
  7. *Itakura, K., Bando, Y., Nakamura, E., Itoyama, K., Yoshii, K., & Kawahara, T.:
    "Bayesian Multichannel Nonnegative Matrix Factorization for Audio Source Separation and Localization"
    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 551–555, 2017.
  8. *Yoshii K., Tomioka, R., Mochihashi, D., & Goto M.:
    "Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals"
    International Conference on Machine Learning (ICML), pp. 576–584, 2013.
  9. *Yoshii, K., & Goto, M.:
    "A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation"
    IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, No. 3, pp. 717–730, 2012.

Contact information

Yoshida-honmachi, Sakyo, Kyoto 606-8501
Tel: +81-(0)75-753-5386
Fax: +81-(0)75-753-5977

Email: yoshii [at] kuis.kyoto-u.ac.jp

Related links

Home > Research > Centers & Labs > RIKEN Center for Advanced Intelligence Project > Goal-Oriented Technology Research Group >