Yasuhiro MINAMI

Department of Computer and Network EngineeringProfessor
Cluster I (Informatics and Computer Engineering)Professor
Artificial Intelligence eXploration Research CenterProfessor
  • Profile:
    昭和61慶大・理工・電気卒.平成3同大大学院博士課程了.同年NTT入社.平成11-12MIT客員研究員.現在,電気通信大学教授.工博.音声認識,音声対話処理,知能情報処理の研究に従事.平成5日本音響学会粟屋潔学術奨励賞,平成15本会論文賞受賞.平成17年テレコムシステム技術賞受賞,平成18情報処理学会創立45周年記念論文優秀論文賞受賞.平成20テレコムシステム技術賞受賞.IEEE,日本音響学会,電子情報通信学会,情報処理学会各会員.

Degree

  • 工学博士, 慶應義塾大学

Field Of Study

  • Informatics, Learning support systems
  • Informatics, Intelligent robotics
  • Informatics, Intelligent informatics
  • Informatics, Human interfaces and interactions
  • Humanities & social sciences, Cognitive sciences

Career

  • 01 Jul. 2014
    電気通信大学 情報システム学研究科
  • 01 Jul. 2013 - 30 Jun. 2014
    Nippon Telegram and Telephone Corporation, Communication Science Laboratories, Group Leader
  • 01 Mar. 2002 - 30 Jun. 2013
    Nippon Telegram and Telephone Corporation, Communication Science Laborato6ries, Senior Researcher
  • 01 Mar. 1996 - 28 Feb. 2002
    Nippon Telegram and Telephone Corporation, Human Interface Laboratories, Senior Researcher
  • 01 Apr. 1991 - 28 Feb. 1996
    Nippon Telegram and Telephone Corporation, Human Interface Laboratories

Educational Background

  • 01 Apr. 1988 - 01 Mar. 1991
    Keio University, 理工学研究科, 電気工学専攻
  • 01 Apr. 1986 - 01 Mar. 1988
    Keio University, 理工学研究科, 電気工学専攻
  • 01 Apr. 1982 - 01 Mar. 1986
    Keio University, 理工学部, 電気工学科
  • 31 Mar. 1981
    都立三田

Member History

  • 01 Apr. 2015
    情報処理学会SLP研究会
  • Aug. 2013
    幼児言語発達研究会幹事, Society
  • 2011 - 2013
    音響学会関西支部評議委員
  • 2013
    情報処理学会関西支部委員, Society
  • 2009 - 2012
    情報処理学会関西支部幹事
  • 2011
    音響学会代議員, Society
  • 2011
    音響学会評議委員, Society

Award

  • Mar. 2023
    言語処理学会
    言語処理学会第29回年次大会優秀賞, 藤田守太;南泰浩
  • Mar. 2022
    言語処理学会
    対話での共通基盤構築過程における名付けの分析
    言語処理学会 第28回年次大会 委員特別賞, 齋藤 結;光田航;東中竜一;南泰浩
    Japan society
  • Mar. 2020
    言語処理学会
    センター試験を対象とした高性能な英語ソルバーの実現
    言語処理学会第26回年次大会 優秀賞
    Japan society
  • Mar. 2018
    言語処理学会
    DRQNによる幼児の語彙獲得のモデル化
    若手奨励賞
  • Mar. 2017
    言語処理学会
    「ロボットは東大に入れるか」プロジェクト: 代ゼミセンター模試タスクにおけるエラーの分析
    言語処理学会論文賞, 松崎 拓也;横野 光;宮尾 祐介;川添 愛;狩野 芳伸;加納 隼人;佐藤 理史;東中 竜一郎;杉山 弘晃;磯崎 秀樹;菊井 玄一郎;堂坂 浩二;平 博順;南 泰浩;新井 紀子
  • 2014
    人工知能学会2014年度全国大会優秀賞,受賞者は筆頭著者の目黒のみ
  • 2013
    言語処理学会第18回年次大会優秀賞
  • 2012
    赤ちゃん学会 ポスター優秀発表賞
  • 2012
    NTT知的財産センタ所長表彰
  • 2011
    人工知能学会 2011年度全国大会優秀賞,受賞者は筆頭著者の堂坂のみ
  • 2010
    人工知能学会2010年度全国大会優秀賞,受賞者は筆頭著者の堂坂のみ
  • 2008
    テレコムシステム技術賞, 南 泰浩
  • 2008
    COLING Best paper finalist
  • 2007
    NTTテクニカルレビュー特集論文賞
  • 2007
    NTTコミュニケーション科学基礎研究所長表彰 特別賞
  • 2006
    情報処理学会創立45周年記念論文 「50年後の情報科学技術をめざして」優秀論文賞, 南 泰浩
  • 2005
    テレコムシステム技術賞, 南 泰浩
  • 2005
    NTT先端技術総合研究所長表彰 研究開発賞
  • 2005
    NTTコミュニケーション科学基礎研究所長表彰
  • 2004
    電子情報通信学会論文賞, 南 泰浩
  • 1996
    NTT研究技術開発本部長表彰
  • 1993
    音響学会粟屋潔学術奨励賞, 南 泰浩

Paper

  • Dataset Construction for Scientific-Document Writing Support by Extracting Related Work Section and Citations from PDF Papers
    Keita Kobayashi; Kohei Koyama; Hiromi Narimatsu; Yasuhiro Minami
    13th Edition of its Language Resources and Evaluation Conference, to appear, 14 Jun. 2022, Peer-reviwed
    International conference proceedings, English
  • Probabilistic model using HDP producing vocabularies of Japanese children
    Yasuhiro Minami; Tessei Kobayashi
    Lead, Conference on Interdisciplinary Advances in Statistical Learning, 96-96, Jun. 2022, Peer-reviwed
  • Characteristic of language development mechanism of children in the residential care institution
    Yuka Sakamoto; Yuko Okumura; Yasuhiro Minami; Ryoko Mugitani; Kayoko Ito; Tessei Kobayashi
    International Congress of Psychology, ICP2020, 19 Jul. 2021, Peer-reviwed
    International conference proceedings, English
  • Predicting New Words for Young Japanese Children using Large-scaled Japanese Child Vocabulary Development Database
    Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi; Yuka Sakamoto
    International Congress of Psychology, ICP2020, 19 Jul. 2021, Peer-reviwed
    International conference proceedings, English
  • Using mobile phone data to estimate the relationship between population flow and influenza infection pathways
    Qiushi Chen; Michiko Tsubaki; Yasuhiro Minami; Kazutoshi; Fujibayashi; Tetsuro Yumoto; Junzo Kamei; Yuka Yamada; Hidenori; Kominato; Hideki Oono; Toshio Naito
    MDPI, International Journal of Environmental Research and Public Health, 18, 14, 2021, Peer-reviwed, True, This study aimed to analyze population flow using global positioning system (GPS) location data and evaluate influenza infection pathways by determining the relationship between population flow and the number of drugs sold at pharmacies. Neural collective graphical models (NCGMs; Iwata and Shimizu 2019) were applied for 25 cell areas, each measuring 10 × 10 km2, in Osaka, Kyoto, Nara, and Hyogo prefectures to estimate population flow. An NCGM uses a neural network to incorporate the spatiotemporal dependency issue and reduce the estimated parameters. The prescription peaks between several cells with high population flow showed a high correlation with a delay of one to two days or with a seven-day time-lag. It was observed that not much population flows from one cell to the outside area on weekdays. This observation may have been due to geographical features and undeveloped transportation networks. The number of prescriptions for anti-influenza drugs in that cell remained low during the observation period. The present results indicate that influenza did not spread to areas with undeveloped traffic networks, and the peak number of drug prescriptions arrived with a time lag of several days in areas with a high amount of area-to-area movement due to commuting.
    Scientific journal, English
  • Task Definition and Integration For Scientific-Document Writing Support
    H. Narimatsu; K. Koyama; K. Dohsaka; R. Higashinaka; Y. Minami; H. Taira
    Online: Association for Computational Linguistics, 発表予定, 18-26, 2021, Peer-reviwed
    International conference proceedings, English
  • Properties of early vocabulary development in Japanese-English bilingual children
    Yuka Sakamoto; Yuko Okumura; Tessei Kobayashi; Yasuhiro Minami
    BCCCD 2020 (Budapest CEU Conference on cognitive development), PB-038, 20 Jan. 2020, Peer-reviwed
    International conference proceedings, English
  • Vocabulary Size As Explanatory Variable for Japanese-Speaking Children’s Vocabulary Development
    Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi
    ICPS, 発表予定, 07 Mar. 2019, Peer-reviwed
    International conference proceedings, English
  • Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database
    Yasuhiro Minami; Tessei Kobayashi; Yuko Okumura
    LREC 2018, P26, 10 May 2018, Peer-reviwed
    International conference proceedings, English
  • Analyzing Vocabulary Commonality Index Using Large-scaled Database of Child Language Development
    Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi
    LREC 2018, P55, 10 May 2018, Peer-reviwed
    International conference proceedings, English
  • Acquisition of infant-directed speech words in Japanese-speaking children: Analysis using large-scale vocabulary-checklist data
    Yuko OKUMURA; Tessei KOBAYASHI; Yasuhiro MINAMI; Yusuke MORIYAMA
    interdisciplinary advances in statistical learning, to appear, 28 Jun. 2017, Peer-reviwed
    International conference proceedings, English
  • Word acquisition correlation in Japanese-speaking children using large-scale infant vocabulary development database
    Yasuhiro Minami; Yusuke Moriyama; Tessei Kobayash; Yuko Okumura
    interdisciplinary advances in statistical learning, To appear, 28 Jun. 2017, Peer-reviwed
    International conference proceedings, English
  • Acquisition of mental state language in Japanese-speaking children: Analysis using large-scale vocabulary-checklist data
    Yuko OKUMURA; Tessei KOBAYASHI; Yasuhiro MINAMI
    WILD, to appear, 14 Jun. 2017, Peer-reviwed
    International conference proceedings, English
  • Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion
    Toru Nakashika; Yasuhiro Minami
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, SPRINGER INTERNATIONAL PUBLISHING AG, 1-10, Jun. 2017, Peer-reviwed, In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: (1) the data used for the training is limited to the pre-defined sentences, (2) the trained model is only applied to the speaker pair used in the training, and (3) a mismatch in alignment may occur. Although it is generally preferable in VC to not use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by combining the two. Our experimental results showed that our approach outperformed the conventional non-parallel approach regarding objective and subjective criteria.
    Scientific journal, English
  • Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Toru Nakashika; Tetsuya Takiguchi; Yasuhiro Minami
    IEEE Transactions on Audio, Speech and Language Processing, 24, 11, 2045, Oct. 2016, Peer-reviwed
    Scientific journal, English
  • Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
    Toru Nakashika; Yasuhiro Minami
    Proceedings of the 17th Conference of the International Speech Communication Association (Interspeech 2016), 1487-1491, Sep. 2016, Peer-reviwed
    International conference proceedings, English
  • 3WRBM-Based Speech Factor Modeling for Arbitrary-Source and Non-Parallel Voice Conversion
    Toru Nakashika; Yasuhiro Minami
    Interspeech 2016, 1487-1491, Sep. 2016, Peer-reviwed
    International conference proceedings, English
  • Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Toru Nakashika; Tetsuya Takiguchi; Yasuhiro Minami
    IEEE/ACM Transactions on Audio, Speech and Language Processing, 23, 3, 1-14, Aug. 2016, Peer-reviwed
    Scientific journal, English
  • 3WRBM-Based Speech Factor Modeling for Arbitrary-Source and Non-Parallel Voice Conversion
    Toru Nakashika; Yasuhiro Minami
    EUSIPCO, 607-611, Aug. 2016, Peer-reviwed
    International conference proceedings, English
  • 「ロボットは東大に入れるか」プロジェクト:代ゼミセンター模試タスクにおけるエラーの分析
    松崎 拓也; 横野 光; 宮尾 祐介; 川添 愛; 狩野 芳伸; 加納 隼人; 佐藤 理史; 東中 竜一郎; 杉山 弘晃; 磯崎 秀樹; 菊井 玄一郎; 堂坂 浩二; 平 博順; 南 泰浩; 新井 紀子
    自然言語処理, 23, 1, Jan. 2016, Peer-reviwed
    Scientific journal, Japanese
  • SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
    Torsi Nakashika; Yasuhiro Minami
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, IEEE, 5530-5534, 2016, Peer-reviwed, In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is a technique where only speaker specific information in source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems; 1) the data used for the training is limited to the pre-defined sentences, 2) the trained model is only applied to the speaker pair used in the training, and 3) mismatch in alignment may happen. Although it is, thus, fairy preferable in VC not to use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then a voice-converted speech is obtained by mixing the two. Our experimental results showed that our approach unfortunately fell short of the popular conventional GMM-based method that used parallel data, but outperformed the conventional non-parallel approach.
    International conference proceedings, English
  • 幼児を対象としたテキストの対象年齢推定方法
    藤田早苗; 小林哲生; 南泰浩; 杉山弘晃
    認知科学, 22, 4, 1-17, Dec. 2015, Peer-reviwed
    Scientific journal, Japanese
  • Fluctuating Development of Common Nouns and Predicates in Early Lexical Development: Evidence from Analysis of Large sample Vocabulary Checklist Data in Japanese children
    Tessei Kobayashi; Yasuhiro Minami; Yuko Okumura
    ECDP, To appear, 08 Sep. 2015, Peer-reviwed
    International conference proceedings, English
  • Taking the English exam for the "can a robot get into the University of Tokyo?" project
    Ryuichiro Higashinaka; Hiroaki Sugiyama; Hideki Isozaki; Genichiro Kikui; Kohji Dohsaka; Hirotoshi Taira; Yasuhiro Minami
    NTT Technical Review, 13, 7, 01 Jul. 2015, NTT and its research partners are participating in the "Can a robot get into the University of Tokyo?" project run by the National Institute of Informatics, which involves tackling English exams. The artificial intelligence system we developed took a mock test in 2014 and achieved a better-than-human-average score for the first time. This was a notable achievement since English exams require English knowledge and also common sense knowledge that humans take for granted but that computers do not necessarily possess. In this article, we describe how our artificial intelligence system takes on English exams.
    Scientific journal
  • Gender variability of child word-comprehension and -production days
    Yasuhiro Minami; Tessei Kobayashi
    WILD, 未定, 10 Jun. 2015, Peer-reviwed
    International conference proceedings, English
  • 任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成
    杉山弘晃; 目黒豊美; 東中竜一郎; 南泰浩
    人工知能学会論文誌, 人工知能学会, 30, 1, 183-194, Jan. 2015, Peer-reviwed
    Scientific journal, Japanese
  • 「ロボットは東大に入れるか」における英語問題の回答手法
    東中 竜一郎; 杉山 弘晃; 磯崎 秀樹; 菊井 玄一郎; 堂坂 浩二; 平 博順; 南 泰浩
    NTT技術ジャーナル, 電気通信協会, 27, 4, 63-66, 2015
    Research institution, Japanese
  • Effects of Conversational Agents on Activation of Communication in Thought-Evoking Multi-Party Dialogues
    Kohji Dohsaka; Ryota Asai; Ryuichiro Higashinaka; Yasuhiro Minami; Eisaku Maeda
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E97D, 8, 2147-2156, Aug. 2014, Peer-reviwed, This paper presents an experimental study that analyzes how conversational agents activate human communication in thought-evoking multi-party dialogues between multi-users and multi-agents. A thought-evoking dialogue is a kind of interaction in which agents act to provoke user thinking, and it has the potential to activate multi-party interactions. This paper focuses on quiz-style multi-party dialogues between two users and two agents as an example of thought-evoking multi-party dialogues. The experimental results revealed that the presence of a peer agent significantly improved user satisfaction and increased the number of user utterances in quiz-style multi-party dialogues. We also found that agents' empathic expressions significantly improved user satisfaction, improved user ratings of the peer agent, and increased the number of user utterances. Our findings should be useful for activating multi-party communications in various applications such as pedagogical agents and community facilitators.
    Scientific journal, English
  • 語の長さと幼児の語彙獲得時期・期間との相関
    南泰浩; 小林哲生
    音声学会, 17, 3, 44-53, Mar. 2014, Peer-reviwed
    Scientific journal, Japanese
  • Large-scale collection and analysis of personal question-answer pairs for conversational agents
    Hiroaki Sugiyama; Toyomi Meguro; Ryuichiro Higashinaka; Yasuhiro Minami
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 8637, 420-433, 2014, Peer-reviwed, In conversation, a speaker sometimes asks questions that relate to another speaker's detailed personality, such as his/her favorite foods and sports. This behavior also appears in conversations with conversational agents
    therefore, agents should be developed that can respond to such questions. In previous agents, this was achieved by creating question-answer pairs defined by hand. However, when a small number of persons create the pairs, we cannot know what types of questions are frequently asked. This makes it difficult to know whether the created questions cover frequently asked questions
    therefore, such essential question-answer pairs for conversational agents are possibly overlooked. This study analyzes a large number of question-answer pairs for six personae created by many question-generators, with one answer-generator for each persona. The proposed approach allows many questioners to create questions for various personae, enabling us to investigate the types of questions that are frequently asked. A comparison with questions appearing in conversations between humans shows that 50.2% of the questions were contained in our question-answer pairs and the coverage rate was almost saturated with the 20 recruited question-generators. © 2014 Springer International Publishing Switzerland.
    International conference proceedings, English
  • OPEN-DOMAIN UTTERANCE GENERATION USING PHRASE PAIRS BASED ON DEPENDENCY RELATIONS
    Hiroaki Sugiyama; Toyomi Meguro; Ryuichiro Higashinaka; Yasuhiro Minami
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, IEEE, PM2.201, 60-65, 2014, Peer-reviwed, The development of open-domain conversational systems remains difficult since user utterances are widely varied for such systems to respond appropriately. To address this issue, previous research has retrieved sentences from the web as system utterances by shallow sentence matching with user utterances. However, since the retrieved sentences include the inherent contexts of the document in which the sentences originally appeared, the retrieved sentences have the possibility of containing information that is irrelevant to user utterances. We propose combining two strongly related semantic units (phrase pairs with dependency relations) to create a system utterance. Here, the first semantic unit is the one found in the user utterance and the second semantic unit is the one that has a dependency relation with the first one in a large text corpus. This way, we can guarantee that the generated utterance is related to the input user utterance. Our experiments, which examine the appropriateness of response sentences, show that our proposed method significantly outperforms other retrieval and rule-based approaches.
    International conference proceedings, English
  • 幼児の発達に応じた語彙検索システム
    南泰浩; 小林哲生
    電子情報通信学会論文誌, The Institute of Electronics, Information and Communication Engineers, J96-D, 10, 2612-2624, Oct. 2013, Peer-reviwed, 本論文では,幼児発達の基礎研究や幼児向けのコンテンツ作成を支援するための,語彙検索システムの作成を試みた.このシステムは,語彙チェックリスト法により取得/解析した大規模横断データを用いて,理解・発話の点から日本語を学習する幼児がいつどんな語をどの程度習得するのかを簡単に且つ高精度に調べることができる.
    Scientific journal, Japanese
  • Open-Domain Utterance Generation for Conversational Dialogue Systems Using Web-Scale Dependency Structures
    H. Sugiyama; T. Meguro; R. Higashinaka; Y. Minami
    SIGdial, 22-24, Aug. 2013, Peer-reviwed
    International conference proceedings, English
  • Vocabulary Spurt and Noun Acquisition: Evidence from Longitudinal Data in Japanese-Speaking Children
    T. Kobayashi; Y. Minami; H. Sugiyama
    CLS, Poster, Jun. 2013, Peer-reviwed
    International conference proceedings, English
  • Influence of Predominance in Noun Learning Examined by Period from Comprehending to Producing Words: A Cross-Linguistic Statistical Investigation Using CDI
    Y. Minami; T. Kobayashi
    WILD, Poster, Jun. 2013, Peer-reviwed
    International conference proceedings, English
  • Cross-Linguistic Universality of Word Acquisition Ages in Comprehension and Production
    Y. Minami; T. Kobayashi
    WILD, Poster, Jun. 2013, Peer-reviwed
    International conference proceedings, English
  • Individual Variation of Word Acquisition Age: A Comparison of Japanese- and English-Speaking Infants
    H. Sugiyama; T. Kobayashi; Y. Minami
    WILD, Poster, Jun. 2013, Peer-reviwed
    International conference proceedings, English
  • Word-Class Composition in First 20 Words Predicts Later Word Acquisition Rate
    T. Kobayashi; Y. Minami; H. Sugiyama
    SRCD Biennial Meeting, 3-044 (157), Apr. 2013, Peer-reviwed
    International conference proceedings, English
  • 「語彙爆発の新しい視点」のさらなる検証
    小林哲生; 南泰浩; 杉山弘晃
    ベビーサイエンス, 12, 55-58, Mar. 2013, Peer-reviwed
    Scientific journal, Japanese
  • 語彙爆発の新しい視点:日本語学習児の初期語彙発達に関する縦断データ解析
    小林哲生; 南泰浩; 杉山弘晃
    ベビーサイエンス, 12, 34-49, Mar. 2013, Peer-reviwed
    Scientific journal, Japanese
  • Learning to control listening-oriented dialogue using partially observable markov decision processes
    Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
    ACM Transactions on Speech and Language Processing, 10, 4, 761-769, 2013, Peer-reviwed, Our aim is to build listening agents that attentively listen to their users and satisfy their desire to speak and have themselves heard. This article investigates how to automatically create a dialogue control component of such a listening agent.We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component that satisfies users by means of Partially Observable Markov Decision Processes (POMDPs). Using a hybrid dialog controller where high-level dialog acts are chosen with a statistical policy and low-level slot values are populated by a wizard, we evaluated our dialogue control method in aWizard-of-Oz experiment. The experimental results show that our POMDPbased method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This article is the first to verify, by using human users, the usefulness of POMDPbased dialogue control for improving user satisfaction in nontask-oriented dialogue systems. © 2013 ACM 1550-4875/2013/12-ART17 15.00.
    Scientific journal, English
  • Differences between Noun and Verb Learning Periods from Comprehension to Production in Early Language Development
    Y. Minami; T. Kobayashi
    BCCCD, 174-174, Jan. 2013, Peer-reviwed
    International conference proceedings, English
  • Learning to control listening-oriented dialogue using partially observable markov decision processes
    Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
    ACM Transactions on Speech and Language Processing, 10, 4, 2013, Peer-reviwed, Our aim is to build listening agents that attentively listen to their users and satisfy their desire to speak and have themselves heard. This article investigates how to automatically create a dialogue control component of such a listening agent.We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component that satisfies users by means of Partially Observable Markov Decision Processes (POMDPs). Using a hybrid dialog controller where high-level dialog acts are chosen with a statistical policy and low-level slot values are populated by a wizard, we evaluated our dialogue control method in aWizard-of-Oz experiment. The experimental results show that our POMDPbased method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This article is the first to verify, by using human users, the usefulness of POMDPbased dialogue control for improving user satisfaction in nontask-oriented dialogue systems. © 2013 ACM 1550-4875/2013/12-ART17 15.00.
    Scientific journal, English
  • 聞き役対話の分析及び分析に基づいた対話制御部の構築
    目黒豊美; 南泰浩; 東中竜一郎; 堂坂浩二
    情報処理学会論文誌, 53, 12, 2787-2801, Dec. 2012, Peer-reviwed
    Scientific journal, Japanese
  • Vocabulary Spurt and Word-Class Composition: Further Evidence for a Model of Plateaus and Linearity in Early Vocabulary Growth
    T. Kobayashi; Y. Minami; H. Sugiyama
    AMLaP, Poster, Sep. 2012, Peer-reviwed
    International conference proceedings, English
  • Plateaus and Linearity of Early Vocabulary Growth
    Y. Minami; T. Kobayashi; H. Sugiyama
    ISSBD, P3.73, Jul. 2012, Peer-reviwed
    International conference proceedings, English
  • Prediction of Vocabulary Growth Using Local Linearity
    H. Sugiyama; Y. Minami; T. Kobayashi
    ISSBD, P3. 67, Jul. 2012, Peer-reviwed
    International conference proceedings, English
  • 聞き役対話の分析及び分析に基づいた対話制御部の構築
    目黒豊美; 東中竜一郎; 堂坂浩二; 南泰浩
    情報処理学会論文誌, 52, 11, 2012, Peer-reviwed
    Scientific journal, Japanese
  • 情報提示対話を主導するシステムのためのユーザの潜在的情報要求の推定
    杉山弘晃; 南泰浩
    電子情報通信学会論文誌A, The Institute of Electronics, Information and Communication Engineers, 95-A, 1, 79-84, Jan. 2012, Peer-reviwed, 本研究では,ユーザへ情報を提示するシステムのための,ユーザの潜在的な情報要求の推定に基づく新たな情報提示タイミング決定方策を提案する.この方策により,システムは早過ぎる情報提示を抑制し,ユーザへ煩わしさを感じさせることなく主体的に情報提示することが可能になる.本研究ではこの方策におけるマルチモーダル情報の寄与を検証するため,最初に人と人のインタラクション実験を行い,利用可能なモダリティが変化したときの人が行う情報要求推定精度の変化について分析する.分析を通して,人はマルチモーダル情報を利用できないときは対話の流れを利用し,利用可能なときはマルチモーダル情報を利用することが示された.この結果をもとに,人の情報要求推定を実現するためのモデルを提案し,ユーザの潜在的な情報要求を表出させるよう設計した連想クイズ対話実験を通してその有効性を示す.
    Scientific journal, Japanese
  • 対話行為タイプ列 Trigram による行動予測確率に基づく Pomdp 対話制御
    南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 森啓; 前田英作
    電子情報通信学会論文誌A, The Institute of Electronics, Information and Communication Engineers, 95-A, 1, 2-15, Jan. 2012, Peer-reviwed, 我々は,これまで,タスク指向ではない対話に対してPOMDPによる対話制御のモデル化を行ってきた.POMDPを用いた対話制御は,短期的に多くの報酬を獲得する対話系列を生成するが,比較的長い自然な対話の流れを生成することには,必ずしも適さない.そこで,我々は,POMDPで定義された報酬と予測確率の高い行動を選択する報酬との間のトレードオフを実現する新たな報酬をPOMDPに導入した.本論文では,この行動予測確率に対話行為タイプ列のTrigram確率を用い,POMDP型の対話制御に組み込むことを試みた.これにより,提案手法は,POMDPで定義された報酬とTrigram確率による行動予測確率に基づく報酬とのトレードオフによる対話制御を実現することになる.提案手法は,従来のTrigram確率による対話制御では実現できなかった二つの目的を同時に考慮した対話制御を可能とする.また,提案手法は,POMDPの特徴である認識誤りへの頑健性をも併せ持つ.本論文では,提案手法を定式化するとともに,実際の対話行為タイプ列のデータを用いて,モデルを学習しシミュレーション実験により提案手法の評価を行った.この評価では,認識誤りをシミュレートするため,対話文から対話行為タイプ列へ変換する対話行為タイプ認識を実装し,その結果得られる認識傾向を利用した.実験を行った結果,提案手法の有効性が確認され,Trigram確率だけに基づく対話制御に比べ,対話行為タイプの認識誤りにも頑健であることも明らかになった.
    Scientific journal, Japanese
  • 擬人化エージェントとの対話場面におけるユーザの非言語動作に基づく難/易および興味/退屈の推定
    中村和晃; 角所考; 正司哲朗; 美濃導彦; 澤木美奈子; 南泰浩; 前田英作
    電子情報通信学会論文誌A, The Institute of Electronics, Information and Communication Engineers, 95-A, 1, 85-96, Jan. 2012, Peer-reviwed, 本研究では,ユーザー擬人化エージェント間の音声対話場面を対象に,ユーザが対話内容に対し難しいと感じていたか否か("難/易"),興味をもっていたか否か("興味/退屈")を,そのユーザの非言語動作(視線,表情,姿勢,手振り)から推定する処理の実現を目指す.一般に,音声対話では対話の内容が決定・伝達されるまでに一定の時間経過を要するため,そのような対話内容に対する難/易等の心的状態も一定の時間区間に対して定義される.一方,こうした時間区間の中では,話者/聴者の交代や対話の文脈の変化といった状況変化が頻繁に生じ,一つひとつの状況が対話全体の中で果たす役割の違いに応じて,各瞬間での心的状態と非言語動作との関係が多様に変化する.このため,各瞬間における非言語動作を特徴量として時間区間ごとに定義される難/易や興味/退屈を推定することは難しい.そこで本研究では,各瞬間ごとではなく時間区間ごとに定義される量(具体的には各種非言語動作の表出頻度)を特徴量として難/易及び興味/退屈を推定することを提案する.提案方法の有効性を確かめるために実験を行った結果,約72%の推定精度が得られた.
    Scientific journal, Japanese
  • Preference-learning based Inverse Reinforcement Learning for Dialog Control
    Hiroaki Sugiyama; Toyomi Meguro; Yasuhiro Minami
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, ISCA-INT SPEECH COMMUNICATION ASSOC, Mon.P1d.03, 222-225, 2012, Peer-reviwed, Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.
    International conference proceedings, English
  • Multiple Vocabulary Spurts in Japanese Children
    Y. Minami; H. Sugiyama; T. Kobayashi
    IASCL, Poster, Jul. 2011, Peer-reviwed
    International conference proceedings, English
  • Analysis of Vocabulary Spurt from Prediction Performance Evaluation
    H. Sugiyama; T. Kobayashi; Y. Minami
    SRCD, Poster, Mar. 2011, Peer-reviwed
    International conference proceedings, English
  • Dialogue Control by Pomdp Using Dialogue Data Statistics.
    Yasuhiro Minami; Akira Mori; Toyomi Meguro; Ryuichiro Higashinaka; Kohji Dohsaka; Eisaku Maeda
    Spoken Dialogue Systems Technology and Design, Springer, 163-186, 2011, Peer-reviwed
    Scientific journal, English
  • Information Provision-timing Control for Informational Assistance Robot
    Hiroaki Sugiyama; Yasuhiro Minami
    PROCEEDINGS OF THE 6TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTIONS (HRI 2011), IEEE, 259-260, 2011, Peer-reviwed, This paper proposes a HMM-based user's information demand estimation model for autonomous informational assistance robots to avoid providing information prematurely. The model estimates the user's implicit information demands by predicting a user's next information request using user's head movements. Through a word-association quiz-dialog experiment, our model demonstrated superior prediction performance over the usual HMM-based classifier.
    International conference proceedings, English
  • Unsupervised Clustering of Utterances using Non-parametric Bayesian Methods
    Ryuichiro Higashinaka; Noriaki Kawamae; Kugatsu Sadamitsu; Yasuhiro Minami; Toyomi Meguro; Kohji Dohsaka; Hirohito Inagaki
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, ISCA-INT SPEECH COMMUNICATION ASSOC, 2092-2095, 2011, Peer-reviwed, Unsupervised clustering of utterances can be useful for the modeling of dialogue acts for dialogue applications. Previously, the Chinese restaurant process (CRP), a non-parametric Bayesian method, has been introduced and has shown promising results for the clustering of utterances in dialogue. This paper newly introduces the infinite HMM, which is also a non-parametric Bayesian method, and verifies its effectiveness. Experimental results in two dialogue domains show that the infinite HMM, which takes into account the sequence of utterances in its clustering process, significantly outperforms the CRP. Although the infinite HMM outperformed other methods, we also found that clustering complex dialogue data, such as human-human conversations, is still hard when compared to human-machine dialogues.
    International conference proceedings, English
  • Evaluation of Listening-oriented Dialogue Control Rules based on the Analysis of HMMs
    Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, ISCA-INT SPEECH COMMUNICATION ASSOC, 816-+, 2011, Peer-reviwed, We have been working on listening-oriented dialogues for the purpose of building listening agents. In our previous work [1], we trained hidden Markov models (HMMs) from listening-oriented dialogues (LoDs) between humans, and by analyzing them, discovered a distinguishing dialogue flow of LoD. For example, listeners suppress their information giving and self-disclosure, and instead, increase acknowledgments and questions to elicit speakers' utterances. As an initial step for building listening agents, we decided to create dialogue control rules based on our analysis of the HMMs. We built our rule-based system and compared it with three other systems by a Wizard of Oz (WoZ) experiment. As a result, we found that our rule-based system achieved as much user satisfaction as human listeners.
    International conference proceedings, English
  • Building a conversational model from two-tweets
    Ryuichiro Higashinaka; Noriaki Kawamae; Kugatsu Sadamitsu; Yasuhiro Minami; Toyomi Meguro; Kohji Dohsaka; Hirohito Inagaki
    2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings, 330-335, 2011, Peer-reviwed, The current problem in building a conversational model from Twitter data is the scarcity of long conversations. According to our statistics, more than 90% of conversations in Twitter are composed of just two tweets. Previous work has utilized only conversations lasting longer than three tweets for dialogue modeling so that more than a single interaction can be successfully modeled. This paper verifies, by experiment, that two-tweet exchanges alone can lead to conversational models that are comparable to those made from longer-tweet conversations. This finding leverages the value of Twitter as a dialogue corpus and opens the possibility of better conversational modeling using Twitter data. © 2011 IEEE.
    International conference proceedings, English
  • Wizard of Oz evaluation of listening-oriented dialogue control using POMDP
    Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
    2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings, 318-323, 2011, Peer-reviwed, We have been working on dialogue control for listening agents. In our previous study [1], we proposed a dialogue control method that maximizes user satisfaction using partially observable Markov decision processes (POMDPs) and evaluated it by a dialogue simulation. We found that it significantly outperforms other stochastic dialogue control methods. However, this result does not necessarily mean that our method works as well in real dialogues with human users. Therefore, in this paper, we evaluate our dialogue control method by a Wizard of Oz (WoZ) experiment. The experimental results show that our POMDP-based method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This paper is the first to show the usefulness of POMDP-based dialogue control using human users when the target function is to maximize user satisfaction. © 2011 IEEE.
    International conference proceedings, English
  • 環境知能を実現する統計的対話処理の研究 (特集 20 周年を迎えたコミュニケーション科学)
    南泰浩; 目黒豊美
    NTT 技術ジャ-ナル, 電気通信協会, 23, 9, 10-13, 2011, Peer-reviwed
    Research institution, Japanese
  • Statistical Dialogue Processing for Ambient Intelligence
    Y. Minami; T. Meguro
    NTT Technical Review, 9, 11, 2011, Peer-reviwed
    Research institution, English
  • User-Adaptive Coordination of Agent Communicative Behavior in Spoken Dialogue
    K. Dohsaka; A. Kanemoto; R. Higashinaka; Y. Minami; E. Maeda
    Sigdial, 人工知能学会, 24, 314-321, Sep. 2010, Peer-reviwed
    International conference proceedings, English
  • Modeling User Satisfaction Transitions in Dialogues from Overall Ratings
    R. Higashinaka; Y. Minami; K. Dohsaka; T. Meguro
    Sigdial, 18-27, Sep. 2010, Peer-reviwed
    International conference proceedings, English
  • Learning to Model Domain-Specific Utterance Sequences for Extractive Summarization of Contact Center Dialogues
    R. Higashinaka; Y. Minami; H. Nishikawa; K. Dohsaka; T. Meguro; S. Takahashi; G. Kikui
    COLING, 400-408, Aug. 2010, Peer-reviwed
    International conference proceedings, English
  • FAST SIMILARITY SEARCH ON A LARGE SPEECH DATA SET WITH NEIGHBORHOOD GRAPH INDEXING
    Kazuo Aoyama; Shinji Watanabe; Hiroshi Sawada; Yasuhiro Minami; Naonori Ueda; Kazumi Saito
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, IEEE, 5358-5361, 2010, Peer-reviwed, This paper presents a novel graph-based approach for solving a problem of fast finding a speech model acoustically similar to a query model from a large set of speech models. Each speech model in the set is represented by a Gaussian mixture model and dissimilarity from a GMM to another is measured with a Kullback-Leibler divergence (KLD). Conventional pruning techniques based on the triangle inequality for fast similarity search are not available because the model space with a KLD is not a metric space. We propose a search method that is characterized by an index of a degree-reduced nearest neighbor (DRNN) graph. The search method can efficiently find the most similar (closest) GMM to a query, exploring the DRNN graph with a best-first manner. Experimental evaluations on utterance GMM search tasks reveal a significantly low computational cost of the proposed method.
    International conference proceedings, English
  • Issues in Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models
    Ryuichiro Higashinaka; Yasuhiro Minami; Kohji Dohsaka; Toyomi Meguro
    SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, SPRINGER-VERLAG BERLIN, 6392, 48-60, 2010, Peer-reviwed, This paper addresses three important issues in automatic prediction of user satisfaction transitions in dialogues. The first issue concerns the individual differences in user satisfaction ratings and how they affect the possibility of creating a user-independent prediction model. The second issue concerns how to determine appropriate evaluation criteria for predicting user satisfaction transitions. The third issue concerns how to train suitable prediction models. We present our findings for these issues on the basis of the experimental results using dialogue data in two domains.
    International conference proceedings, English
  • Improving HMM-based extractive summarization for multi-domain contact center dialogues
    Ryuichiro Higashinaka; Yasuhiro Minami; Hitoshi Nishikawa; Kohji Dohsaka; Toyomi Meguro; Satoshi Kobashikawa; Hirokazu Masataki; Osamu Yoshioka; Satoshi Takahashi; Genichiro Kikui
    2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, 61-66, 2010, Peer-reviwed, This paper reports the improvements we made to our previously proposed hidden Markov model (HMM) based summarization method for multi-domain contact center dialogues. Since the method relied on Viterbi decoding for selecting utterances to include in a summary, it had the inability to control compression rates. We enhance our method by using the forward-backward algorithm together with integer linear programming (ILP) to enable the control of compression rates, realizing summaries that contain as many domain-related utterances and as many important words as possible within a predefined character length. Using call transcripts as input, we verify the effectiveness of our enhancement. ©2010 IEEE.
    International conference proceedings, English
  • Trigram dialogue control using POMDPs
    Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka; Toyomi Meguro; Eisaku Maeda
    2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, 336-341, 2010, Peer-reviwed, This paper proposes hybrid dialogue control of both trigram and POMDP dialogue controls by extending our proposed method that uses two approaches: automatically acquiring POMDP structures and rewards for target dialogues through Dynamic Bayesian Networks (DBNs) with a large amount of dialogue data and reflecting action predictive probabilities into the POMDP structures. In this extension, we modify the action predictive probabilities to treat trigram dialogue controls. Experimental results show that the proposed method can treat a trigram dialogue control with robustness for erroneous conditions and can simultaneously maximize trigram probability and the dialogue evaluations obtained from users. ©2010 IEEE.
    International conference proceedings, English
  • Effects of Personality Traits on Listening-Oriented Dialogue
    T. Meguro; R. Higashinaka; K. Dohsaka; Y. Minami; H. Isozaki
    IWSDS, 104-107, Dec. 2009, Peer-reviwed
    International conference proceedings, English
  • Dialogue Control Algorithm for Ambient Intelligence Based on Partially Observable Markov Decision Processes
    Y. Minami; A. Mori; T. Meguro; R. Higashinaka; K. Dohsaka; E. Maeda
    IWSDS, 254-263, Dec. 2009, Peer-reviwed
    International conference proceedings, English
  • Transdisciplinary Approach for Constructing Ambient Intelligence Environments
    E. Maeda; Y. Minami; K. Dohsaka; A. Mori
    Ami, 9-12, Nov. 2009, Peer-reviwed
    International conference proceedings, English
  • Effects of Conversational Agents on Human Communication in Thought-Evoking Multi-Party Dialogues
    K. Dohsaka; R. Asai; R. Higashinaka; Y. Minami; E. Maeda
    Sigdial, 219-224, Sep. 2009, Peer-reviwed
    International conference proceedings, English
  • Analysis of Listening-Oriented Dialogue for Building Listening Agents
    T. Meguro; R. Higashinaka; K. Dohsaka; Y. Minami; H. Isozaki
    Sigdial, 124-127, Sep. 2009, Peer-reviwed
    International conference proceedings, English
  • Switching acausal filters for speech modeling
    Yasuhiro Minami; Hirokazu Kameoka
    Machine Learning for Signal Processing XIX - Proceedings of the 2009 IEEE Signal Processing Society Workshop, MLSP 2009, 1-6, 2009, Peer-reviwed, This paper shows a unified model of dynamical systems in speech processing that includes speech recognition and pitch modeling. For this purpose, we propose the use of switching acausal filters (SAFs), which exchange multiple acausal filters. These filters are defined by identical linear dynamical systems that exchange the roles of observation value and system input. This paper describes the formulation of recognition, training, and feature generation methods for SAFs, which can be applied to several previously proposed speech models. As an example, we show that an HMM with dynamic features and our F0 control method can be modeled by the proposed formulation. An HMM synthesis method can also be modeled using the formulations. From these results, we demonstrate the unification capability of SAFs. © 2009 IEEE.
    International conference proceedings, English
  • まつしゅるーむの世界 : 環境知能の実現
    南 泰浩; 堂坂 浩二; 澤木 美奈子; 森 啓; 前田 英作
    ヒューマンインタフェース学会誌 = Journal of Human Interface Society : human interface, ヒュ-マンインタフェ-ス学会, 10, 2, 109-114, May 2008, Peer-reviwed
    Scientific journal
  • "WHO IS THIS" QUIZ DIALOGUE SYSTEM AND USERS' EVALUATION
    M. Sawaki; Y. Minami; R. Higashinaka; K. Dohsaka; E. Maeda
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, IEEE, 149-152, 2008, Peer-reviwed, In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes "Who is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner, we implemented the system as a stuffed-toy (or CG equivalent). Quizzes are automatically generated from Wikipedia articles, rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level.
    International conference proceedings, English
  • Quizmaster Mushrooms: “Who Is This” Quiz Dialogue System
    M. Sawaki; Y. Minami; R. Higashinaka; K. Dohsaka; T. Yamada; T.Matsubayashi; H. Isozaki; E. Maeda
    ICMI demo-session, demo-session, Nov. 2007, Peer-reviwed
    International conference proceedings, English
  • Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
    Takaaki Hori; Chiori Hori; Yasuhiro Minami; Atsushi Nakamura
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 15, 4, 1352-1365, May 2007, Peer-reviwed, This paper proposes a novel one-pass search algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large-vocabulary continuous-speech recognition. In the standard search method with on-the-fly composition, two or more WFSTs are composed during decoding, and a Viterbi search is performed based on the composed search space. With this new method, a Viterbi search is performed based on the first of the two WFSTs. The second WFST is only used to rescore the hypotheses generated during the search. Since this rescoring is very efficient, the total amount of computation required by the new method is almost the same as when using only the first WFST. In a 65k-word vocabulary spontaneous lecture speech transcription task, our proposed method significantly outperformed the standard search method. furthermore, our method was faster than decoding with a single fully composed and optimized WFST, where our method used only 38% of the memory required for decoding with the single WFST. Finally, we have achieved high-accuracy one-pass real-time speech recognition with an extremely large vocabulary of 1.8 million words.
    Scientific journal, English
  • The World of Mushrooms: Human-Computer Interaction Prototype Systems for Ambient Intelligence
    Yasuhiro Minami; Minako Sawaki; Kohji Dohsaka; Ryuichiro Higashinaka; Kentaro Ishizuka; Hideki Isozaki; Tatsushi Matsubayashi; Masato Miyoshi; Atsushi Nakamura; Takanobu Oba; Hiroshi Sawada; Takeshi Yamada; Eisaku Maeda
    ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, ASSOC COMPUTING MACHINERY, 366-373, 2007, Peer-reviwed, Our new research project called "ambient intelligence" concentrates oil the creation of new lifestyles through research oil communication science and intelligence integration. It is premised on the creation of Such virtual complication partners as fairies and goblins that can be constantly at our side. We call these virtual communication partners mushrooms.
    To show the essence of ambient intelligence, we developed two multimodal prototype systems: mushrooms that watch, listen, and answer questions and a Quizmaster Mushroom. These two systems Work in real time using speech. Sound, dialogue, and vision technologies.
    We performed preliminary experiments With the Quizmaster Mushroom. The results showed that the system call transmit knowledge to users while they are playing the quizzes.
    Furthermore. through the two Mushrooms, we found policies for design effects in multimodal interface and integration.
    International conference proceedings, English
  • Mixture Gaussian HMM-trajctory method using likelihood compensation
    Yasuhiro Minami
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, IEEE, 296-299, 2007, Peer-reviwed, We propose a new speech recognition method (HMM-trajectory method) that generates a speech trajectory from HMMs by maximizing their likelihood while accounting for the relationship between the MFCCs and dynamic MFCCs. One major advantage of this method is that this relationship, ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper improves the recognition performance of the HMM-trajectory method for dealing with mixture Gaussian distributions. While the HMM-trajectory method chooses the Gaussian distribution sequence of the HMM states by selecting the best Gaussian distribution in the state during Viterbi decoding and calculating HMM trajectory likelihood along with the sequence, the proposed method compensates for HMM trajectory likelihood using ordinary HMM likelihood. In speaker-independent speech recognition experiments, the proposed method reduced the error rate about 10% for the task compared with HMMs, proving its effectiveness for Gaussian mixture components.
    International conference proceedings, English
  • コミュニケーション環境の未来に向けた研究最前線 まっしゅるーむの世界-知能統合の実現に向けて
    南泰浩; 前田英作; 堂坂浩二; 近藤公久; 森啓
    NTT技術ジャーナル, 電気通信協会, 19, 6, 19-21, 2007, Peer-reviwed
    Research institution, Japanese
  • まっしゅるーむの世界――知能統合の実現に向けて
    南泰浩; 前田英作; 堂坂浩二; 近藤公久; 森啓
    NTT技術ジャーナル, 19, 6, 19-22, 2007, Peer-reviwed
    Research institution, Japanese
  • Dynamic assignment of Gaussian components in modelling speech spectra
    Parham Zolfaghari; Hiroko Kato; Yasuhiro Minami; Atsushi Nakamura; Shigeru Katagiri; Roy Patterson
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, SPRINGER, 45, 1-2, 7-19, Nov. 2006, Peer-reviwed, In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract where Gaussian distributions are used to model spectral frequency regions. A mixtures of Gaussian (MoG) based parametrisation scheme is used for modelling a smoothed representation of the spectra. This smoothing procedure removes all signal periodicity from the spectra allowing highly natural analysis, manipulation and synthesis of speech. The goal of this parametrisation scheme is to ease the correspondence between the resonant characteristics of the vocal tract and the parametric distributions and modelling the spectrum with an appropriate number of parameters. Previously, a maximum likelihood (ML) approach to this parametrisation scheme was introduced. However, this approach has inherent local optima problems. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we propose a new scheme whereby starting with a large number of distributions in the mixture, we systematically reduce their number and re-approximate the densities in the mixture based on a distance criterion. The Kullback-Leibler (KL) distance was found to allow optimal MoG solutions to the spectra. Furthermore, a fitness measure based on KL information is used to provide a figure for estimating the model order in representing formant-like features. The proposed model is subjectively evaluated and is shown to reduce the number of Gaussian with an appreciable loss in the quality of the re-synthesised speech.
    Scientific journal, English
  • 「妖精・妖怪の復権: 新しい「環境知能」像の提案」
    前田英作; 南泰浩; 堂坂浩二
    情報処理, 47, 6, 624-640, Jun. 2006, Peer-reviwed
    Scientific journal, Japanese
  • Speech feature extraction method using subband-based periodicity and nonperiodicity decomposition
    Kentaro Ishizuka; Tomohiro Nakatani; Yasuhiro Minami; Noboru Miyazaki
    Journal of the Acoustical Society of America, 120, 1, 443-452, 2006, Peer-reviwed, This paper proposes a speech feature extraction method that utilizes periodicity and nonperiodicity for robust automatic speech recognition. The method was motivated by the auditory comb filtering hypothesis proposed in speech perception research. The method divides input signals into subband signals, which it then decomposes into their periodic and nonperiodic components using comb filters independently designed in each subband. Both features are used as feature parameters. This representation exploits the robustness of periodicity measurements as regards noise while preserving the overall speech information content. In addition, periodicity is estimated independently in each subband, providing robustness as regards noise spectrum bias. The framework is similar to that of a previous study [Jackson et al., Proc. of Eurospeech. (2003), pp. 2321-2324], which is based on cascade processing motivated by speech production. However, the proposed method differs in its design philosophy, which is based on parallel distributed processing motivated by speech perception. Continuous digit speech recognition experiments in the presence of noise confirmed that the proposed method performs better than conventional methods when the noise in the training and test data sets differs. © 2006 Acoustical Society of America.
    Scientific journal, English
  • 「環境知能シンポジウム2006-知性の森が織りなす未来」開催報告
    堂坂浩二; 南泰浩; 森啓; 近藤公久
    NTT技術ジャーナル, 電気通信協会, 18, 12, 72-76, 2006, Peer-reviwed
    Research institution, Japanese
  • 「環境知能」プロジェクトの進展
    南泰浩; 堂坂浩二; 森啓; 前田英作
    NTT技術ジャーナル, 電気通信協会, 18, 9, 60-64, 2006, Peer-reviwed
    Research institution, Japanese
  • Report on “Ambient Intelligence Symposium 2006 - the Future: A Tapestry Woven from Threads of Intelligence”
    K. Dohsaka; Y. Minami; A. Mori; T. Kondo
    NTT Technical Review, 4, 12, 64-69, 2006, Peer-reviwed
    Research institution, English
  • Step Towards Ambient Inteligence
    E. Maeda; Y. Minami
    NTT Technical Review, 4, 1, 50-55, 2006, Peer-reviwed
    Research institution, English
  • The World of Mushrooms -a Transdisciplinary Approach to Human-Computer Interaction with Ambient Intelligence
    E. Maeda; Y. Minami; M. Miyoshi; M. Sawaki; H. Sawada; A. Nakamura; J. Yamato; T. Yamada; R. Higashinaka
    NTT Technical Review, 4, 12, 17-25, 2006, Peer-reviwed
    Research institution, English
  • Selection of shared-state hidden Markov model structure using Bayesian criterion
    S Watanabe; Y Minami; A Nakamura; N Ueda
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E88D, 1, 1-9, Jan. 2005, Peer-reviwed, A Shared-State Hidden Markov Model (SS-HMM) has been widely used as an acoustic model in speech recognition. In this paper, we propose a method for constructing SS-HMMs within a practical Bayesian framework. Our method derives the Bayesian model selection criterion for the SS-HMM based on the variational Bayesian approach. The appropriate phonetic decision tree structure of the SS-HMM is found by using the Bayesian criterion. Unlike the conventional asymptotic criteria, this criterion is applicable even in the case of an insufficient amount of training data. The experimental results on isolated word recognition demonstrate that the proposed method does not require the tuning parameter that must be tuned according to the amount of training data, and is useful for selecting the appropriate SS-HMM structure for practical use.
    Scientific journal, English
  • 「環境知能」の実現に向けて
    前田英作; 南. 泰浩
    NTT技術ジャーナル, 電気通信協会, 17, 11, 52-55, 2005, Peer-reviwed
    Research institution, Japanese
  • Fast on-the-Fly Composition for Weighted Finite-State Transducers in 1.8 Million-Word Vocabulary Continuous Speech Recognition
    T. Hori; C. Hori; Y. Minami
    ICSLP, I, 289-292, Oct. 2004, Peer-reviwed
    International conference proceedings, English
  • Improvement in Robustness of Speech Feature Extraction Method Using Sub-Band Based Periodicity and Aperiodicity Decomposition
    K. Ishizuka; N. Miyazaki; T. Nakatani; Y. Minami
    ICSLP, 937-940, Oct. 2004, Peer-reviwed
    International conference proceedings, English
  • A Theoretical Analysis of Speech Recognition Based on Feature Trajectory Models
    Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
    ICSLP, I, 549-552, Oct. 2004, Peer-reviwed
    International conference proceedings, English
  • Variational Bayesian estimation and clustering for speech recognition
    S Watanabe; Y Minami; A Nakamura; N Ueda
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 12, 4, 365-381, Jul. 2004, Peer-reviwed, In this paper, we propose variational Bayesian estimation and clustering for speech recognition (VBEC), which is based on the variational Bayesian (VB) approach. VBEC is a total Bayesian framework: all speech recognition procedures (acoustic modeling and speech classification) are based on VB posterior distribution, unlike the maximum likelihood (ML) approach based on ML parameters. The total Bayesian framework generates two major Bayesian advantages over the ML, approach for the mitigation of over-training effects, as it can select an appropriate model structure without any data set size condition, and can classify categories robustly using a predictive posterior distribution. By using these advantages, VBEC: 1) allows the automatic construction of acoustic models along two separate dimensions, namely, clustering triphone hidden Markov model states and determining the number of Gaussians and 2) enables robust speech classification, based on Bayesian predictive classification using VB posterior distributions. The capabilities of the VBEC functions were confirmed in large vocabulary continuous speech recognition experiments for read and spontaneous speech tasks. The experiments confirmed that VBEC automatically constructed accurate acoustic models and robustly classified speech, i.e., totally mitigated the over-training effects with high word accuracies due to the VBEC functions.
    Scientific journal, English
  • Recognition Method with Parametric Trajectory Synthesized Using Hmms
    Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
    SWIM, 776-786, Jan. 2004, Peer-reviwed
    International conference proceedings, English
  • Model selection for mixture of Gaussian based spectral modelling
    P Zolfaghari; H Kato; Y Minami; A Nakamura; S Katagiri
    MACHINE LEARNING FOR SIGNAL PROCESSING XIV, IEEE, 325-334, 2004, Peer-reviwed, In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract. We propose a mixtures of Gaussians (MoG) spectral modelling scheme which enables model selection with a goal of easing the correspondence between the resonant characteristics of the vocal tract and the parametric Gaussians and representing a spectrum with an appropriate number of parameters. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we systematically reduce the number of Gaussians and re-approximate the densities in the MoG spectral model. The Kullback-Leibler (KL) distance between the densities in the mixture was found to allow optimal ML-MoG solutions to the spectra. A fitness measure based on KL information provides a figure for estimating the model order in representing formant-like features. The mixture model was fitted to a normalised smooth spectrum obtained by filtering the short-time Fourier transform in time and frequency by a pitch adaptive Gaussian filter. This results in the removal of all source information from the spectra. By subjectively evaluating the quality of the analysed and synthesised speech using this parametrisation scheme, we show considerable improvement over ML using this Gaussian reduction scheme specifically when using lower number of Gaussians in the mixture.
    International conference proceedings, English
  • Speech Summarization Using Weighted Finite-State Transducers
    T. Hori; C. Hori; Y. Minam
    Eurospeech, 2817-2820, Sep. 2003, Peer-reviwed
    International conference proceedings, English
  • ベイズ的基準を用いた状態共有型 Hmm 構造の選択
    渡部晋治; 南泰浩; 中村篤; 上田修功
    電子情報通信学会論文誌D, The Institute of Electronics, Information and Communication Engineers, J86-DII, 6, 776-786, Jun. 2003, Peer-reviwed, 音声認識用音響モデルとして広く用いられている状態共有型HMMにおいては,その状態共有構造をいかに適切に定めるかが重要である.従来,総状態数の決定を含む状態共有構造及び総状態数の選択は最ゆう基準に基づいて行われていた.しかしゆう度は総状態数の増加に伴い単調増加するため,実験的にしきい値を設定する必要がある.また,この問題に対するために導入された.最小記述長(MDL)基準やベイズ的情報基準(BIC)に基づくモデル選択は漸近理論を用いて導出されているため,学習データが少ない場合,適切なモデル選択が困難であるという問題があった.本論文では,決定論的ベイズ計算法として提案された変分ベイズ法に基づく,漸近性を仮定しないベイズ的基準を用いてHMMの状態クラスタリングを行い,状態共有構造と総状態数を学習データに応じて適応的に選択する方法を提案する.不特定話者の孤立単語認識実験を通して提案法の有効性を実証した.
    Scientific journal, Japanese
  • Paraphrasing Spontaneous Speech Using Weighted Finitestate Transducers
    T. Hori; D. Willett; Y. Minami
    SSPR2003, 219-222, Apr. 2003, Peer-reviwed
    International conference proceedings, English
  • Bayesian Acoustic Modeling for Spontaneous Speech Recognition
    S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
    SSPR, 47-50, Apr. 2003, Peer-reviwed
    International conference proceedings, English
  • Language model adaptation using WFST-based speaking-style translation
    T Hori; D Willett; Y Minami
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 228-231, 2003, Peer-reviwed, This paper describes a new approach to language model adaptation for speech recognition based on the statistical framework of speech translation. The main idea of this approach is to compose a weighted finite-state transducer (WFST) that translates sentence styles from in-domain to out-of-domain. It enables to integrate language models of different styles of speaking or dialects and even of different vocabularies. The WFST is built by combining in-domain and out-of-domain models through the translation, while each model and the translation itself is expressed as a WFST. We apply this technique to building language models for spontaneous speech recognition using large written-style corpora. We conducted experiments on a 20k-word Japanese spontaneous speech recognition task. With a small in-domain corpus, a 2.9% absolute improvement in word error rate is achieved over the in-domain model.
    International conference proceedings, English
  • Recognition method with parametric trajectory generated from mixture distribution HMMs
    Y Minami; E McDermott; A Nakamura; S Katagiri
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, I, 124-127, 2003, Peer-reviwed, We have proposed a new speech recognition technique that generates a speech trajectory from HMMs by maximizing the likelihood of the trajectory, while accounting for the relation between the cepstrum and the dynamic cepstrum coefficients. This method has the major advantage that the relation, which is ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper describes an extension of the method for dealing with HMMs whose distributions are mixture Gaussian distributions. The method chooses the sequence of Gaussian distributions by selecting the best Gaussian distribution in the state during Viterbi decoding. Speaker-independent speech recognition experiments were carried out. The proposed method obtained an 18.2% reduction in error rate for the task, proving that the proposed method is effective even for Gaussian mixture HMMs.
    International conference proceedings, English
  • Application of variational Bayesian estimation and clustering to acoustic model adaptation
    S Watanabe; Y Minami; A Nakamura; N Ueda
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 1, 568-571, 2003, Peer-reviwed, In this paper, we apply Variational Bayesian Estimation and Clustering for speech recognition (VBEC) to an acoustic model adaptation. VBEC can estimate parameter posteriors even when a model includes hidden variables, by using Variational Bayesian approach. In addition, VBEC can select an appropriate model structure in clustering triphone states, according to the amount of available adaptation data. Unlike a conventional Bayesian method such as Maximum A Posteriori (MAP), VBEC is useful even in the case of small amounts of data, because the amount of data per,one Gaussian increases due to the model structure selection, and over-training is suppressed. We conduct an off-line supervised adaptation experiment on isolated word recognition, and show the advantage of the proposed method over the conventional method, especially when dealing with small amounts of adaptation data.
    International conference proceedings, English
  • Pervasive unsupervised adaptation for lecture speech transcription
    D Willett; T Niesler; E McDermott; Y Minami; S Katagiri
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 2, 292-295, 2003, Peer-reviwed, Unsupervised adaptation has evolved as a popular approach for tuning the acoustic models of speaker-independent speech recognition systems to specific speakers, speaker groups or channel conditions while making use of only untranscribed data. This study focuses on procedures for unsupervised adaptation of other probabilistic models that are involved in state-of-the-art speech recognizers and on the joint adaptation of. multiple knowledge sources. In particular, we outline and evaluate approaches for adapting both the language model and the pronunciation model (lexicon) without supervision. Initial experiments on off-line lecture speech transcription achieved small but promising word error rate improvements with each approach applied separately. The experimental results on the joint application of acoustic, language and pronunciation model adaptation indicate that the individually achievable performance improvements are additive.
    International conference proceedings, English
  • コミュニケーションの壁を克服するための音声・音響処理技術 次世代の音声認識技術
    中村篤; 南. 泰浩; マクダーモット・エリック
    NTT技術ジャーナル, 電気通信協会, 15, 12, 13-18, 2003, Peer-reviwed
    Research institution, Japanese
  • Application of Variational Bayesian Approach to Speech Recognition
    S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
    NIPS, MIT Press, NIPS'02, Dec. 2002, Peer-reviwed
    International conference proceedings, English
  • Evaluation of a Speech Recognition/Generation Method Based on HMM and Straight
    T. Irino; Y. Minami; T. Nakatani; M. Tsuzaki; H. Tagawa
    ICSLP, 2545-2548, Sep. 2002, Peer-reviwed
    International conference proceedings, English
  • Constructing Shared-State Hidden Markov Models Based on a Bayesian Approach
    S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
    ICSLP, 4, 2669-2672, Sep. 2002, Peer-reviwed
    International conference proceedings, English
  • A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series
    Y Minami; E McDermott; A Nakamura; S Katagiri
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, IEEE, 1, 957-960, 2002, Peer-reviwed, Parametric trajectory models have been proposed to exploit this time-dependency. However, parametric trajectory modeling methods are unable to take advantage of efficient HMM training and recognition methods. We have proposed a new speech recognition technique that generates a speech trajectory using an HMM-based speech synthesis method. This method generates an acoustic trajectory by maximizing the likelihood of the trajectory while taking into account the relation between the cepstrum, delta-cepstrum, and delta-delta cepstrum. In this paper, we extend our method to a general formulation including variance training procedure. Speaker independent speech recognition experiments show that the proposed method is effective for speech recognition.
    International conference proceedings, English
  • A Recognition Method Using Synthesis-Based Scoring That Incorporates Direct Relations between Static and Dynamic Feature Vector Time Series
    Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
    Workshop for Consistent & Reliable Acoustic Cues for Sound Analysis, Poster, Sep. 2001, Peer-reviwed
    International conference proceedings, English
  • Mokusei: A Telephone-Based Japanese Conversational System in the Weather Domain
    M. Nakano; Y. Minami; S. Seneff; T. J. Hazen; D. S. Cyphers; J. Glass; J. Poliforoni; V. Zue
    Eurospeech, 1331-1334, Sep. 2001, Peer-reviwed
    International conference proceedings, English
  • Time and Memory Efficient Viterbi Decording for Lvcsr Using a Precompiled Search Network
    D. Willett; E. McDermott; Y. Minami; S. Katagiri
    Eurospeech, 847-890, Sep. 2001, Peer-reviwed
    International conference proceedings, English
  • From Jupiter to Mokusei: Multilingual Conversational System in the Weather Domain
    V. Zue; S. Seneff; J. Polifroni; M. Nakano; Y. Minami; T. J. Hazen; J. Glass
    Workshop on Multi-Lingual Speech Communication, 1-6, Apr. 2000, Peer-reviwed
    International conference proceedings, English
  • Mokusei: A Japanese Spoken Dialogue System in the Weather Domain
    S. Seneff; J. Glass; T. J. Hazen; Y. Minami; J. Polifroni; V. Zue
    NTT R&D, 電気通信協会, 49, 7, 376-382, 2000, Peer-reviwed
    Research institution, English
  • Compensation of speaker directivity in speech recognition using HMM composition
    F. Giron; Y. Minami; M. Tanaka; K. Furuya
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1, 253-256, 01 Dec. 1998, Peer-reviwed, In hands-free speech recognition the speaker should be able to move freely in front of the speech acquisition device. However, the speech signal is then submitted to variations due to the continuous change of position in the acoustic space. This paper focuses on the role of speaker head rotations as compared with static situations in anechoic conditions. The effect of speaker directivity in speech recognition performance degradation is demonstrated and a compensation method based on HMM composition is proposed to increase the performance. © 1998 IEEE.
  • Towards practical use for speaker recognition technology
    MATSUI Tomoko; YOSHIOKA Osamu; MINAMI Yasuhiro
    ITE Technical Report, The Institute of Image Information and Television Engineers, 22, 45, 43-48, 14 Sep. 1998, Recently, network services such as banking, electronic commerce, database access services, information services over the Internet and telephone networks have become popular, and user verification technology that is essential to those services has become important. Voice verification technology is one type of user verification technology, and demand for this should be anticipated in an environment where services such as those over a telephone network can use only voice as the means of communication. This paper describes practical speaker recognition technology for constructing voice verification systems. Moreover, this paper introduces new software developer kits for speaker recognition, reports on speaker recognition experiments using telephone speech that we recently conducted, and shows that the recognition performance is seriously affected by text and handset conditions that are the same/different for training and testing.
    Japanese
  • An HMM adaptation method for noise and distortion by maximizing likelihood
    Y Minami; S Furui
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 81, 8, 1-9, Aug. 1998, Peer-reviwed, This paper describes a new HMM synthesis method in which HMM adapts to additive noise and multiplicative distortion. The conventional HMM synthesis method can only be applied to additive noise. In the method described here, the likelihood of the synthesized HMM to adapt to speech is maximized, so that multiplicative distortion is eliminated when the method is applied. Within the framework of this method, adaptation to variations in the SN ratio, considered a problem in conventional HMM synthesis, can be formulated as part of the adaptation to multiplicative distortion. As a result of evaluating speech recognition rates using our method, we have confirmed that the method is effective for improving the recognition rate of speech that contains additive noise and multiplicative distortion. (C) 1998 Scripta Technica.
    Scientific journal, English
  • Compensation of Speaker Directivity in Speech Recognition Using HMM Composition
    F. Giron; Y. Minami; M. Tanaka; K. Furuya
    ICASSP, vol.1, 12-15, May 1998, Peer-reviwed
    International conference proceedings, English
  • Connected Digit Recognition in Spontaneous Speech
    E. Bauche; B. Gajic; Y. Minami; T. Matsuoka; S. Furui
    Eurospeech, 923-926, Sep. 1997, Peer-reviwed
    International conference proceedings, English
  • 尤度最大化による雑音とひずみへの Hmm 適応化手法
    南泰浩; 古井貞煕
    電子情報通信学会論文誌A, J80-A, 7, 1179-1186, Jul. 1997, Peer-reviwed
    Scientific journal, Japanese
  • An efficient search method for large-vocabulary continuous-speech recognition
    K Hanazawa; Y Minami; S Furui
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V, I E E E, COMPUTER SOC PRESS, 1787-1790, 1997, Peer-reviwed, This paper proposes an efficient method for large-vocabulary continuous-speech recognition, using a compact data structure and an efficient search algorithm. We introduce a very compact data structure DAWG as a lexicon to reduce the search space. We also propose a search algorithm to obtain the N-best hypotheses using the DAWG structure. This search algorithm is composed of two phases: ''forward search'' and ''haceback''. Forward search, which basically uses the time-synchronous Viterbi algorithm, merges candidates and stores the information about them in DAWG structures to create phoneme graphs. Traceback traces the phoneme graphs to obtain the N-best hypotheses. An evaluation of this method's performance. using a speech-recognition-based telephone-directory-assistance system having a 4000-word vocabulary confirmed that our strategy improves-speech recognition in terms of time and recognition rate.
    International conference proceedings, English
  • Adaptation method based on HMM composition and EM algorithm
    Y Minami; S Furui
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, IEEE, 327-330, 1996, Peer-reviwed
    International conference proceedings, English
  • Improved extended HMM composition by incorporating power variance
    Y Minami; S Furui
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, IEEE, 1109-1112, 1996, Peer-reviwed, This paper describes a way of improving extended HMM composition that can precisely adapt HMMs to both noisy and distorted speech. To do this, we incorporate the variance of power into extended HMM composition using quantization to approximate the Gaussian distribution of the 0th order cepstrum. Consequently, a distribution of noisy speech is approximated in the linear spectral domain as a mixture of log normal distributions.
    This method is evaluated by a four-digit recognition experiment when the number of digits is known. Two types of noise, computer room noise and car noise, are used and noisy and distorted speech data is made by adding these types of noise to speech data recorded using a boundary microphone. Results show that the proposed method improves recognition rates for noisy and distorted speech compared with our previous method.
    International conference proceedings, English
  • AN HMM STATE DURATION CONTROL ALGORITHM APPLIED TO LARGE-VOCABULARY SPONTANEOUS SPEECH RECOGNITION
    S TAKAHASHI; Y MINAMI; K SHIKANO
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78D, 6, 648-653, Jun. 1995, Peer-reviwed, Although Hidden Markov Modeling (HMM) is widely acid successfully used in many speech recognition applications, duration control for HMMs is still an important issue in improving recognition accuracy since a HMM places no constraints on duration. For compensating this defect, some duration control algorithms that employ precise duration models have been proposed. However, they suffer from greatly increased computational complexity. This paper proposes a new state duration control algorithm for limiting both the maximum and the minimum state durations. The algorithm is for the HMM trellis likelihood calculation, not for the Viterbi calculation. The amount of computation required by this algorithm is only order one (O(1)) for the maximum state duration n; that is, the computation amount is independent of the maximum state duration while many conventional duration control algorithm require computation in the amount of order n or order n(2). Thus, the algorithm can drastically reduce the computation needed for duration control. The algorithm uses the property that the trellis likelihood calculation is a summation of many path likelihoods. At each frame, the path likelihood that exceeds the maximum likelihood is subtracted, and the path likelihood that satisfies the minimum likelihood is added to the forward probability. By iterating this procedure, the algorithm calculates the trellis likelihood efficiently. The algorithm was evaluated using a large-vocabulary speaker-independent spontaneous speech recognition system for telephone directory assistance. The average reduction in error rate for sentence understanding was about 7% when using context-independent HMMs, and 3% when using context-dependent HMMs. We could confirm the improvement by using the proposed state duration control algorithm even though the maximum and the minimum state durations were not optimized for the task (speaker-independent duration settings obtained from a different task were used).
    Scientific journal, English
  • A SPEECH DIALOGUE SYSTEM WITH MULTIMODAL INTERFACE FOR TELEPHONE DIRECTORY ASSISTANCE
    O YOSHIOKA; Y MINAMI; K SHIKANO
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78D, 6, 616-621, Jun. 1995, Peer-reviwed, This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.
    Scientific journal, English
  • ACOUSTIC AND LANGUAGE PROCESSING TECHNOLOGY FOR SPEECH RECOGNITION
    T MATSUOKA; Y MINAMI
    NTT REVIEW, NTT CORP, 7, 2, 30-39, Mar. 1995, Peer-reviwed, This paper describes acoustic and language processing technology for automatic speech recognition. Speech recognition systems usually consist of acoustic and language processing modules. The acoustic processing extracts feature parameter vectors from the speech utterance and performs pattern recognition by comparing the vector sequence and pre-defined acoustic models. The most likely model is then chosen as the recognition result. The language processing helps recognition by narrowing down the number of candidates or selects the most linguistically matching hypothesis from those produced by the acoustic processing.
    Scientific journal, English
  • A MAXIMUM-LIKELIHOOD PROCEDURE FOR A UNIVERSAL ADAPTATION METHOD BASED ON HMM COMPOSITION
    Y MINAMI; S FURUI
    1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5, IEEE, 129-132, 1995, Peer-reviwed
    International conference proceedings, English
  • UNIVERSAL ADAPTATION METHOD BASED ON HMM COMPOSITION
    Y MINAMI; S FURUI
    ICA 95 - PROCEEDINGS OF THE 15TH INTERNATIONAL CONGRESS ON ACOUSTICS, VOL III, SINTEF, 105-108, 1995, Peer-reviwed
    International conference proceedings, English
  • LARGE-VOCABULARY CONTINUOUS SPEECH RECOGNITION ALGORITHM APPLIED TO A MULTIMODAL TELEPHONE DIRECTORY ASSISTANCE SYSTEM
    Y MINAMI; K SHIKANO; S TAKAHASHI; T YAMADA; O YOSHIOKA; S FURUI
    SPEECH COMMUNICATION, ELSEVIER SCIENCE BV, 15, 3-4, 301-310, Dec. 1994, Peer-reviwed, This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likelihood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm.
    Scientific journal, English
  • PHONEME HMM EVALUATION ALGORITHM WITHOUT PHONEME LABELING APPLIED TO CONTINUOUS SPEECH HMM EVALUATION
    Y MINAMI; T MATSUOKA; K SHIKANO
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 77, 11, 13-21, Nov. 1994, Peer-reviwed, Phoneme Hidden Markov Model (HMM) ate generally evaluated in terms of the phoneme recognition rate by using speech data extracted based on phoneme labels. This paper proposes an evaluation method that does not use phoneme labels for extraction. Consequently, phoneme HMMs can be evaluated even if a speech database without phoneme labeling is used.
    In this study, concatenation training of the phoneme HMMs is executed using a large-scale speaker-independent continuous-speech database. Evaluation of the HMM phoneme recognition rate which is a function of the number of training speakers, using the proposed evaluation method demonstrates its effectiveness.
    Scientific journal, English
  • Multimodal Telephone Directory Assistance System and Its Evaluation
    Y. Minami; O. Yoshioka; K. Shikano; S. Furui
    International Workshop on Human Interface Technology, 7-14, Sep. 1994, Peer-reviwed
    International conference proceedings, English
  • An HMM Duration Control Algorithm with a Low Computation Cost
    S. Takahashi; Y. Minami; K. Shikano
    ICSLP, 267-270, Sep. 1994, Peer-reviwed
    International conference proceedings, English
  • A Multi-Modal Dialogue System for Telephone Directory Assistance
    O. Yoshioka; Y. Minami; K. Shikano
    ICASSP, 887-890, Sep. 1994, Peer-reviwed
    International conference proceedings, English
  • SPEECH RECOGNITION USING PHONEME HMM CONSTRAINED BY FRAME CORRELATION
    S TAKAHASHI; T MATSUOKA; Y MINAMI; K SHIKANO
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 77, 6, 58-69, Jun. 1994, Peer-reviwed, One of the problems with the hidden Markov model (HMM) in performing speech recognition is that the local transition information of the feature vectors is not incorporated into the mechanism of the model and the model is not constrained by transitions of the feature vectors. Thus, the output probability distribution never changes during recognition. Furthermore, all transitions between the vectors that have high probabilities are allowed even if those transitions did not appear in the training data.
    This paper proposes a bigram-constrained HMM that uses correlations between two frames to constrain the feature distributions of a speaker-independent HMM to the region most appropriate for the speaker. Since the output probability of the bigram-constrained HMM is a conditional probability restricted by the feature vector of the previous frame, the output probability changes dynamically at each frame depending on the feature vector of the previous frame. Constraining the feature distribution makes it possible to reduce the overlapping of feature distributions between different phonemes which improves recognition performance.
    Previously, we proposed the discrete bigram-constrained HMM which is based on the combination of a discrete speaker-independent HMM and the VQ-code bigram. We showed that it performed better than conventional speaker-independent HMMs. In this paper, the strategy is extended to the tied-mixture bigram-constrained HMM and the continuous bigram-constrained HMM to obtain better recognition performance. These three types of HMMs are formulated and evaluated by phoneme recognition in continuous speech.
    Scientific journal, English
  • フレーム間相関を利用した音韻 Hmm による音声認識
    高橋敏; 松岡達雄; 南泰浩; 鹿野清宏
    電子情報通信学会論文誌A, 電子情報通信学会, J77-A, 2, 153-161, Feb. 1994, Peer-reviwed, 現在のHMMの問題点の一つに,出客確率分布が各状態内で常に一定で,音韻特徴量の遷移情報がモデルの仕組みの中に反映されていないという点が挙げられる.しかも,特徴ベクトルの遷移に制約がないので,互いに出力確率が特徴ベクトル間の遷移は,学習データ中に観測されなかった遷移でも高い出力確率が与えられている.本論文では,特徴ベクトルの2フレーム間の相関を用いて遷移を制約し,不特定話者用HMMの広がった特徴量分布を,入力話者に適した範囲に制約するBigram制約HMMを提案する.Bigram制約HMMの出力確率は,前時刻の特徴ベクルトルの条件付き確率で表現されるので,出力確率分布は各時刻で動的に変化する.また,分布を制約することにより,異なる音韻間の特徴量分布の重なりが減少し,認識率を向上することができる.我々は既に,離散型不特定話者用HMMをもとに,VQコードのBigramを用いて遷移を制約する離散型Bigram制約HMMを提案し,従来のHMMよりも性能が良いことを示した.本論文では,更に高い認識性能を得るために,この手法を半連続型Bigram制約HMM,連続型Bigram制約HMMに拡張した.連続音声中の音韻認識によって評価した結果,入力話者の音声のフレーム間相関情報を用した場合,半連続型Bigram制約HMMによって平均音韻認識率を65.4%から74.8%に,連続型Bigram制約HMMによって64.8%から74.5%に改善することができた.また,多数話者から抽出した一般的なフレーム間相関情報を用いた場合,連続型Bigram制約HMMによって64.8%から67.5%に改善することができた.
    Scientific journal, Japanese
  • 音韻ラベルを用いない Hmm 評価法とそれを用いた連続音声認識用 Hmm の評価
    南泰浩; 松岡達雄; 鹿野清宏
    電子情報通信学会論文誌A, J77-A, 2, 267-273, Feb. 1994, Peer-reviwed
    Scientific journal, Japanese
  • 番号案内を対象とした大語い連続音声認識アルゴリズム
    南泰浩; 山田智一; 鹿野清宏; 松岡達雄
    電子情報通信学会論文誌A, J77-A, 2, 190-197, Feb. 1994, Peer-reviwed
    Scientific journal, Japanese
  • A very large vocabulary continuous speech recognition algorithm for telephone directory assistance
    Yasuhiro Minami; Tomokazu Yamada; Kiyohiro Shikano; Tatsuo Matsuoka
    Electronics and Communications in Japan (Part III: Fundamental Electronic Science), 77, 11, 1-12, 1994, Peer-reviwed, This paper proposes a speech recognition algorithm for large vocabulary continuous speech. The proposed algorithm is based on the hidden Markov model (HMM)‐LR algorithm using a generalized predictive LR parser and phoneme HMMs. The following three techniques are applied to improve recognition performance and reduce processing time. The forward and the backward likelihood are used to accurately determine the likelihood in the beam search. To reduce the trellis computation in HMM speech recognition and for efficient search, only the speech frames in which the predicted phoneme seems to exist are used by the window for phoneme matching. For efficient search, adjusting identical phoneme sequences are merged by checking the stack and the state of the LR parser. The algorithm was applied to a telephone directory assistance task involving more than 70, 000 subscribers. A recognition experiment for continuous word utterance was done. The sentence recognition rate was 85 percent for speaker‐dependent speech recognition
    the sentence recognition rate was 71 percent for speaker‐independent speech recognition. The sentence understanding rate was 59 percent for speaker‐dependent speech recognition with spontaneous utterances. Copyright © 1994 Wiley Periodicals, Inc., A Wiley Company
    Scientific journal, English
  • Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system
    Yasuhiro Minami; Kiyohiro Shikano; Satoshi Takahashi; Tomokazu Yamada; Osamu Yoshioka; Sadaoki Furui
    Speech Communication, 15, 3-4, 301-310, 1994, Peer-reviwed, This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likehood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm. © 1994.
    Scientific journal, English
  • SEARCH ALGORITHM THAT MERGES CANDIDATES IN MEANING LEVEL FOR VERY LARGE VOCABULARY SPONTANEOUS SPEECH RECOGNITION
    Y MINAMI; K SHIKANO; S TAKAHASHI; T YAMADA
    ICASSP-94 PROCEEDINGS, VOL 2, IEEE, 141-144, 1994, Peer-reviwed
    International conference proceedings, English
  • Language Processing for Speech Recognition
    T. Matsuoka; Y. Minami
    NTT R & D, 43, 10, 91-100, 1994, Peer-reviwed
    Research institution, English
  • Acoustic Processing for Speech Recognition
    Y. Minami; T. Matsuoka
    NTT R & D, 43, 10, 81-90, 1994, Peer-reviwed
    Research institution, English
  • Large-Vocabulary Continuous Speech Recognition Algorithm for Telephone Directory Assistance
    K. Shikano; Y. Minami; S. Takahashi; T. Yamada
    IEEE Workshop on Automatic Speech Recognition, 14-15, Dec. 1993, Peer-reviwed
    International conference proceedings, English
  • Large Vocabulary Continuous Speech Recognition System for Telephone Directory Assistance
    Y. Minami; K. Shikano; S. Takahashi; T. Yamada; O. Yoshioka
    International Symposium on Spoken Dialogue, 169-172, Nov. 1993, Peer-reviwed
    International conference proceedings, English
  • Multi-Modal Telephone Directory Assistance System Based on Large-Vocabulary Continuous Speech Recognition Algorithm
    K. Shikano; Y. Minami; O. Yoshioka; S. Takahashi; T. Yamada
    International Workshop on Knowledge Structure for Understanding Speech and Language, 1, Nov. 1993, Peer-reviwed
    International conference proceedings, English
  • Recognition of Noisy Speech by Composition of Hidden Markov Models
    F. Martin; K. Shikano; Y. Minami
    Eurospeech, 1031-1034, Sep. 1993, Peer-reviwed
    International conference proceedings, English
  • PHONEME HMMS CONSTRAINED BY FRAME CORRELATIONS
    S TAKAHASHI; T MATSUOKA; Y MINAMI; K SHIKANO
    ICASSP-93 : 1993 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5, I E E E, B219-B222, 1993, Peer-reviwed
    International conference proceedings, English
  • Phoneme HMM Evaluation Algorithm without Phoneme Labeling
    Y. Minami; T. Matsuoka; K. Shikano
    ICSLP, 1535-1538, Oct. 1992, Peer-reviwed
    International conference proceedings, English
  • Very Large Vocabulary Continuous Speech Recognition for Telephone Directory Assistance
    Y. Minami; K. Shikano; T. Yamada; T. Matsuoka
    IEEE Workshop on Interactive Voice technology for Telecommunications Applications, VII.1, 2129-2132, Oct. 1992, Peer-reviwed
    International conference proceedings, English
  • RECENT TOPICS IN SPEECH RECOGNITION RESEARCH AT NTT LABORATORIES
    S FURUI; K SHIKANO; S MATSUNAGA; T MATSUOKA; S TAKAHASHI; T YAMADA
    SPEECH AND NATURAL LANGUAGE, MORGAN KAUFMANN PUB INC, 162-167, 1992, Peer-reviwed
    International conference proceedings, English
  • CONNECTIONIST APPROACHES TO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    H SAWAI; Y MINAMI; M MIYATAKE; A WAIBEL; K SHIKANO
    IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, IEICE-INST ELECTRON INFO COMMUN ENG, 74, 7, 1834-1844, Jul. 1991, Peer-reviwed, This paper describes recent progress in a connectionist large-vocabulary continuous speech recognition system integrating speech recognition and language processing. The speech recognition part consists of Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all 24 Japanese phonemes (i.e., 18 consonants /b/, /d/, /g/, /p/, /t/, /k/, /m/, /n/, /N/, /s/, /sh/ ([integral]), /h/, /z/, /ch/ ([t-integral]), /ts/, /r/, /w/, /y/ ([j]) and 5 vowels /a/, /i/, /u/, /e/, /o/ and a double consonant /Q/ or silence) by simply scanning among input speech without any specific segmentation techniques. On the other hand, the language processing part is made up of a predictive LR parser in which the LR parser is guided by the LR parsing table automatically generated from context-free grammar rules, and proceeds left-to-right without backtracking. Time alignment between the predicted phonemes and a sequence of the TDNN phoneme outputs is carried out by the DTW matching method. We call this 'hybrid' integrated recognition system the 'TDNN-LR' method. We report that large-vocabulary isolated word and continuous speech recognition using the TDNN-LR method provided excellent speaker-dependent recognition performance, where incremental training using a small number of training tokens is found to be very effective for adaptation of speaking rate. Furthermore, we report some new achievements as extensions of the TDNN-LR method: (1) two proposed NN architectures provide robust phoneme recognition performance on variations of speaking manner, (2) a speaker-adaptation technique can be realized using a NN mapping function between input and standard speakers and (3) new architectures proposed for speaker-independent recognition provide performance that nearly matches speaker-dependent recognition performance.
    Scientific journal, English
  • On the Robustness of HMM and Ann Speech Recognition Algorithms
    Y. Minami; T. Hanazawa; H. Iwamida; E. McDermott; K. Shikano; S. Katagiri; M. Nakagawa
    ICSLP, 1345-1348, Nov. 1990, Peer-reviwed
    International conference proceedings, English
  • Trigramモデルを用いた複数候補を求めるフレーム同期型 Hmm 連続音声認識
    南泰浩; 中川正雄
    電子情報通信学会論文誌D, 電子情報通信学会情報・システムソサイエティ, J73-D-II, 9, 1383-1392, Sep. 1990, Peer-reviwed
    Scientific journal, Japanese
  • 時間遅れ神経回路による音韻スポッティング法と予測lrパーザを用いた大語い単語音声認識
    南泰浩; 沢井秀文; 宮武正典
    電子情報通信学会論文誌D, 電子情報通信学会情報・システムソサイエティ, J73-D-II, 6, 788-795, Jun. 1990, Peer-reviwed
    Scientific journal, Japanese
  • INTEGRATED TRAINING FOR SPOTTING JAPANESE PHONEMES USING LARGE PHONEMIC TIME-DELAY NEURAL NETWORKS
    M MIYATAKE; H SAWAI; Y MINAMI; K SHIKANO
    ICASSP 90, VOLS 1-5, I E E E, 449-452, 1990, Peer-reviwed
    International conference proceedings, English
  • VARIABLE BIT RATE PARCOR.VQ HYBRID VOCODER
    K MIZUI; S WAKABAYASHI; M SATOH; Y MINAMI; M NAKAGAWA
    DALLAS GLOBECOM 89, VOLS 1-3, I E E E, 1885-1889, 1989, Peer-reviwed
    International conference proceedings, English

MISC

  • センター試験を対象とした高性能な英語ソルバーの実現
    杉山弘晃; 成松宏美; 菊井玄一郎; 東中竜一郎; 堂坂浩二; 平博順; 南泰浩; 大和淳司
    2020, 言語処理学会年次大会発表論文集(Web), 26th, 2188-4420, 202002257856951573
  • Solving the opinion summarization problem in English in the "Can a Robot Get into the University of Tokyo?" project
    東中竜一郎; 杉山弘晃; 成松宏美; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 喜多智也; 南泰浩; 風間健流; 大和淳司
    2018, 人工知能学会全国大会論文集(CD-ROM), 32nd, ROMBUNNO.2C1.02, Japanese, 1347-9881, 201802262605015084
  • Why Does It Matter Whether or Not AI is able to Pass University Entrance Examinations? : 1. Technical Challenges Revealed by Solving English Problems
    東中竜一郎; 杉山弘晃; 堂坂浩二; 南泰浩; 成松宏美; 磯崎秀樹; 菊井玄一郎; 平博順; 大和淳司
    15 Jun. 2017, 情報処理, 58, 7, 600-602, Japanese, Introduction scientific journal, 170000148666, AN00116625
  • Current status and future challenges for the English subject in the "Can a Robot Get into the University of Tokyo?" project
    東中竜一郎; 杉山弘晃; 成松宏美; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩; 大和淳司
    2017, 人工知能学会全国大会論文集(CD-ROM), 31st, ROMBUNNO.2H2‐1, Japanese, 1347-9881, 201802268089508328
  • センター試験における英語問題の回答手法
    東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
    2015, 言語処理学会年次大会発表論文集(Web), 21st, 2188-4420, 201502234485199950
  • Gender effect on infant word acquisition order
    MINAMI Yasuhiro; KOBAYASHI Tessei
    Tomasello pointed out that the word acquisition ages are strongly affected by social environment. This report, to conduct fundamental investigation of such gender effect, we investigate whether word acquisition ages are affected by gender. In order to perform this, we calculate word comprehension ages and word production ages from database collected with MacArthur-Bates Communicative Developmental Inventories. The gender dependent correlation analysis of word production ages and word ages showed cross-gender universality. Moreover, we found gender effect by removing the universality., The Institute of Electronics, Information and Communication Engineers, 29 May 2014, Technical report of IEICE. HIP, 114, 68, 61-66, Japanese, 0913-5685, 110009903784, AN10487237
  • Gender effect on infant word acquisition order
    MINAMI Yasuhiro; KOBAYASHI Tessei
    Tomasello pointed out that the word acquisition ages are strongly affected by social environment. This report, to conduct fundamental investigation of such gender effect, we investigate whether word acquisition ages are affected by gender. In order to perform this, we calculate word comprehension ages and word production ages from database collected with MacArthur-Bates Communicative Developmental Inventories. The gender dependent correlation analysis of word production ages and word ages showed cross-gender universality. Moreover, we found gender effect by removing the universality., The Institute of Electronics, Information and Communication Engineers, 29 May 2014, Technical report of IEICE. HCS, 114, 67, 61-66, Japanese, 0913-5685, 110009903742, AN10487226
  • 絵本を基にした対象年齢推定方法の検討
    藤田早苗; 小林哲生; 平博順; 南泰浩; 田中貴秋
    2014, 人工知能学会全国大会論文集(CD-ROM), 28th, 1347-9881, 201402211375582913
  • 幼児の言語発達研究<最前線>
    小林哲生; 南泰浩
    2014, ヒューマンインターフェース学会誌, 16, 2, 29-34, Japanese, Peer-reviwed, Invited, Introduction scientific journal
  • Dialogue act tagging for microblog utterances using semantic category patterns
    目黒豊美; 東中竜一郎; 杉山弘晃; 南泰浩
    In this paper, we propose dialogue act tagging for utterances in microblogs. The dialogue act estimator is built by using support vector machines (SVMs). To cope with the variety of words and expressions in microblogs, the feature vector uses N-grams of characters and words. In addition, the feature vector of word N-grams are abstracted into semantic categories by using a thesaurus. In our experiment, the proposed model outperformed naive baselines based on word N-grams., Information Processing Society of Japan (IPSJ), 18 Oct. 2013, IPSJ SIG Notes, 2013, 1, 1-6, Japanese, 110009613935, AN10442647
  • Investigation of infant vocabulary learning characteristic using comprehension-to-production (C2P) indexes for early development words
    MINAMI Yasuhiro; KOBAYASHI Tessei; SUGIYAMA Hiroaki
    Gentner insisted that noun learning predominates verb learning in early vocabulary development Until now, many researches, which examine the predominancy of the noun frequency in the part-of-speech distribution, have supported this assumption Gentner et al and Maguire et al proposed concepts that a vocabulary is a continuum in an abstract representation space, which is strongly related to learning difficulty of words, in order to explain this predominancy of words However, there was no research which showed clearly relation between the learning difficulty and this abstract representation space In this paper, we propose the direct index which connects the difficulty to abstract representation space of words using CDI Furthermore, we report the distribution of word categories in early vocabulary development on this index space., The Institute of Electronics, Information and Communication Engineers, 22 Feb. 2013, Technical report of IEICE. Thought and language, 112, 442, 37-42, Japanese, 0913-5685, 110009728695, AN10449078
  • 対話処理における強化学習
    南泰浩; 目黒豊美
    計測自動制御学会, 2013, 計測と制御, 52, 10, 916-921, Japanese, Peer-reviwed, Invited, Introduction scientific journal, 0453-4662, 40019836182, AN00072406
  • Estimating the vocabulary spurt onset using two piece linear regression
    南 泰浩; 小林 哲生; 杉山 弘晃
    日本音響学会聴覚研究委員会, 08 Mar. 2012, 聴覚研究会資料, 42, 2, 155-160, Japanese, 1346-1109, 40019248769, AN00227138
  • Dialog Control via Preference-learning based Inverse Reinforcement Learning
    杉山 弘晃; 目黒 豊美; 南 泰浩
    人工知能学会, 2012, 人工知能学会全国大会論文集, 26, 1-4, Japanese, 1347-9881, 40020270054, AA11578981
  • Wizard of Oz experiment of listening-oriented dialogue control using POMDPs
    目黒 豊美; 南 泰浩; 東中 竜一郎
    人工知能学会, 2012, 人工知能学会全国大会論文集, 26, 1-4, Japanese, 1347-9881, 40020270069, AA11578981
  • 語彙爆発の新しい視点 : 日本語学習児の初期語彙発達に関する縦断データ解析
    小林 哲生; 南 泰浩; 杉山 弘晃
    日本赤ちゃん学会, 2012, ベビーサイエンス, 12, 40-64, Japanese, 40019763996, AA11903404
  • 統計的手法による音声対話制御
    南泰浩
    2012, 情報処理学会誌, 53, 10, 1088-1094, Japanese, Peer-reviwed, Invited, Introduction scientific journal, 20001036089
  • POMDP dialogue control using action durations
    南 泰浩; 目黒 豊美; 東中 竜一郎; 堂坂 浩二; 前田 英作
    この報告ではアクションの継続長制御を利用する POMDP による対話制御手法を提案する。我々は、これまで,POMDP による対話制御に,Trigram モデルによる統計的な対話制御を取り入れる手法を提案してきた。しかし,この手法は,対話タスクを自動的に学習することができる反面,高い確率を持っているアクションを過剰に生成する問題点があることが実験からわかってきた.本稿では,この問題点を解決するため POMDP を用いる対話制御において,アクション継続長の確率分布に従ってアクションを生成する手法を導入する。実験結果において,提案方法はアクションの Trigram 確率を高く保ちながら,偏りのないアクション生成を実現できることを確認した.This paper proposes a dialogue control method using action durations. We previously proposed a combined method of an ordinary POMDP-based method and a probability-based method and extended it to treat trigram dialogue control. When we apply this method to less task-oriented dialogues, the method over-generates actions that have high probabilities. To avoid this problem, we introduce duration control to our POMDP action generation process. The experimental results show that the proposed method can generate action sequences whose probability is similar to the training data and increase the entropy of the actions. This increase means that the action generation gives new information and avoids over-gererating the same actions. This confirms that our method generates appropriate action sequences., 情報処理学会, Apr. 2011, 情報処理学会研究報告, 2010, 6, 1-8, Japanese, 2186-2583, 110008583618
  • POMDP Dialogue Control using Action Duration
    南 泰浩; 目黒 豊美; 東中 竜一郎
    人工知能学会, 2011, 人工知能学会全国大会論文集, 25, 1-4, Japanese, 1347-9881, 40020269460, AA11578981
  • 部分観測マルコフ決定過程に基づく対話制御
    南泰浩
    2011, 音響学会誌, 67, 10, 482-487, Japanese, Peer-reviwed, Invited, Introduction scientific journal
  • 人ロボット共生におけるコミュニケーション戦略の生成
    前田英作; 南泰浩; 堂坂浩二
    2011, 日本ロボット学会誌, 29, 10, 887-890, Japanese, Peer-reviwed, Invited, Introduction scientific journal
  • On a dynamically changing learning strategy for interactive visual scene understanding
    KIMURA Akisato; MINAMI Yasuhiro; SAKANO Hitoshi; MAEDA Eisaku; SUGIYAMA Hiroaki
    We humans believe that we can easily and naturally understand and verbalize most of given visual scenes. On the other hand, as widely known, the problem of visual scene understanding has been still yet to be far from the ultimate goal, despite of its long history and significance. However, even in humans, it would be natural that almost all the abilities for visual scene understanding have to be (but unintentionally) acquired during the developmental processes, except for basic sensory organs and quite a few fundamental functions. In this report, we discuss a novel approach to the realization of sophisticated visual scene understanding so that computers can acquire the ability naturally. Most of the discussion are directed to its learning strategy, which should be communicative, dynamically changed according to their own knowledge and long-tailed. The framework provides a lot of new and challenging problems to not only the multimedia research community but also other related communities such as HCI, computer vision, machine learning and cognitive science., The Institute of Electronics, Information and Communication Engineers, 02 Dec. 2010, Technical report of IEICE. PRMU, 110, 330, 53-54, Japanese, 110008675751
  • Thought-evoking multi-party dialogue system: CAMP
    堂坂 浩二; 南 泰浩
    人工知能学会, 28 Oct. 2010, 言語・音声理解と対話処理研究会, 60, 35-38, Japanese, 0918-5682, 40017365379, AN10432166
  • Action planning for interactive visual scene understanding based on knowledge confidence defined on latent spaces
    SEKHON Gurbachan; KIMURA Akisato; MINAMI Yasuhiro; SAKANO Hitoshi; MAEDA Eisaku
    This report proposes a method for action planning in a system of interactive visual scene understanding through the use of system knowledge and its confidence. The knowledge confidence is defined as the combination of the following two properties on the latent space of a topic model connecting image features and text labels: 1) Similarity between an input sample and training samples on the latent space, and 2) the overall associability between each text label as determined by the content of the training samples. We evaluate the proposed method in the context of annotation accuracy and effort for providing answers from users. The experimental results with PASCAL VOC2008 dataset indicate that our proposed method achieved comparable or better annotation accuracy with less effort compared with strategies of 1) always asking the name of objects and 2) generating random questions., The Institute of Electronics, Information and Communication Engineers, 29 Aug. 2010, Technical report of IEICE. PRMU, 110, 187, 201-208, English, 0913-5685, 110008107179
  • Dialogue control by POMDP using dialogue data statistics
    Minami Yasuhiro; Mori Akira; Meguro Toyomi; Higashinaka Ryuichiro; Dohsaka Kohji; Maeda Eisaku
    The Institute of Electronics, Information and Communication Engineers, 21 Dec. 2009, IEICE technical report, 109, 355, 83-88, Japanese, 0913-5685, 110008002098
  • Analyzing the Characteristics of Listening-oriented Dialogue for Building Listening Agents
    MEGURO TOYOMI; HIGASHINAKA RYUICHIRO; DOHSAKA KOHJI; MINAMI YASUHIRO; ISOZAKI HIDEKI
    我々は,ユーザの話を聞くことによって 「話したい」 という欲求を満たす聞き役対話システムの構築を目的としている.本稿では,そのような対話システムの構築を目的とした聞き役対話の分析について報告する.まず,人同士の聞き役対話と雑談を収集し,それぞれの対話タイプにおける対話行為の頻度を比較し,続いて,対話の流れを Hidden Markov Model (HMM) を用いて分析した.その結果,聞き役対話と雑談の HMM はそれぞれの特徴を示し,聞き役対話では,聞き役は質問をする前に自己開示を行い,より質問と相槌を多く行っていることがわかった.また,話し役や聞き役の性格特徴によって聞き役対話がどのように変わるかを分析した.その結果,それぞれの性格特徴によって対話が大きく異なることがわかった.Our aim is to build listening agents that can attentively listen to the user and satisfy his/her desire to speak and have himself/ herself heard. This paper investigates the characteristics of such listening-oriented dialogues so that such a listening process can be achieved by automated dialogue systems. We collected both listening-oriented dialogues and casual conversation, and analyzed them by comparing the frequency of dialogue acts, as well as the dialogue flows using Hidden Markov Models (HMMs). The analysis revealed that listening-oriented dialogues and casual conversation have characteristically different dialogue flows and that it is important for listening agents to self-disclose before asking questions and to utter more questions and acknowledgment than in casual conversation. We also investigated the effects of personality traits on listening-oriented dialogue. We found that a dialogue becomes characteristically different depending on the personality traits of speakers and listeners., 21 Sep. 2009, 研究報告自然言語処理(NL), 2009, 10, 1-6, Japanese, 0919-6072, 110008003241, AN10115061
  • Controlling thought-evoking dialogue using POMDP
    MINAMI Yasuhiro; SAWAKI Minako; HIGASHINAKA Ryuichiro; DOHSAKA Kohji
    We are researching thought-evoking dialogue systems where conversation agents appropriately affect users and evoke their voluntary thoughts to motivate human communication. This paper proposes a thought-evoking quiz dialogue system using the Partially Observed Markov Decision Process (POMDP) that can treat such uncertain information as paralanguage information. As uncertain information, we employ the user's level of difficulty in handling quiz hints. Another person detects this difficulty level by observing the user's facial and voice information. The system controls the user's difficulty levels (easy, neutral, and difficult) for the hints by skipping hints based on the POMDP policy that was learned by reinforcement training. This paper evaluates the proposed system in simulation experiments., Information Processing Society of Japan (IPSJ), 02 Dec. 2008, IPSJ SIG Notes, 2008, 123, 97-102, Japanese, 0919-6072, 110007114728, AN10442647
  • Controlling thought-evoking dialogue using POMDP
    MINAMI Yasuhiro; SAWAKI Minako; HIGASHINAKA Ryuichiro; DOHSAKA Kohji
    We are researching thought-evoking dialogue systems where conversation agents appropriately affect users and evoke their voluntary thoughts to motivate human communication. This paper proposes a thought-evoking quiz dialogue system using the Partially Observed Markov Decision Process (POMDP) that can treat such uncertain information as paralanguage information. As uncertain information, we employ the user's level of difficulty in handling quiz hints. Another person detects this difficulty level by observing the user's facial and voice information. The system controls the user's difficulty levels (easy, neutral, and difficult) for the hints by skipping hints based on the POMDP policy that was learned by reinforcement training. This paper evaluates the proposed system in simulation experiments., The Institute of Electronics, Information and Communication Engineers, 02 Dec. 2008, IEICE technical report, 108, 337, 97-102, Japanese, 0913-5685, 110007114428, AN10091225
  • まっしゅるーむの世界 -環境知能の実現-
    南泰浩; 堂坂浩二; 澤木美奈子; 森啓; 前田英作
    2008, ヒューマンインタフェース学会誌, 10, 2, 5-10, Japanese, Peer-reviwed, Invited, Introduction scientific journal
  • クイズ対話システムの構築と音声認識性能による評価
    南泰浩; 東中竜一郎; 澤木美奈子; 堂坂浩二; 山田武士; 松林達史; 磯崎秀樹; 前田英作
    2007, 日本音響学会研究発表会講演論文集(CD-ROM), 2007, 1880-7658, 200902257906453498
  • Evaluation of the SOLON Speech Recognition System : 2006 Benchmark using the Corpus of Spontaneous Japanese
    NAKAMURA Atsushi; OBA Takanobu; WATANABE Shinji; ISHIZUKA Kentaro; FUJIMOTO Masakiyo; HORI Takaaki; MCDERMOTT Erik; MINAMI Yasuhiro
    This article describes results from the latest benchmark tests of our speech recognition system 'SOLON' using the Corpus of Spontaneous Japanese (CSJ). The improvement in recognition accuracy using several techniques, including prior voice-activity detection, speaking-rate dependent analysis, corrective language modeling, discriminative training of full-covariance parameters, unsupervised model adaptation, and their combinations, are reported., The Institute of Electronics, Information and Communication Engineers, 15 Dec. 2006, IEICE technical report, 106, 444, 73-78, Japanese, 0913-5685, 110006163063, AN10013221
  • A Transdisciplinary Approach to Human-Computer Interaction with Kankyo Chinou : towards new "intellect" in future "environments"
    MAEDA Eisaku; MINAMI Yasuhiro; DOHSAKA KOHJI; MORI Akira; KONDOH Tadahisa
    A research project on "ambient intelligence"(Kankyo Chinou in Japanese) was launched two years ago by NTT Communication Science Laboratories that targeted new lifestyles made possible by communication science. Research activities on "ambient intelligence" should bridge the boundaries between technological fields and thus cover the entire field of communication science, rather than be limited to specific fields. Besides performing the basic R&D, we are striving to get this concept established in a comprehensive and strategic way. This article introduces achievements made in this project and the details of the developed demonstration systems., 一般社団法人電子情報通信学会, 12 Oct. 2006, IEICE technical report, 106, 298, 51-56, Japanese, 0913-5685, 110004851875
  • A Transdisciplinary Approach to Human-Computer Interaction with Kankyo Chinou : towards new "intellect" in future "environments"
    MAEDA Eisaku; MINAMI Yasuhiro; DOHSAKA KOHJI; MORI Akira; KONDOH Tadahisa
    A research project on "ambient intelligence" (Kankyo Chinou in Japanese) was launched two years ago by NTT Communication Science Laboratories that targeted new lifestyles made possible by communication science. Research activities on "ambient intelligence" should bridge the boundaries between technological fields and thus cover the entire field of communication science, rather than be limited to specific fields. Besides performing the basic R&D, we are striving to get this concept established in a comprehensive and strategic way. This article introduces achievements made in this project and the details of the developed demonstration systems., 一般社団法人電子情報通信学会, 12 Oct. 2006, IEICE technical report, 106, 300, 69-74, Japanese, 0913-5685, 110004852058
  • A Benchmark Evaluation of Speech Recognizer SOLON using The Corpus of Spontaneous Japanese (Ver. 1.0)
    NAKAMURA Atsushi; OBA Takanobu; WATANABE Shinji; ISHIZUKA Kentaro; HORI Takaaki; SCHUSTER Mike; MCDERMOTT Erik; MINAMI Yasuhiro
    The SOLON is a speech recognition testbed system that has been developed at NTT Communication Science Laboratories. This paper reports results from the latest benchmark evaluation of the SOLON using the Corpus of Spontaneous Japanese (CSJ). The effectiveness of some of techniques, including minimum classification error training and full-covariance modeling, is presented through experiments. Also, results of recognition error analysis and additional evaluations are described., The Institute of Electronics, Information and Communication Engineers, 22 Dec. 2005, IEICE technical report, 105, 494, 7-12, Japanese, 0913-5685, 110003488505, AN10091225
  • A Benchmark Evaluation of Speech Recognizer SOLON using The Corpus of Spontaneous Japanese (Ver. 1.0)
    NAKAMURA Atsushi; OBA Takanobu; WATANABE Shinji; ISHIZUKA Kentaro; HORI Takaaki; SCHUSTER Mike; MCDERMOTT Erik; MINAMI Yasuhiro
    The SOLON is a speech recognition testbed system that has been developed at NTT Communication Science Laboratories. This paper reports results from the latest benchmark evaluation of the SOLON using the Corpus of Spontaneous Japanese (CSJ). The effectiveness of some of techniques, including minimum classification error training and full-covariance modeling, is presented through experiments. Also, results of recognition error analysis and additional evaluations are described., Information Processing Society of Japan (IPSJ), 22 Dec. 2005, IPSJ SIG Notes, 2005, 127, 97-102, Japanese, 0919-6072, 110003494733, AN10442647
  • Applications of the Bayesian network to audio signal recognition
    Kashino Kunio; Minami Yasuhiro
    Acoustical Society of Japan, 2005, THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN, 61, 12, 714-719, Japanese, Peer-reviwed, Invited, Introduction scientific journal, 0369-4232, 110004019698, AN00186234
  • Speech recognition method based on trajectories generated by Kalman filters
    MINAMI Yasuhiro
    22 Dec. 2004, 情報処理学会研究報告. SLP, 音声言語情報処理, 54, 49-54, English, 0919-6072, 10014062518, AN10442647
  • 音声生成モデルを考慮した音声認識
    南泰浩
    2003, 日本音響学会誌, 59, 11, Japanese, Peer-reviwed, Invited, Introduction scientific journal
  • A Recogniton Method with Parametric Trajectory Synthesized Using Direct Relations Between Static and Dynamic Feature Vector Time Series
    MINAMI Yasuhiro; MCDERMOTT Erik; NAKAMURA Atsushi; KATAGIRI Shigeru
    18 Mar. 2002, 日本音響学会研究発表会講演論文集, 2002, 1, 83-84, Japanese, 1340-3168, 10018033127, AN00351181
  • LANGUAGE MODEL SYNCHRONIZATION FOR IMPROVED BEAM-SEARCH PERFORMANCE IN LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    WILLETT Daniel; MCDERMOTT Erik; MINAMI Yasuhiro; KATAGIRI Shigeru
    01 Oct. 2001, 日本音響学会研究発表会講演論文集, 2001, 2, 99-100, English, 1340-3168, 10007458257, AN00351181
  • An efficient search method for continuous speech recognition using network structure.
    HANAZAWA Ken; MINAMI Yasuhiro; FURUI Sadaoki
    01 Mar. 1997, 日本音響学会研究発表会講演論文集, 1997, 1, 51-52, Japanese, 1340-3168, 10002742165, AN00351181
  • Digit string recognition in spontaneous speech
    BAUCHE Etienne; MINAMI Yasuhiro; GAJIC Bojana; MATSUOKA Tatsuo; FURUI Sadaoki
    01 Mar. 1997, 日本音響学会研究発表会講演論文集, 1997, 1, 169-170, Japanese, 1340-3168, 10002742502, AN00351181
  • Extended HMM composition incorporating power variance.
    MINAMI Yasuhiro; FURUI Sadaoki
    01 Sep. 1996, 日本音響学会研究発表会講演論文集, 1996, 2, 141-142, Japanese, 1340-3168, 10002739836, AN00351181
  • 雑音と歪みを含んだ音声へのHMM適応化手法の評価
    南泰浩
    1996, 日本音響学会講演論文集, 85-86, 10004085867, AN00351181
  • 雑音と歪みを含んだ音声へのHMM適応化手法の評価
    南泰浩
    1996, 音講論集, 2, 10004086017, AN00351181
  • HMM adaptation method using maximum likelihood estimation
    MINAMI Yasuhiro; FURUI Sadaoki
    01 Sep. 1995, 日本音響学会研究発表会講演論文集, 1995, 2, 1-2, Japanese, 1340-3168, 10002734243, AN00351181
  • Adaptation method using maximum likelihood procedure based on HMM composition
    MINAMI Yasuhiro; FURUI Sadaoki
    This paper proposes an adaptation method for universal noise (combination of additive noise and multiplicative distortion) based on the HMM composition (compensation) technique. Although the original HMM composition can be applied only to additive noise, our new method can also estimate multiplicative distortion by maximizing the likelihood value. The signal-to-noise ratio is automatically estimated as part of the estimation of multiplicative distortion. Phoneme recognition experiments show that this method improves recognition accuracy for noisy and distorted speech., The Institute of Electronics, Information and Communication Engineers, 22 Jun. 1995, IEICE technical report. Speech, 95, 122, 45-50, Japanese, 110003296453, AN10013221
  • A maximum likelihood procedure for an HMM adaptation method
    MINAMI Yasuhiro; FURUI Sadaoki
    01 Mar. 1995, 日本音響学会研究発表会講演論文集, 1995, 1, 61-62, Japanese, 1340-3168, 10002731969, AN00351181
  • 自由発声音声認識における意味を考慮した2段 LP パーザの検討
    南泰浩
    1993, 音講論, 69-70, 10006730761, AN00351181
  • Comparisons between Continuous Speech Recognition Systems (ATREUS) at ATR.
    山口耕市; 嵯峨山茂樹; 服部浩明; 小森康弘; 沢井秀文; 花沢利行; 中村哲; 甲斐充彦; 南泰浩
    Oct. 1992, 日本音響学会研究発表会講演論文集, 1992, Autumn Pt 1, 181-182, Japanese, 1340-3168, 200902051842663488
  • 不特定話者連続音声データベースを用いたHMMの連結学習
    南泰浩
    1992, 音講論集, 9-10, 10006764155, AN00351181
  • TDDN音韻スポッティングと拡張LRパーザを用いた文節音声認識
    南泰浩
    1989, 平1秋音響講論集, 3, 10006754381

Books and other publications

  • 人工知能プロジェクト「ロボットは東大に入れるか」: 第三次AIブームの到達点と限界
    Scholarly book, Japanese, Joint work, 東京大学出版会, 28 Sep. 2018
  • これからの強化学習
    Scholarly book, Japanese, Joint work, 森北出版, 27 Oct. 2016
  • Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models
    R. Higashinaka; Y. Minami; K. Dohsaka; T. Meguro
    English, Joint work, 2010
  • Dialogue Control by Pomdp Using Dialogue Data Statistics
    Y. Minami; A. Mori; T. Meguro; R. Higashinaka; K. Dohsaka; E. Maeda
    English, Joint work, 2010
  • 環境知能のすすめ -情報化社会の新しいパラダイム-
    外村佳伸; 前田英作; 竹内郁雄; 東浩紀; 石黒浩; 下條信輔; 堂坂浩二; 南泰浩; 中島秀之; 輿水大和
    Japanese, Joint work, 2008
  • 音声認識の基礎(上)
    古井貞煕; 鹿野清宏; 嵯峨山茂樹; 松岡達雄; 南泰浩; 松井知子; 高橋敏; 山田智一; 吉岡理
    Japanese, Joint translation, 1995
  • FM7 解析マニュアル フェーズiii
    菊地寿; 蓑原辰夫; 南泰浩
    Japanese, Joint work, 1984

Lectures, oral presentations, etc.

  • 論文執筆支援を目的とした引用要否判定タスクのドメイン間比較
    小山康平; 小林恵大; 成松宏美; 南泰浩
    言語処理学会第29回年次大会
    16 Mar. 2023
  • 人間の多次元的な心的表象に基づく幼児語彙獲得モデルの構築
    藤田守太; 南泰浩
    言語処理学会第29回年次大会
    16 Mar. 2023
  • 知識グラフと Wikipedia を用いた雑談対話モデルの構築
    郭恩孚; 南泰浩
    言語処理学会第29回年次大会
    14 Mar. 2023
  • 共通基盤の構築における名付けの有用性の分析
    齋藤結; 光田航; 東中竜一郎; 南泰浩
    言語処理学会第29回年次大会
    15 Feb. 2023
  • 話題継続とペルソナを考慮した雑談対話システムの構築
    佐藤明智; 南泰浩; 金子俊太; 谷口伊織; 郭
    言語・音声理解と対話処理研究会
    Dec. 2022
  • 対話での共通基盤構築過程における名付けの分析
    齋藤結; 光田航; 東中竜一郎; 南泰浩
    Oral presentation, Japanese, 言語処理学会28回年次大会, Domestic conference
    15 Mar. 2022
  • 引用要否判定タスクにおけるモデルの性能評価とデータの妥当性分析
    小山康平; 小林恵大; 成松宏美; 南泰浩
    Oral presentation, Japanese, 言語処理学会第28回年次大会, Domestic conference
    15 Mar. 2022
  • 学術論文PDFからの関連研究章と引用情報の抽出による論文執筆支援のためのデータセット構築
    小林恵大; 小山康平; 成松宏美; 南泰浩
    Oral presentation, Japanese, 言語処理学会第28回年次大会, Domestic conference
    15 Mar. 2022
  • 固有名詞に注目したTransformerによる雑談対話モデルの構築
    郭恩孚; 南泰浩
    Oral presentation, Japanese, 言語処理学会第28回年次大会, Domestic conference
    15 Mar. 2022
  • Bert による引用要否判定とエラー分析
    堂坂浩二; 成松宏美; 小山康平; 東中竜一郎; 南泰浩; 田盛大悟; 平
    人工知能学会全国大会
    Jun. 2021
  • 相互排他性を考慮した深層強化学習による幼児語彙獲得モデル
    藤田守太; 南泰浩; 田口真輝
    Oral presentation, Japanese, 言語処理学会第27回年次大会(NLP2021), Domestic conference
    17 Jan. 2021
  • 学術論文における関連研究の執筆支援のための被引用論文の推定
    小山康平; 南泰浩; 成松宏美; 堂坂浩二; 東中竜一郎; 田盛大悟; 平博順
    Oral presentation, Japanese, 言語処理学会第27回年次大会(NLP2021), Domestic conference
    17 Jan. 2021
  • 学術論文における関連研究の執筆支援のためのタスク設計およびデータ構築
    成松宏美; 小山康平; 堂坂浩二; 田盛大悟; 東中竜一郎; 南泰浩; 平博順
    Oral presentation, Japanese, 言語処理学会第27回年次大会(NLP2021), Domestic conference
    17 Jan. 2021
  • ニューラルネットワーク強化学習を用いた幼児語彙獲得のモデル化
    Oral presentation, Japanese, ヒューマンコミュニケーション基礎研究会, Domestic conference
    26 Jan. 2020
  • 乳児院入所児における言語発達の特徴-語彙数・語彙獲得順序・品詞カテゴリからの分析
    坂本有香; 奥村優子; 南泰浩; 麦谷綾子; 伊藤嘉余子; 小林哲生
    Oral presentation, Japanese, ヒューマンコミュニケーション基礎研究会, Domestic conference
    25 Jan. 2020
  • 幼児の語彙発達における地域差の分析
    坂本有香; 南泰浩; 曹 妍; 奥村優子; 小林哲生
    Oral presentation, Japanese, 赤ちゃん学会 第19 回学術集会, Domestic conference
    06 Jul. 2019
  • 多言語コーパスを用いた幼児語彙獲得時期での男女間相関の特性
    藤田浩貴; 南泰浩; 小林哲生; 奥村優子
    Oral presentation, Japanese, 言語処理学会第24回年次大会, Domestic conference
    Mar. 2018
  • 幼児の簡易語彙能力チェックリスト作成における幼児分類の効率化
    塚田元春; 南泰浩; 小林哲生; 奥村優子
    Oral presentation, Japanese, 言語処理学会第24回年次大会, Domestic conference
    Mar. 2018
  • DRQNによる幼児の語彙獲得のモデル化
    野口輝; 南泰浩
    Oral presentation, Japanese, 言語処理学会第24回年次大会, Domestic conference
    Mar. 2018
  • ニューラルネットワークと強化学習による幼児の語彙獲得のモデル化
    野口 輝; 南 泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
    Jan. 2018
  • 幼児の言語発達における共通ボキャブラリー指数の提案
    曹 妍; 南 泰浩; 奥村優子; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
    Jan. 2018
  • 多言語における幼児語彙獲得時期の男女間相関の比較
    藤田浩貴; 南 泰浩; 小林哲生; 奥村優子
    Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
    Jan. 2018
  • 幼児の能力推定のための簡易語彙チェックリストの提案
    森山佑亮; 南泰浩; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
    2017
  • マルチターン対話における次発話予測での効果的な特徴量の統合手法およびその分析
    玉木竜二; 南泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告 (言語理解とコミュニケーション), Domestic conference
    2017
  • 大規模幼児語彙発達データによる語彙獲得現象の分析
    森山佑亮; 南泰浩; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
    2017
  • 乳幼児の語理解・発話日齢に与える母親の教育年数の影響
    森山佑亮; 南泰浩; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
    2016
  • 語彙チェックリストアプリによる幼児語彙発達データ収集の試み
    小林哲生; 奥村優子; 南 泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告, Domestic conference
    Jan. 2016
  • 言語発達遅延児における語彙成長記録アプリ活用の試み
    阿久津由紀子; 小林哲生; 小形哲也; 渡辺佐和; 齋藤貴美子; 南泰浩
    Oral presentation, Japanese, 日本言語聴覚学会, Domestic conference
    2016
  • Three-way restricted boltzmann machine による音声モデリングに基づく話者・音素の同時認識
    中鹿亘; 南泰浩
    Oral presentation, Japanese, 研究報告音楽情報科学 (MUS)
    2016
  • 語彙チェックリストアプリによる幼児語彙発達データ収集の試み
    小林哲生; 奥村優子; 南 泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告, Domestic conference
    Jan. 2016
  • 乳幼児の語理解・発話日齢に与える母親の教育年数の影響
    森山佑亮; 南 泰浩; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告, Domestic conference
    Jan. 2016
  • 幼児語彙習得順序における言語共通性と依存性について
    南泰浩; 小林哲生
    Poster presentation, Japanese, 日本音響学会秋季講演論文集, Domestic conference
    17 Mar. 2015
  • センター試験における英語問題の回答手法
    東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
    Poster presentation, Japanese, 言語処理学会第21回年次大会, Domestic conference
    17 Mar. 2015
  • 本幼児の語彙習得順序に関する性別依存性について
    南泰浩; 小林哲生
    Poster presentation, Japanese, 電子情報通信学会技術研究報告HCS
    30 Jan. 2015
  • 日本語習得児における語彙カテゴリ構成の発達的変遷
    小林哲生; 南泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告HCS
    2014
  • 幼児語彙習得順序における性別の影響について
    南泰浩; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告HCS
    2014
  • 1-2歳児における語彙カテゴリ構成の発達的変遷:大規模横断データを用いた検討
    小林哲生; 南泰浩
    Invited oral presentation, Japanese, 日本教育心理学会第56回総会(JAEP56)
    2014
  • 絵本を基にした対象年齢推定方法の検討
    藤田早苗; 小林哲生; 平博順; 南泰浩; 田中貴秋
    Invited oral presentation, Japanese, 第28回人工知能学会全国大会
    2014
  • 幼児コンテンツ制作支援のための語彙検索システムの提案とその評価
    小林哲生; 南泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告HIP
    2013
  • 幼児早期出現語理解-発話指標による幼児語彙学習特徴の検証
    南泰浩; 小林哲生; 杉山弘晃
    Oral presentation, Japanese, 電子情報通信学会技術研究報告TL
    2013
  • 単語の発話音韻長と幼児の語彙獲得期間との関係
    南泰浩; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2013
  • 語彙の身体性が獲得時期の個人差に与える影響
    杉山弘晃; 小林哲生; 南. 泰浩
    Invited oral presentation, Japanese, 赤ちゃん学会 第13回学術集会
    2013
  • 幼児コンテンツ制作支援のための語彙検索システム 語の習得月齢・習得率の指定による該当語の選択
    小林哲生; 南泰浩
    Invited oral presentation, Japanese, 第13回学術集会
    2013
  • 語の学習では本当に幼児は名詞を早く獲得する?―語の理解・発話日齢の推定による名詞優位性の言語間比較―
    南泰浩; 小林哲生; 杉山晃弘
    Invited oral presentation, Japanese, 赤ちゃん学会 第13回学術集会
    2013
  • 幼児早期出現語の理解-発話指標による名詞学習の優位性の検証
    南泰浩; 小林哲生
    Invited oral presentation, Japanese, 言語処理学会第19回年次大会
    2013
  • 折れ線近似による語彙爆発開始時期の推定
    南泰浩; 小林哲生; 杉山弘晃
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2012
  • 初期語彙発達の急増期における統計的性質と特徴量抽出
    南泰浩; 小林哲生
    Oral presentation, Japanese, 電子情報通信学会技術研究報告TL
    2012
  • POMDP を用いた聞き役対話制御部の Wizard of Oz 実験による評価
    目黒豊美; 南泰浩; 東中竜一郎; 堂坂浩二
    Invited oral presentation, Japanese, 人工知能学会全国大会(第26回)
    2012
  • 2ツイートを用いた対話モデルの構築
    東中竜一郎; 川前徳章; 貞光九月; 南泰浩; 目黒豊美; 堂坂浩二; 稲垣博人
    Invited oral presentation, Japanese, 言語処理学会第18回年次大会
    2012
  • 順序学習に基づく逆強化学習による対話制御
    杉山弘晃; 目黒豊美; 南泰浩
    Invited oral presentation, Japanese, 人工知能学会全国大会(第26回)
    2012
  • 語彙学習速度の線形性を利用した語彙学習日齢の予測
    杉山弘晃; 小林哲生; 南泰浩
    Invited oral presentation, Japanese, 赤ちゃん学会第 12 回学術集会
    2012
  • 幼児コンテンツ作成のための発達に即した語彙検索システムの作成
    小林哲生; 南. 泰浩
    Invited oral presentation, Japanese, 教育心理学会総会
    2012
  • 縦断および横断データを用いた幼児早期出現語の獲得月齢の特定
    小林哲生; 南泰浩; 永田昌明
    Invited oral presentation, Japanese, 言語処理学会第18回年次大会
    2012
  • 幼児の語彙学習速度と語彙カテゴリー構成
    小林哲生; 南泰浩; 杉山弘晃
    Invited oral presentation, Japanese, 赤ちゃん学会第 12 回学術集会
    2012
  • 線形関数とプラトー割り込みによる語彙発達モデルの検証―幼児の語彙発達におけるポアソン過程性の検証―
    南泰浩; 小林哲生; 杉山弘晃
    Invited oral presentation, Japanese, 赤ちゃん学会第 12 回学術集会
    2012
  • カルマンフィルタを用いた語彙発達におけるプラトー時期の推定
    南泰浩; 小林哲生; 杉山弘晃
    Invited oral presentation, Japanese, 音響学会秋季
    2012
  • 線形関数とプラトー割込による幼児語彙発達のモデル化
    南泰浩; 小林哲生; 杉山弘晃
    Invited oral presentation, Japanese, 言語処理学会第18回年次大会
    2012
  • POMDP を用いた聞き役対話システムの対話制御
    目黒豊美; 東中竜一郎; 南泰浩; 堂坂浩二
    Invited oral presentation, Japanese, 言語処理学会第17回年次大会
    2012
  • アクション継続長制御を利用する POMDP 対話制御
    南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
    Oral presentation, Japanese, 情報処理学会研究報告HCI
    2011
  • 共通状態と連結学習を用いた HMM によるコールセンタ対話の要約
    東中竜一郎; 南泰浩; 西川仁; 堂坂浩二; 目黒豊美; 小橋川哲; 政瀧浩和; 吉岡理; 高橋敏; 菊井玄一郎
    Invited oral presentation, Japanese, 言語処理学会第17回年次大会
    2011
  • アクション継続長制御を用いた POMDP による対話制御
    南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
    Invited oral presentation, Japanese, 人工知能学会全国大会論文集
    2011
  • ユーザ支援システムのための人の行動タイミング決定方策の分析
    杉山弘晃; 南泰浩
    Invited oral presentation, Japanese, 第28回日本ロボット学会学術講演会
    2011
  • 思考喚起型多人数対話システム--キャンプ
    堂坂浩二; 南泰浩
    Oral presentation, Japanese, 人工知能学会 言語・音声理解と対話処理研究会
    2010
  • POMDP による Trigram 対話制御
    南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 森啓; 前田英作
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2010
  • 保有知識の確信度に基づく対話型映像認識理解システムの質問生成戦略
    セクホン・ガーバチャン; 木村昭悟; 南泰浩; 坂野鋭; 前田英作
    Oral presentation, Japanese, 電子情報通信学会技術研究報告IBISML
    2010
  • 音声対話におけるエージェント発話行動の適応的調整
    堂坂浩二; 金本淳志; 東中竜一郎; 南泰浩; 前田英作
    Invited oral presentation, Japanese, 人工知能学会全国大会
    2010
  • 対話データを用いた POMDP による統計的対話制御手法の解析
    南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 前田英作
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2010
  • 統計的モデルを用いた POMDP による対話制御
    南泰浩; 目黒豊美; 東中竜一郎; 森啓; 堂坂浩二; 前田英作
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2010
  • 聞き役対話システムの構築を目的とした聞き役対話の分析
    目黒豊美; 東中竜一郎; 堂坂浩二; 南. 泰浩; 磯崎秀樹
    Oral presentation, Japanese, 情報処理学会研究報告NL
    2009
  • 対話データの統計量を用いた POMDP による対話制御
    南泰浩; 森啓; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
    Oral presentation, Japanese, 情報処理学会研究報告SLP
    2009
  • 音声認識システム SOLON における日本語講演音声への教師なし適応に関する評価
    大庭隆伸; 渡部晋治; 石塚健太郎; 藤本雅清; 堀貴明; マックダーモット・エリック; 南泰浩; 中村篤
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2009
  • POMDP を利用した思考喚起型対話の制御
    南泰浩; 澤木美奈子; 東中竜一郎; 堂坂浩二
    Oral presentation, Japanese, 情報処理学会研究報告SLP
    2008
  • クイズ対話システムの構築と音声認識性能による評価
    南泰浩; 東中竜一郎; 澤木美奈子; 堂坂浩二; 山田武士; 松林達史; 磯崎秀樹; 前田英作
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2007
  • カルマンフィルタに基づく音声認識手法における混合ガウス分布モデルの検討
    南泰浩
    Invited oral presentation, Japanese, 日本音響学会秋講演論文集
    2007
  • カルマンフィルタを用いた音声認識
    南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2007
  • 環境知能の実現に向けた分野横断型研究の試み
    前田英作; 南泰浩; 堂坂浩二; 森啓; 近藤公久
    Oral presentation, Japanese, 電子情報通信学会技術研究報告PRMU
    2006
  • 音声認識システム SOLON の日本語話し言葉コーパスによる評価(2006年版)
    中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
    Oral presentation, Japanese, 情報処理学会研究報告SLP
    2006
  • 音声認識システム SOLON の日本語話し言葉コーパス(公開版ver1.0)による評価
    中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2006
  • カルマンフィルタによる音声認識のための特徴量トラジェクトリ生成法
    南泰浩; マックダーモット・エリック; 中村篤
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2006
  • ベイズ的基準を用いた状態共有型 HMM 構造の選択
    渡部晋治; 南泰浩; 中村篤; 上田修功
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2005
  • 変分ベイズを用いた音声認識
    渡部晋治; 南泰浩; 中村篤; 上田修功
    Oral presentation, Japanese, 第8回情報論的学習理論ワークショップ予稿集
    2005
  • 音声認識システム SOLON の日本語話し言葉コーパス(公開版ver1.0)による評価
    中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
    Oral presentation, Japanese, 情報処理学会研究報告SLP
    2005
  • 音声特徴抽出法 Spade における歪補正法の効果
    石塚健太郎; 宮崎昇; 中谷智広; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2005
  • カルマンフィルタにより生成されたトラジェクトリに基づく音声認識
    南泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2004
  • 音声認識でのダイナミクスの表現
    南泰浩
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2004
  • 帯域内での周期性・非周期性を表す音声特徴抽出法spadの提案とaurora-2jを用いた耐雑音性評価
    石塚健太郎; 宮崎昇; 中谷智広; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2004
  • 音声認識システム SOLON の日本語話し言葉コーパスにおける評価
    渡部晋治; 堀貴明; マクダーモット・エリック; 南泰浩; 中村篤
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2004
  • WFST の高速on-the-Fly合成による超大語彙連続音声認識
    堀貴明; 堀智織; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2004
  • 有限状態トランスデューサ型デコーダの性能改善
    堀貴明; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2004
  • 特徴量トラジェクトリによる音声認識手法の理論的考察
    南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2004
  • 変分ベイズ法の音響モデル適応への応用
    渡部晋治; 南泰浩; 中村篤; 上田修功
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2004
  • 有限状態トランスデューサによる音声要約法の評価
    堀貴明; 堀智織; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2003
  • 有限状態トランスデューサによる音声認識・文整形・要約処理の統合
    堀貴明; 堀智織; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2003
  • 変分ベイズ法の音声認識への適用
    渡部晋治; 南泰浩; 中村篤; 上田修功
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2003
  • ベイズ的アプローチに基づく状態共有型 HMM 構造の学習
    渡部晋治; 南泰浩; 中村篤; 上田修功
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2002
  • 実対話音声を用いた有限状態トランスデューサ型認識デコーダの評価
    奈木野豪秀; ヴィレット・ダニエル; 南泰浩; 中村篤; マクダーモット・エリック; 宮崎昇; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    2002
  • セグメントモデルに基づく音声認識
    南泰浩
    Oral presentation, Japanese, 情報処理学会音声言語情報処理研究会SIG-SLP
    2002
  • 有限状態トランスデューサによる音声認識と文整形処理の統合
    堀貴明; ヴィレット・ダニエル; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2002
  • 混合分布型 HMM を用いたトラジェクトリパラメータ生成によろ音声認識手法の評価
    南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2002
  • 静的特徴量と動的特徴量の関係を用いたトラジェクトリパラメータ生成による音声認識手法
    南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2002
  • バイノーラル音源分離の音声認識による評価
    中谷智広; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2002
  • On-Line Transducer Composition for Memory-Efficient Search in Lvcsr
    ヴィレット・ダニエル; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    2002
  • Pervasive Unsupervised Adaptation for Off-Line Lecture Speech Transcription
    ウィレット・ダニエル; ニスラー・トーマス; マクダーモット・エリック; 南泰浩; 片桐滋
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2002
  • Language Model Synchronization for Improved Beam-Search Performance in Large Vocabulary Continuous Speech Recognition
    ヴィレット・ダニエル; マックダーモット・エリック; 南泰浩; 片桐滋
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    2002
  • A Time-Synchronous Viterbi-Decoder for Arbitrary Speech Recognition Tasks Defined by Finite State Transducers
    ヴィレット・ダニエル; マックダーモット・エリック; 南泰浩; 中村篤; 片桐滋
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    Oct. 2001
  • 連続音声認識にためのネットワーク構造をもちいた効率的探索手法
    花沢健; 南泰浩; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    Mar. 2001
  • 話者認識技術の実用化に向けて
    松井知子; 吉岡理; 南泰浩
    Oral presentation, Japanese, 映像情報メディア学会技術報告マルチメディア情報処理研究会
    1998
  • パワーの分散を考慮した拡張hmm合成法
    南泰浩; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会講演論文集
    1997
  • 自由発声中の連続数字音声認識
    ボッシュ・エティエン; 南泰浩; ガジク・ボヤナ; 松岡達雄; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1997
  • Evaluation up Speech Recognition Performance Degradation for a Moving Speaker in Anechoic Conditions
    ジロン・フランク; 田中雅史; 古家賢一; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1997
  • 雑音と歪みを含んだ音声への HMM 適応化手法の評価
    南泰浩; 高木幸一; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1997
  • 尤度最大化原理による HMM 適応化法
    南泰浩; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    Mar. 1996
  • パワーの分散を考慮した拡張 HMM 合成法
    南泰浩; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1996
  • 雑音と歪みを含んだ音声への HMM 適応化手法の評価
    南泰浩; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1996
  • HMM 合成に基づく尤度最大化適応法
    南泰浩; 古井貞煕
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    1995
  • 最尤推定法を用いた HMM 適応化法
    南泰浩; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1995
  • 電話音声認識のための音響モデルの回線特性への適応化
    松岡達雄; グロ・ピエールエマニエル; 南泰浩; 古井貞煕
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1995
  • 電話番号案内を対象としたマルチモーダル対話システムの作成と音声入力の評価
    吉岡理; 南泰浩; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    1994
  • 電話番号案内を対象としたマルチモーダル対話システムにおける音声入力の評価
    吉岡理; 南泰浩; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1994
  • HMM トレリス計算のおける状態継続時間制限アルゴリズム
    高橋敏; 南泰浩; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1994
  • HMM トレリス計算のおける状態継続時間制限アルゴリズム
    高橋敏; 南泰浩; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    Oct. 1993
  • Improving Phoneme HMMs for Large -Vocabulary Spontaneous Speech Recognition
    高橋敏; 南泰浩; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    1993
  • Atr における連続音声認識システム Atreus の諸方式と性能
    永井明人; 山口耕市; 鷹見淳一; 大倉計美; 小坂哲夫; 福沢圭二; 加藤喜永; S. Harald; 村上仁一; 杉山雅英; 嵯峨山茂樹; 保坂順子; 森元逞; 北研二; 服部浩明; 小森康弘; 沢井秀文; 花沢利行; 中村哲; 甲斐充彦; 南泰浩; 川端豪; 鹿野清宏; 榑松明
    Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
    1993
  • 電話番号案内を対象としたマルチモーダル対話システムの作成
    吉岡理; 南泰浩; 山田智一; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1993
  • 自由発声を対象とした不特定話者大語彙連続音声認識法
    南泰浩; 鹿野清宏; 高橋敏; 山田智一
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1993
  • 音韻環境依存 HMM と候補のマージを用いた不特定話者大語彙連続音声認識
    南泰浩; 高橋敏; 鹿野清宏; 山田智一
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1993
  • 自由発声音声認識における意味を考慮した2段 Lr パーザの検討
    南泰浩; 山田智一; 吉岡理; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1993
  • HMM の合成による雑音下の大語彙連続音声認識
    南泰浩; フランクマルタン; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1993
  • エルゴディック雑音 HMM と音韻 HMM の合成による雑音重畳音声の認識
    マルタン・フランク; 鹿野清宏; 南泰浩
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1993
  • 番号案内を対象とした自由発声の認識の試み
    鹿野清宏; 南泰浩; 山田智一
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1993
  • 音韻認識における HMM のラベルなし評価法
    南泰浩; 松岡達雄; 鹿野清宏
    Oral presentation, Japanese, 連続音声認識シンポジウムSPREC
    1992
  • フレーム間相関を用いた音韻 HMM
    高橋敏; 南泰浩; 松岡達雄; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
    1992
  • 音韻認識における HMM のラベルなし評価法
    南泰浩; 松岡達雄; 鹿野清宏
    Oral presentation, Japanese, 連続音声認識シンポジウムSPREC
    1992
  • 不特定話者連続音声データベースによる連結学習 HMM の評価
    南泰浩; 松岡達雄; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
    1992
  • 番号案内を対象とした大語彙連続音声認識アルゴリズム
    南泰浩; 山田智一; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
    1992
  • Recognition of Noisy Speech by Composition of Hidden Markov Models
    マルタン・フランク; 鹿野清宏; 南泰浩; 岡部洋一
    Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
    1992
  • フレーム間相関を用いた連続型音韻 HMM
    高橋敏; 南泰浩; 松岡達雄; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1992
  • 音響学会連続音声データベースによる各種不特定話者 HMM の評価
    南泰浩; 高橋敏; 松岡達雄; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1992
  • 不特定話者連続音声データベースを用いた HMM の連結学習
    南泰浩; 松岡達雄; 鹿野清宏
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1992
  • 番号案内を対象とした大語彙連続音声認識アルゴリズム
    南泰浩; 山田智一; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
    1992
  • Recognition of Noisy Speech by Using the Composition of Hidden Markov Models
    マルタン・フランク; 鹿野清宏; 南泰浩; 岡部洋一
    Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
    1992
  • セパレ-トベクトル量子化を用いた HMM 音声認識の耐雑音性に対する検討
    片岡淳; 南泰浩; 中川正雄
    Invited oral presentation, Japanese, 電子情報通信学会春季全国大会講演論文集
    1992
  • Tdnn 音韻スポッティングと予測 Lr パーザを用いた大語彙単語音声認識
    南泰浩; 沢井秀文; 宮武正典; 鹿野清宏
    Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
    1990
  • 有声音部の定常性を考慮したフレ-ムレ-ト選択型 PARCOR ボコ-ダ
    佐藤正俊; 南泰浩; 水井潔; 中川正雄
    Invited oral presentation, Japanese, 電子情報通信学会春季全国大会講演論文集
    1990
  • トリグラムモデルを用いた連続単語音声認識における自動単語分類
    神田直之; 南泰浩; 中川正雄
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1990
  • セパレートベクトル量子化を用いた HMM 音声認識の耐雑音性に関する検討
    片岡淳; 南泰浩; 中川正雄
    Oral presentation, Japanese, 第12回情報理論とその応用シンポジューム
    1989
  • 可変ビットレート Adpcm ・ PARCOR 混成音楽符号化方式
    岩元直久; 南泰浩; 中川正雄
    Oral presentation, Japanese, 第12回情報理論とその応用シンポジューム
    1989
  • ベクトル量子化を用いる可変フレームレート PARCOR ボコーダ
    佐藤正俊; 南泰浩; 水井潔; 中川正雄
    Oral presentation, Japanese, 第12回情報理論とその応用シンポジューム
    1989
  • HMM 連続音声認識の高速化
    南泰浩; 中川正雄
    Invited oral presentation, Japanese, 日本音響学会春季講演論文集
    1988
  • 構文解析とプロダクションシステムを付加した連続音声認識
    南泰浩; 中川正雄
    Invited oral presentation, Japanese, 電子情報通信学会春季全国大会講演論文集
    1988
  • Adfとプロダクションシステムによる異常信号の検出・除去
    南泰浩; 中川正雄
    Invited oral presentation, Japanese, 電子情報通信学会秋季全国大会講演論文集
    1988
  • Adfとプロダクションシステムによる異常信号の検出・除去
    南泰浩; 中川正雄
    Invited oral presentation, Japanese, 電子情報通信学会秋季全国大会講演論文集
    Mar. 1987
  • 複数の応答⽣ 成モデルを⽤ いた音声雑談対話システムの構築とその対話選択方式の検討
    佐藤明智; 南泰浩 郭恩孚
    人工知能学会全国大会

Courses

  • ヒューマンインターフェース
    The University of Electro-Communications
  • ヒューマンインターフェース
    電気通信大学
  • インターンシップ2海外長期
    The University of Electro-Communications
  • インターンシップ2海外長期
    電気通信大学
  • インターンシップ2海外
    The University of Electro-Communications
  • インターンシップ2海外
    電気通信大学
  • インターンシップ2(長期)
    The University of Electro-Communications
  • インターンシップ2(長期)
    The University of Electro-Communications
  • インターンシップ1海外長期
    The University of Electro-Communications
  • インターンシップ1海外長期
    The University of Electro-Communications
  • インターンシップ1(海外)
    The University of Electro-Communications
  • インターンシップ1(海外)
    電気通信大学
  • インターンシップ1(長期)
    The University of Electro-Communications
  • インターンシップ1(長期)
    The University of Electro-Communications
  • 認知インタラクションデザイン学
    京都工芸繊維大学
  • 認知インタラクションデザイン学
    京都工芸繊維大学
  • インターンシップ2
    The University of Electro-Communications
  • インターンシップ2
    電気通信大学
  • インターンシップ1
    The University of Electro-Communications
  • インターンシップ1
    The University of Electro-Communications
  • 情報システム基礎学合同輪講
    The University of Electro-Communications
  • 情報システム基礎学合同輪講
    電気通信大学
  • 情報システム基礎論1
    The University of Electro-Communications
  • Foundations of Information Systems 1
    The University of Electro-Communications
  • 情報システム基礎論1
    電気通信大学
  • 応用情報学特論第4
    岐阜大学
  • 応用情報学特論第4
    岐阜大学
  • ネットワーク技術と高度情報科社会
    大阪大学
  • ネットワーク技術と高度情報科社会
    大阪大学

Affiliated academic society

  • 日本音響学会
  • IEEE
  • 情報処理学会
  • 電子情報通信学会
  • 言語処理学会

Research Themes

  • 幼児語彙発達大規模データの収集と工学的な解析に基づく語彙発達過程の解明
    南 泰浩
    日本学術振興会, 科学研究費助成事業 基盤研究(B), 電気通信大学, 基盤研究(B), 23H00623
    Apr. 2023 - Mar. 2027
  • 大規模データ処理による網羅的データを用いた言語発達機構の解析とその応用
    南泰浩
    Principal investigator
    01 Apr. 2017 - 31 Mar. 2020
  • 人とロボットの共生による協創社会の創成「人ロボット共生学」 ロボットのコミュニケーション戦略の生成
    01 Oct. 2009 - 31 Mar. 2013

Industrial Property Rights

  • 語彙発達指標推定装置、語彙発達指標推定方法、プログラム
    Patent right, 南 泰浩, 小林 哲生, 特許7213509, Date registered: 19 Jan. 2023
  • 能力推定装置,語選択装置,これらの方法及びプログラム
    Patent right, 南泰浩, 森山佑亮, 小林哲生, 特願2017-138791, Date applied: 18 Jul. 2017, 特許6850218, Date issued: 31 Mar. 2021
  • 幼児単語探索装置とその方法とプログラム
    Patent right, 2012119556, Date applied: 2012, 5806642, Date issued: 11 Sep. 2015
  • 語彙学習速度予測パラメータ生成装置と語彙学習速度予測装置とそれらの方法とプログラム
    Patent right, 2012119555, Date applied: 2012, 5785905, Date issued: 31 Jul. 2015
  • 難易度学習装置、難易度推定モデル学習装置、難易度推定装置、方法、及びプログラム
    Patent right, 特願2015-031004, Date applied: 19 Feb. 2015
  • 難易度推定モデル学習装置、難易度推定装置、方法、及びプログラム
    Patent right, 特願2015-031000, Date applied: 19 Feb. 2015
  • 難易度推定式学習装置、難易度推定装置、方法、及びプログラム
    Patent right, 特願2015-030997, Date applied: 19 Feb. 2015
  • 単語提示装置、計算装置、これらの方法及びプログラム
    Patent right, 特願2014-256876, Date applied: 19 Dec. 2014
  • 単語提示装置、方法及びプログラム
    Patent right, 特願2014-255495, Date applied: 17 Dec. 2014
  • 発話候補作成装置とその方法とプログラム
    Patent right, 2013035865, Date applied: 2013
  • 幼児語彙理解難易度評価装置と幼児語彙検索装置と幼児語彙分類装置と,それらの方法とプログラム
    Patent right, 2013024274, Date applied: 2013
  • 報酬関数推定装置,報酬関数推定方法,およびプログラム,
    Patent right, 2012096453, Date applied: 2012
  • 理解語月齢テーブル生成装置,対象年齢推定装置,方法,及びプログラム
    Patent right, 2012128334, Date applied: 2012
  • 語彙学習関数推定装置,語彙学習関数推定方法及びそのプログラム
    Patent right, 2012192939, Date applied: 2012
  • 語彙学習関数推定装置,語彙学習関数推定方法及びそのプログラム
    Patent right, 2012192938, Date applied: 2012
  • 特徴検出装置,特徴検出方法及びそのプログラム
    Patent right, 2012192937, Date applied: 2012
  • 語彙学習曲線パラメータ推定装置,方法,及びプログラム, 出願番号
    Patent right, 2012029951, Date applied: 2012
  • 語彙学習曲線パラメータ推定装置,方法,及びプログラム, 出願番号
    Patent right, 2012029950, Date applied: 2012
  • 幼児単語探索装置とその方法とプログラム
    Patent right, 2012119556, Date applied: 2012
  • 理解語月齢テーブル生成装置,対象年齢推定装置,方法,及びプログラム
    Patent right, 2012128334, Date applied: 2012
  • 対話学習装置,要約装置,対話学習方法,要約方法,プログラム
    Patent right, 2010179330, Date applied: 2011, 5346327
  • 語彙学習速度推定装置,方法,及びプログラム
    Patent right, 2012029949, Date applied: 2011
  • 語彙学習曲線パラメータ推定装置,方法,及びプログラム
    Patent right, 2012029950, Date applied: 2011
  • コミュニケーションエージェントの動作制御装置,コミュニケーションエージェントの動作制御方法,及びそのプログラム
    Patent right, 2011139777, Date applied: 2011
  • 対話モデル構築装置
    Patent right, 2011110989, Date applied: 2011
  • 文脈依存性推定装置,発話クラスタリング装置,方法,及びプログラム
    Patent right, 2011184054, Date applied: 2011
  • 行動タイミング決定装置,行動タイミング決定方法,およびそのプログラム
    Patent right, 2011035826, Date applied: 2011
  • 語彙爆発時期推定装置,方法,及びプログラム
    Patent right, 2011066456, Date applied: 2011
  • 語彙爆発時期推定装置,方法,及びプログラム
    Patent right, 2011060851, Date applied: 2011
  • 対話評価装置,方法及びプログラム
    Patent right, 2011110989, Date applied: 2011
  • 行動制御装置,行動制御方法及び行動制御プログラム
    Patent right, 2011050493, Date applied: 2011
  • 語彙学習速度推定装置,方法,及びプログラム
    Patent right, 2012029949, Date applied: 2011
  • 行動タイミング決定方法,およびそのプログラム
    Patent right, 2010203895, Date applied: 2010, 5361832
  • 行動制御装置,行動制御方法及び行動制御プログラム
    Patent right, 2010272627, Date applied: 2010, 5427163
  • 要約装置,要約作成方法及びプログラム
    Patent right, 2010271397, Date applied: 2010
  • 対話学習装置,対話分析装置,対話学習方法,対話分析方法
    Patent right, 2010126882, Date applied: 2010
  • 対話型映像認識理解における動的学習戦略に関する試み
    Patent right, 2011017057, Date applied: 2010
  • 多人数思考喚起型対話装置,多人数思考喚起型対話方法,多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
    Patent right, 2010186237, Date applied: 2010
  • 対話からの性格特徴判定装置
    Patent right, 2009215267, Date applied: 2009, 5281527
  • 聞き役対話識別装置
    Patent right, 2009192875, Date applied: 2009, 5150583
  • 多人数思考喚起型対話装置,多人数思考喚起型対話方法,多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
    Patent right, 2009028605, Date applied: 2009, 5218514
  • 行動制御学習方法,行動制御学習装置,行動制御学習プログラム
    Patent right, 2009199376, Date applied: 2009, 5361615
  • 多人数思考喚起型対話装置,多人数思考喚起型対話方法,多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
    Patent right, 2009028605, Date applied: 2009, 5218514
  • 音声信号モデル化方法,信号認識装置及び方法,パラメータ学習装置及び方法,特徴量生成装置及び方法並びにプログラム
    Patent right, 200949901, Date applied: 2009
  • 能力推定装置、方法及びプログラム
    Patent right, 特願2020-193982
  • 語選択装置、方法及びプログラム
    Patent right, 特願2020-193983
  • 語彙発達指標推定装置、語彙発達指標推定方法、プログラム
    Patent right, 特願2019-006697