Search Details ｜The University of Erectro-Communications

Name

Author

Position

Affiliation

Research Areas

Yasuhiro MINAMI

Department of Computer and Network Engineering	Professor
Cluster I (Informatics and Computer Engineering)	Professor
Artificial Intelligence eXploration Research Center	Professor

Profile:
昭和61慶大・理工・電気卒．平成３同大大学院博士課程了．同年NTT入社．平成１１-１２MIT客員研究員．現在，電気通信大学教授．工博．音声認識，音声対話処理，知能情報処理の研究に従事．平成５日本音響学会粟屋潔学術奨励賞，平成１５本会論文賞受賞．平成１７年テレコムシステム技術賞受賞，平成１８情報処理学会創立45周年記念論文優秀論文賞受賞．平成２０テレコムシステム技術賞受賞．IEEE，日本音響学会，電子情報通信学会，情報処理学会各会員．

Researcher Information

Degree

工学博士, 慶應義塾大学

Field Of Study

Informatics, Learning support systems

Informatics, Intelligent robotics

Informatics, Intelligent informatics

Informatics, Human interfaces and interactions

Humanities & social sciences, Cognitive sciences

Career

01 Jul. 2014
電気通信大学情報システム学研究科

01 Jul. 2013 - 30 Jun. 2014
Nippon Telegram and Telephone Corporation, Communication Science Laboratories, Group Leader

01 Mar. 2002 - 30 Jun. 2013
Nippon Telegram and Telephone Corporation, Communication Science Laborato6ries, Senior Researcher

01 Mar. 1996 - 28 Feb. 2002
Nippon Telegram and Telephone Corporation, Human Interface Laboratories, Senior Researcher

01 Apr. 1991 - 28 Feb. 1996
Nippon Telegram and Telephone Corporation, Human Interface Laboratories

Educational Background

01 Apr. 1988 - 01 Mar. 1991
Keio University, 理工学研究科, 電気工学専攻

01 Apr. 1986 - 01 Mar. 1988
Keio University, 理工学研究科, 電気工学専攻

01 Apr. 1982 - 01 Mar. 1986
Keio University, 理工学部, 電気工学科

31 Mar. 1981
都立三田

Member History

01 Apr. 2015
情報処理学会SLP研究会

Aug. 2013
幼児言語発達研究会幹事, Society

2011 - 2013
音響学会関西支部評議委員

2013
情報処理学会関西支部委員, Society

2009 - 2012
情報処理学会関西支部幹事

2011
音響学会代議員, Society

2011
音響学会評議委員, Society

Research Activity Information

Award

Mar. 2023
言語処理学会
言語処理学会第29回年次大会優秀賞, 藤田守太;南泰浩

Mar. 2022
言語処理学会
対話での共通基盤構築過程における名付けの分析
言語処理学会第２８回年次大会委員特別賞, 齋藤結;光田航;東中竜一;南泰浩
Japan society

Mar. 2020
言語処理学会
センター試験を対象とした高性能な英語ソルバーの実現
言語処理学会第26回年次大会優秀賞
Japan society

Mar. 2018
言語処理学会
DRQNによる幼児の語彙獲得のモデル化
若手奨励賞

Mar. 2017
言語処理学会
「ロボットは東大に入れるか」プロジェクト：代ゼミセンター模試タスクにおけるエラーの分析
言語処理学会論文賞, 松崎拓也;横野光;宮尾祐介;川添愛;狩野芳伸;加納隼人;佐藤理史;東中竜一郎;杉山弘晃;磯崎秀樹;菊井玄一郎;堂坂浩二;平博順;南泰浩;新井紀子

2014
人工知能学会2014年度全国大会優秀賞，受賞者は筆頭著者の目黒のみ

2013
言語処理学会第18回年次大会優秀賞

2012
赤ちゃん学会ポスター優秀発表賞

2012
NTT知的財産センタ所長表彰

2011
人工知能学会 2011年度全国大会優秀賞，受賞者は筆頭著者の堂坂のみ

2010
人工知能学会2010年度全国大会優秀賞，受賞者は筆頭著者の堂坂のみ

2008
テレコムシステム技術賞, 南泰浩

2008
COLING Best paper finalist

2007
NTTテクニカルレビュー特集論文賞

2007
NTTコミュニケーション科学基礎研究所長表彰特別賞

2006
情報処理学会創立４５周年記念論文「50年後の情報科学技術をめざして」優秀論文賞, 南泰浩

2005
テレコムシステム技術賞, 南泰浩

2005
NTT先端技術総合研究所長表彰研究開発賞

2005
NTTコミュニケーション科学基礎研究所長表彰

2004
電子情報通信学会論文賞, 南泰浩

1996
NTT研究技術開発本部長表彰

1993
音響学会粟屋潔学術奨励賞, 南泰浩

Paper

Dataset Construction for Scientific-Document Writing Support by Extracting Related Work Section and Citations from PDF Papers
Keita Kobayashi; Kohei Koyama; Hiromi Narimatsu; Yasuhiro Minami
13th Edition of its Language Resources and Evaluation Conference, to appear, 14 Jun. 2022, Peer-reviwed
International conference proceedings, English

Probabilistic model using HDP producing vocabularies of Japanese children
Yasuhiro Minami; Tessei Kobayashi
Lead, Conference on Interdisciplinary Advances in Statistical Learning, 96-96, Jun. 2022, Peer-reviwed

Characteristic of language development mechanism of children in the residential care institution
Yuka Sakamoto; Yuko Okumura; Yasuhiro Minami; Ryoko Mugitani; Kayoko Ito; Tessei Kobayashi
International Congress of Psychology, ICP2020, 19 Jul. 2021, Peer-reviwed
International conference proceedings, English

Predicting New Words for Young Japanese Children using Large-scaled Japanese Child Vocabulary Development Database
Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi; Yuka Sakamoto
International Congress of Psychology, ICP2020, 19 Jul. 2021, Peer-reviwed
International conference proceedings, English

Using mobile phone data to estimate the relationship between population flow and influenza infection pathways
Qiushi Chen; Michiko Tsubaki; Yasuhiro Minami; Kazutoshi; Fujibayashi; Tetsuro Yumoto; Junzo Kamei; Yuka Yamada; Hidenori; Kominato; Hideki Oono; Toshio Naito
MDPI, International Journal of Environmental Research and Public Health, 18, 14, 2021, Peer-reviwed, True, This study aimed to analyze population flow using global positioning system (GPS) location data and evaluate influenza infection pathways by determining the relationship between population flow and the number of drugs sold at pharmacies. Neural collective graphical models (NCGMs; Iwata and Shimizu 2019) were applied for 25 cell areas, each measuring 10 × 10 km2, in Osaka, Kyoto, Nara, and Hyogo prefectures to estimate population flow. An NCGM uses a neural network to incorporate the spatiotemporal dependency issue and reduce the estimated parameters. The prescription peaks between several cells with high population flow showed a high correlation with a delay of one to two days or with a seven-day time-lag. It was observed that not much population flows from one cell to the outside area on weekdays. This observation may have been due to geographical features and undeveloped transportation networks. The number of prescriptions for anti-influenza drugs in that cell remained low during the observation period. The present results indicate that influenza did not spread to areas with undeveloped traffic networks, and the peak number of drug prescriptions arrived with a time lag of several days in areas with a high amount of area-to-area movement due to commuting.
Scientific journal, English
DOI URL

Task Definition and Integration For Scientific-Document Writing Support
H. Narimatsu; K. Koyama; K. Dohsaka; R. Higashinaka; Y. Minami; H. Taira
Online: Association for Computational Linguistics, 発表予定, 18-26, 2021, Peer-reviwed
International conference proceedings, English

Properties of early vocabulary development in Japanese-English bilingual children
Yuka Sakamoto; Yuko Okumura; Tessei Kobayashi; Yasuhiro Minami
BCCCD 2020 (Budapest CEU Conference on cognitive development), PB-038, 20 Jan. 2020, Peer-reviwed
International conference proceedings, English

Vocabulary Size As Explanatory Variable for Japanese-Speaking Children’s Vocabulary Development
Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi
ICPS, 発表予定, 07 Mar. 2019, Peer-reviwed
International conference proceedings, English

Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database
Yasuhiro Minami; Tessei Kobayashi; Yuko Okumura
LREC 2018, P26, 10 May 2018, Peer-reviwed
International conference proceedings, English

Analyzing Vocabulary Commonality Index Using Large-scaled Database of Child Language Development
Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi
LREC 2018, P55, 10 May 2018, Peer-reviwed
International conference proceedings, English

Acquisition of infant-directed speech words in Japanese-speaking children: Analysis using large-scale vocabulary-checklist data
Yuko OKUMURA; Tessei KOBAYASHI; Yasuhiro MINAMI; Yusuke MORIYAMA
interdisciplinary advances in statistical learning, to appear, 28 Jun. 2017, Peer-reviwed
International conference proceedings, English

Word acquisition correlation in Japanese-speaking children using large-scale infant vocabulary development database
Yasuhiro Minami; Yusuke Moriyama; Tessei Kobayash; Yuko Okumura
interdisciplinary advances in statistical learning, To appear, 28 Jun. 2017, Peer-reviwed
International conference proceedings, English

Acquisition of mental state language in Japanese-speaking children: Analysis using large-scale vocabulary-checklist data
Yuko OKUMURA; Tessei KOBAYASHI; Yasuhiro MINAMI
WILD, to appear, 14 Jun. 2017, Peer-reviwed
International conference proceedings, English

Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion
Toru Nakashika; Yasuhiro Minami
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, SPRINGER INTERNATIONAL PUBLISHING AG, 1-10, Jun. 2017, Peer-reviwed, In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: (1) the data used for the training is limited to the pre-defined sentences, (2) the trained model is only applied to the speaker pair used in the training, and (3) a mismatch in alignment may occur. Although it is generally preferable in VC to not use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by combining the two. Our experimental results showed that our approach outperformed the conventional non-parallel approach regarding objective and subjective criteria.
Scientific journal, English
DOI URL

Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
Toru Nakashika; Tetsuya Takiguchi; Yasuhiro Minami
IEEE Transactions on Audio, Speech and Language Processing, 24, 11, 2045, Oct. 2016, Peer-reviwed
Scientific journal, English
DOI URL

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
Toru Nakashika; Yasuhiro Minami
Proceedings of the 17th Conference of the International Speech Communication Association (Interspeech 2016), 1487-1491, Sep. 2016, Peer-reviwed
International conference proceedings, English

3WRBM-Based Speech Factor Modeling for Arbitrary-Source and Non-Parallel Voice Conversion
Toru Nakashika; Yasuhiro Minami
Interspeech 2016, 1487-1491, Sep. 2016, Peer-reviwed
International conference proceedings, English

Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
Toru Nakashika; Tetsuya Takiguchi; Yasuhiro Minami
IEEE/ACM Transactions on Audio, Speech and Language Processing, 23, 3, 1-14, Aug. 2016, Peer-reviwed
Scientific journal, English

3WRBM-Based Speech Factor Modeling for Arbitrary-Source and Non-Parallel Voice Conversion
Toru Nakashika; Yasuhiro Minami
EUSIPCO, 607-611, Aug. 2016, Peer-reviwed
International conference proceedings, English

「ロボットは東大に入れるか」プロジェクト：代ゼミセンター模試タスクにおけるエラーの分析
松崎拓也; 横野光; 宮尾祐介; 川添愛; 狩野芳伸; 加納隼人; 佐藤理史; 東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩; 新井紀子
自然言語処理, 23, 1, Jan. 2016, Peer-reviwed
Scientific journal, Japanese

SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
Torsi Nakashika; Yasuhiro Minami
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, IEEE, 5530-5534, 2016, Peer-reviwed, In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is a technique where only speaker specific information in source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems; 1) the data used for the training is limited to the pre-defined sentences, 2) the trained model is only applied to the speaker pair used in the training, and 3) mismatch in alignment may happen. Although it is, thus, fairy preferable in VC not to use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then a voice-converted speech is obtained by mixing the two. Our experimental results showed that our approach unfortunately fell short of the popular conventional GMM-based method that used parallel data, but outperformed the conventional non-parallel approach.
International conference proceedings, English

幼児を対象としたテキストの対象年齢推定方法
藤田早苗; 小林哲生; 南泰浩; 杉山弘晃
認知科学, 22, 4, 1-17, Dec. 2015, Peer-reviwed
Scientific journal, Japanese

Fluctuating Development of Common Nouns and Predicates in Early Lexical Development: Evidence from Analysis of Large sample Vocabulary Checklist Data in Japanese children
Tessei Kobayashi; Yasuhiro Minami; Yuko Okumura
ECDP, To appear, 08 Sep. 2015, Peer-reviwed
International conference proceedings, English

Taking the English exam for the "can a robot get into the University of Tokyo?" project
Ryuichiro Higashinaka; Hiroaki Sugiyama; Hideki Isozaki; Genichiro Kikui; Kohji Dohsaka; Hirotoshi Taira; Yasuhiro Minami
NTT Technical Review, 13, 7, 01 Jul. 2015, NTT and its research partners are participating in the "Can a robot get into the University of Tokyo?" project run by the National Institute of Informatics, which involves tackling English exams. The artificial intelligence system we developed took a mock test in 2014 and achieved a better-than-human-average score for the first time. This was a notable achievement since English exams require English knowledge and also common sense knowledge that humans take for granted but that computers do not necessarily possess. In this article, we describe how our artificial intelligence system takes on English exams.
Scientific journal

Gender variability of child word-comprehension and -production days
Yasuhiro Minami; Tessei Kobayashi
WILD, 未定, 10 Jun. 2015, Peer-reviwed
International conference proceedings, English

任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成
杉山弘晃; 目黒豊美; 東中竜一郎; 南泰浩
人工知能学会論文誌, 人工知能学会, 30, 1, 183-194, Jan. 2015, Peer-reviwed
Scientific journal, Japanese

「ロボットは東大に入れるか」における英語問題の回答手法
東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
NTT技術ジャーナル, 電気通信協会, 27, 4, 63-66, 2015
Research institution, Japanese
URL

Effects of Conversational Agents on Activation of Communication in Thought-Evoking Multi-Party Dialogues
Kohji Dohsaka; Ryota Asai; Ryuichiro Higashinaka; Yasuhiro Minami; Eisaku Maeda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E97D, 8, 2147-2156, Aug. 2014, Peer-reviwed, This paper presents an experimental study that analyzes how conversational agents activate human communication in thought-evoking multi-party dialogues between multi-users and multi-agents. A thought-evoking dialogue is a kind of interaction in which agents act to provoke user thinking, and it has the potential to activate multi-party interactions. This paper focuses on quiz-style multi-party dialogues between two users and two agents as an example of thought-evoking multi-party dialogues. The experimental results revealed that the presence of a peer agent significantly improved user satisfaction and increased the number of user utterances in quiz-style multi-party dialogues. We also found that agents' empathic expressions significantly improved user satisfaction, improved user ratings of the peer agent, and increased the number of user utterances. Our findings should be useful for activating multi-party communications in various applications such as pedagogical agents and community facilitators.
Scientific journal, English
DOI URL

語の長さと幼児の語彙獲得時期・期間との相関
南泰浩; 小林哲生
音声学会, 17, 3, 44-53, Mar. 2014, Peer-reviwed
Scientific journal, Japanese

Large-scale collection and analysis of personal question-answer pairs for conversational agents
Hiroaki Sugiyama; Toyomi Meguro; Ryuichiro Higashinaka; Yasuhiro Minami
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 8637, 420-433, 2014, Peer-reviwed, In conversation, a speaker sometimes asks questions that relate to another speaker's detailed personality, such as his/her favorite foods and sports. This behavior also appears in conversations with conversational agents
therefore, agents should be developed that can respond to such questions. In previous agents, this was achieved by creating question-answer pairs defined by hand. However, when a small number of persons create the pairs, we cannot know what types of questions are frequently asked. This makes it difficult to know whether the created questions cover frequently asked questions
therefore, such essential question-answer pairs for conversational agents are possibly overlooked. This study analyzes a large number of question-answer pairs for six personae created by many question-generators, with one answer-generator for each persona. The proposed approach allows many questioners to create questions for various personae, enabling us to investigate the types of questions that are frequently asked. A comparison with questions appearing in conversations between humans shows that 50.2% of the questions were contained in our question-answer pairs and the coverage rate was almost saturated with the 20 recruited question-generators. © 2014 Springer International Publishing Switzerland.
International conference proceedings, English
DOI URL

OPEN-DOMAIN UTTERANCE GENERATION USING PHRASE PAIRS BASED ON DEPENDENCY RELATIONS
Hiroaki Sugiyama; Toyomi Meguro; Ryuichiro Higashinaka; Yasuhiro Minami
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, IEEE, PM2.201, 60-65, 2014, Peer-reviwed, The development of open-domain conversational systems remains difficult since user utterances are widely varied for such systems to respond appropriately. To address this issue, previous research has retrieved sentences from the web as system utterances by shallow sentence matching with user utterances. However, since the retrieved sentences include the inherent contexts of the document in which the sentences originally appeared, the retrieved sentences have the possibility of containing information that is irrelevant to user utterances. We propose combining two strongly related semantic units (phrase pairs with dependency relations) to create a system utterance. Here, the first semantic unit is the one found in the user utterance and the second semantic unit is the one that has a dependency relation with the first one in a large text corpus. This way, we can guarantee that the generated utterance is related to the input user utterance. Our experiments, which examine the appropriateness of response sentences, show that our proposed method significantly outperforms other retrieval and rule-based approaches.
International conference proceedings, English

幼児の発達に応じた語彙検索システム
南泰浩; 小林哲生
電子情報通信学会論文誌, The Institute of Electronics, Information and Communication Engineers, J96-D, 10, 2612-2624, Oct. 2013, Peer-reviwed, 本論文では,幼児発達の基礎研究や幼児向けのコンテンツ作成を支援するための,語彙検索システムの作成を試みた.このシステムは,語彙チェックリスト法により取得/解析した大規模横断データを用いて,理解・発話の点から日本語を学習する幼児がいつどんな語をどの程度習得するのかを簡単に且つ高精度に調べることができる.
Scientific journal, Japanese
URL
URL 2

Open-Domain Utterance Generation for Conversational Dialogue Systems Using Web-Scale Dependency Structures
H. Sugiyama; T. Meguro; R. Higashinaka; Y. Minami
SIGdial, 22-24, Aug. 2013, Peer-reviwed
International conference proceedings, English

Vocabulary Spurt and Noun Acquisition: Evidence from Longitudinal Data in Japanese-Speaking Children
T. Kobayashi; Y. Minami; H. Sugiyama
CLS, Poster, Jun. 2013, Peer-reviwed
International conference proceedings, English

Influence of Predominance in Noun Learning Examined by Period from Comprehending to Producing Words: A Cross-Linguistic Statistical Investigation Using CDI
Y. Minami; T. Kobayashi
WILD, Poster, Jun. 2013, Peer-reviwed
International conference proceedings, English

Cross-Linguistic Universality of Word Acquisition Ages in Comprehension and Production
Y. Minami; T. Kobayashi
WILD, Poster, Jun. 2013, Peer-reviwed
International conference proceedings, English

Individual Variation of Word Acquisition Age: A Comparison of Japanese- and English-Speaking Infants
H. Sugiyama; T. Kobayashi; Y. Minami
WILD, Poster, Jun. 2013, Peer-reviwed
International conference proceedings, English

Word-Class Composition in First 20 Words Predicts Later Word Acquisition Rate
T. Kobayashi; Y. Minami; H. Sugiyama
SRCD Biennial Meeting, 3-044 (157), Apr. 2013, Peer-reviwed
International conference proceedings, English

「語彙爆発の新しい視点」のさらなる検証
小林哲生; 南泰浩; 杉山弘晃
ベビーサイエンス, 12, 55-58, Mar. 2013, Peer-reviwed
Scientific journal, Japanese

語彙爆発の新しい視点：日本語学習児の初期語彙発達に関する縦断データ解析
小林哲生; 南泰浩; 杉山弘晃
ベビーサイエンス, 12, 34-49, Mar. 2013, Peer-reviwed
Scientific journal, Japanese

Learning to control listening-oriented dialogue using partially observable markov decision processes
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
ACM Transactions on Speech and Language Processing, 10, 4, 761-769, 2013, Peer-reviwed, Our aim is to build listening agents that attentively listen to their users and satisfy their desire to speak and have themselves heard. This article investigates how to automatically create a dialogue control component of such a listening agent.We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component that satisfies users by means of Partially Observable Markov Decision Processes (POMDPs). Using a hybrid dialog controller where high-level dialog acts are chosen with a statistical policy and low-level slot values are populated by a wizard, we evaluated our dialogue control method in aWizard-of-Oz experiment. The experimental results show that our POMDPbased method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This article is the first to verify, by using human users, the usefulness of POMDPbased dialogue control for improving user satisfaction in nontask-oriented dialogue systems. © 2013 ACM 1550-4875/2013/12-ART17 15.00.
Scientific journal, English
DOI URL

Differences between Noun and Verb Learning Periods from Comprehension to Production in Early Language Development
Y. Minami; T. Kobayashi
BCCCD, 174-174, Jan. 2013, Peer-reviwed
International conference proceedings, English

Learning to control listening-oriented dialogue using partially observable markov decision processes
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
ACM Transactions on Speech and Language Processing, 10, 4, 2013, Peer-reviwed, Our aim is to build listening agents that attentively listen to their users and satisfy their desire to speak and have themselves heard. This article investigates how to automatically create a dialogue control component of such a listening agent.We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component that satisfies users by means of Partially Observable Markov Decision Processes (POMDPs). Using a hybrid dialog controller where high-level dialog acts are chosen with a statistical policy and low-level slot values are populated by a wizard, we evaluated our dialogue control method in aWizard-of-Oz experiment. The experimental results show that our POMDPbased method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This article is the first to verify, by using human users, the usefulness of POMDPbased dialogue control for improving user satisfaction in nontask-oriented dialogue systems. © 2013 ACM 1550-4875/2013/12-ART17 15.00.
Scientific journal, English
DOI URL

聞き役対話の分析及び分析に基づいた対話制御部の構築
目黒豊美; 南泰浩; 東中竜一郎; 堂坂浩二
情報処理学会論文誌, 53, 12, 2787-2801, Dec. 2012, Peer-reviwed
Scientific journal, Japanese

Vocabulary Spurt and Word-Class Composition: Further Evidence for a Model of Plateaus and Linearity in Early Vocabulary Growth
T. Kobayashi; Y. Minami; H. Sugiyama
AMLaP, Poster, Sep. 2012, Peer-reviwed
International conference proceedings, English

Plateaus and Linearity of Early Vocabulary Growth
Y. Minami; T. Kobayashi; H. Sugiyama
ISSBD, P3.73, Jul. 2012, Peer-reviwed
International conference proceedings, English

Prediction of Vocabulary Growth Using Local Linearity
H. Sugiyama; Y. Minami; T. Kobayashi
ISSBD, P3. 67, Jul. 2012, Peer-reviwed
International conference proceedings, English

聞き役対話の分析及び分析に基づいた対話制御部の構築
目黒豊美; 東中竜一郎; 堂坂浩二; 南泰浩
情報処理学会論文誌, 52, 11, 2012, Peer-reviwed
Scientific journal, Japanese

情報提示対話を主導するシステムのためのユーザの潜在的情報要求の推定
杉山弘晃; 南泰浩
電子情報通信学会論文誌A, The Institute of Electronics, Information and Communication Engineers, 95-A, 1, 79-84, Jan. 2012, Peer-reviwed, 本研究では,ユーザへ情報を提示するシステムのための,ユーザの潜在的な情報要求の推定に基づく新たな情報提示タイミング決定方策を提案する.この方策により,システムは早過ぎる情報提示を抑制し,ユーザへ煩わしさを感じさせることなく主体的に情報提示することが可能になる.本研究ではこの方策におけるマルチモーダル情報の寄与を検証するため,最初に人と人のインタラクション実験を行い,利用可能なモダリティが変化したときの人が行う情報要求推定精度の変化について分析する.分析を通して,人はマルチモーダル情報を利用できないときは対話の流れを利用し,利用可能なときはマルチモーダル情報を利用することが示された.この結果をもとに,人の情報要求推定を実現するためのモデルを提案し,ユーザの潜在的な情報要求を表出させるよう設計した連想クイズ対話実験を通してその有効性を示す.
Scientific journal, Japanese
URL

対話行為タイプ列 Trigram による行動予測確率に基づく Pomdp 対話制御
南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 森啓; 前田英作
電子情報通信学会論文誌A, The Institute of Electronics, Information and Communication Engineers, 95-A, 1, 2-15, Jan. 2012, Peer-reviwed, 我々は,これまで,タスク指向ではない対話に対してPOMDPによる対話制御のモデル化を行ってきた.POMDPを用いた対話制御は,短期的に多くの報酬を獲得する対話系列を生成するが,比較的長い自然な対話の流れを生成することには,必ずしも適さない.そこで,我々は,POMDPで定義された報酬と予測確率の高い行動を選択する報酬との間のトレードオフを実現する新たな報酬をPOMDPに導入した.本論文では,この行動予測確率に対話行為タイプ列のTrigram確率を用い,POMDP型の対話制御に組み込むことを試みた.これにより,提案手法は,POMDPで定義された報酬とTrigram確率による行動予測確率に基づく報酬とのトレードオフによる対話制御を実現することになる.提案手法は,従来のTrigram確率による対話制御では実現できなかった二つの目的を同時に考慮した対話制御を可能とする.また,提案手法は,POMDPの特徴である認識誤りへの頑健性をも併せ持つ.本論文では,提案手法を定式化するとともに,実際の対話行為タイプ列のデータを用いて,モデルを学習しシミュレーション実験により提案手法の評価を行った.この評価では,認識誤りをシミュレートするため,対話文から対話行為タイプ列へ変換する対話行為タイプ認識を実装し,その結果得られる認識傾向を利用した.実験を行った結果,提案手法の有効性が確認され,Trigram確率だけに基づく対話制御に比べ,対話行為タイプの認識誤りにも頑健であることも明らかになった.
Scientific journal, Japanese
URL

擬人化エージェントとの対話場面におけるユーザの非言語動作に基づく難／易および興味／退屈の推定
中村和晃; 角所考; 正司哲朗; 美濃導彦; 澤木美奈子; 南泰浩; 前田英作
電子情報通信学会論文誌A, The Institute of Electronics, Information and Communication Engineers, 95-A, 1, 85-96, Jan. 2012, Peer-reviwed, 本研究では,ユーザー擬人化エージェント間の音声対話場面を対象に,ユーザが対話内容に対し難しいと感じていたか否か("難/易"),興味をもっていたか否か("興味/退屈")を,そのユーザの非言語動作(視線,表情,姿勢,手振り)から推定する処理の実現を目指す.一般に,音声対話では対話の内容が決定・伝達されるまでに一定の時間経過を要するため,そのような対話内容に対する難/易等の心的状態も一定の時間区間に対して定義される.一方,こうした時間区間の中では,話者/聴者の交代や対話の文脈の変化といった状況変化が頻繁に生じ,一つひとつの状況が対話全体の中で果たす役割の違いに応じて,各瞬間での心的状態と非言語動作との関係が多様に変化する.このため,各瞬間における非言語動作を特徴量として時間区間ごとに定義される難/易や興味/退屈を推定することは難しい.そこで本研究では,各瞬間ごとではなく時間区間ごとに定義される量(具体的には各種非言語動作の表出頻度)を特徴量として難/易及び興味/退屈を推定することを提案する.提案方法の有効性を確かめるために実験を行った結果,約72%の推定精度が得られた.
Scientific journal, Japanese
URL

Preference-learning based Inverse Reinforcement Learning for Dialog Control
Hiroaki Sugiyama; Toyomi Meguro; Yasuhiro Minami
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, ISCA-INT SPEECH COMMUNICATION ASSOC, Mon.P1d.03, 222-225, 2012, Peer-reviwed, Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.
International conference proceedings, English

Multiple Vocabulary Spurts in Japanese Children
Y. Minami; H. Sugiyama; T. Kobayashi
IASCL, Poster, Jul. 2011, Peer-reviwed
International conference proceedings, English

Analysis of Vocabulary Spurt from Prediction Performance Evaluation
H. Sugiyama; T. Kobayashi; Y. Minami
SRCD, Poster, Mar. 2011, Peer-reviwed
International conference proceedings, English

Dialogue Control by Pomdp Using Dialogue Data Statistics.
Yasuhiro Minami; Akira Mori; Toyomi Meguro; Ryuichiro Higashinaka; Kohji Dohsaka; Eisaku Maeda
Spoken Dialogue Systems Technology and Design, Springer, 163-186, 2011, Peer-reviwed
Scientific journal, English
URL
DOI URL

Information Provision-timing Control for Informational Assistance Robot
Hiroaki Sugiyama; Yasuhiro Minami
PROCEEDINGS OF THE 6TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTIONS (HRI 2011), IEEE, 259-260, 2011, Peer-reviwed, This paper proposes a HMM-based user's information demand estimation model for autonomous informational assistance robots to avoid providing information prematurely. The model estimates the user's implicit information demands by predicting a user's next information request using user's head movements. Through a word-association quiz-dialog experiment, our model demonstrated superior prediction performance over the usual HMM-based classifier.
International conference proceedings, English

Unsupervised Clustering of Utterances using Non-parametric Bayesian Methods
Ryuichiro Higashinaka; Noriaki Kawamae; Kugatsu Sadamitsu; Yasuhiro Minami; Toyomi Meguro; Kohji Dohsaka; Hirohito Inagaki
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, ISCA-INT SPEECH COMMUNICATION ASSOC, 2092-2095, 2011, Peer-reviwed, Unsupervised clustering of utterances can be useful for the modeling of dialogue acts for dialogue applications. Previously, the Chinese restaurant process (CRP), a non-parametric Bayesian method, has been introduced and has shown promising results for the clustering of utterances in dialogue. This paper newly introduces the infinite HMM, which is also a non-parametric Bayesian method, and verifies its effectiveness. Experimental results in two dialogue domains show that the infinite HMM, which takes into account the sequence of utterances in its clustering process, significantly outperforms the CRP. Although the infinite HMM outperformed other methods, we also found that clustering complex dialogue data, such as human-human conversations, is still hard when compared to human-machine dialogues.
International conference proceedings, English

Evaluation of Listening-oriented Dialogue Control Rules based on the Analysis of HMMs
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, ISCA-INT SPEECH COMMUNICATION ASSOC, 816-+, 2011, Peer-reviwed, We have been working on listening-oriented dialogues for the purpose of building listening agents. In our previous work [1], we trained hidden Markov models (HMMs) from listening-oriented dialogues (LoDs) between humans, and by analyzing them, discovered a distinguishing dialogue flow of LoD. For example, listeners suppress their information giving and self-disclosure, and instead, increase acknowledgments and questions to elicit speakers' utterances. As an initial step for building listening agents, we decided to create dialogue control rules based on our analysis of the HMMs. We built our rule-based system and compared it with three other systems by a Wizard of Oz (WoZ) experiment. As a result, we found that our rule-based system achieved as much user satisfaction as human listeners.
International conference proceedings, English

Building a conversational model from two-tweets
Ryuichiro Higashinaka; Noriaki Kawamae; Kugatsu Sadamitsu; Yasuhiro Minami; Toyomi Meguro; Kohji Dohsaka; Hirohito Inagaki
2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings, 330-335, 2011, Peer-reviwed, The current problem in building a conversational model from Twitter data is the scarcity of long conversations. According to our statistics, more than 90% of conversations in Twitter are composed of just two tweets. Previous work has utilized only conversations lasting longer than three tweets for dialogue modeling so that more than a single interaction can be successfully modeled. This paper verifies, by experiment, that two-tweet exchanges alone can lead to conversational models that are comparable to those made from longer-tweet conversations. This finding leverages the value of Twitter as a dialogue corpus and opens the possibility of better conversational modeling using Twitter data. © 2011 IEEE.
International conference proceedings, English
DOI URL

Wizard of Oz evaluation of listening-oriented dialogue control using POMDP
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings, 318-323, 2011, Peer-reviwed, We have been working on dialogue control for listening agents. In our previous study [1], we proposed a dialogue control method that maximizes user satisfaction using partially observable Markov decision processes (POMDPs) and evaluated it by a dialogue simulation. We found that it significantly outperforms other stochastic dialogue control methods. However, this result does not necessarily mean that our method works as well in real dialogues with human users. Therefore, in this paper, we evaluate our dialogue control method by a Wizard of Oz (WoZ) experiment. The experimental results show that our POMDP-based method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This paper is the first to show the usefulness of POMDP-based dialogue control using human users when the target function is to maximize user satisfaction. © 2011 IEEE.
International conference proceedings, English
DOI URL

環境知能を実現する統計的対話処理の研究 (特集 20 周年を迎えたコミュニケーション科学)
南泰浩; 目黒豊美
NTT 技術ジャ-ナル, 電気通信協会, 23, 9, 10-13, 2011, Peer-reviwed
Research institution, Japanese
URL

Statistical Dialogue Processing for Ambient Intelligence
Y. Minami; T. Meguro
NTT Technical Review, 9, 11, 2011, Peer-reviwed
Research institution, English

User-Adaptive Coordination of Agent Communicative Behavior in Spoken Dialogue
K. Dohsaka; A. Kanemoto; R. Higashinaka; Y. Minami; E. Maeda
Sigdial, 人工知能学会, 24, 314-321, Sep. 2010, Peer-reviwed
International conference proceedings, English
URL

Modeling User Satisfaction Transitions in Dialogues from Overall Ratings
R. Higashinaka; Y. Minami; K. Dohsaka; T. Meguro
Sigdial, 18-27, Sep. 2010, Peer-reviwed
International conference proceedings, English

Learning to Model Domain-Specific Utterance Sequences for Extractive Summarization of Contact Center Dialogues
R. Higashinaka; Y. Minami; H. Nishikawa; K. Dohsaka; T. Meguro; S. Takahashi; G. Kikui
COLING, 400-408, Aug. 2010, Peer-reviwed
International conference proceedings, English
URL
URL 2

FAST SIMILARITY SEARCH ON A LARGE SPEECH DATA SET WITH NEIGHBORHOOD GRAPH INDEXING
Kazuo Aoyama; Shinji Watanabe; Hiroshi Sawada; Yasuhiro Minami; Naonori Ueda; Kazumi Saito
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, IEEE, 5358-5361, 2010, Peer-reviwed, This paper presents a novel graph-based approach for solving a problem of fast finding a speech model acoustically similar to a query model from a large set of speech models. Each speech model in the set is represented by a Gaussian mixture model and dissimilarity from a GMM to another is measured with a Kullback-Leibler divergence (KLD). Conventional pruning techniques based on the triangle inequality for fast similarity search are not available because the model space with a KLD is not a metric space. We propose a search method that is characterized by an index of a degree-reduced nearest neighbor (DRNN) graph. The search method can efficiently find the most similar (closest) GMM to a query, exploring the DRNN graph with a best-first manner. Experimental evaluations on utterance GMM search tasks reveal a significantly low computational cost of the proposed method.
International conference proceedings, English

Issues in Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models
Ryuichiro Higashinaka; Yasuhiro Minami; Kohji Dohsaka; Toyomi Meguro
SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, SPRINGER-VERLAG BERLIN, 6392, 48-60, 2010, Peer-reviwed, This paper addresses three important issues in automatic prediction of user satisfaction transitions in dialogues. The first issue concerns the individual differences in user satisfaction ratings and how they affect the possibility of creating a user-independent prediction model. The second issue concerns how to determine appropriate evaluation criteria for predicting user satisfaction transitions. The third issue concerns how to train suitable prediction models. We present our findings for these issues on the basis of the experimental results using dialogue data in two domains.
International conference proceedings, English

Improving HMM-based extractive summarization for multi-domain contact center dialogues
Ryuichiro Higashinaka; Yasuhiro Minami; Hitoshi Nishikawa; Kohji Dohsaka; Toyomi Meguro; Satoshi Kobashikawa; Hirokazu Masataki; Osamu Yoshioka; Satoshi Takahashi; Genichiro Kikui
2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, 61-66, 2010, Peer-reviwed, This paper reports the improvements we made to our previously proposed hidden Markov model (HMM) based summarization method for multi-domain contact center dialogues. Since the method relied on Viterbi decoding for selecting utterances to include in a summary, it had the inability to control compression rates. We enhance our method by using the forward-backward algorithm together with integer linear programming (ILP) to enable the control of compression rates, realizing summaries that contain as many domain-related utterances and as many important words as possible within a predefined character length. Using call transcripts as input, we verify the effectiveness of our enhancement. ©2010 IEEE.
International conference proceedings, English
DOI URL

Trigram dialogue control using POMDPs
Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka; Toyomi Meguro; Eisaku Maeda
2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, 336-341, 2010, Peer-reviwed, This paper proposes hybrid dialogue control of both trigram and POMDP dialogue controls by extending our proposed method that uses two approaches: automatically acquiring POMDP structures and rewards for target dialogues through Dynamic Bayesian Networks (DBNs) with a large amount of dialogue data and reflecting action predictive probabilities into the POMDP structures. In this extension, we modify the action predictive probabilities to treat trigram dialogue controls. Experimental results show that the proposed method can treat a trigram dialogue control with robustness for erroneous conditions and can simultaneously maximize trigram probability and the dialogue evaluations obtained from users. ©2010 IEEE.
International conference proceedings, English
DOI URL

Effects of Personality Traits on Listening-Oriented Dialogue
T. Meguro; R. Higashinaka; K. Dohsaka; Y. Minami; H. Isozaki
IWSDS, 104-107, Dec. 2009, Peer-reviwed
International conference proceedings, English

Dialogue Control Algorithm for Ambient Intelligence Based on Partially Observable Markov Decision Processes
Y. Minami; A. Mori; T. Meguro; R. Higashinaka; K. Dohsaka; E. Maeda
IWSDS, 254-263, Dec. 2009, Peer-reviwed
International conference proceedings, English

Transdisciplinary Approach for Constructing Ambient Intelligence Environments
E. Maeda; Y. Minami; K. Dohsaka; A. Mori
Ami, 9-12, Nov. 2009, Peer-reviwed
International conference proceedings, English

Effects of Conversational Agents on Human Communication in Thought-Evoking Multi-Party Dialogues
K. Dohsaka; R. Asai; R. Higashinaka; Y. Minami; E. Maeda
Sigdial, 219-224, Sep. 2009, Peer-reviwed
International conference proceedings, English

Analysis of Listening-Oriented Dialogue for Building Listening Agents
T. Meguro; R. Higashinaka; K. Dohsaka; Y. Minami; H. Isozaki
Sigdial, 124-127, Sep. 2009, Peer-reviwed
International conference proceedings, English

Switching acausal filters for speech modeling
Yasuhiro Minami; Hirokazu Kameoka
Machine Learning for Signal Processing XIX - Proceedings of the 2009 IEEE Signal Processing Society Workshop, MLSP 2009, 1-6, 2009, Peer-reviwed, This paper shows a unified model of dynamical systems in speech processing that includes speech recognition and pitch modeling. For this purpose, we propose the use of switching acausal filters (SAFs), which exchange multiple acausal filters. These filters are defined by identical linear dynamical systems that exchange the roles of observation value and system input. This paper describes the formulation of recognition, training, and feature generation methods for SAFs, which can be applied to several previously proposed speech models. As an example, we show that an HMM with dynamic features and our F0 control method can be modeled by the proposed formulation. An HMM synthesis method can also be modeled using the formulations. From these results, we demonstrate the unification capability of SAFs. © 2009 IEEE.
International conference proceedings, English
DOI URL

まつしゅるーむの世界 : 環境知能の実現
南泰浩; 堂坂浩二; 澤木美奈子; 森啓; 前田英作
ヒューマンインタフェース学会誌 = Journal of Human Interface Society : human interface, ヒュ-マンインタフェ-ス学会, 10, 2, 109-114, May 2008, Peer-reviwed
Scientific journal

"WHO IS THIS" QUIZ DIALOGUE SYSTEM AND USERS' EVALUATION
M. Sawaki; Y. Minami; R. Higashinaka; K. Dohsaka; E. Maeda
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, IEEE, 149-152, 2008, Peer-reviwed, In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes "Who is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner, we implemented the system as a stuffed-toy (or CG equivalent). Quizzes are automatically generated from Wikipedia articles, rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level.
International conference proceedings, English

Quizmaster Mushrooms: “Who Is This” Quiz Dialogue System
M. Sawaki; Y. Minami; R. Higashinaka; K. Dohsaka; T. Yamada; T.Matsubayashi; H. Isozaki; E. Maeda
ICMI demo-session, demo-session, Nov. 2007, Peer-reviwed
International conference proceedings, English

Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
Takaaki Hori; Chiori Hori; Yasuhiro Minami; Atsushi Nakamura
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 15, 4, 1352-1365, May 2007, Peer-reviwed, This paper proposes a novel one-pass search algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large-vocabulary continuous-speech recognition. In the standard search method with on-the-fly composition, two or more WFSTs are composed during decoding, and a Viterbi search is performed based on the composed search space. With this new method, a Viterbi search is performed based on the first of the two WFSTs. The second WFST is only used to rescore the hypotheses generated during the search. Since this rescoring is very efficient, the total amount of computation required by the new method is almost the same as when using only the first WFST. In a 65k-word vocabulary spontaneous lecture speech transcription task, our proposed method significantly outperformed the standard search method. furthermore, our method was faster than decoding with a single fully composed and optimized WFST, where our method used only 38% of the memory required for decoding with the single WFST. Finally, we have achieved high-accuracy one-pass real-time speech recognition with an extremely large vocabulary of 1.8 million words.
Scientific journal, English
DOI URL

The World of Mushrooms: Human-Computer Interaction Prototype Systems for Ambient Intelligence
Yasuhiro Minami; Minako Sawaki; Kohji Dohsaka; Ryuichiro Higashinaka; Kentaro Ishizuka; Hideki Isozaki; Tatsushi Matsubayashi; Masato Miyoshi; Atsushi Nakamura; Takanobu Oba; Hiroshi Sawada; Takeshi Yamada; Eisaku Maeda
ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, ASSOC COMPUTING MACHINERY, 366-373, 2007, Peer-reviwed, Our new research project called "ambient intelligence" concentrates oil the creation of new lifestyles through research oil communication science and intelligence integration. It is premised on the creation of Such virtual complication partners as fairies and goblins that can be constantly at our side. We call these virtual communication partners mushrooms.
To show the essence of ambient intelligence, we developed two multimodal prototype systems: mushrooms that watch, listen, and answer questions and a Quizmaster Mushroom. These two systems Work in real time using speech. Sound, dialogue, and vision technologies.
We performed preliminary experiments With the Quizmaster Mushroom. The results showed that the system call transmit knowledge to users while they are playing the quizzes.
Furthermore. through the two Mushrooms, we found policies for design effects in multimodal interface and integration.
International conference proceedings, English

Mixture Gaussian HMM-trajctory method using likelihood compensation
Yasuhiro Minami
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, IEEE, 296-299, 2007, Peer-reviwed, We propose a new speech recognition method (HMM-trajectory method) that generates a speech trajectory from HMMs by maximizing their likelihood while accounting for the relationship between the MFCCs and dynamic MFCCs. One major advantage of this method is that this relationship, ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper improves the recognition performance of the HMM-trajectory method for dealing with mixture Gaussian distributions. While the HMM-trajectory method chooses the Gaussian distribution sequence of the HMM states by selecting the best Gaussian distribution in the state during Viterbi decoding and calculating HMM trajectory likelihood along with the sequence, the proposed method compensates for HMM trajectory likelihood using ordinary HMM likelihood. In speaker-independent speech recognition experiments, the proposed method reduced the error rate about 10% for the task compared with HMMs, proving its effectiveness for Gaussian mixture components.
International conference proceedings, English

コミュニケーション環境の未来に向けた研究最前線まっしゅるーむの世界-知能統合の実現に向けて
南泰浩; 前田英作; 堂坂浩二; 近藤公久; 森啓
NTT技術ジャーナル, 電気通信協会, 19, 6, 19-21, 2007, Peer-reviwed
Research institution, Japanese
URL

まっしゅるーむの世界――知能統合の実現に向けて
南泰浩; 前田英作; 堂坂浩二; 近藤公久; 森啓
NTT技術ジャーナル, 19, 6, 19-22, 2007, Peer-reviwed
Research institution, Japanese

Dynamic assignment of Gaussian components in modelling speech spectra
Parham Zolfaghari; Hiroko Kato; Yasuhiro Minami; Atsushi Nakamura; Shigeru Katagiri; Roy Patterson
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, SPRINGER, 45, 1-2, 7-19, Nov. 2006, Peer-reviwed, In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract where Gaussian distributions are used to model spectral frequency regions. A mixtures of Gaussian (MoG) based parametrisation scheme is used for modelling a smoothed representation of the spectra. This smoothing procedure removes all signal periodicity from the spectra allowing highly natural analysis, manipulation and synthesis of speech. The goal of this parametrisation scheme is to ease the correspondence between the resonant characteristics of the vocal tract and the parametric distributions and modelling the spectrum with an appropriate number of parameters. Previously, a maximum likelihood (ML) approach to this parametrisation scheme was introduced. However, this approach has inherent local optima problems. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we propose a new scheme whereby starting with a large number of distributions in the mixture, we systematically reduce their number and re-approximate the densities in the mixture based on a distance criterion. The Kullback-Leibler (KL) distance was found to allow optimal MoG solutions to the spectra. Furthermore, a fitness measure based on KL information is used to provide a figure for estimating the model order in representing formant-like features. The proposed model is subjectively evaluated and is shown to reduce the number of Gaussian with an appreciable loss in the quality of the re-synthesised speech.
Scientific journal, English
DOI URL

「妖精・妖怪の復権: 新しい「環境知能」像の提案」
前田英作; 南泰浩; 堂坂浩二
情報処理, 47, 6, 624-640, Jun. 2006, Peer-reviwed
Scientific journal, Japanese

Speech feature extraction method using subband-based periodicity and nonperiodicity decomposition
Kentaro Ishizuka; Tomohiro Nakatani; Yasuhiro Minami; Noboru Miyazaki
Journal of the Acoustical Society of America, 120, 1, 443-452, 2006, Peer-reviwed, This paper proposes a speech feature extraction method that utilizes periodicity and nonperiodicity for robust automatic speech recognition. The method was motivated by the auditory comb filtering hypothesis proposed in speech perception research. The method divides input signals into subband signals, which it then decomposes into their periodic and nonperiodic components using comb filters independently designed in each subband. Both features are used as feature parameters. This representation exploits the robustness of periodicity measurements as regards noise while preserving the overall speech information content. In addition, periodicity is estimated independently in each subband, providing robustness as regards noise spectrum bias. The framework is similar to that of a previous study [Jackson et al., Proc. of Eurospeech. (2003), pp. 2321-2324], which is based on cascade processing motivated by speech production. However, the proposed method differs in its design philosophy, which is based on parallel distributed processing motivated by speech perception. Continuous digit speech recognition experiments in the presence of noise confirmed that the proposed method performs better than conventional methods when the noise in the training and test data sets differs. © 2006 Acoustical Society of America.
Scientific journal, English
DOI URL

「環境知能シンポジウム2006－知性の森が織りなす未来」開催報告
堂坂浩二; 南泰浩; 森啓; 近藤公久
NTT技術ジャーナル, 電気通信協会, 18, 12, 72-76, 2006, Peer-reviwed
Research institution, Japanese
URL

「環境知能」プロジェクトの進展
南泰浩; 堂坂浩二; 森啓; 前田英作
NTT技術ジャーナル, 電気通信協会, 18, 9, 60-64, 2006, Peer-reviwed
Research institution, Japanese
URL

Report on “Ambient Intelligence Symposium 2006 - the Future: A Tapestry Woven from Threads of Intelligence”
K. Dohsaka; Y. Minami; A. Mori; T. Kondo
NTT Technical Review, 4, 12, 64-69, 2006, Peer-reviwed
Research institution, English

Step Towards Ambient Inteligence
E. Maeda; Y. Minami
NTT Technical Review, 4, 1, 50-55, 2006, Peer-reviwed
Research institution, English

The World of Mushrooms -a Transdisciplinary Approach to Human-Computer Interaction with Ambient Intelligence
E. Maeda; Y. Minami; M. Miyoshi; M. Sawaki; H. Sawada; A. Nakamura; J. Yamato; T. Yamada; R. Higashinaka
NTT Technical Review, 4, 12, 17-25, 2006, Peer-reviwed
Research institution, English

Selection of shared-state hidden Markov model structure using Bayesian criterion
S Watanabe; Y Minami; A Nakamura; N Ueda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E88D, 1, 1-9, Jan. 2005, Peer-reviwed, A Shared-State Hidden Markov Model (SS-HMM) has been widely used as an acoustic model in speech recognition. In this paper, we propose a method for constructing SS-HMMs within a practical Bayesian framework. Our method derives the Bayesian model selection criterion for the SS-HMM based on the variational Bayesian approach. The appropriate phonetic decision tree structure of the SS-HMM is found by using the Bayesian criterion. Unlike the conventional asymptotic criteria, this criterion is applicable even in the case of an insufficient amount of training data. The experimental results on isolated word recognition demonstrate that the proposed method does not require the tuning parameter that must be tuned according to the amount of training data, and is useful for selecting the appropriate SS-HMM structure for practical use.
Scientific journal, English

「環境知能」の実現に向けて
前田英作; 南. 泰浩
NTT技術ジャーナル, 電気通信協会, 17, 11, 52-55, 2005, Peer-reviwed
Research institution, Japanese
URL

Fast on-the-Fly Composition for Weighted Finite-State Transducers in 1.8 Million-Word Vocabulary Continuous Speech Recognition
T. Hori; C. Hori; Y. Minami
ICSLP, I, 289-292, Oct. 2004, Peer-reviwed
International conference proceedings, English

Improvement in Robustness of Speech Feature Extraction Method Using Sub-Band Based Periodicity and Aperiodicity Decomposition
K. Ishizuka; N. Miyazaki; T. Nakatani; Y. Minami
ICSLP, 937-940, Oct. 2004, Peer-reviwed
International conference proceedings, English

A Theoretical Analysis of Speech Recognition Based on Feature Trajectory Models
Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
ICSLP, I, 549-552, Oct. 2004, Peer-reviwed
International conference proceedings, English

Variational Bayesian estimation and clustering for speech recognition
S Watanabe; Y Minami; A Nakamura; N Ueda
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 12, 4, 365-381, Jul. 2004, Peer-reviwed, In this paper, we propose variational Bayesian estimation and clustering for speech recognition (VBEC), which is based on the variational Bayesian (VB) approach. VBEC is a total Bayesian framework: all speech recognition procedures (acoustic modeling and speech classification) are based on VB posterior distribution, unlike the maximum likelihood (ML) approach based on ML parameters. The total Bayesian framework generates two major Bayesian advantages over the ML, approach for the mitigation of over-training effects, as it can select an appropriate model structure without any data set size condition, and can classify categories robustly using a predictive posterior distribution. By using these advantages, VBEC: 1) allows the automatic construction of acoustic models along two separate dimensions, namely, clustering triphone hidden Markov model states and determining the number of Gaussians and 2) enables robust speech classification, based on Bayesian predictive classification using VB posterior distributions. The capabilities of the VBEC functions were confirmed in large vocabulary continuous speech recognition experiments for read and spontaneous speech tasks. The experiments confirmed that VBEC automatically constructed accurate acoustic models and robustly classified speech, i.e., totally mitigated the over-training effects with high word accuracies due to the VBEC functions.
Scientific journal, English
DOI URL

Recognition Method with Parametric Trajectory Synthesized Using Hmms
Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
SWIM, 776-786, Jan. 2004, Peer-reviwed
International conference proceedings, English

Model selection for mixture of Gaussian based spectral modelling
P Zolfaghari; H Kato; Y Minami; A Nakamura; S Katagiri
MACHINE LEARNING FOR SIGNAL PROCESSING XIV, IEEE, 325-334, 2004, Peer-reviwed, In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract. We propose a mixtures of Gaussians (MoG) spectral modelling scheme which enables model selection with a goal of easing the correspondence between the resonant characteristics of the vocal tract and the parametric Gaussians and representing a spectrum with an appropriate number of parameters. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we systematically reduce the number of Gaussians and re-approximate the densities in the MoG spectral model. The Kullback-Leibler (KL) distance between the densities in the mixture was found to allow optimal ML-MoG solutions to the spectra. A fitness measure based on KL information provides a figure for estimating the model order in representing formant-like features. The mixture model was fitted to a normalised smooth spectrum obtained by filtering the short-time Fourier transform in time and frequency by a pitch adaptive Gaussian filter. This results in the removal of all source information from the spectra. By subjectively evaluating the quality of the analysed and synthesised speech using this parametrisation scheme, we show considerable improvement over ML using this Gaussian reduction scheme specifically when using lower number of Gaussians in the mixture.
International conference proceedings, English

Speech Summarization Using Weighted Finite-State Transducers
T. Hori; C. Hori; Y. Minam
Eurospeech, 2817-2820, Sep. 2003, Peer-reviwed
International conference proceedings, English

ベイズ的基準を用いた状態共有型 Hmm 構造の選択
渡部晋治; 南泰浩; 中村篤; 上田修功
電子情報通信学会論文誌D, The Institute of Electronics, Information and Communication Engineers, J86-DII, 6, 776-786, Jun. 2003, Peer-reviwed, 音声認識用音響モデルとして広く用いられている状態共有型HMMにおいては,その状態共有構造をいかに適切に定めるかが重要である.従来,総状態数の決定を含む状態共有構造及び総状態数の選択は最ゆう基準に基づいて行われていた.しかしゆう度は総状態数の増加に伴い単調増加するため,実験的にしきい値を設定する必要がある.また,この問題に対するために導入された.最小記述長(MDL)基準やベイズ的情報基準(BIC)に基づくモデル選択は漸近理論を用いて導出されているため,学習データが少ない場合,適切なモデル選択が困難であるという問題があった.本論文では,決定論的ベイズ計算法として提案された変分ベイズ法に基づく,漸近性を仮定しないベイズ的基準を用いてHMMの状態クラスタリングを行い,状態共有構造と総状態数を学習データに応じて適応的に選択する方法を提案する.不特定話者の孤立単語認識実験を通して提案法の有効性を実証した.
Scientific journal, Japanese
URL

Paraphrasing Spontaneous Speech Using Weighted Finitestate Transducers
T. Hori; D. Willett; Y. Minami
SSPR2003, 219-222, Apr. 2003, Peer-reviwed
International conference proceedings, English

Bayesian Acoustic Modeling for Spontaneous Speech Recognition
S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
SSPR, 47-50, Apr. 2003, Peer-reviwed
International conference proceedings, English

Language model adaptation using WFST-based speaking-style translation
T Hori; D Willett; Y Minami
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 228-231, 2003, Peer-reviwed, This paper describes a new approach to language model adaptation for speech recognition based on the statistical framework of speech translation. The main idea of this approach is to compose a weighted finite-state transducer (WFST) that translates sentence styles from in-domain to out-of-domain. It enables to integrate language models of different styles of speaking or dialects and even of different vocabularies. The WFST is built by combining in-domain and out-of-domain models through the translation, while each model and the translation itself is expressed as a WFST. We apply this technique to building language models for spontaneous speech recognition using large written-style corpora. We conducted experiments on a 20k-word Japanese spontaneous speech recognition task. With a small in-domain corpus, a 2.9% absolute improvement in word error rate is achieved over the in-domain model.
International conference proceedings, English

Recognition method with parametric trajectory generated from mixture distribution HMMs
Y Minami; E McDermott; A Nakamura; S Katagiri
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, I, 124-127, 2003, Peer-reviwed, We have proposed a new speech recognition technique that generates a speech trajectory from HMMs by maximizing the likelihood of the trajectory, while accounting for the relation between the cepstrum and the dynamic cepstrum coefficients. This method has the major advantage that the relation, which is ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper describes an extension of the method for dealing with HMMs whose distributions are mixture Gaussian distributions. The method chooses the sequence of Gaussian distributions by selecting the best Gaussian distribution in the state during Viterbi decoding. Speaker-independent speech recognition experiments were carried out. The proposed method obtained an 18.2% reduction in error rate for the task, proving that the proposed method is effective even for Gaussian mixture HMMs.
International conference proceedings, English

Application of variational Bayesian estimation and clustering to acoustic model adaptation
S Watanabe; Y Minami; A Nakamura; N Ueda
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 1, 568-571, 2003, Peer-reviwed, In this paper, we apply Variational Bayesian Estimation and Clustering for speech recognition (VBEC) to an acoustic model adaptation. VBEC can estimate parameter posteriors even when a model includes hidden variables, by using Variational Bayesian approach. In addition, VBEC can select an appropriate model structure in clustering triphone states, according to the amount of available adaptation data. Unlike a conventional Bayesian method such as Maximum A Posteriori (MAP), VBEC is useful even in the case of small amounts of data, because the amount of data per,one Gaussian increases due to the model structure selection, and over-training is suppressed. We conduct an off-line supervised adaptation experiment on isolated word recognition, and show the advantage of the proposed method over the conventional method, especially when dealing with small amounts of adaptation data.
International conference proceedings, English

Pervasive unsupervised adaptation for lecture speech transcription
D Willett; T Niesler; E McDermott; Y Minami; S Katagiri
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 2, 292-295, 2003, Peer-reviwed, Unsupervised adaptation has evolved as a popular approach for tuning the acoustic models of speaker-independent speech recognition systems to specific speakers, speaker groups or channel conditions while making use of only untranscribed data. This study focuses on procedures for unsupervised adaptation of other probabilistic models that are involved in state-of-the-art speech recognizers and on the joint adaptation of. multiple knowledge sources. In particular, we outline and evaluate approaches for adapting both the language model and the pronunciation model (lexicon) without supervision. Initial experiments on off-line lecture speech transcription achieved small but promising word error rate improvements with each approach applied separately. The experimental results on the joint application of acoustic, language and pronunciation model adaptation indicate that the individually achievable performance improvements are additive.
International conference proceedings, English

コミュニケーションの壁を克服するための音声･音響処理技術次世代の音声認識技術
中村篤; 南. 泰浩; マクダーモット・エリック
NTT技術ジャーナル, 電気通信協会, 15, 12, 13-18, 2003, Peer-reviwed
Research institution, Japanese
URL

Application of Variational Bayesian Approach to Speech Recognition
S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
NIPS, MIT Press, NIPS'02, Dec. 2002, Peer-reviwed
International conference proceedings, English

Evaluation of a Speech Recognition/Generation Method Based on HMM and Straight
T. Irino; Y. Minami; T. Nakatani; M. Tsuzaki; H. Tagawa
ICSLP, 2545-2548, Sep. 2002, Peer-reviwed
International conference proceedings, English

Constructing Shared-State Hidden Markov Models Based on a Bayesian Approach
S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
ICSLP, 4, 2669-2672, Sep. 2002, Peer-reviwed
International conference proceedings, English

A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series
Y Minami; E McDermott; A Nakamura; S Katagiri
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, IEEE, 1, 957-960, 2002, Peer-reviwed, Parametric trajectory models have been proposed to exploit this time-dependency. However, parametric trajectory modeling methods are unable to take advantage of efficient HMM training and recognition methods. We have proposed a new speech recognition technique that generates a speech trajectory using an HMM-based speech synthesis method. This method generates an acoustic trajectory by maximizing the likelihood of the trajectory while taking into account the relation between the cepstrum, delta-cepstrum, and delta-delta cepstrum. In this paper, we extend our method to a general formulation including variance training procedure. Speaker independent speech recognition experiments show that the proposed method is effective for speech recognition.
International conference proceedings, English

A Recognition Method Using Synthesis-Based Scoring That Incorporates Direct Relations between Static and Dynamic Feature Vector Time Series
Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
Workshop for Consistent & Reliable Acoustic Cues for Sound Analysis, Poster, Sep. 2001, Peer-reviwed
International conference proceedings, English

Mokusei: A Telephone-Based Japanese Conversational System in the Weather Domain
M. Nakano; Y. Minami; S. Seneff; T. J. Hazen; D. S. Cyphers; J. Glass; J. Poliforoni; V. Zue
Eurospeech, 1331-1334, Sep. 2001, Peer-reviwed
International conference proceedings, English

Time and Memory Efficient Viterbi Decording for Lvcsr Using a Precompiled Search Network
D. Willett; E. McDermott; Y. Minami; S. Katagiri
Eurospeech, 847-890, Sep. 2001, Peer-reviwed
International conference proceedings, English

From Jupiter to Mokusei: Multilingual Conversational System in the Weather Domain
V. Zue; S. Seneff; J. Polifroni; M. Nakano; Y. Minami; T. J. Hazen; J. Glass
Workshop on Multi-Lingual Speech Communication, 1-6, Apr. 2000, Peer-reviwed
International conference proceedings, English

Mokusei: A Japanese Spoken Dialogue System in the Weather Domain
S. Seneff; J. Glass; T. J. Hazen; Y. Minami; J. Polifroni; V. Zue
NTT R&D, 電気通信協会, 49, 7, 376-382, 2000, Peer-reviwed
Research institution, English
URL

Compensation of speaker directivity in speech recognition using HMM composition
F. Giron; Y. Minami; M. Tanaka; K. Furuya
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1, 253-256, 01 Dec. 1998, Peer-reviwed, In hands-free speech recognition the speaker should be able to move freely in front of the speech acquisition device. However, the speech signal is then submitted to variations due to the continuous change of position in the acoustic space. This paper focuses on the role of speaker head rotations as compared with static situations in anechoic conditions. The effect of speaker directivity in speech recognition performance degradation is demonstrated and a compensation method based on HMM composition is proposed to increase the performance. © 1998 IEEE.
URL
URL 2
DOI URL

Towards practical use for speaker recognition technology
MATSUI Tomoko; YOSHIOKA Osamu; MINAMI Yasuhiro
ITE Technical Report, The Institute of Image Information and Television Engineers, 22, 45, 43-48, 14 Sep. 1998, Recently, network services such as banking, electronic commerce, database access services, information services over the Internet and telephone networks have become popular, and user verification technology that is essential to those services has become important. Voice verification technology is one type of user verification technology, and demand for this should be anticipated in an environment where services such as those over a telephone network can use only voice as the means of communication. This paper describes practical speaker recognition technology for constructing voice verification systems. Moreover, this paper introduces new software developer kits for speaker recognition, reports on speaker recognition experiments using telephone speech that we recently conducted, and shows that the recognition performance is seriously affected by text and handset conditions that are the same/different for training and testing.
Japanese
DOI URL

An HMM adaptation method for noise and distortion by maximizing likelihood
Y Minami; S Furui
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 81, 8, 1-9, Aug. 1998, Peer-reviwed, This paper describes a new HMM synthesis method in which HMM adapts to additive noise and multiplicative distortion. The conventional HMM synthesis method can only be applied to additive noise. In the method described here, the likelihood of the synthesized HMM to adapt to speech is maximized, so that multiplicative distortion is eliminated when the method is applied. Within the framework of this method, adaptation to variations in the SN ratio, considered a problem in conventional HMM synthesis, can be formulated as part of the adaptation to multiplicative distortion. As a result of evaluating speech recognition rates using our method, we have confirmed that the method is effective for improving the recognition rate of speech that contains additive noise and multiplicative distortion. (C) 1998 Scripta Technica.
Scientific journal, English

Compensation of Speaker Directivity in Speech Recognition Using HMM Composition
F. Giron; Y. Minami; M. Tanaka; K. Furuya
ICASSP, vol.1, 12-15, May 1998, Peer-reviwed
International conference proceedings, English

Connected Digit Recognition in Spontaneous Speech
E. Bauche; B. Gajic; Y. Minami; T. Matsuoka; S. Furui
Eurospeech, 923-926, Sep. 1997, Peer-reviwed
International conference proceedings, English

尤度最大化による雑音とひずみへの Hmm 適応化手法
南泰浩; 古井貞煕
電子情報通信学会論文誌A, J80-A, 7, 1179-1186, Jul. 1997, Peer-reviwed
Scientific journal, Japanese

An efficient search method for large-vocabulary continuous-speech recognition
K Hanazawa; Y Minami; S Furui
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V, I E E E, COMPUTER SOC PRESS, 1787-1790, 1997, Peer-reviwed, This paper proposes an efficient method for large-vocabulary continuous-speech recognition, using a compact data structure and an efficient search algorithm. We introduce a very compact data structure DAWG as a lexicon to reduce the search space. We also propose a search algorithm to obtain the N-best hypotheses using the DAWG structure. This search algorithm is composed of two phases: ''forward search'' and ''haceback''. Forward search, which basically uses the time-synchronous Viterbi algorithm, merges candidates and stores the information about them in DAWG structures to create phoneme graphs. Traceback traces the phoneme graphs to obtain the N-best hypotheses. An evaluation of this method's performance. using a speech-recognition-based telephone-directory-assistance system having a 4000-word vocabulary confirmed that our strategy improves-speech recognition in terms of time and recognition rate.
International conference proceedings, English

Adaptation method based on HMM composition and EM algorithm
Y Minami; S Furui
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, IEEE, 327-330, 1996, Peer-reviwed
International conference proceedings, English

Improved extended HMM composition by incorporating power variance
Y Minami; S Furui
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, IEEE, 1109-1112, 1996, Peer-reviwed, This paper describes a way of improving extended HMM composition that can precisely adapt HMMs to both noisy and distorted speech. To do this, we incorporate the variance of power into extended HMM composition using quantization to approximate the Gaussian distribution of the 0th order cepstrum. Consequently, a distribution of noisy speech is approximated in the linear spectral domain as a mixture of log normal distributions.
This method is evaluated by a four-digit recognition experiment when the number of digits is known. Two types of noise, computer room noise and car noise, are used and noisy and distorted speech data is made by adding these types of noise to speech data recorded using a boundary microphone. Results show that the proposed method improves recognition rates for noisy and distorted speech compared with our previous method.
International conference proceedings, English

AN HMM STATE DURATION CONTROL ALGORITHM APPLIED TO LARGE-VOCABULARY SPONTANEOUS SPEECH RECOGNITION
S TAKAHASHI; Y MINAMI; K SHIKANO
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78D, 6, 648-653, Jun. 1995, Peer-reviwed, Although Hidden Markov Modeling (HMM) is widely acid successfully used in many speech recognition applications, duration control for HMMs is still an important issue in improving recognition accuracy since a HMM places no constraints on duration. For compensating this defect, some duration control algorithms that employ precise duration models have been proposed. However, they suffer from greatly increased computational complexity. This paper proposes a new state duration control algorithm for limiting both the maximum and the minimum state durations. The algorithm is for the HMM trellis likelihood calculation, not for the Viterbi calculation. The amount of computation required by this algorithm is only order one (O(1)) for the maximum state duration n; that is, the computation amount is independent of the maximum state duration while many conventional duration control algorithm require computation in the amount of order n or order n(2). Thus, the algorithm can drastically reduce the computation needed for duration control. The algorithm uses the property that the trellis likelihood calculation is a summation of many path likelihoods. At each frame, the path likelihood that exceeds the maximum likelihood is subtracted, and the path likelihood that satisfies the minimum likelihood is added to the forward probability. By iterating this procedure, the algorithm calculates the trellis likelihood efficiently. The algorithm was evaluated using a large-vocabulary speaker-independent spontaneous speech recognition system for telephone directory assistance. The average reduction in error rate for sentence understanding was about 7% when using context-independent HMMs, and 3% when using context-dependent HMMs. We could confirm the improvement by using the proposed state duration control algorithm even though the maximum and the minimum state durations were not optimized for the task (speaker-independent duration settings obtained from a different task were used).
Scientific journal, English

A SPEECH DIALOGUE SYSTEM WITH MULTIMODAL INTERFACE FOR TELEPHONE DIRECTORY ASSISTANCE
O YOSHIOKA; Y MINAMI; K SHIKANO
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78D, 6, 616-621, Jun. 1995, Peer-reviwed, This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.
Scientific journal, English

ACOUSTIC AND LANGUAGE PROCESSING TECHNOLOGY FOR SPEECH RECOGNITION
T MATSUOKA; Y MINAMI
NTT REVIEW, NTT CORP, 7, 2, 30-39, Mar. 1995, Peer-reviwed, This paper describes acoustic and language processing technology for automatic speech recognition. Speech recognition systems usually consist of acoustic and language processing modules. The acoustic processing extracts feature parameter vectors from the speech utterance and performs pattern recognition by comparing the vector sequence and pre-defined acoustic models. The most likely model is then chosen as the recognition result. The language processing helps recognition by narrowing down the number of candidates or selects the most linguistically matching hypothesis from those produced by the acoustic processing.
Scientific journal, English

A MAXIMUM-LIKELIHOOD PROCEDURE FOR A UNIVERSAL ADAPTATION METHOD BASED ON HMM COMPOSITION
Y MINAMI; S FURUI
1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5, IEEE, 129-132, 1995, Peer-reviwed
International conference proceedings, English

UNIVERSAL ADAPTATION METHOD BASED ON HMM COMPOSITION
Y MINAMI; S FURUI
ICA 95 - PROCEEDINGS OF THE 15TH INTERNATIONAL CONGRESS ON ACOUSTICS, VOL III, SINTEF, 105-108, 1995, Peer-reviwed
International conference proceedings, English

LARGE-VOCABULARY CONTINUOUS SPEECH RECOGNITION ALGORITHM APPLIED TO A MULTIMODAL TELEPHONE DIRECTORY ASSISTANCE SYSTEM
Y MINAMI; K SHIKANO; S TAKAHASHI; T YAMADA; O YOSHIOKA; S FURUI
SPEECH COMMUNICATION, ELSEVIER SCIENCE BV, 15, 3-4, 301-310, Dec. 1994, Peer-reviwed, This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likelihood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm.
Scientific journal, English

PHONEME HMM EVALUATION ALGORITHM WITHOUT PHONEME LABELING APPLIED TO CONTINUOUS SPEECH HMM EVALUATION
Y MINAMI; T MATSUOKA; K SHIKANO
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 77, 11, 13-21, Nov. 1994, Peer-reviwed, Phoneme Hidden Markov Model (HMM) ate generally evaluated in terms of the phoneme recognition rate by using speech data extracted based on phoneme labels. This paper proposes an evaluation method that does not use phoneme labels for extraction. Consequently, phoneme HMMs can be evaluated even if a speech database without phoneme labeling is used.
In this study, concatenation training of the phoneme HMMs is executed using a large-scale speaker-independent continuous-speech database. Evaluation of the HMM phoneme recognition rate which is a function of the number of training speakers, using the proposed evaluation method demonstrates its effectiveness.
Scientific journal, English

Multimodal Telephone Directory Assistance System and Its Evaluation
Y. Minami; O. Yoshioka; K. Shikano; S. Furui
International Workshop on Human Interface Technology, 7-14, Sep. 1994, Peer-reviwed
International conference proceedings, English

An HMM Duration Control Algorithm with a Low Computation Cost
S. Takahashi; Y. Minami; K. Shikano
ICSLP, 267-270, Sep. 1994, Peer-reviwed
International conference proceedings, English

A Multi-Modal Dialogue System for Telephone Directory Assistance
O. Yoshioka; Y. Minami; K. Shikano
ICASSP, 887-890, Sep. 1994, Peer-reviwed
International conference proceedings, English

SPEECH RECOGNITION USING PHONEME HMM CONSTRAINED BY FRAME CORRELATION
S TAKAHASHI; T MATSUOKA; Y MINAMI; K SHIKANO
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 77, 6, 58-69, Jun. 1994, Peer-reviwed, One of the problems with the hidden Markov model (HMM) in performing speech recognition is that the local transition information of the feature vectors is not incorporated into the mechanism of the model and the model is not constrained by transitions of the feature vectors. Thus, the output probability distribution never changes during recognition. Furthermore, all transitions between the vectors that have high probabilities are allowed even if those transitions did not appear in the training data.
This paper proposes a bigram-constrained HMM that uses correlations between two frames to constrain the feature distributions of a speaker-independent HMM to the region most appropriate for the speaker. Since the output probability of the bigram-constrained HMM is a conditional probability restricted by the feature vector of the previous frame, the output probability changes dynamically at each frame depending on the feature vector of the previous frame. Constraining the feature distribution makes it possible to reduce the overlapping of feature distributions between different phonemes which improves recognition performance.
Previously, we proposed the discrete bigram-constrained HMM which is based on the combination of a discrete speaker-independent HMM and the VQ-code bigram. We showed that it performed better than conventional speaker-independent HMMs. In this paper, the strategy is extended to the tied-mixture bigram-constrained HMM and the continuous bigram-constrained HMM to obtain better recognition performance. These three types of HMMs are formulated and evaluated by phoneme recognition in continuous speech.
Scientific journal, English

フレーム間相関を利用した音韻 Hmm による音声認識
高橋敏; 松岡達雄; 南泰浩; 鹿野清宏
電子情報通信学会論文誌A, 電子情報通信学会, J77-A, 2, 153-161, Feb. 1994, Peer-reviwed, 現在のHMMの問題点の一つに,出客確率分布が各状態内で常に一定で,音韻特徴量の遷移情報がモデルの仕組みの中に反映されていないという点が挙げられる.しかも,特徴ベクトルの遷移に制約がないので,互いに出力確率が特徴ベクトル間の遷移は,学習データ中に観測されなかった遷移でも高い出力確率が与えられている.本論文では,特徴ベクトルの2フレーム間の相関を用いて遷移を制約し,不特定話者用HMMの広がった特徴量分布を,入力話者に適した範囲に制約するBigram制約HMMを提案する.Bigram制約HMMの出力確率は,前時刻の特徴ベクルトルの条件付き確率で表現されるので,出力確率分布は各時刻で動的に変化する.また,分布を制約することにより,異なる音韻間の特徴量分布の重なりが減少し,認識率を向上することができる.我々は既に,離散型不特定話者用HMMをもとに,VQコードのBigramを用いて遷移を制約する離散型Bigram制約HMMを提案し,従来のHMMよりも性能が良いことを示した.本論文では,更に高い認識性能を得るために,この手法を半連続型Bigram制約HMM,連続型Bigram制約HMMに拡張した.連続音声中の音韻認識によって評価した結果,入力話者の音声のフレーム間相関情報を用した場合,半連続型Bigram制約HMMによって平均音韻認識率を65.4%から74.8%に,連続型Bigram制約HMMによって64.8%から74.5%に改善することができた.また,多数話者から抽出した一般的なフレーム間相関情報を用いた場合,連続型Bigram制約HMMによって64.8%から67.5%に改善することができた.
Scientific journal, Japanese
URL

音韻ラベルを用いない Hmm 評価法とそれを用いた連続音声認識用 Hmm の評価
南泰浩; 松岡達雄; 鹿野清宏
電子情報通信学会論文誌A, J77-A, 2, 267-273, Feb. 1994, Peer-reviwed
Scientific journal, Japanese

番号案内を対象とした大語い連続音声認識アルゴリズム
南泰浩; 山田智一; 鹿野清宏; 松岡達雄
電子情報通信学会論文誌A, J77-A, 2, 190-197, Feb. 1994, Peer-reviwed
Scientific journal, Japanese

A very large vocabulary continuous speech recognition algorithm for telephone directory assistance
Yasuhiro Minami; Tomokazu Yamada; Kiyohiro Shikano; Tatsuo Matsuoka
Electronics and Communications in Japan (Part III: Fundamental Electronic Science), 77, 11, 1-12, 1994, Peer-reviwed, This paper proposes a speech recognition algorithm for large vocabulary continuous speech. The proposed algorithm is based on the hidden Markov model (HMM)‐LR algorithm using a generalized predictive LR parser and phoneme HMMs. The following three techniques are applied to improve recognition performance and reduce processing time. The forward and the backward likelihood are used to accurately determine the likelihood in the beam search. To reduce the trellis computation in HMM speech recognition and for efficient search, only the speech frames in which the predicted phoneme seems to exist are used by the window for phoneme matching. For efficient search, adjusting identical phoneme sequences are merged by checking the stack and the state of the LR parser. The algorithm was applied to a telephone directory assistance task involving more than 70, 000 subscribers. A recognition experiment for continuous word utterance was done. The sentence recognition rate was 85 percent for speaker‐dependent speech recognition
the sentence recognition rate was 71 percent for speaker‐independent speech recognition. The sentence understanding rate was 59 percent for speaker‐dependent speech recognition with spontaneous utterances. Copyright © 1994 Wiley Periodicals, Inc., A Wiley Company
Scientific journal, English
DOI URL

Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system
Yasuhiro Minami; Kiyohiro Shikano; Satoshi Takahashi; Tomokazu Yamada; Osamu Yoshioka; Sadaoki Furui
Speech Communication, 15, 3-4, 301-310, 1994, Peer-reviwed, This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likehood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm. © 1994.
Scientific journal, English
DOI URL

SEARCH ALGORITHM THAT MERGES CANDIDATES IN MEANING LEVEL FOR VERY LARGE VOCABULARY SPONTANEOUS SPEECH RECOGNITION
Y MINAMI; K SHIKANO; S TAKAHASHI; T YAMADA
ICASSP-94 PROCEEDINGS, VOL 2, IEEE, 141-144, 1994, Peer-reviwed
International conference proceedings, English

Language Processing for Speech Recognition
T. Matsuoka; Y. Minami
NTT R & D, 43, 10, 91-100, 1994, Peer-reviwed
Research institution, English

Acoustic Processing for Speech Recognition
Y. Minami; T. Matsuoka
NTT R & D, 43, 10, 81-90, 1994, Peer-reviwed
Research institution, English

Large-Vocabulary Continuous Speech Recognition Algorithm for Telephone Directory Assistance
K. Shikano; Y. Minami; S. Takahashi; T. Yamada
IEEE Workshop on Automatic Speech Recognition, 14-15, Dec. 1993, Peer-reviwed
International conference proceedings, English

Large Vocabulary Continuous Speech Recognition System for Telephone Directory Assistance
Y. Minami; K. Shikano; S. Takahashi; T. Yamada; O. Yoshioka
International Symposium on Spoken Dialogue, 169-172, Nov. 1993, Peer-reviwed
International conference proceedings, English

Multi-Modal Telephone Directory Assistance System Based on Large-Vocabulary Continuous Speech Recognition Algorithm
K. Shikano; Y. Minami; O. Yoshioka; S. Takahashi; T. Yamada
International Workshop on Knowledge Structure for Understanding Speech and Language, 1, Nov. 1993, Peer-reviwed
International conference proceedings, English

Recognition of Noisy Speech by Composition of Hidden Markov Models
F. Martin; K. Shikano; Y. Minami
Eurospeech, 1031-1034, Sep. 1993, Peer-reviwed
International conference proceedings, English

PHONEME HMMS CONSTRAINED BY FRAME CORRELATIONS
S TAKAHASHI; T MATSUOKA; Y MINAMI; K SHIKANO
ICASSP-93 : 1993 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5, I E E E, B219-B222, 1993, Peer-reviwed
International conference proceedings, English

Phoneme HMM Evaluation Algorithm without Phoneme Labeling
Y. Minami; T. Matsuoka; K. Shikano
ICSLP, 1535-1538, Oct. 1992, Peer-reviwed
International conference proceedings, English

Very Large Vocabulary Continuous Speech Recognition for Telephone Directory Assistance
Y. Minami; K. Shikano; T. Yamada; T. Matsuoka
IEEE Workshop on Interactive Voice technology for Telecommunications Applications, VII.1, 2129-2132, Oct. 1992, Peer-reviwed
International conference proceedings, English

RECENT TOPICS IN SPEECH RECOGNITION RESEARCH AT NTT LABORATORIES
S FURUI; K SHIKANO; S MATSUNAGA; T MATSUOKA; S TAKAHASHI; T YAMADA
SPEECH AND NATURAL LANGUAGE, MORGAN KAUFMANN PUB INC, 162-167, 1992, Peer-reviwed
International conference proceedings, English

CONNECTIONIST APPROACHES TO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
H SAWAI; Y MINAMI; M MIYATAKE; A WAIBEL; K SHIKANO
IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, IEICE-INST ELECTRON INFO COMMUN ENG, 74, 7, 1834-1844, Jul. 1991, Peer-reviwed, This paper describes recent progress in a connectionist large-vocabulary continuous speech recognition system integrating speech recognition and language processing. The speech recognition part consists of Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all 24 Japanese phonemes (i.e., 18 consonants /b/, /d/, /g/, /p/, /t/, /k/, /m/, /n/, /N/, /s/, /sh/ ([integral]), /h/, /z/, /ch/ ([t-integral]), /ts/, /r/, /w/, /y/ ([j]) and 5 vowels /a/, /i/, /u/, /e/, /o/ and a double consonant /Q/ or silence) by simply scanning among input speech without any specific segmentation techniques. On the other hand, the language processing part is made up of a predictive LR parser in which the LR parser is guided by the LR parsing table automatically generated from context-free grammar rules, and proceeds left-to-right without backtracking. Time alignment between the predicted phonemes and a sequence of the TDNN phoneme outputs is carried out by the DTW matching method. We call this 'hybrid' integrated recognition system the 'TDNN-LR' method. We report that large-vocabulary isolated word and continuous speech recognition using the TDNN-LR method provided excellent speaker-dependent recognition performance, where incremental training using a small number of training tokens is found to be very effective for adaptation of speaking rate. Furthermore, we report some new achievements as extensions of the TDNN-LR method: (1) two proposed NN architectures provide robust phoneme recognition performance on variations of speaking manner, (2) a speaker-adaptation technique can be realized using a NN mapping function between input and standard speakers and (3) new architectures proposed for speaker-independent recognition provide performance that nearly matches speaker-dependent recognition performance.
Scientific journal, English

On the Robustness of HMM and Ann Speech Recognition Algorithms
Y. Minami; T. Hanazawa; H. Iwamida; E. McDermott; K. Shikano; S. Katagiri; M. Nakagawa
ICSLP, 1345-1348, Nov. 1990, Peer-reviwed
International conference proceedings, English

Trigramモデルを用いた複数候補を求めるフレーム同期型 Hmm 連続音声認識
南泰浩; 中川正雄
電子情報通信学会論文誌D, 電子情報通信学会情報・システムソサイエティ, J73-D-II, 9, 1383-1392, Sep. 1990, Peer-reviwed
Scientific journal, Japanese
URL

時間遅れ神経回路による音韻スポッティング法と予測lrパーザを用いた大語い単語音声認識
南泰浩; 沢井秀文; 宮武正典
電子情報通信学会論文誌D, 電子情報通信学会情報・システムソサイエティ, J73-D-II, 6, 788-795, Jun. 1990, Peer-reviwed
Scientific journal, Japanese
URL

INTEGRATED TRAINING FOR SPOTTING JAPANESE PHONEMES USING LARGE PHONEMIC TIME-DELAY NEURAL NETWORKS
M MIYATAKE; H SAWAI; Y MINAMI; K SHIKANO
ICASSP 90, VOLS 1-5, I E E E, 449-452, 1990, Peer-reviwed
International conference proceedings, English

VARIABLE BIT RATE PARCOR.VQ HYBRID VOCODER
K MIZUI; S WAKABAYASHI; M SATOH; Y MINAMI; M NAKAGAWA
DALLAS GLOBECOM 89, VOLS 1-3, I E E E, 1885-1889, 1989, Peer-reviwed
International conference proceedings, English

MISC

センター試験を対象とした高性能な英語ソルバーの実現
杉山弘晃; 成松宏美; 菊井玄一郎; 東中竜一郎; 堂坂浩二; 平博順; 南泰浩; 大和淳司
2020, 言語処理学会年次大会発表論文集(Web), 26th, 2188-4420, 202002257856951573

Solving the opinion summarization problem in English in the "Can a Robot Get into the University of Tokyo?" project
東中竜一郎; 杉山弘晃; 成松宏美; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 喜多智也; 南泰浩; 風間健流; 大和淳司
2018, 人工知能学会全国大会論文集(CD-ROM), 32nd, ROMBUNNO.2C1.02, Japanese, 1347-9881, 201802262605015084
URL

Why Does It Matter Whether or Not AI is able to Pass University Entrance Examinations? : 1. Technical Challenges Revealed by Solving English Problems
東中竜一郎; 杉山弘晃; 堂坂浩二; 南泰浩; 成松宏美; 磯崎秀樹; 菊井玄一郎; 平博順; 大和淳司
15 Jun. 2017, 情報処理, 58, 7, 600-602, Japanese, Introduction scientific journal, 170000148666, AN00116625
URL

Current status and future challenges for the English subject in the "Can a Robot Get into the University of Tokyo?" project
東中竜一郎; 杉山弘晃; 成松宏美; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩; 大和淳司
2017, 人工知能学会全国大会論文集(CD-ROM), 31st, ROMBUNNO.2H2‐1, Japanese, 1347-9881, 201802268089508328
URL

センター試験における英語問題の回答手法
東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
2015, 言語処理学会年次大会発表論文集(Web), 21st, 2188-4420, 201502234485199950

Gender effect on infant word acquisition order
MINAMI Yasuhiro; KOBAYASHI Tessei
Tomasello pointed out that the word acquisition ages are strongly affected by social environment. This report, to conduct fundamental investigation of such gender effect, we investigate whether word acquisition ages are affected by gender. In order to perform this, we calculate word comprehension ages and word production ages from database collected with MacArthur-Bates Communicative Developmental Inventories. The gender dependent correlation analysis of word production ages and word ages showed cross-gender universality. Moreover, we found gender effect by removing the universality., The Institute of Electronics, Information and Communication Engineers, 29 May 2014, Technical report of IEICE. HIP, 114, 68, 61-66, Japanese, 0913-5685, 110009903784, AN10487237
URL

Gender effect on infant word acquisition order
MINAMI Yasuhiro; KOBAYASHI Tessei
Tomasello pointed out that the word acquisition ages are strongly affected by social environment. This report, to conduct fundamental investigation of such gender effect, we investigate whether word acquisition ages are affected by gender. In order to perform this, we calculate word comprehension ages and word production ages from database collected with MacArthur-Bates Communicative Developmental Inventories. The gender dependent correlation analysis of word production ages and word ages showed cross-gender universality. Moreover, we found gender effect by removing the universality., The Institute of Electronics, Information and Communication Engineers, 29 May 2014, Technical report of IEICE. HCS, 114, 67, 61-66, Japanese, 0913-5685, 110009903742, AN10487226
URL

絵本を基にした対象年齢推定方法の検討
藤田早苗; 小林哲生; 平博順; 南泰浩; 田中貴秋
2014, 人工知能学会全国大会論文集(CD-ROM), 28th, 1347-9881, 201402211375582913

幼児の言語発達研究＜最前線＞
小林哲生; 南泰浩
2014, ヒューマンインターフェース学会誌, 16, 2, 29-34, Japanese, Peer-reviwed, Invited, Introduction scientific journal

Dialogue act tagging for microblog utterances using semantic category patterns
目黒豊美; 東中竜一郎; 杉山弘晃; 南泰浩
In this paper, we propose dialogue act tagging for utterances in microblogs. The dialogue act estimator is built by using support vector machines (SVMs). To cope with the variety of words and expressions in microblogs, the feature vector uses N-grams of characters and words. In addition, the feature vector of word N-grams are abstracted into semantic categories by using a thesaurus. In our experiment, the proposed model outperformed naive baselines based on word N-grams., Information Processing Society of Japan (IPSJ), 18 Oct. 2013, IPSJ SIG Notes, 2013, 1, 1-6, Japanese, 110009613935, AN10442647
URL

Investigation of infant vocabulary learning characteristic using comprehension-to-production (C2P) indexes for early development words
MINAMI Yasuhiro; KOBAYASHI Tessei; SUGIYAMA Hiroaki
Gentner insisted that noun learning predominates verb learning in early vocabulary development Until now, many researches, which examine the predominancy of the noun frequency in the part-of-speech distribution, have supported this assumption Gentner et al and Maguire et al proposed concepts that a vocabulary is a continuum in an abstract representation space, which is strongly related to learning difficulty of words, in order to explain this predominancy of words However, there was no research which showed clearly relation between the learning difficulty and this abstract representation space In this paper, we propose the direct index which connects the difficulty to abstract representation space of words using CDI Furthermore, we report the distribution of word categories in early vocabulary development on this index space., The Institute of Electronics, Information and Communication Engineers, 22 Feb. 2013, Technical report of IEICE. Thought and language, 112, 442, 37-42, Japanese, 0913-5685, 110009728695, AN10449078
URL

対話処理における強化学習
南泰浩; 目黒豊美
計測自動制御学会, 2013, 計測と制御, 52, 10, 916-921, Japanese, Peer-reviwed, Invited, Introduction scientific journal, 0453-4662, 40019836182, AN00072406
URL

Estimating the vocabulary spurt onset using two piece linear regression
南泰浩; 小林哲生; 杉山弘晃
日本音響学会聴覚研究委員会, 08 Mar. 2012, 聴覚研究会資料, 42, 2, 155-160, Japanese, 1346-1109, 40019248769, AN00227138
URL

Dialog Control via Preference-learning based Inverse Reinforcement Learning
杉山弘晃; 目黒豊美; 南泰浩
人工知能学会, 2012, 人工知能学会全国大会論文集, 26, 1-4, Japanese, 1347-9881, 40020270054, AA11578981
URL

Wizard of Oz experiment of listening-oriented dialogue control using POMDPs
目黒豊美; 南泰浩; 東中竜一郎
人工知能学会, 2012, 人工知能学会全国大会論文集, 26, 1-4, Japanese, 1347-9881, 40020270069, AA11578981
URL

語彙爆発の新しい視点 : 日本語学習児の初期語彙発達に関する縦断データ解析
小林哲生; 南泰浩; 杉山弘晃
日本赤ちゃん学会, 2012, ベビーサイエンス, 12, 40-64, Japanese, 40019763996, AA11903404
URL

統計的手法による音声対話制御
南泰浩
2012, 情報処理学会誌, 53, 10, 1088-1094, Japanese, Peer-reviwed, Invited, Introduction scientific journal, 20001036089

POMDP dialogue control using action durations
南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
この報告ではアクションの継続長制御を利用する POMDP による対話制御手法を提案する。我々は、これまで，POMDP による対話制御に，Trigram モデルによる統計的な対話制御を取り入れる手法を提案してきた。しかし，この手法は，対話タスクを自動的に学習することができる反面，高い確率を持っているアクションを過剰に生成する問題点があることが実験からわかってきた．本稿では，この問題点を解決するため POMDP を用いる対話制御において，アクション継続長の確率分布に従ってアクションを生成する手法を導入する。実験結果において，提案方法はアクションの Trigram 確率を高く保ちながら，偏りのないアクション生成を実現できることを確認した．This paper proposes a dialogue control method using action durations. We previously proposed a combined method of an ordinary POMDP-based method and a probability-based method and extended it to treat trigram dialogue control. When we apply this method to less task-oriented dialogues, the method over-generates actions that have high probabilities. To avoid this problem, we introduce duration control to our POMDP action generation process. The experimental results show that the proposed method can generate action sequences whose probability is similar to the training data and increase the entropy of the actions. This increase means that the action generation gives new information and avoids over-gererating the same actions. This confirms that our method generates appropriate action sequences., 情報処理学会, Apr. 2011, 情報処理学会研究報告, 2010, 6, 1-8, Japanese, 2186-2583, 110008583618

POMDP Dialogue Control using Action Duration
南泰浩; 目黒豊美; 東中竜一郎
人工知能学会, 2011, 人工知能学会全国大会論文集, 25, 1-4, Japanese, 1347-9881, 40020269460, AA11578981
URL

部分観測マルコフ決定過程に基づく対話制御
南泰浩
2011, 音響学会誌, 67, 10, 482-487, Japanese, Peer-reviwed, Invited, Introduction scientific journal

人ロボット共生におけるコミュニケーション戦略の生成
前田英作; 南泰浩; 堂坂浩二
2011, 日本ロボット学会誌, 29, 10, 887-890, Japanese, Peer-reviwed, Invited, Introduction scientific journal

On a dynamically changing learning strategy for interactive visual scene understanding
KIMURA Akisato; MINAMI Yasuhiro; SAKANO Hitoshi; MAEDA Eisaku; SUGIYAMA Hiroaki
We humans believe that we can easily and naturally understand and verbalize most of given visual scenes. On the other hand, as widely known, the problem of visual scene understanding has been still yet to be far from the ultimate goal, despite of its long history and significance. However, even in humans, it would be natural that almost all the abilities for visual scene understanding have to be (but unintentionally) acquired during the developmental processes, except for basic sensory organs and quite a few fundamental functions. In this report, we discuss a novel approach to the realization of sophisticated visual scene understanding so that computers can acquire the ability naturally. Most of the discussion are directed to its learning strategy, which should be communicative, dynamically changed according to their own knowledge and long-tailed. The framework provides a lot of new and challenging problems to not only the multimedia research community but also other related communities such as HCI, computer vision, machine learning and cognitive science., The Institute of Electronics, Information and Communication Engineers, 02 Dec. 2010, Technical report of IEICE. PRMU, 110, 330, 53-54, Japanese, 110008675751

Thought-evoking multi-party dialogue system: CAMP
堂坂浩二; 南泰浩
人工知能学会, 28 Oct. 2010, 言語・音声理解と対話処理研究会, 60, 35-38, Japanese, 0918-5682, 40017365379, AN10432166
URL

Action planning for interactive visual scene understanding based on knowledge confidence defined on latent spaces
SEKHON Gurbachan; KIMURA Akisato; MINAMI Yasuhiro; SAKANO Hitoshi; MAEDA Eisaku
This report proposes a method for action planning in a system of interactive visual scene understanding through the use of system knowledge and its confidence. The knowledge confidence is defined as the combination of the following two properties on the latent space of a topic model connecting image features and text labels: 1) Similarity between an input sample and training samples on the latent space, and 2) the overall associability between each text label as determined by the content of the training samples. We evaluate the proposed method in the context of annotation accuracy and effort for providing answers from users. The experimental results with PASCAL VOC2008 dataset indicate that our proposed method achieved comparable or better annotation accuracy with less effort compared with strategies of 1) always asking the name of objects and 2) generating random questions., The Institute of Electronics, Information and Communication Engineers, 29 Aug. 2010, Technical report of IEICE. PRMU, 110, 187, 201-208, English, 0913-5685, 110008107179

Dialogue control by POMDP using dialogue data statistics
Minami Yasuhiro; Mori Akira; Meguro Toyomi; Higashinaka Ryuichiro; Dohsaka Kohji; Maeda Eisaku
The Institute of Electronics, Information and Communication Engineers, 21 Dec. 2009, IEICE technical report, 109, 355, 83-88, Japanese, 0913-5685, 110008002098

Analyzing the Characteristics of Listening-oriented Dialogue for Building Listening Agents
MEGURO TOYOMI; HIGASHINAKA RYUICHIRO; DOHSAKA KOHJI; MINAMI YASUHIRO; ISOZAKI HIDEKI
我々は，ユーザの話を聞くことによって「話したい」という欲求を満たす聞き役対話システムの構築を目的としている．本稿では，そのような対話システムの構築を目的とした聞き役対話の分析について報告する．まず，人同士の聞き役対話と雑談を収集し，それぞれの対話タイプにおける対話行為の頻度を比較し，続いて，対話の流れを Hidden Markov Model （HMM）を用いて分析した．その結果，聞き役対話と雑談の HMM はそれぞれの特徴を示し，聞き役対話では，聞き役は質問をする前に自己開示を行い，より質問と相槌を多く行っていることがわかった．また，話し役や聞き役の性格特徴によって聞き役対話がどのように変わるかを分析した．その結果，それぞれの性格特徴によって対話が大きく異なることがわかった．Our aim is to build listening agents that can attentively listen to the user and satisfy his/her desire to speak and have himself/ herself heard. This paper investigates the characteristics of such listening-oriented dialogues so that such a listening process can be achieved by automated dialogue systems. We collected both listening-oriented dialogues and casual conversation, and analyzed them by comparing the frequency of dialogue acts, as well as the dialogue flows using Hidden Markov Models (HMMs). The analysis revealed that listening-oriented dialogues and casual conversation have characteristically different dialogue flows and that it is important for listening agents to self-disclose before asking questions and to utter more questions and acknowledgment than in casual conversation. We also investigated the effects of personality traits on listening-oriented dialogue. We found that a dialogue becomes characteristically different depending on the personality traits of speakers and listeners., 21 Sep. 2009, 研究報告自然言語処理（NL）, 2009, 10, 1-6, Japanese, 0919-6072, 110008003241, AN10115061
URL

Controlling thought-evoking dialogue using POMDP
MINAMI Yasuhiro; SAWAKI Minako; HIGASHINAKA Ryuichiro; DOHSAKA Kohji
We are researching thought-evoking dialogue systems where conversation agents appropriately affect users and evoke their voluntary thoughts to motivate human communication. This paper proposes a thought-evoking quiz dialogue system using the Partially Observed Markov Decision Process (POMDP) that can treat such uncertain information as paralanguage information. As uncertain information, we employ the user's level of difficulty in handling quiz hints. Another person detects this difficulty level by observing the user's facial and voice information. The system controls the user's difficulty levels (easy, neutral, and difficult) for the hints by skipping hints based on the POMDP policy that was learned by reinforcement training. This paper evaluates the proposed system in simulation experiments., Information Processing Society of Japan (IPSJ), 02 Dec. 2008, IPSJ SIG Notes, 2008, 123, 97-102, Japanese, 0919-6072, 110007114728, AN10442647
URL

Controlling thought-evoking dialogue using POMDP
MINAMI Yasuhiro; SAWAKI Minako; HIGASHINAKA Ryuichiro; DOHSAKA Kohji
We are researching thought-evoking dialogue systems where conversation agents appropriately affect users and evoke their voluntary thoughts to motivate human communication. This paper proposes a thought-evoking quiz dialogue system using the Partially Observed Markov Decision Process (POMDP) that can treat such uncertain information as paralanguage information. As uncertain information, we employ the user's level of difficulty in handling quiz hints. Another person detects this difficulty level by observing the user's facial and voice information. The system controls the user's difficulty levels (easy, neutral, and difficult) for the hints by skipping hints based on the POMDP policy that was learned by reinforcement training. This paper evaluates the proposed system in simulation experiments., The Institute of Electronics, Information and Communication Engineers, 02 Dec. 2008, IEICE technical report, 108, 337, 97-102, Japanese, 0913-5685, 110007114428, AN10091225
URL

まっしゅるーむの世界－環境知能の実現－
南泰浩; 堂坂浩二; 澤木美奈子; 森啓; 前田英作
2008, ヒューマンインタフェース学会誌, 10, 2, 5-10, Japanese, Peer-reviwed, Invited, Introduction scientific journal

クイズ対話システムの構築と音声認識性能による評価
南泰浩; 東中竜一郎; 澤木美奈子; 堂坂浩二; 山田武士; 松林達史; 磯崎秀樹; 前田英作
2007, 日本音響学会研究発表会講演論文集(CD-ROM), 2007, 1880-7658, 200902257906453498

Evaluation of the SOLON Speech Recognition System : 2006 Benchmark using the Corpus of Spontaneous Japanese
NAKAMURA Atsushi; OBA Takanobu; WATANABE Shinji; ISHIZUKA Kentaro; FUJIMOTO Masakiyo; HORI Takaaki; MCDERMOTT Erik; MINAMI Yasuhiro
This article describes results from the latest benchmark tests of our speech recognition system 'SOLON' using the Corpus of Spontaneous Japanese (CSJ). The improvement in recognition accuracy using several techniques, including prior voice-activity detection, speaking-rate dependent analysis, corrective language modeling, discriminative training of full-covariance parameters, unsupervised model adaptation, and their combinations, are reported., The Institute of Electronics, Information and Communication Engineers, 15 Dec. 2006, IEICE technical report, 106, 444, 73-78, Japanese, 0913-5685, 110006163063, AN10013221
URL

A Transdisciplinary Approach to Human-Computer Interaction with Kankyo Chinou : towards new "intellect" in future "environments"
MAEDA Eisaku; MINAMI Yasuhiro; DOHSAKA KOHJI; MORI Akira; KONDOH Tadahisa
A research project on "ambient intelligence"(Kankyo Chinou in Japanese) was launched two years ago by NTT Communication Science Laboratories that targeted new lifestyles made possible by communication science. Research activities on "ambient intelligence" should bridge the boundaries between technological fields and thus cover the entire field of communication science, rather than be limited to specific fields. Besides performing the basic R&D, we are striving to get this concept established in a comprehensive and strategic way. This article introduces achievements made in this project and the details of the developed demonstration systems., 一般社団法人電子情報通信学会, 12 Oct. 2006, IEICE technical report, 106, 298, 51-56, Japanese, 0913-5685, 110004851875

A Transdisciplinary Approach to Human-Computer Interaction with Kankyo Chinou : towards new "intellect" in future "environments"
MAEDA Eisaku; MINAMI Yasuhiro; DOHSAKA KOHJI; MORI Akira; KONDOH Tadahisa
A research project on "ambient intelligence" (Kankyo Chinou in Japanese) was launched two years ago by NTT Communication Science Laboratories that targeted new lifestyles made possible by communication science. Research activities on "ambient intelligence" should bridge the boundaries between technological fields and thus cover the entire field of communication science, rather than be limited to specific fields. Besides performing the basic R&D, we are striving to get this concept established in a comprehensive and strategic way. This article introduces achievements made in this project and the details of the developed demonstration systems., 一般社団法人電子情報通信学会, 12 Oct. 2006, IEICE technical report, 106, 300, 69-74, Japanese, 0913-5685, 110004852058

A Benchmark Evaluation of Speech Recognizer SOLON using The Corpus of Spontaneous Japanese (Ver. 1.0)
NAKAMURA Atsushi; OBA Takanobu; WATANABE Shinji; ISHIZUKA Kentaro; HORI Takaaki; SCHUSTER Mike; MCDERMOTT Erik; MINAMI Yasuhiro
The SOLON is a speech recognition testbed system that has been developed at NTT Communication Science Laboratories. This paper reports results from the latest benchmark evaluation of the SOLON using the Corpus of Spontaneous Japanese (CSJ). The effectiveness of some of techniques, including minimum classification error training and full-covariance modeling, is presented through experiments. Also, results of recognition error analysis and additional evaluations are described., The Institute of Electronics, Information and Communication Engineers, 22 Dec. 2005, IEICE technical report, 105, 494, 7-12, Japanese, 0913-5685, 110003488505, AN10091225
URL

A Benchmark Evaluation of Speech Recognizer SOLON using The Corpus of Spontaneous Japanese (Ver. 1.0)
NAKAMURA Atsushi; OBA Takanobu; WATANABE Shinji; ISHIZUKA Kentaro; HORI Takaaki; SCHUSTER Mike; MCDERMOTT Erik; MINAMI Yasuhiro
The SOLON is a speech recognition testbed system that has been developed at NTT Communication Science Laboratories. This paper reports results from the latest benchmark evaluation of the SOLON using the Corpus of Spontaneous Japanese (CSJ). The effectiveness of some of techniques, including minimum classification error training and full-covariance modeling, is presented through experiments. Also, results of recognition error analysis and additional evaluations are described., Information Processing Society of Japan (IPSJ), 22 Dec. 2005, IPSJ SIG Notes, 2005, 127, 97-102, Japanese, 0919-6072, 110003494733, AN10442647
URL

Applications of the Bayesian network to audio signal recognition
Kashino Kunio; Minami Yasuhiro
Acoustical Society of Japan, 2005, THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN, 61, 12, 714-719, Japanese, Peer-reviwed, Invited, Introduction scientific journal, 0369-4232, 110004019698, AN00186234
URL
DOI URL

Speech recognition method based on trajectories generated by Kalman filters
MINAMI Yasuhiro
22 Dec. 2004, 情報処理学会研究報告. SLP, 音声言語情報処理, 54, 49-54, English, 0919-6072, 10014062518, AN10442647

音声生成モデルを考慮した音声認識
南泰浩
2003, 日本音響学会誌, 59, 11, Japanese, Peer-reviwed, Invited, Introduction scientific journal

A Recogniton Method with Parametric Trajectory Synthesized Using Direct Relations Between Static and Dynamic Feature Vector Time Series
MINAMI Yasuhiro; MCDERMOTT Erik; NAKAMURA Atsushi; KATAGIRI Shigeru
18 Mar. 2002, 日本音響学会研究発表会講演論文集, 2002, 1, 83-84, Japanese, 1340-3168, 10018033127, AN00351181

LANGUAGE MODEL SYNCHRONIZATION FOR IMPROVED BEAM-SEARCH PERFORMANCE IN LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
WILLETT Daniel; MCDERMOTT Erik; MINAMI Yasuhiro; KATAGIRI Shigeru
01 Oct. 2001, 日本音響学会研究発表会講演論文集, 2001, 2, 99-100, English, 1340-3168, 10007458257, AN00351181

An efficient search method for continuous speech recognition using network structure.
HANAZAWA Ken; MINAMI Yasuhiro; FURUI Sadaoki
01 Mar. 1997, 日本音響学会研究発表会講演論文集, 1997, 1, 51-52, Japanese, 1340-3168, 10002742165, AN00351181

Digit string recognition in spontaneous speech
BAUCHE Etienne; MINAMI Yasuhiro; GAJIC Bojana; MATSUOKA Tatsuo; FURUI Sadaoki
01 Mar. 1997, 日本音響学会研究発表会講演論文集, 1997, 1, 169-170, Japanese, 1340-3168, 10002742502, AN00351181

Extended HMM composition incorporating power variance.
MINAMI Yasuhiro; FURUI Sadaoki
01 Sep. 1996, 日本音響学会研究発表会講演論文集, 1996, 2, 141-142, Japanese, 1340-3168, 10002739836, AN00351181

雑音と歪みを含んだ音声へのHMM適応化手法の評価
南泰浩
1996, 日本音響学会講演論文集, 85-86, 10004085867, AN00351181

雑音と歪みを含んだ音声へのHMM適応化手法の評価
南泰浩
1996, 音講論集, 2, 10004086017, AN00351181

HMM adaptation method using maximum likelihood estimation
MINAMI Yasuhiro; FURUI Sadaoki
01 Sep. 1995, 日本音響学会研究発表会講演論文集, 1995, 2, 1-2, Japanese, 1340-3168, 10002734243, AN00351181

Adaptation method using maximum likelihood procedure based on HMM composition
MINAMI Yasuhiro; FURUI Sadaoki
This paper proposes an adaptation method for universal noise (combination of additive noise and multiplicative distortion) based on the HMM composition (compensation) technique. Although the original HMM composition can be applied only to additive noise, our new method can also estimate multiplicative distortion by maximizing the likelihood value. The signal-to-noise ratio is automatically estimated as part of the estimation of multiplicative distortion. Phoneme recognition experiments show that this method improves recognition accuracy for noisy and distorted speech., The Institute of Electronics, Information and Communication Engineers, 22 Jun. 1995, IEICE technical report. Speech, 95, 122, 45-50, Japanese, 110003296453, AN10013221

A maximum likelihood procedure for an HMM adaptation method
MINAMI Yasuhiro; FURUI Sadaoki
01 Mar. 1995, 日本音響学会研究発表会講演論文集, 1995, 1, 61-62, Japanese, 1340-3168, 10002731969, AN00351181

自由発声音声認識における意味を考慮した2段 LP パーザの検討
南泰浩
1993, 音講論, 69-70, 10006730761, AN00351181

Comparisons between Continuous Speech Recognition Systems (ATREUS) at ATR.
山口耕市; 嵯峨山茂樹; 服部浩明; 小森康弘; 沢井秀文; 花沢利行; 中村哲; 甲斐充彦; 南泰浩
Oct. 1992, 日本音響学会研究発表会講演論文集, 1992, Autumn Pt 1, 181-182, Japanese, 1340-3168, 200902051842663488

不特定話者連続音声データベースを用いたHMMの連結学習
南泰浩
1992, 音講論集, 9-10, 10006764155, AN00351181

TDDN音韻スポッティングと拡張LRパーザを用いた文節音声認識
南泰浩
1989, 平1秋音響講論集, 3, 10006754381

Books and other publications

人工知能プロジェクト「ロボットは東大に入れるか」: 第三次AIブームの到達点と限界
Scholarly book, Japanese, Joint work, 東京大学出版会, 28 Sep. 2018

これからの強化学習
Scholarly book, Japanese, Joint work, 森北出版, 27 Oct. 2016

Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models
R. Higashinaka; Y. Minami; K. Dohsaka; T. Meguro
English, Joint work, 2010

Dialogue Control by Pomdp Using Dialogue Data Statistics
Y. Minami; A. Mori; T. Meguro; R. Higashinaka; K. Dohsaka; E. Maeda
English, Joint work, 2010

環境知能のすすめ -情報化社会の新しいパラダイム-
外村佳伸; 前田英作; 竹内郁雄; 東浩紀; 石黒浩; 下條信輔; 堂坂浩二; 南泰浩; 中島秀之; 輿水大和
Japanese, Joint work, 2008

音声認識の基礎（上）
古井貞煕; 鹿野清宏; 嵯峨山茂樹; 松岡達雄; 南泰浩; 松井知子; 高橋敏; 山田智一; 吉岡理
Japanese, Joint translation, 1995

FM7 解析マニュアルフェーズiii
菊地寿; 蓑原辰夫; 南泰浩
Japanese, Joint work, 1984

Lectures, oral presentations, etc.

論文執筆支援を目的とした引用要否判定タスクのドメイン間比較
小山康平; 小林恵大; 成松宏美; 南泰浩
言語処理学会第２９回年次大会
16 Mar. 2023

人間の多次元的な心的表象に基づく幼児語彙獲得モデルの構築
藤田守太; 南泰浩
言語処理学会第２９回年次大会
16 Mar. 2023

知識グラフと Wikipedia を用いた雑談対話モデルの構築
郭恩孚; 南泰浩
言語処理学会第２９回年次大会
14 Mar. 2023

共通基盤の構築における名付けの有用性の分析
齋藤結; 光田航; 東中竜一郎; 南泰浩
言語処理学会第２９回年次大会
15 Feb. 2023

話題継続とペルソナを考慮した雑談対話システムの構築
佐藤明智; 南泰浩; 金子俊太; 谷口伊織; 郭
言語・音声理解と対話処理研究会
Dec. 2022

対話での共通基盤構築過程における名付けの分析
齋藤結; 光田航; 東中竜一郎; 南泰浩
Oral presentation, Japanese, 言語処理学会２８回年次大会, Domestic conference
15 Mar. 2022

引用要否判定タスクにおけるモデルの性能評価とデータの妥当性分析
小山康平; 小林恵大; 成松宏美; 南泰浩
Oral presentation, Japanese, 言語処理学会第２８回年次大会, Domestic conference
15 Mar. 2022

学術論文PDFからの関連研究章と引用情報の抽出による論文執筆支援のためのデータセット構築
小林恵大; 小山康平; 成松宏美; 南泰浩
Oral presentation, Japanese, 言語処理学会第２８回年次大会, Domestic conference
15 Mar. 2022

固有名詞に注目したTransformerによる雑談対話モデルの構築
郭恩孚; 南泰浩
Oral presentation, Japanese, 言語処理学会第２８回年次大会, Domestic conference
15 Mar. 2022

Bert による引用要否判定とエラー分析
堂坂浩二; 成松宏美; 小山康平; 東中竜一郎; 南泰浩; 田盛大悟; 平
人工知能学会全国大会
Jun. 2021

相互排他性を考慮した深層強化学習による幼児語彙獲得モデル
藤田守太; 南泰浩; 田口真輝
Oral presentation, Japanese, 言語処理学会第27回年次大会(NLP2021), Domestic conference
17 Jan. 2021

学術論文における関連研究の執筆支援のための被引用論文の推定
小山康平; 南泰浩; 成松宏美; 堂坂浩二; 東中竜一郎; 田盛大悟; 平博順
Oral presentation, Japanese, 言語処理学会第27回年次大会(NLP2021), Domestic conference
17 Jan. 2021

学術論文における関連研究の執筆支援のためのタスク設計およびデータ構築
成松宏美; 小山康平; 堂坂浩二; 田盛大悟; 東中竜一郎; 南泰浩; 平博順
Oral presentation, Japanese, 言語処理学会第27回年次大会(NLP2021), Domestic conference
17 Jan. 2021

ニューラルネットワーク強化学習を用いた幼児語彙獲得のモデル化
Oral presentation, Japanese, ヒューマンコミュニケーション基礎研究会, Domestic conference
26 Jan. 2020

乳児院入所児における言語発達の特徴-語彙数・語彙獲得順序・品詞カテゴリからの分析
坂本有香; 奥村優子; 南泰浩; 麦谷綾子; 伊藤嘉余子; 小林哲生
Oral presentation, Japanese, ヒューマンコミュニケーション基礎研究会, Domestic conference
25 Jan. 2020

幼児の語彙発達における地域差の分析
坂本有香; 南泰浩; 曹妍; 奥村優子; 小林哲生
Oral presentation, Japanese, 赤ちゃん学会第19 回学術集会, Domestic conference
06 Jul. 2019

多言語コーパスを用いた幼児語彙獲得時期での男女間相関の特性
藤田浩貴; 南泰浩; 小林哲生; 奥村優子
Oral presentation, Japanese, 言語処理学会第24回年次大会, Domestic conference
Mar. 2018

幼児の簡易語彙能力チェックリスト作成における幼児分類の効率化
塚田元春; 南泰浩; 小林哲生; 奥村優子
Oral presentation, Japanese, 言語処理学会第24回年次大会, Domestic conference
Mar. 2018

DRQNによる幼児の語彙獲得のモデル化
野口輝; 南泰浩
Oral presentation, Japanese, 言語処理学会第24回年次大会, Domestic conference
Mar. 2018

ニューラルネットワークと強化学習による幼児の語彙獲得のモデル化
野口輝; 南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
Jan. 2018

幼児の言語発達における共通ボキャブラリー指数の提案
曹妍; 南泰浩; 奥村優子; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
Jan. 2018

多言語における幼児語彙獲得時期の男女間相関の比較
藤田浩貴; 南泰浩; 小林哲生; 奥村優子
Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
Jan. 2018

幼児の能力推定のための簡易語彙チェックリストの提案
森山佑亮; 南泰浩; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
2017

マルチターン対話における次発話予測での効果的な特徴量の統合手法およびその分析
玉木竜二; 南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告 (言語理解とコミュニケーション), Domestic conference
2017

大規模幼児語彙発達データによる語彙獲得現象の分析
森山佑亮; 南泰浩; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
2017

乳幼児の語理解・発話日齢に与える母親の教育年数の影響
森山佑亮; 南泰浩; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), Domestic conference
2016

語彙チェックリストアプリによる幼児語彙発達データ収集の試み
小林哲生; 奥村優子; 南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告, Domestic conference
Jan. 2016

言語発達遅延児における語彙成長記録アプリ活用の試み
阿久津由紀子; 小林哲生; 小形哲也; 渡辺佐和; 齋藤貴美子; 南泰浩
Oral presentation, Japanese, 日本言語聴覚学会, Domestic conference
2016

Three-way restricted boltzmann machine による音声モデリングに基づく話者・音素の同時認識
中鹿亘; 南泰浩
Oral presentation, Japanese, 研究報告音楽情報科学 (MUS)
2016

語彙チェックリストアプリによる幼児語彙発達データ収集の試み
小林哲生; 奥村優子; 南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告, Domestic conference
Jan. 2016

乳幼児の語理解・発話日齢に与える母親の教育年数の影響
森山佑亮; 南泰浩; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告, Domestic conference
Jan. 2016

幼児語彙習得順序における言語共通性と依存性について
南泰浩; 小林哲生
Poster presentation, Japanese, 日本音響学会秋季講演論文集, Domestic conference
17 Mar. 2015

センター試験における英語問題の回答手法
東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
Poster presentation, Japanese, 言語処理学会第２１回年次大会, Domestic conference
17 Mar. 2015

本幼児の語彙習得順序に関する性別依存性について
南泰浩; 小林哲生
Poster presentation, Japanese, 電子情報通信学会技術研究報告HCS
30 Jan. 2015

日本語習得児における語彙カテゴリ構成の発達的変遷
小林哲生; 南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告HCS
2014

幼児語彙習得順序における性別の影響について
南泰浩; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告HCS
2014

1-2歳児における語彙カテゴリ構成の発達的変遷：大規模横断データを用いた検討
小林哲生; 南泰浩
Invited oral presentation, Japanese, 日本教育心理学会第56回総会(JAEP56)
2014

絵本を基にした対象年齢推定方法の検討
藤田早苗; 小林哲生; 平博順; 南泰浩; 田中貴秋
Invited oral presentation, Japanese, 第28回人工知能学会全国大会
2014

幼児コンテンツ制作支援のための語彙検索システムの提案とその評価
小林哲生; 南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告HIP
2013

幼児早期出現語理解-発話指標による幼児語彙学習特徴の検証
南泰浩; 小林哲生; 杉山弘晃
Oral presentation, Japanese, 電子情報通信学会技術研究報告TL
2013

単語の発話音韻長と幼児の語彙獲得期間との関係
南泰浩; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2013

語彙の身体性が獲得時期の個人差に与える影響
杉山弘晃; 小林哲生; 南. 泰浩
Invited oral presentation, Japanese, 赤ちゃん学会第13回学術集会
2013

幼児コンテンツ制作支援のための語彙検索システム語の習得月齢・習得率の指定による該当語の選択
小林哲生; 南泰浩
Invited oral presentation, Japanese, 第13回学術集会
2013

語の学習では本当に幼児は名詞を早く獲得する？―語の理解・発話日齢の推定による名詞優位性の言語間比較―
南泰浩; 小林哲生; 杉山晃弘
Invited oral presentation, Japanese, 赤ちゃん学会第13回学術集会
2013

幼児早期出現語の理解-発話指標による名詞学習の優位性の検証
南泰浩; 小林哲生
Invited oral presentation, Japanese, 言語処理学会第19回年次大会
2013

折れ線近似による語彙爆発開始時期の推定
南泰浩; 小林哲生; 杉山弘晃
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2012

初期語彙発達の急増期における統計的性質と特徴量抽出
南泰浩; 小林哲生
Oral presentation, Japanese, 電子情報通信学会技術研究報告TL
2012

POMDP を用いた聞き役対話制御部の Wizard of Oz 実験による評価
目黒豊美; 南泰浩; 東中竜一郎; 堂坂浩二
Invited oral presentation, Japanese, 人工知能学会全国大会（第26回）
2012

２ツイートを用いた対話モデルの構築
東中竜一郎; 川前徳章; 貞光九月; 南泰浩; 目黒豊美; 堂坂浩二; 稲垣博人
Invited oral presentation, Japanese, 言語処理学会第18回年次大会
2012

順序学習に基づく逆強化学習による対話制御
杉山弘晃; 目黒豊美; 南泰浩
Invited oral presentation, Japanese, 人工知能学会全国大会（第26回）
2012

語彙学習速度の線形性を利用した語彙学習日齢の予測
杉山弘晃; 小林哲生; 南泰浩
Invited oral presentation, Japanese, 赤ちゃん学会第 12 回学術集会
2012

幼児コンテンツ作成のための発達に即した語彙検索システムの作成
小林哲生; 南. 泰浩
Invited oral presentation, Japanese, 教育心理学会総会
2012

縦断および横断データを用いた幼児早期出現語の獲得月齢の特定
小林哲生; 南泰浩; 永田昌明
Invited oral presentation, Japanese, 言語処理学会第18回年次大会
2012

幼児の語彙学習速度と語彙カテゴリー構成
小林哲生; 南泰浩; 杉山弘晃
Invited oral presentation, Japanese, 赤ちゃん学会第 12 回学術集会
2012

線形関数とプラトー割り込みによる語彙発達モデルの検証―幼児の語彙発達におけるポアソン過程性の検証―
南泰浩; 小林哲生; 杉山弘晃
Invited oral presentation, Japanese, 赤ちゃん学会第 12 回学術集会
2012

カルマンフィルタを用いた語彙発達におけるプラトー時期の推定
南泰浩; 小林哲生; 杉山弘晃
Invited oral presentation, Japanese, 音響学会秋季
2012

線形関数とプラトー割込による幼児語彙発達のモデル化
南泰浩; 小林哲生; 杉山弘晃
Invited oral presentation, Japanese, 言語処理学会第18回年次大会
2012

POMDP を用いた聞き役対話システムの対話制御
目黒豊美; 東中竜一郎; 南泰浩; 堂坂浩二
Invited oral presentation, Japanese, 言語処理学会第17回年次大会
2012

アクション継続長制御を利用する POMDP 対話制御
南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
Oral presentation, Japanese, 情報処理学会研究報告HCI
2011

共通状態と連結学習を用いた HMM によるコールセンタ対話の要約
東中竜一郎; 南泰浩; 西川仁; 堂坂浩二; 目黒豊美; 小橋川哲; 政瀧浩和; 吉岡理; 高橋敏; 菊井玄一郎
Invited oral presentation, Japanese, 言語処理学会第17回年次大会
2011

アクション継続長制御を用いた POMDP による対話制御
南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
Invited oral presentation, Japanese, 人工知能学会全国大会論文集
2011

ユーザ支援システムのための人の行動タイミング決定方策の分析
杉山弘晃; 南泰浩
Invited oral presentation, Japanese, 第28回日本ロボット学会学術講演会
2011

思考喚起型多人数対話システム--キャンプ
堂坂浩二; 南泰浩
Oral presentation, Japanese, 人工知能学会言語・音声理解と対話処理研究会
2010

POMDP による Trigram 対話制御
南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 森啓; 前田英作
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2010

保有知識の確信度に基づく対話型映像認識理解システムの質問生成戦略
セクホン・ガーバチャン; 木村昭悟; 南泰浩; 坂野鋭; 前田英作
Oral presentation, Japanese, 電子情報通信学会技術研究報告IBISML
2010

音声対話におけるエージェント発話行動の適応的調整
堂坂浩二; 金本淳志; 東中竜一郎; 南泰浩; 前田英作
Invited oral presentation, Japanese, 人工知能学会全国大会
2010

対話データを用いた POMDP による統計的対話制御手法の解析
南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 前田英作
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2010

統計的モデルを用いた POMDP による対話制御
南泰浩; 目黒豊美; 東中竜一郎; 森啓; 堂坂浩二; 前田英作
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2010

聞き役対話システムの構築を目的とした聞き役対話の分析
目黒豊美; 東中竜一郎; 堂坂浩二; 南. 泰浩; 磯崎秀樹
Oral presentation, Japanese, 情報処理学会研究報告NL
2009

対話データの統計量を用いた POMDP による対話制御
南泰浩; 森啓; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
Oral presentation, Japanese, 情報処理学会研究報告SLP
2009

音声認識システム SOLON における日本語講演音声への教師なし適応に関する評価
大庭隆伸; 渡部晋治; 石塚健太郎; 藤本雅清; 堀貴明; マックダーモット・エリック; 南泰浩; 中村篤
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2009

POMDP を利用した思考喚起型対話の制御
南泰浩; 澤木美奈子; 東中竜一郎; 堂坂浩二
Oral presentation, Japanese, 情報処理学会研究報告SLP
2008

クイズ対話システムの構築と音声認識性能による評価
南泰浩; 東中竜一郎; 澤木美奈子; 堂坂浩二; 山田武士; 松林達史; 磯崎秀樹; 前田英作
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2007

カルマンフィルタに基づく音声認識手法における混合ガウス分布モデルの検討
南泰浩
Invited oral presentation, Japanese, 日本音響学会秋講演論文集
2007

カルマンフィルタを用いた音声認識
南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2007

環境知能の実現に向けた分野横断型研究の試み
前田英作; 南泰浩; 堂坂浩二; 森啓; 近藤公久
Oral presentation, Japanese, 電子情報通信学会技術研究報告ＰＲＭＵ
2006

音声認識システム SOLON の日本語話し言葉コーパスによる評価(2006年版)
中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
Oral presentation, Japanese, 情報処理学会研究報告SLP
2006

音声認識システム SOLON の日本語話し言葉コーパス（公開版ver1.0）による評価
中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2006

カルマンフィルタによる音声認識のための特徴量トラジェクトリ生成法
南泰浩; マックダーモット・エリック; 中村篤
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2006

ベイズ的基準を用いた状態共有型 HMM 構造の選択
渡部晋治; 南泰浩; 中村篤; 上田修功
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2005

変分ベイズを用いた音声認識
渡部晋治; 南泰浩; 中村篤; 上田修功
Oral presentation, Japanese, 第8回情報論的学習理論ワークショップ予稿集
2005

音声認識システム SOLON の日本語話し言葉コーパス（公開版ver1.0）による評価
中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
Oral presentation, Japanese, 情報処理学会研究報告SLP
2005

音声特徴抽出法 Spade における歪補正法の効果
石塚健太郎; 宮崎昇; 中谷智広; 南泰浩
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2005

カルマンフィルタにより生成されたトラジェクトリに基づく音声認識
南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2004

音声認識でのダイナミクスの表現
南泰浩
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2004

帯域内での周期性・非周期性を表す音声特徴抽出法spadの提案とaurora-2jを用いた耐雑音性評価
石塚健太郎; 宮崎昇; 中谷智広; 南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2004

音声認識システム SOLON の日本語話し言葉コーパスにおける評価
渡部晋治; 堀貴明; マクダーモット・エリック; 南泰浩; 中村篤
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2004

WFST の高速on-the-Fly合成による超大語彙連続音声認識
堀貴明; 堀智織; 南泰浩
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2004

有限状態トランスデューサ型デコーダの性能改善
堀貴明; 南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2004

特徴量トラジェクトリによる音声認識手法の理論的考察
南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2004

変分ベイズ法の音響モデル適応への応用
渡部晋治; 南泰浩; 中村篤; 上田修功
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2004

有限状態トランスデューサによる音声要約法の評価
堀貴明; 堀智織; 南泰浩
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2003

有限状態トランスデューサによる音声認識・文整形・要約処理の統合
堀貴明; 堀智織; 南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2003

変分ベイズ法の音声認識への適用
渡部晋治; 南泰浩; 中村篤; 上田修功
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2003

ベイズ的アプローチに基づく状態共有型 HMM 構造の学習
渡部晋治; 南泰浩; 中村篤; 上田修功
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2002

実対話音声を用いた有限状態トランスデューサ型認識デコーダの評価
奈木野豪秀; ヴィレット・ダニエル; 南泰浩; 中村篤; マクダーモット・エリック; 宮崎昇; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
2002

セグメントモデルに基づく音声認識
南泰浩
Oral presentation, Japanese, 情報処理学会音声言語情報処理研究会SIG-SLP
2002

有限状態トランスデューサによる音声認識と文整形処理の統合
堀貴明; ヴィレット・ダニエル; 南泰浩
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2002

混合分布型 HMM を用いたトラジェクトリパラメータ生成によろ音声認識手法の評価
南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2002

静的特徴量と動的特徴量の関係を用いたトラジェクトリパラメータ生成による音声認識手法
南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2002

バイノーラル音源分離の音声認識による評価
中谷智広; 南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2002

On-Line Transducer Composition for Memory-Efficient Search in Lvcsr
ヴィレット・ダニエル; 南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
2002

Pervasive Unsupervised Adaptation for Off-Line Lecture Speech Transcription
ウィレット・ダニエル; ニスラー・トーマス; マクダーモット・エリック; 南泰浩; 片桐滋
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2002

Language Model Synchronization for Improved Beam-Search Performance in Large Vocabulary Continuous Speech Recognition
ヴィレット・ダニエル; マックダーモット・エリック; 南泰浩; 片桐滋
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
2002

A Time-Synchronous Viterbi-Decoder for Arbitrary Speech Recognition Tasks Defined by Finite State Transducers
ヴィレット・ダニエル; マックダーモット・エリック; 南泰浩; 中村篤; 片桐滋
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
Oct. 2001

連続音声認識にためのネットワーク構造をもちいた効率的探索手法
花沢健; 南泰浩; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
Mar. 2001

話者認識技術の実用化に向けて
松井知子; 吉岡理; 南泰浩
Oral presentation, Japanese, 映像情報メディア学会技術報告マルチメディア情報処理研究会
1998

パワーの分散を考慮した拡張ｈｍｍ合成法
南泰浩; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会講演論文集
1997

自由発声中の連続数字音声認識
ボッシュ・エティエン; 南泰浩; ガジク・ボヤナ; 松岡達雄; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1997

Evaluation up Speech Recognition Performance Degradation for a Moving Speaker in Anechoic Conditions
ジロン・フランク; 田中雅史; 古家賢一; 南泰浩
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1997

雑音と歪みを含んだ音声への HMM 適応化手法の評価
南泰浩; 高木幸一; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1997

尤度最大化原理による HMM 適応化法
南泰浩; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
Mar. 1996

パワーの分散を考慮した拡張 HMM 合成法
南泰浩; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1996

雑音と歪みを含んだ音声への HMM 適応化手法の評価
南泰浩; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1996

HMM 合成に基づく尤度最大化適応法
南泰浩; 古井貞煕
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
1995

最尤推定法を用いた HMM 適応化法
南泰浩; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1995

電話音声認識のための音響モデルの回線特性への適応化
松岡達雄; グロ・ピエールエマニエル; 南泰浩; 古井貞煕
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1995

電話番号案内を対象としたマルチモーダル対話システムの作成と音声入力の評価
吉岡理; 南泰浩; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
1994

電話番号案内を対象としたマルチモーダル対話システムにおける音声入力の評価
吉岡理; 南泰浩; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1994

HMM トレリス計算のおける状態継続時間制限アルゴリズム
高橋敏; 南泰浩; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1994

HMM トレリス計算のおける状態継続時間制限アルゴリズム
高橋敏; 南泰浩; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
Oct. 1993

Improving Phoneme HMMs for Large -Vocabulary Spontaneous Speech Recognition
高橋敏; 南泰浩; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
1993

Atr における連続音声認識システム Atreus の諸方式と性能
永井明人; 山口耕市; 鷹見淳一; 大倉計美; 小坂哲夫; 福沢圭二; 加藤喜永; S. Harald; 村上仁一; 杉山雅英; 嵯峨山茂樹; 保坂順子; 森元逞; 北研二; 服部浩明; 小森康弘; 沢井秀文; 花沢利行; 中村哲; 甲斐充彦; 南泰浩; 川端豪; 鹿野清宏; 榑松明
Oral presentation, Japanese, 電子情報通信学会技術研究報告SP
1993

電話番号案内を対象としたマルチモーダル対話システムの作成
吉岡理; 南泰浩; 山田智一; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1993

自由発声を対象とした不特定話者大語彙連続音声認識法
南泰浩; 鹿野清宏; 高橋敏; 山田智一
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1993

音韻環境依存 HMM と候補のマージを用いた不特定話者大語彙連続音声認識
南泰浩; 高橋敏; 鹿野清宏; 山田智一
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1993

自由発声音声認識における意味を考慮した2段 Lr パーザの検討
南泰浩; 山田智一; 吉岡理; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1993

HMM の合成による雑音下の大語彙連続音声認識
南泰浩; フランクマルタン; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1993

エルゴディック雑音 HMM と音韻 HMM の合成による雑音重畳音声の認識
マルタン・フランク; 鹿野清宏; 南泰浩
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1993

番号案内を対象とした自由発声の認識の試み
鹿野清宏; 南泰浩; 山田智一
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1993

音韻認識における HMM のラベルなし評価法
南泰浩; 松岡達雄; 鹿野清宏
Oral presentation, Japanese, 連続音声認識シンポジウムSPREC
1992

フレーム間相関を用いた音韻 HMM
高橋敏; 南泰浩; 松岡達雄; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
1992

音韻認識における HMM のラベルなし評価法
南泰浩; 松岡達雄; 鹿野清宏
Oral presentation, Japanese, 連続音声認識シンポジウムSPREC
1992

不特定話者連続音声データベースによる連結学習 HMM の評価
南泰浩; 松岡達雄; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
1992

番号案内を対象とした大語彙連続音声認識アルゴリズム
南泰浩; 山田智一; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
1992

Recognition of Noisy Speech by Composition of Hidden Markov Models
マルタン・フランク; 鹿野清宏; 南泰浩; 岡部洋一
Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
1992

フレーム間相関を用いた連続型音韻 HMM
高橋敏; 南泰浩; 松岡達雄; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1992

音響学会連続音声データベースによる各種不特定話者 HMM の評価
南泰浩; 高橋敏; 松岡達雄; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1992

不特定話者連続音声データベースを用いた HMM の連結学習
南泰浩; 松岡達雄; 鹿野清宏
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1992

番号案内を対象とした大語彙連続音声認識アルゴリズム
南泰浩; 山田智一; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
1992

Recognition of Noisy Speech by Using the Composition of Hidden Markov Models
マルタン・フランク; 鹿野清宏; 南泰浩; 岡部洋一
Invited oral presentation, Japanese, 日本音響学会秋季講演論文集
1992

セパレ-トベクトル量子化を用いた HMM 音声認識の耐雑音性に対する検討
片岡淳; 南泰浩; 中川正雄
Invited oral presentation, Japanese, 電子情報通信学会春季全国大会講演論文集
1992

Tdnn 音韻スポッティングと予測 Lr パーザを用いた大語彙単語音声認識
南泰浩; 沢井秀文; 宮武正典; 鹿野清宏
Oral presentation, Japanese, 電子情報通信学会技術研究報告 SP
1990

有声音部の定常性を考慮したフレ-ムレ-ト選択型 PARCOR ボコ-ダ
佐藤正俊; 南泰浩; 水井潔; 中川正雄
Invited oral presentation, Japanese, 電子情報通信学会春季全国大会講演論文集
1990

トリグラムモデルを用いた連続単語音声認識における自動単語分類
神田直之; 南泰浩; 中川正雄
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1990

セパレートベクトル量子化を用いた HMM 音声認識の耐雑音性に関する検討
片岡淳; 南泰浩; 中川正雄
Oral presentation, Japanese, 第１２回情報理論とその応用シンポジューム
1989

可変ビットレート Adpcm ・ PARCOR 混成音楽符号化方式
岩元直久; 南泰浩; 中川正雄
Oral presentation, Japanese, 第１２回情報理論とその応用シンポジューム
1989

ベクトル量子化を用いる可変フレームレート PARCOR ボコーダ
佐藤正俊; 南泰浩; 水井潔; 中川正雄
Oral presentation, Japanese, 第１２回情報理論とその応用シンポジューム
1989

HMM 連続音声認識の高速化
南泰浩; 中川正雄
Invited oral presentation, Japanese, 日本音響学会春季講演論文集
1988

構文解析とプロダクションシステムを付加した連続音声認識
南泰浩; 中川正雄
Invited oral presentation, Japanese, 電子情報通信学会春季全国大会講演論文集
1988

Adfとプロダクションシステムによる異常信号の検出・除去
南泰浩; 中川正雄
Invited oral presentation, Japanese, 電子情報通信学会秋季全国大会講演論文集
1988

Adfとプロダクションシステムによる異常信号の検出・除去
南泰浩; 中川正雄
Invited oral presentation, Japanese, 電子情報通信学会秋季全国大会講演論文集
Mar. 1987

複数の応答⽣成モデルを⽤いた音声雑談対話システムの構築とその対話選択方式の検討
佐藤明智; 南泰浩郭恩孚
人工知能学会全国大会

Courses

ヒューマンインターフェース
The University of Electro-Communications

ヒューマンインターフェース
電気通信大学

インターンシップ２海外長期
The University of Electro-Communications

インターンシップ２海外長期
電気通信大学

インターンシップ２海外
The University of Electro-Communications

インターンシップ２海外
電気通信大学

インターンシップ２（長期）
The University of Electro-Communications

インターンシップ２（長期）
The University of Electro-Communications

インターンシップ１海外長期
The University of Electro-Communications

インターンシップ１海外長期
The University of Electro-Communications

インターンシップ１（海外）
The University of Electro-Communications

インターンシップ１（海外）
電気通信大学

インターンシップ１（長期）
The University of Electro-Communications

インターンシップ１（長期）
The University of Electro-Communications

認知インタラクションデザイン学
京都工芸繊維大学

認知インタラクションデザイン学
京都工芸繊維大学

インターンシップ２
The University of Electro-Communications

インターンシップ２
電気通信大学

インターンシップ１
The University of Electro-Communications

インターンシップ１
The University of Electro-Communications

情報システム基礎学合同輪講
The University of Electro-Communications

情報システム基礎学合同輪講
電気通信大学

情報システム基礎論１
The University of Electro-Communications

ＦｏｕｎｄａｔｉｏｎｓｏｆＩｎｆｏｒｍａｔｉｏｎＳｙｓｔｅｍｓ１
The University of Electro-Communications

情報システム基礎論１
電気通信大学

応用情報学特論第4
岐阜大学

応用情報学特論第4
岐阜大学

ネットワーク技術と高度情報科社会
大阪大学

ネットワーク技術と高度情報科社会
大阪大学

Affiliated academic society

日本音響学会

ＩＥＥＥ

情報処理学会

電子情報通信学会

言語処理学会

Research Themes

幼児語彙発達大規模データの収集と工学的な解析に基づく語彙発達過程の解明
南泰浩
日本学術振興会, 科学研究費助成事業基盤研究(B), 電気通信大学, 基盤研究(B), 23H00623
Apr. 2023 - Mar. 2027

大規模データ処理による網羅的データを用いた言語発達機構の解析とその応用
南泰浩
Principal investigator
01 Apr. 2017 - 31 Mar. 2020

人とロボットの共生による協創社会の創成「人ロボット共生学」ロボットのコミュニケーション戦略の生成
01 Oct. 2009 - 31 Mar. 2013

Industrial Property Rights

語彙発達指標推定装置、語彙発達指標推定方法、プログラム
Patent right, 南泰浩, 小林哲生, 特許7213509, Date registered: 19 Jan. 2023

能力推定装置，語選択装置，これらの方法及びプログラム
Patent right, 南泰浩, 森山佑亮, 小林哲生, 特願2017-138791, Date applied: 18 Jul. 2017, 特許6850218, Date issued: 31 Mar. 2021

幼児単語探索装置とその方法とプログラム
Patent right, 2012119556, Date applied: 2012, 5806642, Date issued: 11 Sep. 2015

語彙学習速度予測パラメータ生成装置と語彙学習速度予測装置とそれらの方法とプログラム
Patent right, 2012119555, Date applied: 2012, 5785905, Date issued: 31 Jul. 2015

難易度学習装置、難易度推定モデル学習装置、難易度推定装置、方法、及びプログラム
Patent right, 特願2015-031004, Date applied: 19 Feb. 2015

難易度推定モデル学習装置、難易度推定装置、方法、及びプログラム
Patent right, 特願2015-031000, Date applied: 19 Feb. 2015

難易度推定式学習装置、難易度推定装置、方法、及びプログラム
Patent right, 特願2015-030997, Date applied: 19 Feb. 2015

単語提示装置、計算装置、これらの方法及びプログラム
Patent right, 特願2014-256876, Date applied: 19 Dec. 2014

単語提示装置、方法及びプログラム
Patent right, 特願2014-255495, Date applied: 17 Dec. 2014

発話候補作成装置とその方法とプログラム
Patent right, 2013035865, Date applied: 2013

幼児語彙理解難易度評価装置と幼児語彙検索装置と幼児語彙分類装置と，それらの方法とプログラム
Patent right, 2013024274, Date applied: 2013

報酬関数推定装置，報酬関数推定方法，およびプログラム,
Patent right, 2012096453, Date applied: 2012

理解語月齢テーブル生成装置，対象年齢推定装置，方法，及びプログラム
Patent right, 2012128334, Date applied: 2012

語彙学習関数推定装置，語彙学習関数推定方法及びそのプログラム
Patent right, 2012192939, Date applied: 2012

語彙学習関数推定装置，語彙学習関数推定方法及びそのプログラム
Patent right, 2012192938, Date applied: 2012

特徴検出装置，特徴検出方法及びそのプログラム
Patent right, 2012192937, Date applied: 2012

語彙学習曲線パラメータ推定装置，方法，及びプログラム, 出願番号
Patent right, 2012029951, Date applied: 2012

語彙学習曲線パラメータ推定装置，方法，及びプログラム, 出願番号
Patent right, 2012029950, Date applied: 2012

幼児単語探索装置とその方法とプログラム
Patent right, 2012119556, Date applied: 2012

理解語月齢テーブル生成装置，対象年齢推定装置，方法，及びプログラム
Patent right, 2012128334, Date applied: 2012

対話学習装置，要約装置，対話学習方法，要約方法，プログラム
Patent right, 2010179330, Date applied: 2011, 5346327

語彙学習速度推定装置，方法，及びプログラム
Patent right, 2012029949, Date applied: 2011

語彙学習曲線パラメータ推定装置，方法，及びプログラム
Patent right, 2012029950, Date applied: 2011

コミュニケーションエージェントの動作制御装置，コミュニケーションエージェントの動作制御方法，及びそのプログラム
Patent right, 2011139777, Date applied: 2011

対話モデル構築装置
Patent right, 2011110989, Date applied: 2011

文脈依存性推定装置，発話クラスタリング装置，方法，及びプログラム
Patent right, 2011184054, Date applied: 2011

行動タイミング決定装置，行動タイミング決定方法，およびそのプログラム
Patent right, 2011035826, Date applied: 2011

語彙爆発時期推定装置，方法，及びプログラム
Patent right, 2011066456, Date applied: 2011

語彙爆発時期推定装置，方法，及びプログラム
Patent right, 2011060851, Date applied: 2011

対話評価装置，方法及びプログラム
Patent right, 2011110989, Date applied: 2011

行動制御装置，行動制御方法及び行動制御プログラム
Patent right, 2011050493, Date applied: 2011

語彙学習速度推定装置，方法，及びプログラム
Patent right, 2012029949, Date applied: 2011

行動タイミング決定方法，およびそのプログラム
Patent right, 2010203895, Date applied: 2010, 5361832

行動制御装置，行動制御方法及び行動制御プログラム
Patent right, 2010272627, Date applied: 2010, 5427163

要約装置，要約作成方法及びプログラム
Patent right, 2010271397, Date applied: 2010

対話学習装置，対話分析装置，対話学習方法，対話分析方法
Patent right, 2010126882, Date applied: 2010

対話型映像認識理解における動的学習戦略に関する試み
Patent right, 2011017057, Date applied: 2010

多人数思考喚起型対話装置，多人数思考喚起型対話方法，多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
Patent right, 2010186237, Date applied: 2010

対話からの性格特徴判定装置
Patent right, 2009215267, Date applied: 2009, 5281527

聞き役対話識別装置
Patent right, 2009192875, Date applied: 2009, 5150583

多人数思考喚起型対話装置，多人数思考喚起型対話方法，多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
Patent right, 2009028605, Date applied: 2009, 5218514

行動制御学習方法，行動制御学習装置，行動制御学習プログラム
Patent right, 2009199376, Date applied: 2009, 5361615

多人数思考喚起型対話装置，多人数思考喚起型対話方法，多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
Patent right, 2009028605, Date applied: 2009, 5218514

音声信号モデル化方法，信号認識装置及び方法，パラメータ学習装置及び方法，特徴量生成装置及び方法並びにプログラム
Patent right, 200949901, Date applied: 2009

能力推定装置、方法及びプログラム
Patent right, 特願2020-193982

語選択装置、方法及びプログラム
Patent right, 特願2020-193983

語彙発達指標推定装置、語彙発達指標推定方法、プログラム
Patent right, 特願2019-006697