検索詳細｜電気通信大学

南　泰浩

情報・ネットワーク工学専攻	教授
Ⅰ類（情報系）	教授
人工知能先端研究センター	教授

プロフィール:
昭和61慶大・理工・電気卒．平成３同大大学院博士課程了．同年NTT入社．平成１１-１２MIT客員研究員．現在，電気通信大学教授．工博．音声認識，音声対話処理，知能情報処理の研究に従事．平成５日本音響学会粟屋潔学術奨励賞，平成１５本会論文賞受賞．平成１７年テレコムシステム技術賞受賞，平成１８情報処理学会創立45周年記念論文優秀論文賞受賞．平成２０テレコムシステム技術賞受賞．IEEE，日本音響学会，電子情報通信学会，情報処理学会各会員．

研究者情報

学位

工学博士, 慶應義塾大学

研究分野

情報通信, 学習支援システム

情報通信, 知能ロボティクス

情報通信, 知能情報学

情報通信, ヒューマンインタフェース、インタラクション

人文・社会, 認知科学

経歴

2014年07月01日
電気通信大学情報システム学研究科

2013年07月01日 - 2014年06月30日
日本電信電話株式会社, コミュニケーションサイエンス基礎研究所, コミュニケーション環境研究グループグループリーダー

2002年03月01日 - 2013年06月30日
日本電信電話株式会社, コミュニケーションサイエンス基礎研究所, 主任研究員

1996年03月01日 - 2002年02月28日
日本電信電話株式会社, ヒューマンインタフェース研究所, 主任研究員

1991年04月01日 - 1996年02月28日
日本電信電話株式会社, ヒューマンインタフェース研究所

学歴

1988年04月01日 - 1991年03月01日
慶應義塾大学, 理工学研究科, 電気工学専攻

1986年04月01日 - 1988年03月01日
慶應義塾大学, 理工学研究科, 電気工学専攻

1982年04月01日 - 1986年03月01日
慶應義塾大学, 理工学部, 電気工学科

1981年03月31日
都立三田

委員歴

2015年04月01日
情報処理学会SLP研究会

2013年08月
幼児言語発達研究会幹事, 学協会

2011年 - 2013年
音響学会関西支部評議委員

2013年
情報処理学会関西支部委員, 学協会

2009年 - 2012年
情報処理学会関西支部幹事

2011年
音響学会代議員, 学協会

2011年
音響学会評議委員, 学協会

研究活動情報

受賞

受賞日 2023年12月
第14回対話システムシンポジウム
Rissociation
第6回対話システムライブコンペティション優秀賞, 松浦直樹;大沼飛宇多;中山朝陽;佐藤明智;南泰浩

受賞日 2023年03月
言語処理学会
言語処理学会第29回年次大会優秀賞, 藤田守太;南泰浩

受賞日 2022年03月
言語処理学会
対話での共通基盤構築過程における名付けの分析
言語処理学会第２８回年次大会委員特別賞, 齋藤結;光田航;東中竜一;南泰浩
国内学会・会議・シンポジウム等の賞

受賞日 2020年03月
言語処理学会
センター試験を対象とした高性能な英語ソルバーの実現
言語処理学会第26回年次大会優秀賞
国内学会・会議・シンポジウム等の賞

受賞日 2018年03月
言語処理学会
DRQNによる幼児の語彙獲得のモデル化
若手奨励賞

受賞日 2017年03月
言語処理学会
「ロボットは東大に入れるか」プロジェクト：代ゼミセンター模試タスクにおけるエラーの分析
言語処理学会論文賞, 松崎拓也;横野光;宮尾祐介;川添愛;狩野芳伸;加納隼人;佐藤理史;東中竜一郎;杉山弘晃;磯崎秀樹;菊井玄一郎;堂坂浩二;平博順;南泰浩;新井紀子

受賞日 2014年
人工知能学会2014年度全国大会優秀賞，受賞者は筆頭著者の目黒のみ

受賞日 2013年
言語処理学会第18回年次大会優秀賞

受賞日 2012年
赤ちゃん学会ポスター優秀発表賞

受賞日 2012年
NTT知的財産センタ所長表彰

受賞日 2011年
人工知能学会 2011年度全国大会優秀賞，受賞者は筆頭著者の堂坂のみ

受賞日 2010年
人工知能学会2010年度全国大会優秀賞，受賞者は筆頭著者の堂坂のみ

受賞日 2008年
テレコムシステム技術賞, 南泰浩

受賞日 2008年
COLING Best paper finalist

受賞日 2007年
NTTテクニカルレビュー特集論文賞

受賞日 2007年
NTTコミュニケーション科学基礎研究所長表彰特別賞

受賞日 2006年
情報処理学会創立４５周年記念論文「50年後の情報科学技術をめざして」優秀論文賞, 南泰浩

受賞日 2005年
テレコムシステム技術賞, 南泰浩

受賞日 2005年
NTT先端技術総合研究所長表彰研究開発賞

受賞日 2005年
NTTコミュニケーション科学基礎研究所長表彰

受賞日 2004年
電子情報通信学会論文賞, 南泰浩

受賞日 1996年
NTT研究技術開発本部長表彰

受賞日 1993年
音響学会粟屋潔学術奨励賞, 南泰浩

論文

CIF-RNNT: Streaming ASR Via Acoustic Word Embeddings with Continuous Integrate-and-Fire and RNN-Transducers
Wen Shen Teo; Yasuhiro Minami
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 出版日 2024年04月14日
研究論文（国際会議プロシーディングス）, 英語
URL
DOI URL

Prediction of BPSD Using Environmental and Vital Sensor Data
Hyuta Onuma; Naoya Tokiwa; Junichi Shibata; Toshikazu Suzuki; Takehiko Kashiwagi; Tatsuya Moe; Kaito Kamura; Tatsunoshin Shinmi; Shunichi Tano; Yasuhiro Minami
2024 IEEE First International Conference on Artificial Intelligence for Medicine, Health and Care (AIMHC), IEEE, 出版日 2024年02月05日
研究論文（国際会議プロシーディングス）, 英語
URL
DOI URL

Model of infant vocabulary acquisition through mental state modeling and reinforcement learning
Shuta Fujita; Yasuhiro Minami
AMLAP, 掲載ページ 77-77, 出版日 2023年08月

Dataset Construction for Scientific-Document Writing Support by Extracting Related Work Section and Citations from PDF Papers
Keita Kobayashi; Kohei Koyama; Hiromi Narimatsu; Yasuhiro Minami
13th Edition of its Language Resources and Evaluation Conference, to appear巻, 出版日 2022年06月14日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Probabilistic model using HDP producing vocabularies of Japanese children
Yasuhiro Minami; Tessei Kobayashi
筆頭著者, Conference on Interdisciplinary Advances in Statistical Learning, 掲載ページ 96-96, 出版日 2022年06月, 査読付

Characteristic of language development mechanism of children in the residential care institution
Yuka Sakamoto; Yuko Okumura; Yasuhiro Minami; Ryoko Mugitani; Kayoko Ito; Tessei Kobayashi
International Congress of Psychology, ICP2020巻, 出版日 2021年07月19日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Predicting New Words for Young Japanese Children using Large-scaled Japanese Child Vocabulary Development Database
Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi; Yuka Sakamoto
International Congress of Psychology, ICP2020巻, 出版日 2021年07月19日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Using mobile phone data to estimate the relationship between population flow and influenza infection pathways
Qiushi Chen; Michiko Tsubaki; Yasuhiro Minami; Kazutoshi; Fujibayashi; Tetsuro Yumoto; Junzo Kamei; Yuka Yamada; Hidenori; Kominato; Hideki Oono; Toshio Naito
MDPI, International Journal of Environmental Research and Public Health, 18巻, 14号, 出版日 2021年, 査読付, 国際誌, This study aimed to analyze population flow using global positioning system (GPS) location data and evaluate influenza infection pathways by determining the relationship between population flow and the number of drugs sold at pharmacies. Neural collective graphical models (NCGMs; Iwata and Shimizu 2019) were applied for 25 cell areas, each measuring 10 × 10 km2, in Osaka, Kyoto, Nara, and Hyogo prefectures to estimate population flow. An NCGM uses a neural network to incorporate the spatiotemporal dependency issue and reduce the estimated parameters. The prescription peaks between several cells with high population flow showed a high correlation with a delay of one to two days or with a seven-day time-lag. It was observed that not much population flows from one cell to the outside area on weekdays. This observation may have been due to geographical features and undeveloped transportation networks. The number of prescriptions for anti-influenza drugs in that cell remained low during the observation period. The present results indicate that influenza did not spread to areas with undeveloped traffic networks, and the peak number of drug prescriptions arrived with a time lag of several days in areas with a high amount of area-to-area movement due to commuting.
研究論文（学術雑誌）, 英語
DOI URL

Task Definition and Integration For Scientific-Document Writing Support
H. Narimatsu; K. Koyama; K. Dohsaka; R. Higashinaka; Y. Minami; H. Taira
Online: Association for Computational Linguistics, 発表予定巻, 掲載ページ 18-26, 出版日 2021年, 査読付
研究論文（国際会議プロシーディングス）, 英語

Properties of early vocabulary development in Japanese-English bilingual children
Yuka Sakamoto; Yuko Okumura; Tessei Kobayashi; Yasuhiro Minami
BCCCD 2020 (Budapest CEU Conference on cognitive development), 掲載ページ PB-038, 出版日 2020年01月20日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Vocabulary Size As Explanatory Variable for Japanese-Speaking Children’s Vocabulary Development
Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi
ICPS, 発表予定巻, 出版日 2019年03月07日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database
Yasuhiro Minami; Tessei Kobayashi; Yuko Okumura
LREC 2018, P26巻, 出版日 2018年05月10日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Analyzing Vocabulary Commonality Index Using Large-scaled Database of Child Language Development
Yan Cao; Yasuhiro Minami; Yuko Okumura; Tessei Kobayashi
LREC 2018, P55巻, 出版日 2018年05月10日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Acquisition of infant-directed speech words in Japanese-speaking children: Analysis using large-scale vocabulary-checklist data
Yuko OKUMURA; Tessei KOBAYASHI; Yasuhiro MINAMI; Yusuke MORIYAMA
interdisciplinary advances in statistical learning, to appear巻, 出版日 2017年06月28日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Word acquisition correlation in Japanese-speaking children using large-scale infant vocabulary development database
Yasuhiro Minami; Yusuke Moriyama; Tessei Kobayash; Yuko Okumura
interdisciplinary advances in statistical learning, To appear巻, 出版日 2017年06月28日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Acquisition of mental state language in Japanese-speaking children: Analysis using large-scale vocabulary-checklist data
Yuko OKUMURA; Tessei KOBAYASHI; Yasuhiro MINAMI
WILD, to appear巻, 出版日 2017年06月14日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion
Toru Nakashika; Yasuhiro Minami
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, SPRINGER INTERNATIONAL PUBLISHING AG, 掲載ページ 1-10, 出版日 2017年06月, 査読付, In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: (1) the data used for the training is limited to the pre-defined sentences, (2) the trained model is only applied to the speaker pair used in the training, and (3) a mismatch in alignment may occur. Although it is generally preferable in VC to not use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by combining the two. Our experimental results showed that our approach outperformed the conventional non-parallel approach regarding objective and subjective criteria.
研究論文（学術雑誌）, 英語
DOI URL

Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
Toru Nakashika; Tetsuya Takiguchi; Yasuhiro Minami
IEEE Transactions on Audio, Speech and Language Processing, 24巻, 11号, 掲載ページ 2045, 出版日 2016年10月, 査読付
研究論文（学術雑誌）, 英語
DOI URL

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
Toru Nakashika; Yasuhiro Minami
Proceedings of the 17th Conference of the International Speech Communication Association (Interspeech 2016), 掲載ページ 1487-1491, 出版日 2016年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

3WRBM-Based Speech Factor Modeling for Arbitrary-Source and Non-Parallel Voice Conversion
Toru Nakashika; Yasuhiro Minami
Interspeech 2016, 掲載ページ 1487-1491, 出版日 2016年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
Toru Nakashika; Tetsuya Takiguchi; Yasuhiro Minami
IEEE/ACM Transactions on Audio, Speech and Language Processing, 23巻, 3号, 掲載ページ 1-14, 出版日 2016年08月, 査読付
研究論文（学術雑誌）, 英語

3WRBM-Based Speech Factor Modeling for Arbitrary-Source and Non-Parallel Voice Conversion
Toru Nakashika; Yasuhiro Minami
EUSIPCO, 掲載ページ 607-611, 出版日 2016年08月, 査読付
研究論文（国際会議プロシーディングス）, 英語

「ロボットは東大に入れるか」プロジェクト：代ゼミセンター模試タスクにおけるエラーの分析
松崎拓也; 横野光; 宮尾祐介; 川添愛; 狩野芳伸; 加納隼人; 佐藤理史; 東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩; 新井紀子
自然言語処理, 23巻, 1号, 出版日 2016年01月, 査読付
研究論文（学術雑誌）, 日本語

SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
Torsi Nakashika; Yasuhiro Minami
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, IEEE, 掲載ページ 5530-5534, 出版日 2016年, 査読付, In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is a technique where only speaker specific information in source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems; 1) the data used for the training is limited to the pre-defined sentences, 2) the trained model is only applied to the speaker pair used in the training, and 3) mismatch in alignment may happen. Although it is, thus, fairy preferable in VC not to use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then a voice-converted speech is obtained by mixing the two. Our experimental results showed that our approach unfortunately fell short of the popular conventional GMM-based method that used parallel data, but outperformed the conventional non-parallel approach.
研究論文（国際会議プロシーディングス）, 英語

幼児を対象としたテキストの対象年齢推定方法
藤田早苗; 小林哲生; 南泰浩; 杉山弘晃
認知科学, 22巻, 4号, 掲載ページ 1-17, 出版日 2015年12月, 査読付
研究論文（学術雑誌）, 日本語

Fluctuating Development of Common Nouns and Predicates in Early Lexical Development: Evidence from Analysis of Large sample Vocabulary Checklist Data in Japanese children
Tessei Kobayashi; Yasuhiro Minami; Yuko Okumura
ECDP, To appear巻, 出版日 2015年09月08日, 査読付
研究論文（国際会議プロシーディングス）, 英語

Taking the English exam for the "can a robot get into the University of Tokyo?" project
Ryuichiro Higashinaka; Hiroaki Sugiyama; Hideki Isozaki; Genichiro Kikui; Kohji Dohsaka; Hirotoshi Taira; Yasuhiro Minami
NTT Technical Review, 13巻, 7号, 出版日 2015年07月01日, NTT and its research partners are participating in the "Can a robot get into the University of Tokyo?" project run by the National Institute of Informatics, which involves tackling English exams. The artificial intelligence system we developed took a mock test in 2014 and achieved a better-than-human-average score for the first time. This was a notable achievement since English exams require English knowledge and also common sense knowledge that humans take for granted but that computers do not necessarily possess. In this article, we describe how our artificial intelligence system takes on English exams.
研究論文（学術雑誌）

Gender variability of child word-comprehension and -production days
Yasuhiro Minami; Tessei Kobayashi
WILD, 未定巻, 出版日 2015年06月10日, 査読付
研究論文（国際会議プロシーディングス）, 英語

任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成
杉山弘晃; 目黒豊美; 東中竜一郎; 南泰浩
人工知能学会論文誌, 人工知能学会, 30巻, 1号, 掲載ページ 183-194, 出版日 2015年01月, 査読付
研究論文（学術雑誌）, 日本語

「ロボットは東大に入れるか」における英語問題の回答手法
東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
NTT技術ジャーナル, 電気通信協会, 27巻, 4号, 掲載ページ 63-66, 出版日 2015年
研究論文（大学，研究機関等紀要）, 日本語
URL

Effects of Conversational Agents on Activation of Communication in Thought-Evoking Multi-Party Dialogues
Kohji Dohsaka; Ryota Asai; Ryuichiro Higashinaka; Yasuhiro Minami; Eisaku Maeda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E97D巻, 8号, 掲載ページ 2147-2156, 出版日 2014年08月, 査読付, This paper presents an experimental study that analyzes how conversational agents activate human communication in thought-evoking multi-party dialogues between multi-users and multi-agents. A thought-evoking dialogue is a kind of interaction in which agents act to provoke user thinking, and it has the potential to activate multi-party interactions. This paper focuses on quiz-style multi-party dialogues between two users and two agents as an example of thought-evoking multi-party dialogues. The experimental results revealed that the presence of a peer agent significantly improved user satisfaction and increased the number of user utterances in quiz-style multi-party dialogues. We also found that agents' empathic expressions significantly improved user satisfaction, improved user ratings of the peer agent, and increased the number of user utterances. Our findings should be useful for activating multi-party communications in various applications such as pedagogical agents and community facilitators.
研究論文（学術雑誌）, 英語
DOI URL

語の長さと幼児の語彙獲得時期・期間との相関
南泰浩; 小林哲生
音声学会, 17巻, 3号, 掲載ページ 44-53, 出版日 2014年03月, 査読付
研究論文（学術雑誌）, 日本語

Large-scale collection and analysis of personal question-answer pairs for conversational agents
Hiroaki Sugiyama; Toyomi Meguro; Ryuichiro Higashinaka; Yasuhiro Minami
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 8637巻, 掲載ページ 420-433, 出版日 2014年, 査読付, In conversation, a speaker sometimes asks questions that relate to another speaker's detailed personality, such as his/her favorite foods and sports. This behavior also appears in conversations with conversational agents
therefore, agents should be developed that can respond to such questions. In previous agents, this was achieved by creating question-answer pairs defined by hand. However, when a small number of persons create the pairs, we cannot know what types of questions are frequently asked. This makes it difficult to know whether the created questions cover frequently asked questions
therefore, such essential question-answer pairs for conversational agents are possibly overlooked. This study analyzes a large number of question-answer pairs for six personae created by many question-generators, with one answer-generator for each persona. The proposed approach allows many questioners to create questions for various personae, enabling us to investigate the types of questions that are frequently asked. A comparison with questions appearing in conversations between humans shows that 50.2% of the questions were contained in our question-answer pairs and the coverage rate was almost saturated with the 20 recruited question-generators. © 2014 Springer International Publishing Switzerland.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

OPEN-DOMAIN UTTERANCE GENERATION USING PHRASE PAIRS BASED ON DEPENDENCY RELATIONS
Hiroaki Sugiyama; Toyomi Meguro; Ryuichiro Higashinaka; Yasuhiro Minami
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, IEEE, PM2.201巻, 掲載ページ 60-65, 出版日 2014年, 査読付, The development of open-domain conversational systems remains difficult since user utterances are widely varied for such systems to respond appropriately. To address this issue, previous research has retrieved sentences from the web as system utterances by shallow sentence matching with user utterances. However, since the retrieved sentences include the inherent contexts of the document in which the sentences originally appeared, the retrieved sentences have the possibility of containing information that is irrelevant to user utterances. We propose combining two strongly related semantic units (phrase pairs with dependency relations) to create a system utterance. Here, the first semantic unit is the one found in the user utterance and the second semantic unit is the one that has a dependency relation with the first one in a large text corpus. This way, we can guarantee that the generated utterance is related to the input user utterance. Our experiments, which examine the appropriateness of response sentences, show that our proposed method significantly outperforms other retrieval and rule-based approaches.
研究論文（国際会議プロシーディングス）, 英語

幼児の発達に応じた語彙検索システム
南泰浩; 小林哲生
電子情報通信学会論文誌, 一般社団法人電子情報通信学会, J96-D巻, 10号, 掲載ページ 2612-2624, 出版日 2013年10月, 査読付, 本論文では,幼児発達の基礎研究や幼児向けのコンテンツ作成を支援するための,語彙検索システムの作成を試みた.このシステムは,語彙チェックリスト法により取得/解析した大規模横断データを用いて,理解・発話の点から日本語を学習する幼児がいつどんな語をどの程度習得するのかを簡単に且つ高精度に調べることができる.
研究論文（学術雑誌）, 日本語
URL
URL 2

Open-Domain Utterance Generation for Conversational Dialogue Systems Using Web-Scale Dependency Structures
H. Sugiyama; T. Meguro; R. Higashinaka; Y. Minami
SIGdial, 掲載ページ 22-24, 出版日 2013年08月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Vocabulary Spurt and Noun Acquisition: Evidence from Longitudinal Data in Japanese-Speaking Children
T. Kobayashi; Y. Minami; H. Sugiyama
CLS, Poster巻, 出版日 2013年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Influence of Predominance in Noun Learning Examined by Period from Comprehending to Producing Words: A Cross-Linguistic Statistical Investigation Using CDI
Y. Minami; T. Kobayashi
WILD, Poster巻, 出版日 2013年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Cross-Linguistic Universality of Word Acquisition Ages in Comprehension and Production
Y. Minami; T. Kobayashi
WILD, Poster巻, 出版日 2013年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Individual Variation of Word Acquisition Age: A Comparison of Japanese- and English-Speaking Infants
H. Sugiyama; T. Kobayashi; Y. Minami
WILD, Poster巻, 出版日 2013年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Word-Class Composition in First 20 Words Predicts Later Word Acquisition Rate
T. Kobayashi; Y. Minami; H. Sugiyama
SRCD Biennial Meeting, 3-044 (157)巻, 出版日 2013年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

「語彙爆発の新しい視点」のさらなる検証
小林哲生; 南泰浩; 杉山弘晃
ベビーサイエンス, 12巻, 掲載ページ 55-58, 出版日 2013年03月, 査読付
研究論文（学術雑誌）, 日本語

語彙爆発の新しい視点：日本語学習児の初期語彙発達に関する縦断データ解析
小林哲生; 南泰浩; 杉山弘晃
ベビーサイエンス, 12巻, 掲載ページ 34-49, 出版日 2013年03月, 査読付
研究論文（学術雑誌）, 日本語

Learning to control listening-oriented dialogue using partially observable markov decision processes
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
ACM Transactions on Speech and Language Processing, 10巻, 4号, 掲載ページ 761-769, 出版日 2013年, 査読付, Our aim is to build listening agents that attentively listen to their users and satisfy their desire to speak and have themselves heard. This article investigates how to automatically create a dialogue control component of such a listening agent.We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component that satisfies users by means of Partially Observable Markov Decision Processes (POMDPs). Using a hybrid dialog controller where high-level dialog acts are chosen with a statistical policy and low-level slot values are populated by a wizard, we evaluated our dialogue control method in aWizard-of-Oz experiment. The experimental results show that our POMDPbased method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This article is the first to verify, by using human users, the usefulness of POMDPbased dialogue control for improving user satisfaction in nontask-oriented dialogue systems. © 2013 ACM 1550-4875/2013/12-ART17 15.00.
研究論文（学術雑誌）, 英語
DOI URL

Differences between Noun and Verb Learning Periods from Comprehension to Production in Early Language Development
Y. Minami; T. Kobayashi
BCCCD, 掲載ページ 174-174, 出版日 2013年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Learning to control listening-oriented dialogue using partially observable markov decision processes
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
ACM Transactions on Speech and Language Processing, 10巻, 4号, 出版日 2013年, 査読付, Our aim is to build listening agents that attentively listen to their users and satisfy their desire to speak and have themselves heard. This article investigates how to automatically create a dialogue control component of such a listening agent.We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component that satisfies users by means of Partially Observable Markov Decision Processes (POMDPs). Using a hybrid dialog controller where high-level dialog acts are chosen with a statistical policy and low-level slot values are populated by a wizard, we evaluated our dialogue control method in aWizard-of-Oz experiment. The experimental results show that our POMDPbased method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This article is the first to verify, by using human users, the usefulness of POMDPbased dialogue control for improving user satisfaction in nontask-oriented dialogue systems. © 2013 ACM 1550-4875/2013/12-ART17 15.00.
研究論文（学術雑誌）, 英語
DOI URL

聞き役対話の分析及び分析に基づいた対話制御部の構築
目黒豊美; 南泰浩; 東中竜一郎; 堂坂浩二
情報処理学会論文誌, 53巻, 12号, 掲載ページ 2787-2801, 出版日 2012年12月, 査読付
研究論文（学術雑誌）, 日本語

Vocabulary Spurt and Word-Class Composition: Further Evidence for a Model of Plateaus and Linearity in Early Vocabulary Growth
T. Kobayashi; Y. Minami; H. Sugiyama
AMLaP, Poster巻, 出版日 2012年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Plateaus and Linearity of Early Vocabulary Growth
Y. Minami; T. Kobayashi; H. Sugiyama
ISSBD, P3.73巻, 出版日 2012年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Prediction of Vocabulary Growth Using Local Linearity
H. Sugiyama; Y. Minami; T. Kobayashi
ISSBD, P3. 67巻, 出版日 2012年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語

聞き役対話の分析及び分析に基づいた対話制御部の構築
目黒豊美; 東中竜一郎; 堂坂浩二; 南泰浩
情報処理学会論文誌, 52巻, 11号, 出版日 2012年, 査読付
研究論文（学術雑誌）, 日本語

情報提示対話を主導するシステムのためのユーザの潜在的情報要求の推定
杉山弘晃; 南泰浩
電子情報通信学会論文誌A, 一般社団法人電子情報通信学会, 95-A巻, 1号, 掲載ページ 79-84, 出版日 2012年01月, 査読付, 本研究では,ユーザへ情報を提示するシステムのための,ユーザの潜在的な情報要求の推定に基づく新たな情報提示タイミング決定方策を提案する.この方策により,システムは早過ぎる情報提示を抑制し,ユーザへ煩わしさを感じさせることなく主体的に情報提示することが可能になる.本研究ではこの方策におけるマルチモーダル情報の寄与を検証するため,最初に人と人のインタラクション実験を行い,利用可能なモダリティが変化したときの人が行う情報要求推定精度の変化について分析する.分析を通して,人はマルチモーダル情報を利用できないときは対話の流れを利用し,利用可能なときはマルチモーダル情報を利用することが示された.この結果をもとに,人の情報要求推定を実現するためのモデルを提案し,ユーザの潜在的な情報要求を表出させるよう設計した連想クイズ対話実験を通してその有効性を示す.
研究論文（学術雑誌）, 日本語
URL

対話行為タイプ列 Trigram による行動予測確率に基づく Pomdp 対話制御
南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 森啓; 前田英作
電子情報通信学会論文誌A, 一般社団法人電子情報通信学会, 95-A巻, 1号, 掲載ページ 2-15, 出版日 2012年01月, 査読付, 我々は,これまで,タスク指向ではない対話に対してPOMDPによる対話制御のモデル化を行ってきた.POMDPを用いた対話制御は,短期的に多くの報酬を獲得する対話系列を生成するが,比較的長い自然な対話の流れを生成することには,必ずしも適さない.そこで,我々は,POMDPで定義された報酬と予測確率の高い行動を選択する報酬との間のトレードオフを実現する新たな報酬をPOMDPに導入した.本論文では,この行動予測確率に対話行為タイプ列のTrigram確率を用い,POMDP型の対話制御に組み込むことを試みた.これにより,提案手法は,POMDPで定義された報酬とTrigram確率による行動予測確率に基づく報酬とのトレードオフによる対話制御を実現することになる.提案手法は,従来のTrigram確率による対話制御では実現できなかった二つの目的を同時に考慮した対話制御を可能とする.また,提案手法は,POMDPの特徴である認識誤りへの頑健性をも併せ持つ.本論文では,提案手法を定式化するとともに,実際の対話行為タイプ列のデータを用いて,モデルを学習しシミュレーション実験により提案手法の評価を行った.この評価では,認識誤りをシミュレートするため,対話文から対話行為タイプ列へ変換する対話行為タイプ認識を実装し,その結果得られる認識傾向を利用した.実験を行った結果,提案手法の有効性が確認され,Trigram確率だけに基づく対話制御に比べ,対話行為タイプの認識誤りにも頑健であることも明らかになった.
研究論文（学術雑誌）, 日本語
URL

擬人化エージェントとの対話場面におけるユーザの非言語動作に基づく難／易および興味／退屈の推定
中村和晃; 角所考; 正司哲朗; 美濃導彦; 澤木美奈子; 南泰浩; 前田英作
電子情報通信学会論文誌A, 一般社団法人電子情報通信学会, 95-A巻, 1号, 掲載ページ 85-96, 出版日 2012年01月, 査読付, 本研究では,ユーザー擬人化エージェント間の音声対話場面を対象に,ユーザが対話内容に対し難しいと感じていたか否か("難/易"),興味をもっていたか否か("興味/退屈")を,そのユーザの非言語動作(視線,表情,姿勢,手振り)から推定する処理の実現を目指す.一般に,音声対話では対話の内容が決定・伝達されるまでに一定の時間経過を要するため,そのような対話内容に対する難/易等の心的状態も一定の時間区間に対して定義される.一方,こうした時間区間の中では,話者/聴者の交代や対話の文脈の変化といった状況変化が頻繁に生じ,一つひとつの状況が対話全体の中で果たす役割の違いに応じて,各瞬間での心的状態と非言語動作との関係が多様に変化する.このため,各瞬間における非言語動作を特徴量として時間区間ごとに定義される難/易や興味/退屈を推定することは難しい.そこで本研究では,各瞬間ごとではなく時間区間ごとに定義される量(具体的には各種非言語動作の表出頻度)を特徴量として難/易及び興味/退屈を推定することを提案する.提案方法の有効性を確かめるために実験を行った結果,約72%の推定精度が得られた.
研究論文（学術雑誌）, 日本語
URL

Preference-learning based Inverse Reinforcement Learning for Dialog Control
Hiroaki Sugiyama; Toyomi Meguro; Yasuhiro Minami
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, ISCA-INT SPEECH COMMUNICATION ASSOC, Mon.P1d.03巻, 掲載ページ 222-225, 出版日 2012年, 査読付, Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.
研究論文（国際会議プロシーディングス）, 英語

Multiple Vocabulary Spurts in Japanese Children
Y. Minami; H. Sugiyama; T. Kobayashi
IASCL, Poster巻, 出版日 2011年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Analysis of Vocabulary Spurt from Prediction Performance Evaluation
H. Sugiyama; T. Kobayashi; Y. Minami
SRCD, Poster巻, 出版日 2011年03月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Dialogue Control by Pomdp Using Dialogue Data Statistics.
Yasuhiro Minami; Akira Mori; Toyomi Meguro; Ryuichiro Higashinaka; Kohji Dohsaka; Eisaku Maeda
Spoken Dialogue Systems Technology and Design, Springer, 掲載ページ 163-186, 出版日 2011年, 査読付
研究論文（学術雑誌）, 英語
URL
DOI URL

Information Provision-timing Control for Informational Assistance Robot
Hiroaki Sugiyama; Yasuhiro Minami
PROCEEDINGS OF THE 6TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTIONS (HRI 2011), IEEE, 掲載ページ 259-260, 出版日 2011年, 査読付, This paper proposes a HMM-based user's information demand estimation model for autonomous informational assistance robots to avoid providing information prematurely. The model estimates the user's implicit information demands by predicting a user's next information request using user's head movements. Through a word-association quiz-dialog experiment, our model demonstrated superior prediction performance over the usual HMM-based classifier.
研究論文（国際会議プロシーディングス）, 英語

Unsupervised Clustering of Utterances using Non-parametric Bayesian Methods
Ryuichiro Higashinaka; Noriaki Kawamae; Kugatsu Sadamitsu; Yasuhiro Minami; Toyomi Meguro; Kohji Dohsaka; Hirohito Inagaki
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, ISCA-INT SPEECH COMMUNICATION ASSOC, 掲載ページ 2092-2095, 出版日 2011年, 査読付, Unsupervised clustering of utterances can be useful for the modeling of dialogue acts for dialogue applications. Previously, the Chinese restaurant process (CRP), a non-parametric Bayesian method, has been introduced and has shown promising results for the clustering of utterances in dialogue. This paper newly introduces the infinite HMM, which is also a non-parametric Bayesian method, and verifies its effectiveness. Experimental results in two dialogue domains show that the infinite HMM, which takes into account the sequence of utterances in its clustering process, significantly outperforms the CRP. Although the infinite HMM outperformed other methods, we also found that clustering complex dialogue data, such as human-human conversations, is still hard when compared to human-machine dialogues.
研究論文（国際会議プロシーディングス）, 英語

Evaluation of Listening-oriented Dialogue Control Rules based on the Analysis of HMMs
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, ISCA-INT SPEECH COMMUNICATION ASSOC, 掲載ページ 816-+, 出版日 2011年, 査読付, We have been working on listening-oriented dialogues for the purpose of building listening agents. In our previous work [1], we trained hidden Markov models (HMMs) from listening-oriented dialogues (LoDs) between humans, and by analyzing them, discovered a distinguishing dialogue flow of LoD. For example, listeners suppress their information giving and self-disclosure, and instead, increase acknowledgments and questions to elicit speakers' utterances. As an initial step for building listening agents, we decided to create dialogue control rules based on our analysis of the HMMs. We built our rule-based system and compared it with three other systems by a Wizard of Oz (WoZ) experiment. As a result, we found that our rule-based system achieved as much user satisfaction as human listeners.
研究論文（国際会議プロシーディングス）, 英語

Building a conversational model from two-tweets
Ryuichiro Higashinaka; Noriaki Kawamae; Kugatsu Sadamitsu; Yasuhiro Minami; Toyomi Meguro; Kohji Dohsaka; Hirohito Inagaki
2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings, 掲載ページ 330-335, 出版日 2011年, 査読付, The current problem in building a conversational model from Twitter data is the scarcity of long conversations. According to our statistics, more than 90% of conversations in Twitter are composed of just two tweets. Previous work has utilized only conversations lasting longer than three tweets for dialogue modeling so that more than a single interaction can be successfully modeled. This paper verifies, by experiment, that two-tweet exchanges alone can lead to conversational models that are comparable to those made from longer-tweet conversations. This finding leverages the value of Twitter as a dialogue corpus and opens the possibility of better conversational modeling using Twitter data. © 2011 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Wizard of Oz evaluation of listening-oriented dialogue control using POMDP
Toyomi Meguro; Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka
2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings, 掲載ページ 318-323, 出版日 2011年, 査読付, We have been working on dialogue control for listening agents. In our previous study [1], we proposed a dialogue control method that maximizes user satisfaction using partially observable Markov decision processes (POMDPs) and evaluated it by a dialogue simulation. We found that it significantly outperforms other stochastic dialogue control methods. However, this result does not necessarily mean that our method works as well in real dialogues with human users. Therefore, in this paper, we evaluate our dialogue control method by a Wizard of Oz (WoZ) experiment. The experimental results show that our POMDP-based method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This paper is the first to show the usefulness of POMDP-based dialogue control using human users when the target function is to maximize user satisfaction. © 2011 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

環境知能を実現する統計的対話処理の研究 (特集 20 周年を迎えたコミュニケーション科学)
南泰浩; 目黒豊美
NTT 技術ジャ-ナル, 電気通信協会, 23巻, 9号, 掲載ページ 10-13, 出版日 2011年, 査読付
研究論文（大学，研究機関等紀要）, 日本語
URL

Statistical Dialogue Processing for Ambient Intelligence
Y. Minami; T. Meguro
NTT Technical Review, 9巻, 11号, 出版日 2011年, 査読付
研究論文（大学，研究機関等紀要）, 英語

User-Adaptive Coordination of Agent Communicative Behavior in Spoken Dialogue
K. Dohsaka; A. Kanemoto; R. Higashinaka; Y. Minami; E. Maeda
Sigdial, 人工知能学会, 24巻, 掲載ページ 314-321, 出版日 2010年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語
URL

Modeling User Satisfaction Transitions in Dialogues from Overall Ratings
R. Higashinaka; Y. Minami; K. Dohsaka; T. Meguro
Sigdial, 掲載ページ 18-27, 出版日 2010年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Learning to Model Domain-Specific Utterance Sequences for Extractive Summarization of Contact Center Dialogues
R. Higashinaka; Y. Minami; H. Nishikawa; K. Dohsaka; T. Meguro; S. Takahashi; G. Kikui
COLING, 掲載ページ 400-408, 出版日 2010年08月, 査読付
研究論文（国際会議プロシーディングス）, 英語
URL
URL 2

FAST SIMILARITY SEARCH ON A LARGE SPEECH DATA SET WITH NEIGHBORHOOD GRAPH INDEXING
Kazuo Aoyama; Shinji Watanabe; Hiroshi Sawada; Yasuhiro Minami; Naonori Ueda; Kazumi Saito
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, IEEE, 掲載ページ 5358-5361, 出版日 2010年, 査読付, This paper presents a novel graph-based approach for solving a problem of fast finding a speech model acoustically similar to a query model from a large set of speech models. Each speech model in the set is represented by a Gaussian mixture model and dissimilarity from a GMM to another is measured with a Kullback-Leibler divergence (KLD). Conventional pruning techniques based on the triangle inequality for fast similarity search are not available because the model space with a KLD is not a metric space. We propose a search method that is characterized by an index of a degree-reduced nearest neighbor (DRNN) graph. The search method can efficiently find the most similar (closest) GMM to a query, exploring the DRNN graph with a best-first manner. Experimental evaluations on utterance GMM search tasks reveal a significantly low computational cost of the proposed method.
研究論文（国際会議プロシーディングス）, 英語

Issues in Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models
Ryuichiro Higashinaka; Yasuhiro Minami; Kohji Dohsaka; Toyomi Meguro
SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, SPRINGER-VERLAG BERLIN, 6392巻, 掲載ページ 48-60, 出版日 2010年, 査読付, This paper addresses three important issues in automatic prediction of user satisfaction transitions in dialogues. The first issue concerns the individual differences in user satisfaction ratings and how they affect the possibility of creating a user-independent prediction model. The second issue concerns how to determine appropriate evaluation criteria for predicting user satisfaction transitions. The third issue concerns how to train suitable prediction models. We present our findings for these issues on the basis of the experimental results using dialogue data in two domains.
研究論文（国際会議プロシーディングス）, 英語

Improving HMM-based extractive summarization for multi-domain contact center dialogues
Ryuichiro Higashinaka; Yasuhiro Minami; Hitoshi Nishikawa; Kohji Dohsaka; Toyomi Meguro; Satoshi Kobashikawa; Hirokazu Masataki; Osamu Yoshioka; Satoshi Takahashi; Genichiro Kikui
2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, 掲載ページ 61-66, 出版日 2010年, 査読付, This paper reports the improvements we made to our previously proposed hidden Markov model (HMM) based summarization method for multi-domain contact center dialogues. Since the method relied on Viterbi decoding for selecting utterances to include in a summary, it had the inability to control compression rates. We enhance our method by using the forward-backward algorithm together with integer linear programming (ILP) to enable the control of compression rates, realizing summaries that contain as many domain-related utterances and as many important words as possible within a predefined character length. Using call transcripts as input, we verify the effectiveness of our enhancement. ©2010 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Trigram dialogue control using POMDPs
Yasuhiro Minami; Ryuichiro Higashinaka; Kohji Dohsaka; Toyomi Meguro; Eisaku Maeda
2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, 掲載ページ 336-341, 出版日 2010年, 査読付, This paper proposes hybrid dialogue control of both trigram and POMDP dialogue controls by extending our proposed method that uses two approaches: automatically acquiring POMDP structures and rewards for target dialogues through Dynamic Bayesian Networks (DBNs) with a large amount of dialogue data and reflecting action predictive probabilities into the POMDP structures. In this extension, we modify the action predictive probabilities to treat trigram dialogue controls. Experimental results show that the proposed method can treat a trigram dialogue control with robustness for erroneous conditions and can simultaneously maximize trigram probability and the dialogue evaluations obtained from users. ©2010 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Effects of Personality Traits on Listening-Oriented Dialogue
T. Meguro; R. Higashinaka; K. Dohsaka; Y. Minami; H. Isozaki
IWSDS, 掲載ページ 104-107, 出版日 2009年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Dialogue Control Algorithm for Ambient Intelligence Based on Partially Observable Markov Decision Processes
Y. Minami; A. Mori; T. Meguro; R. Higashinaka; K. Dohsaka; E. Maeda
IWSDS, 掲載ページ 254-263, 出版日 2009年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Transdisciplinary Approach for Constructing Ambient Intelligence Environments
E. Maeda; Y. Minami; K. Dohsaka; A. Mori
Ami, 掲載ページ 9-12, 出版日 2009年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Effects of Conversational Agents on Human Communication in Thought-Evoking Multi-Party Dialogues
K. Dohsaka; R. Asai; R. Higashinaka; Y. Minami; E. Maeda
Sigdial, 掲載ページ 219-224, 出版日 2009年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Analysis of Listening-Oriented Dialogue for Building Listening Agents
T. Meguro; R. Higashinaka; K. Dohsaka; Y. Minami; H. Isozaki
Sigdial, 掲載ページ 124-127, 出版日 2009年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Switching acausal filters for speech modeling
Yasuhiro Minami; Hirokazu Kameoka
Machine Learning for Signal Processing XIX - Proceedings of the 2009 IEEE Signal Processing Society Workshop, MLSP 2009, 掲載ページ 1-6, 出版日 2009年, 査読付, This paper shows a unified model of dynamical systems in speech processing that includes speech recognition and pitch modeling. For this purpose, we propose the use of switching acausal filters (SAFs), which exchange multiple acausal filters. These filters are defined by identical linear dynamical systems that exchange the roles of observation value and system input. This paper describes the formulation of recognition, training, and feature generation methods for SAFs, which can be applied to several previously proposed speech models. As an example, we show that an HMM with dynamic features and our F0 control method can be modeled by the proposed formulation. An HMM synthesis method can also be modeled using the formulations. From these results, we demonstrate the unification capability of SAFs. © 2009 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

まつしゅるーむの世界 : 環境知能の実現
南泰浩; 堂坂浩二; 澤木美奈子; 森啓; 前田英作
ヒューマンインタフェース学会誌 = Journal of Human Interface Society : human interface, ヒュ-マンインタフェ-ス学会, 10巻, 2号, 掲載ページ 109-114, 出版日 2008年05月, 査読付
研究論文（学術雑誌）

"WHO IS THIS" QUIZ DIALOGUE SYSTEM AND USERS' EVALUATION
M. Sawaki; Y. Minami; R. Higashinaka; K. Dohsaka; E. Maeda
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, IEEE, 掲載ページ 149-152, 出版日 2008年, 査読付, In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes "Who is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner, we implemented the system as a stuffed-toy (or CG equivalent). Quizzes are automatically generated from Wikipedia articles, rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level.
研究論文（国際会議プロシーディングス）, 英語

Quizmaster Mushrooms: “Who Is This” Quiz Dialogue System
M. Sawaki; Y. Minami; R. Higashinaka; K. Dohsaka; T. Yamada; T.Matsubayashi; H. Isozaki; E. Maeda
ICMI demo-session, demo-session巻, 出版日 2007年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
Takaaki Hori; Chiori Hori; Yasuhiro Minami; Atsushi Nakamura
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 15巻, 4号, 掲載ページ 1352-1365, 出版日 2007年05月, 査読付, This paper proposes a novel one-pass search algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large-vocabulary continuous-speech recognition. In the standard search method with on-the-fly composition, two or more WFSTs are composed during decoding, and a Viterbi search is performed based on the composed search space. With this new method, a Viterbi search is performed based on the first of the two WFSTs. The second WFST is only used to rescore the hypotheses generated during the search. Since this rescoring is very efficient, the total amount of computation required by the new method is almost the same as when using only the first WFST. In a 65k-word vocabulary spontaneous lecture speech transcription task, our proposed method significantly outperformed the standard search method. furthermore, our method was faster than decoding with a single fully composed and optimized WFST, where our method used only 38% of the memory required for decoding with the single WFST. Finally, we have achieved high-accuracy one-pass real-time speech recognition with an extremely large vocabulary of 1.8 million words.
研究論文（学術雑誌）, 英語
DOI URL

The World of Mushrooms: Human-Computer Interaction Prototype Systems for Ambient Intelligence
Yasuhiro Minami; Minako Sawaki; Kohji Dohsaka; Ryuichiro Higashinaka; Kentaro Ishizuka; Hideki Isozaki; Tatsushi Matsubayashi; Masato Miyoshi; Atsushi Nakamura; Takanobu Oba; Hiroshi Sawada; Takeshi Yamada; Eisaku Maeda
ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, ASSOC COMPUTING MACHINERY, 掲載ページ 366-373, 出版日 2007年, 査読付, Our new research project called "ambient intelligence" concentrates oil the creation of new lifestyles through research oil communication science and intelligence integration. It is premised on the creation of Such virtual complication partners as fairies and goblins that can be constantly at our side. We call these virtual communication partners mushrooms.
To show the essence of ambient intelligence, we developed two multimodal prototype systems: mushrooms that watch, listen, and answer questions and a Quizmaster Mushroom. These two systems Work in real time using speech. Sound, dialogue, and vision technologies.
We performed preliminary experiments With the Quizmaster Mushroom. The results showed that the system call transmit knowledge to users while they are playing the quizzes.
Furthermore. through the two Mushrooms, we found policies for design effects in multimodal interface and integration.
研究論文（国際会議プロシーディングス）, 英語

Mixture Gaussian HMM-trajctory method using likelihood compensation
Yasuhiro Minami
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, IEEE, 掲載ページ 296-299, 出版日 2007年, 査読付, We propose a new speech recognition method (HMM-trajectory method) that generates a speech trajectory from HMMs by maximizing their likelihood while accounting for the relationship between the MFCCs and dynamic MFCCs. One major advantage of this method is that this relationship, ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper improves the recognition performance of the HMM-trajectory method for dealing with mixture Gaussian distributions. While the HMM-trajectory method chooses the Gaussian distribution sequence of the HMM states by selecting the best Gaussian distribution in the state during Viterbi decoding and calculating HMM trajectory likelihood along with the sequence, the proposed method compensates for HMM trajectory likelihood using ordinary HMM likelihood. In speaker-independent speech recognition experiments, the proposed method reduced the error rate about 10% for the task compared with HMMs, proving its effectiveness for Gaussian mixture components.
研究論文（国際会議プロシーディングス）, 英語

コミュニケーション環境の未来に向けた研究最前線まっしゅるーむの世界-知能統合の実現に向けて
南泰浩; 前田英作; 堂坂浩二; 近藤公久; 森啓
NTT技術ジャーナル, 電気通信協会, 19巻, 6号, 掲載ページ 19-21, 出版日 2007年, 査読付
研究論文（大学，研究機関等紀要）, 日本語
URL

まっしゅるーむの世界――知能統合の実現に向けて
南泰浩; 前田英作; 堂坂浩二; 近藤公久; 森啓
NTT技術ジャーナル, 19巻, 6号, 掲載ページ 19-22, 出版日 2007年, 査読付
研究論文（大学，研究機関等紀要）, 日本語

Dynamic assignment of Gaussian components in modelling speech spectra
Parham Zolfaghari; Hiroko Kato; Yasuhiro Minami; Atsushi Nakamura; Shigeru Katagiri; Roy Patterson
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, SPRINGER, 45巻, 1-2号, 掲載ページ 7-19, 出版日 2006年11月, 査読付, In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract where Gaussian distributions are used to model spectral frequency regions. A mixtures of Gaussian (MoG) based parametrisation scheme is used for modelling a smoothed representation of the spectra. This smoothing procedure removes all signal periodicity from the spectra allowing highly natural analysis, manipulation and synthesis of speech. The goal of this parametrisation scheme is to ease the correspondence between the resonant characteristics of the vocal tract and the parametric distributions and modelling the spectrum with an appropriate number of parameters. Previously, a maximum likelihood (ML) approach to this parametrisation scheme was introduced. However, this approach has inherent local optima problems. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we propose a new scheme whereby starting with a large number of distributions in the mixture, we systematically reduce their number and re-approximate the densities in the mixture based on a distance criterion. The Kullback-Leibler (KL) distance was found to allow optimal MoG solutions to the spectra. Furthermore, a fitness measure based on KL information is used to provide a figure for estimating the model order in representing formant-like features. The proposed model is subjectively evaluated and is shown to reduce the number of Gaussian with an appreciable loss in the quality of the re-synthesised speech.
研究論文（学術雑誌）, 英語
DOI URL

「妖精・妖怪の復権: 新しい「環境知能」像の提案」
前田英作; 南泰浩; 堂坂浩二
情報処理, 47巻, 6号, 掲載ページ 624-640, 出版日 2006年06月, 査読付
研究論文（学術雑誌）, 日本語

Speech feature extraction method using subband-based periodicity and nonperiodicity decomposition
Kentaro Ishizuka; Tomohiro Nakatani; Yasuhiro Minami; Noboru Miyazaki
Journal of the Acoustical Society of America, 120巻, 1号, 掲載ページ 443-452, 出版日 2006年, 査読付, This paper proposes a speech feature extraction method that utilizes periodicity and nonperiodicity for robust automatic speech recognition. The method was motivated by the auditory comb filtering hypothesis proposed in speech perception research. The method divides input signals into subband signals, which it then decomposes into their periodic and nonperiodic components using comb filters independently designed in each subband. Both features are used as feature parameters. This representation exploits the robustness of periodicity measurements as regards noise while preserving the overall speech information content. In addition, periodicity is estimated independently in each subband, providing robustness as regards noise spectrum bias. The framework is similar to that of a previous study [Jackson et al., Proc. of Eurospeech. (2003), pp. 2321-2324], which is based on cascade processing motivated by speech production. However, the proposed method differs in its design philosophy, which is based on parallel distributed processing motivated by speech perception. Continuous digit speech recognition experiments in the presence of noise confirmed that the proposed method performs better than conventional methods when the noise in the training and test data sets differs. © 2006 Acoustical Society of America.
研究論文（学術雑誌）, 英語
DOI URL

「環境知能シンポジウム2006－知性の森が織りなす未来」開催報告
堂坂浩二; 南泰浩; 森啓; 近藤公久
NTT技術ジャーナル, 電気通信協会, 18巻, 12号, 掲載ページ 72-76, 出版日 2006年, 査読付
研究論文（大学，研究機関等紀要）, 日本語
URL

「環境知能」プロジェクトの進展
南泰浩; 堂坂浩二; 森啓; 前田英作
NTT技術ジャーナル, 電気通信協会, 18巻, 9号, 掲載ページ 60-64, 出版日 2006年, 査読付
研究論文（大学，研究機関等紀要）, 日本語
URL

Report on “Ambient Intelligence Symposium 2006 - the Future: A Tapestry Woven from Threads of Intelligence”
K. Dohsaka; Y. Minami; A. Mori; T. Kondo
NTT Technical Review, 4巻, 12号, 掲載ページ 64-69, 出版日 2006年, 査読付
研究論文（大学，研究機関等紀要）, 英語

Step Towards Ambient Inteligence
E. Maeda; Y. Minami
NTT Technical Review, 4巻, 1号, 掲載ページ 50-55, 出版日 2006年, 査読付
研究論文（大学，研究機関等紀要）, 英語

The World of Mushrooms -a Transdisciplinary Approach to Human-Computer Interaction with Ambient Intelligence
E. Maeda; Y. Minami; M. Miyoshi; M. Sawaki; H. Sawada; A. Nakamura; J. Yamato; T. Yamada; R. Higashinaka
NTT Technical Review, 4巻, 12号, 掲載ページ 17-25, 出版日 2006年, 査読付
研究論文（大学，研究機関等紀要）, 英語

Selection of shared-state hidden Markov model structure using Bayesian criterion
S Watanabe; Y Minami; A Nakamura; N Ueda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E88D巻, 1号, 掲載ページ 1-9, 出版日 2005年01月, 査読付, A Shared-State Hidden Markov Model (SS-HMM) has been widely used as an acoustic model in speech recognition. In this paper, we propose a method for constructing SS-HMMs within a practical Bayesian framework. Our method derives the Bayesian model selection criterion for the SS-HMM based on the variational Bayesian approach. The appropriate phonetic decision tree structure of the SS-HMM is found by using the Bayesian criterion. Unlike the conventional asymptotic criteria, this criterion is applicable even in the case of an insufficient amount of training data. The experimental results on isolated word recognition demonstrate that the proposed method does not require the tuning parameter that must be tuned according to the amount of training data, and is useful for selecting the appropriate SS-HMM structure for practical use.
研究論文（学術雑誌）, 英語

「環境知能」の実現に向けて
前田英作; 南. 泰浩
NTT技術ジャーナル, 電気通信協会, 17巻, 11号, 掲載ページ 52-55, 出版日 2005年, 査読付
研究論文（大学，研究機関等紀要）, 日本語
URL

Fast on-the-Fly Composition for Weighted Finite-State Transducers in 1.8 Million-Word Vocabulary Continuous Speech Recognition
T. Hori; C. Hori; Y. Minami
ICSLP, I巻, 掲載ページ 289-292, 出版日 2004年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Improvement in Robustness of Speech Feature Extraction Method Using Sub-Band Based Periodicity and Aperiodicity Decomposition
K. Ishizuka; N. Miyazaki; T. Nakatani; Y. Minami
ICSLP, 掲載ページ 937-940, 出版日 2004年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

A Theoretical Analysis of Speech Recognition Based on Feature Trajectory Models
Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
ICSLP, I巻, 掲載ページ 549-552, 出版日 2004年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Variational Bayesian estimation and clustering for speech recognition
S Watanabe; Y Minami; A Nakamura; N Ueda
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 12巻, 4号, 掲載ページ 365-381, 出版日 2004年07月, 査読付, In this paper, we propose variational Bayesian estimation and clustering for speech recognition (VBEC), which is based on the variational Bayesian (VB) approach. VBEC is a total Bayesian framework: all speech recognition procedures (acoustic modeling and speech classification) are based on VB posterior distribution, unlike the maximum likelihood (ML) approach based on ML parameters. The total Bayesian framework generates two major Bayesian advantages over the ML, approach for the mitigation of over-training effects, as it can select an appropriate model structure without any data set size condition, and can classify categories robustly using a predictive posterior distribution. By using these advantages, VBEC: 1) allows the automatic construction of acoustic models along two separate dimensions, namely, clustering triphone hidden Markov model states and determining the number of Gaussians and 2) enables robust speech classification, based on Bayesian predictive classification using VB posterior distributions. The capabilities of the VBEC functions were confirmed in large vocabulary continuous speech recognition experiments for read and spontaneous speech tasks. The experiments confirmed that VBEC automatically constructed accurate acoustic models and robustly classified speech, i.e., totally mitigated the over-training effects with high word accuracies due to the VBEC functions.
研究論文（学術雑誌）, 英語
DOI URL

Recognition Method with Parametric Trajectory Synthesized Using Hmms
Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
SWIM, 掲載ページ 776-786, 出版日 2004年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Model selection for mixture of Gaussian based spectral modelling
P Zolfaghari; H Kato; Y Minami; A Nakamura; S Katagiri
MACHINE LEARNING FOR SIGNAL PROCESSING XIV, IEEE, 掲載ページ 325-334, 出版日 2004年, 査読付, In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract. We propose a mixtures of Gaussians (MoG) spectral modelling scheme which enables model selection with a goal of easing the correspondence between the resonant characteristics of the vocal tract and the parametric Gaussians and representing a spectrum with an appropriate number of parameters. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we systematically reduce the number of Gaussians and re-approximate the densities in the MoG spectral model. The Kullback-Leibler (KL) distance between the densities in the mixture was found to allow optimal ML-MoG solutions to the spectra. A fitness measure based on KL information provides a figure for estimating the model order in representing formant-like features. The mixture model was fitted to a normalised smooth spectrum obtained by filtering the short-time Fourier transform in time and frequency by a pitch adaptive Gaussian filter. This results in the removal of all source information from the spectra. By subjectively evaluating the quality of the analysed and synthesised speech using this parametrisation scheme, we show considerable improvement over ML using this Gaussian reduction scheme specifically when using lower number of Gaussians in the mixture.
研究論文（国際会議プロシーディングス）, 英語

Speech Summarization Using Weighted Finite-State Transducers
T. Hori; C. Hori; Y. Minam
Eurospeech, 掲載ページ 2817-2820, 出版日 2003年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

ベイズ的基準を用いた状態共有型 Hmm 構造の選択
渡部晋治; 南泰浩; 中村篤; 上田修功
電子情報通信学会論文誌D, 一般社団法人電子情報通信学会, J86-DII巻, 6号, 掲載ページ 776-786, 出版日 2003年06月, 査読付, 音声認識用音響モデルとして広く用いられている状態共有型HMMにおいては,その状態共有構造をいかに適切に定めるかが重要である.従来,総状態数の決定を含む状態共有構造及び総状態数の選択は最ゆう基準に基づいて行われていた.しかしゆう度は総状態数の増加に伴い単調増加するため,実験的にしきい値を設定する必要がある.また,この問題に対するために導入された.最小記述長(MDL)基準やベイズ的情報基準(BIC)に基づくモデル選択は漸近理論を用いて導出されているため,学習データが少ない場合,適切なモデル選択が困難であるという問題があった.本論文では,決定論的ベイズ計算法として提案された変分ベイズ法に基づく,漸近性を仮定しないベイズ的基準を用いてHMMの状態クラスタリングを行い,状態共有構造と総状態数を学習データに応じて適応的に選択する方法を提案する.不特定話者の孤立単語認識実験を通して提案法の有効性を実証した.
研究論文（学術雑誌）, 日本語
URL

Paraphrasing Spontaneous Speech Using Weighted Finitestate Transducers
T. Hori; D. Willett; Y. Minami
SSPR2003, 掲載ページ 219-222, 出版日 2003年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Bayesian Acoustic Modeling for Spontaneous Speech Recognition
S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
SSPR, 掲載ページ 47-50, 出版日 2003年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Language model adaptation using WFST-based speaking-style translation
T Hori; D Willett; Y Minami
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 掲載ページ 228-231, 出版日 2003年, 査読付, This paper describes a new approach to language model adaptation for speech recognition based on the statistical framework of speech translation. The main idea of this approach is to compose a weighted finite-state transducer (WFST) that translates sentence styles from in-domain to out-of-domain. It enables to integrate language models of different styles of speaking or dialects and even of different vocabularies. The WFST is built by combining in-domain and out-of-domain models through the translation, while each model and the translation itself is expressed as a WFST. We apply this technique to building language models for spontaneous speech recognition using large written-style corpora. We conducted experiments on a 20k-word Japanese spontaneous speech recognition task. With a small in-domain corpus, a 2.9% absolute improvement in word error rate is achieved over the in-domain model.
研究論文（国際会議プロシーディングス）, 英語

Recognition method with parametric trajectory generated from mixture distribution HMMs
Y Minami; E McDermott; A Nakamura; S Katagiri
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, I巻, 掲載ページ 124-127, 出版日 2003年, 査読付, We have proposed a new speech recognition technique that generates a speech trajectory from HMMs by maximizing the likelihood of the trajectory, while accounting for the relation between the cepstrum and the dynamic cepstrum coefficients. This method has the major advantage that the relation, which is ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper describes an extension of the method for dealing with HMMs whose distributions are mixture Gaussian distributions. The method chooses the sequence of Gaussian distributions by selecting the best Gaussian distribution in the state during Viterbi decoding. Speaker-independent speech recognition experiments were carried out. The proposed method obtained an 18.2% reduction in error rate for the task, proving that the proposed method is effective even for Gaussian mixture HMMs.
研究論文（国際会議プロシーディングス）, 英語

Application of variational Bayesian estimation and clustering to acoustic model adaptation
S Watanabe; Y Minami; A Nakamura; N Ueda
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 1巻, 掲載ページ 568-571, 出版日 2003年, 査読付, In this paper, we apply Variational Bayesian Estimation and Clustering for speech recognition (VBEC) to an acoustic model adaptation. VBEC can estimate parameter posteriors even when a model includes hidden variables, by using Variational Bayesian approach. In addition, VBEC can select an appropriate model structure in clustering triphone states, according to the amount of available adaptation data. Unlike a conventional Bayesian method such as Maximum A Posteriori (MAP), VBEC is useful even in the case of small amounts of data, because the amount of data per,one Gaussian increases due to the model structure selection, and over-training is suppressed. We conduct an off-line supervised adaptation experiment on isolated word recognition, and show the advantage of the proposed method over the conventional method, especially when dealing with small amounts of adaptation data.
研究論文（国際会議プロシーディングス）, 英語

Pervasive unsupervised adaptation for lecture speech transcription
D Willett; T Niesler; E McDermott; Y Minami; S Katagiri
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, IEEE, 2巻, 掲載ページ 292-295, 出版日 2003年, 査読付, Unsupervised adaptation has evolved as a popular approach for tuning the acoustic models of speaker-independent speech recognition systems to specific speakers, speaker groups or channel conditions while making use of only untranscribed data. This study focuses on procedures for unsupervised adaptation of other probabilistic models that are involved in state-of-the-art speech recognizers and on the joint adaptation of. multiple knowledge sources. In particular, we outline and evaluate approaches for adapting both the language model and the pronunciation model (lexicon) without supervision. Initial experiments on off-line lecture speech transcription achieved small but promising word error rate improvements with each approach applied separately. The experimental results on the joint application of acoustic, language and pronunciation model adaptation indicate that the individually achievable performance improvements are additive.
研究論文（国際会議プロシーディングス）, 英語

コミュニケーションの壁を克服するための音声･音響処理技術次世代の音声認識技術
中村篤; 南. 泰浩; マクダーモット・エリック
NTT技術ジャーナル, 電気通信協会, 15巻, 12号, 掲載ページ 13-18, 出版日 2003年, 査読付
研究論文（大学，研究機関等紀要）, 日本語
URL

Application of Variational Bayesian Approach to Speech Recognition
S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
NIPS, MIT Press, NIPS'02巻, 出版日 2002年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Evaluation of a Speech Recognition/Generation Method Based on HMM and Straight
T. Irino; Y. Minami; T. Nakatani; M. Tsuzaki; H. Tagawa
ICSLP, 掲載ページ 2545-2548, 出版日 2002年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Constructing Shared-State Hidden Markov Models Based on a Bayesian Approach
S. Watanabe; Y. Minami; A. Nakamura; N. Ueda
ICSLP, 4巻, 掲載ページ 2669-2672, 出版日 2002年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series
Y Minami; E McDermott; A Nakamura; S Katagiri
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, IEEE, 1巻, 掲載ページ 957-960, 出版日 2002年, 査読付, Parametric trajectory models have been proposed to exploit this time-dependency. However, parametric trajectory modeling methods are unable to take advantage of efficient HMM training and recognition methods. We have proposed a new speech recognition technique that generates a speech trajectory using an HMM-based speech synthesis method. This method generates an acoustic trajectory by maximizing the likelihood of the trajectory while taking into account the relation between the cepstrum, delta-cepstrum, and delta-delta cepstrum. In this paper, we extend our method to a general formulation including variance training procedure. Speaker independent speech recognition experiments show that the proposed method is effective for speech recognition.
研究論文（国際会議プロシーディングス）, 英語

A Recognition Method Using Synthesis-Based Scoring That Incorporates Direct Relations between Static and Dynamic Feature Vector Time Series
Y. Minami; E. McDermott; A. Nakamura; S. Katagiri
Workshop for Consistent & Reliable Acoustic Cues for Sound Analysis, Poster巻, 出版日 2001年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Mokusei: A Telephone-Based Japanese Conversational System in the Weather Domain
M. Nakano; Y. Minami; S. Seneff; T. J. Hazen; D. S. Cyphers; J. Glass; J. Poliforoni; V. Zue
Eurospeech, 掲載ページ 1331-1334, 出版日 2001年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Time and Memory Efficient Viterbi Decording for Lvcsr Using a Precompiled Search Network
D. Willett; E. McDermott; Y. Minami; S. Katagiri
Eurospeech, 掲載ページ 847-890, 出版日 2001年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

From Jupiter to Mokusei: Multilingual Conversational System in the Weather Domain
V. Zue; S. Seneff; J. Polifroni; M. Nakano; Y. Minami; T. J. Hazen; J. Glass
Workshop on Multi-Lingual Speech Communication, 掲載ページ 1-6, 出版日 2000年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Mokusei: A Japanese Spoken Dialogue System in the Weather Domain
S. Seneff; J. Glass; T. J. Hazen; Y. Minami; J. Polifroni; V. Zue
NTT R&D, 電気通信協会, 49巻, 7号, 掲載ページ 376-382, 出版日 2000年, 査読付
研究論文（大学，研究機関等紀要）, 英語
URL

Compensation of speaker directivity in speech recognition using HMM composition
F. Giron; Y. Minami; M. Tanaka; K. Furuya
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1巻, 掲載ページ 253-256, 出版日 1998年12月01日, 査読付, In hands-free speech recognition the speaker should be able to move freely in front of the speech acquisition device. However, the speech signal is then submitted to variations due to the continuous change of position in the acoustic space. This paper focuses on the role of speaker head rotations as compared with static situations in anechoic conditions. The effect of speaker directivity in speech recognition performance degradation is demonstrated and a compensation method based on HMM composition is proposed to increase the performance. © 1998 IEEE.
URL
URL 2
DOI URL

話者認識技術の実用化に向けて
松井知子; 吉岡理; 南泰浩
映像情報メディア学会技術報告, 一般社団法人映像情報メディア学会, 22巻, 45号, 掲載ページ 43-48, 出版日 1998年09月14日, 近年, インターネットや電話によるバンキング, 電子商取引, 会員制の情報提供サービスなどのネットワークを介したサービスが普及しはじめ, それらのサービスに不可欠な個人認証技術への関心が高まっている.その中でも声による個入認証システムは, 特に電話を利用したサービスのように, 声しか利用できない環境での需要が見込まれる.本稿では, 声による個人認証システムを構築するための話者認識技術について解説する.また, 最近注目されはじめた, 話者認識用のソフト開発キットについて, いくつか紹介するとともに, 筆者らが行った電話音声による話者認識実験について述べ, 学習と認識とで発声内容や電話機が同じ/異るの条件は, 認識性能に大きな影響を与えることを示す.
日本語
DOI URL

An HMM adaptation method for noise and distortion by maximizing likelihood
Y Minami; S Furui
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 81巻, 8号, 掲載ページ 1-9, 出版日 1998年08月, 査読付, This paper describes a new HMM synthesis method in which HMM adapts to additive noise and multiplicative distortion. The conventional HMM synthesis method can only be applied to additive noise. In the method described here, the likelihood of the synthesized HMM to adapt to speech is maximized, so that multiplicative distortion is eliminated when the method is applied. Within the framework of this method, adaptation to variations in the SN ratio, considered a problem in conventional HMM synthesis, can be formulated as part of the adaptation to multiplicative distortion. As a result of evaluating speech recognition rates using our method, we have confirmed that the method is effective for improving the recognition rate of speech that contains additive noise and multiplicative distortion. (C) 1998 Scripta Technica.
研究論文（学術雑誌）, 英語

Compensation of Speaker Directivity in Speech Recognition Using HMM Composition
F. Giron; Y. Minami; M. Tanaka; K. Furuya
ICASSP, vol.1巻, 掲載ページ 12-15, 出版日 1998年05月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Connected Digit Recognition in Spontaneous Speech
E. Bauche; B. Gajic; Y. Minami; T. Matsuoka; S. Furui
Eurospeech, 掲載ページ 923-926, 出版日 1997年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

尤度最大化による雑音とひずみへの Hmm 適応化手法
南泰浩; 古井貞煕
電子情報通信学会論文誌A, J80-A巻, 7号, 掲載ページ 1179-1186, 出版日 1997年07月, 査読付
研究論文（学術雑誌）, 日本語

An efficient search method for large-vocabulary continuous-speech recognition
K Hanazawa; Y Minami; S Furui
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V, I E E E, COMPUTER SOC PRESS, 掲載ページ 1787-1790, 出版日 1997年, 査読付, This paper proposes an efficient method for large-vocabulary continuous-speech recognition, using a compact data structure and an efficient search algorithm. We introduce a very compact data structure DAWG as a lexicon to reduce the search space. We also propose a search algorithm to obtain the N-best hypotheses using the DAWG structure. This search algorithm is composed of two phases: ''forward search'' and ''haceback''. Forward search, which basically uses the time-synchronous Viterbi algorithm, merges candidates and stores the information about them in DAWG structures to create phoneme graphs. Traceback traces the phoneme graphs to obtain the N-best hypotheses. An evaluation of this method's performance. using a speech-recognition-based telephone-directory-assistance system having a 4000-word vocabulary confirmed that our strategy improves-speech recognition in terms of time and recognition rate.
研究論文（国際会議プロシーディングス）, 英語

Adaptation method based on HMM composition and EM algorithm
Y Minami; S Furui
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, IEEE, 掲載ページ 327-330, 出版日 1996年, 査読付
研究論文（国際会議プロシーディングス）, 英語

Improved extended HMM composition by incorporating power variance
Y Minami; S Furui
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, IEEE, 掲載ページ 1109-1112, 出版日 1996年, 査読付, This paper describes a way of improving extended HMM composition that can precisely adapt HMMs to both noisy and distorted speech. To do this, we incorporate the variance of power into extended HMM composition using quantization to approximate the Gaussian distribution of the 0th order cepstrum. Consequently, a distribution of noisy speech is approximated in the linear spectral domain as a mixture of log normal distributions.
This method is evaluated by a four-digit recognition experiment when the number of digits is known. Two types of noise, computer room noise and car noise, are used and noisy and distorted speech data is made by adding these types of noise to speech data recorded using a boundary microphone. Results show that the proposed method improves recognition rates for noisy and distorted speech compared with our previous method.
研究論文（国際会議プロシーディングス）, 英語

AN HMM STATE DURATION CONTROL ALGORITHM APPLIED TO LARGE-VOCABULARY SPONTANEOUS SPEECH RECOGNITION
S TAKAHASHI; Y MINAMI; K SHIKANO
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78D巻, 6号, 掲載ページ 648-653, 出版日 1995年06月, 査読付, Although Hidden Markov Modeling (HMM) is widely acid successfully used in many speech recognition applications, duration control for HMMs is still an important issue in improving recognition accuracy since a HMM places no constraints on duration. For compensating this defect, some duration control algorithms that employ precise duration models have been proposed. However, they suffer from greatly increased computational complexity. This paper proposes a new state duration control algorithm for limiting both the maximum and the minimum state durations. The algorithm is for the HMM trellis likelihood calculation, not for the Viterbi calculation. The amount of computation required by this algorithm is only order one (O(1)) for the maximum state duration n; that is, the computation amount is independent of the maximum state duration while many conventional duration control algorithm require computation in the amount of order n or order n(2). Thus, the algorithm can drastically reduce the computation needed for duration control. The algorithm uses the property that the trellis likelihood calculation is a summation of many path likelihoods. At each frame, the path likelihood that exceeds the maximum likelihood is subtracted, and the path likelihood that satisfies the minimum likelihood is added to the forward probability. By iterating this procedure, the algorithm calculates the trellis likelihood efficiently. The algorithm was evaluated using a large-vocabulary speaker-independent spontaneous speech recognition system for telephone directory assistance. The average reduction in error rate for sentence understanding was about 7% when using context-independent HMMs, and 3% when using context-dependent HMMs. We could confirm the improvement by using the proposed state duration control algorithm even though the maximum and the minimum state durations were not optimized for the task (speaker-independent duration settings obtained from a different task were used).
研究論文（学術雑誌）, 英語

A SPEECH DIALOGUE SYSTEM WITH MULTIMODAL INTERFACE FOR TELEPHONE DIRECTORY ASSISTANCE
O YOSHIOKA; Y MINAMI; K SHIKANO
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78D巻, 6号, 掲載ページ 616-621, 出版日 1995年06月, 査読付, This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.
研究論文（学術雑誌）, 英語

ACOUSTIC AND LANGUAGE PROCESSING TECHNOLOGY FOR SPEECH RECOGNITION
T MATSUOKA; Y MINAMI
NTT REVIEW, NTT CORP, 7巻, 2号, 掲載ページ 30-39, 出版日 1995年03月, 査読付, This paper describes acoustic and language processing technology for automatic speech recognition. Speech recognition systems usually consist of acoustic and language processing modules. The acoustic processing extracts feature parameter vectors from the speech utterance and performs pattern recognition by comparing the vector sequence and pre-defined acoustic models. The most likely model is then chosen as the recognition result. The language processing helps recognition by narrowing down the number of candidates or selects the most linguistically matching hypothesis from those produced by the acoustic processing.
研究論文（学術雑誌）, 英語

A MAXIMUM-LIKELIHOOD PROCEDURE FOR A UNIVERSAL ADAPTATION METHOD BASED ON HMM COMPOSITION
Y MINAMI; S FURUI
1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5, IEEE, 掲載ページ 129-132, 出版日 1995年, 査読付
研究論文（国際会議プロシーディングス）, 英語

UNIVERSAL ADAPTATION METHOD BASED ON HMM COMPOSITION
Y MINAMI; S FURUI
ICA 95 - PROCEEDINGS OF THE 15TH INTERNATIONAL CONGRESS ON ACOUSTICS, VOL III, SINTEF, 掲載ページ 105-108, 出版日 1995年, 査読付
研究論文（国際会議プロシーディングス）, 英語

LARGE-VOCABULARY CONTINUOUS SPEECH RECOGNITION ALGORITHM APPLIED TO A MULTIMODAL TELEPHONE DIRECTORY ASSISTANCE SYSTEM
Y MINAMI; K SHIKANO; S TAKAHASHI; T YAMADA; O YOSHIOKA; S FURUI
SPEECH COMMUNICATION, ELSEVIER SCIENCE BV, 15巻, 3-4号, 掲載ページ 301-310, 出版日 1994年12月, 査読付, This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likelihood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm.
研究論文（学術雑誌）, 英語

PHONEME HMM EVALUATION ALGORITHM WITHOUT PHONEME LABELING APPLIED TO CONTINUOUS SPEECH HMM EVALUATION
Y MINAMI; T MATSUOKA; K SHIKANO
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 77巻, 11号, 掲載ページ 13-21, 出版日 1994年11月, 査読付, Phoneme Hidden Markov Model (HMM) ate generally evaluated in terms of the phoneme recognition rate by using speech data extracted based on phoneme labels. This paper proposes an evaluation method that does not use phoneme labels for extraction. Consequently, phoneme HMMs can be evaluated even if a speech database without phoneme labeling is used.
In this study, concatenation training of the phoneme HMMs is executed using a large-scale speaker-independent continuous-speech database. Evaluation of the HMM phoneme recognition rate which is a function of the number of training speakers, using the proposed evaluation method demonstrates its effectiveness.
研究論文（学術雑誌）, 英語

Multimodal Telephone Directory Assistance System and Its Evaluation
Y. Minami; O. Yoshioka; K. Shikano; S. Furui
International Workshop on Human Interface Technology, 掲載ページ 7-14, 出版日 1994年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

An HMM Duration Control Algorithm with a Low Computation Cost
S. Takahashi; Y. Minami; K. Shikano
ICSLP, 掲載ページ 267-270, 出版日 1994年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

A Multi-Modal Dialogue System for Telephone Directory Assistance
O. Yoshioka; Y. Minami; K. Shikano
ICASSP, 掲載ページ 887-890, 出版日 1994年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

SPEECH RECOGNITION USING PHONEME HMM CONSTRAINED BY FRAME CORRELATION
S TAKAHASHI; T MATSUOKA; Y MINAMI; K SHIKANO
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA-JOHN WILEY & SONS, 77巻, 6号, 掲載ページ 58-69, 出版日 1994年06月, 査読付, One of the problems with the hidden Markov model (HMM) in performing speech recognition is that the local transition information of the feature vectors is not incorporated into the mechanism of the model and the model is not constrained by transitions of the feature vectors. Thus, the output probability distribution never changes during recognition. Furthermore, all transitions between the vectors that have high probabilities are allowed even if those transitions did not appear in the training data.
This paper proposes a bigram-constrained HMM that uses correlations between two frames to constrain the feature distributions of a speaker-independent HMM to the region most appropriate for the speaker. Since the output probability of the bigram-constrained HMM is a conditional probability restricted by the feature vector of the previous frame, the output probability changes dynamically at each frame depending on the feature vector of the previous frame. Constraining the feature distribution makes it possible to reduce the overlapping of feature distributions between different phonemes which improves recognition performance.
Previously, we proposed the discrete bigram-constrained HMM which is based on the combination of a discrete speaker-independent HMM and the VQ-code bigram. We showed that it performed better than conventional speaker-independent HMMs. In this paper, the strategy is extended to the tied-mixture bigram-constrained HMM and the continuous bigram-constrained HMM to obtain better recognition performance. These three types of HMMs are formulated and evaluated by phoneme recognition in continuous speech.
研究論文（学術雑誌）, 英語

フレーム間相関を利用した音韻 Hmm による音声認識
高橋敏; 松岡達雄; 南泰浩; 鹿野清宏
電子情報通信学会論文誌A, 電子情報通信学会, J77-A巻, 2号, 掲載ページ 153-161, 出版日 1994年02月, 査読付, 現在のHMMの問題点の一つに,出客確率分布が各状態内で常に一定で,音韻特徴量の遷移情報がモデルの仕組みの中に反映されていないという点が挙げられる.しかも,特徴ベクトルの遷移に制約がないので,互いに出力確率が特徴ベクトル間の遷移は,学習データ中に観測されなかった遷移でも高い出力確率が与えられている.本論文では,特徴ベクトルの2フレーム間の相関を用いて遷移を制約し,不特定話者用HMMの広がった特徴量分布を,入力話者に適した範囲に制約するBigram制約HMMを提案する.Bigram制約HMMの出力確率は,前時刻の特徴ベクルトルの条件付き確率で表現されるので,出力確率分布は各時刻で動的に変化する.また,分布を制約することにより,異なる音韻間の特徴量分布の重なりが減少し,認識率を向上することができる.我々は既に,離散型不特定話者用HMMをもとに,VQコードのBigramを用いて遷移を制約する離散型Bigram制約HMMを提案し,従来のHMMよりも性能が良いことを示した.本論文では,更に高い認識性能を得るために,この手法を半連続型Bigram制約HMM,連続型Bigram制約HMMに拡張した.連続音声中の音韻認識によって評価した結果,入力話者の音声のフレーム間相関情報を用した場合,半連続型Bigram制約HMMによって平均音韻認識率を65.4%から74.8%に,連続型Bigram制約HMMによって64.8%から74.5%に改善することができた.また,多数話者から抽出した一般的なフレーム間相関情報を用いた場合,連続型Bigram制約HMMによって64.8%から67.5%に改善することができた.
研究論文（学術雑誌）, 日本語
URL

音韻ラベルを用いない Hmm 評価法とそれを用いた連続音声認識用 Hmm の評価
南泰浩; 松岡達雄; 鹿野清宏
電子情報通信学会論文誌A, J77-A巻, 2号, 掲載ページ 267-273, 出版日 1994年02月, 査読付
研究論文（学術雑誌）, 日本語

番号案内を対象とした大語い連続音声認識アルゴリズム
南泰浩; 山田智一; 鹿野清宏; 松岡達雄
電子情報通信学会論文誌A, J77-A巻, 2号, 掲載ページ 190-197, 出版日 1994年02月, 査読付
研究論文（学術雑誌）, 日本語

A very large vocabulary continuous speech recognition algorithm for telephone directory assistance
Yasuhiro Minami; Tomokazu Yamada; Kiyohiro Shikano; Tatsuo Matsuoka
Electronics and Communications in Japan (Part III: Fundamental Electronic Science), 77巻, 11号, 掲載ページ 1-12, 出版日 1994年, 査読付, This paper proposes a speech recognition algorithm for large vocabulary continuous speech. The proposed algorithm is based on the hidden Markov model (HMM)‐LR algorithm using a generalized predictive LR parser and phoneme HMMs. The following three techniques are applied to improve recognition performance and reduce processing time. The forward and the backward likelihood are used to accurately determine the likelihood in the beam search. To reduce the trellis computation in HMM speech recognition and for efficient search, only the speech frames in which the predicted phoneme seems to exist are used by the window for phoneme matching. For efficient search, adjusting identical phoneme sequences are merged by checking the stack and the state of the LR parser. The algorithm was applied to a telephone directory assistance task involving more than 70, 000 subscribers. A recognition experiment for continuous word utterance was done. The sentence recognition rate was 85 percent for speaker‐dependent speech recognition
the sentence recognition rate was 71 percent for speaker‐independent speech recognition. The sentence understanding rate was 59 percent for speaker‐dependent speech recognition with spontaneous utterances. Copyright © 1994 Wiley Periodicals, Inc., A Wiley Company
研究論文（学術雑誌）, 英語
DOI URL

Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system
Yasuhiro Minami; Kiyohiro Shikano; Satoshi Takahashi; Tomokazu Yamada; Osamu Yoshioka; Sadaoki Furui
Speech Communication, 15巻, 3-4号, 掲載ページ 301-310, 出版日 1994年, 査読付, This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likehood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm. © 1994.
研究論文（学術雑誌）, 英語
DOI URL

SEARCH ALGORITHM THAT MERGES CANDIDATES IN MEANING LEVEL FOR VERY LARGE VOCABULARY SPONTANEOUS SPEECH RECOGNITION
Y MINAMI; K SHIKANO; S TAKAHASHI; T YAMADA
ICASSP-94 PROCEEDINGS, VOL 2, IEEE, 掲載ページ 141-144, 出版日 1994年, 査読付
研究論文（国際会議プロシーディングス）, 英語

Language Processing for Speech Recognition
T. Matsuoka; Y. Minami
NTT R & D, 43巻, 10号, 掲載ページ 91-100, 出版日 1994年, 査読付
研究論文（大学，研究機関等紀要）, 英語

Acoustic Processing for Speech Recognition
Y. Minami; T. Matsuoka
NTT R & D, 43巻, 10号, 掲載ページ 81-90, 出版日 1994年, 査読付
研究論文（大学，研究機関等紀要）, 英語

Large-Vocabulary Continuous Speech Recognition Algorithm for Telephone Directory Assistance
K. Shikano; Y. Minami; S. Takahashi; T. Yamada
IEEE Workshop on Automatic Speech Recognition, 掲載ページ 14-15, 出版日 1993年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Large Vocabulary Continuous Speech Recognition System for Telephone Directory Assistance
Y. Minami; K. Shikano; S. Takahashi; T. Yamada; O. Yoshioka
International Symposium on Spoken Dialogue, 掲載ページ 169-172, 出版日 1993年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Multi-Modal Telephone Directory Assistance System Based on Large-Vocabulary Continuous Speech Recognition Algorithm
K. Shikano; Y. Minami; O. Yoshioka; S. Takahashi; T. Yamada
International Workshop on Knowledge Structure for Understanding Speech and Language, 1巻, 出版日 1993年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Recognition of Noisy Speech by Composition of Hidden Markov Models
F. Martin; K. Shikano; Y. Minami
Eurospeech, 掲載ページ 1031-1034, 出版日 1993年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

PHONEME HMMS CONSTRAINED BY FRAME CORRELATIONS
S TAKAHASHI; T MATSUOKA; Y MINAMI; K SHIKANO
ICASSP-93 : 1993 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5, I E E E, 掲載ページ B219-B222, 出版日 1993年, 査読付
研究論文（国際会議プロシーディングス）, 英語

Phoneme HMM Evaluation Algorithm without Phoneme Labeling
Y. Minami; T. Matsuoka; K. Shikano
ICSLP, 掲載ページ 1535-1538, 出版日 1992年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Very Large Vocabulary Continuous Speech Recognition for Telephone Directory Assistance
Y. Minami; K. Shikano; T. Yamada; T. Matsuoka
IEEE Workshop on Interactive Voice technology for Telecommunications Applications, VII.1巻, 掲載ページ 2129-2132, 出版日 1992年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

RECENT TOPICS IN SPEECH RECOGNITION RESEARCH AT NTT LABORATORIES
S FURUI; K SHIKANO; S MATSUNAGA; T MATSUOKA; S TAKAHASHI; T YAMADA
SPEECH AND NATURAL LANGUAGE, MORGAN KAUFMANN PUB INC, 掲載ページ 162-167, 出版日 1992年, 査読付
研究論文（国際会議プロシーディングス）, 英語

CONNECTIONIST APPROACHES TO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
H SAWAI; Y MINAMI; M MIYATAKE; A WAIBEL; K SHIKANO
IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, IEICE-INST ELECTRON INFO COMMUN ENG, 74巻, 7号, 掲載ページ 1834-1844, 出版日 1991年07月, 査読付, This paper describes recent progress in a connectionist large-vocabulary continuous speech recognition system integrating speech recognition and language processing. The speech recognition part consists of Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all 24 Japanese phonemes (i.e., 18 consonants /b/, /d/, /g/, /p/, /t/, /k/, /m/, /n/, /N/, /s/, /sh/ ([integral]), /h/, /z/, /ch/ ([t-integral]), /ts/, /r/, /w/, /y/ ([j]) and 5 vowels /a/, /i/, /u/, /e/, /o/ and a double consonant /Q/ or silence) by simply scanning among input speech without any specific segmentation techniques. On the other hand, the language processing part is made up of a predictive LR parser in which the LR parser is guided by the LR parsing table automatically generated from context-free grammar rules, and proceeds left-to-right without backtracking. Time alignment between the predicted phonemes and a sequence of the TDNN phoneme outputs is carried out by the DTW matching method. We call this 'hybrid' integrated recognition system the 'TDNN-LR' method. We report that large-vocabulary isolated word and continuous speech recognition using the TDNN-LR method provided excellent speaker-dependent recognition performance, where incremental training using a small number of training tokens is found to be very effective for adaptation of speaking rate. Furthermore, we report some new achievements as extensions of the TDNN-LR method: (1) two proposed NN architectures provide robust phoneme recognition performance on variations of speaking manner, (2) a speaker-adaptation technique can be realized using a NN mapping function between input and standard speakers and (3) new architectures proposed for speaker-independent recognition provide performance that nearly matches speaker-dependent recognition performance.
研究論文（学術雑誌）, 英語

On the Robustness of HMM and Ann Speech Recognition Algorithms
Y. Minami; T. Hanazawa; H. Iwamida; E. McDermott; K. Shikano; S. Katagiri; M. Nakagawa
ICSLP, 掲載ページ 1345-1348, 出版日 1990年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Trigramモデルを用いた複数候補を求めるフレーム同期型 Hmm 連続音声認識
南泰浩; 中川正雄
電子情報通信学会論文誌D, 電子情報通信学会情報・システムソサイエティ, J73-D-II巻, 9号, 掲載ページ 1383-1392, 出版日 1990年09月, 査読付
研究論文（学術雑誌）, 日本語
URL

時間遅れ神経回路による音韻スポッティング法と予測lrパーザを用いた大語い単語音声認識
南泰浩; 沢井秀文; 宮武正典
電子情報通信学会論文誌D, 電子情報通信学会情報・システムソサイエティ, J73-D-II巻, 6号, 掲載ページ 788-795, 出版日 1990年06月, 査読付
研究論文（学術雑誌）, 日本語
URL

INTEGRATED TRAINING FOR SPOTTING JAPANESE PHONEMES USING LARGE PHONEMIC TIME-DELAY NEURAL NETWORKS
M MIYATAKE; H SAWAI; Y MINAMI; K SHIKANO
ICASSP 90, VOLS 1-5, I E E E, 掲載ページ 449-452, 出版日 1990年, 査読付
研究論文（国際会議プロシーディングス）, 英語

VARIABLE BIT RATE PARCOR.VQ HYBRID VOCODER
K MIZUI; S WAKABAYASHI; M SATOH; Y MINAMI; M NAKAGAWA
DALLAS GLOBECOM 89, VOLS 1-3, I E E E, 掲載ページ 1885-1889, 出版日 1989年, 査読付
研究論文（国際会議プロシーディングス）, 英語

MISC

環境・バイタルセンサデータによりBPSD予測性能向上のためのBPSD発症時期分析
新見, 龍之慎; 常盤, 直也; 柴田, 純一; 鈴木, 利一; 柏木, 岳彦; 馬上, 竜也; 嘉村, 魁人; 大沼, 飛宇多; 田野, 俊一; 南, 泰浩
認知症患者が発症する行動・心理症状（BPSD）は，介護者の大きな負担となるだけでなく，患者本人の生活の質にも影響を与えている．BPSDを事前に予測し，症状へ対処が可能となれば介護者の負担を軽減できる．先行研究として複数の介護施設から収集した環境・バイタルセンサデータに基づき，機械学習を利用したBPSD予測を行った．しかし未だPR曲線のAverage Precisionが低い．そこで本研究では，BPSD予測の高精度化に向けてデータ解析を行なった．解析結果から特定の症状には24時間周期があることを確認した．この結果は，機械学習手法によるBPSD発症予測の可能性を示すものである．, 電子情報通信学会, 出版日 2024年03月08日, 電子情報通信学会技術研究報告信学技報 : MICT2023-79, 123巻, 446号, 掲載ページ 12-16, 日本語, 2432-6380
URL

UEC共創進化スマート社会モデルによる認知症高齢者問題解法の概要・得られた知見・今後の展望
田野, 俊一; 岡山, 義光; 横川, 慎二; 南, 泰浩
本稿では，新たな社会問題解決アプローチの基盤的な考え方である電気通信大学の「共創進化スマート社会」について説明し，次にこのコンセプトを認知症高齢者問題（BPSD）へ適用した「東京都BPSDプロジェクト」の概要を説明し，最後に現在得られている知見と今後の展望について報告する．, 電子情報通信学会, 出版日 2024年03月06日, 電子情報通信学会技術研究報告信学技報 : LOIS2023-58, 123巻, 429号, 掲載ページ 56-61, 日本語, 2432-6380
URL

AIとIoTにより認知症高齢者問題を多面的に解決する東京アプローチ
中島円; 中島円; 本井ゆみ子; 本井ゆみ子; 蒲原千尋; 蒲原千尋; 蒲原千尋; 小峯一城; 南泰浩; 南泰浩; 韓浩; 遠山修; 岩切のり子; 池田和博; 羽田野政治; 西浦孝典; 池田充; 三澤純子; 松岡伸輔; 横川慎二; 岡山義光; 田野俊一
出版日 2023年, Dementia Japan, 37巻, 4号, 1342-646X, 202302250079970435

タスク指向対話の対話状態追跡における言語モデルのHallucinationの抑制
佐藤明智; 南泰浩
出版日 2023年, 人工知能学会言語・音声理解と対話処理研究会資料, 99巻, 0918-5682, 202402249444565085

GPT-4を活用した感情・対話行為分析を組み込んだシチュエーショントラック対話システム
松浦直樹; 中山朝陽; 大沼飛宇多; 佐藤明智; 南泰浩
出版日 2023年, 人工知能学会言語・音声理解と対話処理研究会資料, 99巻, 0918-5682, 202402289423028367

アブストラクトの観点に基づく学術論文推薦手法の検証
小林恵大; 小山康平; 成松宏美; 南泰浩
学術論文の調査や執筆を支援するため，読むべき論文リストを推薦する論文推薦の研究が盛んに行われている．本研究では論文推薦において，読むべき論文リストだけでなく，その推薦理由も提示することを目的としアブストラクト（要旨）の分類を用いた手法を提案する．従来手法では，検索クエリとなる論文と推薦候補である論文集合との類似度は全体的な近さで判断しており，推薦の根拠を提示することは難しい．研究者の実用支援では，目的，アプローチなど，どの観点が近いということまで合わせて提示することが期待される．そこで本稿では，要旨の各文を背景，手法，結果などの観点で分類し，引用・被引用関係にある論文が観点ごとに分類できるかの分析と，その観点ごとの類似度に基づく論文推薦手法の提案・検証を行う．分析，検証の結果，観点ごとの類似度に基づく手法の有効性と従来手法と組み合わせることによる性能向上の可能性が示された．, 一般社団法人人工知能学会, 出版日 2023年, 人工知能学会全国大会論文集, JSAI2023巻, 掲載ページ 4Xin105-4Xin105, 日本語, 2758-7347
DOI URL

センター試験を対象とした高性能な英語ソルバーの実現
杉山弘晃; 成松宏美; 菊井玄一郎; 東中竜一郎; 堂坂浩二; 平博順; 南泰浩; 大和淳司
出版日 2020年, 言語処理学会年次大会発表論文集(Web), 26th巻, 2188-4420, 202002257856951573

「ロボットは東大に入れるか」プロジェクトの英語における意見要旨把握問題の解法
東中竜一郎; 杉山弘晃; 成松宏美; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 喜多智也; 南泰浩; 風間健流; 大和淳司
出版日 2018年, 人工知能学会全国大会論文集(CD-ROM), 32nd巻, 掲載ページ ROMBUNNO.2C1.02, 日本語, 1347-9881, 201802262605015084
URL

「ロボットは東大に入れるか」という企て：1．英語問題への挑戦から分かった技術的課題
東中竜一郎; 杉山弘晃; 堂坂浩二; 南泰浩; 成松宏美; 磯崎秀樹; 菊井玄一郎; 平博順; 大和淳司
出版日 2017年06月15日, 情報処理, 58巻, 7号, 掲載ページ 600-602, 日本語, 記事・総説・解説・論説等（学術雑誌）, 170000148666, AN00116625
URL

「ロボットは東大に入れるか」プロジェクトにおける英語科目の到達点と今後の課題
東中竜一郎; 杉山弘晃; 成松宏美; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩; 大和淳司
出版日 2017年, 人工知能学会全国大会論文集(CD-ROM), 31st巻, 掲載ページ ROMBUNNO.2H2‐1, 日本語, 1347-9881, 201802268089508328
URL

センター試験における英語問題の回答手法
東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
出版日 2015年, 言語処理学会年次大会発表論文集(Web), 21st巻, 2188-4420, 201502234485199950

幼児語彙習得順序における性別の影響について (ヒューマン情報処理)
南泰浩; 小林哲生
トマセロは語彙を習得する時期の違いは,社会的な環境によると指摘している.本報告では,このような社会的な影響の基礎的な調査のため,性別の語彙習得の時期への影響について調べる.このため,マッカーサー乳幼児言語発達質問紙(MacArthur-Bates Communicative Developmental Inventories)を用いて収集した横断データから,語を理解する日齢および発話する日齢を求めた.複数の言語において,理解日齢および発話日齢について相関を求めた結果,これらの指標が性別に関して共通性があることが分かった.また,性別共通性を取り除いた分析により,これらの指標が性別に影響されることも確認できた., 一般社団法人電子情報通信学会, 出版日 2014年05月29日, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 114巻, 68号, 掲載ページ 61-66, 日本語, 0913-5685, 110009903784, AN10487237
URL

幼児語彙習得順序における性別の影響について (ヒューマンコミュニケーション基礎)
南泰浩; 小林哲生
トマセロは語彙を習得する時期の違いは,社会的な環境によると指摘している.本報告では,このような社会的な影響の基礎的な調査のため,性別の語彙習得の時期への影響について調べる.このため,マッカーサー乳幼児言語発達質問紙(MacArthur-Bates Communicative Developmental Inventories)を用いて収集した横断データから,語を理解する日齢および発話する日齢を求めた.複数の言語において,理解日齢および発話日齢について相関を求めた結果,これらの指標が性別に関して共通性があることが分かった.また,性別共通性を取り除いた分析により,これらの指標が性別に影響されることも確認できた., 一般社団法人電子情報通信学会, 出版日 2014年05月29日, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 114巻, 67号, 掲載ページ 61-66, 日本語, 0913-5685, 110009903742, AN10487226
URL

絵本を基にした対象年齢推定方法の検討
藤田早苗; 小林哲生; 平博順; 南泰浩; 田中貴秋
出版日 2014年, 人工知能学会全国大会論文集(CD-ROM), 28th巻, 1347-9881, 201402211375582913

幼児の言語発達研究＜最前線＞
小林哲生; 南泰浩
出版日 2014年, ヒューマンインターフェース学会誌, 16巻, 2号, 掲載ページ 29-34, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）

意味属性パターンを用いたマイクロブログ中の発言に対する自動対話行為付与
目黒豊美; 東中竜一郎; 杉山弘晃; 南泰浩
Twitter 等のマイクロブログには様々な情報があり，これらを分類する研究が多くされている．もし，マイクロブログに対話行為を付与することができれば，ユーザの間でのやり取りを分析したり，対話システムへの発話生成などに応用できる可能性がある．本稿では，マイクロブログ中の発言に教師あり学習の手法を用いて，対話行為を付与する手法を提案する．マイクロブログのデータは，対話データに比べ，多種多様な話題や語彙を含み，崩れた日本語が多い．そこで，我々は多種多様な語彙や崩れた日本語を補うために，シソーラスを用いて抽象化した単語 N-gram と文字 N-gram を特徴量として用いることを提案する．評価実験の結果，Bag-of-Ngrams を用いるベースライン手法に比べて，精度が高いことが分かった．In this paper, we propose dialogue act tagging for utterances in microblogs. The dialogue act estimator is built by using support vector machines (SVMs). To cope with the variety of words and expressions in microblogs, the feature vector uses N-grams of characters and words. In addition, the feature vector of word N-grams are abstracted into semantic categories by using a thesaurus. In our experiment, the proposed model outperformed naive baselines based on word N-grams., 一般社団法人情報処理学会, 出版日 2013年10月18日, 研究報告音声言語情報処理（SLP）, 2013巻, 1号, 掲載ページ 1-6, 日本語, 110009613935, AN10442647
URL

幼児早期出現語の理解-発話指標による幼児語彙学習特徴の検証 (思考と言語)
南泰浩; 小林哲生; 杉山弘晃
幼児早期出現語に関して, Gentnerらにより名詞学習の優位性が主張されてきた.これまで,早期出現語のある時点での品詞分布における名詞頻度の優位性を根拠にこの仮定を支持する結果が多数報告されているGentnerらや, Maguireらは,ある語の学習における優位牲を説明するために,語を抽象表現空間上の連続体と仮定し,その空間が語の難易に関与しているという仮説を提案している.しかしながら,これまで,語彙学習の難易とこの抽象表現空間とのつながりを明確に示した研究はなかった.本報告では,横断的な語彙チェックリストの結果から語彙学習の難易と抽象表現空間を結びつける直接的な指標を提案するさらに,この指標上に,幼児早期出現語彙のカテゴリがどのように分布するかを報告する., 一般社団法人電子情報通信学会, 出版日 2013年02月22日, 電子情報通信学会技術研究報告 : 信学技報, 112巻, 442号, 掲載ページ 37-42, 日本語, 0913-5685, 110009728695, AN10449078
URL

対話処理における強化学習
南泰浩; 目黒豊美
計測自動制御学会, 出版日 2013年, 計測と制御, 52巻, 10号, 掲載ページ 916-921, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）, 0453-4662, 40019836182, AN00072406
URL

折れ線近似による語彙爆発開始時期の推定
南泰浩; 小林哲生; 杉山弘晃
日本音響学会聴覚研究委員会, 出版日 2012年03月08日, 聴覚研究会資料, 42巻, 2号, 掲載ページ 155-160, 日本語, 1346-1109, 40019248769, AN00227138
URL

順序学習に基づく逆強化学習による対話制御 (人工知能学会全国大会(第26回)文化,科学技術と未来) -- (オーガナイズドセッション「OS-18 知的対話システム」)
杉山弘晃; 目黒豊美; 南泰浩
人工知能学会, 出版日 2012年, 人工知能学会全国大会論文集, 26巻, 掲載ページ 1-4, 日本語, 1347-9881, 40020270054, AA11578981
URL

POMDP を用いた聞き役対話制御部のWizard of Oz 実験による評価 (人工知能学会全国大会(第26回)文化,科学技術と未来) -- (オーガナイズドセッション「OS-18 知的対話システム」)
目黒豊美; 南泰浩; 東中竜一郎
人工知能学会, 出版日 2012年, 人工知能学会全国大会論文集, 26巻, 掲載ページ 1-4, 日本語, 1347-9881, 40020270069, AA11578981
URL

語彙爆発の新しい視点 : 日本語学習児の初期語彙発達に関する縦断データ解析
小林哲生; 南泰浩; 杉山弘晃
日本赤ちゃん学会, 出版日 2012年, ベビーサイエンス, 12巻, 掲載ページ 40-64, 日本語, 40019763996, AA11903404
URL

統計的手法による音声対話制御
南泰浩
出版日 2012年, 情報処理学会誌, 53巻, 10号, 掲載ページ 1088-1094, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）, 20001036089

アクション継続長制御を利用するPOMDP対話制御 (ヒューマンコンピュータインタラクション(HCI) Vol.2011-HCI-142)
南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
この報告ではアクションの継続長制御を利用する POMDP による対話制御手法を提案する。我々は、これまで，POMDP による対話制御に，Trigram モデルによる統計的な対話制御を取り入れる手法を提案してきた。しかし，この手法は，対話タスクを自動的に学習することができる反面，高い確率を持っているアクションを過剰に生成する問題点があることが実験からわかってきた．本稿では，この問題点を解決するため POMDP を用いる対話制御において，アクション継続長の確率分布に従ってアクションを生成する手法を導入する。実験結果において，提案方法はアクションの Trigram 確率を高く保ちながら，偏りのないアクション生成を実現できることを確認した．This paper proposes a dialogue control method using action durations. We previously proposed a combined method of an ordinary POMDP-based method and a probability-based method and extended it to treat trigram dialogue control. When we apply this method to less task-oriented dialogues, the method over-generates actions that have high probabilities. To avoid this problem, we introduce duration control to our POMDP action generation process. The experimental results show that the proposed method can generate action sequences whose probability is similar to the training data and increase the entropy of the actions. This increase means that the action generation gives new information and avoids over-gererating the same actions. This confirms that our method generates appropriate action sequences., 情報処理学会, 出版日 2011年04月, 情報処理学会研究報告, 2010巻, 6号, 掲載ページ 1-8, 日本語, 2186-2583, 110008583618

アクション継続長制御を用いたPOMDPによる対話制御
南泰浩; 目黒豊美; 東中竜一郎
人工知能学会, 出版日 2011年, 人工知能学会全国大会論文集, 25巻, 掲載ページ 1-4, 日本語, 1347-9881, 40020269460, AA11578981
URL

部分観測マルコフ決定過程に基づく対話制御
南泰浩
出版日 2011年, 音響学会誌, 67巻, 10号, 掲載ページ 482-487, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）

人ロボット共生におけるコミュニケーション戦略の生成
前田英作; 南泰浩; 堂坂浩二
出版日 2011年, 日本ロボット学会誌, 29巻, 10号, 掲載ページ 887-890, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）

対話型映像認識理解における動的学習戦略に関する取り組み(テーマセッション,PRMUのフロンティア・グランドチャレンジ)
木村昭悟; 南泰浩; 坂野鋭; 前田英作; 杉山弘晃
人間は、特に意識をしなくとも、見た映像を理解し言語化できる。しかし、これらのタスクを計算機に代行させる映像認識/理解は、パターン認識分野における早期からの最重要課題の1つでありながら、未だ本質的な解決に至っていない。ただ、人間もこれらの能力を先天的に兼ね備えているとは考えにくく、その大部分が成長の過程で後天的に身に付けていくと考えられる。本報告では、認知発達的アプローチに基づく映像認識理解の枠組について議論した昨年の報告、及び近年の関連研究の動向などを踏まえた上で、映像認識理解のための能力や知識を獲得する戦略のあり方について言及する。特に、戦略構築の過程において必然的に人間がその系の中に取り込まれる強化学習としての側面、及びその際に人間のみならず計算機自身に適応する形で戦略の基本構造が動的に変化する点について、より具体的に踏み込んだ議論を行う。, 一般社団法人電子情報通信学会, 出版日 2010年12月02日, 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解, 110巻, 330号, 掲載ページ 53-54, 日本語, 110008675751

思考喚起型多人数対話システム--キャンプ
堂坂浩二; 南泰浩
人工知能学会, 出版日 2010年10月28日, 言語・音声理解と対話処理研究会, 60巻, 掲載ページ 35-38, 日本語, 0918-5682, 40017365379, AN10432166
URL

保有知識の確信度に基づく対話型映像認識理解システムの質問生成戦略(テーマセッション,コンピュータビジョンとパターン認識のための機械学習と最適化,一般)
Sekhon Gurbachan; 木村昭悟; 南泰浩; 坂野鋭; 前田英作
This report proposes a method for action planning in a system of interactive visual scene understanding through the use of system knowledge and its confidence. The knowledge confidence is defined as the combination of the following two properties on the latent space of a topic model connecting image features and text labels: 1) Similarity between an input sample and training samples on the latent space, and 2) the overall associability between each text label as determined by the content of the training samples. We evaluate the proposed method in the context of annotation accuracy and effort for providing answers from users. The experimental results with PASCAL VOC2008 dataset indicate that our proposed method achieved comparable or better annotation accuracy with less effort compared with strategies of 1) always asking the name of objects and 2) generating random questions., 一般社団法人電子情報通信学会, 出版日 2010年08月29日, 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解, 110巻, 187号, 掲載ページ 201-208, 英語, 0913-5685, 110008107179

対話データの統計量を用いたPOMDPによる対話制御 (言語理解とコミュニケーション)
南泰浩; 森啓; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
一般社団法人電子情報通信学会, 出版日 2009年12月21日, 電子情報通信学会技術研究報告, 109巻, 355号, 掲載ページ 83-88, 日本語, 0913-5685, 110008002098

聞き役対話システムの構築を目的とした聞き役対話の分析
目黒豊美; 東中竜一郎; 堂坂浩二; 南泰浩; 磯崎秀樹
我々は，ユーザの話を聞くことによって「話したい」という欲求を満たす聞き役対話システムの構築を目的としている．本稿では，そのような対話システムの構築を目的とした聞き役対話の分析について報告する．まず，人同士の聞き役対話と雑談を収集し，それぞれの対話タイプにおける対話行為の頻度を比較し，続いて，対話の流れを Hidden Markov Model （HMM）を用いて分析した．その結果，聞き役対話と雑談の HMM はそれぞれの特徴を示し，聞き役対話では，聞き役は質問をする前に自己開示を行い，より質問と相槌を多く行っていることがわかった．また，話し役や聞き役の性格特徴によって聞き役対話がどのように変わるかを分析した．その結果，それぞれの性格特徴によって対話が大きく異なることがわかった．Our aim is to build listening agents that can attentively listen to the user and satisfy his/her desire to speak and have himself/ herself heard. This paper investigates the characteristics of such listening-oriented dialogues so that such a listening process can be achieved by automated dialogue systems. We collected both listening-oriented dialogues and casual conversation, and analyzed them by comparing the frequency of dialogue acts, as well as the dialogue flows using Hidden Markov Models (HMMs). The analysis revealed that listening-oriented dialogues and casual conversation have characteristically different dialogue flows and that it is important for listening agents to self-disclose before asking questions and to utter more questions and acknowledgment than in casual conversation. We also investigated the effects of personality traits on listening-oriented dialogue. We found that a dialogue becomes characteristically different depending on the personality traits of speakers and listeners., 出版日 2009年09月21日, 研究報告自然言語処理（NL）, 2009巻, 10号, 掲載ページ 1-6, 日本語, 0919-6072, 110008003241, AN10115061
URL

POMDPを利用した思考喚起型対話の制御
南泰浩; 澤木美奈子; 東中竜一郎; 堂坂浩二
我々は，会話エージェントが，状況に即した適切な働きかけを通して，人間の自発的思考を喚起し，人間の会話意欲を高める思考喚起型の会話システムの実現を目指している．本稿では，思考喚起型会話システムの一例であるクイズを対象とする対話システムにおいて， POMDP を利用し不確定な情報にも適切に対処する対話システムの制御手法を提案する．不確定な情報として，ここでは，人がユーザの表情と音声からユーザが感じる難易度を判定した結果を利用する．この制御手法では，実験データから強化学習により POMDP の方策を作成し，その方策を使ってヒントを適切にスキップすることにより，ユーザのヒントに対する心的状態（易，ニュートラル，難）を制御する．本稿では，提案手法の有効性をシミュレーションにより評価を行った．We are researching thought-evoking dialogue systems where conversation agents appropriately affect users and evoke their voluntary thoughts to motivate human communication. This paper proposes a thought-evoking quiz dialogue system using the Partially Observed Markov Decision Process (POMDP) that can treat such uncertain information as paralanguage information. As uncertain information, we employ the user's level of difficulty in handling quiz hints. Another person detects this difficulty level by observing the user's facial and voice information. The system controls the user's difficulty levels (easy, neutral, and difficult) for the hints by skipping hints based on the POMDP policy that was learned by reinforcement training. This paper evaluates the proposed system in simulation experiments., 一般社団法人情報処理学会, 出版日 2008年12月02日, 情報処理学会研究報告音声言語情報処理（SLP）, 2008巻, 123号, 掲載ページ 97-102, 日本語, 0919-6072, 110007114728, AN10442647
URL

POMDPを利用した思考喚起型対話の制御
南泰浩; 澤木美奈子; 東中竜一郎; 堂坂浩二
我々は,会話エージェントが,状況に即した適切な働きかけを通して,人間の自発的思考を喚起し,人間の会話意欲を高める思考喚起型の会話システムの実現を目指している.本稿では,思考喚起型会話システムの一例であるクイズを対象とする対話システムにおいて,POMDPを利用し不確定な情報にも適切に対処する対話システムの制御手法を提案する.不確定な情報として,ここでは,人がユーザの表情と音声からユーザが感じる難易度を判定した結果を利用する.この制御手法では,実験データから強化学習によりPOMDPの方策を作成し,その方策を使ってヒントを適切にスキップすることにより,ユーザのヒントに対する心的状態(易,ニュートラル,難)を制御する.本稿では,提案手法の有効性をシミュレーションにより評価を行った., 一般社団法人電子情報通信学会, 出版日 2008年12月02日, 電子情報通信学会技術研究報告. NLC, 言語理解とコミュニケーション, 108巻, 337号, 掲載ページ 97-102, 日本語, 0913-5685, 110007114428, AN10091225
URL

まっしゅるーむの世界－環境知能の実現－
南泰浩; 堂坂浩二; 澤木美奈子; 森啓; 前田英作
出版日 2008年, ヒューマンインタフェース学会誌, 10巻, 2号, 掲載ページ 5-10, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）

クイズ対話システムの構築と音声認識性能による評価
南泰浩; 東中竜一郎; 澤木美奈子; 堂坂浩二; 山田武士; 松林達史; 磯崎秀樹; 前田英作
出版日 2007年, 日本音響学会研究発表会講演論文集(CD-ROM), 2007巻, 1880-7658, 200902257906453498

音声認識システムSOLONの日本語話し言葉コーパスによる評価(2006年版)
中村篤; 大庭隆伸; 渡部晋治; 石塚健太郎; 藤本雅清; 堀貴明; マクダーモットエリック; 南泰浩
NTTコミュニケーション科学基礎研究所では実環境での自然な話し言葉を対象とした音声認識の研究を進めている.本報告では,そのテストベッドとして開発中の音声認識ソフトウェア'SOLON'の,日本語話し言葉コーパス(CSJ: Corpus of Spontaneous Japanese)を用いたベンチマーク評価結果を報告する.音声区間の事前検出,発話速度依存音声分析,言語モデルの誤り訂正学習,全共分散型モデルの識別学習,教師なし話者適応,及びそれらの組み合わせによる効果を実験により示す., 一般社団法人電子情報通信学会, 出版日 2006年12月15日, 電子情報通信学会技術研究報告. SP, 音声, 106巻, 444号, 掲載ページ 73-78, 日本語, 0913-5685, 110006163063, AN10013221
URL

環境知能の実現に向けた分野横断型研究の試み : 新しい「環境」における新しい「知」へ
前田英作; 南泰浩; 堂坂浩二; 森啓; 近藤公久
NTTコミュニケーション科学基礎研究所では,2005年より「環境知能」をテーマとした研究プロジェクトを進めている.このプロジェクトの目的は,音声処理,音響処理,言語処理,対話,視覚情報処理,探索,学習,ネットワークなどのコミュニケーションのための情報処理技術を有機的に統合することにあり,それによって実現される新たな生活様式の提案も視野に入れている.本稿では,この取り組みの狙いとこれまでの進展を紹介する., 一般社団法人電子情報通信学会, 出版日 2006年10月12日, 電子情報通信学会技術研究報告. NLC, 言語理解とコミュニケーション, 106巻, 298号, 掲載ページ 51-56, 日本語, 0913-5685, 110004851875

環境知能の実現に向けた分野横断型研究の試み : 新しい「環境」における新しい「知」へ
前田英作; 南泰浩; 堂坂浩二; 森啓; 近藤公久
NTTコミュニケーション科学基礎研究所では,2005年より「環境知能」をテーマとした研究プロジェクトを進めている.このプロジェクトの目的は,音声処理,音響処理,言語処理,対話,視覚情報処理,探索,学習,ネットワークなどのコミュニケーションのための情報処理技術を有機的に統合することにあり,それによって実現される新たな生活様式の提案も視野に入れている.本稿では,この取り組みの狙いとこれまでの進展を紹介する., 一般社団法人電子情報通信学会, 出版日 2006年10月12日, 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解, 106巻, 300号, 掲載ページ 69-74, 日本語, 0913-5685, 110004852058

音声認識システムSOLONの日本語話し言葉コーパス(公開版Ver. 1.0)による評価
中村篤; 大庭隆伸; 渡部晋治; 石塚健太郎; 堀貴明; シュスターマイク; マクダーモットエリック; 南泰浩
NTTコミュニケーション科学基礎研究所で開発を進めている音声認識研究用テストベッドSOLONの, 日本語話し言葉コーパス(CSJ : Corpus of Spontaneous Japanese)公開版Ver.1.0を用いたベンチマーク評価結果を報告する.我々は実環境での自然な話し言葉を対象とした音声認識の研究を進めており, CSJを重要なベンチマークテスト用試料のひとつと位置づけている.本報告では, 最小識別誤り学習や全共分散型モデルを始めとするいくつかの技法の大語彙自然発話音声認識における効果を実験により明らかにし, 間投詞や精度算出言語単位の違いを考慮した追加評価等の結果もあわせて示す., 一般社団法人電子情報通信学会, 出版日 2005年12月22日, 電子情報通信学会技術研究報告. NLC, 言語理解とコミュニケーション, 105巻, 494号, 掲載ページ 7-12, 日本語, 0913-5685, 110003488505, AN10091225
URL

音声認識システムSOLONの日本語話し言葉コーパス（公開版Ver．1．0）による評価
中村篤; 大庭隆伸; 渡部晋治; 石塚健太郎; 堀貴明; マイク・シュスター; エリック・マクダーモット; 南泰浩
NTTコミュニケーション科学基礎研究所で開発を進めている音声認識研究用テストベッドSOLONの，日本語話し言葉コーパス（CSJ：Corpus of Spontaneous Japanese）公開版Ⅵr．1．0を用いたベンチマーク評価結果を報告する．我々は実環境での自然な話し言葉を対象とした音声認識の研究を進めており，CSJを重要なべンチマークテスト用試料のひとつと位置づけている．本報告では，最小識別誤り学習や全共分散型モデルを始めとするいくつかの技法の大語彙自然発話音声認識における効果を実験により明らかにし，間投詞や精度算出言語単位の違いを考慮した追加評価等の結果もあわせて示す．The SOLON is a speech recognition testbed system that has been developed at NTT Communication Science Laboratories. This paper reports results Inn the latest benchmark evaluation of the SOLON using the Corpus of Spontaneous Japanese (CSJ). The effectiveness of some of techniques, including minimum classification error training and full-covariance modeling, is presented through experiments. Also, results of recognition error analysis and additional evaluations are described., 一般社団法人情報処理学会, 出版日 2005年12月22日, 情報処理学会研究報告音声言語情報処理（SLP）, 2005巻, 127号, 掲載ページ 97-102, 日本語, 0919-6072, 110003494733, AN10442647
URL

ベイジアンネットワークの音響認識への応用
柏野邦夫; 南泰浩
一般社団法人日本音響学会, 出版日 2005年, 日本音響学会誌, 61巻, 12号, 掲載ページ 714-719, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）, 0369-4232, 110004019698, AN00186234
URL
DOI URL

カルマンフィルタにより生成されたトラジェクトリに基づく音声認識
南泰浩
出版日 2004年12月22日, 情報処理学会研究報告. SLP, 音声言語情報処理, 54巻, 掲載ページ 49-54, 英語, 0919-6072, 10014062518, AN10442647

音声生成モデルを考慮した音声認識
南泰浩
出版日 2003年, 日本音響学会誌, 59巻, 11号, 日本語, 査読付, 招待, 記事・総説・解説・論説等（学術雑誌）

静的特徴量時系列と動的特徴量時系列の関係を用いたトラジェクトリパラメータ生成による音声認識手法
南泰浩; マクダーモットエリック; 中村篤; 片桐滋
出版日 2002年03月18日, 日本音響学会研究発表会講演論文集, 2002巻, 1号, 掲載ページ 83-84, 日本語, 1340-3168, 10018033127, AN00351181

大語彙連続音声認識におけるビームサーチ性能向上のための言語モデル同期化法
ヴィレットダニエル; マクダーモットエリック; 南泰浩; 片桐滋
出版日 2001年10月01日, 日本音響学会研究発表会講演論文集, 2001巻, 2号, 掲載ページ 99-100, 英語, 1340-3168, 10007458257, AN00351181

連続音声認識のためのネットワーク構造を用いた効率的探索手法
花沢健; 南泰浩; 古井貞熙
出版日 1997年03月01日, 日本音響学会研究発表会講演論文集, 1997巻, 1号, 掲載ページ 51-52, 日本語, 1340-3168, 10002742165, AN00351181

自由発声中の連続数字音声認識
BAUCHE Etienne; 南泰浩; GAJIC Bojana; 松岡達雄; 古井貞熙
出版日 1997年03月01日, 日本音響学会研究発表会講演論文集, 1997巻, 1号, 掲載ページ 169-170, 日本語, 1340-3168, 10002742502, AN00351181

パワーの分散を考慮した拡張HMM合成法
南泰浩; 古井貞熙
出版日 1996年09月01日, 日本音響学会研究発表会講演論文集, 1996巻, 2号, 掲載ページ 141-142, 日本語, 1340-3168, 10002739836, AN00351181

雑音と歪みを含んだ音声へのHMM適応化手法の評価
南泰浩
出版日 1996年, 日本音響学会講演論文集, 掲載ページ 85-86, 10004085867, AN00351181

雑音と歪みを含んだ音声へのHMM適応化手法の評価
南泰浩
出版日 1996年, 音講論集, 2巻, 10004086017, AN00351181

最尤推定法を用いたHMM適応化法
南泰浩; 古井貞煕
出版日 1995年09月01日, 日本音響学会研究発表会講演論文集, 1995巻, 2号, 掲載ページ 1-2, 日本語, 1340-3168, 10002734243, AN00351181

HMM合成に基づく尤度最大化適応法
南泰浩; 古井貞煕
加算性雑音と乗算性ひずみにHMMを適応化させる新しいHMM合成法について述べる。従来のHMM合成法では、加算性の雑音にしか適応できなかった。本手法では、適応音声に対する合成HMMの尤度を最大化することによって、乗算性ひずみを推定し、乗算ひずみへの適応を行う。本手法の枠組みでは従来のHMM合成で、問題となっていたS/N比に対する適応も乗算性歪みへの適応の一部として定式化できる。本手法を用いて音韻認識率で評価した結果、本手法が加算性雑音と乗算性ひずみを含む音声の認識率を改善することが確認された。, 一般社団法人電子情報通信学会, 出版日 1995年06月22日, 電子情報通信学会技術研究報告. SP, 音声, 95巻, 122号, 掲載ページ 45-50, 日本語, 110003296453, AN10013221

尤度最大化原理によるHMM適応化法
南泰浩; 古井貞熙
出版日 1995年03月01日, 日本音響学会研究発表会講演論文集, 1995巻, 1号, 掲載ページ 61-62, 日本語, 1340-3168, 10002731969, AN00351181

自由発声音声認識における意味を考慮した2段 LP パーザの検討
南泰浩
出版日 1993年, 音講論, 掲載ページ 69-70, 10006730761, AN00351181

ATREUS:ATSにおける連続音声認識諸方式の比較
山口耕市; 嵯峨山茂樹; 服部浩明; 小森康弘; 沢井秀文; 花沢利行; 中村哲; 甲斐充彦; 南泰浩
出版日 1992年10月, 日本音響学会研究発表会講演論文集, 1992巻, Autumn Pt 1号, 掲載ページ 181-182, 日本語, 1340-3168, 200902051842663488

不特定話者連続音声データベースを用いたHMMの連結学習
南泰浩
出版日 1992年, 音講論集, 掲載ページ 9-10, 10006764155, AN00351181

TDDN音韻スポッティングと拡張LRパーザを用いた文節音声認識
南泰浩
出版日 1989年, 平1秋音響講論集, 3巻, 10006754381

書籍等出版物

人工知能プロジェクト「ロボットは東大に入れるか」: 第三次AIブームの到達点と限界
学術書, 日本語, 共著, 東京大学出版会, 出版日 2018年09月28日

これからの強化学習
学術書, 日本語, 共著, 森北出版, 出版日 2016年10月27日

Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models
R. Higashinaka; Y. Minami; K. Dohsaka; T. Meguro
英語, 共著, 出版日 2010年

Dialogue Control by Pomdp Using Dialogue Data Statistics
Y. Minami; A. Mori; T. Meguro; R. Higashinaka; K. Dohsaka; E. Maeda
英語, 共著, 出版日 2010年

環境知能のすすめ -情報化社会の新しいパラダイム-
外村佳伸; 前田英作; 竹内郁雄; 東浩紀; 石黒浩; 下條信輔; 堂坂浩二; 南泰浩; 中島秀之; 輿水大和
日本語, 共著, 出版日 2008年

音声認識の基礎（上）
古井貞煕; 鹿野清宏; 嵯峨山茂樹; 松岡達雄; 南泰浩; 松井知子; 高橋敏; 山田智一; 吉岡理
日本語, 共訳, 出版日 1995年

FM7 解析マニュアルフェーズiii
菊地寿; 蓑原辰夫; 南泰浩
日本語, 共著, 出版日 1984年

講演・口頭発表等

共同図形配置課題を行うシステムの構築と分析
齋藤結; 東中竜一郎; 南泰浩
言語処理学会第30回年次大会
発表日 2024年03月13日

対話状態追跡における言語モデルのスキーマに基づくHallucinationの抑制
佐藤明智; 南泰浩
言語処理学会第30回年次大会
発表日 2024年03月12日

推薦理由提示のためのアブストラクトの観点に基づく学術論文論文推薦
小林恵大; QI YANG; 成松宏美; 南泰浩
言語処理学会第30回年次大会
発表日 2024年03月

論文執筆支援を目的とした引用要否判定タスクのドメイン間比較
小山康平; 小林恵大; 成松宏美; 南泰浩
言語処理学会第２９回年次大会
発表日 2023年03月16日

人間の多次元的な心的表象に基づく幼児語彙獲得モデルの構築
藤田守太; 南泰浩
言語処理学会第２９回年次大会
発表日 2023年03月16日

知識グラフと Wikipedia を用いた雑談対話モデルの構築
郭恩孚; 南泰浩
言語処理学会第２９回年次大会
発表日 2023年03月14日

共通基盤の構築における名付けの有用性の分析
齋藤結; 光田航; 東中竜一郎; 南泰浩
言語処理学会第２９回年次大会
発表日 2023年02月15日

話題継続とペルソナを考慮した雑談対話システムの構築
佐藤明智; 南泰浩; 金子俊太; 谷口伊織; 郭
言語・音声理解と対話処理研究会
発表日 2022年12月

対話での共通基盤構築過程における名付けの分析
齋藤結; 光田航; 東中竜一郎; 南泰浩
口頭発表（一般）, 日本語, 言語処理学会２８回年次大会, 国内会議
発表日 2022年03月15日

引用要否判定タスクにおけるモデルの性能評価とデータの妥当性分析
小山康平; 小林恵大; 成松宏美; 南泰浩
口頭発表（一般）, 日本語, 言語処理学会第２８回年次大会, 国内会議
発表日 2022年03月15日

学術論文PDFからの関連研究章と引用情報の抽出による論文執筆支援のためのデータセット構築
小林恵大; 小山康平; 成松宏美; 南泰浩
口頭発表（一般）, 日本語, 言語処理学会第２８回年次大会, 国内会議
発表日 2022年03月15日

固有名詞に注目したTransformerによる雑談対話モデルの構築
郭恩孚; 南泰浩
口頭発表（一般）, 日本語, 言語処理学会第２８回年次大会, 国内会議
発表日 2022年03月15日

Bert による引用要否判定とエラー分析
堂坂浩二; 成松宏美; 小山康平; 東中竜一郎; 南泰浩; 田盛大悟; 平
人工知能学会全国大会
発表日 2021年06月

相互排他性を考慮した深層強化学習による幼児語彙獲得モデル
藤田守太; 南泰浩; 田口真輝
口頭発表（一般）, 日本語, 言語処理学会第27回年次大会(NLP2021), 国内会議
発表日 2021年01月17日

学術論文における関連研究の執筆支援のための被引用論文の推定
小山康平; 南泰浩; 成松宏美; 堂坂浩二; 東中竜一郎; 田盛大悟; 平博順
口頭発表（一般）, 日本語, 言語処理学会第27回年次大会(NLP2021), 国内会議
発表日 2021年01月17日

学術論文における関連研究の執筆支援のためのタスク設計およびデータ構築
成松宏美; 小山康平; 堂坂浩二; 田盛大悟; 東中竜一郎; 南泰浩; 平博順
口頭発表（一般）, 日本語, 言語処理学会第27回年次大会(NLP2021), 国内会議
発表日 2021年01月17日

ニューラルネットワーク強化学習を用いた幼児語彙獲得のモデル化
口頭発表（一般）, 日本語, ヒューマンコミュニケーション基礎研究会, 国内会議
発表日 2020年01月26日

乳児院入所児における言語発達の特徴-語彙数・語彙獲得順序・品詞カテゴリからの分析
坂本有香; 奥村優子; 南泰浩; 麦谷綾子; 伊藤嘉余子; 小林哲生
口頭発表（一般）, 日本語, ヒューマンコミュニケーション基礎研究会, 国内会議
発表日 2020年01月25日

幼児の語彙発達における地域差の分析
坂本有香; 南泰浩; 曹妍; 奥村優子; 小林哲生
口頭発表（一般）, 日本語, 赤ちゃん学会第19 回学術集会, 国内会議
発表日 2019年07月06日

多言語コーパスを用いた幼児語彙獲得時期での男女間相関の特性
藤田浩貴; 南泰浩; 小林哲生; 奥村優子
口頭発表（一般）, 日本語, 言語処理学会第24回年次大会, 国内会議
発表日 2018年03月

幼児の簡易語彙能力チェックリスト作成における幼児分類の効率化
塚田元春; 南泰浩; 小林哲生; 奥村優子
口頭発表（一般）, 日本語, 言語処理学会第24回年次大会, 国内会議
発表日 2018年03月

DRQNによる幼児の語彙獲得のモデル化
野口輝; 南泰浩
口頭発表（一般）, 日本語, 言語処理学会第24回年次大会, 国内会議
発表日 2018年03月

ニューラルネットワークと強化学習による幼児の語彙獲得のモデル化
野口輝; 南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), 国内会議
発表日 2018年01月

幼児の言語発達における共通ボキャブラリー指数の提案
曹妍; 南泰浩; 奥村優子; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), 国内会議
発表日 2018年01月

多言語における幼児語彙獲得時期の男女間相関の比較
藤田浩貴; 南泰浩; 小林哲生; 奥村優子
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), 国内会議
発表日 2018年01月

幼児の能力推定のための簡易語彙チェックリストの提案
森山佑亮; 南泰浩; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), 国内会議
発表日 2017年

マルチターン対話における次発話予測での効果的な特徴量の統合手法およびその分析
玉木竜二; 南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告 (言語理解とコミュニケーション), 国内会議
発表日 2017年

大規模幼児語彙発達データによる語彙獲得現象の分析
森山佑亮; 南泰浩; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), 国内会議
発表日 2017年

乳幼児の語理解・発話日齢に与える母親の教育年数の影響
森山佑亮; 南泰浩; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告(ヒューマンコミュニケーション基礎), 国内会議
発表日 2016年

語彙チェックリストアプリによる幼児語彙発達データ収集の試み
小林哲生; 奥村優子; 南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告, 国内会議
発表日 2016年01月

言語発達遅延児における語彙成長記録アプリ活用の試み
阿久津由紀子; 小林哲生; 小形哲也; 渡辺佐和; 齋藤貴美子; 南泰浩
口頭発表（一般）, 日本語, 日本言語聴覚学会, 国内会議
発表日 2016年

Three-way restricted boltzmann machine による音声モデリングに基づく話者・音素の同時認識
中鹿亘; 南泰浩
口頭発表（一般）, 日本語, 研究報告音楽情報科学 (MUS)
発表日 2016年

語彙チェックリストアプリによる幼児語彙発達データ収集の試み
小林哲生; 奥村優子; 南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告, 国内会議
発表日 2016年01月

乳幼児の語理解・発話日齢に与える母親の教育年数の影響
森山佑亮; 南泰浩; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告, 国内会議
発表日 2016年01月

幼児語彙習得順序における言語共通性と依存性について
南泰浩; 小林哲生
ポスター発表, 日本語, 日本音響学会秋季講演論文集, 国内会議
発表日 2015年03月17日

センター試験における英語問題の回答手法
東中竜一郎; 杉山弘晃; 磯崎秀樹; 菊井玄一郎; 堂坂浩二; 平博順; 南泰浩
ポスター発表, 日本語, 言語処理学会第２１回年次大会, 国内会議
発表日 2015年03月17日

本幼児の語彙習得順序に関する性別依存性について
南泰浩; 小林哲生
ポスター発表, 日本語, 電子情報通信学会技術研究報告HCS
発表日 2015年01月30日

日本語習得児における語彙カテゴリ構成の発達的変遷
小林哲生; 南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告HCS
発表日 2014年

幼児語彙習得順序における性別の影響について
南泰浩; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告HCS
発表日 2014年

1-2歳児における語彙カテゴリ構成の発達的変遷：大規模横断データを用いた検討
小林哲生; 南泰浩
口頭発表（招待・特別）, 日本語, 日本教育心理学会第56回総会(JAEP56)
発表日 2014年

絵本を基にした対象年齢推定方法の検討
藤田早苗; 小林哲生; 平博順; 南泰浩; 田中貴秋
口頭発表（招待・特別）, 日本語, 第28回人工知能学会全国大会
発表日 2014年

幼児コンテンツ制作支援のための語彙検索システムの提案とその評価
小林哲生; 南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告HIP
発表日 2013年

幼児早期出現語理解-発話指標による幼児語彙学習特徴の検証
南泰浩; 小林哲生; 杉山弘晃
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告TL
発表日 2013年

単語の発話音韻長と幼児の語彙獲得期間との関係
南泰浩; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2013年

語彙の身体性が獲得時期の個人差に与える影響
杉山弘晃; 小林哲生; 南. 泰浩
口頭発表（招待・特別）, 日本語, 赤ちゃん学会第13回学術集会
発表日 2013年

幼児コンテンツ制作支援のための語彙検索システム語の習得月齢・習得率の指定による該当語の選択
小林哲生; 南泰浩
口頭発表（招待・特別）, 日本語, 第13回学術集会
発表日 2013年

語の学習では本当に幼児は名詞を早く獲得する？―語の理解・発話日齢の推定による名詞優位性の言語間比較―
南泰浩; 小林哲生; 杉山晃弘
口頭発表（招待・特別）, 日本語, 赤ちゃん学会第13回学術集会
発表日 2013年

幼児早期出現語の理解-発話指標による名詞学習の優位性の検証
南泰浩; 小林哲生
口頭発表（招待・特別）, 日本語, 言語処理学会第19回年次大会
発表日 2013年

折れ線近似による語彙爆発開始時期の推定
南泰浩; 小林哲生; 杉山弘晃
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2012年

初期語彙発達の急増期における統計的性質と特徴量抽出
南泰浩; 小林哲生
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告TL
発表日 2012年

POMDP を用いた聞き役対話制御部の Wizard of Oz 実験による評価
目黒豊美; 南泰浩; 東中竜一郎; 堂坂浩二
口頭発表（招待・特別）, 日本語, 人工知能学会全国大会（第26回）
発表日 2012年

２ツイートを用いた対話モデルの構築
東中竜一郎; 川前徳章; 貞光九月; 南泰浩; 目黒豊美; 堂坂浩二; 稲垣博人
口頭発表（招待・特別）, 日本語, 言語処理学会第18回年次大会
発表日 2012年

順序学習に基づく逆強化学習による対話制御
杉山弘晃; 目黒豊美; 南泰浩
口頭発表（招待・特別）, 日本語, 人工知能学会全国大会（第26回）
発表日 2012年

語彙学習速度の線形性を利用した語彙学習日齢の予測
杉山弘晃; 小林哲生; 南泰浩
口頭発表（招待・特別）, 日本語, 赤ちゃん学会第 12 回学術集会
発表日 2012年

幼児コンテンツ作成のための発達に即した語彙検索システムの作成
小林哲生; 南. 泰浩
口頭発表（招待・特別）, 日本語, 教育心理学会総会
発表日 2012年

縦断および横断データを用いた幼児早期出現語の獲得月齢の特定
小林哲生; 南泰浩; 永田昌明
口頭発表（招待・特別）, 日本語, 言語処理学会第18回年次大会
発表日 2012年

幼児の語彙学習速度と語彙カテゴリー構成
小林哲生; 南泰浩; 杉山弘晃
口頭発表（招待・特別）, 日本語, 赤ちゃん学会第 12 回学術集会
発表日 2012年

線形関数とプラトー割り込みによる語彙発達モデルの検証―幼児の語彙発達におけるポアソン過程性の検証―
南泰浩; 小林哲生; 杉山弘晃
口頭発表（招待・特別）, 日本語, 赤ちゃん学会第 12 回学術集会
発表日 2012年

カルマンフィルタを用いた語彙発達におけるプラトー時期の推定
南泰浩; 小林哲生; 杉山弘晃
口頭発表（招待・特別）, 日本語, 音響学会秋季
発表日 2012年

線形関数とプラトー割込による幼児語彙発達のモデル化
南泰浩; 小林哲生; 杉山弘晃
口頭発表（招待・特別）, 日本語, 言語処理学会第18回年次大会
発表日 2012年

POMDP を用いた聞き役対話システムの対話制御
目黒豊美; 東中竜一郎; 南泰浩; 堂坂浩二
口頭発表（招待・特別）, 日本語, 言語処理学会第17回年次大会
発表日 2012年

アクション継続長制御を利用する POMDP 対話制御
南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
口頭発表（一般）, 日本語, 情報処理学会研究報告HCI
発表日 2011年

共通状態と連結学習を用いた HMM によるコールセンタ対話の要約
東中竜一郎; 南泰浩; 西川仁; 堂坂浩二; 目黒豊美; 小橋川哲; 政瀧浩和; 吉岡理; 高橋敏; 菊井玄一郎
口頭発表（招待・特別）, 日本語, 言語処理学会第17回年次大会
発表日 2011年

アクション継続長制御を用いた POMDP による対話制御
南泰浩; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
口頭発表（招待・特別）, 日本語, 人工知能学会全国大会論文集
発表日 2011年

ユーザ支援システムのための人の行動タイミング決定方策の分析
杉山弘晃; 南泰浩
口頭発表（招待・特別）, 日本語, 第28回日本ロボット学会学術講演会
発表日 2011年

思考喚起型多人数対話システム--キャンプ
堂坂浩二; 南泰浩
口頭発表（一般）, 日本語, 人工知能学会言語・音声理解と対話処理研究会
発表日 2010年

POMDP による Trigram 対話制御
南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 森啓; 前田英作
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2010年

保有知識の確信度に基づく対話型映像認識理解システムの質問生成戦略
セクホン・ガーバチャン; 木村昭悟; 南泰浩; 坂野鋭; 前田英作
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告IBISML
発表日 2010年

音声対話におけるエージェント発話行動の適応的調整
堂坂浩二; 金本淳志; 東中竜一郎; 南泰浩; 前田英作
口頭発表（招待・特別）, 日本語, 人工知能学会全国大会
発表日 2010年

対話データを用いた POMDP による統計的対話制御手法の解析
南泰浩; 東中竜一郎; 堂坂浩二; 目黒豊美; 前田英作
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2010年

統計的モデルを用いた POMDP による対話制御
南泰浩; 目黒豊美; 東中竜一郎; 森啓; 堂坂浩二; 前田英作
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2010年

聞き役対話システムの構築を目的とした聞き役対話の分析
目黒豊美; 東中竜一郎; 堂坂浩二; 南. 泰浩; 磯崎秀樹
口頭発表（一般）, 日本語, 情報処理学会研究報告NL
発表日 2009年

対話データの統計量を用いた POMDP による対話制御
南泰浩; 森啓; 目黒豊美; 東中竜一郎; 堂坂浩二; 前田英作
口頭発表（一般）, 日本語, 情報処理学会研究報告SLP
発表日 2009年

音声認識システム SOLON における日本語講演音声への教師なし適応に関する評価
大庭隆伸; 渡部晋治; 石塚健太郎; 藤本雅清; 堀貴明; マックダーモット・エリック; 南泰浩; 中村篤
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2009年

POMDP を利用した思考喚起型対話の制御
南泰浩; 澤木美奈子; 東中竜一郎; 堂坂浩二
口頭発表（一般）, 日本語, 情報処理学会研究報告SLP
発表日 2008年

クイズ対話システムの構築と音声認識性能による評価
南泰浩; 東中竜一郎; 澤木美奈子; 堂坂浩二; 山田武士; 松林達史; 磯崎秀樹; 前田英作
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2007年

カルマンフィルタに基づく音声認識手法における混合ガウス分布モデルの検討
南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会秋講演論文集
発表日 2007年

カルマンフィルタを用いた音声認識
南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2007年

環境知能の実現に向けた分野横断型研究の試み
前田英作; 南泰浩; 堂坂浩二; 森啓; 近藤公久
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告ＰＲＭＵ
発表日 2006年

音声認識システム SOLON の日本語話し言葉コーパスによる評価(2006年版)
中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
口頭発表（一般）, 日本語, 情報処理学会研究報告SLP
発表日 2006年

音声認識システム SOLON の日本語話し言葉コーパス（公開版ver1.0）による評価
中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2006年

カルマンフィルタによる音声認識のための特徴量トラジェクトリ生成法
南泰浩; マックダーモット・エリック; 中村篤
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2006年

ベイズ的基準を用いた状態共有型 HMM 構造の選択
渡部晋治; 南泰浩; 中村篤; 上田修功
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2005年

変分ベイズを用いた音声認識
渡部晋治; 南泰浩; 中村篤; 上田修功
口頭発表（一般）, 日本語, 第8回情報論的学習理論ワークショップ予稿集
発表日 2005年

音声認識システム SOLON の日本語話し言葉コーパス（公開版ver1.0）による評価
中村篤; 大庭隆伸; 石塚健太郎; 渡部晋治; 堀貴明; シュスター・マイク; マックダーモット・エリック; 南泰浩
口頭発表（一般）, 日本語, 情報処理学会研究報告SLP
発表日 2005年

音声特徴抽出法 Spade における歪補正法の効果
石塚健太郎; 宮崎昇; 中谷智広; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2005年

カルマンフィルタにより生成されたトラジェクトリに基づく音声認識
南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2004年

音声認識でのダイナミクスの表現
南泰浩
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2004年

帯域内での周期性・非周期性を表す音声特徴抽出法spadの提案とaurora-2jを用いた耐雑音性評価
石塚健太郎; 宮崎昇; 中谷智広; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2004年

音声認識システム SOLON の日本語話し言葉コーパスにおける評価
渡部晋治; 堀貴明; マクダーモット・エリック; 南泰浩; 中村篤
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2004年

WFST の高速on-the-Fly合成による超大語彙連続音声認識
堀貴明; 堀智織; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2004年

有限状態トランスデューサ型デコーダの性能改善
堀貴明; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2004年

特徴量トラジェクトリによる音声認識手法の理論的考察
南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2004年

変分ベイズ法の音響モデル適応への応用
渡部晋治; 南泰浩; 中村篤; 上田修功
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2004年

有限状態トランスデューサによる音声要約法の評価
堀貴明; 堀智織; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2003年

有限状態トランスデューサによる音声認識・文整形・要約処理の統合
堀貴明; 堀智織; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2003年

変分ベイズ法の音声認識への適用
渡部晋治; 南泰浩; 中村篤; 上田修功
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2003年

ベイズ的アプローチに基づく状態共有型 HMM 構造の学習
渡部晋治; 南泰浩; 中村篤; 上田修功
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2002年

実対話音声を用いた有限状態トランスデューサ型認識デコーダの評価
奈木野豪秀; ヴィレット・ダニエル; 南泰浩; 中村篤; マクダーモット・エリック; 宮崎昇; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 2002年

セグメントモデルに基づく音声認識
南泰浩
口頭発表（一般）, 日本語, 情報処理学会音声言語情報処理研究会SIG-SLP
発表日 2002年

有限状態トランスデューサによる音声認識と文整形処理の統合
堀貴明; ヴィレット・ダニエル; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2002年

混合分布型 HMM を用いたトラジェクトリパラメータ生成によろ音声認識手法の評価
南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2002年

静的特徴量と動的特徴量の関係を用いたトラジェクトリパラメータ生成による音声認識手法
南泰浩; マクダーモット・エリック; 中村篤; 片桐滋
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2002年

バイノーラル音源分離の音声認識による評価
中谷智広; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2002年

On-Line Transducer Composition for Memory-Efficient Search in Lvcsr
ヴィレット・ダニエル; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2002年

Pervasive Unsupervised Adaptation for Off-Line Lecture Speech Transcription
ウィレット・ダニエル; ニスラー・トーマス; マクダーモット・エリック; 南泰浩; 片桐滋
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2002年

Language Model Synchronization for Improved Beam-Search Performance in Large Vocabulary Continuous Speech Recognition
ヴィレット・ダニエル; マックダーモット・エリック; 南泰浩; 片桐滋
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 2002年

A Time-Synchronous Viterbi-Decoder for Arbitrary Speech Recognition Tasks Defined by Finite State Transducers
ヴィレット・ダニエル; マックダーモット・エリック; 南泰浩; 中村篤; 片桐滋
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2001年10月

連続音声認識にためのネットワーク構造をもちいた効率的探索手法
花沢健; 南泰浩; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 2001年03月

話者認識技術の実用化に向けて
松井知子; 吉岡理; 南泰浩
口頭発表（一般）, 日本語, 映像情報メディア学会技術報告マルチメディア情報処理研究会
発表日 1998年

パワーの分散を考慮した拡張ｈｍｍ合成法
南泰浩; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会講演論文集
発表日 1997年

自由発声中の連続数字音声認識
ボッシュ・エティエン; 南泰浩; ガジク・ボヤナ; 松岡達雄; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1997年

Evaluation up Speech Recognition Performance Degradation for a Moving Speaker in Anechoic Conditions
ジロン・フランク; 田中雅史; 古家賢一; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1997年

雑音と歪みを含んだ音声への HMM 適応化手法の評価
南泰浩; 高木幸一; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1997年

尤度最大化原理による HMM 適応化法
南泰浩; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1996年03月

パワーの分散を考慮した拡張 HMM 合成法
南泰浩; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1996年

雑音と歪みを含んだ音声への HMM 適応化手法の評価
南泰浩; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1996年

HMM 合成に基づく尤度最大化適応法
南泰浩; 古井貞煕
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 1995年

最尤推定法を用いた HMM 適応化法
南泰浩; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1995年

電話音声認識のための音響モデルの回線特性への適応化
松岡達雄; グロ・ピエールエマニエル; 南泰浩; 古井貞煕
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1995年

電話番号案内を対象としたマルチモーダル対話システムの作成と音声入力の評価
吉岡理; 南泰浩; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 1994年

電話番号案内を対象としたマルチモーダル対話システムにおける音声入力の評価
吉岡理; 南泰浩; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1994年

HMM トレリス計算のおける状態継続時間制限アルゴリズム
高橋敏; 南泰浩; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1994年

HMM トレリス計算のおける状態継続時間制限アルゴリズム
高橋敏; 南泰浩; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1993年10月

Improving Phoneme HMMs for Large -Vocabulary Spontaneous Speech Recognition
高橋敏; 南泰浩; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 1993年

Atr における連続音声認識システム Atreus の諸方式と性能
永井明人; 山口耕市; 鷹見淳一; 大倉計美; 小坂哲夫; 福沢圭二; 加藤喜永; S. Harald; 村上仁一; 杉山雅英; 嵯峨山茂樹; 保坂順子; 森元逞; 北研二; 服部浩明; 小森康弘; 沢井秀文; 花沢利行; 中村哲; 甲斐充彦; 南泰浩; 川端豪; 鹿野清宏; 榑松明
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告SP
発表日 1993年

電話番号案内を対象としたマルチモーダル対話システムの作成
吉岡理; 南泰浩; 山田智一; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1993年

自由発声を対象とした不特定話者大語彙連続音声認識法
南泰浩; 鹿野清宏; 高橋敏; 山田智一
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1993年

音韻環境依存 HMM と候補のマージを用いた不特定話者大語彙連続音声認識
南泰浩; 高橋敏; 鹿野清宏; 山田智一
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1993年

自由発声音声認識における意味を考慮した2段 Lr パーザの検討
南泰浩; 山田智一; 吉岡理; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1993年

HMM の合成による雑音下の大語彙連続音声認識
南泰浩; フランクマルタン; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1993年

エルゴディック雑音 HMM と音韻 HMM の合成による雑音重畳音声の認識
マルタン・フランク; 鹿野清宏; 南泰浩
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1993年

番号案内を対象とした自由発声の認識の試み
鹿野清宏; 南泰浩; 山田智一
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1993年

音韻認識における HMM のラベルなし評価法
南泰浩; 松岡達雄; 鹿野清宏
口頭発表（一般）, 日本語, 連続音声認識シンポジウムSPREC
発表日 1992年

フレーム間相関を用いた音韻 HMM
高橋敏; 南泰浩; 松岡達雄; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告 SP
発表日 1992年

音韻認識における HMM のラベルなし評価法
南泰浩; 松岡達雄; 鹿野清宏
口頭発表（一般）, 日本語, 連続音声認識シンポジウムSPREC
発表日 1992年

不特定話者連続音声データベースによる連結学習 HMM の評価
南泰浩; 松岡達雄; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告 SP
発表日 1992年

番号案内を対象とした大語彙連続音声認識アルゴリズム
南泰浩; 山田智一; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告 SP
発表日 1992年

Recognition of Noisy Speech by Composition of Hidden Markov Models
マルタン・フランク; 鹿野清宏; 南泰浩; 岡部洋一
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告 SP
発表日 1992年

フレーム間相関を用いた連続型音韻 HMM
高橋敏; 南泰浩; 松岡達雄; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1992年

音響学会連続音声データベースによる各種不特定話者 HMM の評価
南泰浩; 高橋敏; 松岡達雄; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1992年

不特定話者連続音声データベースを用いた HMM の連結学習
南泰浩; 松岡達雄; 鹿野清宏
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1992年

番号案内を対象とした大語彙連続音声認識アルゴリズム
南泰浩; 山田智一; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告 SP
発表日 1992年

Recognition of Noisy Speech by Using the Composition of Hidden Markov Models
マルタン・フランク; 鹿野清宏; 南泰浩; 岡部洋一
口頭発表（招待・特別）, 日本語, 日本音響学会秋季講演論文集
発表日 1992年

セパレ-トベクトル量子化を用いた HMM 音声認識の耐雑音性に対する検討
片岡淳; 南泰浩; 中川正雄
口頭発表（招待・特別）, 日本語, 電子情報通信学会春季全国大会講演論文集
発表日 1992年

Tdnn 音韻スポッティングと予測 Lr パーザを用いた大語彙単語音声認識
南泰浩; 沢井秀文; 宮武正典; 鹿野清宏
口頭発表（一般）, 日本語, 電子情報通信学会技術研究報告 SP
発表日 1990年

有声音部の定常性を考慮したフレ-ムレ-ト選択型 PARCOR ボコ-ダ
佐藤正俊; 南泰浩; 水井潔; 中川正雄
口頭発表（招待・特別）, 日本語, 電子情報通信学会春季全国大会講演論文集
発表日 1990年

トリグラムモデルを用いた連続単語音声認識における自動単語分類
神田直之; 南泰浩; 中川正雄
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1990年

セパレートベクトル量子化を用いた HMM 音声認識の耐雑音性に関する検討
片岡淳; 南泰浩; 中川正雄
口頭発表（一般）, 日本語, 第１２回情報理論とその応用シンポジューム
発表日 1989年

可変ビットレート Adpcm ・ PARCOR 混成音楽符号化方式
岩元直久; 南泰浩; 中川正雄
口頭発表（一般）, 日本語, 第１２回情報理論とその応用シンポジューム
発表日 1989年

ベクトル量子化を用いる可変フレームレート PARCOR ボコーダ
佐藤正俊; 南泰浩; 水井潔; 中川正雄
口頭発表（一般）, 日本語, 第１２回情報理論とその応用シンポジューム
発表日 1989年

HMM 連続音声認識の高速化
南泰浩; 中川正雄
口頭発表（招待・特別）, 日本語, 日本音響学会春季講演論文集
発表日 1988年

構文解析とプロダクションシステムを付加した連続音声認識
南泰浩; 中川正雄
口頭発表（招待・特別）, 日本語, 電子情報通信学会春季全国大会講演論文集
発表日 1988年

Adfとプロダクションシステムによる異常信号の検出・除去
南泰浩; 中川正雄
口頭発表（招待・特別）, 日本語, 電子情報通信学会秋季全国大会講演論文集
発表日 1988年

Adfとプロダクションシステムによる異常信号の検出・除去
南泰浩; 中川正雄
口頭発表（招待・特別）, 日本語, 電子情報通信学会秋季全国大会講演論文集
発表日 1987年03月

複数の応答⽣成モデルを⽤いた音声雑談対話システムの構築とその対話選択方式の検討
佐藤明智; 南泰浩郭恩孚
人工知能学会全国大会

担当経験のある科目_授業

ヒューマンインターフェース
電気通信大学

ヒューマンインターフェース
電気通信大学

インターンシップ２海外長期
The University of Electro-Communications

インターンシップ２海外長期
電気通信大学

インターンシップ２海外
電気通信大学

インターンシップ２海外
電気通信大学

インターンシップ２（長期）
The University of Electro-Communications

インターンシップ２（長期）
電気通信大学

インターンシップ１海外長期
The University of Electro-Communications

インターンシップ１海外長期
電気通信大学

インターンシップ１（海外）
The University of Electro-Communications

インターンシップ１（海外）
電気通信大学

インターンシップ１（長期）
The University of Electro-Communications

インターンシップ１（長期）
電気通信大学

認知インタラクションデザイン学
京都工芸繊維大学

認知インタラクションデザイン学
京都工芸繊維大学

インターンシップ２
The University of Electro-Communications

インターンシップ２
電気通信大学

インターンシップ１
The University of Electro-Communications

インターンシップ１
電気通信大学

情報システム基礎学合同輪講
電気通信大学

情報システム基礎学合同輪講
電気通信大学

情報システム基礎論１
The University of Electro-Communications

ＦｏｕｎｄａｔｉｏｎｓｏｆＩｎｆｏｒｍａｔｉｏｎＳｙｓｔｅｍｓ１
The University of Electro-Communications

情報システム基礎論１
電気通信大学

応用情報学特論第4
岐阜大学

応用情報学特論第4
岐阜大学

ネットワーク技術と高度情報科社会
大阪大学

ネットワーク技術と高度情報科社会
大阪大学

所属学協会

日本音響学会

ＩＥＥＥ

情報処理学会

電子情報通信学会

言語処理学会

共同研究・競争的資金等の研究課題

幼児語彙発達大規模データの収集と工学的な解析に基づく語彙発達過程の解明
南泰浩
日本学術振興会, 科学研究費助成事業基盤研究(B), 電気通信大学, 基盤研究(B), 23H00623
研究期間 2023年04月 - 2027年03月

大規模データ処理による網羅的データを用いた言語発達機構の解析とその応用
南泰浩
研究代表者
研究期間 2017年04月01日 - 2020年03月31日

人とロボットの共生による協創社会の創成「人ロボット共生学」ロボットのコミュニケーション戦略の生成
研究期間 2009年10月01日 - 2013年03月31日

産業財産権

語彙発達指標推定装置、語彙発達指標推定方法、プログラム
特許権, 南泰浩, 小林哲生, 特許7213509, 登録日: 2023年01月19日

能力推定装置，語選択装置，これらの方法及びプログラム
特許権, 南泰浩, 森山佑亮, 小林哲生, 特願2017-138791, 出願日: 2017年07月18日, 特許6850218, 発行日: 2021年03月31日

幼児単語探索装置とその方法とプログラム
特許権, 2012119556, 出願日: 2012年, 5806642, 発行日: 2015年09月11日

語彙学習速度予測パラメータ生成装置と語彙学習速度予測装置とそれらの方法とプログラム
特許権, 2012119555, 出願日: 2012年, 5785905, 発行日: 2015年07月31日

難易度学習装置、難易度推定モデル学習装置、難易度推定装置、方法、及びプログラム
特許権, 特願2015-031004, 出願日: 2015年02月19日

難易度推定モデル学習装置、難易度推定装置、方法、及びプログラム
特許権, 特願2015-031000, 出願日: 2015年02月19日

難易度推定式学習装置、難易度推定装置、方法、及びプログラム
特許権, 特願2015-030997, 出願日: 2015年02月19日

単語提示装置、計算装置、これらの方法及びプログラム
特許権, 特願2014-256876, 出願日: 2014年12月19日

単語提示装置、方法及びプログラム
特許権, 特願2014-255495, 出願日: 2014年12月17日

発話候補作成装置とその方法とプログラム
特許権, 2013035865, 出願日: 2013年

幼児語彙理解難易度評価装置と幼児語彙検索装置と幼児語彙分類装置と，それらの方法とプログラム
特許権, 2013024274, 出願日: 2013年

報酬関数推定装置，報酬関数推定方法，およびプログラム,
特許権, 2012096453, 出願日: 2012年

理解語月齢テーブル生成装置，対象年齢推定装置，方法，及びプログラム
特許権, 2012128334, 出願日: 2012年

語彙学習関数推定装置，語彙学習関数推定方法及びそのプログラム
特許権, 2012192939, 出願日: 2012年

語彙学習関数推定装置，語彙学習関数推定方法及びそのプログラム
特許権, 2012192938, 出願日: 2012年

特徴検出装置，特徴検出方法及びそのプログラム
特許権, 2012192937, 出願日: 2012年

語彙学習曲線パラメータ推定装置，方法，及びプログラム, 出願番号
特許権, 2012029951, 出願日: 2012年

語彙学習曲線パラメータ推定装置，方法，及びプログラム, 出願番号
特許権, 2012029950, 出願日: 2012年

幼児単語探索装置とその方法とプログラム
特許権, 2012119556, 出願日: 2012年

理解語月齢テーブル生成装置，対象年齢推定装置，方法，及びプログラム
特許権, 2012128334, 出願日: 2012年

対話学習装置，要約装置，対話学習方法，要約方法，プログラム
特許権, 2010179330, 出願日: 2011年, 5346327

語彙学習速度推定装置，方法，及びプログラム
特許権, 2012029949, 出願日: 2011年

語彙学習曲線パラメータ推定装置，方法，及びプログラム
特許権, 2012029950, 出願日: 2011年

コミュニケーションエージェントの動作制御装置，コミュニケーションエージェントの動作制御方法，及びそのプログラム
特許権, 2011139777, 出願日: 2011年

対話モデル構築装置
特許権, 2011110989, 出願日: 2011年

文脈依存性推定装置，発話クラスタリング装置，方法，及びプログラム
特許権, 2011184054, 出願日: 2011年

行動タイミング決定装置，行動タイミング決定方法，およびそのプログラム
特許権, 2011035826, 出願日: 2011年

語彙爆発時期推定装置，方法，及びプログラム
特許権, 2011066456, 出願日: 2011年

語彙爆発時期推定装置，方法，及びプログラム
特許権, 2011060851, 出願日: 2011年

対話評価装置，方法及びプログラム
特許権, 2011110989, 出願日: 2011年

行動制御装置，行動制御方法及び行動制御プログラム
特許権, 2011050493, 出願日: 2011年

語彙学習速度推定装置，方法，及びプログラム
特許権, 2012029949, 出願日: 2011年

行動タイミング決定方法，およびそのプログラム
特許権, 2010203895, 出願日: 2010年, 5361832

行動制御装置，行動制御方法及び行動制御プログラム
特許権, 2010272627, 出願日: 2010年, 5427163

要約装置，要約作成方法及びプログラム
特許権, 2010271397, 出願日: 2010年

対話学習装置，対話分析装置，対話学習方法，対話分析方法
特許権, 2010126882, 出願日: 2010年

対話型映像認識理解における動的学習戦略に関する試み
特許権, 2011017057, 出願日: 2010年

多人数思考喚起型対話装置，多人数思考喚起型対話方法，多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
特許権, 2010186237, 出願日: 2010年

対話からの性格特徴判定装置
特許権, 2009215267, 出願日: 2009年, 5281527

聞き役対話識別装置
特許権, 2009192875, 出願日: 2009年, 5150583

多人数思考喚起型対話装置，多人数思考喚起型対話方法，多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
特許権, 2009028605, 出願日: 2009年, 5218514

行動制御学習方法，行動制御学習装置，行動制御学習プログラム
特許権, 2009199376, 出願日: 2009年, 5361615

多人数思考喚起型対話装置，多人数思考喚起型対話方法，多人数思考喚起型対話プログラム並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体
特許権, 2009028605, 出願日: 2009年, 5218514

音声信号モデル化方法，信号認識装置及び方法，パラメータ学習装置及び方法，特徴量生成装置及び方法並びにプログラム
特許権, 200949901, 出願日: 2009年

能力推定装置、方法及びプログラム
特許権, 特願2020-193982

語選択装置、方法及びプログラム
特許権, 特願2020-193983

語彙発達指標推定装置、語彙発達指標推定方法、プログラム
特許権, 特願2019-006697

南 泰浩

学位

研究分野

経歴

学歴

委員歴

受賞

論文

MISC

書籍等出版物

講演・口頭発表等

担当経験のある科目_授業

所属学協会

共同研究・競争的資金等の研究課題

産業財産権

南　泰浩