TAKAGI KAZUYUKI

Department of InformaticsAssistant Professor
Cluster I (Informatics and Computer Engineering)Assistant Professor
  • Profile:
    1992--1996 Topic structure of spoken dialogue and surface prosodic features
    1997--2000 Automatic recognition of broadcast news speech
    2000--2004 Use of prosodic information in syntactic analysis of sentence and automatic speech recognition
    2004--2008 Robust speech recognition under noisy environments
    2008--2010 Clustering and typifying of multi-lingual speech by acoustic analysis
    2011-- Automatic language identification with spoken utterance
    2017--2019 Automatic detection of water leakage sound with underwater sound analysis

Degree

  • 博士(工学), 筑波大学

Research Keyword

  • text-to-speech
  • automatic speech recognition
  • spoken language identification

Field Of Study

  • Informatics, Human interfaces and interactions
  • Informatics, Intelligent informatics
  • Informatics, Perceptual information processing

Career

  • 01 Apr. 2016
    大学院情報理工学研究科、情報理工学域, 助教
  • 01 Apr. 2010
    電気通信大学大学院情報理工学研究科/情報理工学部, 助教
  • 01 Apr. 2007 - 31 Mar. 2010
    電気通信大学電気通信学部情報工学科, 助教
  • Apr. 1995
    電気通信大学電気通信学部情報工学科, Assistant Researcher
  • Apr. 1989 - Mar. 1992
    IBM Japan, ソフトウェア開発技術者

Educational Background

  • 01 Apr. 1992 - 31 Mar. 1995
    University of Tsukuba, Graduate School, Division of Engineering, 電子・情報工学専攻
  • 01 Apr. 1987 - 31 Mar. 1989
    University of Tsukuba, Graduate School, Division of Science and Engineering, 理工学専攻
  • 01 Apr. 1983 - 31 Mar. 1987
    University of Tsukuba, Third Cluster of College, 情報学類情報科学専攻
  • 01 Apr. 1979 - 31 Mar. 1982
    都立武蔵丘高等学校, 普通科

Paper

  • Development of a Capsule Type Leak Detection Device for Pipeline
    Asano Isamu; Mori Mitsuhiro; Takagi Kazuyuki; Haneda Yoichi; Kawakami Akihiko; Kawabe Syohei
    Water, Land and Environmental Engineering, The Japanese Society of Irrigation, Drainage and Rural Emgineering, 86, 6, 31-36, Jun. 2018, Peer-reviwed
    Scientific journal, Japanese
  • Referential reconstruction in complex frequency domain for word recognition under noisy environments
    Takehiro Ihara; Kazuyuki Takagi; Kazuhiko Ozeki
    The Journal of the Acoustical Society of Japan, The Acoustical Society of Japan (ASJ), 64, 9, 533-544, Sep. 2008, Peer-reviwed, 本論文では音声に雑音が重畳した単一チャネルの信号から原音声を復元し,自動音声認識性能を向上させる手法について述べる。著者らはすでに,小規模の音声データベースを事前に用意し,ある尺度で入力フレームと類似しているフレームをデータベース内から抽出し,その抽出したフレームを参考にして出力を得るという手法を提案しているが,本論文では更にその類似尺度と出力方法の改良法を報告する。改良の要点は,短時間フーリエ変換後の位相情報をそのまま保持しておくことと,そこにバイナリマスクをかけることの2点である。性能評価をするために器楽曲雑音及び環境雑音を用いて単語認識実験を行ったところ,低いSNRにおいて単語正解率の改善が見られた。
    Scientific journal, Japanese
  • The use of overlapped sub-bands in multi-band, Multi-SNR, multi-path recognition of noisy word utterances
    Yutaka Tsuboi; Takehiro Ihara; Kazuyuki Takagi; Kazuhiko Ozeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E91D, 6, 1774-1782, Jun. 2008, Peer-reviwed, A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped subbands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.
    Scientific journal, English
  • Dependency analysis of spontaneous monologue speech using pause and F0 information: a preliminary study
    Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of International Conference of Speech Prosody 2006, PS5-20, May 2006, Peer-reviwed
    International conference proceedings, English
  • Japanese dependency structure analysis using information about multiple pauses and F-0
    MR Lu; K Takagi; K Ozeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E89D, 1, 298-304, Jan. 2006, Peer-reviwed, Syntax and prosody are closely related to each other. This paper is concerned with the problem of exploiting pause information for recovering dependency structures of read Japanese sentences. Our parser can handle both symbolic information such as dependency rule and numerical information such as the probability of dependency distance of a phrase in a unified way as linguistic information. In our past work, post-phrase pause that immediately succeeds a phrase in question was employed as prosodic information. In this paper, we employed two kinds of pauses in addition to the post-phrase pause: post-post-phrase pause that immediately succeeds the phrase that follows a phrase in question, and pre-phrase pause that immediately precedes a phrase in question. By combining the three kinds of pause information linearly with the optimal combination weights that were determined experimentally, the parsing accuracy was improved compared to the case where only the post-phrase pause was used as in our previous work. Linear combination of pause and fundamental frequency information yielded further improvement of parsing accuracy.
    Scientific journal, English
  • Sentence compression using statistical information about dependency path length
    Kiwamu Yamagata; Satoshi Fukutomi; Kazuyuki Takagi; Kazuhiko Ozeki
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 4188, 127-134, 2006, Peer-reviwed, This paper is concerned with the use of statistical information about dependency path length for sentence compression. The sentence compression method employed here requires a quantity called inter-phrase dependency strength. In the training process, original sentences are parsed, and the number of tokens is counted for each pair of phrases, connected with each other by a dependency path of certain length, that survive as a modifier-modified phrase pair in the corresponding compressed sentence in the training corpus. The statistics is exploited to estimate the inter-phrase dependency strength required in the sentence compression process. Results of subjective evaluation shows that the present method outperforms the conventional one of the same framework where the distribution of dependency distance is used to estimate the inter-phrase dependency strength.
    Scientific journal, English
  • Automatic adjustment of subband likelihood recombination weights for improving noise-robustness of a multi-SNR multi-band speaker identification system
    K Yoshida; K Takagi; K Ozeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E87D, 11, 2453-2459, Nov. 2004, Peer-reviwed, This paper is concerned with improving noise-robustness of a multi-SNR multi-band speaker identification system by introducing automatic adjustment of subband likelihood recombination weights. The adjustment is per-formed on the basis of subband power calculated from the noise observed just before the speech starts in the input signal. To evaluate the noise-robustness of this system, text-independent speaker identification experiments were conducted on speech data corrupted with noises recorded in five environments: "bus," "car," "office," "lobby," and "restaurant". It was found that the present method reduces the identification error by 15.9% compared with the multi-SNR multi-band method with equal recombination weights at 0 dB SNR. The performance of the present method was compared with a clean fullband method in which a speaker model training is performed on clean speech data, and spectral subtraction is applied to the input signal in the speaker identification stage. When the clean fullband method without spectral subtraction is taken as a baseline, the multi-SNR multi-band method with automatic adjustment of recombination weights attained 56.8% error reduction on average, while the average error reduction rate of the clean fullband method with spectral subtraction was 11.4% at 0 dB SNR.
    Scientific journal, English
  • Dependency analysis of read Japanese sentences using pause and F0 information: a speaker independent case
    Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of ICSLP2004 (8th International Conference on Spoken Language Processing), 3021-3024, Oct. 2004, Peer-reviwed
    International conference proceedings, English
  • Improved model training and automatic weight adjustment for multi-SNR multi-band speaker identification system
    Kenichi Yoshida; Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of ICSLP2004 (8th International Conference on Spoken Language Processing), 3, 1749-1752, Oct. 2004
    International conference proceedings, English
  • Dependency analysis of read Japanese sentences using pause information: a speaker independent case
    Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of International Conference of Speech Prosody 2004, 595-598, Mar. 2004, Peer-reviwed
    International conference proceedings, English
  • Recovery of Japanese dependency structure using multiple pause information
    Lu Meirong; Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of International Conference of Speech Prosody 2004, 513-516, Mar. 2004, Peer-reviwed
    International conference proceedings, English
  • A neural network approach to dependency analysis of Japanese sentences using prosodic information
    Kazuyuki Takagi; Mamiko Okimoto; Yasuo Ogawa; Kazuhiko Ozeki
    Proceedings of EUROSPEECH2003, 3177-3180, Sep. 2003, Peer-reviwed
    International conference proceedings, English
  • The use of multiple pause information in dependency analysis of spoken Japanese sentences
    Lu Meirong; Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of EUROSPEECH2003, 3173-3176, Sep. 2003, Peer-reviwed
    International conference proceedings, English
  • マルチSNR部分帯域モデルを用いた雑音環境下での話者識別
    吉田健一; 高木一幸; 尾関和彦
    日本音響学会誌, 59, 1, 3-12, Jan. 2003, Peer-reviwed
    Scientific journal, Japanese
  • Combination of pause and F0 information in dependency analysis of Japanese sentences
    Kazuyuki Takagi; Hajime Kubota; Kazuhiko Ozeki
    Proc. of Interspeech 2002, 2, 1173-1176, Sep. 2002, Peer-reviwed
    International conference proceedings, English
  • Evaluation of a Japanese sentence compression method based on phrase significance and inter-phrase dependency
    Rei Oguro; Hiromi Sekiya; Yuhei Morooka; Kazuyuki Takagi; Kazuhiko Ozeki
    Lecture Notes in Artificial Intelligence, subseries of Lecture Notes in Computer Science, 2448, 27-32, Sep. 2002, Peer-reviwed
    Scientific journal, English
  • Effectiveness of word string language models on noisy broadcast news speech recognition
    K Takagi; R Oguro; K Ozeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E85D, 7, 1130-1137, Jul. 2002, Peer-reviwed, Experiments were conducted to examine an approach from language modeling side to improving noisy speech recognition performance. By adopting appropriate word strings as new units of processing, speech recognition performance was improved by acoustic effects as well as by test-set perplexity reduction. Three kinds of word string language models were evaluated, whose additional lexical entries were selected based on combinations of part of speech information, word length, occurrence frequency, and log likelihood ratio of the hypotheses about the bigram frequency. All of the three word string models reduced errors in broadcast news speech recognition, and also lowered test-set perplexity. The word string model based on log likelihood ratio exhibited the best improvement for noisy speech recognition, by which deletion errors were reduced by 26%. substitution errors by 9.3%; and insertion errors by 13%, in the experiments using the speaker-dependent, noise-adapted triphone. Effectiveness of word string models on error reduction was more prominent for noisy speech than for studio-clean speech.
    Scientific journal, English
  • Recovery of Japanese dependency structure using prosodic information
    Kazuhiko Ozeki; Kazuyuki Takagi; Hajime Kubota
    Proceedings for 2001 2nd Plenary Meeting and Symposium on Prosody and Speech Processing, 169-174, Jan. 2002
    International conference proceedings, English
  • 日本語読み上げ文の係り受け解析における韻律的特徴量の有効性
    廣瀬幸由; 尾関和彦; 高木一幸
    自然言語処理, The Association for Natural Language Processing, 8, 4, 71-89, Oct. 2001, Peer-reviwed, Prosody contains information that is lost when utterances are transcribed into letters or characters. Such information may be useful for syntactic analysis of spoken sentences. In our previous work, we took up 12 prosodic features, and made a statistical model to represent the relationship between those features and dependency distances. Then, using a dependency analyzer that incorporates the model, we have shown that prosodic information is in fact effective for dependency analysis of read Japanese sentences. In the present work, we employed 24 features including new ones, and conducted an extensive search for effective ones. Also, the statistical model was modified to better fit the actual distributions of the feature values. As a result, in open experiments using the ATR 503-sentence database, the correct parsing rate was improved by 21.2% with the use of the prosodic features. This figure is 4.0 points higher than the improvement in the previous experiment of our group. Among the features, the duration of pause was definitely effective in both the open and the closed experiments, while the effectiveness of other features related to the pitch, the power, and the speaking rate, when used together with the duration of pause, was not clear in the open experiments.
    Scientific journal, Japanese
  • The use of prosody in Japanese dependency structure analysis
    Kazuhiko Ozeki; Kazuyuki Takagi; Hajime Kubota
    Proceedings of ISCA Tutorial and Research Workshop on Speech Recognition and Understanding, Red Bank, 123-126, Oct. 2001
    International conference proceedings, English
  • Pause information for dependency analysis of read Japanese sentences
    Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of EUROSPEECH 2001 (Proceedings of 7th European Conference on Speech Communication and Technology), 2, 1041-1044, Sep. 2001, Peer-reviwed
    International conference proceedings, English
  • A multi-SNR subband model for speaker identification under noisy environments
    Ken'ichi Yoshida; Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of EUROSPEECH 2001 (Proceedings of 7th European Conference on Speech Communication and Technology), 4, 2849-2852, Sep. 2001, Peer-reviwed
    International conference proceedings, English
  • 文節重要度と係り受け整合度に基づく日本語文簡約アルゴリズム
    小黒玲; 尾関和彦; 張玉潔; 高木一幸
    自然言語処理, 8, 3, 3-18, Jul. 2001, Peer-reviwed
    Japanese
  • Effects of word string language models on noisy broadcast news speech recognition
    Kazuyuki Takagi; Rei Oguro; Kazuhiko Ozeki
    Proceedings of ICSLP2000 (International Conference on Spoken Language Processing), 1, 154-157, Oct. 2000, Peer-reviwed
    International conference proceedings, English
  • Effectiveness of prosodic features in syntactic analysis of read Japanese sentences
    Yukiyoshi Hirose; Kazuhiko Ozeki; Kazuyuki Takagi
    Proceedings of ICSLP2000 (International Conference on Spoken Language Processing), 1, 215-218, Oct. 2000, Peer-reviwed
    International conference proceedings, English
  • An efficient algorithm for Japanese sentence compaction based on phrase importance and inter-phrase dependency
    R Oguro; K Ozeki; YJ Zhang; K Takagi
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 1902, 103-108, 2000, Peer-reviwed, This paper describes an efficient algorithm for Japanese sentence compaction, First, a measure of grammatical goodness of phrase sequences is defined on the basis of a Japanese dependency grammar. Also a measure of topical importance of phrase sequences is given. Then the problem of sentence compaction is formulated as an optimisation problem of selecting a subsequence of phrases from the original sentence that maximises the sum of the grammatical goodness and the topical importance. A recurrence equation is derived by using the principle of dynamic programming, which is then translated into an algorithm to solve the problem. The algorithm is of polynomial-time with respect to the original sentence length. Finally, an example of sentence compaction is presented.
    Scientific journal, English
  • Speaker indentification using subband HMMs
    Kenichi Yoshida; Kazuyuki Takagi; Kazuhiko Ozeki
    Proceedings of EUROSPEECH 99 (Proceedings of 6th European Conference on Speech Communication and Technology), 2, 1019-1022, Sep. 1999, Peer-reviwed
    International conference proceedings, English
  • Performance comparison of recognition systems: a Bayesian approach
    Kazuhiko Ozeki; Yoshiyasu Ishigami; Kazuyuki Takagi
    The Journal of the Acoustical Society of Japan (E), Japan Acoustical Society of Japan, 20, 3, 171-179, May 1999, Peer-reviwed, This paper describes a Bayesian approach to performance comparison of recognition systems. Unlike a conventional statistical test, this method makes no decision whether there is a significant difference between the true recognition rate of System A and that of System B. Instead, it gives the probability of the event that the true recognition rate of A is higher than that of B given their recognition results. The probability is referred to as the superiority of A to B. This is similar to a numerical weather forecast, in which what is predicted is the probability of having a certain amount of rain, not a prospect of being sunny or rainy. The superiority is exemplified in various cases for the manner of inputting test data and observing the recognition results, and then its sensitivity for the difference between the respective sample recognition rates of A and B is investigated. All the results support that this method has natural properties which conform to our intuition. The relationship between the superiority in this method and the level of significance in statistical tests is also discussed.
    Scientific journal, English
  • Performance evaluation of word phrase and noun category language models for broadcast news speech recognition
    Kazuyuki Takagi; Rei Oguro; Kenji Hashimoto; Kazuhiko Ozeki
    Proceeding of the 5th International Conference on Spoken Language Processing, 6, 2507-2510, Dec. 1998, Peer-reviwed
    International conference proceedings, English
  • Segmentation of spoken dialogue by interjections, disfluent utterances and pauses
    K Takagi; S Itahashi
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, I E E E, 2, 697-700, 1996, Peer-reviwed, This paper attempts to segment spontaneous speech of human-to-human spoken dialogues into a relatively large unit of speech, that is, a sub-phrasal unit segmented by interjections, disfluent utterances and pauses. A spontaneous speech model incorporating prosody was developed, in which three kinds of speech segment models and the transition probabilities among them were specified. The segmentation experiments showed that 87.6 % of the segment boundaries were located correctly within 50 msec, 81.2 % within 30 msec, which showed 10.1 point increase in performance comparing with the initial model without prosodic information.
    International conference proceedings, English
  • Effectiveness of pause information in the content word detection of spoken dialogues
    Kazuyuki Takagi; Shuichi Itahashi
    Proceedings of EUROSPEECH '95, 1, 19-22, Sep. 1995, Peer-reviwed
    International conference proceedings, English
  • TEMPORAL CHARACTERISTICS OF UTTERANCE UNITS AND TOPIC STRUCTURE OF SPOKEN DIALOGS
    K TAKAGI; S ITAHASHI
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRON INFO COMMUN ENG, E78D, 3, 269-276, Mar. 1995, Peer-reviwed, There are various difficulties in processing spoken dialogs because of acoustic, phonetic, and grammatical ill-formedness, and because of interactions among participants. This paper describes temporal characteristics of utterances in human-human task-oriented dialogs and interactions between the participants, analyzed in relation to the topic structure of the dialog. We analyzed 12 task-oriented simulated dialogs of ASJ continuous speech corpus conducted by 13 different participants whose total length being 66 minutes. Speech data was segmented into utterance units each of which is a speech interval segmented by pauses. There were 3876 utterance units, and 38.9% of them were interjections, fillers, false starts and chiming utterances. Each dialog consisted of 6 to 15 topic segments in each of which participants exchange specific information of the task. Eighty-six out of 119 new topic segments started with interjectory utterances and filled pauses. It was found that the durations of turn-taking interjections and fillers including the preceding silent pause were significantly longer in topic boundaries than the other positions. The results indicate that the duration of interjection words and filled pauses is a sign of a topic shift in spoken dialogs. In natural conversations, participants' speaking modes change dynamically as the conversation develops. Response time of both client and agent role speakers became shorter as the dialog proceeded. This indicates that interactions between the participants become active as the dialog proceeds. Speech rate was also affected by the dialog structure. initiating and terminating parts where most utterances are of fixed expressions, and slow in topic segments of the body part of the dialog where both client and agent participants stalled to speak in order to retrieve task knowledge. The results can be utilized in man-machine dialog systems, e.g., in order to detect topic shifts of a dialog, and to make the speech interface of dialog systems more natural to a human participant.
    Scientific journal, English
  • Annotating Illocutionary Force Types and Phonological Features into a Spontaneous Dialogue Corpus : An Experimental Study
    Kazuyo Tanaka; Kanae Kinibuchi; Naoko Houra; Kazuyuki Takagi; Shuichi Itahashi; Katsunobu Itoh; Satoru Hayamizu
    Proceedings of ICSLP94 (International Conference on Spoken Language Processing), 3, 1831-1834, Sep. 1994, Peer-reviwed
    International conference proceedings, English
  • Prosodic pattern of utterance units in Japanese spoken dialogs
    Kazuyuki Takagi; Shuichi Itahashi
    Proceedings of ICSLP'94 (The 3rd International Conference on Spoken Language Processing, 1, 143-146, Sep. 1994, Peer-reviwed
    International conference proceedings, English
  • Characteristics of utterance units and temporal structure of spoken dialog
    Kazuyuki Takagi; Naoko Houra; Shuichi Itahashi
    Proceedings of ISSD93 (International Symposium on Spoken Dialogue), 287-290, Nov. 1993, Peer-reviwed
    International conference proceedings, English
  • Formant Frequency Extraction by Moment Calculation of Speech Spectrum
    Kazuyuki Takagi; Shuichi Itahashi
    The Journal of the Acoustical Society of Japan (E), Japan Acoustical Society of Japan, 12, 1, 47-50, Jan. 1991, Peer-reviwed
    Scientific journal, English
  • Automatic formant frequency extraction by moment calculation of speech spectrum
    Shuichi Itahashi; Kazuyuki Takagi
    Proceedings of EUROSPEECH'89, 2, 207-210, Sep. 1989, Peer-reviwed
    International conference proceedings, English
  • Formant frequency estimation by moment calculation of speech spectrum
    Kazuyuki Takagi; Shuichi Itahashi
    Proceedings of 2nd Joint Meeting of ASA and ASJ, Journal of the Acoustical Society of America, 84-J6, 22, Nov. 1988, Peer-reviwed
    International conference proceedings, English

MISC

  • Automatic Language Identification Based on Posterior Probability on Articulatory Classes
    Takumi Hirata; Kazuyuki Takagi
    Extraction of features from input speech that are effective in distinguishing the language is a key issue for language identification system. We use posterior probabilities on articulatory classes as features for language identification. Posterior probability on each articulatory class is calculated by GMMs. Each GMM is trained with MFCC data of speech segments labeled with the phonemes or acoustic events that correspond to the articulatory class. The posterior probability values of the articulatory classes are concatenated to form an articulatory-feature- class-posterior-probability (AFCPP) vector at each analysis frame. These vectors are then quantized to yield VQ code sequence, which is used as the training data for a n-gram language model. Language identification is performed by selecting the n-gram model that yields the highest likelihood for the AFCPP vector sequence of the input utterance. Language identification experiment between Japanese and English by the present method showed identification rate of 97.1%., Information Processing Society of Japan (IPSJ), 08 Dec. 2014, IPSJ SIG Notes, 2014, 28, 1-5, Japanese, 110009850972, AN10442647
  • A study on language identification using non-negative matrix factorization as an extractor of phonotactic information
    OGATA Tsuyoshi; TAKAGI Kazuyuki
    Language identification is the technique to identify the language being spoken by an unknown speaker. In this paper, phonotactic information was used as the feature for language identification. In order to obtain phonotactic information, it is required to extract the phoneme sequence from speech data. A template-based non-negative matrix factorization was applied for this purpose. The extracted phoneme sequence was then analyzed to yield n-gram models which may reflect the order in which the phoneme-like categories of speech occur in the language. Language identification was carried out by a support vector machine with the n-gram as the feature vector. It is shown that the identification performance changes with the number of spectrum templates and the order of n-gram, and that the best performance of 98.6% was obtained when the number of spectrum was 13 and the order of n-gram was 3., The Institute of Electronics, Information and Communication Engineers, 19 Dec. 2011, IEICE technical report. Speech, 111, 365, 45-48, Japanese, 0913-5685, 110009466803, AN10013221

Lectures, oral presentations, etc.

  • Automatic detection of water leakage sound in a water pipeline from underwater acoustic data --- an analysis of eld recording data ---
    髙木一幸; 石川佳佑; 羽田陽一; 浅野勇; 農; 森充広; 農研; 川上昭彦; 川邉翔平
    Poster presentation, Japanese, Acoustical Society of Japan 2019 Spring Meeting, 一般財団法人日本音響学会, 電気通信大学, 農業水利施設のパイプラインの漏水診断を目的として現地の流下試験で収録した音響データについて、観測された各種音響イベント・漏水音の特性とその自動検出に関し、実験用水路のデータと比較して考察する。
    06 Mar. 2019
  • Automatic detection of water leakage sound in a water pipeline from underwater acoustic data - a subbed acoustic model approach -
    髙木一幸; 石川佳佑; 羽田陽一; 浅野勇; 農; 森充広; 農研; 川上昭彦; 川邉翔平
    Poster presentation, Japanese, 日本音響学会2018年秋季研究発表会, 一般財団法人日本音響学会, 大分大学旦野原キャンパス, 農業水利施設のパイプラインの漏水診断を目的として、実験用水路で 収録した音響データから
    機械学習した漏水音と非漏水音の複数帯域音響モデルにより漏水音を自動検知することを試みた。
    12 Sep. 2018
  • Automatic detection of water leakage sound in a water pipeline from underwater acoustic data
    髙木一幸; 羽田陽一; 浅野勇; 農; 森充広; 農; 川上昭彦; 川邉翔平
    Poster presentation, Japanese, 日本音響学会2018年春季研究発表会, 一般財団法人日本音響学会, 日本工業大学宮代キャンパス
    13 Mar. 2018
  • 調音クラス事後確率による言語識別における連続型言語モデルの検討
    石川佳佑; 髙木一幸
    Poster presentation, Japanese, 日本音響学会2018年春季研究発表会, 一般財団法人日本音響学会, 日本工業大学宮代キャンパス, Domestic conference
    13 Mar. 2018
  • 調音クラス事後確率を用いた言語識別 -線形判別分析を用いた特徴量抽出の改良-
    Keisuke Ishikawa; Kazuyuki Takagi
    Poster presentation, Japanese, 日本音響学会2017年秋季研究発表会, 日本音響学会, 愛媛大学, 本研究室は,調音に着目した言語識別法を提案した.先行研究では,調音クラス抽出における認識率が約60から90%であった.そこで,LDA分析を用いることで,調音クラス抽出の精度をあげ,言語識別率の向上を目的とした., Domestic conference
    26 Sep. 2017
  • Using local feature and group delay spectrum in language identification based on posterior probability of articulatory classes
    Risa Koizumi; Kazuyuki Takagi
    Poster presentation, Japanese, 日本音響学会2016年春季研究発表会, 日本音響学会, 桐蔭横浜大学, In most of current speech processing techniques, MFCC obtained from amplitude spectrum and delta-MFCC calculated as time derivative of MFCC are widely used as acoustic features. However, these features consider neither frequency derivative of amplitude spectrum nor phase information of speech waveform. Local feature and group delay spectrum are among the features claimed by previous works to possess such information useful for speech processing. We therefore examine their effectiveness on speech recognition performance. We conducted phoneme recognition experiments using speaker-dependent phoneme HMMs trained with local feature, group delay spectrum, and MFCC in same speaker, same gender, and different gender conditions. We obtained highest recognition rate by local feature, while the other features showed better performance for some phonemes. Likelihood combination of local feature, group delay spectrum, and MFCC HMMs yielded better phoneme recognition rate than the case in which each HMM was used solely. Results show that it is promising that recognition performance degradation can be alleviated by a combination of local feature, group delay spectrum, and MFCC., Domestic conference
    11 Mar. 2016
  • Automatic Language Identification Based on Posterior Probability on Articulatory Classes: Language-independent articulatory feature extractor and codebook size
    Takumi Hirata; Kazuyuki Takagi
    Poster presentation, Japanese, 電子情報通信学会技術報告,2015年3月度音声研究会, 電子情報通信学会、日本音響学会, 南の美ら花ホテルミヤヒラ, Wesetarticulatoryclassesbasedonarticulatoryfeature,andweuseposteriorprobabilitiesonarticula-toryclassesforlanguageidenti cation.PosteriorprobabilityoneacharticulatoryclassiscalculatedbyarticulatoryfeatureextractorbasedonGMMs.Theposteriorprobabilityvaluesofthearticulatoryclassesareconcatenatedtoformvectorateachanalysisframe.ThesevectorsarethenquantizedtoyieldVQcodesequence,whichisusedasthetrainingdataforan-gramlanguagemodel.Theresultsoflanguageidenti cationexperimentbetweenJapaneseandEnglishshowedchangeinidenti cationperformancebycodebooksize.Themethodthatuseslanguage-dependentarticulatoryfeatureextractorshowedidenti cationrateof98.1%whencodebooksizewas64,andthemethodthatuseslanguage-independentarticulatoryfeatureextractorshowedidenti cationrateof95.6%whencodebooksizewas256., Domestic conference
    03 Mar. 2015
  • Effectiveness of local feature, group delay spectrum, MFCC and their combination on phoneme recognition performance
    Risa Koizumi; Kazuyuki Takagi
    Poster presentation, Japanese, 電子情報通信学会技術報告,2015年3月度音声研究会, 電子情報通信学会、日本音響学会, 南の美ら花ホテルミヤヒラ, In most of current speech processing techniques, MFCC obtained from amplitude spectrum and delta-MFCC calculated as time derivative of MFCC are widely used as acoustic features. However, these features consider neither frequency derivative of amplitude spectrum nor phase information of speech waveform. Local feature and group delay spectrum are among the features claimed by previous works to possess such information useful for speech processing. We therefore examine their effectiveness on speech recognition performance. We conducted phoneme recognition experiments using speaker-dependent phoneme HMMs trained with local feature, group delay spectrum, and MFCC in same speaker, same gender, and different gender conditions. We obtained highest recognition rate by local feature, while the other features showed better performance for some phonemes. Likelihood combination of local feature, group delay spectrum, and MFCC HMMs yielded better phoneme recognition rate than the case in which each HMM was used solely. Results show that it is promising that recognition performance degradation can be alleviated by a combination of local feature, group delay spectrum, and MFCC., Domestic conference
    03 Mar. 2015
  • Automatic Language Identification Based on Posterior Probability on Articulatory Classes
    Takumi Hirata; Kazuyuki Takagi
    Poster presentation, Japanese, 電子情報通信学会技術報告,第16回音声言語シンポジウム, 電子情報通信学会、日本音響学会, 東京工業大学すずかけ台キャンパス, Extraction of features from input speech that are effective in distinguishing the language is a key issue for language identification system. We use posterior probabilities on articulatory classes as features for language identification. Posterior probability on each articulatory class is calculated by GMMs. Each GMM is trained with MFCC data of speech segments labeled with the phonemes or acoustic events that correspond to the articulatory class. The posterior probability values of the articulatory classes are concatenated to form an articulatory-feature-class-posterior-probability (AFCPP) vector at each analysis frame. These vectors are then quantized to yield VQ code sequence, which is used as the training data for a n-gram language model. Language identification is performed by selecting the n-gram model that yields the highest likelihood for the AFCPP vector sequence of the input utterance. Language identification experiment between Japanese and English by the present method showed identification rate of 97.1%., Domestic conference
    16 Dec. 2014
  • A study on language identification between Japanese and English by non-negative matrix factorization with common spectral templates
    Takahiro Ishii; Tsuyoshi Ogata; Kazuyuki Takagi
    Oral presentation, Japanese, Proceedings of 2013 Autumn Meeting Acoustical Society of Japan
    Sep. 2013
  • A study on language identification with non-negative matrix factorization as an extractor of phonotactic information
    Tsuyoshi Ogata; Kazuyuki Takagi
    Oral presentation, Japanese, Proceedings of 2013 Spring Meeting Acoustical Society of Japan
    Mar. 2013
  • A study on language identification with non-negative matrix factorization as an extractor of phonotactic information
    Tsuyoshi Ogata; Kazuyuki Takagi
    Oral presentation, Japanese, Proceedings of 2012 Spring Meeting Acoustical Society of Japan
    Mar. 2012
  • A study on language identification using non-negative matrix factorization as an extractor of phonotactic information
    Tsuyoshi Ogata; Kazuyuki Takagi
    Oral presentation, Japanese, IEICE Technical Report
    Dec. 2011
  • Dictionary selection by prosodic information for isolated word recognition in noisy environment
    Hidetaka Ogura; Kazuyuki Takagi; Toshinobu Yoshida
    Oral presentation, Japanese, 日本音響学会講演論文集,日本音響学会2010年春季研究発表会
    Mar. 2010
  • A study on recombination weights for multi-band multi-SNR multi-path isolated word recognition
    Yuuichi Tsuchiya; Kazuyuki Takagi; Toshinobu Yoshida
    Oral presentation, Japanese, 日本音響学会講演論文集,日本音響学会2009年春季研究発表会
    Mar. 2009
  • A study on GMM-based language identification by multi-SNR multi-band method
    Kazuyuki Takagi; Shunsuke Kaku
    Oral presentation, Japanese, 日本音響学会講演論文集,日本音響学会2009年春季研究発表会
    Mar. 2009
  • 韻律情報を用いた話し言葉コーパスの係り受け解析の試み
    高木 一幸; 尾関 和彦
    Oral presentation, Japanese, 日本音響学会,日本音響学会2006年春季研究発表会
    Mar. 2006
  • 概念距離と係り受けを利用した要約文の文節対応付け
    福冨諭; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 言語処理学会,言語処理学会第12回年次大会
    Mar. 2006
  • 係り受け経路長を利用した新聞記事の自動簡約
    山形究; 福冨諭; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 言語処理学会,言語処理学会第12回年次大会
    Mar. 2006
  • マルチSNR・マルチバンド音声認識のためのHMM学習用雑音に関する検討
    坪井 豊; 高木 一幸; 尾関 和彦
    Oral presentation, Japanese, 日本音響学会,日本音響学会2005年秋季研究発表会
    Sep. 2005
  • 参照再構成法を用いた時間周波数マスクによる音声と音楽の分離
    井原健紘; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 電子情報通信学会総合大会
    Mar. 2005
  • 概念距離と係り受けを利用した要約文の文節対応付け
    福冨諭; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 情報処理学会第67回全国大会講演論文集
    Mar. 2005
  • マルチSNR・マルチバンド法を用いた話者識別における様々な学習雑音に対する性能評価
    吉田健一; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2005
  • 話し言葉コーパスの係り受け解析を目的とした韻律の分析
    高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2005
  • 係り受け解析における韻律情報有効性の多数話者による評価
    高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2004
  • 着目文節の前後のポーズ情報を利用した係り受け解析
    呂美蓉; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2004
  • 1/f雑音を用いたマルチSNR部分帯域法による雑音下話者識別
    吉田健一; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2004
  • マルチパス方式を用いた雑音環境下での単語音声認識―アクセント情報の利用―
    小野寺栄; 吉田健一; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2004
  • 雑音下話者識別におけるマルチSNR部分帯域法とスペクトルサブトラクション法の性能比較
    吉田健一; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Sep. 2003
  • マルチSNR部分帯域モデルを用いた話者識別システムの耐雑音性能改善
    吉田健一; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2003
  • 日本語読み上げ文の係り受け解析における複数ポーズ情報の利用
    呂美蓉; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2003
  • 韻律情報を用いた日本語読み上げ文の係り受け解析におけるニューラルネットワークの利用
    沖本真美子; 小川善生; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2003
  • サポートベクターマシンによる日本語長文の短文分割
    根岸知弘; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 言語処理学会第9回年次大会発表論文集
    Mar. 2003
  • 係り受け整合度と文節重要度を用いた自動簡約文の主観評価
    諸岡祐平; 小黒玲; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 言語処理学会第9回年次大会発表論文集
    Mar. 2003
  • The use of prosody for disambiguating Japanese dependency structure
    Kazuyuki Takagi; Mamiko Okimoto; Yoshio Ogawa; Kazuhiko Ozeki
    Others, English, Proceedings for 2002 2nd Plenary Meeting and Symposium on Prosody and Speech Processing
    Feb. 2003
  • 文節間係り受け整合度と文節重要度を用いて自動簡約した日本語文の主観評価
    諸岡祐平; 小黒玲; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 情報処理学会研究報告自然言語処理
    Jan. 2003
  • 韻律を利用した係り受け解析におけるポーズ・基本周波数情報の結合法の検討
    久保田新; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 2002
  • 係り受け解析におけるポーズ・ピッチの利用法の検討
    久保田新; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集 I, 2-2-8
    Oct. 2001
  • 文節重要度と係り受け整合度に基づいた文簡約実験
    小黒玲; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 電子情報通信学会技術研究報告, NLC 2001-3
    May 2001
  • 雑音重畳音声の認識における連語言語モデルの比較
    高木一幸; 小黒玲; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集Ⅰ,2-33-18
    Mar. 2001
  • フレーム単位で最適SNR部分帯域モデルを選択する話者認識
    吉田健一; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集Ⅰ,3-P-8
    Mar. 2001
  • 係り受け解析における韻律情報の利用
    高木一幸; 尾関和彦
    Others, Japanese, 文部省科学研究費補助金特定領域研究(B)「韻律に着目した音声言語処理の高度化」研究成果報告書(平成12年度)(領域代表者 広瀬啓吉)
    Jan. 2001
  • マルチSNR部分帯域GMMを用いた雑音環境下での話者認識
    吉田健一; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 電子情報通信学会技術研究報告,DSP2000-97,SP2000-63
    Sep. 2000
  • 文節重要度と係り受け整合度に基づく文要約アルゴリズム
    小黒玲; 尾関和彦; 張玉潔; 高木一幸
    Oral presentation, Japanese, 言語処理学会第6回年次大会発表論文集
    Mar. 2000
  • 種々の音響条件におけるニュース音声認識についての考察
    高木一幸; 小黒玲; 林真由美; 八木澄江; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集Ⅰ,1-Q-10
    Mar. 2000
  • 複数の単語bigramモデルを線形結合した言語モデルの検討
    小黒玲; 高木一幸; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集Ⅰ,3-Q-20
    Oct. 1999
  • 単語クラスタリングに基づく言語モデルを用いたニュース音声認識
    橋本顕示; 高木一幸; 小黒 玲; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集I,1-1-26
    Mar. 1999
  • ニュース音声認識のための言語モデルの比較
    小黒玲; 高木一幸; 橋本顕示; 尾関和彦
    Oral presentation, Japanese, 日本音響学会講演論文集,1-6-22
    Mar. 1998
  • Performance evaluation of language models for broadcast news speech recognition
    高木一幸; 小黒玲; 橋本顕示; 尾関和彦
    Oral presentation, Japanese, Technical Report of IEICE(SP)
    Dec. 1997
  • 対話音声の発話単位への自動区分の検討
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 情報処理学会研究報告
    Feb. 1997
  • 間投詞・非流暢発話と休止による対話音声区分化の検討
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 1996
  • 対話音声中の自立語の検出におけるポーズ情報導入の効果
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 1995
  • 対話音声コーパスの発話文タイプ・形態素情報の付与
    田中和世; 杵淵香奈恵; 保浦直子; 高木一幸; 小栗直樹; 板橋秀一; 伊藤克亘; 速水悟
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 1994
  • 音声対話における発話系列の韻律パタン
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 1994
  • 対話における話題展開と発話単位の性質
    高木一幸; 保浦直子; 板橋秀一
    Oral presentation, Japanese, Spontaneous Speechの分析・理解・生成
    Jul. 1993
  • 対話音声中の発話単位の時間関係
    高木一幸; 保浦直子; 板橋秀一
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 1993
  • 模擬対話音声における各種区分の持続時間の性質
    高木一幸; 保浦直子; 板橋秀一
    Oral presentation, Japanese, 電子情報通信学会技術研究報告
    Dec. 1992
  • 対話音声中の各種音形・韻律単位の性質
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 日本音響学会講演論文集
    Oct. 1992
  • スペクトルのモーメント計算によるホルマント周波数推定
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 1989
  • スペクトルモーメントを利用した音声のホルマント周波数推定法
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 電子情報通信学会技術研究報告
    Oct. 1988
  • スペクトルモーメントによる母音ホルマント周波数の推定
    高木一幸; 板橋秀一
    Oral presentation, Japanese, 日本音響学会講演論文集
    Mar. 1988

Courses

  • メディア情報学実験(音声認識)
    The University of Electro-Communications
  • マルチメディア処理
    The University of Electro-Communications
  • 情報領域演習第二C演習(アセンブラプログラミンング)
    The University of Electro-Communications
  • マルチメディア処理
    The University of Electro-Communications
  • マルチメディア処理
    電気通信大学
  • メディア情報学実験(音声認識)
    The University of Electro-Communications
  • メディア情報学実験(音声認識)
    電気通信大学
  • 情報領域演習第二C演習
    The University of Electro-Communications
  • 情報領域演習第二C演習
    電気通信大学
  • 情報領域演習第二C演習(アセンブラプログラミンング)
    The University of Electro-Communications
  • 情報領域演習第二C演習(アセンブラプログラミンング)
    電気通信大学
  • 音声音響情報処理
    The University of Electro-Communications
  • 大学院輪講第一(II)
    The University of Electro-Communications
  • メディア情報学実験
    The University of Electro-Communications
  • 大学院輪講第一(I)
    The University of Electro-Communications
  • 大学院輪講第一(II)
    The University of Electro-Communications
  • 大学院輪講第一(II)
    電気通信大学
  • 大学院輪講第一(I)
    The University of Electro-Communications
  • 大学院輪講第一(I)
    電気通信大学
  • メディア情報学実験
    The University of Electro-Communications
  • Media Science and Engineering Laboratory
    The University of Electro-Communications
  • 音声音響情報処理
    The University of Electro-Communications
  • 音声音響情報処理
    電気通信大学
  • Physics Laboratory
    The University of Electro-Communications
  • 基礎科学実験A
    電気通信大学
  • メディア情報学実験
    電気通信大学

Affiliated academic society

  • Information Processing Society of Japan
  • The Acoustical Society of Japan
  • The Institute of Electronics, Information and Communication Engineers

Research Themes

  • 水中音波生成・解析を利用した漏水探索ロボットの位置特定および漏水データの特性分析に関する研究
    田野俊一
    (研)科学技術振興機構「SIP」, 研究助成, 模擬実験用管路で収集したデータを検聴してイベントのラベル付けを行い、パターン学習および評価用のデータを整備する。 -上記のデータに基づき、漏水箇所の音響的特徴を解析し、パターン認識処理に適した信号処理方法の設計を行う。 -適切な条件で信号処理された音響的特徴を基に統計的音声認識技術を応用した機械学習を行い、漏水箇所とそれ以外の箇所の自動検出の可能性を検討する。
    01 Apr. 2017 - 31 Mar. 2019
  • High Compression-Rate Automatic Summarization of Newspaper Articles Based on Combined Use of Significant Sentence Extraction and Sentence Compression
    OZEKI Kazuhiko; TAKAGI Kazuyuki
    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), 1.In this work, we use a corpus in which pairs of newspaper articles and corresponding hand-made short summaries are contained. This corpus provides information about how humans make short summaries. To obtain such information effectively, phrase alignment is necessary between the original sentence and its summary. We developed a phrase aligner that makes use of conceptual distance and inter-phrase dependency. 2.Before the research period started, we were using the inter-phrase dependency strength estimated from the distribution of dependency distance in the set of original sentences. This method misses, however, the relationship between the original sentence and its summary. In this work, we estimated the inter-phrase dependency strength from the relative frequency of phrase pairs that exist in the original sentence with a certain dependency path length and remain having modifier-modified relation in the corresponding summary. The result of a subjective evaluation experiment showed significant improvement in the quality of compressed sentences. 3.In the phrase extraction type sentence compression, which is employed in this research, phrases that are not in modifier-modified relation in the original sentence sometimes appear to have modifier-modified relation in the compressed sentence. Such a phenomenon may degrade the readability of compressed sentences. We worked out a method to modify the phrase ending of the modifier-phrase for improving the readability of compressed sentences. The result of subjective evaluation experiment showed the effectiveness of the method. 4.We reformulated our sentence compression method in a probabilistic framework. In calculating the probability that a compressed sentence is generated from an original sentence, quantities similar to phrase significance and inter-phrase dependency appear, which can be estimated from a training corpus. It was shown that this probabilistic approach attains comparable performance as our former, heuristic approach., 16500077
    2004 - 2006
  • 音声認識・理解における韻律情報の利用
    尾関 和彦; 峯松 信明; 山下 洋一; 吉田 利信; 高木 一幸; 荒木 雅弘; 新美 康永
    日本学術振興会, 科学研究費助成事業, 電気通信大学, 特定領域研究, 1.音声知覚における韻律の役割解明と音声認識への応用 (1)句頭アクセント核の検出とそれに基づく仮説探索制御を実装した.単語アクセントは前後の環境により変化するが,句頭に核が存在した場合は,その単語は必ず一型となる.この規則の基づき,句頭のF0情報よりその語が一型となる事後確率を求め,韻律スコアを導入した.連続音声認識システムJuliusに本モジュールを実装し,大語彙連続音声認識におけるその有効性を示した. (2)音声の時間構造を,局所話速の分析を中心に,文内の文節継続長を決定する統計モデル,文節内のモーラ継続長制御モデル,モーラ内での子音継続時間長制御モデルの3階層でモデル化した.また,それぞれのモデルについて時間構造の知覚実験を行い,時間的制約について検討した. 2.発話の構文・意味解析における韻律情報の利用 (1)これまで利用した着目文節の直後のポーズと着目文節の直後の文節の直後のポーズに加えて,着目文節の直前のポーズを利用することにより,係り受け解析の精度が向上することを確認した.また,これらのポーズ情報にF0情報を加えることにより,さらなる解析精度の向上が得られた. (2)多数の話者による音声データを用いて不特定話者条件の係り受け解析実験を行った結果,ポーズ長とF0特徴量のモデルは従来より簡単なものでよいこと,ポーズ長は平均音節継続長で正規化した方が良いことなどがわかった.また,大量のコーパスを用いて評価文に対する被覆率が高い係り受け規則を新たに作成した. 3.音韻情報と韻律情報を統合した音声認識・理解システム ディクテーションシステムにおける入力補完候補の絞込みに,アクセント情報を利用する手法を開発した.また,アクセント情報の認識・ディクテーション・入力補完機能を統合した予測型音声入力システムを実装し,アクセント情報利用の有効性を検証した. 4.韻律的特徴を用いた講演音声の自動要約 重要文抽出によって講演音声の要約を自動生成するために,文単位と文重要度を韻律情報を利用して決定する手法について検討した.ポーズで区切られた発話単位境界に対し,文境界とすべきかどうかを判断する決定木を学習し94%の分類率を得た.文重要度の決定において,連続音声認識による誤りを含む言語情報奪利用する場合の方が,正しい言語情報を利用する場合よりも,韻律情報の効果が大きいことを示した., 12132203
    2000 - 2003
  • Advanced Dependency Structure Analysis Using Minimum Total Penalty Method
    OZEKI Kazuhiko; TAKAGI Kazuyuki
    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), 1. Development of Sentence Compression Algorithm The sentence compression problem was formulated as a problem of selecting an optimal subsequence of phrases from a given sentence. Then, based on our dependency analysis technique, an efficient algorithm was developed to solve the problem. 2. Estimation of inter-phrase dependency strength and phrase significance By using about 34,000 sentences in Kyoto University Corpus, inter-phrase dependency strength was estimated. It is based on the statistical frequency of inter-phrase dependency distance, and was estimated for each modifying phrase class and modified phrase class. Also, a sentence compression experiment was conducted in which human subjects compressed 200 sentences. The result was analyzed statistically and the remaining rate for each phrase class was calculated. Based on the result, phrase significance value for each phrase class was estimated. 3. Subjective Evaluation of Compressed Sentences A subjective evaluation experiment was performed for sentences automatically compressed by using the above algorithm together with the estimated inter-phrase dependency and phrase significance. In this experiment, 200 test sentences, which are different from the sentences in 2, were used. 5 subjects were employed for evaluating the quality of compressed sentences. Subjective evaluation was performed from the following points of view : (a) total impression, (b) retention of information, and c grammatical correctness. For comparison, the same kind of evaluation experiment was done for sentences compressed by humans, and also by a random method. It was found that the quality of sentences compressed by the proposed method lies just between those of human compression and random compression. 4. Segmentation of Long Sentences Because long sentences are difficult to analyze syntactically, it is desirable to segment long sentences into shorter ones. In this work, a support vector machine (SVM) technique was applied to the problem. Vectors consisting of surface attribute values of relevant phrases were input to the SVM, and segmentation points were automatically estimated. As a result, 77% of precision and 84% of recall were obtained. Correct sentence segmentation rate was 72%., 12680372
    2000 - 2002
  • Spoken Language Processing by Minimum Total Penalty Dependency Analysis Method
    OZEKI Kazuhiko; ZHANG Yujie
    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), Results of this research project can be classified into 1. theoretical basis, 2. bunsetsu segmentation and segmentation of long sentences, 3. use of prosodic information, and 4. sentence compaction, all related to the minimum total penalty method. 1. Dependency analysis was investigated from a view point of "minimum cost segmentation problem". It was shown that various dependency analysis algorithms can be derived by changing the definition of the cost. It was also made clear that the minimum total penalty method allows the use of a wide range of numerical information as linguistic knowledge. 2. It is necessary to segment a sentence into bunsetsu phrases prior to dependency analysis. In this work, a decision tree method was applied to this problem, giving higher segmentation accuracy than conventional methods data. The decision tree technique was also applied to segmentation of long sentences, which is pre-processing for dependency analysis. It was demonstrated that a set of segmentation rules was automatically acquired by this method. 3. To find out syntactic information contained in prosodic features, a statistical model was created that represents a relationship between prosodic features and inter-phrase dependency distance. The model was then incorporated into the minimum total penalty parser to measure the effectiveness of prosodic information for dependency analysis. The duration of pause was found to be very effective. Further investigation is necessary to make use of prosodic features related to pitch, power, and speaking rate. 4. An efficient sentence compaction algorithm was developed for such application as generation of on-line TV closed-captions. This algorithm selects an optimal bunsetsu subsequence from an original sentence that maximizes the sum of bunsetsu importance scores and inter-phrase dependency scores. Future work includes investigation of better definitions of bunsetsu importance score and inter-phrase dependency score. It is also necessary to evaluate the quality of shortened sentences using large amount of test data., 09680356
    1997 - 1999