検索詳細｜電気通信大学

柳井　啓司

情報学専攻	教授
Ⅰ類（情報系）	教授
人工知能先端研究センター	教授

研究者情報

学位

修士(工学), 東京大学

博士(工学), 東京大学

Doctor of Engineering, The University of Tokyo

研究キーワード

深層学習による画像生成・変換

深層学習

マルチメディア映像処理

Web画像マイニング

一般物体認識

画像・映像認識

研究分野

情報通信, 知覚情報処理

情報通信, データベース

経歴

2015年04月
電気通信大学, 大学院情報理工学研究科情報学専攻, 教授

2010年04月 - 2015年03月
電気通信大学, 大学院情報理工学研究科総合情報学専攻, 准教授

2006年04月 - 2010年03月
電気通信大学, 情報工学科, 助教授

1997年10月 - 2006年03月
電気通信大学, 情報工学科, 助手

2003年11月 - 2004年09月
米国アリゾナ大学, コンピュータサイエンス学科, 客員研究員

学歴

1997年04月 - 1997年09月
東京大学大学院工学系研究科情報工学専攻博士課程

1995年04月 - 1997年03月
東京大学大学院工学系研究科情報工学専攻修士課程

1995年03月
東京大学, 工学部, 計数工学科

研究活動情報

受賞

受賞日 2023年12月
ACM Multimedia Asia
VQ-VDM: Video Diffusion Models with 3D VQGAN
Best Poster Award, Ryota Kaji;Keiji Yanai

受賞日 2023年07月
International Conference on Machine Vision and Applications
QAHOI: Query-based Anchors for Human-Object Interaction Detection
Best Paper Award at MVA 2023, Junwen Chen;Keiji Yanai

受賞日 2023年07月
画像の認識・理解シンポジウム (MIRU2023)
CLIPと微分可能レンダラーを用いたフォントスタイル変換
MIRUインタラクティブ発表賞, 泉幸太;柳井啓司

受賞日 2023年07月
画像の認識・理解シンポジウム (MIRU2023)
StableSeg: Stable Diffusionによるゼロショット領域分割
MIRU優秀賞, 本部勇真;山口廉斗;柳井啓司

受賞日 2023年05月
Stable Diffusionによるゼロショット画像領域分割
PRMU研究奨励賞, 本部勇真

受賞日 2023年03月
Stable Diffusionによるゼロショット画像領域分割
PRMU月間ベストプレゼンテーション賞, 本部勇真

受賞日 2022年10月
MADiMa Best Paper Award, Shu Naritomi;Keiji Yanai
国際学会・会議・シンポジウム等の賞

受賞日 2021年08月
CEA Best Paper Award, Kaimu Okamoto;Kento Adachi;Keiji Yanai
国際学会・会議・シンポジウム等の賞

受賞日 2020年08月
画像の認識・理解シンポジウム（MIRU2020）
意味と形状の分離によるマルチモーダルレシピ検索及び画像生成
MIRU優秀賞
国内学会・会議・シンポジウム等の賞

受賞日 2020年03月
電子情報通信学会
深層学習による太陽画像からの太陽黒点数の推定
電子情報通信学会学術奨励賞, 樋口陽光;會下拓実;柳井啓司
国内学会・会議・シンポジウム等の賞

受賞日 2019年08月
画像の認識・理解シンポジウム(MIRU)
自己教師あり学習による変化領域の推論を活用した弱教師あり領域分割
MIRU学生奨励賞, 下田和
国内学会・会議・シンポジウム等の賞

受賞日 2018年06月
電子情報通信学会
ISS査読功労賞
国内学会・会議・シンポジウム等の賞

受賞日 2018年03月
データ工学と情報マネジメントに関するフォーラム（DEIM2018）
Conditional GANによる食事写真の属性操作
学生プレゼンテーション賞, 成冨志優
国内学会・会議・シンポジウム等の賞

受賞日 2018年03月
データ工学と情報マネジメントに関するフォーラム（DEIM201９）
AR DeepCalorieCam: AR表示型食事カロリー量推定システム
優秀インタラクティブ賞, 丹野良介;會下拓実;柳井啓司
国内学会・会議・シンポジウム等の賞

受賞日 2017年08月
画像の認識・理解シンポジウム(MIRU)
Conditional GAN を用いた複数詳細カテゴリ画像の合成
MIRU学生奨励賞
国内学会・会議・シンポジウム等の賞

受賞日 2017年08月
画像の認識・理解シンポジウム(MIRU)
Unseen Style Transfer Network
MIRUインタラクティブ発表賞, Ryosuke Tanno;Keiji Yanai
国内学会・会議・シンポジウム等の賞

受賞日 2017年08月
画像の認識・理解シンポジウム(MIRU)
ConvDeconvNetの効率的モバイル実装による画像変換・物体検出・領域分割リアルタイムiOSアプリ群
MIRUデモ発表賞, 丹野良介;泉裕貴;柳井啓司
国内学会・会議・シンポジウム等の賞

受賞日 2017年05月
情報処理学会コンピュータビジョンとイメージメディア研究会｀
食事レシピ情報を利用した食事画像からのカロリー量推定
卒論セッション優秀賞, 會下拓実
国内学会・会議・シンポジウム等の賞

受賞日 2017年03月
データ工学と情報マネジメントに関するフォーラム（DEIM2017）
Multi Style Transfer: 複数のスタイルの任意重み合成によるモバイル上でのリアルタイム画風変換
学生プレゼンテーション賞, 丹野良介
国内学会・会議・シンポジウム等の賞

受賞日 2017年03月
データ工学と情報マネジメントに関するフォーラム（DEIM2017）
Multi-task CNNによる食事画像からのカロリー量推定
学生プレゼンテーション賞, 會下拓実
国内学会・会議・シンポジウム等の賞

受賞日 2017年01月
International MultiMedia Modeling Conference (MMM)
DeepStyleCam: A Real-Time Style Transfer App on iOS
MMM Best Demo Award, Ryosuke Tanno;Keiji Yanai
国際学会・会議・シンポジウム等の賞

受賞日 2016年08月
画像の認識・理解シンポジウム(MIRU)
Style Image Retrieval Using CNN-Based Style Vector
MIRU学生奨励賞, 松尾真
国内学会・会議・シンポジウム等の賞

受賞日 2016年08月
画像の認識・理解シンポジウム(MIRU)
CNNの順・逆伝搬値とCRFを利用した弱教師領域分割
MIRU学生奨励賞, 下田和
国内学会・会議・シンポジウム等の賞

受賞日 2016年03月
データ工学と情報マネジメントに関するフォーラム（DEIM2016）
モバイルOS上での深層学習による画像認識システムの実装と比較分析
優秀インタラクティブ賞，学生プレゼンテーション賞, 丹野良介
国内学会・会議・シンポジウム等の賞

受賞日 2015年07月
画像の認識・理解シンポジウム(MIRU)
料理写真撮影におけるおいしそうな構図決定を支援するシステム
MIRUインタラクティブ発表賞, 柿森隆生;岡部誠;柳井啓司;尾内理紀夫
国内学会・会議・シンポジウム等の賞

受賞日 2014年01月
International MultiMedia Modeling Conference (MMM)
Ireland
Best Demo Award, Yoshiyuki Kawano;Keiji Yanai
国際学会・会議・シンポジウム等の賞, アイルランド

論文

Training-Free Region Prediction with Stable Diffusion.
Yuma Honbu; Keiji Yanai
Proc. of the International Multimedia Modeling Conference (MMM), 掲載ページ 17-31, 出版日 2024年01月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Contextual Associated Triplet Queries for Panoptic Scene Graph Generation.
Jingbin Xu; Junwen Chen; Keiji Yanai
Proc. of ACM Multimedia Asia, 掲載ページ 100-5, 出版日 2023年12月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Mask-based Food Image Synthesis with Cross-Modal Recipe Embeddings.
Zhongtao Chen; Yuma Honbu; Keiji Yanai
Proc. of ACM Multimedia Asia, 掲載ページ 46-7, 出版日 2023年12月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management, MADiMa 2023, Ottawa, ON, Canada, 29 October 2023
ACM MM Workshop on Multimedia Assisted Dietary Management, ACM, 出版日 2023年11月
研究論文（国際会議プロシーディングス）
URL
DOI URL

MADiMa '23: 8th International Workshop on Multimedia Assisted Dietary Management.
Stavroula G. Mougiakakou; Keiji Yanai; Dario Allegra
ACM Multimedia, 掲載ページ 9726-9727, 出版日 2023年11月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

HowToEat: Exploring Human Object Interaction and Eating Action in Eating Scenarios.
Yingcheng Wang; Junwen Chen; Keiji Yanai
ACM MM Workshop on Multimedia Assisted Dietary Management, 掲載ページ 71-78, 出版日 2023年11月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Focusing on what to decode and what to train: Efficient Training with HOI Split Decoders and Specific Target Guided DeNoising.
Junwen Chen; Yingcheng Wang; Keiji Yanai
CoRR, abs/2307.02291巻, 出版日 2023年07月
研究論文（学術雑誌）
URL
DOI URL

QAHOI: Query-Based Anchors for Human-Object Interaction Detection.
Junwen Chen; Keiji Yanai
Proc. of the International Workshop on Machine Vision and Applications (MVA), 掲載ページ 1-5, 出版日 2023年07月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL
DOI 2 URL

CalorieCam360: Simultaneous Eating Action Recognition of Multiple People Using an Omnidirectional Camera.
Kento Terauchi; Keiji Yanai
Proceedings of the 2023 ACM International Conference on Multimedia Retrieval(ICMR), ACM, 掲載ページ 644-648, 出版日 2023年06月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Patent Image RetrievalUsing Cross-entropy-based Metric Learning
Kotaro Higuchi; Yuma Honbu; Keiji Yanai
Proc.of International Workshop on Frontiers of Computer Vision (IW-FCV),, 出版日 2023年02月, 査読付

VQ-VDM: Video Diffusion Models with 3D VQGAN.
Ryota Kaji; Keiji Yanai
MMAsia, 掲載ページ 90-5, 出版日 2023年
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Virtual Try-On Considering Temporal Consistency for Videoconferencing.
Daiki Shimizu; Keiji Yanai
Proc. of the International Multimedia Modeling Conference (MMM), Springer, 掲載ページ 758-763, 出版日 2023年01月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Transformer-Based Cross-Modal Recipe Embeddings with Large Batch Training.
Jing Yang; Junwen Chen; Keiji Yanai
Proc. of the International Multimedia Modeling Conference (MMM), Springer, 掲載ページ 471-482, 出版日 2023年01月
研究論文（国際会議プロシーディングス）
URL
URL 2
DOI URL

Zero-shot Font Style Transfer with a Differentiable Renderer
Kota Izumi; Keiji Yanai
Proc. of ACM Multimedia Asia, -巻, -号, 出版日 2022年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Parallel Queries for Human-Object Interaction Detection
Junwen Chen; Keiji Yanai
Proc. of ACM Multimedia Asia, -巻, -号, 出版日 2022年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

SetMealAsYouLike: Sketch-based Set Meal Image Synthesis with Plate Annotations
Yuma Honbu; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -巻, -号, 出版日 2022年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

DepthGrillCam: A Mobile Application for Real-time Eating Action Recording Using RGB-D Images
Kento Adachi; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -巻, -号, 出版日 2022年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Text-based Image Editing for Food Images with CLIP
Kohei Yamamoto; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -巻, -号, 出版日 2022年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Real Scale 3D Reconstruction of a Dish and a Plate using Implicit Function and a Single RGB-D Image
Shu Naritomi; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -巻, -号, 出版日 2022年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Continual Learning in Vision Transformer
Mana Takeda; Keiji Yanai
Proc.of IEEE International Conference on Image Processing (ICIP), -巻, -号, 出版日 2022年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

FASSD-Net: Fast and Accurate Real-Time Semantic Segmentation for Embedded Systems
L. Rosas-Arias; G. Benitez-Garcia; J. Portillo-Portillo; J. Olivares-Mercado; G. Sanchez-Perez; K. Yanai
IEEE Transactions on Intelligent Transportation Systems, IEEE, 23巻, 9号, 掲載ページ 14349-14360, 出版日 2022年09月, 査読付
研究論文（学術雑誌）, 英語
URL
DOI URL

Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval
Gibran Benitez-Garcia; Hiroki Takahashi; Keiji Yanai
Sensors, MDPI, 22巻, 19号, 掲載ページ 7317:1-7317:17, 出版日 2022年09月, 査読付
研究論文（学術雑誌）, 英語
URL
DOI URL

StyleGAN-based CLIP-guided Image Shape Manipulation
Yuchen Qian; Kohei Yamamoto; Keiji Yanai
Proc.of International Conference on Content-based Multimedia Indexing (CBMI), -巻, -号, 出版日 2022年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Unseen Food Segmentation
Yuma Honbu; Keiji Yanai
Proc.of ACM International Conference on Multimedia Retrieval (ICMR), -巻, -号, 出版日 2022年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Ketchup As You Like: Drawing Editor for Foods
Shu Naritomi; Gibran Benitez-Garcia; Keiji Yanai
Proc. of IEEE Airtificial Intelligence and Virtual Realty (IEEE AIVR) (demo paper), -巻, -号, 出版日 2021年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Pose Sequence Generation with a GCN and an Initial Pose Generator
Kento Terauchi; Keiji Yanai
Proc. of Asian Conference on Pattern Recgonition (ACPR), -巻, -号, 出版日 2021年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Few-Shot and Zero-Shot Semantic Segmentation for Food Images
Yuma Honbu; Keiji Yanai
Proc. of ICMR WS on Multimedia for Cooking and Eating Activities (CEA), -巻, -号, 出版日 2021年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Region-Based Food Calorie Estimation for Multiple-Dish Meals
Kaimu Okamoto; Kento Adachi; Keiji Yanai
Proc. of ICMR WS on Multimedia for Cooking and Eating Activities (CEA), -巻, -号, 出版日 2021年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Ketchup GAN: A New Dataset for Realistic Synthesis of Letters on Food
Gibran Benitez-Garcia; Keiji Yanai
Proc. of ICMR WS on Multimedia Artworks Analysis and Attractiveness Computing (MMArt), -巻, -号, 出版日 2021年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

3D Mesh Reconstruction of Foods from a Single Image
Shu Naritomi; Keiji Yanai
Proc. of ACM Multimedia WS on AIxFood, -巻, -号, 出版日 2021年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Cross-Modal Recipe Embeddings by Disentangling Recipe Contents and Dish Styles
Yu Sugiyama; Keiji Yanai
Proc. of ACM Multimedia, -巻, -号, 出版日 2021年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Pop'n Food: 3D Food Model Estimation System from a Single Image
Shu Naritomi; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval, -巻, -号, 出版日 2021年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

A study on persistence of gan-based vision-induced gustatory manipulation
Kizashi Nakano; Daichi Horita; Norihiko Kawai; Naoya Isoyama; Nobuchika Sakata; Kiyoshi Kiyokawa; Keiji Yanai; Takuji Narumi
Electronics (Switzerland), 10巻, 10号, 出版日 2021年05月02日, Vision-induced gustatory manipulation interfaces can help people with dietary restrictions feel as if they are eating what they want by modulating the appearance of the alternative foods they are eating in reality. However, it is still unclear whether vision-induced gustatory change persists beyond a single bite, how the sensation changes over time, and how it varies among individuals from different cultural backgrounds. The present paper reports on a user study conducted to answer these questions using a generative adversarial network (GAN)-based real-time image-to-image translation system. In the user study, 16 participants were presented somen noodles or steamed rice through a video see-through head mounted display (HMD) both in two conditions; without or with visual modulation (somen noodles and steamed rice were translated into ramen noodles and curry and rice, respectively), and brought food to the mouth and tasted it five times with an interval of two minutes. The results of the experiments revealed that vision-induced gustatory manipulation is persistent in many participants. Their persistent gustatory changes are divided into three groups: those in which the intensity of the gustatory change gradually increased, those in which it gradually decreased, and those in which it did not fluctuate, each with about the same number of participants. Although the generalizability is limited due to the small population, it was also found that non-Japanese and male participants tended to perceive stronger gustatory manipulation compared to Japanese and female participants. We believe that our study deepens our understanding and insight into vision-induced gustatory manipulation and encourages further investigation.
研究論文（学術雑誌）
DOI URL

Multi-Style Transfer Generative Adversarial Network for Text Images
Yuan Honghui; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval, -巻, -号, 出版日 2021年03月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Hungry Networks: 3D Mesh Reconstruction of a Dish and a Plate from a Single Dish Image for Estimating Food Volume
Shu Naritomi; Keiji Yanai
Proc. of ACM Multimedia Asia, -巻, -号, 掲載ページ -, 出版日 2021年02月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Training of Multiple and Mixed Tasks With A Single Network Using Feature Modulation
Mana Takeda; Gibran Benitez-Garcia; Keiji Yanai
Proc. of ICPR Workshop on Deep Learning for Pattern Recognition, -巻, -号, 出版日 2021年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Rescue Dog Action Recognition by Integrating Ego-centric Video, Sound and Sensor Information
Yuta Ide; Tsuyohito Araki; Ryunosuke Hamada; Kazunori Ohno; Keiji Yanai
Proc. of ICPR Workshop on Applications of Egocentric Vision, -巻, -号, 出版日 2021年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

UEC-FoodPix Complete: A Large-scale Food Image Segmentation Dataset
Kaimu Okamoto; Keiji Yanai
Proc. of ICPR Workshop on Multimedia Assisted Dietary Management, -巻, -号, 出版日 2021年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Mask-Based Style-Controlled Image Synthesis Using a Mask Style Encoder
Jaehyeong Cho; Wataru Shimoda; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition (ICPR), -巻, 出版日 2021年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition
Gibran Benitez-Garcia; Jesus Olivares-Mercado; Gabriel Sanchez-Perez; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition (ICPR), -巻, 出版日 2021年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions
Leonel Rosas-Arias; Gibran Benitez-Garcia, Jose Portillo-Portillo; Gabriel Sanchez-Perez; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition (ICPR), -巻, 出版日 2021年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Predicting Plate Regions for Weakly-supervised Food Image Segmentation
Wataru Shimoda; Keiji Yanai
Proc. of IEEE International Conference on Multimedia and Expo (ICME), -巻, 出版日 2020年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Iconify: Converting Photographs into Icons
Takuro Karamatsu; Gibran Benitez-Garcia; Keiji Yanai; Seiichi Uchida
Proc. of ACM ICMR Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM), -巻, 掲載ページ 7-12, 出版日 2020年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語
URL
URL 2
DOI URL

Style Image Retrieval for Improving Material Translation Using Neural Style Transfer
Gibran Benitez-Garcia; Wataru Shimoda; Keiji Yanai
Proc. of ACM ICMR Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM), -巻, 出版日 2020年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

CalorieCaptorGlass: Food Calorie Estimation based on Actual Size using HoloLens and Deep Learning
Shu Naritomi; Keiji Yanai
Proc. of IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR) Demo Track, 掲載ページ -, 出版日 2020年03月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Weakly Supervised Semantic Segmentation Using Distinct Class Specific Saliency Maps
Wataru Shimoda; Keiji Yanai
Computer Vision and Image Understanding, Elsevier, 191巻, -号, 出版日 2020年02月, 査読付
研究論文（学術雑誌）, 英語
DOI URL

SSA-GAN: End-to-End Time-Lapse Generation with Spatial Self-Attention
Daichi Horita; Keiji Yanai
Proc. of Asian Conference on Pattern Recgonition (ACPR), 掲載ページ -, 出版日 2019年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Pre-trained and Shared Encoder in Cycle-Consistent Adversarial Networks to Improve Image Quality
Runtong Zhang; Yuchen Wu; Keiji Yanai
Proc. of Asian Conference on Pattern Recognition (ACPR), 掲載ページ -, 出版日 2019年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Continual Learning of An Image Transformation Network Using Task-dependent Weight Selection Masks
Asato Matsumoto; Keiji Yanai
Proc. of Asian Conference on Pattern Recgonition (ACPR), 掲載ページ -, 出版日 2019年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Attention Guided Unsupervised Image-to-Image Translation with Progressively Growing Strategy
Yuchen Wu; Runtong Zhang; Keiji Yanai
Proc. of ACPR Workshop on Advances and Applications on Generative Deep Learning Models (AAGM), 掲載ページ -, 出版日 2019年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Self-supervised Difference Detection for Weakly-supervised Semantic Segmentation
Wataru Shimoda; Keiji Yanai
Proc. of IEEE/CVF International Conference on Computer Vision (ICCV), -巻, 出版日 2019年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

DepthCalorieCam: A Mobile Application for Volume-Based Food Calorie Estimation using Depth Cameras
Yoshikazu Ando; Takumi Ege; Jaehyeong Cho; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA)-, -巻, 出版日 2019年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

A New Large-scale Food Image Segmentation Dataset and Its Application to Food Calorie Estimation Based on Grains of Rice
Takumi Ege; Wataru Shimoda; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -巻, 出版日 2019年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Unseen Food Creation by Mixing Existing Food Images with Conditional StyleGAN
Daichi Horita; Wataru Shimoda; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -巻, 出版日 2019年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Ramen as You Like: Sketch-based Food Image Generation and Editing
Jaehyeong Cho; Wataru Shimoda; Keiji Yanai
Proc. of ACM Multimedia (demo paper), -巻, 出版日 2019年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Zero-Annotation Plate Segmentation Using a Food Category Classifier and a Food/Non-Food Classifier
Wataru Shimoda; Keiji yanai
Proc. of ICCV Workshop on Multi-Discipline Approach for Learning Concepts (MDALC), -巻, 出版日 2019年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Dog-Centric Activity Recognition by Integrating Appearance, Motion and Sound
Tsuyohito Araki; Ryunosuke Hamada; Kazunori Ohno; Keiji Yanai
Proc. of ICCV Workshop on Egocentric Perception, Interaction and Computing (EPIC), -巻, 出版日 2019年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Analyzing Regional Food Trends with Geo-tagged Twitter Food Photos
Kaimu Okamoto; Keiji Yanai
Proc. of International Conference on Content-Based Multimedia Indexing (CBMI), -巻, 出版日 2019年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Large-scale Twitter Food Photo Mining and Its Applications
Keiji Yanai; Kaimu Okamoto; Tetsuya Nagano; Daichi Horita
Proc. of International Conference on Multimedia Big Data (BIGMM), -巻, 出版日 2019年09月, 査読付, 招待
研究論文（国際会議プロシーディングス）, 英語

Self-supervised Difference Detection for Refinement CRF and Seek Interpolation
Wataru Shimoda; Keiji yanai
Proc. of CVPR WS on Learning from Imperfect Data, -巻, 出版日 2019年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Enchanting your noodles: A gustatory manipulation interface by using GAN-based real-time food-to-food translation
Kizashi Nakano; Kiyoshi Kiyokawa; Daichi Horita; Keiji Yanai; Nobuchika Sakata; Takuji Narurni
26th IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2019 - Proceedings, 掲載ページ 1339-1340, 出版日 2019年03月, In this demonstration, we present a novel gustatory manipulation interface which utilizes the cross-modal effect of vision on taste elicited with real-time food appearance modulation using a generative adversarial network (GAN). Unlike existing systems which only change color or texture pattern of a particular type of food in an inflexible manner, our system changes the appearance of food into multiple types of food in real-time flexibly, dynamically and interactively in accordance with the deformation of the food that the user is actually eating by using GAN-based image-to-image translation. Our system can turn somen noodles into ramen noodles or fried noodles, or steamed rice into curry and rice or fried rice. Users of our demonstration system will taste what is visually presented to some extent rather than what they are actually eating.
研究論文（国際会議プロシーディングス）
DOI URL

A Large-scale Analysis of Regional Tendency of Twitter Photos Using Only Image Features
Tetsuya Nagano; Takumi Ege; Wataru Shimoda; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR), -巻, 出版日 2019年03月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Image-Based Estimation of Real Food Size for Accurate Food Calorie Estimation
Takumi Ege; Yoshikazu Ando; Ryosuke Tanno; Wataru Shimoda; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR), -巻, 出版日 2019年03月, 査読付
研究論文（国際会議プロシーディングス）, 英語

DeepTaste: Augmented Reality Gustatory Manipulation with GAN-Based Real-Time Food-to-Food Translation.
Kizashi Nakano; Daichi Horita; Nobuchika Sakata; Kiyoshi Kiyokawa; Keiji Yanai; Takuji Narumi
Proc. of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), IEEE, -巻, 掲載ページ 212-223, 出版日 2019年, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Webly-Supervised Food Detection with Foodness Proposal
Wataru Shimoda; Keiji Yanai
IEICE Transactions on Information and Systems, IEICE, -巻, -号, 出版日 2019年, 査読付
研究論文（学術雑誌）, 英語
DOI URL

Simultaneous Estimation of Dish Locations and Calories with Multi-task Learning
Takumi Ege; Keiji Yanai
IEICE Transactions on Information and Systems, IEICE, -巻, -号, 出版日 2019年, 査読付
研究論文（学術雑誌）, 英語
DOI URL

FoodChangeLens: CNN-based Food Transformation on HoloLens
Shu Naritmo; Ryosuke Tanno; Takumi Ege; Keiji Yanai
Proc. of International Workshop on Interface and Experience Design with AI for VR/AR (DAIVAR 2018), -巻, 出版日 2018年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Word-Conditioned Image Style Transfer
Yu Sugiyama; Keiji Yanai
Proc. of ACCV Workshop on AI Aesthetics in Art and Media, -巻, 出版日 2018年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Font Style Transfer Using Neural Style Transfer and Unsupervised Cross-domain Transfer
Atsushi Narusawa; Wataru Shimoda; Keiji Yanai
Proc. of ACCV Workshop on AI Aesthetics in Art and Media, -巻, 出版日 2018年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Real-Time Image Classification and Transformation Apps on iOS by "Chainer2MPSNNGraph"
Yuki Izumi; Daichi Horita; Ryosuke Tanno; Keiji Yanai
Proc. of NIPS WS on Machine Learning on the Phone and other Consumer Devices (MLPCD), -巻, 出版日 2018年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Continual Learning for an Encoder-Decoder CNN Using "Piggyback"
Asato Matsumoto; Keiji Yanai
Proc. of NIPS Continual Learning Workshop, -巻, 出版日 2018年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

CNN-based photo transformation for improving attractiveness of ramen photos
Daichi Horita; Jaehyeong Cho; Takumi Ege; Keiji Yanai
Proc. of ACM Symposium on Virtual Reality Software and Technology (VRST), 掲載ページ -, 出版日 2018年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

AR DeepCalorieCam V2: Food Calorie Estimation with CNN and AR-based Actual Size Estimation
Ryosuke Tanno; Takumi Ege; Keiji Yanai
Proc. of ACM Symposium on Virtual Reality Software and Technology (VRST), 掲載ページ -, 出版日 2018年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Magical Rice Bowl: Real-time Food Category Changer
Ryosuke Tanno; Daichi Horita; Wataru Shimoda; Keiji Yanai
Proc. of ACM Multimedia, -巻, 掲載ページ -, 出版日 2018年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

CNN 特徴量学習に基づく画像検索による食事画像カロリー量推定
會下拓実; 下田和; 柳井啓司
電子情報通信学会論文誌 D, J101巻, 8号, 掲載ページ 1099-1109, 出版日 2018年08月, 査読付
研究論文（学術雑誌）, 日本語
DOI URL

Multi-task Learning of Dish Detection and Calorie Estimation
Takumi Ege; Keiji Yanai
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), 掲載ページ 53-58, 出版日 2018年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Food Category Transfer with Conditional Cycle GAN and a Large-scale Food Image Dataset
Daichi Horita; Ryosuke Tanno; Wataru Shimoda; Keiji Yanai
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), 掲載ページ 67-70, 出版日 2018年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Food Image Generation using A Large Amount of Food Images with Conditional GAN: RamenGAN and RecipeGAN
Yoshifumi Ito; Wataru Shimoda; Keiji Yanai
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), 掲載ページ 71-74, 出版日 2018年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Image-based food calorie estimation using recipe information
Takumi Ege; Keiji Yanai
IEICE Transactions on Information and Systems, Institute of Electronics, Information and Communication, Engineers, IEICE, E101D巻, 5号, 掲載ページ 1333-1341, 出版日 2018年05月01日, 査読付, Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Ar deepcaloriecam: an ios app for food calorie estimation with augmented reality
Ryosuke Tanno; Takumi Ege; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 10705巻, 掲載ページ 352-356, 出版日 2018年, 査読付, A food photo generally includes several kinds of food dishes. In order to recognize multiple dishes in a food photo, we need to detect each dish in a food image. Meanwhile, in recent years, the accuracy of object detection has improved drastically by the appearance of Convolutional Neural Network (CNN). In this demo, we present two automatic calorie estimation apps, DeepCalorieCam and AR DeepCalorieCam, running on iOS. DeepCalorieCam can estimate food calories by detecting dishes from the video stream captured from the built-in camera of an iPhone. We use YOLOv2, [1] which is the state-of-the-art object detector using CNN, as a dish detector to detect each dish in a food image, and the food calorie of each detected dish is estimated by image-based food calorie estimation, [2, 3]. AR DeepCalorieCam is a combination of calorie estimation and augmented reality (AR) which is an AR version of DeepCalorieCam.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

An Integration of Bottom-up and Top-Down Salient Cues on RGB-D Data: Saliency from Objectness vs. Non-Objectness
Nevrez Imamoglu; Wataru Shimoda; Chi Zhang; Yuming Fang; Asako; Kanezaki; Keiji Yanai; Yoshifumi Nishida
Signal, Image and Video Processin, Springer, -巻, 2号, 掲載ページ 307-314, 出版日 2018年, 査読付
研究論文（学術雑誌）, 英語
URL
URL 2
DOI URL

Predicting Segmentation "Easiness'" from the Consistency for Weakly-Supervised Segmentation
Wataru Shimoda; Keiji Yanai
Proc. of Asian Conference on Pattern Recognition (ACPR), 掲載ページ -, 出版日 2017年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Estimating Food Calories for Multiple-dish Food Photos,
Takumi Ege; Keiji Yanai
Proc. of Asian Conference on Pattern Recognition (ACPR), -巻, 出版日 2017年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Scene Text Eraser
Toshiki Nakamura; Anna Zhu; Keiji Yanai; Seiichi Uchida
Proc. of the International Conference on Document Analysis and Recognitino (ICDAR), -巻, 出版日 2017年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Neural Font Style Transfer
Gantugs Atarsaikhan; Brian Kenji Iwana; Atsushi Narusawa; Keiji Yanai; Seiichi Uchida
Proc. of ICDAR Workshop on Machine Learning, -巻, 出版日 2017年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Image-Based Food Calorie Estimation Using Knowledge on Food Categories, Ingredients and Cooking Directions
Takumi Ege; Keiji Yana
Proc. of ACM Multimedia Thematic Workshops on Understanding, 掲載ページ -, 出版日 2017年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Takumi Ege and Keiji Yanai
Comparison of Two; Approaches for Direct Food; Calorie Estimation
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), 掲載ページ -, 出版日 2017年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Neural Style Vector を用いた絵画画像のスタイル検索
松尾真; 柳井啓司
電子情報通信学会論文誌D, J100-D巻, 8号, 掲載ページ 742-749, 出版日 2017年08月01日, 査読付
研究論文（学術雑誌）, 日本語
DOI URL

Partial Style Transfer Using Weakly-Supervised Semantic Segmentation
Shin Matsuo; Wataru Shimoda; Keiji Yanai
Proc. ICME Workshop on Multimedia Artworks Analysis (MMArt), -巻, 出版日 2017年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Learning Food Image Similarity for Food Image Retrieval
Wataru Shimoda; Keiji Yanai
Proceedings - 2017 IEEE 3rd International Conference on Multimedia Big Data, BigMM 2017, Institute of Electrical and Electronics Engineers Inc., 掲載ページ 165-168, 出版日 2017年06月30日, 査読付, For food application, recipe retrieval is an important task. However, many of them rely on only text query. Food image retrieval has relation to recipe retrieval so that similar food images are expected that they have similar recipes. Rising image retrieval performance is desired for recipe retrieval. On the other hand, to learn similarity by Siamese Network or Triplet Network are known as an effective method for image retrieval. However, there are no research for food image retrieval using similarity learning with Convolutional Neural Network as far as we know. Food recognition is known as one of fine-grained recognition tasks. Therefore it is unclear that how effective similarity learning methods based on CNN in food images. In our work, we trained some networks for feature similarity, and evaluated their effectiveness in food image retrieval.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Conditional fast style transfer network
Keiji Yanai; Ryosuke Tanno
ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval, Association for Computing Machinery, Inc, -巻, 掲載ページ 434-437, 出版日 2017年06月06日, 査読付, In this paper, we propose a conditional fast neural style transfer network. We extend the network proposed as a fast neural style transfer network by Johnson et al. [8] so that the network can learn multiple styles at the same time. To do that, we add a conditional input which selects a style to be transferred out of the trained styles. In addition, we show that the proposed network can mix multiple styles, although the network is trained with each of the training styles independently. The proposed network can also transfer different styles to the different parts of a given image at the same time, which we call "spatial style transfer". In the experiments, we confirmed that no quality degradation occurred in the multi-style network compared to the single network, and linear-weighted multi-style fusion enabled us to generate various kinds of new styles which are different from the trained single styles. In addition, we also introduce a mobile implementation of the proposed network which runs in about 5 fps on an iPhone 7 Plus.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Simultaneous Estimation of Food Categories and Calories with Multi-task CNN
Takumi Ege; Keiji Yanai
Proc. of IAPR International Conference on Machine Vision Applications (MVA), -巻, 出版日 2017年05月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Twitter Photo Geo-Localization Using Both Textual and Visual Features
Shin Matsuo; Wataru Shimoda; Keiji Yanai
Proc. of the IEEE International Conference on Multimedia Big Data (BigMM), -巻, 出版日 2017年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Learning Food Image Embedding for Food Image Retrieval
Wataru Shimoda; Keiji Yanai
Proc. of the IEEE International Conference on Multimedia Big Data (BigMM), -巻, 出版日 2017年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Unseen Style Transfer Based on a Conditional Fast Style Transfer Network
Keiji Yanai
Proc. of International Conference on Learning Representation Workshop Track (ICLR WS), -巻, 出版日 2017年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Comparison of Two Approaches for Direct Food Calorie Estimation
Takumi Ege; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 10590巻, 掲載ページ 453-461, 出版日 2017年, 査読付, In this paper, we compare CNN-based estimation and search-based estimation for image-based food calorie estimation. As the up-to-date direct food calorie estimation methods, we proposed a CNN-based calorie regression in [5], while Miyazaki et al. [9] proposed an image-search-based estimation method. The dataset used in the CNN-based direct estimation [5] contained 4877 images of 15 kinds of food classes, while the dataset used in the search-based work [9] consisted of 6522 images without any category information. In addition, in [9], hand-crafted features are used such as BoF and color histogram. The problems are that both the datasets are small and as far as we know there are no work to clearly compare CNN-based and search-based with the same dataset. In this work, we construct a calorie-annotated 68,774 food image dataset, and compare CNN-based estimation [5] and search-based estimation [9] with the same datasets. For the search-based estimation, we use CNN features instead of hand-crafted features used in [9].
研究論文（国際会議プロシーディングス）, 英語
DOI URL

DeepStyleCam: A Real-Time Style Transfer App on iOS
Ryosuke Tanno; Shin Matsuo; Wataru Shimoda; Keiji Yanai
MULTIMEDIA MODELING, MMM 2017, PT II, SPRINGER INTERNATIONAL PUBLISHING AG, 10133巻, 掲載ページ 446-449, 出版日 2017年, 査読付, In this demo, we present a very fast CNN-based style transfer system running on normal iPhones. The proposed app can transfer multiple pre-trained styles to the video stream captured from the builtin camera of an iPhone around 140ms (7fps). We extended the network proposed as a real-time neural style transfer network by Johnson et al. [1] so that the network can learn multiple styles at the same time. In addition, we modified the CNN network so that the amount of computation is reduced one tenth compared to the original network. The very fast mobile implementation of the app are based on our paper [2] which describes several new ideas to implement CNN on mobile devices efficiently. Figure 1 shows an example usage of DeepStyleCam which is running on an iPhone SE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Automatic Retrieval of Action Video Shots from the Web Using Density-Based Cluster Analysis and Outlier Detection
Nga Hang; Keiji Yanai
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E99D巻, 11号, 掲載ページ 2788-2795, 出版日 2016年11月, 査読付, In this paper, we introduce a fully automatic approach to construct action datasets from noisy Web video search results. The idea is based on combining cluster structure analysis and density-based outlier detection. For a specific action concept, first, we download its Web top search videos and segment them into video shots. We then organize these shots into subsets using density-based hierarchy clustering. For each set, we rank its shots by their outlier degrees which are determined as their isolatedness with respect to their surroundings. Finally, we collect high ranked shots as training data for the action concept. We demonstrate that with action models trained by our data, we can obtain promising precision rates in the task of action classification while offering the advantage of fully automatic, scalable learning. Experiment results on UCF11, a challenging action dataset, show the effectiveness of our method.
研究論文（学術雑誌）, 英語
DOI URL

Efficient mobile implementation of A CNN-based object recognition system
Keiji Yanai; Ryosuke Tanno; Koichi Okamoto
MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, Association for Computing Machinery, Inc, -巻, -号, 掲載ページ 362-366, 出版日 2016年10月01日, 査読付, Because of the recent progress on deep learning studies, Convolutional Neural Network (CNN) based method have out-performed conventional object recognition methods with a large margin. However, it requires much more memory and computational costs compared to the conventional methods. Therefore, it is not easy to implement a CNN-based object recognition system on a mobile device where memory and computational power are limited. In this paper, we examine CNN architectures which are suitable for mobile implementation, and propose multi-scale network-in-networks (NIN) in which users can adjust the trade-off between recognition time and accuracy. We implemented multi-threaded mobile applications on both iOS and Android employing either NEON SIMD instructions or the BLAS library for fast computation of convolutional layers, and compared them in terms of recognition time on mobile devices. As results, it has been revealed that BLAS is better for iOS, while NEON is better for Android, and that reducing the size of an input image by resizing is very effective for speedup of CNN-based recognition.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Visual Event Mining from the Twitter Stream
Takamu Kaneko; Keiji Yanai
Proc. of ACM International World-Wide Web Conference (WWW), -巻, -号, 出版日 2016年04月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications
Ryosuke Tanno; Keiji Yanai
ADJUNCT PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING NETWORKING AND SERVICES (MOBIQUITOUS 2016), ASSOC COMPUTING MACHINERY, 掲載ページ 159-164, 出版日 2016年, 査読付, In this study, we create "Caffe2C" which converts CNN (Convolutional Neural Network) models trained with the existing CNN framework, Caffe, C-language source codes for mobile devices. Since Caffe2C generates a single C code which includes everything needed to execute the trained CNN, csCaffe2C makes it easy to run CNN-based applications on any kinds of mobile devices and embedding devices without GPUs. Moreover, Caffe2C achieves faster execution speed compared to the existing Caffe for iOS/Android and the OpenCV iOS/Android DNN class. The reasons are as follows: (1) directly converting of trained CNN models to C codes, (2) efficient use of NEON/BLAS with multi-threading, and (3) performing pre-computation as much as possible in the computation of CNNs. In addition, in this paper, we demonstrate the availability of Caffe2C by showing four kinds of CNN-base object recognition mobile applications.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Overview of the ACM MultiMedia 2016 International Workshop on Multimedia Assisted Dietary Management
Stavroula Mougiakakou; Giovanni Maria Farinella; Keiji Yanai
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, ASSOC COMPUTING MACHINERY, 掲載ページ 1489-1490, 出版日 2016年, 査読付, This abstract provides a summary and overview of the 2nd international workshop on multimedia assisted dietary management.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Automatic construction of action datasets using web videos with density-based cluster analysis and outlier detection
Nga Hang Do; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 9431巻, 掲載ページ 160-172, 出版日 2016年, 査読付, In this paper, we introduce a fully automatic approach to construct action datasets from noisy Web video search results. The idea is based on combining cluster structure analysis and density-based outlier detection. For a specific action concept, first, we download its Web top search videos and segment them into video shots. We then organize these shots into subsets using density-based hierarchy clustering. For each set, we rank its shots by their outlier degrees which are determined as their isolatedness with respect to their surroundings. Finally, we collect upper ranked shots as training data for the action concept. We demonstrate that with action models trained by our data, we can obtain promising precision rates in the task of action classification while offering the advantage of a fully automatic, scalable learning. Experiment results on UCF11, a challenging action dataset, show the effectiveness of our method.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Grillcam: A real-time eating action recognition system
Koichi Okamoto; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 9517巻, 掲載ページ 331-335, 出版日 2016年, In this demo, we demonstrate a mobile real-time eating action recognition system, GrillCam. It continuously recognizes user’s eating action and estimates categories of eaten food items during meal-time. With this system, we can get to know total amount of eaten food items, and can calculate total calorie intake of eaten foods even for the meals where the amount of foods to be eaten is not decided before starting eating. The system implemented on a smartphone continuously monitors eating actions during mealtime. It detects the moment when a user eats foods, extract food regions near the user’s mouth and classify them. As a prototype system, we implemented a mobile system the target of which are Japanese-style meals, “Yakiniku” and “Oden”. It can recognize five different kinds of ingredients for each of “Yakiniku” and “Oden” in the real-time way with classification rates, 87.7% and 80.8%, respectively. It was evaluated as being superior to the baseline system which employed no eating action recognition by user study.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

A system to help amateurs take pictures of delicious looking food
Takao Kakimori; Makoto Okabe; Keiji Yanai; Rikio Onai
2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), IEEE, -巻, -号, 掲載ページ 456-461, 出版日 2016年, 査読付, Recently, many people have begun to take pictures of meals and food either at home or in restaurants. These pictures are then uploaded to social networking services (SNS) where they are shared with friends. People want to take pictures of food that looks delicious, but they often find this difficult. This is because most people lack the knowledge required to take attractive pictures. There are many photography techniques in use, e.g., composition [1], lighting, color, focus, etc. The techniques used to take good pictures vary depending on the subject. Amateur photographers find it difficult to choose techniques and apply them appropriately. In this paper, we consider the composition of food photographs and develop a system to support amateurs taking pictures of meals and food to make the food look delicious. Our target users are food photography amateurs. Our target photographic subjects are food items on plates or dishes. Using our system, there are four steps to food photography: 1) the user provides information about the foods to be photographed, or our system automatically recognizes these food items, with the aid of a camera on a mobile phone; 2) our system suggests a composition and camera tilt that will result in a picture that makes the food look delicious; 3) the user arranges the food and dishes on the table, and sets the camera position and tilt; 4) finally, the user takes the picture. If the user is not satisfied with the suggestion, we allow the user to design a new composition quickly and easily using their mobile phone. We performed a usability study for our system followed by a subjective evaluation of the quality of the pictures taken using our system.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

CNN-based Style Vector for Style Image Retrieval
Shin Matsuo; Keiji Yanai
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ASSOC COMPUTING MACHINERY, -巻, -号, 掲載ページ 309-312, 出版日 2016年, 査読付, In this paper, we have examined the effectiveness of "style matrix" which is used in the works on style transfer and texture synthesis by Gatys et al. [2, 3] in the context of image retrieval as image features. A style matrix is presented by Gram matrix of the feature maps in a deep convolutional neural network. We proposed a style vector which are generated from a style matrix with PCA dimension reduction. In the experiments, we evaluate image retrieval performance using artistic images downloaded from Wikiarts.org regarding both artistic styles ans artists. We have obtained 40.64% and 70.40% average precision for style search and artist search, respectively, both of which outperformed the results by common CNN features. In addition, we found PCA-compression boosted the performance.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Event photo mining from Twitter using keyword bursts and image clustering
Takamu Kaneko; Keiji Yanai
NEUROCOMPUTING, ELSEVIER SCIENCE BV, 172巻, 掲載ページ 143-158, 出版日 2016年01月, 査読付, Twitter is a unique microblogging service which enables people to post and read not only short messages but also photos from anywhere. Since microblogs are different from traditional blogs in terms of timeliness and on-the-spot-ness, they include much information on various events over the world. Especially, photos posted to microblogs are useful to understand what happens in the world visually and intuitively.
In this paper, we propose a system to discover events and related photos from the Twitter stream. We make use of "geo-photo tweets" which are tweets including both geotags and photos in order to mine various events visually and geographically. Some works on event mining which utilize geotagged tweets have been proposed so far. However, they used no images but only textual analysis of tweet message texts. In this work, we detect events using visual information as well as textual information.
In the experiments, we analyzed 17 million geo-photo tweets posted in the United States and 3 million geo-photo tweets posted in Japan with the proposed method, and evaluated the results. We show some examples of detected events and their photos such as "rainbow", "fireworks" "Tokyo firefly festival" and "Halloween". (C) 2015 Elsevier B.V. All rights reserved.
研究論文（学術雑誌）, 英語
DOI URL

Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation
Wataru Shimoda; Keiji Yanai
COMPUTER VISION - ECCV 2016, PT IV, SPRINGER INT PUBLISHING AG, 9908巻, -号, 掲載ページ 218-234, 出版日 2016年, 査読付, In this paper, we deal with a weakly supervised semantic segmentation problem where only training images with image-level labels are available. We propose a weakly supervised semantic segmentation method which is based on CNN-based class-specific saliency maps and fully-connected CRF. To obtain distinct class-specific saliency maps which can be used as unary potentials of CRF, we propose a novel method to estimate class saliency maps which improves the method proposed by Simonyan et al. (2014) significantly by the following improvements: (1) using CNN derivatives with respect to feature maps of the intermediate convolutional layers with up-sampling instead of an input image; (2) subtracting the saliency maps of the other classes from the saliency maps of the target class to differentiate target objects from other objects; (3) aggregating multiple-scale class saliency maps to compensate lower resolution of the feature maps. After obtaining distinct class saliency maps, we apply fully-connected CRF by using the class maps as unary potentials. By the experiments, we show that the proposed method has out-performed state-of-the-art results with the PASCAL VOC 2012 dataset under the weakly-supervised setting.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Foodness Proposal for Multiple Food Detection by Training of Single Food Images
Wataru Shimoda; Keiji Yanai
MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, ASSOC COMPUTING MACHINERY, -巻, -号, 掲載ページ 13-21, 出版日 2016年, 査読付, We propose a CNN-based "food-ness" proposal method which requires neither pixel-wise annotation nor bounding box annotation. Some proposal methods have been proposed to detect regions with high "object-ness" so far. However, many of them generated a large number of candidates to raise the recall rate. Considering the recent advent of the deeper CNN, these methods to generate a large number of proposals have difficulty in processing time for practical use. Meanwhile, a fully convolutional network (FCN) was proposed the network of which localizes target objects directly. FCN saves computational cost, although FCN is essentially equivalent to the sliding window search. This approach made large progress and achieved significant success in various tasks.
Then, in this paper we propose an intermediate approach between the traditional proposal approach and the fully convolutional approach. Especially we propose a novel proposal method which generates high "food-ness" regions by fully convolutional networks and back-propagation based approach with training food images gathered from the Web.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

An Automatic Calorie Estimation System of Food Images on a Smartphone
Koichi Okamoto; Keiji Yanai
MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, ASSOC COMPUTING MACHINERY, -巻, -号, 掲載ページ 63-70, 出版日 2016年, 査読付, In recent years, due to a rise in healthy thinking on eating, many people take care of their eating habits, and some people record daily diet regularly. To assist them, many mobile applications for recording everyday meals have been released so far. Some of them employ food image recognition which can estimate not only food names but also food calorie. However, most of such applications have some problems especially on their usability. Then, in this paper, we propose a novel single-image-based food calorie estimation system which runs on a smartphone as a standalone application without external recognition servers. The proposed system carries out food region segmentation, food region categorization, and calorie estimation automatically. By the experiments and the user study on the proposed system, the effectiveness of the proposed system was confirmed.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

DeepFoodCam: A DCNN-based Real-time Mobile Food Recognition System
Ryosuke Tanno; Koichi Okamoto; Keiji Yanai
MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, ASSOC COMPUTING MACHINERY, -巻, -号, 掲載ページ 89-89, 出版日 2016年, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Weakly-Supervised Segmentation by Combining CNN Feature Maps and Object Saliency Maps
Wataru Shimoda; Keiji Yanai
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE COMPUTER SOC, -巻, -号, 掲載ページ 1935-1940, 出版日 2016年, 査読付, In general, CNN based semantic segmentation methods assume pixel-wise annotation is available, which is costly to obtain in general. On the other hand, image-level annotations is much easier to obtain than pixel-level annotation. Then, in this work, we focus on weakly-supervised semantic segmentation which is known as task of using training data with only image-level annotations.
In this paper, we propose a new CNN-based semantic segmentation method which uses both activation features calculated by feed-forwarding and object saliency maps obtained by back-propagation. As a CNN, we use the VGG-16 pre-trained with 1000-class ILSVRC datasets and fine-tuned it with multi-label training using only image-level labeled dataset. By the experiments, we show that the proposed method achieved state-of-the-art results with the PASCAL VOC 2012 dataset.
研究論文（国際会議プロシーディングス）, 英語

A system to support the amateurs to take a delicious-looking picture of foods
Takao Kakimori; Makoto Okabe; Keiji Yanai; Rikio Onai
SIGGRAPH Asia 2015 Mobile Graphics and Interactive Applications, SA 2015, Association for Computing Machinery, Inc, -巻, 出版日 2015年11月02日, 査読付, Recently, many people take a picture of foods at home or in restau- rants, and upload the picture to a social networking service (SNS) to share it with friends. People want to take a delicious-looking picture of foods, but it is often difficult, because most of them have no idea how to take a delicious-looking picture. There are many photography techniques for composition[Liu et al. 2010], lighting, color, focus, etc, and the techniques used to take a picture are differ- ent for different types of subjects. The problem lies in the difficulty for amateur photographers to choose and apply appropriate ones from such many techniques. In this paper, we pay attention to composition and develop a system to support the amateurs to take a delicious-looking picture of foods in a short time. Our target users are the amateurs of food photogra- phy and our target photographic subjects are foods on dishes. There are four steps to take a picture using our system: 1) our system automatically recognizes foods on dishes
2) our system suggests the composition and the camera tilt, by which the user can take a delicious-looking picture
3) the user arranges foods and dishes on the table, and set the camera position and tilt
4) finally, the user takes the picture.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Automatic Action Dataset Construction from Web using Density-based Cluster Analysis and Outlier Detection
Nga Do; Keiji Yanai
Proc. of Pacific Rim Symposium on Image and Video Technology (PSIVT), -巻, 出版日 2015年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

UEC at TRECVID 2015 SIN task
Do Hang Nga; Keiji Yanai
Proc. of TRECVID Workshop, 掲載ページ -, 出版日 2015年11月
研究論文（国際会議プロシーディングス）, 英語

FoodCam: A real-time food recognition system on a smartphone
Yoshiyuki Kawano; Keiji Yanai
MULTIMEDIA TOOLS AND APPLICATIONS, SPRINGER, 74巻, 14号, 掲載ページ 5263-5287, 出版日 2015年07月, 査読付, We propose a mobile food recognition system, FoodCam, the purposes of which are estimating calorie and nutrition of foods and recording a user's eating habits. In this paper, we propose image recognition methods which are suitable for mobile devices. The proposed method enables real-time food image recognition on a consumer smartphone. This characteristic is completely different from the existing systems which require to send images to an image recognition server. To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract image features and finally classify it into one of the one hundred food categories with a linear SVM. As image features, we adopt two kinds of features: one is the combination of the standard bag-of-features and color histograms with chi(2) kernel feature maps, and the other is a HOG patch descriptor and a color patch descriptor with the state-of-the-art Fisher Vector representation. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, and it shows the estimated direction in an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly and continuously. We implemented this system as a standalone mobile application for Android smartphones so as to use multiple CPU cores effectively for real-time recognition. In the experiments, we have achieved the 79.2 % classification rate for the top 5 category candidates for a 100-category food dataset with the ground-truth bounding boxes when we used HOG and color patches with the Fisher Vector coding as image features. In addition, we obtained positive evaluation by a user study compared to the food recording system without object recognition.
研究論文（学術雑誌）, 英語
DOI URL

FoodCam: A real-time food recognition system on a smartphone
Yoshiyuki Kawano; Keiji Yanai
Multimedia Tools and Applications, Kluwer Academic Publishers, 74巻, 14号, 掲載ページ 5263-5287, 出版日 2015年07月01日, 査読付, We propose a mobile food recognition system, FoodCam, the purposes of which are estimating calorie and nutrition of foods and recording a user’s eating habits. In this paper, we propose image recognition methods which are suitable for mobile devices. The proposed method enables real-time food image recognition on a consumer smartphone. This characteristic is completely different from the existing systems which require to send images to an image recognition server. To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract image features and finally classify it into one of the one hundred food categories with a linear SVM. As image features, we adopt two kinds of features: one is the combination of the standard bag-of-features and color histograms with χ2 kernel feature maps, and the other is a HOG patch descriptor and a color patch descriptor with the state-of-the-art Fisher Vector representation. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, and it shows the estimated direction in an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly and continuously. We implemented this system as a standalone mobile application for Android smartphones so as to use multiple CPU cores effectively for real-time recognition. In the experiments, we have achieved the 79.2 % classification rate for the top 5 category candidates for a 100-category food dataset with the ground-truth bounding boxes when we used HOG and color patches with the Fisher Vector coding as image features. In addition, we obtained positive evaluation by a user study compared to the food recording system without object recognition.
研究論文（学術雑誌）, 英語
DOI URL

既存カテゴリの利用とクラウドソーシングによる食事画像データセットの自動拡張
河野憲之; 柳井啓司
電子情報通信学会論文誌D, J98-D巻, 4号, 掲載ページ -, 出版日 2015年04月, 査読付
研究論文（学術雑誌）, 日本語
DOI URL

Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation
Yoshiyuki Kawano; Keiji Yanai
COMPUTER VISION - ECCV 2014 WORKSHOPS, PT III, SPRINGER-VERLAG BERLIN, 8927巻, -号, 掲載ページ 3-17, 出版日 2015年, 査読付, In this paper, we propose a novel effective framework to expand an existing image dataset automatically leveraging existing categories and crowdsourcing. Especially, in this paper, we focus on expansion on food image data set. The number of food categories is uncountable, since foods are different from a place to a place. If we have a Japanese food dataset, it does not help build a French food recognition system directly. That is why food data sets for different food cultures have been built independently so far. Then, in this paper, we propose to leverage existing knowledge on food of other cultures by a generic "foodness" classifier and domain adaptation. This can enable us not only to built other-cultured food datasets based on an original food image dataset automatically, but also to save as much crowd-sourcing costs as possible. In the experiments, we show the effectiveness of the proposed method over the baselines.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Hand Detection and Tracking in Videos for Fine-Grained Action Recognition
Nga H. Do; Keiji Yanai
COMPUTER VISION - ACCV 2014 WORKSHOPS, PT I, SPRINGER-VERLAG BERLIN, 9008巻, 掲載ページ 19-34, 出版日 2015年, 査読付, In this paper, we develop an effective method of detecting and tracking hands in uncontrolled videos based on multiple cues including hand shape, skin color, upper body position and flow information. We apply our hand detection results to perform fine-grained human action recognition. We demonstrate that motion features extracted from hand areas can help classify actions even when they look familiar and they are associated with visually similar objects. We validate our method of detecting and tracking hands on VideoPose2.0 dataset and apply our method of classifying actions to the playing-instrument group of UCF-101 dataset. Experimental results show the effectiveness of our approach.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

テレビ番組からの位置情報付き旅行映像データベースの自動構築
向井康貴; 柳井啓司
電子情報通信学会論文誌D, J98-D巻, 1号, 出版日 2015年01月, 査読付
研究論文（学術雑誌）, 日本語

VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence
Nga H. Do; Keiji Yanai
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E98D巻, 1号, 掲載ページ 166-172, 出版日 2015年01月, 査読付, In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as "surfing wave" (sport action) or "brushing teeth" (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.
研究論文（学術雑誌）, 英語
DOI URL

FOOD IMAGE RECOGNITION USING DEEP CONVOLUTIONAL NETWORK WITH PRE-TRAINING AND FINE-TUNING
Keiji Yanai; Yoshiyuki Kawano
2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, -巻, -号, 掲載ページ -, 出版日 2015年, 査読付, In this paper, we examined the effectiveness of deep convolutional neural network (DCNN) for food photo recognition task. Food recognition is a kind of fine-grained visual recognition which is relatively harder problem than conventional image recognition. To tackle this problem, we sought the best combination of DCNN-related techniques such as pre-training with the large-scale ImageNet data, fine-tuning and activation features extracted from the pre-trained DCNN. From the experiments, we concluded the fine-tuned DCNN which was pre-trained with 2000 categories in the ImageNet including 1000 food-related categories was the best method, which achieved 78.77% as the top-1 accuracy for UEC-FOOD100 and 67.57% for UEC-FOOD256, both of which were the best results so far.
In addition, we applied the food classifier employing the best combination of the DCNN techniques to Twitter photo data. We have achieved the great improvements on food photo mining in terms of both the number of food photos and accuracy. In addition to its high classification accuracy, we found that DCNN was very suitable for large-scale image data, since it takes only 0.03 seconds to classify one food photo with GPU.
研究論文（国際会議プロシーディングス）, 英語

A VISUAL ANALYSIS ON RECOGNIZABILITY AND DISCRIMINABILITY OF ONOMATOPOEIA WORDS WITH DCNN FEATURES
Wataru Shimoda; Keiji Yanai
2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), IEEE, 掲載ページ -, 出版日 2015年, 査読付, In this paper, we examine the relation between onomatopoeia and images using a large number of Web images The objective of this paper is to examine if the images corresponding to Japanese onomatopoeia words which express the feeling of visual appearance can be recognized by the state-of-the-art visual recognition methods. In our work, first, we collect the images corresponding to onomatopoeia words using an Web image search engine, and then we filter out noise images to obtain clean dataset with automatic image re-ranking method. Next, we analyze the recognizability of various kinds of onomatopoeia images using improved Fisher vector (IFV) and deep convolutional neural network (DCNN) features. In addition, we collect images corresponding to the pairs of nouns and onomatopoeia words, and we examine if the images associated with the same nouns and the different onomatopoeia words are visually discriminable or not. By the experiments, it has been shown that the DCNN features extracted from the layer 7 of Overfeat's network pre-trained with the ILSVRC 2013 data have prominent ability to represent onomatopoeia images, and most of the onomatopoeia words have visual characteristics which can be recognized.
研究論文（国際会議プロシーディングス）, 英語

A review of web image mining
Keiji Yanai
ITE Transactions on Media Technology and Applications, Institute of Image Information and Television Engineers, 3巻, 3号, 掲載ページ 156-169, 出版日 2015年, 査読付, 招待, In this paper, we review works related to big visual data on the Web in the literature of computer vision and multimedia research regarding the following points: (1) Web image acquisition for construction of visual concept database for image/video recognition, (2) Web image application for visual concept analysis and data-driven computer graphics, and (3) real-world sensing through Web images to detect location-dependent and event-related visual information.
研究論文（学術雑誌）, 英語
DOI URL

CNN-Based Food Image Segmentation Without Pixel-Wise Annotation
Wataru Shimoda; Keiji Yanai
NEW TRENDS IN IMAGE ANALYSIS AND PROCESSING - ICIAP 2015 WORKSHOPS, SPRINGER-VERLAG BERLIN, 9281巻, 掲載ページ 449-457, 出版日 2015年, 査読付, We propose a CNN-based food image segmentation which requires no pixel-wise annotation. The proposed method consists of food region proposals by selective search and bounding box clustering, back propagation based saliency map estimation with the CNN model fine-tuned with the UEC-FOOD100 dataset, GrabCut guided by the estimated saliency maps and region integration by non-maximum suppression. In the experiments, the proposed method outperformed RCNN regarding food region detection as well as the PASCAL VOC detection task.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Twitter Event Photo Detection Using both Geotagged Tweets and Non-geotagged Photo Tweets
Kaneko Takamu; Nga Do Hang; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT II, SPRINGER INT PUBLISHING AG, 9315巻, 掲載ページ 128-138, 出版日 2015年, 査読付, In this paper, we propose a system to detect event photos using geotagged tweets and non-geotagged photo tweets. In our previous work, only "geotagged photo tweets" was used for event photo detection the ratio of which to the total tweets was very limited. In the proposed system, we use geotagged tweets without photos for event detection, and non-geotagged photo tweets for event photo detection in addition to geotagged photo tweets. As results, we have detected about ten times of the photo events with higher accuracy compared to the previous work.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Low-Bit Representation of Linear Classifier Weights for Mobile Large-Scale Image Classification
Yoshiyuki Kawano; Keiji Yanai
Proceedings 3rd IAPR Asian Conference on Pattern Recognition ACPR 2015, IEEE, -巻, 掲載ページ 489-493, 出版日 2015年, 査読付, In this paper, we propose an effective method to implement a system of large-scale visual recognition where the number of classes is more than 1000 on mobile devices. Because the size of memory and storage on mobile devices such as smartphones is limited, the size of image recognition application should be as small as possible. To save the required memory of mobile visual recognition, we proposed a scalar-based classifier weight compression method before [6]. Although it is very simple and effective, it has the drawback that the performance is degraded largely in case of lower-bit representation. Then, in this paper, we propose an improved method to make 2-bit and 1-bit representation feasible, and make more comprehensive experiments including more large-scale 10k image classification with combination of the proposed improved scalar-based compression method and product quantization.
研究論文（国際会議プロシーディングス）, 英語

Real-time Food Image Mining and Analysis from the Twitter Stream
Keiji Yanai; Yoshiyuki Kawano
Proc. of Pacific-Rim Conference on Multimedia, -巻, 出版日 2014年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Food Image Recognition using Deep Convolutional Features Pre-trained with Food-related Categories
Yoshiyuki Kawano; Keiji Yanai
Proc. of PCM Workshop on Multimedia Big Data Analytics (MBDA), 掲載ページ -, 出版日 2014年12月, 査読付, 招待
研究論文（国際会議プロシーディングス）, 英語

Object Categorization by Local Feature Matching with a Large Number of Web Images
Mizuki Akiyama; Yoshiyuki Kawano; Keiji Yanai
Proc. of PCM Workshop on Multimedia Big Data Analytics (MBDA), 掲載ページ -, 出版日 2014年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

An Analysis on Visual Recognizability of Onomatopoeia Using Web Images and DCNN features
Wataru Shimoda; Keiji Yanai
Proc. of PCM Workshop on Multimedia Big Data Analytics (MBDA), 掲載ページ -, 出版日 2014年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

FoodCam-256: A large-scale real-time mobile food recognition system employing high-dimensional features and compression of classifier weights
Yoshiyuki Kawano; Keiji Yanai
MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, Association for Computing Machinery, Inc, Demo paper巻, 掲載ページ 761-762, 出版日 2014年11月03日, 査読付, In the demo, we demonstrate a large-scale food recognition system employing high-dimensional Fisher Vector and liner one-vsrest classifiers. Since all the processes on image recognition perform on a smartphone, the system does not require an external image recognition server, and runs on an ordinary smartphone in a real-time way. The proposed system can recognize 256 kinds of food by using the UEC-Food256 food image dataset we built by ourselves recently as a training dataset. To implement an image recognition system employing high-dimensional features on mobile devices, we propose linear weight compression method to save memory. In the experiments, we proved that the proposed compression methods make a little performance loss, while we can reduce the amount of weight vectors to 1/8. The proposed system has not only food recognition function but also the functions of estimation of food calorie and nutritious and recording a user's eating habits. In the experiments with 100 kinds of food categories, we have achieved the 74.4% classification rate for the top 5 category candidates. The prototype system is open to the public as an Androidbased smartphone application.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

UEC at TRECVID 2014 SIN task
Keiji Yanai; Hiroyoshi Harada; Do Hang Nga
Proc. of TRECVID Workshop, 掲載ページ -, 出版日 2014年11月
研究論文（国際会議プロシーディングス）, 英語

Analyzing the similarities of actions based on video clustering
Vu Gia Truong; Do Hang Nga; Keiji Yanai
Proc. of International Workshop on Modern Science and Technology (IWMST), 掲載ページ -, 出版日 2014年10月, 査読付
研究論文（国際会議プロシーディングス）, 英語

道案内動画の作成のためのウェアラブルカメラ映像の自動要約
岡本昌也; 柳井啓司
電子情報通信学会論文誌D, J97-D巻, 8号, 出版日 2014年08月
研究論文（学術雑誌）, 日本語

Realtime Eating Action Recognition System on a SmartphoneICME Workshop on Mobile Multimedia Computing, (2014).
Koichi Okamoto; Keiji Yana
Proc. of ICME Workshop on Mobile Multimedia Computing, -巻, 出版日 2014年07月, 査読付
研究論文（国際会議プロシーディングス）, 英語

A cooking recipe recommendation system with visual recognition of food ingredients
Keiji Yanai; Takuma Maruyama; Yoshiyuki Kawano
International Journal of Interactive Mobile Technologies, International Association of Online Engineering, 8巻, 2号, 掲載ページ 28-34, 出版日 2014年, 査読付, In this paper, we propose a cooking recipe recommendation system which runs on a consumer smartphone as an interactive mobile application. The proposed system employs real-time visual object recognition of food ingredients, and recommends cooking recipes related to the recognized food ingredients. Because of visual recognition, by only pointing a built-in camera on a smartphone to food ingredients, a user can get to know a related cooking recipes instantly. The objective of the proposed system is to assist people who cook to decide a cooking recipe at grocery stores or at a kitchen. In the current implementation, the system can recognize 30 kinds of food ingredient in 0.15 seconds, and it has achieved the 83.93% recognition rate within the top six candidates. By the user study, we confirmed the effectiveness of the proposed system.
研究論文（学術雑誌）, 英語
DOI URL

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance
Masaya Okamoto; Keiji Yanai
IMAGE AND VIDEO TECHNOLOGY, PSIVT 2013, SPRINGER-VERLAG BERLIN, 8333巻, 掲載ページ 431-442, 出版日 2014年, 査読付, In this paper, we propose a method to summarize an egocentric moving video (a video recorded by a moving wearable camera) for generating a walking route guidance video. To summarize an egocentric video, we analyze it by applying pedestrian crosswalk detection as well as ego-motion classification, and estimate an importance score of each section of the given video. Based on the estimated importance scores, we dynamically control video playing speed instead of generating a summarized video file in advance. In the experiments, we prepared an egocentric moving video dataset including more than one-hour-long videos totally, and evaluated crosswalk detection and ego-motion classification methods. Evaluation of the whole system by user study has been proved that the proposed method is much better than a simple baseline summarization method without video analysis.
研究論文（国際会議プロシーディングス）, 英語

Automatic extraction of relevant video shots of specific actions exploiting Web data
Do Hang Nga; Keiji Yanai
COMPUTER VISION AND IMAGE UNDERSTANDING, ACADEMIC PRESS INC ELSEVIER SCIENCE, 118巻, 掲載ページ 2-15, 出版日 2014年01月, 査読付, Video sharing websites have recently become a tremendous video source, which is easily accessible without any costs. This has encouraged researchers in the action recognition field to construct action database exploiting Web sources. However Web sources are generally too noisy to be used directly as a recognition database. Thus building action database from Web sources has required extensive human efforts on manual selection of video parts related to specified actions. In this paper, we introduce a novel method to automatically extract video shots related to given action keywords from Web videos according to their metadata and visual features. First, we select relevant videos among tagged Web videos based on the relevance between their tags and the given keyword. After segmenting selected videos into shots, we rank these shots exploiting their visual features in order to obtain shots of interest as top ranked shots. Especially, we propose to adopt Web images and human pose matching method in shot ranking step and show that this application helps to boost more relevant shots to the top. This unsupervised method of ours only requires the provision of action keywords such as "surf wave" or "bake bread" at the beginning. We have made large-scale experiments on various kinds of human actions as well as non-human actions and obtained promising results. (C) 2013 Elsevier Inc. All rights reserved.
研究論文（学術雑誌）, 英語
DOI URL

A dense SURF and triangulation based spatio-temporal feature for action recognition
Do Hang Nga; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8325巻, 1号, 掲載ページ 375-387, 出版日 2014年, 査読付, In this paper, we propose a novel method of extracting spatio-temporal features from videos. Given a video, we extract its features according to every set of N frames. The value of N is small enough to guarantee the temporal denseness of our features. For each frame set, we first extract dense SURF keypoints from its first frame. We then select points with the most likely dominant and reliable movements, and consider them as interest points. In the next step, we form triangles of interest points using Delaunay triangulation and track points within each triple through the frame set. We extract one spatio-temporal feature from each triangle based on its shape feature along with the visual features and optical flows of its points. This enables us to extract spatio-temporal features based on groups of related points and their trajectories. Hence the features can be expected to be robust and informative. We apply Fisher Vector encoding to represent videos using the proposed spatio-temporal features. We conduct experiments on several challenging benchmarks, and show the effectiveness of our proposed method. © 2014 Springer International Publishing.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

FoodCam: A real-time mobile food recognition system employing Fisher Vector
Yoshiyuki Kawano; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8326巻, 2号, 掲載ページ 369-373, 出版日 2014年, 査読付, In the demo, we demonstrate a mobile food recognition system with Fisher Vector and liner one-vs-rest SVMs which enables us to record our food habits easily. In the experiments with 100 kinds of food categories, we have achieved the 79.2% classification rate for the top 5 category candidates when the ground-truth bounding boxes are given. The prototype system is open to the public as an Android-based smartphone application. © 2014 Springer International Publishing.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Offline 1000-Class Classification on a Smartphone
Yoshiyuki Kawano; Keiji Yanai
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), IEEE, 掲載ページ 193-194, 出版日 2014年, 査読付, In this demo, we propose an offline large-scale image classification system on a smartphone. The proposed system can classify 1000-class objects in the ILSVRC2012 dataset in 0.270 seconds. To implement a 1000-class object classification system, we compress the weight vectors of linear classifiers, which leads only slight performance loss.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

ILSVRC on a smartphone
Yoshiyuki Kawano; Keiji Yanai
IPSJ Transactions on Computer Vision and Applications, Information Processing Society of Japan, 6巻, 掲載ページ 83-87, 出版日 2014年, 査読付, In this work, to the best of our knowledge, we propose a stand-alone large-scale image classification system running on an Android smartphone. The objective of this work is to prove that mobile large-scale image classification requires no communication to external servers. To do that, we propose a scalar-based compression method for weight vectors of linear classifiers. As an additional characteristic, the proposed method does not need to uncompress the compressed vectors for evaluation of the classifiers, which brings the saving of recognition time. We have implemented a large-scale image classification system on an Android smartphone, which can perform 1000-class classification for a given image in 0.270 seconds. In the experiment, we show that compressing the weights to 1/8 leaded to only 0.80% performance loss for 1000-class classification with the ILSVRC2012 dataset. In addition, the experimental results indicate that weight vectors compressed in low bits, even in the binarized case (bit = 1), are still valid for classification of high dimensional vectors.
研究論文（学術雑誌）, 英語
DOI URL

REAL-TIME EATING ACTION RECOGNITION SYSTEM ON A SMARTPHONE
Koichi Okamoto; Keiji Yanai
2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, ELECTRON DEVICES SOC & RELIABILITY GROUP, -巻, 出版日 2014年, 査読付, Recently, many mobile applications to record everyday meals for dieting have been popular. Some of them can recognize names of food items in meals by only taking photos. However, such image-recognition-based food recording systems requires taking meal photos before eating, which are not applicable for the meals in which the amount of food to be eaten is not decided before eating such as large platter for sharing and barbecue-style dishes.
Then in this paper, we propose a mobile real-time eating action recognition system. It continuously recognizes user's eating action and estimates the categories of eaten food items during mealtime. With this system, we can get to know total amount of eaten food items, and can calculate total calories of eaten foods even for the meals where the amount of foods to be eaten is not decided before starting eating.
The system implemented on a smartphone continuously monitor eating actions during mealtime. It detects the moment when a user eats foods, extract food regions near the user's mouth and classify them. In the experiments, we implemented a mobile system the target of which is Japanesestyle "Yakiniku" where people eat meats and vegetables while grilling. It can recognize five different kinds of ingredients for "Yakiniku" such as beef, carrot and pumpkin in the real-time way. It has achieved 74.8% classification rate, and was evaluated as being superior to the baseline system which employed no eating action recognition by user study.
研究論文（国際会議プロシーディングス）, 英語

Food image recognition with deep convolutional features
Yoshiyuki Kawano; Keiji Yanai
UbiComp 2014 - Adjunct Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Association for Computing Machinery, Inc, -巻, 掲載ページ 589-593, 出版日 2014年, 査読付, In this paper, we report the feature obtained from the Deep Convolutional Neural Network boosts food recognition accuracy greatly by integrating it with conventional hand-crafted image features, Fisher Vectors with HoG and Color patches. In the experiments, we have achieved 72.26% as the top-1 accuracy and 92.00% as the top-5 accuracy for the 100-class food dataset, UEC-FOOD100, which outperforms the best classification accuracy of this dataset reported so far, 59.6%, greatly.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Real-time Photo Mining from the Twitter Stream: Event Photo Discovery and Food Photo Detection
Keiji Yanai; Takamu Takamu; Yoshiyuki Kawano
2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), IEEE, 掲載ページ 295-302, 出版日 2014年, 招待, So many people are posting photos as well as short messages to Twitter every minutes from everywhere on the earth. By monitoring the Twitter stream, we can obtain various kinds of photos with texts. In this paper, as case studies of real-time Twitter photo mining, we introduce our current on-going projects on event photo discovery and food photo mining from the Twitter stream.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

複数品目が含まれる食事画像の認識における共起関係の利用
松田裕司; 柳井啓司
電子情報通信学会論文誌D, 一般社団法人電子情報通信学会, J96-D巻, 8号, 掲載ページ 1724-1730, 出版日 2013年08月, 査読付, 一般に食事には複数の料理が含まれることが多く,それらには組合せが存在すると考えられる.本論文では,複数の料理を含む画像の認識において,料理間の共起関係を考慮する手法を提案する.提案手法は,SVMによる評価値を,共起確率を用いたManifold Rankingにより再ランキングすることで最終的な評価値を得る.共起確率は,データベース上の共起頻度,及びWeb上のテキストを用いて求めた.実験では,複数品目を含む画像に対して,100種類の料理分類を行ったところ,10個の候補を提示したときに,共起関係を利用しない従来手法と比べ,データベースから共起確率を求めた場合には8.8ポイント向上し,64.6%の分類率を達成した.これにより料理間の共起関係の利用が複数品目の食事画像認識において有効であることが示された.
研究論文（学術雑誌）, 日本語
URL

Real-time Mobile Food Recognition System
Yoshiyuki Kawano; Keiji Yanai
Proc. of IEEE CVPR International Workshop on Mobile Vision, 掲載ページ -, 出版日 2013年06月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Real-time Mobile Food Recognition System
Yoshiyuki Kawano; Keiji Yanai
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), IEEE, 掲載ページ 1-7, 出版日 2013年, 査読付, We propose a mobile food recognition system the purposes of which are estimating calorie and nutritious of foods and recording a user's eating habits. Since all the processes on image recognition performed on a smartphone, the system does not need to send images to a server and runs on an ordinary smartphone in a real-time way.
To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract a color histogram and SURF-based bag-of-features, and finally classify it into one of the fifty food categories with linear SVM and fast chi(2) kernel. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, show it as an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly about once a second. We implemented this system as an Android smartphone application so as to use multiple CPU cores effectively for real-time recognition.
In the experiments, we have achieved the 81.55% classification rate for the top 5 category candidates when the ground-truth bounding boxes are given. In addition, we obtained positive evaluation by user study compared to the food recording system without object recognition.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Visual analysis of tag co-occurrence on nouns and adjectives
Yuya Kohara; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7732巻, 1号, 掲載ページ 47-57, 出版日 2013年, 査読付, In recent years, due to the wide spread of photo sharing Web sites such as Flickr and Picasa, we can put our own photos on the Web and show them to the public easily. To make the photos searched for easily, it is common to add several keywords which are called as "tags" when we upload photos. However, most of the tags are added one by one independently without much consideration of association between the tags. Then, in this paper, as a preparation for realizing simultaneous recognition of nouns and adjectives, we examine visual relationship between tags, particularly noun tags and adjective tags, by analyzing image features of a large number of tagged photos in social media sites on the Web with mutual information. As a result, it was turned out that mutual information between some nouns such as "car" and "sea" and adjectives related to color such as "red" and "blue" was relatively high, which showed that their relations were stronger. © Springer-Verlag 2013.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

VISUAL EVENT MINING FROM GEO-TWEET PHOTOS
Takamu Kaneko; Keiji Yanai
ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, 掲載ページ -, 出版日 2013年, 査読付, In this paper, we propose a system to mine events visually from the Twitter stream by making use of "geo-tweet photos" which are tweets including both geotags and photos. Some works on event mining which utilize geotagged tweets have been proposed so far. However, they used no images but only textual analysis of tweet texts. In this work, we detect events using visual information as well as textual information. In the experiments, we show some examples of detected events and their photos such as "rainbow", "fireworks" and "Tokyo firefly festival".
研究論文（国際会議プロシーディングス）, 英語

[DEMO PAPER] MIRURECIPE: A MOBILE COOKING RECIPE RECOMMENDATION SYSTEM WITH FOOD INGREDIENT RECOGNITION
Yoshiyuki Kawano; Takanori Sato; Takuma Maruyama; Keiji Yanai
ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, (demo paper)巻, 出版日 2013年, 査読付, In this demo, we demonstrate a cooking recipe recommendation system which runs on a consumer smartphone. The proposed system carries out object recognition on food ingredients in a real-time way, and recommends cooking recipes related to the recognized food ingredients. By only pointing a built-in camera on a mobile device to food ingredients, the user can obtain a recipe list instantly. The objective of the proposed system is to assist people who cook to decide a cooking recipe at grocery stores or at a kitchen. In the current implementation, the system can recognize 30 kinds of food ingredient in 0.15 seconds, and it achieved the 83.93% recognition rate within the top six candidates.
[GRAPHICS]
.
研究論文（国際会議プロシーディングス）, 英語

Twitter visual event mining system
Takamu Kaneko; Hiroyoshi Harada; Keiji Yanai
Electronic Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2013, (demo paper)巻, 出版日 2013年, 査読付, In this demo, we demonstrate a system to mine events visually from the Twitter stream by making use of 'geo-tweet photos'. Some works on event mining which utilize geotagged tweets have been proposed so far. However, they used no images but only textual analysis of tweet texts. In this work, we detect events using visual information as well as textual information, which is the first work to mine event photos automatically from a huge number of Twitter photos, as long as we know. In the experiments, we show some examples of detected events and their photos such as 'blooming cherry blossom' and 'Tokyo firefly festival'. © 2013 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Large-scale web video shot ranking based on visual features and tag co-occurrence
Do Hang Nga; Keiji Yanai
MM 2013 - Proceedings of the 2013 ACM Multimedia Conference, 掲載ページ 525-528, 出版日 2013年, 査読付, In this paper, we propose a novel ranking method, Visu- AlTextualRank, which extends [1] and [2]. Our method is based on random walk over bipartite graph to integrate vi- sual information of video shots and tag information of Web videos ectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and tex- Tual information of their corresponding videos to improve shot ranking. We apply our proposed method to the system of extracting automatically relevant video shots of specic actions from Web videos [3]. Based on our experimental re- sults, we demonstrate that our ranking method can improve the performance of video shot retrieval. Copyright © 2013 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Rapid mobile object recognition using fisher vector
Yoshiyuki Kawano; Keiji Yanai
Proceedings - 2nd IAPR Asian Conference on Pattern Recognition, ACPR 2013, IEEE Computer Society, 掲載ページ 476-480, 出版日 2013年, 査読付, We propose a real-time object recognition method for a smart phone, which consists of light-weight local features, Fisher Vector and linear SVM. As light local descriptors, we adopt a HOG Patch descriptor and a Color Patch descriptor, and sample them from an image densely. Then we encode them with Fisher Vector representation, which can save the number of visual words greatly. As a classifier, we use a liner SVM the computational cost of which is very low. In the experiments, we have achieved the 79.2% classification rate for the top 5 category candidates for a 100-category food dataset. It outperformed the results using a conventional bag-of-features representation with a chi-square-RBF- kernel-based SVM. Moreover, the processing time of food recognition takes only 0.065 seconds, which is four times as faster as the existing work. © 2013 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

A Spatio-Temporal Feature based on Triangulation of Dense SURF
Do Hang Nga; Keiji Yanai
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), IEEE, 掲載ページ 420-427, 出版日 2013年, 査読付, In this paper, we propose a spatio-temporal feature which is based on the appearance and movement of interest SURF keypoints. Given a video, we extract its spatiotemporal features according to every small set of frames. For each frame set, we first extract dense SURF keypoints from its first frame and estimate their optical flows at each frame. We then detect camera motion and compensate flow vectors in case camera motion exists. Next, we select interest points based on their movement based relationship through the frame set. We then apply Delaunay triangulation to form triangles of selected points. From each triangle we extract its shape feature along with trajectory based visual features of its points. We show that concatenating these features with SURF feature can form a spatio-temporal feature which is comparable to the state of the art. Our proposed spatio-temporal feature is supposed to be robust and informative since it is not based on characteristics of individual points but groups of related interest points. We apply Fisher Vector encoding to represent videos using the proposed feature. We conduct various experiments on UCF101, the largest action dataset of realistic videos up to date, and show the effectiveness of our proposed method.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Yuji Matsuda and Keiji Yanai
Yuji Matsuda; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition, 掲載ページ -, 出版日 2012年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

UEC at TRECVID 2012 SIN and MED task
Kazuya Hizume; Keiji Yanai
Proc. of TRECVID Workshop, 掲載ページ -, 出版日 2012年11月
研究論文（国際会議プロシーディングス）, 英語

Entropy-Based Analysis of Visual and Geolocation Concepts in Images
Keiji Yanai; Hidetoshi Kawakubo; Kobus Barnard
Multimedia Information Extraction: Advances in Video, Audio, and Imagery Analysis for Search, Data Mining, Surveillance, and Authoring, John Wiley and Sons, 掲載ページ 63-80, 出版日 2012年08月24日, 査読付
論文集(書籍)内論文, 英語
DOI URL

候補領域推定に基づく複数品目食事画像認識
松田裕司; 甫足創; 柳井啓司
電子情報通信学会論文誌D, 一般社団法人電子情報通信学会, J95-D巻, 8号, 掲載ページ 1554-1564, 出版日 2012年08月, 査読付, 本研究では食事内容を少ない手間で記録するために,画像認識技術を用いて,画像中に含まれると推測される料理名の候補を表示する認識エンジンを構築した.我々は以前,Multiple Kernel Learningを用いて,色特徴や局所領域特徴等の複数の特徴を統合して学習・分類を行う認識エンジンを構築した.本研究では食事画像認識手法の改良を行う.高速スライディングウィンドウ探索や領域分割,円検出を用いて,画像中の料理の位置候補を推定し,その部分に対して従来の手法による分類を行うことで,画像中に複数の料理がある場合に対応した.実験では,100種類の料理について分類を行い性能を評価を行った.その結果,複数の領域検出法を組み合わせて用いることで,10個の候補を表示したとき,単品を含む画像では,従来手法と比べ5.4ポイント向上し,69.6%,複数品を含む画像では,従来手法と比べ40.1ポイント向上し55.5%の分類率を達成し,特に複数品を含む食事画像の認識において提案手法は有効であることが示された.
研究論文（学術雑誌）, 日本語
URL

Visual Analysis on Relations between Nouns and Adjectives Using a Large Number of Web Images
Yuuya Kohara; Keiji Yanai
Proc. of International Workshop on Modern Science and Technology (IWMST), 掲載ページ -, 出版日 2012年08月
研究論文（国際会議プロシーディングス）, 英語

Multiple-Food Recognition Considering Co-occurrence Employing Manifold Ranking
Yuji Matsuda; Keiji Yanai
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), IEEE, 掲載ページ 2017-2020, 出版日 2012年, 査読付, In this paper, we propose a method to recognize food images which include multiple food items considering co-occurrence statistics of food items. The proposed method employs a manifold ranking method which has been applied to image retrieval successfully in the literature. In the experiments, we prepared co-occurrence matrices of 100 food items using various kinds of data sources including Web texts, Web food blogs and our own food database, and evaluated the final results obtained by applying manifold ranking. As results, it has been proved that co-occurrence statistics obtained from a food photo database is very helpful to improve the classification rate within the top ten candidates.
研究論文（国際会議プロシーディングス）, 英語

A SURF-based spatio-temporal feature for feature-fusion-based action recognition
Akitsugu Noguchi; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6553巻, 1号, 掲載ページ 153-167, 出版日 2012年, 査読付, In this paper, we propose a novel spatio-temporal feature which is useful for feature-fusion-based action recognition with Multiple Kernel Learning (MKL). The proposed spatio-temporal feature is based on moving SURF interest points grouped by Delaunay triangulation and on their motion over time. Since this local spatio-temporal feature has different characteristics from holistic appearance features and motion features, it can boost action recognition performance for both controlled videos such as the KTH dataset and uncontrolled videos such as Youtube datasets, by combining it with visual and motion features with MKL. In the experiments, we evaluate our method using KTH dataset, and Youtube dataset. As a result, we obtain 94.5% as a classification rate for in KTH dataset which is almost equivalent to state-of-art, and 80.4% for Youtube dataset which outperforms state-of-the-art greatly. © 2012 Springer-Verlag.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

World Seer : A realtime geo-tweet photo mapping system
Keiji Yanai
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR 2012, (demo paper)巻, 出版日 2012年, 査読付, Twitter is a unique microblog which is different from conventional social media in terms of its quickness. Many Twitter's users send messages to Twitter on the spot with mobile phones or smart phones, and some of them send tweets with photos and geotags, which can be regarded as being geotagged photos. Geotagged tweet photos are very useful to understand what happens currently over the world. In the demo, we introduce "World Seer" which is a real-time geo-tweet photo mapping system. Users can see the latest geo-tweet photos related to given keywords and areas on the online maps. The system shows geo-tweet photos not only on the map, but also on the street-view. In addition, for some parts of the geo-tweet photos, the system can show representative photos for the given locations and the given times employing the GeoVisualRank method which takes into account both visual features of photos and proximity of geotags. Copyright © 2012 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Automatic collection of web video shots corresponding to specific actions using web images
Do Hang Nga; Keiji Yanai
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 掲載ページ 15-20, 出版日 2012年, 査読付, In this paper, we apply Web images to the problem of automatically extracting video shots corresponding to specific actions from Web videos. Our framework modifies the unsupervised method on automatic collecting of Web video shots corresponding to the given actions which we proposed last year [9]. For each action, following that work, we first exploit tag relevance to gather 200 most relevant videos of the given action and segment each video into several video shots. Shots are then converted into bags of spatio-temporal features and ranked by the VisualRank method. We refine the approach by introducing the use of Web action images into shot ranking step. We select images by applying Pose-lets [2] to detect human in the case of human actions. We test our framework on 28 human action categories whose precision values were 20% or below and 8 non-human action categories whose precision values were less than 15% in [9]. The results show that our model can improve the precision approximately 6% over 28 human action categories and 16% over 8 non-human action categories. © 2012 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Recognition of multiple-food images by detecting candidate regions
Yuji Matsuda; Hajime Hoashi; Keiji Yanai
Proceedings - IEEE International Conference on Multimedia and Expo, 掲載ページ 25-30, 出版日 2012年, 査読付, In this paper, we propose a two-step method to recognize multiple-food images by detecting candidate regions with several methods and classifying them with various kinds of features. In the first step, we detect several candidate regions by fusing outputs of several region detectors including Felzenszwalb's deformable part model (DPM) [1], a circle detector and the JSEG region segmentation. In the second step, we apply a feature-fusion-based food recognition method for bounding boxes of the candidate regions with various kinds of visual features including bag-of-features of SIFT and CSIFT with spatial pyramid (SP-BoF), histogram of oriented gradient (HoG), and Gabor texture features. In the experiments, we estimated ten food candidates for multiple-food images in the descending order of the confidence scores. As results, we have achieved the 55.8% classification rate, which improved the baseline result in case of using only DPM by 14.3 points, for a multiple-food image data set. This demonstrates that the proposed two-step method is effective for recognition of multiple-food images. © 2012 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

VISUALIZATION OF REAL-WORLD EVENTS WITH GEOTAGGED TWEET PHOTOS
Yusuke Nakaji; Keiji Yanai
2012 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, 掲載ページ 272-277, 出版日 2012年, Recently, microblogs such as Twitter have become very common, which enable people to post and read short messages from anywhere. Since microblogs are different from traditional blogs in terms of being instant and on the spot, they include much more information on various events happened over the world. In addition, some of the messages posted to Twitter include photos and geotags as well as texts. From them, we can get to know what and where happens intuitively.
Then, we propose a method to select photos related to the given real-world events from geotagged Twitter messages (tweets) taking advantage of geotags and visual features of photos. We implemented a system which can visualize real-world events on the online map.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Real-time mobile recipe recommendation system using food ingredient recognition
Takuma Maruyama; Yoshiyuki Kawano; Keiji Yanai
IMMPD 2012 - Proceedings of the 2012 ACM Workshop on Interactive Multimedia on Mobile and Portable Devices, Co-located with ACM Multimedia 2012, 掲載ページ 27-33, 出版日 2012年, 査読付, In this paper, we propose a mobile cooking recipe recommendation system employing object recognition for food ingredients such as vegetables and meats. The proposed system carries out object recognition on food ingredients in a real-time way on an Android-based smartphone, and recommends cooking recipes related to the recognized food ingredients. By only pointing a built-in camera on a mobile device to food ingredients, the user can obtain a recipe list instantly. As an object recognition method, we adopt bag-of-features with SURF and color histogram extracted from multiple images as image features and linear SVM with the one-vs-rest strategy as a classifier. We built 30 kinds of food ingredient short video database for experiments. With this database, we achieved the 83.93% recognition rate within the top six candidates. In the experiment, we made user study by comparing mobile recipe recommendation systems with/without ingredient recognition. © 2012 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

A Travel Planning System Based on Travel Trajectories Extracted from a Large Number of Geotagged Photos on the Web
Kohya Okuyama; Keiji Yanai
Proc. of Pacific-Rim Conference on Multimedia, 掲載ページ -, 出版日 2011年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

UEC at TRECVID 2011 SIN and MED task
Kazuya Hizume; Keiji Yanai
Proc. of TRECVID Workshop, 掲載ページ -, 出版日 2011年11月
研究論文（国際会議プロシーディングス）

Folksonomyを用いた画像特徴とタグ共起に基づく画像オントロジーの自動構築
秋間雄太; 川久保秀敏; 柳井啓司
電子情報通信学会論文誌D, 一般社団法人電子情報通信学会, J94-D巻, 8号, 掲載ページ 1248-1259, 出版日 2011年08月, 近年,Folksonomyの出現により,データベースにタグなどによって意味的な価値を付与することが進められてきたが,階層構造のような概念間の関係を組み込んでいるデータベースは少ない.そこで,本研究では,意味的な階層構造を考慮した画像データベースの作成方法を提案する.階層構造の構築方法は,大量の画像データの各概念のノイズを除去した後に,各概念を視覚特徴を用いたベクトル表現,タグを用いたベクトル表現,視覚特徴とタグを統合したベクトル表現の3種類のベクトル表現で,JSダイバージェンスによる距離尺度を用いて概念間の距離関係を推定し,更に概念エントロピーを作成することで,概念の広がりから上下関係を推測する.最終的には,作成した階層構造を,視覚的な特徴のみで作成した場合とタグ特徴のみで作成した場合,そしてタグと視覚特徴を結合した場合での表現結果を考察した.結果として,視覚特徴での階層構造,タグ情報による階層構造のそれぞれにおいて特有の階層構造を確認することができ,また,統合した階層構造は両方の階層構造を加味し,それぞれの特徴を内包した新しい階層構造を作り出すことに成功した.構築された階層構造には人手での発見が難しい概念間の関係が含まれ,画像検索へ役立つ可能性を示す.
研究論文（学術雑誌）, 日本語
URL

Geotagged Image Recognition by Combining Three Different Kinds of Geo location Features
Keita Yaegashi; Keiji Yanai
COMPUTER VISION - ACCV 2010, PT II, SPRINGER-VERLAG BERLIN, 6493巻, 掲載ページ 360-373, 出版日 2011年, 査読付, Scenes and objects represented in photos have causal relationship to the places where they are taken. In this paper, we propose using geo-information such as aerial photos and location-related texts as features for geotagged image recognition and fusing them with Multiple Kernel Learning (MKL). By the experiments, we have verified the possibility for reflecting location contexts in image recognition by evaluating not only recognition rates, but feature fusion weights estimated by MKL. As a result, the mean average precision (MAP) for 28 categories increased up to 80.87% by the proposed method, compared with 77.71% by the baseline. Especially, for the categories related to location-dependent concepts, MAP was improved by 6.57 points.
研究論文（国際会議プロシーディングス）, 英語

GeoVisualRank: A ranking method of geotagged imagesconsidering visual similarity and geo-location proximity
Hidetoshi Kawakubo; Keiji Yanai
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, 掲載ページ 69-70, 出版日 2011年, 査読付
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Automatic Construction of an Action Video Shot Database using Web Videos
Do Hang Nga; Keiji Yanai
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 掲載ページ 527-534, 出版日 2011年, 査読付, There are a huge number of videos with text tags on the Web nowadays. In this paper, we propose a method of automatically extracting from Web videos video shots corresponding to specific actions with just only providing action keywords such as "walking" and "eating".
The proposed method consists of three steps: (1) tag-based video selection, (2) segmenting videos into shots and extracting features from the shots, and (3) visual-feature-based video shot selection with tag-based scores taken into account. Firstly, we gather video IDs and tag lists for 1000 Web videos corresponding to given keywords via Web API, and we calculate tag relevance scores for each video using a tag-co-occurrence dictionary which is constructed in advance. Secondly, we fetch the top 200 videos from the Web in the descending order of the tag relevance scores, and segment each downloaded video into several shots. From each shot we extract spatio-temporal features, global motion features and appearance features, and convert them into the bag-of-features representation. Finally, we apply the VisualRank method to select the video shots which describe the actions corresponding to the given keywords best after calculating a similarity matrix between video shots. In the experiments, we achieved the 49.5% precision at 100 shots over six kinds of human actions by just providing keywords without any supervision. In addition, we made large-scale experiments on 100 kinds of action keywords.
研究論文（国際会議プロシーディングス）, 英語

UEC at TRECVID 2010 Semantic Indexing Task
Yasushi Shimoda; Akitsugu Nogochi; Keiji Yanai
Proc. of TRECVID Workshop, 掲載ページ -, 出版日 2010年11月
研究論文（国際会議プロシーディングス）, 英語

Multiple Kernel Learning による50 種類の食事画像の認識
上東太一; 柳井啓司
電子情報通信学会論文誌D, 一般社団法人電子情報通信学会, J93-D巻, 8号, 掲載ページ 1397-1406, 出版日 2010年08月, 査読付, 近年,食事に関する健康管理が注目され,より簡単に食事内容が記録できるシステムの実現が望まれている.そこで,本研究では,画像認識技術を用いて食事内容を記録するシステムを提案する.画像認識手法としては最新の機械学習の手法であるMultiple Kernel Learning(MKL)を用いて,局所特徴,色特徴,テクスチャ特徴などの複数種類の画像特徴を統合し,高精度な認識を実現することを提案する.MKLを用いることにより,カテゴリーごとに認識に有効な画像特徴を自動的に推定し,各特徴に対して最適な重みを学習することが可能となる.それに加え,本研究では,提案した食事画像認識手法を組み込んだ食事画像認識システムのプロトタイプを実装した.実験では,50種類の食事画像データセットを構築し,提案手法の評価を行い,平均分類率61.34%を達成した.50種類もの大規模な食事画像の分類は,実用的な精度で実現することが困難であったため報告例がないが,本研究ではMKLによる特徴統合を行う提案手法によって,初めて大規模食事画像分類において高い認識精度を達成することができた.
研究論文（学術雑誌）, 日本語
URL

単語概念の視覚性と地理的分布の関係性の分析
川久保秀敏; 柳井啓司
電子情報通信学会論文誌D, 一般社団法人電子情報通信学会, J93-D巻, 8号, 掲載ページ 1417-1428, 出版日 2010年08月, 査読付, 本研究の目的は,単語概念と画像特徴量の関係性をWeb上の大量の画像データを用いて定量的に分析することである.具体的には, (1)Bag-of-Features表現を用いた画像領域エントロピーによる単語の視覚性の分析, (2)位置情報付きの画像の分布を表すジオエントロピーによる単語概念の地理的分布の分析, (3)画像領域エントロピーとジオエントロピーによる単語の視覚性と地理的分布の関連性の分析,を行った.単語の視覚性と地理的分布の両方を分析した研究は,本研究が初めてである.本研究では,230語の名詞と,100語の形容詞について,Webからそれぞれ対応する画像を500枚ずつ収集し,これらの分析を行った.分析の結果, "sun" や "rainbow" など空に関する名詞は,他の単語に比べて画像領域エントロピーが小さく,ジオエントロピーが大きい傾向が分かった.一方,地名・地域名や偉人名に関する単語は,ジオエントロピーが小さく,画像領域エントロピーが大きい傾向にあった.
研究論文（学術雑誌）, 日本語
URL

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions
Akitsugu Noguchi; Keiji Yanai
COMPUTER VISION - ACCV 2009, PT II, SPRINGER-VERLAG BERLIN, 5995巻, 掲載ページ 458-467, 出版日 2010年, 査読付, Recently spatio-temporal local features have been proposed as image features to recognize events or human actions in videos. In this paper, we propose yet another local spatio-temporal feature based on the SURF detector, which is a lightweight local feature. Our method consists of two parts: extracting visual features and extracting motion features. First, we select candidate points based on the SURF detector. Next, we calculate motion features at each point with local temporal units divided in order to consider consecutiveness of motions. Since our proposed feature is intended to be robust to rotation, we rotate optical flow vectors to the main direction of extracted SURF features. In the experiments, we evaluate the proposed spatio-temporal local feature with the common dataset containing six kinds of simple human actions. As the result, the accuracy achieves 86%, which is almost equivalent to state-of-the-art. In addition, we make experiments to classify large amounts of Web video clips downloaded from Youtube.
研究論文（国際会議プロシーディングス）, 英語

Region-based automatic web image selection
Keiji Yanai; Kobus Barnard
MIR 2010 - Proceedings of the 2010 ACM SIGMM International Conference on Multimedia Information Retrieval, 掲載ページ 305-312, 出版日 2010年, 査読付, We propose a new Web image selection method which employs the region-based bag-of-features representation. The contribution of this work is (1) to introduce the region-based bag-of-features representation into an Web image selection task where training data is incomplete, and (2) to prove its effectiveness by experiments with both generative and discriminative machine learning methods. In the experiments, we used a multiple-instance learning SVM and a standard SVM as discriminative methods, and pLSA and LDA mixture models as probabilistic generative methods. Several works on Web image filtering task with bag-of-features have been proposed so far. However, in case that the training data includes much noise, sufficient results could not be obtained. In this paper, we divide images into regions and classify each region instead of classifying whole images. By this region-based classification, we can separate foreground regions from background regions and achieve more effective image training from incomplete training data. By the experiments, we show that the results by the proposed methods outperformed the results by the whole-image-based bag-of-features. Copyright 2010 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Associating faces and names in japanese photo news articles
Akio Kitahara; Keiji Yanai
Progress in Informatics, National Institute of Informatics, 7号, 掲載ページ 63-70, 出版日 2010年, 査読付, We propose a system which extracts faces and person names from news articles with photos on the Web and associates them automatically. The system detects face images in news photos with a face detector and extracts person names from news text with a morphological analyzer. In addition, the bag-of-keypoints representation is applied to the extracted face images for filtering out non-face images. The system uses the eigenface representation as image features of the extracted faces, and associates them with the extracted names by the modified k-means clustering in the eigenface subspace. In the experiment, we obtained the 66% precision rate at most regarding association of faces and names. © 2010 National Institute of Informatics.
研究論文（学術雑誌）, 英語
DOI URL

Geotagged photo recognition using corresponding aerial photos with multiple kernel learning
Keita Yaegashi; Keiji Yanai
Proceedings - International Conference on Pattern Recognition, 掲載ページ 3272-3275, 出版日 2010年, 査読付, In this paper, we treat with generic object recognition for geotagged images. As a recognition method for geotagged photos, we have already proposed exploiting aerial photos around geotag places as additional image features for visual recognition of geotagged photos. In the previous work, to fuse two kinds of features, we just concatenate them. Instead, in this paper, we introduce Multiple Kernel Learning (MKL) to integrate both features of photos and aerial images. MKL can estimate the contribution weights to integrate both kinds of features. In the experiments, we confirmed effectiveness of usage of aerial photos for recognition of geotagged photos, and we evaluated the weights of both features estimated by MKL for eighteen concepts. © 2010 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Automatic construction of a Folksonomy-based visual ontology
Hidetoshi Kawakubo; Yuuta Akima; Keiji Yanai
Proceedings - 2010 IEEE International Symposium on Multimedia, ISM 2010, 掲載ページ 330-335, 出版日 2010年, 査読付, Recently, Folksonomy attracts attentions as a new method to index large-scale image databases. In the Folksonomy-style image databases, they allows users to attach keywords to images as "tags". Since tag words are uncontrolled, they have various and many kinds of tags associated with images. This is much different from conventional image databases. In this paper, we propose a novel method to extract hierarchical structure on relations between tags from Folksonomy. The tag structure we extract can be used as an ontology for image database search which reflects both textual and visual relations between tags. In the proposed method, at first, we collect millions of tag-attached-images from Flickr which is the world-largest Folksonomy-style image database, and remove noise images from them. Next, we estimate concept vectors for highly-frequent tags based on only visual features, only tag word features and combined features of both visual and textual features, and compute JS divergence and entropy for three kinds of concept vectors. Finally we estimate hierarchical structures between tags regarding three kinds of concept vectors. In the experiments, we show the obtained hierarchical structure, and it includes interesting relations which sometimes are difficult to be discovered by human. In addition, as its application, we used and evaluated the obtained ontology for query expansion of text-tag-based image search over Flickr. These results indicate that the proposed method is promising and the structure is expected to help image search and some other applications. © 2010 IEEE.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Image recognition of 85 food categories by feature fusion
Hajime Hoashi; Taichi Joutou; Keiji Yanai
Proceedings - 2010 IEEE International Symposium on Multimedia, ISM 2010, 掲載ページ 296-301, 出版日 2010年, 査読付, Recognition of food images is challenging due to their diversity and practical for health care on foods for people. In this paper, we propose an automatic food image recognition system for 85 food categories by fusing various kinds of image features including bag-of-features (BoF), color histogram, Gabor features and gradient histogram with Multiple Kernel Learning (MKL). In addition, we implemented a prototype system to recognize food images taken by cellular-phone cameras. In the experiment, we have achieved the 62.52% classification rate for 85 food categories.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Detecting ``In-play'' Photos from Web Sports News Photos
Akio Kitahara; Keiji Yanai
Proc. of the Pacific-Rim Conference on Multimedia, 掲載ページ -, 出版日 2009年12月, 査読付
研究論文（国際会議プロシーディングス）, 英語

UEC at TRECVID 2009 High Level Feature Task
Zhiyuan Tang; Akitsugu Noguchi; Keiji Yanai
Proc. of TRECVID Workshop, 掲載ページ -, 出版日 2009年11月
研究論文（国際会議プロシーディングス）, 英語

Detecting "In-Play" Photos in Sports News Photo Database
Akio Kitahara; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, SPRINGER-VERLAG BERLIN, 5879巻, 掲載ページ 268-279, 出版日 2009年, 査読付, In this paper we treat with in-play classification of sports news photos as an instance of researches on more sophisticated search methods for large-scale photo news databases. We propose two methods to classify sports news photos Into one of the given six sports categories and to disci affiliate in-play photos from not-in-play ones One is the two-step method which classifies sports categories first and recognizes in-play conditions next, and the other is the one-step method which classifies them simultaneously In the proposed methods. we integrate textual features extracted from news all ides and image features extracted from photo images by Multiple Kernel Learning (MKL). In the experiment of the two-step method we obtained 99 33% as the classification late lot the sports category classification which is the first step and 80 75% for the in-play classification which is the second step On the other hand, in the experiment of the one-step method. we obtained 77 08% which was a little less than the result, by the two-step method
研究論文（国際会議プロシーディングス）, 英語

Can Geotags Help Image Recognition?
Keita Yaegashi; Keiji Yanai
ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 5414巻, 掲載ページ 361-373, 出版日 2009年, 査読付, In this paper, we propose to exploit geotags as additional information for visual recognition of consumer photos to improve its performance. Geotags, which represent places where the photos were taken, for photos can be obtained automatically by carrying a portable small GPS device with digital cameras. Geotags have potential to improve performance of visual image recognition, since recognition targets are unevenly distributed. For example, "beach" photos can be taken near the, sea and "lion" photos can be taken only in a zoo except Africa.
To integrate geotag information into visual image recognition, we adopt two types of geographical information, raw values of latitude and longitude, and visual feature of aerial photos around the location the. geotag represents. As classifiers, we. rise both a discriminative method and a generative method in the experiments.
The objective of this paper is to examine if geotags can help category-level image recognition. Note that we define air image recognition problem as deciding if air image is associated with a. certain given concept such as "mountain" and "beach" in this paper. We, propose a novel method to carry out geotagged image recognition in this paper. The experimental results demonstrate effectiveness of usage of geographical information for recognition of consumer photos.
研究論文（国際会議プロシーディングス）, 英語

Detecting cultural differences using consumer-generated geotagged photos
Keiji Yanai; Keita Yaegashi; Bingyu Qiu
Proceedings of the 2nd International Workshop on Location and the Web, LOCWEB'09, 掲載ページ 40-43, 出版日 2009年, 査読付, We propose a novel method to detect cultural differences over the world automatically by using a large amount of geotagged images on the photo sharing Web sites such as Flickr. We employ the state-of-the-art object recognition technique developed in the research community of computer vision to mine representative photos of the given concept for representative local regions from a large-scale unorganized collection of consumer-generated geotagged photos. The results help us understand how objects, scenes or events corresponding to the same given concept are visually different depending on local regions over the world. Copyright 2009 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Mining cultural differences from a large number of geotagged photos
Keiji Yanai; Bingyu Qiu
WWW'09 - Proceedings of the 18th International World Wide Web Conference, 掲載ページ 1173-1174, 出版日 2009年, We propose a novel method to detect cultural differences over the world automatically by using a large amount of geotagged images on the photo sharing Web sites such as Flickr. We employ the state-of-the-art object recognition technique developed in the research community of computer vision to mine representative photos of the given concept for representative local regions from a large-scale unorganized collection of consumer-generated geotagged photos. The results help us understand how objects, scenes or events corresponding to the same given concept are visually different depending on local regions over the world. Copyright is held by the author/owner(s).
研究論文（国際会議プロシーディングス）, 英語
DOI URL

WEB IMAGE GATHERING WITH REGION-BASED BAG-OF-FEATURES AND MULTIPLE INSTANCE LEARNING
Keiji Yanai
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, IEEE, 掲載ページ 450-453, 出版日 2009年, 査読付, We propose a new Web image gathering system which employs the region-based bag-of-features representation and multiple instance learning. The contribution of this work is introducing the region-based bag-of-features representation into an Web image gathering task where training data is incomplete and having proved its effectiveness by comparing the proposed method with the normal whole-image-based bag-of-features representation.
In our method, first, we perform region segmentation for an image, and next we generate a bag-of-features vector for each region. One image is represented by a set of bag-of-features vectors in this paper, while one image is represented by just one bag-of-features vector in the normal bag-of-features representation which is very popular for visual object categorization tasks recently.
Several works on Web image selection with bag-of-features have been proposed so far. However, in case that the training data includes much noise, sufficient results could not be obtained. In this paper, we divide images into regions and classify each region with multiple-instance support vector machine (mi-SVM) instead of classifying whole images. By this region-based classification, we can separate foreground regions from background regions and achieve more effective image training from incomplete training data. By the experiments, we show that the results by the proposed methods outperformed the results by the whole-image-based bag-of-visual-words and the normal support vector machine.
研究論文（国際会議プロシーディングス）, 英語

AN ANALYSIS OF THE RELATION BETWEEN VISUAL CONCEPTS AND GEO-LOCATIONS USING GEOTAGGED IMAGES ON THE WEB
Hidetoshi Kawakubo; Keiji Yanai
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, IEEE, 掲載ページ 1644-1647, 出版日 2009年, 査読付, Recently, a large number of geotagged images are available on photo sharing Web sites such as Flickr. In this paper, we propose image region entropy and geo-location entropy for analyzing the relation between visual concepts and geographical locations using a large-scale geotagged image database. Image region entropy represents to what extent concepts have visual characteristics, while geo-location entropy represents to what extent concepts are distributed over the world. In the experiment, we analyzed relations between image region entropy and geo-location entropy in terms of 230 nouns and 100 adjectives, and we found that the concepts with low image entropy tend to have high geo-location entropy and vice versa.
研究論文（国際会議プロシーディングス）, 英語

A visual analysis of the relationship between word concepts and geographical locations
Keiji Yanai; Hidetoshi Kawakubo; Bingyu Qiu
CIVR 2009 - Proceedings of the ACM International Conference on Image and Video Retrieval, 掲載ページ 92-99, 出版日 2009年, 査読付, In this paper, we describe two methods to analyze the relationship between word concepts and geographical locations by using a large amount of geotagged images on the photo sharing Web sites such as Flickr. Firstly, we propose using both image region entropy and geolocation entropy to analyze relations between location and visual features, and in the experiment we found that concepts with low image entropy tends to have high geo-location entropy and vice versa. Secondly, we propose a novel method to select representative photographs for regions in the worldwide dimensions, which helps detect cultural differences over the world regarding word concepts with high geo-location entropy. In the proposed method, at first, we extracts the most relevant images by clustering and evaluation on the visual features. Then, based on geographic information of the images, representative regions are automatically detected. Finally, we select and generate a set of representative images for the representative regions by employing the Probabilistic Latent Semantic Analysis (PLSA) modelling. The results show the ability of our approach to mine regional representative photographs and cultural differences over the world. Copyright 2009 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

A FOOD IMAGE RECOGNITION SYSTEM WITH MULTIPLE KERNEL LEARNING
Taichi Joutou; Keiji Yanai
2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, IEEE, 掲載ページ 285-288, 出版日 2009年, 査読付, Since health care on foods is drawing people's attention recently, a system that can record everyday meals easily is being awaited. In this paper, we propose an automatic food image recognition system for recording people's eating habits. In the proposed system, we use the Multiple Kernel Learning (MKL) method to integrate several kinds of image features such as color, texture and SIFT adaptively. MKL enables to estimate optimal weights to combine image features for each category. In addition, we implemented a prototype system to recognize food images taken by cellular-phone cameras. In the experiment, we have achieved the 61.34% classification rate for 50 kinds of foods. To the best of our knowledge, this is the first report of a food image classification system which can be applied for practical use.
研究論文（国際会議プロシーディングス）, 英語

UEC at TRECVID 2008 High Level Feature Task
Zhiyuan Tang; Keiji Yanai
Proc. of TRECVID Workshop, 掲載ページ -, 出版日 2008年11月
研究論文（国際会議プロシーディングス）, 英語

WEB VIDEO RETRIEVAL BASED ON THE EARTH MOVER'S DISTANCE BY INTEGRATING COLOR, MOTION AND SOUND
Keisuke Takada; Keiji Yanai
2008 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS, IEEE, 掲載ページ 89-92, 出版日 2008年, 査読付, In this paper, we propose a novel content-based video retrieval method for short video clips which are stored on consumer video sharing Web sites. It is based on the Earth Mover's Distance which enables us to evaluate dissimilarities among videos where the number of shots and time length are different. As features extracted from videos, we use color, motion, sound and position of shots. By defining the ground distance of EMD as the weighted sum of Euclid distances of these four kinds of features, we integrate them when calculating EMD. In the experiments on video retrieval for YouTube videos, we obtained the 0.98 average precision at most, which shows effectiveness of the proposed method. In addition, the results of integration of four kinds of features outperformed the ones of single features, which shows that feature combination is effective.
研究論文（国際会議プロシーディングス）, 英語

Web image gathering with a part-based object recognition method
Keiji Yanai
ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 4903巻, 掲載ページ 297-306, 出版日 2008年, 査読付, We propose a new Web image gathering system which employs a part-based object recognition method. The novelty of our work is introducing the bag-of-keypoints representation into an Web image gathering task instead of color histogram or segmented regions our previous system used. The bag-of-keypoints representation has been proven that it has the excellent ability to represent image concepts in the context of visual object categorization / recognition in spite of its simplicity. Most of object recognition work assumed that complete training data is available. On the other hand, in the Web image gathering task, since images associated with the given keywords are gathered from the Web fully-automatically, complete training images cannot be available. In this paper, we combine the HTML-based automatic positive training image selection and the bag-of-keypoints-based image selection with an SVM which is a supervised machine learning method. This combination enables the system to gather many images related to given concepts with high precision fully automatically needing no human intervention. Our main objective is to examine if the bag-of-keypoints model is also effective for the Web image gathering task where training images always include some noise. By the experiments, we show the new system outperforms our previous systems, other systems and Google Image Search greatly.
研究論文（国際会議プロシーディングス）, 英語

Associating Faces and Names in Japanese Photo News Articles on the Web
Akio Kitahara; Taichi Joutou; Keiji Yanai
2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, IEEE, 掲載ページ 1156-1161, 出版日 2008年, 査読付, We propose a system which extracts faces and person names from news articles with photos on the Web and associates them automatically. The system detects face images in news photos with a face detector and extracts person names from news text with a morphological analyzer In addition, the bag-of-keypoints technique is applied to the extracted face images for filtering out non-face images. The system uses the eigenface representation as image features of the extracted faces, and associates them with the extracted names by the modified k-means clustering in the eigenface subspace. In the experiment, we obtained the 66% precision rate regarding association of faces and names.
研究論文（国際会議プロシーディングス）, 英語

Automatic web image selection with a probabilistic latent topic model
Keiji Yanai
Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08, 掲載ページ 1237-1238, 出版日 2008年, 査読付, We propose a new method to select relevant images to the given keywords from images gathered from the Web based on the Probabilistic Latent Semantic Analysis (PLSA) model which is a probabilistic latent topic model originally proposed for text document analysis. The experimental results show that the results by the proposed method are almost equivalent to or outperform the results by existing methods. In addition, it is proved that our method can select more various images compared to the existing SVM-based methods.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

WEB IMAGE SELECTION WITH PLSA
Keiji Yanai
2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, IEEE, 掲載ページ 1373-1376, 出版日 2008年, 査読付, In this paper, we propose a new method to select relevant images to the given keywords from the images gathered from the Web. Our novel method is based on the Probabilistic Latent Semantic Analysis (PLSA) model, which is a generative probabilistic topic model. Firstly, we gather images related to the given keywords from the Web with Web search engines. Secondly, we choose pseudo-training images from them by simple heuristic HTML analysis, and train our PLSA-based probabilistic model with them. Finally, we select relevant images from all the gathered images with the learned model. The experimental results shows that the results by the proposed method is almost equivalent to the results by existing methods, although our method does not need to prepare negative training samples in advance unlike existing methods.
研究論文（国際会議プロシーディングス）, 英語

WEB VIDEO RETRIEVAL BASED ON THE EARTH MOVER'S DISTANCE BY INTEGRATING COLOR, MOTION AND SOUND
Keisuke Takada; Keiji Yanai
2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, IEEE, 掲載ページ 89-92, 出版日 2008年, 査読付, In this paper, we propose a novel content-based video retrieval method for short video clips which are stored on consumer video sharing Web sites. It is based on the Earth Mover's Distance which enables us to evaluate dissimilarities among videos where the number of shots and time length are different. As features extracted from videos, we use color, motion, sound and position of shots. By defining the ground distance of EMD as the weighted sum of Euclid distances of these four kinds of features, we integrate them when calculating EMD. In the experiments on video retrieval for YouTube videos, we obtained the 0.98 average precision at most, which shows effectiveness of the proposed method. In addition, the results of integration of four kinds of features outperformed the ones of single features, which shows that feature combination is effective.
研究論文（国際会議プロシーディングス）, 英語

Rushes summarization based on color, motion and face
Akitsugu Noguchi; Keiji Yanai
MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops, 掲載ページ 139-143, 出版日 2008年, 査読付, In this paper, we present a method for the Rushes Summarization task which is one of tasks of TRECVID 2008. In the proposed method, first an input video is decomposed into shots by comparing consecutive frames. Then, these shots are grouped by the k-means method, using color, motion and faces as features. In the preliminary experiments, we compared three systems which employed the following feature combinations: "color", "color and motion" and "color, motion and faces". As a result, we found out that motion features and face features were effective. Our results of Rushes Summarization 2008 were a little below the median regarding IN (inclusion ratio of ground truth) and JU (lack of junk shots), but were above the median regarding TE (pleasant tempo). Then, to improve IN and JU, we modified the method to detect clapper boards by introducing visual feature in addition to sound feature. The additional experiment regarding the modification after submission shows that it improved the results. © 2008 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Objects over the World
Bingyu Qiu; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2008, 9TH PACIFIC RIM CONFERENCE ON MULTIMEDIA, SPRINGER-VERLAG BERLIN, 5353巻, 掲載ページ 296-+, 出版日 2008年, 査読付, This paper considers the problem of selecting representative photographs for regions in the worldwide dimensions. Selecting and generating such representative photographs for representative regions from large-scale collections would help us understand about local specific objects with a worldwide perspective. We propose a solution to this problem using a large-scale collection of geo-tagged photographs. Our solution firstly extracts the most relevant images by clustering and evaluation on the visual features. Then, based on geographic information of the images, representative regions are automatically detected. Finally, we select and generate a set of representative images for the representative regions by employing the Probabilistic Latent Semantic Analysis (PLSA) modelling. The results show the ability of our approach to generate region-based representative photographs.
研究論文（国際会議プロシーディングス）, 英語

一般画像認識の現状と今後
柳井啓司
情報処理学会論文誌：コンピュータビジョン・イメージメディア, 48巻, CVIM19号, 掲載ページ 1-24, 出版日 2007年12月, 査読付
研究論文（学術雑誌）, 日本語

一般物体認識のための単語概念の視覚性の分析
柳井啓司; Kobus Barnard
情報処理学会論文誌：コンピュータビジョン・イメージメディア, 48巻, CVIM17号, 出版日 2007年02月, 査読付
研究論文（学術雑誌）, 日本語

確率的Web画像収集
柳井啓司
人工知能学会誌, 22巻, 1号, 掲載ページ 10-18, 出版日 2007年01月, 査読付
研究論文（学術雑誌）, 日本語

Image collector III: A web image-gathering system with bag-of-keypoints
Keiji Yanai
16th International World Wide Web Conference, WWW2007, 掲載ページ 1295-1296, 出版日 2007年, 査読付, We propose a new system to mine visual knowledge on the Web.There are huge image data as well as text data on the Web. However, mining image data from the Web is paid less attention than mining text data, since treating semantics of images are much more difficult. In this paper, we propose introducing a latest image recognition technique, which is the bag-of-keypoints representation,into Web image-gathering task. By the experiments we show theproposed system outperforms our previous systems and Google Imagesearch greatly.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

The photo news flusher: A photo-news clustering browser
Tatsuya Iyota; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2007, SPRINGER-VERLAG BERLIN, 4810巻, 掲載ページ 462-466, 出版日 2007年, 査読付, We propose a novel news browsing system that can cluster photo news articles based on both textual features of articles and image features of news photos for a personal news database which is built by accumulating Web photo news articles. The system provides two types of clustering methods: normal clustering and thread-style clustering. It enables us to browse news articles over several weeks or months visually and find out useful news easily. In this paper, we describe an overview of our system, some examples of uses and user studies.
研究論文（国際会議プロシーディングス）, 英語

Web image gathering with a spatial pyramid kernel
Keiji Yanai
ISM WORKSHOPS 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA - WORKSHOPS, PROCEEDINGS, IEEE COMPUTER SOC, 掲載ページ 309-314, 出版日 2007年, 査読付, In this paper, we propose a Web image gathering system employing state-of-the-art object recognition techniques, which are bag-of-visual-words representation and a spatial pyramid kernel. In these several years, research on object recognition is progressing greatly. Most of work on object recognition assumes complete training data is available, while complete training data is not available in general in case of gathering Web images with no human intervention. The objective of this paper is to examine if a state-of-the-art object recognition technique is also effective for the Web image gathering task where training images always include some noise. By the experiments, we show the state-of-the-art object recognition method is also very effective for Web image gathering and the results outperform ones by existing methods greatly.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

UEC at TRECVID 2007 High Level Feature Task
O. Liu; Z. Tang; K. Yanai
Proc. of TRECVID Workshop 2007, 掲載ページ -, 出版日 2007年
研究論文（国際会議プロシーディングス）, 英語

Image Classification by a Probabilistic Model Learned from Imperfect Training Data on the Web
Keiji Yanai
Proc. of ACM Knowledge Discovery and Data Mining (KDD) Workshop on Multimedia Data Mining, 掲載ページ 75-82, 出版日 2006年08月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Cross modal disambiguation
Kobus Barnard; Keiji Yanai; Matthew Johnson; Prasad Gabbur
TOWARD CATEGORY-LEVEL OBJECT RECOGNITION, SPRINGER-VERLAG BERLIN, 4170巻, 掲載ページ 238-+, 出版日 2006年, 査読付, We consider strategies for reducing ambiguity in multi-modal data, particularly in the domain of images and text. Large data sets containing images with associated text (and vice versa) are readily available, and recent work has exploited such data to learn models for linking visual elements to semantics. This requires addressing a correspondence ambiguity because it is generally not known which parts of the images connect with which language elements. In this paper we first discuss using language processing to reduce correspondence ambiguity in loosely labeled image data. We then consider a similar problem of using visual correlates to reduce ambiguity in text with associated images. Only rudimentary image understanding is needed for this task because the image only needs to help differentiate between a limited set of choices, namely the senses of a particular word.
研究論文（国際会議プロシーディングス）, 英語

Automatic "Go" record generation from a TV program
Keiji Yanai; Takehisa Hayashiyama
12TH INTERNATIONAL MULTI-MEDIA MODELLING CONFERENCE PROCEEDINGS, IEEE, 掲載ページ 414-417, 出版日 2006年, 査読付, We present a video recognition system of a "Go" TV program. It generates a Go play record automatically from a broadcast of Go played by human professionals. "Go" is the ancient Asian board game played between two player which is similar to Chess and Shogi. For an MPEG2 video of a TV Go program, the system distinguishes play commentary boardshots from other types of shots such as player's shots, and detects Go stones placed on the board from board shots. The system removes several types of noise such as a player's head or hand. In addition, it also detects Go stone from commentary board shots which are often inserted between play boardshots, and compensates for the order of Go stones placed on the play board during commentary board shots. In the experimental results for eight TV Go program the system have achieved the 95.7% precision and the 95.7% recall rate.
研究論文（国際会議プロシーディングス）, 英語

Mutual Information Between Words and Pictures
Kobus Barnard; Keiji Yanai
Proc. of the Workshop on Information Theory and Applications, 掲載ページ (), 出版日 2006年01月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Finding visual concepts by web image mining
Keiji Yanai; Kobus Barnard
Proceedings of the 15th International Conference on World Wide Web, 掲載ページ 923-924, 出版日 2006年, 査読付, We propose measuring "visualness" of concepts with images on the Web, that is, what extent concepts have visual characteristics. This is a new application of "Web image mining". To know which concept has visually discriminative power is important for image recognition, since not all concepts are related to visual contents. Mining image data on the Web with our method enables it. Our method performs probabilistic region selection for images and computes an entropy measure which represents "visualness" of concepts. In the experiments, we collected about forty thousand images from the Web for 150 concepts. We examined which concepts are suitable for annotation of image contents.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Evaluation Strategies for Image Understanding and Retrieval
Keiji Yanai; Nikhil V. Shirahatti; Prasad Gabbur; Kobus Barnard
Proc. of ACM Multimedia Workshop on Multimedia Information Retrieval, 掲載ページ 217-226, 出版日 2005年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Probabilistic Web Image Gathering
Keiji Yanai; Kobus Barnard
Proc. of ACM Multimedia Workshop on Multimedia Information Retrieval, 掲載ページ 57-64, 出版日 2005年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

UEC at TRECVID 2005 High Level Feature Task --Web Images Meet TRECVID--
Keiji Yanai; Liu Ounan; Yuki Tsujita
Proc. of the TRECVID Conference, 掲載ページ -, 出版日 2005年11月
研究論文（国際会議プロシーディングス）, 英語

Image region entropy: A measure of "visualness" of web images associated with one concept
Keiji Yanai; Kobus Barnard
Proceedings of the 13th ACM International Conference on Multimedia, MM 2005, 掲載ページ 419-422, 出版日 2005年, 査読付, We propose a new method to measure "visualness" of concepts, that is, what extent concepts have visual characteristics. To know which concept has visually discriminative power is important for image annotation, especially automatic image annotation by image recognition system, since not all concepts are related to visual contents. Our method performs probabilistic region selection for images which are labeled as concept "X" or "non-X", and computes an entropy measure which represents "visualness" of concepts. In the experiments, we collected about forty thousand images from the World-Wide Web using the Google Image Search for 150 concepts. We examined which concepts are suitable for annotation of image contents. Copyright © 2005 ACM.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

Image collector II: A system to gather a large number of images from the web
Keiji Yanai
IEICE Transactions on Information and Systems, Institute of Electronics, Information and Communication, Engineers, IEICE, E88-D巻, 10号, 掲載ページ 2432-2436, 出版日 2005年, 査読付, We propose a system that enables us to gather hundreds of images related to one set of keywords provided by a user from the World Wide Web. The system is called Image Collector II. The Image Collector, which we proposed previously, can gather only one or two hundreds of images. We propose the two following improvements on our previous system in terms of the number of gathered images and their precision: (1) We extract some words appearing with high frequency from all HTML files in which output images are embedded in an initial image gathering, and using them as keywords, we carry out a second image gathering. Through this process, we can obtain hundreds of images for one set of keywords. (2) The more images we gather, the more the precision of gathered images decreases. To improve the precision, we introduce word vectors of HTML files embedding images into the image selecting process in addition to image feature vectors. Copyright © 2005 The Institute of Electronics, Information and Communication Engineers.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

一般画像自動分類の実現へ向けたWorld Wide Webからの画像知識の獲得
柳井啓司
人工知能学会論文誌, 19巻, 5号, 掲載ページ 429-439, 出版日 2004年10月, 査読付
研究論文（学術雑誌）, 日本語

A fast image-gathering system from the World-Wide Web using a PC cluster
K Yanai; M Shindo; K Noshita
IMAGE AND VISION COMPUTING, ELSEVIER SCIENCE BV, 22巻, 1号, 掲載ページ 59-71, 出版日 2004年01月, 査読付, Due to the recent explosive progress of WWW (World-Wide Web), we can easily access a large number of images on WWW. There are, however, no established methods to make use of WWW as a large image database. In this paper, we describe an automatic image-gathering system from WWW, in which we use both keywords and image features. By exploiting some existing keyword-based search engines and selecting images by their image features, our system obtains, with high accuracy, images that are relevant to query keywords. Our system has the following two novel properties: (1) It does not need to make a huge index for a great number of images on the whole WWW because of taking advantage of commercial keyword-based text-search engines. (2) It can gather a lot of images related to given keywords full-automatically without a user's intervention during the processing. The system has been implemented on a parallel PC cluster, which enables us to gather more than one hundred images from WWW in about one minute. (C) 2003 Elsevier B.V. All rights reserved.
研究論文（学術雑誌）, 英語
DOI URL

Generic Image Classification Using Visual Knowledge on the Web
Keiji Yanai
Proc. of ACM International Conference on Multimedia, 掲載ページ 67-76, 出版日 2003年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Web Image Mining toward Generic Image Recognition
Keiji Yanai
Proc. of ACM International World Wide Web Conference, 掲載ページ Poster Paper No.193, 出版日 2003年05月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Image Collector II : An Over-One-Thousand-Image-Gathering System
Keiji Yanai
Proc. of ACM International World Wide Web Conference, 掲載ページ Poster Paper No.47, 出版日 2003年05月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Image collector II: A system for gathering more than one thousand images from the web for one keyword
K Yanai
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, IEEE, 掲載ページ 785-788, 出版日 2003年, 査読付, We propose a system that enables us to gather more than one thousand images from the World Wide Web. The system is called Image Collector H. The Image Collector, which we proposed previously, can gather only several hundreds images. We made the two following improvements to extend the ability of our previous system in terms of the number of gathered images and their precision: (1) We extracted some words appearing with high frequency from all HTML files embedding output images in an initial image gathering, and using them as keywords, we made a second image gathering again. Through this, we obtained more than one thousand images for one keyword. (2) The more images we gathered, the more he precision of gathered images decreased. To raise the precision, we introduced word vectors of HTML files embedding images into the image selecting process in addition to image feature vectors.
研究論文（国際会議プロシーディングス）, 英語

Web Image Mining: Can we gather visual knowledge for image recognition from the Web?
K Yanai
ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, IEEE, 掲載ページ 186-190, 出版日 2003年, 査読付, Because of the wide spread of digital imaging devices and the World Wide Web, we can easily obtain digital images of various kinds of real world scenes. Currently, however, classification/recognition of generic real world images is far from practical due to a diversity of real world scenes.
To deal with such diversity, we have proposed gathering real world images from the World-Wide Web and using them as training images for image classification. We call this research project "Web Image Mining". Web images are as diverse as real world scene, since Web images are taken by a large number of people for various kinds of purpose. It is expected that diverse training images enable us to classify/recognition diverse real world images. In this paper, we describe our ongoing project, "Web Image Mining for Generic Image Recognition".
研究論文（国際会議プロシーディングス）, 英語

Recognition of indoor images employing supporting relation between objects
Keiji Yanai; Koichiro Deguchi
Systems and Computers in Japan, 33巻, 11号, 掲載ページ 14-26, 出版日 2002年10月, In this paper, we describe a new design of a recognition system for a single image of an indoor scene including complex occlusions. In conventional works, the systems could not recognize images of an indoor scene including complex occlusions. Our system can treat them by employing supporting relation between objects. In our system, first, the system estimates the 3D structure of an object by fitting a 3D structure model to the image qualitatively. Next, by checking the supporting relation between objects, it eliminates object candidates that cannot exist and estimates real objects from their parts in the image. Finally, the system recognizes objects that are compatible with each other. We implemented the system as a multi-agent-based image understanding system. In this paper, we describe the design of the system and results of experiments. © 2002 Wiley Periodicals, Inc. Syst. Comp. Jpn., 33(11).
研究論文（学術雑誌）, 英語
DOI URL

Image Classification by Web Images
Keiji Yanai
Proc. of the Seventh Pacific-Rim International Conference on Artificial Intelligence (Springer LNAI no.2417), 掲載ページ 613-614, 出版日 2002年08月, 査読付
研究論文（国際会議プロシーディングス）, 英語

反復深化探索に基く協力詰将棋の解法
星由雄; 野下浩平; 柳井啓司
情報処理学会論文誌, 43巻, 1号, 掲載ページ 11-19, 出版日 2002年01月
研究論文（学術雑誌）, 日本語

An experiment on generic image classification using Web images
K Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, SPRINGER-VERLAG BERLIN, 2532巻, 掲載ページ 303-310, 出版日 2002年, 査読付, In this paper, we describe an experiment on generic image classification using a large number of images gathered from the Web as learning images. The processing consists of three steps. In the gathering stage, a system gathers images related to given class keywords from the Web automatically. In the learning stage, it extracts image features from gathered images and associates them with each class. In the classification stage, the system classifies a test image into one of classes corresponding to the class keywords by using the association between image features and classes. In the experiments, we achieved a classification rate 44.6% for generic images by using images gathered from the World-Wide Web automatically as learning images.
研究論文（学術雑誌）, 英語

A multi-resolution image understanding system based on multi-agent architecture for high-resolution images
K Yanai; K Deguchi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E84D巻, 12号, 掲載ページ 1642-1650, 出版日 2001年12月, 査読付, Recently a high-resolution image that has more than one million pixels is available easily. However, such an image requires much processing time and memory for an image understanding system. In this paper, we propose an integrated image understanding system of multi-resolution analysis and multiagent-based architecture for high-resolution images. The system we propose in this paper has capability to treat with a high-resolution image effectively without much extra cost. We implemented an experimental system for images of indoor scenes.
研究論文（学術雑誌）, 英語

キーワードと画像特徴を利用したWWWからの画像収集システム
柳井啓司
情報処理学会論文誌:データベース, 42巻, SIG10 (TOD11)号, 掲載ページ 79-91, 出版日 2001年10月, 査読付
研究論文（その他学術会議資料等）, 日本語

A Fast Image-Gathering System on the World-Wide Web Using a PC Cluster
Keiji Yanai; Masaya Shindo; Kohei Noshita
Proc. of International Conference on Web Intelligence 2001 (Springer LNAI no.2198), 1巻, 掲載ページ 324-334, 出版日 2001年09月, 査読付
研究論文（国際会議プロシーディングス）, 英語

物体間の支持関係を利用した室内画像の認識
柳井啓司; 出口光一郎
電子情報通信学会論文誌D-II, 一般社団法人電子情報通信学会, 84-DII巻, 8号, 掲載ページ 1741-1752, 出版日 2001年08月, 査読付, 本論文では,複雑なオクルージョンを含む室内シーン画像に対する画像認識システムを提案する.従来のシステムでは,物体が十分に画像中に現れていないと認識ができず,室内シーンのような複雑なオクルージョンを含む画像に対して対処できなかった.それに対して,我々の提案するシステムでは,物体が物体の上に載っているという関係である物体間の支持関係を定性的に推論することによって,他の物体によって隠されている物体の認識を可能としている.具体的には,最初に画像中に明確に現れている対象に対して3次元構造モデルを当てはめることによって物体の3次元構造を推定し,次に推定された物体の3次元構造を利用して,物体間の支持関係をチェックすることによって,部分的にしか見えていない物体の存在を推定したり,実在しない物体の候補を消去し,最終的に全体として整合性のとれた認識結果を得る.我々は,こうした認識を我々が従来より研究しているマルチエージェント型の画像認識システムとして実現した.本論文では,システムについての詳細と,実際にインプリメントしたプロトタイプシステムによる実験,結果について述べる.
研究論文（その他学術会議資料等）, 日本語
URL

Image collector: An image-gathering system from the world-wide web employing keyword-based search engines
Keiji Yanai
Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 掲載ページ 523-526, 出版日 2001年, 査読付, Due to the recent explosive progress of WWW (World-Wide Web), w ecan easily access a large number of images from WWW. There are, however, no established methods to make use of WWW large image database. In this paper, we describe an automatic image-gathering system from WWW employing key-w ordsand image features, which is called the Image Collector. By exploiting some existing keyword-based searc h engines and selecting images by their image features, our system obtains, with high accuracy, images that are strongly related to query keywords. We have implemented the system that gathers more than one hundred images from WWW in about five minutes.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

An automatic image-gathering system for the World-Wide Web by integration of keywords and image features
K Yanai
ICCIMA 2001: FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, IEEE COMPUTER SOC, 掲載ページ 303-307, 出版日 2001年, Due to the recent explosive progress of WWW (World-Wide, Web), we can easily access a large number of images from WWW. However, methods to utilize WWW as a large image database have not been established yet. In this paper, we propose an automatic image-gathering system from WWW employing both keywords and image features, which is called Image Collector. By exploiting some existing keyword-based search engines and selecting images by their image features, the system obtains more than one hundred images related to query keywords in about five minutes.
研究論文（国際会議プロシーディングス）, 英語

A Multi-resolution Image Understanding System Based on Multi-agent Architecture for High-resolution Images
Keiji Yanai; Koichiro Deguchi
Proc. of IAPR Workshop on Machine Vision and Applications, 掲載ページ 291-294, 出版日 2000年11月, 査読付
研究論文（国際会議プロシーディングス）, 英語

Recognition of indoor images employing qualitative model fitting and supporting relation between objects
K Yanai; K Deguchi
15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, IEEE COMPUTER SOC, 掲載ページ 964-967, 出版日 2000年, 査読付, In this paper, we describe a new design of a recognition system for a single image of indoor scene including complex occlusions. In our system, first, the system estimates 3D structure of an object by fitting a 3D structure model to the image qualitatively. Next, by checking supporting relation between objects, it eliminates object candidates that are impossible to exist and estimates actual objects from their parts in the image. Then, finally, we recognize objects that are consistent with each other. We implemented the system as a multi-agent-based image understanding system. This paper describes an outline of the system and results of recognition experiments.
研究論文（国際会議プロシーディングス）, 英語

An image understanding system for various images based on multi-agent architecture
Keiji Yanai
Proceedings - 3rd International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 1999, Institute of Electrical and Electronics Engineers Inc., 掲載ページ 186-190, 出版日 1999年, An image understanding system for real world images which has an ability to recognize various kinds of images is proposed. We propose a multi-agent architecture to integrate and cooperate object recognition modules for individual target objects. In our system, object candidates generated by different agents are integrated not only on the evaluations by each modules themselves but also on spatial relations among objects. By checking spatial relation, the agents also estimate actual objects from parts seen in the image. Such mechanisms are realized by autonomous cooperation among the agents, and the most reliable result is selected after the arbitration between them. We implemented an experimental system on a PC cluster system, and achieved recognition for both indoor and outdoor images.
研究論文（国際会議プロシーディングス）, 英語
DOI URL

マルチエージェントによる多様な画像に対応した物体認識システムの一構成法
柳井啓司; 出口光一郎
情報処理学会論文誌, 39巻, 2号, 掲載ページ 170-177, 出版日 1998年02月, 査読付
研究論文（その他学術会議資料等）, 日本語

An architecture of object recognition system for various images based on multi-agent
K Yanai; K Deguchi
FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, IEEE COMPUTER SOC, 掲載ページ 278-281, 出版日 1998年, 査読付, An image understanding system for real world images which has an ability to recognize various kinds of images is proposed. We propose a multi-agent architecture to integrate object recognition modules for individual target objects. In our method, recognized results by different agents are fused not only on the evaluations by each modules themselves but also on relations of object locations, sizes and so on. This is carried out autonomously between the agents concerned, and the most reliable result is selected after the arbitration between them. We implemented an experimental system on a parallel computer and achieved recognition for both indoor and outdoor images.
研究論文（国際会議プロシーディングス）, 英語

An Implementation of Multi-Agent Based Object Recognition System for Various Image
Keiji Yanai
Proc. of the Eighth Parallel Computing Workshop, 掲載ページ P1-P, 出版日 1998年
研究論文（国際会議プロシーディングス）, 英語

Implementation of Object Recognition System Employing Multiagent Architecture at Highly Parallel Computer
Keiji Yanai; Koichiro Deguchi
Proc. of the Sixth Parallel Computing Workshop'96, 掲載ページ 2-a, 出版日 1996年
研究論文（国際会議プロシーディングス）, 英語

MISC

食事画像認識の現状と今後
柳井啓司
人工知能学会, 出版日 2019年01月, 人工知能学会誌, 34巻, 1号, 掲載ページ -, 日本語, 記事・総説・解説・論説等（学術雑誌）

Neural Style Vectorによる絵画画像のスタイル検索
松尾真; 柳井啓司
出版日 2018年06月, 画像ラボ, 29巻, 6号, 掲載ページ -, 日本語, 記事・総説・解説・論説等（商業誌、新聞、ウェブメディア）

食メディアに関する研究最先端 (特集衣・食・住に入り込む先端メディア技術)
井手一郎; 柳井啓司; 山肩洋子
映像情報メディア学会, 出版日 2017年11月, 映像情報メディア学会誌 = The journal of the Institute of Image Information and Television Engineers, 71巻, 6号, 掲載ページ 768-772, 日本語, 1342-6907, 1881-6908, 201702260037003336, 40021375737, AN10588970
URL
DOI URL

CNNを用いた高速モバイル画像認識エンジンの自動生成フレームワーク
丹野良介; 柳井啓司
日本工業出版, 出版日 2017年04月, 画像ラボ, 28巻, 4号, 掲載ページ 31-38, 日本語, 招待, 記事・総説・解説・論説等（商業誌、新聞、ウェブメディア）, 0915-6755, 40021174186, AN10164169
URL

食事動作認識によるリアルタイム食事記録システム
岡元晃一; 柳井啓司
日本工業出版, 出版日 2015年10月, 画像ラボ, 26巻, 10号, 掲載ページ 1-7, 日本語, 招待, 記事・総説・解説・論説等（商業誌、新聞、ウェブメディア）, 0915-6755, 40020612817, AN10164169
URL

ウェアラブルカメラ映像の自動要約による道案内映像の自動作成
岡本昌也; 柳井啓司
日本工業出版, 出版日 2015年10月, 画像ラボ, 26巻, 10号, 掲載ページ 8-15, 日本語, 招待, 記事・総説・解説・論説等（商業誌、新聞、ウェブメディア）, 0915-6755, 40020612826, AN10164169
URL

CNNを用いた複数品食事画像の領域分割とカロリー推定 (データ工学)
下田和; 柳井啓司
電子情報通信学会, 出版日 2015年09月24日, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115巻, 230号, 掲載ページ 65-70, 日本語, 0913-5685, 40020617008, AN10012921
URL

CNNを用いた弱教師学習による画像領域分割 (情報論的学習理論と機械学習)
下田和; 柳井啓司
電子情報通信学会, 出版日 2015年09月14日, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115巻, 225号, 掲載ページ 149-154, 日本語, 0913-5685, 40020617396
URL

高速食事画像判別器を用いたTwitter食事画像分析とデータセット自動拡張
河野憲之; 柳井啓司
日本工業出版, 出版日 2014年12月, 画像ラボ, 25巻, 12号, 掲載ページ --54, 日本語, 招待, 記事・総説・解説・論説等（商業誌、新聞、ウェブメディア）, 0915-6755, 40020284287, AN10164169
URL

Twitterからのジオタグ画像収集による視覚的イベント検出
金子昴夢; 柳井啓司
日刊工業出版, 出版日 2014年11月, 画像ラボ, 25巻, 11号, 掲載ページ --28, 日本語, 招待, 記事・総説・解説・論説等（商業誌、新聞、ウェブメディア）, 0915-6755, 40020246600, AN10164169
URL

画像メディア技術の実世界画像データへの応用
柳井啓司
出版日 2013年12月, 画像ラボ, 24巻, 12号, 日本語, 記事・総説・解説・論説等（その他）

ラーメンvsカレー : 2年分のログデータと高速食事画像認識エンジンを用いたTwitter食事画像分析とデータセット自動構築 (パターン認識・メディア理解)
河野憲之; 柳井啓司
多くの人々がTwitter を利用するようになり,大量に投稿されたツイートを通して人々の行動や考えを分析することが可能となった.ツイートには画像が付与されたものも多く,特に昼食時,夕食時には,食事の画像が大量にツイートされる.そこで,本稿では2011年5月から2013年8月の2年4ヶ月の間に収集した約10億件の画像付きツイートに対して,食事キーワード検索と高速食事画像認識エンジンを用いて,100種類の食事画像を抽出する実験を行った結果を報告する.実験では,食事画像ランキング,一部の食事カテゴリについてサンプリングによる抽出精度評価,また位置情報食事画像ツイートを用いた「ラーメン」と「カレー」に関する地域分布の分析を行った.またさらに,我々が構築した100種類の食事画像データセットを自動的に拡張するためのフレームワークについても述べる.100 類食事画像データを利用して構築した食事画像判定エンジンと,Amazon Mechanical Turk を利用したクラウドソーシングを用いて,キーワードを与えるのみで,自動的に新しい食事カテゴリのバウンディングBOX付きの画像データセットを構築する.実験では,手動で作成した既存の食事画像データセットのサブセットとの認識精度の比較を行う., 一般社団法人電子情報通信学会, 出版日 2013年10月03日, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 113巻, 230号, 掲載ページ 59-64, 日本語, 0913-5685, 110009824980, AN10541106
URL

FoodCam : スマートフォン上でのリアルタイム食事画像認識による食事記録アプリケーション (データ工学)
河野憲之; 柳井啓司
近年、スマートフォンが普及しその性能も向上している。従来のスマートフォンアプリケ-ションはデータをサーバに送り、サーバ上で画像認識をしていた。だが、通信コストがかかる。また、ユーザの増加により計算資源が多数必要になるという問題点がある。そこで、スマートフォン上で画像処理することが望まれる。本論文では、計算資源の限られたスマートフォン上でより高速、高精度に一般物体認識を行う手法を提案する。そして、従来の画像認識を用いた食事記録アプリケーションの画像認識の面において改良を行った。従来手法よりも大幅に認識性能が向上し、かつ高速であることを実験により確認した。さらに、サーバ上でのコストが大きい認識手法と比較しより高い認識性能を示し、有効性を確認した。, 一般社団法人電子情報通信学会, 出版日 2013年09月12日, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 113巻, 214号, 掲載ページ 13-18, 日本語, 0913-5685, 110009815150, AN10012921
URL

料理画像認識を用いたモバイル食事記録システム
河野憲之; 柳井啓司
近年，スマートフォンの性能が大きく向上している．そこで，スマートフォンの計算資源のみを用い，スマートフォン上でリアルタイムに料理画像認識を行い，ユーザの食事記録の補助を行うシステムを提案する．それは，カラーヒストグラムと Bag-of-SURF を fast X2 kernel SVMs で 50 種類の料理を分類し，また，認識する領域をユーザが与えるとバックグラウンドで GrabCut によるその領域の補正と，認識結果が誤った場合を考慮し，SVM の評価値に基づき認識結果が向上する方向を写すように指示するシステムである．実験では正しい料理領域が与えられ料理の候補を 5 つ提示した場合，81.55%の分類率を達成した．また，ユーザによるシステム評価を行い提案システムの有用性を検証した．, 出版日 2013年05月23日, 研究報告コンピュータビジョンとイメージメディア（CVIM）, 2013巻, 4号, 掲載ページ 1-8, 日本語, 170000077117, AA11131797
URL

ウェアラブルカメラを用いた道案内映像の自動作成
岡本昌也; 柳井啓司
近年，ウェアラブルカメラと呼ばれる頭や胸などに着けて撮影するカメラの普及に伴って，一人称視点の映像が多く撮られるようになってきている。本研究では，一人称視点映像の移動映像を入力として，映像中の道順が分かるようシーンの重要度に応じて動的に再生スピードを変化させた要約映像を自動生成することを目的とする。要約映像を生成する為に，横断歩道検出と自己動作分類という 2 つの処理を行う。横断歩道検出は，映像中に出現する横断歩道を検出することで，映像中の交差点や分岐点を推定する。自己行動分類は，映像撮影者の行動を "前進"， "停止" 及び "右折"， "左折" の 4 つに分類する。要約映像の生成は，横断歩道検出と自己行動分類の結果を統合し，動的に再生速度を制御することによって行う。実験では，横断歩道検出の精度実験及び自己行動分類の精度実験を行い，その有効性を検証した。映像 3 本に対して提案手法の要約映像を生成して比較評価実験を行い，提案手法が単純な要約手法より優れている事を示した。, 出版日 2013年05月23日, 研究報告コンピュータビジョンとイメージメディア（CVIM）, 2013巻, 5号, 掲載ページ 1-8, 日本語, 170000077118, AA11131797
URL

k-meansによる局所特徴量抽出と皿検出器による食事画像認識の改良 (パターン認識・メディア理解)
松田裕司; 柳井啓司
我々は,以前にスライディングウィンドウサーチや領域分割,円検出を用いて料理領域を推定し,推定された領域を複数の特徴量を用いて分類を行う食事画像認識エンジンを提案した.本研究では,この認識エンジンにk-meansを用いた局所領域の画素値に基づく特徴量を加え,さらに料理領域推定に料理の種類に依らない皿検出器を用いることで改良を行う.実験では,k-meansによる特徴量は,料理領域が既知の場合ではあるが,同次元数であるBoF-SIFTと比較して,分類率が2.4ポイント向上し,69.3%,皿検出器は,従来の料理ごとに学習したDPMと比較して,分類率が3.8ポイント向上し.66.4%となり,それぞれの有効性を示した., 一般社団法人電子情報通信学会, 出版日 2013年03月14日, 電子情報通信学会技術研究報告 : 信学技報, 112巻, 495号, 掲載ページ 157-161, 日本語, 0913-5685, 110009713432, AN10541106
URL

視覚特徴およびタグ共起を用いた大規模Webビデオショットランキング (パターン認識・メディア理解)
DOHANG NGA; 柳井啓司
本研究はビデオショットの間の視覚的関連およびビデオとそのタグの共起関係を利用した新しいランキング手法を提案する。提案手法は特定動作の関連Webビデオショットの自動抽出システムに適用する。提案手法ではノイズが多く含まれたタグ情報はビデオのコンテンツ情報によって洗練し、より効率的に使うことができる。また、ビデオショットの視覚特徴と同時にそのビデオの洗練されたタグ特徴も考慮することによって、視覚特徴のみを利用することよりランク上位で多くの関連ありのビデオショットが得られる。大規模実験でベースラインとの比較を行い、提案手法の有効性を検証できた。, 一般社団法人電子情報通信学会, 出版日 2013年02月21日, 電子情報通信学会技術研究報告 : 信学技報, 112巻, 441号, 掲載ページ 221-226, 日本語, 0913-5685, 110009728829, AN10541106
URL

画像メディア技術の実世界データへの応用
柳井啓司
一般社団法人映像情報メディア学会, 出版日 2012年11月, 映像メディア学会誌, 66巻, 11号, 掲載ページ --906, 日本語, 記事・総説・解説・論説等（その他）, 1342-6907, 110009597746, AN10588970
URL
DOI URL

Webマルチメディア情報と物体認識技術
柳井啓司
出版日 2012年07月, 画像ラボ, 23巻, 7号, 掲載ページ -, 日本語, 記事・総説・解説・論説等（その他）

食材画像認識を用いたレシピ推薦システム
丸山拓馬; 秋山瑞樹; 柳井啓司
本論文では、モバイルデバイスでの画像認識を利用したレシピ推薦システムを提案する。提案システムはスマートフォン側で食材の一般物体認識をリアルタイムに行う。食材にスマートフォンをかざすだけで次々とレシピを推薦するので、従来のキー入力のみのシステムよりも直感的で簡単なレシピ検索が可能となる。一般物体認識手法としてカラーヒストグラム、SURFを用いたBag-of-Features表現を採用している。実験では30種類の食材を対象にしてタッチのみで操作するレシピ検索システムとの比較を行い、ユーザによる評価を実施した。また画像認識では44.9%の精度で目的の食材を認識でき、上位5位までを考慮すると80.9%を達成すること確認した。, 一般社団法人電子情報通信学会, 出版日 2012年03月05日, 電子情報通信学会技術研究報告. MVE, マルチメディア・仮想環境基礎, 111巻, 479号, 掲載ページ 43-48, 日本語, 0913-5685, 110009546395, AN10476092
URL

候補領域推定による複数品目に対応した食事画像認識
甫足創; 松田裕司; 柳井啓司
出版日 2011年07月20日, 画像の認識・理解シンポジウム(MIRU2011)論文集, 2011巻, 掲載ページ 234-240, 日本語, 170000067219
URL

時空間特徴量を用いた Youtube 動画からの特定動作ショットの自動抽出
Nga Do Hang; 柳井啓司
本研究では、教師なし手法によって単語を入力するだけで、様々な動詞に対応するショットをWebビデオから自動抽出することを目的とする。まず、Web上での一般ユーザが動画像を自由にアップロードできる動画共有サイトであるYoutubeから特定動作に対応する動画に付与されるタグの共起情報を用いて、動作キーワードについてのWeb2.0辞書を作成する。次に、この辞書に基づくキーワードに対する相関値の高い順に動画を大量に収集する。次は、収集動画に対して色情報に基づいてショット分割する。次に、ショットから時空間特徴、動き特徴、視覚特徴を抽出する。そして、各ショットをbag-of-featuresとして表現する。最後に、画像集合の代表的な画像を選出するためのランキング手法であるVisualRankを適用して、特定動作に対応するショットを自動抽出する。時空間特徴は野口らによって提案された時空間特徴を利用する。さらに、時空間特徴に全体的な動きを表現する動き特徴とガボールフィルタによる視覚特徴を統合し、ショットをランキングすることも行い、その有効性を示す。また、VisualRankの計算では、動作のタグデータベースから構築されたWeb2.0辞書に基づく相関値が高い動画からのショットを強調するとする。実験では、6種類の動詞についてランキングをおこない、上位10ショットに53.3%,、上位100ショットに48.7%の精度でそれぞれ動詞に対応したショットをランキングすることができた。, 一般社団法人電子情報通信学会, 出版日 2011年02月10日, 電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解, 110巻, 414号, 掲載ページ 159-164, 日本語, 0913-5685, 110008690104, AN10541106
URL

物体認識技術の進歩
柳井啓司
出版日 2010年05月, 日本ロボット学会誌, 28巻, 3号, 掲載ページ 257-260, 日本語, 記事・総説・解説・論説等（その他）

Bag-of-Features に基づく物体認識(2)-一般物体認識-
柳井啓司
アドコム・メディア, 出版日 2010年, コンピュータビジョン最先端ガイド, 3巻, 掲載ページ 85-117, 20001476662

Multiple Kernel Learningを用いた50種類の食事画像分類
上東太一; 柳井啓司
日本工業出版, 出版日 2010年01月, 画像ラボ, 21巻, 1号, 掲載ページ 12-18, 日本語, 記事・総説・解説・論説等（その他）, 0915-6755, 40016947602, AN10164169
URL

サポートベクターマシン関連ツール
柳井啓司
一般社団法人映像情報メディア学会, 出版日 2009年12月, 映像メディア学会誌, 63巻, 12号, 掲載ページ 1778-1781, 日本語, 記事・総説・解説・論説等（その他）, 1342-6907, 110009669470, AN10588970
URL
DOI URL

セマンティックギャップを超えて―画像・映像の内容理解に向けて―
井手一郎; 柳井啓司
人工知能学会, 出版日 2009年09月, 人工知能学会誌, 24巻, 5号, 掲載ページ 691-699, 日本語, 記事・総説・解説・論説等（その他）, 0912-8085, 110007340609, AN10067140
URL

Bag-of-Featuresによるカテゴリー分類
柳井啓司
日本工業出版, 出版日 2009年01月, 画像ラボ, 20巻, 1号, 掲載ページ 59-64, 日本語, 記事・総説・解説・論説等（その他）, 0915-6755, 40016422766, AN10164169
URL

カメラが情景を理解する --シーンの意味的分類技術の最先端--
柳井啓司
出版日 2008年07月, 映像メディア学会誌, 62巻, 7号, 掲載ページ 40-45, 英語, 査読付, 記事・総説・解説・論説等（その他）

Semi-Supervised Learningを用いたWeb画像収集システム
柳井啓司
人工知能学会, 出版日 2005年, 人工知能学会全国大会論文集, 19巻, 掲載ページ 1-4, 日本語, 1347-9881, 40020231017, AA11578981
URL

WWWからの高速画像収集と収集画像を用いた画像認識の試み
柳井啓司
出版日 2001年, 第15回人工知能学会全国大会講演論文集, 2001, 80012673207

マルチエージェントによる多重解像度画像理解システム
柳井啓司; 出口光一郎
人工知能学会, 出版日 1999年06月15日, 人工知能学会全国大会論文集 = Proceedings of the Annual Conference of JSAI, 13巻, 掲載ページ 463-466, 日本語, 0914-4293, 10009927951, AN10258229
URL

マルチエージェントによる画像理解システム構成法
柳井啓司; 出口光一郎
日本工業出版, 出版日 1998年, 画像ラボ, 9巻, 8号, 掲載ページ 15-19, 日本語, 記事・総説・解説・論説等（その他）, 0915-6755, 40005023964, AN10164169
URL

物体認識の結果を利用した3次元モデルの空間への配置の試み
柳井啓司; 出口光一郎
予め画像中に存在するであろう物体候補の正確な3次元形状モデルが与えられている場合に, 単一の2次元画像から, そこに写っている3次元物体の位置や姿勢を推定する問題は, model-based recognitionといわれ, 従来より様々な研究が行なわれている。しかし, 一般に実世界の画像に対しては, 予め物体の正確な形状の与えて置くことは困難である。一方, 実世界に対する認識は, 通常は一般名称に関する認識として行なわれている。特に単一画像を対象とした物体認識では, 主に画像を領域分割した結果に, 領域の特徴や領域間の関係などを用いて, ラベル付けを行なうことによって, 認識を実現している。そのため, 実世界は3次元であっても, 認識の過程では2次元的に処理され, 認識結果も領域に対するラベリングとして出力されることが多く, 3次元情報は得られない。しかし, 人間は見慣れているシーンであると, 単一2次元の画像(例えば, 図4)からでも奥行きを感じることができる。これは, 人間がシーン中の各物体の典型的な3次元的な構造を知っているからであると考えられる。そこで, この考えに基づき, 領域分割的な物体認識の認識結果を利用して, 予め与えておいた, それぞれの物体のクラス(同一の一般名称を持つ物体の集合)のプロトタイプ3次元モデルを画像に当てはめることによって, 各物体の3次元での相対的な向き, 大きさなどの3次元情報を復元する試みを行なった。本発表では, その構想と簡単な実験結果について述べる。, 出版日 1997年09月24日, 全国大会講演論文集, 55巻, 掲載ページ 295-296, 日本語, 110002891448, AN00349328
URL

分散協調処理による多数の物体認識プログラムの統合
柳井啓司; 出口光一郎
我々は実画像に対する柔軟な認識を実現する枠組としてマルチエージェントを用いた分散協調型の物体認識システムを構築している. 我々のシステムでは, 単一種類の物体を認識する物体認識プログラムに, 協調機能を一体化して, 1つのエージェントとし, その集合体として物体認識を実現しようとしている. 各エージェントは, それぞれ独立に画像に対して物体の認識を行ない, その結果を互いに交換し, 物体間の通常考えられるような関係の知識を用いて, 競合の解消や他のエージェントの結果の利用などの協調処理を行なう. 我々は, このシステムを並列計算機AP1000+上に実装した. 本発表では, この構想に基づいて実現した物体認識システムの概要とその動作例について紹介する., 一般社団法人電子情報通信学会, 出版日 1997年08月20日, 電子情報通信学会技術研究報告. CPSY, コンピュータシステム, 97巻, 226号, 掲載ページ 31-38, 日本語, 110003179856, AN10013141

形状知識と関係知識の統合によるマルチエージェント物体認識システムの実現
柳井啓司; 出口光一郎
出版日 1997年06月24日, 人工知能学会全国大会論文集 = Proceedings of the Annual Conference of JSAI, 11巻, 掲載ページ 408-411, 日本語, 0914-4293, 10011367423, AN10258229

マルチエージェント物体認識システムにおける対象物体に関する知識についての一考察
柳井啓司; 出口光一郎
従来の物体認識システムにおいては, 認識の対象とする物体の形状に関する知識を数値データとして知識べース化し, それを用いて認識を行なうモデルベース型の認識が主流であった. そうした方法は, 予め正確な形状が分かっている物体を対象としている物体認識においては有効であったが, 実世界の画像では, 通常, 対象の正確な形状は予め知ることができないので, 絶対的な数値を用いた形状モデルの構築が困難であり, 必ずしも有効とはいえなかった. そこで我々は, マルチエージェントを用いて構築した実画像に対する物体認識システムにおいて, 認識対象の物体の形状に加えて, 物体同士の相対的な関係も知識として積極的に利用することを試みた. 本稿においては, その実験について紹介する., 出版日 1997年03月12日, 全国大会講演論文集, 54巻, 掲載ページ 85-86, 日本語, 110002890902, AN00349328
URL

マルチエージェントによる機能ベーストな物体認識システムの試み
柳井啓司; 出口光一郎
従来のモデルベーストによる物体認識の研究では,予め形状が分かっている物体を対象としてきた.ところが,人間は初めて見る形状の物体であっても,その機能を推測することによって,おおよそそれが何であるかを認識することができる.より人間に近い認識を実現するには,未知の物体に対しても推論によって認識を行なうことが必要となる.近年,このような認識を実現するための新しい手法として,物体の持つ機能を認識し,それに基づいて物体を認識するという機能ベーストな物体認識手法が提案されている.一方.近年,分散AIやマルチエージェントの考え方を画像理解システムの構成法に利用する,分散協調処理による画像理解が提案されている.この分散協調処理を導入することにより,画像中の物体同士の関係を用いた認識が可能になる.本発表では,人工物によって構成される室内環境の実画像に対する物体認識に,分散協調処理によって実現される機能ベーストな物体認識を適用させる構想について述べ,我々が試作中である実験システムについて紹介する., 出版日 1996年09月04日, 全国大会講演論文集, 53巻, 掲載ページ 235-236, 日本語, 110002887722, AN00349328
URL

書籍等出版物

IT Text 深層学習
柳井啓司; 中鹿亘; 稲葉通将
学術書, 日本語, 共著, オーム社, 出版日 2022年11月

レクチャーマルチメディア: 基礎からわかる音・画像・映像の情報処理
川崎洋; 柳井啓司; 佐川立昌; 森山剛; 古川亮
学術書, 日本語, 共著, 数理工学社, 出版日 2022年03月29日

光学辞典
事典・辞書, 日本語, 分担執筆, 物体認識 (P.204-280), 朝倉書店, 出版日 2014年

総合コミュニケーション科学シリーズユニーク＆エキサイティングサイエンス III
一般書・啓蒙書, 日本語, 共著, 「Web 情報マイニングへの招待 –Web 上の画像・映像から面白いこと，役に立つことを発見する！–」, 近代科学社, 出版日 2014年

Multimedia Information Extraction
Keiji Yanai; Hidetoshi Kawakubo; Kobus Barnard
英語, 共著, Entropy-based Analysis of Visual and Geo-location Concepts in Images, IEEE Computer Society Press, 出版日 2011年

Handbook of Social Network Technologies and Applications
Keiji Yanai; Bingyu Qiu
英語, 共著, Mining Regional Representative Photos from a Consumer-Generated Geotagged Photo Database, Springer, 出版日 2010年

コンピュータビジョン最先端ガイド III
柳井啓司
日本語, 共著, Bag-of-Featuresに基づく物体認識 -- 一般物体認識 --, アドコム・メディア株式会社, 出版日 2010年

Toward Category-Level Object Recognition
Kobus Barnard; Keiji Yanai; Matthew Johnson; Prasad Gabbur
英語, 共著, Cross Modal Disambiguation, Springer, 出版日 2006年12月

講演・口頭発表等

StableSeg: Stable Diffusionによるゼロショット領域分割
本部勇真; 山口廉斗; 柳井啓司
画像の認識・理解シンポジウム (MIRU), 査読付
発表日 2023年07月

人物・物体・動作デコーダの分離によるHOI検出
陳俊文; 王瀛成; 柳井啓司
電子情報通信学会パターン認識・メディア理解研究会（PRMU）, 査読付
発表日 2023年07月

VQ-VDM: ベクトル量子化変分オートエンコーダと拡散モデルを用いた動画生成モデル
梶凌太; 柳井啓司
画像の認識・理解シンポジウム (MIRU), 査読付
発表日 2023年07月

CalorieCam360: 全方位カメラによる複数人同時食事カロリー量推定システム
寺内健人; 山本耕平; 柳井啓司
画像の認識・理解シンポジウム (MIRU), 査読付
発表日 2023年07月

CLIPと微分可能レンダラーを用いたフォントスタイル変換
泉幸太; 柳井啓司
画像の認識・理解シンポジウム (MIRU)
発表日 2023年07月

Stable Diffusionによるゼロショット画像領域分割
本部勇真; 柳井啓司
電子情報通信学会パターン認識・メディア理解研究会（PRMU）
発表日 2023年03月03日

深層距離学習の特許図面検索への適用
樋口幸太郎; 柳井啓司
電子情報通信学会パターン認識・メディア理解研究会（PRMU）
発表日 2023年03月03日

分離されたデコーダとノイズ除去学習を用いたHOI検出
陳俊文; 王瀛成; 柳井啓
発表日 2023年03月02日

全方位カメラを用いた複数人食事動作同時認識
寺内健人; 柳井啓司
電子情報通信学会パターン認識・メディア理解研究会（PRMU）
発表日 2023年03月02日

SetMealAsYouLike: Few-shot Segmentationによる食事画像への皿領域マスクの追加と食事画像生成への応用
本部勇真; 寺内健人; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

時間的一貫性を考慮したビデオ会議のための自然な仮想試着
清水大輝; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

Cross-Modal Recipe Embeddingを用いたマスクに基づく食事画像生成
陳仲涛; 楊景; 本部勇真; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

大規模マルチモーダルモデルCLIPを用いた画像形状変換
銭雨晨; 山本耕平; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

TransformerとLarge Batch Sizeを用いたクロスモーダルレシピエンベディング学習
楊景; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

マルチスケールのアンカーを用いた人間と物体のインタラクション検出
陳俊文; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

単一RGB-D画像と陰関数表現を用いた食事と食器の実寸三次元再構成と体積推定
成冨志優; 本部勇真; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

Vision Transformerを用いたContinual Learning
武田麻奈; 清水大輝; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2022年07月

Vision TransformerにおけるContinual Learning
武田麻奈; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2022年03月

陰関数表現とRGB-D画像を用いた実寸通り食事と食器の三次元再構成
成冨志優; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2022年03月

StyleGANによるCLIP-Guidedな画像形状特徴編集
銭雨晨; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2022年03月

クエリベースのアンカーを用いた人間と物体のインタラクション検出
陳俊文; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2022年03月

Transformerを用いた人物行動検出
水野颯介; 柳井啓
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2022年03月

クロスモーダルレシピエンベディングによるマスクに基づく食事画像生成
陳仲涛; 本部勇真; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2022年03月

Transformerを用いたクロスモーダルレシピ検索・画像生成
楊景; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2022年03月

Adaptive Point-wise グループ化畳み込みを用いた小規模データセットからの画像の生成
武田麻奈; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2021年07月

初期ポーズ生成の改良とGCNの導入によるポーズシーケンス生成モデルの拡張
寺内健人; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2021年07月

全体カロリー量のみがアノテーションされた複数品食事画像の個別カロリー量推定
岡本開夢; 足立賢人; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2021年07月

食事画像に対するFew/Zero-shot Segmentation
本部勇真; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2021年07月

食事画像に対する少数およびゼロショット領域分割
本部勇真; 柳井啓司
ポスター発表, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), 国内会議
発表日 2021年05月

初期ポーズ生成の改良とGCNの導入によるポーズシーケンス生成モデルの拡張
寺内健人; 柳井啓司
ポスター発表, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), 国内会議
発表日 2021年05月

単一画像からの食事(食器含む)と食器単体の三次元形状の同時復元を用いた食事領域の体積推定
成冨志優; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会, 国内会議
発表日 2020年10月

意味と形状の分離によるマルチモーダルレシピ検索及び画像生成
杉山優; 岡本開夢; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2020年08月

単一画像変換ネットワークによる複数タスクと組み合わせタスクの学習
武田麻奈; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2020年08月

単一食事画像からの皿と食事の同時分離形状復元
成冨志優; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2020年08月

RamenStyleAsYouLike: 領域毎のスタイル特徴の融合による画像生成
岡本開夢; 下田和; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2020年08月

画素単位アノテーション付きの食事画像データセットの構築と認識・生成への応用
岡本開夢; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2020年08月

意味と形状の分離によるマルチモーダルレシピ検索及び画像生成
杉山優; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2020年03月

ラーメンスタイルエンコーダーを用いたスタイル特徴とマスク画像からの画像生成
趙宰亨; 下田和; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU)
発表日 2020年03月

映像・音・センサー情報の統合によるレスキュー犬の1人称行動認識
井出佑汰; 荒木勇人; 濱田龍之介; 大野和則; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), 国内会議
発表日 2020年03月

皿領域の推論を活用した食事の弱教師あり領域分割
下田和; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会 (CEA), 国内会議
発表日 2019年10月

食事画像領域分割データセットの作成とその活用
岡本開夢; Cho Jaehyeong; 會下拓実; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会 (CEA), 国内会議
発表日 2019年10月

自己教師あり学習による変化領域の推論を活用した弱教師あり領域分割
下田和; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

米飯を基準としたCNNによる食事画像からのカロリー量推定
會下拓実; Jaehyeong Cho; 松平礼史; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

Ramen as You Like
Jaehyeong Cho; Wataru Shimoda; Keiji Yanai
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

レスキュー犬の一人称動画を用いた動作推定
荒木勇人; 井出佑汰; 濱田龍之介; 大野和則; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

DepthCalorieCam: 深度カメラと深層学習による自動食事カロリー量推定システム
安蒜祥和; 會下拓実; 岡本開夢; 泉裕貴; Jaehyeong Cho; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

大量のTwitter位置情報付き画像を用いた世界各地域における食事傾向分析
岡本開夢; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

Identityと化粧Styleの分離による顔画像変換
五味京祐; 越野誠也; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

重み選択マスクを用いた画像変換ネットワークの連続学習
松本晨人; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

SSA-GAN: Cloud Video Generation from a Single Image with Spatial Self-Attention Generative Adversarial Networks
Daichi Horita; Keiji Yanai
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

ウミネコ動画の自動分析
井出佑汰; 水谷友一; 依田憲; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

ONNX2MPSNNGraph: モバイル深層学習コードジェネレータの実装と評価
泉裕貴; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2019年08月

米飯画像の実寸推定に基づく面積を考慮したカロリー量推定
會下拓実; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム (DEIM), 国内会議
発表日 2019年03月

深度付き画像と深層学習による食事カロリー量推定システムの開発
安蒜祥和; 會下拓実; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム (DEIM), 国内会議
発表日 2019年03月

画像変換ネットワークによる連続学習
松本晨人; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム (DEIM), 国内会議
発表日 2019年03月

位置情報付きTwitter画像を用いた世界の食事傾向分析
岡本開夢; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム (DEIM), 国内会議
発表日 2019年03月

Conditional GANによる化粧顔画像変換
五味京祐; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム (DEIM), 国内会議
発表日 2019年03月

変化領域の推測による弱教師あり領域分割の精度向上
下田和; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）, 国内会議
発表日 2019年03月

深層学習による太陽画像からの太陽黒点数の推定
樋口陽光; 會下拓実; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会総合大会, 国内会議
発表日 2019年03月

教師情報に含まれるノイズに堅牢な弱教師あり領域分割手法
下田和; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2018年08月

NNによる料理検出とカロリー量推定のマルチタスク学習
會下拓実; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2018年08月

単語情報を利用した画像の質感転送
杉山優; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2018年08月

Chainer2MPSGraph: 高速深層学習モバイルアプリ作成のためのモデルコンバータ
泉裕貴; 堀田大地; 丹野良介; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2018年08月

画像マイニングを用いた Conditional Cycle GAN による食事画像変換
堀田大地; 丹野良介; 下田和; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2018年08月

CNNを用いた質感文字生成
成沢淳史; 下田和; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2018年08月

AR 技術とモバイル深層学習を活用した食事カロリー量推定
丹野良介; 會下拓実; Jaehyeong Cho; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), (2018)., 国内会議
発表日 2018年08月

深層学習による質感文字生成
成沢淳史; 下田和; 柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会, 国内会議
発表日 2018年06月

画像内容を考慮した質感表現に基づく画像変換
杉山優; 柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会, 国内会議
発表日 2018年06月

大量のTwitter画像を用いたConditional Cycle GANによる食事写真カテゴリ変換
堀田大地; 成冨志優; 丹野良介; 下田和; 柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会, 国内会議
発表日 2018年06月

深層学習による画像認識革命
口頭発表（招待・特別）, 日本語, オプティメカトロニクス協会講演会, 招待, 国内会議
発表日 2018年03月14日

スタイル転移によるフォント画像変換
成沢淳史; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), 国内会議
発表日 2018年03月

AR DeepCalorieCam: AR表示型食事カロリー量推定システム
丹野良介; 會下拓実; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM), 国内会議
発表日 2018年03月

Conditional GANによる食事写真の属性操作
成冨志優; 堀田大地; 丹野良介; 下田和; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM), 国内会議
発表日 2018年03月

Conditional GANを用いた大規模食事画像データからの画像生成
伊藤祥文; 丹野良介; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会(CEA), 国内会議
発表日 2018年03月

単一の畳み込みネットワークによる料理検出とカロリー量推定のマルチタスク学習
會下拓実; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会(CEA), 国内会議
発表日 2018年03月

會下拓実・丹野良介・柳井啓司
CNNによる複数品食事画像の同時カロリー推定とそのモバイル実装
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会(CEA), 国内会議
発表日 2017年12月

CoreMLによるiOS深層学習アプリの実装と性能分析
丹野良介; 泉裕貴; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）, 国内会議
発表日 2017年10月

CNNによる複数料理写真からの同時カロリー量推定
會下拓実; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）, 国内会議
発表日 2017年10月

食事画像カロリー量推定における回帰による手法と検索による手法の比較
會下拓実; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会 (CVIM)
発表日 2017年09月

完全教師あり学習手法を用いた弱教師あり領域分割におけるシード領域生成方法の改良
下田和; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会 (CVIM), 国内会議
発表日 2017年09月

Conditional GAN を用いた複数詳細カテゴリ画像の合成
伊藤祥文; Jaehyeong Cho; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

弱教師あり領域分割のための一貫性に基づく学習画像の領域分割容易性推定
下田和; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

画像スタイル変換とWeb画像を用いた画像の任意質感生成
松尾真; 下田和; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

Multi-task CNNを用いた食材および調理手順情報を利用した食事画像カロリー量推定
會下拓実; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

Twitter画像に対する地域別画像タイプの大規模分析
長野哲也; 會下拓実; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

Neural Style TransferとCycle GANを利用したフォント変換
成沢淳史; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

ConvDeconvNetの効率的モバイル実装による画像変換・物体検出・領域分割リアルタイムiOSアプリ群
丹野良介; 泉裕貴; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

Unseen Style Transfer Network
Ryosuke Tanno; Keiji Yanai
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2017年08月

画像を生成する深層学習ネットワーク ―領域分割と画像生成・変換―
柳井啓司; 下田和
口頭発表（招待・特別）, 日本語, 日本画像学会主催セミナー『人工知能における学習技術』, 招待, 日本画像学会, 国内会議
発表日 2017年07月04日

食事レシピ情報を利用した食事画像からのカロリー量推定
會下拓実; 柳井啓司
ポスター発表, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), 国内会議
発表日 2017年05月11日

映像認識技術の現状と生物映像への適用可能性
柳井啓司
口頭発表（招待・特別）, 日本語, 生物学動画・音声アーカイブに関するシンポジウム, 招待, 大阪市立自然史博物館, 国内会議
発表日 2017年03月05日

深層学習技術が引き起こした画像認識の大幅性能向上
柳井啓司
口頭発表（招待・特別）, 日本語, 「くらしの中の共生」第13回人類動態学会シンポジウム, 招待, 人類動態学会, 国内会議
発表日 2016年12月17日

食事画像認識の現状と今後
柳井啓司
口頭発表（招待・特別）, 日本語, 電子情報通信学会データ工学研究会(DE), 招待, 国内会議
発表日 2016年12月01日

深層学習による質感画像の認識・変換
柳井啓司
口頭発表（招待・特別）, 日本語, 質感のつどい第２回公開フォーラム, 招待, 国内会議
発表日 2016年11月30日

ディープラーニングによる画像・映像の認識と生成
柳井啓司
口頭発表（招待・特別）, 日本語, 「映画のまち調布」映画・映像技術シンポジウム, 招待, 調布市役所、多摩信用金庫、国立大学法人電気通信大学, 国内会議
発表日 2016年11月25日

Neural Style Transferと領域分割による画像の部分的質感操作
松尾真; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）
発表日 2016年09月

弱教師学習手法を用いたWebからの食事検出器の自動学習
下田和; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）, 国内会議
発表日 2016年09月

CNNの順・逆伝搬値とCRFを利用した弱教師領域分割
下田和; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2016年08月

Style Image Retrieval Using CNN-based Style Vector
Shin Matsuo; Keiji Yana
口頭発表（一般）, 英語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2016年08月

DeepXCam: CNNによるリアルタイムモバイル画像認識・変換アプリ群
丹野良介; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2016年08月

CNNを用いた商品札文字認識
成沢淳史; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2016年08月

Twitter食事画像からの詳細カテゴリ発見
伊藤祥文; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2016年08月

質感画像の弱教師領域分割とその結果に基づく質感の部分的変換
下田和; 松尾真; 柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会
発表日 2016年06月

スマートフォン上でのDeep Learningによる画像認識
丹野良介; 柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会, 国内会議
発表日 2016年06月

80msで認識可能な深層学習による2000種類物体認識iOSアプリ
丹野良介; 柳井啓司
ポスター発表, 日本語, 画像センシングシンポジウム (SSII)
発表日 2016年06月

値札文字認識による価格記録Webアプリケーション
成沢淳史; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM), 国内会議
発表日 2016年03月

iOS上のDeep Learningによる食事画像認識アプリ
丹野良介; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM), 国内会議
発表日 2016年03月

テレビ映像からの特定動作シーンの自動検出
小林隼人; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM), 国内会議
発表日 2016年03月

位置情報付き画像を用いた単語概念の時間変化の分析
ボルドビレグサイハン; 及川雄介; 伊藤祥文; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM), 国内会議
発表日 2016年03月

スマートフォンによる食事画像からの自動カロリー量推定システム
岡元晃一; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会, 国内会議
発表日 2016年03月

料理写真撮影におけるおいしそうな構図決定および撮影支援モバイルアプリ
柿森隆生; 岡部誠; 柳井啓司; 尾内理紀夫
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会
発表日 2016年03月

CNNとMILを用いた弱教師あり領域分割
下田和; 柳井啓司
口頭発表（一般）, 日本語, 情報論的学習理論ワークショップ (IBIS), 国内会議
発表日 2015年11月27日

CNNを用いた複数品食事画像の領域分割とカロリー推定
下田和; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会データ工学研究会
発表日 2015年09月25日

CNNを用いた弱教師学習による画像領域分割
下田和; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）, 国内会議
発表日 2015年09月15日

Ｗｅｂ大規模画像データを用いた画像とオノマトペの関係分析
柳井啓司; 下田和
口頭発表（招待・特別）, 日本語, FIT2015 第14回情報科学技術フォーム, 招待, 国内会議
発表日 2015年09月

料理写真撮影におけるおいしそうな構図決定を支援するシステム
柿森隆生; 岡部誠; 柳井啓司; 尾内理紀夫
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2015年07月

DeepFoodCam: DCNNによる101種類食事認識アプリ
岡元晃一; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2015年07月

画像特徴とテキスト特徴を用いた画像ツイートの位置推定
松尾真; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU)
発表日 2015年07月

シーン文字認識と自己動作分類を用いた車載動画の要約
佐藤享憲; 成沢淳史; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU)
発表日 2015年07月

CNNの逆伝搬を利用した食事画像の領域分割
下田和; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU)
発表日 2015年07月

画像の位置推定を用いたマイクロブログからの視覚的なイベント検出
金子昂夢; 松尾真; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU)
発表日 2015年07月

Automatic Action Video Dataset Construction from Web using Density-based Cluster Anaysis and Outlier Detection
Do Hang Nga; Keiji Yanai
ポスター発表, 英語, 画像の認識・理解シンポジウム(MIRU), (2015), 国内会議
発表日 2015年07月

DCNN特徴を用いたWeb からの質感画像の収集と分析
下田和; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）, 国内会議
発表日 2015年01月22日

Real-time Photo Mining from the Twitter Stream: Event Photo Discovery and Food Photo Detection
Keiji Yanai; Takamu Takamu; Yoshiyuki Kawano
口頭発表（招待・特別）, 英語, IEEE International Symposium on Multimedia (ISM), 招待, 国際会議
発表日 2014年12月

大量の位置情報付きTwitter画像データからの視覚的イベント検出
金子昂夢; 柳井啓司
ポスター発表, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2014年07月

既存カテゴリの活用とクラウドソーシングによる食事画像データセットの自動拡張
河野憲之; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU), 国内会議
発表日 2014年07月

Twitter上の位置情報付き画像を利用したリアルタイムイベント画像検出
金子昂夢; 柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会, 国内会議
発表日 2014年05月

クラウドソーシングによる食事画像データセットの自動構築
河野憲之; 柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会, 国内会議
発表日 2014年05月

Twitterからのジオタグ画像収集による視覚的イベント検出
金子昂夢; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）
発表日 2013年10月

ラーメン vs カレー：２年分のログデータと高速食事画像認識エンジンを用いたTwitter食事画像分析とデータセット自動構築
河野憲之; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU,）
発表日 2013年10月

FoodCam:スマートフォン上でのリアルタイム食事画像認識による食事記録アプリケーション
河野憲之; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会データ工学研究会（DE）
発表日 2013年09月

VisualTextualRank: A Video Shot Ranking Method Using Visual Similarity and Tag Co-occurrence
Do Hang Nga; Keiji Yanai
口頭発表（一般）, 英語, 画像の認識・理解シンポジウム(MIRU2013)
発表日 2013年07月

スマートフォン上でのリアルタイム食事認識システム
河野憲之; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2013)
発表日 2013年07月

スマートフォン上でのリアルタイム食事・食材画像認識アプリケーション
河野憲之; 丸山拓馬; 柳井啓司
口頭発表（一般）, 日本語, 画像センシングシンポジウム (SSII)
発表日 2013年06月

ウェアラブルカメラを用いた道案内映像の自動作成
岡本昌也; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会 (CVIM),情報処理学会コンピュータビジョンとイメージメディア研究会 (CVIM)
発表日 2013年05月

料理画像認識を用いたモバイル食事記録システム
河野憲之; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
発表日 2013年04月

k-meansによる局所特徴量抽出と皿検出器による食事画像認識の改良
松田裕司; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）
発表日 2013年03月

食事認識を用いたモバイル食事管理システム
河野憲之; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM)
発表日 2013年03月

位置情報付き画像ツイートを利用した視覚的なイベント検出
金子昂夢; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM)
発表日 2013年03月

クラウドソーシングによる食事画像認識モデルの自動構築
大澤翔吾; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM)
発表日 2013年03月

視覚特徴およびタグ共起を用いた大規模Webビデオショットランキング
Do Hang Nga; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会（PRMU）
発表日 2013年02月

TRECVID Semantic Indexing Task と Multimedia Event Detection Taskへの取り組み
樋爪和也; 柳井啓司
口頭発表（一般）, 日本語, ビジョン技術の実利用ワークショップ (ViEW2012)
発表日 2012年12月

Web動画・画像を用いた特定動作の対応ショットの自動抽出
Do Hang Nga; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2012)
発表日 2012年08月

料理間の共起関係を考慮した食事画像認識
松田裕司; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2012)
発表日 2012年08月

タグの組み合わせによる視覚的な関連性変化の分析
小原侑也; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2012)
発表日 2012年08月

食材画像認識を用いたモバイルレシピ推薦システム
丸山拓馬; 樋爪和也; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2012)
発表日 2012年08月

Automatically Extracting Relevant Video Shots of Specific Actions from the Web
Do Hang Nga; Keiji Yanai
口頭発表（招待・特別）, 英語, Greater Tokyo Area Multimedia/Vision Workshop, Greater Tokyo Area Multimedia/Vision Workshop, Tokyo, Japan, 国際会議
発表日 2012年08月

Web上の大量画像を用いた名詞と形容詞の関係分析
小原侑也; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
発表日 2012年05月

物体認識技術を用いたモバイル物品管理システム
望月宏史; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム（DEIM）
発表日 2012年03月

テレビ番組からの位置情報付き旅行映像データベースの自動構築
向井康貴; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム（DEIM）
発表日 2012年03月

食材画像認識を用いたモバイルレシピ推薦システム
丸山拓馬; 秋山瑞樹; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会食メディア研究会(CEA)
発表日 2012年03月

服飾画像マイニングのための衣類領域からの色情報抽出
相田優; 柳井啓司; 柴原一友; 藤本浩司
口頭発表（一般）, 電子情報通信学会画像工学研究会(IE)
発表日 2012年03月

特徴点選択とペア化による Naive-Bayes Nearest-Neighbor手法の改良
秋山瑞樹; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
発表日 2012年03月

Web動画・画像を用いた特定動作ショットの自動収集
Do Hang Nga; 樋爪和也; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
発表日 2012年03月

Deformable Part Modelを用いた料理の位置検出
松田裕司; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 2011年12月

物体認識技術を用いた食事画像認識
柳井啓司
口頭発表（招待・特別）, 日本語, 人工知能セミナー「食とAI～消費・小売流通・生産の立場から～」, 国内会議
発表日 2011年10月

Webマルチメディアと物体認識
柳井啓司
口頭発表（招待・特別）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会, 国内会議
発表日 2011年10月

一般物体認識技術の発展
柳井啓司
口頭発表（招待・特別）, 日本語, 精密工学会画像応用技術専門委員会研究会, 国内会議
発表日 2011年09月

大量のWeb動画からの教師なし特定動作ショット抽出
DoHang Nga; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2011)
発表日 2011年07月

GeoVisualRank を用いた単語概念の地域性の分析
川久保秀敏; 樋爪和也; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2011)
発表日 2011年07月

候補領域推定による複数品目に対応した食事画像認識
甫足創; 松田裕司; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2011)
発表日 2011年07月

マルチフレームを用いた動画像認識
樋爪和也; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンとイメージメディア研究会（CVIM）
発表日 2011年05月

地域別代表画像を用いた単語概念の地域性の分析
川久保秀敏; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会(CVIM)
発表日 2011年03月

候補領域推定による食事画像の複数品目認識
甫足創; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会(CVIM)
発表日 2011年03月

Bag-of-frames と時空間特徴量を用いたSemantic Indexing Taskへの取り組み
下田保志; 野口顕嗣; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU)
発表日 2011年02月

時空間特徴量を用いたWeb 動画からの特定動作ショットの自動抽出
Do Hang Nga; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会(PRMU)
発表日 2011年02月

写真撮影の位置軌跡を利用した旅行支援システム
奥山幸也; 柳井啓司
口頭発表（一般）, 日本語, データ工学と情報マネジメントに関するフォーラム(DEIM)
発表日 2011年02月

Webマルチメディアと物体認識
柳井啓司
口頭発表（招待・特別）, 日本語, NHK放送技術研究所セミナー, 国内会議
発表日 2010年12月

Folksonomyを用いた画像特徴とタグ共起に基づく画像オントロジーの自動構築
秋間雄太; 川久保秀敏; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2010)
発表日 2010年07月

Web上の大量画像を用いた特定物体認識手法による一般物体認識
秋山瑞樹; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2010)
発表日 2010年07月

動作認識のための時空間特徴量と特徴統合手法の提案
野口顕嗣; 下田保志; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2010)
発表日 2010年07月

ジオタグ画像認識における位置情報の利用法の検討と分析
八重樫恵太; 丸山拓馬; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2010)
発表日 2010年07月

位置情報を考慮したVisualRankによる地域別代表画像の選出
川久保秀敏; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2010)
発表日 2010年07月

画像・映像の認識と意味的検索
柳井啓司
口頭発表（招待・特別）, 日本語, 電子情報通信学会音声研究会(SP), 国内会議
発表日 2010年07月

一般物体認識における機械学習の利用
柳井啓司
口頭発表（招待・特別）, 日本語, 電子情報通信学会情報論的学習と機械学習研究会(IBIS-ML), 国内会議
発表日 2010年06月

Web 上の大量画像を用いた特定物体認識手法による一般物体認識
秋山瑞樹; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会
発表日 2010年05月

位置情報付き路上画像の撮影方向推定システムの提案
丸山拓馬; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2010年05月

ジオタグ画像認識における周辺テキスト情報の有効性の検証
八重樫恵太; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2010年03月

Web動画ショットの動作分類のための時空間特徴抽出手法の提案
野口顕嗣; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2010年03月

Folksonomyによる階層構造画像データベースの構築
秋間雄太; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2010年03月

多種類特徴統合による動作認識手法の提案
野口顕嗣; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 2010年03月

【チュートリアル】一般物体認識
柳井啓司
口頭発表（招待・特別）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会, 招待, 国内会議
発表日 2009年11月

VisualRankにおける位置情報活用の検討
川久保秀敏; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 2009年11月

マルチカーネル学習を用いた画像特徴と航空写真特徴の重要度の推定
八重樫恵太; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 2009年11月

位置情報付き写真における撮影位置の航空写真を利用した画像認識
八重樫恵太; 柳井啓司
口頭発表（一般）, Webとデータベースに関するフォーラム (WebDB Forum 2009)
発表日 2009年11月

一般物体認識技術の発展と映像検索への応用
柳井啓司
口頭発表（招待・特別）, 日本語, FIT2009 第８回情報科学技術フォーム, FIT2009 第８回情報科学技術フォーム
発表日 2009年09月

Web画像マイニング -Web上の膨大な画像データからの知識発見-
柳井啓司
口頭発表（招待・特別）, 日本語, FIT2009 第８回情報科学技術フォーム, FIT2009 第８回情報科学技術フォーム
発表日 2009年09月

動きの連続性を考慮した動画からの局所的な時空間特徴の抽出
野口顕嗣; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2009)
発表日 2009年08月

単語概念の視覚性と地理的分布の関係性の分析
川久保秀敏; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2009)
発表日 2009年08月

画像特徴とテキスト特徴の統合によるWebスポーツニュース画像のイベント分類
北原章雄; 奥山幸也; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2009)
発表日 2009年08月

Multiple Kernel Learningによる50種類の食事画像の認識
上東太一; 甫足創; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2009)
発表日 2009年08月

Multiple Kernel Learningを用いた食べ物画像の認識
上東太一; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2009年03月

TRECVID高次特徴抽出タスクにおける多種類特徴統合手法の比較
湯志遠; 柳井啓司; TRECVID高次特徴抽出タスクにおけ; 多種類特徴統合手法の比較
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2009年03月

画像特徴とテキスト特徴を用いたWebスポーツニュース画像のイベント分類
北原章雄; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2009年03月

Bag-of-Features表現を用いたエントロピーによる単語の視覚性の分析
川久保秀敏; 柳井啓司
口頭発表（一般）, 英語, 情報処理学会コンピュータビジョンイメージメディア研究会
発表日 2009年03月

現状および今後の物体認識技術のデジタルカメラへの応用の可能性
柳井啓司
口頭発表（招待・特別）, 日本語, オプティメカトロニクス協会講演会, オプティメカトロニクス協会
発表日 2009年03月

一般物体認識の現状と今後の展望
柳井啓司
口頭発表（招待・特別）, 日本語, 筑波大学システム情報工学研究科コンピュータサイエンス専攻学術講演会, 筑波大学システム情報工学研究科コンピュータサイエンス専攻
発表日 2009年03月

Mining Regional Representative Photos from a Large-scale Geotagged Image Database
Qiu BINGYU; Keiji YANAI
口頭発表（一般）, 英語, 電子情報通信学会パターン認識・メディア理解研究会,電子情報通信学会パターン認識・メディア理解研究会
発表日 2008年12月

一般物体認識と機械学習
柳井啓司
口頭発表（招待・特別）, 日本語, T-PRIMAL セミナー, T-PRIMAL セミナー
発表日 2008年08月

色・動き・音特徴を用いたEarth Mover’s Distance に基づくWeb動画検索
高田圭佑; 柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2008),画像の理解シンポジウム(MIRU2008)
発表日 2008年07月

確率トピックモデルによるWeb画像の分類
柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会,http://www.ai-gakkai.or.jp/jsai/conf/2008/
発表日 2008年06月

Bag-of-keypointsによるカテゴリー分類
柳井啓司
口頭発表（招待・特別）, 日本語, 画像センシングシンポジウム(SSII), 画像センシングシンポジウム(SSII)
発表日 2008年06月

色，動き，顔特徴に基づくTRECVIDラッシュ映像の自動要約
野口顕嗣; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会,情報処理学会CVIM研究会
発表日 2008年05月

撮影位置の情報を用いた一般画像認識の可能性の検討
八重樫恵太; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会,情報処理学会CVIM研究会
発表日 2008年05月

多種類特徴の統合によるTRECVID映像の認識
劉謳南; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会全国大会,情報処理学会全国大会
発表日 2008年03月

クラスタリングによるTRECVIDラッシュ映像の要約
野口顕嗣; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会全国大会,情報処理学会全国大会
発表日 2008年03月

Earth Mover's Distance を用いた類似Web動画検索
高田圭佑; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会全国大会,情報処理学会全国大会
発表日 2008年03月

Bag-of-Keypointsを用いたWeb画像収集の高精度化
柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2007)
発表日 2007年08月

Web画像ニュースの顔と人物名の対応付け
北原章雄; 柳井啓司
口頭発表（一般）, 情報処理学会コンピュータビジョン・イメージメディア研究会
発表日 2007年05月

Bag-of-Keypoints表現を用いたWeb画像分類
上東太一; 柳井啓司
口頭発表（一般）, 情報処理学会コンピュータビジョン・イメージメディア研究会
発表日 2007年05月

Bag-of-keypointsによるTRECVIDデータに対する映像認識
湯志遠; 柳井啓司
口頭発表（一般）, 情報処理学会コンピュータビジョン・イメージメディア研究会
発表日 2007年05月

位置情報を用いた旅行自動記録システム
阿久津剛之; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会全国大会
発表日 2007年03月

Web写真ニュースの分類と検索
伊與田達也; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会全国大会
発表日 2007年03月

画像付きニュース記事からの顔と人物名の抽出
北原章雄; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会全国大会
発表日 2007年03月

一般物体認識の現状と今後
柳井啓司
口頭発表（招待・特別）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会, 情報処理学会コンピュータビジョン・イメージメディア研究会
発表日 2006年09月

Web画像マイニング: Webからの画像知識の獲得とその応用
柳井啓司
口頭発表（招待・特別）, 日本語, 電気関係学会東海支部連合大会, 電気関係学会東海支部連合大会
発表日 2006年09月

確率モデルを用いたWeb画像マイニングによる画像認識
柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU2006)
発表日 2006年07月

確率モデルを用いたWeb画像マイニングによる画像認識
柳井啓司
口頭発表（一般）, 日本語, 人工知能学会全国大会,人工知能学会全国大会
発表日 2006年06月

テキスト特徴と画像特徴を併用したWeb上の写真付きニュース記事のクラスタリング
伊与田達也; 阿久津剛之; 柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 2006年06月

一般物体認識のための単語概念の視覚性の分析
柳井啓司; Kobus Barnard
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会,情報処理学会コンピュータビジョン・イメージメディア研究会
発表日 2006年01月

Evaluation Strategies for Image Understanding and Retrieval
Keiji Yanai; Nikhil V. Shirahatti; Prasad Gabbur; Kobus Barnard
口頭発表（招待・特別）, 英語, ACM Multimedia Workshop on Multimedia Information Retrieval, ACM Multimedia Workshop on Multimedia Information Retrieval, Singapore, 国際会議
発表日 2005年11月

実世界画像コーパス作成のための高精度Web 画像収集{準教師あり学習を用いた画像選択{
柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU 2005)
発表日 2005年07月

Semi-Supervised Learning を用いたWeb画像収集システム
柳井啓司
口頭発表（一般）, 日本語, 第19 回人工知能学会全国大会
発表日 2005年06月

対局盤面と解説盤面の認識結果の統合による囲碁対局番組からの棋譜自動生成
林山剛久; 柳井啓司; 野下浩平
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 2005年03月

Web Image Mining: Can We Gather Visual Knowledge for Image Recognition from the Web ?
Keiji Yanai
口頭発表（招待・特別）, 英語, Pacific-Rim International Conference on Multimedia, Singapore, 国際会議
発表日 2003年12月

実世界画像自動分類のためのWeb画像マイニング
柳井啓司
口頭発表（一般）, 日本語, 第17回人工知能学会全国大会(2003年度)
発表日 2003年06月

Web画像収集における単語ベクトルの導入と画像特徴の改良
柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 2003年01月

Web画像を用いた一般画像の自動分類
柳井啓司
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウム(MIRU 2002)
発表日 2002年07月

囲碁テレビ番組からの棋譜自動生成システム
林山剛久; 柳井啓司; 野下浩平
口頭発表（一般）, 日本語, 情報処理学会コンピュータビジョン・イメージメディア研究会
発表日 2002年05月

反復深化探索に基く協力詰将棋の解法
星由雄; 野下浩平; 柳井啓司
口頭発表（一般）, 日本語, 情報処理学会ゲーム情報学研究会
発表日 2001年03月

PCクラスタを用いたWWWから高速画像収集システム
新藤雅也; 柳井啓司; 野下浩平
口頭発表（一般）, 英語, 電子情報通信学会パターン認識・メディア理解研究会報告,パターン認識・メディア理解研究会報告
発表日 2001年03月

WWWからの高速画像収集と収集画像を用いた画像認識の試み
柳井啓司
口頭発表（一般）, 日本語, 第15回人工知能学会全国大会
発表日 2001年

WWWからの大量の収集画像を用いた画像認識の試み
柳井啓司
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会報告
発表日 2001年

キーワードと画像特徴を利用したWWWからの画像収集の試み
柳井啓司
口頭発表（一般）, 日本語, 第14回人工知能学会全国大会論文集
発表日 2000年

解像度選択を用いた高解像度実画像に対する画像理解システム
柳井啓司; 出口光一郎
口頭発表（一般）, 日本語, 画像の認識・理解シンポジウムMIRU 2000論文集
発表日 2000年

マルチエージェントによる多重解像度画像理解システム
柳井啓司; 出口光一郎
口頭発表（一般）, 日本語, 第13回人工知能学会全国大会
発表日 1999年

高解像度画像利用のためのマルチエージェントによる多重解像度画像理解システム
柳井啓司; 出口光一郎
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会
発表日 1999年

マルチエージェント画像理解システムにおける対象間の空間的関係に関する考察
柳井啓司
口頭発表（一般）, 日本語, 第７回マルチ・エージェントと協調計算ワークショップ（MACC '98）
発表日 1998年

定性的モデル当てはめと空間推論による室内画像の認識
柳井啓司; 出口光一郎
口頭発表（一般）, 日本語, 電子情報通信学会パターン認識・メディア理解研究会報告,電子情報通信学会パターン認識・メディア理解研究会報告
発表日 1998年

マルチエージェント物体認識システムにおける協調に関する考察
柳井啓司
口頭発表（一般）, 日本語, 第6回マルチ・エージェントと協調計算ワークショップ(MACC'97)
発表日 1997年12月

担当経験のある科目_授業

メディア情報学実験
2019年04月 - 現在
電気通信大学

情報学工房
2018年04月 - 現在
電気通信大学

画像認識システム特論
2016年04月 - 現在
電気通信大学

物体認識論
2016年04月 - 現在
電気通信大学

深層学習・画像認識
2016年04月 - 2024年03月
電気通信大学

基礎プログラミングおよび演習
2016年04月 - 2022年03月
電気通信大学

深層学習と画像認識
2020年03月 - 2020年03月
京都大学霊長類研究所

メディア情報学プログラミング演習
2016年04月
電気通信大学

所属学協会

IEEE Computer Society

人工知能学会

電子情報通信学会

情報処理学会

共同研究・競争的資金等の研究課題

深層学習を用いた能動的な新しい食事管理技術の創出
研究期間 2022年04月01日 - 2026年03月31日

文字を介した視覚的コミュニケーション基盤の創成
内田誠一; 北本朝展; 中山英樹; 牛久祥孝; 柳井啓司; 大町真一郎; 塩入諭; 黄瀬浩一; 岩村雅一; 山本和明
日本学術振興会, 科学研究費助成事業, 九州大学, 基盤研究(A), 22H00540
研究期間 2022年04月01日 - 2025年03月31日

機能の重ね合せを実現する深層学習におけるタスク融合学習
研究期間 2022年06月30日 - 2024年03月31日

質感と形状の分離による奥深質感画像分析・生成のためのマルチモーダル深層学習モデル【深奥質感】
研究期間 2021年09月01日 - 2023年03月31日

機械可読時代における文字科学の創成と応用展開【分担者】
内田誠一
研究期間 2017年07月01日 - 2022年03月31日

モバイル深層学習技術を活用した動物ロギングのためのリアルタイム行動認識
柳井啓司
日本学術振興会, 科学研究費助成事業, 電気通信大学, 新学術領域研究(研究領域提案型), 本研究では，小型デバイス上での深層学習によるリアルタイム生物行動認識の実現のために，(1)動物一人称映像の認識の研究，(2)タスクとデバイスに適応した小型ネットワークの探索，(3)センサ情報と画像認識を併用したリアルタイム行動認識の実現, (4) (1)～(3)の手法を統合的に用いた小型IoT デバイス上での実装，を実施する．特に2019年度は１年目は基礎的研究として(1) 動物一人称映像の認識の研究，(2)タスクとデバイスに適応した小型ネットワークの探索を行った．これらは独立して研究できるので，並行して実施した． (1)では，レスキュー犬の一人称映像に対して，映像，音声に加えて，レスキュー犬が着用しているサイバースーツに搭載されている各種センサーの情報も統合して，動作認識を行う手法について研究を行った．静止画の情報，映像の動き，音声，センサーの４つの情報を統合し，従来を上回る認識精度を達成した． (2)では，ネットワーク圧縮技術であるpruningによってネットワークパラメータを減らす実験，自動的に最適なネットワーク構成を探索する技術であるNeural Architecture Search(NAS)技術によって最適なネットワークを探索する実験を実施した．また，複数タスクを同時に１つのネットワークで学習するマルチドメイン学習に関する基礎的な研究，またIoTデバイス上で，実行のみならず簡単な学習を行うことによって，環境変化に応じて動的にネットワークを更新する実験も実施した．, 19H04929
研究期間 2019年04月01日 - 2021年03月31日

信号変調に基づく視聴触覚の質感認識機構【多元質感知】【計画班分担者】
西田眞也
研究期間 2015年10月01日 - 2020年03月31日

自動食事診断実現のための深層学習とWeb知識を用いた食事写真カロリー量推定
研究期間 2017年 - 2020年

文字科学 ― 文字の機能の多面的解明
内田誠一; 柳井啓司; 牛久祥孝
日本学術振興会, 科学研究費助成事業, 九州大学, 基盤研究(A), 「文字」は我々の文化的活動やコミュニケーションを支える最重要メディアである．本課題では，「言語であり画像でもある」という文字の二面性に注目しながら，文字の持つ多様な機能の本質を総合的に解析する新分野「文字科学」を推進する予定であった．特にこれまで注目されることのなかった文字の4 機能（周囲の明確化，知識・意味伝達，雰囲気伝達，可読性維持）について，広汎で挑戦的かつ世界にも類例のない基礎的研究群を実施する予定であった．こうした予定であったところ，本課題での実施内容を含んだ基盤S課題「機械可読時代における文字科学の創成と応用展開（17H06100）」が平成29年5月末日に採択されるに至った．その結果，重複制約により，本課題に係る事業は開始後2か月で廃止される運びとなった．なお，基盤S課題の実施担当者も，本課題と同じ3名であり，本課題で実施予定だった内容は，基盤S課題においても問題なく実施可能である．本2か月間においては，代表者ならびに2名の分担者での議論により，上記の文字の4機能解明に係る課題の具体化を行った．特に，フォント自動生成，デザインとフォント形状の解明，情景内文字とその意味の関係解明，情景の画像情報が与える情報と情景内文字が与える情報の関係解明，アルファベット生成プロセスの工学的解明，等の課題について具体的な方法論を設定し，担当者を決定するとともに，データ収集・実装をスタートした．上述の通り，これらはいずれも前記基盤S課題に引き継がれることになっている．, 17H00736
研究期間 2017年04月01日 - 2018年03月31日

低消費電力リアルタイム画像認識実現のためのモバイル深層学習技術【生物移動情報学】
研究期間 2017年 - 2018年

単機能の重ね合せにより新機能を創発するマルチファンクショナル深層学習ネットワーク【人工知能と脳科学】
研究期間 2017年 - 2018年

実環境でロバストに動作可能な高速高精度な一般物体認識技術の開発
JST, マッチングプランナープログラム
研究期間 2015年 - 2016年

文字工学リノベーション【分担者】
内田誠一
研究期間 2014年 - 2016年

大規模位置情報画像マイニングによる画像と視覚概念の関係の地域性に関する総合的研究
研究期間 2012年 - 2015年

集合知を用いた質感認知と物体認知の関係に関する大規模分析【質感脳情報学】
研究期間 2013年 - 2014年

Webマルチメディアマイニングによる動詞概念と名詞概念およびその関係の自動学習
研究期間 2011年 - 2013年

1000クラスに対応した大規模一般画像認識システムの実現
研究期間 2008年 - 2010年

時空間情報の利用による一般物体認識の研究
研究期間 2007年 - 2008年

実世界画像自動分類のためのWorldWideWebからの画像知識の獲得
研究期間 2004年 - 2006年

ベイジアンネットワークを用いたマルチエージェント型物体認識システムの実現
研究期間 2000年 - 2001年

ゲーム木の並列探索のための分散共有ハッシュ機構の実現と評価
野下浩平; 柳井啓司; 中山泰一
日本学術振興会, 科学研究費助成事業, 電気通信大学, 基盤研究(C), ゲーム木探索における局面表(トランスポジション表)は、局面の探索結果を表に登録し、同一または類似の局面の探索を表の参照ですませる高速化技法である。本研究では、分散的並列計算環境でゲーム木の並列探索を能率的に実行するために、計算機の間で局面表を共有する方式の設計、実現、評価、応用を行う。ネットワークにより複数の計算機を結合した分散的非共有メモリ型並列計算環境において、ゲーム木の並列探索を行うための基本システムを設計、実現、改良した。分散した計算機の間で共有するハッシュ表により共有局面表を実現した。この共有局面表の実現法を分散共有ハッシュ法とよぶ。共有局面表の構成法として、集中型と分散型の2種類を設計、実現した。評価のための実験対象として、オセロゲームと並列選択問題に関する応用問題をとりあげた。各問題に関する新しい理論的結果をいくつか示した。並列探索アルゴリズムを実現して、実行時間や各種オーバーヘッドを測定した。これにより、様々な側面から共有局面表の効果を調べた。各計算機の中だけの局所局面表を使う方法に対して、局所局面表に加えて共有局面表を使う方法の速度向上比を調べた。実例を用いて、並列計算の性能向上の目標に近い台数効果が実現できることを示した。逐次計算では時間がかかりすぎて解けなかった問題をいくつか並列計算により解いた。本研究では、通信速度が遅い分散的並列計算環境において、分散共有ハッシュ法により十分良い台数効果が達成できることを実験的に示した。, 10680340
研究期間 1998年 - 1999年

産業財産権

画像スタイル変換装置，画像スタイル変換方法及び画像スタイ変換プログラム
特許権, 特願2017-024688, 出願日: 2017年02月14日

線形識別器，大規模一般物体認識装置及び電子計算機
特許権, 河野憲之, 柳井啓司, 特願2014-150063, 出願日: 2014年07月23日

画像ランキング方法，プログラム及び記録媒体並びに画像表示システム
特許権, 川久保秀敏, 柳井啓司, 特願2010-109454, 出願日: 2010年05月11日, 特許第5569728号, 発行日: 2014年07月04日

摂取量推定装置，摂取量推定方法及びプログラム
特許権, 柳井啓司, 岡元晃一, 特願2014-132385, 出願日: 2014年06月27日

画像処理方法，その方法を実行するプログラム，記憶媒体，撮像機器，画像処理システム
特許権, 柳井啓司, 特願2008-106546, 出願日: 2008年04月16日, 特許第5018614号, 発行日: 2012年06月22日

メディア報道

NHK総合「NHKスペシャルエジプト悠久の王国プロローグピラミッド透視とファラオの謎」に出演
NHK, NHK総合「NHKスペシャルエジプト悠久の王国プロローグピラミッド透視とファラオの謎」
公開日 2024年03月31日

学術貢献活動

International Multimedia Modeling Conference (MMM)
学会・研究会等, 企画立案・運営等, General Co-chair, 実施期間 2025年01月

European Conference on Computer Vision (ECCV)
査読, Program Committee Member, 実施期間 2024年

ACM Multimedia
学会・研究会等, 査読, Program Committee Member, 実施期間 2024年

International Conference on Learning Representation (ICLR)
査読, 実施期間 2024年

Computer Vision and Pattern Recognition (CVPR)
査読, Area Chair, 実施期間 2024年

8th International Workshop on Multimedia Assisted Dietary Management (MADiMa) 企画立案・運営等
企画立案・運営等, Co-organizer, 実施期間 2023年

ACM Multimedia
査読, 実施期間 2023年

Neural Information Processing Systems (NeurIPS)
査読, 実施期間 2023年

International Conference on Computer Vision (ICCV)
査読, 実施期間 2023年

Computer Vision and Pattern Recognition (CVPR)
査読, Program Committee Member, 実施期間 2023年

International Multimedia Modeling Conference (MMM)
学会・研究会等, 査読, 実施期間 2023年01月

Neural Information Processing Systems (NeurIPS)
学会・研究会等, 査読, 実施期間 2022年12月

ACM Multimedia Asia
学会・研究会等, 企画立案・運営等, General Co-chair, 実施期間 2022年12月

ACM Multimedia (ACMMM)
学会・研究会等, その他, 実施期間 2022年10月

7th International Workshop on Multimedia Assisted Dietary Management (MADiMa)
学会・研究会等, 企画立案・運営等, Co-organizer, 実施期間 2022年10月

画像の認識・理解シンポジウム(MIRU)
学会・研究会等, その他, 実施期間 2022年08月

Computer Vision and Pattern Recognition (CVPR)
学会・研究会等, その他, 実施期間 2022年06月

ACM International Conference on Multimedia Retrieval (ICMR)
学会・研究会等, その他, 実施期間 2022年06月

International Conference on Learning Representation (ICLR)
学会・研究会等, その他, 実施期間 2022年04月

International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP)
学会・研究会等, その他, 実施期間 2022年02月

IEEE Winter Conference on Applications of Computer Vision (WACV)
学会・研究会等, その他, 実施期間 2022年01月

International Multimedia Modeling Conference (MMM)
学会・研究会等, その他, 実施期間 2022年01月

柳井 啓司

学位

研究キーワード

研究分野

経歴

学歴

受賞

論文

MISC

書籍等出版物

講演・口頭発表等

担当経験のある科目_授業

所属学協会

共同研究・競争的資金等の研究課題

産業財産権

メディア報道

学術貢献活動

柳井　啓司