KEIJI YANAI
Department of Informatics | Professor |
Cluster I (Informatics and Computer Engineering) | Professor |
Artificial Intelligence eXploration Research Center | Professor |
Researcher Information
Research Keyword
Career
- Apr. 2015
The Universiti of Electro-Communications, Tokyo, Department of Informatics, Professor - Apr. 2010 - Mar. 2015
The University of Electro-Communication, Tokyo, Department of Informatics, Associate Professor - Apr. 2006 - Mar. 2010
The University of Electro-Communications, Department of Computer Science, Associate Professor - Oct. 1997 - Mar. 2006
The University of Electro-Communications, Department of Computer Science, Research Associate - Nov. 2003 - Sep. 2004
The University of Arizona, Computer Science Department, Visiting Scholar
Research Activity Information
Award
- Aug. 2024
WaveFontStyler: 音に基づくフォントスタイル変換
MIRUデモ発表賞, 泉幸太;柳井啓司 - Dec. 2023
ACM Multimedia Asia
VQ-VDM: Video Diffusion Models with 3D VQGAN
Best Poster Award, Ryota Kaji;Keiji Yanai - Jul. 2023
International Conference on Machine Vision and Applications
QAHOI: Query-based Anchors for Human-Object Interaction Detection
Best Paper Award at MVA 2023, Junwen Chen;Keiji Yanai - Jul. 2023
画像の認識・理解シンポジウム (MIRU2023)
CLIPと微分可能レンダラーを用いたフォントスタイル変換
MIRUインタラクティブ発表賞, 泉幸太;柳井啓司 - Jul. 2023
画像の認識・理解シンポジウム (MIRU2023)
StableSeg: Stable Diffusionによるゼロショット領域分割
MIRU優秀賞, 本部勇真;山口廉斗;柳井啓司 - May 2023
Stable Diffusionによるゼロショット画像領域分割
PRMU研究奨励賞, 本部勇真 - Mar. 2023
Stable Diffusionによるゼロショット画像領域分割
PRMU月間ベストプレゼンテーション賞, 本部勇真 - Oct. 2022
MADiMa Best Paper Award, Shu Naritomi;Keiji Yanai
International society - Aug. 2021
CEA Best Paper Award, Kaimu Okamoto;Kento Adachi;Keiji Yanai
International society - Aug. 2020
画像の認識・理解シンポジウム(MIRU2020)
意味と形状の分離によるマルチモーダルレシピ検索及び画像生成
MIRU優秀賞
Japan society - Mar. 2020
電子情報通信学会
深層学習による太陽画像からの太陽黒点数の推定
電子情報通信学会 学術奨励賞, 樋口陽光;會下拓実;柳井啓司
Japan society - Aug. 2019
画像の認識・理解シンポジウム(MIRU)
自己教師あり学習による変化領域の推論を活用した弱教師あり領域分割
MIRU学生奨励賞, 下田 和
Japan society - Jun. 2018
電子情報通信学会
ISS査読功労賞
Japan society - Mar. 2018
データ工学と情報マネジメントに関するフォーラム(DEIM2018)
Conditional GANによる食事写真の属性操作
学生プレゼンテーション賞, 成冨 志優
Japan society - Mar. 2018
データ工学と情報マネジメントに関するフォーラム(DEIM2019)
AR DeepCalorieCam: AR表示型食事カロリー量推定システム
優秀インタラクティブ賞, 丹野 良介;會下 拓実;柳井 啓司
Japan society - Aug. 2017
画像の認識・理解シンポジウム(MIRU)
Conditional GAN を用いた複数詳細カテゴリ画像の合成
MIRU学生奨励賞
Japan society - Aug. 2017
画像の認識・理解シンポジウム(MIRU)
Unseen Style Transfer Network
MIRUインタラクティブ発表賞, Ryosuke Tanno;Keiji Yanai
Japan society - Aug. 2017
画像の認識・理解シンポジウム(MIRU)
ConvDeconvNetの効率的モバイル実装による 画像変換・物体検出・領域分割リアルタイムiOSアプリ群
MIRUデモ発表賞, 丹野 良介;泉 裕貴;柳井 啓司
Japan society - May 2017
情報処理学会コンピュータビジョンとイメージメディア研究会`
食事レシピ情報を利用した食事画像からのカロリー量推定
卒論セッション優秀賞, 會下 拓実
Japan society - Mar. 2017
データ工学と情報マネジメントに関するフォーラム(DEIM2017)
Multi Style Transfer: 複数のスタイルの任意重み合成によるモバイル上でのリアルタイム画風変換
学生プレゼンテーション賞, 丹野良介
Japan society - Mar. 2017
データ工学と情報マネジメントに関するフォーラム(DEIM2017)
Multi-task CNNによる食事画像からのカロリー量推定
学生プレゼンテーション賞, 會下 拓実
Japan society - Jan. 2017
International MultiMedia Modeling Conference (MMM)
DeepStyleCam: A Real-Time Style Transfer App on iOS
MMM Best Demo Award, Ryosuke Tanno;Keiji Yanai
International society - Aug. 2016
画像の認識・理解シンポジウム(MIRU)
Style Image Retrieval Using CNN-Based Style Vector
MIRU学生奨励賞, 松尾 真
Japan society - Aug. 2016
画像の認識・理解シンポジウム(MIRU)
CNNの順・逆伝搬値とCRFを利用した弱教師領域分割
MIRU学生奨励賞, 下田 和
Japan society - Mar. 2016
データ工学と情報マネジメントに関するフォーラム(DEIM2016)
モバイルOS上での深層学習による画像認識システムの実装と比較分析
優秀インタラクティブ賞,学生プレゼンテーション賞, 丹野良介
Japan society - Jul. 2015
画像の認識・理解シンポジウム(MIRU)
料理写真撮影におけるおいしそうな構図決定を支援するシステム
MIRUインタラクティブ発表賞, 柿森 隆生;岡部 誠;柳井 啓司;尾内 理紀夫
Japan society - Jan. 2014
International MultiMedia Modeling Conference (MMM)
Ireland
Best Demo Award, Yoshiyuki Kawano;Keiji Yanai
International society, Ireland
Paper
- Focusing on what to decode and what to train: SOVDecoding with Specific Target Guided DeNoising and Vision Language Advisor
Junwen Chen; Yingcheng Wang; Keiji Yanai
Proc. of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Feb. 2025, Peer-reviwed - WaveFontStyler: Font Style Transfer Based on Sound
Kota Izumi; Keiji Yanai
Proc. of the International Multimedia Modeling Conference (MMM) (demo), Jan. 2025, Peer-reviwed - KuzushijiFontDiff: Diffusion Model for Japanese Kuzushiji Font Generation
Honghui Yuan; Keiji Yanai
Proc. of the International Multimedia Modeling Conference (MMM) (demo), Jan. 2025, Peer-reviwed - SceneTextStyler: Editing Text with Style Transformation
Honghui Yuan; Keiji Yanai
Proc. of the International Multimedia Modeling Conference (MMM) (demo), Jan. 2025, Peer-reviwed - CalorieVoL: Integrating Volumetric Context into Multimodal Large Language Models for Image-based Calorie Estimation
Hikaru Tanabe; Keiji Yanai
Proc. of the International Multimedia Modeling Conference (MMM), Jan. 2025, Peer-reviwed - KuzushijiDiffuser: Japanese Kuzushiji Font Generation with FontDiffuser
Honghui Yuan; Keiji Yanai
Proc. of International Conference on Multimedia Modeling (MMM), Jan. 2025, Peer-reviwed - Health Literacy and Internet Use Among Japanese Older Adults: A Gender-Stratified Cross-Sectional Analysis of the Moderating Effects of Neighborhood Relationships
Tsubasa Nakada; Kayo Kurotani; Satoshi Seino; takako kozawa; Shinichi Murota; Miki Eto; Junko Shimasawa; YUMIKO SHIMIZU; Shinobu Tsurugano; Fuminori KATSUKAWA; Kazunori Sakamoto; Hironori Washizaki; Yo Ishigaki; Maki Sakamoto; Keiki Takadama; Keiji Yanai; Osamu Matsuo; Chiyoko Kameue; Hitomi Suzuki; Kazunori Ohkawara
Healthcare, Dec. 2024
Scientific journal - Exploring Cross-Attention Maps in Multi-modal Diffusion Transformers for Training-Free Semantic Segmentation
Rento Yamaguchi; Keiji Yanai
Proc. of ACCV Workshop on Large Vision – Language Model Learning and Applications (LAVA), Dec. 2024, Peer-reviwed - Vector Logo Image Synthesis Using Differentiable Renderer
Ryuta Yamakura; Keiji Yanai
Proc. of ACCV Workshop on Rich Media and Genarative AI, Dec. 2024, Peer-reviwed - Calorie-Aware Food Image Editing with Image Generation Models
Kohei Yamamoto; Honghui Yuan; Keiji Yanai
International Workshop on Multimedia Assisted Dietary Management, Dec. 2024, Peer-reviwed - CalorieLLaVA: Image-based Calorie Estimation with Multimodal Large Language Models
Hikaru Tanabe; Keiji Yanai
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADiMa), Dec. 2024, Peer-reviwed - Font Style Translation in Scene Text Images with CLIPstyler
Honghui Yuan; Keiji Yanai
Proc. of International Conference on Pattern Recognition (ICPR), Dec. 2024, Peer-reviwed - Act-ChatGPT: Introducing Action Features into Multi-Modal Large Language Models for Video Understanding
Yuto Nakamizo; Keiji Yanai
Proc. of International Conference on Pattern Recognition (ICPR), Dec. 2024, Peer-reviwed - RecipeSD: Injecting Recipe into Food Image Synthesis with Stable Diffusion
Jing Yang; Junwen Chen; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice (McGE), Oct. 2024, Peer-reviwed - Patent Image RetrievalUsing Cross-entropy-based Metric Learning
Kotaro Higuchi; Yuma Honbu; Keiji Yanai
Proc.of International Workshop on Frontiers of Computer Vision (IW-FCV),, Feb. 2023, Peer-reviwed - Zero-shot Font Style Transfer with a Differentiable Renderer
Kota Izumi; Keiji Yanai
Proc. of ACM Multimedia Asia, -, -, Dec. 2022, Peer-reviwed
International conference proceedings, English - Parallel Queries for Human-Object Interaction Detection
Junwen Chen; Keiji Yanai
Proc. of ACM Multimedia Asia, -, -, Dec. 2022, Peer-reviwed
International conference proceedings, English - SetMealAsYouLike: Sketch-based Set Meal Image Synthesis with Plate Annotations
Yuma Honbu; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -, -, Oct. 2022, Peer-reviwed
International conference proceedings, English - DepthGrillCam: A Mobile Application for Real-time Eating Action Recording Using RGB-D Images
Kento Adachi; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -, -, Oct. 2022, Peer-reviwed
International conference proceedings, English - Text-based Image Editing for Food Images with CLIP
Kohei Yamamoto; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -, -, Oct. 2022, Peer-reviwed
International conference proceedings, English - Real Scale 3D Reconstruction of a Dish and a Plate using Implicit Function and a Single RGB-D Image
Shu Naritomi; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -, -, Oct. 2022, Peer-reviwed
International conference proceedings, English - Continual Learning in Vision Transformer
Mana Takeda; Keiji Yanai
Proc.of IEEE International Conference on Image Processing (ICIP), -, -, Oct. 2022, Peer-reviwed
International conference proceedings, English - FASSD-Net: Fast and Accurate Real-Time Semantic Segmentation for Embedded Systems
L. Rosas-Arias; G. Benitez-Garcia; J. Portillo-Portillo; J. Olivares-Mercado; G. Sanchez-Perez; K. Yanai
IEEE Transactions on Intelligent Transportation Systems, IEEE, 23, 9, 14349-14360, Sep. 2022, Peer-reviwed
Scientific journal, English - StyleGAN-based CLIP-guided Image Shape Manipulation
Yuchen Qian; Kohei Yamamoto; Keiji Yanai
Proc.of International Conference on Content-based Multimedia Indexing (CBMI), -, -, Sep. 2022, Peer-reviwed
International conference proceedings, English - Unseen Food Segmentation
Yuma Honbu; Keiji Yanai
Proc.of ACM International Conference on Multimedia Retrieval (ICMR), -, -, Jun. 2022, Peer-reviwed
International conference proceedings, English - Ketchup As You Like: Drawing Editor for Foods
Shu Naritomi; Gibran Benitez-Garcia; Keiji Yanai
Proc. of IEEE Airtificial Intelligence and Virtual Realty (IEEE AIVR) (demo paper), -, -, Nov. 2021, Peer-reviwed
International conference proceedings, English - Pose Sequence Generation with a GCN and an Initial Pose Generator
Kento Terauchi; Keiji Yanai
Proc. of Asian Conference on Pattern Recgonition (ACPR), -, -, Nov. 2021, Peer-reviwed
International conference proceedings, English - Few-Shot and Zero-Shot Semantic Segmentation for Food Images
Yuma Honbu; Keiji Yanai
Proc. of ICMR WS on Multimedia for Cooking and Eating Activities (CEA), -, -, Nov. 2021, Peer-reviwed
International conference proceedings, English - Region-Based Food Calorie Estimation for Multiple-Dish Meals
Kaimu Okamoto; Kento Adachi; Keiji Yanai
Proc. of ICMR WS on Multimedia for Cooking and Eating Activities (CEA), -, -, Nov. 2021, Peer-reviwed
International conference proceedings, English - Ketchup GAN: A New Dataset for Realistic Synthesis of Letters on Food
Gibran Benitez-Garcia; Keiji Yanai
Proc. of ICMR WS on Multimedia Artworks Analysis and Attractiveness Computing (MMArt), -, -, Nov. 2021, Peer-reviwed
International conference proceedings, English - 3D Mesh Reconstruction of Foods from a Single Image
Shu Naritomi; Keiji Yanai
Proc. of ACM Multimedia WS on AIxFood, -, -, Oct. 2021, Peer-reviwed
International conference proceedings, English - Cross-Modal Recipe Embeddings by Disentangling Recipe Contents and Dish Styles
Yu Sugiyama; Keiji Yanai
Proc. of ACM Multimedia, -, -, Oct. 2021, Peer-reviwed
International conference proceedings, English - Pop'n Food: 3D Food Model Estimation System from a Single Image
Shu Naritomi; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval, -, -, Sep. 2021, Peer-reviwed
International conference proceedings, English - A study on persistence of gan-based vision-induced gustatory manipulation
Kizashi Nakano; Daichi Horita; Norihiko Kawai; Naoya Isoyama; Nobuchika Sakata; Kiyoshi Kiyokawa; Keiji Yanai; Takuji Narumi
Electronics (Switzerland), 10, 10, 02 May 2021, Vision-induced gustatory manipulation interfaces can help people with dietary restrictions feel as if they are eating what they want by modulating the appearance of the alternative foods they are eating in reality. However, it is still unclear whether vision-induced gustatory change persists beyond a single bite, how the sensation changes over time, and how it varies among individuals from different cultural backgrounds. The present paper reports on a user study conducted to answer these questions using a generative adversarial network (GAN)-based real-time image-to-image translation system. In the user study, 16 participants were presented somen noodles or steamed rice through a video see-through head mounted display (HMD) both in two conditions; without or with visual modulation (somen noodles and steamed rice were translated into ramen noodles and curry and rice, respectively), and brought food to the mouth and tasted it five times with an interval of two minutes. The results of the experiments revealed that vision-induced gustatory manipulation is persistent in many participants. Their persistent gustatory changes are divided into three groups: those in which the intensity of the gustatory change gradually increased, those in which it gradually decreased, and those in which it did not fluctuate, each with about the same number of participants. Although the generalizability is limited due to the small population, it was also found that non-Japanese and male participants tended to perceive stronger gustatory manipulation compared to Japanese and female participants. We believe that our study deepens our understanding and insight into vision-induced gustatory manipulation and encourages further investigation.
Scientific journal - Multi-Style Transfer Generative Adversarial Network for Text Images
Yuan Honghui; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval, -, -, Mar. 2021, Peer-reviwed
International conference proceedings, English - Hungry Networks: 3D Mesh Reconstruction of a Dish and a Plate from a Single Dish Image for Estimating Food Volume
Shu Naritomi; Keiji Yanai
Proc. of ACM Multimedia Asia, -, -, -, Feb. 2021, Peer-reviwed
International conference proceedings, English - Training of Multiple and Mixed Tasks With A Single Network Using Feature Modulation
Mana Takeda; Gibran Benitez-Garcia; Keiji Yanai
Proc. of ICPR Workshop on Deep Learning for Pattern Recognition, -, -, Jan. 2021, Peer-reviwed
International conference proceedings, English - Rescue Dog Action Recognition by Integrating Ego-centric Video, Sound and Sensor Information
Yuta Ide; Tsuyohito Araki; Ryunosuke Hamada; Kazunori Ohno; Keiji Yanai
Proc. of ICPR Workshop on Applications of Egocentric Vision, -, -, Jan. 2021, Peer-reviwed
International conference proceedings, English - UEC-FoodPix Complete: A Large-scale Food Image Segmentation Dataset
Kaimu Okamoto; Keiji Yanai
Proc. of ICPR Workshop on Multimedia Assisted Dietary Management, -, -, Jan. 2021, Peer-reviwed
International conference proceedings, English - Mask-Based Style-Controlled Image Synthesis Using a Mask Style Encoder
Jaehyeong Cho; Wataru Shimoda; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition (ICPR), -, Jan. 2021, Peer-reviwed
International conference proceedings, English - IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition
Gibran Benitez-Garcia; Jesus Olivares-Mercado; Gabriel Sanchez-Perez; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition (ICPR), -, Jan. 2021, Peer-reviwed
International conference proceedings, English - Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions
Leonel Rosas-Arias; Gibran Benitez-Garcia, Jose Portillo-Portillo; Gabriel Sanchez-Perez; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition (ICPR), -, Jan. 2021, Peer-reviwed
International conference proceedings, English - Food Image Generation and Translation and Its Application to Augmented Reality
Keiji Yanai; Daichi Horita; Jaehyeong Cho
Proceedings - 3rd International Conference on Multimedia Information Processing and Retrieval, MIPR 2020, Institute of Electrical and Electronics Engineers Inc., 181-186, 01 Aug. 2020, Food image recognition has drawn attention because it is important technology to record and manage people's eating habits automatically. In addition, recently, due to the great progress of generative adversarial networks (GAN) and GAN-based image translation, we can generate food images and change the categories of the foods in food images. In addition, food-To-food image translation can be integrated with augmented reality for new eating experiences as well as food entertainments. In this paper, we introduce some works of ours on food image generation and translation and its application to augmented reality (AR). (1) Conditional food image generation. (2) Food image translation system, 'MagicalRice-Bowl' and AR-based system for new eating experience, 'DeepTaste'. (3) Interactive sketch-based ramen image generating system, 'RamenAsYouLike' and its extension with a ramen style encoder.
International conference proceedings, English - Weakly-supervised plate and food region segmentation
Wataru Shimoda; Keiji Yanai
Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2020-, 01 Jul. 2020, In this paper, we propose a novel method to infer plate regions of food images without any pixel-wise annotation. We synthesize plate segmentation masks using difference of visualization in food image classifiers. To be concrete, we use two types of classifiers: a food category classifier and a food/non-food classifier. Using the Class Activation Mapping (CAM) which is one of the basic visualization techniques of CNNs, a food category classifier can highlight food regions containing no plate regions, while a food/non-food category classifier can highlight food regions including plate regions. By taking advantage of the difference between the food regions estimated by visualization of two kinds of the classifiers, in this paper, we demonstrate that we can estimate plate regions without any pixel-wise annotation, and we proposed the approach for boosting the accuracy of weakly-supervised food segmentation using the plate segmentation. In experiments, we show the effectiveness of the proposed approach by evaluating and comparing the accuracy of the weakly-supervised segmentation. The proposed approaches certainly improved an image-level weakly-supervised segmentation method in the food domain and outperformed a well-known bounding box-level weakly-supervised segmentation method.
International conference proceedings, English - Predicting Plate Regions for Weakly-supervised Food Image Segmentation
Wataru Shimoda; Keiji Yanai
Proc. of IEEE International Conference on Multimedia and Expo (ICME), -, Jul. 2020, Peer-reviwed
International conference proceedings, English - Iconify: Converting Photographs into Icons
Takuro Karamatsu; Gibran Benitez-Garcia; Keiji Yanai; Seiichi Uchida
Proc. of ACM ICMR Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM), -, 7-12, Jun. 2020, Peer-reviwed
International conference proceedings, English - Style Image Retrieval for Improving Material Translation Using Neural Style Transfer
Gibran Benitez-Garcia; Wataru Shimoda; Keiji Yanai
Proc. of ACM ICMR Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM), -, Jun. 2020, Peer-reviwed
International conference proceedings, English - CalorieCaptorGlass: Food Calorie Estimation based on Actual Size using HoloLens and Deep Learning
Shu Naritomi; Keiji Yanai
Proc. of IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR) Demo Track, -, Mar. 2020, Peer-reviwed
International conference proceedings, English - Weakly Supervised Semantic Segmentation Using Distinct Class Specific Saliency Maps
Wataru Shimoda; Keiji Yanai
Computer Vision and Image Understanding, Elsevier, 191, -, Feb. 2020, Peer-reviwed
Scientific journal, English - Partial Image Texture Translation Using Weakly-Supervised Semantic Segmentation
Gibran Benitez-Garcia; Wataru Shimoda; Shin Matsuo; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Science and Business Media Deutschland GmbH, 12331, 387-401, 2020, The field of Neural Style Transfer (NST) has led to interesting applications that enables to transform the reality as human beings perceive. Particularly, NST for material translation aims to change the material (texture) of an object to a different material from a desired image. In order to generate more realistic results, in this paper, we propose a partial texture style transfer method by combining NST with semantic segmentation. The original NST algorithm changes the style of an entire image including the style of background even though the texture is contained only in object regions. Therefore, we segment target objects using a weakly supervised segmentation method, and transfer the material of the style image to only material-based segmented areas. As a result, we achieved partial style transfer for only specific object regions, which enables us to change materials of objects in a given image as we like. Furthermore, we analyze the material translation capability of state-of-the-art image-to-image (I2I) translation algorithms, including the conventional NST method of Gatys, WCT, StarGAN, MUNIT, and DRIT++. The analysis of our experimental results suggests that the conventional NST produces more realistic results than other I2I translation methods. Moreover, there are certain materials that are easier to synthesize than others.
International conference proceedings, English - UEC at TRECVID 2014 SIN task
Keiji Yanai; Hiroyoshi Harada; Do Hang Nga
2014 TREC Video Retrieval Evaluation, TRECVID 2014, National Institute of Standards and Technology (NIST), 2020, In this paper, we describe our approach and results for the semantics indexing (SIN) task at TRECVID2014. We submitted four runs for the SIN task of TRECVID 2014 which includes one run submitted last year as a progress run for the 2014 dataset. In the best run of ours, we used deep convolutional neural network features, Fisher Vector with dense SIFT and Fisher Vector with spatio-temporal local features as features, and combined them in the manner of late fusion. As a result, the best run achieved the mean infAP=0.1537.
International conference proceedings, English - SSA-GAN: End-to-End Time-Lapse Generation with Spatial Self-Attention
Daichi Horita; Keiji Yanai
Proc. of Asian Conference on Pattern Recgonition (ACPR), -, Nov. 2019, Peer-reviwed
International conference proceedings, English - Pre-trained and Shared Encoder in Cycle-Consistent Adversarial Networks to Improve Image Quality
Runtong Zhang; Yuchen Wu; Keiji Yanai
Proc. of Asian Conference on Pattern Recognition (ACPR), -, Nov. 2019, Peer-reviwed
International conference proceedings, English - Continual Learning of An Image Transformation Network Using Task-dependent Weight Selection Masks
Asato Matsumoto; Keiji Yanai
Proc. of Asian Conference on Pattern Recgonition (ACPR), -, Nov. 2019, Peer-reviwed
International conference proceedings, English - Attention Guided Unsupervised Image-to-Image Translation with Progressively Growing Strategy
Yuchen Wu; Runtong Zhang; Keiji Yanai
Proc. of ACPR Workshop on Advances and Applications on Generative Deep Learning Models (AAGM), -, Nov. 2019, Peer-reviwed
International conference proceedings, English - MADima'19: 5th international workshop on multimedia assisted dietary management
Stavroula G. Mougiakakou; Keiji Yanai; Giovanni Maria Farinella; Dario Allegra
MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2722-2723, 15 Oct. 2019, This abstract provides a summary and overview of the 5th International Workshop on Multimedia Assisted Dietary Management.
International conference proceedings, English - Self-supervised Difference Detection for Weakly-supervised Semantic Segmentation
Wataru Shimoda; Keiji Yanai
Proc. of IEEE/CVF International Conference on Computer Vision (ICCV), -, Oct. 2019, Peer-reviwed
International conference proceedings, English - DepthCalorieCam: A Mobile Application for Volume-Based Food Calorie Estimation using Depth Cameras
Yoshikazu Ando; Takumi Ege; Jaehyeong Cho; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA)-, -, Oct. 2019, Peer-reviwed
International conference proceedings, English - A New Large-scale Food Image Segmentation Dataset and Its Application to Food Calorie Estimation Based on Grains of Rice
Takumi Ege; Wataru Shimoda; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -, Oct. 2019, Peer-reviwed
International conference proceedings, English - Unseen Food Creation by Mixing Existing Food Images with Conditional StyleGAN
Daichi Horita; Wataru Shimoda; Keiji Yanai
Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA), -, Oct. 2019, Peer-reviwed
International conference proceedings, English - Ramen as You Like: Sketch-based Food Image Generation and Editing
Jaehyeong Cho; Wataru Shimoda; Keiji Yanai
Proc. of ACM Multimedia (demo paper), -, Oct. 2019, Peer-reviwed
International conference proceedings, English - Zero-Annotation Plate Segmentation Using a Food Category Classifier and a Food/Non-Food Classifier
Wataru Shimoda; Keiji yanai
Proc. of ICCV Workshop on Multi-Discipline Approach for Learning Concepts (MDALC), -, Oct. 2019, Peer-reviwed
International conference proceedings, English - Dog-Centric Activity Recognition by Integrating Appearance, Motion and Sound
Tsuyohito Araki; Ryunosuke Hamada; Kazunori Ohno; Keiji Yanai
Proc. of ICCV Workshop on Egocentric Perception, Interaction and Computing (EPIC), -, Oct. 2019, Peer-reviwed
International conference proceedings, English - Analyzing Regional Food Trends with Geo-tagged Twitter Food Photos
Kaimu Okamoto; Keiji Yanai
Proc. of International Conference on Content-Based Multimedia Indexing (CBMI), -, Sep. 2019, Peer-reviwed
International conference proceedings, English - Large-scale Twitter Food Photo Mining and Its Applications
Keiji Yanai; Kaimu Okamoto; Tetsuya Nagano; Daichi Horita
Proc. of International Conference on Multimedia Big Data (BIGMM), -, Sep. 2019, Peer-reviwed, Invited
International conference proceedings, English - Self-supervised Difference Detection for Refinement CRF and Seek Interpolation
Wataru Shimoda; Keiji yanai
Proc. of CVPR WS on Learning from Imperfect Data, -, Jun. 2019, Peer-reviwed
International conference proceedings, English - Enchanting your noodles: A gustatory manipulation interface by using GAN-based real-time food-to-food translation
Kizashi Nakano; Kiyoshi Kiyokawa; Daichi Horita; Keiji Yanai; Nobuchika Sakata; Takuji Narurni
26th IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2019 - Proceedings, 1339-1340, Mar. 2019, In this demonstration, we present a novel gustatory manipulation interface which utilizes the cross-modal effect of vision on taste elicited with real-time food appearance modulation using a generative adversarial network (GAN). Unlike existing systems which only change color or texture pattern of a particular type of food in an inflexible manner, our system changes the appearance of food into multiple types of food in real-time flexibly, dynamically and interactively in accordance with the deformation of the food that the user is actually eating by using GAN-based image-to-image translation. Our system can turn somen noodles into ramen noodles or fried noodles, or steamed rice into curry and rice or fried rice. Users of our demonstration system will taste what is visually presented to some extent rather than what they are actually eating.
International conference proceedings - A Large-scale Analysis of Regional Tendency of Twitter Photos Using Only Image Features
Tetsuya Nagano; Takumi Ege; Wataru Shimoda; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR), -, Mar. 2019, Peer-reviwed
International conference proceedings, English - Image-Based Estimation of Real Food Size for Accurate Food Calorie Estimation
Takumi Ege; Yoshikazu Ando; Ryosuke Tanno; Wataru Shimoda; Keiji Yanai
Proc. of IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR), -, Mar. 2019, Peer-reviwed
International conference proceedings, English - Mosquito Larvae Image Classification Based on DenseNet and Guided Grad-CAM
Zaira García; Keiji Yanai; Mariko Nakano; Antonio Arista; Laura Cleofas Sanchez; Hector Perez
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer, 11868, 239-246, 2019, The surveillance of Aedes aegypti and Aedes albopictus mosquito to avoid the spreading of arboviruses that cause Dengue, Zika and Chikungunya becomes more important, because these diseases have greatest repercussions in public health in the significant extension of the world. Mosquito larvae identification methods require special equipment, skillful entomologists and tedious work with considerable consuming time. In comparison with the short mosquito lifecycle, which is less than 2 weeks, the time required for all surveillance process is too long. In this paper, we proposed a novel technological approach based on Deep Neural Networks (DNNs) and visualization techniques to classify mosquito larvae images using the comb-like figure appeared in the eighth segment of the larva’s abdomen. We present the DNN and the visualization technique employed in this work, and the results achieved after training the DNN to classify an input image into two classes: Aedes and Non-Aedes mosquito. Based on the proposed scheme, we obtain the accuracy, sensitivity and specificity, and compare this performance with existing technological approaches to demonstrate that the automatic identification process offered by the proposed scheme provides a better identification performance.
International conference proceedings, English - DeepTaste: Augmented Reality Gustatory Manipulation with GAN-Based Real-Time Food-to-Food Translation.
Kizashi Nakano; Daichi Horita; Nobuchika Sakata; Kiyoshi Kiyokawa; Keiji Yanai; Takuji Narumi
Proc. of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), IEEE, -, 212-223, 2019, Peer-reviwed
International conference proceedings, English - Webly-Supervised Food Detection with Foodness Proposal
Wataru Shimoda; Keiji Yanai
IEICE Transactions on Information and Systems, IEICE, -, -, 2019, Peer-reviwed
Scientific journal, English - Simultaneous Estimation of Dish Locations and Calories with Multi-task Learning
Takumi Ege; Keiji Yanai
IEICE Transactions on Information and Systems, IEICE, -, -, 2019, Peer-reviwed
Scientific journal, English - FoodChangeLens: CNN-based Food Transformation on HoloLens
Shu Naritmo; Ryosuke Tanno; Takumi Ege; Keiji Yanai
Proc. of International Workshop on Interface and Experience Design with AI for VR/AR (DAIVAR 2018), -, Dec. 2018, Peer-reviwed
International conference proceedings, English - Word-Conditioned Image Style Transfer
Yu Sugiyama; Keiji Yanai
Proc. of ACCV Workshop on AI Aesthetics in Art and Media, -, Dec. 2018, Peer-reviwed
International conference proceedings, English - Font Style Transfer Using Neural Style Transfer and Unsupervised Cross-domain Transfer
Atsushi Narusawa; Wataru Shimoda; Keiji Yanai
Proc. of ACCV Workshop on AI Aesthetics in Art and Media, -, Dec. 2018, Peer-reviwed
International conference proceedings, English - Real-Time Image Classification and Transformation Apps on iOS by "Chainer2MPSNNGraph"
Yuki Izumi; Daichi Horita; Ryosuke Tanno; Keiji Yanai
Proc. of NIPS WS on Machine Learning on the Phone and other Consumer Devices (MLPCD), -, Dec. 2018, Peer-reviwed
International conference proceedings, English - Continual Learning for an Encoder-Decoder CNN Using "Piggyback"
Asato Matsumoto; Keiji Yanai
Proc. of NIPS Continual Learning Workshop, -, Dec. 2018, Peer-reviwed
International conference proceedings, English - CNN-based photo transformation for improving attractiveness of ramen photos
Daichi Horita; Jaehyeong Cho; Takumi Ege; Keiji Yanai
Proc. of ACM Symposium on Virtual Reality Software and Technology (VRST), -, Nov. 2018, Peer-reviwed
International conference proceedings, English - AR DeepCalorieCam V2: Food Calorie Estimation with CNN and AR-based Actual Size Estimation
Ryosuke Tanno; Takumi Ege; Keiji Yanai
Proc. of ACM Symposium on Virtual Reality Software and Technology (VRST), -, Nov. 2018, Peer-reviwed
International conference proceedings, English - Magical Rice Bowl: Real-time Food Category Changer
Ryosuke Tanno; Daichi Horita; Wataru Shimoda; Keiji Yanai
Proc. of ACM Multimedia, -, -, Oct. 2018, Peer-reviwed
International conference proceedings, English - CNN 特徴量学習に基づく画像検索による食事画像カロリー量推定
會下拓実; 下田和; 柳井啓司
電子情報通信学会論文誌 D, J101, 8, 1099-1109, Aug. 2018, Peer-reviwed
Scientific journal, Japanese - Multi-task Learning of Dish Detection and Calorie Estimation
Takumi Ege; Keiji Yanai
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), 53-58, Jul. 2018, Peer-reviwed
International conference proceedings, English - Food Category Transfer with Conditional Cycle GAN and a Large-scale Food Image Dataset
Daichi Horita; Ryosuke Tanno; Wataru Shimoda; Keiji Yanai
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), 67-70, Jul. 2018, Peer-reviwed
International conference proceedings, English - Food Image Generation using A Large Amount of Food Images with Conditional GAN: RamenGAN and RecipeGAN
Yoshifumi Ito; Wataru Shimoda; Keiji Yanai
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), 71-74, Jul. 2018, Peer-reviwed
International conference proceedings, English - Image-based food calorie estimation using recipe information
Takumi Ege; Keiji Yanai
IEICE Transactions on Information and Systems, Institute of Electronics, Information and Communication, Engineers, IEICE, E101D, 5, 1333-1341, 01 May 2018, Peer-reviwed, Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.
International conference proceedings, English - Font image conversion using style transfer and cross domain transfer learning
Wataru Shimoda; Atsushi Narusawa; Keiji Yanai
Proceedings of the International Display Workshops, International Display Workshops, 3, 1410-1413, 2018, In this paper we study about font generation and conversion. Previous methods dealt with characters as set of stlokes and define models of the stloke for each character. On the contrary, we extract features, which equivalent to the stlokes from font images and transferring texture or pattern using deep learning. We expect that generation of original font such as hand written characters will be generated automatically by our proposed approach. In experiments we construct unique datasets such as a ketchup character image dataset and improve image generation quality for readability of character by combining neural style transfer with cross domain learning.
International conference proceedings, English - Ar deepcaloriecam: an ios app for food calorie estimation with augmented reality
Ryosuke Tanno; Takumi Ege; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 10705, 352-356, 2018, Peer-reviwed, A food photo generally includes several kinds of food dishes. In order to recognize multiple dishes in a food photo, we need to detect each dish in a food image. Meanwhile, in recent years, the accuracy of object detection has improved drastically by the appearance of Convolutional Neural Network (CNN). In this demo, we present two automatic calorie estimation apps, DeepCalorieCam and AR DeepCalorieCam, running on iOS. DeepCalorieCam can estimate food calories by detecting dishes from the video stream captured from the built-in camera of an iPhone. We use YOLOv2, [1] which is the state-of-the-art object detector using CNN, as a dish detector to detect each dish in a food image, and the food calorie of each detected dish is estimated by image-based food calorie estimation, [2, 3]. AR DeepCalorieCam is a combination of calorie estimation and augmented reality (AR) which is an AR version of DeepCalorieCam.
International conference proceedings, English - An Integration of Bottom-up and Top-Down Salient Cues on RGB-D Data: Saliency from Objectness vs. Non-Objectness
Nevrez Imamoglu; Wataru Shimoda; Chi Zhang; Yuming Fang; Asako; Kanezaki; Keiji Yanai; Yoshifumi Nishida
Signal, Image and Video Processin, Springer, -, 2, 307-314, 2018, Peer-reviwed
Scientific journal, English - Predicting Segmentation "Easiness'" from the Consistency for Weakly-Supervised Segmentation
Wataru Shimoda; Keiji Yanai
Proc. of Asian Conference on Pattern Recognition (ACPR), -, Nov. 2017, Peer-reviwed
International conference proceedings, English - Estimating Food Calories for Multiple-dish Food Photos,
Takumi Ege; Keiji Yanai
Proc. of Asian Conference on Pattern Recognition (ACPR), -, Nov. 2017, Peer-reviwed
International conference proceedings, English - Scene Text Eraser
Toshiki Nakamura; Anna Zhu; Keiji Yanai; Seiichi Uchida
Proc. of the International Conference on Document Analysis and Recognitino (ICDAR), -, Nov. 2017, Peer-reviwed
International conference proceedings, English - Neural Font Style Transfer
Gantugs Atarsaikhan; Brian Kenji Iwana; Atsushi Narusawa; Keiji Yanai; Seiichi Uchida
Proc. of ICDAR Workshop on Machine Learning, -, Nov. 2017, Peer-reviwed
International conference proceedings, English - Image-Based Food Calorie Estimation Using Knowledge on Food Categories, Ingredients and Cooking Directions
Takumi Ege; Keiji Yana
Proc. of ACM Multimedia Thematic Workshops on Understanding, -, Oct. 2017, Peer-reviwed
International conference proceedings, English - Takumi Ege and Keiji Yanai
Comparison of Two; Approaches for Direct Food; Calorie Estimation
Proc. of International Workshop on Multimedia Assisted Dietary Management (MADIMA), -, Sep. 2017, Peer-reviwed
International conference proceedings, English - Neural Style Vector を用いた絵画画像のスタイル検索
松尾真; 柳井啓司
電子情報通信学会論文誌D, J100-D, 8, 742-749, 01 Aug. 2017, Peer-reviwed
Scientific journal, Japanese - Partial Style Transfer Using Weakly-Supervised Semantic Segmentation
Shin Matsuo; Wataru Shimoda; Keiji Yanai
Proc. ICME Workshop on Multimedia Artworks Analysis (MMArt), -, Jul. 2017, Peer-reviwed
International conference proceedings, English - Learning Food Image Similarity for Food Image Retrieval
Wataru Shimoda; Keiji Yanai
Proceedings - 2017 IEEE 3rd International Conference on Multimedia Big Data, BigMM 2017, Institute of Electrical and Electronics Engineers Inc., 165-168, 30 Jun. 2017, Peer-reviwed, For food application, recipe retrieval is an important task. However, many of them rely on only text query. Food image retrieval has relation to recipe retrieval so that similar food images are expected that they have similar recipes. Rising image retrieval performance is desired for recipe retrieval. On the other hand, to learn similarity by Siamese Network or Triplet Network are known as an effective method for image retrieval. However, there are no research for food image retrieval using similarity learning with Convolutional Neural Network as far as we know. Food recognition is known as one of fine-grained recognition tasks. Therefore it is unclear that how effective similarity learning methods based on CNN in food images. In our work, we trained some networks for feature similarity, and evaluated their effectiveness in food image retrieval.
International conference proceedings, English - Conditional fast style transfer network
Keiji Yanai; Ryosuke Tanno
ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval, Association for Computing Machinery, Inc, -, 434-437, 06 Jun. 2017, Peer-reviwed, In this paper, we propose a conditional fast neural style transfer network. We extend the network proposed as a fast neural style transfer network by Johnson et al. [8] so that the network can learn multiple styles at the same time. To do that, we add a conditional input which selects a style to be transferred out of the trained styles. In addition, we show that the proposed network can mix multiple styles, although the network is trained with each of the training styles independently. The proposed network can also transfer different styles to the different parts of a given image at the same time, which we call "spatial style transfer". In the experiments, we confirmed that no quality degradation occurred in the multi-style network compared to the single network, and linear-weighted multi-style fusion enabled us to generate various kinds of new styles which are different from the trained single styles. In addition, we also introduce a mobile implementation of the proposed network which runs in about 5 fps on an iPhone 7 Plus.
International conference proceedings, English - Simultaneous Estimation of Food Categories and Calories with Multi-task CNN
Takumi Ege; Keiji Yanai
Proc. of IAPR International Conference on Machine Vision Applications (MVA), -, May 2017, Peer-reviwed
International conference proceedings, English - Twitter Photo Geo-Localization Using Both Textual and Visual Features
Shin Matsuo; Wataru Shimoda; Keiji Yanai
Proc. of the IEEE International Conference on Multimedia Big Data (BigMM), -, Apr. 2017, Peer-reviwed
International conference proceedings, English - Learning Food Image Embedding for Food Image Retrieval
Wataru Shimoda; Keiji Yanai
Proc. of the IEEE International Conference on Multimedia Big Data (BigMM), -, Apr. 2017, Peer-reviwed
International conference proceedings, English - Unseen Style Transfer Based on a Conditional Fast Style Transfer Network
Keiji Yanai
Proc. of International Conference on Learning Representation Workshop Track (ICLR WS), -, Apr. 2017, Peer-reviwed
International conference proceedings, English - Comparison of Two Approaches for Direct Food Calorie Estimation
Takumi Ege; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 10590, 453-461, 2017, Peer-reviwed, In this paper, we compare CNN-based estimation and search-based estimation for image-based food calorie estimation. As the up-to-date direct food calorie estimation methods, we proposed a CNN-based calorie regression in [5], while Miyazaki et al. [9] proposed an image-search-based estimation method. The dataset used in the CNN-based direct estimation [5] contained 4877 images of 15 kinds of food classes, while the dataset used in the search-based work [9] consisted of 6522 images without any category information. In addition, in [9], hand-crafted features are used such as BoF and color histogram. The problems are that both the datasets are small and as far as we know there are no work to clearly compare CNN-based and search-based with the same dataset. In this work, we construct a calorie-annotated 68,774 food image dataset, and compare CNN-based estimation [5] and search-based estimation [9] with the same datasets. For the search-based estimation, we use CNN features instead of hand-crafted features used in [9].
International conference proceedings, English - DeepStyleCam: A Real-Time Style Transfer App on iOS
Ryosuke Tanno; Shin Matsuo; Wataru Shimoda; Keiji Yanai
MULTIMEDIA MODELING, MMM 2017, PT II, SPRINGER INTERNATIONAL PUBLISHING AG, 10133, 446-449, 2017, Peer-reviwed, In this demo, we present a very fast CNN-based style transfer system running on normal iPhones. The proposed app can transfer multiple pre-trained styles to the video stream captured from the builtin camera of an iPhone around 140ms (7fps). We extended the network proposed as a real-time neural style transfer network by Johnson et al. [1] so that the network can learn multiple styles at the same time. In addition, we modified the CNN network so that the amount of computation is reduced one tenth compared to the original network. The very fast mobile implementation of the app are based on our paper [2] which describes several new ideas to implement CNN on mobile devices efficiently. Figure 1 shows an example usage of DeepStyleCam which is running on an iPhone SE.
International conference proceedings, English - Automatic Retrieval of Action Video Shots from the Web Using Density-Based Cluster Analysis and Outlier Detection
Nga Hang; Keiji Yanai
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E99D, 11, 2788-2795, Nov. 2016, Peer-reviwed, In this paper, we introduce a fully automatic approach to construct action datasets from noisy Web video search results. The idea is based on combining cluster structure analysis and density-based outlier detection. For a specific action concept, first, we download its Web top search videos and segment them into video shots. We then organize these shots into subsets using density-based hierarchy clustering. For each set, we rank its shots by their outlier degrees which are determined as their isolatedness with respect to their surroundings. Finally, we collect high ranked shots as training data for the action concept. We demonstrate that with action models trained by our data, we can obtain promising precision rates in the task of action classification while offering the advantage of fully automatic, scalable learning. Experiment results on UCF11, a challenging action dataset, show the effectiveness of our method.
Scientific journal, English - Efficient mobile implementation of A CNN-based object recognition system
Keiji Yanai; Ryosuke Tanno; Koichi Okamoto
MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, Association for Computing Machinery, Inc, -, -, 362-366, 01 Oct. 2016, Peer-reviwed, Because of the recent progress on deep learning studies, Convolutional Neural Network (CNN) based method have out-performed conventional object recognition methods with a large margin. However, it requires much more memory and computational costs compared to the conventional methods. Therefore, it is not easy to implement a CNN-based object recognition system on a mobile device where memory and computational power are limited. In this paper, we examine CNN architectures which are suitable for mobile implementation, and propose multi-scale network-in-networks (NIN) in which users can adjust the trade-off between recognition time and accuracy. We implemented multi-threaded mobile applications on both iOS and Android employing either NEON SIMD instructions or the BLAS library for fast computation of convolutional layers, and compared them in terms of recognition time on mobile devices. As results, it has been revealed that BLAS is better for iOS, while NEON is better for Android, and that reducing the size of an input image by resizing is very effective for speedup of CNN-based recognition.
International conference proceedings, English - Visual Event Mining from the Twitter Stream
Takamu Kaneko; Keiji Yanai
Proc. of ACM International World-Wide Web Conference (WWW), -, -, Apr. 2016, Peer-reviwed
International conference proceedings, English - Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications
Ryosuke Tanno; Keiji Yanai
ADJUNCT PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING NETWORKING AND SERVICES (MOBIQUITOUS 2016), ASSOC COMPUTING MACHINERY, 159-164, 2016, Peer-reviwed, In this study, we create "Caffe2C" which converts CNN (Convolutional Neural Network) models trained with the existing CNN framework, Caffe, C-language source codes for mobile devices. Since Caffe2C generates a single C code which includes everything needed to execute the trained CNN, csCaffe2C makes it easy to run CNN-based applications on any kinds of mobile devices and embedding devices without GPUs. Moreover, Caffe2C achieves faster execution speed compared to the existing Caffe for iOS/Android and the OpenCV iOS/Android DNN class. The reasons are as follows: (1) directly converting of trained CNN models to C codes, (2) efficient use of NEON/BLAS with multi-threading, and (3) performing pre-computation as much as possible in the computation of CNNs. In addition, in this paper, we demonstrate the availability of Caffe2C by showing four kinds of CNN-base object recognition mobile applications.
International conference proceedings, English - Overview of the ACM MultiMedia 2016 International Workshop on Multimedia Assisted Dietary Management
Stavroula Mougiakakou; Giovanni Maria Farinella; Keiji Yanai
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, ASSOC COMPUTING MACHINERY, 1489-1490, 2016, Peer-reviwed, This abstract provides a summary and overview of the 2nd international workshop on multimedia assisted dietary management.
International conference proceedings, English - Automatic construction of action datasets using web videos with density-based cluster analysis and outlier detection
Nga Hang Do; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 9431, 160-172, 2016, Peer-reviwed, In this paper, we introduce a fully automatic approach to construct action datasets from noisy Web video search results. The idea is based on combining cluster structure analysis and density-based outlier detection. For a specific action concept, first, we download its Web top search videos and segment them into video shots. We then organize these shots into subsets using density-based hierarchy clustering. For each set, we rank its shots by their outlier degrees which are determined as their isolatedness with respect to their surroundings. Finally, we collect upper ranked shots as training data for the action concept. We demonstrate that with action models trained by our data, we can obtain promising precision rates in the task of action classification while offering the advantage of a fully automatic, scalable learning. Experiment results on UCF11, a challenging action dataset, show the effectiveness of our method.
International conference proceedings, English - Grillcam: A real-time eating action recognition system
Koichi Okamoto; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 9517, 331-335, 2016, In this demo, we demonstrate a mobile real-time eating action recognition system, GrillCam. It continuously recognizes user’s eating action and estimates categories of eaten food items during meal-time. With this system, we can get to know total amount of eaten food items, and can calculate total calorie intake of eaten foods even for the meals where the amount of foods to be eaten is not decided before starting eating. The system implemented on a smartphone continuously monitors eating actions during mealtime. It detects the moment when a user eats foods, extract food regions near the user’s mouth and classify them. As a prototype system, we implemented a mobile system the target of which are Japanese-style meals, “Yakiniku” and “Oden”. It can recognize five different kinds of ingredients for each of “Yakiniku” and “Oden” in the real-time way with classification rates, 87.7% and 80.8%, respectively. It was evaluated as being superior to the baseline system which employed no eating action recognition by user study.
International conference proceedings, English - A system to help amateurs take pictures of delicious looking food
Takao Kakimori; Makoto Okabe; Keiji Yanai; Rikio Onai
2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), IEEE, -, -, 456-461, 2016, Peer-reviwed, Recently, many people have begun to take pictures of meals and food either at home or in restaurants. These pictures are then uploaded to social networking services (SNS) where they are shared with friends. People want to take pictures of food that looks delicious, but they often find this difficult. This is because most people lack the knowledge required to take attractive pictures. There are many photography techniques in use, e.g., composition [1], lighting, color, focus, etc. The techniques used to take good pictures vary depending on the subject. Amateur photographers find it difficult to choose techniques and apply them appropriately. In this paper, we consider the composition of food photographs and develop a system to support amateurs taking pictures of meals and food to make the food look delicious. Our target users are food photography amateurs. Our target photographic subjects are food items on plates or dishes. Using our system, there are four steps to food photography: 1) the user provides information about the foods to be photographed, or our system automatically recognizes these food items, with the aid of a camera on a mobile phone; 2) our system suggests a composition and camera tilt that will result in a picture that makes the food look delicious; 3) the user arranges the food and dishes on the table, and sets the camera position and tilt; 4) finally, the user takes the picture. If the user is not satisfied with the suggestion, we allow the user to design a new composition quickly and easily using their mobile phone. We performed a usability study for our system followed by a subjective evaluation of the quality of the pictures taken using our system.
International conference proceedings, English - CNN-based Style Vector for Style Image Retrieval
Shin Matsuo; Keiji Yanai
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ASSOC COMPUTING MACHINERY, -, -, 309-312, 2016, Peer-reviwed, In this paper, we have examined the effectiveness of "style matrix" which is used in the works on style transfer and texture synthesis by Gatys et al. [2, 3] in the context of image retrieval as image features. A style matrix is presented by Gram matrix of the feature maps in a deep convolutional neural network. We proposed a style vector which are generated from a style matrix with PCA dimension reduction. In the experiments, we evaluate image retrieval performance using artistic images downloaded from Wikiarts.org regarding both artistic styles ans artists. We have obtained 40.64% and 70.40% average precision for style search and artist search, respectively, both of which outperformed the results by common CNN features. In addition, we found PCA-compression boosted the performance.
International conference proceedings, English - Event photo mining from Twitter using keyword bursts and image clustering
Takamu Kaneko; Keiji Yanai
NEUROCOMPUTING, ELSEVIER SCIENCE BV, 172, 143-158, Jan. 2016, Peer-reviwed, Twitter is a unique microblogging service which enables people to post and read not only short messages but also photos from anywhere. Since microblogs are different from traditional blogs in terms of timeliness and on-the-spot-ness, they include much information on various events over the world. Especially, photos posted to microblogs are useful to understand what happens in the world visually and intuitively.
In this paper, we propose a system to discover events and related photos from the Twitter stream. We make use of "geo-photo tweets" which are tweets including both geotags and photos in order to mine various events visually and geographically. Some works on event mining which utilize geotagged tweets have been proposed so far. However, they used no images but only textual analysis of tweet message texts. In this work, we detect events using visual information as well as textual information.
In the experiments, we analyzed 17 million geo-photo tweets posted in the United States and 3 million geo-photo tweets posted in Japan with the proposed method, and evaluated the results. We show some examples of detected events and their photos such as "rainbow", "fireworks" "Tokyo firefly festival" and "Halloween". (C) 2015 Elsevier B.V. All rights reserved.
Scientific journal, English - Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation
Wataru Shimoda; Keiji Yanai
COMPUTER VISION - ECCV 2016, PT IV, SPRINGER INT PUBLISHING AG, 9908, -, 218-234, 2016, Peer-reviwed, In this paper, we deal with a weakly supervised semantic segmentation problem where only training images with image-level labels are available. We propose a weakly supervised semantic segmentation method which is based on CNN-based class-specific saliency maps and fully-connected CRF. To obtain distinct class-specific saliency maps which can be used as unary potentials of CRF, we propose a novel method to estimate class saliency maps which improves the method proposed by Simonyan et al. (2014) significantly by the following improvements: (1) using CNN derivatives with respect to feature maps of the intermediate convolutional layers with up-sampling instead of an input image; (2) subtracting the saliency maps of the other classes from the saliency maps of the target class to differentiate target objects from other objects; (3) aggregating multiple-scale class saliency maps to compensate lower resolution of the feature maps. After obtaining distinct class saliency maps, we apply fully-connected CRF by using the class maps as unary potentials. By the experiments, we show that the proposed method has out-performed state-of-the-art results with the PASCAL VOC 2012 dataset under the weakly-supervised setting.
International conference proceedings, English - Foodness Proposal for Multiple Food Detection by Training of Single Food Images
Wataru Shimoda; Keiji Yanai
MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, ASSOC COMPUTING MACHINERY, -, -, 13-21, 2016, Peer-reviwed, We propose a CNN-based "food-ness" proposal method which requires neither pixel-wise annotation nor bounding box annotation. Some proposal methods have been proposed to detect regions with high "object-ness" so far. However, many of them generated a large number of candidates to raise the recall rate. Considering the recent advent of the deeper CNN, these methods to generate a large number of proposals have difficulty in processing time for practical use. Meanwhile, a fully convolutional network (FCN) was proposed the network of which localizes target objects directly. FCN saves computational cost, although FCN is essentially equivalent to the sliding window search. This approach made large progress and achieved significant success in various tasks.
Then, in this paper we propose an intermediate approach between the traditional proposal approach and the fully convolutional approach. Especially we propose a novel proposal method which generates high "food-ness" regions by fully convolutional networks and back-propagation based approach with training food images gathered from the Web.
International conference proceedings, English - An Automatic Calorie Estimation System of Food Images on a Smartphone
Koichi Okamoto; Keiji Yanai
MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, ASSOC COMPUTING MACHINERY, -, -, 63-70, 2016, Peer-reviwed, In recent years, due to a rise in healthy thinking on eating, many people take care of their eating habits, and some people record daily diet regularly. To assist them, many mobile applications for recording everyday meals have been released so far. Some of them employ food image recognition which can estimate not only food names but also food calorie. However, most of such applications have some problems especially on their usability. Then, in this paper, we propose a novel single-image-based food calorie estimation system which runs on a smartphone as a standalone application without external recognition servers. The proposed system carries out food region segmentation, food region categorization, and calorie estimation automatically. By the experiments and the user study on the proposed system, the effectiveness of the proposed system was confirmed.
International conference proceedings, English - DeepFoodCam: A DCNN-based Real-time Mobile Food Recognition System
Ryosuke Tanno; Koichi Okamoto; Keiji Yanai
MADIMA'16: PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMEDIA ASSISTED DIETARY MANAGEMENT, ASSOC COMPUTING MACHINERY, -, -, 89-89, 2016, Peer-reviwed
International conference proceedings, English - Weakly-Supervised Segmentation by Combining CNN Feature Maps and Object Saliency Maps
Wataru Shimoda; Keiji Yanai
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE COMPUTER SOC, -, -, 1935-1940, 2016, Peer-reviwed, In general, CNN based semantic segmentation methods assume pixel-wise annotation is available, which is costly to obtain in general. On the other hand, image-level annotations is much easier to obtain than pixel-level annotation. Then, in this work, we focus on weakly-supervised semantic segmentation which is known as task of using training data with only image-level annotations.
In this paper, we propose a new CNN-based semantic segmentation method which uses both activation features calculated by feed-forwarding and object saliency maps obtained by back-propagation. As a CNN, we use the VGG-16 pre-trained with 1000-class ILSVRC datasets and fine-tuned it with multi-label training using only image-level labeled dataset. By the experiments, we show that the proposed method achieved state-of-the-art results with the PASCAL VOC 2012 dataset.
International conference proceedings, English - A system to support the amateurs to take a delicious-looking picture of foods
Takao Kakimori; Makoto Okabe; Keiji Yanai; Rikio Onai
SIGGRAPH Asia 2015 Mobile Graphics and Interactive Applications, SA 2015, Association for Computing Machinery, Inc, -, 02 Nov. 2015, Peer-reviwed, Recently, many people take a picture of foods at home or in restau- rants, and upload the picture to a social networking service (SNS) to share it with friends. People want to take a delicious-looking picture of foods, but it is often difficult, because most of them have no idea how to take a delicious-looking picture. There are many photography techniques for composition[Liu et al. 2010], lighting, color, focus, etc, and the techniques used to take a picture are differ- ent for different types of subjects. The problem lies in the difficulty for amateur photographers to choose and apply appropriate ones from such many techniques. In this paper, we pay attention to composition and develop a system to support the amateurs to take a delicious-looking picture of foods in a short time. Our target users are the amateurs of food photogra- phy and our target photographic subjects are foods on dishes. There are four steps to take a picture using our system: 1) our system automatically recognizes foods on dishes
2) our system suggests the composition and the camera tilt, by which the user can take a delicious-looking picture
3) the user arranges foods and dishes on the table, and set the camera position and tilt
4) finally, the user takes the picture.
International conference proceedings, English - Automatic Action Dataset Construction from Web using Density-based Cluster Analysis and Outlier Detection
Nga Do; Keiji Yanai
Proc. of Pacific Rim Symposium on Image and Video Technology (PSIVT), -, Nov. 2015, Peer-reviwed
International conference proceedings, English - UEC at TRECVID 2015 SIN task
Do Hang Nga; Keiji Yanai
Proc. of TRECVID Workshop, -, Nov. 2015
International conference proceedings, English - FoodCam: A real-time food recognition system on a smartphone
Yoshiyuki Kawano; Keiji Yanai
MULTIMEDIA TOOLS AND APPLICATIONS, SPRINGER, 74, 14, 5263-5287, Jul. 2015, Peer-reviwed, We propose a mobile food recognition system, FoodCam, the purposes of which are estimating calorie and nutrition of foods and recording a user's eating habits. In this paper, we propose image recognition methods which are suitable for mobile devices. The proposed method enables real-time food image recognition on a consumer smartphone. This characteristic is completely different from the existing systems which require to send images to an image recognition server. To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract image features and finally classify it into one of the one hundred food categories with a linear SVM. As image features, we adopt two kinds of features: one is the combination of the standard bag-of-features and color histograms with chi(2) kernel feature maps, and the other is a HOG patch descriptor and a color patch descriptor with the state-of-the-art Fisher Vector representation. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, and it shows the estimated direction in an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly and continuously. We implemented this system as a standalone mobile application for Android smartphones so as to use multiple CPU cores effectively for real-time recognition. In the experiments, we have achieved the 79.2 % classification rate for the top 5 category candidates for a 100-category food dataset with the ground-truth bounding boxes when we used HOG and color patches with the Fisher Vector coding as image features. In addition, we obtained positive evaluation by a user study compared to the food recording system without object recognition.
Scientific journal, English - FoodCam: A real-time food recognition system on a smartphone
Yoshiyuki Kawano; Keiji Yanai
Multimedia Tools and Applications, Kluwer Academic Publishers, 74, 14, 5263-5287, 01 Jul. 2015, Peer-reviwed, We propose a mobile food recognition system, FoodCam, the purposes of which are estimating calorie and nutrition of foods and recording a user’s eating habits. In this paper, we propose image recognition methods which are suitable for mobile devices. The proposed method enables real-time food image recognition on a consumer smartphone. This characteristic is completely different from the existing systems which require to send images to an image recognition server. To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract image features and finally classify it into one of the one hundred food categories with a linear SVM. As image features, we adopt two kinds of features: one is the combination of the standard bag-of-features and color histograms with χ2 kernel feature maps, and the other is a HOG patch descriptor and a color patch descriptor with the state-of-the-art Fisher Vector representation. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, and it shows the estimated direction in an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly and continuously. We implemented this system as a standalone mobile application for Android smartphones so as to use multiple CPU cores effectively for real-time recognition. In the experiments, we have achieved the 79.2 % classification rate for the top 5 category candidates for a 100-category food dataset with the ground-truth bounding boxes when we used HOG and color patches with the Fisher Vector coding as image features. In addition, we obtained positive evaluation by a user study compared to the food recording system without object recognition.
Scientific journal, English - 既存カテゴリの利用とクラウドソーシングによる食事画像データセットの自動拡張
河野憲之; 柳井啓司
電子情報通信学会論文誌D, J98-D, 4, -, Apr. 2015, Peer-reviwed
Scientific journal, Japanese - Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation
Yoshiyuki Kawano; Keiji Yanai
COMPUTER VISION - ECCV 2014 WORKSHOPS, PT III, SPRINGER-VERLAG BERLIN, 8927, -, 3-17, 2015, Peer-reviwed, In this paper, we propose a novel effective framework to expand an existing image dataset automatically leveraging existing categories and crowdsourcing. Especially, in this paper, we focus on expansion on food image data set. The number of food categories is uncountable, since foods are different from a place to a place. If we have a Japanese food dataset, it does not help build a French food recognition system directly. That is why food data sets for different food cultures have been built independently so far. Then, in this paper, we propose to leverage existing knowledge on food of other cultures by a generic "foodness" classifier and domain adaptation. This can enable us not only to built other-cultured food datasets based on an original food image dataset automatically, but also to save as much crowd-sourcing costs as possible. In the experiments, we show the effectiveness of the proposed method over the baselines.
International conference proceedings, English - Hand Detection and Tracking in Videos for Fine-Grained Action Recognition
Nga H. Do; Keiji Yanai
COMPUTER VISION - ACCV 2014 WORKSHOPS, PT I, SPRINGER-VERLAG BERLIN, 9008, 19-34, 2015, Peer-reviwed, In this paper, we develop an effective method of detecting and tracking hands in uncontrolled videos based on multiple cues including hand shape, skin color, upper body position and flow information. We apply our hand detection results to perform fine-grained human action recognition. We demonstrate that motion features extracted from hand areas can help classify actions even when they look familiar and they are associated with visually similar objects. We validate our method of detecting and tracking hands on VideoPose2.0 dataset and apply our method of classifying actions to the playing-instrument group of UCF-101 dataset. Experimental results show the effectiveness of our approach.
International conference proceedings, English - テレビ番組からの位置情報付き旅行映像データベースの自動構築
向井康貴; 柳井啓司
電子情報通信学会論文誌D, J98-D, 1, Jan. 2015, Peer-reviwed
Scientific journal, Japanese - VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence
Nga H. Do; Keiji Yanai
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E98D, 1, 166-172, Jan. 2015, Peer-reviwed, In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as "surfing wave" (sport action) or "brushing teeth" (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.
Scientific journal, English - FOOD IMAGE RECOGNITION USING DEEP CONVOLUTIONAL NETWORK WITH PRE-TRAINING AND FINE-TUNING
Keiji Yanai; Yoshiyuki Kawano
2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, -, -, -, 2015, Peer-reviwed, In this paper, we examined the effectiveness of deep convolutional neural network (DCNN) for food photo recognition task. Food recognition is a kind of fine-grained visual recognition which is relatively harder problem than conventional image recognition. To tackle this problem, we sought the best combination of DCNN-related techniques such as pre-training with the large-scale ImageNet data, fine-tuning and activation features extracted from the pre-trained DCNN. From the experiments, we concluded the fine-tuned DCNN which was pre-trained with 2000 categories in the ImageNet including 1000 food-related categories was the best method, which achieved 78.77% as the top-1 accuracy for UEC-FOOD100 and 67.57% for UEC-FOOD256, both of which were the best results so far.
In addition, we applied the food classifier employing the best combination of the DCNN techniques to Twitter photo data. We have achieved the great improvements on food photo mining in terms of both the number of food photos and accuracy. In addition to its high classification accuracy, we found that DCNN was very suitable for large-scale image data, since it takes only 0.03 seconds to classify one food photo with GPU.
International conference proceedings, English - A VISUAL ANALYSIS ON RECOGNIZABILITY AND DISCRIMINABILITY OF ONOMATOPOEIA WORDS WITH DCNN FEATURES
Wataru Shimoda; Keiji Yanai
2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), IEEE, -, 2015, Peer-reviwed, In this paper, we examine the relation between onomatopoeia and images using a large number of Web images The objective of this paper is to examine if the images corresponding to Japanese onomatopoeia words which express the feeling of visual appearance can be recognized by the state-of-the-art visual recognition methods. In our work, first, we collect the images corresponding to onomatopoeia words using an Web image search engine, and then we filter out noise images to obtain clean dataset with automatic image re-ranking method. Next, we analyze the recognizability of various kinds of onomatopoeia images using improved Fisher vector (IFV) and deep convolutional neural network (DCNN) features. In addition, we collect images corresponding to the pairs of nouns and onomatopoeia words, and we examine if the images associated with the same nouns and the different onomatopoeia words are visually discriminable or not. By the experiments, it has been shown that the DCNN features extracted from the layer 7 of Overfeat's network pre-trained with the ILSVRC 2013 data have prominent ability to represent onomatopoeia images, and most of the onomatopoeia words have visual characteristics which can be recognized.
International conference proceedings, English - A review of web image mining
Keiji Yanai
ITE Transactions on Media Technology and Applications, Institute of Image Information and Television Engineers, 3, 3, 156-169, 2015, Peer-reviwed, Invited, In this paper, we review works related to big visual data on the Web in the literature of computer vision and multimedia research regarding the following points: (1) Web image acquisition for construction of visual concept database for image/video recognition, (2) Web image application for visual concept analysis and data-driven computer graphics, and (3) real-world sensing through Web images to detect location-dependent and event-related visual information.
Scientific journal, English - CNN-Based Food Image Segmentation Without Pixel-Wise Annotation
Wataru Shimoda; Keiji Yanai
NEW TRENDS IN IMAGE ANALYSIS AND PROCESSING - ICIAP 2015 WORKSHOPS, SPRINGER-VERLAG BERLIN, 9281, 449-457, 2015, Peer-reviwed, We propose a CNN-based food image segmentation which requires no pixel-wise annotation. The proposed method consists of food region proposals by selective search and bounding box clustering, back propagation based saliency map estimation with the CNN model fine-tuned with the UEC-FOOD100 dataset, GrabCut guided by the estimated saliency maps and region integration by non-maximum suppression. In the experiments, the proposed method outperformed RCNN regarding food region detection as well as the PASCAL VOC detection task.
International conference proceedings, English - Twitter Event Photo Detection Using both Geotagged Tweets and Non-geotagged Photo Tweets
Kaneko Takamu; Nga Do Hang; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT II, SPRINGER INT PUBLISHING AG, 9315, 128-138, 2015, Peer-reviwed, In this paper, we propose a system to detect event photos using geotagged tweets and non-geotagged photo tweets. In our previous work, only "geotagged photo tweets" was used for event photo detection the ratio of which to the total tweets was very limited. In the proposed system, we use geotagged tweets without photos for event detection, and non-geotagged photo tweets for event photo detection in addition to geotagged photo tweets. As results, we have detected about ten times of the photo events with higher accuracy compared to the previous work.
International conference proceedings, English - Low-Bit Representation of Linear Classifier Weights for Mobile Large-Scale Image Classification
Yoshiyuki Kawano; Keiji Yanai
Proceedings 3rd IAPR Asian Conference on Pattern Recognition ACPR 2015, IEEE, -, 489-493, 2015, Peer-reviwed, In this paper, we propose an effective method to implement a system of large-scale visual recognition where the number of classes is more than 1000 on mobile devices. Because the size of memory and storage on mobile devices such as smartphones is limited, the size of image recognition application should be as small as possible. To save the required memory of mobile visual recognition, we proposed a scalar-based classifier weight compression method before [6]. Although it is very simple and effective, it has the drawback that the performance is degraded largely in case of lower-bit representation. Then, in this paper, we propose an improved method to make 2-bit and 1-bit representation feasible, and make more comprehensive experiments including more large-scale 10k image classification with combination of the proposed improved scalar-based compression method and product quantization.
International conference proceedings, English - Real-time Food Image Mining and Analysis from the Twitter Stream
Keiji Yanai; Yoshiyuki Kawano
Proc. of Pacific-Rim Conference on Multimedia, -, Dec. 2014, Peer-reviwed
International conference proceedings, English - Food Image Recognition using Deep Convolutional Features Pre-trained with Food-related Categories
Yoshiyuki Kawano; Keiji Yanai
Proc. of PCM Workshop on Multimedia Big Data Analytics (MBDA), -, Dec. 2014, Peer-reviwed, Invited
International conference proceedings, English - Object Categorization by Local Feature Matching with a Large Number of Web Images
Mizuki Akiyama; Yoshiyuki Kawano; Keiji Yanai
Proc. of PCM Workshop on Multimedia Big Data Analytics (MBDA), -, Dec. 2014, Peer-reviwed
International conference proceedings, English - An Analysis on Visual Recognizability of Onomatopoeia Using Web Images and DCNN features
Wataru Shimoda; Keiji Yanai
Proc. of PCM Workshop on Multimedia Big Data Analytics (MBDA), -, Dec. 2014, Peer-reviwed
International conference proceedings, English - FoodCam-256: A large-scale real-time mobile food recognition system employing high-dimensional features and compression of classifier weights
Yoshiyuki Kawano; Keiji Yanai
MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, Association for Computing Machinery, Inc, Demo paper, 761-762, 03 Nov. 2014, Peer-reviwed, In the demo, we demonstrate a large-scale food recognition system employing high-dimensional Fisher Vector and liner one-vsrest classifiers. Since all the processes on image recognition perform on a smartphone, the system does not require an external image recognition server, and runs on an ordinary smartphone in a real-time way. The proposed system can recognize 256 kinds of food by using the UEC-Food256 food image dataset we built by ourselves recently as a training dataset. To implement an image recognition system employing high-dimensional features on mobile devices, we propose linear weight compression method to save memory. In the experiments, we proved that the proposed compression methods make a little performance loss, while we can reduce the amount of weight vectors to 1/8. The proposed system has not only food recognition function but also the functions of estimation of food calorie and nutritious and recording a user's eating habits. In the experiments with 100 kinds of food categories, we have achieved the 74.4% classification rate for the top 5 category candidates. The prototype system is open to the public as an Androidbased smartphone application.
International conference proceedings, English - UEC at TRECVID 2014 SIN task
Keiji Yanai; Hiroyoshi Harada; Do Hang Nga
Proc. of TRECVID Workshop, -, Nov. 2014
International conference proceedings, English - Analyzing the similarities of actions based on video clustering
Vu Gia Truong; Do Hang Nga; Keiji Yanai
Proc. of International Workshop on Modern Science and Technology (IWMST), -, Oct. 2014, Peer-reviwed
International conference proceedings, English - 道案内動画の作成のためのウェアラブルカメラ映像の自動要約
岡本昌也; 柳井啓司
電子情報通信学会論文誌D, J97-D, 8, Aug. 2014
Scientific journal, Japanese - Realtime Eating Action Recognition System on a SmartphoneICME Workshop on Mobile Multimedia Computing, (2014).
Koichi Okamoto; Keiji Yana
Proc. of ICME Workshop on Mobile Multimedia Computing, -, Jul. 2014, Peer-reviwed
International conference proceedings, English - Twitter food photo mining and analysis for one hundred kinds of foods
Keiji Yanai; Yoshiyuki Kawano
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 8879, 22-32, 2014, So many people post photos as well as short messages to Twitter every minutes from everywhere on the earth. By monitoring the Twitter stream, we can obtain various kinds of images with texts. In this paper, as a case study of Twitter image mining for specific kinds of photos, we describe food photo mining from the Twitter stream. To collect food photos from Twitter, we monitor the Twitter stream to find the tweets containing both food-related keywords and photos, and apply a “foodness” classifier and 100-class food classifiers to them to verify whether they represent foods or not after downloading the corresponding photos. In the paper, we report the experimental results of our food photo mining for the Twitter photo data we have collected for two years and four months. As results, we detected about 470,000 food photos from Twitter. With this data, we made spatio-temporal analysis on food photos.
International conference proceedings, English - A cooking recipe recommendation system with visual recognition of food ingredients
Keiji Yanai; Takuma Maruyama; Yoshiyuki Kawano
International Journal of Interactive Mobile Technologies, International Association of Online Engineering, 8, 2, 28-34, 2014, Peer-reviwed, In this paper, we propose a cooking recipe recommendation system which runs on a consumer smartphone as an interactive mobile application. The proposed system employs real-time visual object recognition of food ingredients, and recommends cooking recipes related to the recognized food ingredients. Because of visual recognition, by only pointing a built-in camera on a smartphone to food ingredients, a user can get to know a related cooking recipes instantly. The objective of the proposed system is to assist people who cook to decide a cooking recipe at grocery stores or at a kitchen. In the current implementation, the system can recognize 30 kinds of food ingredient in 0.15 seconds, and it has achieved the 83.93% recognition rate within the top six candidates. By the user study, we confirmed the effectiveness of the proposed system.
Scientific journal, English - Summarization of Egocentric Moving Videos for Generating Walking Route Guidance
Masaya Okamoto; Keiji Yanai
IMAGE AND VIDEO TECHNOLOGY, PSIVT 2013, SPRINGER-VERLAG BERLIN, 8333, 431-442, 2014, Peer-reviwed, In this paper, we propose a method to summarize an egocentric moving video (a video recorded by a moving wearable camera) for generating a walking route guidance video. To summarize an egocentric video, we analyze it by applying pedestrian crosswalk detection as well as ego-motion classification, and estimate an importance score of each section of the given video. Based on the estimated importance scores, we dynamically control video playing speed instead of generating a summarized video file in advance. In the experiments, we prepared an egocentric moving video dataset including more than one-hour-long videos totally, and evaluated crosswalk detection and ego-motion classification methods. Evaluation of the whole system by user study has been proved that the proposed method is much better than a simple baseline summarization method without video analysis.
International conference proceedings, English - Automatic extraction of relevant video shots of specific actions exploiting Web data
Do Hang Nga; Keiji Yanai
COMPUTER VISION AND IMAGE UNDERSTANDING, ACADEMIC PRESS INC ELSEVIER SCIENCE, 118, 2-15, Jan. 2014, Peer-reviwed, Video sharing websites have recently become a tremendous video source, which is easily accessible without any costs. This has encouraged researchers in the action recognition field to construct action database exploiting Web sources. However Web sources are generally too noisy to be used directly as a recognition database. Thus building action database from Web sources has required extensive human efforts on manual selection of video parts related to specified actions. In this paper, we introduce a novel method to automatically extract video shots related to given action keywords from Web videos according to their metadata and visual features. First, we select relevant videos among tagged Web videos based on the relevance between their tags and the given keyword. After segmenting selected videos into shots, we rank these shots exploiting their visual features in order to obtain shots of interest as top ranked shots. Especially, we propose to adopt Web images and human pose matching method in shot ranking step and show that this application helps to boost more relevant shots to the top. This unsupervised method of ours only requires the provision of action keywords such as "surf wave" or "bake bread" at the beginning. We have made large-scale experiments on various kinds of human actions as well as non-human actions and obtained promising results. (C) 2013 Elsevier Inc. All rights reserved.
Scientific journal, English - A dense SURF and triangulation based spatio-temporal feature for action recognition
Do Hang Nga; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8325, 1, 375-387, 2014, Peer-reviwed, In this paper, we propose a novel method of extracting spatio-temporal features from videos. Given a video, we extract its features according to every set of N frames. The value of N is small enough to guarantee the temporal denseness of our features. For each frame set, we first extract dense SURF keypoints from its first frame. We then select points with the most likely dominant and reliable movements, and consider them as interest points. In the next step, we form triangles of interest points using Delaunay triangulation and track points within each triple through the frame set. We extract one spatio-temporal feature from each triangle based on its shape feature along with the visual features and optical flows of its points. This enables us to extract spatio-temporal features based on groups of related points and their trajectories. Hence the features can be expected to be robust and informative. We apply Fisher Vector encoding to represent videos using the proposed spatio-temporal features. We conduct experiments on several challenging benchmarks, and show the effectiveness of our proposed method. © 2014 Springer International Publishing.
International conference proceedings, English - FoodCam: A real-time mobile food recognition system employing Fisher Vector
Yoshiyuki Kawano; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8326, 2, 369-373, 2014, Peer-reviwed, In the demo, we demonstrate a mobile food recognition system with Fisher Vector and liner one-vs-rest SVMs which enables us to record our food habits easily. In the experiments with 100 kinds of food categories, we have achieved the 79.2% classification rate for the top 5 category candidates when the ground-truth bounding boxes are given. The prototype system is open to the public as an Android-based smartphone application. © 2014 Springer International Publishing.
International conference proceedings, English - Offline 1000-Class Classification on a Smartphone
Yoshiyuki Kawano; Keiji Yanai
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), IEEE, 193-194, 2014, Peer-reviwed, In this demo, we propose an offline large-scale image classification system on a smartphone. The proposed system can classify 1000-class objects in the ILSVRC2012 dataset in 0.270 seconds. To implement a 1000-class object classification system, we compress the weight vectors of linear classifiers, which leads only slight performance loss.
International conference proceedings, English - ILSVRC on a smartphone
Yoshiyuki Kawano; Keiji Yanai
IPSJ Transactions on Computer Vision and Applications, Information Processing Society of Japan, 6, 83-87, 2014, Peer-reviwed, In this work, to the best of our knowledge, we propose a stand-alone large-scale image classification system running on an Android smartphone. The objective of this work is to prove that mobile large-scale image classification requires no communication to external servers. To do that, we propose a scalar-based compression method for weight vectors of linear classifiers. As an additional characteristic, the proposed method does not need to uncompress the compressed vectors for evaluation of the classifiers, which brings the saving of recognition time. We have implemented a large-scale image classification system on an Android smartphone, which can perform 1000-class classification for a given image in 0.270 seconds. In the experiment, we show that compressing the weights to 1/8 leaded to only 0.80% performance loss for 1000-class classification with the ILSVRC2012 dataset. In addition, the experimental results indicate that weight vectors compressed in low bits, even in the binarized case (bit = 1), are still valid for classification of high dimensional vectors.
Scientific journal, English - REAL-TIME EATING ACTION RECOGNITION SYSTEM ON A SMARTPHONE
Koichi Okamoto; Keiji Yanai
2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, ELECTRON DEVICES SOC & RELIABILITY GROUP, -, 2014, Peer-reviwed, Recently, many mobile applications to record everyday meals for dieting have been popular. Some of them can recognize names of food items in meals by only taking photos. However, such image-recognition-based food recording systems requires taking meal photos before eating, which are not applicable for the meals in which the amount of food to be eaten is not decided before eating such as large platter for sharing and barbecue-style dishes.
Then in this paper, we propose a mobile real-time eating action recognition system. It continuously recognizes user's eating action and estimates the categories of eaten food items during mealtime. With this system, we can get to know total amount of eaten food items, and can calculate total calories of eaten foods even for the meals where the amount of foods to be eaten is not decided before starting eating.
The system implemented on a smartphone continuously monitor eating actions during mealtime. It detects the moment when a user eats foods, extract food regions near the user's mouth and classify them. In the experiments, we implemented a mobile system the target of which is Japanesestyle "Yakiniku" where people eat meats and vegetables while grilling. It can recognize five different kinds of ingredients for "Yakiniku" such as beef, carrot and pumpkin in the real-time way. It has achieved 74.8% classification rate, and was evaluated as being superior to the baseline system which employed no eating action recognition by user study.
International conference proceedings, English - Food image recognition with deep convolutional features
Yoshiyuki Kawano; Keiji Yanai
UbiComp 2014 - Adjunct Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Association for Computing Machinery, Inc, -, 589-593, 2014, Peer-reviwed, In this paper, we report the feature obtained from the Deep Convolutional Neural Network boosts food recognition accuracy greatly by integrating it with conventional hand-crafted image features, Fisher Vectors with HoG and Color patches. In the experiments, we have achieved 72.26% as the top-1 accuracy and 92.00% as the top-5 accuracy for the 100-class food dataset, UEC-FOOD100, which outperforms the best classification accuracy of this dataset reported so far, 59.6%, greatly.
International conference proceedings, English - Real-time Photo Mining from the Twitter Stream: Event Photo Discovery and Food Photo Detection
Keiji Yanai; Takamu Takamu; Yoshiyuki Kawano
2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), IEEE, 295-302, 2014, Invited, So many people are posting photos as well as short messages to Twitter every minutes from everywhere on the earth. By monitoring the Twitter stream, we can obtain various kinds of photos with texts. In this paper, as case studies of real-time Twitter photo mining, we introduce our current on-going projects on event photo discovery and food photo mining from the Twitter stream.
International conference proceedings, English - 複数品目が含まれる食事画像の認識における共起関係の利用
松田裕司; 柳井啓司
電子情報通信学会論文誌D, The Institute of Electronics, Information and Communication Engineers, J96-D, 8, 1724-1730, Aug. 2013, Peer-reviwed, 一般に食事には複数の料理が含まれることが多く,それらには組合せが存在すると考えられる.本論文では,複数の料理を含む画像の認識において,料理間の共起関係を考慮する手法を提案する.提案手法は,SVMによる評価値を,共起確率を用いたManifold Rankingにより再ランキングすることで最終的な評価値を得る.共起確率は,データベース上の共起頻度,及びWeb上のテキストを用いて求めた.実験では,複数品目を含む画像に対して,100種類の料理分類を行ったところ,10個の候補を提示したときに,共起関係を利用しない従来手法と比べ,データベースから共起確率を求めた場合には8.8ポイント向上し,64.6%の分類率を達成した.これにより料理間の共起関係の利用が複数品目の食事画像認識において有効であることが示された.
Scientific journal, Japanese - Real-time Mobile Food Recognition System
Yoshiyuki Kawano; Keiji Yanai
Proc. of IEEE CVPR International Workshop on Mobile Vision, -, Jun. 2013, Peer-reviwed
International conference proceedings, English - UEC, Tokyo at MediaEval 2013 retrieving diverse social images task
Keiji Yanai; Do Hang Nga
CEUR Workshop Proceedings, CEUR-WS, 1043, 2013, In this paper, we describe our method and results for the MediaEval 2013 Retrieving Diverse Social Images Task. To accomplish the task objective, we adopt VisualRank [5] and Ranking with Sink Points [2], which are common methods to select representative and diverse photos. To obtain an affinity matrix for both ranking methods, we used only the officially-provided features including visual features and tag features. We submitted three required runs including only visual feature run, only textual feature run and textual-visual fused feature run.
International conference proceedings, English - Real-time Mobile Food Recognition System
Yoshiyuki Kawano; Keiji Yanai
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), IEEE, 1-7, 2013, Peer-reviwed, We propose a mobile food recognition system the purposes of which are estimating calorie and nutritious of foods and recording a user's eating habits. Since all the processes on image recognition performed on a smartphone, the system does not need to send images to a server and runs on an ordinary smartphone in a real-time way.
To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract a color histogram and SURF-based bag-of-features, and finally classify it into one of the fifty food categories with linear SVM and fast chi(2) kernel. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, show it as an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly about once a second. We implemented this system as an Android smartphone application so as to use multiple CPU cores effectively for real-time recognition.
In the experiments, we have achieved the 81.55% classification rate for the top 5 category candidates when the ground-truth bounding boxes are given. In addition, we obtained positive evaluation by user study compared to the food recording system without object recognition.
International conference proceedings, English - Visual analysis of tag co-occurrence on nouns and adjectives
Yuya Kohara; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7732, 1, 47-57, 2013, Peer-reviwed, In recent years, due to the wide spread of photo sharing Web sites such as Flickr and Picasa, we can put our own photos on the Web and show them to the public easily. To make the photos searched for easily, it is common to add several keywords which are called as "tags" when we upload photos. However, most of the tags are added one by one independently without much consideration of association between the tags. Then, in this paper, as a preparation for realizing simultaneous recognition of nouns and adjectives, we examine visual relationship between tags, particularly noun tags and adjective tags, by analyzing image features of a large number of tagged photos in social media sites on the Web with mutual information. As a result, it was turned out that mutual information between some nouns such as "car" and "sea" and adjectives related to color such as "red" and "blue" was relatively high, which showed that their relations were stronger. © Springer-Verlag 2013.
International conference proceedings, English - VISUAL EVENT MINING FROM GEO-TWEET PHOTOS
Takamu Kaneko; Keiji Yanai
ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, -, 2013, Peer-reviwed, In this paper, we propose a system to mine events visually from the Twitter stream by making use of "geo-tweet photos" which are tweets including both geotags and photos. Some works on event mining which utilize geotagged tweets have been proposed so far. However, they used no images but only textual analysis of tweet texts. In this work, we detect events using visual information as well as textual information. In the experiments, we show some examples of detected events and their photos such as "rainbow", "fireworks" and "Tokyo firefly festival".
International conference proceedings, English - [DEMO PAPER] MIRURECIPE: A MOBILE COOKING RECIPE RECOMMENDATION SYSTEM WITH FOOD INGREDIENT RECOGNITION
Yoshiyuki Kawano; Takanori Sato; Takuma Maruyama; Keiji Yanai
ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, (demo paper), 2013, Peer-reviwed, In this demo, we demonstrate a cooking recipe recommendation system which runs on a consumer smartphone. The proposed system carries out object recognition on food ingredients in a real-time way, and recommends cooking recipes related to the recognized food ingredients. By only pointing a built-in camera on a mobile device to food ingredients, the user can obtain a recipe list instantly. The objective of the proposed system is to assist people who cook to decide a cooking recipe at grocery stores or at a kitchen. In the current implementation, the system can recognize 30 kinds of food ingredient in 0.15 seconds, and it achieved the 83.93% recognition rate within the top six candidates.
[GRAPHICS]
.
International conference proceedings, English - Twitter visual event mining system
Takamu Kaneko; Hiroyoshi Harada; Keiji Yanai
Electronic Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2013, (demo paper), 2013, Peer-reviwed, In this demo, we demonstrate a system to mine events visually from the Twitter stream by making use of 'geo-tweet photos'. Some works on event mining which utilize geotagged tweets have been proposed so far. However, they used no images but only textual analysis of tweet texts. In this work, we detect events using visual information as well as textual information, which is the first work to mine event photos automatically from a huge number of Twitter photos, as long as we know. In the experiments, we show some examples of detected events and their photos such as 'blooming cherry blossom' and 'Tokyo firefly festival'. © 2013 IEEE.
International conference proceedings, English - Large-scale web video shot ranking based on visual features and tag co-occurrence
Do Hang Nga; Keiji Yanai
MM 2013 - Proceedings of the 2013 ACM Multimedia Conference, 525-528, 2013, Peer-reviwed, In this paper, we propose a novel ranking method, Visu- AlTextualRank, which extends [1] and [2]. Our method is based on random walk over bipartite graph to integrate vi- sual information of video shots and tag information of Web videos ectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and tex- Tual information of their corresponding videos to improve shot ranking. We apply our proposed method to the system of extracting automatically relevant video shots of specic actions from Web videos [3]. Based on our experimental re- sults, we demonstrate that our ranking method can improve the performance of video shot retrieval. Copyright © 2013 ACM.
International conference proceedings, English - Rapid mobile object recognition using fisher vector
Yoshiyuki Kawano; Keiji Yanai
Proceedings - 2nd IAPR Asian Conference on Pattern Recognition, ACPR 2013, IEEE Computer Society, 476-480, 2013, Peer-reviwed, We propose a real-time object recognition method for a smart phone, which consists of light-weight local features, Fisher Vector and linear SVM. As light local descriptors, we adopt a HOG Patch descriptor and a Color Patch descriptor, and sample them from an image densely. Then we encode them with Fisher Vector representation, which can save the number of visual words greatly. As a classifier, we use a liner SVM the computational cost of which is very low. In the experiments, we have achieved the 79.2% classification rate for the top 5 category candidates for a 100-category food dataset. It outperformed the results using a conventional bag-of-features representation with a chi-square-RBF- kernel-based SVM. Moreover, the processing time of food recognition takes only 0.065 seconds, which is four times as faster as the existing work. © 2013 IEEE.
International conference proceedings, English - A Spatio-Temporal Feature based on Triangulation of Dense SURF
Do Hang Nga; Keiji Yanai
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), IEEE, 420-427, 2013, Peer-reviwed, In this paper, we propose a spatio-temporal feature which is based on the appearance and movement of interest SURF keypoints. Given a video, we extract its spatiotemporal features according to every small set of frames. For each frame set, we first extract dense SURF keypoints from its first frame and estimate their optical flows at each frame. We then detect camera motion and compensate flow vectors in case camera motion exists. Next, we select interest points based on their movement based relationship through the frame set. We then apply Delaunay triangulation to form triangles of selected points. From each triangle we extract its shape feature along with trajectory based visual features of its points. We show that concatenating these features with SURF feature can form a spatio-temporal feature which is comparable to the state of the art. Our proposed spatio-temporal feature is supposed to be robust and informative since it is not based on characteristics of individual points but groups of related interest points. We apply Fisher Vector encoding to represent videos using the proposed feature. We conduct various experiments on UCF101, the largest action dataset of realistic videos up to date, and show the effectiveness of our proposed method.
International conference proceedings, English - Yuji Matsuda and Keiji Yanai
Yuji Matsuda; Keiji Yanai
Proc. of IAPR International Conference on Pattern Recognition, -, Nov. 2012, Peer-reviwed
International conference proceedings, English - UEC at TRECVID 2012 SIN and MED task
Kazuya Hizume; Keiji Yanai
Proc. of TRECVID Workshop, -, Nov. 2012
International conference proceedings, English - Entropy-Based Analysis of Visual and Geolocation Concepts in Images
Keiji Yanai; Hidetoshi Kawakubo; Kobus Barnard
Multimedia Information Extraction: Advances in Video, Audio, and Imagery Analysis for Search, Data Mining, Surveillance, and Authoring, John Wiley and Sons, 63-80, 24 Aug. 2012, Peer-reviwed
In book, English - 候補領域推定に基づく複数品目食事画像認識
松田裕司; 甫足創; 柳井啓司
電子情報通信学会論文誌D, The Institute of Electronics, Information and Communication Engineers, J95-D, 8, 1554-1564, Aug. 2012, Peer-reviwed, 本研究では食事内容を少ない手間で記録するために,画像認識技術を用いて,画像中に含まれると推測される料理名の候補を表示する認識エンジンを構築した.我々は以前,Multiple Kernel Learningを用いて,色特徴や局所領域特徴等の複数の特徴を統合して学習・分類を行う認識エンジンを構築した.本研究では食事画像認識手法の改良を行う.高速スライディングウィンドウ探索や領域分割,円検出を用いて,画像中の料理の位置候補を推定し,その部分に対して従来の手法による分類を行うことで,画像中に複数の料理がある場合に対応した.実験では,100種類の料理について分類を行い性能を評価を行った.その結果,複数の領域検出法を組み合わせて用いることで,10個の候補を表示したとき,単品を含む画像では,従来手法と比べ5.4ポイント向上し,69.6%,複数品を含む画像では,従来手法と比べ40.1ポイント向上し55.5%の分類率を達成し,特に複数品を含む食事画像の認識において提案手法は有効であることが示された.
Scientific journal, Japanese - Visual Analysis on Relations between Nouns and Adjectives Using a Large Number of Web Images
Yuuya Kohara; Keiji Yanai
Proc. of International Workshop on Modern Science and Technology (IWMST), -, Aug. 2012
International conference proceedings, English - Multiple-Food Recognition Considering Co-occurrence Employing Manifold Ranking
Yuji Matsuda; Keiji Yanai
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), IEEE, 2017-2020, 2012, Peer-reviwed, In this paper, we propose a method to recognize food images which include multiple food items considering co-occurrence statistics of food items. The proposed method employs a manifold ranking method which has been applied to image retrieval successfully in the literature. In the experiments, we prepared co-occurrence matrices of 100 food items using various kinds of data sources including Web texts, Web food blogs and our own food database, and evaluated the final results obtained by applying manifold ranking. As results, it has been proved that co-occurrence statistics obtained from a food photo database is very helpful to improve the classification rate within the top ten candidates.
International conference proceedings, English - A SURF-based spatio-temporal feature for feature-fusion-based action recognition
Akitsugu Noguchi; Keiji Yanai
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6553, 1, 153-167, 2012, Peer-reviwed, In this paper, we propose a novel spatio-temporal feature which is useful for feature-fusion-based action recognition with Multiple Kernel Learning (MKL). The proposed spatio-temporal feature is based on moving SURF interest points grouped by Delaunay triangulation and on their motion over time. Since this local spatio-temporal feature has different characteristics from holistic appearance features and motion features, it can boost action recognition performance for both controlled videos such as the KTH dataset and uncontrolled videos such as Youtube datasets, by combining it with visual and motion features with MKL. In the experiments, we evaluate our method using KTH dataset, and Youtube dataset. As a result, we obtain 94.5% as a classification rate for in KTH dataset which is almost equivalent to state-of-art, and 80.4% for Youtube dataset which outperforms state-of-the-art greatly. © 2012 Springer-Verlag.
International conference proceedings, English - World Seer : A realtime geo-tweet photo mapping system
Keiji Yanai
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR 2012, (demo paper), 2012, Peer-reviwed, Twitter is a unique microblog which is different from conventional social media in terms of its quickness. Many Twitter's users send messages to Twitter on the spot with mobile phones or smart phones, and some of them send tweets with photos and geotags, which can be regarded as being geotagged photos. Geotagged tweet photos are very useful to understand what happens currently over the world. In the demo, we introduce "World Seer" which is a real-time geo-tweet photo mapping system. Users can see the latest geo-tweet photos related to given keywords and areas on the online maps. The system shows geo-tweet photos not only on the map, but also on the street-view. In addition, for some parts of the geo-tweet photos, the system can show representative photos for the given locations and the given times employing the GeoVisualRank method which takes into account both visual features of photos and proximity of geotags. Copyright © 2012 ACM.
International conference proceedings, English - Automatic collection of web video shots corresponding to specific actions using web images
Do Hang Nga; Keiji Yanai
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 15-20, 2012, Peer-reviwed, In this paper, we apply Web images to the problem of automatically extracting video shots corresponding to specific actions from Web videos. Our framework modifies the unsupervised method on automatic collecting of Web video shots corresponding to the given actions which we proposed last year [9]. For each action, following that work, we first exploit tag relevance to gather 200 most relevant videos of the given action and segment each video into several video shots. Shots are then converted into bags of spatio-temporal features and ranked by the VisualRank method. We refine the approach by introducing the use of Web action images into shot ranking step. We select images by applying Pose-lets [2] to detect human in the case of human actions. We test our framework on 28 human action categories whose precision values were 20% or below and 8 non-human action categories whose precision values were less than 15% in [9]. The results show that our model can improve the precision approximately 6% over 28 human action categories and 16% over 8 non-human action categories. © 2012 IEEE.
International conference proceedings, English - Recognition of multiple-food images by detecting candidate regions
Yuji Matsuda; Hajime Hoashi; Keiji Yanai
Proceedings - IEEE International Conference on Multimedia and Expo, 25-30, 2012, Peer-reviwed, In this paper, we propose a two-step method to recognize multiple-food images by detecting candidate regions with several methods and classifying them with various kinds of features. In the first step, we detect several candidate regions by fusing outputs of several region detectors including Felzenszwalb's deformable part model (DPM) [1], a circle detector and the JSEG region segmentation. In the second step, we apply a feature-fusion-based food recognition method for bounding boxes of the candidate regions with various kinds of visual features including bag-of-features of SIFT and CSIFT with spatial pyramid (SP-BoF), histogram of oriented gradient (HoG), and Gabor texture features. In the experiments, we estimated ten food candidates for multiple-food images in the descending order of the confidence scores. As results, we have achieved the 55.8% classification rate, which improved the baseline result in case of using only DPM by 14.3 points, for a multiple-food image data set. This demonstrates that the proposed two-step method is effective for recognition of multiple-food images. © 2012 IEEE.
International conference proceedings, English - VISUALIZATION OF REAL-WORLD EVENTS WITH GEOTAGGED TWEET PHOTOS
Yusuke Nakaji; Keiji Yanai
2012 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), IEEE, 272-277, 2012, Recently, microblogs such as Twitter have become very common, which enable people to post and read short messages from anywhere. Since microblogs are different from traditional blogs in terms of being instant and on the spot, they include much more information on various events happened over the world. In addition, some of the messages posted to Twitter include photos and geotags as well as texts. From them, we can get to know what and where happens intuitively.
Then, we propose a method to select photos related to the given real-world events from geotagged Twitter messages (tweets) taking advantage of geotags and visual features of photos. We implemented a system which can visualize real-world events on the online map.
International conference proceedings, English - Real-time mobile recipe recommendation system using food ingredient recognition
Takuma Maruyama; Yoshiyuki Kawano; Keiji Yanai
IMMPD 2012 - Proceedings of the 2012 ACM Workshop on Interactive Multimedia on Mobile and Portable Devices, Co-located with ACM Multimedia 2012, 27-33, 2012, Peer-reviwed, In this paper, we propose a mobile cooking recipe recommendation system employing object recognition for food ingredients such as vegetables and meats. The proposed system carries out object recognition on food ingredients in a real-time way on an Android-based smartphone, and recommends cooking recipes related to the recognized food ingredients. By only pointing a built-in camera on a mobile device to food ingredients, the user can obtain a recipe list instantly. As an object recognition method, we adopt bag-of-features with SURF and color histogram extracted from multiple images as image features and linear SVM with the one-vs-rest strategy as a classifier. We built 30 kinds of food ingredient short video database for experiments. With this database, we achieved the 83.93% recognition rate within the top six candidates. In the experiment, we made user study by comparing mobile recipe recommendation systems with/without ingredient recognition. © 2012 ACM.
International conference proceedings, English - A Travel Planning System Based on Travel Trajectories Extracted from a Large Number of Geotagged Photos on the Web
Kohya Okuyama; Keiji Yanai
Proc. of Pacific-Rim Conference on Multimedia, -, Dec. 2011, Peer-reviwed
International conference proceedings, English - UEC at TRECVID 2011 SIN and MED task
Kazuya Hizume; Keiji Yanai
Proc. of TRECVID Workshop, -, Nov. 2011
International conference proceedings - Folksonomyを用いた画像特徴とタグ共起に基づく画像オントロジーの自動構築
秋間雄太; 川久保秀敏; 柳井啓司
電子情報通信学会論文誌D, The Institute of Electronics, Information and Communication Engineers, J94-D, 8, 1248-1259, Aug. 2011, 近年,Folksonomyの出現により,データベースにタグなどによって意味的な価値を付与することが進められてきたが,階層構造のような概念間の関係を組み込んでいるデータベースは少ない.そこで,本研究では,意味的な階層構造を考慮した画像データベースの作成方法を提案する.階層構造の構築方法は,大量の画像データの各概念のノイズを除去した後に,各概念を視覚特徴を用いたベクトル表現,タグを用いたベクトル表現,視覚特徴とタグを統合したベクトル表現の3種類のベクトル表現で,JSダイバージェンスによる距離尺度を用いて概念間の距離関係を推定し,更に概念エントロピーを作成することで,概念の広がりから上下関係を推測する.最終的には,作成した階層構造を,視覚的な特徴のみで作成した場合とタグ特徴のみで作成した場合,そしてタグと視覚特徴を結合した場合での表現結果を考察した.結果として,視覚特徴での階層構造,タグ情報による階層構造のそれぞれにおいて特有の階層構造を確認することができ,また,統合した階層構造は両方の階層構造を加味し,それぞれの特徴を内包した新しい階層構造を作り出すことに成功した.構築された階層構造には人手での発見が難しい概念間の関係が含まれ,画像検索へ役立つ可能性を示す.
Scientific journal, Japanese - Geotagged Image Recognition by Combining Three Different Kinds of Geo location Features
Keita Yaegashi; Keiji Yanai
COMPUTER VISION - ACCV 2010, PT II, SPRINGER-VERLAG BERLIN, 6493, 360-373, 2011, Peer-reviwed, Scenes and objects represented in photos have causal relationship to the places where they are taken. In this paper, we propose using geo-information such as aerial photos and location-related texts as features for geotagged image recognition and fusing them with Multiple Kernel Learning (MKL). By the experiments, we have verified the possibility for reflecting location contexts in image recognition by evaluating not only recognition rates, but feature fusion weights estimated by MKL. As a result, the mean average precision (MAP) for 28 categories increased up to 80.87% by the proposed method, compared with 77.71% by the baseline. Especially, for the categories related to location-dependent concepts, MAP was improved by 6.57 points.
International conference proceedings, English - GeoVisualRank: A ranking method of geotagged imagesconsidering visual similarity and geo-location proximity
Hidetoshi Kawakubo; Keiji Yanai
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, 69-70, 2011, Peer-reviwed
International conference proceedings, English - Automatic Construction of an Action Video Shot Database using Web Videos
Do Hang Nga; Keiji Yanai
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 527-534, 2011, Peer-reviwed, There are a huge number of videos with text tags on the Web nowadays. In this paper, we propose a method of automatically extracting from Web videos video shots corresponding to specific actions with just only providing action keywords such as "walking" and "eating".
The proposed method consists of three steps: (1) tag-based video selection, (2) segmenting videos into shots and extracting features from the shots, and (3) visual-feature-based video shot selection with tag-based scores taken into account. Firstly, we gather video IDs and tag lists for 1000 Web videos corresponding to given keywords via Web API, and we calculate tag relevance scores for each video using a tag-co-occurrence dictionary which is constructed in advance. Secondly, we fetch the top 200 videos from the Web in the descending order of the tag relevance scores, and segment each downloaded video into several shots. From each shot we extract spatio-temporal features, global motion features and appearance features, and convert them into the bag-of-features representation. Finally, we apply the VisualRank method to select the video shots which describe the actions corresponding to the given keywords best after calculating a similarity matrix between video shots. In the experiments, we achieved the 49.5% precision at 100 shots over six kinds of human actions by just providing keywords without any supervision. In addition, we made large-scale experiments on 100 kinds of action keywords.
International conference proceedings, English - UEC at TRECVID 2010 Semantic Indexing Task
Yasushi Shimoda; Akitsugu Nogochi; Keiji Yanai
Proc. of TRECVID Workshop, -, Nov. 2010
International conference proceedings, English - Multiple Kernel Learning による50 種類の食事画像の認識
上東太一; 柳井啓司
電子情報通信学会論文誌D, The Institute of Electronics, Information and Communication Engineers, J93-D, 8, 1397-1406, Aug. 2010, Peer-reviwed, 近年,食事に関する健康管理が注目され,より簡単に食事内容が記録できるシステムの実現が望まれている.そこで,本研究では,画像認識技術を用いて食事内容を記録するシステムを提案する.画像認識手法としては最新の機械学習の手法であるMultiple Kernel Learning(MKL)を用いて,局所特徴,色特徴,テクスチャ特徴などの複数種類の画像特徴を統合し,高精度な認識を実現することを提案する.MKLを用いることにより,カテゴリーごとに認識に有効な画像特徴を自動的に推定し,各特徴に対して最適な重みを学習することが可能となる.それに加え,本研究では,提案した食事画像認識手法を組み込んだ食事画像認識システムのプロトタイプを実装した.実験では,50種類の食事画像データセットを構築し,提案手法の評価を行い,平均分類率61.34%を達成した.50種類もの大規模な食事画像の分類は,実用的な精度で実現することが困難であったため報告例がないが,本研究ではMKLによる特徴統合を行う提案手法によって,初めて大規模食事画像分類において高い認識精度を達成することができた.
Scientific journal, Japanese - 単語概念の視覚性と地理的分布の関係性の分析
川久保秀敏; 柳井啓司
電子情報通信学会論文誌D, The Institute of Electronics, Information and Communication Engineers, J93-D, 8, 1417-1428, Aug. 2010, Peer-reviwed, 本研究の目的は,単語概念と画像特徴量の関係性をWeb上の大量の画像データを用いて定量的に分析することである.具体的には, (1)Bag-of-Features表現を用いた画像領域エントロピーによる単語の視覚性の分析, (2)位置情報付きの画像の分布を表すジオエントロピーによる単語概念の地理的分布の分析, (3)画像領域エントロピーとジオエントロピーによる単語の視覚性と地理的分布の関連性の分析,を行った.単語の視覚性と地理的分布の両方を分析した研究は,本研究が初めてである.本研究では,230語の名詞と,100語の形容詞について,Webからそれぞれ対応する画像を500枚ずつ収集し,これらの分析を行った.分析の結果, "sun" や "rainbow" など空に関する名詞は,他の単語に比べて画像領域エントロピーが小さく,ジオエントロピーが大きい傾向が分かった.一方,地名・地域名や偉人名に関する単語は,ジオエントロピーが小さく,画像領域エントロピーが大きい傾向にあった.
Scientific journal, Japanese - Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions
Akitsugu Noguchi; Keiji Yanai
COMPUTER VISION - ACCV 2009, PT II, SPRINGER-VERLAG BERLIN, 5995, 458-467, 2010, Peer-reviwed, Recently spatio-temporal local features have been proposed as image features to recognize events or human actions in videos. In this paper, we propose yet another local spatio-temporal feature based on the SURF detector, which is a lightweight local feature. Our method consists of two parts: extracting visual features and extracting motion features. First, we select candidate points based on the SURF detector. Next, we calculate motion features at each point with local temporal units divided in order to consider consecutiveness of motions. Since our proposed feature is intended to be robust to rotation, we rotate optical flow vectors to the main direction of extracted SURF features. In the experiments, we evaluate the proposed spatio-temporal local feature with the common dataset containing six kinds of simple human actions. As the result, the accuracy achieves 86%, which is almost equivalent to state-of-the-art. In addition, we make experiments to classify large amounts of Web video clips downloaded from Youtube.
International conference proceedings, English - Region-based automatic web image selection
Keiji Yanai; Kobus Barnard
MIR 2010 - Proceedings of the 2010 ACM SIGMM International Conference on Multimedia Information Retrieval, 305-312, 2010, Peer-reviwed, We propose a new Web image selection method which employs the region-based bag-of-features representation. The contribution of this work is (1) to introduce the region-based bag-of-features representation into an Web image selection task where training data is incomplete, and (2) to prove its effectiveness by experiments with both generative and discriminative machine learning methods. In the experiments, we used a multiple-instance learning SVM and a standard SVM as discriminative methods, and pLSA and LDA mixture models as probabilistic generative methods. Several works on Web image filtering task with bag-of-features have been proposed so far. However, in case that the training data includes much noise, sufficient results could not be obtained. In this paper, we divide images into regions and classify each region instead of classifying whole images. By this region-based classification, we can separate foreground regions from background regions and achieve more effective image training from incomplete training data. By the experiments, we show that the results by the proposed methods outperformed the results by the whole-image-based bag-of-features. Copyright 2010 ACM.
International conference proceedings, English - Associating faces and names in japanese photo news articles
Akio Kitahara; Keiji Yanai
Progress in Informatics, National Institute of Informatics, 7, 63-70, 2010, Peer-reviwed, We propose a system which extracts faces and person names from news articles with photos on the Web and associates them automatically. The system detects face images in news photos with a face detector and extracts person names from news text with a morphological analyzer. In addition, the bag-of-keypoints representation is applied to the extracted face images for filtering out non-face images. The system uses the eigenface representation as image features of the extracted faces, and associates them with the extracted names by the modified k-means clustering in the eigenface subspace. In the experiment, we obtained the 66% precision rate at most regarding association of faces and names. © 2010 National Institute of Informatics.
Scientific journal, English - Geotagged photo recognition using corresponding aerial photos with multiple kernel learning
Keita Yaegashi; Keiji Yanai
Proceedings - International Conference on Pattern Recognition, 3272-3275, 2010, Peer-reviwed, In this paper, we treat with generic object recognition for geotagged images. As a recognition method for geotagged photos, we have already proposed exploiting aerial photos around geotag places as additional image features for visual recognition of geotagged photos. In the previous work, to fuse two kinds of features, we just concatenate them. Instead, in this paper, we introduce Multiple Kernel Learning (MKL) to integrate both features of photos and aerial images. MKL can estimate the contribution weights to integrate both kinds of features. In the experiments, we confirmed effectiveness of usage of aerial photos for recognition of geotagged photos, and we evaluated the weights of both features estimated by MKL for eighteen concepts. © 2010 IEEE.
International conference proceedings, English - Automatic construction of a Folksonomy-based visual ontology
Hidetoshi Kawakubo; Yuuta Akima; Keiji Yanai
Proceedings - 2010 IEEE International Symposium on Multimedia, ISM 2010, 330-335, 2010, Peer-reviwed, Recently, Folksonomy attracts attentions as a new method to index large-scale image databases. In the Folksonomy-style image databases, they allows users to attach keywords to images as "tags". Since tag words are uncontrolled, they have various and many kinds of tags associated with images. This is much different from conventional image databases. In this paper, we propose a novel method to extract hierarchical structure on relations between tags from Folksonomy. The tag structure we extract can be used as an ontology for image database search which reflects both textual and visual relations between tags. In the proposed method, at first, we collect millions of tag-attached-images from Flickr which is the world-largest Folksonomy-style image database, and remove noise images from them. Next, we estimate concept vectors for highly-frequent tags based on only visual features, only tag word features and combined features of both visual and textual features, and compute JS divergence and entropy for three kinds of concept vectors. Finally we estimate hierarchical structures between tags regarding three kinds of concept vectors. In the experiments, we show the obtained hierarchical structure, and it includes interesting relations which sometimes are difficult to be discovered by human. In addition, as its application, we used and evaluated the obtained ontology for query expansion of text-tag-based image search over Flickr. These results indicate that the proposed method is promising and the structure is expected to help image search and some other applications. © 2010 IEEE.
International conference proceedings, English - Image recognition of 85 food categories by feature fusion
Hajime Hoashi; Taichi Joutou; Keiji Yanai
Proceedings - 2010 IEEE International Symposium on Multimedia, ISM 2010, 296-301, 2010, Peer-reviwed, Recognition of food images is challenging due to their diversity and practical for health care on foods for people. In this paper, we propose an automatic food image recognition system for 85 food categories by fusing various kinds of image features including bag-of-features (BoF), color histogram, Gabor features and gradient histogram with Multiple Kernel Learning (MKL). In addition, we implemented a prototype system to recognize food images taken by cellular-phone cameras. In the experiment, we have achieved the 62.52% classification rate for 85 food categories.
International conference proceedings, English - Detecting ``In-play'' Photos from Web Sports News Photos
Akio Kitahara; Keiji Yanai
Proc. of the Pacific-Rim Conference on Multimedia, -, Dec. 2009, Peer-reviwed
International conference proceedings, English - UEC at TRECVID 2009 High Level Feature Task
Zhiyuan Tang; Akitsugu Noguchi; Keiji Yanai
Proc. of TRECVID Workshop, -, Nov. 2009
International conference proceedings, English - Detecting "In-Play" Photos in Sports News Photo Database
Akio Kitahara; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, SPRINGER-VERLAG BERLIN, 5879, 268-279, 2009, Peer-reviwed, In this paper we treat with in-play classification of sports news photos as an instance of researches on more sophisticated search methods for large-scale photo news databases. We propose two methods to classify sports news photos Into one of the given six sports categories and to disci affiliate in-play photos from not-in-play ones One is the two-step method which classifies sports categories first and recognizes in-play conditions next, and the other is the one-step method which classifies them simultaneously In the proposed methods. we integrate textual features extracted from news all ides and image features extracted from photo images by Multiple Kernel Learning (MKL). In the experiment of the two-step method we obtained 99 33% as the classification late lot the sports category classification which is the first step and 80 75% for the in-play classification which is the second step On the other hand, in the experiment of the one-step method. we obtained 77 08% which was a little less than the result, by the two-step method
International conference proceedings, English - Can Geotags Help Image Recognition?
Keita Yaegashi; Keiji Yanai
ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 5414, 361-373, 2009, Peer-reviwed, In this paper, we propose to exploit geotags as additional information for visual recognition of consumer photos to improve its performance. Geotags, which represent places where the photos were taken, for photos can be obtained automatically by carrying a portable small GPS device with digital cameras. Geotags have potential to improve performance of visual image recognition, since recognition targets are unevenly distributed. For example, "beach" photos can be taken near the, sea and "lion" photos can be taken only in a zoo except Africa.
To integrate geotag information into visual image recognition, we adopt two types of geographical information, raw values of latitude and longitude, and visual feature of aerial photos around the location the. geotag represents. As classifiers, we. rise both a discriminative method and a generative method in the experiments.
The objective of this paper is to examine if geotags can help category-level image recognition. Note that we define air image recognition problem as deciding if air image is associated with a. certain given concept such as "mountain" and "beach" in this paper. We, propose a novel method to carry out geotagged image recognition in this paper. The experimental results demonstrate effectiveness of usage of geographical information for recognition of consumer photos.
International conference proceedings, English - Detecting cultural differences using consumer-generated geotagged photos
Keiji Yanai; Keita Yaegashi; Bingyu Qiu
Proceedings of the 2nd International Workshop on Location and the Web, LOCWEB'09, 40-43, 2009, Peer-reviwed, We propose a novel method to detect cultural differences over the world automatically by using a large amount of geotagged images on the photo sharing Web sites such as Flickr. We employ the state-of-the-art object recognition technique developed in the research community of computer vision to mine representative photos of the given concept for representative local regions from a large-scale unorganized collection of consumer-generated geotagged photos. The results help us understand how objects, scenes or events corresponding to the same given concept are visually different depending on local regions over the world. Copyright 2009 ACM.
International conference proceedings, English - Mining cultural differences from a large number of geotagged photos
Keiji Yanai; Bingyu Qiu
WWW'09 - Proceedings of the 18th International World Wide Web Conference, 1173-1174, 2009, We propose a novel method to detect cultural differences over the world automatically by using a large amount of geotagged images on the photo sharing Web sites such as Flickr. We employ the state-of-the-art object recognition technique developed in the research community of computer vision to mine representative photos of the given concept for representative local regions from a large-scale unorganized collection of consumer-generated geotagged photos. The results help us understand how objects, scenes or events corresponding to the same given concept are visually different depending on local regions over the world. Copyright is held by the author/owner(s).
International conference proceedings, English - WEB IMAGE GATHERING WITH REGION-BASED BAG-OF-FEATURES AND MULTIPLE INSTANCE LEARNING
Keiji Yanai
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, IEEE, 450-453, 2009, Peer-reviwed, We propose a new Web image gathering system which employs the region-based bag-of-features representation and multiple instance learning. The contribution of this work is introducing the region-based bag-of-features representation into an Web image gathering task where training data is incomplete and having proved its effectiveness by comparing the proposed method with the normal whole-image-based bag-of-features representation.
In our method, first, we perform region segmentation for an image, and next we generate a bag-of-features vector for each region. One image is represented by a set of bag-of-features vectors in this paper, while one image is represented by just one bag-of-features vector in the normal bag-of-features representation which is very popular for visual object categorization tasks recently.
Several works on Web image selection with bag-of-features have been proposed so far. However, in case that the training data includes much noise, sufficient results could not be obtained. In this paper, we divide images into regions and classify each region with multiple-instance support vector machine (mi-SVM) instead of classifying whole images. By this region-based classification, we can separate foreground regions from background regions and achieve more effective image training from incomplete training data. By the experiments, we show that the results by the proposed methods outperformed the results by the whole-image-based bag-of-visual-words and the normal support vector machine.
International conference proceedings, English - AN ANALYSIS OF THE RELATION BETWEEN VISUAL CONCEPTS AND GEO-LOCATIONS USING GEOTAGGED IMAGES ON THE WEB
Hidetoshi Kawakubo; Keiji Yanai
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, IEEE, 1644-1647, 2009, Peer-reviwed, Recently, a large number of geotagged images are available on photo sharing Web sites such as Flickr. In this paper, we propose image region entropy and geo-location entropy for analyzing the relation between visual concepts and geographical locations using a large-scale geotagged image database. Image region entropy represents to what extent concepts have visual characteristics, while geo-location entropy represents to what extent concepts are distributed over the world. In the experiment, we analyzed relations between image region entropy and geo-location entropy in terms of 230 nouns and 100 adjectives, and we found that the concepts with low image entropy tend to have high geo-location entropy and vice versa.
International conference proceedings, English - A visual analysis of the relationship between word concepts and geographical locations
Keiji Yanai; Hidetoshi Kawakubo; Bingyu Qiu
CIVR 2009 - Proceedings of the ACM International Conference on Image and Video Retrieval, 92-99, 2009, Peer-reviwed, In this paper, we describe two methods to analyze the relationship between word concepts and geographical locations by using a large amount of geotagged images on the photo sharing Web sites such as Flickr. Firstly, we propose using both image region entropy and geolocation entropy to analyze relations between location and visual features, and in the experiment we found that concepts with low image entropy tends to have high geo-location entropy and vice versa. Secondly, we propose a novel method to select representative photographs for regions in the worldwide dimensions, which helps detect cultural differences over the world regarding word concepts with high geo-location entropy. In the proposed method, at first, we extracts the most relevant images by clustering and evaluation on the visual features. Then, based on geographic information of the images, representative regions are automatically detected. Finally, we select and generate a set of representative images for the representative regions by employing the Probabilistic Latent Semantic Analysis (PLSA) modelling. The results show the ability of our approach to mine regional representative photographs and cultural differences over the world. Copyright 2009 ACM.
International conference proceedings, English - A FOOD IMAGE RECOGNITION SYSTEM WITH MULTIPLE KERNEL LEARNING
Taichi Joutou; Keiji Yanai
2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, IEEE, 285-288, 2009, Peer-reviwed, Since health care on foods is drawing people's attention recently, a system that can record everyday meals easily is being awaited. In this paper, we propose an automatic food image recognition system for recording people's eating habits. In the proposed system, we use the Multiple Kernel Learning (MKL) method to integrate several kinds of image features such as color, texture and SIFT adaptively. MKL enables to estimate optimal weights to combine image features for each category. In addition, we implemented a prototype system to recognize food images taken by cellular-phone cameras. In the experiment, we have achieved the 61.34% classification rate for 50 kinds of foods. To the best of our knowledge, this is the first report of a food image classification system which can be applied for practical use.
International conference proceedings, English - UEC at TRECVID 2008 High Level Feature Task
Zhiyuan Tang; Keiji Yanai
Proc. of TRECVID Workshop, -, Nov. 2008
International conference proceedings, English - WEB VIDEO RETRIEVAL BASED ON THE EARTH MOVER'S DISTANCE BY INTEGRATING COLOR, MOTION AND SOUND
Keisuke Takada; Keiji Yanai
2008 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS, IEEE, 89-92, 2008, Peer-reviwed, In this paper, we propose a novel content-based video retrieval method for short video clips which are stored on consumer video sharing Web sites. It is based on the Earth Mover's Distance which enables us to evaluate dissimilarities among videos where the number of shots and time length are different. As features extracted from videos, we use color, motion, sound and position of shots. By defining the ground distance of EMD as the weighted sum of Euclid distances of these four kinds of features, we integrate them when calculating EMD. In the experiments on video retrieval for YouTube videos, we obtained the 0.98 average precision at most, which shows effectiveness of the proposed method. In addition, the results of integration of four kinds of features outperformed the ones of single features, which shows that feature combination is effective.
International conference proceedings, English - Web image gathering with a part-based object recognition method
Keiji Yanai
ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 4903, 297-306, 2008, Peer-reviwed, We propose a new Web image gathering system which employs a part-based object recognition method. The novelty of our work is introducing the bag-of-keypoints representation into an Web image gathering task instead of color histogram or segmented regions our previous system used. The bag-of-keypoints representation has been proven that it has the excellent ability to represent image concepts in the context of visual object categorization / recognition in spite of its simplicity. Most of object recognition work assumed that complete training data is available. On the other hand, in the Web image gathering task, since images associated with the given keywords are gathered from the Web fully-automatically, complete training images cannot be available. In this paper, we combine the HTML-based automatic positive training image selection and the bag-of-keypoints-based image selection with an SVM which is a supervised machine learning method. This combination enables the system to gather many images related to given concepts with high precision fully automatically needing no human intervention. Our main objective is to examine if the bag-of-keypoints model is also effective for the Web image gathering task where training images always include some noise. By the experiments, we show the new system outperforms our previous systems, other systems and Google Image Search greatly.
International conference proceedings, English - Associating Faces and Names in Japanese Photo News Articles on the Web
Akio Kitahara; Taichi Joutou; Keiji Yanai
2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, IEEE, 1156-1161, 2008, Peer-reviwed, We propose a system which extracts faces and person names from news articles with photos on the Web and associates them automatically. The system detects face images in news photos with a face detector and extracts person names from news text with a morphological analyzer In addition, the bag-of-keypoints technique is applied to the extracted face images for filtering out non-face images. The system uses the eigenface representation as image features of the extracted faces, and associates them with the extracted names by the modified k-means clustering in the eigenface subspace. In the experiment, we obtained the 66% precision rate regarding association of faces and names.
International conference proceedings, English - Automatic web image selection with a probabilistic latent topic model
Keiji Yanai
Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08, 1237-1238, 2008, Peer-reviwed, We propose a new method to select relevant images to the given keywords from images gathered from the Web based on the Probabilistic Latent Semantic Analysis (PLSA) model which is a probabilistic latent topic model originally proposed for text document analysis. The experimental results show that the results by the proposed method are almost equivalent to or outperform the results by existing methods. In addition, it is proved that our method can select more various images compared to the existing SVM-based methods.
International conference proceedings, English - WEB IMAGE SELECTION WITH PLSA
Keiji Yanai
2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, IEEE, 1373-1376, 2008, Peer-reviwed, In this paper, we propose a new method to select relevant images to the given keywords from the images gathered from the Web. Our novel method is based on the Probabilistic Latent Semantic Analysis (PLSA) model, which is a generative probabilistic topic model. Firstly, we gather images related to the given keywords from the Web with Web search engines. Secondly, we choose pseudo-training images from them by simple heuristic HTML analysis, and train our PLSA-based probabilistic model with them. Finally, we select relevant images from all the gathered images with the learned model. The experimental results shows that the results by the proposed method is almost equivalent to the results by existing methods, although our method does not need to prepare negative training samples in advance unlike existing methods.
International conference proceedings, English - WEB VIDEO RETRIEVAL BASED ON THE EARTH MOVER'S DISTANCE BY INTEGRATING COLOR, MOTION AND SOUND
Keisuke Takada; Keiji Yanai
2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, IEEE, 89-92, 2008, Peer-reviwed, In this paper, we propose a novel content-based video retrieval method for short video clips which are stored on consumer video sharing Web sites. It is based on the Earth Mover's Distance which enables us to evaluate dissimilarities among videos where the number of shots and time length are different. As features extracted from videos, we use color, motion, sound and position of shots. By defining the ground distance of EMD as the weighted sum of Euclid distances of these four kinds of features, we integrate them when calculating EMD. In the experiments on video retrieval for YouTube videos, we obtained the 0.98 average precision at most, which shows effectiveness of the proposed method. In addition, the results of integration of four kinds of features outperformed the ones of single features, which shows that feature combination is effective.
International conference proceedings, English - Rushes summarization based on color, motion and face
Akitsugu Noguchi; Keiji Yanai
MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops, 139-143, 2008, Peer-reviwed, In this paper, we present a method for the Rushes Summarization task which is one of tasks of TRECVID 2008. In the proposed method, first an input video is decomposed into shots by comparing consecutive frames. Then, these shots are grouped by the k-means method, using color, motion and faces as features. In the preliminary experiments, we compared three systems which employed the following feature combinations: "color", "color and motion" and "color, motion and faces". As a result, we found out that motion features and face features were effective. Our results of Rushes Summarization 2008 were a little below the median regarding IN (inclusion ratio of ground truth) and JU (lack of junk shots), but were above the median regarding TE (pleasant tempo). Then, to improve IN and JU, we modified the method to detect clapper boards by introducing visual feature in addition to sound feature. The additional experiment regarding the modification after submission shows that it improved the results. © 2008 ACM.
International conference proceedings, English - Objects over the World
Bingyu Qiu; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2008, 9TH PACIFIC RIM CONFERENCE ON MULTIMEDIA, SPRINGER-VERLAG BERLIN, 5353, 296-+, 2008, Peer-reviwed, This paper considers the problem of selecting representative photographs for regions in the worldwide dimensions. Selecting and generating such representative photographs for representative regions from large-scale collections would help us understand about local specific objects with a worldwide perspective. We propose a solution to this problem using a large-scale collection of geo-tagged photographs. Our solution firstly extracts the most relevant images by clustering and evaluation on the visual features. Then, based on geographic information of the images, representative regions are automatically detected. Finally, we select and generate a set of representative images for the representative regions by employing the Probabilistic Latent Semantic Analysis (PLSA) modelling. The results show the ability of our approach to generate region-based representative photographs.
International conference proceedings, English - 一般画像認識の現状と今後
柳井 啓司
情報処理学会論文誌:コンピュータビジョン・イメージメディア, 48, CVIM19, 1-24, Dec. 2007, Peer-reviwed
Scientific journal, Japanese - 一般物体認識のための単語概念の視覚性の分析
柳井啓司; Kobus Barnard
情報処理学会論文誌:コンピュータビジョン・イメージメディア, 48, CVIM17, Feb. 2007, Peer-reviwed
Scientific journal, Japanese - 確率的Web画像収集
柳井啓司
人工知能学会誌, 22, 1, 10-18, Jan. 2007, Peer-reviwed
Scientific journal, Japanese - Image collector III: A web image-gathering system with bag-of-keypoints
Keiji Yanai
16th International World Wide Web Conference, WWW2007, 1295-1296, 2007, Peer-reviwed, We propose a new system to mine visual knowledge on the Web.There are huge image data as well as text data on the Web. However, mining image data from the Web is paid less attention than mining text data, since treating semantics of images are much more difficult. In this paper, we propose introducing a latest image recognition technique, which is the bag-of-keypoints representation,into Web image-gathering task. By the experiments we show theproposed system outperforms our previous systems and Google Imagesearch greatly.
International conference proceedings, English - The photo news flusher: A photo-news clustering browser
Tatsuya Iyota; Keiji Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2007, SPRINGER-VERLAG BERLIN, 4810, 462-466, 2007, Peer-reviwed, We propose a novel news browsing system that can cluster photo news articles based on both textual features of articles and image features of news photos for a personal news database which is built by accumulating Web photo news articles. The system provides two types of clustering methods: normal clustering and thread-style clustering. It enables us to browse news articles over several weeks or months visually and find out useful news easily. In this paper, we describe an overview of our system, some examples of uses and user studies.
International conference proceedings, English - Web image gathering with a spatial pyramid kernel
Keiji Yanai
ISM WORKSHOPS 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA - WORKSHOPS, PROCEEDINGS, IEEE COMPUTER SOC, 309-314, 2007, Peer-reviwed, In this paper, we propose a Web image gathering system employing state-of-the-art object recognition techniques, which are bag-of-visual-words representation and a spatial pyramid kernel. In these several years, research on object recognition is progressing greatly. Most of work on object recognition assumes complete training data is available, while complete training data is not available in general in case of gathering Web images with no human intervention. The objective of this paper is to examine if a state-of-the-art object recognition technique is also effective for the Web image gathering task where training images always include some noise. By the experiments, we show the state-of-the-art object recognition method is also very effective for Web image gathering and the results outperform ones by existing methods greatly.
International conference proceedings, English - UEC at TRECVID 2007 High Level Feature Task
O. Liu; Z. Tang; K. Yanai
Proc. of TRECVID Workshop 2007, -, 2007
International conference proceedings, English - Image Classification by a Probabilistic Model Learned from Imperfect Training Data on the Web
Keiji Yanai
Proc. of ACM Knowledge Discovery and Data Mining (KDD) Workshop on Multimedia Data Mining, 75-82, Aug. 2006, Peer-reviwed
International conference proceedings, English - Cross modal disambiguation
Kobus Barnard; Keiji Yanai; Matthew Johnson; Prasad Gabbur
TOWARD CATEGORY-LEVEL OBJECT RECOGNITION, SPRINGER-VERLAG BERLIN, 4170, 238-+, 2006, Peer-reviwed, We consider strategies for reducing ambiguity in multi-modal data, particularly in the domain of images and text. Large data sets containing images with associated text (and vice versa) are readily available, and recent work has exploited such data to learn models for linking visual elements to semantics. This requires addressing a correspondence ambiguity because it is generally not known which parts of the images connect with which language elements. In this paper we first discuss using language processing to reduce correspondence ambiguity in loosely labeled image data. We then consider a similar problem of using visual correlates to reduce ambiguity in text with associated images. Only rudimentary image understanding is needed for this task because the image only needs to help differentiate between a limited set of choices, namely the senses of a particular word.
International conference proceedings, English - Automatic "Go" record generation from a TV program
Keiji Yanai; Takehisa Hayashiyama
12TH INTERNATIONAL MULTI-MEDIA MODELLING CONFERENCE PROCEEDINGS, IEEE, 414-417, 2006, Peer-reviwed, We present a video recognition system of a "Go" TV program. It generates a Go play record automatically from a broadcast of Go played by human professionals. "Go" is the ancient Asian board game played between two player which is similar to Chess and Shogi. For an MPEG2 video of a TV Go program, the system distinguishes play commentary boardshots from other types of shots such as player's shots, and detects Go stones placed on the board from board shots. The system removes several types of noise such as a player's head or hand. In addition, it also detects Go stone from commentary board shots which are often inserted between play boardshots, and compensates for the order of Go stones placed on the play board during commentary board shots. In the experimental results for eight TV Go program the system have achieved the 95.7% precision and the 95.7% recall rate.
International conference proceedings, English - Mutual Information Between Words and Pictures
Kobus Barnard; Keiji Yanai
Proc. of the Workshop on Information Theory and Applications, (), Jan. 2006, Peer-reviwed
International conference proceedings, English - Finding visual concepts by web image mining
Keiji Yanai; Kobus Barnard
Proceedings of the 15th International Conference on World Wide Web, 923-924, 2006, Peer-reviwed, We propose measuring "visualness" of concepts with images on the Web, that is, what extent concepts have visual characteristics. This is a new application of "Web image mining". To know which concept has visually discriminative power is important for image recognition, since not all concepts are related to visual contents. Mining image data on the Web with our method enables it. Our method performs probabilistic region selection for images and computes an entropy measure which represents "visualness" of concepts. In the experiments, we collected about forty thousand images from the Web for 150 concepts. We examined which concepts are suitable for annotation of image contents.
International conference proceedings, English - Evaluation Strategies for Image Understanding and Retrieval
Keiji Yanai; Nikhil V. Shirahatti; Prasad Gabbur; Kobus Barnard
Proc. of ACM Multimedia Workshop on Multimedia Information Retrieval, 217-226, Nov. 2005, Peer-reviwed
International conference proceedings, English - Probabilistic Web Image Gathering
Keiji Yanai; Kobus Barnard
Proc. of ACM Multimedia Workshop on Multimedia Information Retrieval, 57-64, Nov. 2005, Peer-reviwed
International conference proceedings, English - UEC at TRECVID 2005 High Level Feature Task --Web Images Meet TRECVID--
Keiji Yanai; Liu Ounan; Yuki Tsujita
Proc. of the TRECVID Conference, -, Nov. 2005
International conference proceedings, English - Image region entropy: A measure of "visualness" of web images associated with one concept
Keiji Yanai; Kobus Barnard
Proceedings of the 13th ACM International Conference on Multimedia, MM 2005, 419-422, 2005, Peer-reviwed, We propose a new method to measure "visualness" of concepts, that is, what extent concepts have visual characteristics. To know which concept has visually discriminative power is important for image annotation, especially automatic image annotation by image recognition system, since not all concepts are related to visual contents. Our method performs probabilistic region selection for images which are labeled as concept "X" or "non-X", and computes an entropy measure which represents "visualness" of concepts. In the experiments, we collected about forty thousand images from the World-Wide Web using the Google Image Search for 150 concepts. We examined which concepts are suitable for annotation of image contents. Copyright © 2005 ACM.
International conference proceedings, English - Image collector II: A system to gather a large number of images from the web
Keiji Yanai
IEICE Transactions on Information and Systems, Institute of Electronics, Information and Communication, Engineers, IEICE, E88-D, 10, 2432-2436, 2005, Peer-reviwed, We propose a system that enables us to gather hundreds of images related to one set of keywords provided by a user from the World Wide Web. The system is called Image Collector II. The Image Collector, which we proposed previously, can gather only one or two hundreds of images. We propose the two following improvements on our previous system in terms of the number of gathered images and their precision: (1) We extract some words appearing with high frequency from all HTML files in which output images are embedded in an initial image gathering, and using them as keywords, we carry out a second image gathering. Through this process, we can obtain hundreds of images for one set of keywords. (2) The more images we gather, the more the precision of gathered images decreases. To improve the precision, we introduce word vectors of HTML files embedding images into the image selecting process in addition to image feature vectors. Copyright © 2005 The Institute of Electronics, Information and Communication Engineers.
International conference proceedings, English - 一般画像自動分類の実現へ向けたWorld Wide Webからの画像知識の獲得
柳井 啓司
人工知能学会論文誌, 19, 5, 429-439, Oct. 2004, Peer-reviwed
Scientific journal, Japanese - A fast image-gathering system from the World-Wide Web using a PC cluster
K Yanai; M Shindo; K Noshita
IMAGE AND VISION COMPUTING, ELSEVIER SCIENCE BV, 22, 1, 59-71, Jan. 2004, Peer-reviwed, Due to the recent explosive progress of WWW (World-Wide Web), we can easily access a large number of images on WWW. There are, however, no established methods to make use of WWW as a large image database. In this paper, we describe an automatic image-gathering system from WWW, in which we use both keywords and image features. By exploiting some existing keyword-based search engines and selecting images by their image features, our system obtains, with high accuracy, images that are relevant to query keywords. Our system has the following two novel properties: (1) It does not need to make a huge index for a great number of images on the whole WWW because of taking advantage of commercial keyword-based text-search engines. (2) It can gather a lot of images related to given keywords full-automatically without a user's intervention during the processing. The system has been implemented on a parallel PC cluster, which enables us to gather more than one hundred images from WWW in about one minute. (C) 2003 Elsevier B.V. All rights reserved.
Scientific journal, English - Generic Image Classification Using Visual Knowledge on the Web
Keiji Yanai
Proc. of ACM International Conference on Multimedia, 67-76, Nov. 2003, Peer-reviwed
International conference proceedings, English - Web Image Mining toward Generic Image Recognition
Keiji Yanai
Proc. of ACM International World Wide Web Conference, Poster Paper No.193, May 2003, Peer-reviwed
International conference proceedings, English - Image Collector II : An Over-One-Thousand-Image-Gathering System
Keiji Yanai
Proc. of ACM International World Wide Web Conference, Poster Paper No.47, May 2003, Peer-reviwed
International conference proceedings, English - Image collector II: A system for gathering more than one thousand images from the web for one keyword
K Yanai
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, IEEE, 785-788, 2003, Peer-reviwed, We propose a system that enables us to gather more than one thousand images from the World Wide Web. The system is called Image Collector H. The Image Collector, which we proposed previously, can gather only several hundreds images. We made the two following improvements to extend the ability of our previous system in terms of the number of gathered images and their precision: (1) We extracted some words appearing with high frequency from all HTML files embedding output images in an initial image gathering, and using them as keywords, we made a second image gathering again. Through this, we obtained more than one thousand images for one keyword. (2) The more images we gathered, the more he precision of gathered images decreased. To raise the precision, we introduced word vectors of HTML files embedding images into the image selecting process in addition to image feature vectors.
International conference proceedings, English - Web Image Mining: Can we gather visual knowledge for image recognition from the Web?
K Yanai
ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, IEEE, 186-190, 2003, Peer-reviwed, Because of the wide spread of digital imaging devices and the World Wide Web, we can easily obtain digital images of various kinds of real world scenes. Currently, however, classification/recognition of generic real world images is far from practical due to a diversity of real world scenes.
To deal with such diversity, we have proposed gathering real world images from the World-Wide Web and using them as training images for image classification. We call this research project "Web Image Mining". Web images are as diverse as real world scene, since Web images are taken by a large number of people for various kinds of purpose. It is expected that diverse training images enable us to classify/recognition diverse real world images. In this paper, we describe our ongoing project, "Web Image Mining for Generic Image Recognition".
International conference proceedings, English - Recognition of indoor images employing supporting relation between objects
Keiji Yanai; Koichiro Deguchi
Systems and Computers in Japan, 33, 11, 14-26, Oct. 2002, In this paper, we describe a new design of a recognition system for a single image of an indoor scene including complex occlusions. In conventional works, the systems could not recognize images of an indoor scene including complex occlusions. Our system can treat them by employing supporting relation between objects. In our system, first, the system estimates the 3D structure of an object by fitting a 3D structure model to the image qualitatively. Next, by checking the supporting relation between objects, it eliminates object candidates that cannot exist and estimates real objects from their parts in the image. Finally, the system recognizes objects that are compatible with each other. We implemented the system as a multi-agent-based image understanding system. In this paper, we describe the design of the system and results of experiments. © 2002 Wiley Periodicals, Inc. Syst. Comp. Jpn., 33(11).
Scientific journal, English - Image Classification by Web Images
Keiji Yanai
Proc. of the Seventh Pacific-Rim International Conference on Artificial Intelligence (Springer LNAI no.2417), 613-614, Aug. 2002, Peer-reviwed
International conference proceedings, English - 反復深化探索に基く協力詰将棋の解法
星由雄; 野下浩平; 柳井啓司
情報処理学会論文誌, 43, 1, 11-19, Jan. 2002
Scientific journal, Japanese - An experiment on generic image classification using Web images
K Yanai
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, SPRINGER-VERLAG BERLIN, 2532, 303-310, 2002, Peer-reviwed, In this paper, we describe an experiment on generic image classification using a large number of images gathered from the Web as learning images. The processing consists of three steps. In the gathering stage, a system gathers images related to given class keywords from the Web automatically. In the learning stage, it extracts image features from gathered images and associates them with each class. In the classification stage, the system classifies a test image into one of classes corresponding to the class keywords by using the association between image features and classes. In the experiments, we achieved a classification rate 44.6% for generic images by using images gathered from the World-Wide Web automatically as learning images.
Scientific journal, English - A multi-resolution image understanding system based on multi-agent architecture for high-resolution images
K Yanai; K Deguchi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E84D, 12, 1642-1650, Dec. 2001, Peer-reviwed, Recently a high-resolution image that has more than one million pixels is available easily. However, such an image requires much processing time and memory for an image understanding system. In this paper, we propose an integrated image understanding system of multi-resolution analysis and multiagent-based architecture for high-resolution images. The system we propose in this paper has capability to treat with a high-resolution image effectively without much extra cost. We implemented an experimental system for images of indoor scenes.
Scientific journal, English - キーワードと画像特徴を利用したWWWからの画像収集システム
柳井啓司
情報処理学会論文誌:データベース, 42, SIG10 (TOD11), 79-91, Oct. 2001, Peer-reviwed
Research society, Japanese - A Fast Image-Gathering System on the World-Wide Web Using a PC Cluster
Keiji Yanai; Masaya Shindo; Kohei Noshita
Proc. of International Conference on Web Intelligence 2001 (Springer LNAI no.2198), 1, 324-334, Sep. 2001, Peer-reviwed
International conference proceedings, English - 物体間の支持関係を利用した室内画像の認識
柳井啓司; 出口光一郎
電子情報通信学会論文誌D-II, The Institute of Electronics, Information and Communication Engineers, 84-DII, 8, 1741-1752, Aug. 2001, Peer-reviwed, 本論文では,複雑なオクルージョンを含む室内シーン画像に対する画像認識システムを提案する.従来のシステムでは,物体が十分に画像中に現れていないと認識ができず,室内シーンのような複雑なオクルージョンを含む画像に対して対処できなかった.それに対して,我々の提案するシステムでは,物体が物体の上に載っているという関係である物体間の支持関係を定性的に推論することによって,他の物体によって隠されている物体の認識を可能としている.具体的には,最初に画像中に明確に現れている対象に対して3次元構造モデルを当てはめることによって物体の3次元構造を推定し,次に推定された物体の3次元構造を利用して,物体間の支持関係をチェックすることによって,部分的にしか見えていない物体の存在を推定したり,実在しない物体の候補を消去し,最終的に全体として整合性のとれた認識結果を得る.我々は,こうした認識を我々が従来より研究しているマルチエージェント型の画像認識システムとして実現した.本論文では,システムについての詳細と,実際にインプリメントしたプロトタイプシステムによる実験,結果について述べる.
Research society, Japanese - Image collector: An image-gathering system from the world-wide web employing keyword-based search engines
Keiji Yanai
Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 523-526, 2001, Peer-reviwed, Due to the recent explosive progress of WWW (World-Wide Web), w ecan easily access a large number of images from WWW. There are, however, no established methods to make use of WWW large image database. In this paper, we describe an automatic image-gathering system from WWW employing key-w ordsand image features, which is called the Image Collector. By exploiting some existing keyword-based searc h engines and selecting images by their image features, our system obtains, with high accuracy, images that are strongly related to query keywords. We have implemented the system that gathers more than one hundred images from WWW in about five minutes.
International conference proceedings, English - An automatic image-gathering system for the World-Wide Web by integration of keywords and image features
K Yanai
ICCIMA 2001: FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, IEEE COMPUTER SOC, 303-307, 2001, Due to the recent explosive progress of WWW (World-Wide, Web), we can easily access a large number of images from WWW. However, methods to utilize WWW as a large image database have not been established yet. In this paper, we propose an automatic image-gathering system from WWW employing both keywords and image features, which is called Image Collector. By exploiting some existing keyword-based search engines and selecting images by their image features, the system obtains more than one hundred images related to query keywords in about five minutes.
International conference proceedings, English - A Multi-resolution Image Understanding System Based on Multi-agent Architecture for High-resolution Images
Keiji Yanai; Koichiro Deguchi
Proc. of IAPR Workshop on Machine Vision and Applications, 291-294, Nov. 2000, Peer-reviwed
International conference proceedings, English - Recognition of indoor images employing qualitative model fitting and supporting relation between objects
K Yanai; K Deguchi
15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, IEEE COMPUTER SOC, 964-967, 2000, Peer-reviwed, In this paper, we describe a new design of a recognition system for a single image of indoor scene including complex occlusions. In our system, first, the system estimates 3D structure of an object by fitting a 3D structure model to the image qualitatively. Next, by checking supporting relation between objects, it eliminates object candidates that are impossible to exist and estimates actual objects from their parts in the image. Then, finally, we recognize objects that are consistent with each other. We implemented the system as a multi-agent-based image understanding system. This paper describes an outline of the system and results of recognition experiments.
International conference proceedings, English - An image understanding system for various images based on multi-agent architecture
Keiji Yanai
Proceedings - 3rd International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 1999, Institute of Electrical and Electronics Engineers Inc., 186-190, 1999, An image understanding system for real world images which has an ability to recognize various kinds of images is proposed. We propose a multi-agent architecture to integrate and cooperate object recognition modules for individual target objects. In our system, object candidates generated by different agents are integrated not only on the evaluations by each modules themselves but also on spatial relations among objects. By checking spatial relation, the agents also estimate actual objects from parts seen in the image. Such mechanisms are realized by autonomous cooperation among the agents, and the most reliable result is selected after the arbitration between them. We implemented an experimental system on a PC cluster system, and achieved recognition for both indoor and outdoor images.
International conference proceedings, English - A Multi-Agent Architecture of Object Recognition System for Various Image
柳井 啓司; 出口 光一郎
Transaction of Information Processing Society of Japan, 39, 2, 170-177, Feb. 1998, Peer-reviwed
Research society, Japanese - An architecture of object recognition system for various images based on multi-agent
K Yanai; K Deguchi
FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, IEEE COMPUTER SOC, 278-281, 1998, Peer-reviwed, An image understanding system for real world images which has an ability to recognize various kinds of images is proposed. We propose a multi-agent architecture to integrate object recognition modules for individual target objects. In our method, recognized results by different agents are fused not only on the evaluations by each modules themselves but also on relations of object locations, sizes and so on. This is carried out autonomously between the agents concerned, and the most reliable result is selected after the arbitration between them. We implemented an experimental system on a parallel computer and achieved recognition for both indoor and outdoor images.
International conference proceedings, English - An Implementation of Multi-Agent Based Object Recognition System for Various Image
Keiji Yanai
Proc. of the Eighth Parallel Computing Workshop, P1-P, 1998
International conference proceedings, English - Implementation of Object Recognition System Employing Multiagent Architecture at Highly Parallel Computer
Keiji Yanai; Koichiro Deguchi
Proc. of the Sixth Parallel Computing Workshop'96, 2-a, 1996
International conference proceedings, English
MISC
- 食事画像認識の現状と今後
柳井啓司
人工知能学会, Jan. 2019, 人工知能学会誌, 34, 1, -, Japanese, Introduction scientific journal - Neural Style Vectorによる絵画画像のスタイル検索
松尾 真; 柳井 啓司
Jun. 2018, 画像ラボ, 29, 6, -, Japanese, Introduction commerce magazine - Research Trends in Food Media
Ide Ichiro; Yanai Keiji; Yamakata Yoko
The Institute of Image Information and Television Engineers, Nov. 2017, The Journal of The Institute of Image Information and Television Engineers, 71, 6, 768-772, Japanese, 1342-6907, 1881-6908, 201702260037003336, 40021375737, AN10588970 - CNNを用いた高速モバイル画像認識エンジンの自動生成フレームワーク
丹野良介; 柳井啓司
日本工業出版, Apr. 2017, 画像ラボ, 28, 4, 31-38, Japanese, Invited, Introduction commerce magazine, 0915-6755, 40021174186, AN10164169 - 食事動作認識によるリアルタイム食事記録システム
岡元晃一; 柳井啓司
日本工業出版, Oct. 2015, 画像ラボ, 26, 10, 1-7, Japanese, Invited, Introduction commerce magazine, 0915-6755, 40020612817, AN10164169 - ウェアラブルカメラ映像の自動要約による道案内映像の自動作成
岡本昌也; 柳井啓司
日本工業出版, Oct. 2015, 画像ラボ, 26, 10, 8-15, Japanese, Invited, Introduction commerce magazine, 0915-6755, 40020612826, AN10164169 - CNNを用いた複数品食事画像の領域分割とカロリー推定 (データ工学)
下田 和; 柳井 啓司
電子情報通信学会, 24 Sep. 2015, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115, 230, 65-70, Japanese, 0913-5685, 40020617008, AN10012921 - Region Segmentation with Convolutional Neural Network by Weakly-Supervised Training
下田 和; 柳井 啓司
電子情報通信学会, 14 Sep. 2015, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115, 225, 149-154, Japanese, 0913-5685, 40020617396 - 高速食事画像判別器を用いたTwitter食事画像分析とデータセット自動拡張
河野憲之; 柳井啓司
日本工業出版, Dec. 2014, 画像ラボ, 25, 12, --54, Japanese, Invited, Introduction commerce magazine, 0915-6755, 40020284287, AN10164169 - Twitterからのジオタグ画像収集による視覚的イベント検出
金子昴夢; 柳井啓司
日刊工業出版, Nov. 2014, 画像ラボ, 25, 11, --28, Japanese, Invited, Introduction commerce magazine, 0915-6755, 40020246600, AN10164169 - 画像メディア技術の実世界画像データへの応用
柳井 啓司
Dec. 2013, 画像ラボ, 24, 12, Japanese, Introduction other - ラーメンvsカレー : 2年分のログデータと高速食事画像認識エンジンを用いたTwitter食事画像分析とデータセット自動構築 (パターン認識・メディア理解)
河野 憲之; 柳井 啓司
多くの人々がTwitter を利用するようになり,大量に投稿されたツイートを通して人々の行動や考えを分析することが可能となった.ツイートには画像が付与されたものも多く,特に昼食時,夕食時には,食事の画像が大量にツイートされる.そこで,本稿では2011年5月から2013年8月の2年4ヶ月の間に収集した約10億件の画像付きツイートに対して,食事キーワード検索と高速食事画像認識エンジンを用いて,100種類の食事画像を抽出する実験を行った結果を報告する.実験では,食事画像ランキング,一部の食事カテゴリについてサンプリングによる抽出精度評価,また位置情報食事画像ツイートを用いた「ラーメン」と「カレー」に関する地域分布の分析を行った.またさらに,我々が構築した100種類の食事画像データセットを自動的に拡張するためのフレームワークについても述べる.100 類食事画像データを利用して構築した食事画像判定エンジンと,Amazon Mechanical Turk を利用したクラウドソーシングを用いて,キーワードを与えるのみで,自動的に新しい食事カテゴリのバウンディングBOX付きの画像データセットを構築する.実験では,手動で作成した既存の食事画像データセットのサブセットとの認識精度の比較を行う., 一般社団法人電子情報通信学会, 03 Oct. 2013, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 113, 230, 59-64, Japanese, 0913-5685, 110009824980, AN10541106 - FoodCam : スマートフォン上でのリアルタイム食事画像認識による食事記録アプリケーション (データ工学)
河野 憲之; 柳井 啓司
近年、スマートフォンが普及しその性能も向上している。従来のスマートフォンアプリケ-ションはデータをサーバに送り、サーバ上で画像認識をしていた。だが、通信コストがかかる。また、ユーザの増加により計算資源が多数必要になるという問題点がある。そこで、スマートフォン上で画像処理することが望まれる。本論文では、計算資源の限られたスマートフォン上でより高速、高精度に一般物体認識を行う手法を提案する。そして、従来の画像認識を用いた食事記録アプリケーションの画像認識の面において改良を行った。従来手法よりも大幅に認識性能が向上し、かつ高速であることを実験により確認した。さらに、サーバ上でのコストが大きい認識手法と比較しより高い認識性能を示し、有効性を確認した。, 一般社団法人電子情報通信学会, 12 Sep. 2013, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 113, 214, 13-18, Japanese, 0913-5685, 110009815150, AN10012921 - 料理画像認識を用いたモバイル食事記録システム
河野憲之; 柳井啓司
近年,スマートフォンの性能が大きく向上している.そこで,スマートフォンの計算資源のみを用い,スマートフォン上でリアルタイムに料理画像認識を行い,ユーザの食事記録の補助を行うシステムを提案する.それは,カラーヒストグラムと Bag-of-SURF を fast X2 kernel SVMs で 50 種類の料理を分類し,また,認識する領域をユーザが与えるとバックグラウンドで GrabCut によるその領域の補正と,認識結果が誤った場合を考慮し,SVM の評価値に基づき認識結果が向上する方向を写すように指示するシステムである.実験では正しい料理領域が与えられ料理の候補を 5 つ提示した場合,81.55%の分類率を達成した.また,ユーザによるシステム評価を行い提案システムの有用性を検証した., 23 May 2013, 研究報告コンピュータビジョンとイメージメディア(CVIM), 2013, 4, 1-8, Japanese, 170000077117, AA11131797 - ウェアラブルカメラを用いた道案内映像の自動作成
岡本昌也; 柳井啓司
近年,ウェアラブルカメラと呼ばれる頭や胸などに着けて撮影するカメラの普及に伴って,一人称視点の映像が多く撮られるようになってきている。本研究では,一人称視点映像の移動映像を入力として,映像中の道順が分かるようシーンの重要度に応じて動的に再生スピードを変化させた要約映像を自動生成することを目的とする。要約映像を生成する為に,横断歩道検出と自己動作分類という 2 つの処理を行う。横断歩道検出は,映像中に出現する横断歩道を検出することで,映像中の交差点や分岐点を推定する。自己行動分類は,映像撮影者の行動を "前進", "停止" 及び "右折", "左折" の 4 つに分類する。要約映像の生成は,横断歩道検出と自己行動分類の結果を統合し,動的に再生速度を制御することによって行う。実験では,横断歩道検出の精度実験及び自己行動分類の精度実験を行い,その有効性を検証した。映像 3 本に対して提案手法の要約映像を生成して比較評価実験を行い,提案手法が単純な要約手法より優れている事を示した。, 23 May 2013, 研究報告コンピュータビジョンとイメージメディア(CVIM), 2013, 5, 1-8, Japanese, 170000077118, AA11131797 - Improvement of food image recognition using local feature extraction by k-means and dish detector
MATSUDA Yuji; YANAI Keiji
In our previous work, we propose a method to recognize multi-food images by detecting candidate regions with several food region detectors including a circle detector, the JSEG region segmentation and sliding window search by the Deformable Part Model. In this paper, we improve food image recognition using feature extraction by k-means and dish detector., The Institute of Electronics, Information and Communication Engineers, 14 Mar. 2013, Technical report of IEICE. PRMU, 112, 495, 157-161, Japanese, 0913-5685, 110009713432, AN10541106 - Large-scale Web Video Shot Ranking Exploiting Visual Features and Tag Co-occurance
HANG DO Nga; YANAI Keiji
In this paper, we propose a novel ranking method which aims to detect automatically relevant Web video shots of specific actions using visual links between the video shos as well as textual links between the videos and their tags. Our method adopts content based features to reduce tag noise and exploits the refined tags to improve VisualRank method which employs only visual features. We conduct large-scale experiments and show the effectiveness of our proposed method over the baseline., The Institute of Electronics, Information and Communication Engineers, 21 Feb. 2013, Technical report of IEICE. PRMU, 112, 441, 221-226, Japanese, 0913-5685, 110009728829, AN10541106 - Webマルチメディア情報と物体認識技術
柳井 啓司
Jul. 2012, 画像ラボ, 23, 7, -, Japanese, Introduction other - 食材画像認識を用いたレシピ推薦システム
丸山 拓馬; 秋山 瑞樹; 柳井 啓司
本論文では、モバイルデバイスでの画像認識を利用したレシピ推薦システムを提案する。提案システムはスマートフォン側で食材の一般物体認識をリアルタイムに行う。食材にスマートフォンをかざすだけで次々とレシピを推薦するので、従来のキー入力のみのシステムよりも直感的で簡単なレシピ検索が可能となる。一般物体認識手法としてカラーヒストグラム、SURFを用いたBag-of-Features表現を採用している。実験では30種類の食材を対象にしてタッチのみで操作するレシピ検索システムとの比較を行い、ユーザによる評価を実施した。また画像認識では44.9%の精度で目的の食材を認識でき、上位5位までを考慮すると80.9%を達成すること確認した。, 一般社団法人電子情報通信学会, 05 Mar. 2012, 電子情報通信学会技術研究報告. MVE, マルチメディア・仮想環境基礎, 111, 479, 43-48, Japanese, 0913-5685, 110009546395, AN10476092 - 候補領域推定による複数品目に対応した食事画像認識
甫足創; 松田裕司; 柳井啓司
20 Jul. 2011, 画像の認識・理解シンポジウム(MIRU2011)論文集, 2011, 234-240, Japanese, 170000067219 - Mining Specific Actions from Youtube Video with Spatio-Temporal Features
DO HANG NGA; YANAI Keiji
In this paper, we present a new method of automatically extracting from tagged Web videos the video-shots correspond to specific actions with just inputing the action keywords such as walking, eating etc., The Institute of Electronics, Information and Communication Engineers, 10 Feb. 2011, IEICE technical report, 110, 414, 159-164, Japanese, 0913-5685, 110008690104, AN10541106 - 物体認識技術の進歩
柳井啓司
May 2010, 日本ロボット学会誌, 28, 3, 257-260, Japanese, Introduction other - Bag-of-Features に基づく物体認識(2)-一般物体認識-
柳井啓司
アドコム・メディア, 2010, コンピュータビジョン最先端ガイド, 3, 85-117, 20001476662 - Multiple Kernel Learningを用いた50種類の食事画像分類
上東太一; 柳井啓司
日本工業出版, Jan. 2010, 画像ラボ, 21, 1, 12-18, Japanese, Introduction other, 0915-6755, 40016947602, AN10164169 - セマンティックギャップを超えて―画像・映像の内容理解に向けて―
井手一郎; 柳井啓司
人工知能学会, Sep. 2009, 人工知能学会誌, 24, 5, 691-699, Japanese, Introduction other, 0912-8085, 110007340609, AN10067140 - Bag-of-Featuresによるカテゴリー分類
柳井 啓司
日本工業出版, Jan. 2009, 画像ラボ, 20, 1, 59-64, Japanese, Introduction other, 0915-6755, 40016422766, AN10164169 - カメラが情景を理解する --シーンの意味的分類技術の最先端--
柳井啓司
Jul. 2008, 映像メディア学会誌, 62, 7, 40-45, English, Peer-reviwed, Introduction other - An Web Image-Gathering System Employing Semi-Supervised Learning
柳井 啓司
人工知能学会, 2005, 人工知能学会全国大会論文集, 19, 1-4, Japanese, 1347-9881, 40020231017, AA11578981 - WWWからの高速画像収集と収集画像を用いた画像認識の試み
柳井啓司
2001, 第15回人工知能学会全国大会講演論文集, 2001, 80012673207 - A Multi-resolution Image Understanding System Based on Multi-agent Architecture
YANAI Keiji; DEGUCHI Koichiro
人工知能学会, 15 Jun. 1999, 人工知能学会全国大会論文集 = Proceedings of the Annual Conference of JSAI, 13, 463-466, Japanese, 0914-4293, 10009927951, AN10258229 - マルチエージェントによる画像理解システム構成法
柳井啓司; 出口光一郎
日本工業出版, 1998, 画像ラボ, 9, 8, 15-19, Japanese, Introduction other, 0915-6755, 40005023964, AN10164169 - An Experiment of Placing 3D Object Model to the 3D Space Using Results of Object Recognition
柳井 啓司; 出口 光一郎
予め画像中に存在するであろう物体候補の正確な3次元形状モデルが与えられている場合に, 単一の2次元画像から, そこに写っている3次元物体の位置や姿勢を推定する問題は, model-based recognitionといわれ, 従来より様々な研究が行なわれている。しかし, 一般に実世界の画像に対しては, 予め物体の正確な形状の与えて置くことは困難である。一方, 実世界に対する認識は, 通常は一般名称に関する認識として行なわれている。特に単一画像を対象とした物体認識では, 主に画像を領域分割した結果に, 領域の特徴や領域間の関係などを用いて, ラベル付けを行なうことによって, 認識を実現している。そのため, 実世界は3次元であっても, 認識の過程では2次元的に処理され, 認識結果も領域に対するラベリングとして出力されることが多く, 3次元情報は得られない。しかし, 人間は見慣れているシーンであると, 単一2次元の画像(例えば, 図4)からでも奥行きを感じることができる。これは, 人間がシーン中の各物体の典型的な3次元的な構造を知っているからであると考えられる。そこで, この考えに基づき, 領域分割的な物体認識の認識結果を利用して, 予め与えておいた, それぞれの物体のクラス(同一の一般名称を持つ物体の集合)のプロトタイプ3次元モデルを画像に当てはめることによって, 各物体の3次元での相対的な向き, 大きさなどの3次元情報を復元する試みを行なった。本発表では, その構想と簡単な実験結果について述べる。, 24 Sep. 1997, 全国大会講演論文集, 55, 295-296, Japanese, 110002891448, AN00349328 - Integration of Various Object Recognition Programs Based on Multi-agent Architecture
YANAI Keiji; DEGUCHI Koichiro
We propose a multi-agent architecture of object recognition system to realize flexible recognition for real world image. In this architecture, each agent recognizes only one kind of own object, and the total system is constructed as an assembly of those agents, so that it achieves recognition for various objects by cooperation of the the agents. Each agent consists of an object recognition program and a communication module. Using relation knowledge such as relative locations, sizes and so on between objects, they confirm their recognitions and resolve conflicts. We implemented this system on a highly parallel computer AP1000+. This paper describes an outline of this architecture and results of recognition experiments of the implemented system., The Institute of Electronics, Information and Communication Engineers, 20 Aug. 1997, IEICE technical report. Computer systems, 97, 226, 31-38, Japanese, 110003179856, AN10013141 - Implementation of Object Recognition System Based on Muoti-agent Architecture by Intergrating Shape Knowledge and Realational Knowledge
YANAI Keiji; DEGUCHI Koichiro
24 Jun. 1997, 人工知能学会全国大会論文集 = Proceedings of the Annual Conference of JSAI, 11, 408-411, Japanese, 0914-4293, 10011367423, AN10258229 - A Consideration of Knowledge about Objects on the Object Recognition System employing Multiagent Architecture
柳井 啓司; 出口 光一郎
従来の物体認識システムにおいては, 認識の対象とする物体の形状に関する知識を数値データとして知識べース化し, それを用いて認識を行なうモデルベース型の認識が主流であった. そうした方法は, 予め正確な形状が分かっている物体を対象としている物体認識においては有効であったが, 実世界の画像では, 通常, 対象の正確な形状は予め知ることができないので, 絶対的な数値を用いた形状モデルの構築が困難であり, 必ずしも有効とはいえなかった. そこで我々は, マルチエージェントを用いて構築した実画像に対する物体認識システムにおいて, 認識対象の物体の形状に加えて, 物体同士の相対的な関係も知識として積極的に利用することを試みた. 本稿においては, その実験について紹介する., 12 Mar. 1997, 全国大会講演論文集, 54, 85-86, Japanese, 110002890902, AN00349328 - Experiment of Function-based Object Recognition System employing Multiagent Architechture
柳井 啓司; 出口 光一郎
従来のモデルベーストによる物体認識の研究では,予め形状が分かっている物体を対象としてきた.ところが,人間は初めて見る形状の物体であっても,その機能を推測することによって,おおよそそれが何であるかを認識することができる.より人間に近い認識を実現するには,未知の物体に対しても推論によって認識を行なうことが必要となる.近年,このような認識を実現するための新しい手法として,物体の持つ機能を認識し,それに基づいて物体を認識するという機能ベーストな物体認識手法が提案されている.一方.近年,分散AIやマルチエージェントの考え方を画像理解システムの構成法に利用する,分散協調処理による画像理解が提案されている.この分散協調処理を導入することにより,画像中の物体同士の関係を用いた認識が可能になる.本発表では,人工物によって構成される室内環境の実画像に対する物体認識に,分散協調処理によって実現される機能ベーストな物体認識を適用させる構想について述べ,我々が試作中である実験システムについて紹介する., 04 Sep. 1996, 全国大会講演論文集, 53, 235-236, Japanese, 110002887722, AN00349328
Books and other publications
- IT Text 深層学習
柳井 啓司; 中鹿 亘; 稲葉 通将
Scholarly book, Japanese, Joint work, オーム社, Nov. 2022 - レクチャー マルチメディア: 基礎からわかる音・画像・映像の情報処理
川崎 洋; 柳井 啓司; 佐川 立昌; 森山 剛; 古川 亮
Scholarly book, Japanese, Joint work, 数理工学社, 29 Mar. 2022 - 光学辞典
Dictionary or encycropedia, Japanese, Contributor, 物体認識 (P.204-280), 朝倉書店, 2014 - 総合コミュニケーション科学シリーズユニーク&エキサイティング サイエンス III
General book, Japanese, Joint work, 「Web 情報マイニングへの招待 –Web 上の画像・映像から面白いこと,役に立つことを発見する!–」, 近代科学社, 2014 - Multimedia Information Extraction
Keiji Yanai; Hidetoshi Kawakubo; Kobus Barnard
English, Joint work, Entropy-based Analysis of Visual and Geo-location Concepts in Images, IEEE Computer Society Press, 2011 - Handbook of Social Network Technologies and Applications
Keiji Yanai; Bingyu Qiu
English, Joint work, Mining Regional Representative Photos from a Consumer-Generated Geotagged Photo Database, Springer, 2010 - コンピュータビジョン最先端ガイド III
柳井啓司
Japanese, Joint work, Bag-of-Featuresに基づく物体認識 -- 一般物体認識 --, アドコム・メディア株式会社, 2010 - Toward Category-Level Object Recognition
Kobus Barnard; Keiji Yanai; Matthew Johnson; Prasad Gabbur
English, Joint work, Cross Modal Disambiguation, Springer, Dec. 2006
Lectures, oral presentations, etc.
- 動画拡散モデルを用いた複数物体におけるゼロショット動作制御
梶凌太, 柳井啓司
画像の認識・理解シンポジウム(MIRU)
07 Aug. 2024 - RecipeSD: Injecting Recipe Embedding into Food Image Synthesis using Stable Diffusion
Jing Yang, Junwen Chen, Jingbin Xu and Keiji Yanai
画像の認識・理解シンポジウム(MIRU), Peer-reviewed
07 Aug. 2024 - 視覚言語モデルを用いたパノプティックシーングラフ生成
許敬斌,柳井啓司
画像の認識・理解シンポジウム(MIRU)
07 Aug. 2024 - Act-ChatGPT: 動作特徴を用いた対話型ビデオ理解モデル
中溝雄斗, 柳井啓司
画像の認識・理解シンポジウム(MIRU), Peer-reviewed
07 Aug. 2024 - 画像生成モデルによるカロリー量を考慮した食事画像編集
山本 耕平,大岸 茉由,柳井 啓司
画像の認識・理解シンポジウム(MIRU), Peer-reviewed
07 Aug. 2024 - 大規模視覚言語モデルと食品体積推定による食事画像からのカロリー量推定
田邊光, 柳井啓司
画像の認識・理解シンポジウム(MIRU), Peer-reviewed
07 Aug. 2024 - FontCLIPstyler: 言語によるシーンテキストスタイル変換
原 虹暉, 柳井 啓司
画像の認識・理解シンポジウム(MIRU), Peer-reviewed
07 Aug. 2024 - MM-DiT ベースの Stable Diffusion 3 によるゼロショット領域分割
山口廉斗, 柳井啓司
画像の認識・理解シンポジウム(MIRU)
07 Aug. 2024 - WaveFontStyler:音に基づくフォントスタイル変換
泉幸太,柳井啓司
画像の認識・理解シンポジウム(MIRU), Peer-reviewed
07 Aug. 2024 - 微分可能レンダラーを用いたロゴ画像生成
山倉隆太, 柳井啓司
画像の認識・理解シンポジウム(MIRU), Peer-reviewed
07 Aug. 2024 - テキストプロンプトによるベクター形式ロゴ画像生成
山倉隆太 柳井啓司
情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
14 May 2024 - 大規模視覚言語モデルを用いた食事画像からのカロリー量推定
田邊光, 柳井啓司
情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
14 May 2024 - 対話型ビデオ理解モデルにおける動作特徴量の活用
中溝雄斗, 柳井啓司
情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
14 May 2024 - StableSeg: Stable Diffusionによるゼロショット領域分割
本部 勇真; 山口 廉斗; 柳井 啓司
画像の認識・理解シンポジウム (MIRU), Peer-reviewed
Jul. 2023 - 人物・物体・動作デコーダの分離によるHOI検出
陳 俊文; 王 瀛成; 柳井 啓司
電子情報通信学会パターン認識・メディア理解研究会(PRMU), Peer-reviewed
Jul. 2023 - VQ-VDM: ベクトル量子化変分オートエンコーダと 拡散モデルを用いた動画生成モデル
梶 凌太; 柳井 啓司
画像の認識・理解シンポジウム (MIRU), Peer-reviewed
Jul. 2023 - CalorieCam360: 全方位カメラによる複数人同時食事カロリー量推定システム
寺内 健人; 山本 耕平; 柳井 啓司
画像の認識・理解シンポジウム (MIRU), Peer-reviewed
Jul. 2023 - CLIPと微分可能レンダラーを用いたフォントスタイル変換
泉 幸太; 柳井 啓司
画像の認識・理解シンポジウム (MIRU)
Jul. 2023 - Stable Diffusionによるゼロショット画像領域分割
本部勇真; 柳井啓司
電子情報通信学会パターン認識・メディア理解研究会(PRMU)
03 Mar. 2023 - 深層距離学習の特許図面検索への適用
樋口幸太郎; 柳井啓司
電子情報通信学会パターン認識・メディア理解研究会(PRMU)
03 Mar. 2023 - 分離されたデコーダとノイズ除去学習を用いたHOI検出
陳 俊文; 王 瀛成; 柳井啓
02 Mar. 2023 - 全方位カメラを用いた複数人食事動作同時認識
寺内健人; 柳井啓司
電子情報通信学会パターン認識・メディア理解研究会(PRMU)
02 Mar. 2023 - SetMealAsYouLike: Few-shot Segmentationによる 食事画像への皿領域マスクの追加 と 食事画像生成への応用
本部 勇真; 寺内 健人; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - 時間的一貫性を考慮した ビデオ会議のための自然な仮想試着
清水 大輝; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - Cross-Modal Recipe Embeddingを用いた マスクに基づく食事画像生成
陳仲 涛; 楊 景; 本部 勇真; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - 大規模マルチモーダルモデルCLIPを用いた画像形状変換
銭 雨晨; 山本 耕平; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - TransformerとLarge Batch Sizeを用いたクロスモーダルレシピエンベディング学習
楊 景; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - マルチスケールのアンカーを用いた人間と物体のインタラクション検出
陳 俊文; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - 単一RGB-D画像と陰関数表現を用いた食事と食器の実寸三次元再構成と体積推定
成冨 志優; 本部 勇真; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - Vision Transformerを用いたContinual Learning
武田 麻奈; 清水 大輝; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2022 - Vision TransformerにおけるContinual Learning
武田 麻奈; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2022 - 陰関数表現とRGB-D画像を用いた実寸通り食事と食器の三次元再構成
成冨 志優; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2022 - StyleGANによるCLIP-Guidedな画像形状特徴編集
銭 雨晨; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2022 - クエリベースのアンカーを用いた人間と物体のインタラクション検出
陳 俊文; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2022 - Transformerを用いた人物行動検出
水野 颯介; 柳井 啓
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2022 - クロスモーダルレシピエンベディングによるマスクに基づく食事画像生成
陳 仲涛; 本部 勇真; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2022 - Transformerを用いたクロスモーダルレシピ検索・画像生成
楊 景; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2022 - Adaptive Point-wise グループ化畳み込みを用いた 小規模データセットからの画像の生成
武田麻奈; 柳井啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2021 - 初期ポーズ生成の改良とGCNの導入による ポーズシーケンス生成モデルの拡張
寺内 健人; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2021 - 全体カロリー量のみがアノテーションされた 複数品食事画像の個別カロリー量推定
岡本 開夢; 足立 賢人; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2021 - 食事画像に対するFew/Zero-shot Segmentation
本部 勇真; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2021 - 食事画像に対する少数およびゼロショット領域分割
本部 勇真; 柳井 啓司
Poster presentation, Japanese, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), Domestic conference
May 2021 - 初期ポーズ生成の改良とGCNの導入によるポーズシーケンス生成モデルの拡張
寺内 健人; 柳井 啓司
Poster presentation, Japanese, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), Domestic conference
May 2021 - 単一画像からの食事(食器含む)と食器単体の三次元形状の同時復元を用いた食事領域の体積推定
成冨 志優; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンとイメージメディア研究会, Domestic conference
Oct. 2020 - 意味と形状の分離によるマルチモーダルレシピ検索及び画像生成
杉山優; 岡本開夢; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2020 - 単一画像変換ネットワークによる複数タスクと組み合わせタスクの学習
武田麻奈; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2020 - 単一食事画像からの皿と食事の同時分離形状復元
成冨志優; 柳井啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2020 - RamenStyleAsYouLike: 領域毎のスタイル特徴の融合による画像生成
岡本開夢; 下田和; 柳井啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2020 - 画素単位アノテーション付きの食事画像データセットの構築 と 認識・生成への応用
岡本開夢; 柳井啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2020 - 意味と形状の分離によるマルチモーダルレシピ検索及び画像生成
杉山 優; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2020 - ラーメンスタイルエンコーダーを用いたスタイル特徴とマスク画像からの画像生成
趙 宰亨; 下田 和; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU)
Mar. 2020 - 映像・音・センサー情報の統合によるレスキュー犬の1人称行動認識
井出佑汰; 荒木勇人; 濱田龍之介; 大野和則; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2020 - 皿領域の推論を活用した食事の弱教師あり領域分割
下田和; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 食メディア研究会 (CEA), Domestic conference
Oct. 2019 - 食事画像領域分割データセットの作成とその活用
岡本開夢; Cho Jaehyeong; 會下拓実; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 食メディア研究会 (CEA), Domestic conference
Oct. 2019 - 自己教師あり学習による変化領域の推論を活用した弱教師あり領域分割
下田 和; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - 米飯を基準としたCNNによる食事画像からのカロリー量推定
會下 拓実; Jaehyeong Cho; 松平 礼史; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - Ramen as You Like
Jaehyeong Cho; Wataru Shimoda; Keiji Yanai
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - レスキュー犬の一人称動画を用いた動作推定
荒木 勇人; 井出 佑汰; 濱田 龍之介; 大野 和則; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - DepthCalorieCam: 深度カメラと深層学習による自動食事カロリー量推定システム
安蒜 祥和; 會下 拓実; 岡本 開夢; 泉 裕貴; Jaehyeong Cho; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - 大量のTwitter位置情報付き画像を用いた世界各地域における食事傾向分析
岡本 開夢; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - Identityと化粧Styleの分離による顔画像変換
五味 京祐; 越野 誠也; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - 重み選択マスクを用いた画像変換ネットワークの連続学習
松本 晨人; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - SSA-GAN: Cloud Video Generation from a Single Image with Spatial Self-Attention Generative Adversarial Networks
Daichi Horita; Keiji Yanai
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - ウミネコ動画の自動分析
井出 佑汰; 水谷 友一; 依田 憲; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - ONNX2MPSNNGraph: モバイル深層学習コードジェネレータの実装と評価
泉 裕貴; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2019 - 米飯画像の実寸推定に基づく面積を考慮したカロリー量推定
會下 拓実; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム (DEIM), Domestic conference
Mar. 2019 - 深度付き画像と深層学習による食事カロリー量推定システムの開発
安蒜 祥和; 會下 拓実; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム (DEIM), Domestic conference
Mar. 2019 - 画像変換ネットワークによる連続学習
松本 晨人; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム (DEIM), Domestic conference
Mar. 2019 - 位置情報付きTwitter画像を用いた世界の食事傾向分析
岡本 開夢; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム (DEIM), Domestic conference
Mar. 2019 - Conditional GANによる化粧顔画像変換
五味 京祐; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム (DEIM), Domestic conference
Mar. 2019 - 変化領域の推測による弱教師あり領域分割の精度向上
下田 和; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU), Domestic conference
Mar. 2019 - 深層学習による太陽画像からの太陽黒点数の推定
樋口陽光; 會下拓実; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 総合大会, Domestic conference
Mar. 2019 - 教師情報に含まれるノイズに堅牢な弱教師あり領域分割手法
下田 和; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2018 - NNによる料理検出とカロリー量推定のマルチタスク学習
會下 拓実; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2018 - 単語情報を利用した画像の質感転送
杉山 優; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2018 - Chainer2MPSGraph: 高速深層学習モバイルアプリ作成のためのモデルコンバータ
泉 裕貴; 堀田 大地; 丹野 良介; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2018 - 画像マイニングを用いた Conditional Cycle GAN による食事画像変換
堀田 大地; 丹野 良介; 下田 和; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2018 - CNNを用いた質感文字生成
成沢淳史; 下田和; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2018 - AR 技術とモバイル深層学習を活用した 食事カロリー量推定
丹野良介; 會下拓実; Jaehyeong Cho; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), (2018)., Domestic conference
Aug. 2018 - 深層学習による質感文字生成
成沢 淳史; 下田 和; 柳井 啓司
Oral presentation, Japanese, 人工知能学会全国大会, Domestic conference
Jun. 2018 - 画像内容を考慮した質感表現に基づく画像変換
杉山 優; 柳井 啓司
Oral presentation, Japanese, 人工知能学会全国大会, Domestic conference
Jun. 2018 - 大量のTwitter画像を用いたConditional Cycle GANによる食事写真カテゴリ変換
堀田 大地; 成冨 志優; 丹野 良介; 下田 和; 柳井 啓司
Oral presentation, Japanese, 人工知能学会全国大会, Domestic conference
Jun. 2018 - 深層学習による画像認識革命
Invited oral presentation, Japanese, オプティメカトロニクス協会講演会, Invited, Domestic conference
14 Mar. 2018 - スタイル転移によるフォント画像変換
成沢淳史; 柳井啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), Domestic conference
Mar. 2018 - AR DeepCalorieCam: AR表示型食事カロリー量推定システム
丹野良介; 會下拓実; 柳井啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM), Domestic conference
Mar. 2018 - Conditional GANによる食事写真の属性操作
成冨志優; 堀田大地; 丹野良介; 下田和; 柳井啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM), Domestic conference
Mar. 2018 - Conditional GANを用いた大規模食事画像データからの画像生成
伊藤祥文; 丹野良介; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 食メディア研究会(CEA), Domestic conference
Mar. 2018 - 単一の畳み込みネットワークによる料理検出とカロリー量推定のマルチタスク学習
會下拓実; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 食メディア研究会(CEA), Domestic conference
Mar. 2018 - 會下拓実・丹野良介・柳井啓司
CNNによる複数品食事画像の同時カロリー推定とそのモバイル実装
Oral presentation, Japanese, 電子情報通信学会食メディア研究会(CEA), Domestic conference
Dec. 2017 - CoreMLによるiOS深層学習アプリの実装と性能分析
丹野 良介; 泉 裕貴; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU), Domestic conference
Oct. 2017 - CNNによる複数料理写真からの同時カロリー量推定
會下 拓実; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU), Domestic conference
Oct. 2017 - 食事画像カロリー量推定における回帰による手法と検索による手法の比較
會下拓実; 柳井啓司
Oral presentation, Japanese, 情報処理学会 コンピュータビジョンとイメージメディア研究会 (CVIM)
Sep. 2017 - 完全教師あり学習手法を用いた弱教師あり領域分割におけるシード領域生成方法の改良
下田 和; 柳井 啓司
Oral presentation, Japanese, 情報処理学会 コンピュータビジョンとイメージメディア研究会 (CVIM), Domestic conference
Sep. 2017 - Conditional GAN を用いた複数詳細カテゴリ画像の合成
伊藤 祥文; Jaehyeong Cho; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - 弱教師あり領域分割のための一貫性に基づく学習画像の領域分割容易性推定
下田 和; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - 画像スタイル変換とWeb画像を用いた画像の任意質感生成
松尾 真; 下田 和; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - Multi-task CNNを用いた食材および調理手順情報を利用した食事画像カロリー量推定
會下 拓実; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - Twitter画像に対する地域別画像タイプの大規模分析
長野 哲也; 會下 拓実; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - Neural Style TransferとCycle GANを利用したフォント変換
成沢 淳史; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - ConvDeconvNetの効率的モバイル実装による 画像変換・物体検出・領域分割リアルタイムiOSアプリ群
丹野 良介; 泉 裕貴; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - Unseen Style Transfer Network
Ryosuke Tanno; Keiji Yanai
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2017 - 画像を生成する深層学習ネットワーク ―領域分割と画像生成・変換―
柳井啓司; 下田和
Invited oral presentation, Japanese, 日本画像学会主催セミナー『人工知能における学習技術』, Invited, 日本画像学会, Domestic conference
04 Jul. 2017 - 食事レシピ情報を利用した食事画像からのカロリー量推定
會下 拓実; 柳井 啓司
Poster presentation, Japanese, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM), Domestic conference
11 May 2017 - 映像認識技術の現状 と 生物映像への適用可能性
柳井啓司
Invited oral presentation, Japanese, 生物学動画・音声アーカイブに関するシンポジウム, Invited, 大阪市立自然史博物館, Domestic conference
05 Mar. 2017 - 深層学習技術が引き起こした画像認識の大幅性能向上
柳井啓司
Invited oral presentation, Japanese, 「くらしの中の共生」第13回人類動態学会シンポジウム, Invited, 人類動態学会, Domestic conference
17 Dec. 2016 - 食事画像認識の現状と今後
柳井啓司
Invited oral presentation, Japanese, 電子情報通信学会データ工学研究会(DE), Invited, Domestic conference
01 Dec. 2016 - 深層学習による質感画像の認識・変換
柳井啓司
Invited oral presentation, Japanese, 質感のつどい 第2回公開フォーラム, Invited, Domestic conference
30 Nov. 2016 - ディープラーニングによる画像・映像の認識と生成
柳井啓司
Invited oral presentation, Japanese, 「映画のまち調布」映画・映像技術シンポジウム, Invited, 調布市役所、多摩信用金庫、国立大学法人電気通信大学, Domestic conference
25 Nov. 2016 - Neural Style Transferと領域分割による画像の部分的質感操作
松尾真; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU)
Sep. 2016 - 弱教師学習手法を用いたWebからの食事検出器の自動学習
下田和; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU), Domestic conference
Sep. 2016 - CNNの順・逆伝搬値とCRFを利用した弱教師領域分割
下田和; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2016 - Style Image Retrieval Using CNN-based Style Vector
Shin Matsuo; Keiji Yana
Oral presentation, English, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2016 - DeepXCam: CNNによるリアルタイムモバイル画像認識・変換アプリ群
丹野 良介; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2016 - CNNを用いた商品札文字認識
成沢淳史; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2016 - Twitter食事画像からの詳細カテゴリ発見
伊藤 祥文; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Aug. 2016 - 質感画像の弱教師領域分割とその結果に基づく質感の部分的変換
下田 和; 松尾 真; 柳井 啓司
Oral presentation, Japanese, 人工知能学会全国大会
Jun. 2016 - スマートフォン上でのDeep Learningによる画像認識
丹野良介; 柳井啓司
Oral presentation, Japanese, 人工知能学会全国大会, Domestic conference
Jun. 2016 - 80msで認識可能な深層学習による2000種類物体認識iOSアプリ
丹野良介; 柳井啓司
Poster presentation, Japanese, 画像センシングシンポジウム (SSII)
Jun. 2016 - 値札文字認識による価格記録Webアプリケーション
成沢 淳史; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM), Domestic conference
Mar. 2016 - iOS上のDeep Learningによる食事画像認識アプリ
丹野 良介; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM), Domestic conference
Mar. 2016 - テレビ映像からの特定動作シーンの自動検出
小林 隼人; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM), Domestic conference
Mar. 2016 - 位置情報付き画像を用いた単語概念の時間変化の分析
ボルド ビレグサイハン; 及川 雄介; 伊藤 祥文; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM), Domestic conference
Mar. 2016 - スマートフォンによる食事画像からの自動カロリー量推定システム
岡元 晃一; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 食メディア研究会, Domestic conference
Mar. 2016 - 料理写真撮影におけるおいしそうな構図決定および撮影支援モバイルアプリ
柿森 隆生; 岡部 誠; 柳井 啓司; 尾内 理紀夫
Oral presentation, Japanese, 電子情報通信学会 食メディア研究会
Mar. 2016 - CNNとMILを用いた弱教師あり領域分割
下田 和; 柳井 啓司
Oral presentation, Japanese, 情報論的学習理論ワークショップ (IBIS), Domestic conference
27 Nov. 2015 - CNNを用いた複数品食事画像の領域分割とカロリー推定
下田和; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 データ工学研究会
25 Sep. 2015 - CNNを用いた弱教師学習による画像領域分割
下田和; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU), Domestic conference
15 Sep. 2015 - Web大規模画像データを用いた 画像とオノマトペの関係分析
柳井啓司; 下田和
Invited oral presentation, Japanese, FIT2015 第14回情報科学技術フォーム, Invited, Domestic conference
Sep. 2015 - 料理写真撮影におけるおいしそうな構図決定を支援するシステム
柿森 隆生; 岡部 誠; 柳井 啓司; 尾内 理紀夫
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2015 - DeepFoodCam: DCNNによる101種類食事認識アプリ
岡元 晃一; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2015 - 画像特徴とテキスト特徴を用いた画像ツイートの位置推定
松尾 真; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU)
Jul. 2015 - シーン文字認識と自己動作分類を用いた車載動画の要約
佐藤 享憲; 成沢 淳史; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU)
Jul. 2015 - CNNの逆伝搬を利用した食事画像の領域分割
下田 和; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU)
Jul. 2015 - 画像の位置推定を用いたマイクロブログからの視覚的なイベント検出
金子 昂夢; 松尾 真; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU)
Jul. 2015 - Automatic Action Video Dataset Construction from Web using Density-based Cluster Anaysis and Outlier Detection
Do Hang Nga; Keiji Yanai
Poster presentation, English, 画像の認識・理解シンポジウム(MIRU), (2015), Domestic conference
Jul. 2015 - DCNN特徴を用いたWeb からの質感画像の収集と分析
下田和; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU), Domestic conference
22 Jan. 2015 - Real-time Photo Mining from the Twitter Stream: Event Photo Discovery and Food Photo Detection
Keiji Yanai; Takamu Takamu; Yoshiyuki Kawano
Invited oral presentation, English, IEEE International Symposium on Multimedia (ISM), Invited, International conference
Dec. 2014 - 大量の位置情報付きTwitter画像データからの視覚的イベント検出
金子 昂夢; 柳井 啓司
Poster presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2014 - 既存カテゴリの活用とクラウドソーシングによる食事画像データセットの自動拡張
河野 憲之; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU), Domestic conference
Jul. 2014 - Twitter上の位置情報付き画像を利用した リアルタイムイベント画像検出
金子 昂夢; 柳井 啓司
Oral presentation, Japanese, 人工知能学会全国大会, Domestic conference
May 2014 - クラウドソーシングによる食事画像データセットの自動構築
河野 憲之; 柳井 啓司
Oral presentation, Japanese, 人工知能学会全国大会, Domestic conference
May 2014 - Twitterからのジオタグ画像収集による視覚的イベント検出
金子 昂夢; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU)
Oct. 2013 - ラーメン vs カレー:2年分のログデータと高速食事画像認識エンジンを用いたTwitter食事画像分析とデータセット自動構築
河野 憲之; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU,)
Oct. 2013 - FoodCam:スマートフォン上でのリアルタイム食事画像認識による食事記録アプリケーション
河野 憲之; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 データ工学研究会(DE)
Sep. 2013 - VisualTextualRank: A Video Shot Ranking Method Using Visual Similarity and Tag Co-occurrence
Do Hang Nga; Keiji Yanai
Oral presentation, English, 画像の認識・理解シンポジウム(MIRU2013)
Jul. 2013 - スマートフォン上でのリアルタイム食事認識システム
河野 憲之; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2013)
Jul. 2013 - スマートフォン上でのリアルタイム食事・食材画像認識アプリケーション
河野 憲之; 丸山 拓馬; 柳井 啓司
Oral presentation, Japanese, 画像センシングシンポジウム (SSII)
Jun. 2013 - ウェアラブルカメラを用いた道案内映像の自動作成
岡本 昌也; 柳井 啓司
Oral presentation, Japanese, 情報処理学会 コンピュータビジョンとイメージメディア研究会 (CVIM),情報処理学会 コンピュータビジョンとイメージメディア研究会 (CVIM)
May 2013 - 料理画像認識を用いたモバイル食事記録システム
河野 憲之; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
Apr. 2013 - k-meansによる局所特徴量抽出と皿検出器による食事画像認識の改良
松田 裕司; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU)
Mar. 2013 - 食事認識を用いたモバイル食事管理システム
河野 憲之; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM)
Mar. 2013 - 位置情報付き画像ツイートを利用した視覚的なイベント検出
金子 昂夢; 柳井 啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM)
Mar. 2013 - クラウドソーシングによる食事画像認識モデルの自動構築
大澤翔吾; 柳井啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM)
Mar. 2013 - 視覚特徴およびタグ共起を用いた大規模Webビデオショットランキング
Do Hang Nga; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会 パターン認識・メディア理解研究会(PRMU)
Feb. 2013 - TRECVID Semantic Indexing Task と Multimedia Event Detection Taskへの取り組み
樋爪 和也; 柳井 啓司
Oral presentation, Japanese, ビジョン技術の実利用ワークショップ (ViEW2012)
Dec. 2012 - Web動画・画像を用いた特定動作の対応ショットの自動抽出
Do Hang Nga; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2012)
Aug. 2012 - 料理間の共起関係を考慮した食事画像認識
松田裕司; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2012)
Aug. 2012 - タグの組み合わせによる視覚的な関連性変化の分析
小原 侑也; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2012)
Aug. 2012 - 食材画像認識を用いたモバイルレシピ推薦システム
丸山 拓馬; 樋爪 和也; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2012)
Aug. 2012 - Automatically Extracting Relevant Video Shots of Specific Actions from the Web
Do Hang Nga; Keiji Yanai
Invited oral presentation, English, Greater Tokyo Area Multimedia/Vision Workshop, Greater Tokyo Area Multimedia/Vision Workshop, Tokyo, Japan, International conference
Aug. 2012 - Web上の大量画像を用いた名詞と形容詞の関係分析
小原侑也; 柳井啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
May 2012 - 物体認識技術を用いたモバイル物品管理システム
望月宏史; 柳井啓司
Oral presentation, Japanese, データ工学と情報マネジメントに 関するフォーラム(DEIM)
Mar. 2012 - テレビ番組からの位置情報付き旅行映像データベースの自動構築
向井康貴; 柳井啓司
Oral presentation, Japanese, データ工学と情報マネジメントに 関するフォーラム(DEIM)
Mar. 2012 - 食材画像認識を用いたモバイルレシピ推薦システム
丸山拓馬; 秋山瑞樹; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会 食メディア研究会(CEA)
Mar. 2012 - 服飾画像マイニングのための衣類領域からの色情報抽出
相田優; 柳井啓司; 柴原一友; 藤本浩司
Oral presentation, 電子情報通信学会 画像工学研究会(IE)
Mar. 2012 - 特徴点選択とペア化による Naive-Bayes Nearest-Neighbor手法の改良
秋山 瑞樹; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
Mar. 2012 - Web動画・画像を用いた特定動作ショットの自動収集
Do Hang Nga; 樋爪 和也; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会(CVIM)
Mar. 2012 - Deformable Part Modelを用いた料理の位置検出
松田裕司; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
Dec. 2011 - 物体認識技術を用いた食事画像認識
柳井啓司
Invited oral presentation, Japanese, 人工知能セミナー「食とAI~消費・小売流通・生産の立場から~」, Domestic conference
Oct. 2011 - Webマルチメディア と 物体認識
柳井啓司
Invited oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会, Domestic conference
Oct. 2011 - 一般物体認識技術の発展
柳井 啓司
Invited oral presentation, Japanese, 精密工学会 画像応用技術専門委員会 研究会, Domestic conference
Sep. 2011 - 大量のWeb動画からの教師なし特定動作ショット抽出
DoHang Nga; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2011)
Jul. 2011 - GeoVisualRank を用いた単語概念の地域性の分析
川久保 秀敏; 樋爪 和也; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2011)
Jul. 2011 - 候補領域推定による複数品目に対応した食事画像認識
甫足 創; 松田 裕司; 柳井 啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2011)
Jul. 2011 - マルチフレームを用いた動画像認識
樋爪 和也; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンとイメージメディア研究会(CVIM)
May 2011 - 地域別代表画像を用いた単語概念の地域性の分析
川久保 秀敏; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会(CVIM)
Mar. 2011 - 候補領域推定による食事画像の複数品目認識
甫足 創; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会(CVIM)
Mar. 2011 - Bag-of-frames と時空間特徴量を用いたSemantic Indexing Taskへの取り組み
下田 保志; 野口 顕嗣; 柳井 啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU)
Feb. 2011 - 時空間特徴量を用いたWeb 動画からの特定動作ショットの自動抽出
Do Hang Nga; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会(PRMU)
Feb. 2011 - 写真撮影の位置軌跡を利用した旅行支援システム
奥山 幸也; 柳井啓司
Oral presentation, Japanese, データ工学と情報マネジメントに関するフォーラム(DEIM)
Feb. 2011 - Webマルチメディアと物体認識
柳井啓司
Invited oral presentation, Japanese, NHK放送技術研究所セミナー, Domestic conference
Dec. 2010 - Folksonomyを用いた画像特徴とタグ共起に基づく画像オントロジーの自動構築
秋間雄太; 川久保秀敏; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2010)
Jul. 2010 - Web上の大量画像を用いた特定物体認識手法による一般物体認識
秋山瑞樹; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2010)
Jul. 2010 - 動作認識のための時空間特徴量と特徴統合手法の提案
野口顕嗣; 下田保志; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2010)
Jul. 2010 - ジオタグ画像認識における位置情報の利用法の検討と分析
八重樫恵太; 丸山拓馬; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2010)
Jul. 2010 - 位置情報を考慮したVisualRankによる地域別代表画像の選出
川久保秀敏; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2010)
Jul. 2010 - 画像・映像の認識と意味的検索
柳井啓司
Invited oral presentation, Japanese, 電子情報通信学会 音声研究会(SP), Domestic conference
Jul. 2010 - 一般物体認識における機械学習の利用
柳井啓司
Invited oral presentation, Japanese, 電子情報通信学会情報論的学習と機械学習研究会(IBIS-ML), Domestic conference
Jun. 2010 - Web 上の大量画像を用いた特定物体認識手法による一般物体認識
秋山 瑞樹; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会
May 2010 - 位置情報付き路上画像の撮影方向推定システムの提案
丸山拓馬; 柳井啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会
May 2010 - ジオタグ画像認識における周辺テキスト情報の有効性の検証
八重樫恵太; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会
Mar. 2010 - Web動画ショットの動作分類のための時空間特徴抽出手法の提案
野口 顕嗣; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会
Mar. 2010 - Folksonomyによる階層構造画像データベースの構築
秋間 雄太; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会
Mar. 2010 - 多種類特徴統合による動作認識手法の提案
野口 顕嗣; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
Mar. 2010 - 【チュートリアル】一般物体認識
柳井啓司
Invited oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会, Invited, Domestic conference
Nov. 2009 - VisualRankにおける位置情報活用の検討
川久保秀敏; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
Nov. 2009 - マルチカーネル学習を用いた画像特徴と航空写真特徴の重要度の推定
八重樫恵太; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
Nov. 2009 - 位置情報付き写真における撮影位置の航空写真を利用した画像認識
八重樫恵太; 柳井啓司
Oral presentation, Webとデータベースに関するフォーラム (WebDB Forum 2009)
Nov. 2009 - 一般物体認識技術の発展と映像検索への応用
柳井 啓司
Invited oral presentation, Japanese, FIT2009 第8回情報科学技術フォーム, FIT2009 第8回情報科学技術フォーム
Sep. 2009 - Web画像マイニング -Web上の膨大な画像データからの知識発見-
柳井 啓司
Invited oral presentation, Japanese, FIT2009 第8回情報科学技術フォーム, FIT2009 第8回情報科学技術フォーム
Sep. 2009 - 動きの連続性を考慮した動画からの局所的な時空間特徴の抽出
野口顕嗣; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2009)
Aug. 2009 - 単語概念の視覚性と地理的分布の関係性の分析
川久保秀敏; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2009)
Aug. 2009 - 画像特徴とテキスト特徴の統合によるWebスポーツニュース画像のイベント分類
北原章雄; 奥山幸也; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2009)
Aug. 2009 - Multiple Kernel Learningによる50種類の食事画像の認識
上東太一; 甫足創; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2009)
Aug. 2009 - Multiple Kernel Learningを用いた食べ物画像の認識
上東太一; 柳井啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会
Mar. 2009 - TRECVID高次特徴抽出タスクにおける多種類特徴統合手法の比較
湯 志遠; 柳井啓司; TRECVID高次特徴抽出タスクにおけ; 多種類特徴統合手法の比較
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会
Mar. 2009 - 画像特徴とテキスト特徴を用いたWebスポーツニュース画像のイベント分類
北原章雄; 柳井啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョンイメージメディア研究会
Mar. 2009 - Bag-of-Features表現を用いたエントロピーによる単語の視覚性の分析
川久保秀敏; 柳井啓司
Oral presentation, English, 情報処理学会コンピュータビジョンイメージメディア研究会
Mar. 2009 - 現状および今後の物体認識技術のデジタルカメラへの応用の可能性
柳井啓司
Invited oral presentation, Japanese, オプティメカトロニクス協会講演会, オプティメカトロニクス協会
Mar. 2009 - 一般物体認識の現状と今後の展望
柳井 啓司
Invited oral presentation, Japanese, 筑波大学システム情報工学研究科コンピュータサイエンス専攻 学術講演会, 筑波大学システム情報工学研究科 コンピュータサイエンス専攻
Mar. 2009 - Mining Regional Representative Photos from a Large-scale Geotagged Image Database
Qiu BINGYU; Keiji YANAI
Oral presentation, English, 電子情報通信学会パターン認識・メディア理解研究会,電子情報通信学会パターン認識・メディア理解研究会
Dec. 2008 - 一般物体認識と機械学習
柳井啓司
Invited oral presentation, Japanese, T-PRIMAL セミナー, T-PRIMAL セミナー
Aug. 2008 - 色・動き・音特徴を用いたEarth Mover’s Distance に基づくWeb動画検索
高田圭佑; 柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2008),画像の理解シンポジウム(MIRU2008)
Jul. 2008 - 確率トピックモデルによるWeb画像の分類
柳井 啓司
Oral presentation, Japanese, 人工知能学会全国大会,http://www.ai-gakkai.or.jp/jsai/conf/2008/
Jun. 2008 - Bag-of-keypointsによるカテゴリー分類
柳井 啓司
Invited oral presentation, Japanese, 画像センシングシンポジウム(SSII), 画像センシングシンポジウム(SSII)
Jun. 2008 - 色,動き,顔特徴に基づくTRECVIDラッシュ映像の自動要約
野口 顕嗣; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会,情報処理学会CVIM研究会
May 2008 - 撮影位置の情報を用いた一般画像認識の可能性の検討
八重樫 恵太; 柳井 啓司
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会,情報処理学会CVIM研究会
May 2008 - 多種類特徴の統合によるTRECVID映像の認識
劉謳南; 柳井啓司
Oral presentation, Japanese, 情報処理学会全国大会,情報処理学会全国大会
Mar. 2008 - クラスタリングによるTRECVIDラッシュ映像の要約
野口顕嗣; 柳井啓司
Oral presentation, Japanese, 情報処理学会全国大会,情報処理学会全国大会
Mar. 2008 - Earth Mover's Distance を用いた類似Web動画検索
高田圭佑; 柳井啓司
Oral presentation, Japanese, 情報処理学会全国大会,情報処理学会全国大会
Mar. 2008 - Bag-of-Keypointsを用いたWeb画像収集の高精度化
柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2007)
Aug. 2007 - Web画像ニュースの顔と人物名の対応付け
北原章雄; 柳井啓司
Oral presentation, 情報処理学会コンピュータビジョン・イメージメディア研究会
May 2007 - Bag-of-Keypoints表現を用いたWeb画像分類
上東太一; 柳井啓司
Oral presentation, 情報処理学会コンピュータビジョン・イメージメディア研究会
May 2007 - Bag-of-keypointsによるTRECVIDデータに対する映像認識
湯志遠; 柳井啓司
Oral presentation, 情報処理学会コンピュータビジョン・イメージメディア研究会
May 2007 - 位置情報を用いた旅行自動記録システム
阿久津剛之; 柳井啓司
Oral presentation, Japanese, 情報処理学会全国大会
Mar. 2007 - Web写真ニュースの分類と検索
伊與田達也; 柳井啓司
Oral presentation, Japanese, 情報処理学会全国大会
Mar. 2007 - 画像付きニュース記事からの顔と人物名の抽出
北原章雄; 柳井啓司
Oral presentation, Japanese, 情報処理学会全国大会
Mar. 2007 - 一般物体認識の現状と今後
柳井 啓司
Invited oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会, 情報処理学会コンピュータビジョン・イメージメディア研究会
Sep. 2006 - Web画像マイニング: Webからの画像知識の獲得 と その応用
柳井 啓司
Invited oral presentation, Japanese, 電気関係学会東海支部連合大会, 電気関係学会東海支部連合大会
Sep. 2006 - 確率モデルを用いたWeb画像マイニングによる画像認識
柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU2006)
Jul. 2006 - 確率モデルを用いたWeb画像マイニングによる画像認識
柳井啓司
Oral presentation, Japanese, 人工知能学会全国大会,人工知能学会全国大会
Jun. 2006 - テキスト特徴と画像特徴を併用したWeb上の写真付きニュース記事のクラスタリング
伊与田達也; 阿久津剛之; 柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
Jun. 2006 - 一般物体認識のための単語概念の視覚性の分析
柳井啓司; Kobus Barnard
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会,情報処理学会コンピュータビジョン・イメージメディア研究会
Jan. 2006 - Evaluation Strategies for Image Understanding and Retrieval
Keiji Yanai; Nikhil V. Shirahatti; Prasad Gabbur; Kobus Barnard
Invited oral presentation, English, ACM Multimedia Workshop on Multimedia Information Retrieval, ACM Multimedia Workshop on Multimedia Information Retrieval, Singapore, International conference
Nov. 2005 - 実世界画像コーパス作成のための高精度Web 画像収集{準教師あり学習 を用いた画像選択{
柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU 2005)
Jul. 2005 - Semi-Supervised Learning を用いたWeb画像収集システム
柳井啓司
Oral presentation, Japanese, 第19 回人工 知能学会全国大会
Jun. 2005 - 対局盤面と解説盤面の認識結果の統合による囲碁対局番組からの棋譜自動生成
林山剛久; 柳井啓司; 野下浩平
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
Mar. 2005 - Web Image Mining: Can We Gather Visual Knowledge for Image Recognition from the Web ?
Keiji Yanai
Invited oral presentation, English, Pacific-Rim International Conference on Multimedia, Singapore, International conference
Dec. 2003 - 実世界画像自動分類のためのWeb画像マイニング
柳井啓司
Oral presentation, Japanese, 第17回人工知能学会全国大会(2003年度)
Jun. 2003 - Web画像収集における単語ベクトルの導入と画像特徴の改良
柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
Jan. 2003 - Web画像を用いた一般画像の自動分類
柳井啓司
Oral presentation, Japanese, 画像の認識・理解シンポジウム(MIRU 2002)
Jul. 2002 - 囲碁テレビ番組からの棋譜自動生成システム
林山剛久; 柳井啓司; 野下浩平
Oral presentation, Japanese, 情報処理学会コンピュータビジョン・イメージメディア研究会
May 2002 - 反復深化探索に基く協力詰将棋の解法
星 由雄; 野下浩平; 柳井啓司
Oral presentation, Japanese, 情報処理学会ゲーム情報学研究会
Mar. 2001 - PCクラスタを用いたWWWから高速画像収集システム
新藤雅也; 柳井啓司; 野下浩平
Oral presentation, English, 電子情報通信学会パターン認識・メディア理解研究会報告,パターン認識・メディア理解研究会報告
Mar. 2001 - WWWからの高速画像収集 と 収集画像を用いた画像認識の試み
柳井啓司
Oral presentation, Japanese, 第15回人工知能学会全国大会
2001 - WWWからの大量の収集画像を用いた画像認識の試み
柳井啓司
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会報告
2001 - キーワードと画像特徴を利用したWWWからの画像収集の試み
柳井啓司
Oral presentation, Japanese, 第14回人工知能学会全国大会論文集
2000 - 解像度選択を用いた高解像度実画像に対する画像理解システム
柳井啓司; 出口光一郎
Oral presentation, Japanese, 画像の認識・理解シンポジウムMIRU 2000論文集
2000 - マルチエージェントによる多重解像度画像理解システム
柳井啓司; 出口光一郎
Oral presentation, Japanese, 第13回人工知能学会全国大会
1999 - 高解像度画像利用のためのマルチエージェントによる多重解像度画像理解システム
柳井啓司; 出口光一郎
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会
1999 - マルチエージェント画像理解システムにおける対象間の空間的関係に関する考察
柳井啓司
Oral presentation, Japanese, 第7回マルチ・エージェントと協調計算ワークショップ(MACC '98)
1998 - 定性的モデル当てはめと空間推論による室内画像の認識
柳井啓司; 出口光一郎
Oral presentation, Japanese, 電子情報通信学会パターン認識・メディア理解研究会報告,電子情報通信学会パターン認識・メディア理解研究会報告
1998 - マルチエージェント物体認識システムにおける協調に関する考察
柳井啓司
Oral presentation, Japanese, 第6回マルチ・エージェントと協調計算ワークショップ(MACC'97)
Dec. 1997
Courses
- メディア情報学実験
Apr. 2019 - Present
The University of Electro-Communications - 情報学工房
Apr. 2018 - Present
The University of Electro-Communications - 画像認識システム特論
Apr. 2016 - Present
The University of Electro-Communications - 物体認識論
Apr. 2016 - Present
The University of Electro-Communications - 深層学習・画像認識
Apr. 2016 - Mar. 2024
電気通信大学 - 基礎プログラミングおよび演習
Apr. 2016 - Mar. 2022
The University of Electro-Communications - 深層学習と画像認識
Mar. 2020 - Mar. 2020
京都大学 霊長類研究所 - メディア情報学プログラミング演習
Apr. 2016
The University of Electro-Communications
Research Themes
- 深層学習を用いた能動的な新しい食事管理技術の創出
01 Apr. 2022 - 31 Mar. 2026 - 文字を介した視覚的コミュニケーション基盤の創成
内田 誠一; 北本 朝展; 中山 英樹; 牛久 祥孝; 柳井 啓司; 大町 真一郎; 塩入 諭; 黄瀬 浩一; 岩村 雅一; 山本 和明
日本学術振興会, 科学研究費助成事業, 九州大学, 基盤研究(A), 22H00540
01 Apr. 2022 - 31 Mar. 2025 - 機能の重ね合せを実現する深層学習におけるタスク融合学習
30 Jun. 2022 - 31 Mar. 2024 - 質感と形状の分離による奥深質感画像分析・生成のためのマルチモーダル深層学習モデル 【深奥質感】
01 Sep. 2021 - 31 Mar. 2023 - 機械可読時代における文字科学の創成と応用展開 【分担者】
内田 誠一
01 Jul. 2017 - 31 Mar. 2022 - Real-time activity recognition for animal logging based on mobile deep learning techniques
柳井 啓司
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area), 本研究では,小型デバイス上での深層学習によるリアルタイム生物行動認識の実現のために,(1)動物一人称映像の認識の研究,(2)タスクとデバイスに適応した小型ネットワークの探索,(3)センサ情報と画像認識を併用したリアルタイム行動認識の実現, (4) (1)~(3)の手法を統合的に用いた小型IoT デバイス上での実装,を実施する. 特に2019年度は1年目は基礎的研究として(1) 動物一人称映像の認識の研究,(2)タスクとデバイスに適応した小型ネットワークの探索を行った.これらは独立して研究できるので,並行して実施した. (1)では,レスキュー犬の一人称映像に対して,映像,音声に加えて,レスキュー犬が着用しているサイバースーツに搭載されている各種センサーの情報も統合して,動作認識を行う手法について研究を行った.静止画の情報,映像の動き,音声,センサーの4つの情報を統合し,従来を上回る認識精度を達成した. (2)では,ネットワーク圧縮技術であるpruningによってネットワークパラメータを減らす実験,自動的に最適なネットワーク構成を探索する技術であるNeural Architecture Search(NAS)技術によって最適なネットワークを探索する実験を実施した.また,複数タスクを同時に1つのネットワークで学習するマルチドメイン学習に関する基礎的な研究,またIoTデバイス上で,実行のみならず簡単な学習を行うことによって,環境変化に応じて動的にネットワークを更新する実験も実施した., 19H04929
01 Apr. 2019 - 31 Mar. 2021 - 信号変調に基づく視聴触覚の質感認識機構 【多元質感知】【計画班分担者】
西田 眞也
01 Oct. 2015 - 31 Mar. 2020 - 自動食事診断実現のための深層学習とWeb知識を用いた食事写真カロリー量推定
2017 - 2020 - 文字科学 ― 文字の機能の多面的解明
内田 誠一; 柳井 啓司; 牛久 祥孝
日本学術振興会, 科学研究費助成事業, 九州大学, 基盤研究(A), 「文字」は我々の文化的活動やコミュニケーションを支える最重要メディアである.本課題では,「言語であり画像でもある」という文字の二面性に注目しながら,文字の持つ多様な機能の本質を総合的に解析する新分野「文字科学」を推進する予定であった.特にこれまで注目されることのなかった文字の4 機能(周囲の明確化,知識・意味伝達,雰囲気伝達,可読性維持)について,広汎で挑戦的かつ世界にも類例のない基礎的研究群を実施する予定であった. こうした予定であったところ,本課題での実施内容を含んだ基盤S課題「機械可読時代における文字科学の創成と応用展開(17H06100)」が平成29年5月末日に採択されるに至った.その結果,重複制約により,本課題に係る事業は開始後2か月で廃止される運びとなった.なお,基盤S課題の実施担当者も,本課題と同じ3名であり,本課題で実施予定だった内容は,基盤S課題においても問題なく実施可能である. 本2か月間においては,代表者ならびに2名の分担者での議論により,上記の文字の4機能解明に係る課題の具体化を行った.特に,フォント自動生成,デザインとフォント形状の解明,情景内文字とその意味の関係解明,情景の画像情報が与える情報と情景内文字が与える情報の関係解明,アルファベット生成プロセスの工学的解明,等の課題について具体的な方法論を設定し,担当者を決定するとともに,データ収集・実装をスタートした.上述の通り,これらはいずれも前記基盤S課題に引き継がれることになっている., 17H00736
01 Apr. 2017 - 31 Mar. 2018 - 低消費電力リアルタイム画像認識実現のためのモバイル深層学習技術 【生物移動情報学】
2017 - 2018 - 単機能の重ね合せにより新機能を創発するマルチファンクショナル深層学習ネットワーク 【人工知能と脳科学】
2017 - 2018 - 実環境でロバストに動作可能な高速高精度な一般物体認識技術の開発
JST, マッチングプランナープログラム
2015 - 2016 - 文字工学リノベーション 【分担者】
内田 誠一
2014 - 2016 - 大規模位置情報画像マイニングによる画像と視覚概念の関係の地域性に関する総合的研究
2012 - 2015 - 集合知を用いた質感認知と物体認知の関係に関する大規模分析 【質感脳情報学】
2013 - 2014 - Webマルチメディアマイニングによる動詞概念と名詞概念およびその関係の自動学習
2011 - 2013 - 1000クラスに対応した大規模一般画像認識システムの実現
2008 - 2010 - 時空間情報の利用による一般物体認識の研究
2007 - 2008 - 実世界画像自動分類のためのWorldWideWebからの画像知識の獲得
2004 - 2006 - ベイジアンネットワークを用いたマルチエージェント型物体認識システムの実現
2000 - 2001 - Design and Evaluation of a Distributed Shared-Hashing Mechanism for Searching Game-Trees in Parallel
NOSHITA Kohei; YANAI Keiji; NAKAYAMA Yasuichi
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), In game-tree searching, transposition tables are used for eliminating repetitions of the identical computation for reappeared (identical or similar) positions. Transposition tables are also expected to be effective for efficient parallel searching. On a distributed parallel computer-cluster, a new method for sharing the global transposition table among component processors is designed, implemented, applied and evaluated. The basic parallel software system for communicating among network-connected processors is designed, implemented and improved. The global transposition table consists of shared-hashing tables which are distributed on processors. Two types of the shared-hashing tables are experimentally compared. Two games are used for evaluating our distributed shared-hashing method. They are mini-othello and parallel selection. Some theoretical results concerning properties of those games are obtained. By executing some parallel algorithms, several kinds of overheads as well as the computation time are counted. Based on these experimental results, various aspects of our method are evaluated. The speedup factor is shown by comparing our method (together with local hashing tables) with the local-hashing method (without the global shared-hashing table). The excellent speedups in terms of the number of processors are achieved. By our method, several instances of the parallel selection problem are solved, which have not been solved so far on a single computer. The experiments prove that our distributed shared-hashing method is efficient enough to show a good performance near the maximum on a distributed parallel environment with slow interprocessor communication, 10680340
1998 - 1999
Industrial Property Rights
- 画像スタイル変換装置,画像スタイル変換方法及び画像スタイ変換プログラム
Patent right, 特願2017-024688, Date applied: 14 Feb. 2017 - 線形識別器,大規模一般物体認識装置及び電子計算機
Patent right, 河野憲之, 柳井啓司, 特願2014-150063, Date applied: 23 Jul. 2014 - 画像ランキング方法,プログラム及び記録媒体並びに画像表示システム
Patent right, 川久保秀敏, 柳井啓司, 特願2010-109454, Date applied: 11 May 2010, 特許第5569728号, Date issued: 04 Jul. 2014 - 摂取量推定装置,摂取量推定方法及びプログラム
Patent right, 柳井啓司, 岡元晃一, 特願2014-132385, Date applied: 27 Jun. 2014 - 画像処理方法,その方法を実行するプログラム,記憶媒体,撮像機器,画像処理システム
Patent right, 柳井啓司, 特願2008-106546, Date applied: 16 Apr. 2008, 特許第5018614号, Date issued: 22 Jun. 2012
Media Coverage
Academic Contribution Activities
- International Multimedia Modeling Conference (MMM)
Academic society etc, Planning etc, General Co-chair, Jan. 2025 - European Conference on Computer Vision (ECCV)
Peer review, Program Committee Member, 2024 - ACM Multimedia
Academic society etc, Peer review, Program Committee Member, 2024 - International Conference on Learning Representation (ICLR)
Peer review, 2024 - Computer Vision and Pattern Recognition (CVPR)
Peer review, Area Chair, 2024 - 8th International Workshop on Multimedia Assisted Dietary Management (MADiMa) 企画立案・運営等
Planning etc, Co-organizer, 2023 - ACM Multimedia
Peer review, 2023 - Neural Information Processing Systems (NeurIPS)
Peer review, 2023 - International Conference on Computer Vision (ICCV)
Peer review, 2023 - Computer Vision and Pattern Recognition (CVPR)
Peer review, Program Committee Member, 2023 - International Multimedia Modeling Conference (MMM)
Academic society etc, Peer review, Jan. 2023 - Neural Information Processing Systems (NeurIPS)
Academic society etc, Peer review, Dec. 2022 - ACM Multimedia Asia
Academic society etc, Planning etc, General Co-chair, Dec. 2022 - ACM Multimedia (ACMMM)
Academic society etc, Others, Oct. 2022 - 7th International Workshop on Multimedia Assisted Dietary Management (MADiMa)
Academic society etc, Planning etc, Co-organizer, Oct. 2022 - 画像の認識・理解シンポジウム(MIRU)
Academic society etc, Others, Aug. 2022 - Computer Vision and Pattern Recognition (CVPR)
Academic society etc, Others, Jun. 2022 - ACM International Conference on Multimedia Retrieval (ICMR)
Academic society etc, Others, Jun. 2022 - International Conference on Learning Representation (ICLR)
Academic society etc, Others, Apr. 2022 - International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP)
Academic society etc, Others, Feb. 2022 - IEEE Winter Conference on Applications of Computer Vision (WACV)
Academic society etc, Others, Jan. 2022 - International Multimedia Modeling Conference (MMM)
Academic society etc, Others, Jan. 2022