Search Details ｜The University of Erectro-Communications

Name

Author

Position

Affiliation

Research Areas

TSUTOMU YOSHINAGA

Department of Computer and Network Engineering	Professor
Cluster I (Informatics and Computer Engineering)	Professor
Meta-Networking Research Center	Professor

Profile:
Current research interests include computer system, interconnection networks, cluster computing and reconfigurable systems.

Researcher Information

Degree

工学修士, 宇都宮大学

master of engineering, Utsunomiya univ.

博士(工学), 宇都宮大学

doctor of engineering, Utsunomiya University

Research Keyword

computer system

network computing

parallel and distributed system

cluster computing

routing algorithm

reconfigurable system

interconnection network

Field Of Study

Informatics, Computer systems

Career

Feb. 2023 - Present
The University of Electro-Communications, Graduate School of Informatics and Engineering, Department of Computer and Network Engineering, Cluster I （Informatics and Computer Engineering)", Professor

01 Apr. 2016
University of Electro-Communications, Graduate School of Informatics and Engineering, Professor

01 Apr. 2010
電気通信大学大学院情報システム学研究科, 情報ネットワークシステム学専攻, 教授

Educational Background

Mar. 1988
Utsunomiya University, Graduate School, Division of Engineering, 情報工学専攻

Mar. 1986
Utsunomiya University, Faculty of Engineering, 情報工学科

Apr. 1979 - Mar. 1982
Gunma Prefectural Oota High school, Japan

Member History

Jun. 2017 - Present
adviser, IEICE, Society

Mar. 2017 - Present
fellow, IEICE, Society

May 2012 - Present
senior member, the institute of electronics, information and communication engineering, Society

Jun. 2015 - Jun. 2017
Technical Committee on Computer Systems, IEICE, Society

Nov. 2016
computer science education committee member, IPSJ, Society

1999 - May 2016
reviewer, IEICE, Society

May 2013 - May 2015
CPSY Chair, the institute of electronics, information and communication engineering, Society

Nov. 2010 - May 2014
IEICE Trans. ED Editor, the institute of electronics, information and communication engineering, Society

May 2011 - Apr. 2013
CPSY vice-Chair, the institute of electronics, information and communication engineering, Society

May 2005 - May 2011
コンピュータシステム研究専門委員, 電子情報通信学会, Society

May 2006 - May 2010
和文論文誌D編集委員, 電子情報通信学会, Society

Jun. 2006 - May 2009
reviewer, IPSJ, Society

May 2007 - Mar. 2009
情報システム教育委員, 情報処理学会, Society

2002 - May 2005
CPSY secretary, the institute of electronics, information and communication engineering, Society

2001 - 2005
論文誌編集委員, 情報処理学会, Society

1995 - 1997
計算機アーキテクチャ研究会連絡委員, 情報処理学会, Society

Research Activity Information

Award

Sep. 2021
日本神経回路学会, この賞は，電子情報通信学会ニューロコンピューティング研究会で発表された全論文のうち，特に優秀な論文に送られるもので，毎年3編程度選ばれるものです．
優秀研究賞, 柳田悠介;佐藤俊治;策力木格;吉永努
Japan society, Japan

May 2015
IEICE
ISS活動功労賞

Paper

DRL-Assisted Network Selection for Federated IoV.
Ganggui Wang; Celimuge Wu; Zhaoyang Du; Tsutomu Yoshinaga; Rui Yin; Lei Zhong
IEEE Internet of Things Magazine, 6, 3, 86-90, Sep. 2023, Peer-reviwed
Scientific journal, English
URL
DOI URL

Load-based Content Allocation Scheme for Realizing Efficient Mobile Cooperative Cache
Taiki Akiba; Celimuge Wu; Tsutomu Yoshinaga
Last, International Journal of Networking and Computing, 13, 2, 93-117, Jul. 2023, Peer-reviwed
Scientific journal, English
URL
URL 2
共同研究・競争的資金等URL

Blockchain-Enabled Internet of Vehicles Applications
Junting Gao; Chunrong Peng; Tsutomu Yoshinaga; Guorong Han; Siri Guleng; Celimuge Wu
Electronics, MDPI AG, 12, 6, article 1335-62 pages, 11 Mar. 2023, Internet of Vehicles (IoV) is a network that connects vehicles and everything. IoV shares traffic data by connecting vehicles with the surrounding environment, which brings huge potential to people’s life. However, a large number of connections and data sharing will seriously consume vehicle resources during the interaction. In addition, how to build a safe and reliable connection to ensure vehicle safety is also an issue to consider. To solve the above problems, researchers introduce blockchains into IoV to build a safe and reliable vehicle network relying on the distributed account structure, immutable, transparent and security features of blockchains. We have investigated the application of blockchains in IoV in recent years, and have summarized and compared these studies according to their purposes. On this basis, we also point out the future trends and opportunities.
Scientific journal, English
URL
DOI URL

Semantic segmentation-based semantic communication system for image transmission
Jiale Wu; Celimuge Wu; Yangfei Lin; Tsutomu Yoshinaga; Lei Zhong; Xianfu Chen; Yusheng Ji
Digital Communications and Networks, Elsevier BV, 11 pages, Feb. 2023
Scientific journal, English
DOI URL
共同研究・競争的資金等URL

Communication-Efficient Federated Learning for UAV Networks with Knowledge Distillation and Transfer Learning.
Yalong Li; Celimuge Wu; Zhaoyang Du; Lei Zhong; Tsutomu Yoshinaga
GLOBECOM, 5739-5744, 2023
International conference proceedings
URL
URL 2
DOI URL

Multi-Robot Systems and Cooperative Object Transport: Communications, Platforms, and Challenges
Xing An; Celimuge Wu; Yangfei Lin; Min Lin; Tsutomu Yoshinaga; Yusheng Ji
IEEE Open Journal of Computer Society, 4, 23-36, Jan. 2023, Peer-reviwed
Scientific journal, English
URL
DOI URL

Load-Based Content Allocation for Mobile Cooperative Cache
Taiki Akiba; Celimuge Wu; Tsutomu Yoshinaga
Proc. of the 13th International Workshop on Advances in Networking and Computing (WANC2022), IEEE, 1-5, 22 Nov. 2022, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

A communication-efficient distributed machine learning scheme in vehicular network.
Yalong Li; Celimuge Wu; Lei Zhong; Tsutomu Yoshinaga
Proceedings of the Conference on Research in Adaptive and Convergent Systems (RACS22), 7 pages, 92-98, Oct. 2022, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

On-device federated learning with fuzzy logic based client selection
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Lei Zhong; Yusheng Ji
2022 International Conference on Research in Adaptive and Convergent Systems, 64-70, Oct. 2022, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

Fuzzy Logic based Client Selection for Federated Learning in Vehicular Networks
Narisu Cha; Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Lei Zhong; Jing Ma; Fuqiang Liu; Yusheng Ji
IEEE Open Journal of the Computer Society, IEEE, 3, 39-50, 31 Mar. 2022, Peer-reviwed, In vehicular networks, the problem of choosing proper clients is particularly complex due to the heterogeneity of network users, including the differences in the data, computation capability, available throughput, and samples freshness. We design a fuzzy logic based client selection scheme to address this issue.
Scientific journal, English
URL
DOI URL

Toward Efficient Blockchain for the Internet of Vehicles with Hierarchical Blockchain Resource Scheduling
Liming Gao; Celimuge Wu; Zhaoyang Du; Tsutomu Yoshinaga; Lei Zhong; Fuqiang Liu; Yusheng Ji
MDPI Electronics, MDPI, 11, 5, 1-21, 07 Mar. 2022, Peer-reviwed, In this paper, we propose a hierarchical resource scheduling scheme for blockchain-enabled IoV systems that improves the performance of the blockchain-enabled IoV system by efficiently allocating computational resources. The superiority of the proposed method is fully demonstrated by comparing it with existing baseline methods.
Scientific journal, English
DOI URL

A Failsoft Scheme for Mobile Live Streaming by Scalable Video Coding
H. Okada; M. Yoshimi; C. Wu; T. Yoshinaga
IEICE Transactions Information and Systems, IEICE, E104-D, 12, 2121-2130, 04 Dec. 2021, Peer-reviwed
Scientific journal, English
URL
DOI URL

A Reinforcement Learning based Edge Cloud Collaboration
Hiroki Kobari; Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Wugedele Bao
Proc. of 2021 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), IEEE, 4 pages, 26-29, 03 Dec. 2021, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

Deep Reinforcement Learning Based Mode Selection for Coexistence of D2D-U and Wi-Fi
Ganggui Wang; Celimuge Wu; Tsutomu Yoshinaga; Wugedele Bao; Rui Yin
Proc. of 2021 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), IEEE, 6 pages, 9-14, 03 Dec. 2021, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

Resource Management for Blockchain-enabled Internet of Vehicles
Liming Gao; Chunrong Peng; Qitu Hu; Celimuge Wu; Tsutomu Yoshinaga; Wugedele Bao; Siri Guleng
Proc. of 2021 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), IEEE, 7 pages, 164-170, 03 Dec. 2021, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

A Fuzzy Logic Controller for Greenhouse Temperature Regulation System Based on Edge Computing.
Yue Ren; Celimuge Wu; Tsutomu Yoshinaga; Wugedele Bao
Mobile Networks and Management - 11th EAI International Conference(MONAMI), Springer, 18 pages, 316-332, Oct. 2021, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

UAV-Empowered Vehicular Networking Scheme for Federated Learning in Delay Tolerant Environments
Zhaoyang Du; Ganggui Wang; Narisu Cha; Celimuge Wu; Tsutomu Yoshinaga; Rui Yin
Proc. of The 24th IEEE international conference on computational science and engineering (IEEE CSE 2021), IEEE, 72-79, Oct. 2021, Peer-reviwed
International conference proceedings, English
URL
URL 2
DOI URL

Toward Agile Information and Communication Framework for the Post-COVID-19 Era
Celimuge Wu; Chunrong Peng; Zhaoyang Du; Liming Gao; Tsutomu Yoshinaga; Yusheng Ji
IEEE Open Journal of the Computer Society, IEEE, 2, 290-299, Aug. 2021, Peer-reviwed
Scientific journal, English
URL
DOI URL

Integrating autonomous decentralized communication and edge computing for real-time control in IoT system
Masaya Harada; Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Wugedele Bao; Yusheng Ji
Proc. of The 7th Euro-China Conference on Intelligent Data Analysis and Applications (ECC-2021), 9 pages, May 2021, Peer-reviwed
International conference proceedings, English

A Brief Review of Multipath TCP for Vehicular Networks
Luomeng Chao; Celimuge Wu; Tsutomu Yoshinaga; Wugedele Bao; Yusheng Ji
MDPI Open Access Journal Sensors, MDPI, 21, 8, 1-34, 15 Apr. 2021, Peer-reviwed, In this paper, we first conduct a brief survey of existing MPTCP studies and give a brief overview to multipath routing. Then we discuss the significance technical challenges in applying MPTCP for vehicular networks and point out future research directions.
Scientific journal, English
URL
DOI URL

Multi-Channel Blockchain Scheme for Internet of Vehicles
Liming Gao; Celimuge Wu; Tsutomu Yoshinaga; Xianfu Chen; Yusheng Ji
IEEE Open Journal of the Computer Society, IEEE, 2, 192-203, 31 Mar. 2021, Peer-reviwed
Scientific journal, English
URL
DOI URL

Coexistence Analysis of D2D-Unlicensed and Wi-Fi Communications
Ganggui Wang; Celimuge Wu; Tsutomu Yoshinaga; Rui Yin; Tutomu Murase; Kok-Lim Alvin Yau; Wugedele Bao; Yusheng Ji
Wireless Communications and Mobile Computing Journal, Hindawi, 2021, Article ID 5523273, 1-11, 25 Mar. 2021, Peer-reviwed
Scientific journal, English
URL
DOI URL

Virtual Edge: Exploring Computation Offloading in Collaborative Vehicular Edge Computing
Narisu Cha; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji; Kok-Lim Alvin Yau
IEEE Access, IEEE, 9, 37739-37751, 02 Mar. 2021, Peer-reviwed, We design a virtual edge formation algorithm that considers both the stability of virtual edge and the computational resources available at the vehicles constituting the virtual edge.
Scientific journal, English
URL
DOI URL

A Peak-Avoidance Scheme for Chasing Playback of Mobile Live Streaming
Hiroki Okada; Masato Yoshimi; Celimuge Wu; Tsutomu Yoshinaga
Proc. of the 2020 Eighth International Symposium on Computing and Networking Workshops, IEEE, 474-476, 22 Feb. 2021, Peer-reviwed, In this study, we propose a mechanism called Adaptive Failsoft Control to avoid peak traffic in mobile live streaming with a chasing playback function.
International conference proceedings, English
DOI URL

A Routing Protocol for UAV-assisted Vehicular Delay Tolerant Networks
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Xianfu Chen; Wang Xiaoyan; Kok-Lim Alvin Yau; Yusheng Ji
IEEE Open Journal of the Computer Society, 1, 45-61, 28 Jan. 2021, Peer-reviwed
Scientific journal, English
URL
DOI URL

Virtual Edge: Collaborative Computation Offloading in VANETs
Narisu Cha; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proc. of the 10th EAI International Conference on Mobile Networks and Management (MONAMI 2020), (LNICST,volume 338), Springer Nature Switzerland AG 2020., 79-93, 22 Dec. 2020, Peer-reviwed, In this paper, we propose a virtual edge scheme where a node can offload its tasks to a virtual edge node that consists of multiple vehicles in vicinity.
International conference proceedings, English
URL
DOI URL

UAV-empowered Protocol for empowered Protocol for Information Sharing in VDTN
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga
Proc. of The 16th International Conference on Mobility, Sensing and Networking (MSN 2020), IEEE, 626-627, 17 Dec. 2020, Peer-reviwed
International conference proceedings, English
DOI URL

Context-Aware Clustering for SDN Enabled Network
Ran Duo; Celimuge Wu; Yoshinaga Tsutomu; Yusheng Ji
AIMCOM2 Workshop at the 28th IEEE International Conference on Network Protocols (ICNP 2020), Oct. 2020, Peer-reviwed
International conference proceedings
DOI URL

Vehicle Speed Prediction with Convolutional Neural Networks for ITS
Yifei Li; Celimuge Wu; Tsutomu Yoshinaga
Proc. of the 2020 IEEE/CIC International Conference on Communications in China (ICCC Workshops), IEEE, 41-46, 09 Aug. 2020, Peer-reviwed, In this paper, we propose a convolutional neural network-based approach for a better estimation of vehicle traffics.
International conference proceedings, English
URL
DOI URL

Collaborative Learning of Communication Routes in Edge-enabled Multi-access Vehicular Environment
Celimuge Wu; Zhi Liu; Fuqiang Liu; Tsutomu Yoshinaga; Yusheng Ji; Jie Li
IEEE Transactions on Cognitive Communications and Networking, IEEE, 4, 4, 1155-1165, 15 Jun. 2020, Peer-reviwed, In this paper, we propose a collaborative learning-based routing scheme for multi-access vehicular edge computing environment.
Scientific journal, English
DOI URL

Federated Learning for Vehicular Internet of Things: Recent Advances and Open Issues
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Kok-Lim Alvin Yau; Yusheng Ji; Jie Li
IEEE Open Journal of the Computer Society, IEEE, Early Access, 06 May 2020, Peer-reviwed, In this paper, we first conduct a brief survey of existing studies on FL and its use in wireless IoT. Then we discuss the significance and technical challenges of applying FL in vehicular IoT, and point out future research directions.
Scientific journal, English
DOI URL

Federated Learning for Vehicular Internet of Things: Recent Advances and Open Issues
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Kok-Lim Alvin Yau; Yusheng Ji; Jie Li
IEEE Open Journal of the Computer Society, IEEE, 1, 61, 45-61, 06 May 2020, Peer-reviwed, In this paper, we first conduct a brief survey of existing studies on FL and its use in wireless IoT. Then, we discuss the significance and technical challenges of applying FL in vehicular IoT, and point out future research directions.
Scientific journal, English
DOI URL

SDN-based Handover Scheme in Cellular/IEEE 802.11p Hybrid Vehicular Networks
Ran Duo; Celimuge Wu; Tsutomu Yoshinaga; Jiefang Zhang; Yusheng Ji
MDPI Open Access Journal Sensors, MDPI, 20, 4, 1-17, 17 Feb. 2020, Peer-reviwed
Scientific journal, English
DOI URL

A VDTN scheme with enhanced buffer management
Zhaoyang Du; Celimuge Wu; Xianfu Chen; Xiaoyan Wang; Tsutomu Yoshinaga; Yusheng Ji
Wireless Networks, Springer Nature, 26, 1537-1548, 05 Jan. 2020, Peer-reviwed
Scientific journal, English
URL
DOI URL

Traffic Big Data Assisted Broadcast in Vehicular Networks
Siri Guleng; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proceedings of the Conference on Research in Adaptive and Convergent Systems (RACS'19), ACM, 236-240, 24 Sep. 2019, Peer-reviwed, In this paper, we propose a traffic big data assisted broadcast scheme in VANETs.
International conference proceedings, English
DOI URL

Traffic Flow Prediction with Compact Neural Networks
Yuhang Li; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Int. Conf. on Cloud and Big Data Computing (CBDCom), IEEE, 5 pages, 1072-1076, Aug. 2019, Peer-reviwed
International conference proceedings, English

Integrating Licensed and Unlicensed Spectrum in Internet-of-Vehicles with Mobile Edge Computing
Celimuge Wu; Xianfu Chen; Tsutomu Yoshinaga; Yusheng Ji; Yan Zhang
IEEE Network Magazine, IEEE, 33, 4, 48-53, 31 Jul. 2019, Peer-reviwed, In this article, we propose a context-aware communication approach to efficiently integrate different licensed and unlicensed spectrums leveraging the edge computing technologies.
Scientific journal, English
URL
DOI URL

Performance Evaluation of RPL-Based Sensor Data Collection in Challenging IoT Environment
Liming Gao; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proceedings of the 2nd International Conference on Healthcare Science and Engineering, Springer Nature Singapore Pte Ltd., 275-285, May 2019, Peer-reviwed
International conference proceedings, English
DOI URL

Decentralized Trust Evaluation in Vehicular Internet of Things
Siri Guleng; Celimuge Wu; Xianfu Chen; Xiaoyan Wang; Tsutomu Yoshinaga; Yusheng Ji
IEEE Access, IEEE, 2019, 7, 15980-15988, Feb. 2019, Peer-reviwed
Scientific journal, English
URL
DOI URL

Computational Intelligence Inspired Data Delivery for Vehicle-to-roadside Communications
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji; Yan Zhang
IEEE Transactions on Vehicular Technology, IEEE, 67, 12, 12038-12048, Dec. 2018, Peer-reviwed
Scientific journal, English
DOI URL

A Color-Based Cooperative Caching Strategy for Time-Shifted Live Video Streaming
Hiroki Okada; Takayuki Shiroma; Celimuge Wu; Tsutomu Yoshinaga
Proc. of the 6th International Symposium on Computer Systems and Architectures (CSA'18), 6 pages, 29 Nov. 2018, Peer-reviwed
International conference proceedings, English

The template-based sub-optimal content distribution for a D2D content sharing network
Takayuki Shiroma; Celimuge Wu; Tsutomu Yoshinaga
Proc. of the 6th International Symposium on Computing and Networking (CANDAR 2018), IEEE, 6 pages, 28 Nov. 2018, Peer-reviwed
International conference proceedings, English
DOI URL

System Performance Assessment and Sizing for Cloud-based Data Backup
Y. Taguchi; T. Yoshinaga
Journal of Information Processing (ACS), IPSJ, 11, 2, 1-10, Nov. 2018, Peer-reviwed
Scientific journal, English

SDN-based Handover Approach in IEEE 802.11p and LTE hybrid vehicular networks
Ran Duo; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proc. of The IEEE International Conference on Cloud and Big Data Computing (CBDCom 2018), IEEE, 6 pages, 10 Oct. 2018, Peer-reviwed
International conference proceedings, English

A Prophet-based DTN protocol for VANETs
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proc. of The IEEE International Conference on Cloud and Big Data Computing (CBDCom 2018), IEEE, 4 pages, 1876-1879, 10 Oct. 2018, Peer-reviwed
International conference proceedings, English
DOI URL

Spatial Intelligence toward Trustworthy Vehicular IoT
Celimuge Wu; Zhi Liu; Di Zhang; Tsutomu Yoshinaga; Yusheng Ji
IEEE Communications Magazine, IEEE, 56, 10, 22-27, Oct. 2018, Peer-reviwed
Scientific journal, English
DOI URL

System Resource Management to Control the Risk of Data-Loss in Cloud-based Disaster Recovery
Yuichi Taguchi; Tsutomu Yoshinaga
Proc. of The 6th IEEE International Workshop on Architecture, Design, Deployment and Management of Networks and Applications (ADMNET), IEEE, 210-215, 23 Jul. 2018, Peer-reviwed, A method for system resource management of a cloud-based disaster recovery is proposed.
International conference proceedings, English
DOI URL

A context-aware edge-based VANET communication scheme for ITS
Chang An; Celimuge Wu; Tsutomu Yoshinaga; Xianfu Chen; Yusheng Ji
Sensors (Switzerland), MDPI AG, 18, 7, 1-15, 01 Jul. 2018, Peer-reviwed, We propose a context-aware edge-based packet forwarding scheme for vehicular networks. The proposed scheme employs a fuzzy logic-based edge node selection protocol to find the best edge nodes in a decentralized manner, which can achieve an efficient use of wireless resources by conducting packet forwarding through edges. A reinforcement learning algorithm is used to optimize the last two-hop communications in order to improve the adaptiveness of the communication routes. The proposed scheme selects different edge nodes for different types of communications with different context information such as connection-dependency (connection-dependent or connection-independent), communication type (unicast or broadcast), and packet payload size. We launch extensive simulations to evaluate the proposed scheme by comparing with existing broadcast protocols and unicast protocols for various network conditions and traffic patterns.
Scientific journal, English
DOI URL

Learning for Adaptive Anycast in Vehicular Delay Tolerant Networks
Celimuge Wu; Tsutomu Yoshinaga; Dabhur Bayar; Yusheng Ji
Journal of Ambient Intelligence and Humanized Computing, Springer Berlin Heidelberg, 1-10, 12 May 2018, Peer-reviwed
Research society, English
URL
DOI URL

Cluster-Based Content Distribution Integrating LTE and IEEE 802.11p with Fuzzy Logic and Q-Learning
Celimuge Wu; Tsutomu Yoshinaga; Xianfu Chen; Lin Zhang; Yusheng Ji
IEEE Computational Intelligence Magazine, Institute of Electrical and Electronics Engineers Inc., 13, 1, 41-50, 01 Feb. 2018, Peer-reviwed, There is an increasing demand for distributing a large amount of content to vehicles on the road. However, the cellular network is not sufficient due to its limited bandwidth in a dense vehicle environment. In recent years, vehicular ad hoc networks (VANETs) have been attracting great interests for improving communications between vehicles using infrastructure-less wireless technologies. In this paper, we discuss integrating LTE (Long Term Evolution) with IEEE 802.11p for the content distribution in VANETs. We propose a two-level clustering approach where cluster head nodes in the first level try to reduce the MAC layer contentions for vehicle-tovehicle (V2V) communications, and cluster head nodes in the second level are responsible for providing a gateway functionality between V2V and LTE. A fuzzy logic-based algorithm is employed in the first-level clustering, and a Q-learning algorithm is used in the second-level clustering to tune the number of gateway nodes. We conduct extensive simulations to evaluate the performance of the proposed protocol under various network conditions. Simulation results show that the proposed protocol can achieve 23% throughput improvement in highdensity scenarios compared to the existing approaches.
Scientific journal, English
DOI URL

A learning-based probabilistic routing protocol for vehicular delay tolerant networks
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proceedings of 4th International Conference on Information and Communication Technologies for Disaster Management (ICT-DM2017), IEEE, 1-6, 12 Dec. 2017, Peer-reviwed, We propose a probabilistic routing protocol for VDTNs. The protocol takes into account the vehicle velocity, node centrality, and node buffer size using a fuzzy logic-based approach.
International conference proceedings, English
DOI URL

Vehicular multi-access edge computing with licensed sub-6 GHz, IEEE 802.11p and mmWave
Qitu Hu; Celimuge Wu; Xiaobing Zhao; Xianfu Chen; Yusheng Ji; Tsutomu Yoshinaga
IEEE Access, Institute of Electrical and Electronics Engineers Inc., 6, 1, 1995-2004, 07 Dec. 2017, Peer-reviwed, With the rapid increase of vehicular Internet of things applications, it is urgent to design a mobile edge computing architecture, which is possible to distribute and process a large amount of contents with vehicles on the road. From a communication perspective, the current cellular technology faces challenges due to the limited bandwidth in a dense vehicle environment. In this paper, we propose a multi-access edge computing framework and the corresponding communication protocol which integrates licensed Sub-6 GHz band, IEEE 802.11p, and millimeter wave (mmWave) communications for the content distribution and processing in vehicular networks. The proposed protocol uses a cluster-based approach, where a fuzzy logic-based algorithm is employed to select efficient gateway nodes which bridge the licensed Sub-6 GHz communication and the mmWave communication in order to maximize the overall network throughput. IEEE 802.11p vehicle-to-vehicle communication is used to share information among vehicles in order to achieve efficient clustering. We conduct extensive simulations to evaluate the performance of the proposed protocol under various network conditions. Simulation results show that the proposed protocol can achieve significant improvements in various scenarios compared with the existing approaches.
Scientific journal, English
DOI URL

Cooperative Content Delivery in Vehicular Networks with Integration of Sub-6 GHz and mmWave
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proc. of the IEEE Global Communications Conference Workshops, IEEE, 6 pages, 04 Dec. 2017, Peer-reviwed
International conference proceedings, English

Pipelined parallel join and its FPGA-based acceleration
Masato Yoshimi; Yasin Oge; Tsutomu Yoshinaga
ACM Transactions on Reconfigurable Technology and Systems, Association for Computing Machinery, 10, 4, 28:1-28:8, 01 Dec. 2017, Peer-reviwed, A huge amount of data is being generated and accumulated in data centers, which leads to an important increase in the required energy consumption to analyze these data. Thus, we must consider the redesign of current computer systems architectures to be more friendly to applications based on distributed algorithms that require a high data transfer rate. Novel computer architectures that introduce dedicated accelerators to enable near-data processing have been discussed and developed for high-speed big-data analysis. In this work, we propose a computer system with an FPGA-based accelerator, namely, interconnected-FPGAs, which offers two advantages: (1) direct data transmission and (2) offloading computation into data-flow in the FPGA. In this article, we demonstrate the capability of the proposed interconnected-FPGAs system to accelerate join operations in a relational database. We developed a newparallel join algorithm, PPJoin, targeted to big-data analysis in a shared-nothing architecture. PPJoin is an extended version of the NUMA-based parallel join algorithm, created by overlapping computation by multicore processors and data communication. The data communication between computational nodes can be accelerated by direct data transmission without passing through the main memory of the hosts. To confirm the performance of the PPJoin algorithm and its acceleration process using an interconnected-FPGA platform, we evaluated a simple query for large tables. Additionally, to support availability, we also evaluated the actual benchmark query. Our evaluation results confirm that the PPJoin algorithm is faster than a software-based query engine by 1.5-5 times. Moreover, we experimentally confirmed that the direct data transmission by interconnected FPGAs reduces computational time around 20% for PPJoin. 2017 Copyright is held by the owner/author(s).
Scientific journal, English
DOI URL

Color-Based Cooperative Cache and Its Routing Scheme for Telco-CDNs
Takuma Nakajima; Masato Yoshimi; Celimuge Wu; Tsutomu Yoshinaga
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E100D, 12, 2847-2856, Dec. 2017, Peer-reviwed, Cooperative caching is a key technique to reduce rapid growing video-on-demand's traffic by aggregating multiple cache storages. Existing strategies periodically calculate a sub-optimal allocation of the content caches in the network. Although such technique could reduce the generated traffic between servers, it comes with the cost of a large computational overhead. This overhead will be the cause of preventing these caches from following the rapid change in the access pattern. In this paper, we propose a light-weight scheme for cooperative caching by grouping contents and servers with color tags. In our proposal, we associate servers and caches through a color tag, with the aim to increase the effective cache capacity by storing different contents among servers. In addition to the color tags, we propose a novel hybrid caching scheme that divides its storage area into colored LFU (Least Frequently Used) and no-color LRU (Least Recently Used) areas. The colored LFU area stores color-matching contents to increase cache hit rate and no-color LRU area follows rapid changes in access patterns by storing popular contents regardless of their tags. On the top of the proposed architecture, we also present a new routing algorithm that takes benefit of the color tags information to reduce the traffic by fetching cached contents from the nearest server. Evaluation results, using a backbone network topology, showed that our color-tag based caching scheme could achieve a performance close to the sub-optimal one obtained with a genetic algorithm calculation, with only a few seconds of computational overhead. Furthermore, the proposed hybrid caching could limit the degradation of hit rate from 13.9% in conventional non-colored LFU, to only 2.3%, which proves the capability of our scheme to follow rapid insertions of new popular contents. Finally, the color-based routing scheme could reduce the traffic by up to 31.9% when compared with the shortest-path routing.
Scientific journal, English
DOI URL

A Light-weight Cooperative Caching Strategy by D2D Content Sharing
Takayuki Shiroma; Takuma Nakajima; Celumuge Wu; Tsutomu Yoshinaga
Proc. of the Fifth International Symposium on Computing and Networking (CANDAR 2017), IEEE, 159-165, 20 Nov. 2017, Peer-reviwed
International conference proceedings, English

V2R Communication Protocol Based on Game Theory Inspired Clustering
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Proc. of IEEE 86th Vehicular Technology Conference (VTC2017-Fall), 1-5, 24 Sep. 2017, Peer-reviwed
International conference proceedings, English

Scalable Photonic Networks-on-Chip Architecture Based on a Novel Wavelength-Shifting Mechanism
A. Ben Ahmed; Tsutomu Yoshinaga; A. Ben Abdallah
IEEE Transactions on Emerging Topics in Computing (April-June 2020), IEEE, 8, 2, 533-544, 09 Aug. 2017, Peer-reviwed, We propose in this paper an alternative to these two conventional PNoC architectures. Our proposed system is based on a novel Wavelength-Shifting mechanism, which combines the benefits of the previously mentioned schemes while limiting their drawbacks.
Scientific journal, English
URL
DOI URL

A Reinforcement Learning-Based Data Storage Scheme for Vehicular Ad Hoc Networks
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji; Tutomu Murase; Yan Zhang
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 66, 7, 6336-6348, Jul. 2017, Peer-reviwed, Vehicular ad hoc networks (VANETs) have been attracting interest for their potential roles in intelligent transport systems (ITS). In order to enable distributed ITS, there is a need to maintain some information in the vehicular networks without the support of any infrastructure such as road side units. In this paper, we propose a protocol that can store the data in VANETs by transferring data to a new carrier (vehicle) before the current data carrier is moving out of a specified region. For the next data carrier node selection, the protocol employs fuzzy logic to evaluate instant reward by taking into account multiple metrics, specifically throughput, vehicle velocity, and bandwidth efficiency. In addition, a reinforcement learning-based algorithm is used to consider the future reward of a decision. For the data collection, the protocol uses a cluster-based forwarding approach to improve the efficiency of wireless resource utilization. We use theoretical analysis and computer simulations to evaluate the proposed protocol.
Scientific journal, English
DOI URL

DTN-based Vehicular Cloud for Post-disaster Information Sharing
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
2017 WIRELESS DAYS, IEEE, 167-172, 2017, Peer-reviwed, We first propose a framework which utilizes vehicular delay tolerant network (DTN) to form a vehicular cloud in order to provide information exchange without communication infrastructure. The framework does not rely on cellular network and therefore provides an approach which is suitable for post-disaster communication where cellular network is unavailable or severely congested. The paper also proposes a protocol which is able to provide vehicle-to-cloud communication in frequently changing vehicular environment. The protocol takes into account the link throughput, additional signal coverage, connection time, and the probability to encounter a RSU for the forwarder selection by using a fuzzy logic-based approach. The protocol also employs a network coding approach to reduce the overhead while maintaining a high data delivery ratio. We use computer simulations to evaluate the proposed framework.
International conference proceedings, English

Multihop Data Delivery Virtualization for Green Decentralized IoT
Lifeng Zhang; Celimuge Wu; Tsutomu Yoshinaga; Xianfu Chen; Tutomu Murase; Yusheng Ji
Wireless Communications and Mobile Computing, Hindawi Limited, 2017, 9805784, 9 pages, 2017, Peer-reviwed, Decentralized communication technologies (i.e., ad hoc networks) provide more opportunities for emerging wireless Internet of Things (IoT) due to the flexibility and expandability of distributed architecture. However, the performance degradation of wireless communications with the increase of the number of hops becomes the main obstacle in the development of decentralized wireless IoT systems. The main challenges come from the difficulty in designing a resource and energy efficient multihop communication protocol. Transmission control protocol (TCP), the most frequently used transport layer protocol for achieving reliable end-to-end communications, cannot achieve a satisfactory result in multihop wireless scenarios as it uses end-to-end acknowledgment which could not work well in a lossy scenario. In this paper, we propose a multihop data delivery virtualization approach which uses multiple one-hop reliable transmissions to perform multihop data transmissions. Since the proposed protocol utilizes hop-by-hop acknowledgment instead of end-to-end feedback, the congestion window size at each TCP sender node is not affected by the number of hops between the source node and the destination node. The proposed protocol can provide a significantly higher throughput and shorter transmission time as compared to the end-to-end approach. We conduct real-world experiments as well as computer simulations to show the performance gain from our proposed protocol.
Scientific journal, English
DOI URL

Context-aware Unified Routing for VANETs Based on Virtual Clustering
Celimuge Wu; Tsutomu Yoshinaga; Yusheng J
Proc. of the 2nd International Workshop on Vehicular Networking and Intelligent Transportation systems (VENITS'16), 281-286, 04 Sep. 2016, Peer-reviwed, We propose a context-aware routing protocol for vehicular ad hoc networks (VANETs).
International conference proceedings, English

A Cooperative Forwarding Scheme for VANET Routing Protocols
Celimuge Wu; Yusheng Ji; Tsutomu Yoshinaga
ZTE Communications, Editorial Office of ZTE Communications, 14, 3, 13-21, 25 Aug. 2016, Peer-reviwed, In this paper, we propose a loss-tolerant scheme for unicast routing protocols in VANETs. The proposed scheme employs multiple forwarding nodes to improve the packet reception ratio at the forwarding nodes. T
Scientific journal, English
URL
DOI URL

Accelerating BLAST Computation on an FPGA-enhanced PC Cluster
Masato Yoshimi; Celimuge Wu; Tsutomu Yoshinaga
2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), IEEE, 67-76, 2016, Peer-reviwed, This paper introduces an FPGA-based scheme to accelerate mpiBLAST, which is a parallel sequence alignment algorithm for computational biology. Recent rapidly growing biological databases for sequence alignment require high-throughput storage and network rather than computing speed. Our scheme utilizes a specialized hardware configured on an FPGA-board which connects flash storage and other FPGA-boards directly. The specialized hardware configured on the FPGAs, we call a Data Stream Processing Engine (DSPE), take a role for preprocessing to adjust data for high-performance multi-and many-core processors simultaneously with offloading system-calls for storage access and networking. DSPE along the datapath achieves in-datapath computing which applies operations for data streams passing through the FPGA. Two functions in mpiBLAST are implemented using DSPE to offload operations along the datapath. The first function is database partitioning, which distributes the biological database to multiple computing nodes before commencing the BLAST processes. Using DSPE, we observe a 20-fold improvement in computation time for the database partitioning operation. The second function is an early part of the BLAST process that determines the positions of sequences for more detailed computations. We implement IDP-BLAST (In-datapath BLAST), which annotates positions in data streams from solid-state drives. We show that IDP-BLAST accelerates the computation time of the preprocess of BLAST by a factor of three hundred by offloading heavy operations to the introduced special hardware.
International conference proceedings, English
DOI URL

A Light-weight Content Distribution Scheme for Cooperative Caching in Telco-CDNs
Takuma Nakajima; Masato Yoshimi; Celimuge Wu; Tsutomu Yoshinaga
2016 FOURTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), IEEE, 126-132, 2016, Peer-reviwed, A key technique to reduce the rapid growing of video-on-demand's traffic is a cooperative caching strategy aggregating multiple cache storages. Many internet service providers have considered the use of cache servers on their networks as a solution to reduce the traffic. Existing schemes often periodically calculate a sub-optimal allocation of the content caches in the network. However, such approaches require a large computational overhead that cannot be amortized in a presence of frequent changes of the contents' popularities. This paper proposes a light-weight scheme for a cooperative caching that obtains a sub-optimal distribution of the contents by focusing on their popularities. This was made possible by adding color tags to both cache servers and contents. In addition, we propose a hybrid caching strategy based on Least Frequently Used (LFU) and Least Recently Used (LRU) schemes, which efficiently manages the contents even with a frequent change in the popularity. Evaluation results showed that our light-weight scheme could considerably reduce the traffic, reaching a sub-optimal result. In addition, the performance gain is obtained with a computation overhead of just a few seconds. The evaluation results also showed that the hybrid caching strategy could follow the rapid variation of the popularity. While a single LFU strategy drops the hit ratio by 13.9%, affected by rapid popularity changes, our proposed hybrid strategy could limit the degradation to only 2.3%.
International conference proceedings, English
DOI URL

Design and Evaluation of a Configurable Query Processing Hardware for Data Streams
Yasin Oge; Masato Yoshimi; Takefumi Miyoshi; Hideyuki Kawashima; Hidetsugu Irie; Tsutomu Yoshinaga
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E98D, 12, 2207-2217, Dec. 2015, Peer-reviwed, In this paper, we propose Configurable Query Processing Hardware (CQPH), an FPGA-based accelerator for continuous query processing over data streams. CQPH is a highly optimized and minimal-overhead execution engine designed to deliver real-time response for high-volume data streams. Unlike most of the other FPGA-based approaches, CQPH provides on-the-fly configurability for multiple queries with its own dynamic configuration mechanism. With a dedicated query compiler, SQL-like queries can be easily configured into CQPH at run time. CQPH supports continuous queries including selection, group-by operation and sliding-window aggregation with a large number of overlapping sliding windows. As a proof of concept, a prototype of CQPH is implemented on an FPGA platform for a case study. Evaluation results indicate that a given query can be configured within just a few microseconds, and the prototype implementation of CQPH can process over 150 million tuples per second with a latency of less than a microsecond. Results also indicate that CQPH provides linear scalability to increase its flexibility (i.e., on-the-fly configurability) without sacrificing performance (i.e., maximum allowable clock speed).
Scientific journal, English
DOI URL

An Implementation of Cloud Environment with Adaptive Computing Resource Sharing
Takuma NAKAJIMA; Masato YOSHIMI; Hidetsugu IRIE; Tsutomu; YOSHINAGA
IEICE Trans. on Information Systems, IEICE, J98-D, 8, 1142-1150, 05 Aug. 2015, Peer-reviwed
Scientific journal, Japanese
DOI URL

An Efficient Cache Grouping Strategy for Multinode Cache Networks
Takayuki Shiroma; Takuma Nakajima; Kouta Nojima; Masato Yoshimi; Tsutomu Yoshinaga
PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), IEEE, 295-298, 2015, Peer-reviwed, Popularizations of Video-On-Demand (VOD) services cause explosive and continuous growth of the Internet traffic. Web cache servers are widely utilized fot reducing such VOD traffic. However, ordinary cache strategies such as LRU often degrade cache efficiency of a multi-node cache network as popular contents are cached on all servers that squeezes the total amount of cache capacity. This paper proposes a novel strategy called Cache Grouping to improve cache efficiency in the multi-node cache network. The Cache Grouping organizes multiple web cache servers into a single cache server to increase virtual cache capacity and diversity of stored contents. Compared to a conventional cache strategy, the Cache Grouping reduces maximum 59% of web server transmissions and improves 20% of download time to process all requests while maintaining a total amount of transmissions.
International conference proceedings, English
DOI URL

Packet Size-Aware Broadcasting in VANETs With Fuzzy Logic and RL-Based Parameter Adaptation
Celimuge Wu; Xianfu Chen; Yusheng Ji; Fuqiang Liu; Satoshi Ohzahata; Tsutomu Yoshinaga; Toshihiko Kato
IEEE ACCESS, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 3, 2481-2491, 2015, Peer-reviwed, Most existing multi-hop broadcast protocols for vehicular ad hoc networks do not consider the problem of how to adapt transmission parameters according to the network environment. Besides the propagation environment that determines the channel bit error rate, packet payload size has a significant effect on the packet loss rate. In this paper, we first discuss the effect of packet size on the packet reception ratio, and then propose a broadcast protocol that is able to specify the best relay node by taking into account the data payload size. The proposed protocol employs a fuzzy logic-based algorithm to jointly consider multiple metrics (link quality, inter-vehicle distance, and vehicle mobility) and uses a redundancy transmission approach to ensure high reliability. Since the fuzzy membership functions are tuned by using reinforcement learning, the protocol can adapt to various network scenarios. We use both real-world experiments and computer simulations to evaluate the proposed protocol.
Scientific journal, English
DOI URL

UDU-L: An Intutive Device Access Method by Laser Pointing
Masato KOGI; Yuta OKI; Tsutomu YOSHINAGA; Hidetsugu IRIE
IEICE TRANSACTIONS on Information and Systems (Japanese edition), The Institute of Electronics, Information and Communication Engineers, Vol. J97-D, No.1, 155-164, Jan. 2014, Peer-reviwed, ネットワークに接続できるデバイスが普及し,各々のデバイスがもつデータや機能をネットワーク上で共有するデバイス連携が増加している.しかし,既存技術によるデバイス接続では,対象デバイスをネットワーク上からユーザが手動で選択する必要があり,デバイス同士が物理的に近くにあっても,通信したいと思ったときにその場で接続を行うことは煩雑である.そこで,通信対象のデバイスに対して直観的に接続する方法として,レーザを利用した接続方法UDU-Lを提案する.UDU-Lは,可視光レーザを用いレーザポインタのように通信したいデバイスを直接指定する.このとき,レーザ光の明滅によって,対象デバイスへ接続に必要な情報を送信し通信を確立する.実装の結果,5m離れた対象デバイスを正確に指定して通信を行うことができた.また応用例として,手元の携帯端末内の画像をディスプレイへ転送・表示するアプリケーションを実装した.
Scientific journal, Japanese
URL

An FPGA-based Tightly Coupled Accelerator for Data-intensive Applications
Masato Yoshimi; Ryu Kudo; Yasin Oge; Yuta Terada; Hidetsugu Irie; Tsutomu Yoshinaga
2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), IEEE, 289-296, 2014, Peer-reviwed, Computation beside a data source plays an important role in achieving a high performance with low energy consumption in Big Data processing. In contrast to that of a conventional workload, the processing of Big Data frequently requires that a massive amount of data in distributed storage be scanned. A key technique for reducing energy-consuming processor loads is to install a reconfigurable accelerator that is tightly coupled to a computational resource with interfaces. The accelerator is capable of configuring application-specific hardware modules to allow some logical and arithmetic operations for data stream transmission between interfaces, as well as the offloading of control protocols for communication with other computing nodes or storage. In this paper, an FPGA-based accelerator, which is directly attached to DRAM, the network, and storage, is proposed in order to realize an energy efficient computing system. A simple application that counts the words appearing in the data is implemented to evaluate a prototype system. As the accelerator outperforms by 80.66 to 429 times similar applications executed on an SSD-based Hadoop framework, we confirm that the accelerator's utilization for Big Data processing is beneficial.
International conference proceedings, English
URL
DOI URL

Accelerating OLAP workload on interconnected FPGAs with Flash storage
Masato Yoshimi; Ryu Kudo; Yasin Oge; Yuta Terada; Hidetsugu Irie; Tsutomu Yoshinaga
2014 SECOND INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), IEEE, 440-446, 2014, Peer-reviwed, The data volume used in online analytical processing (OLAP) applications is rapidly increasing because of the increasing popularity of various Web services and emerging sensor technologies. Since the amount of accumulated data is frequently too large to store in an in-memory database, it is necessary to have a secondary storage to store such big data. On the basis of this premise, the most important factor to determine the performance of data-intensive applications is to reduce the number and the size of the data transfers between the secondary storage and the main memory. To achieve an energy-efficient computing environment, offloading a user-defined function (UDF) onto interconnected FPGA-boards that equip high-speed storage is effective due to FPGA's performance ratio of operations per I/O. In this paper, we focus on the aggregate operations that are popularly used UDF in OLAP, and propose an acceleration scheme utilizing interconnected FPGAs with flash storage. The scheme is by introducing an accelerator modules which apply operations to data-stream passing through the FPGA, in addition to appropriate data distribution and partitioning. We implemented an accelerator module that aggregates the data transferred from the flash storage to the DRAM in order to show availability. Through preliminary evaluations of the accelerator, we confirmed that aggregate operations supported by the active-disk mechanism outperforms a software-based database management system by more than 30 times.
International conference proceedings, English
DOI URL

A Fully Optical Ring Network-on-Chip with Static and Dynamic Wavelength Allocation
Ahmadou Dit Adi Cisse; Michihiro Koibuchi; Masato Yoshimi; Hidetsugu Irie; Tsutomu Yoshinaga
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E96D, 12, 2545-2554, Dec. 2013, Peer-reviwed, Silicon photonics Network-on-Chips (NoCs) have emerged as an attractive solution to alleviate the high power consumption of traditional electronic interconnects. In this paper, we propose a fully optical ring NoC that combines static and dynamic wavelength allocation communication mechanisms. A different wavelength-channel is statically allocated to each destination node for light weight communication. Contention of simultaneous communication requests from multiple source nodes to the destination is solved by a token based arbitration for the particular wavelength-channel. For heavy load communication, a multiwavelength-channel is available by requesting it in execution time from source node to a special node that manages dynamic allocation of the shared multiwavelength-channel among all nodes. We combine these static and dynamic communication mechanisms in a same network that introduces selection techniques based on message size and congestion information. Using a photonic NoC simulator based on Phoenixsim, we evaluate our architecture under uniform random, neighbor, and hotspot traffic patterns. Simulation results show that our proposed fully optical ring NoC presents a good performance by utilizing adequate static and dynamic channels based on the selection techniques. We also show that our architecture can reduce by more than half, the energy consumption necessary for arbitration compared to hybrid photonic ring and mesh NoCs. A comparison with several previous works in term of architecture hardware cost shows that our architecture can be an attractive cost-performance efficient interconnection infrastructure for future SoCs and CMPs.
Scientific journal, English
DOI URL

FLAT: An MPI Friendly GPGPU Programming Framework for GPU Clusters
Keigo Shima; Masato Yoshimi; Takefumi MIYOSHI; Masaaki Kondo; Hidetsugu IRIE; Hiroki Honda; Tsutomu Yoshinaga
IPSJ Trans. on Computing System, 6, 4, 105-116, Oct. 2013, Peer-reviwed, GPU搭載PCクラスタで動作するプログラムは，GPU上の処理を記述するコードと通信処理を行うCPUのコードで構成される．GPUコードは並列化されたアルゴリズムを高速に実行し，CPUはノード間の通信処理を担当する．ノード間通信にはMPIの利用が一般的だがGPUコードには記述できないため，並列化の効果を引き出すためには，プログラマはCPUとGPUのデータの移動を考えつつCPUコードとGPUコードを並行して実装することになる．そこで，GPU間のデータ通信に関わるプログラミングコストを低減させるために，MPIを埋め込み可能なGPUプログラミングフレームワーク"FLAT"を提案する．FLATを用いることでGPUコードにMPI関数が記述できるようになるため，GPU間で転送されるデータが明確化される．本論文では，まず，FLATの実行モデルと実装方法について述べる．その後，LivermoreループLoop18，オプティカルフロー計算の2つの実プログラムを用いてFLATの有効性と実行性能を示す．実験の結果，GPUコードの計算粒度が粗粒度の場合，FLATの利用による性能低下率は，3%以下であることが確認された．A program for a PC cluster which equips GPUs consists of two types of code, for GPUs and for CPUs. The GPU code executes parallelized algorithms to introduce high speed computing supported by a CPU code which performs communication with other nodes. Although MPI library is commonly utilized to transfer data in the CPU code, MPI functions can not be written in the GPU code. Programmers are forced to implement CPU and GPU codes alternately with taking care of data movement among nodes. In order to reduce software development costs, we propose a programming framework called FLAT which enables GPU codes to embed MPI functions. This paper describes execution model and implementation of FLAT, and discusses availability and performance obtained by two case studies, Livermore Loop18 and optical flow programs. Through the experimental results, we confirmed that FLAT increases readability in synthesized GPU codes with maintaining bearable performance degradation, which is less than 3% for a coarse-grained parallel program.
Scientific journal, Japanese
URL

A Novel Wire-activity-aware Floorplanner for 3D-stacked Processor
H. Irie; T. Inaba; H. Houchi; D. Fujiwara; K. Mazima; M. Yoshimi; T. Yoshinaga
IPSJ Trans. on Advanced Computing Systems, IPSJ, 6, 3, 131-145, 25 Sep. 2013, Peer-reviwed, As 3D-stacked silicon technology grows, the significant increase of performance/power balance of 3D-stacked processor is expected. Exploiting 3D-stacked design, long wires that are not shrunk by process scaling can shrink geometrically, which essentially reduce the interconnect power that is the major part of the power dissipation. However, existing 3D module-mappers have not reflected switching activity to cost functions of wires; moreover, their outputs of 3D-microprocessor floorplans have not been revealed. This study introduces novel 3D module-mapper which reflects communication patterns to the cost function by collaborating with pipeline simulator, and reveals the floorplan and its effects to the 3D-stacked processor architectures. Our result showed efficient mapping of 3D data path and cache structures. With the condition of 3-layer, assuming the wire load of TSV as same as 30μm of the normal wire load, compared to that of 2D floorplan, it requires 34% footprint and shows 57% "Wire-Activity" value that represents interconnect power dissipation, which is improved by 10% from the result of existing 3D floorplanners.
Scientific journal, Japanese
URL

Variable Color Environment System using Heart Rate Variability
Naoko Kanda; Daiki Sakuma; Masato Yoshimi; Tsutomu Yoshinaga; Hidetugu Irie
Proc. of the 2013 International Conference on Bioinformatics & Computational Biology (BIOCOMP'13), 1-BIOCOMP-6, Jul. 2013, Peer-reviwed
International conference proceedings, English

Wire-speed implementation of sliding-window aggregate operator over out-of-order data streams
Yasin Oge; Masato Yoshimi; Takefumi Miyoshi; Hideyuki Kawashima; Hidetsugu Irie; Tsutomu Yoshinaga
Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013, IEEE Computer Society, 55-60, 2013, Peer-reviwed, This paper shows the design and evaluation of an FPGA-based accelerator for sliding-window aggregation over data streams with out-of-order data arrival. We propose an order-agnostic hardware implementation technique for windowing operators based on a one-pass query evaluation strategy called Window-ID, which is originally proposed for software implementation. The proposed implementation succeeds to process out-of-order data items, or tuples, at wire speed due to the simultaneous evaluations of overlapping sliding-windows. In order to verify the effectiveness of the proposed approach, we have also implemented an experimental system as a case study. Our experiments demonstrate that the proposed accelerator with a network interface achieves an effective throughput around 760 Mbps or equivalently nearly 6 million tuples per second, by fully utilizing the available bandwidth of the network interface. © 2013 IEEE.
International conference proceedings, English
DOI URL

An Efficient and Scalable Implementation of Sliding-Window Aggregate Operator on FPGA
Yasin Oge; Masato Yoshimi; Takefumi Miyoshi; Hideyuki Kawashima; Hidetsugu Irie; Tsutomu Yoshinaga
2013 FIRST INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), IEEE, 112-121, 2013, Peer-reviwed, This paper presents an efficient and scalable implementation of an FPGA-based accelerator for sliding-window aggregates over disordered data streams. With an increasing number of overlapping sliding-windows, the window aggregates have a serious scalability issue, especially when it comes to implementing them in parallel processing hardware (e.g., FPGAs). To address the issue, we propose a resource-efficient, scalable, and order-agnostic hardware design and its implementation by examining and integrating two key concepts, called Window-ID and Pane, which are originally proposed for software implementation, respectively. Evaluation results show that the proposed implementation scales well compared to the previous FPGA implementation in terms of both resource consumption and performance. The proposed design is fully pipelined and our implementation can process out-of-order data items, or tuples, at wire speed up to 200 million tuples per second.
International conference proceedings, English
DOI URL

A real-time gait improvement tool using a smartphone
Hirotaka Kashihara; Hiroki Shimizu; Hiroyoshi Houchi; Masato Yoshimi; Tsutomu Yoshinaga; Hidetsugu Irie
ACM International Conference Proceeding Series, 243, 2013, Peer-reviwed, Recent handy devices are provided with various sensors and have realized a lot of functions as downsizing and speeding up of computers. Currently smartphones occupy significant positions as the multifunctional handy devices. One of the most observable feature is that the users carry the smartphone whenever leaving home. Analyzing the motion measured by such device can be useful to improve lifestyle habits. Gaits should be focused as the representative behavior of daily living, which is shown by the fact that there are a lot of exercises intended to improve gaits. Copyright 2013 ACM.
International conference proceedings, English
DOI URL

A fast handshake join implementation on FPGA with adaptive merging network
Yasin Oge; Takefumi Miyoshi; Hideyuki Kawashima; Tsutomu Yoshinaga
ACM International Conference Proceeding Series, No.44 (4 pages), 2013, Peer-reviwed, One of a critical design issues for implementing handshake-join hardware is result collection performed by a merging network. To address the issue, we introduce an adaptive merging network. Our implementation achieves over 3 million tuples per second when the selectivity is 0.1. The proposed implementation attains up to 5.2x higher throughput than original handshake-join hardware. In this demonstration, we apply the proposed technique to filter out malicious packets from packet streams. To the best of our knowledge, our system is the fastest handshake join implementation on FPGA. Copyright © 2013 ACM.
International conference proceedings, English
DOI URL

Sharing Computing Resources with Virtual Machines by Transparent Data Access
Takuma Nakajima; Masato Yoshimi; Hidetsugu Irie; Tsutomu Yoshinaga
2013 FIRST INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), IEEE, 359-365, 2013, Peer-reviwed, Cloud computing has rapid growth in enterprise and academic areas. Computing platform makes up the transition from physical servers to virtual machines (VMs) in the cloud. Instead of many advantages, VMs remain several problems to employ effective utilization of physical computing resources, especially many-core accelerators. Even though GPGPU is a hopeful solution for high-load applications, existing methods to utilize GPUs from VMs are subjected to various restraints. In order to solve this problem, we propose a flexible method to share external computing resources by providing transparent access for data in the VMs. By committing commands to a computing host which processes the jobs as substitution, VMs can process high load jobs as necessary even if the VM has a tiny configuration. The computing host mounts the working directories in the VMs and enqueues jobs committed by the VMs. Experimental results show that the overhead of our implementation is sufficiently small in the low I/O load processes.
International conference proceedings, English
DOI URL

Using Cacheline Reuse Characteristics for Prefetcher Throttling
Hidetsugu Irie; Takefumi Miyoshi; Goki Honjo; Kei Hiraki; Tsutomu Yoshinaga
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E95D, 12, 2928-2938, Dec. 2012, Peer-reviwed, One of the significant issues of processor architecture is to overcome memory latency. Prefetching can greatly improve cache performance, but it has the drawback of cache pollution, unless its aggressiveness is properly set. Several techniques that have been proposed for prefetcher throttling use accuracy as a metric, but their robustness were not sufficient because of the variations in programs' working set sizes and cache capacities. In this study, we revisit prefetcher throttling from the viewpoint of data lifetime. Exploiting the characteristics of cache line reuse, we propose Cache-Convection-Control-based Prefetch Optimization Plus (CCCPO+), which enhances the feedback algorithm of our previous CCCPO. Evaluation results showed that this novel approach achieved a 30% improvement over no prefetching in the geometric mean of the SPEC CPU 2006 benchmark suite with 256 KB LLC, 1.8% over the latest prefetcher throttling, and 0.5% over our previous CCCPO. Moreover, it showed superior stability compared to related works, while lowering the hardware cost.
Scientific journal, English
DOI URL

Design and Implementation of a Handshake Join Architecture on FPGA
Yasin Oge; Takefumi Miyoshi; Hideyuki Kawashima; Tsutomu Yoshinaga
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E95D, 12, 2919-2927, Dec. 2012, Peer-reviwed, A novel design is proposed to implement highly parallel stream join operators on a field-programmable gate array (FPGA), by examining handshake join algorithm for hardware implementation. The proposed design is evaluated in terms of the hardware resource usage, the maximum clock frequency, and the performance. Experimental results indicate that the proposed implementation can handle considerably high input rates, especially at low match rates. Results of simulation conducted to optimize size of buffers included in join and merge units give a new intuition regarding static and adaptive buffer tuning in handshake join.
Scientific journal, English
DOI URL

A Token-based Fully Photonic Network-on-Chip with Dynamic Wavelength Allocation
P. Qiu; C.A.D. Adi; H. Irie; T. Yoshinaga
Proc. of the International Workshop on Modern Science and Technology (IWMST 2012), 39-44, Aug. 2012
International conference proceedings, English

FLAT: A GPU programming framework to provide embedded MPI
Takefumi Miyoshi; Hidetsugu Irie; Keigo Shima; Hiroki Honda; Masaaki Kondo; Tsutomu Yoshinaga
ACM International Conference Proceeding Series, 20-29, 2012, Peer-reviwed, For leveraging multiple GPUs in a cluster system, it is necessary to assign application tasks to multiple GPUs and execute those tasks with appropriately using communication primitives to handle data transfer among GPUs. In current GPU programming models, communication primitives such as MPI functions cannot be used within GPU kernels. Instead, such functions should be used in the CPU code. Therefore, programmer must handle both GPU kernel and CPU code for data communications. This makes GPU programming and its optimization very difficult. In this paper, we propose a programming framework named FLAT which enables programmers to use MPI functions within GPU kernels. Our framework automatically transforms MPI functions written in a GPU kernel into runtime routines executed on the CPU. The execution model and the implementation of FLAT are described, and the applicability of FLAT in terms of scalability and programmability is discussed. We also evaluate the performance of FLAT. The result shows that FLAT achieves good scalability for intended applications. © 2012 ACM.
International conference proceedings, English
DOI URL

Design and implementation of a merging network architecture for handshake join operator on fpga
Yasin Oge; Takefumi Miyoshi; Hideyuki Kawashima; Tsutomu Yoshinaga
Proceedings - IEEE 6th International Symposium on Embedded Multicore SoCs, MCSoC 2012, 84-91, 2012, Peer-reviwed, A novel merging network architecture is proposed for a handshake join operator in order to achieve much higher data throughput than ever before. Handshake join is a highly parallelized algorithm for window-based stream joins. Result collection performed by a merging network is a significant design issue for the handshake join operator because the merging network becomes an overwhelming bottleneck for scalable performance. To address the issue, an adaptive merging network is proposed for hardware implementation of the algorithm. The proposed architecture is implemented on an FPGA and it is evaluated in terms of the hardware resource usage, the maximum clock frequency, and the performance. Experimental results demonstrate up to 16.3 times higher throughput than nested loops-style join implementation without dropping any tuples. To the best of our knowledge, this is the best performance for handshake join operator implemented on an FPGA. © 2012 IEEE.
International conference proceedings, English
DOI URL

Parallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion
Akira Egashira; Shunji Satoh; Hidetsugu Irie; Tsutomu Yoshinaga
2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), IEEE, 130-136, 2012, Peer-reviwed, Detailed mechanism of optical illusion caused by visual neurons in human brain has not been well understood, and its numerical simulation is helpful to analyze visual system of humans. This paper describes implementation techniques of parallel numerical simulation to help understanding optical illusion by using a GPU-accelerated PC cluster. Our parallel acceleration techniques include following three points. Firstly, input images of the numerical simulation is efficiently calculated by dividing it images for multiple computation nodes using MPI (Message Passing Interface). Secondly, convolution, which is dominated computation for the optical flow, is accelerated by GPU. Finally, an algorithm to compute convolution specified to analyze optical illusion is proposed to speed up the simulation. Our experimental results show an interesting insight that values of optical flow for images causing optical illusion are quite different compared to that does not cause the optical illusion. We also demonstrate that our implementation of simulation works efficiently on the GPU-accelerated PC cluster.
International conference proceedings, English
DOI URL

STRAIGHT: Realizing a Lightweight Large Instruction Window by using Eventually Consistent Distributed Registers
Hidetsugu Irie; Daisuke Fujiwara; Kazuki Majima; Tsutomu Yoshinaga
2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), IEEE, 336-342, 2012, Peer-reviwed, As the number of cores as well as the network size in a processor chip increases, the performance of each core is more critical for the improvement of the total chip performance. However, to improve the total chip performance, the performance per power or per unit area must be improved, making it difficult to adopt a conventional approach of superscalar extension. In this paper, we explore a new core structure that is suitable for manycore processors. We revisit prior studies of new instruction-level (ILP) and thread-level parallelism (TLP) architectures and propose our novel STRAIGHT processor architecture. By introducing the scheme of distributed key-value-store to the register file of clustered microarchitectures, STRAIGHT directly executes the operation with large logical registers, which are written only once. By discussing the processor structure, microarchitecture, and code model, we show that STRAIGHT realizes both large instruction window and lightweight rapid execution, while suppressing the hardware and energy cost. Preliminary estimation results are promising, and show that STRAIGHT improves the single thread performance by about 30%, which is the geometric mean of the SPEC CPU 2006 benchmark suite, without significantly increasing the power and area budget.
International conference proceedings, English
DOI URL

Throttling control for bufferless routing in on-chip networks
Yicheng Guan; Cisse Ahmadou Dit Adi; Takefumi Miyoshi; Michihiro Koibuchi; Hidetsugu Irie; Tsutomu Yoshinaga
Proceedings - IEEE 6th International Symposium on Embedded Multicore SoCs, MCSoC 2012, 37-44, 2012, Peer-reviwed, As the number of core integration on a single diegrows, buffers consume significant energy, and occupy chip area. A bufferless deflection routing that eliminates router's inputportbuffers can considerably help saving energy and chip areawhile providing similar performance of existing buffered routing, especially for low-to-medium network loads. However when congestion increases, the bufferless frequently causes flits deflections, and misrouting leading to a degradation of network performance. In this paper, we propose IRT(Injection Rate Throttling), a local throttling mechanism that reduces deflection and misrouting for high-load bufferless networks. IRT provides injection rate control independently for each network node, allowing to reduce network congestion. Our simulation results based on a cycle-accurate simulator show that using IRT, IRT reduces average transmission latency by 8.65% compared to traditional bufferless routing. © 2012 IEEE.
International conference proceedings, English
DOI URL

Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster
Junichi Ohmura; Takefumi Miyoshi; Hidetsugu Irie; Tsutomu Yoshinaga
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E94D, 12, 2319-2327, Dec. 2011, Peer-reviwed, In this paper, we propose an approach to obtaining enhanced performance of the Linpack benchmark on a CPU-accelerated PC cluster connected via relatively slow inter-node connections. For one node with a quad-core Intel Xeon W3520 processor and a NVIDIA Testa C1060 CPU card, we implement a CPU-GPU parallel double-precision general matrix-matrix multiplication (dgemm) operation, and achieve a performance improvement of 34% compared with the CPU-only case and 64% compared with the CPU-only case. For an entire 16-node cluster, each node of which is the same as the above and is connected with two gigabit Ethernet links, we use a computation-communication overlap scheme with CPU acceleration for the Linpack benchmark, and achieve a performance improvement of 28% compared with the CPU-accelerated high-performance Unpack benchmark (HPL) without overlapping. Our overlap CPU acceleration solution uses overlaps in which the main inter-node communication and data transfer to the GPU device memory are overlapped with the main computation task on the CPU cores. These overlaps use multi-core processors, which almost all of today's high-performance computers use. In particular, as well as using a CPU core for communication tasks, we also simultaneously use other CPU cores and the CPU for computation tasks. In order to enable overlap between inter-node communication and computation tasks, we eliminate their close dependence by breaking the main computation task into smaller tasks and rescheduling. Based on a scheme in which part of the CPU computation power is simultaneously used for tasks other than computation tasks, we experimentally find the optimal computation ratio for CPUs; this ratio differs from the case of parallel dgemm operation of one node.
Scientific journal, English
DOI URL

Realizing Window Join Operator by using FPGA
Takefumi MIYOSHI; Yuta TERADA; Hideyuki KAWASHIMA; Tsutomu; YOSHINAGA
IEICE Trans. on Communications, The Institute of Electronics, Information and Communication Engineers, J94-B, 10, 1313-1322, Oct. 2011, Peer-reviwed, 本論文では,ストリームデータのウィンドウ結合をFPGA上に実現する手法とアーキテクチャを提案する.提案アーキテクチャは二つのストリームデータの処理を並列に行い,かつそれをパイプライン処理することで高性能化を実現する.提案アーキテクチャは,2^<16>個のタプルに関するウィンドウ結合を実現し,1ms周期で発生するストリームデータを処理できることを示す.高性能化に起因して,出力結果タプルが欠落する問題が生じることを述べ,提案するアドミッション制御機構によりその問題を防ぐことを示す.
Scientific journal, Japanese
URL
URL 2

An Efficient Path Setup for a Hybrid Photonic Network-on-Chip
C. A. D. Adi; H. Matsutani; M. Koibuchi; H. Irie; T. Miyoshi; T. Yoshinaga
International Journal of Networking and Computing, 1, 2, 244-259, Jul. 2011, Peer-reviwed
Scientific journal, English
URL
URL 2

A Dynamic Reconfigurable Processor Architecture for Stream Processing Engine
Takefumi Miyoshi; Yuta Terada; Hideyuki Kawashima; Tsutomu Yoshinaga
IPSJ TOD, 情報処理学会, 4, 2, 35-51, Jul. 2011, Peer-reviwed, 動的再構成可能ストリーム処理エンジンDR-SPEのプロセッサアーキテクチャを提案する．ストリーム処理エンジンは，ときどき刻々と変化するデータの流れであるストリームデータに対して，SQLライクな宣言的クエリ言語を用いて，関係演算や算術演算を適用できる．DR-SPEは並列処理による高い処理性能を実現すると同時に，高速なクエリ登録や演算子実行順序切換えをサポートする専用ハードウェアによるストリーム処理エンジンである．DR-SPEが提供する演算子は，Streams on Wiresと同等である．本論文では，提案するアーキテクチャをFPGA XC6VLX240T-1上に実装し，クエリの構成時間および処理性能を評価する．評価の結果は，DR-SPEはStreams on Wiresと同等のスループットを実現しながら，85μ秒でクエリを構成できることを示す．A processor architecture of dynamic reconfigurable stream processing engine DR-SPE is proposed. By using declarative query language, stream processing engine is able to apply relational and arithmetic operations to stream data. DR-SPE is a special purpose hardware for stream processing, which achieves both of high processing performance by exploiting parallelism in target query and ability for query registration and execution order of operations at runtime. Available operations in DR-SPE are the same as ones in Streams on Wires. In this paper, DR-SPE is implemented on a FPGA XC6VLX240T-1, and its configuration time for operations and its performance are evaluated in real experiments. The result of experiments shows that DR-SPE realizes 85μ second on configuration of operations, which overwhelms Streams on Wires. Simultaneously, DR-SPE achieves comparable performance with Streams on Wires,
Scientific journal, Japanese
URL
URL 2

Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors
Hiroki Matsutani; Michihiro Koibuchi; Hideharu Amano; Tsutomu Yoshinaga
IEEE TRANSACTIONS ON COMPUTERS, IEEE COMPUTER SOC, 60, 6, 783-799, Jun. 2011, Peer-reviwed, Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy ( e. g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.
Scientific journal, English
DOI URL

Control Mechanism with Virtual Remote Replication on 3 Data Center Storage System
N. Maki; Y. Hiraiwa; T. Imazu; T. Yoshinaga
Journal of the IPSJ, 52, 2, 1-13, Feb. 2011, Peer-reviwed
Scientific journal, Japanese

A coarse grain reconfigurable processor architecture for stream processing engine
Takefumi Miyoshi; Hideyuki Kawashima; Yuta Terada; Tsutomu Yoshinaga
Proceedings - 21st International Conference on Field Programmable Logic and Applications, FPL 2011, 490-495, 2011, Peer-reviwed, This paper proposes a processor architecture for DR-SPE, a dynamic reconfigurable stream processing engine. DR-SPE is special-purpose hardware for stream data processing, which achieves high processing performance by exploiting parallelism in the target query. It also handles query registration and execution order of operations at runtime. Available operations in DR-SPE are the same as those in Streams on Wires. In this paper, DR-SPE is implemented on a FPGA XC6VLX240T-1, and its performance is evaluated. The results of the evaluation show that DR-SPE achieves register modification within 506 μsec when the configuration path is driven at 1 Mbps, which is not achieved by Streams on Wires. DR-SPE also achieves flexibility and can support complicated queries by providing 10×10 operation units tiled onto an FPGA. DR-SPE achieves comparable operation throughput with Streams on Wires at the expense of requiring more LUTs. © 2011 IEEE.
International conference proceedings, English
DOI URL

Multi-GPU acceleration of optical flow computation in visual functional simulation
Junichi Ohmura; Akira Egashira; Shunji Satoh; Takefumi Miyoshi; Hidetsugu Irie; Tsutomu Yoshinaga
Proceedings - 2011 2nd International Conference on Networking and Computing, ICNC 2011, 228-234, 2011, Peer-reviwed, Numerical simulation for visual processing of the human brain is one of time-consuming applications. This paper shows acceleration techniques for a simulation program of the visual processing. We parallelize convolution calculations, which are core operations, which the simulation program requests, on a GPU-accelerated PC cluster. Our implementation includes three improvement points. Firstly, we consider efficient data mapping onto global and shared memories1 of the GPU. Secondly, multiple convolutions for the same input data are computed by each node's GPU, referred to as package execution. Finally, an input 2-dimensional image is divided into regions and convolutions for these regions are executed in parallel utilizing MPI (Message Passing Interface). Our experimental results show a linear speedup up to 12 nodes in the PC cluster for the convolution program. We also show the effects of the package execution and reduced communication on NVIDIA tesla C1060 and C2070, respectively. © 2011 IEEE.
International conference proceedings, English
DOI URL

CCCPO: Robust prefetcher optimization technique based on cache convection
Hidetsugu Irie; Takefumi Miyoshi; Goki Honjo; Kei Hiraki; Tsutomu Yoshinaga
Proceedings - 2011 2nd International Conference on Networking and Computing, ICNC 2011, 127-133, 2011, Peer-reviwed, One of the significant issues of processor architecture is to overcome memory latency. Prefetching can greatly improve cache performance, however, it has the drawback of cache pollution unless its aggressiveness is properly set. Although several techniques for prefetcher throttling have been proposed which use accuracy as a metric, their robustness were not sufficient due to the variations between program working set sizes and cache capacities. In this paper, we revisit cache behavior with the viwepoint of data lifetime in a cache with prefetching. Based on this observation Cache-Convection-Control-based Prefetch Optimization (CCCPO) is proposed, which exploits the characteristics of cache line reuse and controls the prefetcher aggressiveness. Evaluation results showed that this novel approach achieved 4.6% improvement against the most recent prefetcher throttling algorithms in the geometric mean of SPEC CPU 2006 benchmark suite with 256KB LLC. © 2011 IEEE.
International conference proceedings, English
DOI URL

An implementation of handshake join on FPGA
Yasin Oge; Takefumi Miyoshi; Hideyuki Kawashima; Tsutomu Yoshinaga
Proceedings - 2011 2nd International Conference on Networking and Computing, ICNC 2011, 95-104, 2011, Peer-reviwed, This paper shows an implementation of handshake join on field-programmable gate array (FPGA). Handshake join is one of stream join algorithms, proposed by Teubner and Mueller. It can support very high degrees of parallelism and attain unprecedented success in throughput speed in order to achieve efficient support for window-based join in streaming databases. In handshake join, it is necessary to take into account the problems with regard to the capacity of the output channel and the limitation of the internal buffer sizes, in order to apply join operation to input tuples efficiently in a correct manner. However, the implementation has not necessarily clarified in detail yet in their paper. In this paper, to solve the issues, we propose the merging network and the admission controller. Then we evaluate the architecture in terms of the hardware resource usage, the maximum clock frequency, and the operation performance. © 2011 IEEE.
International conference proceedings, English
DOI URL

Parallel matrix-matrix multiplication based on HPL with a GPU-accelerated PC cluster
Qin Wang; Junichi Ohmura; Shan Axida; Takefumi Miyoshi; Hidetsugu Irie; Tsutomu Yoshinaga
Proceedings - 2010 1st International Conference on Networking and Computing, ICNC 2010, 243-248, 2010, Peer-reviwed, In this paper, we propose an approach for significantly improving the performance of parallel matrix-matrix multiplication using a GPU-accelerated cluster. For one node, we implement a CPUs-GPU parallel double-precision general matrix-matrix multiplication (dgemm) operation and achieve a performance improvement of 32% as compared to the GPU-only case and 56% as compared to the CPUs-only case. For the entire cluster, we use the overlap GPU acceleration solution to high-performance Linpack (HPL), which eliminates the close dependency between the LU decomposition and the dgemm operation, and achieve a performance improvement of 17% as compared to the flat GPU acceleration case. © 2010 IEEE.
International conference proceedings, English
DOI URL

CODIE: Continuation-based overlapping data-transfers with instruction execution
Takefumi Miyoshi; Kenji Kise; Hidetsugu Irie; Tsutomu Yoshinaga
Proceedings - 2010 1st International Conference on Networking and Computing, ICNC 2010, 71-77, 2010, In this paper, a runtime system termed CODIE is proposed to execute sequential part of programs efficiently in a many-core architecture. All independent processing elements in a many-core architecture use a shared network and off-chip memory. Therefore, contentions on such resources substantially degrade the system performance. On the CODIE system, when a cache miss occurs, the system first initiates a data transfer operation. Next, the system creates a continuation of executing instructions related to the missing data. The continuation is stored into the buffer, and the instructions not related to the missing data are executed subsequently. In other words, data transfer and instruction executions can be performed simultaneously. In this way, the effect of the overhead of the updating cache entry (increased by memory access contention) is tolerated. The results of evaluation show that the proposed CODIE system realizes a 1.86x speed up of the execution of the sequential write/read program on the M-Core architecture at 36 cores and a 1.97x speed up of the execution of the blackscholes(from PARSEC benchmark suite) on the Cell/BE processor with 6 SPEs. © 2010 IEEE.
International conference proceedings, English
DOI URL

OREX: An Optical Ring with Electrical Crossbar Hybrid Photonic Network-on-Chip
Cisse Ahmadou Dit Adi; Ping Qiu; Hidetsugu Irie; Takefumi Miyoshi; Tsutomu Yoshinaga
2010 PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS (IWIA 2010), IEEE, 3-10, 2010, Peer-reviwed, The role of network-on-chip (NoC) is becoming more important as the number of processing elements (PE) integration onto a single chip increases. Lowering power consumption while providing capability of high-performance communication is a challenging problem for the design of future NoCs. In this paper we propose OREX, which is a hybrid NoC consisting of an optical ring and an electrical crossbar central router. OREX takes advantage of both electrical and optical technology designs state-of-art to deliver a high data rate transfer NoC at an acceptable power consumption cost. Using a cycle accurate simulator, we evaluate the proposed hybrid NoC. Simulation experiment shows that OREX presents slightly better communication performance in terms of bandwidth and power consumption compare to a conventional hybrid photonic torus network.
International conference proceedings, English
DOI URL

An efficient path setup for a hybrid photonic Network-on-Chip
Cisse Ahmadou Dit Adi; Hiroki Matsutani; Michihiro Koibuchi; Hidetsugu Irie; Takefumi Miyoshi; Tsutomu Yoshinaga
Proceedings - 2010 1st International Conference on Networking and Computing, ICNC 2010, 156-161, 2010, Electrical Network-on-Chip (NoC) faces critical challenges in meeting the high performance and low power consumption requirements for future multicore processors interconnection. Recent tremendous advances in CMOS compatible optical components give the potential for photonics to deliver an efficient NoC performance at an acceptable energy cost. However, the lack of in flight processing and buffering of optical data made the realization of a fully optical NoC complicated. A hybrid architecture which uses optical high bandwidth transfer and a tiny electrical control network can take advantage of both interconnection methods to offer an efficient performance-per-watt infrastructure to connect multicore processors and System-on-Chip (SoC). In this paper, we propose a hybrid photonic torus NoC (HPNoC) that uses a predictive switching to improve the performance of a hybrid architecture. By using prediction techniques, we can reduce the path set up latency for the electrical control network hence improving the overall end-to-end delay for communication in the HPNoC. Simulation results using a cycle accurate simulator under uniform, neighbor and bitreversal traffic patterns for 64 nodes show that predictive switching considerably improves the HPNoC overall performance. © 2010 IEEE.
International conference proceedings, English
DOI URL

Evaluation of Prediction Router for Low-Latency On-Chip Networks
Hiroki Matsutani; Michihiro Koibuchi; Hideharu Amano; Tsutomu Yoshinaga
IPSJ Trans. on Advanced Computing Systems, 2, 3, 26-38, Sep. 2009, Peer-reviwed
Scientific journal, Japanese

Prediction Router: Yet Another Low Latency On-Chip Router Architecture
Hiroki Matsutani; Michihiro Koibuchi; Hideharu Amano; Tsutomu Yoshinaga
HPCA-15 2009: FIFTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, IEEE COMPUTER SOC, 367-+, 2009, Peer-reviwed, Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce the communication latency, we propose a low-latency router architecture that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing the communication latency is the hit rates of prediction algorithms, which vary from the network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that speculatively skip one or more pipeline stages use a bypass datapath for specific packet transfers (e.g., packets moving on the same dimension), our prediction router predictively forwards packets based on a prediction algorithm selected from several candidates in response to the network environments. In this paper, we analyze the prediction hit rates of six prediction algorithms on meshes, tori, and fat trees. Then we provide three case studies, each of which assumes different many-core architecture. We have implemented a prediction router for each case study by using a 65nm CMOS process, and evaluated them in terms of the prediction hit rate, zero load latency, hardware amount, and energy consumption. The results show that although the area and energy are increased by 6.4-15.9% and 8.0-9.5% respectively, up to 89.8% of the prediction hit rate is achieved in real applications, which provide favorable trade-offs between the modest hardware/energy overheads and the latency saving.
International conference proceedings, English

Network Reconfiguration Protocols for Fault-Tolerant and Adaptive Deadlock-Recovery Routing
T.Yoshinaga; Y. Nishimura
Trans. on IEICE, D, The Institute of Electronics, Information and Communication Engineers, 91-D, 12, 2881-2891, Dec. 2008, 並列分散処理計算機用の耐故障/適応ルーチングアルゴリズムとして,動的ネットワーク再構成によってルーチング機能を実行時に切り換える方式が提案されている.本論文では,ネットワーク再構成中に一つのPE (Processing Element)に排出可能なパケット数を設定し,排出パケット数がその値に達するまでネットワークへのパケット注入を継続する動的ネットワーク再構成プロトコルを提案する.これにより,ネットワーク再構成中のパケット注入待ち時間を減らす.対象とするルーチングアルゴリズムは,k-ary n-cube用の耐故障/適応デッドロック回復ルーチングとする.デッドロック回復にup^*/down^*ルーチングとL-turnルーチングを使用する16-ary 2-cubeネットワークについてシミュレーションを行った.提案するプロトコルは,従来の静的,及びパケット注入停止型プロトコルと比較して,ネットワーク再構成中のスループット維持とネットワーク再構成後の低遅延化に貢献することを示す.
Scientific journal, Japanese
URL

Remote Sharing Support for DLNA Appliances with Rule-Based Access Control Functions
Daigo Mutoh; Tsutomu Yoshinaga
Journal of IPSJ, 情報処理学会, 49, 12, 3985-3996, Dec. 2008, Peer-reviwed, 我々は，DLNA機器の接続範囲をホームネットワーク内から宅外·家庭間に拡張することを支援するワームホールデバイスと呼ぶソフトウェアを開発した．ワームホールデバイスは，既存のDLNA機器および家庭用UPnPルータとの接続性を持つとともに，SIPサーバを利用してホームネットワーク間の接続を一括して行う．また，ユーザの設定したルールに基づいてDLNA機器やコンテンツのアクセス制御を実現する．市販のDLNA機器や家庭用UPnPルータ，家庭向けインターネット接続サービスを用いた複数のホームネットワーク環境を構築し，相互接続とコンテンツ共有に関する実験を行った．その結果，実用的な遅延時間で遠隔接続とアクセス制御を実現できることが分かった．We developed a software named wormhole device (WD) which supports remote connection of DLNA equipment between two home networks. WD has interoperability with existing DLNA products and household UPnP broadband routers. It utilizes a SIP server to establish remote connection with assisting NAT-Traversal for popular home network environments. It also supports access control functions to share remote DLNA equipments and their contents based on rules which are specified by users. We constructed experiments simulating home networks that are connected to the internet through commercial network providers and using different DLNA-enable device for each home. The experiments examined both remote connection and contents sharing. We show the results that WD realizes safe and easy remote contents sharing as well as access control with acceptable latency.
Scientific journal, Japanese
URL
URL 2

A Low-Latency Network-on-chip using Predictive Routers
M. Koibuchi; T. Yoshinaga; K. Murakami; H. Matsutani; H. Amano
Trans. on IPSJ (ACS), 1, 2, 59-69, Aug. 2008, Peer-reviwed
Scientific journal, Japanese

Latency Reduction Utilizing Dynamic Communication Prediction in 2-D Tori
T.Yoshinaga; H. Murakami; M. Koibuchi
Trans. on Advanced Computing Systems, 1, 1 (ACS22), 28-39, May 2008, Peer-reviwed
Scientific journal, Japanese

The QC-2 parallel Queue processor architecture
Ben A. Abderazek; Arquimedes Canedo; Tsutomu Yoshinaga; Masahiro Sowa
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, ACADEMIC PRESS INC ELSEVIER SCIENCE, 68, 2, 235-245, Feb. 2008, Peer-reviwed, Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)-an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor.
A prototype implementation is produced by synthesizing the high-level model for a target FPGA device. We present the architecture description and design results in a fair amount of details. (c) 2007 Elsevier Inc. All rights reserved.
Scientific journal, English
DOI URL

Impact of predictive switching in 2-D torus networks
Tsutomu Yoshinaga; Hirokazu Murakami; Michihiro Koibuchi
INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, IEEE COMPUTER SOC, 11-+, 2007, Peer-reviwed, Predictive switching is a technique for reducing message latency in parallel computer networks. It tries to decide traversal paths of messages by utilizing a prediction mechanism so that processing time for message headers can be shortened. A key issue of predictive switching is the overhead of prediction failures. This paper presents simple and efficient treatments of prediction failures. Our proposal includes three schemes. The first scheme is arranging predictive and non-predictive routers in a network to safely detect and discard mis-predicted packets. The second is additional hardware to reduce occurrences of mis-predicted packets. The third scheme is to shorten the mis-predicted packets. We show the impact of predictive switching embodying the three schemes for k-ary 2-cubes (k = 8, 16, 32) with dimension-order routing. Our simulation results demonstrate that we can reduce average message latency by minimizing the prediction failure overhead. Network saturation throughput is also improved when the predictor's accuracy is high.
International conference proceedings, English
DOI URL

Mathematical model for multiobjective synthesis of NoC architectures
Ben A. Abderazek; Mushfiquzzaman Akanda; Tsutomu Yoshinaga; Masahiro Sowa
Proceedings of the International Conference on Parallel Processing Workshops, CD, 2007, Peer-reviwed, Network-on-Chip (NoC) interconnections have been proposed to overcome the problems associated with long wires used in chip wide communications. They support asynchronous transfer of communication between cores within multicore systems-on-chips (MCSoCs). The design of such architectures is crucial for achieving high performance and energy efficient systems. However, the effectiveness of NoC based design depends on the adopted design methodology. Automatic design approach is highly desirable to increase system design productivity. This paper presents a new mathematical formulation for synthesizing application specific NoC architectures, such that the performance constraints are satisfied and the communication power consumption is minimized. © 2007 IEEE.
International conference proceedings, English
DOI URL

High-level modeling and FPGA prototyping of produced order parallel queue processor core
Ben A. Abderazek; Tsutomu Yoshinaga; Masahiro Sowa
JOURNAL OF SUPERCOMPUTING, SPRINGER, 38, 1, 3-15, Oct. 2006, Peer-reviwed, Emerging high-level hardware description and synthesis technologies in conjunction with field programmable gate arrays (FPGAs) have significantly lowered the threshold for hardware development. Opportunities exist to integrate these technologies into a tool for exploring and evaluating microarchitectural designs especially for newly proposed architectures. This paper presents a prototyping of a new processor core based on Queue architecture as starting point for application-specific processor design exploration. Using a hardware description language, we have created the Synthesizable model of a produced order parallel queue processor core for the integer subset parallel Queue architecture. A prototype implementation is produced by synthesizing the high-level model for the Stratix FPGA prototyping board. We show how to perform prototyping and optimizations to fully exploit the capabilities of the prototyped Queue processor core, while maintaining a common source base.
Scientific journal, English
DOI URL

Improving Linpack Performance on SMP Clusters with Asynchronous MPI Programming
Ta Quoc Viet; Tsutomu Yoshinaga
IPSJ Trans. ACS, Information Processing Society of Japan (IPSJ), 47, SIG 12 (ACS 15), 340-348, Sep. 2006, Peer-reviwed, This study proposes asynchronous MPI, a simple and effective parallel programming model for SMP clusters, to reimplement the High PerformanceLinpack benchmark. The proposed model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. As a result, we can achieve significant improvements in performance with a minimal programming effort. In comparison with a de-facto flat MPI solution, our algorithm can yield a 20.6% performance improvement for a 16-node cluster of Xeon dual-processor SMPs.
Scientific journal, English
URL
URL 2

Predictive switching in 2-D torus routers
Tsutomu Yoshinaga; Shojiro Kamakura; Michihiro Koibuchi
INTERNATIONAL WORKSHOP ON INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH PERFORMANCE PROCESSORS AND SYSTEMS, IEEE COMPUTER SOC, 65-72, 2006, This paper proposes predictive switching in 2-D torus routers to reduce the number of pipeline stages for low-latency communication. By utilizing the communication regularity in parallel applications, a dynamic predicting mechanism presets packet traversal paths inside the router before packet arrivals. Hence, we can bypass the pipeline stages of routing computation, virtual channel allocation and switch allocation when the prediction hits. We considered the predictor architecture and accuracy for several traffic patterns in NAS parallel benchmarks. Our experiments show that a sampled pattern matching (SPM) predictor achieves 77% to 96% of the prediction hit rates when we use the dimension-order routing algorithm. We also discuss a method to improve the prediction accuracy of SPM by examining the frequency of occurrence for the prediction values in the communication history.
International conference proceedings, English

A partial irregular-network routing on faulty k-ary n-cubes
Michihiro Koibuchi; Tsutomu Yoshinaga; Yasuhiko Nishimura
INTERNATIONAL WORKSHOP ON INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH PERFORMANCE PROCESSORS AND SYSTEMS, IEEE COMPUTER SOC, 57-64, 2006, Interconnection networks have been studied to connect a number of processing elements on parallel computers. Their design increasingly includes a challenge to high fault-tolerance, as entire systems become complicated. This paper presents a partial irregular-network routing in order to provide a high fault-tolerance in k-ary n-cube networks. Since an irregular-network routing usually performs poorly in k-ary n-cube networks, it is only used for progressive deadlock-recovery, and avoiding hard failures. The network is logically divided into the fault and regular regions. In the regular region, most packets are transferred along fully adaptive paths that are computed, assuming that there are no hard failures, so as to uniformly distribute the traffic. Simulation results show that the proposed routing achieves the same throughput as that of Duato's protocol under no hard failures. As the number of faulty links increases to up to 8 on 256 nodes, its throughput is only decreased by 15%. Moreover the throughput of the proposed deadlock recovery routing is almost maintained during a dynamic reconfiguration.
International conference proceedings, English

Scalable core-based methodology and synthesizable core for systematic design environment in multicore SoC (MCSoC)
Ben A. Abderazek; Tsutomu Yoshinaga; Masahiro Sowa
2006 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS, IEEE COMPUTER SOC, 345-+, 2006, Peer-reviwed, The strong demand for complex and high performance embedded system-on-chip requires quick turn around design methodology and high performance cores. Thus, there is a clear need for new methodologies supporting efficient and fast design of these systems on complex platforms implementing both hardware and software modules.
In this paper, we describe a novel scalable core-based methodology for systematic design environment of application specific heterogeneous multicore systems-on-chip (MC-SoC). We also developed a high performance 32-bit Synthesizable QueueCore (QC-2) with single precision floating point support. The core is targeted for special purpose applications within our target MCSoC system. We present the architecture description and design results in a fair amount of details.
International conference proceedings, English

Asynchronous Parallel Programming Model for SMP Clusters
Ta Quoc Viet; Tsutomu Yoshinaga
Proc. of the IASTED Int. Conf. on Parallel and Distributed Computing Systems (PDCS 2005), 466-070, 6 pages in CD, Nov. 2005, Peer-reviwed
International conference proceedings, English

Parallel queue processor architecture based on produced order computation model
M Sowa; BA Abderazek; T Yoshinaga
JOURNAL OF SUPERCOMPUTING, SPRINGER, 32, 3, 217-229, Jun. 2005, Peer-reviwed, This paper proposes novel produced order parallel queue processor architecture. To store intermediate results, the proposed system uses a first-in-first-out (FIFO) circular queue-registers instead of random access registers. Datum is inserted in the queue-registers in produced order scheme and can be reused. We show that this feature has profound implications in the areas of parallel execution, programs compactness, hardware simplicity and high execution speed.
Our performance evaluations show a significant performance improvement (e.g., 10 to 26% decrease in program size and 6 to 46% decrease in execution time over a range of benchmark programs) when compared with the earlier proposed architecture.
Scientific journal, English

Construction of Hybrid MPI-OpenMP Solutions for SMP Clusters
Ta Quoc Viet; Tsutomu Yoshinaga; Ben A. Abderazek; Masahiro Sowa
Transactions on Advanced Computing Systems, Information Processing Society of Japan (IPSJ), 46, ACS8, 25-37, Jan. 2005, Peer-reviwed, This paper proposes a middle-grain approach to construct hybrid MPI-OpenMP solutions for SMP clusters from an existing MPI algorithm. Experiments on different cluster platforms show that our solutions exceed the solutions that are based on the de-facto MPI model in most cases, and occasionally by as much as 40% of performance. We also prove an automatic outperformance of a thread-to-thread communication model over a traditional process-to-process communication model in hybrid solutions. In addition, the paper performs a detailed analysis on the hardware and software factors affecting the performance of MPI in comparison to hybrid models.
English
URL
URL 2

Modular design structure and high-level prototyping for novel embedded processor core
BA Abderazek; S Kawata; T Yoshinaga; M Sowa
EMBEDDED AND UBIQUITOUS COMPUTING - EUC 2005, SPRINGER-VERLAG BERLIN, 3824, 340-349, 2005, Peer-reviwed, In this research work, we present a high-level prototyping of a new processor core based on Queue architecture as starting point for application-specific processor design exploration. Using modular design structure with control logic implemented as a set of communicating state machines, we show hardware emulation and optimizations results of a parallel queue proecssor architecture (QueueCore). We also show how to to fully exploit the capabilities of the designed QueueCore, while maintaining a common source base. From the evaluation results, we show that the QueueCore prototype fits on a single conventional FPGA device, thereby obviating the need to perform multi-chip partitioning which results in a loss of resource efficiency.
Scientific journal, English

Performance evaluation of dynamic network reconfiguration using Detour-UD routing
T Yoshinaga; Y Nishimura
INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, IEEE COMPUTER SOC, 110-118, 2005, Fault-tolerance is an emerging issue for massively parallel computers. This paper describes the performance impact of dynamic network reconfiguration protocols using a fault-tolerant, adaptive deadlock-recovery routing algorithm, Detour-UD, for k-ary n-cubes. We propose a scheme to specify unroutable packets by managing drain-flags in routing tables. We also propose two selective drainage protocols. One protocol drains the unroutable packets specified by the drain-flags after the reconfiguration process. The other protocol drains deadlocked packets to reduce the network load during the reconfiguration process. Our simulation results show that the first protocol helps reduce the number of drainage packets, and the second one keeps the network throughput during the reconfiguration process.
International conference proceedings, English

Fault-Tolerant Adaptive Deadlock-Recovery Routing for k-ary n-cube Networks
T. Yoshinaga; H. Hosogoshi; M. Sowa
IPSJ Transactions on Advanced Computing Systems, IPSJ, 45, SIG 11(ACS7), 408-419, Oct. 2004, Peer-reviwed
Scientific journal, Japanese

High Performance Hybrid Processor Architecture with Efficient Hardware Usability
Akanda Md; Musfiquzzaman; Ben A. Abderazek; Soichi Shigeta; Tsutomu; Yoshinaga; Masahiro Sowa
Proceedings on International Workshop on Modern Science and Technology, 43-46, Sep. 2004
International conference proceedings, English

Design of Producer-Order Parallel Queue Processor Architecture
A. Markovskij; B.A. Abderazek; S. Shigeta; T. Yoshinaga; M. Sowa
Proceedings on International Workshop on Modern Science and Technology, 25-28, Sep. 2004
International conference proceedings, English

QJava: Integrate Queue Computational Model into Java
S. Shigeta; L.-Q. Wang; N. Yagishita; B. A. Abderazek; T. Yoshinaga; M. Sowa
Proc. of the Joint Japan-Tunisia Workshop on Computer Systems and Information Technology (JT-CSIT'04), 60-65, Jul. 2004
International conference proceedings, English

Optimization for Hybrid MPI-OpenMP Programs on a Cluster of SMP PCs
Tsutomu Yoshinaga; Ta Quoc Viet
Proc. of the Joint Japan-Tunisia Workshop on Computer Systems and Information Technology (JT-CSIT'04), 28-35, Jul. 2004
International conference proceedings, English

Theoretical Evaluation of Simultaneous Multithreading Parallel Queue Processor Architecture
Hirotoshi Sasaki; Yoshitomo Okumura; Ben Abderazek; Soichi Shigeta; Tsutomu Yoshinaga; Masahiro Sowa
International Conference on Circuits/Systems, Computers and Communications, 6D1L-2-1～4, Jul. 2004
International conference proceedings, English

Fault-tolerant adaptive deadlock-recovery routing for k-ary n-cube networks
T Yoshinaga; H Hosogoshi; M Sowa
INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, PROCEEDINGS, IEEE COMPUTER SOC, 49-58, 2004, Peer-reviwed, This paper proposes a fault-tolerant fully adaptive deadlock-recovery routing algorithm for k-ary n-cube networks. We intend to consider both the adaptability for faults and the communication performance by integrating regular and irregular network routing. Our algorithm tolerates any number or shape of faults without disabling fault-free nodes by maintaining routing tables that are configured based on faulty information. Our algorithm also provides minimal misrouting paths around faults by guaranteeing deadlock freedom using only two virtual channels per physical channel. Simulation results show that the proposed algorithm attains robust communication performance for uniform and nonuniform traffic patterns not only on a fault-free torus network but also on irregular tori with faulty nodes.
International conference proceedings, English

Queue processor architecture for novel queue computing paradigm based on produced order scheme
BA Abderazek; M Arsenji; S Shigeta; T Yoshinaga; M Sowa
SEVENTH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND GRID IN ASIA PACIFIC REGION, PROCEEDINGS, IEEE COMPUTER SOC, 169-177, 2004, Peer-reviwed, This paper proposes novel produced order parallel queue processor architecture. To store intermediate results, the proposed system uses a FIFO queue registers instead of random access registers. Datum is inserted in the queue in produced order scheme and can be reused. We will show that this feature has a profound implication in the areas of parallel execution, programs compactness, hardware simplicity and high execution speed. Our preliminary performance evaluations have shown a significant performance improvement (e.g., 10% to 26% decrease in program size and 6% to 46% decrease in execution time over a range of benchmark programs) when compared with the earlier proposed architecture.
International conference proceedings, English

QJAVAC: Queue-Java Compiler Design for High Parallelism Queue Java Bytecode
Li. Qiang Wang; Ben A. Abderazek; Soichi Shigeta; Tsutomu Yoshinaga; Masahiro Sowa
International Conference on Circuits/Systems, Computers and Communications (ITC-CSCC2003), 900-903, Jul. 2003, Peer-reviwed
International conference proceedings, English

Architectural Issues in the Design of a High Performance Parallel Queue Processor
Ben A. Abderazek; Soichi Shigeta; Tsutomu Yoshinaga; Masahiro Sowa
4th Bilateral Symposium on Science & Technology, Apr. 2003
International conference proceedings, English

Design and evaluation of a fault-tolerant adaptive router for parallel computers
T Yoshinaga; H Hosogoshi; M Sowa
INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, IEEE COMPUTER SOC, 100-107, 2003, In this paper, we propose a design methodology for fault tolerant adaptive routers for parallel and distributed computers. The key idea of our method is integrating minimal and non-minimal routing that is supported by independent virtual channels (VCs). Distinguishing the routing functions for each set of VCs simplifies the design of fault-tolerant algorithms. After describing the method, we show an application of a routing algorithm for two-dimensional mesh and torus networks. This algorithm, called Detour-NF, supports three routing modes: deterministic, minimal fully adaptive and non-minimal fault-tolerant operations. We also discuss the hardware cost and operational speed of minimal and non-minimal routers based on our design, which uses hardware description language (HDL).
Communication performance and fault-tolerance are demonstrated by an HDL simulation. The experimental results show that supporting both minimal and non-minimal routing modes is advantageous for high-bandwidth and low-latency communication, as well as fault-tolerance.
International conference proceedings, English

On the design of a register queue based processor architecture (FaRM-rq)
BA Abderazek; S Shigeta; T Yoshinaga; M Sowa
PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 2745, 248-262, 2003, Peer-reviwed, We propose in this paper a processor architecture that supports multi instructions set through run time functional assignment algorithm (RUNFA). The above processor, which is named Functional Assignment Register Microprocessor (FaRM-rq) supports queue and register based instruction set architecture and functions into different modes: (1) R-mode (FRM) - when switched for register based instructions support, and (2) Q-mode (FQM) - when switched for Queue based instructions support. The entities share a common data path and may operate independently though not in parallel.
In FRM mode, the machine's shared storage unit (SSU) behaves as a conventional register file. However, in FQM mode, the system organizes the SSU access as a first-in-first-out latches, thus accesses concentrate around a small window and the addressing of registers is implicit trough the Queue head and tail pointers.
First, we present the novel aspects of the FaRM-rq(1) architecture. Then, we give the novel FQM fundamentals and the principles underlying the architecture.
Scientific journal, English

Proposal and Design of a Parallel Queue Processor Architecture (PQP)
M. Sowa; B. A. Abderazek; S. Shigeta; K. Nikolova; T. Yoshinaga
Proc. 14th IASTED International Conference on Parallel and Distributed Computing and Systems, 554-560, Oct. 2002, Peer-reviwed
International conference proceedings, English

Complexity Analysis of a Functional Assignment Register Microprocessor
Ben A. Abderazek; Soichi Shigeta; Tsutomu Yoshinaga; Masahiro Sowa
International Workshop on Modern Science and Technology (IWMST2002), 116-123, Sep. 2002
International conference proceedings, English

High parallelism Java Compiler with Queue Architecture
Li-Qiang Wang; Tsutomu Yoshinaga; Masahiro Sowa
International Workshop on Modern Science and Technology (IWMST02), 130-135, Sep. 2002
International conference proceedings, English

A scalable FPGA-based custom computing machine for a medical image processing
T Yokota; M Nagafuchi; Y Mekada; T Yoshinaga; K Ootsu; T Baba
10TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, IEEE COMPUTER SOC, 307-308, 2002, Concentration index filter is a kind of spatial filters of images, and its typical application is diagnosis from medical images. This paper presents a dedicated computing engine for concentration index filtering. Original algorithm is modified to extract full parallelism and data width is optimized for maximizing clock speed and minimizing hardware scale. Evaluation results reveal that the system runs 100 times faster than current workstation and enables real-time diagnosis.
International conference proceedings, English

Real-time medical diagnosis on a multiple FPGA-based system
T Yokota; M Nagafuchi; Y Mekada; T Yoshinaga; K Ootsu; T Baba
FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 2438, 1088-1091, 2002, Peer-reviwed, The concentration index is a novel characteristic measurement that indicates the degree of concentration of lines to a certain point. Its typical application is medical diagnosis; for example, gastric cancer has a distinctive nature that folds concentrate to the lesion. Its large computational complexity requires much computing time. This paper presents a multiple FPGA-based computing architecture which accelerates the concentration index calculation and enables real-time diagnosis of gastric cancer. Evaluation results reveal that gate- and pin- counts are within those of todays' FPGA devices, and that the diagnosis process should be accelerated about 100 times faster than ordinal workstations..
Scientific journal, English

High Speed Receiving Process of MPI by Receiving Message Prediction
Y. Iwamoto; R. Adachi; K. Ootsu; T. Yoshianga; T. Baba
IPSJ Journal, IPSJ, 42, 4, 812-820, Apr. 2001, Peer-reviwed
Scientific journal, Japanese

Performance Evaluation of Adaptive Routers Based on the Number of Virtual Channels and Operating Frequencies
M. Horita; T. Yoshinaga; K. Ootsu; T. Baba
IPSJ Journal, IPSJ, 42, 4, 714-723, Apr. 2001, Peer-reviwed
Scientific journal, Japanese

Design and evaluation of speculative multithreading with selective multi-path execution
K. Ootsu; T. Yoshinaga; T. Baba
Proceedings - 15th International Parallel and Distributed Processing Symposium, IPDPS 2001, Institute of Electrical and Electronics Engineers Inc., 1409-1416, 2001, Peer-reviwed, Thread Level Parallelism (TLP) is the most promising way to the future high-performance microprocessors. When a sequential program code is speculatively executed in a multithreaded manner, it is natural that each speculative thread follows the control flow of the program. Based on this background, various studies have been performed on speculative multithreading, following control flow graph. This paper proposes a new execution model that aims at the speedup of the execution of usual sequential program codes by speculatively executing the multiple control flows in parallel. Further, this paper shows the thread control mechanism and the inter-thread communication facility required for the realization of the model. For the evaluation of our model, the cycle-based instruction-level simulator has been developed. We evaluate our model with simple benchmark programs and show the effectiveness of our model especially, for the case where the speculation by single control flow is difficult to speedup the programs.
International conference proceedings, English
DOI URL

Prediction Methodologies for Receiving Message Prediction
Y. Iwamoto; K. Ootsu; T. Yoshinaga; T. Baba
IPSJ Journal, IPSJ, 41, 09, 2582-2591, Sep. 2000, Peer-reviwed
Scientific journal, Japanese

Performance Evaluation of the Recover-X Adaptive Router for 2D Torus Networks
T. Yoshinaga; M. Hayashi; M. Horita; S. Nakamura; K. Ootsu; T. Baba
Proceedings of the World Multiconference on Systemics, Cybernetics and Informatics 2000, 4, 107-112, Aug. 2000, Peer-reviwed
International conference proceedings, English

Recover-X Adaptive Routing
T. Yoshinaga; M. Hayashi; M. Horita; S. Nakamura; K. Ootsu; T. Baba
Transactions of Information Processing Society of Japan, IPSJ, 41, 5, 1360-1369, May 2000, Peer-reviwed
Scientific journal, Japanese

Design, Implementation and Evaluation of a Parallel Object-Oriented Language A-NETL
Takanobu Baba; Tsutomu Yoshinaga; Yoshiyuki Iwamoto; Kanemitsu Ootsu
Parallel and Distributed Computing Practices, 3, 2, 199-219, 2000, Peer-reviwed
Scientific journal, English

Efficient Implementation of a Parallel Object-Oriented Language A-NETL on Multicomputers
Takanobu Baba; Tsutomu Yoshinaga; Yoshiyuki Iwamoto; Somchai Numprasertchai; Norihito Saitoh; Kanemitsu Ootsu; Mitsutoshi Hori
Proc. France-Japan Workshop on Object-Based Parallel and Distributed Computation, Object-Oriented Parallel and Distributed Programming, 75-93, 2000
International conference proceedings, English

Recover-x: An adaptive router with limited escape channels
T Yoshinaga; M Hayashi; M Horita; S Nakamura; K Ootsu; T Baba
SEVENTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, IEEE COMPUTER SOC, 272-279, 2000, Peer-reviwed, In order to improve network performance, a variety of adaptive routing algorithms has been proposed. Recent research focuses oil their implementation costs, as well as the performance to enhance their practical applications. This paper proposes the Recover-x adaptive routing, which limits escape message candidates in a blocked or deadlocked configuration. This limitation simplifies the routing logic and offers a chance to balance the usage between adaptive and non-adaptive channels.
The cost and performance of four wormhole routers based on Verilog-HDL designs were compared. Synthesis results for the chosen gate at-ray technology show that the Recover-x router attains a fast operating speed and low-latency, with high-bandwidth communication performance.
International conference proceedings, English

Efficient Implementation Techniques and the Performance of a Parallel Object-Oriented Language A-NETL
T. baba; T. Yoshinaga; Y. Iwamoto; N. Saitou; S. Numprasertchai
IPSJ Journal, IPSJ, 40, 9, 3554-3563, 1999, Peer-reviwed
Scientific journal, Japanese

Message Prediction and Speculative Execution of the Reception Process
Y. Iwamoto; K. Ootsu; T. Yoshinaga; T. Baba
Proc. IASTED International Conference on Parallel and Distributed Computing and Systems '99, 329-334, 1999, Peer-reviwed
International conference proceedings, English

A-NET マルチコンピュータにおける仮想時間を用いた性能評価法とその実現
Y. Iwamoto; D. Abe; K. Ootsu; T. Yoshinaga; T. Baba
IPSJ Journal, IPSJ, 40, 5, 1947-1957, 1999, Peer-reviwed
Scientific journal, Japanese

A Parallel Navigation Algorithm with Dynamic Load Balancing for OODBMSs
L. Mutenda; T. Baba; T. Yoshinaga; K. Ootsu
Trans. on IPSJ Database Systems, 40, SIG5(TOD2), 29-42, 1999, Peer-reviwed
Scientific journal, English

A speculative multithreading with selective multi-path execution
K Ootsu; W Yoshinari; F Furukawa; T Yoshinaga; T Baba
INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, IEEE COMPUTER SOC, 46-52, 1999, Peer-reviwed, Recent microprocessors' performance has been improved by their high-speed clock frequency and by their exploiting instruction-level parallelism (ILP). Physical limitations of clock speed and semantical limitations of control dependencies impede the improvement of performance. To overcome this difficulty, it is indispensable to make use of the thread-level parallelism. This paper proposes a speculative thread execution model that aims at a speed-up of sequential program execution by selective multi-path thread execution.
International conference proceedings, English

Prior-dimension Specification on output Channel Selection for Adaptive Routers
T. Yoshinaga; M. Hayashi; M. Horita; Y. Yamaguchi; K. Ootsu; T. Baba
Transactions of Information Processing Society of Japan, JPSJ, 40, 5, 1958-1967, 1999, Peer-reviwed, We propose a new adaptive routing method which is capable of selecting, based on a prioritizing system, a particular dimension to output each message. We have compared its hardware cost and performance based on HDL designs. The results of HDL synthesis and simulation lead to the following conclusions: (1) The dimension-selective routing can be sup-ported inexpensively compared with adding virtual channels; (2) Adaptive routers which support communication scheduling are effective in improving network performance; (3) The ability to restrict adaptive routing is useful not only in maintaining in-order message delivery but also balancing the overall network load for uniform communication traffic.
Scientific journal, Japanese
URL
URL 2

The A-NET working prototype: A parallel object-oriented multicomputer with reconfigurable network
T Baba; T Yoshinaga; Y Iwamoto; D Abe
INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, PROCEEDINGS, IEEE COMPUTER SOC, 40-49, 1998, A multicomputer prototype has been co-designed and implemented in conjunction with a programming language, A-NETL, based on a parallel object-oriented computation model. Each node processor consists of a processing element and a router. The prototype PE has an A-NETL directed, high-level instruction set. The implementation is supported by firmware and hardware. The router has been designed to be independent of network topology, utilizing virtual-cut-through, adaptive routing.
Experimental results show that a round trip for a, 35-byte message between adjacent nodes takes 85 machine cycles (MCs), and 6 MCs per hop; adaptive routing attains low latency communication under contention; the adaptation of network topology to a given communication pattern shows better performance than a generic topology; and the application to small problems attains a 10 to 18 times speedup on the 16-node prototype.
International conference proceedings, English

System Performance Evaluation for the A-NET Multicomputer
T. Yoshinaga; A. Sawada; M. Hirota; D. Abe; Y. Iwamoto; T. Baba
The Transactions of the Institute of Electronics, Information and Communication Engineers, IEICE, J81-D-I, 4, 368-376, 1998, Peer-reviwed, A-NETマルチコンピュータは, 並列オブジェクト指向言語A-NETLの設計と共に, トータルアーキテクチャの一環として開発した分散メモリ型並列計算機である.16ノードプロトタイプを使用した実験から, 通信性能は, 35バイトの隣接メッセージ転送に対して往復85マシンサイクル, 1ホップ当りの遅延時間は6マシンサイクルであることがわかった.また, 衝突による通信遅延は, PEの処理速度に対して十分高速であり, 適応型ルーチングの効果を確認した.更に, メッセージのオーバヘッドを変化させたA-NETLプログラムの実行結果から, 通信処理や並列アルゴリズムが実行性能に与える影響について考察した.実験したプログラムでは, 16ノードで約9.6〜18倍の台数効果が得られた.
Scientific journal, Japanese
URL

Methodologies for High Performance Message Passing: Implementation and Evaluation
Y. Iwamoto; A. Sawada; D. Abe; Y. Sawada; K. Ootsu; T. Yoshinaga; T. Baba
Trans. of IPSJ, IPSJ, 39, 6, 1663-1671, 1998
Scientific journal, Japanese

A cost and performance comparison for wormhole routers based on HDL designs
T Yoshinaga; M Hayashi; M Horita; Y Yamaguchi; K Ootsu; T Baba
1998 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, IEEE COMPUTER SOC, 375-382, 1998, Peer-reviwed, Our research investigates cost and performance characteristics for wormhole routers based on HDL designs. Comparison for dimension-order routers and turn model-based adaptive routers leads to the following conclusions: (1) Static and additional routing information which we propose in this paper, such as prior dimension specification and in-order delivery, improves the communication performance.(2) An adaptive routing algorithm must be implemented to satisfy the objective speed of the design. The operation speed of the routers affects the network; performance a lot. (3) The virtual channels cancel the improvement not only for the dimension-order router but also for the naive implementation of the adaptive routers when they degrade the operation speed.
International conference proceedings, English

Parallel navigation in an A-NETL based parallel OODBMS
Lawrence Mutenda; Manabu Hiyama; Tsutomu Yoshinaga; Takanobu Baba
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 1336, 305-316, 1997, Peer-reviwed, A parallel OODBMS has been proposed based on a parallel object-oriented language, A-NETL. The OODBMS is designed for a shared-nothing environment. An overview of the database system is described. Accessing object-data in a parallel OODBMS is based on navigation. A parallel navigation algorithm being implemented for use in the system is presented including its features. The algorithm is based on the need to balance the load across all nodes in a parallel OODB accessing objects with set-valued attributes. An analytical evaluation of the features of the algorithm is prescnted.
International conference proceedings, English
DOI URL

The Implementation of A-NETL on the Highly Parallel Computer AP1000
T. Yoshinaga; T. Baba; S. Numprasertchai
The Transactions of the Institute of Electronics, Information and Communication Engineers D-I, IEICE, J80-D-I, 9, 787-790, 1997, 実行性能と移植性を考慮した, 並列オブジェクト指向言語の高並列計算機への実装と評価について報告する. コンパイラが, 受信メッセージのハンドラやメソッドディスパッチャのコードを自動生成することにより, 高機能なメッセージインタフェースやアクティブオブジェクトをサポート可能である. コンパイル時の最適化により, ターゲットマシンのC言語プログラムに近い性能でメッセージパッシングできることが分かった. また, サンプルプログラムを用いて, 並列処理による台数効果を確認した.
Scientific journal, Japanese
URL

Event-Based Debugging for a Parallel Object-Oriented Language A-NETL
T. baba; Y. Furuya; T. Yoshinaga
IEICE Trans. on Information and Systems, IEICE, J79-D-I, 6, 331-340, Jun. 1996, Peer-reviwed
Scientific journal, Japanese

The Node Processor for a Parallel Object-Oriented Total Architecture A-NET
T. Yoshinaga; T. Baba
THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS,INFORMATION AND COMMUNICATION ENGINEERS D-I, IEICE, J79-D-I, 2, 60-68, 1996, Peer-reviwed, 並列オブジェクト指向計算モデルに基づく高並列計算機のノードプロセッサを計算し, プロトタイプを試作した。このノードプロセッサは, メソッドを実行するPEと, メッセージの送受信を行うルータからなる. PEは, メッセージ送受信命令などの高機能命令セットを命令前処理ユニット(IPU)でデコードし, マイクロプログラムで実行する. また, 同期機構や動的データ型付けなどをサポートするためのタグ付きアーキテクチャを採用し, これをタグ処理ユニット(TPU)によりハードウェア支援する. TPUの効果を加算命令を例に調べた結果, TPUを使用しない場合より実行時間(マイクロステップ数)が約44%高速化できることがわかった. また, IPUを使用しないと3倍程度の実行ステップ数が必要となる. プロトタイプを30MHzで動作させてメッセージの転送時間を実測した結果, 35Byteのメッセージを隣接ノードに転送するのに約5.2μs, 1ホップ当りの遅延時間が約1.0μsであることが確認できた.
Scientific journal, Japanese
URL

A DECLARATIVE SYNCHRONIZATION MECHANISM FOR PARALLEL OBJECT-ORIENTED COMPUTATION
T BABA; N SAITOH; T FURUTA; H TAGUCHI; T YOSHINAGA
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRON INFO COMMUN ENG, E78D, 8, 969-981, Aug. 1995, We have designed and implemented a simple yet powerful declarative synchronization mechanism for a parallel object-oriented computation model. The mechanism allows the user to control multiple message reception, specify the order of message reception, lock an invocation, and specify relations as invocation constraints. It has been included in a parallel object-oriented language, called A-NETL. The compiler and operating system have been developed on a total architecture, A-NET (Actors NETwork). The experimental results show that (i) the mechanism allows the user to model asynchronous events naturally, without losing the integrity of described programs; (ii) the replacement of the mechanism with the user's code requires tedious descriptions, but gains little performance enhancement, and certainly loses program readability and integrity; (iii) the mechanism allows the user to shift synchronous programs to asynchronous ones, with a scalable reduction of execution times: an average 20.6% for 6 to 17 objects and 46.1% for 65 objects. These prove the effectiveness of the proposed synchronization mechanism.
Scientific journal, English

Programming and Debugging for Massive Parallelism: The Case for a Parallel Object-Oriented Language A-NETL
Takanobu Baba; Tsutomu Yoshinaga; Takahiro Furuta
Proc. Workshop on Object-Based Parallel and Distributed Computation, Springer, Lecture Notes in Computer Science 1107, Springer-Verlag, 38-58, 1995
International conference proceedings, English

A-NETL: A Language for Massively Parallel Object-Oriented Computing
Takanobu Baba; Tsutomu Yoshinaga
Proc. Massively Parallel Programming Models, 98-105, 1995, Peer-reviwed
International conference proceedings, English

A Parallel Object-Oriented Language A-NETL Supporting the Topological Programming
T. Yoshinaga; T. Baba
The Transactions of the Institute of Electronics, Information and Communication Engineers, IEICE, J77-D-I, 8, 557-566, 1994, Peer-reviwed, A-NETLは,並列オブジェクト指向トータルアーキテクチャA-NETプロジェクトにおいて設計された高並列プログラミング言語であり,(1)静的な多数オブジェクトの定義を可能とするインデックストオブジェクト,(2)ノード指定可能な動的オブジェクト生成,(3)メッセージのマルチキャストとマルチレシーブ,(4)メソッドの実行順序制御などに特徴がある.インデックストオブジェクトは,体系的にオブジェクトグループを扱うのに有効である.グループに属する一連のオブジェクトの起動と処理の待合せについては,メッセージの並列送受信構文を用いて簡便に記述できる.また,動的に構造が変化する問題に対しては,割付けノードを指定した動的オブジェクト生成を使用できる.ユーザは,これらの機能を利用することにより,問題固有の並列性をネットワークトポロジーに反映したプログラミングが可能となる.更に,個々のオブジェクト間の通信関係を反映したオブジェクト間関係宣言を用いることにより,非定型的な問題も容易に扱うことができる.オブジェクトの内部状態の保護とメッセージのディスパッチについては,メソッドの実行順序制御と起動条件に関する制御構文を与えることにより,オブジェクトの自律的な動作を可能とする.
Scientific journal, Japanese
URL

Organization of a Network-Topology Independent Router for a Parallel Object-Oriented Total Architecture A-NET
T. Yoshinaga; T. Baba
Transactions of Information Processing Society of Japan, Information Processing Society of Japan (IPSJ), 34, 4, 648-657, Apr. 1993, Peer-reviwed, We have designed a router for a parallel object-oriented machine. The A-NET router provides two functions, that is, message routing and dynamic object allocation, in a network-topology independent fashion using a programmable communication controller. It supports both adaptive virtual cut-through packet switching and the circuit-switching transfer of an object code. It consists of several hardware blocks, a message-sender, a message-receiver, a PE interface circuit, a packet buffer, 6 ports to connect with other routers, and a port for a host computer. These blocks have their own state machines and are connected to a crossbar network, enabling them to exchange data simultaneously. According to the preliminary evaluation, the time to decide a route per hop is about 2 μs, assuming that there is no conflict. The propagation delay of an average size message traveling l hop is 10 μs, and it takes 48 μs for one traveling 2O hops. These times are comparable with a typical continuous execution time of user programs per message.
Scientific journal, Japanese
URL

並列オブジェクト指向トータルアーキテクチャA-NETにおける言語とアーキテクチャの統合
T. Baba; T. Yoshinaga
IEICE Trans. on Information and Systems, IEICE, J75-D-I, 8, 563-574, Aug. 1992
Scientific journal, Japanese
URL

A Local Operating System for the A-NET Parallel Object-Oriented Computer
Tsutomu Yoshinaga; Takanobu Baba
Journal of Information Processing, 14, 4, 414-422, Apr. 1992, Peer-reviwed
Scientific journal, English

A PARALLEL OBJECT-ORIENTED LANGUAGE A-NETL AND ITS PROGRAMMING ENVIRONMENT
T YOSHINAGA; T BABA
COMPSAC 91 - THE FIFTEENTH ANNUAL INTERNATIONAL COMPUTER SOFTWARE & APPLICATIONS CONFERENCE, PROCEEDINGS, I E E E, COMPUTER SOC PRESS, 459-464, 1991, Peer-reviwed
International conference proceedings, English

Design and Implementation of a Three-Dimensional Color Graphics System on a Parallel Computer MUNAP using Extended L6 Language
T. baba; M. Kagaya; T. Yoshinaga; S. Suzuki; K. Yamazaki; K. Okuda
IEICE Trans. on Information and Systems, IEICE, J73-D-I, 1, 9-17, Jan. 1990, Peer-reviwed
Scientific journal, Japanese

A NETWORK-TOPOLOGY INDEPENDENT TASK ALLOCATION STRATEGY FOR PARALLEL COMPUTERS
T BABA; Y IWAMOTO; T YOSHINAGA
SUPERCOMPUTING 90, I E E E, COMPUTER SOC PRESS, 878-887, 1990, Peer-reviwed
International conference proceedings, English

A PARALLEL OBJECT-ORIENTED TOTAL ARCHITECTURE - A-NET
T BABA; T YOSHINAGA; T IIJIMA; Y IWAMOTO; M HAMADA; M SUZUKI
SUPERCOMPUTING 90, I E E E, COMPUTER SOC PRESS, 276-285, 1990, Peer-reviwed
International conference proceedings, English

MISC

Performance Analysis of On-device Hierarchical Federated Learning Frameworks
DU Zhaoyang; WU Celimuge; YOSHINAGA Tsutomu
2023, 電子情報通信学会技術研究報告(Web), 123, 174(CQ2023 26-36), 2432-6380, 202302231592195804

Semantic Communication with Masked Autoencoders: Enhancing Efficiency in Image Transmission
WU Jiale; DU Zhaoyang; WU Celimuge; YOSHINAGA Tsutomu
2023, 電子情報通信学会技術研究報告(Web), 123, 174(CQ2023 26-36), 2432-6380, 202302286133140217

Enhancing Communication Efficiency for UAV Networks through Knowledge Distillation and Transfer Learning in Federated Learning
LI Yalong; DU Zhaoyang; WU Celimuge; YOSHINAGA Tsutomu
2023, 電子情報通信学会技術研究報告(Web), 123, 174(CQ2023 26-36), 2432-6380, 202302286796191909

Automated Driving Methods Using Federated Learning
小野航輝; WU Celimuge; 吉永努
2023, 電子情報通信学会技術研究報告(Web), 122, 438(CQ2022 80-106), 2432-6380, 202302230675097693

Effectiveness of Multi-Hop D2D Communication in Mobile Cooperative Cache
宮原雅司; 秋場大暉; 萩原賢; WU Celimuge; 吉永努
2023, 電子情報通信学会技術研究報告(Web), 122, 451(CPSY2022 34-55), 2432-6380, 202302228620977562

A Color-Based Cooperative Cache with Chunking Contents Distribution
岡田浩希; 城間隆行; 中島拓真; 策力木格; 吉永努
電子情報通信学会, 19 Nov. 2017, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 117, 314, 3-8, Japanese, 0913-5685, 40021402233, AN10013141
URL

A Prototype Implementation of Color-based Cooperative Cache with Chunking Contents Delivery
中島拓真; 岡田浩希; 策力木格; 吉永努
電子情報通信学会, 19 Nov. 2017, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 117, 314, 9-14, Japanese, 0913-5685, 40021402258, AN10013141
URL

Performance evaluation of RPL-based sensor data collection in challenging IoT environment (コミュニケーションクオリティ)
Gao Liming; Wu Celimuge; Yoshinaga Tsutomu
電子情報通信学会, 19 Jan. 2017, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 116, 403, 7-12, English, 0913-5685, 40021101994, AN1054106X
URL

Reinforcement Learning-based Data Storage Scheme in Vehicular Ad Hoc Networks
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji; Tutomu Murase; Yan Zhang
IEEE, 24 May 2016, Proc. of the IEEE International Conference on Communications (ICC) 2016, 718-723, English, Peer-reviwed, Meeting report

Hierarchical Cache Strategies for VOD Networks with Popularity Change
中島拓真; 城間隆行; 吉見真聡; 策力木格; 吉永努
電子情報通信学会, 24 Mar. 2016, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115, 518, 247-252, Japanese, 0913-5685, 40020791009, AN10013141
URL

Design and Evaluation of Low-Latency Handshake Join on FPGA (コンピュータシステム) -- (組込み技術とネットワークに関するワークショップETNET2016)
YOSHIMI Masato; OGE Yasin; WU Celimuge; YOSHINAGA Tsutomu
電子情報通信学会, 24 Mar. 2016, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 115, 518, 253-258, English, 0913-5685, 40020791018, AN10013141
URL

活動60年を超えたコンピュータシステム研究会
吉永努
電子情報通信学会, 01 May 2014, 電子情報通信学会情報・システムソサイエティ誌, 19, 1, 10-11, Japanese, Invited, Others, 130005433624
DOI URL

Books and other publications

ロス・キニー論理回路
佐藤証; 三輪忍; 吉永努
Textbook, Japanese, Joint translation, １～５章, 東京化学同人, 20 May 2021

The Massively Parallel Processing System JUMP-1
Takanobu Baba; Tsutomu Yoshinaga
English, Joint work, Ohmsha, 1996

Lectures, oral presentations, etc.

Low-Latency Request Process for an FPGA-based Cache Server
YUAN TIANYI; Celimuge WU; Tsutomu YOSHINAGA
CPSY
21 Mar. 2024
21 Mar. 2024- 23 Mar. 2024
共同研究・競争的資金等URL

モバイル分散協調キャッシュにおけるマルチホップD2D通信の活用
宮原雅司; 秋場大暉; 萩原賢; 策力木格; 吉永努
Oral presentation, Japanese, 信学技報, vol. 122, no. 451, CPSY2022-37, pp. 19-24,
23 Mar. 2023

An Efficient Content Sharing for the User QoE Improvement in Mobile Cooperative Cache
秋場大暉; 策力木格; 吉永努
Oral presentation, Japanese, IEICE, IEICE-CPSY2022-7, IEICE, Shimonoseki, Domestic conference
28 Jul. 2022

A UAV-empowered Routing Protocol for Federated Learning in Delay Tolerant Environments
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga
Oral presentation, English, IEICE Technical Report, CQ2021-55, Domestic conference
Sep. 2021

Load balancing method using reinforcement learning between edge and cloud
Hiroki Kobari; Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga
Oral presentation, English, IEICE Technical Report, CQ2021-38, IEICE, Domestic conference
Sep. 2021

Acceleration of Database Query Processing Using FPGA
Hirohiko OZAKU; Masato YOSHIMI; Celimuge WU; Tsutomu YOSHINAGA
Oral presentation, Japanese, IEICE/Technical Report, IEICE, online, https://www.ieice.org/ken/paper/20210126UC1M/, Domestic conference
26 Jan. 2021
URL

Task Offloading of a Distributed and Cooperative Cache Server using FPGA
Teppei Yamagishi; Masato Yoshimi; Celimuge Wu; Tsutomu Yoshinaga
Oral presentation, Japanese, SWoPP2020／CPSY2020-10, IEICE, on line, https://www.ieice.org/ken/paper/20200731w1ZC/, Domestic conference
31 Jul. 2020
URL

Acceleration of numerical simulation of computational model for research on visual illusion
柳田悠介; 佐藤俊治; 策力木格; 吉永努
Oral presentation, Japanese, IEICE Technical Report, IEICE, 東京, Domestic conference
04 Mar. 2020

An efficient traffic reduction scheme for mobile cooperative cache by pushing contents preliminarily
城間隆行; 吉見真聡; 策力木格; 吉永努
Oral presentation, Japanese, IEICE Technical Report(CPSY-2020-02-20), Domestic conference
27 Feb. 2020

FPGA-based Stream Data Aggregation for Large Sliding-Windows
大坂誠樹; 吉見真聡; 策力木格; 吉永努
Oral presentation, Japanese, IEICE, Technical Report, CPSY, IEICE, 日吉, Domestic conference
23 Jan. 2020

Empowering ICN in Intermittent Connectivity Scenarios
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga
Oral presentation, English, IEICE Society Conference 2019, IEICE, Domestic conference
Sep. 2019

An efficient content sharing for distributed and collaborative cache networks of mobile devices
城間隆行; 策力木格; 吉永努
Oral presentation, Japanese, IEICE Technical Report, CPSY2019-38, IEICE, 北見市, Domestic conference
26 Jul. 2019

A VDTN Routing Protocol with Enhanced Buffer Management Policy
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE General Conference 2019, IEICE, Domestic conference
Mar. 2019

A Vehicular DTN Routing Protocol with Enhanced Buffer Management Policy
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Technical Report, CQ2018-81, IEICE, Domestic conference
Jan. 2019

Low-Latency Stream Data Join on Multiple FPGA Nodes
松下紘嗣; 策力木格; 吉永努
Oral presentation, Japanese, IEICE Technical Report, IEICE, 熊本, Domestic conference
31 Jul. 2018

A Prophet-based DTN protocol for VANETs
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Technical Report, CQ2018-24, Domestic conference
May 2018

An Openflow-based Management Framework for Sensor and Actuator Networks
Rui Kang; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Tech. Report, CQ2017-87, IEICE, Tokyo, Domestic conference
18 Jan. 2018

A Study on Interconnection Networks and Their Computing Systems
吉永努
Invited oral presentation, Japanese, コンピュータシステム研究会／信学技法CPSY2017-115, Invited, IEICE, 日吉, Domestic conference
18 Jan. 2018

Evaluation on performance gain of an SDN-based handover approach in IEEE 802.11p and LTE hybrid vehicular networks
Ran Duo; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Tech. Report, CQ2017-86, IEICE, Tokyo, Domestic conference
18 Jan. 2018

チャンク分割コンテンツ配置を用いた分散協調色キャッシュ
岡田浩希; 城間隆行; 中島拓真; 策力木格; 吉永努
Poster presentation, Japanese, 電子情報通信学会／信学技法 CPSY2017-51, 電子情報通信学会, Domestic conference
19 Nov. 2017

色タグ情報に基づく分散協調キャッシュおよびチャンク分割キャッシュ制御のプロトタイプの実装
中島拓真; 岡田浩希; 策力木格; 吉永努
Poster presentation, Japanese, 電子情報通信学会／信学技法 CPSY2017-52, 電子情報通信学会, 青森, Domestic conference
19 Nov. 2017

System Performance Assessment and Sizing for Cloud-based Data Backup
Yuichi Taguchi; Tsutomu Yoshinaga
Oral presentation, English, コンピュータシステム・シンポジウム (ComSys2017）, 情報処理学会システムソフトウェアとオペレーティング・システム研究会, 川崎市, http://www.ipsj.or.jp/sig/os/index.php?ComSys2017, Domestic conference
07 Nov. 2017
URL

Performance Evaluation of Vehicular DTN Protocols for Anycast Vehicle-to-cloud Communications
Zhaoyang Du; Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Technical Report, CQ2017-52, IEICE
Aug. 2017

Color-based Distributed and Cooperative Cache Control using Prediction of Access Change
Susumu CHIDA; Takayuki SHIROMA; Takuma NAKAJIMA; Celimuge WU; Tsutomu YOSHINAGA
Oral presentation, Japanese, IEICE Tech. Report, Domestic conference
26 Jul. 2017

Content Distribution in VANETs Integrating LTE and IEEE 802.11p
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Tech. Report, CQ2017-14, IEICE, Miyazaki, http://www.ieice.org/ken/paper/201705293bTu/, In this paper, we discuss the use of integrating LTE (Long Term Evolution) and IEEE 802.11p for the content distribution in vehicular ad hoc networks (VANETs)., Domestic conference
29 May 2017
URL

デバイス間通信を活用する分散協調キャッシュ機構の提案
城間隆行; 中島拓真; 吉見真聡; 策力木格; 吉永努
Oral presentation, Japanese, 信学技報,CPSY2016-159, 電子情報通信学会, 沖縄, Domestic conference
10 Mar. 2017

多田昂介・川原尚人・吉見真聡・策力木格・吉永努
多田昂介
Oral presentation, Japanese, 信学技法CPSY2016-112, IEICE, yokohama, Domestic conference
23 Jan. 2017

Performance evaluation of RPL-based sensor data collection in challenging IoT environment
Liming Gao; Celimuge Wu; Tsutomu Yoshinaga
Oral presentation, English, IEICE Tech. Report, CQ2016-91, IEICE, Oosaka, Domestic conference
19 Jan. 2017

A Reinforcement Learning-based Data Storage Scheme for VANETs
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Tech. Report, CQ2016-92, IEICE, Oosaka, Domestic conference
19 Jan. 2017

Processing Aggregation Queries using Interconnected Multiple FPGA Boards
川原尚人; 吉見真聡; 策力木格; 吉永努
Poster presentation, Japanese, 信学技法 CPSY2016-49, IEICE, 幕張, Domestic conference
06 Oct. 2016

A context-aware unified routing protocol for vehicular ad hoc networks
Celimuge Wu; Tsutomu Yoshinaga; Yusheng Ji
Oral presentation, English, IEICE Tech. Report, CQ2016-64, IEICE, Tsukuba, We propose a context-aware unified routing protocol for vehicular ad hoc networks (VANETs). The proposed protocol constructs route based on virtual clustering which only exchanges beacon messages in one-hop neighborhood area. The packets are forwarded by the cluster heads, and the last 2-hop route is optimized by using a reinforcement learning algorithm which can attain good performance with low overhead. The advantage of the proposed protocol is shown by using computer simulations., Domestic conference
30 Aug. 2016

An Automatic Disaster Recovery Scheme for Inter-cloud Environment
溝田敦也; 城間隆行; 中島拓真; 吉見真聡; 策力木格; 吉永努
Oral presentation, Japanese, 信学技法，CPSY2016-34, 電子情報通信学会, 松本, Domestic conference
10 Aug. 2016

An Impact of In-Network Caching on Energy Saving for ISP Networks
野島幸大; 城間隆行; 中島拓真; 吉見真聡; 策力木格; 吉永努
Oral presentation, Japanese, IEICE Technical Report, 電子情報通信学会, 松本, Domestic conference
10 Aug. 2016

動画の人気変動に追従する異種キャッシュ混在ネットワークの検討
中島拓真; 城間隆行; 吉見真聡; 策力木格; 吉永努
Oral presentation, Japanese, 信学技法，CPSY2015-154, 電子情報通信学会, 長崎, 本論文では，動画の人気が急激に変動した際にも高いヒット率を維持するために，ヒット率の高いLFU キャッシュで通信量を削減しつつ，アクセスの急激な変動に強いLRU キャッシュを少量組み合わせて，ヒット率の低下を抑制する．LRU とLFU を混在させた階層ャッシュネットワークでシミュレーションを行い，異種キャッシュを混在させたキャッシュネットワークは急激な人気変動に追従して高いヒット率と短いホップ数を維持できることを確認した．, Domestic conference
25 Mar. 2016

Design and Evaluation of Low-Latency Handshake Join on FPGA
Masato YOSHIMI; Yasin OGE; Celimuge Wu; Tsutomu YOSHINAGA
Oral presentation, English, IEICE, Tech. Report, CPSY2015-155, IEICE, Nagasaki, In this paper, we propose an FPGA-based implementation of low-latency handshake join algorithm and present a detailed evaluation of the proposed design., Domestic conference
25 Mar. 2016

Reinforcement learning-based parameter tuning for a broadcast protocol in VANETs
Celimuge Wu; Satoshi Ohzahata; Yusheng Ji; Tsutomu Yoshinaga; Toshihiko Kato
Oral presentation, English, IEICE Tech. Report CQ2015-105, IEICE, Tsukuba, http://www.ieice.org/ken/paper/20160122zbfW/, In this paper, we present a broadcast protocol which is able to make forwarding decision based on a self-learning mechanism., Domestic conference
21 Jan. 2016
URL

FPGA-based Parallel Processing of Sliding-Window Aggregate queries on Data Streams
Yoshimitsu OGAWA; Yasin OGE; Masato YOSHIMI; Celimuge WU; Tsutomu YOSHINAGA
Oral presentation, Japanese, Technical Report of IEICE, CPSY2015-119, IEICE, 横浜, Domestic conference
19 Jan. 2016

3次元積層プロセッサ向けフロアプランナの可視化
村田篤志; 野村隼人; 吉見真聡; 入江英嗣; 吉永努; 坂井修一
Others, Japanese, 信学技報CPSY2015-58, IEICE, 幕張, Domestic conference
08 Oct. 2015

RGB-Dセンサと学習による運転姿勢検知
土門憲司; 野村隼人; 吉見真聡; 入江英嗣; 吉永努; 坂井修一
Others, Japanese, 信学技報CPSY2015-59, IEICE, 幕張, Domestic conference
08 Oct. 2015

Stubborn Cache: A Novel Strategy for Repeating Thrashing Access Patterns
Hayato Nomura; Takuma Nakajima; Masato Yoshimi; Tsutomu Yoshinaga; Hidetsugu Irie
Poster presentation, English, Proceedings Notebook for COOL Chips XVIII, International conference
13 Apr. 2015

A proposal of placement optimization algorithm by introducing TSV module
村田篤志; 稲場朋大; 吉見真聡; 入江英嗣; 吉永努
Oral presentation, Japanese, IEICE Technical Report/IEICE-CPSY2014-169, IEICE, Amami, The performance and the power efficiency of VLSI are expected to be significantly improved by the
development of 3D stacking technologies. Various 3D
oorplanner algorithms are proposed to optimize the design of
future 3D-ICs, while they approximate the arrangement of TSVs, which diminishes the optimization. In this paper,
novel algorithm that optimizes the location of TSVs as well as normal modules is proposed. Our algorithm is implemented and the optimization of 3d microprocessor floorplan is organized. The evaluation results show that there are
some common tendency in effective TSV positions. It is also revealed that our algorithm estimates "wire-activity"
cost function in 28.4% higher accuracy for the optimization., Domestic conference
15 Mar. 2015

Acceleration of Big Data Partitioning with Multiple FPGA boards
Ryu KUDO; Saori SUDO; Yasin OGE; Yuta TERADA; Masato YOSHIMI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, TECHNICAL REPORT OF IEICE, CPSY2014-152, IEICE, Yokohama, http://www.ieice.org/ken/program/index.php?tgs_regid=53077ae37f63d6d92da5a8fedb26a5e56da654fe71ed80dd51e298c078687e7a&tgid=IEICE-CPSY&lang=, This technical
report describes an implementation of partitioning distribute nucleotide database for sequence similarity search
in bioscience as a case study, and discusses the performance evaluation., Domestic conference
30 Jan. 2015
URL

An Implementation of Web Cache System using Access Frequency of Content Pieces
Takayuki SHIROMA; Takuma NAKAJIMA; Masato YOSHIMI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Poster presentation, Japanese, IEICE, IEICE, 東京, 本研究報告では，動画コンテンツ内のアクセス頻度の差に着目し，動画を分割したコンテンツピースを対象にキャッシュ制御を行うことで，効率的にデータ配信を行うキャッシュ機構を提案する．, Domestic conference
01 Dec. 2014

Efficient Communication Strategy among Web Cache Servers using SDN
Takuma NAKAJIMA; Masato YOSHIMI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, TECHNICAL REPORT OF IEICE, IEICE, Hiroshima, http://www.ieice.org/ken/program/index.php?tgs_regid=0798faabdcdbe96efdb8ecc9bf1160946c02a0ad149b4213fb7af78655239c98&tgid=IEICE-CPSY&lang=, This paper proposes an efficient communication strategy
among Web cache servers using SDN. The experimental result shows that the file download time and the traffic size are reduced 57% and 44%, respectively., Domestic conference
13 Nov. 2014
URL

AirTargetシステムにおけるカーソルの予測描画によるUIの改善
UI improvements; by a cursor; prediction drawing in AirTarget System
Oral presentation, Japanese, IEICE Technical Report, IEICE, 幕張, http://www.ieice.org/ken/paper/20141010kBSe/, Domestic conference
10 Oct. 2014
URL

相対チェックポイントを用いた運転者支援の検討
土門憲司; 吉見真聡; 入江英嗣; 吉永努
Oral presentation, Japanese, 電子情報通信学会／信学技法 CPSY2014-47, IEICE, 幕張, Domestic conference
10 Oct. 2014

STRAIGHTシミュレータによるループ実行の評価
佐保田誠; 山中崇弘; 吉見真聡; 吉永努; 入江英嗣
Poster presentation, Japanese, IPSJ SIG Technical Report, IPSJ, 別府, http://www.ipsj.or.jp/sig-reports/ARC/ARC212.html, Domestic conference
07 Oct. 2014
URL

プリフェッチ精度に基づくキャッシュライン保持手法
力翠湖; 吉見真聡; 吉永努; 入江英嗣
Oral presentation, Japanese, IPSJ SIG Technical Report, IPSJ, 別府, http://www.ipsj.or.jp/sig-reports/ARC/ARC212.html, Domestic conference
06 Oct. 2014
URL

動的推定によるキャッシュパーティショニング最適化
野村隼人; 力翠湖; 吉見真聡; 吉永努; 入江英嗣
Oral presentation, Japanese, IPSJ SIG Technical Report, IPSJ, 新潟, http://www.ipsj.or.jp/sig-reports/ARC/ARC211.html, Domestic conference
28 Jul. 2014
URL

An Experimental Bit-Parallel Solution to Accelerate Smith-Waterman Algorithm
Saori Sudo; Masato Yoshimi; Hidetsugu Irie; Tsutomu Yoshinaga
Oral presentation, Japanese, Tech. Report/IEICE
Jan. 2014

Transparent Data Access and Dynamic Resource Sharing in Cloud Environments
Takuma NAKAJIMA; Masato YOSHIMI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, TECHNICAL REPORT OF IEICE.
Nov. 2013

Performance Evaluation of the Fingertip Recognition Algorithm that Runs on a HMD Device
Kohei CHIKAMA; Hiroshi IWASAKI; Mitsutaka MORITA; Masato YOSHIMI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, TECHNICAL REPORT OF IEICE
Oct. 2013

Intuitive Gesture UIs for Optical See-Through HMDs
Hiroshi IWASAKI; Kohei CHIKAMA; Mitsutaka MORITA; Masato YOSHIMI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, IEICE Technical Report
Oct. 2013

Design of Dedicated Hardware for Energy Efficient Distributed Computing System
Masato YOSHIMI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, Technical report in RIS of IEICE
Oct. 2013

座位状態での心拍測定を用いたリアルタイムなストレス緩和システム
佐久間大輝; 神田尚子; 吉見真聡; 吉永努; 入江英嗣
Public symposium, Japanese, マルチメディア、分散、協調とモバイル(DICOMO2013)シンポジウム, 情報処理学会, 北海道
Jul. 2013

タッチ指示によるお供ロボットナビゲーション
小野澤清人; 芝星帆; 吉永努; 入江英嗣
Public symposium, Japanese, マルチメディア，分散，協調とモバイル(DICOMO2013)シンポジウム, 情報処理学会, 北海道
Jul. 2013

プリフェッチ情報から再参照予測を行うキャッシュライン置き換えアルゴリズム
力翠湖; 眞島一貴; 藤原大輔; 吉見真聡; 吉永努; 入江英嗣
Oral presentation, Japanese, IPSJ SIG-ARC Technical Report
Jul. 2013

相対座標を用いた運動指導システム
黒田修平; 放地宏佳; 吉見真聡; 吉永努; 入江英嗣
Public symposium, Japanese, マルチメディア，分散，協調とモバイル(DICOMO2013)シンポジウム, 情報処理学会, 北海道
Jul. 2013

もしILPプロセッサのレジスタファイルが分散キーバリューストアになったら
入江英嗣; 山中崇弘; 佐保田誠; 吉見真聡; 吉永努
Oral presentation, Japanese, IPSJ SIG-ARC Technical Report
Jul. 2013

FPGA-based Implementation of Sliding-Window Aggregates over Data Streams
Yasin OGE; Masato YOSHIMI; Takefumi MIYOSHI; Hideyuki KAWASHIMA; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, English, IEICE Technical Report
Jan. 2013

配線アクティビティを考慮した3次元積層プロセッサ向けフロアプランナーのための熱評価手法
稲場朋大; 放地宏佳; 藤原大輔; 眞島一貴; 吉見真聡; 入江英嗣; 吉永努
Oral presentation, Japanese, 情報処理学会研究報告
Jan. 2013

色彩環境下での心拍変動との作業能率の相関に関する検討
神田尚子; 佐久間大輝; 吉永努; 入江英嗣
Public symposium, Japanese, 第20回インタラクティブシステムとソフトウェアに関するワークショップ, 日本ソフトウェア科学会
Dec. 2012

顔検出とエッジ抽出を利用した携帯端末による自撮り支援システムの提案
芝星帆; 入江英嗣; 吉永努
Public symposium, Japanese, 日本ソフトウェア科学会, 青森
Dec. 2012

スマートフォンによる歩行動作分析の評価
樫原裕大; 清水裕基; 吉永努; 入江英嗣
Public symposium, Japanese, マルチメディア，分散，協調とモバイル(DICOMO2012)シンポジウム, 情報処理学会, 加賀市
Jul. 2012

レーザー光を利用したデバイス間通信における直観的な接続方法の提案
小木真人; 大木裕太; 吉永努; 入江英嗣
Public symposium, Japanese, マルチメディア，分散，協調とモバイル(DICOMO2012)シンポジウム, 情報処理学会, 加賀市
Jul. 2012

ネットワークコンピューティングのための包括的マッシュアップフレームワークIDUMOの設計
放地宏佳; 三好健文; 入江英嗣; 吉永努
Public symposium, Japanese, マルチメディア，分散，協調とモバイル(DICOMO2012)シンポジウム, 情報処理学会, 加賀市
Jul. 2012

A Data Flow Trace Method over Multiple Processes by Using Additional Identi er for Each TCP Session
Hiroki SHIMIZU; Takefumi MIYOSHI; Hidetsugu IRIE; Tsutomu YOSHINAGA
Public symposium, Japanese, 第23回コンピュータシステム・シンポジウム (ComSys 2011), IPSJ, 京都
Dec. 2011

A Smartphone Application for Improving Gait
樫原裕大; 清水裕基; 三好健文; 吉永努; 入江英嗣
Oral presentation, Japanese, IPSJ SIG Technical Report
Nov. 2011

IDUMO: A Study of the Comprehensive Mashup Framework for Network Computing
放地宏佳; 三好健文; 入江英嗣; 吉永努
Oral presentation, Japanese, IPSJ SIG Technical Report
Nov. 2011

覗き込みを利用した直感的な外部ディスプレイアクセス方式の提案
小木真人; 清水裕基; 三好健文; 吉永努; 入江英嗣
Oral presentation, Japanese, IPSJ SIG Technical Report
Nov. 2011

関連データ先読みとスマートフォンの消費電力に関する研究
小貫貴央; 神田尚子; 放地宏佳; 吉永努; 入江英嗣
Oral presentation, Japanese, 第10回情報科学技術フォーラムFIT2011
Sep. 2011

Multi-GPU Acceleration of Numerical Simulation for the Linear Model of Visual Neurons
Ohmura Junichi; Shunji Satoh; Akira Egashira; Takefumi Miyoshi; Hidetsugu Iriey; Tsutomu Yoshinaga
Oral presentation, Japanese, IPSJ SIG Technical Report
Jul. 2011

An Availability Evaluation of GPU Programming Framework to Provide Embedded MPI
Keigo SHIMA; Takefumi MIYOSHI; Masaaki KONDO; Hidetsugu IRIE; Hiroki HONDA; Tsutomu YOSHINAGA
Oral presentation, Japanese, TECHNICAL REPORT OF IEICE
Jul. 2011

Parallel Numerical Simulation for the Linear Model of Visual Neurons with MPI
Yusuke Saito; Shunji Satoh; Ohmura Junichi; Takefumi Miyoshi; Hidetsugu Irie; Tsutomu Yoshinaga
Oral presentation, Japanese, IPSJ SIG Technical Report
May 2011

A Study of GPU Programming Framework to Provide Embedded MPI
Takefumi Miysohi; Masaaki Kondo; Hidetsugu Irie; Tsutomu Yoshinaga; Hiroki Honda
Public symposium, Japanese, 先進的計算基盤システムシンポジウムSACSIS2011, IPSJ, 東京
May 2011

Examination of block arrangement problem on 3D integrated microprocessor
Yuki MATSUMURA; Takeufmi MIYOSHI; Tsutomu YOSINAGA; Hidetsugu IRIE
Oral presentation, Japanese, IPSJ SIG Technical Report
Apr. 2011

Development of cloud based portable fingertip signature authentication system
RYOTA TERANISHI; TAKEFUMI MIYOSHI; HIDETSUGU IRIE; TSUTOMU YOSHINAGA
Oral presentation, Japanese, IPSJ SIG Technical Report
Mar. 2011

A Dynamic Recon figurable Streaming Processing Engine and Its Compiler
T. Miyoshi; Y. Terada; H. Kawashima; T. Yoshianga
Public symposium, Japanese, 第52回プログラミングシンポジウム, IPSJ, 熱海市
Jan. 2011

A Consideration of Window Join Operation over Data Streams by using FPGA
Yuta TERADA; Takefumi MIYOSHI; Hideyuki KAWASHIMA; Tsutomu YOSHINAGA
Oral presentation, Japanese, TECHNICAL REPORT OF IEICE
Jan. 2011

A Study of Comparison between In-Order and Out-of-Order Processors for Many-core Processor Era
T. Miyoshi; H. Irie; Y. Matsumura; T. Yoshinaga
Oral presentation, Japanese, IEICE Technical Report
Dec. 2010

ZeoBro: A Personal Life Assistance Service Platform on a PC-Cluster
Yuta TAJIMA; Takefumi MIYOSHI; Sayaka AKIOKA; Tatsuya GOTO; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, IEICE TECHNICAL REPORT
Nov. 2010

携帯端末とネットワーク上計算資源の協調によるカメラセンサアプリ高速化の検討
高橋信宏; 入江英嗣; 吉永努; 寺西良太; 清水裕基
Oral presentation, Japanese, 情報処理学会,第9回科学技術フォーラムFIT2010
Sep. 2010

Compiler and Runtime System to Conceal Overhead for Updating Software Cache
Takefumi MIYSOHI; Kenji KISE; Hidetsugu IRIE; Tsutomu YOSHINAGA
Oral presentation, Japanese, IPSJ SIG Technical Report
Aug. 2010

Tying up contents with real objects using pictures and sensors of mobile phones
K. Fujimoto; R. Teranishi; T. Yoshianga; H. Irie; T. Miyoshi; Y. Suzuki
Public symposium, Japanese, マルチメディア，分散，協調とモバイル(DICOMO2010)シンポジウム, IPSJ, 岐阜
Jul. 2010

資源情報の特徴抽出によるモデル化手法と攻撃検知法の提案
清水裕基; 菅谷みどり; 秋岡明香; 吉永努
Oral presentation, Japanese, 情報処理学会,創立50周年記念（第72回）全国大会
Mar. 2010

パターン学習を用いた未知のSQLインジェクション攻撃検知システム
八木達哉; 秋岡明香; 吉永努
Oral presentation, Japanese, 情報処理学会,創立50周年記念（第72回）全国大会
Mar. 2010

Computer Aided Detection System Implementation for Mammograms over a FPGA
Yessica Suarez Henandez; Sayaka Akioka; Tsutomu Yoshinaga; Gonzalo Duchen Sanchez; Volodymyr Ponomaryov
Oral presentation, English, IEICE Tech. Report
Jan. 2010

Prediction router for low latency Fat Tree network
Tomoaki Tateshita; Sayaka Akioka; Tsutomu Yoshinaga; Hiroki Matsutaniy; Michihiro Koibuchi
Oral presentation, Japanese, IPSJ SIG Technical Report
Aug. 2009

細粒度リソース監視による攻撃検出手法の提案と考察
清水裕基; 菅谷みどり; 秋岡明香; 吉永努
Oral presentation, Japanese, ソフトウェア科学会,第26回大会
Aug. 2009

Prediction Switching for Photonic Network-on-chip
Cisse Ahmadou Dit ADI; Hiroki Matsutani; Michihiro Koibuchi; Sayaka Akioka; Tsutomu Yoshinaga
Oral presentation, English, IPSJ SIG Technical Report
Aug. 2009

A Web-based Online Communication Tool with a Location Awareness
Tatsuya Goto; Sayaka Akioka; Tsutomu Yoshinaga
Public symposium, Japanese, マルチメディア，分散，協調とモバイル(DICOMO2009)シンポジウム, IPSJ, 別府
Jul. 2009

A Remote Connection Service to DLNA Appliances via Web Application
Masashi Fukada; Takumi Koyama; Sayaka Akioka; Tsutomu Yoshinaga; Yoshihiro Suzuki
Public symposium, Japanese, マルチメディア，分散，協調とモバイル（DICOMO 2009)シンポジウム, IPSJ, 別府
Jul. 2009

Evaluation of Prediction Router for Low-Latency On-Chip Networks
H. Matsutani; M. Koibuchi; H. Amano; T. Yoshinaga
Public symposium, Japanese, 先進的計算基盤システムシンポジウムSACSIS 2009, IPSJ, 広島
May 2009

Performance Evaluation of SMP Clusters with Multi-link Ethernet
Satoshi Kobayashi; Shan Axida; Tsutomu Yoshinaga
Oral presentation, Japanese, The 71st National Convention of IPSJ
Mar. 2009

Evalutions of Prediction Router for Low-Latency On-Chip Networks
Hiroki Matsutani; Michihiro Koibuchi; Hideharu Amano; Tsutomu Yoshinaga
Oral presentation, Japanese, IPSJ-ARC
Jan. 2009

Mechanism for Sharing Media Content in Multiple Home Network Environments
JingYuan Wu; Tsutomu Yosinaga; Daigo Muto; Takumi Koyama
Oral presentation, English, IPSJ Technical Report
Jul. 2008

A DMS- and DMP-based Client Server System for Playing TV Content via Streaming over the Internet
T. Koyama; D. Muto; T. Yoshinaga
Public symposium, Japanese, マルチメディア、分散、協調とモバイル（DICOMO2008）シンポジウム, IPSJ, 札幌
Jul. 2008

The Study of Low-Latency Network-on-Chip using Predictive Routers
M. Koibuchi; T. Yoshinaga; H. Murakami; H. Matsutani; H. Amano
Public symposium, Japanese, SACSIS 2008, IPSJ, つくば
Jun. 2008

A Low-Latency On-Chip Router Architecture with Prediction Mechanism
H. Matsutani; M. Koibuchi; H. Amano; T. Yoshinaga
Oral presentation, Japanese, IPSJ
May 2008

Pipelined Round-Robin Broadcast Algorithm in Homogeneous Clusters of SMP
Axida; T. Q. Viet; T. Yoshinaga
Oral presentation, English, IPSJ Technical Report
Mar. 2008

Mobile-WormholeDevice: Software-Assisted Remote Communication Mechanism between Mobile Devices and DLNA-based Appliances
T. Koyama; D. Muto; J.-Y. Wu; T. Yoshinaga
Oral presentation, Japanese, IPSJ SIG Technical Report
Mar. 2008

Lowering Network Latency: Utilizing a Communication Prediction Mechanism and its Evaluation
村上弘和; 吉永努; 鯉渕道紘
Oral presentation, Japanese, IPSJ SIG ARC tecnical report
Nov. 2007

Wormhole Device: Software-Assisted Remote Communication Mechanism for DLNA-Based Appliances
D. Muto; T. Yoshinaga
Oral presentation, Japanese, Proc. Multimedia, Distributed, Cooperative and Mobile Symposium, IPSJ
Jul. 2007

Evaluation of Dynamic Communication Prediction in 2-D Tori
Tsutomu Yoshinaga; Hirokazu Murakami; Michihiro Koibuchi
Public symposium, Japanese, 先進的計算基盤システムシンポジウムSACSIS2007, IPSJ, 東京
May 2007

A consideration of slottling for fault-tolerant network routing
村上弘和; 鎌倉正司郎; 吉永努
Oral presentation, Japanese, 情報処理学会第69回全国大会論文集,情報処理学会第67回全国大会
Mar. 2007

A network Architecture for remote communication between digital appliances
武藤大悟; 吉永努
Oral presentation, Japanese, 情報処理学会第69回全国大会論文集,情報処理学会第67回全国大会
Mar. 2007

Analysis of Prediction Accuracy for Communications on k-ary n-cubes
T. Yoshinaga; M. Koibuchi; S. Kamakura
Public symposium, English, 10th Int. Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA07), Maui, Hawaii
Jan. 2007

Output Port Prediction for Messages in 2-D Torus Routers
T. Yoshinaga
Public symposium, English, Tunisia-Japan Symposium on Society, Science and Technology 2006, Borj-Cedria Science and Technology Park, Tunisia, and ARENA of the University of Tsukuba, Sousse
Dec. 2006

Dynamic Predictive Routing for 2-D Torus Networks
鎌倉正司郎; 吉永努; 鯉渕道紘
Oral presentation, Japanese, IPSJ
Aug. 2006

耐故障・適応ルーティングの自動チューニングに関する研究
西村康彦; 鎌倉正司郎; 吉永努; 鯉渕道紘
Public symposium, Japanese, 先進的計算基盤システムシンポジウム SACSIS06, 情報処理学会・計算機アーキテクチャ研究会他, 大阪
May 2006

Improving Linpack Performance on SMP Clusters with Asynchronous MPI Programming
T. Q. Viet; T. Yoshinaga
Public symposium, English, Symposium on Advanced Computing Systems and Infrastructures (SACSIS), IPSJ SIG ARC, IEICE SIG Computer System, etc., Oosaka
May 2006

Predictive Routing of Communication Direction in 2-D Torus Networks
鎌倉正司郎; 西村康彦; 吉永努; 鯉渕道紘
Oral presentation, Japanese, IPSJ Annual Convention
Mar. 2006

Dynamic Tuning Impact on a Fault-Tolerant and Adaptive Routing Function
T. Yoshinaga; Y. Nishimura
Public symposium, English, The 6th Tunisian Japanese Seminar on Culture, Science and Technology, Borj Cedria Science and Technology Park, Tunisia, and ARENA of the University of Tsukuba, Sousse
Nov. 2005

Performance Enhancement for Matrix Multiplication on an SMP PC Cluster
Ta Quoc Viet; Tsutomu Yoshinaga; Ben A. Abderazek
Oral presentation, English, IPSJ SIG Technical Report, HPC
Aug. 2005

Evaluation of Network Reconfiguration Protocols for Fault-Tolerant Adaptive Routing
吉永努; 西村康彦; 曽和将容
Oral presentation, Japanese, Proc. SACSIS (Symposium on Advanced Computing Systems and Infrastructures) 2005
May 2005

Self-Tuning Techniques for Fault-Tolerant and Adaptive Routing
西村康彦; 吉永努; 曽和将容
Oral presentation, Japanese, IEICE Technical Report
Apr. 2005

Personalized Data Retrieval for Campus P2P Networks
杉原健司; 志田匡士; 吉永努; 曽和将容
Oral presentation, Japanese, IEICE Technical Report
Mar. 2005

Rapid FPGA Prototyping of a Queue Processor Core for Embedded Computing
B. A. Abderazek; T. Yoshinaga; M. Sowa
Oral presentation, English, IPSJ
Mar. 2005

生産順序キューマシン命令コード生成手法の提案
川島祐介; 繁田聡一; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会
Mar. 2005

Evaluation of the fault tolerant routing algorithm Detour-UD
船山裕右; 戸村元; 吉永努; 曽和将容
Oral presentation, Japanese, The 67th National Convention of IPSJ
Mar. 2005

An Examination of Congestion Avoidance Routing for k-ary n-cube Networks
西村康彦; 戸村元; 吉永努; 曽和将容
Oral presentation, Japanese, The 67th National Convention of IPSJ
Mar. 2005

JXTAを用いたP2Pネットワークにおける認証機能委託と知的検索
志田匡士; 杉原健司; 吉永努; 曽和将容
Oral presentation, Japanese, The 67th National Convention of IPSJ
Mar. 2005

A General Purpose Assembler for Queue Computers
A. Canedo; B. A. Abderazek; 吉永努; 曽和将容
Oral presentation, English, IPSJ
Mar. 2005

並列キュー計算モデルの理論的特性評価
Halcham Kutluk; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第67回全国大会
Mar. 2005

生産型キュープロセッサの実用化に関する研究
仲谷陵; Ben A. Abderazek; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第67回全国大会
Mar. 2005

FaRMqs: Hybrid Processor Architecture in Verilog-HDL
M. M. Akanda; Abderazek Ben; 吉永努; 曽和将容
Oral presentation, English, IPSJ
Mar. 2005

Verilog-HDLによる並列キュープロセッサのデザイン
三好崇之; ABDERAZEK Ben; 繁田聡一; 吉永努; 曽和将容
Oral presentation, Japanese, 第3回情報科学技術フォーラムFIT04講演論文集
Sep. 2004

Optimization for Hybrid MPI-OpenMP Programs with Thread-to-thread Communication
Ta Quoc Viet; Tsutomu Yoshinaga; Masahiro Sowa
Oral presentation, English, IEICE Technical Report, CPSY2004-12
Jul. 2004

Analysys of Fundamental Characteristics of Parallel Queue Computatiojn Model
Halcham Kutluk; Ben A. Abderazek; 繁田聡一; 吉永努; 曽和将容
Oral presentation, Japanese, IEICE Technical Report, CPSY2004-15
Jul. 2004

Fault-Tolerant Adaptive Deadlock-Recovery Routing for k-ary n-cube Networks
吉永努; 細越洋行; 曽和将容
Oral presentation, Japanese, Proc. Symposium on Advanced Computing Systems and Infrastructures
May 2004

Dynamic Fault-Tolerance of an Adaptive Router for Parallel Computers
戸村元; 細越洋行; 吉永努; 曽和将容
Oral presentation, Japanese, IEICE Technical Report, CPSY2004-8
Apr. 2004

QJavaプロセッサの設計とVerilogシミュレータによる動作検証
阿部俊輔; 繁田聡一; B. A. Abderazek; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第66回全国大会
Mar. 2004

2バイト固定命令長キューマシンアーキテクチャの性能評価
山崎淳一; B. A. ABDERAZEK; 繁田聡一; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第66回全国大会
Mar. 2004

生産順序キューマシン命令コード生成手法の提案
川島祐介; Ben A. Abderazek; 繁田聡一; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第66回全国大会
Mar. 2004

同時マルチスレッディング(SMT)技術を用いたマルチスレッド並列キュープロセッサのハードウエア設計
佐々木博敏; 奥村義智; B. A.Abderazek; 繁田聡一; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第66回全国大会
Mar. 2004

Queue Computation Mechanism For Parallel execution in Parallel Queue Processor
M. M. Akanda; B. A. Abderazek; 繁田聡一; 吉永努; 曽和将容
Oral presentation, English, IPSJ
Mar. 2004

プレゼンス情報によるP2Pネットワーク支援
杉原健司; Xuanhoa Tran; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会第66回大会
Mar. 2004

PQPpfB: Parallel Queue Processor Architecture in Verilog-HDL
B. A. Abderazek; M. Arsenji; K. Kiuchi; M. M. Akanda; S. Shigeta; T. Yoshinaga; M. Sowa
Oral presentation, English, IPSJ
Mar. 2004

Instruction Set Architecture for Parallel Queue Processor
M. Arsenij; B. A. Abderazek; S. Shigeta; H. Kutluk; M. Sowa; T. Yoshinaga
Oral presentation, English, IPSJ
Mar. 2004

生産順序型並列キュープロセッサのための効率的な命令発行のメカニズム
木内和之; Ben Abderazek; 繁田聡一; 曽和将容; 吉永努
Oral presentation, Japanese, 情報処理学会,第66回大会
Mar. 2004

キュー実行方式に基づくキューJava仮想マシンの実現
茂野収; 繁田聡一; B. A. Abderazek; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第66回全国大会
Mar. 2004

QJava VMの実装と動作検証
柳下伸幸; 繁田聡一; B. A. Abderazek; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会,第66回全国大会
Mar. 2004

Design of Producer-order Parallel Queue Processor Architecture
Arsenij Markovskij; Masahiro Sowa; Ben Abderazek; Soichi Shigeta; Tsutomu Yoshinaga
Oral presentation, English, Technical Report of IEICE, CPSY2003-26
Jan. 2004

Introduction of user authentication and access control mechanism in JXTA network
Xuanhoa Tran; 杉原健司; 吉永努; 曽和将容
Oral presentation, English, IPSJ, SIG Technical Reports, 2003-CSEC-23
Dec. 2003

Reduced Bit-Width Instruction Set Architecture for Q-mode Execution in Hybrid Processor Architecture (FaRM-rq)
Ben A. Abderazek; Soichi Shigeta; Tsutomu Yoshinaga; Masahiro Sowa
Oral presentation, English, IPSJ SIG Technical Reports, HPC
Jun. 2003

An Ambiguous, Context-Free Grammar for Deterministic Parsing In Queue-Java Compiler
Li. Qiang Wang; Ben A. Abderazek; Soichi Shigeta; Tsutomu Yoshinaga; Masahiro Sowa
Oral presentation, English, IPSJ SIG Technical Report, HPC
Jun. 2003

Proposal of QJava for High Bytecode Level Parallelism
繁田聡一; 王立強; Ben A. Abderazek; 吉永努; 曽和将容
Oral presentation, Japanese, 先端的計算基盤システムシンポジウム論文集
May 2003

Fast, Effective Instruction Generation Algorithm for Queue-Java Compiler (QJAVAC)
Li. Qiang Wang; Ben A. Abderazek; Soichi Shigeta; Tsutomu Yoshinaga; Masahiro Sowa
Oral presentation, English, IPSJ SIG Technical Report, ARC-153
May 2003

Design of a Fault-Tolerant Fully Adaptive Router
細越洋行; 水戸部理; 吉永努; 曽和将容
Oral presentation, Japanese, Symposium on Advanced Computing Systems and Infrastructures (SACSIS03)
May 2003

A Hybrid MPI-OpenMP Solution for a Linear System on a Cluster of SMPs
Ta Quoc Viet; Tsutomu Yoshinaga; Ben A. Abderazek; Masahiro Sowa
Oral presentation, English, Proc. Symposium on Advanced Computing Systems and Infrastructures
May 2003

Fundamental Design of a QJava Processor
繁田聡一; 阿部俊輔; B. A. Abderazek; 吉永努; 曽和将容
Oral presentation, Japanese, IEICE
Apr. 2003

Introducing access control functions to a Jini network
宮本幹大; Hoa Tran Xuan; 吉永努; 曽和将容
Oral presentation, Japanese, National Convention of IPSJ
Mar. 2003

迂回制御を考慮した適応ルータの設計
細越洋行; 水戸部理; 吉永努; 曽和将容
Oral presentation, Japanese, 第6回システムLSIワークショップ資料集
Nov. 2002

Fundamental Design of a Parallel Queue Processor
B. A. Abderazek; 繁田聡一; K. Nikolova; 吉永努; 曽和将容
Oral presentation, Japanese, Technical Report of IEICE
Nov. 2002

A Master-Slave Algorithm for Hybrid MPI-OpenMP Programming on a Cluster of SMPs
Ta Quoc Viet; Tsutomu Yoshinaga; Masahiro Sowa
Oral presentation, English, IPSJ SIG Notes, HPC
Aug. 2002

適応ルーティングを用いたPCクラスタ用ネットワークスイッチの提案
水戸部理; 吉永努; 曽和将容
Oral presentation, Japanese, 並列処理シンポジウムJSPP2002
May 2002

Performance Comparison of Routers by Changing Virtual Channel Connection
水戸部理; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会第64回大会3ZB-3
Mar. 2002

キュー構文木を用いたJavaコンパイラ
王立強; 吉永努; 曽和将容
Oral presentation, Japanese, 情報処理学会第64回大会5ZB-02
Mar. 2002

マルチFPGAベース・カスタム計算機による医療用画像処理
横田隆史; 永淵雅道; 目加田慶人; 吉永努; 大津金光; 馬場敬信
Oral presentation, Japanese, FPGA/PLD Design Conference論文集
Jan. 2002

An LSI Design of the Concurrent Deadlock Recovery Router Recover-x
御代田雅俊; 吉永努; 横田隆史; 大津金光; 馬場敬信
Oral presentation, Japanese, Technical Report of IEICE, ICD2001-69
Aug. 2001

Speed-up of the medical image processing using the parallel FPGA system
永淵雅道; 吉永努; 横田隆史; 大津金光; 馬場敬信
Oral presentation, Japanese, Technical Report of IEICE, CPSY2001-44
Jul. 2001

異なるプラットフォームにおける受信メッセージ予測法の性能評価
足立涼子; 岩本善行; 大津金光; 吉永努; 馬場敬信
Oral presentation, Japanese, 情報処理学会研究報告
Sep. 2000

適応ルータにおける最適な仮想チャネル数に関する考察
堀田真貴; 吉永努; 大津金光; 馬場敬信
Oral presentation, Japanese, 並列処理シンポジウムJSPP2000
Jun. 2000

SOCノードで構成する並列計算機の性能評価
古川文人; 大津金光; 吉永努; 馬場敬信
Oral presentation, Japanese, 並列処理シンポジウムJSPP2000
Jun. 2000

並列デッドロック回復ルータRecover-xの性能評価
林匡哉; 堀田真貴; 中村さゆり; 吉永努; 大津金光; 馬場敬信
Oral presentation, Japanese, 情処学ARC研報
2000

適応ルータの効率的な並列デッドロックリカバリ方式の提案
林匡哉; 堀田真貴; 吉永努; 大津金光; 馬場敬信
Oral presentation, Japanese, 並列処理シンポジウムJSPP '99
1999

システムオンチップ化ノードで構成する並列計算機の性能評価
古川文人; 大津金光; 吉永努; 馬場敬信
Oral presentation, Japanese, 並列処理シンポジウムJSPP '99
1999

受信メッセージ予測法の実装と評価
岩本善行; 大津金光; 吉永努; 馬場敬信
Oral presentation, Japanese, 並列処理シンポジウムJSPP '99
1999

MIMD並列計算機における仮想時間を用いた性能評価法
岩本善行; 阿部大輝; 大津金光; 吉永努; 馬場敬信
Oral presentation, Japanese, 並列処理シンポジウムJSPP '98
1998

適応ルータのコストパフォーマンス
吉永努; 山口喜教
Oral presentation, Japanese, 並列処理シンポジウムJSPP '98
1998

トポロジ独立なA-NETマルチコンピュータの通信性能
澤田東; 阿部大輝; 廣田守; 吉永努; 馬場敬信
Oral presentation, Japanese, 並列処理シンポジウムJSPP '97
1997

Courses

コンピュータサイエンス実験第二 A ・ B
Present

コンピュータ設計論
Present
The University of Electro-Communications

論理設計学
Present
The University of Electro-Communications

計算機ネットワーク特論
Present
The University of Electro-Communications

アカデミックリテラシー
The University of Electro-Communications

論理回路学
The University of Electro-Communications

ネットワークコンピューティング論２
電気通信大学

コンピュータサイエンス実験第２AB
The University of Electro-Communications

大学院技術英語
The University of Electro-Communications

Affiliated academic society

the institute of electronics, information and communication engineering

IEEE

情報処理学会

Research Themes

通信量削減とQoS向上のための通信負荷に適応するD2D分散協調キャッシュの開発
日本学術振興会, 科学研究費, 基盤研究(C)（一般）, Principal investigator
Apr. 2024 - Mar. 2027

distributed and cooperative cache server architecture to realize traffic volume reduction and low-latency response
吉永努
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C), The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), Principal investigator, 2021年度は，ネットワークの動画配信サービスを対象として，以下の研究を実施した．１．高負荷時にも配信サービスが停止しにくいキャッシュサーバの設計ユーザからのリクエスト量の時間変化分を予測し，高負荷時には配信する動画の画質を自動的に低下させる方式を採用する．動画のキャッシュサーバは，通信負荷に合わせてより重要なデータのヒット率が向上するように，画質毎の優先度付きキャッシュ制御を行う．提案方式を組み込んだコンテンツ配信ネットワーク（CDN)シミュレータを作成し，優先度付きキャッシュ制御方式の有効性を考察した．実験の結果，優先度付きキャッシュは従来方式のユーザQoEを改善できることを確認した．また，モバイルデバイスで構成する分散協調キャッシュとの併用についても検討した．２．FPGA搭載型キャッシュサーバの設計動画配信ネットワーク用のキャッシュサーバにFPGAボードを搭載し，FPGAに実装する専用ハードウェアによってネットワークの通信プロトコル処理，コンテンツ検索，メモリ（DRAM）やストレージ・アクセスを実行する．FPGA実装の目的は，キャッシュ制御処理の低遅延化，低消費電力化である．2021年度は，FPGAに実装するキャッシュ制御ハードウェアを試作し，まず単発のユーザリクエストを処理する実験を行った．予備実験の結果，FPGAを用いたキャッシュ制御のハードウェア化によって，ユーザリクエスト処理の低遅延化が実現できる見通しを得た．, 21K11805
Apr. 2021 - Mar. 2024

Researches on Model-aided Learning Approaches for Reliable Realtime Control in Future Wireless Systems
計宇生; 金子めぐみ; 村瀬勉; 吉永努; 策力木格; 江易翰
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A), National Institute of Informatics, Grant-in-Aid for Scientific Research (A), Coinvestigator, 2020年度では、以下の項目に関する研究を実施した。１）数理モデルと機械学習の統合手法による無線資源割当については、数理モデルの助けを借りて、効果的に学習する方法について検討した。通信と計算資源の割当問題を非線形計画問題として数理モデル化し、モデルのパラメータ化と非制約化転換で、目的と制約の双対関係を利用して、Deep Dual Learningの手法を使った学習を行った。数理モデルによる近似よりも高いシステム性能が得られたことを確認した。また、深層畳込みニューラルネットワークに基づく転移学習を利用して、異なる無線環境にも適用できる周波数検知方法を提案した。２）コンテクストアウェアな通信とオフローティングについては、自動運転・協調ロボットなどの迅速な判断を行うべき場合において、近隣の最新状況を素早く把握し、端末間で協調分散処理の実現方法について検討を行った。周囲の状況が動的に変化する環境において、強化学習を利用して、車両間の通信経路を先制的に確保する方法を検討した。状況の変化によって、経路を機動的に変えられるようにした結果、既存の手法よりも優れた性能が得られた。また、動的に干渉が存在する環境下におけるミッションクリティカルな車載通信のための品質制御方法について提案した。３）FPGAを用いた通信処理、AI処理、キャッシングの高速化については、低遅延、高信頼の実時間処理を行うための加速処理をFPGAによって実現する方法として、分散協調キャッシュサーバ処理のFPGAオフロードについて検討を行った。ビックデータ解析のためにストレージとネットワークを密に結合させたFPGAを複数組み合わせたInterconnected-FPGAsと呼ばれるシステムが処理の加速化に有効であり、それを用いて分散協調キャッシュサーバの処理の一部をオフロードすることで、レスポンス時間が短縮されることを確認した。, 20H00592
Apr. 2020 - Mar. 2024

Reducing communication volume of contents delivery network using distributed collaborative caches
Yoshinaga Tsutomu
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C), The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), We propose a color-based distributed and cooperative cache control scheme. The proposed scheme assigns color-based tags to each cache server as well as each content. We also propose a template-based sub-optimal content distribution, which is called templatized elastic assignment (TEA) scheme, for device to device (D2D) content sharing networks. The proposed TEA scheme minimizes communication traffic volume via base stations using parameters such as the number of mobile devices and content access popularity. We developed a prototype of content delivery network (CDN) to examine effectiveness of our proposals. Based on our evaluation, up to 92% of traffic volume on CDN is reduced compared to a case without using the proposed cache system., 18K11259
Apr. 2018 - Mar. 2021

分散配置キャッシュ利用によるインターネット通信データ量削減の検証
TIS株式会社, 研究助成
01 Jul. 2016 - 31 Mar. 2019

FPGAを用いたデータ処理アクセラレータに関する研究
株式会社アバールデータ, 研究助成
01 Feb. 2017 - 31 Jan. 2018

専用ハードウェアを用いたデータストリーム管理システムの開発
Tsutomu Yoshinaga
STARC, アイデア・スカウト（IS) プログラム, Principal investigator, 高速・低電力なストリームデータ処理用ハードウェアを開発し，グリーンコンピューティング技術に貢献する．
Apr. 2014 - Mar. 2017

コンテンツ分割キャッシュを用いた配信ネットワークの効率化
Tsutomu Yoshinaga
The Telecommunications Advancement Foundation, 研究助成, Principal investigator, 本研究調査では，動画コンテンツ配信ネットワークの高速化について考察すると共に，動画を分割したコンテンツピース単位でWebキャッシュを制御する手法，及び効率的な配信経路選択手法について提案，評価する．
01 Apr. 2015 - 31 Mar. 2016

Network architecture and communication of a photonicnetwork-on-chip with fully utilizing inherent parallelism in applications
YOSHINAGA Tsutomu
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), We proposed a photonic network-on-chip (NoC) which utilizes both static and dynamic wavelength allocation mechanisms. Proposed NoC communicates fine-grained and coarse-grained messages by the static and dynamic wavelength allocations, respectively. Our experiments show that the proposed NoC improves communication performance per energy consumption compared to previously proposed electronic-photonic hybrid NoCs., 22500042
2010 - 2012

A study on high-performance and reliable networks with dynamic communication prediction
YOSHINAGA Tsutomu
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), ネットワークを流れる通信パケットの経路をルータで予測することで高性能通信を実現する方式を提案し,ルータのハードウェア設計とシミュレーションによる評価を行った.また,ネットワークトポロジーとアプリケーションの持つ通信パターンに対する予測アルゴリズムの関係を明らかにした.予測精度は通信環境に影響を受けるが,適切な予測アルゴリズムを用いることで予測ルータが種々のネットワーク環境において低遅延通信に有効であることを実験的に確認した., 19500040
2007 - 2009

Research on the speculative Processing of computer systems based on the information theory approach
TE Sun Han; MORITA Hiroshi; NAGAOKA Hiroshi; NISHIARA Mikihiko; YOSHINAGA Tsutomu; KISE Kenji
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (B), This research examined the various universal data prediction techniques argued in the field of information theory as application to the speculative processing in computer systems. Two research groups, the information theory group and the computer architecture group, jointly carried out the research and build the data structure suitable for the speculative processing in computer systems. Moreover, we designed the speculation execution architecture which uses the proposed data structure, and evaluated its performance. The main results of this research are as follows. 1. Antidictionary data structure suitable for speculative processing The compressing method called the anti-dictionary using the set of the minimum series which does not appear for a data series was examined. We present a fast and memory-efficient algorithm to construct an antidictionary for a binary string using a suffix tree. It is proved that the complexity of the algorithm is linear in space and time, and its effectiveness is demonstrated by simulation results. 2. Branch prediction for microprocessors The branch prediction scheme using the pattern matching with a branch history and an instruction address was proposed, and its prediction accuracy was examined. We verified the good prediction accuracy of the scheme compared with the conventional branch prediction only using a branch history when there was no restriction of computational complexity or the amount of hardw are resources. 3. Switching router with predictions In order to realize the predictive switching router which operates efficiently, we proposed the technique of reducing prediction mistake packets, and the technology of detecting and canceling a prediction mistake packet. We performed the evaluation by a simulation about the effect of predictive switching in consideration of a dynamic communication collision and verified the high performance of the proposed technique., 17360178
2005 - 2006

High-Performance Multicomputer based on Receiving Message Prediction
BABA Takanobu; YOKOTA Takashi; OOTSU Kanemitsu; YOSHINAGA Tsutomu; KATO Shigeo; HASEGAWA Madoka
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Utsunomiya University, Grant-in-Aid for Scientific Research (B), This research proposes and evaluates the Receiving Message Prediction Method for high performance message passing. In this method, a node in the idle state predicts the next message reception and speculatively executes the message reception and user processes. This method is independent of underlying computer architecture and message passing libraries. We propose the algorithms for the message prediction, and evaluate them from the viewpoint of the success ratio and speed-ups. We use the NAS parallel benchmark programs as typical parallel applications running on two different types of parallel platforms, i.e., a workstation cluster and a shared memory multiprocessor. The experimental results show that the method can be applied to various platforms. The method can also be implemented just by changing the software inside their message passing libraries without any support from the underlying system software or hardware. This mean that we do not require any change of applications software that uses the libraries. The application of the method to the message passing interface libraries achieves a speed-up of 6.8 % for the NAS Parallel Benchmarks, and the static and dynamic selection of prediction methods based on profiling results improve the performance., 14380135
2002 - 2005

Receiving Message Prediction and Speculative Execution of the Receiving Process
BABA Takanobu; YOSHINAGA Tsutomu; OOTSU Kanemitsu; YOKOTA Takashi
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Utsunomiya University, Grant-in-Aid for Scientific Research (C), We have designed and implemented the Receiving Message Prediction Method for high performance message passing. In the method, a node in the idle state predicts the message to be received next, and speculatively executes the message reception and user processes. This method is independent of underlying computer architecture and message passing library. We have proposed six methodologies for message prediction, and evaluated the effectiveness from the viewpoints of the success ratio and speed-ups. The experimental results show that (1) the method can be applied to various platforms, such as a workstation cluster, a distributed memory multicomputer, and a shared memory multiprocessor, and (2) the maximum 6.8 % speedup has been attained by implementing our method in the MPI receiving function and using NAS Parallel Benchmarks. These results prove the effectiveness of the Receiving Message Prediction Method., 12680328
2000 - 2001

通信スケジューリングを考慮した適応ルーティングに関する研究
吉永努
日本学術振興会, 科学研究費助成事業, 奨励研究(A), Principal investigator, 本研究では、新しい適応ルーティング・アルゴリズムとしてRecover-xを提案し、それを実現するルータチップの設計と評価を行った。・Recover-x適応ルーティングの提案これまでのデッドロック回復方法を見直し、より少ないハードウェア量で実現可能なRecover-x適応ルーティングを提案した。具体的には、2次元トーラスネットワークにおいて、デッドロック対象とするメッセージを1次元のみに限定することで、ロジックの簡単化と高速動作を促進する。・HDLによるルータの記述と動作検証 Recover-x適応ルータをハードウェア記述言語Verilog-HDLにより設計した。また、LSI設計ツールを利用してシミュレーションによる動作検証と論理合成によるハードウェアコストの評価を行った。・通信性能の評価 Verilog-HDLのシミュレータを利用し、10×10(100ノード)の2次元トーラスネットワークに対するバンド幅と通信遅延(レイテンシ)時間を評価した。その結果、Recover-xは、通信パターンにかかわらず高バンド幅/低レイテンシであることを確認した。以上のことから、Recover-xは従来の適応ルーティングに比べて高速動作設計に適合するアルゴリズムであり、堅牢な通信性能を示すことがわかった。, 11780190
1999 - 2000

Design and Implementation of the A-NET Multicomputer based on System-on-Chip Architecture
BABA Takanobu; KATOH Shigeo; OOTSU Kanemitsu; YOSHINAGA Tsutomu; FUKUNAGA Yasushi; KIMURA Yasunori; NAKATA Toshiyuki
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Utsunomiya University, Grant-in-Aid for Scientific Research (B)., The A-NET project has designed and implemented a parallel object-oriented language, called A-NETL, and its processing systems. The systems have been implemented on three platforms, such as the A-NETL oriented multicomputer, a workstation cluster, and a commercially available parallel machine, AP1000. Based on the results of the implementations, this research has designed and evaluated the A-NET multicomputer, using system-on-chip(SOC)technologies. Thus, the research includes the followings : (1)the evaluation of the various A-NETL implementations as the base of the research, (2)the design and evaluation of a multi-threaded, node processor architecture, (3)the design and evaluation of a new adaptive, router architecture, called Recover-x, (4)the SOC multicomputer architecture of the year 2005, and(5)the applications of the A-NETL multicomputer to various areas. The results of these research indicate that(1)the SOC multicomputer is a promising architecture for future parallel machine, (2)the router should be included in an SOC node, and(3)there is a trade-off point between on-chip DRAM capacity and on-chip cache one., 10558039
1998 - 2000

並列オブジェクト指向計算機の性能評価に関する研究
吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 奨励研究(A), 本研究では、まず我々の研究室で開発した16ノードからなるA-NETマルチコンピュータを利用して、並列計算機の通信方式を詳細に評価した。次に、新たな通信方式をハードウェアとソフトウェアの両面で提案し、実験を行った。 1. 新しい適応ルータの提案と評価 A-NETマルチコンピュータは、適応ルーティングをサポートしている。しかし、アプリケーションの通信パターンによっては、その利点を十分に活かせない場合があった。そこで、メッセージ毎に出力チャネル選択の優先順位を指定できるようにした新たなルーティング手法を提案し、実際にルータを設計してその有効性を評価した。 2. 受信メッセージ予測によるプログラムの先行実行並列オブジェクト指向プログラムの実行モデルであるメッセージ駆動方式の改善について検討した。そして、直後(将来)に到着するであろうメッセージを予測して、メッセージ受信後の処理を投機的に先行実行する方式を提案し、A-NETマルチコンピュータに実装した。その予備評価により、アプリケーションの通信パターンに適合した予測アルゴリズムを使用することにより、高い確率で到着メッセージの予測が可能であることを示した。今後は、上記2点に関し、さらに詳細な実験を行う。具体的には、ハードウェア記述言語を利用したルータの設計とシミュレータを用いた大規模シミュレーションによる評価、実用的アプリケーションに対する受信メッセージ予測法の効果などを明らかにする。, 09780237
1997 - 1998

High Performance Implementation of a Parallel Object-Oriented Language using Type Inference
BABA Takanobu; OOTSU Kanemitsu; YOSHINAGA Tsutomu
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Utsunomiya University, Grant-in-Aid for Scientific Research (C), The advent of massively parallel computers requires new programming paradigms. As a parallel object-oriented language allows the user to describe various styles of parallel operations flexibly, it is recognized as one of the most promising solutions for describing highly parallel and distributed programs. However, the high-level language structure requires new techniques for the high-performance implementation. This research proposes compile-time and run-time optimization techniques, including type inference, for a parallel object-oriented language, called A-NETL.They have been implemented on various platforms, such as clusters of workstations/personal computers, a commercially available parallel machine, Fujitsu AP1000, and the A-NETL oriented parallel machine, called the A-NET multicomputer. Experimental results show that the high-level language facilities have been implemented efficiently, utilizing the proposed optimizattion techniques., 09680324
1997 - 1998

非決定的並列プログラム検証のための論理時間に基づく再演機構
馬場敬信; 吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 基盤研究(C), 本年度行った研究の実績は次の通りである。 (1)イベント及び論理時間の定義とその検証並列オブジェクト指向言語プログラムの実行において、メッセージの通信、各ノードプロセッサ上でのコンテクスト切り替えなど、処理の区切りとなるものをイベントとして定義し、イベント間の半順序関係をもとに論理時間を定義した。 (2)論理時間に基づく再演実行機構の設計提案する方式の要点は、イベントを計測しながらテスト実行を行い、イベント記録をもとに、各イベントの論理時間を定めるとともに、再演時には論理時間によって再演実行の順序を決定するものである。 (3)デバッガの設計、試作 (1)、(2)に基づく並列プログラムデバッガを設計し、研究室で試作したマルチコンピュータ上に実現した。各ノードはイベントが発生するたびにこれを記録し、一定の時間間隔でホストに送信する。デバッガの試作に当たっては、各ノード上のファームウェアを活用して、できるだけプローブ効果を抑える工夫をした。 (4)実験による評価実験によって、イベントに基づく論理時間の定義を基本とする記録、表示、再演のための機構の有効性が明らかになった。特に、使用者から見ると、非同期的な現象をあたかも同期的な現象のように扱えるため、極めてデバッグしやすくなる。また、デバッグ機能の付加に伴う動的な負荷は、イベント記録について、1イベント当り平均0.04ms、記憶容量において約35Bであり、リプレイの実行性能が1論理時間に対する表示に0.94sと十分実用に耐えるものであることを確かめた。 (5)成果報告本研究の成果については、添付の研究発表に示すように、電子情報通信学会論文誌、及びSpringer社Lecture Notes in Computer Science1107などに掲載となった。また、関連する研究成果の報告を添付の文献リストに示すように行った。, 08680346
1996 - 1996

並列オブジェクト指向言語の高並列計算機への高効率実装に関する研究
吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 奨励研究(A), 本研究では,並列オブジェクト指向言語の普及と実装技術の確立を目的として,我々の研究室で設計した言語A-NETLを商用の高並列計算機である富士通AP1000に実装し,通信性能やアプリケーションの処理効率などを評価した. 実装方式の確立 AP1000ばかりでなく,幅広いアーキテクチャへの移植性を考慮して,広く普及しているC言語へのトランスレート方式を採用した.ベンダ提供のCコンパイラを利用することで,アーキテクチャ固有の低レベル・ハードウェア気候を利用した最適化が低コストで可能になる.トランスレータは,ユーザプログラムを翻訳するばかりでなく,A-NETL-C言語間のセマンティックギャップを埋めるコードを自動生成する. 実装と改善メッセージ通信性能をA-NETL及びCプログラムについて計測した所,当初の実装においては,A-NETLはCの約3.4倍のオーバヘッドを伴っていた.その後,メッセージ受信領域のプリアロケ-トや受信後のディスパッチ処理を改善することにより,2倍弱のオーバヘッドに抑えることができた. アプリケーション性能 Nクイーン,ガウスの消去法による連立方程式の解析,分子動力学シミュレーションなどの問題について実行時間を調べた所,順に10.3(24),5.7(24),14.8(25)倍の台数効果が確認できた(カッコ内はノード数). 今後の課題 A-NETLは,変数の型宣言がない,同期/非同期など種々のメッセージパタンがある,などの理由でC言語と比較すると実装のコストが大きい.今後,コンパイル時の変数型推論,Active Messageの導入などにより,さらなる効率化を図りたい., 07780225
1995 - 1995

自律系並列オブジェクト指向計算モデルに関する研究
馬場敬信; 吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 一般研究(C), 本年度行った研究の実績は次のとおりである。 (1)宣言型同期機構の定義とA-NETL言語への組み込み複数メッセージの送受信、実行順序の指定、メソッド起動のロック、起動条件の指定などから成る宣言的な同期機構を定義し、これをA-NETL言語に組み込んだ。同時に設計した仕様に基づいて種々の非同期問題を記述することにより、定義の妥当性を検証した。 (2)コンパイラ、OSなどの処理系の試作設計した言語の仕様に沿ってコンパイラを改訂した。同期機構のための実行時処理については、ユーザコードとOSコードにそれぞれ適切と思われる機能分散を行った。基本的には、簡単な処理はユーザコードで、重たい処理はトラップ命令によりOSに制御を移して行う。 (3)実験による効果の測定定義した同期機構を用いて、種々の並列問題を記述するとともに、これをA-NETマシンシミュレーション上でシミュレーションすることにより、同期機構の効果を実験的に明らかにした。この結果、本機構によってユーザは非同期的な事象をそのまま自然な形でモデル化できる。その効果はスケーラブルで6〜17才オブジェクトに対して平均で20.6%、65オブジェクトに対して平均46.1%の実行時間の削減になる、などが判明し、同期機構の有効性が実験的に明らかになった。 (4)成果の報告本研究の成果については、添付の文献におけるJSPPで発表するとともに、電子情報通信学会英文論文誌(IEICE)に投稿、掲載となった。また、関連する研究成果の報告を添付の文献リストに示すように行った。, 07680334
1995 - 1995

超並列オブジェクト指向ソフトウェアのための視覚的プログラミング環境の構築
吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 奨励研究(A), 本研究では,現在我々の研究室で開発中の並列オブジェクト指向トータルアーキテクチャA-NETをベースとして,グラフィカルなユーザインターフェイスを用いて並列プログラムの開発を支援し,またデバッグするための環境を構築した. 1.プログラミング支援システムAPSS ユーザは,専用のプログラミング支援システムAPSSから,ソースコードの編集,コンパイル,実行までを一貫して行える.また,オブジェクト間の通信関係を反映した静的な負荷分散を行なうために,プログラム構成ファイルの作成も支援する.プログラム構成ファイルにより,プログラムに属するオブジェクトやクラスなどを宣言し,それらの計算コストや通信関係を指定することができる. 2.デバッガオブジェクト内のデバッグについては,ソースコードレベルのブレークポイント方式で対応する.一方,複数オブジェクト間にまたがる部分ついては,非決定性あるいはプローブ効果など並列処理固有の問題が生じる可能性があるため,テスト実行によるイベント記録とリプレイによって,イベント単位での処理の再現性を確保する. イベントには,オブジェクト間のインタラクションとしてのメッセージ交換と,並列計算機の各ノード内で発生するプロセス切替えがある,デバッガは,これらのイベントを記録編集して2次元ダイアグラムで関係を表示すると共に,リプレイ時には個々のオブジェクトの状態とメッセージ送受信の様子をアニメーション的にグラフィック表示する. 3.今後の課題現在は上記のシステムが動作し始めた段階であり,具体的な評価はまだ行っていない.今後,A-NETLプログラミング熟練者,および初心者に本システムを使用してもらい,システムの有効性を検討する予定である., 06780233
1994 - 1994

Development of a parallel Object-Oriented Total Architecture [A-NET]
BABA Takanobu; MAEDA Akira; KOIKE Nobuhiko; BANDO Tadaaki; YONAZAWA Akinori; YOSHINAGA Tsutomu
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Utsunomiya University, Grant-in-Aid for Developmental Scientific Research (B), We have developed a parallel object-oriented total architecture A-NET (Actors NET work). The development includes the design and implementation of a programming language, and a multicomputer hardware, and their applications. (1) 8-node Prototype Multicomputer We developed a 2-node prototype in 1994, and debugged its hardware. Based on that experience, we scaled it up to 8 nodes and put them into an originally designed 20-slot cabinet at the beginning of 1995. We also developed its device driver on the host workstation and connected the prototype and the host via VME interface. (2) Parallel Object-Oriented Language We have designed a parallel object-oriented language A-NETL.In 1994, we added some file access functions to A-NETL.We also improved the reliability and portability of the langage processors. (3) Future Work We have to adjust the 8-node prototype hardware. Then we will extend it to the 16-node one. We also need to modify the software environment in order to replace the simulator with the real machine. Now, we are planning to implement A-NETL both on AP1000 and on workstation clusters to extend its available environment., 04555077
1992 - 1994

超並列計算機のためのトポロジ独立なメッセージルーティングに関する研究
吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 奨励研究(A), 超並列計算機のためのネットワークトポロジ独立なメッセージの通信方式を設計し、シミュレータを試作していくつかのサンプルプログラムについて実験を行った。以下に、具体的な通信方式と実験結果について報告する。 1.通信方式ネットワークトポロジについては静的に可変な直接網を対象とし、ハイパーキュープ、2D、3Dメッシュ、トーラス、トリ-などについて経路選択可能なものを設計した。このため、メッセージの宛先はシリアルなノード番号で表し、各ノードのル-タに持たせた宛先ノード番号に対する出力ポート番号のテーブル参照によって経路選択を行う。また、ネットワーク資源を有効利用できるよう適応型ルーティングをサポートし、衝突が発生した場合には最短経路の中から異なる経路を選択することができる。出力ポートがすべて使用可能でない場合、デッドロックを防ぐため、適当なパケットをル-タ内のバッファに退避してポートを開放するバーチャルカットスルー方式を実装した。 2.実験結果 LAN上の複数のUNIXワークステーションを用いてシミュレーション実験を行える環境を整備し、いくつかの具体的なサンプルプログラムの通信パターンについて実験を行った。なお、本シシュレータはノード内の処理もシシュレートしているため、時間的に正確な通信パターンをシシュレートすることができる。今回の実験では、ノード内部の処理能力として研究室で設計中のA-NET計算機の性能を仮定した。実験の結果、MIMD的に動作する問題については0.3msec程度の通信遅延時間で線形的な並列処理による速度向上を達成できることが分かった。また、SIMD的な問題では0.1msec以下の通信遅延を実現しないと良好な速度向上が期待できないことが分かった。, 05780226
1993 - 1993

超並列処理のためのオブジェクト指向言語とその処理に関する研究
馬場敬信; 吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 重点領域研究, 平成4年度に設計した並列オブジェクト指向言語の実行モデルを改善するとともに,言語処理系の試作とシミュレーション実験を行った。以下に,具体的な内容について整理する。 1.実行モデルの改善 (1)オブジェクトロードの高速化: ホストから各ノードへのオブジェクトの転送を高速化するため,クラスやインデックスト・オブジェクトのアドレスをすべてのノードで統一し,ブロードキャストすることとした。 (2)メソッド起動の高速化: メッセージを受信してからユーザメソッドを起動するまでの時間短縮を目的として,変数宛以外のメッセージセレクタを受信先オブジェクトのメッセージ辞書のアドレスとし,実行時のメソッド探索を不要にした。 2.並列プログラミングの支援並列オブジェクト指向のプログラミングの支援を目的として,以下に述べる専用の環境を構築した。 (1)コンパイラ: 上に述べた実行モデルの改善を達成するための改訂を行った。具体的には,従来動的に決定していたオブジェクトのロードアドレスをコンパイル時に決定し,OSが行っていた実行モジュールの再配置を不要にした。 (2)統合プログラミング環境: 平成4年度から進めてきた専用プログラミング環境の開発を継続して行った。今年度は,プログラムの編集から実行までを統合的に行える環境を整備した。 3.今後の課題言語処理系,OS,シミュレータの改訂がほぼ終了した。今後は,シミュレータを用いて性能評価を行う必要がある。また,より並列プログラムの開発が容易になるよう,イベント履歴に基づくプログラム実行のリプレイなどをサポートした並列デバッガを現在構築中である。, 05219203
1993 - 1993

高並列オブジェクト指向計算機のためのオペレーティングシステムの研究
吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 奨励研究(A), 04750303
1992 - 1992

超並列処理のためのオブジェクト指向言語とその理処に関する研究
馬場敬信; 吉永努
日本学術振興会, 科学研究費助成事業, 宇都宮大学, 重点領域研究, 超並列処理をより実用的なものとすることを目的として,並列オブジェクト指向実行モデルを再検討し,その結果に基づいて記述言語を改訂した.また,言語処理系を試作し,シミュレーション実験を行った. 1実行性能の改善 (1)実行順序制御機構の導入:種々のシミュレーション問題などで必要となる時間的な情報の管理をグローバルな同期をとることなく行えるように,メッセージの到着順に関係なくメソッドの起動順序を柔軟に管理するための記述を,並列オブジェクト指向言語の言語仕様に加えた. (2)メソッド起動の高速化:OSのオーバヘッド軽減を目的として,実行順序制御やメッセージの待ち合わせなどについて,コンパイラによるスケジューリング解析を行い,通常メソッドのディスパッチを高速化した. 2並列プログラミングの支援 (1)アロケータ:分散メモリ型の並列計算機上で実行することを仮定して,オブジェクトを各ノードプロセッサに最適に割り付けるためのネットワークトポロジ独立なアロケータを試作した. (2)統合プログラミング環境:プログラムの編集からコンパイル,ネットワークトポロジとノード数の指定,アロケーション,実行などを統合的に行える専用のプログラミング環境を試作した. 3実験 LANにより結合されたUNLXワークステーション上にシミュレータを試作し,改善前との比較を行った.その結果,各プログラムの総通信量は40〜60%削減された.また,実行時間については,1〜3倍程度速度が向上した., 04235202
1992 - 1992

THE STUDY ON CONFIGURATION CONTROL OF ELECTRICAL POWER SYSTEM WITH CASE-BASED REASONING
OKUDA Kenzo; YOSINAGA Tsutomu; BABA Takanobu
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, UTSUNOMIYA UNIVERSITY, Grant-in-Aid for General Scientific Research (C), This study is concerned with the fault restoration support system in electrical power system using case-based reasoning (CBR). The construction of case-base, retrieval, evaluation, application and modification, explanation of reasoning-process of a case which are important processes in CBR are studied. These functions are embedded in our fault restoration support system. In the problem of fault restoration, the application and modification of a current case correspond to load switching in electrical power system. The indispensable functions of load switching are made clear, and implemented in our system. To evaluate our fault restoration support system, the simulations of fault restoration were carried out under exhaustively various conditions with practical scale electrical power system. The results of the simulations are as follows: (1)optimum or sub-optimum solutions are obtained by CBR, (2)the processing time is considerably short, and this feature is remarkable with complicated cases by CBR, (3)the method of construction of case- base, the guideline of the selection of cases which are stored in case-base are made clear., 63550201
1988 - 1989

Object-Oriented, Highly Parallel AI Machine
TAKANOBU BABA; YOSHINAGA Tsutomu; KUMAGAI Takeshi; AOKI Kyota; OKUDA Kenzo; YONEZAWA Akinori
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Utsunomiya University, Grant-in-Aid for General Scientific Research (C), This research has proposed an object-oriented total architecture, called A-NET(Actors Network), considering its applications, parallel programming language, and highly parallel machine configuration at the same time. The major results are as follows: (1) We have defined a parallel, object-oriented language A-NETL (A-NET Language) and developed its processor. (2) We have applied A-NETL to the description of AI problems such as an expert system and language recognition, and to the definition of an A-NET operating system to be placed on each processing element. (3) We have designed an A-NET oriented special processor and evaluated the design by a software simulation. These results support our research direction towards the design and implementation of a special purpose VLSI chip and, thus, the implementation of object-oriented, highly parallel architecture., 62550255
1987 - 1988

Industrial Property Rights

ネットワークシステム、ノード装置、キャッシュ方法及びプログラム
Patent right, 吉永努, 中島拓真, 森元敏雄, 石橋靖嗣, 特願2018-015767, Date applied: 31 Jan. 2018, TIS株式会社，電気通信大学, 特許第67281879, Date issued: 23 Jun. 2020

ネットワークシステム，キャッシュ方法，キャッシュプログラム，管理装置，管理方法及び管理プログラム
Patent right, 吉永努, 中島拓真, 吉見真聡, 森本敏雄, 村木暢哉, 特願2018-551597, PCT/JP2017/040485, Date applied: 09 Nov. 2017, TIS株式会社，国立大学法人電気通信大学, WO2018/092679, Date announced: 24 May 2018, 6712744, Date issued: 04 Jun. 2020

ネットワークシステム、ノード装置、キャッシュ方法及びプログラム
Patent right, 吉永努, 中島拓真, 森元敏雄, 石橋靖嗣, 特願2017-138406, Date applied: 14 Jul. 2017, TIS株式会社，電気通信大学, 特開2019-020994, Date announced: 07 Feb. 2019, 6638145, Date issued: 07 Jan. 2020, チャンク分割転送する動画の分散協調キャッシュ

ネットワークシステム，無線通信端末，通信方法及びプログラム
Patent right, 吉永努, 城間隆行, 中島拓真, 森元敏雄, 石橋靖嗣, 特願2017-175164, Date applied: 12 Sep. 2017, TIS株式会社，国立大学法人電気通信大学, 特開2019-53358(P2019-53358A), Date announced: 04 Apr. 2019, 6606808, Date issued: 01 Nov. 2019

ネットワークシステム、ノード装置、キャッシュ方法及びプログラム
Patent right, 吉永努, 中島拓真, 吉見真聡, 森元敏雄, 村木暢哉, 特願2017-105055, Date applied: 26 May 2017, TIS株式会社 , 国立大学法人電気通信大学, 特開2018-200581, Date announced: 20 Dec. 2018, 6592809, Date issued: 04 Oct. 2019, 階層ネットワークにおけるキャッシュ方法及びプログラム

通信端末装置、通信ネットワークシステム、通信方法及び通信プログラム
Patent right, 小木真人, 入江英嗣, 大木裕太, 吉永努, 特願2012-242584, Date applied: 02 Dec. 2012, 国立大学法人電気通信大学, 特開2014-092903, Date announced: 19 May 2014, 特許第6061377号, Date issued: 22 Dec. 2016, 直観的に連携対象の機器を指定し、煩雑な通信確立手順を経ることなく、連携機器間の接続を確立することができる通信端末装置を提供することを目的とする。

ネットワークシステム，キャッシュ方法及びキャッシュプログラム
Patent right, 吉永努, 中島拓真, 吉見真聡, 森本敏雄, 村木暢哉, 特願2016-224243, Date applied: 17 Nov. 2016, TIS株式会社，電気通信大学

データ処理装置およびデータ処理方法，並びにプログラム
Patent right, オゲヤースィン, 吉見真聡, 入江英嗣, 吉永努, 特願2014-230387, Date applied: 13 Nov. 2015, 特開2016-095606, Date announced: 26 May 2016

映像データ送出方法、並びに、その方法を実行する映像データ送出装置、その方法をコンピュータに実行させるための映像データ送出プログラム、およびそのプログラムが書き込まれた記録媒体
Patent right, 吉永努, 小山卓視, 坪田浩乃, 小野松丈洋, 錦織義久, 特願2008-168460, Date applied: 27 Jun. 2008, 国立大学法人電気通信大学(50%), 船井電機株式会社(50%), 特許第5257659号, Date issued: 02 May 2013

ルータおよび並列分散システム
Patent right, 鯉渕道紘, 吉永努, 鎌倉正司郎, 特願2007-135940, Date applied: 22 May 2007, 大学共同利用機関法人情報・システム研究機構，国立大学法人電気通信大学, 特開2008-294586, Date announced: 04 Dec. 2008