
Sato Shigeyuki
Department of Computer and Network Engineering | Associate Professor |
Cluster I (Informatics and Computer Engineering) | Associate Professor |
Researcher Information
Career
- Oct. 2023 - Present
The University of Electro-Communications, Graduate School of Informatics and Engineering, Associate Professor - Dec. 2018 - Sep. 2023
The University of Tokyo, Graduate School of Information Science and Technology, Assistant Professor - Apr. 2016 - Nov. 2018
Kochi University of Technology, School of Information, Postdoctoral Researcher - Apr. 2015 - Mar. 2016
The University of Tokyo, Graduate School of Information Science and Technology, Project Researcher
Educational Background
- Apr. 2011 - Mar. 2015
The University of Electro-Communications, Graduate School of Informatics and Engineering, Department of Communication Engineering and Informatics - Apr. 2009 - Mar. 2011
The University of Electro-Communications, Graduate School of Electro-Communications, Department in Computer Science - Apr. 2005 - Mar. 2009
The University of Electro-Communications, Faculty of Electro-Communications, Department of Computer Science
Member History
- Apr. 2024 - Present
編集委員, 日本ソフトウェア科学会編集委員会, Society - Apr. 2023 - Present
編集委員, 情報処理学会論文誌プログラミング編集委員会, Society - Apr. 2023 - Present
運営委員, 情報処理学会プログラミング研究会運営委員会, Society - 2021 - Mar. 2022
PPL2022 プログラム委員 - Mar. 2021 - Mar. 2021
PPL2021 プログラム組織委員 - 2020 - Mar. 2021
PPL2021 プログラム委員 - Nov. 2019 - Feb. 2020
PPoPP 2020 Artifact Evaluation Committee - 2018 - Mar. 2019
プログラム委員, PPL2019, Society - 2017 - Mar. 2018
組織委員, PPL2018, Society - 2016 - Mar. 2017
組織委員, PPL2017, Society - 2015 - Mar. 2016
プログラム委員, PPL2016, Society
Research Activity Information
Award
- 2020
Information Processing Society of Japan
Centaurus: A Just-in-Time Parallel-Parser Generator for Ad hoc Data Processing
IPSJ Yamashita SIG Research Award, 佐藤 重幸 - 2018
日本ソフトウェア科学会
第35回大会 優秀発表賞, 佐藤 重幸 - 2016
日本ソフトウェア科学会
第33回大会 高橋奨励賞, 佐藤 重幸 - 2013
日本ソフトウェア科学会
第30回大会 学生奨励賞, 佐藤 重幸 - 2011
日本ソフトウェア科学会
第28回大会 学生奨励賞, 佐藤 重幸
Paper
- Efficiently Adapting Stateless Model Checking for C11/C++11 to Mixed-Size Accesses
Shigeyuki Sato; Taiyo Mizuhashi; Genki Kimura; Kenjiro Taura
Lecture Notes in Computer Science, Springer Nature Singapore, 346-364, 28 Oct. 2024, Peer-reviwed, Abstract
Stateless model checking (SMC) is crucial for productivity in verified concurrent programming, and its recent developments for C/C++ and weak memory models are remarkable. The state-of-the-art SMC for C, GenMC, efficiently verifies C programs based on C11 atomics and pthreads. However, it does not support mixed-size accesses, accesses to the same memory region with different-sized types, even though they are ubiquitous in C/C++, particularly the code for memory management. As a result, GenMC does not work for C/C++ programs containing memory management. To resolve this problem, we develop a method of adapting GenMC to mixed-size accesses preserving its optimality. We experimentally evaluate the efficiency of our extended implementation of GenMC and its efficacy for memory management programs.
In book - Multiverse Notebook: Shifting Data Scientists to Time Travelers
Shigeyuki Sato; Tomoki Nakamaru
Lead, Proceedings of the ACM on Programming Languages, 8, OOPSLA1, 121:1-121:30, 29 Apr. 2024, Peer-reviwed
Scientific journal, English - Collective Allocator Abstraction to Control Object Spatial Locality in C++
Takato Hideshima; Shigeyuki Sato; Tomoharu Ugawa
The Art, Science, and Engineering of Programming, 15 Feb. 2024, Peer-reviwed
Scientific journal, English - Multiverse Notebook: A Notebook Environment for Safe and Efficient Exploration
Tomoki Nakamaru; Shigeyuki Sato
Companion Proceedings of the 2022 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, ACM, 7-8, 29 Nov. 2022
International conference proceedings - Reverse engineering for reduction parallelization via semiring polynomials.
Akimasa Morihata; Shigeyuki Sato
PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI), ACM, 820-834, Jun. 2021, Peer-reviwed
International conference proceedings - Pitfalls of InfiniBand with On-Demand Paging
Takuya Fukuoka; Shigeyuki Sato; Kenjiro Taura
Corresponding, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, 265-275, Mar. 2021, Peer-reviwed, InfiniBand is a popular high-performance interconnect and offers Remote Direct Memory Access (RDMA), which enables low-latency communication based on kernel bypassing. Although the conventional RDMA technology necessitates manual physical memory management, an emerging extension, On-Demand Paging (ODP), implements automatic memory management based on RDMA-triggered page faults, which benefits productivity. Although the existing studies said the overhead of a page fault of ODP to be small enough, an in-depth investigation in various network situations including retransmission and timeout is missing. In this work, we conduct a comprehensive analysis of the actual behaviors of ODP on different devices and reveal two awful performance pitfalls, which incur longer latencies 3-4 orders of magnitude than a common-case page fault does. We also experimentally demonstrate that the revealed pitfalls are harmful to existing software systems. This paper presents our experimental analysis and lessons learned therefrom.
International conference proceedings - A Dual-Index Based Representation for Processing XPath Queries on Very Large XML Documents
Wei Hao; Kiminori Matsuzaki; Shigeyuki Sato
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Springer Science and Business Media Deutschland GmbH, 363, 18-30, 2021, Although XML processing has been intensively studied in recent years, designing efficient implementations for evaluating XPath queries on XML documents remains a challenge in case XML documents are very large. In this study, we implemented a tree-shaped data structure called partial tree that is intrinsically suitable for large XML document processing with multiple computers. Our implementation uses two index sets to accelerate the evaluation of structural relationships among nodes, making it highly efficient for processing very large XML documents regarding three important classes of XPath queries: backward, order-aware and predicate-containing queries. Experiment results show that our implementation outperforms a start-of-the-art XML database BaseX in both absolute loading time and execution time for the target queries. The absolute execution time over 358 GB of XML data averagely is only seconds by using 32 EC2 instances.
International conference proceedings, English - MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA
Wataru Endo; Shigeyuki Sato; Kenjiro Taura
2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM), IEEE, 9-16, Nov. 2020, Peer-reviwed, The spread of RDMA-capable interconnects on supercomputers has enabled the middleware developers to explore new design options for runtime systems based on efficient communications. Observing low-latency networks and shared-memory infrastructure for multi-core processors, we have focused on extending shared-memory abstraction into multiple nodes exploiting RDMA, i.e., Distributed Shared Memory (DSM). We have found that the traditional protocols of DSM designed for two-sided communications cannot fully exploit the performance of RDMA, which necessitates decentralization and coarse-grained communications. To solve this problem, we introduced two methods for the DSM coherence protocol to exploit RDMA and implemented a DSM library MENPS using this protocol. Our evaluation shows that MENPS could accelerate two of five shared-memory applications with minimal modifications and beat an existing RDMA-based DSM runtime.
International conference proceedings - CENTAURUS: A Dynamic Parser Generator for Parallel Ad Hoc Data Extraction
Shigeyuki Sato; Hiroka Ihara; Kenjiro Taura
J. Inf. Process., 28, 724-732, 2020, Peer-reviwed, It is important to handle large-scale data in text formats such as XML, JSON, and CSV because these data very often appear in data exchange. For these data, instead of data ingestion to databases, ad hoc data extraction is highly desirable. The main issue of ad hoc data extraction is to serve both the programmability to allow handling various types of data intuitively and the performance for large-scale data. To pursue it, we develop Centaurus, a dynamic parser generator library for parallel ad hoc data extraction. This paper presents the design and implementation of Centaurus. The experimental results on ad hoc data extraction have demonstrated that Centaurus outperformed fast dedicated parser libraries in C++ for XML and JSON, and achieved excellent scalability with actions implemented in Python.
Scientific journal - Parallelization of XPath Queries Using Modern XQuery Processors.
Shigeyuki Sato; Wei Hao; Kiminori Matsuzaki
New Trends in Databases and Information Systems - ADBIS 2018 Short Papers and Workshops, AI*QA, BIGPMED, CSACDB, M2U, BigDataMAPS, ISTREND, DC, Budapest, Hungary, September, 2-5, 2018, Proceedings, Springer, 54-62, 2018, Peer-reviwed - A Generator of Hadoop MapReduce Programs that Manipulate One-dimensional Arrays
Reina Miyazaki; Kiminori Matsuzaki; Shigeyuki Sato
J. Inf. Process., Information Processing Society of Japan, 25, 841-851, 2017, Peer-reviwed, MapReduce is a framework for large-scale data processing proposed by Google, and its open-source implementation, Hadoop MapReduce, is now widely used. Several language systems have been proposed to make developing MapReduce programs easier, for instance, Sawzall, FlumeJava, Pig, Hive, and Crunch. These language systems mainly target applications that can be naturally solved by using a MapReduce-like programming model. In this study, we propose a new MapReduce-program generator that accepts programs manipulating one-dimensional arrays. By using the proposed generator, users only need to write sequential programs to generate Hadoop MapReduce programs automatically. We applied some program optimization techniques to the generation of Hadoop MapReduce programs. In this paper, we also report our experiment results that compare programs generated by the proposed generator with hand-written MapReduce programs.
Scientific journal, English - A Generic Implementation of Tree Skeletons
Shigeyuki Sato; Kiminori Matsuzaki
Int. J. Parallel Program., SPRINGER/PLENUM PUBLISHERS, 44, 3, 686-707, 2016, Peer-reviwed, In data-parallel skeleton libraries, the implementation of skeletons is usually tightly-coupled with that of data structures. However, loose coupling between them like C++ STL will improve modularity and flexibility of skeletons and data structures. This flexibility is particularly valuable for tree skeletons. To achieve such loose coupling, we present an iterator-based interface of trees for tree skeletons. We have implemented tree skeletons on the basis of our interface; we present their design and implementation. This paper also reports the results of preliminary experiments.
Scientific journal, English - A debugger-cooperative higher-order contract system in Python
Ryoya Arai; Shigeyuki Sato; Hideya Iwasaki
Corresponding, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 10017, 148-168, 2016, Peer-reviwed, Contract programming is one of the most promising ways of enhancing the reliability of Python, which becomes increasingly desired. Higher-order contract systems that support fully specifying the behaviors of iterators and functions are desirable for Python but have not been presented yet. Besides, even with them, debugging with contracts in Python would still be burdensome because of delayed contract checking. To resolve this problem, we present PyBlame, a higher-order contract system in Python, and ccdb, a source-level debugger equipped with features dedicated to debugging with delayed contract checking. PyBlame and ccdb are designed on the basis of the standard of Python and thus friendly to many Python programmers. We have experimentally confirmed the advantage and the efficacy of PyBlame and ccdb through the web framework Bottle.
International conference proceedings, English - s6raph: vertex-centric graph processing framework with functional interface.
Onofre Coll Ruiz; Kiminori Matsuzaki; Shigeyuki Sato
Proceedings of the 5th International Workshop on Functional High-Performance Computing, FHPC@ICFP 2016, Nara, Japan, September 22, 2016, ACM, 58-64, 2016, Peer-reviwed - Design and Implementation of Thunk Recycling in the Glasgow Haskell Compiler
Yasunao Takano; Hideya Iwasaki; Shigeyuki Sato
Computer Software, 32, 1, 253-287, Feb. 2015, Peer-reviwed
Scientific journal, Japanese - LibDSL: a library for developing embedded domain specific languages in d via template metaprogramming
Masato Shioda; Hideya Iwasaki; Shigeyuki Sato
Proceedings of the 2014 International Conference on Generative Programming: Concepts and Experiences - GPCE 2014, ACM Press, 2014, Peer-reviwed
International conference proceedings - An Operator Generator for Skeletal Programming on Trees
Sato, S; Matsuzaki, K
IPSJ Trans. PRO, 情報処理学会, 6, 4, 38-49, Dec. 2013, Peer-reviwed, 木上の並列計算を抽象化および構造化する手段として,木スケルトンというものが存在する.著者らは,木スケルトンをC++上の高階関数ライブラリとして実装してきている.この木スケルトンは,効率的な並列計算を保証するために,逐次計算において使われる演算子に加えて,補演算子も要求する.この補演算子の導出は一般に難しく,木スケルトンの利用への大きな障害となっている.そこで本研究では,既存の補演算子導出法に基づいた演算子生成器を,Cコンパイラ拡張として実装した.本システムは,木スケルトンを隠蔽することで,利用者が木スケルトンの複雑さに触れることを防ぐ.利用者は逐次的な再帰関数をCで記述するだけで,暗黙かつ単純に木スケルトンを利用できるようになる.したがって,本システムは,木上のスケルトン並列プログラミングの難しさを首尾よく解消する.Tree skeletons are known as a way of abstracting and structuringparallel computation on trees. We have been implementing tree skeletonsas a library of higher-order functions in C++. For efficient parallelcomputing, our tree skeletons necessitate auxiliary operators as well asoperators used in sequential computation. Since an auxiliary operator isdifficult to derive in general, our tree skeletons are too difficult fornon-expert programmers to use. To overcome this difficulty, on thetheoretical basis of existing work, we have developed an operatorgenerator in the form of an extension to a C compiler. By hiding a treeskeleton itself, our implementation prevents users from touching itscomplicatedness; then they have only to describe recursive functions inC to use a tree skeleton simply and implicitly. Our implementationresolves the difficulty of skeletal programming on trees successfully.
Japanese - A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
Shigeyuki Sato; Hideya Iwasaki
PROGRAMMING LANGUAGES AND SYSTEMS, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 5904, 79-94, 2009, Peer-reviwed, Although today's graphics processing units (GPUs) have high performance and general-purpose computing on GPUs (GPGPU) is actively studied, developing GPGPU applications remains difficult for two reasons. First, both parallelization and optimization of GPGPU applications is necessary to achieve high performance. Second, the suitability of the target application for GPGPU must be determined, because whether at application performs well with GPGPU heavily depends on its inherent properties, which are not obvious from the source code. To overcome these difficulties, we developed a skeletal parallel programming framework for rapid GPGPU application developments. It enables program tin to easily write GPGPU applications and rapidly test them because it generates programs for both GPUs and CPUs from the same source code. It also provides an optimization mechanism based on fusion transformation. Its effectiveness was confirmed experimentally.
International conference proceedings, English
MISC
- Enjoy Data Processing[V・Finish]: Python Language
Shigeyuki Sato
The Institute of Electronics, Information and Communication Engineers, Dec. 2019, The journal of the Institute of Electronics, Information and Communication Engineers, 102, 12, 1135-1139, Japanese, Introduction scientific journal - PLDI 2016 Report
Shigeyuki Sato
Information Processing Society of Japan, Oct. 2016, IPSJ magazine, 57, 11, 1154-1156, Japanese, Meeting report
Lectures, oral presentations, etc.
- MN-Core アセンブリ言語の形式検証に向けて
中村 元昭; 佐藤 重幸
Oral presentation, Japanese, 第27回プログラミングおよびプログラミング言語ワークショップ PPL 2025, Peer-reviewed
05 Mar. 2025
05 Mar. 2025- 07 Mar. 2025 - 多相型に基づく型エラー診断の定式化
小林 亮太; 佐藤 重幸; 田浦 健次朗
Oral presentation, Japanese, 第26回プログラミングおよびプログラミング言語ワークショップ PPL 2024, Peer-reviewed
06 Mar. 2024
05 Mar. 2024- 07 Mar. 2024 - プレゼンテーションスライド作成のためのSATySFiノートブック環境
両角 颯; 佐藤 重幸; 田浦 健次朗
Oral presentation, Japanese, 第26回プログラミングおよびプログラミング言語ワークショップ PPL 2024, Peer-reviewed
05 Mar. 2024
05 Mar. 2024- 07 Mar. 2024 - SATySFi におけるドメイン固有型エラー診断
小林 亮太; 佐藤 重幸; 田浦 健次朗
Oral presentation, Japanese, 日本ソフトウェア科学会第40回大会
14 Sep. 2023
12 Sep. 2023- 14 Sep. 2023 - A Far-memory Allocator for Pointer-based Data Structures in C++
秀島, 宇音; 佐藤, 重幸; 田浦, 健次朗
Japanese, 情報処理学会論文誌プログラミング(PRO), http://id.nii.ac.jp/1001/00223355/, Far Memoryは,ネットワーク越しの他マシンのメモリをデータの置き場に加えつつ手元のメモリとの間でのデータ移動を自動的に行う技術であり,手元のメモリ資源を超えた規模のデータに対する多様な操作を,少ないプログラマ負担で実装できるようにする.これをC++などメモリへの低水準操作を許す言語に向けて提供するにあたっては,1)独自のAPIを介さず通常のメモリ領域と同様に読み書きでき,2)かつ通常の領域と使い分けられることが望ましい.だが我々の知る限りこの両方を満たす実装は存在しない.そこで本発表では,メモリ空間内の部分空間に対してFar Memory機能を付加し,そこに属するメモリ領域を専用のアロケータで確保できるようにする,アロケータとしてのFar Memory実装を提案する.これによりプログラマは,通常の領域と区別できる方法でFar Memory領域を確保し,そこへの読み書きは通常の領域と同様に記述できるようになる.提案実装はそうした読み書きにページキャッシュを用意して対応しつつ,さらに局所性を意識したメモリ確保によって性能向上に寄与する.我々は提案実装の上にポインタベースの構造である平衡木を実装することを通じて,各々の特性を活かして性能を高めるプログラミングが実現されることと,提案実装が生むFar Memory領域と通常の領域との高い相互運用性がプログラマの負担を軽減することを確認した.
Far memory is a technology that enables us to utilize the memory resources of remote machines as if they are local ones by transparently moving data through networks. It facilitates implementing various operations on larger-scale data than the memory resources of local machines. In C/C++, or languages supporting direct operations on memory, far-memory regions have two desiderata: 1) the same operations as normal regions are applicable to them; yet, 2) they support region-aware programming. Unfortunately, to the best of our knowledge, no implementation satisfies both of them. In this presentation, we propose an allocator implementation of far memory, which creates far-memory sub-regions within the memory space and provides a dedicated allocator for them. With this interface, programmers can perform region-aware allocation and direct read/write operations to the allocated far-memory regions. Our implementation handles such reads/writes with page cache, and furthermore, contributes to performance improvement by locality-aware allocation. Through implementing balanced trees atop the proposed allocator, we have confirmed that our allocator supports region-aware programming to improve performance and offers high productivity stemming from the interoperability with the existing C/C++ code.
13 Jan. 2023
13 Jan. 2023- 13 Jan. 2023 - Simultaneous Finite Automaton の部分構成による並列正規表現マッチ
高品 剛大; 佐藤 重幸; 田浦 健次朗
Japanese, 第24回プログラミングおよびプログラミング言語ワークショップ PPL 2022, Peer-reviewed
06 Mar. 2022
06 Mar. 2022- 08 Mar. 2022 - PLAGS UT: 自動評価付きPythonプログラミング課題管理システム
佐藤 重幸; 犬伏 貴之
Japanese, 第24回プログラミングおよびプログラミング言語ワークショップ PPL 2022, Peer-reviewed
06 Mar. 2022
06 Mar. 2022- 08 Mar. 2022 - Implementing Maximum Flow Algorithms with Spark GraphX
松本 拓也; 佐藤 重幸; 松崎 公紀
Japanese, 日本ソフトウェア科学会大会論文集, 日本ソフトウェア科学会, http://ci.nii.ac.jp/naid/40021461282
18 Sep. 2017
18 Sep. 2017- 18 Sep. 2017 - A Generator of Hadoop MapReduce Programs that Manipulate One-dimensional Arrays
Reina Miyazaki; Kiminori Matsuzaki; Shigeyuki Sato
English, 情報処理学会論文誌プログラミング(PRO), http://id.nii.ac.jp/1001/00182661/, MapReduce is a framework for large-scale data processing proposed by Google, and its open-source implementation, Hadoop MapReduce, is now widely used. Several language systems have been proposed to make developing MapReduce programs easier, for instance, Sawzall, FlumeJava, Pig, Hive, and Crunch. These language systems mainly target applications that can be naturally solved by using a MapReduce-like programming model. In this study, we propose a new MapReduce-program generator that accepts programs manipulating one-dimensional arrays. By using the proposed generator, users only need to write sequential programs to generate Hadoop MapReduce programs automatically. We applied some program optimization techniques to the generation of Hadoop MapReduce programs. In this paper, we also report our experiment results that compare programs generated by the proposed generator with hand-written MapReduce programs.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.25(2017) (online)------------------------------MapReduce is a framework for large-scale data processing proposed by Google, and its open-source implementation, Hadoop MapReduce, is now widely used. Several language systems have been proposed to make developing MapReduce programs easier, for instance, Sawzall, FlumeJava, Pig, Hive, and Crunch. These language systems mainly target applications that can be naturally solved by using a MapReduce-like programming model. In this study, we propose a new MapReduce-program generator that accepts programs manipulating one-dimensional arrays. By using the proposed generator, users only need to write sequential programs to generate Hadoop MapReduce programs automatically. We applied some program optimization techniques to the generation of Hadoop MapReduce programs. In this paper, we also report our experiment results that compare programs generated by the proposed generator with hand-written MapReduce programs.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.25(2017) (online)------------------------------
21 Jul. 2017
21 Jul. 2017- 21 Jul. 2017 - Locality Enhancement for Generalized N-body Algorithm
佐藤 重幸; 田浦 健次朗
Japanese, 日本ソフトウェア科学会大会論文集, 日本ソフトウェア科学会, http://id.ndl.go.jp/bib/027831494
07 Sep. 2016
07 Sep. 2016- 07 Sep. 2016 - Abstraction of Space Partitioning for Spatial Computation
Shigeyuki Sato; Kenjiro Taura
English, 情報処理学会論文誌プログラミング(PRO), http://id.nii.ac.jp/1001/00157597/, Space-partitioning trees, each of which holds particles in some space, are extensively used in various applications. Parallel programming with space-partitioning trees is an important issue because they are often used for spatial computation on large-scale data that motivates parallel computing. High-level computational patterns that abstract spatial computation on space-partitioning trees will make parallel programming easy, simple, and productive. Although they have tree structures, spatial computation on them is inherently interaction/correlation computation on two sets rather than tree computation. Existing patterns based only on tree structures are therefore insufficient to abstract it. In this study, we present a pattern based on cross edges between two space-partitioning trees. It abstracts computations between subspaces/particles as cross edges. We have implemented our pattern as a C++ template library at the proof-of-concept level. We present our library API, describe its affinity with parallel implementation, and also report the experimental overhead of sequential execution.Space-partitioning trees, each of which holds particles in some space, are extensively used in various applications. Parallel programming with space-partitioning trees is an important issue because they are often used for spatial computation on large-scale data that motivates parallel computing. High-level computational patterns that abstract spatial computation on space-partitioning trees will make parallel programming easy, simple, and productive. Although they have tree structures, spatial computation on them is inherently interaction/correlation computation on two sets rather than tree computation. Existing patterns based only on tree structures are therefore insufficient to abstract it. In this study, we present a pattern based on cross edges between two space-partitioning trees. It abstracts computations between subspaces/particles as cross edges. We have implemented our pattern as a C++ template library at the proof-of-concept level. We present our library API, describe its affinity with parallel implementation, and also report the experimental overhead of sequential execution.
26 Feb. 2016
26 Feb. 2016- 26 Feb. 2016 - Using Hardware Transactional Memory for Efficiently Solving Graph Problems
小林 哲; 佐藤 重幸; 岩崎 英哉
Japanese, 情報処理学会論文誌プログラミング(PRO), http://id.nii.ac.jp/1001/00142229/, 近年,ハードウェアトランザクショナルメモリ(HTM)を搭載したCPUが市場にリリースされてきている.HTMのような投機的並列実行は,グラフアルゴリズムの並列処理に有効であることが知られている.そのため,HTMをグラフ問題の並列処理に利用することは有用と考えられる.しかし,現実的なグラフ問題の並列処理におけるトランザクションは,グラフの探索と更新をアトミックに行う必要があるため長くなりやすい.HTMではハードウェアの制限によって,このようなトランザクションを効率的に実行することは難しい.この問題を解決するため,本発表では,HTMのトランザクションを短縮する方法を提案する.提案方法では,計算処理とグラフの探索部分をトランザクションの外に出し,グラフの更新だけをトランザクション内で実行する.本発表では,HTMとしてIntel HaswellのRestricted Transactional Memoryを,グラフ問題としてDelaunay Mesh Refinementを取り上げ,提案する方法をpthreadsのmutexによる実装と比較して評価を行った.Nowadays, commodity processors are equipped with hardware transactional memory (HTM). Speculative parallel execution, which HTM adopts, is known to bring a performance gain in a wide range of graph algorithms. Therefore, the application of HTM to parallel graph processing seems promising. However, transactions in parallel graph processing for practical graph problems tend to be long because both graph traversal and updating have generally to be atomic. Executing such transactions efficiently is difficult because of hardware restrictions of HTM. To resolve this problem, we propose a method for shortening transactions of HTM. It makes calculation and graph traversal migrate from transactions and performs only graph updating within transactions. In this study, we use the Restricted Transactional Memory, which is a HTM functionality of the Intel Haswell architecture, and deal with Delaunay Mesh Refinement as an example graph problem. We evaluate the proposed method by comparing it with mutex locks in Phreads.
02 Jun. 2015
02 Jun. 2015- 02 Jun. 2015 - Locality-Aware Parallel Iterative Tree Traversal
佐藤 重幸
English, 日本ソフトウェア科学会大会論文集, [日本ソフトウェア科学会], http://id.ndl.go.jp/bib/026725220
07 Sep. 2014
07 Sep. 2014- 07 Sep. 2014 - A Parallel Implementation of Syntax-Directed Dataflow Analysis
佐藤 重幸; 森畑 明昌
Japanese, 日本ソフトウェア科学会大会論文集, [日本ソフトウェア科学会], http://id.ndl.go.jp/bib/026720114
10 Sep. 2013
10 Sep. 2013- 10 Sep. 2013
Courses
- Software Engineering
Oct. 2024 - Present
The University of Electro-Communications - Fundamentals of Programming Lanugages
Apr. 2024 - Present
The University of Electro-Communications - Compiler
Apr. 2024 - Present
The University of Electro-Communications - Introduction to Python Programming
Dec. 2018 - Sep. 2023
The University of Tokyo - Python Programming Paradigms
Aug. 2020 - Aug. 2021
The University of Tokyo - 数値計算法
Jan. 2018 - Feb. 2018
Kochi University of Technology
Research Themes
- Research on Managed Languages and Runtime Systems to Utilize Memory with Computational Capabilities
鵜川 始陽; 塩谷 亮太; 佐藤 重幸; 穐山 空道
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B), The University of Tokyo, Grant-in-Aid for Scientific Research (B), Coinvestigator, 22H03566
Apr. 2022 - Mar. 2027 - Program synthesis for Processing-in-Memory architectures
佐藤 重幸
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Early-Career Scientists, The University of Tokyo, Grant-in-Aid for Early-Career Scientists, Principal investigator, 22K17872
Apr. 2022 - Mar. 2026 - Advanced loop parallelization and integrated vectorization
Sato Shigeyuki
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Early-Career Scientists, The University of Tokyo, Grant-in-Aid for Early-Career Scientists, Principal investigator, In this project, we studied automatic parallelization of complicated reduction loops. Our developed techniques enable us to systematically transform serial intuitive specifications into divide-and-conquer implementations amenable to various parallel computers. Specifically, we developed 1) a branch elimination method by operator extraction, 2) a parallelization method based on dynamic behaviors, and 3) a vectorization method based on data shuffling as part of the fundamental compiler technology. To evaluate these techniques, we also developed 4) a benchmark suite of reduction loops. Furthermore, we developed implementation techniques for 5) parallel lexing and 6) parallel regular expression matching as case studies on applications of complicated reductions., 18K18032
Apr. 2018 - Mar. 2022 - 自動チューニング可能な一般化N体問題解法枠組みの開発
佐藤重幸
JST, ACT-I加速フェーズ, Principal investigator, JPMJPR17UC
Apr. 2019 - Mar. 2021 - 自動チューニング可能な一般化N体問題解法枠組みの開発
佐藤重幸
JST, ACT-I, Principal investigator
Oct. 2017 - Mar. 2019