
Yusaku YAMAMOTO
Department of Computer and Network Engineering | Professor |
Cluster I (Informatics and Computer Engineering) | Professor |
Fundamental Program for Advanced Engineering (Evening Course) | Professor |
Researcher Information
Research Keyword
Research Activity Information
Award
Paper
- Comparison of Two QUBO Formulations of Approximate Block Diagonalization and Their Performance on the D-Wave Advantage Quantum Annealing Machine
Koushi Teramoto; Shuhei Kudo; Yusaku Yamamoto
Lecture Notes in Computer Science, Springer Nature Switzerland, 217-230, 01 Apr. 2025, Peer-reviwed
In book - Discrete Lotka–Volterra systems with time delay and its stability analysis
Yusaku Yamamoto; Taisei Yamamoto; Takumi Kuroiwa; Kurumi Oka; Emiko Ishiwata; Masashi Iwasaki
Lead, Physica D: Nonlinear Phenomena, Elsevier BV, 474, 134562-134562, Apr. 2025, Peer-reviwed
Scientific journal - Convergence analysis of the generalized Newton shift for computing singular values with the dqds algorithm
Kinji Kimura; Yusaku Yamamoto; Yusuke Hirota; Masami Takata; Takumi Yamashita; Yoshimasa Nakamura
Japan Journal of Industrial and Applied Mathematics, Springer Science and Business Media LLC, 03 Mar. 2025, Peer-reviwed, Abstract
The generalized Newton shift is a generalization of the Newton shift for computing the singular values of a bidiagonal matrix with the dqds or mdLVs algorithm and is defined as $$\left( \textrm{Tr}\left( \left( B^{(n)}\right) ^{\top } B^{(n)}\right) ^{-p}\right) ^{-1/p}$$ , where $$B^{(n)}$$ is the bidiagonal matrix at the n-th iteration and p is some positive integer. This quantity can be computed in O(pm) work, where m is the order of the matrix, and another $$O(p^2 m)$$ subtraction-free algorithm is also available for higher accuracy. We analyze the asymptotic convergence property of the dqds algorithm with the generalized Newton shift and show that its order of convergence is $$p+1-\epsilon $$ for any $$\epsilon >0$$ . This result is confirmed by numerical experiments. Since the generalized Newton shift can achieve arbitrarily high order of convergence, it should be useful as a building block for constructing an efficient shifting strategy for the dqds algorithm.
Scientific journal - 二重指数関数型数値積分公式に基づく行列符号関数計算法の改良と性能評価
宮下 朋也; 工藤 周平; 山本 有作
Corresponding, 日本応用数理学会論文誌, 34, 3, 66-97, Sep. 2024, Peer-reviwed
Scientific journal - q-discretization of the Kostant–Toda equation and its asymptotic analysis
Ryoto Watanabe; Masato Shinjo; Yusaku Yamamoto; Masashi Iwasaki
Transactions of Mathematics and Its Applications, Oxford University Press (OUP), 8, 2, 04 Jul. 2024, Peer-reviwed, Abstract
The famous Toda equation is an integrable system related to similarity transformations of tridiagonal matrices. The discrete Toda equation, which is a time-discretization of the Toda equation, is essentially the recursion formula of the quotient-difference (qd) algorithm for computing eigenvalues of tridiagonal matrices. Another time-discretization of the Toda equation is the $q$-discrete Toda equation, which is derived by replacing standard derivatives with the so-called $q$-derivatives that involves a parameter $q$ such that $0<q<1$. In a prior work, we related the $q$-discrete Toda equation to implicit-shift $LR$ transformations (which are similarity transformations) of tridiagonal matrices. Furthermore, we developed the determinantal solution to clarify the convergence as discrete-time goes to infinity. In this paper, we propose an extension of the $q$-discrete Toda equation as a time-discretization of the Kostant–Toda equation and then show the convergence as discrete-time goes to infinity from the perspective of implicit-shift $LR$ transformations of Hessenberg matrices. We also present numerical examples to verify the convergence as discrete-time goes to infinity in the proposed $q$-discrete Kostant–Toda equation.
Scientific journal - A fast and efficient computation method for reflective diffraction simulations
Shuhei Kudo; Yusaku Yamamoto; Takeo Hoshi
Computer Physics Communications, Elsevier BV, 296, 109029-109029, Mar. 2024, Peer-reviwed
Scientific journal - Approximate Block Diagonalization of Symmetric Matrices Using Quantum Annealing
Koushi Teramoto; Masaki Kugaya; Shuhei Kudo; Yasuhiko Takenaga; Yusaku Yamamoto
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, ACM, 18 Jan. 2024, Peer-reviwed
International conference proceedings - Box and ball system with numbered boxes and balls
Yusaku Yamamoto; Akiko Fukuda; Emiko Ishiwata; Masashi Iwasaki
Lead, RIMS Kokyuroku Bessatsu, B94, 1, 21-36, Nov. 2023, Peer-reviwed
Scientific journal, English - Automatic performance tuning using the ATMathCoreLib tool: Two experimental studies related to dense symmetric eigensolvers
Masato Kobayashi; Yusuke Hirota; Shuhei Kudo; Takeo Hoshi; Yusaku Yamamoto
Concurrency and Computation: Practice and Experience, Wiley, 30 Jun. 2023, Peer-reviwed, Summary
We consider automatic performance tuning of dense symmetric eigenvalue problems using ATMathCoreLib, which is a library to assist automatic tuning. We deal with two problems, namely, automatic code selection for the symmetric generalized eigenvalue problem in distributed‐memory parallel environments and automatic parameter tuning in tridiagonalization of dense symmetric matrices on multicore processors. As for the first problem, numerical experiments show that ATMathCoreLib can choose the fastest solver for a given computing environment and problem size quickly even if the fluctuation in the execution time is as high as 40%. As for the second problem, ATMathCoreLib was able to select nearly optimal combinations of the algorithm and its parameter reliably and efficiently for various computing environments and matrix sizes. The performance of auto‐tuning was further enhanced by incorporating a user‐provided execution‐time model into ATMathCoreLib.
Scientific journal - Discrete relativistic Toda equation from the perspective of shifted transformation
Yusaku Yamamoto; Naoya Minoshita; Masashi Iwasaki
Lead, Physica D, 440, 1, 133485-133493, 02 Aug. 2022, Peer-reviwed
Scientific journal, English - Convergence to Singular Triplets in the Two-Sided Block-Jacobi SVD Algorithm with Dynamic Ordering
Gabriel Okša; Yusaku Yamamoto; Marián Vajteršic
SIAM Journal on Matrix Analysis and Applications, 45, 3, 1238-1262, 01 Aug. 2022, Peer-reviwed
Scientific journal, English - Discrete hungry integrable systems — 40 years from the Physica D paper by W.W. Symes
Masato Shinjo; Akiko Fukuda; Koichi Kondo; Yusaku Yamamoto; Emiko Ishiwata; Masashi Iwasaki; Yoshimasa Nakamura
Physica D, 439, 1, 133422-133435, 23 Jun. 2022, Peer-reviwed
Scientific journal, English - Box and Ball System with Numbered Boxes
Yusaku Yamamoto; Akiko Fukuda; Sonomi Kakizaki; Emiko Ishiwata; Masashi Iwasaki; Yoshimasa Nakamura
Mathematical Physics, Analysis and Geometry, Springer, 25, 1, 1-20, 24 Apr. 2022, Peer-reviwed
Scientific journal, English - Combinatorial preconditioning for accelerating the convergence of the parallel block Jacobi method for the symmetric eigenvalue problem
Masaki Kugaya; Shuhei Kudo; Yusaku Yamamoto
JSIAM Letters, 13, 1, 56-59, 26 Sep. 2021, Peer-reviwed
Scientific journal, English - Acceleration and Parallelization of a Linear Equation Solver for Crack Growth Simulation Based on the Phase Field Model
Gaku Ishii; Yusaku Yamamoto; Takeshi Takaishi
Mathematics, MDPI Publishing, 9, 18, 2248-2268, 13 Sep. 2021, Peer-reviwed
Scientific journal, English - On eigenvalues of a matrix arising in energy-preserving/dissipative continuous-stage Runge-Kutta methods
Yusaku Yamamoto
Special Matrices, De Gruyter, 10, 1, 34-39, 10 Aug. 2021, Peer-reviwed
Scientific journal, English - Block red–black MILU(0) preconditioner with relaxation on GPU
Akemi Shioya; Yusaku Yamamoto
Parallel Computing, Elsevier, 103, 1, 102760-102772, 01 Jun. 2021, Peer-reviwed
Scientific journal, English - Error analysis of the cholesky qr-based block orthogonalization process for the one-sided block Jacobi SVD algorithm
Shuhei Kudo; Yusaku Yamamoto; Toshiyuki Imamura
Computing and Informatics, Slovak Academy of Sciences, Institute of Theatre and Film Research of the Center for Research in Art, 39, 6, 1203-1228, 01 May 2021, Peer-reviwed, The one-sided block Jacobi method (OSBJ) has attracted attention as a fast and accurate algorithm for the singular value decomposition (SVD). The computational kernel of OSBJ is orthogonalization of a column block pair, which amounts to computing the SVD of this block pair. Hari proposes three methods for this partial SVD, and we found through numerical experiments that the variant named “V2”, which is based on the Cholesky QR method, is the fastest variant and achieves satisfactory accuracy. While it is a good news from a practical viewpoint, it seems strange considering the well-known instability of the Cholesky QR method. In this paper, we perform a detailed error analysis of the V2 variant and explain why and when it can be used to compute the partial SVD accurately. Thus, our results provide a theoretical support for using the V2 variant safely in the OSBJ method.
Scientific journal, English - On convergence to eigenvalues and eigenvectors in the block-Jacobi EVD algorithm with dynamic ordering
Yusaku Yamamoto; Gabriel Oksa; Marian Vajtersic
Linear Algebra and its Applications, Elsevier, 622, 1, 19-45, 01 Mar. 2021, Peer-reviwed
Scientific journal, English - Convergence acceleration of the shifted LR transformations for totally nonnegative Hessenberg matrices, Applications of Mathematics
Akiko Fukuda; Yusaku Yamamoto; Masashi Iwasaki; Emiko Ishiwata; Yoshimasa Nakamura
Applications of Mathematics, 65, 5, 677-702, 13 Oct. 2020, Peer-reviwed, We design shifted LR transformations based on the integrable discrete hungry Toda equation to compute eigenvalues of totally nonnegative matrices of the banded Hessenberg form. The shifted LR transformation can be regarded as an extension of the extension employed in the well-known dqds algorithm for the symmetric tridiagonal eigenvalue problem. In this paper, we propose a new and effective shift strategy for the sequence of shifted LR transformations by considering the concept of the Newton shift. We show that the shifted LR transformations with the resulting shift strategy converge with order 2 − ε for arbitrary ε > 0.
Scientific journal, English - Fixed-point analysis of Ogita-Aishima’s symmetric eigendecomposition refinement algorithm for multiple eigenvalues
K. Shiroma; Y. Yamamoto
JSIAM Letters, 日本応用数理学会, 12, 1, 5-8, 01 Jun. 2020, Peer-reviwed
Scientific journal, English - Shifted Cholesky QR for Computing the QR Factorization of Ill-Conditioned Matrices.
Takeshi Fukaya; Ramaseshan Kannan; Yuji Nakatsukasa; Yusaku Yamamoto; Yuka Yanagisawa
SIAM J. Scientific Computing, 42, 1, -503, 2020, Peer-reviwed, The Cholesky QR algorithm is an efficient communication-minimizing algorithm
for computing the QR factorization of a tall-skinny matrix. Unfortunately it
has the inherent numerical instability and breakdown when the matrix is
ill-conditioned. A recent work establishes that the instability can be cured by
repeating the algorithm twice (called CholeskyQR2). However, the applicability
of CholeskyQR2 is still limited by the requirement that the Cholesky
factorization of the Gram matrix runs to completion, which means it does not
always work for matrices $X$ with $\kappa_2(X)\gtrsim { { \bf u } }^{-\frac{1},{2 } }$
where ${ { \bf u } }$ is the unit roundoff. In this work we extend the
applicability to $\kappa_2(X)=\mathcal{O}({\bf u}^{-1})$ by introducing a shift
to the computed Gram matrix so as to guarantee the Cholesky factorization
$R^TR= A^TA+sI$ succeeds numerically. We show that the computed $AR^{-1}$ has
reduced condition number $\leq { { \bf u } }^{-\frac{1},{2 } }$, for which CholeskyQR2
safely computes the QR factorization, yielding a computed $Q$ of orthogonality
$\|Q^TQ-I\|_2$ and residual $\|A-QR\|_F/\|A\|_F$ both $\mathcal{O}({ { \bf u } })$.
Thus we obtain the required QR factorization by essentially running Cholesky QR
thrice. We extensively analyze the resulting algorithm shiftedCholeskyQR to
reveal its excellent numerical stability. shiftedCholeskyQR is also highly
parallelizable, and applicable and effective also when working in an oblique
inner product space. We illustrate our findings through experiments, in which
we achieve significant (up to x40) speedup over alternative methods.
Scientific journal - Asymptotic Quadratic Convergence of the Two-Sided Serial and Parallel Block-Jacobi SVD Algorithm
Gabriel Oksa; Yusaku Yamamoto; Martin Becka; Marian Vajtersic
SIAM Journal on Matrix Analysis and Applications, Society for Industrial and Applied Mathematics, 40, 2, 639-671, 23 May 2019, Peer-reviwed
Scientific journal, English - Efficient implementations of the modified Gram–Schmidt orthogonalization with a non-standard inner product
Akira Imakura; Yusaku Yamamoto
Japan Journal of Industrial and Applied Mathematics, 36, 2, 619-641, 29 Apr. 2019, Peer-reviwed
Scientific journal, English - EigenKernel: A middleware for parallel generalized eigenvalue solvers to attain high scalability and usability
Kazuyuki Tanaka; Hiroto Imachi; Tomoya Fukumoto; Akiyoshi Kuwata; Yuki Harada; Takeshi Fukaya; Yusaku Yamamoto; Takeo Hoshi
Japan Journal of Industrial and Applied Mathematics, 36, 2, 719-742, 29 Apr. 2019, Peer-reviwed
Scientific journal, English - 荻田・相島の固有ベクトル反復改良法に基づく実対称行列の固有値分解追跡手法
白間久瑠美; 工藤周平; 山本有作
日本応用数理学会論文誌, 29, 1, 78-120, 25 Mar. 2019, Peer-reviwed
Scientific journal, Japanese - On Using the shifted minimal residual method for quantum-mechanical wave packet simulation
Hiroaki Seito; Takeo Hoshi; Yusaku Yamamoto
JSIAM Letters, 11, 1, 13-16, 05 Mar. 2019, Peer-reviwed
Scientific journal, English - A case study on modeling the performance of dense matrix computation: Tridiagonalization in the EigenExa eigensolver on the K computer
Takeshi Fukaya; Toshiyuki Imamura; Yusaku Yamamoto
Proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, 1, 1, 1-1, 06 Aug. 2018, Peer-reviwed
International conference proceedings, English - Performance of the parallel block Jacobi method with dynamic ordering for the symmetric eigenvalue problem
Shuhei Kudo; Kousuke Yasuda; Yusaku Yamamoto
JSIAM Letters, 10, 1, 41-44, 01 Aug. 2018, Peer-reviwed
Scientific journal, English - Asymptotic quadratic convergence of the parallel block-Jacobi EVD algorithm with dynamic ordering for Hermitian matrices
Gabriel Oksa; Yusaku Yamamoto; Marian Vajtersic
BIT Numerical Mathematics, 0, 0, 0-0, 01 May 2018, Peer-reviwed
Scientific journal, English - The danger of combining block red–black ordering with modified incomplete factorizations and its remedy by perturbation or relaxation
Akemi Shioya; Yusaku Yamamoto
Japan Journal of Industrial and Applied Mathematics, Springer Tokyo, 35, 1, 195-216, 01 Mar. 2018, Peer-reviwed, Modified incomplete LU/Cholesky factorizations without fill-ins are popular preconditioners for Krylov subspace methods, because they require no extra memory and have more potential of accelerating the convergence than simple ILU/IC preconditioners. For parallelizing preconditioners, the block red–black ordering is attractive due to its highly parallel nature and small number of synchronization points. Hence, their combination seems to produce powerful and parallelizable preconditioners. In fact, however, this combination can cause breakdown of the factorization due to the occurrence of zero pivots. We analyze this phenomenon and give necessary and sufficient conditions of zero pivots in the case of a regular grid. We also show both theoretically and experimentally that adding perturbation to the diagonal elements or relaxing the compensation of dropped fill-ins is useful to alleviate the problem. Numerical tests show that the resulting preconditioners are highly effective and are applicable for up to 10 3 level of parallelism.
Scientific journal, English - On using the Cholesky QR method in the full-blocked one-sided Jacobi algorithm
Shuhei Kudo; Yusaku Yamamoto
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 10777, 612-622, 2018, Peer-reviwed, The one-sided Jacobi method is known as an alternative of the bi-diagonalization based singular value decomposition (SVD) algorithms like QR, divide-and-conquer and MRRR, because of its accuracy and comparable performance. There is an extension of the one-sided Jacobi method called “full-blocked” method, which can further improve the performance by replacing level-1 BLAS like operations with matrix multiplications. The main part of the full-blocked one-sided Jacobi method (OSBJ) is computing the SVD of a pair of block columns of the input matrix. Thus, the computation method of this partial SVD is important for both accuracy and performance of OSBJ. Hari proposed three methods for this computation, and we found out that one of the method called “V2”, which computes the QR decomposition in this partial SVD using the Cholesky QR method, is the fastest and has comparable accuracy with other method. This is interesting considering that Cholesky QR is generally known as fast but unstable algorithm. In this article, we analyze the accuracy of V2 and explain why and when the Cholesky QR method used in it can compute the QR decomposition accurately. We also show the performance and accuracy comparisons with other computational methods.
International conference proceedings, English - One-way dissectionオーダリングによる連立一次方程式の直接解法の並列化
NAKANO Tomoki; YOKOKAWA MITSUO; FUKAYA Takeshi; YAMAMOTO Yusaku
情報処理学会第162回ハイパフォーマンスコンピューティング研究会, Information Processing Society of Japan, 2017-HPC-162, 19, 1-10, Dec. 2017, 時間依存シュレディンガー方程式のモデル問題として2次元ポアソン方程式を取り上げ,one-way dissection オーダリングによる正定値対称疎行列を係数行列にもつ連立一次方程式の並列直接解法に対して,いくつかの疎行列格納方式を用いた場合の性能評価結果について述べる.また,新しいスカイライン格納方式を提案し,その有効性を確認した.
Symposium, Japanese - Solution of a Nonlinear Eigenvalue Problem Using Signed Singular Values
Ooi, Kouhei; Mizuno, Yoshinori; Sogabe, Tomohiro; Yamamoto, Yusaku; Zhang, Shao-Liang
EAST ASIAN JOURNAL ON APPLIED MATHEMATICS, CAMBRIDGE UNIV PRESS, 7, 4, 799-809, Nov. 2017, Peer-reviwed, We propose a robust numerical algorithm for solving the nonlinear eigenvalue problem A (lambda) x = 0. Our algorithmis based on the idea of finding the value of. forwhich A(lambda) is singular by computing the smallest eigenvalue or singular value of A(lambda) viewed as a constant matrix. To further enhance computational efficiency, we introduce and use the concept of signed singular value. Our method is applicable when A(lambda) is large and nonsymmetric and has strong nonlinearity. Numerical experiments on a nonlinear eigenvalue problem arising in the computation of scaling exponent in turbulent flow show robustness and effectiveness of our method.
Scientific journal, English - Asymptotic quadratic convergence of the serial block-Jacobi EVD algorithm for Hermitian matrices
Gabriel Oksa; Yusaku Yamamoto; Marian Vajtersic
NUMERISCHE MATHEMATIK, SPRINGER HEIDELBERG, 136, 4, 1071-1095, Aug. 2017, Peer-reviwed, We provide the proof of the asymptotic quadratic convergence of the classical serial block-Jacobi EVD algorithm for Hermitian matrices with well-separated eigenvalues (including the multiple ones) as well as clusters of eigenvalues. At each iteration step, two off-diagonal blocks with the largest Frobenius norm are eliminated which is an extension of the original Jacobi approach to the block case. Numerical experiments illustrate and confirm the developed theory.
Scientific journal, English - On the optimality and sharpness of Laguerre's lower bound on the smallest eigenvalue of a symmetric positive definite matrix
Yusaku Yamamoto
APPLICATIONS OF MATHEMATICS, ACAD SCIENCES CZECH REPUBLIC, INST MATHEMATICS, 62, 4, 319-331, Aug. 2017, Peer-reviwed, Lower bounds on the smallest eigenvalue of a symmetric positive definite matrix A a R (mxm) play an important role in condition number estimation and in iterative methods for singular value computation. In particular, the bounds based on Tr(A (-1)) and Tr(A (-2)) have attracted attention recently, because they can be computed in O(m) operations when A is tridiagonal. In this paper, we focus on these bounds and investigate their properties in detail. First, we consider the problem of finding the optimal bound that can be computed solely from Tr(A (-1)) and Tr(A (-2)) and show that the so called Laguerre's lower bound is the optimal one in terms of sharpness. Next, we study the gap between the Laguerre bound and the smallest eigenvalue. We characterize the situation in which the gap becomes largest in terms of the eigenvalue distribution of A and show that the gap becomes smallest when {Tr(A (-1))}(2)/Tr(A (-2)) approaches 1 or m. These results will be useful, for example, in designing efficient shift strategies for singular value computation algorithms.
Scientific journal, English - Performance analysis and optimization of the parallel one-sided block Jacobi SVD algorithm with dynamic ordering and variable blocking
Shuhei Kudo; Yusaku Yamamoto; Martin Becka; Marian Vajtersic
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, WILEY, 29, 9, May 2017, Peer-reviwed, The one-sided block Jacobi (OSBJ) method is known to be an efficient method for computing the singular value decomposition on a parallel computer. In this paper, we focus on the most recent variant of the OSBJ method, the one with parallel dynamic ordering and variable blocking, and present both theoretical and experimental analyses of the algorithm. In the first part of the paper, we provide a detailed theoretical analysis of its convergence properties. In the second part, based on preliminary performance measurement on the Fujitsu FX10 and SGI Altix ICE parallel computers, we identify two performance bottlenecks of the algorithm and propose new implementations to resolve the problem. Experimental results show that they are effective and can achieve up to 1.8 and 1.4 times speedup of the total execution time on the FX10 and the Altix ICE, respectively. Comparison with the ScaLAPACK SVD routine PDGESVD shows that our OSBJ solver is efficient when solving small to medium sized problems (n<10000) using modest number (<100) of computing nodes. Copyright (c) 2016 John Wiley & Sons, Ltd.
Scientific journal, English - 箱と玉の両方に番号が付いた箱玉系について
福田 亜希子; 山本有作; 岩崎雅史; 石渡 恵美子; 中村佳正
九州大学応用力学研究所 研究集会報告 「非線形波動研究の深化と展開」, 28AO-S6, 87-92, Mar. 2017, Peer-reviwed
Japanese - Probabilistic analysis of an estimator for the Frobenius norm of a matrix product
Yusaku Yamamoto; Shuhei Kudo
JSIAM Letters, 9, 9-12, 23 Feb. 2017, Peer-reviwed
Scientific journal, English - Xeon PhiにおけるDSYRKの並列化手法と性能解析
工藤 周平; 山本 有作
情報処理学会論文誌コンピューティングシステム(ACS), 9, 4, 15-24, 30 May 2016, Peer-reviwed
Scientific journal, Japanese - On constructing cost models for online automatic tuning using ATMathCoreLib: Case studies through the SVD computation on a multicore processor
Seiji Nagashima; Takeshi Fukaya; Yusaku Yamamoto
2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), IEEE, 345-352, 2016, Peer-reviwed, We consider the problem of online automatic tuning. In this setting, we execute the target program with some tuning parameters N times, where N is given, while optimizing the parameters to minimize some objective function such as the total execution time. Thus we have to choose the parameters for each execution by taking into account the trade-off between exploration and exploitation. The ATMathCoreLib library developed by Suda is a set of software that solves this problem. To model the performance of the target software, ATMathCoreLib uses a linear statistical model, and its basis functions must be provided by the user.
In this paper, we investigate how to choose the basis functions appropriately, using the singular value decomposition of a square matrix as an example. We consider three cases, namely, (I) when the performance characteristics of the target problem are well understood by the user, (II) when the tuning parameter has a complicated structure, as occurs in the case of simultaneous selection of an algorithm and its parameter, and (III) when the performance characteristics of the target problem are not known to the user. The results of using ATMathCoreLib with different basis functions for each case are given. They help one understand the tuning by ATMathCoreLib and contribute to the progress of ATMathCoreLib.
International conference proceedings, English - Roundoff error analysis of the CholeskyQR2 algorithm in an oblique inner product
Y. Yamamoto; Y. Nakatsukasa; Y. Yanagisawa; T. Fukaya
JSIAM Letters, The Japan Society for Industrial and Applied Mathematics, 8, 5-8, 2016, Peer-reviwed, The Cholesky QR algorithm is an ideal QR decomposition algorithm for high performance computing, but known to be unstable. We present error analysis of the Cholesky QR algorithm in an oblique inner product defined by a positive definite matrix, and show that by repeating the algorithm twice (called CholeskyQR2), its stability is greatly improved.
Scientific journal, English - Performance of the Parallel One-Sided Block Jacobi SVD Algorithm on a Modern Distributed-Memory Parallel Computer
Shuhei Kudo; Yusaku Yamamoto; Martin Becka; Marian Vajtersic
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I, SPRINGER INT PUBLISHING AG, 9573, 594-604, 2016, Peer-reviwed, The one-sided block Jacobi (OSBJ) method is known to be an efficient algorithm for computing the singular value decomposition. In this paper, we evaluate the performance of the most recent variant of the OSBJ method, the one with dynamic ordering and variable blocking, on the Fujitsu FX10 parallel computer. By analyzing the performance results, we identified two bottlenecks, namely, weight computation for ordering and diagonalization of 2x2 block matrices. To resolve the problem, we propose new implementations for these two tasks. Experimental results show that they are effective and can achieve speedup of up to 1.6 times in total. As a result, our OSBJ solver can compute the SVD of matrices of order 2048 to 8192 on 12 to 48 nodes of FX10 more than three times faster than ScaLAPACK PDGESVD.
International conference proceedings, English - On constructing cost models for online automatic tuning using ATMathCoreLib
Seiji Nagashima; Takeshi Fukaya; Yusaku Yamamoto
Proceedings of IEEE MCSoC 2016, IEEE Computer Society Press, 1, 1, 1-8, 2016, Peer-reviwed
International conference proceedings, English - CONSERVED QUANTITIES OF THE INTEGRABLE DISCRETE HUNGRY SYSTEMS
Sonomi Kakizaki; Akiko Fukuda; Yusaku Yamamoto; Masashi Iwasaki; Emiko Ishiwata; Yoshimasa Nakamura
DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS-SERIES S, AMER INST MATHEMATICAL SCIENCES, 8, 5, 889-899, Oct. 2015, Peer-reviwed, In this paper, conserved quantities of the discrete hungry Lotka-Volterra (dhLV) system are derived. Our approach is based on the Lax representation of the dhLV system, which expresses the time evolution of the dhLV system as a similarity transformation on a certain square matrix. Thus, coefficients of the characteristic polynomial of this matrix constitute conserved quantities of the dhLV system. These coefficients are calculated explicitly through a recurrence relation among the characteristic polynomials of its leading principal submatrices. The conserved quantities of the discrete hungry Toda (dhToda) equation is also derived with the help of the Backlund transformation between the dhLV system and the dhToda equation.
Scientific journal, English - A new subtraction-free formula for lower bounds of the minimal singular value of an upper bidiagonal matrix
Takumi Yamashita; Kinji Kimura; Yusaku Yamamoto
NUMERICAL ALGORITHMS, SPRINGER, 69, 4, 893-912, Aug. 2015, Peer-reviwed, Traces of inverse powers of a positive definite symmetric tridiagonal matrix give lower bounds of the minimal singular value of an upper bidiagonal matrix. In a preceding work, a formula for the traces which gives the diagonal entries of the inverse powers is presented. In this paper, we present another formula which gives the traces based on a quite different idea from the one in the preceding work. An efficient implementation of the formula for practice is also presented.
Scientific journal, English - 離散戸田方程式のある拡張に基づくTotally Nonnegative行列の固有値計算アルゴリズム
隅蔵亮; 福田亜希子; 石渡恵美子; 山本有作; 岩崎雅史; 中村佳正
九州大学応用力学研究所 研究集会報告 「非線形波動研究の現状-課題と展望を探る-」, 26AO-S2, 115-120, Mar. 2015, Peer-reviwed
Japanese - Eigenvalue computation of totally nonnegative upper Hessenberg matrices based on a variant of the discrete hungry Toda equation
Ryo Sumikura; Akiko Fukuda; Emiko Ishiwata; Yusaku Yamamoto; Masashi Iwasaki; Yoshimasa Nakamura
AIP Conference Proceedings 1648, 1648, 2015, Peer-reviwed
International conference proceedings, English - An asymptotic analysis for an integrable variant of the Lotka-Volterra prey-predator model via a determinant expansion technique
Masato Shinjo; Masashi Iwasaki; Akiko Fukuda; Emiko Ishiwata; Yusaku Yamamoto; Yoshimasa Nakamura
COGENT MATHEMATICS, TAYLOR & FRANCIS AS, 2, 1, 1-14, 2015, Peer-reviwed, The Hankel determinant appears in representations of solutions to several integrable systems. An asymptotic expansion of the Hankel determinant thus plays a key role in the investigation of asymptotic analysis of such integrable systems. This paper presents an asymptotic expansion formula of a certain Casorati determinant as an extension of the Hankel case. This Casorati determinant is then shown to be associated with the solution to the discrete hungry Lotka-Volterra (dhLV) system, which is an integrable variant of the famous prey-predator model in mathematical biology. Finally, the asymptotic behavior of the dhLV system is clarified using the expansion formula for the Casorati determinant.
Scientific journal, English - ROUNDOFF ERROR ANALYSIS OF THE CHOLESKYQR2 ALGORITHM
Yusaku Yamamoto; Yuji Nakatsukasa; Yuka Yanagisawa; Takeshi Fukaya
ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, KENT STATE UNIVERSITY, 44, 306-326, 2015, Peer-reviwed, We consider the QR decomposition of an m x n matrix X with full column rank, where m >= n. Among the many algorithms available, the Cholesky QR algorithm is ideal from the viewpoint of high performance computing since it consists entirely of standard level 3 BLAS operations with large matrix sizes, and requires only one reduce and broadcast in parallel environments. Unfortunately, it is well-known that the algorithm is not numerically stable and the deviation from orthogonality of the computed Q factor is of order O((kappa(2)(X))(2) u), where kappa(2)(X) is the 2-norm condition number of X and u is the unit roundoff. In this paper, we show that if the condition number of X is not too large, we can greatly improve the stability by iterating the Cholesky QR algorithm twice. More specifically, if kappa(2)(X) is at most O(u(-1/2)), both the residual and deviation from orthogonality are shown to be of order 0(u). Numerical results support our theoretical analysis.
Scientific journal, English - Implementation details of an extended oqds algorithm for singular values
S. Araki; K. Kimura; Y. Yamamoto; Y. Nakamura
JSIAM Letters, The Japan Society for Industrial and Applied Mathematics, 7, 9-12, 2015, Peer-reviwed, We introduce an extended oqds algorithm for singular values of lower tridiagonal matrix which is a condensed form of inputted full matrix. Reduction to the lower tridiagonal matrix is able to be performed using cache-efficient block Householder method based on BLAS 2.5 routines. In this letter, we describe the implementation details of the latter algorithm such as the shift strategy and criteria for deflation and splitting. The effectiveness of our approach is demonstrated by numerical experiments.
Scientific journal, English - 通信削減アルゴリズムCAQRのRSDFTの直交化処理への適用と評価
片桐孝洋; 高山恒一; 米村崇; 熊洞宏樹; 猪貝光祥; 北上純一; 江口義之; 深谷猛; 山本有作; 岩田潤一; 内田和之; 大島聡史; 中島研吾
情報処理学会研究報告(Web), 一般社団法人情報処理学会, 2014, HPC-144, VOL.2014-HPC-144,NO.3 (WEB ONLY)-6, 19 May 2014, 本報告では,量子力学的第一原理シミュレーションのソフトウェア RSDFT における直交化処理に,通信削減アルゴリズムを用いた QR 分解アルゴリズムである CAQR を組み込んだ性能について報告する.東京大学情報基盤センターの FX10 を用いた 1,024 ノード実行 (4,096MPI,MPI 当たり 4OMP 実行のハイブリッド MPI-OpenMP 実行) におけるバンド分割が 64 の時の実行では,従来の Gram-Schmidt 法による直交化に比べ CAQR を利用すると,最大で 11 倍の高速化が得られる事例があった.
Japanese - 箱に番号が付いた新しい箱玉系について
柿崎苑美; 福田亜希子; 石渡恵美子; 山本有作; 岩崎雅史; 中村佳正
九州大学応用力学研究所 研究集会報告 「非線形波動研究の拡がり」, 25AO-S2, 121-126, Mar. 2014, Peer-reviwed
Japanese - Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection.
Takeshi Fukaya; Toshiyuki Imamura; Yusaku Yamamoto
High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30 - July 3, 2014, Revised Selected Papers, Springer, 269-283, 2014, Peer-reviwed - Convergence analysis of the parallel classical block Jacobi method for the symmetric eigenvalue problem
Y. Yamamoto; L. Zhang; S. Kudo
JSIAM Letters, 6, 57-60, 2014, Peer-reviwed
Scientific journal, English - Performance analysis of the Householder-type parallel tall-skinny QR factorizations toward automatic algorithm selection
T. Fukaya; T. Imamura; Y. Yamamoto
Proceedings of VECPAR 2014, 1, 1, 1-1, 2014, Peer-reviwed
International conference proceedings, English - CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System
Takeshi Fukaya; Yuji Nakatsukasa; Yuka Yanagisawa; Yusaku Yamamoto
2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), IEEE, 1, 1, 31-38, 2014, Peer-reviwed, Designing communication-avoiding algorithms crucial for high performance computing on a large-scale parallel system. The TSQR algorithm is a communication-avoiding algorithm for computing a tall-skinny QR factorization, and TSQR is known to he much faster and as stable as the classical Householder QR algorithm The Cholesky QR algorithm is another very simple and fast communication-avoiding algorithm, but rarely used in practice because of its numerical instability. Our recent work points out that an algorithm that simply repeats Cholesky QR twice, which we call CholeskyQR2, gives excellent accuracy for a wide range of matrices arising in practice. Although the communication cost of CholeskyQR2 is twice that of TSQR, it has an advantage that its reduction operation is addition whereas that of TSQR is a QR factorization, whose high-performance implementation is more difficult. Thus, CholeskvQR2 can potentially be significantly faster than TSQR. Indeed, in our experiments using 16384 nodes of the K computer, CholeskyQR2 ran about three times faster than TSQR for a 4194304 x 64 matrix,
International conference proceedings, English - Integrable discrete hungry systems and their related matrix eigenvalues
Akiko Fukuda; Emiko Ishiwata; Yusaku Yamamoto; Masashi Iwasaki; Yoshimasa Nakamura
ANNALI DI MATEMATICA PURA ED APPLICATA, SPRINGER HEIDELBERG, 192, 3, 423-445, Jun. 2013, Peer-reviwed, Recently, some of the authors designed an algorithm, named the dhLV algorithm, for computing complex eigenvalues of a certain class of band matrix. The recursion formula of the dhLV algorithm is based on the discrete hungry Lotka-Volterra (dhLV) system, which is an integrable system. One of the authors has proposed an algorithm, named the multiple dqd algorithm, for computing eigenvalues of a totally nonnegative (TN) band matrix. In this paper, by introducing a theorem on matrix eigenvalues, we first show that the eigenvalues of a TN matrix are also computable by the dhLV algorithm. We next clarify the asymptotic behavior of the discrete hungry Toda (dhToda) equation, which is also an integrable system, and show that a similarity transformation for a TN matrix is given through the dhToda equation. Then, by combining these properties of the dhToda equation, we design a new algorithm, named the dhToda algorithm, for computing eigenvalues of a TN matrix. We also describe the close relationship among the above three algorithms and give numerical examples.
Scientific journal, English - Discrete Integrable Systems of Hungry Type and Numerical Algorithms for Eigenvalues of Nonsymmetric Matrices : Recent Developments in Integrable Algorithms(Survey,
Activity Group "Applied Integrable Systems")
Fukuda Akiko; Iwasaki Masashi; Yamamoto Yusaku; Ishiwata Emiko; Nakamura Yoshimasa
Transactions of the Japan Society for Industrial and Applied Mathematics, The Japan Society for Industrial and Applied Mathematics, 23, 1, 109-181, 2013, Peer-reviwed, Some numerical algorithms for computing eigenvalues of nonsymmetric matrix with high accuracy have been recently designed based on the discrete hungry Toda equation and the discrete Lotka-Volterra system which are known as the discrete integrable systems of hungry type. In this paper, not only the process for formulating these algorithms but also the results concerning asymptotic analysis through the center manifold theory, mixed error analysis in floating point arithmetic and shift of origin for accelerating convergence are shortly explained. Backlund transformations between discrete integrable systems of hungry type are also shown.
Scientific journal, Japanese - On a shifted LR transformation derived from the discrete hungry Toda equation
Akiko Fukuda; Yusaku Yamamoto; Masashi Iwasaki; Emiko Ishiwata; Yoshimasa Nakamura
Monatshefte fur Mathematik, 170, 1, 11-26, 2013, Peer-reviwed, The discrete hungry Toda (dhToda) equation is known as an integrable system which is derived from the study of the numbered box and ball system. In the authors' paper (Fukuda et al., in Phys Lett A 375, 303-308, 2010), we associate the dhToda equation with a sequence of LR transformations for a totally nonnegative (TN) matrix, and then, in another paper (Fukuda et al. in Annal Math Pura Appl, 2011), based on the dhToda equation, we design an algorithm for computing the eigenvalues of the TN matrix. In this paper, in order to accelerate the convergence speed, we first introduce the shift of origin into the LR transformations associated with the dhToda equation, then derive a recursion formula for generating the shifted LR transformations. We next present a shift strategy for avoiding the breakdown of the shifted LR transformations. We finally clarify the asymptotic convergence and show that the sequence of TN matrices generated by the shifted LR transformations converges to a lower triangular matrix, exposing the eigenvalues of the original TN matrix. The asymptotic convergence is also verified through some numerical examples. © 2012 Springer-Verlag.
Scientific journal, English - Performance Optimization for the Blocked Householder QR Decomposition Using the Dynamic Programming
深谷 猛; 山本 有作; 張 紹良
情報処理学会論文誌 論文誌トランザクション, 情報処理学会, 2011, 2, 146-157, Apr. 2012, Peer-reviwed
Japanese - シフト付きLR変換に付随する離散ハングリー系について
濱洋輔; 福田亜希子; 山本有作; 岩崎雅史; 石渡恵美子; 中村佳正
九州大学応用力学研究所 研究集会報告 「非線形波動研究の進展 -現象と数理の相互作用-」, 23AO-S7, 159-164, Mar. 2012, Peer-reviwed
Japanese - Error analysis for matrix eigenvalue algorithm based on the discrete hungry Toda equation
A. Fukuda; Y. Yamamoto; M. Iwasaki; E. Ishiwata; Y. Nakamura
Numerical Algorithms, 61, 2, 243-260, 2012, Peer-reviwed
English - On some properties of a discrete hungry Lotka-Volterra system of multiplicative type
Yosuke Hama; Akiko Fukuda; Yusaku Yamamoto; Masashi Iwasaki; Emiko Ishiwata; Yoshimasa Nakamura
Journal of Math-for-Industry, 4, 5-15, 2012, Peer-reviwed
English - Optimization of the Multishift QR Algorithm with Coprocessors for Non-Hermitian Eigenvalue Problems
Takafumi Miyata; Yusaku Yamamoto; Takashi Uneyama; Yoshimasa Nakamura; Shao-Liang Zhang
EAST ASIAN JOURNAL ON APPLIED MATHEMATICS, GLOBAL SCIENCE PRESS, 1, 2, 187-196, May 2011, Peer-reviwed, The multishift QR algorithm is efficient for computing all the eigenvalues of a dense, large-scale, non-Hermitian matrix. The major part of this algorithm can be performed by matrix-matrix multiplications and is therefore suitable for modern processors with hierarchical memory. A variant of this algorithm was recently proposed which can execute more computational parts by matrix-matrix multiplications. The algorithm is especially appropriate for recent coprocessors which contain many processor-elements such as the CSX600. However, the performance of the algorithm highly depends on the setting of parameters such as the numbers of shifts and divisions in the algorithm. Optimal settings are different depending on the matrix size and computational environments. In this paper, we construct a performance model to predict a setting of parameters which minimizes the execution time of the algorithm. Experimental results with the CSX600 coprocessor show that our model can be used to find the optimal setting.
Scientific journal, English - A block IDR(s) method for nonsymmetric linear systems with multiple right-hand sides
L. Du; T. Sogabe; B. Yu; Y. Yamamoto; S. -L. Zhang
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, ELSEVIER SCIENCE BV, 235, 14, 4095-4106, May 2011, The IDR(s) based on the induced dimension reduction (IDR) theorem, is a new class of efficient algorithms for large nonsymmetric linear systems. IDR(1) is mathematically equivalent to BiCGStab at the even IDR(1) residuals, and IDR(s) with s > 1 is competitive with most Bi-CG based methods. For these reasons, we extend the IDR(s) to solve large nonsymmetric linear systems with multiple right-hand sides. In this paper, a variant of the IDR theorem is given at first, then the block IDR(s), an extension of IDR(s) based on the variant IDR(s) theorem, is proposed. By analysis, the upper bound on the number of matrix-vector products of block IDR(s) is the same as that of the IDR(s) for a single right-hand side in generic case, i.e., the total number of matrix-vector products of IDR(s) may be m times that of of block IDR(s), where in is the number of right-hand sides. Numerical experiments are presented to show the effectiveness of our proposed method. (C) 2011 Elsevier B.V. All rights reserved.
Scientific journal, English - A Bäcklund transformation between two integrable discrete hungry systems
A. Fukuda; Y. Yamamoto; M. Iwasaki; E. Ishiwata; Y. Nakamura
Physics Letters, Section A: General, Atomic and Solid State Physics, 375, 3, 303-308, 17 Jan. 2011, Peer-reviwed
English - Four-point correlation function of a passive scalar field in rapidly fluctuating turbulence: Numerical analysis of an exact closure equation
Y. Mizuno; K. Ohi; T. Sogabe; Y. Yamamoto; Y. Kaneda
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 82, 20 Sep. 2010, A numerical analysis is made on the four-point correlation function in a similarity range of a model of two-dimensional passive scalar field ψ advected by a turbulent velocity field with infinitely small correlation time. The model yields an exact closure equation for the four-point correlation Ψ4 of ψ, which may be casted into the form of an eigenvalue problem in the similarity range. The analysis of the eigenvalue problem gives not only the scale dependence of Ψ4, but also the dependence on the configuration of the four points. The numerical analysis gives S4 (R) R ζ4 in the similarity range in which S2 (R) R ζ2, where SN is the structure function defined by SN (R) [ψ (x+R) -ψ x)] N and ζ4 ≠ 2 ζ2. The estimate of ζ4 by the numerical analysis of the eigenvalue problem is in good agreement with numerical simulations so far reported. The agreement supports the idea of universality of the exponent ζ4 in the sense that ζ4 is insensitive to conditions of ψ outside the similarity range. The numerical analysis also shows that the correlation C (R,r) [ψ (x+R)-ψ (x)] 2 [ψ (x+r) -ψ (x) ] 2 is stronger than that given by the joint-normal approximation, and scales like C (R,r) (r/R) χ for r/R1 with R and r in the similarity range, where χ is a constant depending on the angle between R and r. © 2010 The American Physical Society. - 密正方行列特異値分解における並列I-SVD法の特性を用いた後処理の高速化
豊川博己; 山本有作; 木村欣司; 高田雅美; 中村佳正
情報処理学会論文誌. コンピューティングシステム, 3, 2, 30-38, Jun. 2010, Peer-reviwed
Scientific journal, Japanese - Acceleration of Hessenberg reduction for nonsymmetric eigenvalue problems using GPU
Jun-Ichi Muramatsu; Shao-Liang Zhang; Yusaku Yamamoto
Proceedings - 2010 1st International Conference on Networking and Computing, ICNC 2010, 215-219, 2010, Solution of large-scale dense nonsymmetric eigen-value problem is required in many areas of scientific and engineering computing, such as vibration analysis of automobiles and analysis of electronic diffraction patterns. In this study, we focus on the Hessenberg reduction step and consider accelerating it using GPU. Our main strategy is to use the CUBLAS, an optimized BLAS library for GPU. However, since Hessenberg reduction requires operations not supported by CUBLAS, we combine CPU and GPU to perform the computation. We propose two approaches for combining CPU and GPU: the one that performs as much work as possible on GPU and the one that aggressively assigns computation of small-size matrices to CPU. Experimental results show that the latter approach is considerably faster than the former. Compared with the computation on the Core i7 processor with 4 cores, the latter approach with the Tesla C1060 GPU and the Core 17 processor achieves 2.8 times speedup when computing the Hessenberg form of a 4,800 x 4,800 real matrix. © 2010 IEEE.
International conference proceedings, English - Dynamic programming approaches to optimizing the blocking strategy for basic matrix decompositions
Yusaku Yamamoto; Takeshi Fukaya
Software Automatic Tuning: From Concepts to State-of-the-Art Results, Springer New York, 69-85, 2010, Peer-reviwed, In this chapter, we survey several approaches to optimizing the blocking strategy for basic matrix decompositions, such as LU, Cholesky, and QR. Conventional blocking strategies such as fixed-size blocking and recursive blocking are widely used to optimize the performance of these decompositions. However, these strategies have only a small number of parameters such as the block size or the level of recursion and are not sufficiently flexible to exploit the performance of modern high-performance architectures. As such, several attempts have been made to define a much larger class of strategies and to choose the best strategy among them according to the target machine and the matrix size. The number of candidate strategies is usually exponential in the size of the matrix. However, with the use of dynamic programming, the cost of optimization can be reduced to a realistic level. As representatives of such approaches, we survey variable-size blocking, generalized recursive blocking, and the combination of variable-size blocking and the TSQR algorithm. Directions for future research are also discussed. © 2010 Springer Science+Business Media LLC.
In book, English - Differential qd algorithm for totally nonnegative Hessenberg matrices: introduction of origin shifts and relationship with the discrete hungry Lotka-Volterra system
YAMAMOTO YUSAKU; FUKAYA TAKESHI
JSIAM Lett (Web), 2, 69-72 (J-STAGE), 2010, Peer-reviwed
English - An Extension of the Sakurai‐Sugiura Method for Eigenvalue Problems of Multiply Connected Region
宮田考史; DU Lei; 曽我部知広; 山本有作; ZHANG Shao‐Liang
日本応用数理学会論文誌, 19, 4, 537-550, 25 Dec. 2009
Japanese - Acceleration of the Singular Value Decomposition Algorithm for Square Matrices Using CUDA
深谷猛; 山本有作; 畝山多加志; 中村佳正
情報処理学会論文誌トランザクション(CD-ROM), 2009, 1, KONPYUTINGUSHISUTEMU,VOL.2,NO.2,98-109, Nov. 2009, Peer-reviwed
Japanese - 正方行列向け特異値分解のCUDAによる高速化
深谷猛; 山本有作; 畝山多加志; 中村佳正
情報処理学会論文誌. コンピューティングシステム, 2, 2, 98-109, Jul. 2009, Peer-reviwed
Scientific journal, Japanese - An efficient bidiagonalization algorithm for combined CPU-accelerator environments
Yusaku Yamamoto; Takeshi Fukaya; Takashi Uneyama; Yoshimasa Nakamura
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2009, 121-126, 2009, In computing the singular values of a square matrix, transformation of the input matrix to bidiagonal form accounts for most of the computation time. In this paper, we consider speeding up this process using a combination of CPU and floating-point accelerator. As an algorithm for bidiagonalization, we can use the conventional Householder's method or Bischof's two-phase algorithm, which can use the level-3 BLAS efficiently. We can also choose to store the whole matrix in the CPU memory or in the on-board memory of the accelerator. So there are four possible strategies. We investigate the advantages and disadvantages of each strategy and construct an analytical performance model for each of them. Using the models, we predict the performance of bidiagonalzation on the CSX600 accelerator and show that it is the best to achieve high performance to use Bischof's algorithm with the matrix stored in the on-board memory. This conclusion should hold for many other accelerators with similar performance characteristics.
International conference proceedings, English - Differential qd algorithm for totally nonnegative band matrices: convergence properties and error analysis
YAMAMOTO YUSAKU; FUKAYA TAKESHI
JSIAM Lett (Web), 1, 56-59 (J-STAGE), 2009, Peer-reviwed
English - Asymptotic analysis of a new multishift QR method for symmetric tridiagonal eigenproblems
Miyata T; Iwasaki M; Yamamoto Y; Zhang S-L
Trans. JSIAM, 18, 4, 563-577, Dec. 2008, Peer-reviwed
Scientific journal, Japanese - A dynamic programming approach to optimizing the blocking strategy for the Householder QR decomposition.
Takeshi Fukaya; Yusaku Yamamoto; Shao-Liang Zhang
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September - 1 October 2008, Tsukuba, Japan, IEEE Computer Society, 402-410, 2008, Peer-reviwed - Acceleration of the Singular Value Decomposition Algorithm for Rectangular Matrices with a Floating-point Coprocessor
深谷猛; 山本有作; 畝山多加志; 堀玄, 梅野健
情報処理学会論文誌, 48, SIG8(ACS18), 31-43, May 2007, Peer-reviwed
Japanese - Accelerating the complex Hessenberg QR algorithm with the csx600 floating-point coprocessor
Yusaku Yamamoto; Takafumi Miyata; Yoshimasa Nakamura
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, 204-211, 2007, In this paper, we show how to speed up the Hessenberg QR algorithm for computing the eigenvalues of a dense complex nonsymmetric matrix using the CSX600 floating-point coprocessor. Our approach is based on the small-bulge multishift QR algorithm proposed by Braman et al. This algorithm can perform a major part of the computation in the form of matrix multiplication and is therefore very suited to high performance architectures. However, to exploit the potential of the CSX600, the number of shifts must be increased to more than two hundred. In that case, the parts other than matrix multiplication begin to occupy a considerable percentage of the execution time, thereby limiting the speedup. To solve this problem, we focus on two most time-consuming parts in the algorithm except for matrix multiplication: accumulation of bulge-chasing transformations and solution of a decoupled small eigenproblem. We reconstruct the former as a recursive algorithm and apply a blocking technique to the latter. This enables these parts to be computed also using matrix multiplication. As a result of these optimizations, we obtained up to 3.8 times speedup with the CSX600 processor over a 3.2GHz Xeon processor when computing the eigenvalues of a 12,000 × 12,000 complex Hessenberg matrix.
International conference proceedings, English
MISC
- 運搬車容量付きの箱に番号が付いた箱玉系とシフト付きLR変換との対応
西脇友哉; 福田亜希子; 山本有作; 岩崎雅史
2022, 日本応用数理学会年会講演予稿集(CD-ROM), 2022, 1345-3378, 202302270405635318 - 巨大行列の固有値計算
山本有作
日本評論社, 01 Feb. 2020, 数学セミナー, 59, 2, 12-17, Japanese, Peer-reviwed, Invited, Introduction commerce magazine - A parallelizable energy-preserving integrator MB4 and its application to quantum-mechanical wavepacket dynamics
Tsubasa Sakai; Shuhei Kudo; Hiroto Imachi; Yuto Miyatake; Takeo Hoshi; Yusaku Yamamoto
submitted, 2020, Preprint: https://arxiv.org/abs/2003.04345/ - 緩和型スーパーノードマルチフロンタル法の最適な緩和パラメータについて
NAKANO Tomoki; YOKOKAWA MITSUO; FUKAYA Takeshi; YAMAMOTO Yusaku
数値シミュレーションにおける多くの問題は,偏微分方程式を離散化して得られる連立一次方程式を解く問題に帰着される.そして,多くの場合,連立一次方程式を解く時間は全体のシミュレーション時間の大部分を占める.よって,連立一次方程式を高速に解くことは非常に重要である.本研究では,正定値対称行列に適用できるコレスキー分解を扱う.疎行列に対して,コレスキー分解を行う手法はいくつかあるが,本稿では,緩和型スーパーノードマルチフロンタル法を用いた.同手法では,2 つのスーパーノードを融合する際に非零と見なす零要素数の上限である緩和パラメータが性能に大きな影響を与える。そこで,このパラメータの最適値を求めることを目的として,Intel Xeon (Ivy Bridge-EX) とIntel Xeon Phi(Knights Landing, KNL) のそれぞれ1 コ, Information Processing Society of Japan, Dec. 2018, 情報処理学会第167回ハイパフォーマンスコンピューティング研究会, 2018-HPC-167, 28, 1-8, Japanese - 1次元分散型のCAQRアルゴリズムの性能評価とパネルサイズの自動チューニングに向けた検討
深谷猛; 深谷猛; 深谷猛; 山本有作; 山本有作; 今村俊幸; 今村俊幸
08 Jun. 2015, 計算工学講演会論文集(CD-ROM), 20, ROMBUNNO.E-1-3, Japanese, 1342-145X, 201502216577000797 - シフト付きコレスキーQR分解アルゴリズムの提案
柳澤優香; 深谷猛; 中務佑治; RAMASESHAN Kannan; 山本有作; 大石進一
27 Aug. 2014, 日本応用数理学会年会講演予稿集(CD-ROM), 2014, ROMBUNNO.9GATSU4NICHI,09:30,F,2, Japanese, 1345-3378, 201402216291168986 - 大規模並列計算機上での再直交化付きコレスキーQR分解の性能評価
深谷猛; 中務佑治; 柳澤優香; 山本有作
27 Aug. 2014, 日本応用数理学会年会講演予稿集(CD-ROM), 2014, ROMBUNNO.9GATSU3NICHI,11:00,E,1, Japanese, 1345-3378, 201402294746261271 - 固有値計算のためのハングリー型可積分アルゴリズムについて
福田亜希子; 山本有作; 岩崎雅史; 石渡恵美子; 中村佳正
27 Aug. 2014, 日本応用数理学会年会講演予稿集(CD-ROM), 2014, ROMBUNNO.9GATSU4NICHI,09:30,C,2, Japanese, 1345-3378, 201402264849885641 - 箱に番号が付いた新しい箱玉系について
柿崎苑美; 福田亜希子; 石渡恵美子; 山本有作; 岩崎雅史; 中村佳正
2014, 九州大学応用力学研究所研究集会報告, 25AO-S2, 65-75, Japanese, Peer-reviwed, Summary national conference - 超並列環境における密行列計算プログラムの性能モデリングに向けた検討
深谷猛; 今村俊幸; 山本有作
24 Jul. 2013, 情報処理学会研究報告(Web), 2013, HPC-140, WEB ONLY VOL.2013-HPC-140,NO.41, Japanese, 201502261843444684 - 超並列環境における縦長行列のQR分解に対する自動チューニングの検討
深谷猛; 山本有作
19 Jun. 2013, 計算工学講演会論文集(CD-ROM), 18, ROMBUNNO.D-13-4, Japanese, 1342-145X, 201302237393474291 - 京における密行列固有値ソルバEigen-Kの性能評価と性能モデリング
深谷猛; 今村俊幸; 山本有作
15 May 2013, 先進的計算基盤システムシンポジウム論文集, 2013, 132-133, Japanese, 170000076930 - オンライン自動チューニング数理基盤ライブラリATMathCoreLibの特異値分解問題への適用
長島聖児; 深谷猛; 山本有作
2013, 日本応用数理学会年会講演予稿集(CD-ROM), 2013, ROMBUNNO.9076, Japanese, 1345-3378, 201402207997990957 - ブロックヤコビ法に基づく固有値解法の超並列計算機上での実装
工藤周平; 高橋佑輔; 深谷猛; 山本有作
2013, 日本応用数理学会年会講演予稿集(CD-ROM), 2013, ROMBUNNO.9142, Japanese, 1345-3378, 201402271508181510 - 京コンピュータにおける対称密行列向け固有値計算プログラムの性能評価と性能予測
深谷猛; 今村俊幸; 山本有作
2013, 日本応用数理学会年会講演予稿集(CD-ROM), 2013, ROMBUNNO.9104, Japanese, 1345-3378, 201402299178620107 - ハウスホルダー変換に基づく直交化法の最近の進展 : 並列計算・高性能計算の観点から (科学技術計算における理論と応用の新展開)
山本 有作
京都大学, Apr. 2012, 数理解析研究所講究録, 1791, 57-65, Japanese, 1880-2818, 110009435842, AN00061013 - 離散ハングリー戸田方程式に基づく Totally Nonnegative 行列に対する固有値計算
福田亜希子; 山本有作; 岩崎雅史; 石渡恵美子; 中村佳正
Apr. 2012, 京都大学数理解析研究所講究録 「科学技術計算における理論と応用の新展開」, 1791, 1-10, Japanese - SMP上での並列QR分解に対する自動チューニングの検討
深谷猛; 山本有作; 張紹良
2012, 日本応用数理学会年会講演予稿集(CD-ROM), 2012, 283-284, Japanese, 1345-3378, 201302298604265740 - Performance Optimization for the Blocked Householder QR Decomposition Using the Dynamic Programming
深谷 猛; 山本 有作; 張 紹良
密行列計算においては,高性能化のためにアルゴリズムのブロック化が必須である.その際に,ブロック化の方法次第で性能が大きく変化するため,その最適化が重要な課題となっている.しかしながら,ブロック化の自由度が大きいため,従来は限定された範囲内で最適化を行うことがほとんどである.本論文では,QR 分解アルゴリズムを対象として,二分木を使うことで従来より格段に広いクラスのブロック化の方法を系統的に扱い,その中から動的計画法により最適なブロック化の方法を決定する手法を提案する.数値実験の結果,提案手法がブロック分割法に対する自動チューニング手法として有望であることが示された.Blocking techniques are widely used in high performance matrix computations. When using them, it is important to optimize a blocking way, which influences the performance of computations. However, because of the high degree of freedom in blocking techniques, such optimization is generally done in a limited class of blocking ways. In this paper, we propose a framework to determine the efficient blocking way for the algorithm of QR decomposition. In our framework, various kinds of blocking ways are represented systematically with binary trees and an optimal one is determined by dynamic programming. Results of numerical experiments show that our framework has good possibilities in the view of the automatic performance tuning., 情報処理学会, 05 Oct. 2011, 情報処理学会論文誌コンピューティングシステム(ACS), 4, 4, 146-157, Japanese, 1882-7829, 40019259189, AA11833852 - Automatic Tuning for the Algorithm of QR Decomposition-an investigation into performance models-
深谷猛; 山本有作; ZHANG Shao‐Liang
15 Aug. 2011, 情報処理学会研究報告(CD-ROM), 2011, 2, ROMBUNNO.HPC-130,NO.42, Japanese, 2186-2583, 201102215660199701 - LR flow から導かれる離散ハングリー系のベックルント変換
福田亜希子; 濱洋輔; 山本有作; 岩崎雅史; 石渡恵美子; 中村佳正
Mar. 2011, 九州大学応用力学研究所 研究集会報告 「非線形波動研究の新たな展開 ― 現象とモデル化-」, 22AO-S8, 182-187, Japanese - 動的計画法に基づく密行列計算アルゴリズムの再帰的ブロック化
深谷猛; 山本有作; 張紹良
11 Jan. 2011, ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集, 2011, 65-65, Japanese, 170000063932 - シフトを考慮したLR変換に関連する離散ハングリー系について
濱洋輔; 福田亜希子; 石渡恵美子; 岩崎雅史; 山本有作; 中村佳正
2011, 日本応用数理学会年会講演予稿集, 2011, 69-70, Japanese, 1345-3378, 201102251195821308 - Perforamnce Optimization and Evaluation of the Blocking Strategy for the Dense Matrix COmputations
深谷猛; 山本有作; ZHANG Shao‐Liang
15 Oct. 2010, 情報処理学会研究報告(CD-ROM), 2010, 3, ROMBUNNO.HPC-126,NO.33, Japanese, 2186-2583, 201002258680016739 - Automatic Performance Tuning of Matrix Computations Based on Hierarchical Performance Models
山本有作; 深谷猛
24 Sep. 2010, 応用数理, 20, 3, 201-211, Japanese, 0917-2270, 201002247930381444 - LR変換に関連する離散ハングリー系とそのベックルント変換 2
濱洋輔; 福田亜希子; 石渡恵美子; 岩崎雅史; 山本有作; 中村佳正
Sep. 2010, 日本応用数理学会年会講演予稿集, 2010, 167-168, Japanese, 1345-3378, 201002234075524653 - LR変換に関連する離散ハングリー系とそのベックルント変換 1
福田亜希子; 山本有作; 岩崎雅史; 石渡恵美子; 中村佳正
Sep. 2010, 日本応用数理学会年会講演予稿集, 2010, 165-166, Japanese, 1345-3378, 201002245645084794 - Perforamnce Optimization and Evaluation of the Blocking Strategy for the Dense Matrix COmputations
FUKAYA TAKESHI; YAMAMOTO YUSAKU; ZHANG SHAO-LIANG
高性能な行列計算を行う場合,プログラムの性能チューニングが必要不可欠である.我々は基本的な密行列計算が BLAS ルーチンを使って実行される点に着目し,チューニング済みの BLAS ルーチンを効率的に使えるようにプログラムをチューニングすることを目指す.ブロック化されたアルゴリズムにおいて,効率的に BLAS を使うためには行列のブロック分割法を最適化することが重要となる.本稿では,LU 分解のアルゴリズムをブロック化して,ブロック分割法が性能に与える影響を評価し,さらに適切な分割法を決定するための手法の検討を行う.For high performance matrix computations, it is necessity to tune the software. Since basic dense matrix computations consist almost entirely of the BLAS routines, it is important how to tune programs for exploiting the peak performance of optimized BLAS routines. In blocked algorithm, this means how to optimize the partitioning of the target matrix. In this paper, we evaluate and discuss the blocking strategy for the blocked LU decomposition., 情報処理学会, 27 Jul. 2010, 研究報告ハイパフォーマンスコンピューティング(HPC), 2010, 33, 1-6, Japanese, 0919-6072, 110007995521, AN10463942 - 帯行列の固有値を計算する離散可積分系について
福田亜希子; 石渡恵美子; 山本有作; 岩崎雅史; 中村佳正
Mar. 2010, 九州大学応用力学研究所 研究集会報告 「非線形波動研究と将来 ― 次の10年への展望」, 21ME-S7, 1-6, Japanese - On a variable transformation between two integrable systems: The discrete hungry Toda equation and the discrete hungry Lotka-Volterra system
Y. Yamamoto; A. Fukuda; M. Iwasaki; E. Ishiwata; Y. Nakamura
2010, AIP Conference Proceedings, 1281, 2045-2048, English, Peer-reviwed - A Study on Automatic Tuning for Parallel Computation of the Blocked Housseholder QR Decomposition
深谷猛; 山本有作; ZHANG Shao‐Liang
15 Oct. 2009, 情報処理学会研究報告(CD-ROM), 2009, 3, ROMBUNNO.HPC-121,18, Japanese, 2186-2583, 200902237723187860 - Totally Nonnegative帯行列向けqd法へのシフト導入について
山本有作; 深谷猛
28 Sep. 2009, 日本応用数理学会年会講演予稿集, 2009, 177-178, Japanese, 1345-3378, 200902203889947802 - 離散ハングリー可積分系に基づく固有値計算アルゴリズムの新たな展開
福田亜希子; 石渡恵美子; 山本有作; 岩崎雅史; 中村佳正
28 Sep. 2009, 日本応用数理学会年会講演予稿集, 2009, 235-236, Japanese, 1345-3378, 200902219801599854 - 離散ハングリー可積分系に基づく帯行列の固有値計算アルゴリズムとその誤差評価
福田亜希子; 石渡恵美子; 山本有作; 岩崎雅史; 中村佳正
28 Sep. 2009, 日本応用数理学会年会講演予稿集, 2009, 179-180, Japanese, 1345-3378, 200902242087749156 - A Study on Automatic Tuning for Parallel Computation of the Blocked Housseholder QR Decomposition
深谷 猛; 山本 有作; 張 紹良
行列計算を並列化する場合,行列ベクトル積や行列乗算などの BLAS ルーチンを並列化する方法と,それらのルーチンをコールする階層で並列化する方法が考えられる.また,行列をブロックに分割して計算を行うことが一般的となっている.そのため,ユーザーは並列化方法とブロック分割法の両者のチューニングを行う必要があるが,自由度が非常に大きいため,効果的なチューニングをすることが難しい.そこで,本稿ではハウスホルダー QR 分解を対象として,自動チューニング手法の検討を行う.In matrix computation, we can parallelize an algorithm by two ways: parallelization of BLAS routines such as matrix-vector multiplication, and parallelization in algorithm levels where BLAS routines are called. In addition, blocking techniques are widely used for matrix computations. Therefore we have many choices when tuning our programs for parallel computers. But it is very difficult for general users to tune their programs effectively. In this paper, we discuss an approach to automatic tuning the algorithm of the blocked Householder QR decomposition., 情報処理学会, 28 Jul. 2009, 研究報告ハイパフォーマンスコンピューティング(HPC), 2009, 18, 1-7, Japanese, 1884-0930, 110007995413, AN10463942 - 正方行列向け特異値分解のCUDAによる高速化
深谷猛; 山本有作; 畝山多加志; 中村佳正
15 Jan. 2009, 情報処理学会シンポジウム論文集, 2009, 2, 107-114, Japanese, 1344-0640, 200902293260151524 - Acceleration of the Singular Value Decomposition Algorithm for Rectangular Matrices with a Floating-point Coprocessor
FUKAYA TAKESHI; YAMAMOTO YUSAKU; UNEYAMA TAKASHI; HORI GEN; UMENO KEN
In this paper, we propose an approach for accelerating the singular value decomposition (SVD) of a rectangular matrix with the CSX600 floating point coprocessor. The SVD of rectangular matrices consists of five steps, namely, QR decomposition of the input matrix A, transformation of R into a bidiagonal matrix, SVD of the resulting bidiagonal matrix, back-transformation and back-transformation corresponding to the QR decomposition. In our study, we optimize each step so that most of the computation is done using the level-3 BLAS (DGEMM) and accelerate the execution of DGEMM with the CSX600. We implemented the proposed method using an accelerator board with two CSX600 chips and obtained up to 2.3 times speedup over 3.2 GHz Xeon processor when computing the SVD of a 40000×2000 rectangular matrix. Technical issues for further improving the performance are also discussed., Information Processing Society of Japan (IPSJ), 15 May 2007, 情報処理学会論文誌コンピューティングシステム(ACS), 48, 8, 31-43, Japanese, 1882-7829, 110006274060, AA11833852 - Acceleration of the Singular Value Decomposition Algorithm for Rectangular Matrices with a Floating-Point Coprocessor
深谷猛; 山本有作; 畝山多加志; 堀玄, 梅野健
17 Jan. 2007, 情報処理学会シンポジウム論文集, 2007, 1, 111-118, Japanese, 1344-0640, 200902222433133700 - Accelerating the singular value decomposition of rectangular matrices with the CSX600 and the integrable SVD
Y. Yamamoto; T. Fukaya; T. Uneyama; M. Takata; K. Kimura; M. Iwasaki; Y. Nakamura
2007, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4671 LNCS, 340-345, English, Peer-reviwed, 200902269071311083
Books and other publications
- Advanced Mathematical Science for Mobility Society
Masashi Iwasaki, Masato Shinjo, Yusaku Yamamoto, Akiko Fukuda, Sennosuke Watanabe, Masaki Sekiguchi
Contributor, Chapter: Integrable Systems Related to Matrix LR Transformations, Springer Singapore, Nov. 2023, 9789819997718 - 20世紀のトップ10アルゴリズム
小柳義夫; 土谷隆; 曽我部知広; 杉原正顯; 西山博泰; 山本有作; 平田富夫; 大浦拓哉; 張紹良; 須田礼仁; 張紹良; 山本有作
Scholarly book, Japanese, Contributor, 第6章「QR法」、第9章「整数関係探知」(共訳)、第9章補遺「PSLQアルゴリズムの動作原理」, 共立出版, 01 Jan. 2022 - Vector Calculus
山本有作; 石原卓
Textbook, Japanese, Joint work, 裳華房, 18 Nov. 2020 - 計算科学のための基本数理アルゴリズム
山本有作; 曽我部知広
Scholarly book, Japanese, Joint work, 共立出版, 01 Jun. 2019, 9784320122666 - The Art of High Performance Computing for Computational Science, Vol. 1
Takahiro Katagiri; Maho Nakata; Yusaku Yamamoto; Daisuke Takahashi; Hiroshi Watanabe; Shin’chi Oishi; Yusuke Morikura; Kouta Sekine; Hisayasu Kuroda
Scholarly book, English, Contributor, High-Performance Algorithms for Numerical Linear Algebra, Springer, 23 May 2019 - 数値線形代数の数理とHPC (シリーズ応用数理 第 6巻)
山本 有作
Scholarly book, Japanese, Contributor, 第1章「連立一次方程式の数値解法」, 共立出版, 30 Aug. 2018, 9784320019553 - Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing
Yusaku Yamamoto
Scholarly book, English, Contributor, Article: An elementary derivation of the projection method for nonlinear eigenvalue problems based on complex contour integration, Springer, 04 Jan. 2018, 9783319624242 - 計算科学のためのHPC技術1
片桐孝洋; 中田真秀; 渡辺宙志; 山本有作; 吉井範行; Jaewoon Jung; 杉田有治; 石村和也; 大石進一; 関根晃太; 森倉悠介; 黒田久泰
Scholarly book, Japanese, Contributor, 第8章 行列計算における高速アルゴリズム, 大阪大学出版会, 01 Mar. 2017, 9784872595864 - Computational Science & Engineering
Gilbert Strang
Scholarly book, Japanese, Contributor, 第1章 応用線形代数, 近代科学社, 30 Jan. 2017 - 応用数理ハンドブック
山本有作
Dictionary or encycropedia, Japanese, Contributor, 第II編「方法の数理」中,「固有値問題の数値解法」, 朝倉書店, 25 Oct. 2013
Lectures, oral presentations, etc.
- Comparison of Two QUBO Formulations of Approximate Block Diagonalization and Their Performance on the D-Wave Advantage Quantum Annealing Machine
Koushi Teramoto; Shuhei Kudo; Yusaku Yamamoto
Oral presentation, English, 15th International Conference on Parallel Processing & Applied Mathematics, Peer-reviewed
10 Sep. 2024 - Picard逐次近似法とGauss-Seidel法
山本有作
Oral presentation, Japanese, 第50回数値解析シンポジウム
13 Jun. 2024 - Approximate Block Diagonalization of Symmetric Matrices Using Quantum Annealing
Koushi Teramoto; Masaki Kugaya; Shuhei Kudo; Yasuhiko Takenaga; Yusaku Yamamoto
Oral presentation, English, International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia '24), Peer-reviewed
26 Jan. 2024 - Approximate block diagonalization of symmetric matrices using a quantum annealing
Koushi Teramoto; Shuhei Kudo; Yusaku Yamamoto
Oral presentation, English, 2023 International Congress on Industrial and Applied Mathematics (ICIAM 2023)
24 Aug. 2023
20 Aug. 2023- 25 Aug. 2023 - Computing the matrix sign function with the double exponential formula
Tomoya Miyashita; Shuhei Kudo; Yusaku Yamamoto
Oral presentation, English, 2023 International Congress on Industrial and Applied Mathematics (ICIAM 2023)
23 Aug. 2023
20 Aug. 2023- 25 Aug. 2023 - Landau–Lifshitz–Gilbert 方程式に対する数値解法の構造保存性について
山本有作; 慈道亮人; 工藤周平
Oral presentation, Japanese, 第49回数値解析シンポジウム(NAS2023)
13 Jul. 2023
12 Jul. 2023- 14 Jul. 2023 - 2本のベクトルが張る平面上での回転による直交変換
森健太; 山本有作; 工藤周平
Oral presentation, Japanese, 日本応用数理学会 2023年研究部会連合発表会
10 Mar. 2023
08 Mar. 2023- 10 Mar. 2023 - q-離散戸田方程式の拡張とその時間発展が与えるLR変換について
渡邉凌斗; 新庄雅斗; 山本有作; 岩崎雅史
Oral presentation, Japanese, 日本応用数理学会 2023年研究部会連合発表会
10 Mar. 2023
08 Mar. 2023- 10 Mar. 2023 - 二重指数関数型数値積分公式を用いた行列符号関数計算法の精度および並列性の検証
宮下朋也; 山本有作; 工藤周平
Oral presentation, Japanese, 日本応用数理学会 2023年研究部会連合発表会
10 Mar. 2023
08 Mar. 2023- 10 Mar. 2023 - 固有値計算法で解く連続時間可積分系
山本有作
Public discourse, Japanese, 武蔵野大学MCMEセミナー, Invited, 武蔵野大学, 武蔵野大学有明キャンパス, https://www.musashino-u.ac.jp/research/laboratory/mathematical_engineering/seminar_symposium.html
17 Jan. 2023 - 行列計算における確率的誤差解析 ~ 行列指数関数の計算を例として ~
山本有作
Public discourse, Japanese, 日本応用数理学会 第14回 三部会連携「応用数理セミナー」, Invited, 日本応用数理学会, オンライン, https://na.cs.tsukuba.ac.jp/mepa/?page_id=2264
23 Dec. 2022 - 量子力学の固有値問題に対する緒方の方法の加速について
山本有作,今倉暁,緒方秀教
Oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第34回研究会, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会, オンライン, https://na.cs.tsukuba.ac.jp/mepa/?page_id=2427
06 Dec. 2022 - Automatic code selection for the dense symmetric generalized eigenvalue problem using ATMathCoreLib
Masato Kobayashi; Takeo Hoshi; Yusaku Yamamoto
Oral presentation, English, The 14th International Conference on Parallel Processing and Applied Mathematics (PPAM 2022), International conference
12 Sep. 2022 - 量子力学固有値問題の数値不定積分による数値解法
緒方秀教; 山本有作
Oral presentation, Japanese, 日本応用数理学会2022年度年会, Domestic conference
10 Sep. 2022 - 線形一般固有値問題に対する非線形べき乗法の収束解析
山本有作; 緒方秀教
Oral presentation, Japanese, 日本応用数理学会2022年度年会, Domestic conference
08 Sep. 2022 - 二重指数関数型数値積分公式を用いた行列符号関数の計算の改良および応用
宮下朋也; 山本有作
Oral presentation, Japanese, 日本応用数理学会2022年度年会, Domestic conference
08 Sep. 2022 - 運搬車容量付きの箱に番号が付いた箱玉系とシフト付きLR変換との対応
西脇友哉; 福田亜希子; 山本有作; 岩﨑雅史
Oral presentation, Japanese, 日本応用数理学会2022年度年会, Domestic conference
08 Sep. 2022 - Mobility and Eigenvalue/Singular Value Problems: Applications and Numerical Computation
Yusaku Yamamoto
Invited oral presentation, English, Mobility Workshop 2022 in Shizuoka: Mathematical Approaches to Mobility Problems, Invited, Network Machine Learning Team, Advanced Mathematical Science for Mobility Society Unit, Kyoto Univ.; Mathematical Informatics Lab, NAIST, Parche (Room C), Shizuoka City, https://sites.google.com/view/milab/home/mobility-workshop-2022-invited-only, International conference
17 Jan. 2022 - ブロック赤黒順序付けを用いた摂動付きMIC(0)分解のモデル問題による収束性解析
塩谷明美; 山本有作
Oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第32回研究会, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会, オンライン, Domestic conference
10 Dec. 2021 - ベイズ推定による超並列計算の性能予測
星健夫; 小橋恒士; 山本有作; 深谷猛
Oral presentation, Japanese, 日本応用数理学会2020年度年会, 日本応用数理学会, オンライン, Domestic conference
09 Sep. 2021 - 離散相対論的戸田方程式から導かれる交通流モデル
関口真基; 岩崎雅史; 山本有作; 石渡恵美子
Poster presentation, Japanese, 日本応用数理学会2020年度年会, 日本応用数理学会, オンライン, Domestic conference
08 Sep. 2021 - 原点シフト付きロトカ・ボルテラ系に対する超離散化から得られる箱玉系の性質について
岡来美; 関口真基; 岩﨑雅史; 山本有作; 石渡恵美子
Poster presentation, Japanese, 日本応用数理学会2020年度年会, 日本応用数理学会, オンライン, Domestic conference
08 Sep. 2021 - ブロック赤黒順序付けされた摂動付き修正不完全分解前処理の収束性解析
塩谷明美; 山本有作
Oral presentation, Japanese, 日本応用数理学会2020年度年会, 日本応用数理学会, Domestic conference
07 Sep. 2021 - テイラー展開に基づく行列指数関数計算の誤差解析
山本有作; 工藤周平; 星健夫
Oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第31回研究会, 日本応用数理学会, オンライン, Domestic conference
20 Jul. 2021 - ヒルベルト行列と関連するある行列の固有値について
山本 有作
Oral presentation, Japanese, 日本応用数理学会 第17回研究部会連合発表会, 日本応用数理学会, Domestic conference
04 Mar. 2021 - ヒルベルト行列と関連するある行列の 固有値について
山本 有作
Oral presentation, Japanese, 日本応用数理学会 第17回研究部会連合発表会, 日本応用数理学会, Domestic conference
04 Mar. 2021 - 縦長行列の列ピボット付きQR分解に対するコレスキーQR型アルゴリズムの検討
深谷猛; 中務祐治; 山本有作
Oral presentation, Japanese, 日本応用数理学会2020年度年会, 日本応用数理学会, Zoom開催, Domestic conference
09 Sep. 2020 - 実対称固有値問題向けのブロックヤコビ法に対する組み合わせ的前処理
久賀谷匡貴; 山本有作
Oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第29回研究会, 日本応用数理学会, Zoom開催, Domestic conference
30 Jul. 2020 - Shifted CholeskyQR3 for High Performance Tall-Skinny QR Factorization
Takeshi Fukaya; Ramaseshan Kannan; Yuji Nakatsukasa; Yusaku Yamamoto; Yuka Yanagisawa
Oral presentation, English, 2020 SIAM Conference on Parallel Processing for Scientific Computing, Society for Industrial and Applied Mathematics, Seattle, WA
13 Feb. 2020 - Performance improvement of block red-black MILU(0) preconditioner with relaxation on GPU
Akemi Shioya; Yusaku Yamamoto
Poster presentation, English, HPC Asia 2020, Across Fukuoka, International conference
16 Jan. 2020 - エネルギー散逸型数値解法の設計におけるある逆固有値問題について
山本 有作
Oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第28回研究会, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会, 東京, Domestic conference
02 Dec. 2019 - エネルギー散逸型数値解法の 設計におけるある逆固有値問題について
山本 有作
Oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第28回研究会, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会, 東京, Domestic conference
02 Dec. 2019 - Efficient implementations of the modified Gram–Schmidt orthogonalization in a non-standard inner product
Yusaku Yamamoto
Oral presentation, English, PARNUM 2019
28 Oct. 2019 - 一般内積における直交化のためのMGS-HP法の誤差解析
山本有作; 今倉暁
Oral presentation, Japanese, 日本応用数理学会2019年度年会, 日本応用数理学会, 東京大学駒場キャンパス, Domestic conference
05 Sep. 2019 - 離散相対論的戸田方程式が与えるシフト付きLR変換について
簑下尚也; 岩崎雅史; 山本有作
Oral presentation, Japanese, 日本応用数理学会2019年度年会, 日本応用数理学会, 東京大学駒場キャンパス, Domestic conference
05 Sep. 2019 - ブロック赤黒順序付け緩和MILU(0)前処理法のGPU向け高速化
塩谷明美; 山本有作
Poster presentation, Japanese, 日本応用数理学会2019年度年会, 日本応用数理学会, 東京大学駒場キャンパス, Domestic conference
04 Sep. 2019 - 超並列電子状態計算に対するベイズ推定型計算性能解析
星健夫; 角田皓亮; 田中和; 深谷猛; 山本有作
Oral presentation, Japanese, 日本応用数理学会2019年度年会, 日本応用数理学会, 東京大学駒場キャンパス, Domestic conference
03 Sep. 2019 - ブロック赤黒順序付け緩和MILU(0)前処理法のGPU実装と性能評価
塩谷明美; 山本有作
Oral presentation, Japanese, 2019年並列/分散/協調処理に関する『北見』サマー・ワークショップ (SWoPP2019), 北見市民会館, Domestic conference
25 Jul. 2019 - 荻田・相島の固有ベクトル反復改良法に基づく実対称行列の固有値分解追跡手法
山本 有作
Oral presentation, Japanese, RIMS共同研究(公開型)「次世代の科学技術を支える数値解析学の基盤整備と応用展開」, 京都大学数理解析研究所, Domestic conference
14 Nov. 2018 - Orthogonalization algorithms and dense symmetric eigenvalue solvers optimized for strong scaling
Yusaku Yamamoto
Invited oral presentation, English, 3rd International Symposium on Research and Education of Computational Science (RECS2018), Invited, The Computational Science Alliance of the University of Tokyo, The University of Tokyo, http://conf.compsci-alliance.jp/program/, International conference
20 Sep. 2018 - 荻田・相島の固有ベクトル反復改良法における重複固有値の扱いについて
白間 久瑠美; 工藤 周平; 山本 有作
Oral presentation, Japanese, 日本応用数理学会2018年度年会, 日本応用数理学会, 名古屋大学, http://annual2018.jsiam.org/, Domestic conference
04 Sep. 2018 - ベイズ推定を用いた並列数値計算ライブラリの性能予測
原田 祐希; 田中 和幸; 福本 智哉; 深谷 猛; 山本 有作; 星 健夫
Poster presentation, Japanese, 日本応用数理学会2018年度年会, 日本応用数理学会, 名古屋大学, http://annual2018.jsiam.org/, Domestic conference
04 Sep. 2018 - シフト付きCholeskyQR法を用いた一般内積空間におけるQR分解の計算
深谷 猛; 中務 佑治; Kannan Ramaseshan; 山本 有作; 柳澤 優香
Poster presentation, Japanese, 日本応用数理学会2018年度年会, 日本応用数理学会, 名古屋大学, http://annual2018.jsiam.org/, Domestic conference
04 Sep. 2018 - 一般化固有値問題に対する分割統治法におけるデフレーション手法について
廣田 悠輔; 山本 有作
Oral presentation, Japanese, 2018年並列/分散/協調処理に関する『熊本』サマー・ワークショップ (SWoPP2018), 電子情報通信学会・情報処理学会・日本応用数理学会, 熊本県熊本市, https://sites.google.com/site/swoppweb/, Domestic conference
31 Jul. 2018 - Relaxed MILU(0) Factorization with Block Red-Black Ordering for GPU parallelization of PIC simulation
Akemi Shioya; Yusaku Yamamoto
Oral presentation, English, EASIAM2018, SIAM East Asian Section, The University of Tokyo, http://sr3.t.u-tokyo.ac.jp/EASIAM2018/, International conference
25 Jun. 2018 - Application of an energy-preserving integrator to quantum-mechanical wavepacket dynamics
Tsubasa Sakai; Shuhei Kudo; Hiroto Imachi; Yuto Miyatake; Takeo Hoshi; Yusaku Yamamoto
Oral presentation, English, EASIAM2018, SIAM East Asian Section, The University of Tokyo, http://sr3.t.u-tokyo.ac.jp/EASIAM2018/, International conference
24 Jun. 2018 - エネルギー保存型並列解法MB4に基づ2次元量子ダイナミクス計算
酒井 翼; 工藤 周平; 井町 宏人; 宮武 勇登; 星 健夫; 山本 有作
Poster presentation, Japanese, 第47 回数値解析シンポジウム(NAS2018), 福井県あわら市, Domestic conference
07 Jun. 2018 - A case study on modeling the performance of dense matrix computation: tridiagonalization in the EigenExa eigensolver on the K computer
Takeshi Fukaya; Toshiyuki Imamura; Yusaku Yamamoto
Oral presentation, English, IPDPS Workshops 2018, IPDPS, Vancouver, Canada, International conference
21 May 2018 - Block Red-Black Milu(0) Preconditioner with Relaxation on GPU
Akemi Shioya; Yusaku Yamamoto
Oral presentation, English, 18th SIAM Conference on Parallel Processing for Scientific Computing, Society of Industrial and Applied Mathematics, Tokyo, Japan, International conference
09 Mar. 2018 - Orthogonalization Algorithms and Dense Symmetirc Eigenvalue Solvers for Massively Parallel Machines
Yusaku Yamamoto
Oral presentation, English, International Workshop on Massively Parallel Programming for Quantum Chemistry and Physics 2018, Invited, 理化学研究所 計算科学研究機構 (量子系分子科学研究チーム) 理化学研究所 理論科学連携研究推進グループ (iTHES) 超並列プログラミング国際WS実行委員会, 理化学研究所(埼玉県和光市), http://labs.aics.riken.jp/nakajimat_top/mpqcp2018_ws.html, International conference
16 Jan. 2018 - One-Way Dissectionオーダリングによる連立一次方程式の直接解法の並列化
中野 智輝; 横川 三津夫; 深谷 猛; 山本 有作
Oral presentation, Japanese, 情報処理学会第162回ハイパフォーマンスコンピューティング研究発表会, Domestic conference
19 Dec. 2017 - 虚数シフトを行った実対称行列のためのCOCG法と一般化MINRES法の性能比較
清藤 博暉; 山本 有作
Oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第24回研究会, 日本応用数理学会, Domestic conference
24 Nov. 2017 - On using the Cholesky QR method in the full-blocked one-sided Jacobi algorithm
S. Kudo; Y. Yamamoto
Oral presentation, English, The 12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017), Lublin, Poland, https://www.ppam.pl/program, International conference
11 Sep. 2017 - Cholesky QR法を用いたブロック片側ヤコビ法の誤差解析
工藤周平; 山本有作
Oral presentation, Japanese, SWoPP 2017, 日本応用数理学会, Domestic conference
27 Jul. 2017 - ブロック赤-黒順序付けされた摂動/緩和MILU(0)前処理法のGPUとマルチコアCPUにおける性能評価
塩谷明美; 山本有作
Oral presentation, Japanese, SWoPP 2017, Domestic conference
27 Jul. 2017 - 一般内積における直交化のためのMGS-HP 法の誤差解析
山本有作
Oral presentation, Japanese, 第46回数値解析シンポジウム(NAS2017), Domestic conference
30 Jun. 2017 - Roundoff error analysis of the CholeskyQR2 and related algorithms
Yusaku Yamamoto
Oral presentation, English, International Workshop on Parallel Numerics (PARNUM 2017), Waischenfeld, Germany, International conference
20 Apr. 2017 - 一般内積に対する修正グラム・シュミット直交化の効率的実装法の誤差解析
山本有作; 今倉暁
Oral presentation, Japanese, 日本応用数理学会 2017年 研究部会連合発表会, 電気通信大学
06 Mar. 2017 - 片側ヤコビSVDの並列計算機向け実装と性能解析
工藤周平; 山本有作
Oral presentation, Japanese, 日本応用数理学会 2017年 研究部会連合発表会, 電気通信大学, Domestic conference
06 Mar. 2017 - ブロックヤコビ法に基づく強スケーリング型固有値・特異値計算アルゴリズム
山本 有作
Invited oral presentation, Japanese, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会 第22回研究会, 東京大学, Domestic conference
25 Nov. 2016 - Convergence Analysis of the Parallel Block Jacobi Method for the SVD with Dynamic Ordering
Yusaku Yamamoto
Invited oral presentation, English, Parallel Numerical Computing and Its Applications, Invited, Slovak Academy of Sciences, Bratislava, Slovakia, Smolenice, Slovakia, International conference
07 Sep. 2016 - ブロック化赤-黒順序付けにおけるMILU分解の問題点と緩和係数導入の効果
塩谷明美; 山本有作
Oral presentation, Japanese, SWoPP 2016, (社)情報処理学会, キッセイ文化ホール(長野県松本文化会館), Domestic conference
09 Aug. 2016 - High-Performance Parallelization Method of DSYRK for SVD and other Matrix Computations on Xeon Phi
Shuhei Kudo; Yusaku Yamamoto
Oral presentation, English, The 9th International Workshop on Parallel Matrix Algorithms and Applications (PMAA 16), International conference
07 Jul. 2016 - Solution of a Nonlinear Eigenvalue Problem using the Signed Singular Values with Application to the Scaling Exponent Computation in 2 Dimensional Turbulent Flow
Yusaku Yamamoto
Oral presentation, English, SIAM: East Asian Section Conference 2016 (EASIAM 2016), Society of Industrial and Applied mathematics, International conference
21 Jun. 2016 - Xeon PhiにおけるDSYRKの並列化手法と性能解析
工藤周平; 山本有作
Oral presentation, Japanese, 2016年ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2016), (社)情報処理学会, 東北大学片平キャンパス, Domestic conference
06 Jun. 2016 - Performance of the Parallel One-Sided Block Jacobi SVD Algorithm on a Modern Distributed-Memory Parallel Computer
S. Kudo; Y. Yamamoto; M. Bečka; M. Vajteršic
Oral presentation, English, 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, International conference
08 Sep. 2015 - 三重対角化に対するDongarra--Wilkinson法の性能解析と実装手法について
工藤周平; 山本有作; 横川三津夫
Oral presentation, Japanese, 2015年ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2015), (社)情報処理学会, 東京大学, Domestic conference
19 May 2015 - A nonlinear eigenvalue problem arising in theoretical fluid dynamics and its solution using signed singular values
Yusaku Yamamoto
Invited oral presentation, English, The Fifth China-Japan- Korea Conference on Numerical Mathematics, Invited, Yinchuan City, Ningxia Province, China, International conference
25 Aug. 2014 - 固有値計算のための古典的ブロックヤコビ法の並列化とその収束性解析
山本有作; 張瀾
Oral presentation, Japanese, 日本応用数理学会 研究部会連合発表会, 日本応用数理学会, 京都大学, http://chaosken.amp.i.kyoto-u.ac.jp/jsiam2014spring/program.html, Domestic conference
20 Mar. 2014 - エクサフロップス時代に向けた線形計算アルゴリズムの課題と研究動向
山本有作
Invited oral presentation, Japanese, 日本応用数理学会 研究部会連合発表会, Invited, 日本応用数理学会 「行列・固有値問題とその応用」研究部会, 東京大学, http://na.cs.tsukuba.ac.jp/mepa/?page_id=331, Domestic conference
26 Dec. 2013 - Solution of a Large-Scale Nonlinear Eigenvalue Problem Based on the Signed Singular Values
Yusaku Yamamoto
Invited oral presentation, English, 2013 NCTS Workshop on Numerical Linear Algebra and High Performance Computing, NCTS, National Tsing-Hua University,, Hsinchu, China, http://math.cts.nthu.edu.tw/Mathematics/2013NLA-HPC.htm, International conference
09 Dec. 2013
Courses
- 情報数理工学実験第二
The University of Electro-Communications - 情報数理工学実験第二
電気通信大学 - ハイパフォーマンスコンピューティング基礎論
The University of Electro-Communications - ハイパフォーマンスコンピューティング基礎論
電気通信大学 - ハイパフォーマンスコンピューティング
The University of Electro-Communications - ハイパフォーマンスコンピューティング
電気通信大学 - 応用数学第二
The University of Electro-Communications - 応用数学第二
電気通信大学 - Basics of High Performance Computing
The University of Electro-Communications - HPC基礎論
電気通信大学 - High Performance Computing 1
The University of Electro-Communications - HPC第一
電気通信大学 - High Performance Computing 2
The University of Electro-Communications - HPC第二
電気通信大学
Research Themes
- 組合せ的前処理と量子アニーリングの融合による行列計算の加速手法
山本 有作; 武永 康彦
日本学術振興会, 科学研究費助成事業 挑戦的研究(萌芽), 電気通信大学, 挑戦的研究(萌芽), 22K19772
30 Jun. 2022 - 31 Mar. 2025 - Investigation for the advancement of the general-purpose surface-structural-analysis programme, 2DMAT II
望月出海; 和田 健; 兵頭 俊夫; アハメッド レズワン; 福島 孝治; 吉見 一慶; 花田 貴; 平川 力; 星建夫; 高山あかり; 中西義典; 山本有作
2023 年度TIA 連携プログラム探索推進事業「かけはし」, 高エネルギー加速器研究機構, TK23-006
Apr. 2023 - Mar. 2024 - 高いスケーリング性能と高精度性を併せ持つ次世代固有値・特異値分解ライブラリの開発
山本有作
01 Apr. 2019 - 31 Mar. 2023 - Fast, accurate and stable matrix computation algorithms based on non-orthogonal transformations
Yamamoto Yusaku
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Challenging Research (Exploratory), The aim of this study is to develop new algorithms for matrix computations based on non-orthogonal transformations and analyze their convergence properties and accuracy theoretically. We chose the time-dependent eigenvalue problem and preconditioning methods for iterative solution of linear systems as examples. For the former, we developed an algorithm based on matrix multiplications, which is suited for next-generation microprocessors. For the latter, we developed a highly parallel algorithm based on the modified incomplete Cholesky factorization and the block red-block ordering., 17K19966
30 Jun. 2017 - 31 Mar. 2022 - Development of a parallel matrix library for computational physics with high strong-scaling performance
Yamamoto Yusaku
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (B), In this study, we developed matrix solvers which are applicable to a wide range of computer physics applications, such as electronic structure calculation, plasma simulation and crack growth simulation. Our main targets are direct solvers for sparse and band matrices and generalized eigenvalue solvers for dense matrices. With the use of communication-avoiding algorithms and automatic code selection techniques, our solvers aim at achieving high strong-scaling performance. We applied our solvers to time-dependent simulation of organic polymeric materials and crack growth simulation and verified their performance., 17H02828
01 Apr. 2017 - 31 Mar. 2020 - Innovative design of electronics material based on identifying and preserving mathematical structures
Matsuo Takayasu
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Tokyo, Grant-in-Aid for Scientific Research (B), In this research, we aimed at developing a new designing framework of innovative electronics material for which conventional designing approach has become too expensive due to the rapid increase of the complexity and size of target designs. The main idea there was to establish a unified approach where the physics, the mathematical model, and its numerical computation were demanded to be consistent in terms of a certain mathematical structure. As outcomes of this research, we have developed several important supporting new techniques, such that new structure-preserving model reduction techniques, new fast iterative solvers for electronics dynamics computations. Then we considered some electronics dynamics computations by efficient, energy-conserving parallel numerical methods, which confirmed the validity of the concept of the unified approach in that it actually enables efficient, stable computation., 16KT0016
19 Jul. 2016 - 31 Mar. 2019 - Theory and Application of Scalable Numerical Software on an O(100M) core environment
IMAMURA Toshiyuki; YAMAMOTO Yusaku; Todo Shinji
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Institute of Physical and Chemical Research, Grant-in-Aid for Scientific Research (B), This research project aims to realize high performance numerical services investigated in the past based on new mathematical principles in the emerging computing system where tens of thousands to hundreds of millions of processing cores are installed. Giving two important themes, `Mixed-granularity numerical kernel' and `Asynchronous numerical algorithm,' we conducted; i) the research on the theory of asynchronous numerical algorithms. Also avoidance of communication and synchronization at a practical level, then CAHTR and a new method for the FDTD scheme were proposed. Furthermore, we have practiced; ii) promoting research on core numerical infrastructure technologies such as automatic tuning for scalable, lightweight code generation at super-many-core, and promoting innovative research leading to the next generation numerical calculation software., 15H02709
01 Apr. 2015 - 31 Mar. 2018 - Research on Mathematical Methods and Development of Libraries for Combined and Hierarchical Autotuning
SUDA Reiji; TANAKA Teruo; YAMAMOTO Yusaku; KATAGIRI Takahiro
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Tokyo, Grant-in-Aid for Scientific Research (B), Autotuning is technology that aims to attain good performance under various conditions, by letting software controls its own parameter. In case multiple parameter exist, most of previous research chose either exhaustive search or heuristic pruning. In this research, we aim mathematically founded method using Bayesian statistics, which gives practically good and asymptotically optimal solutions. In survey of previous works, we found that linear models and correlation models have such properties, and they can be combined. From description of such models, we create a software that generates a code that constructs performance model from a priori information and observations. Also we apply autotuning mathematical libraries to various computations, and confirms their effectiveness., 15H02708
01 Apr. 2015 - 31 Mar. 2018 - New developments of matrix similarity transformations derived from integrable systems
Iwasaki Masashi; YAMAMOTO Yusaku; ISHIWATA Emiko; KONDO Koichi; FUKUDA Akiko; SHINJO Masato
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Kyoto Prefectural University, Grant-in-Aid for Scientific Research (C), One of the results is to improve the I-SVD algorithm for computing singular value decompositions of bidiagonal matrices with higher accuracy. The new I-SVD algorithm employs the proposed dLVs algorithm for singular values and the proposed divided Twisted factorization method for singular vectors. The second is to investigate the convergence of solutions to dynamical systems to eigenvalues of various band matrices. Solution expressions of the discrete hungry integrable systems and their asymptotic convergence are thoroughly clarified. A new algorithm for computing eigenvalues of pentadiagonal matrices is also designed. The third is to find relationships among the discrete hungry integrable systems, extended Fibonacci sequences and roots of polynomials, and then develop them in constructing band matrices with prescribed eigenvalues. New characteristic polynomials of matrices are also presented over min-plus algebra., 26400208
01 Apr. 2014 - 31 Mar. 2017 - Extension of a Linear algebra Library for Electronic Structure Calculation and Its Optimization for Many-Core Processors
Yusaku Yamamoto
Principal investigator, 大規模電子状態計算向けに行列計算ライブラリを拡張し,電子状態計算向けの固有値解法,ベクトルの逐次添加型直交化法,行列関数の計算法などの機能を備える。また,開発したライブラリをメニーコアプロセッサ向けに最適化する。
01 Apr. 2014 - 31 Mar. 2017 - Research on Software Foundations for General Purpose Automatic Tuning Mechanisms
SUDA Reiji; SATO Hiroyuki; YAMAMOTO Yusaku; IMAMURA Toshiyuki; YOSHIZOE Kazuki; KATAGIRI Takahiro; FUJII Akihiro; YUBA Toshitsugu; YASUGI Masahiro; OSAWA Noritaka
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Tokyo, Grant-in-Aid for Scientific Research (A), Aiming to establish Automatic Tuning (AT) as a general technology, we researched in the following four domains. In Math Domain, we developed and released AT mathematical core library ATMathCoreLib, and developed mathematical methods for AT with performance correlation. In Programming Domain, we combined ATMathCoreLib with AT description language ppOpen-AT, and proposed a method to cancel performance variation from code placements. In System Domain, we proposed architecture of general purpose Auto-Tuner, and created light-weight performance data collection mechanism. In Application Domain, we apply AT to matrix computations and large scale search, and apply ATMathCoreLib to some applications. Thus we researched on four domains in focus and in collaboration, and achieved research results toward general purpose AT methodology., 23240005
01 Apr. 2011 - 31 Mar. 2015 - Fast solution for ultra-large scale systems as a basis of computational materials science
ZHANG Shao-Liang; ABE Kuniyoshi; YAMAMOTO Yusaku; SOGABE Tomohiro; IMAHORI Shinji; MIYATA Takahumi; SUGIHARA Masaaki
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Nagoya University, Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area), Finding novel complex correlation phenomena and clarifying the non-equilibrium dynamics are of prime importance in the field of materials design. This research group tackles the challenging problems from the viewpoints of numerical linear algebra, optimization and high-performance computing. The major purpose of the researches is to develop robust and efficient numerical algorithms for solving large linear systems and eigenvalue problems in order to shed light on a breakthrough toward the challenging problems. Some of numrical algrotihms for solving linear systems we developed are as follows: the shifted COCR method for solving shifted complex symmetric linear systems; the look back GMRES(m) method for nonsymmetric linear systems; a variand of the IDR(s) method for nonsymmetric linear systems. Some of numrical algrotihms for solving eigenvalue problems we developed are as follows: the Arnoldi(M,W,G) method; an extension of the SS method;, 22104004
01 Apr. 2010 - 31 Mar. 2015 - Fast and accurate eigenvalue solver for many-core processors
YAMAMOTO Yusaku
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Kobe University, Grant-in-Aid for Scientific Research (C), The objective of this research is to develop efficient algorithms for computing eigenvalues on many-core processors. The achievements of this research can be summarized as follows: (1) Implementation and performance evaluation of an eigenvalue computation algorithm designed for multi-core/many-core processors (2) Error analysis of the AllReduce algorithm for Householder QR decomposition (3) A new highly parallel algorithm for incremental orthogonalization (4) A new algorithm for computing the eigenvalues of a totally nonnegative band Hessenberg matrix with high relative accuracy, 21560065
2009 - 2012 - Adaptive Auto-tuning Technology Aiming Complex Multicore and Multiprocessor Environments
IMAMURA Toshiyuki; KATAGIRI Takahiro; SUDA Reiji; TAKAHASHI Daisuke; YAMAMOTO Yusaku; NAKAJIMA Kengo
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, The University of Electro-Communications, Grant-in-Aid for Scientific Research (B), To establish a key component of"automatic tuning technology"on multi-core processors and multi-GPU's for a next-generation supercomputer system, we conducted this research project. In this work, we developed an"automatic tuning technique employing performance stabilization on GPU's"and an"automatic tuning technique which utilizes a performance database". Furthermore, we successfully gave the semioptimal experimental design based on a Bayesian model through the mathematical research of the automatic tuning when the environmental conditions change dynamically. AT(automatic-tuning) for eigenvalue solvers, Fast Fourier Transform, and iterative solver with preconditioning were exploited on a cluster system or a multicore computer system., 21300013
2009 - 2011 - Development of Innovative Library for Singular Value Decomposition Suited to Multi-Core Processors
NAKAMURA Yoshimasa; TSUJIMOTO Satoshi; KIMURA Kinji; YAMAMOTO Yusaku; IWASAKI Masahi; TAKATA Masami
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Kyoto University, Grant-in-Aid for Scientific Research (A), This project obtains the following fruits in parallel computations with multi-core CPU environments. 1) A shift strategy based on the generalized Newton shift is completed for computing bidiagonal singular values by the mdLVs. 2) A speed up is established in reorthogonalization of singular vectors by using the Householder transformation with the compact WY representation. 3) A speed up of preconditioning process from given dense matrix to bidiagonal is performed. 4) Evaluation both of performance and accuracy of some singular value decomposition algorithms including I-SVD and D & C in double precision computation in GPGPU are discussed. 5) New integrable algorithms for computing eigenvalues and singular values of some class of band matrices are designed. The resulting code DBDSLV of the fast I-SVD algorithm in terms of 1)-4) can be downloaded in the web page as well as many presentations have been given in international workshops and annual meetings and many research papers have been published., 20246027
2008 - 2011 - Research for new algorithms by integrable system approach
IWASAKI Masashi; KONDO Koichi; ISHIWATA Emiko; YAMAMOTO Yusaku; NAKAMURA Yoshimasa
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Kyoto Prefectural University, Grant-in-Aid for Young Scientists (B), We design new three kinds of algorithms based on the studies of integrable systems. One of our proposal algorithms is matrix eigenvalues algorithm for band matrices. The second is matrix diagonalization algorithm for general matrices. The third is matrix eigenvalues algorithm in Max-Plus algebra. We also clarify some numerical properties of our proposal algorithms and existing algorithms., 20740064
2008 - 2010 - A study on fast solvers for large linear systems in the field of scientific computations
ZHANG Shao-Liang
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Nagoya University, Grant-in-Aid for Scientific Research (C), There are many researches about Krylov subspace methods for large linear systems and some research results were utilized for solving practical problems. However, these results can be used for some special cases. We studied and developed fast solvers for large linear systems in the field of scientific computations., 19560065
2007 - 2010 - 情報爆発時代のロバストな自動チューニングシステムに向けた数理的基盤技術の研究
須田 礼仁; 竹村 彰通; 室田 一雄; 山本 有作; 直野 健; 今村 俊幸
日本学術振興会, 科学研究費助成事業, 東京大学, 特定領域研究, 情報爆発時代のソフトウェアは, 質・量とも爆発的に大きくなる情報を有効に活用するため, それとおなじくらい爆発的に拡大する計算・通信・記憶資源の多様性に, 柔軟かつロバストに適応する能力を備えなければならない. このような適応能力(以下総じて「自動チューニング機能」と呼ぶ)は, 一般に情報収集→推定→最適化→実行制御という流れで実現される. 本研究では上記の流れの4つの部分のうち, 計算環境や応用分野への依存が少なく, 数理の世界で表現できる推定と最適化の部分に焦点を当て, 自動チューニングの数理の普遍的構造を明らかにし, 情報爆発時代の資源とデータに柔軟かつ確実に適応するシステムを実現するために必要となるロバストな自動チューニング機構の数理的基盤を構築することを目的とする. 平成20年度は2つのサブテーマについて特に注力して研究を推進した. 「オンライン自動チューニングのための逐次実験計画」では,一般のBayesモデルにおいて性能を適切に学習できなくなる場合があることを指摘し, そのような事態を確実に回避できるようなBayesモデルの定式化を行った. また, そのようなBayesモデルを具体的に示すことにより, 確実に性能を学習して, 最適解に近づいてゆける手法を確立した. また, 「未知の組み合わせにおける性能の推定」では, 連立一次方程式の反復解法ライブラリであるLisをターゲットとして, 部分的な実験データから未測定のパラメタの組み合わせにおける収束・発散および所要時間を予測することを試みた. その結果, パラメタ探索を効率化する複数の手法が得られた. これらの手法は, 実験および最適化の目的関数によって使い分けるものである. このほか, 自動チューニングとその周辺のさまざまな研究課題に取り組み, また, 3年連続で国際ワークショップを主催し, 自動チューニング研究を世界的にけん引した., 19024018
2007 - 2008 - Extension of option pricing algorithms based on the fast multipole methods to advanced and non-financial derivatives
YAMAMOTO Yusaku
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, Nagoya University, Grant-in-Aid for Scientific Research (C), 本課題では,非金融デリバティブ,および新しい資産価格モデルに基づく金融デリバティブに対し,高速・高精度な価格計算手法を開発することを目的として研究を行った。その結果,次の成果を得た。 (1)Dischelモデルと呼ばれる標準的な気温モデルに基づく天候デリバティブに対し,従来のモンテカルロ法に比べて10倍程度高速な新しい価格評価手法を開発した。 (2)ジャンプ拡散モデルに基づく金融デリバティブに対し,高速多重極子展開法に基づく価格評価手法が適用できることを明らかにした。 (3)オプション価格評価で用いられる連立線形常微分方程式に対し,行列の指数関数に基づく高速・高精度な数値解法を提案した。, 18560058
2006 - 2008 - Study on large-scale scalable P2P grid infrastructure for large-capacity distributed computing
SATO Mitsuhisa; BOKU Taisuke; TATEBE Osamu; AMAGASA Toshiyuki; SAKURAI Tetsuya; YUMAMOTO Yusaku
Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research, University of Tsukuba, Grant-in-Aid for Scientific Research (A), The concept of P2P grid is to make use of PCs in office and laboratory as a computing resource of Grid computing using P2P technology, while the Grid computing is mainly for sharing computing resource in each research organization. Our objective is to investigate and establish the technology and infrastructure of P2P grid for large-capacity computing and demonstrate its effectiveness in several applications. 1. To make use of potential computing resources such as PCs in office as a computing source of grid computing, we have developed a system, called BEE, which enables the Linux binary to run a Grid worker in Windows, and a network-layer which allows the communication traversing the firewall by using UDP-hole punching. And, we also proposed an authentication method called AUBReX (Authentication method Using Buddy-buddy relationship Represented by Cross certificate), which supports both anonymity and malicious user identification, for P2P environment, and a framework for a parallel programming model by remote procedure calls, which bridge large-scale computing resource pools managed by multiple Grid-enabled job scheduling systems. 2. We proposed a model to separate the data transfer by a data management layer from the RPC programming. We have designed and implemented a prototype data transfer layer, OmniStorage, which allows efficient data transmission of a large amount of data. And we proposed a framework and programming model to support large capacity computing on global distributed file system. We have proposed a scheme for storage and retrieval of XML data using DHT with storage-load balancing method based on extendible hashing. 3. We have developed a master-worker type parallel method for finding several eigenvalues and eigenvectors of a generalized eigenvalue problem, which suitable for master-worker programming models in P2P grid. We have also developed numerical algorithms suited to the heterogeneous multi-core systems such as Cell processor which can be recognized as a powerful computing resource in P2P grid., 17200002
2005 - 2007 - 情報爆発時代のロバストな自動チューニングソフトウェアに向けた数理的基盤技術の研究
須田 礼仁; 竹村 彰通; 室田 一雄; 山本 有作; 直野 健; 片桐 孝洋
日本学術振興会, 科学研究費助成事業, 東京大学, 特定領域研究, 情報爆発時代のソフトウェアは,質・量とも爆発的に大きくなる情報を技術と社会に活かすため,同様に爆発的に拡大する計算・通信・記憶資源の多様性に適応する能力(自動チューニング機能)を備えなければならない.我々は柔軟かつロバストな自動チューニング機能を有するソフトウェア開発するための数理的基盤技術の開拓を日指して以下の研究を行った. 1.実行時自動チューニングのための最適実験計画では,ベイズ統計に基づく最適解を大規模な数値計算により求め,優れた性質を持つことを示した. 2.計算機性能のためのモデルとして,キャッシュあふれやライン衝突などを考慮した性能モデルを構築した.また,AICによるモデル選択が一定の有効性を示すことを確認した. 3.分割表の検定に用いるMCMCの基底の研究を展開した.これは,従来困難であった多様な統計的解析を効率的に実現するものである. 4.M凸関数の概念をジャンプシステム上に拡張し,離散凸解析の研究をさらに展開した.これにより,マトロイド,デルタマトロイド,基多面体での知見を一般化した. 5.固有値解法であるマルチシフトQR法の性能モデルを構築し自動チューニングへの見通しをつけた.また,対称行列の三重対角化の手法について,その優劣がシステムや問題によりどのように変化するかについて系統的に調査した. 6.行列計算に関する自動チューニングのサーベイを行い,その動向を明らかにし,既存研究で残された課題を明らかにした. 7.分散メモリ並列計算における集団通信について,網羅的なアルゴリズム列挙法を開発した.特に本年度は2のべき以外のマシン数に適応した. 8.固有値問題の解法のうち,多固有値多分法について,実行時自動チューニングの手法を開発した.この結果は次期LAPACKに収納される予定である. このように,我々が目指す数理基盤の基礎をなす重要な研究成果が得られた., 18049014
2006 - 2006 - 高速多重極子展開法を用いた派生証券の高速価格計算手法に関する研究
山本 有作
日本学術振興会, 科学研究費助成事業, 名古屋大学, 若手研究(B), 本研究では,アメリカン・オプション,経路依存型オプション,複数資産に依存するオプションなど,価格を求める解析的公式が存在しない複雑な派生証券について,高速・高精度に価格を計算するための数値計算手法の開発を目的とする。これに対して,平成17年度は次のような研究成果が得られた。 (1)アメリカン・ルックバック・オプションに対する価格計算法の開発 ルックバック・オプションのうち,期限前行使が可能ないわゆるアメリカン・ルックバック・オプションは実用上重要である。本研究では,高速ガウス変換と2重指数型数値積分公式の組み合わせによるルックバック・オプションの価格計算法をこの場合に拡張した。数値実験の結果,従来法で最速とされるReinerの方法と比べても,より高速・高精度に価格を計算できることを確認した。本成果は論文誌Publ.RIMS Kyoto Univ.で報告した。 (2)天候デリバティブに対する価格計算法の並列化 高速ガウス変換に基づく天候デリバティブの価格計算法を前年度に開発したが,この方法は従来のモンテカルロ法に比べて高速ではあるものの,リアルタイム・プライシングなどの応用にとってはまだ計算時間がかかり過ぎるという問題点があった。そこで,この方法に対する効率的な並列化手法を開発した。並列化の結果,10台程度のPCクラスタを用いれば典型的な天候デリバティブの価格計算を10秒程度で行えることを示した。この速度は,ほとんどの応用にとって十分である。本成果はSpringer-VerlagのLecture Notes in Computer Scienceで報告した。, 16760053
2004 - 2005