Tetsu NARUMI

Department of Computer and Network EngineeringProfessor
Cluster I (Informatics and Computer Engineering)Professor
Admissions Research CenterProfessor
  • Profile:
    Apr. 1995 - Mar. 1998 : Special Postdoctoral Researcher,
    Japan Society for the Promotion of Science
    Apr. 1998 - Mar. 2001 : Special Postdoctoral Researcher,
    RIKEN (Institute of Physical and Chemical Research)
    Apr. 2001 - Mar. 2007 : Research Scientist,
    High-Performance Molecular Simulation Team, RIKEN Genomic Sciences Center
    Apr. 2007 - Mar. 2009 : Research Associate Professor,
    Faculty of Science and Technology, Keio University
    Apr. 2009 - Mar. 2013 : Associate Professor,
    Department of Computer Science, University of Electro-Communications
    (Department of Communication Engineering and Informatics from Apr.2010)
    Apr. 2013 - Mar. 2016 : Professor,
    Department of Communication Engineering and Informatics,
    University of Electro-Communications
    Apr. 2016 - Professor,
    Department of Computer and Network Engineering,
    University of Electro-Communications

Degree

  • Ph. D., The University of Tokyo

Research Keyword

  • FPGA
  • GPGPU
  • High Performance Computing
  • ハイパフォーマンスコンピューティング

Field Of Study

  • Informatics, High-performance computing
  • Informatics, Computer systems

Educational Background

  • Apr. 1995 - Mar. 1998
    Graduate School of University of Tokyo, College of Arts and Sciences, Department of General Systems Studies

Member History

  • Apr. 2012 - Mar. 2019
    専門委員, 東工大GSIC共同利用専門委員会, Others
  • Apr. 2013 - Mar. 2018
    委員, 東大情報基盤センタースーパーコンピューティング専門委員会, Others
  • Apr. 2011 - Mar. 2015
    MWG委員, 情報処理学会会誌編集委員会, Others
  • Apr. 2010 - Mar. 2014
    運営委員, 情報処理学会HPC研究会運営委員会, Others

Award

  • Nov. 2009
    ACM
    USA
    Gordon Bell Prize (Winner, Low Price / Performance Category)
    United States
  • Nov. 2006
    IEEE
    USA
    Gordon Bell Prize (Honorable Mention, Peak Performance)
    United States
  • Nov. 2000
    IEEE
    USA
    Gordon Bell Prize (Winner, Peak Performance)
    United States
  • Sep. 1996
    日経サイエンス社
    AVS大賞(最優秀賞)

Paper

  • An Implementation and Evaluation of a Remote Control System for Logic Circuit Design Assignment using an FPGA
    Hideo Akaike; Toshiyuki Shimazaki; Tetsu Narumi
    情報処理学会論文誌 教育とコンピュータ, 情報処理学会, 8, 2, 51-63, Jun. 2022, Peer-reviwed
    Scientific journal, Japanese
  • Estimating Configuration Parameters of Pipelines for accelerating N-Body Simulations with an FPGA using High-level Synthesis
    Tetsu Narumi; Akio Muramatsu
    9th International Conference on Pervasive and Embedded Computing and Communication Systems, 65-64, 19 Sep. 2019, Peer-reviwed
    International conference proceedings, English
  • CUDA offloading for energy‐efficient and high‐frame‐rate simulations using tablets
    Edgar Josafat Martinez‐Noriega; Syunji Yazaki; Tetsu Narumi
    Concurrency and Computation: Practice and Experience, John Wiley & Sons Ltd, 33, 2, e5488-14, 23 Aug. 2019, Peer-reviwed
    Scientific journal, English
  • Structural determinants in the bulk heterojunction
    Angela Acocella; Siegfried Höfinger; Ernst Haunschmid; Sergiu C. Pop; Tetsu Narumi; Kenji Yasuoka; Masato Yasui; Francesco Zerbetto
    Physical Chemistry Chemical Physics, Royal Society of Chemistry, 20, 8, 5708-5720, 2018, Peer-reviwed, Photovoltaics is one of the key areas in renewable energy research with remarkable progress made every year. Here we consider the case of a photoactive material and study its structural composition and the resulting consequences for the fundamental processes driving solar energy conversion. A multiscale approach is used to characterize essential molecular properties of the light-absorbing layer. A selection of bulk-representative pairs of donor/acceptor molecules is extracted from the molecular dynamics simulation of the bulk heterojunction and analyzed at increasing levels of detail. Significantly increased ground state energies together with an array of additional structural characteristics are identified that all point towards an auxiliary role of the material's structural organization in mediating charge-transfer and -separation. Mechanistic studies of the type presented here can provide important insights into fundamental principles governing solar energy conversion in next-generation photovoltaic devices.
    Scientific journal, English
  • An FPGA-Based Tiled Display System for a Wearable Display
    Tetsu Narumi
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATICS AND APPLICATIONS (ICIA2016), SOC DIGITAL INFORMATION & WIRELESS COMMUNICATIONS, 12-17, 2016, Peer-reviwed, We developed an FPGA-based tiled display system as a wearable display. Four 15.6-inch displays are arranged in a grid and formed a single display which has an input of HDMI signal. FPGAs are used to divide and scale the image into four LCD panels. This system can be used as a wearable display, such as a digital sandwich man. A smartphone which supports MHL output can be connected to the system, and users can modify the contents of the display on the fly. The merit of using hardware is that displays can be synchronized enough to play movies on them; No screen tearing occurs. The system consumes lower power than a single display with equivalent size.
    International conference proceedings, English
  • Comparison of the Accuracy of Periodic Reaction Field Methods in Molecular Dynamics Simulations of a Model Liquid Crystal System
    Nozawa, T; Takahashi, K. Z; Narumi, T; Yasuoka, K
    J. Comput. Chem., 36, 2406-2411, 15 Dec. 2015, Peer-reviwed
    Scientific journal, English
  • GPU-accelerated replica exchange molecular simulation on solid-liquid phase transition study of Lennard-Jones fluids
    Kentaro Nomura; Minoru Oikawa; Atsushi Kawai; Tetsu Narumi; Kenji Yasuoka
    MOLECULAR SIMULATION, TAYLOR & FRANCIS LTD, 41, 10-12, 874-880, Aug. 2015, Peer-reviwed, Determining the solid-liquid phase transition point by conventional molecular dynamics (MD) simulations is difficult because of the tendency of the system to get trapped in local minimum energy states at low temperatures and hysteresis during cooling and heating cycles. The replica exchange method, used in performing many MD simulations of the system at different temperature conditions simultaneously and performs exchanges of these temperatures at certain intervals, has been introduced as a tool to overcome this local-minimum problem. However, around the phase transition temperature, a greater number of different temperatures are required to adequately find the phase transition point. In addition, the number of different temperature values increases when treating larger systems resulting in huge computation times. We propose a computational acceleration of the replica exchange MD simulation on graphics processing units (GPUs) in studying first-order solid-liquid phase transitions of Lennard-Jones (LJ) fluids. The phase transition temperature for a 108-atom LJ fluid has been calculated to validate our new code. The result corresponds with that of a previous study using multicanonical ensemble. The computational speed is measured for various GPU-cluster sizes. A peak performance of 196.3GFlops with one GPU and 8.13TFlops with 64GPUs is achieved.
    Scientific journal, English
  • Application of isotropic periodic sum method for 4-pentyl-4 '-cyanobiphenyl liquid crystal
    Takuma Nozawa; Kazuaki Z. Takahashi; Shun Kameoka; Tetsu Narumi; Kenji Yasuoka
    MOLECULAR SIMULATION, TAYLOR & FRANCIS LTD, 41, 10-12, 927-935, Aug. 2015, Peer-reviwed, In future large-scale molecular dynamics (MD) simulations that will use parallel computing, the isotropic periodic sum (IPS) method is expected to effectively reduce the cost of interaction calculations while maintaining adequate accuracy. To assess the accuracy of this method in estimating low-charge-density polymer systems, we performed atomistic MD simulations of the bulk state of liquid crystal systems based on 4-pentyl-4 '-cyanobiphenyl (5CB). In conditions of 270 K <= T <= 320 K and a normal pressure, the temperature dependence of the density, potential energy and order parameter was estimated using the IPS and Ewald sum method. The results of the IPS method and Ewald sum were consistent within the range of error. In conditions close to the phase transition point, however, the averaged values of potential energy and order parameter had a small difference. We concluded that the fundamental physical properties for the bulk state of 5CB systems are determined reasonably by using the IPS method, at least in conditions that are not close to the phase transition point.
    Scientific journal, English
  • Acceleration of the Fast Multipole Method on FPGA Devices
    Hitoshi Ukawa; Tetsu Narumi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E98D, 2, 309-312, Feb. 2015, Peer-reviwed, The fast multipole method (FMM) for N-body simulations is attracting much attention since it requires minimal communication between computing nodes. We implemented hardware pipelines specialized for the FMM on an FPGA device, the GRAPE-9. An N-body simulation with 1.6 x 10(7) particles ran 16 times faster than that on a CPU. Moreover the particle-to-particle stage of the FMM on the GRAPE-9 executed 2.5 times faster than on a GPU in a limited case.
    Scientific journal, English
  • Acceleration of Othello Computer Game using an FPGA Tablet
    Tomoya Sato; Tetsu Narumi
    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), IEEE, WANC2, 581-584, 2015, Peer-reviwed, In this research, we accelerate the AI (Artificial Intelligence) portion of the Othello computer game using an FPGA (Field Programmable Gate Arrays). In computer-based board games such as Go and Othello, the calculation time for the AI part increases according to the number of states of a game and the level of look-ahead. We propose the use of an FPGA tablet running Android OS to accelerate these games. Generally, the CPUs in mobile devices are relatively slow, and a dedicated circuit with an FPGA can easily overtake. Using the FPGA allows us to reconfigure the circuit for different applications while the operating system is running. We achieve 2.5 times acceleration compared with a CPU by developing a pattern matching circuit.
    International conference proceedings, English
  • 1,024GPUを使用したレプリカ交換分子動力学シミュレーションの並列化
    老川 稔; 野村 昴太郎; 泰岡 顕治; 成見 哲
    情報処理学会 コンピューティングシステム, 48, Dec. 2014, Peer-reviwed
    Scientific journal, Japanese
  • Parallel Molecular Dynamics Simulation with Replica-Exchange Method using 1,024 GPUs
    Minoru Oikawa; Kentaro Nomura; Kenji Yasuoka; Tetsu Narumi
    IPSJ Trans. ACS, Information Processing Society of Japan (IPSJ), 7, 4, 1-14, Dec. 2014, Peer-reviwed, We parallelized a molecular dynamics simulation software using single-node GPUs to support multi-node GPUs by applying GPU virtualization tool to CUDA C/C++ program. To obtain the global minimum state of molecules efficiently, we used replica-exchange method, which is suitable for parallel com- putation since it has large granularity and small communication between nodes. The GPU virtualization software enabled us to use a single-node program to support multi-node calculation without any modification. Parallel efficiency with up to 1,024 GPUs are measured and 87% of efficiency was obtained even with 1,024 GPUs. The main reason of losing efficiency for higher parallelism was quantitatively analyzed and we found it mostly the latency between nodes.
    Scientific journal, Japanese
  • Petascale molecular dynamics simulation using the fast multipole method on K computer
    Yousuke Ohno; Rio Yokota; Hiroshi Koyama; Gentaro Morimoto; Aki Hasegawa; Gen Masumoto; Noriaki Okimoto; Yoshinori Hirano; Huda Ibeid; Tetsu Narumi; Makoto Taiji
    COMPUTER PHYSICS COMMUNICATIONS, ELSEVIER SCIENCE BV, 185, 10, 2575-2585, Oct. 2014, Peer-reviwed, In this paper, we report all-atom simulations of molecular crowding - a result from the full node simulation on the "K computer", which is a 10-PFLOPS supercomputer in Japan. The capability of this machine enables us to perform simulation of crowded cellular environments, which are more realistic compared to conventional MD simulations where proteins are simulated in isolation. Living cells are "crowded" because macromolecules comprise similar to 30% of their molecular weight. Recently, the effects of crowded cellular environments on protein stability have been revealed through in-cell NMR spectroscopy. To measure the performance of the "K computer", we performed all-atom classical molecular dynamics simulations of two systems: target proteins in a solvent, and target proteins in an environment of molecular crowders that mimic the conditions of a living cell. Using the full system, we achieved 4.4 PFLOPS during a 520 million-atom simulation with cutoff of 28 angstrom. Furthermore, we discuss the performance and scaling of fast multipole methods for molecular dynamics simulations on the "K computer", as well as comparisons with Ewald summation methods. (C) 2014 Elsevier B.V. All rights reserved.
    Scientific journal, English
  • Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
    Rio Yokota; L. A. Barba; Tetsu Narumi; Kenji Yasuoka
    Computer Physics Communications, 184, 3, 445-455, Mar. 2013, Peer-reviwed, This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (fmm) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the fft algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the tsubame-2.0 system). The fft-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-All communication pattern of the fft algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date. © 2012 Elsevier B.V. All rights reserved.
    Scientific journal, English
  • Common force field thermodynamics of cholesterol
    Francesco Giangreco; Eiji Yamamoto; Yoshinori Hirano; Milan Hodoscek; Volker Knecht; Matteo Di Giosia; Matteo Calvaresi; Francesco Zerbetto; Kenji Yasuoka; Tetsu Narumi; Masato Yasui; Siegfried Höfinger
    The Scientific World Journal, Hindawi Publishing Corporation, 2013, 1-7, 2013, Peer-reviwed, Four different force fields are examined for dynamic characteristics using cholesterol as a case study. The extent to which various types of internal degrees of freedom become thermodynamically relevant is evaluated by means of principal component analysis. More complex degrees of freedom (angle bending, dihedral rotations) show a trend towards force field independence. Moreover, charge assignments for membrane-embedded compounds are revealed to be critical with significant impact on biological reasoning. © 2013 Francesco Giangreco et al.
    Scientific journal, English
  • DS-CUDA: a Middleware to Use Many GPUs in the Cloud Environment
    Minoru Oikawa; Atsushi Kawai; Kentaro Nomura; Kazuyuki Yoshikawa; Kenji Yasuoka; Tetsu Narumi
    SHPCloud workshop at SC12, 10-16, Nov. 2012, Peer-reviwed
    International conference proceedings, English
  • GPU-accelerated computation of electron transfer
    Siegfried Hoefinger; Angela Acocella; Sergiu C. Pop; Tetsu Narumi; Kenji Yasuoka; Titus Beu; Francesco Zerbetto
    JOURNAL OF COMPUTATIONAL CHEMISTRY, WILEY-BLACKWELL, 33, 29, 2351-2356, Nov. 2012, Peer-reviwed, Electron transfer is a fundamental process that can be studied with the help of computer simulation. The underlying quantum mechanical description renders the problem a computationally intensive application. In this study, we probe the graphics processing unit (GPU) for suitability to this type of problem. Time-critical components are identified via profiling of an existing implementation and several different variants are tested involving the GPU at increasing levels of abstraction. A publicly available library supporting basic linear algebra operations on the GPU turns out to accelerate the computation approximately 50-fold with minor dependence on actual problem size. The performance gain does not compromise numerical accuracy and is of significant value for practical purposes. (c) 2012 Wiley Periodicals, Inc.
    Scientific journal, English
  • An Improved Isotropic Periodic Sum Method That Uses Linear Combinations of Basis Potentials
    Kazuaki Z. Takahashi; Tetsu Narumi; Donguk Suh; Kenji Yasuoka
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, AMER CHEMICAL SOC, 8, 11, 4503-4516, Nov. 2012, Peer-reviwed, Isotropic Periodic sum.(IPS) is a technique that calculates long-range interactions differently than conventional lattice sum methods. The difference between IPS and lattice sum methods lies in the shape and distribution of remote images for.,, long range interaction calculations. The images used in lattice stun calculations are identical to those generated from periodic boundary conditions and are discretely positioned at lattice points in space. The images for IPS calculations are "imaginary", which means they do not explicitly exist in a simulation system and are distributed isotropically and periodically around each particle. Two different versions of the original IPS method exist The IPSn method is applied to calculations for point charges, whereas the IPSp method calculates polar molecules. However, both IFSn and IPSp have their advantages and disadvantages in simulating bulk Water or water-vapor interfacial systems. In bulk water systems, the cutoff radius effect of IPSn strongly affects the configuration, whereas IPSp does not provide adequate estimations of water-vapor interfacial systems unless very long cutoff radii are used: To extend the applicability of the IPS technique, an improved IPS method, which has better accuracy in both homogeneous and heterogeneous systems has been developed and named the linear-combination-based isotropic periodic sum (LIPS) method. This improved IPS method uses linear combinations of basis potentials. We performed molecular dynamics (MD) simulations of bulk water and water-vapor interfacial systems to evaluate the accuracy of the LIPS method. For bulk water systems, the LIPS method has better accuracy than IPSn in estimating thermodynamic and configurational properties without the countercharge assumption, which is used for IPSp. For water-vapor interfacial systems, LIPS has better accuracy than IPSp and properly estimates thermodynamic and configurational properties. In conclusion, the LIPS method can successfully estimate homogeneous and heterogeneous systems of polar molecular systems with good accuracy.
    Scientific journal, English
  • Optimization of Molecular Dynamics Core Program on the K computer
    Yousuke Ohno; Rio Yokota; Hiroshi Koyama; Gentaro Morimoto, Aki; Hasegawa; Gen Masumoto; Tetsu Narumi; Makoto Taiji
    International Conference on Simulation Technology (JSST2012), 1-2, Sep. 2012
    International conference proceedings, English
  • Structural features of aquaporin 4 supporting the formation of arrays and junctions in biomembranes
    Siegfried Hoefinger; Eiji Yamamoto; Yoshinori Hirano; Francesco Zerbetto; Tetsu Narumi; Kenji Yasuoka; Masato Yasui
    BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES, ELSEVIER SCIENCE BV, 1818, 9, 2234-2243, Sep. 2012, Peer-reviwed, A limited class of aquaporins has been described to form regular arrays and junctions in membranes. The biological significance of these structures, however, remains uncertain. Here we analyze the underlying physical principles with the help of a computational procedure that takes into account protein-protein as well as protein-membrane interactions. Experimentally observed array/junction structures are systematically (dis)assembled and major driving forces identified. Aquaporin 4 was found to be markedly different from the non-junction forming aquaporin 1. The environmental stabilization resulting from embedding into the biomembrane was identified as the main driving force. This highlights the role of protein-membrane interactions in aquaporin 4. Analysis of the type presented here can help to decipher the biological role of membrane arrays and junctions formed by aquaporin. (C) 2012 Elsevier B.V. All rights reserved.
    Scientific journal, English
  • Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability
    Atsushi Kawai; Kenji Yasuoka; Kazuyuki Yoshikawa; Tetsu Narumi
    The Fourth International Conference on Future Computational Technologies and Applications, FUTURE CONPUTING 2012, 22-27, Jul. 2012, Peer-reviwed
    International conference proceedings, English
  • DS-CUDA: a Middleware to Use Many GPUs in the Cloud Environment
    Minoru Oikawa; Atsushi Kawai; Kentaro Nomura; Kenji Yasuoka; Kazuyuki Yoshikawa; Tetsu Narumi
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), IEEE, 1207-1214, 2012, Peer-reviwed, GPGPU (General-purpose computing on graphics processing units) has several difficulties when used in cloud environment, such as narrow bandwidth, higher cost, and lower security, compared with computation using only CPUs. Most high performance computing applications require huge communication between nodes, and do not fit a cloud environment, since network topology and its bandwidth are not fixed and they affect the performance of the application program. However, there are some applications for which little communication is needed, such as molecular dynamics (MD) simulation with the replica exchange method (REM). For such applications, we propose DS-CUDA (Distributed-shared compute unified device architecture), a middleware to use many GPUs in a cloud environment with lower cost and higher security. It virtualizes GPUs in a cloud such that they appear to be locally installed GPUs in a client machine. Its redundant mechanism ensures reliable calculation with consumer GPUs, which reduce the cost greatly. It also enhances the security level since no data except command and data for GPUs are stored in the cloud side. REM-MD simulation with 64 GPUs showed 58 and 36 times more speed than a locally-installed GPU via InfiniBand and the Internet, respectively.
    International conference proceedings, English
  • Cut-off radius effect of the isotropic periodic sum method for polar molecules in a bulk water system
    Kazuaki Takahashi; Tetsu Narumi; Kenji Yasuoka
    MOLECULAR SIMULATION, TAYLOR & FRANCIS LTD, 38, 5, 397-403, 2012, Peer-reviwed, Molecular dynamics simulation has been applied for water to compare the isotropic periodic sum (IPS) method for polar molecules (IPSp) to the normal IPS (IPSn) method and the Ewald sum by evaluating the diffusion coefficient and liquid structure. In our previous study, we have applied the IPSn method for bulk water and found notable deviation of the radial distribution function g(r). In this work, the IPSp gives a good estimation for the potential energy and the self-diffusion coefficient at a cut-off radius, r(c), greater than 2.2 nm while avoiding the notable deviation of g(r) which appeared in the case of IPSn. The distance-dependent Kirkwood factor G(k)(r) was also calculated, and the truncation of a long-range interaction of the cut-off-like method (such as cut-off with or without the switch function and the reaction field) shows serious shortcomings for dipole dipole correlations in bulk water systems. This was observed by comparing the shape to that of the Ewald sum. G(k)(r) of the cut-off-like method greatly deviates from that of the Ewald sum. However, the discrepancy of G(k)(r) for IPSp method was found to be much less than that of other typical cut-off-like methods. We conclude that the IPSp method is an adequately accurate technique for estimating transport coefficients and the liquid structure of water in a homogeneous system at long cut-off distances.
    Scientific journal, English
  • A combination of the tree-code and IPS method to simulate large scale systems by molecular dynamics
    Kazuaki Z. Takahashi; Tetsu Narumi; Kenji Yasuoka
    JOURNAL OF CHEMICAL PHYSICS, AMER INST PHYSICS, 135, 17, 174108, Nov. 2011, Peer-reviwed, An IPS/Tree method which is a combination of the isotropic periodic sum (IPS) method and tree-based method was developed for large-scale molecular dynamics simulations, such as biological and polymer systems, that need hundreds of thousands of molecules. The tree-based method uses a hierarchical tree structure to reduce the calculation cost of long-range interactions. IPS/Tree is an efficient method like IPS/DFFT, which is a combination of the IPS method and FFT in calculating large-scale systems that require massively parallel computers. The IPS method has two different versions: IPSn and IPSp. The basic idea is the same expect for the fact that the IPSn method is applied to calculations for point charges, while the IPSp method is used to calculate polar molecules. The concept of the IPS/Tree method is available for both IPSn and IPSp as IPSn/Tree and IPSp/Tree. Even though the accuracy of the Coulomb forces with tree-based method is well known, the accuracy for the combination of the IPS and tree-based methods is unclear. Therefore, in order to evaluate the accuracy of the IPS/Tree method, we performed molecular dynamics simulations for 32 000 bulk water molecules, which contains around 10(5) point charges. IPSn/Tree and IPSp/Tree were both applied to study the interaction calculations of Coulombic forces. The accuracy of the Coulombic forces and other physical properties of bulk water systems were evaluated. The IPSp/Tree method not only has reasonably small error in estimating Coulombic forces but the error was almost the same as the theoretical error of the ordinary tree-based method. These facts show that the algorithm of the tree-based method can be successfully applied to the IPSp method. On the other hand, the IPSn/Tree has a relatively large error, which seems to have been derived from the interaction treatment of the original IPSn method. The self-diffusion and radial distribution functions of water were calculated each by both the IPSn/Tree and IPSp/Tree methods, where both methods showed reasonable agreement with the Ewald method. In conclusion, the IPSp/Tree method is a potentially fast and sufficiently accurate technique for predicting transport coefficients and liquid structures of water in a homogeneous system. (C) 2011 American Institute of Physics. [doi:10.1063/1.3658640]
    Scientific journal, English
  • FAST QUASI DOUBLE-PRECISION METHOD WITH SINGLE-PRECISION HARDWARE TO ACCELERATE SCIENTIFIC APPLICATIONS
    Tetsu Narumi; Tsuyoshi Hamada; Keigo Nitadori; Ryuji Sakamaki; Kenji Yasuoka
    INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, WORLD SCIENTIFIC PUBL CO PTE LTD, 8, 3, 561-581, Sep. 2011, Peer-reviwed, The recent commodity hardware, such as the Cell processor in PLAYSTATION 3 or GeForce GTX280 GPU, has much more peak performance in single-precision than that of CPU of PCs, while the double-precision performance of these computers is comparable to it. Even though a quasi double-precision method achieves comparable accuracy to double-precision, the performance of it is relatively low. In this paper, the quasi double-precision method is modified, and it can deliver more performance than that by native double-precision of the GPU. Calculation of a Mandelbrot set and molecular dynamics (MD) simulation are used to test the method.
    Scientific journal, English
  • Fast Calculation of Electrostatic Potentials on the GPU or the ASIC MD-GRAPE-3
    Tetsu Narumi; Kenji Yasuoka; Makoto Taiji; Francesco Zerbetto; Siegfried Hoefinger
    COMPUTER JOURNAL, OXFORD UNIV PRESS, 54, 7, 1181-1187, Jul. 2011, Peer-reviwed, Electrostatic potentials (ESPs) are frequently used in structural biology for the characterization of biomolecules. Here we study the potential employment of hardware accelerators like the graphics processing unit or the application-specific integrated circuit MD-GRAPE-3 for the purpose of efficient computation of ESPs. An algorithm closely coupled to the general description of molecular surfaces is ported to both specialized architectures. The high-level interface library MR1/3 is used, which greatly simplifies the porting process. Hardware-accelerated versions show significant Speed-Up factors reaching values of up to 27x. Once ESP computations have become a matter of seconds, the underlying application can be offered in the form of a web service.
    Scientific journal, English
  • Cutoff radius effect of the isotropic periodic sum and Wolf method in liquid-vapor interfaces of water
    Kazuaki Z. Takahashi; Tetsu Narumi; Kenji Yasuoka
    JOURNAL OF CHEMICAL PHYSICS, AMER INST PHYSICS, 134, 17, 174112, May 2011, Peer-reviwed, As a more economical but similarly accurate computation method than the Ewald sum, the isotropic periodic sum (IPS) method for nonpolar molecules (IPSn) and polar molecules (IPSp), along with the Wolf method are of interest, but the cutoff radius dependence is an important issue. To evaluate the cutoff radius effect of the three methods, a water-vapor interfacial system has been studied by molecular dynamics. The Wolf method can produce adequate results for surface tension compared to that of the Ewald sum (within 2.9%) at a long enough cutoff radius, r(c). However, the estimation of the electrostatic potential profile and dipole orientational function is poor. The Wolf method cannot estimate electrostatic configuration at r(c) <= L-z/2 (L-z is the longest lattice of the system). We have found that the convergence of the surface tension and the electrostatic configuration of the IPSn method is faster than that of the IPSp method. Moreover, the IPSn method is most accurate among the three methods for the same cutoff radius. Furthermore, the behavior of the surface tension against the cutoff radius shows a greater difference for the IPSn and IPSp method. The surface tension of the IPSp method fluctuates and presents a similar result to that of the Ewald sum, but the surface tension for the IPSn method greatly deviates near r(c) = L-z/3. The cause of this deviation is the difference between the interfacial configuration of the water surface and the cutoff treatment of the IPS method. The deviation becomes insignificant far from r(c) = L-z/3. In spite of this shortcoming, the IPSn method gives the most accurate result in estimating the surface tension at r(c) = L-z/2. From all the results in this work, the IPSn and IPSp method have been found to be more accurate than the Wolf method. In conclusion, the surface tension and structure of water-vapor interface can be calculated by the IPSn method when rc is greater than or equal to the longest lattice of the system. The IPSp method and the Wolf method require a longer cutoff radius than the longest lattice of the system to estimate interfacial properties. c 2011 American Institute of Physics. [doi:10.1063/1.3578473]
    Scientific journal, English
  • Thermodynamic properties of methane/water interface predicted by molecular dynamics simulations
    Ryuji Sakamaki; Amadeu K. Sum; Tetsu Narumi; Ryo Ohmura; Kenji Yasuoka
    JOURNAL OF CHEMICAL PHYSICS, AMER INST PHYSICS, 134, 14, 144702, Apr. 2011, Peer-reviwed, Molecular dynamics simulations have been performed to examine the thermodynamic properties of methane/water interface using two different water models, the TIP4P/2005 and SPC/E, and two sets of combining rules. The density profiles, interfacial tensions, surface excesses, surface pressures, and coexisting densities are calculated over a wide range of pressure conditions. The TIP4P/2005 water model was used, with an optimized combining rule between water and methane fit to the solubility, to provide good predictions of interfacial properties. The use of the infinite dilution approximation to calculate the surface excesses from the interfacial tensions is examined comparing the surface pressures obtained by different approaches. It is shown that both the change of methane solubilities in pressure and position of maximum methane density profile at the interface are independent of pressure up to about 2 MPa. We have also calculated the adsorption enthalpies and entropies to describe the temperature dependency of the adsorption. (C) 2011 American Institute of Physics. [doi:10.1063/1.3579480]
    Scientific journal, English
  • Molecular dynamics simulations of vapor/liquid coexistence using the nonpolarizable water models
    Ryuji Sakamaki; Amadeu K. Sum; Tetsu Narumi; Kenji Yasuoka
    JOURNAL OF CHEMICAL PHYSICS, AMER INST PHYSICS, 134, 12, 124708, Mar. 2011, Peer-reviwed, The surface tension, vapor-liquid equilibrium densities, and equilibrium pressure for common water models were calculated using molecular dynamics simulations over temperatures ranging from the melting to the critical points. The TIP4P/2005 and TIP4P-i models produced better values for the surface tension than the other water models. We also examined the correlation of the data to scaling temperatures based on the critical and melting temperatures. The reduced temperature (T/Tc) gives consistent equilibrium densities and pressure, and the shifted temperature T + (T-c,T-exp - T-c,T-sim) gives consistent surface tension among all models considered in this study. The modified fixed charge model which has the same Lennard-Jones parameters as the TIP4P-FQ model but uses an adjustable molecular dipole moment is also simulated to find the differences in the vapor-liquid coexistence properties between fixed and fluctuating charge models. The TIP4P-FQ model (2.72 Debye) gives the best estimate of the experimental surface tension. The equilibrium vapor density and pressure are unaffected by changes in the dipole moment as well as the surface tension and liquid density. (C) 2011 American Institute of Physics. [doi: 10.1063/1.3574038]
    Scientific journal, English
  • CUTOFF RADIUS EFFECT OF WATER CONFIGURATION USING THE WOLF METHOD
    Kazuaki Takahashi; Tetsu Narumi; Kenji Yasuoka
    PROCEEDINGS OF THE ASME/JSME 8TH THERMAL ENGINEERING JOINT CONFERENCE 2011, VOL 3, AMER SOC MECHANICAL ENGINEERS, 615-+, 2011, Peer-reviwed, Molecular dynamics simulation has been applied for water to compare the Wolf method to the IPS method and the Ewald sum by evaluating the diffusion coefficient and liquid structure. In our previous study, we applied the IPS method for bulk water and found notable deviation of the radial distribution function g(r). The Wolf method gives a good estimation for the potential energy and the self-diffusion coefficient at a cutoff radius, r(c), greater than 2.2 nm while avoiding the notable deviation of g(r) which appeared in the case of IPS. The distance dependent Kirk-wood factor G(k)(r) was also calculated, and the truncation of a long-range interaction of the cutofflike method (such as cutoff with or without the switch function and the reaction field) show serious shortcomings for dipole-dipole correlations in bulk water systems. This was observed by comparing the shape to that of the Ewald sum. G(k)(r) of the cutofflike method greatly deviates from that of the Ewald sum. However the discrepancy of G(k)(r) for the Wolf method was found to be much less than that of other typical cutoff-like methods. We conclude that the Wolf method is an adequately accurate technique for estimating transport coefficients and the liquid structure of water in a homogeneous system at long cutoff distances.
    International conference proceedings, English
  • Accelerating Molecular Dynamics Simulation Using Graphics Processing Unit
    Hun Joo Myung; Ryuji Sakamaki; Kwang Jin Oh; Tetsu Narumi; Kenji Yasuoka; Sik Lee
    BULLETIN OF THE KOREAN CHEMICAL SOCIETY, KOREAN CHEMICAL SOC, 31, 12, 3639-3643, Dec. 2010, Peer-reviwed, We have developed CUDA-enabled version of a general purpose molecular dynamics simulation code for GPU. Implementation details including parallelization scheme and performance optimization are described. Here we have focused on the non-bonded force calculation because it is most time consuming part in molecular dynamics simulation. Timing results using CUDA-enabled and CPU versions were obtained and compared for a biomolecular system-containing 23558 atoms. CUDA-enabled versions were found to be faster than CPU version. This suggests that GPU could be a useful hardware for molecular dynamics simulation.
    Scientific journal, English
  • Cutoff radius effect of the isotropic periodic sum method in homogeneous system. II. Water
    Kazuaki Takahashi; Tetsu Narumi; Kenji Yasuoka
    JOURNAL OF CHEMICAL PHYSICS, AMER INST PHYSICS, 133, 1, 014109, Jul. 2010, Peer-reviwed, Molecular dynamics simulation has been applied for water to compare the isotropic periodic sum (IPS) method [X. Wu and B. R. Brooks, J. Chem. Phys. 122, 044107 (2005)] with the Ewald sum based on the diffusion coefficient and liquid structure. The IPS method gives a good estimation for the self-diffusion coefficient at a cutoff radius, r(c), greater than 2.2 nm; however, the radial distribution function g(r) has a notable deviation. The peak of this deviation appears at specific intermolecular distances which are near each cutoff radius and decrease in proportion to the inverse of the cube of r(c). Thus the deviation becomes insignificant (less than 1%) at r(c) greater than 2.2 nm. The distance dependent Kirkwood factor G(k)(r) was also calculated, and since the truncation of a long-range interaction of the cutofflike method (such as cutoff with or without the switch function and the reaction field) shows serious shortcomings for dipole-dipole correlations in bulk water systems, this was observed by comparing the shape to that of the Ewald sum [Y. Yonetani, J. Chem. Phys. 124, 204501 (2006); D. van der Spoel and P. J. van Maaren, J. Chem. Theory Comput. 2, 1 (2006)]. The G(k)(r) of cutofflike method greatly deviate from that of the Ewald sum. However, the discrepancy of G(k)(r) for the IPS method was found to be much less than that of other typical cutofflike methods. In conclusion, the IPS method is an adequately accurate technique for estimating transport coefficients and the liquid structure of water in a homogeneous system at long cutoff distances. (C) 2010 American Institute of Physics. [doi:10.1063/1.3462241]
    Scientific journal, English
  • A 281 Tflops Calculation for X-ray Protein Structure Analysis with Special-Purpose Computers MDGRAPE-3
    Yousuke Ohno; Eiji Nishibori; Tetsu Narumi; Takahiro Koishi; Tahir H. Tahirov; Hideo Ago; Masashi Miyano; Ryutaro Himeno; Toshikazu Ebisuzaki; Makoto Sakata; Makoto Taiji
    2007 ACM/IEEE SC07 CONFERENCE, IEEE, USB memory, 1-+, 2010, Peer-reviwed, We have achieved a sustained calculation speed of 281 Tflops for the optimization of the 3-D structures of proteins from the X-ray experimental data by the Genetic Algorithm - Direct Space (GA-DS) method. In this calculation we used MDGRAPE-3, special-purpose computer for molecular simulations, with the peak performance of 752 Tflops. In the GA-DS method, a set of selected parameters which define the crystal structures of proteins is optimized by the Genetic Algorithm. As a criterion to estimate the model parameters, we used the reliability factor R-1 which indicates the statistical difference between the calculated and the measured diffraction data. To evaluate this factor it is necessary to reconstruct the diffraction patterns of the model structures every time the model is updated. Therefore, in this method the nonequispaced Discrete Fourier Transformation (DFT) used to calculate the diffraction patterns dominates most of the computation time. To accelerate DFT calculations, we used the special-purpose computer, MDGRAPE-3. A molecule, Carbamoyl-Phosphate Synthetase was investigated. The final reliability factors were much smaller than the typical values obtained in other methods such as the Molecular Replacement (MR) method. Our results successfully demonstrate that high-performance computing with GA-DS method on special-purpose computers is effective for the structure determination of biological molecules and the method has a potential to be widely used in near future.
    International conference proceedings, English
  • Current Performance Gains From Utilizing the GPU or the ASIC MDGRAPE-3 Within an Enhanced Poisson Boltzmann Approach
    Tetsu Narumi; Kenji Yasuoka; Makoto Taiji; Siegfried Hoefinger
    JOURNAL OF COMPUTATIONAL CHEMISTRY, JOHN WILEY & SONS INC, 30, 14, 2351-2357, Nov. 2009, Peer-reviwed, Scientific applications do frequently suffer from limited compute performance. In this article. we investigate the suitability of specialized computer chips to overcome this limitation. An enhanced Poisson Boltzmann program is ported to the graphics processing unit and the application specific integrated circuit MDGRAPE-3 and resulting, execution times are compared to the conventional performance obtained on a modern central processing unit. Speed Up factors are measured and an analysis of numerical accuracy is provided. On both specialized architectures the improvement is increasing with problem size and reaches tip to a Speed Up factor of 39x for the largest problem studied. This type of alternative high performance computing can significantly improve the performance of demanding scientific applications. (C) 2009 Wiley Periodicals. Inc. J Comput Chem 30: 2351-2357, 2009
    Scientific journal, English
  • Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence
    R. Yokota; T. Narumi; R. Sakamaki; S. Kameoka; S. Obi; K. Yasuoka
    COMPUTER PHYSICS COMMUNICATIONS, ELSEVIER SCIENCE BV, 180, 11, 2066-2078, Nov. 2009, Peer-reviwed, Recent advances in the parallelizability of fast N-body algorithms, and the programmability of graphics processing units (GPUs) have opened a new path for particle based simulations. For the simulation of turbulence, vortex methods can now be considered as an interesting alternative to finite difference and spectral methods. The present study focuses on the efficient implementation of the fast multipole method and pseudo-particle method on a cluster of NVIDIA CeForce 8800 GT GPUs, and applies this to a vortex method calculation of homogeneous isotropic turbulence. The results of the present vortex method agree quantitatively with that of the reference calculation using a spectral method. We achieved a maximum speed of 7.48 TFlops using 64 GPUs, and the cost performance was near $9.4/GFlops. The calculation of the present vortex method on 64 GPUs took 4120 s, while the spectral method on 32 CPUs took 4910 s. (C) 2009 Elsevier B.V. All rights reserved.
    Scientific journal, English
  • High-Performance Drug Discovery: Computational Screening by Combining Docking and Molecular Dynamics Simulations
    Noriaki Okimoto; Noriyuki Futatsugi; Hideyoshi Fuji; Atsushi Suenaga; Gentaro Morimoto; Ryoko Yanai; Yousuke Ohno; Tetsu Narumi; Makoto Taiji
    PLOS COMPUTATIONAL BIOLOGY, PUBLIC LIBRARY SCIENCE, 5, 10, e1000528, Oct. 2009, Peer-reviwed, Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, this method is not completely reliable and therefore unsatisfactory. In this study, we used massive molecular dynamics simulations of protein-ligand conformations obtained by molecular docking in order to improve the enrichment performance of molecular docking. Our screening approach employed the molecular mechanics/Poisson-Boltzmann and surface area method to estimate the binding free energies. For the top-ranking 1,000 compounds obtained by docking to a target protein, approximately 6,000 molecular dynamics simulations were performed using multiple docking poses in about a week. As a result, the enrichment performance of the top 100 compounds by our approach was improved by 1.6-4.0 times that of the enrichment performance of molecular dockings. This result indicates that the application of molecular dynamics simulations to virtual screening for lead discovery is both effective and practical. However, further optimization of the computational protocols is required for screening various target proteins.
    Scientific journal, English
  • High-Performance Quasi Double-Precision Method using Single-Precision Hardware for Molecular Dynamics Simulations with GPUs
    Tetsu Narumi; Tsuyoshi Hamada; Keigo Nitadori; Ryuji Sakamaki; Shun Kameoka; Kenji Yasuoka
    HPC Asia 2009, 160-167, Mar. 2009, Peer-reviwed
    International conference proceedings, English
  • Using Special-Purpose and Video-Game Computers for Accelerating Molecular Dynamics Simulations
    Tetsu Narumi; Ryuji Sakamaki; Shun Kameoka; Kenji Yasuoka
    MOLECULAR SIMULATION IN MATERIAL AND BIOLOGICAL RESEARCH, NOVA SCIENCE PUBLISHERS, INC, 29-40, 2009, Peer-reviwed, Molecular Dynamics (MD) simulation requires huge computational power, as each atom interacts with the others by long range forces such as the Coulomb or vander Waals forces. Recently, a video game computer, such as SONY PLAYSTATION 3 (PS3) or NVIDIAs Graphics Processing Unit (GPU) has become a candidate hardware for accelerating MD simulations as well as an MDGRAPE-3 special-purpose computer for their better performance than current CPU of the PC, and also for their cost-effectiveness. We compared performances of gravitational N-body and MD simulations with four hardware: CPU (dual Intel Xeon E5430), PS3, GPU (NVIDIA GeForce9800GTX), and MDGRAPE-3. As for gravitational N-body simulation, the GPU is the fastest and almost best in also other criteria: cost/performance, power/performance, and size/performance. For MD simulation with AMBER MD package, the MDGRAPE-3 is the fastest and the best for power/performance, while the PS3 is the best for cost/performance.
    International conference proceedings, English
  • 42 TFlops Hierarchical N-body Simulations on GPUs with Applications in both Astrophysics and Turbulence
    Tsuyoshi Hamada; Rio Yokota; Keigo Nitadori; Tetsu Narumi; Kenji Yasuoka; Makoto Taiji
    PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS, IEEE, USB memory, 2009, Peer-reviwed, As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N-2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) on the GPUs with unprecedented efficiency. We demonstrate the performance of our method by choosing one standard application -a gravitational N-body simulation-and one non-standard application -simulation of turbulence using vortex particles. The gravitational simulation using the treecode with 1,608,044,129 particles showed a sustained performance of 42.15 TFlops. The vortex particle simulation of homogeneous isotropic turbulence using the periodic FMM with 16,777,216 particles showed a sustained performance of 20.2 TFlops. The overall cost of the hardware was 228,912 dollars. The maximum corrected performance is 28.1TFlops for the gravitational simulation, which results in a cost performance of 124 MFlops/$. This correction is performed by counting the Flops based on the most efficient CPU algorithm. Any extra Flops that arise from the GPU implementation and parameter differences are not included in the 124 MFlops/$.
    International conference proceedings, English
  • グラフィックカードを用いた水表面張力の高速分子動力学シミュレーション
    坂牧隆司; 成見哲; 泰岡顕治
    情報処理学会ACS論文誌, 2, 2, 89-97, 2009, Peer-reviwed
    Scientific journal, Japanese
  • JINR CICC in computational chemistry and nanotechnology problems: DL_POLY performance for different communication architectures
    E. Dushanov; Kh Kholmurodov; G. Aru; V. Korenkov; W. Smith; Y. Ohno; T. Narumi; G. Morimoto; M. Taiji; K. Yasuoka
    Physics of Particles and Nuclei Letters, 6, 3, 251-259, 2009, Peer-reviwed, This report compares the performance of the DL_POLY general-purpose molecular dynamics simulation package on the LIT JINR computing cluster CICC with various communication systems. The comparison involved two cluster architectures: Gigabit Ethernet and InfiniBand technologies, respectively. The code performance tests include some comparison of the CICC cluster with the special-purpose computer MDGRAPE-3 developed at RIKEN for a high-speed acceleration of the MD (molecular dynamics) without a fixed cutoff. The DL_POLY benchmark covers a set of typical MD system simulations detailed below. © Pleiades Publishing, Ltd. 2009.
    Scientific journal, English
  • Knoppix for CUDA : a CD-bootable GPGPU training environment
    HAMADA Tsuyoshi; NARUMI Tetsu; KONISHI Fumikazu; YASUOKA Kenji; OGURI Kiyoshi; SHIBATA Yuichiro; TAIJI Makoto
    Journal of the Visualization Society of Japan, 可視化情報学会, 28, 1, 249-254, 01 Jul. 2008
    Scientific journal, Japanese
  • Overheads in Accelerating Molecular Dynamics Simulations with GPUs
    Tetsu Narumi; Ryuji Sakamaki; Shun Kameoka; Kenji Yasuoka
    PDCAT 2008: NINTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, IEEE COMPUTER SOC, 143-150, 2008, Peer-reviwed, Molecular Dynamics (MD) simulation requires huge computational power as each atom interacts with the others by long range forces such as the Coulomb or van der Waals forces. Recently, a video game computer, such as SONY PLAYSTATION 3 (PS3) or NVIDIA's Graphics Processing Unit (GPU) has become a candidate hardware for accelerating MD simulations as well as an MDGRAPE-3 special-purpose computer for their better performance than current CPU of the PC, and also for their cost-effectiveness. Especially the latest GPU has much more peak performance than a CPU of the PC or an MDGRAPE-3, though a GPU has much more overheads in accelerating MD simulations. When the number of particles is small or the calculation kernel becomes complicated, the performance of the GPU drops dramatically as low as that of the MDGRAPE-3. However, the acceleration ratio of the GPU and the PS3 per cost exceeds that of the MDGRAPE-3.
    International conference proceedings, English
  • Cell size dependence of orientational order of uniaxial liquid crystals in flat slit
    Toshiki Mima; Tetsu Narumi; Shun Kameoka; Kenji Yasuoka
    MOLECULAR SIMULATION, TAYLOR & FRANCIS LTD, 34, 8, 761-773, 2008, Peer-reviwed, In order to investigate the ordered structure of nematic liquid crystal molecules confined in a nanoslit, we carried out a classical molecular dynamics simulation of uniaxial prolate Gay-Berne particles in a flat, structureless slit at several temperatures. When the slit gap is so small that the system is not assumed as the bulk, particles in the slit possess orientationally ordered structures different from ones in the bulk. The weak spacial orientational correlation existed when the temperature corresponded to the isotropic phase in the bulk system. The first order isotropic-nematic phase transition was not clearly observed and the transitional phenomenon of the creation and annihilation of the uniaxial domains were observed. These results revealed that the ordered structure depends on the number of particles, in other words, cell size, and that the system with 100,000 or more particles gives reasonable results of an infinitely wide slit. The number of particles is converted into up to 220 particles of the length of the base.
    Scientific journal, English
  • ACCELERATING MOLECULAR DYNAMICS SIMULATIONS ON PLAYSTATION 3 PLATFORM USING VIRTUAL-GRAPE PROGRAMMING MODEL
    Tetsu Narumi; Shun Kameoka; Makoto Taiji; Kenji Yasuoka
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, SIAM PUBLICATIONS, 30, 6, 3108-3125, 2008, Peer-reviwed, Molecular dynamics (MD) simulation requires huge computational power because each atom interacts with another by long-range forces, such as Coulomb and van der Waals forces. Therefore, parallel computers or special accelerators, such as MDGRAPE-3, are required for accelerating MD simulations. A video game processor in a Sony PlayStation 3 or NVIDIA's graphic accelerator card is also a candidate hardware for accelerating MD simulations, because the peak performance of the latest video game processors exceeds that of a current PC's CPU, and they are also very cost-effective. However, the software development for these processors requires much more time compared to CPUs of PCs because the hardware has a lot of parallel processing. We propose the virtual-GRAPE programming model to utilize the hardware resource of video game processors with minimum time for software development. GRAPE is a special-purpose computer used to accelerate particle-based simulations: astrophysical or MD simulations. Under the Virtual-GRAPE model, the subroutine whose calculation speed is accelerated by the special hardware, GRAPE, is replaced with a specially tuned subroutine to be used without the accelerator. We implemented this model in a PlayStation 3 to accelerate the "sander" MD module in the AMBER software package. We were able to achieve an acceleration of 20 times, compared to a serial job using an Intel Xeon 5160 processor. Its performance cost is far superior to that of a PC or an MDGRAPE-3. To obtain the highest performance from the subroutine, most of the arithmetic operations in the tuned routine were performed with single precision accuracy, which is sufficient for MD simulations.
    Scientific journal, English
  • Cutoff radius effect of isotropic periodic sum method for transport coefficients of Lennard-Jones liquid
    Kazuaki Takahashi; Kenji Yasuoka; Tetsu Narumi
    JOURNAL OF CHEMICAL PHYSICS, AMER INST PHYSICS, 127, 11, 114511, Sep. 2007, Peer-reviwed, Molecular dynamics simulations of a Lennard-Jones (LJ) liquid were applied to compare the isotropic periodic sum (IPS) method [X. Wu and B. R. Brooks, J. Chem. Phys. 122, 044107 (2005)], which can reduce the calculation cost of long-range interactions, such as the Lennard-Jones and Coulombic ones, with the cutoff method for the transport coefficients which includes the self-diffusion coefficient, bulk viscosity, and thermal conductivity. The self-diffusion coefficient, bulk viscosity, and thermal conductivity were estimated with reasonable accuracy if the cutoff distance of the LJ potential for the IPS method was greater than 3 sigma. The IPS method is an effective technique for estimating the transport coefficients of the Lennard-Jones liquid in a homogeneous system. (c) 2007 American Institute of Physics.
    Scientific journal, English
  • Folding dynamics of 10-residue beta-hairpin peptide chignolin
    Atsushi Suenaga; Tetsu Narumi; Noriyuki Futatsugi; Ryoko Yanai; Yousuke Ohno; Noriaki Okimoto; Makoto Taiji
    CHEMISTRY-AN ASIAN JOURNAL, WILEY-V C H VERLAG GMBH, 2, 5, 591-598, 2007, Peer-reviwed, Short peptides that fold into beta-hairpins are ideal model systems for investigating the mechanism of protein folding because their folding process shows dynamics typical of proteins. We performed folding, unfolding, and refolding molecular dynamics simulations (total of 2.7 mu s) of the 10-residue beta-hairpin peptide chignolin, which is the smallest beta-hairpin structure known to be stable in solution. Our results revealed the folding mechanism of chignolin, which comprises three steps. First, the folding begins with hydrophobic assembly. It brings the main chain together; subsequently, a nascent turn structure is formed. The second step is the conversion of the nascent turn into a tight turn structure along with interconversion of the hydrophobic packing and interstrand hydrogen bonds. Finally, the formation of the hydrogen-bond network and the complete hydrophobic core as well as the arrangement of side-chain-side-chain interactions occur at approximately the same time. This three-step mechanism appropriately interprets the folding process as involving a combination of previous inconsistent explanations of the folding mechanism of the beta-hairpin, that the first event of the folding is formation of hydrogen bonds and the second is that of the hydrophobic core, or vice versa.
    Scientific journal, English
  • A 55 Tflops Simulation of Amyloid-forming Peptides from Yeast Prion Sup35 with the Special-Purpose Computer System MDGRAPE-3
    Tetsu Narumi; Yousuke Ohno; Noriaki Okimoto; Takahiro Koishi; Atsushi Suenaga; Noriyuki Futatsugi; Yoko Yanai; Ryutaro Himeno; Shigenori Fujikawa; Mitsuru Ikei; Makoto Taiji
    SC2006, CDROM, Nov. 2006, Peer-reviwed
    International conference proceedings, English
  • Structure and dynamics of RNA polymerase II elongation complex
    A Suenaga; N Okimoto; N Futatsugi; Y Hirano; T Narumi; Y Ohno; R Yanai; T Hirokawa; T Ebisuzaki; A Konagaya; M Taiji
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, ACADEMIC PRESS INC ELSEVIER SCIENCE, 343, 1, 90-98, Apr. 2006, Peer-reviwed, RNA polymerise (Pot) II is a fundamental and important enzyme in the transcription process. However, two mysterious questions have remained unsolved: how an unwound bubble of DNA is established and maintained, and how the enzyme moves along the DNA. To answer these questions, we constructed a model structure of the Pot II elongation complex with the 50 base pairs of DNA-24 bases of RNA including the unwound bubble of DNA and performed a molecular dynamics simulation. We obtained a reliable model structure of the Pot II elongation complex in the pre-translocation state which has not yet been determined by the X-ray crystallographic study. The model structure revealed that multiple protein loops work concertedly to form and maintain the bubble structure. We also found that the conformational change of a loop in the Pot II, fork loop 1, couples with the unidirectional movement of the Pot II along the DNA. (c) 2006 Elsevier Inc. All rights reserved.
    Scientific journal, English
  • Novel mechanism of interaction of p85 subunit of phosphatidylinositol 3-kinase and ErbB3 receptor-derived phosphotyrosyl peptides
    A Suenaga; N Takada; M Hatakeyama; M Ichikawa; XM Yu; K Tomii; N Okimoto; N Futatsugi; T Narumi; M Shirouzu; S Yokoyama; A Konagaya; M Taiji
    JOURNAL OF BIOLOGICAL CHEMISTRY, AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC, 280, 2, 1321-1326, Jan. 2005, Peer-reviwed, Ligand-activated and tyrosine-phosphorylated ErbB3 receptor binds to the SH2 domain of the p85 subunit of phosphatidylinositol 3-kinase and initiates intracellular signaling. Here, we studied the interactions between the N- (N-SH2) and C- (C-SH2) terminal SH2 domains of the p85 subunit of the phosphatidylinositol 3-kinase and eight ErbB3 receptor-derived phosphotyrosyl peptides (P-peptides) by using molecular dynamics, free energy, and surface plasmon resonance (SPR) analyses. In SPR analysis, these P-peptides showed no binding to the C-SH2 domain, but P-peptides containing a phospho-YXXM or a non-phospho-YXXM motif did bind to the N- SH2 domain. The N- SH2 domain has two phosphotyrosine binding sites in its N- (N1) and C- (N2) terminal regions. Interestingly, we found that P-peptides of pY1180 and pY1241 favored to bind to the N2 site, although all other P-peptides showed favorable binding to the N1 site. Remarkably, two phosphotyrosines, pY1178 and pY1243, which are just 63 amino acids apart from the pY1241 and pY1180, respectively, showed favorable binding to the N1 site. These findings indicate a possibility that the pair of phosphotyrosines, pY1178-pY1241 or pY1243-pY1180, will fold into an appropriate configuration for binding to the N1 and N2 sites simultaneously. Our model structures of the cytoplasmic C- terminal domain of ErbB3 receptor also strongly supported the speculation. The calculated binding free energies between the N- SH2 domain and P-peptides showed excellent qualitative agreement with SPR data with a correlation coefficient of 0.91. The total electrostatic solvation energy between the N- SH2 domain and P-peptide was the dominant factor for its binding affinity.
    Scientific journal, English
  • Nanoscale hydrophobic interaction and nanobubble nucleation
    T Koishi; S Yoo; K Yasuoka; XC Zeng; T Narumi; R Susukita; A Kawai; H Furusawa; A Suenaga; N Okimoto; N Futatsugi; T Ebisuzaki
    PHYSICAL REVIEW LETTERS, AMER PHYSICAL SOC, 93, 18, 185701-1-4, Oct. 2004, Peer-reviwed, We report large-scale atomistic simulation of midrange nanoscale hydrophobic interaction, manifested by the nucleation of nanobubble between nanometer-sized hydrophobes at constrained equilibrium. When the length scale of the hydrophobes is greater than 2 nm, the nanobubble formation shows hysteresis behavior resembling the first-order transition. Calculation of the potential of mean force versus interhydrophobe distance provides a quantitative measure of the strength of the nanoscale hydrophobic interaction.
    Scientific journal, English
  • Simulations of magnetic materials with MDGRAPE-2
    BG Elmegreen; RH Koch; ME Schabes; T Crawford; T Ebisuzaki; H Furusawa; T Narumi; R Susukita; K Yasuoka
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, IBM CORP, 48, 2, 199-207, Mar. 2004, Peer-reviwed, The use of accelerator hardware for micromagnetics simulations is described, along with some initial results. The accelerator calculates the dipole interactions at 400 gigaflops, allowing large simulations to be performed with arbitrary geometries. Two research programs are highlighted, the simulation of a curved MRAM cell and the simulation of the write head in a computer disk drive.
    Scientific journal, English
  • Tyr-317 phosphorylation increases shc structural rigidity and reduces coupling of domain motions remote from the phosphorylation site as revealed by molecular dynamics simulations
    A Suenaga; AB Kiyatkin; M Hatakeyama; N Futatsugi; N Okimoto; Y Hirano; T Narumi; A Kawai; R Susukita; T Koishi; H Furusawa; K Yasuoka; N Takada; Y Ohno; M Taiji; T Ebisuzaki; JB Hoek; A Konagaya; BN Kholodenko
    JOURNAL OF BIOLOGICAL CHEMISTRY, AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC, 279, 6, 4657-4662, Feb. 2004, Peer-reviwed, Activated receptor tyrosine kinases bind the She adaptor protein through its N-terminal phosphotyrosine-binding (PTB) and C-terminal Src homology 2 (SH2) domains. After binding, Shc is phosphorylated within the central collagen-homology (CH) linker region on Tyr-317, a residue remote to both the PTB and SH2 domains. Shc phosphorylation plays a pivotal role in the initiation of mitogenic signaling through the Ras/Raf/MEK/ERK pathway, but it is unclear if Tyr-317 phosphorylation affects Shc-receptor interactions through the PTB and SH2 domains. To investigate the structural impact of Shc phosphorylation, molecular dynamics simulations were carried out using special-purpose Molecular Dynamics Machine-Grape computers. After a 1-nanosecond equilibration, atomic motions in the structures of unphosphorylated Shc and Shc phosphorylated on Tyr-317 were calculated during a 2-nanosecond period. The results reveal larger phosphotyrosine-binding domain fluctuations and more structural flexibility of unphosphorylated Shc compared with phosphorylated Shc. Collective motions between the PTB-SH2, PTB-CH, and CH-SH2 domains were highly correlated only in unphosphorylated Shc. Dramatic changes in domain coupling and structural rigidity, induced by Tyr-317 phosphorylation, may alter Shc function, bringing about marked differences in the association of unphosphorylated and phosphorylated Shc with its numerous partners, including activated membrane receptors.
    Scientific journal, English
  • 1P053 Proteomic Structural Analysis based on Molecular Dynamics Simulations : Protein-Protein Interactions of Ras-Raf and Ras-RalGDS com
    Futatsugi N.; Shirouzu M.; Suenaga A.; Okimoto N.; Narumi T.; Ebisuzaki T.; Yokoyama S.; Taiji M.; Konagaya A.
    Seibutsu Butsuri, The Biophysical Society of Japan General Incorporated Association, 44, S43, 2004
    Japanese
  • MDGRAPE-3: A petaflops special-purpose computer system for molecular dynamics simulations
    M Taiji; T Narumi; Y Ohno; A Konagaya
    PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, ELSEVIER SCIENCE BV, 13, 669-676, 2004, Peer-reviwed, We are developing the MDGRAPE-3 system, a petaflops special-purpose computer system for molecular dynamics simulations. It is a special-purpose engines that calculate nonbonded interactions between atoms, which is the most time-consuming part of the simulations. A dedicated LSI 'MDGRAPE-3 chip' performs these force calculations at a speed of 165 gigaflops or higher. The system will have 6,144 MDGRAPE-3 chips to achieve a nominal peak performance of one petaflops.
    International conference proceedings, English
  • Hardware accelerator for molecular dynamics: MDGRAPE-2
    R Susukita; T Ebisuzaki; BG Elmegreen; H Furusawa; K Kato; A Kawai; Y Kobayashi; T Koishi; GD McNiven; T Narumi; K Yasuoka
    COMPUTER PHYSICS COMMUNICATIONS, ELSEVIER SCIENCE BV, 155, 2, 115-131, Oct. 2003, Peer-reviwed, We developed MDGRAPE-2, a hardware accelerator that calculates forces at high speed in molecular dynamics (MD) simulations. MDGRAPE-2 is connected to a PC or a workstation as an extension board. The sustained performance of one MDGRAPE-2 board is 15 Gflops, roughly equivalent to the peak performance of the fastest supercomputer processing element. One board is able to calculate all forces between 10000 particles in 0.28 s (i.e. 310000 time steps per day). If 16 boards are connected to one computer and operated in parallel, this calculation speed becomes similar to10 times faster. In addition to MD, MDGRAPE-2 can be applied to gravitational N-body simulations, the vortex method and smoothed particle hydrodynamics in computational fluid dynamics. (C) 2003 Elsevier B.V. All rights reserved.
    Scientific journal, English
  • Molecular dynamics study on class A beta-lactamase: Hydrogen bond network among the functional groups of penicillin G and side chains of the conserved residues in the active site
    Y Fujii; N Okimoto; M Hata; T Narumi; K Yasuoka; R Susukita; A Suenaga; N Futatsugi; T Koishi; H Furusawa; A Kawai; T Ebisuzaki; S Neya; T Hoshino
    JOURNAL OF PHYSICAL CHEMISTRY B, AMER CHEMICAL SOC, 107, 37, 10274-10283, Sep. 2003, Peer-reviwed, Molecular dynamics simulation was performed on class A P-lactamase binding penicillin G (pen G). The structure of the acyl enzyme intermediate (AEI) was derived from the crystallographic data of the clavulanic acid bound enzyme. To execute the simulation precisely, the AEI was solvated by nearly 8000 water molecules and the no-cutoff (NCO) method was applied to the calculation of the Coulomb term. The Coulomb term calculation was accelerated with MDGRAPE-2 hardware. In the first step of this study, the relability of the NCO method was confirmed by comparing experimental and computational B-factors. We confirmed that the NCO method is much more reliable than the particle mesh Ewald and generalized Born methods. Hence the NCO method was applied for the simulation on AEI. The integrated simulation time was 1.2 ns. It was found from the simulation that Ser130, Asn132, Ser235, Gly237, and Arg244 cooperatively restricted the mobility of pen G moiety by making salt bridges among the side chains of these residues and the C3-carboxyl or C6-amide group of the substrate. The oxyanion hole composed of N atom in the main chain of Ser70 and Gly237 was properly reproduced under aqueous condition. The simulation also shows that it is impossible for Glu166 to act as a general base in the acylation of pen G because the average distance between Glu166 carboxyl oxygens and Ser700gamma is too far for direct proton transfer (5.2 and 5.5 Angstrom, respectively) and there is no water molecule between Glu 166 carboxylate and Ser700gamma. Molecular dynamics simulation on the substrate free enzyme (SFE) was also carried out and compared with AEI. While no drastic change due to the substrate binding was observed in both the secondary structure and the positions of catalytic residues of the enzyme, the mobility of the catalytic water molecule was strongly restricted by the presence of the substrate.
    Scientific journal, English
  • Molecular dynamics, free energy, and SPR analyses of the interactions between the SH2 domain of grb2 and ErbB phosphotyrosyl peptides
    A Suenaga; M Hatakeyama; M Ichikawa; XM Yu; N Futatsugi; T Narumi; K Fukui; T Terada; M Taiji; M Shirouzu; S Yokoyama; A Konagaya
    BIOCHEMISTRY, AMER CHEMICAL SOC, 42, 18, 5195-5200, May 2003, Peer-reviwed, We studied the interactions between the SH2 domain of growth factor receptor binding protein 2 (Grb2) and ErbB receptor-derived phosphotyrosyl peptides using molecular dynamics, free energy calculations, and surface plasmon resonance (SPR) analysis. Binding free energies for nine phosphotyrosyl peptides were calculated using the MM-PBSA continuum solvent method, and excellent qualitative agreement with the SPR experimental data, with a correlation coefficient of 0.92, was obtained. Consistent with previous experimental findings, phosphotyrosyl peptides with the consensus sequence pYXNX showed favorable binding affinity for the Grb2. Unexpectedly, phosphotyrosyl peptides with the consensus sequence pYQQD, which had not shown any specific binding affinity for the Grb2 in earlier studies, also showed favorable binding affinity for the Grb2 in our experimental and computational analyses. Component analysis of the calculated binding free energies revealed that van der Waals interaction between the Grb2 and the phosphotyrosyl peptide was the dominant factor for specificity and binding affinity. These results indicate that current methods of estimating binding free energies are efficient for obtaining important information about protein-protein interactions, which are essential for the transmission of signals in cellular signaling pathways.
    Scientific journal, English
  • Parallelized Simulation of Molecular Dynamics with a Special-Purpose Computer: MDGRAPE-2
    Takada Naoki; Futatsugi Noriyuki; Suenaga Atsushi; Narumi Tetsu; Okimoto Noriaki; Hirano Hidenori; Kawai Atsushi; Yasuoka Kenji; Ebisuzaki Toshikazu; Taiji Makoto; Konagaya Akihiko
    GI, Japanese Society for Bioinformatics, 14, 625-626, 2003
    English
  • Protein Explorer
    Makoto Taiji; Tetsu Narumi; Yousuke Ohno; Noriyuki Futatsugi; Atsushi Suenaga; Naoki Takada; Akihiko Konagaya
    Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03, ACM Press, 2003
    International conference proceedings
  • Protein explorer: A petaflops special-purpose computer system for molecular dynamics simulations
    Makoto Taiji; Tetsu Narumi; Yousuke Ohno; Noriyuki Futatsugi; Atsushi Suenaga; Naoki Takada; Akihiko Konagaya
    Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003, CDROM, 2003, Peer-reviwed, We are developing the 'Protein Explorer' system, a petaflops special-purpose computer system for molecular dynamics simulations. The Protein Explorer is a PC cluster equipped with special-purpose engines that calculate nonbonded interactions between atoms, which is the most time-consuming part of the simulations. A dedicated LSI 'MDGRAPE-3 chip' performs these force calculations at a speed of 165 gigaflops or higher. The system will have 6,144 MDGRAPE-3 chips to achieve a nominal peak performance of one petaflop. The system will be completed in 2006. In this paper, we describe the project plans and the architecture of the Protein Explorer. © 2003 ACM.
    International conference proceedings, English
  • Molecular Dynamics Simulations of Prion Proteins - Effect of Ala 117→Val Mutation -
    Noriaki Okimoto; Kazunori Yamanaka; Atsushi Suenaga; Yoshinori Hirano; Noriyuki Futatsugi; Tetsu Narumi; Kenji Yasuoka; Ryutaro Susukita; Takahiro Koishi; Hideaki Furusawa; Atsushi Kawai; Masayuki Hata; Tyuji Hoshino; Toshikazu Ebisuzaki
    Chem-Bio Informatics Journal, 3, 1-11, 2003, Peer-reviwed
    Scientific journal, English
  • Numerical simulations of magnetic materials with MD-GRAPE: curvature induced anisotropy
    BG Elmegreen; R Koch; K Yasuoka; H Furusawa; T Narumi; R Susukita; T Ebisuzaki
    JOURNAL OF MAGNETISM AND MAGNETIC MATERIALS, ELSEVIER SCIENCE BV, 250, 1-3, 39-48, Sep. 2002, Peer-reviwed, The time development of an array of magnetic dipoles representing the internal magnetization of a thin film is calculated using a hardware accelerator, MD-GRAPE, for the determination of the magnetic vector potential. The results for single-layer arrays of dipoles are compared with the analogous results obtained from an FFT method and found to be in reasonable agreement. Three-dimensional MD-GRAPE simulations with sinusoidal deformations illustrate the utility of the hardware accelerator in cases that cannot be solved easily with FFT methods. The deformations give the internal field an asymmetry with components parallel to the wave crest, leading to significant changes in the critical external fields required for switching. These changes occur even for small wave amplitudes, comparable to or less than the layer thickness. Layer curvature affects the astroid pattern of critical field strengths by shifting the threshold to lower absolute values of the hard axis field when the curvature is in the easy axis direction, and lower absolute values of the easy axis field when the curvature is in the hard axis direction. This effect of curvature differs from orange peel coupling between two layers because here there is only one layer, and because orange peel coupling shifts the whole astroid pattern as a result of an effective field bias, whereas here the pattern shape is changed without any centroid shift. (C) 2002 Elsevier Science B.V. All rights reserved.
    Scientific journal, English
  • A large scale molecular dynamics simulation for water with special-purpose computer MDM
    T Koishi; K Yasuoka; XC Zeng; T Narumi; R Susukita; A Kawai; H Furusawa; T Ebisuzaki
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS, INT INST INFORMATICS & SYSTEMICS, 503-507, 2002, Peer-reviwed, The calculation cost of molecular dynamics (AID) simulation in a system that contains Coulomb interaction is very large. We have developed a special-purpose computer, MDM (Molecular Dynamics Machine), to carry out simulation for a system that has over million particles. MDM can perform AID simulation for a Coulomb system at very high speed. A large-scale water AID simulation was carried out to estimate the calculation speed of the MDM. The calculation speed was found to be about 1000-times faster than that of a conventional workstation.
    International conference proceedings, English
  • Molecular dynamics study of the solidification process in alkali halide cluster
    T Koishi; K Yasuoka; T Narumi; R Susukita; H Furusawa; T Ebisuzaki
    JOURNAL OF NON-CRYSTALLINE SOLIDS, ELSEVIER SCIENCE BV, 312-14, 332-336, 2002, Peer-reviwed, Molecular dynamics (MD) simulations of the solidification process of an NaCl cluster are carried out. Voronoi analysis is employed to distinguish a crystal nucleus from molten NaCl. In the early stage of simulation, some small solid clusters of size smaller than the critical nucleus size are repeatedly formed and broken. Under the low-temperature condition (T = 700 K), in the later stage of the simulation, a polycrystal NaCl solid, in which two or three large solid grains survive, appears. Under the high-temperature condition (T = 740 K), one large single crystal cluster is formed. All simulations of this work are performed in a special-purpose computer for MD simulation, called MDGRAPE-2. (C) 2002 Elsevier Science B.V. All rights reserved.
    Scientific journal, English
  • An 8.61 Tflop/s Molecular Dynamics Simulation for NaCl with a Special-Purpose Computer: MDM
    Tetsu Narumi; Atsushi Kawai; Takahiro Koishi
    SC2001, ACM, CDROM, Nov. 2001, Peer-reviwed
    International conference proceedings, English
  • 1.34 Tflops Molecular Dynamics Simulation for NaCl with a Special-Purpose Computer: MDM
    Tetsu Narumi; Ryutaro Susukita; Takahiro Koishi; Kenji Yasuoka; Hideaki Furusawa; Atsushi Kawai; Toshikazu Ebisuzaki
    SC2000, CDROM, Nov. 2000, Peer-reviwed
    International conference proceedings, English
  • 46 Tflops special-purpose computer for molecular dynamics simulations: WINE-2
    T Narumi; R Susukita; H Furusawa; T Ebisuzaki
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, PUBLISHING HOUSE ELECTRONICS INDUSTRY, 1, 575-582, 2000, Peer-reviwed, We developed WINE-2, a 46 Tflops computer, to accelerate molecular dynamics (MD) simulations. We need huge computational power to understand the dynamics of large and complex molecules such as those of proteins and nucleic acids. WINE-2 is a specialised computer to accelerate the wavenumber-space part of the Ewald summation for calculating the Coulomb forces among atoms. It has a highly parallel architecture and is composed of 2,304 WINE-2 chips. We measured 29 Tflops of performance in the case of 24 million atoms and a half million wavenumber vectors. This sustained speed is more than seven times faster than the peak speed of the fastest supercomputer in the world.
    International conference proceedings, English
  • Molecular dynamics machine: Special-purpose computer for molecular dynamics simulations
    E Narumi; R Susukita; T Ebisuzaki; G McNiven; B Elmegreen
    MOLECULAR SIMULATION, TAYLOR & FRANCIS LTD, 21, 5-6, 401-415, 1999, Peer-reviwed, We are now developing Molecular Dynamics Machine (MDM), a special-purpose computer for classical molecular dynamics simulations. It accelerates the calculation of non-bonding force, Coulomb and van der Waals forces, because the calculation cost for Coulomb force dominates the total calculation time when we treat a large system of charged particles without truncating Coulomb force. When we use Ewald method, the Coulomb force can be calculated by dividing it into real-space and wavenumber-space parts. MDM is composed of MDGRAPE-2, WINE-2, and a host computer. MDGRAPE-2 calculates van der Weals force and real-space parr of Coulomb force. WINE-2 calculates wavenumber-space part of Coulomb force. The host computer calculates bonding-force and updates positions and velocities of atoms. The target performance of MDM is 100 Tflops and will sustain about 30 Tflops in realistic applications. It can calculate 3.2 x 10(6) lime-steps of MD simulation with a million atoms in a week. Total system will be completed in 1999.
    Scientific journal, English

MISC

  • Fault-Tolerant Molecular Dynamics Simulation using Virtualized GPUs
    老川稔; 野村昴太郎; 泰岡顕治; 成見哲
    GPU is widely used with many high-performance computing systems as a calculation accelerator. In recent years, the increasing scale and complexity of high-performance computing systems consists of many components causes the shorter MTBF(Mean Time Between Failure) period, so that the fault tolerant methodology are required to achieve the high reliability of the computing systems. We modified the GPU virtualization software to enhance the reliability of GPU computing systems, and tested the fault tolerant functions we implemented by executing molecular dynamics simulation using GPU devices. We supposed two types system fault, (1)calculation error during GPU kernel function, (2)unexpected disconnection between host and GPUs. We achieved the auto recovery functions from these faults, utilizing redundant GPUs, checkpointing and roll-back recovery techniques., Information Processing Society of Japan (IPSJ), 21 Jul. 2014, IPSJ SIG Notes, 2014, 40, 1-6, Japanese, 110009808135, AN10463942
  • G134 The electrostatic effect on ordering process of 4-pently-4'-cyanobiphenyl liquid crystal
    Nozawa Takuma; Takahashi Kazuaki; Narumi Tetsu; Yasuoka Kenji
    Atomic molecular dynamics of bulk liquid crystal state of 4-pentil-4'-cyanobiphenyl (5CB) system were performed. For the coulombic interaction treatment under the periodic boundary conditions, Particle Mesh Ewald (PME) method and Isotropic Periodic Sum (IPS) method were applied. In both cases, the system was ordered from 280K to 290K. However the ordering processes were different between the two cases. When the temperature was fixed to 280K, the result from PME showed layer structure, whereas IPS did not. The mean square displacement of PME drastically changed between 280K and 290K., The Japan Society of Mechanical Engineers, 18 Oct. 2013, Procee[d]ings of Thermal Engineering Conference, 2013, 221-222, Japanese, 110009955702, AA11901963
  • CUDA Enabled for Android Tablets through DS-CUDA
    EdgarJosafatMartinez-Noriega; Atsushi Kawai; Kazuyuki Yoshikawa; Kenji Yasuoka; Tetsu Narumi
    15 May 2013, 先進的計算基盤システムシンポジウム論文集, 2013, 115-116, English, 170000076921
  • 4096GPUを用いた4096³規模の一様等方性乱流の渦法解析
    横田 理央; Barba Lorena; 成見 哲
    東京工業大学学術国際情報センター, Jul. 2012, Tsubame ESJ. : e-science journal, 6, 1-6, Japanese, 2185-6028, 40019415507
  • Turbulence Simulation Using 4096³ Vortex Particles on 4096 GPUs
    Yokota Rio; Barba Lorena; Narumi Tetsu
    東京工業大学学術国際情報センター, Jul. 2012, Tsubame ESJ. : e-science journal, 6, 17-22, English, 2185-6028, 40019415552
  • Automatic Redundant Calculation with GPU Virtualization
    吉川 和幸; 川井 敦; 泰岡 顕治; 成見 哲
    画像処理装置である GPU の高い計算能力を活用する技術である GPGPU が近年普及しつつあり,高性能な画像編集ソフトや動画エンコーディングソフトにも用いられるようになるなど,用途は一般化してきている.しかし,一般の人が使うコンシューマ向け GPU は演算性能的にはサーバー向け GPU に劣らないものの,ECC メモリを搭載しないなど信頼性の面で問題がある.そこで本研究では,GPU 仮想化技術である DS-CUDA を改良して,コンシューマ向け GPU による GPGPU の信頼性を向上させるシステムを開発する.具体的には,複数の GPU で同一の計算をさせる冗長計算を行い,その結果を比較することでいずれかの GPU で計算ミスが発生したことを検出する.計算ミスが発生したら,自動で再計算を行う機能も搭載する.既存のプログラムを変更することなく使用出来ることが最大のメリットである.GPGPU is becoming more and more popular, and recently software for image manipulation or movie encoding also supports GPGPU. However, consumer GPUs for such applications are not so reliable, for example no ECC, even though the performance of them is superior to that of server GPUs. We modified DS-CUDA, which is a framework for GPU virtualization, to enhance the reliability of consumer GPUs. Our system performs redundant calculation with multiple GPUs and checks the difference of the results with each other. When error occurred, the system automatically repeats the former operations on the GPUs. The key is that we do not need any modification of the application software to get the benefit of the reliability., 25 May 2012, 研究報告ハイパフォーマンスコンピューティング(HPC), 2012, 6, 1-5, Japanese, 110009453400, AN10463942
  • Thermodynamic properties of methane/water interface predicted by molecular dynamics simulations (vol 134, 144702, 2011)
    Ryuji Sakamaki; Amadeu K. Sum; Tetsu Narumi; Ryo Ohmura; Kenji Yasuoka
    AMER INST PHYSICS, Dec. 2011, JOURNAL OF CHEMICAL PHYSICS, 135, 21, English, Others, 0021-9606, WOS:000298490700044
  • Petascale Turbulence Simulation Using FMM
    横田 理央; 成見 哲; L.A.Barba; 泰岡 顕治
    Fast Multipole Method (FMM) は従来粒子のN体問題の高速化手法として発展してきたが,近年その応用の幅を広げる研究が多くなされている.本研究では,大規模 GPU システム向けに開発された FMM を用いて 20483 規模の乱流解析を行い,同様の計算条件のもとでスペクトル法との比較を行った.ただし,今回の解析に用いた手法は Treecode と FMM の長所を組み合わせたハイブリッド型になっており,GPU 上で高い Flops が出る treecode の特長をさらに高速なアルゴリズムである FMM で実現している.TSUBAME2.0 上で 4096 GPU を用いた計算において 74% の並列化効率を得た.また,このときの演算性能は 1.01PFlops であった.Fast multipole methods (FMM) were originally developed for accelerating N-body problems in astrophysics and other particle based methods. A recent trend in HPC has been to use FMMs in unconventional application areas. We have performed a 20483 turbulence calculation using an FMM designed for large scale GPU systems. The proposed method uses a hybridization of the treecode and FMM, and combines the data-parallel treecode with the O(N) FMM. The run on TSUBAME 2.0 using 4096 GPUs achieved 74 % parallel efficiency, and the sustained performance reached 1.01 PFlops., 21 Nov. 2011, 研究報告計算機アーキテクチャ(ARC), 2011, 29, 1-8, Japanese, 110008713499, AN10096105
  • High-Performance Drug Discovery: Computational Screening by Combining Docking and Molecular Dynamics Simulations
    Noriaki Okimoto; Noriyuki Futatsugi; Hideyoshi Fuji; Atsushi Suenaga; Gentaro Morimoto; Ryoko Yanai; Yosuske Ohno; Tetsu Narumi; Makoto Taiji
    CELL PRESS, Jan. 2010, BIOPHYSICAL JOURNAL, 98, 3, 460A-460A, English, Summary international conference, 0006-3495, WOS:000208762004306
  • Cell size dependence of orientational order of uniaxial liquid crystals in flat slit (vol 34, pg 761, 2008)
    Toshiki Mima; Tetsu Narumi; Shun Kameoka; Kenji Yasuoka
    TAYLOR & FRANCIS LTD, 2010, MOLECULAR SIMULATION, 36, 3, 254-254, English, Others, 0892-7022, WOS:000274928600011
  • 412 Implementation and Evaluation of Particle Mesh Ewald Method on GPU for Molecular Dynamics Simulation
    SAKAMAKI Ryuji; NARUMI Tetsu; YASUOKA Kenji
    The Japan Society of Mechanical Engineers, 10 Oct. 2009, The Computational Mechanics Conference, 2009, 22, 756-757, Japanese, 1348-026X, 110008702456, AA1190257X
  • アクセラレータによる粒子法シミュレーションの加速
    成見哲; 濱田剛; 小西史一
    Feb. 2009, 情報処理, 50, 2, 129-139, Japanese, Introduction other
  • 618 Large Scale Molecular Dynamics Simulation using GPU
    SAKAMAKI Ryuji; NARUMI Tetsu; YASUOKA Kenji
    The Japan Society of Mechanical Engineers, 01 Nov. 2008, The Computational Mechanics Conference, 2008, 21, 546-547, Japanese, 1348-026X, 110008701884, AA1190257X
  • MOLECULAR DYNAMICS SIMULATION OF WATER INTERFACE
    SAKAMAKI Ryuji; NARUMI Tetsu; OHMURA Ryo; YASUOKA Kenji
    08 Oct. 2008, Thermophysical properties, 29, 143-145, Japanese, 0911-1743, 10024651215, AN10370091
  • D233 分子動力学シミュレーションにおける拡散係数のシステムサイズ依存性(分子動力学1)
    亀岡 駿; 美馬 俊喜; 成見 哲; 戎崎 俊一; 泰岡 顕治
    Molecular dynamics (MD) is one of the computer simulation methods to observe the microscopic state of materials. It was reported that the diffusion coefficient, which is obtained from MD simulation, increased with the increasing of the system size for 128-2,048 water molecules and 128-1,000 Lennard-Jones (LJ) particles fluid. We have performed the large scale MD simulations to study for the system size effect on the diffusion coefficient. The number of molecules for LJ is 1,000-100,000, and for water is 1,000-50,000. MDGRAPE-3, which is the special purpose computer for molecular dynamics si..., 社団法人日本機械学会, 23 Nov. 2007, 熱工学コンファレンス講演論文集, 2007, 307-308, 110007082989
  • 分子動力学シミュレーション専用計算機MDGRAPE-3
    成見哲
    Apr. 2007, 化学工学, 71, 4, 7-10, Japanese, Introduction other
  • 遊離型及びリガンド結合型酵素間の構造変化メカニズムの分子動力学的研究
    沖本憲明; 中村卓; 二木紀行; 末永敦; 平野秀典; 成見哲; 川井敦; 泰岡顕治; 戎崎俊一
    2003, 日本薬学会年会要旨集, 123rd, 3, 0918-9823, 200902226524798999
  • 専用計算機MDGRAPE-2を用いた分子動力学シミュレーションの並列化
    高田直樹; 二木紀行; 末永敦; 沖本憲明; 平野秀典; 成見哲; 川井敦; 泰岡顕治; 泰地真弘人; 戎崎俊一; 小長谷明彦
    2003, 先進的計算基盤システムシンポジウム論文集(SACSIS2003), 197-198
  • バイオインフォマティクスのための高性能計算機アーキテクチャ
    小長谷明彦; 小西史一; 成見哲
    2002, ソフトウェアバイオロジー, 1, 73-77, Japanese, Introduction other
  • 18pTK-8 NaCl-KCl混合系の結晶成長のシミュレーションIII
    古石 貴裕; 泰岡 顕治; 成見 哲; 薄田 竜太郎; 古沢 秀明; 戎崎 俊一
    社団法人日本物理学会, 03 Sep. 2001, 日本物理学会講演概要集, 56, 2, 1342-8349, 110002047479
  • 29pYH-9 NaCl-KCl混合系の結晶成長のシミュレーション II
    古石 貴裕; 泰岡 顕治; 成見 哲; 薄田 竜太郎; 古沢 秀明; 戎崎 俊一
    社団法人日本物理学会, 09 Mar. 2001, 日本物理学会講演概要集, 56, 1, 1342-8349, 110002163031
  • 分子動力学専用計算機へのAMBERプログラムの移植開発および実用化に向けた性能評価
    二木紀行; 沖本憲明; 末永敦; 成見哲; 川井敦; 泰岡顕治; 戎崎俊一
    2001, 分子シミュレーション討論会講演要旨集, 14th, 200902198008459297
  • 100Tflopsの分子動力学シミュレーション専用計算機MDM
    成見哲
    Nov. 2000, Bit, 11, Japanese, Introduction other
  • 2000-HPC-82-33 100Tflopsの分子動力学シミュレーション専用計算機MDMの開発状況
    成見哲; 薄田竜太郎; 古沢秀明; 川井敦; 古石貴裕; 泰岡顕治; 戎崎俊一
    我々は現在100Tflopsの分子動力学シミュレーション専用計算機Molecular Dynamics Machine(MDM)を開発中である。MDMは、Ewald法を用いたクーロン力および分子間力の計算のために専用に開発される計算機であり、タンパク質を含んだ系のようにクーロン力が支配的で大規模な分子動力学シミュレーションを加速することが出来る。MDMは WINE-2, MDGRAPE-2、ホストコンピュータで構成されており、それぞれクーロン力の波数空間部分、クーロン力の実空間部分および分子間力、結合力その他の計算を行う。2000年末に完成予定であり、現在波数空間部分では29Tflopsの実効速度を得た。, 社団法人情報処理学会, 03 Aug. 2000, 情報処理学会研究報告. [ハイパフォーマンスコンピューティング], 2000, 73, 191-196, 0919-6072, 110002932456
  • 22aZC-1 分子動力学専用計算機MDMの性能評価
    薄田 竜太郎; 戎崎 俊一; 加藤 健矢; 小林 芳直; 成見 哲; 古沢 秀明; 泰岡 顕治; Elmegreen B.G; McNiven G.D
    社団法人日本物理学会, 10 Mar. 2000, 日本物理学会講演概要集, 55, 1, 1342-8349, 110001910686
  • 16 分子動力学専用計算機 MDM
    泰岡 顕治; 薄田 竜太郎; 戎崎 俊一; 加藤 健矢; 小林 芳直; 成見 哲; 古沢 秀明; Elmegreen B. G; McNiven G. D; 大口 晃司
    社団法人日本機械学会, 29 Feb. 2000, 熱流体系および固体系のミクロシミュレーションに関する合同シンポジウム・分子動力学シンポジウム講演論文集, 2000, 5, 31-32, 110002506740

Books and other publications

  • Petascale Computing: Algorithm and Applications
    Makoto Taiji; Tetsu Narumi; Yousuke Ohno; Edited by David A. Bader
    English, Joint work, Petascale Special-Purpose Computer for Molecular Dynamics Simulations, CRC Press, 2008
  • 専用計算機によるシミュレーション
    成見哲; 杉本大一郎編
    Japanese, Joint work, 第10章 照明問題も専用計算機で加速できる, 朝倉書店, 1994

Lectures, oral presentations, etc.

  • スマートフォンで遊ぶマイクロマグネティックシミュレーション
    涌井 桐哉; 植田 武; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2022(EC2022), 情報処理学会エンタテインメントコンピューティング研究会, マイクロマグネティクスは原子レベルの磁気の相互作用を取り扱う分野であり,磁気抵抗メモリの開発等に応用されている.シミュレーションにおいては,外部磁界,交換エネルギーによる磁界,静磁界など複数の効果を考慮するため,現象の理解が難しいという問題がある.本研究では,GPU を用いたシミュレーションをスマートフォンで動作させ,加速度センサ等で体を使いながら操作するゲームを開発することで,難解な現象を楽しく学ぶことを目指した., Domestic conference
    02 Sep. 2022
  • FPGAを用いた12画面タイルドディスプレイシステム
    木村 智美; 柴尾 啓太; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2022(EC2022), 情報処理学会エンタテインメントコンピューティング研究会, 本研究では,12 枚のディスプレイを組み合わせて一つの大画面 (約 84 インチ相当) を構成するシステムを開発した.FPGA を用いた専用ハードウェアを開発したため,安価ながら画面間の同期ズレを防ぎティアリングが発生しない.大きなサイズのディスプレイを使わないことから,ばらして持ち運べる,余ったディスプレイを有効活用出来るというメリットもある.実演では,Kinect と組み合わせた大画面VR ゲームのデモを行う., Domestic conference
    02 Sep. 2022
  • Unity によるマイクロマグネティクスシミュレーションの可視化
    李 嘉慶; 成見哲
    Oral presentation, Japanese, 情報処理学会第84回全国大会, 東京, マイクロマグネティクスとは、磁石内部に現れる原子磁気モーメントによって作られる磁化構造やその動的な変化を扱う分野であり、ハードディスクのヘッドやMRAMのシミュレーションなどに用いられる。磁気モーメントの相互作用があるため計算量が多くその動きは予測しずらい。本研究では、ゲームエンジンのUnityを用いてマルチプラットフォーム対応のマイクロマグネティクスシミュレーションのリアルタイム可視化システムを開発した。Compute shaderを用いた場合、CUDAより2倍、CPUより約60倍速い。学生はスマートフォンなどのデバイスで手軽にシミュレーションができ、教育用途に役立つ可能性がある。, Domestic conference
    04 Mar. 2022
  • レイトレーシング法に特化したGPUによる電波伝搬シミュレーションの高速化
    荒生太一; 成見哲
    Oral presentation, Japanese, 情報処理学会第84回全国大会, 東京, ITS(高度道路交通システム)での車々間・路車間無線通信にはGHz以上の電波が使用され、反射や回折などの影響を強く受ける。複雑な周辺環境に対応するにはレイトレーシング法をベースとしたシミュレーション手法が必要だが計算が膨大になる。一方コンピュータグラフィクスの分野では、専用プロセッサを搭載することでレイトレーシング法の計算を高速に行うGPU(NVIDIA RTX)が普及し始めている。本研究では、レイトレーシング法による電波伝搬シミュレーションをRTXを用いて加速し、従来のイメージング法と計算精度などを比較し実用性を検討した。, Domestic conference
    04 Mar. 2022
  • FPGAを用いたフィットネスゲームにおけるリアルタイム運動データ記録支援システムの提案
    滝川 潤; 成見 哲
    Oral presentation, Japanese, 第196回システムとLSIの設計技術研究発表会(デザインガイア2021), 情報処理学会, オンライン, Domestic conference
    02 Dec. 2021
  • 重力多体問題を例とした高位合成ツールの性能比較 ~ SDSoCとVitisの違いに関して ~
    村松耀生; 成見 哲
    Oral presentation, Japanese, 第196回システムとLSIの設計技術研究発表会(デザインガイア2021), 情報処理学会, オンライン, Domestic conference
    02 Dec. 2021
  • A Report on the Results of Making a Logic Circuit Design Assignment using an FPGA Remotely Accessible
    赤池英夫; 島崎俊介; 成見哲
    Poster presentation, Japanese, 情報教育シンポジウム(SSS2021)論文集, IPSJ SIG-CE, オンライン, http://id.nii.ac.jp/1001/00212253/, 本研究では,本学学内における対面での利用のみを想定して作成されていた実験システムを,やむを得ない理由で遠隔対応させ使用した結果,教育にどのような影響を及ぼしたかを調査した., Domestic conference
    28 Aug. 2021
  • FPGAを用いた論理回路設計実験の遠隔実践
    赤池 英夫; 島崎 俊介; 成見 哲
    Oral presentation, Japanese, 157回研究発表会, 研究報告コンピュータと教育(CE), 情報処理学会 コンピュータと教育研究会, オンライン, Domestic conference
    31 Oct. 2020
  • 等身大の仮想、手の中のリアル
    明石 禎紀; 冨井 陸矢; 真鍋 光希; 高見 太基; 帆山 遼; 羽賀 聡希; 寺崎 葉月; 武井 友里恵; 鈴ヶ嶺 聡哲; 山中 佑紀; 峯水 延浩; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2019(EC2019), 博多, Domestic conference
    22 Sep. 2019
  • 音源分離を用いた持ち寄りAndroid端末による3Dモデルライブ
    杉田 陽亮; 荒生 太一; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2019(EC2019), 博多, Domestic conference
    22 Sep. 2019
  • FPGAを用いた持ち運び可能なタイルドディスプレイ
    白井 暁; 前田 諒磨; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2019(EC2019), Domestic conference
    22 Sep. 2019
  • 高位合成によるラジオシティ法のソフトウェア/ハードウェア協調FPGAシステム開発
    田村昂太郎; 成見 哲
    Oral presentation, Japanese, デザインガイア2018 -VLSI設計の新しい大地-, 情報処理学会, サテライトキャンパスひろしま, Domestic conference
    05 Dec. 2018
  • 仮想物理世界で歩き回る論理回路
    渋谷峻; 成見哲
    Poster presentation, Japanese, エンターテインメントコンピューティング2018(EC2018), 情報処理学会エンタテインメントコンピューティング研究会, Domestic conference
    14 Sep. 2018
  • LeapMotion を用いてVRでイカサマを行える麻雀ゲームの提案
    石井拓斗; 成見哲
    Poster presentation, Japanese, エンターテインメントコンピューティング2018(EC2018), 情報処理学会エンタテインメントコンピューティング研究会, Domestic conference
    14 Sep. 2018
  • CG 画像向けの True Color から Deep Color への Inverse Tone Mappingの開発
    難波宗介; 成見哲
    Oral presentation, Japanese, 情報処理学会第80回全国大会, 東京, Domestic conference
    14 Mar. 2018
  • FPGAを用いた6画面タイルドディスプレイシステム
    岩田拳太郎; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2017(EC2017), 仙台, Domestic conference
    16 Sep. 2017
  • その場でタイルドディスプレイ
    井出 拓弥; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2017(EC2017), Domestic conference
    16 Sep. 2017
  • 仮想コレクションケースへの展示物配置システム
    佐野 明日香; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2017(EC2017), Domestic conference
    16 Sep. 2017
  • DS-CUDA によるP2P 機能の評価
    伊藤一輝; 成見 哲
    Oral presentation, Japanese, 第15回情報科学技術フォーラム(FIT2016), 富山, Domestic conference
    09 Sep. 2016
  • FPGA タブレットを用いたDeep Learning アプリケーションの高速化の検討
    佐藤知哉; 成見 哲
    Oral presentation, Japanese, 第15回情報科学技術フォーラム(FIT2016), 富山, Domestic conference
    08 Sep. 2016
  • CUDA Offloading for Molecular Dynamics Simulation
    Martinez-Noriega Edgar Josafat; Tetsu Narumi
    Oral presentation, English, 第21回計算工学講演会, (社)日本計算工学会, 新潟コンベンションセンター, Domestic conference
    01 Jun. 2016
  • PCI Express拡張Boxと仮想GPUとの計算性能比較
    瀬戸口幸寿; 成見 哲
    Oral presentation, Japanese, 情報処理学会第78回全国大会, 情報処理学会, 横浜, GPUを科学計算などの汎用的な目的で使用する技術はGPGPU(General-Purpose
    computing on Graphics Processing Units)として知られている.DS-CUDA(Distributed Shared CUDA)はネットワークを通じたサーバ上のGPUを仮想化するミドルウェアで,クライアント側でソフトを書き換えることなくGPU資源を用いたGPGPUが可能である.ただし,クライアントとサーバー間の通信がボトルネックになり易い.そこで本研究ではMDシミュレーションを題材に,DS-CUDAによる仮想GPUを用いた場合と,物理GPUをPCIExpress拡張Boxを用いて直接扱う場合とで,性能モデルを構築して比較した., Domestic conference
    11 Mar. 2016
  • DS-CUDA: GPU Virtualization Middleware to Support Migration Functionality
    成見 哲; 老川 稔; Martinez-Noriega; EDGAR JOSAFAT; 泰岡 顕治
    Oral presentation, Japanese, 第153回ハイパフォーマンスコンピューティング研究発表会, 情報処理学会HPC研究会, 松山、愛媛, DS-CUDA is a middleware to virtualize remote GPUs as if they look like local GPUs, by re-
    compiling CUDA source code with our compiler. Redundant calculation mechanism of DS-CUDA enables
    a fault-tolerant calculation with consumer GPUs, which are not usually so reliable. However, when large
    number of GPU nodes are used, the server nodes or network also should have fault-tolerance since large
    number of nodes cause more failure. In this research, we added the migration functionality to DS-CUDA,
    and evaluated the overhead of it. This function makes the user code continue to run even if server nodes of
    GPUs stop. Other implementations using Dynamics Parallelism and GPU Direct RDMA are also explained., Domestic conference
    02 Mar. 2016
  • 自動車運転技術の乗り心地判定及び運転者への助言を行うAndroidアプリの開発
    嶋田 貴行; 難波 宗介; 成見 哲
    Poster presentation, Japanese, エンターテインメントコンピューティング2015(EC2015), 情報処理学会 エンタテインメントコンピューティング研究会(SIG-EC), 札幌, Domestic conference
    26 Sep. 2015
  • Mixed Reality空間における仮想書斎システムの開発
    大和田 瑛美華; 佐藤 知哉; 成見 哲
    Poster presentation, Japanese, エンターテインメントコンピューティング2015(EC2015), 情報処理学会 エンタテインメントコンピューティング研究会(SIG-EC), 札幌, Domestic conference
    26 Sep. 2015
  • 数十枚のGPUを手軽に使えるGPU仮想化ツール:DS-CUDA
    成見 哲
    Invited oral presentation, Japanese, GPU Technology Conference Japan, Invited, GPUコンピューティング研究会, GPUエヌビディア合同会社, 虎ノ門, 東京, CUDAが普及して一枚のGPUを使うコードを書ける人が増えてきたものの、数十枚 のGPUを使いたい場合はMPIによる並列化やマシンの管理などの問題がありまだま だ敷居が高い。我々はGPUを仮想化するツールDS-CUDAを提供しており、比較的簡 単に数十枚のGPUを使ったり、故障したGPUを自動的に切り替えたり出来る。本講 演では例を交えながらツールの概要を紹介する。, Domestic conference
    18 Sep. 2015
  • High Performance Computing on Mobile Devices through Distributed Shared CUDA
    Edgar Josafat; Martinez Noriega
    Oral presentation, English, GPU Technology Conference, San Jose, Through a GPU virtualization tool, (DS-CUDA), we remotely use an NVIDIA GPU from our local network to accelerate a molecular dynamics (MD) simulation inside an Android device (NVIDIA SHIELD™). We implement a NaCl MD simulation on Android. We accelerate the computation of force, velocity and coordinate using CUDA through the DS-CUDA tool. We use a laptop equipped with GeForce GTX 680M (server) connected to our LAN network using Gigabit Ethernet. Android device (client) is connected to same LAN using Wifi 802.11n. Server and client communicate under tcp socket. We reached up to 420 Gflops in force computation on a simulation with 5832 ions, 5700 times faster than the 0.073 Gflops delivered from CPU implementation on NVIDIA SHIELD™., International conference
    18 Mar. 2015
  • Unityアプリケーションのクラウド化による高速化
    高橋 悠; 伊藤 一輝; 成見 哲
    Oral presentation, Japanese, 第148回ハイパフォーマンスコンピューティング研究発表会, 別府, Domestic conference
    03 Mar. 2015
  • FPGAタブレットによるモバイルアクセラレータ
    塩谷 丈史; 成見 哲
    Oral presentation, Japanese, 第148回ハイパフォーマンスコンピューティング研究発表会, 別府, Domestic conference
    02 Mar. 2015
  • Making Logic Circuits with a Single Component in a Virtual Kinematic Environment
    Tetsu Narumi
    Poster presentation, English, Chem-Bio Informatics Society(CBI) Annual Meeting 2014, 情報計算化学生物(CBI)学会, タワーホール船堀、東京, Domestic conference
    28 Oct. 2014
  • マルチGPUの活用を簡単にする仮想化技術
    成見 哲
    Keynote oral presentation, Japanese, Prometech Simulation Conference 2014, GPU Computing Workshop for Advanced Manufacturing, プロメテック・ソフトウェア株式会社, 東京, Domestic conference
    26 Sep. 2014
  • FPGAを用いた3Dタイルドディスプレイシステム
    堀田 将也; 嶋田 貴行; 大和田 瑛美華; 成見 哲
    Oral presentation, Japanese, 第19回日本バーチャルリアリティ学会大会, 名古屋, Domestic conference
    17 Sep. 2014
  • Android向けUnityアプリの仮想化
    高橋 悠; 成見 哲
    Poster presentation, Japanese, エンタテインメントコンピューティング2014(EC2014), 中野, Domestic conference
    13 Sep. 2014
  • AndroidアプリからのFPGAの利用による処理の高速化
    塩谷丈史; 成見 哲
    Oral presentation, Japanese, 第13回情報科学フォーラム(FIT2014), 筑波, Domestic conference
    05 Sep. 2014
  • 仮想物理世界で動く大規模論理回路の実現のための立方体型ゲートの提案
    神澤 俊; 成見 哲
    Oral presentation, Japanese, 第13回情報科学技術フォーラム(FIT2014), 筑波, Domestic conference
    03 Sep. 2014
  • GPU 仮想化による耐故障性を考慮した分子動力学シミュレーション
    老川稔; 野村昴太郎; 泰岡顕治; 成見哲
    Oral presentation, Japanese, 2014 年並列/分散/協調処理に関するサマー・ワークショップ(SWoPP2014), 情報処理学会ハイパフォーマンスコンピューティング研究会, 新潟, Domestic conference
    30 Jul. 2014
  • Logic gates in kinematic virtual environment
    Yukitoshi Setoguchi; Tetsu Narumi
    Poster presentation, Japanese, 2014年度人工知能学会全国大会(第28回), 人工知能学会, 松山, Each individual piece of arti cial life (ALife) is usually controlled by the AI, which is programmed by the
    scientist. Therefore, we couldn't say the piece lives completely in the virtual environment. We propose ALife which
    works within the kinematic virtual environment including the program itself. As a rst step of it, logical units
    in the kinematic environment are implemented using Unity, a game constructing tool with kinematic engine. A
    ring oscillator works faster than the previous one which was made by PhysX, a lower-level kinematic engine. A
    comparison of the speed of a simple particle-collision calculation between Unity and CUDA, a lowest-level engine
    with graphics processing unit, showed that CUDA is much faster than Unity for simple calculation. Also a simpler
    NAND gate is proposed to build complicated logical units easier., Domestic conference
    14 May 2014
  • モバイルタイルドディスプレイの開発
    安枝 光; 堀田将也; 成見 哲
    Oral presentation, Japanese, 情報処理学会第76回全国大会, 情報処理学会, 東京, Domestic conference
    11 Mar. 2014
  • Running CUDA on Android Through GPU Virtualization
    Edgar Josafat; Martinez Noriega
    Poster presentation, English, GPU Technology Conference, NVIDIA, San Jose, USA, The way we interact with computers has been modified since tablets and smartphones had come into daily life scene. These new smart and interactive devices offer a new way to represent data and interact with it taking input from different sources. Nowadays, they are equipped with multicore processors on its CPU and also integrate GPU, however they are still lack of libraries for multicore computing such as CUDA (Compute Unified Device Architecture) or OpenCL. On this work we propose the usage of DS-CUDA, a middleware that allows you to manage NVIDIA's GPUs on a distributed network, to run CUDA code on Android devices., International conference
    Mar. 2014
  • 対話型アプリケーション向けハンズフリー入力装置の開発
    椨木 正博; 成見 哲
    Oral presentation, Japanese, ENTERTAINMENT COMPUTING 2013,ENTERTAINMENT COMPUTING 2013
    Oct. 2013
  • モバイルタイルドディスプレイの開発
    安枝 光; 成見 哲
    Oral presentation, Japanese, ENTERTAINMENT COMPUTING 2013,ENTERTAINMENT COMPUTING 2013
    Oct. 2013
  • 国会答弁の言語特徴量と印象の相関調査
    御崎 淳; 成見 哲
    Oral presentation, Japanese, 第12回情報科学技術フォーラム(FIT2013),第12回情報科学技術フォーラム(FIT2013)
    Sep. 2013
  • 拡張現実感を用いた 3D モデリングシステムの開発
    春田英和; 成見哲
    Oral presentation, Japanese, 映像情報メディア学会2013年次大会,映像情報メディア学会2013年次大会
    Aug. 2013
  • GPUによる大規模シミュレーションと支援ツール
    成見 哲
    Oral presentation, Japanese, 第339回CBI学会研究講演会,第339回CBI学会研究講演会
    Jun. 2013
  • 256GPUを用いたレプリカ交換分子動力学シミュレーションの高速化
    老川 稔; 野村 昴太郎; 川井 敦; 泰岡 顕治; 成見 哲
    Oral presentation, Japanese, 第18回計算工学講演会,第18回計算工学講演会
    Jun. 2013
  • CUDA Enabled for Android Tablets through DS-CUDA
    Edgar Josafat Martinez-Noriega; Atsushi Kawai; Kazuyuki Yoshikawa; Kenji Yasuoka; Tetsu Narumi
    Oral presentation, Japanese, 先進的計算基盤システムシンポジウムSACSIS 2013,先進的計算基盤システムシンポジウムSACSIS 2013
    May 2013
  • DS-CUDAを用いたGPGPUの信頼性向上システム
    吉川 和幸; Edgar Josafat Martinez-Noriega; 川井 敦; 泰岡 顕治; 成見 哲
    Oral presentation, Japanese, 先進的計算基盤システムシンポジウムSACSIS2013,先進的計算基盤システムシンポジウムSACSIS2013
    May 2013
  • GPUによる大規模分子動力学シミュレーション
    成見 哲
    Oral presentation, Japanese, 高分子計算機科学研究会,高分子計算機科学研究会「大規模計算機シミュレーションが拓く高分子研究の新展開」
    Mar. 2013
  • 剛体水モデルにおける水/氷共存状態の分子動力学シミュレーション
    高岩大輔; 坂牧隆司; Amadeu K. Sum; 成見哲; 泰岡顕治
    Oral presentation, Japanese, 第26回分子シミュレーション討論会
    Nov. 2012
  • 剛体水モデルにおける水/氷共存状態の分子動力学シミュレーション
    高岩大輔; 坂牧隆司; Amadeu K. Sum; 成見哲; 泰岡顕治
    Oral presentation, Japanese, 第26回分子シミュレーション討論会
    Nov. 2012
  • GPUを用いたレプリカ交換分子動力学シミュレーションの高速化
    野村昴太郎; 老川稔; 川井敦; 成見哲; 泰岡顕治
    Oral presentation, Japanese, 第26回分子シミュレーション討論会,第26回分子シミュレーション討論会
    Nov. 2012
  • GPU仮想化による自動冗長計算システム
    吉川和幸; 川井敦; 泰岡顕治; 成見哲
    Oral presentation, Japanese, 第134回ハイパフォーマンスコンピューティング研究発表会,第134回ハイパフォーマンスコンピューティング研究発表会
    Jun. 2012
  • 高性能計算を用いた人工生命の研究
    原 健一郎; 成見 哲
    Oral presentation, Japanese, 2012年度人工知能学会全国大会(第26回),2012年度人工知能学会全国大会(第26回)
    Jun. 2012
  • GPUを用いたレプリカ交換分子動力学シミュレーションの高速化
    野村昴太郎; 坂牧隆司; 成見哲; 泰岡顕治
    Oral presentation, Japanese, 第17回計算工学講演会,第17回計算工学講演会
    May 2012
  • 大規模GPUとFMMを用いた一様等方性乱流の解析
    成見哲; 横田理央; Barba Lorena; 泰岡顕治
    Oral presentation, Japanese, 第19回ハイパフォーマンスコンピューティングとアーキテクチャの評価に関する北海道ワークショップ
    Nov. 2011
  • 並列GPUによるN体問題の加速
    成見哲
    Oral presentation, Japanese, 第3回アクセラレーション技術発表討論会
    Sep. 2011
  • GPGPUによるモンテカルロ碁のシミュレーションの並列処理
    岩川 夏季; 成見哲; 村松正和
    Oral presentation, Japanese, 第26回ゲーム情報学研究会,第26回ゲーム情報学研究会
    Jul. 2011
  • What the GPU Computing Technology Can or Cannot in the Near Future
    Tetsu Narumi
    Keynote oral presentation, English, Mini Symposium on Computational mechanics on GPUs and modern many-core processors in the 9th World Congress on Computational Mechanics (WCCM) and 4th Asian Pacific Congress on Computational Mechanics (APACOM), IACM (International Association for Computational Mechanics) APACM (Asian Pacific Association for Computational Mechanics), Sydney, Australia, International conference
    Jul. 2010
  • Achieving Enough Accuracy without Double-Precision Hardware to Accelerate Molecular Dynamics Simulations with GPUs
    Tetsu Narumi
    Invited oral presentation, English, 2nd International Workshop on Advances in Computational Mechanics (IWACOM-II), 日本計算工学会,日本機械学会(計算力学部門),横浜国立大学, Yokohama, Japan, International conference
    Mar. 2010
  • 分子動力学シミュレーションにおけるParticle Mesh Ewald法のGPUへの実装と評価
    坂牧隆司; 成見哲; 泰岡顕治
    Oral presentation, Japanese, 第22回計算力学講演会
    Oct. 2009
  • 水/メタン気液界面の分子動力学シミュレーション
    坂牧隆司; 成見哲; 大村亮; 泰岡顕治
    Public symposium, Japanese, 第46回日本伝熱シンポジウム, 第46回日本伝熱シンポジウム, 京都
    Jun. 2009
  • Isotropic Periodic Sum法およびWolf法を用いたMDシミュレーション
    高橋和義; 成見哲; 泰岡顕治
    Public symposium, Japanese, 第46回日本伝熱シンポジウム, 第46回日本伝熱シンポジウム, 京都
    Jun. 2009
  • 分子動力学シミュレーションのGPUによる高速化
    成見哲; 坂牧隆司; 泰岡顕治
    Oral presentation, Japanese, 第14回計算工学講演会
    May 2009
  • Using Special-Purpose and Video-Game Computers for Accelerating Molecular Dynamics Simulations
    Testu Narumi; Ryuji Sakamaki; Shun Kameoka; Kenji Yasuoka
    Invited oral presentation, English, International Workshop on Molecular Simulation Studies in Material and Biological Sciences, Joint Institute for Nuclear Research, Rusia, Dubna, Russia, International conference
    Sep. 2008
  • Special-Purpose Computers and Video-Game Consoles as a High Performance Computing Platform for Molecular Dynamics Simulation
    Tetsu Narumi; Ryuji Sakamaki; Shun Kameoka; Makoto Taiji; Kenji Yasuoka
    Invited oral presentation, English, ICCSE2007 at the 9th High Performance Computing International Conference & Exhibition, ICCSE2007 at the 9th High Performance Computing International Conference & Exhibition, Seoul, Korea, International conference
    Sep. 2007
  • Accelerating Molecular Dynamics Simulations by Special-Purpose Computers and Game Consoles
    Tetsu Narumi; Toshikazu Ebisuzaki; Makoto Taiji; Kenji Yasuoka
    Invited oral presentation, English, CALCON2007 (62nd Calorimetry Conference), CALCON2007 (62nd Calorimetry Conference), Hawaii, USA, International conference
    Aug. 2007
  • A High-Speed Special-Purpose Computer for Molecular Dynamics Simulations : MDGRAPE-3
    Tetsu Narumi; Yousuke Ohno; Noriyuki Futatsugi; Noriaki Okimoto; Atsushi Suenaga; Ryoko Yanai; Makoto Taiji
    Invited oral presentation, English, NIC Workshop : From Computational Biophysics to Systems Biology 2006, NIC Series, John von Neumann Institute for Computing, Germany, Julich, Germany, International conference
    Jun. 2006

Courses

  • コンピュータサイエンス実験第一
    The University of Electro-Communications
  • コンピュータサイエンス実験第一
    The University of Electro-Communications
  • コンピュータサイエンス実験第一
    電気通信大学
  • コンピュータサイエンス実験第二A,B
    The University of Electro-Communications
  • Computer Graphics
    The University of Electro-Communications
  • コンピュータグラフィックス
    電気通信大学
  • Advanced High Performance Computing
    The University of Electro-Communications
  • 総合コミュニケーション科学
    The University of Electro-Communications
  • コンピュータサイエンス実験第二A,B
    The University of Electro-Communications
  • コンピュータサイエンス実験第二A,B
    電気通信大学
  • ハイパフォーマンスコンピューティング特論
    The University of Electro-Communications
  • 総合コミュニケーション科学
    The University of Electro-Communications
  • 総合コミュニケーション科学
    電気通信大学
  • イノベーティブ総合コミュニケーションデザイン2
    The University of Electro-Communications
  • イノベーティブ総合コミュニケーションデザイン2
    電気通信大学
  • イノベーティブ総合コミュニケーションデザイン1
    The University of Electro-Communications
  • イノベーティブ総合コミュニケーションデザイン1
    電気通信大学
  • コンピュータサイエンス実験第二
    The University of Electro-Communications
  • 計算機工学
    The University of Electro-Communications
  • コンピュータ設計論
    The University of Electro-Communications
  • エンジニアリングデザイン2
    The University of Electro-Communications
  • 情報工学工房
    The University of Electro-Communications
  • 大学院技術英語
    The University of Electro-Communications
  • エンジニアリングデザイン1
    The University of Electro-Communications
  • コンピュータサイエンス実験第二
    The University of Electro-Communications
  • コンピュータサイエンス実験第二
    電気通信大学
  • 情報工学工房
    The University of Electro-Communications
  • 情報工学工房
    電気通信大学
  • 大学院技術英語
    The University of Electro-Communications
  • 大学院技術英語
    電気通信大学
  • コンピュータ設計論
    The University of Electro-Communications
  • コンピュータ設計論
    電気通信大学
  • 計算機工学
    The University of Electro-Communications
  • 計算機工学
    電気通信大学
  • エンジニアリングデザイン2
    The University of Electro-Communications
  • エンジニアリングデザイン2
    電気通信大学
  • エンジニアリングデザイン1
    The University of Electro-Communications
  • エンジニアリングデザイン1
    電気通信大学
  • ハイパフォーマンスコンピューティング特論
    The University of Electro-Communications
  • ハイパフォーマンスコンピューティング特論
    電気通信大学

Affiliated academic society

  • 情報処理学会
  • IEEE Computer Society
  • 人工知能学会
  • 電子情報通信学会
  • バーチャルリアリティ学会
  • ソフトウェア科学会

Research Themes

  • タイルドディスプレイを用いた汎用的ウェアラブルディスプレイの開発
    成見哲
    Principal investigator, 当研究では、タイルドディスプレイを用いて服のように着られるディスプレイを開発する。 ディスプレイを格子状に多数並べるタイルドディスプレイは、高解像度の大画面を安価に実現 出来るためデジタルサイネージ等で使われている。スマートフォンを並べてソフトウェアで制御 する手法もあるが、動画再生時にティアリングが発生する。本課題では、複数台のタブレット用 小型ディスプレイをFPGA ハードウェアで制御することにより、ティアリングが発生せずベストの 様に被って使えるウェアラブルディスプレイを安価に実現するところが特徴である。
    01 Apr. 2017 - 31 Mar. 2020
  • Large tiled display system with virtualized GPU
    Narumi Tetsu
    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C), The University of Electro-Communications, Grant-in-Aid for Scientific Research (C), We developed a cost-effective tiled display system by designing a custom FPGA board instead of using professional equipments. Since it is composed by a hardware, tearing cannot be happen. Actually, we made sure that the timing difference between displays are very small because it worked with stereo display using shutter glasses which is very sensitive to synchronization of displays. Also we developed a software to support remote OpenGL rendering by extending DS-CUDA middleware which virtualizes remote GPUs., 24500108
    01 Apr. 2012 - 31 Mar. 2016
  • 仮想物理世界における大規模論理回路の実現
    成見 哲
    日本学術振興会, 科学研究費助成事業 新学術領域研究(研究領域提案型), 電気通信大学, 新学術領域研究(研究領域提案型), 本研究の目的は、分子ロボットに不可欠な大規模論理回路の実現に先立って、コンピュータ内の仮想的な物理世界において大規模な論理回路を実現する手法を確立することである。平成26年度は仮想GPU技術を用いて大規模な論理回路を実装する予定であったが、ロボットに必要ないくつかの機能の確認に留まった。 平成25年度にはゲームエンジンであるUnityを用いて機械式論理回路を設計することで立方体型NANDゲートを作成した。しかし配線できない回路が存在することが分かり、平成26年度は立方体式AND/NANDゲートを作成することで、どのような論理回路も設計できる基本部分を作ることが出来た。これを用いてリングオシレータを実装したところ、立方体型NANDゲートよりも若干低速ながら、同等の速度を達成することが出来た。 また、より複雑で生物のように動く回路を作るために、リングオシレータに脚パーツをつなぐことで、歩く論理回路を作成した。分子ロボットに不可欠な「センサー」「モーター」「コンピュータ」「構造」の4要素のうち、「モーター」と「コンピュータ」を備えたものを作り出せたことから、仮想物理世界で論理回路を作る意義を確認出来た。 ただしGPUを用いた大規模化は実現出来なかった。Unityがベースとしている物理エンジンのPhysXはGPUで加速可能と謳われているが、実際には剛体モデルではGPUの加速は実装されていなかったことが大きい。また、Unityでは剛体同士の微妙な遊びを調整するなど職人的な設計が必要となり、性能を出すためのネックとなることが分かった。今後はより単純化した物理モデルを用いてCUDAなどのより低レベルの言語を用いてGPUでの加速を実装するなど、方針を変える必要があることが分かった。, 25104509
    01 Apr. 2013 - 31 Mar. 2015
  • Development of Enhanced GRAPE library for Quasi general-purpose Computer
    NARUMI Tetsu
    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B), Keio University, Grant-in-Aid for Young Scientists (B), I have extended a library program from MDGRAPE special-purpose computer to quasi general-purpose computer (PLAYSTATION 3 and GPU) for molecular dynamics and turbulence simulations with vortex method. First, new MR3 library, which is the library program for MDGRAPE-3, has supported PS3 and GPU. Second, Parallel PS3s further accelerated MD simulations for large number of particles. Third, parallel GPUs further accelerated turbulence simulations with the vortex method., 20700031
    2008 - 2009

Industrial Property Rights

  • 数値計算処理装置
    Patent right, 泰地真弘人, 成見哲, 大野洋介, -, 2006-236256, Date announced: 2006

Media Coverage

  • CBTでプログラミング技術評価 電気通信大学 注目大学に聞く(3)
    日本教育新聞, Paper
    21 Jul. 2023
  • おはよう日本(関東甲信越)五輪を目指せ!学生たちの”エリア放送”-東京調布市から中継-
    NHK, Media report
    Sep. 2013
  • 朝刊多摩版 来場者へワンセグ番組-電通大生、会場の味スタ周辺限定放送-
    朝日新聞, Paper
    Sep. 2013
  • 分子の動き予測PS3で可能に
    日経産業新聞, Paper
    Dec. 2008