Search Details ｜The University of Erectro-Communications

Name

Author

Position

Affiliation

Research Areas

CONG-KHA PHAM

Department of Computer and Network Engineering	Professor
Cluster II (Emerging Multi-interdisciplinary Engineering)	Professor
UEC ASEAN Research Center	Professor

Researcher Information

Degree

Master of Engineering, Sophia University

Doctor of Engineering, Sophia University

Research Keyword

Hardware System

neural network

analog circuit

digital circuit

Integrated circuit

ニューラルネットワーク

アナログ回路

デジタル回路

集積回路

Field Of Study

Manufacturing technology (mechanical, electrical/electronic, chemical engineering), Electronic devices and equipment

Career

01 Apr. 2017
電気通信大学, 教授

01 Apr. 2007 - 31 Mar. 2017
電気通信大学, 准教授

01 Apr. 2000 - 31 Mar. 2007
電気通信大学, 助教授

01 Apr. 1996 - 31 Mar. 2000
東京情報大学情報学科, 講師

01 Apr. 1992 - 31 Mar. 1996
上智大学理工学部電気電子工学科, 助手

Educational Background

Mar. 1992
Sophia University, Graduate School, Division of Science and Engineering, 電気電子工学専攻

Mar. 1990
Sophia University, Graduate School, Division of Science and Engineering, 電気電子工学専攻

Mar. 1989
Sophia University, Faculty of Science and Engineering, 電気電子工学科

Member History

Apr. 2012
電子回路研究専門, 電気学会, Society

Apr. 2012
集積回路研究会専門, 電子情報通信学会, Society

Research Activity Information

Award

Oct. 2023
ATC 2023 2023 International Conference On Advanced Technologies For Communications
ATC 2023 2023 International Conference On Advanced Technologies For Communications Best Paper Award

Sep. 2023
The 5th ASEAN-UEC Workshop on Informatics and Engineering
Young Researchers Encouragement Award, Khai-Duy Nguyen、Tuan-Kiet Dang、Koichiro Ishibashi、Cong-Kha Pham、Trong-Thuc Hoang

Oct. 2022
ISOCC 2022
Korea
The 19th International SoC Conference (ISOCC 2022) Best Paper Award
International society, Korea, Republic of

Sep. 2022
ICICD 2022
2022 International Conference on IC Design and Technology Best Student Paper Award
International society, Korea, Republic of

Nov. 2021
ISOCC 2021, NGUYEN XUAN THUANさん（先進理工学専攻博士後期3年）がBest Student Paper awardを受賞しました。IEEE Circuits and Systems Society（CASS）が主催する2017年ISCASは、2017年5月28日から31日まで、米国メリーランド州ボルチモアで開催されました。2017年ISCASでは、夢から革新へのつながりを目標に、革新に移行して経済発展を促進する際に、回路やシステムにおける創造的で研究主義的なアイデアを養います。プログラムは、研究者が共有する幅広い研究分野やアプリケーションを反映するように調整されました。 2017年ISCASのトピックは、回路とシステムのコミュニティにとって特に重要となる分野を網羅的に含んでいます。2017年の一般セッションの採択状況は、49の国と地域から投稿総数1,339件に対して612件が採択され、採択率は45.7%と例年並みの難易度が高い学会です。アカデミックからの投稿率は91%で、この内、学生がファーストオーサーの論文を対象とするBest Student Paper Awardが7編の内の1編となっています。
USA
The 18th International SoC Conference (ISOCC 2021) MetaCNI Award
International society, United States

May 2017
IEEE, NGUYEN XUAN THUANさん（先進理工学専攻博士後期3年）がBest Student Paper awardを受賞しました。IEEE Circuits and Systems Society（CASS）が主催する2017年ISCASは、2017年5月28日から31日まで、米国メリーランド州ボルチモアで開催されました。2017年ISCASでは、夢から革新へのつながりを目標に、革新に移行して経済発展を促進する際に、回路やシステムにおける創造的で研究主義的なアイデアを養います。プログラムは、研究者が共有する幅広い研究分野やアプリケーションを反映するように調整されました。 2017年ISCASのトピックは、回路とシステムのコミュニティにとって特に重要となる分野を網羅的に含んでいます。2017年の一般セッションの採択状況は、49の国と地域から投稿総数1,339件に対して612件が採択され、採択率は45.7%と例年並みの難易度が高い学会です。アカデミックからの投稿率は91%で、この内、学生がファーストオーサーの論文を対象とするBest Student Paper Awardが7編の内の1編となっています。
USA
Best Paper Award of The International Symposium on Circuits and Systems (ISCAS 2017)
International society, United States

Aug. 2012
IEEE
Vietnam
Best Paper Award of 4th International Conference on Communications and Electronics (ICCE2012)
Viet Nam

Paper

Compacting Side-Channel Measurements With Amplitude Peak Location Algorithm.
Thai-Ha Tran; Duc-Thuan Dam; Ba-Anh Dao; Van-Phuc Hoang; Cong-Kha Pham; Trong-Thuc Hoang
IEEE Trans. Very Large Scale Integr. Syst., 32, 3, 573-586, Mar. 2024, Peer-reviwed
Scientific journal
URL
DOI URL

Accumulator-Based 16-Bit Processor for Wireless Sensor Nodes
Tuan Kiet Dang; Khai Duy Nguyen; Cong Kha Pham; Trong Thuc Hoang
IEEE Transactions on Circuits and Systems II: Express Briefs, Feb. 2024, Peer-reviwed, Wireless sensor network (WSN) has emerged as a significant application among Internet-of-Things (IoT) applications. Energy harvesting systems have a high potential for deployment in WSN to monitor natural environments and industrial equipment. With limited resources, including power and chip area, an energy harvesting system demands thorough resource allocation to several circuits like a control system, sensors, and a transceiver. Also, such systems are required to function with low-peak power to adapt to the fluctuation of harvested energy. This research presents a System-on-Chip (SoC) featuring a tiny 16-bit processor for batteryless systems. The processor is implemented using an accumulator-based instruction set architecture, realizing a small-scale design. The SoC integrates the 16-bit processor, two static random-access-memory blocks (1KB and 512B) for instruction, and data memory and peripherals for communication. It is fabricated on general-purpose CMOS 180nm and Silicon On-Thin-Buried Oxide (SOTB) 65nm process. Implemented results show the total area cost of the SoC is 241,036μm2 and 52,558μm2 on CMOS 180nm and SOTB. The SoC design achieves low-peak power consumption at 0.6μW and on the CMOS 180nm chip. Power consumption can decline further with a key technique in varying the back body bias by SOTB technology to 21.56nW. The minimum energy point is observed to be 10.38μW/MHz and 0.64μW/MHz in CMOS 180nm and SOTB 65nm chips, respectively. The small-scale features in size and power dissipation make the proposed SoC suitable for energy harvesting applications.
Scientific journal
DOI URL

High-Speed NTT Accelerator for CRYSTAL-Kyber and CRYSTAL-Dilithium.
Trong-Hung Nguyen; Binh Kieu-Do-Nguyen; Cong-Kha Pham; Trong-Thuc Hoang
IEEE Access, 12, 34918-34930, Feb. 2024, Peer-reviwed
Scientific journal
URL
DOI URL

FPGA-Based Secured and Efficient Lightweight IoT Edge Devices with Customized RISC-V
Nguyen The Binh; Binh Kieu-Do; Trong-Thuc Hoang; Pham Cong-Kha; Cuong Pham-Quoc
2023 RIVF International Conference on Computing and Communication Technologies (RIVF), IEEE, 23 Dec. 2023
International conference proceedings
URL
DOI URL

Revealing Secret Key from Low Success Rate Deep Learning-Based Side Channel Attacks
Van-Phuc Hoang; Ngoc-Tuan Do; Trong-Thuc Hoang; Cong-Kha Pham
2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), IEEE, 18 Dec. 2023
International conference proceedings
URL
DOI URL

A High-Speed Barret-Based Modular Multiplication with Bit-Correction for the CRYSTAL-KYBER Cryptosystem
Trong-Hung Nguyen; Cong-Kha Pham; Trong-Thuc Hoang
Intelligence of Things: Technologies and Applications, Springer Nature Switzerland, 191-199, 20 Oct. 2023
In book
URL
DOI URL

A High-Performance Pipelined FPGA-SoC Implementation of SHA3-512 for Single and Multiple Message Blocks
Tan-Phat Dang; Tuan-Kiet Tran; Trong-Thuc Hoang; Cong-Kha Pham; Huu-Thuan Huynh
Intelligence of Things: Technologies and Applications, Springer Nature Switzerland, 288-298, 20 Oct. 2023
In book
URL
DOI URL

Optimizing ECC Implementations Based on SoC-FPGA with Hardware Scheduling and Full Pipeline Multiplier for IoT Platforms
Tuan-Kiet Tran; Tan-Phat Dang; Trong-Thuc Hoang; Cong-Kha Pham; Huu-Thuan Huynh
Intelligence of Things: Technologies and Applications, Springer Nature Switzerland, 299-309, 20 Oct. 2023
In book
URL
DOI URL

The Efficiency of High-performance SHA-3 Accelerator on the System Level
Tan-Phat Dang; Tuan-Kiet Tran; Trong-Thuc Hoang; Cong-Kha Pham; Huu-Thuan Huynh
2023 International Symposium on Electrical and Electronics Engineering (ISEE), IEEE, 19 Oct. 2023
International conference proceedings
URL
DOI URL

An Efficient Cryptographic Accelerators for IoT System Based on Elliptic Curve Digital Signature
Huu Thuan Huynh; Tan Phat Dang; Trong Thuc Hoang; Cong Kha Pham; Tuan Kiet Tran
Communications in Computer and Information Science, 1950 CCIS, 106-118, Oct. 2023, Peer-reviwed, Given the importance of security requirements in today’s Internet of Things (IoT) landscape, this study focuses on enhancing the security of IoT systems through encryption algorithms implemented on the SoC-FPGA platform. Specifically, this paper presents the development of a hardware system that generates public keys and performs digital signature generation and verification using the Elliptic Curve Digital Signature Algorithm (ECDSA) based on the SECP256K1 curve. The ECDSA encryption hardware functions as a co-processor and demonstrates a maximum operating frequency of 30 MHz. In terms of performance, the ECDSA IP achieves efficient processing speeds, taking approximately 17 ms to generate a public key and produce a digital signature and nearly 30 ms to verify the digital signature. These results showcase a well-balanced design that enables a trade-off between speed, area, and power dissipation in the proposed system.
International conference proceedings
DOI URL

A High-Efficiency Modular Multiplication Digital Signal Processing for Lattice-Based Post-Quantum Cryptography
Trong-Hung Nguyen; Cong-Kha Pham; Trong-Thuc Hoang
Cryptography, Sep. 2023, Peer-reviwed
Scientific journal
URL
DOI URL

Efficiency System-level SHA-3 Accelerator for IoT
Thuan Huu Huynh; Phat Tan Dang; Kiet Tuan Tran; Thuc Trong Hoang; Kha Cong Pham
18 Aug. 2023, Peer-reviwed
DOI URL

A Survey of Post-Quantum Cryptography: Start of a New Race
Thuan Dam; Thai-Ha Tran; Van-Phuc Hoang; Cong-Kha Pham; Trong-Thuc Hoang
Cryptography, Aug. 2023, Peer-reviwed
Scientific journal
URL
DOI URL

Dynamic Gold Code-Based Chaotic Clock for Cryptographic Designs to Counter Power Analysis Attacks
Thai-Ha Tran; Anh-Tien Le; Trong-Thuc Hoang; Van-Phuc Hoang; Cong-Kha Pham
Proceedings of the Great Lakes Symposium on VLSI 2023, ACM, 05 Jun. 2023, Peer-reviwed
International conference proceedings
URL
DOI URL

In-NVRAM Unified PUF and TRNG Based on Standard CMOS Technology
Ronaldo Serrano; Marco Sarmiento; Ckristian Duran; Tuan-Kiet Dang; Trong-Thuc Hoang; Cong-Kha Pham
2023 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 21 May 2023
International conference proceedings
URL
DOI URL

Design of an SoC Based on 32-Bit RISC-V Processor with Low-Latency Lightweight Cryptographic Cores in FPGA
Khai-Minh Ma; Duc-Hung Le; Cong-Kha Pham; Trong-Thuc Hoang
Future Internet, MDPI AG, 15, 5, 186-186, 19 May 2023, Peer-reviwed, The security of Internet of Things (IoTs) devices in recent years has created interest in developing implementations of lightweight cryptographic algorithms for such systems. Additionally, open-source hardware and field-programable gate arrays (FPGAs) are gaining traction via newly developed tools, frameworks, and HDLs. This enables new methods of creating hardware and systems faster, more simply, and more efficiently. In this paper, the implementation of a system-on-chip (SoC) based on a 32-bit RISC-V processor with lightweight cryptographic accelerator cores in FPGA and an open-source integrating framework is presented. The system consists of a 32-bit VexRiscv processor, written in SpinalHDL, and lightweight cryptographic accelerator cores for the PRINCE block cipher, the PRESENT-80 block cipher, the ChaCha stream cipher, and the SHA3-512 hash function, written in Verilog HDL and optimized for low latency with fewer clock cycles. The primary aim of this work was to develop a customized SoC platform with a register-controlled bus suitable for integrating lightweight cryptographic cores to become compact embedded systems that require encryption functionalities. Additionally, custom firmware was developed to verify the functionality of the SoC with all integrated accelerator cores, and to evaluate the speed of cryptographic processing. The proposed system was successfully implemented in a Xilinx Nexys4 DDR FPGA development board. The resources of the system in the FPGA were low with 11,830 LUTs and 9552 FFs. The proposed system can be applicable to enhancing the security of Internet of Things systems.
Scientific journal
URL
DOI URL

A flexible and efficient FPGA-based random forest architecture for IoT applications
Trung Pham Dinh; Cuong Pham-Quoc; Tran Ngoc Thinh; Binh Kieu Do Nguyen; Pham Cong Kha
Internet of Things, Elsevier BV, 100813-100813, May 2023, Peer-reviwed
Scientific journal, English
DOI URL

Transition Factors of Power Consumption Models for CPA Attacks on Cryptographic RISC-V SoC
Thai Ha Tran; Ba Anh Dao; Trong Thuc Hoang; Van Phuc Hoang; Cong Kha Pham
IEEE Transactions on Computers, Mar. 2023, Peer-reviwed, Physical cryptographic devices are vulnerable to side-channel information leakages during operation. They are widely used in software as well as hardware implementations, ranging from microcontrollers and microprocessors to hardware accelerators in System on Chips (SoCs). Nowadays, cryptographic RISC-V SoCs are becoming the most prominent solution compared to the rest. Cryptographic accelerators provide users with a very high level of flexibility and customization of chips suited to specific applications in these systems. First, this research aims to confirm the effectiveness of the Correlation Power Analysis attack on cryptographic SoCs based on three different power consumption models. In each model, the effectiveness of an attack depends on the transition factor, which is a ratio related to different characteristics of the device's power consumption. Then, we focus on modifying the configuration on the SoC and attacking the AES hardware implementation on these designs. The experimental results show that applying the Switching Distance model brings the highest performance. With our suggested range of transition factors, the number of traces needed to find the secret key can be reduced by 13.35% in the best case.
Scientific journal
DOI URL

Multi-Functional Resource-Constrained Elliptic Curve Cryptographic Processor
Binh Kieu Do-Nguyen; Cuong Pham-Quoc; Ngoc-Thinh Tran; Cong-Kha Pham; Trong-Thuc Hoang
IEEE Access, 11, Jan. 2023, Peer-reviwed
Scientific journal
URL
DOI URL

A cross-process Spectre attack via cache on RISC-V processor with trusted execution environment.
Anh-Tien Le; Trong-Thuc Hoang; Ba-Anh Dao; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
Computers & Electrical Engineering, Elsevier {BV}, 105, 108546-108546, Jan. 2023, Peer-reviwed
Scientific journal, English
URL
DOI URL

On the performance of non‐profiled side channel attacks based on deep learning techniques
Ngoc‐Tuan Do; Van‐Phuc Hoang; Van Sang Doan; Cong‐Kha Pham
IET Information Security, Institution of Engineering and Technology (IET), 17, 3, 377-393, 20 Dec. 2022
Scientific journal
URL
URL 2
DOI URL

Design of a Low-Power and Low-Area 8-Bit Flash ADC Using a Double-Tail Comparator on 180 nm CMOS Process.
Hong-Hai Thai; Cong-Kha Pham; Duc-Hung Le
Sensors, 23, 1, 76-76, Dec. 2022, Peer-reviwed
Scientific journal
URL
DOI URL

A 3.65 Gb/s Area-Efficiency ChaCha20 Cryptocore
Ronaldo Serrano; Marco Sarmiento; Ckristian Duran; Trong-Thuc Hoang; Cong-Kha Pham
Proc. of 19th International SoC Conference (ISOCC 2022), 19th International SoC Conference (ISOCC 2022), 79-80, 19 Oct. 2022, Peer-reviwed, In the last decade, the efforts to provide a secure
channel for end-to-end communications have focused on developing
high-throughput, side-channel resistant, and hardware
efficiency implementations in the Advanced Encryption Standard
(AES). However, the relevance of the ChaCha20 cipher increases
due to the addition in Transport Layer Security 1.3, generating
another solution different than AES to provide a secure
channel in end-to-end communications in computer networks.
This paper shows the hardware efficiency perspective on the
ChaCha20 cipher. The ChaCha20 is implemented in a 0.18μm
standard CMOS technology, occupying a 25.05-kGE. In addition,
the implementation reports a 67.17-mW and 145-Kbps/GE of
power consumption and hardware efficiency, respectively. The
ChaCha20 implementation increased 40% of hardware efficiency
compared with the related works.
International conference proceedings, English
URL
DOI URL

A Novel Ring Oscillator PUF for FPGA Based on Feedforward Ring Oscillators
Tuan-Kiet Dang; Ronaldo Serrano; Trong-Thuc Hoang; Cong-Kha Pham
Proc. of 19th International SoC Conference (ISOCC 2022), 19th International SoC Conference (ISOCC 2022), 87-88, 19 Oct. 2022, Peer-reviwed, A Physical Unclonable Function (PUF) exploits
uncontrollable variations in manufacturing to characterize an integrated circuit. There have been many PUF designs proposed which apply different strategies to extract process variation on Field Programmable Gate Arrays (FPGAs). Ring Oscillator PUF (RO PUF) is one of the FPGA-friendly designs taking advantage of the difference in hardware delay to generate an unpredictable output of a silicon device. This paper proposes a novel variation of RO PUF on FPGA based on feedforward ring oscillators (FRO). The experiment results of FRO PUF are conducted on Xilinx Artix-7 FPGA and illustrate satisfactory results in uniqueness, uniformity, and reliability with 50.23%, 52.64%, and 95.92%, respectively.
International conference proceedings, English
URL
DOI URL

A Unified PUF and Crypto Core Exploiting the Metastability in Latches
Ronaldo Serrano; Ckristian Duran; Marco Sarmiento; Tuan-Kiet Dang; Trong-Thuc Hoang; Cong-Kha Pham
Future Internet 2022, MDPI, 14, 10, 1-12, 17 Oct. 2022, Peer-reviwed, Hardware acceleration of cryptography algorithms represents an emerging approach to obtain benefits in terms of speed and side-channel resistance compared to software implementations. In addition, a hardware implementation can provide the possibility of unifying the functionality with some secure primitive, for example, a true random number generator (TRNG) or a physical unclonable function (PUF). This paper presents a unified PUF-ChaCha20 in a field-programmable gate-array (FPGA) implementation. The problems and solutions of the PUF implementation are described, exploiting the metastability in latches. The Xilinx Artix-7 XC7A100TCSG324-1 FPGA implementation occupies 2416 look-up tables (LUTs) and 1026 flips-flops (FFs), reporting a 3.11% area overhead. The PUF exhibits values of 49.15%, 47.52%, and 99.25% for the average uniformity, uniqueness, and reliability, respectively. Finally, ChaCha20 reports a speed of 0.343 cycles per bit with the unified implementation.
Scientific journal, English

A System-on-Chip for IoT Applications with 16-bit Tiny Processor
Dang Tuan Kiet; Khai-Duy Nguyen; Nguyen Quang Nhu Quynh; Trong-Thuc Hoang; Cong-Kha Pham
Proc. of 2022 International Conference on IC Design and Technology (ICICDT), 2022 International Conference on IC Design and Technology (ICICDT), 1-4, 21 Sep. 2022, Peer-reviwed, In an Internet of Things (IoT) system, many embedded
devices are deployed to gather massive amounts of information. These devices may collect indexes of natural substances (air, soil, water) or physiological parameters to provide data for later assessment on environmental conditions or improving healthcare. Gathering this information requires embedded processors to execute lightweight tasks involving sensing and communication
through a wireless channel. This paper presents a low-area System on Chip (SoC) capable of performing sensing tasks for IoT applications. The SoC contains an accumulated architecture 16-bit processor, 512-Byte and 1-Kilobyte Random Access Memory (RAM) blocks for data and instruction, a General-Purpose In-Outs (GPIO), and a Serial Peripheral Interface (SPI) and a programmer. The SoC has been synthesized in 65-nm Silicon-On-Thin-Box (SOTB) technology and occupies 350×600-μm2. The processor area represents only 3.56% of the total SoC. The implementation of FPGA (Altera Cyclone IV EP4CE115) costs 373 LUTs, 202 Flip-flops, and 2 Block RAMs.
International conference proceedings, English

High-speed FPGA-based Design and Implementation of Text Search Processor
Binh Kieu-Do-Nguyen; Dang Tuan Kiet; Trong-Thuc Hoang; Katsumi Inoue; Toshinori Usugi; Masanori Odaka; Shuichi Kameyama; Cong-Kha Pham
Proc. of 2022 International Conference on IC Design and Technology (ICICDT), 2022 International Conference on IC Design and Technology (ICICDT), 1-4, 21 Sep. 2022, Peer-reviwed, In the age of computer evolution, the number of data grows swiftly. Moreover, the requirement of extracting the
information from the database becomes urgent. Full-text search provides methods to quickly locate multiple keywords inside extensive text data and has gained more consideration in recent years. The proposed tools, such as Lucene, Hyper Estraier, and Namazu, are based on general-purpose processors. They spend more time on index input documents and require more space to store these indexes. In this work, we provide a text search processor design that could perform the full-text search without indexing. The text search processor offers a high-performance, high-level of parallelism and scalability. The design is deployed
on Field Programmable Gate Arrays (FPGA) platforms. More
than 70K processing units can be integrated on Xilinx Alveo U50. The working frequency achieves 266-MHz after place and route.
International conference proceedings, English

An Efficient Masking Method for AES Using Tower Fields
Khac-Hoan Pham; Thai-Ha Tran; Thi-Phuong Nguyen; Cong-Kha Pham
Proc. of 2022 IEEE Ninth International Conference on Communications and Electronics (ICCE), 2022 IEEE Ninth International Conference on Communications and Electronics (ICCE), 207-212, 27 Jul. 2022, Peer-reviwed, A combination of the Boolean masking with the multiplicative masking for AES S-box is secured against side-channel attacks, particularly power analysis attacks. However, it is paid for by significantly increasing the complexity of the S-box implementation in hardware. This paper proposes a masking method based on the inversion in the tower field to cope with that problem. The experimental results show that the proposed method assures security against CPA up to 30,000 traces with the AES-TFM-2 scheme and more than 12,660 traces with the AES-TFM-1 one. There is a trade-off between the security level and the hardware implementation cost of the two proposed schemes. However, the technique reduces that cost considerably compared to existing approaches, and it is also secured against zero-value attacks.
International conference proceedings, English
DOI URL

A Unified NVRAM and TRNG in Standard CMOS Technology
Ronaldo Serrano; Ckristian Duran; Marco Sarmiento; Cong-Kha Pham
IEEE Access, IEEE, 10, 79213-79221, 25 Jul. 2022, Peer-reviwed, This paper presents a unified NVRAM-TRNG in a 0.18μm standard CMOS technology without an additional mask or fabrication steps. The unified implementation does not need additional circuits for the random number generation mode. The differential NVRAM bit cell is implemented using a high voltage transistor to resist the non-volatile memory application. The NVRAM presents times of 15-ms of programming and erasing and 20-ns of reading functions. The bit cell needs a voltage of 8.5-V for the programming and erasing functions. The TRNG implemented passes the NIST SP800-22 statistical test and NIST SP800-90B entropy test with a 0.9859 minimum entropy. The entropy and statistical test are applied with Process, Voltage, and Temperature (PVT) variations. The implementation occupies a 476- μm2 with 14.69- 103F2 of area normalized. Besides, the NVRAM bit cell in TRNG mode shows a bit rate of 50-Mbps. Finally, the implementation as TRNG reports a 49.5- μW power consumption with 0.99-pJ/bit energy efficiency, respectively.
Scientific journal, English
DOI URL

High-performance Multi-function HMAC-SHA2 FPGA Implementation
Binh Kieu-Do-Nguyen; Trong-Thuc Hoang; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
Proc. of 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), 30-34, 19 Jun. 2022, Peer-reviwed, Today, Hash-based Message Authentication Code with Secure Hash Algorithm 2 (HMAC-SHA2) is widely used in modern protocols, such as in Internet Protocol Security (IPSec) and Transport Layer Security (TLS). Many authors proposed their HMAC-SHA2 hardware implementations. Some targeted a high-performance design, while others aimed to satisfy an area constraint. Those implementations are acceptable for applications that require only low-cost or high throughput. However, some applications, such as Software-Defined Networking (SDN), Internet-of-Thing (IoT), and Wireless Sensor Network (WSN), need an efficient design that can satisfy both merits. In this paper, an FPGA implementation is proposed that can operate on multiple HMAC-SHA2 variants without re-synthesize. The proposed architecture achieves high performance with a low-cost area. The experimental results show that it can run up to 380-MHz, more than 4.8 Giga-bit-per-second (Gbps), with fewer resources compared to other similar designs.
International conference proceedings, English
DOI URL

Low-Cost Area-Efficient FPGA-Based Multi-Functional ECDSA/EdDSA
Binh Kieu-Do-Nguyen; Cuong Pham-Quoc; Ngoc-Thinh Tran; Cong-Kha Pham; Trong-Thuc Hoang
Cryptography 2022, MDPI, 10, 1-14, 10 May 2022, Peer-reviwed, In cryptography, elliptic curve cryptography (ECC) is considered an efficient and secure
method to implement digital signature algorithms (DSAs). ECC plays an essential role in many
security applications, such as transport layer security (TLS), internet protocol security (IPsec), and
wireless sensor networks (WSNs). The proposed designs of ECC hardware implementation only
focus on a single ECC variant and use many resources. These proposals cannot be used for resourceconstrained
applications or for the devices that need to provide multiple levels of security. This
work provides a multi-functional elliptic curve digital signature algorithm (ECDSA) and Edwardscurve
digital signature algorithm (EdDSA) hardware implementation. The core can run multiple
ECDSA/EdDSA algorithms in a single design. The design consumes fewer resources than the other
single-functional design, and is not based on digital signal processors (DSP). The experiments show
that the proposed core could run up to 112.2 megahertz with Virtex-7 devices while consuming only
10,259 slices in total.
Scientific journal, English

ChaCha20-Poly1305 Authenticated Encryption with Additional Data for Transport Layer Security 1.3
Ronaldo Serrano; Ckristian Duran; Marco Sarmiento; Cong-Kha Pham; Trong-Thuc Hoang
Cryptography 2022, MDPI, 10, 1-12, 10 May 2022, Peer-reviwed, This paper shows ChaCha20 and Poly1305 primitives. In addition, a compatible ChaCha20–Poly1305 AEAD with TLS 1.3 is implemented with a fault detector to reduce the problems in fragmented blocks. The AEAD implementation reaches 1.4-cycles-per-byte in a standalone core. Additionally, the system implementation presents 11.56-cycles-per-byte in an RISC-V environment using a TileLink bus. The implementation in Xilinx Virtex-7 XC7VX485T Field-Programmable Gate-Array (FPGA) denotes 10,808 Look-Up Tables (LUT) and 3731 Flip-Flops (FFs), represented in 23% and 48% of ChaCha20 and Poly1305, respectively. Finally, the hardware implementation of ChaCha20–Poly1305 AEAD demonstrates the viability of using a different option from the conventional cipher suite based on AES for TLS 1.3.
Scientific journal, English

Trusted Execution Environment Hardware by Isolated Heterogeneous Architecture for Key Scheduling
Trong-Thuc Hoang; Ckristian Duran; Ronaldo Serrano; Marco Sarmiento; Khai-Duy Nguyen; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), 1-1, Apr. 2022
Scientific journal
URL
DOI URL

A Robust and Healthy Against PVT Variations TRNG based on Frequency Collapse
Ronaldo Serrano; Ckristian Duran; Marco Sarmiento; Trong-Thuc Hoang; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), 1-1, Apr. 2022
Scientific journal
URL
DOI URL

Systems on a Chip with 8bits and 32bits Processors in 0.18μm Technology for IoT Applications
Marco Sarmiento; Khai-Duy Nguyen; Ckristian Duran; Trong-Thuc Hoang; Ronaldo Serrano; Koichiro Ishibashi; Cong-Kha Pham
IEEE Transactions on Circuits and Systems II: Express Briefs, IEEE, 69, 5, 2438-2442, 22 Mar. 2022, Peer-reviwed, The Internet-of-Things applications use embedded processors to execute lightweight tasks for sensing and management of communications. These applications use different energy reducing strategies such as clock gating and domain switching. However, some power supplies for sensor systems are designed for low-power delivery rather than low-energy battery consumption. Regarding power consumption, it is important to choose the system-based processor in which some variables are taken into account. Depending on the final IoT application, such variables are power consumption, area, performance, and software tools. This paper presents an 8bits and 32bits based System on Chip (SoC) in a General Purpose (GP) CMOS technology. The two processors are implemented in the same tape-out and the same peripherals. The experiment results show a 1.69μW and 1.76μW in the 32bits and 8bits SoC, respectively. In terms of area, the 32bits processor is 46% overhead of the 8bits processor, with 6.6-kGE over 3.6-kGE. Finally, the 32bits SoC presents a 1.11 DMIPS and 8bits SoC a 1.38 DMIPS.
Scientific journal, English
DOI URL

Low Complexity Correlation Power Analysis by Combining Power Trace Biasing and Correlation Distribution Techniques
Ngoc-Tuan Do; Van-Phuc Hoang; Cong-Kha Pham
IEEE Access, IEEE, 10, 17578-17589, 10 Feb. 2022, Peer-reviwed, This paper proposes new methods to reduce the computation time by using Point of Interest (POI) extractor with the power trace biasing technique and the correlation distribution for the low complexity correlation power analysis (CPA). The theoretical explanations are provided and the experiments on different platforms such as ASCAD and RISC-V processor based databases are conducted to justify the proposed techniques. Especially, our experiments are performed with different protected schemes such as masking, hiding and combined hiding-masking techniques. The experimental results indicate that our proposed methods provide reliable results in comparison with the standard CPA. By using only a half of the power traces for taking the POIs, our first proposal not only decreases the execution time approximately by half but also enhances the success rate of the attack. Moreover, the second method based on power trace biasing technique is proposed in order to achieve better results and reduce the number of traces needed for selecting the POIs. With only 28.9% of given power traces needed, our second proposed technique reduces the execution time to only 2.6 times of the standard CPA.
Scientific journal, English
DOI URL

A Combined Blinding-Shuffling Online Template Attacks Countermeasure Based on Randomized Domain Montgomery Multiplication
Bien-Cuong Nguyen; Cong-Kha Pham
Proc. of 2022 IEEE International Conference on Consumer Electronics (ICCE), 2022 IEEE International Conference on Consumer Electronics (ICCE), 1-6, 07 Jan. 2022, Peer-reviwed, Online template attacks (OTA), high-efficiency side-channel attacks, are initially presented to attack the elliptic curve scalar. The modular exponentiation is similarly vulnerable to OTA. The correlation between modular multiplication's intermediate products is a crucial leakage of the modular exponent. This paper proposed a practical OTA countermeasure based on randomized domain Montgomery multiplication, which combines blinding and shuffling methods to eliminate the correlation between modular multiplication's inner products without additional computation requirements. The proposed OTA countermeasure is implemented on the Sakura-G board with a suppose that the target board and template board are identical. The experiment results show that the proposed countermeasure is sufficient to protect the modular exponentiation from OTA.
International conference proceedings, English

An Obstacle Avoidance Two-Wheeled Self-Balancing Robot
Ryuichi Tsutada; Trong-Thuc Hoang; Cong-Kha Pham
International Journal of Mechanical Engineering and Robotics Research, IJMERR, 11, 1, 1-7, Jan. 2022, Peer-reviwed, This paper introduces a Two-Wheeled Self-Balancing Robot (TWSBR) which is controlled to avoid obstacles. The TWSBR is a type of the inverted pendulum and is treated as an inherently unstable nonlinear system. Therefore, a continuous appropriate control is required to maintain the inverted state. The TWSBR consists of two DC motors with encoders and 6-axis sensor (accelerometer and gyroscope). All peripherals are connected to a 32-bit RISC-V soft microprocessor implemented on an FPGA, and all control circuits for the peripherals are also implemented on the same FPGA. An attitude control system of the TWSBR is provided through 3 Proportional-Integral- Differential (PID) controllers with a sensor fusion-based on a Kalman Filter, which is implemented on the 32-bit RISC-V soft microprocessor. The obstacle avoidance system of the TWSBR is based on a fuzzy control using multiple ultrasonic sensors. The 32-bit RISC-V soft microprocessor includes a 32-bit fixed-point (Q16.16) arithmetic instructions of addition, subtraction, multiplication, maximum and minimum as a custom instruction set architecture (ISA) extensions for calculation of a speed improvement.
Scientific journal, English
DOI URL

Design of a High-speed 8-bit Flash ADC using Double-Tail Comparator on 180nm CMOS Process
Hong-Hai Thai; Cong-Kha Pham; Duc-Hung Le
Proc. of 2021 8th NAFOSTED Conference on Information and Computer Science (NICS), 2021 8th NAFOSTED Conference on Information and Computer Science (NICS), 1-4, 21 Dec. 2021, Peer-reviwed, This paper presents a high-speed 8-bit Flash ADC. The design, which is considered as a mixed-signal type, includes two main blocks – comparator and encoder. The comparator block contains a TIQ comparator, a control circuit, and a proposed architecture of a Double-Tail (DT) comparator. The advantage of using the DT comparator is to reduce the half number of comparators which helps reduce the design area. The comparator is implemented with custom analog design meanwhile, the encoder block is designed with digital design flow. This mixed-signal circuit is designed and simulated on 180nm CMOS technology. The 8-bit Flash ADC only employs 128 comparators. The applied input clock for testing is 50 MHz with the input voltage ranging from 0.6V to 1.8V. Comparator block outputs 127 bits of thermometer code and sends them to the encoder, which exports 7 LSB bits of the binary code. The MSB bit is decided by only one DT comparator.
International conference proceedings, English
DOI URL

A Low-Power Low-Area SoC based in RISC-V Processor for IoT Applications
Ronaldo Serrano; Marco Sarmiento; Ckristian Duran; Khai-Duy Nguyen; Trong-Thuc Hoang; Koichiro Ishibashi; Cong-Kha Pham
Proc. of 2021 18th International SoC Design Conference (ISOCC), 2021 18th International SoC Design Conference (ISOCC), 1-4, 21 Nov. 2021, Peer-reviwed, The IoT applications use embedded processors to execute lightweight tasks for sensing and management of communications, using different energy harvesting strategies. However, many IoT applications need a low-power consumption for the limitation of power supplies. This paper presents a low-power low-area System On a Chip (SoC) for IoT applications with a stable power supply. The SoC consists of a microprocessor, a 1-KB of Static Random Access Memory (SRAM), a debug module, a timer, a General-Purpose In-Outs (GPIO), and a Serial Peripheral Interface (SPI) programmer. The processor uses a RISC-V Instruction Set Architecture (ISA). The implementation is fabricated in 0.18µm CMOS General Purpose (GP) technology, occupies a 750µm x 536µm. The microprocessor represents only 7.6% of the area of all SoC. The measures denote a 2.17µW with a 1V of supply voltage and 32KHz operating clock frequency.
International conference proceedings, English
DOI URL

ChaCha20-Poly1305 Crypto Core Compatible with Transport Layer Security 1.3
Ronaldo Serrano; Ckristian Duran; Trong-Thuc Hoang; Marco Sarmiento; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
Proc. of 2021 18th International SoC Design Conference (ISOCC), 2021 18th International SoC Design Conference (ISOCC), 1-4, 21 Nov. 2021, Peer-reviwed, The security of the information represents a vital part of all communications protocols. In computer networks, Transport Layer Security (TLS) represents the majority of the use of secure channels for end-to-end communications. However, the efforts are directed only to optimize the software implementations. This paper shows an Authenticated Encryption with Associated Data (AEAD) hardware implementation of ChaCha20-Poly1305 compatible with TLS 1.3. Compared to a software implementation in a RISC-V environment, the performance increase by 7. The AEAD implementation reaches a speed of 21.5-cycle/byte.×The design is implemented in Xilinx Virtex-7 XC7VX485T Field-Programmable Gate Array (FPGA), using 7897 Look-Up Tables (LUT) and 4840 Flip-Flops (FF), represented in 26% of ChaCha20 and 54% of Poly1305.
International conference proceedings, English
DOI URL

A Power-efficient Implementation of SHA-256 Hash Function for Embedded Applications
Binh Kieu-Do-Nguyen; Trong-Thuc Hoang; Cong-Kha Pham; Cuong Pham-Quoc
Proc. of 2021 International Conference on Advanced Technologies for Communications (ATC), 2021 International Conference on Advanced Technologies for Communications (ATC), 1-4, 14 Oct. 2021, Peer-reviwed, SHA-256 is a well-known algorithm widely used in many security applications. The algorithm provides a sufficient level of safety and can be performed efficiently by FPGA devices due to its high parallelism level. This paper presents a high-throughput, low hardware resources usage, and power-efficiency architecture of the SHA-256 algorithm targeting FPGA-based embedded platforms. The SHA-256 computing core takes advantage of the specific architecture of FPGA to achieve high performance. We implement the SHA-256 computing core with hardware description languages so that the computing core is technology-independent. Therefore, the computing core is suitable for building applications with various FPGA-based platforms. We conduct several experiments with both simulation and SoC boards. The experimental results show that the core achieves the same functionality, performance, and power consumption when implemented on different FPGA families. The implemented system with our SHA-256 computing core can function at 139.04 MHz, achieving a bandwidth of up to 1.04 Gbps. The SHA-256 computing core is power-efficient when consuming only 0.072 W with the minimum configuration.
International conference proceedings, English
DOI URL

A Sub-μW Reversed-Body-Bias 8-bit Processor on 65-nm Silicon-on-Thin-Box (SOTB) for IoT Applications
Marco Sarmiento; Khai-Duy Nguyen; Ckristian Duran; Trong-Thuc Hoang; Ronaldo Serrano; Van-Phuc Hoang; Xuan-Tu Tran; Koichiro Ishibashi; Cong-Kha Pham
IEEE Transactions on Circuits and Systems II: Express Briefs, IEEE, 68, 9, 3182-3186-24786, Sep. 2021, Peer-reviwed, This brief presents a sub-μW 8-bit processor which is suitable for such IoT applications. The processor implements the Open8 Instruction Set Architecture (ISA) with an 8-bit datapath and 16-bit bus addressing. The chip contains the processor and a 4-KB of Static Random-Access-Memory (SRAM), and is fabricated by the 65-nm Silicon-On-Thin-Box (SOTB) process. The SOTB process is one of the Fully-Depleted Silicon-On-Insulator (FD-SOI) technology. Hence, the ability to control biasing voltages is one of its key advantages to achieve low-power. The experimental results show that the power consumption at the reverse-body bias can reach down to 50-nW with 0.5-V supply voltage and 32-KHz operating clock frequency. The completed microcontroller consists of the Open8 processor, 32-KB of Read-Only-Memory (ROM), 4-KB of SRAM, Serial Peripheral Interface (SPI), SPI programmer, debug module, General-Purpose In-Outs (GPIOs), and UART. The system was tested using an XC7A100T Xilinx Field-Programmable Gate Array (FPGA); it yielded 1.8% of the total FPGA utilization.
Scientific journal, English
DOI URL

A trigonometric hardware acceleration in 32-bit RISC-V microcontroller with custom instruction
Khai-Duy Nguyen; Dang Tuan Kiet; Trong-Thuc Hoang; Nguyen Quang Nhu Quynh; Xuan-Tu Tran; Cong-Kha Pham
IEICE Electronics Express, The Institute of Electronics, Information and Communication Engineers (IEICE), 18, 16, 1-6, 25 Aug. 2021, Peer-reviwed, This work presents a 32-bit Reduced Instruction Set Computer fifth-generation (RISC-V) microprocessor with a COordinate Rotation DIgital Computer (CORDIC) accelerator. The accelerator is implemented inside the core and being used by the software via custom instruction. The used microprocessor is the VexRiscv with the Instruction Set Architecture (ISA) of RV32IM; that means 32-bit RISC-V including Integer and Multiplication. The experimental results were collected using Field-Programmable Gate Array (FPGA) on the DE2-115 development kit and Application Specific Integrated Chip (ASIC) synthesizer on 180-nm CMOS process library.
Scientific journal, English

A CORDIC-based Trigonometric Hardware Accelerator with Custom Instruction in 32-bit RISC-V System-on-Chip
Khai-Duy Nguyen; Dang Tuan Kiet; Trong-Thuc Hoang; Nguyen Quang Nhu Quynh; Cong-Kha Pham
Proc. of The Hot Chips 33, Hot Chips, 1-13, 22 Aug. 2021, Peer-reviwed, This poster presents a 32-bit Reduced Instruction Set Computer five (RISC-V) microprocessor with a COordinate
Rotation DIgital Computer (CORDIC) algorithm accelerator. The implemented core processor is the VexRiscv CPU, an RV32IM variant of the RISC-V ISA processor. Within the VexRiscv core, the CORDIC accelerator was connected directly to the Execute stage. The core was placed in Briey System-on-Chip (SoC) and was synthesized on Field Programmable Gate Array (FPGA) and on Application Specific Integrated Chip (ASIC) level with the cell logic of ROHM 180nm technology
International conference proceedings, English

System-on-Chip Implementation of Trusted Execution Environment with Heterogeneous Architecture
Trong-Thuc Hoang; Ckristian Duran; Ronaldo Serrano; Marco Sarmiento; Khai-Duy Nguyen; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham; Presenter: Trong-Thuc Hoang; National Institute of; Advanced Industrial Science; Technology (AIST; Tokyo, Japa
Proc. of The Hot Chips 33, Hot Chips, 1-16, 22 Aug. 2021, Peer-reviwed, This poster presents a Trusted Execution Environment (TEE) hardware implementation based on a heterogeneous
architecture. The TEE verifies the integrity of software applications based on a chain of trust with the initial
authentication. The chain-of-trust is implemented in software, using TEE hardware crypto-processors. The initial
authentication is called the Root-of-Trust (RoT), and the isolated 32-bit system handles it. On the peripheral bus,
there are several cryptography accelerators implemented such as SHA3, ED25519, AES, and a True Random Number Generator (TRNG). The TRNG module has not only the public channel over the peripheral bus but also a special private channel just for the isolated core. The proposed system was implemented in a 5mm x 5mm die by the 180-nm ROHM process library.
International conference proceedings, English

A proposal for enhancing training speed in deep learning models based on memory activity survey
Dang Tuan Kiet; Binh Kieu-Do-Nguyen; Trong-Thuc Hoang; Khai-Duy Nguyen; Xuan-Tu Tran; Cong-Kha Pham
IEICE Electronics Express, The Institute of Electronics, Information and Communication Engineers (IEICE), 18, 15, 1-6, 10 Aug. 2021, Peer-reviwed, This paper presents a new approach to profile using a co-operate solution from software and hardware. The idea is to use Field-Programmable-Gate-Array memory as the main memory for the DL training processes on a computer. Then, the memory behaviors from both software and hardware point-of-views can be monitored and evaluated. The most common DL models are selected for the tests, including ResNet, VGG, AlexNet, and GoogLeNet. The CIFAR-10 dataset is chosen for the training database. The experimental results show that the ratio between read and write transactions is roughly about 3 to 1. The requested allocations are varied from 2-Byte to 64-MB, with the most requested sizes are approximately 16-KB to 64-KB. Based on the statistic, a suggestion was made to improve the training speed using an L4 cache for the Double-Data-Rate (DDR) memory. It can be demonstrated that our recommended L4 cache configuration can improve the DDR performance by about 15% to 18%.
Scientific journal, English

極低電圧用チャージポンプ回路
長岡慶一; 範公可
電子情報通信学会論文誌 C, 電子情報通信学会, J104-C, 8, 225-232, 01 Aug. 2021, Peer-reviwed, In recent years, energy harvesting, which harvests electric power from energy existing in the environment, has attracted attention as a power supply method for many wireless sensors. However, the voltage obtained by energy harvesting is very small.A booster circuit that can operate at a low voltage is required to convert these voltages to a sufficient magnitude as the power supply voltage of the LSI. In this work, we propose a charge pump circuit for very low voltage that can boost from 100mV.
Scientific journal, Japanese
DOI URL

Exploiting the Back-Gate Biasing Technique as a Countermeasure Against Power Analysis Attacks
Ba-Anh Dao; Trong-Thuc Hoang; Anh-Tien Le; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
IEEE Access, IEEE, 9, 24768-24786, 05 Feb. 2021, Peer-reviwed, Fully depleted silicon-on-insulator (FD-SOI) technology is renowned for its back-gate bias voltage controllability. It allows devices fabricated with FD-SOI technology to be optimized for low power consumption or high performance with proper back-gate biases, depending on the required application. This article proposes using the back-gate biasing technique in novel countermeasures against power analysis attacks. Theoretical explanations are discussed, and realistic differential power analysis (DPA) attacks, targeting AES-128 encryption on a 65-nm STOB 32-bit RISC-V microcontroller, are conducted to justify the proposed idea. The experimental results show that when compared with applying no bias, applying our first proposal, which involves using forward back-gate bias, not only improves the test device performance but also enhances its resistance to DPA attacks. Moreover, vulnerability to DPA attacks is kept unchanged when a reverse back-gate bias is applied to achieve low power consumption. The DPA resistance is even more vital when combining the back-gate bias technique with a lower supply voltage.
Scientific journal, English
URL
DOI URL

A Real-time Cache side-channel attack detection system on RISC-V Out-of-order processor
Anh-Tien Le; Trong-Thuc Hoang; Ba-Anh Dao; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), 1-1, 2021
Scientific journal
URL
DOI URL

Correlation Power Analysis Attack Resisted Cryptographic RISC-V SoC with Random Dynamic Frequency Scaling Countermeasure
Ba-Anh Dao; Trong-Thuc Hoang; Anh-Tien Le; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), 1-1, 2021
Scientific journal
URL
DOI URL

A Fully Digital True Random Number Generator With Entropy Source Based in Frequency Collapse
Ronaldo Serrano; Ckristian Duran; Trong-Thuc Hoang; Marco Sarmiento; Khai-Duy Nguyen; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), 9, 105748-105755, 2021
Scientific journal
URL
DOI URL

Cryptographic Accelerators for Trusted Execution Environment in RISC-V Processors
Trong-Thuc Hoang; Ckristian Duran; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2020), IEEE, 1-4, 10 Oct. 2020, Peer-reviwed, The trusted execution environment protects data by taking advantage of memory isolation schemes. Most of the software implementations on security enclaves offer a framework that can be implemented on any processor architecture. Assuming that privilege escalation is not possible through software means, the only way to access protected data is over authentication over a driver in kernel mode. However, the use of hardware back-doors cannot prevent processor execution in more privileged modes. Implementation of kernel-mode allows the reading of sensitive data over the protected regions of memory. In this work, a proposal of crypto-accelerator is described. The peripheral bus in the proposed architecture features a write-only secure memory. That means the cryptography operations on the software level can not read the sensitive data from that secure memory. This approach suppresses any cache coherence manipulator and fault execution-related attacks against reading sensitive data. The peripheral can be useful to accelerate the cryptography operations, and store securely intermediate calculations as well as storing secure keys.
International conference proceedings, English
DOI URL

Low-power High-performance 32-bit RISC-V Microcontroller on 65-nm Silicon-On-Thin-BOX (SOTB) Date of Evaluation
Trong-Thuc Hoang; Ckristian Duran; Khai-Duy Nguyen; Tuan-Kiet Dang; Quynh Nguyen; Quang Nhu; Phuc Hong Than; Xuan-Tu Tran; Duc-Hung Le; Akira Tsukamoto; Kuniyasu Suzaki; Cong-Kha Pham
IEICE Electronics Express, The Institute of Electronics, Information and Communication Engineers (IEICE), 17, 10, 1-6, 04 Sep. 2020, Peer-reviwed, In this paper, a 32-bit RISC-V microcontroller in a 65-nm Silicon-On-Thin-BOX (SOTB) chip is presented. The system is developed based on the VexRiscv Central Processing Unit (CPU) with the Instruction Set Architecture (ISA) extensions of RV32IM. The proposed SoC performs the Dhrystone and Core mark benchmarks with the results of 1.27 DMIPS/MHz and 2.4 Coremark/MHz, respectively. The layout occupies 1.32-mm2ofdie area, which equivalents to 349,061 of NAND2 gate-counts. The 65-nmSOTB process is chosen not only because of its low-power feature but also because of the back-gate biasing technique that allows us to control the microcontroller to favor the low-power or the high-performance operations. The measurement results show that the highest operating frequency of 156-MHz is achieved at 1.2-V supply voltage (VDD) with+1.6-V back-gate bias voltage (VBB). The best power density of 33.4-uW/MHz is reached at 0.5-V VDD wit h+0.8-V VBB. The least current leakage of 3-nA is retrieved at0.5-V VDD with−2.0-V VBB
Scientific journal, English
URL
DOI URL

Dynamic Frequency Scaling as a countermeasure against simple power analysis attack in RISC-V processors.
Ba Anh Dao; Anh Tien Le; Trong Thuc Hoang; Akira Tsukamoto; Kuniyasu Suzakii; Cong Kha Pham
Proc. of The First International Workshop on Secure RISC-V (SECRISCV)., The First International Workshop on Secure RISC-V (SECRISCV)), 1-4, 23 Aug. 2020, Peer-reviwed, Dynamic Frequency Scaling (DFS) is a technique related to dynamically changing the clock frequency of hardware modules during their operation. This paper demonstrates integrating DFS technique into an open-source RISC-V processor and used it as a simple, cost-effective countermeasure against Simple Power Analysis attack. The integrated processor is implemented in Sakura-X FPGA board for experiments. Results from experiments show that the DFS module can cover up sensitive information in measured power traces while hardware resources requirements of the processor are virtually unchanged.
International conference proceedings, English

TEE Boot Procedure with Crypto accelerators in RISC-V Processors
Ckristian Duran; Trong Thuc Hoang; Akira Tsukamoto; Kuniyasu Suzaki; Cong Kha Pham
Proc. of Fourth Workshop on Computer Architecture Research with RISC-V (CARRV 2020), Fourth Workshop on Computer Architecture Research with RISC-V (CARRV 2020)), 1-4, 30 May 2020, Peer-reviwed, In this paper, a Trusted Execution Environment (TEE) boot procedure with RISC-V processors and crypto-accelerators is presented. The RISC-V system consists of dual cores of Rocket Chip and an SHA-3 accelerator connected on the peripheral bus. Together with the Ed25519 computation on software, the TEE boot procedure, which based on the Keystone framework, is implemented. The Keystone framework provides TEE that can protect data by taking advantage of the Physical Memory Protection (PMP) of the RISC-V ISA. The completed system is built and tested on an Altera
Field-Programmable Gate Array (FPGA). The experimental results show that the calculation process for any bootloader payload to authenticate can be reduced about 2.5 decades of milliseconds in comparison with pure software approaches.
International conference proceedings, English

Experiment on Replication of Side Channel Attack via Cache of RISC-V Berkeley Out-of-Order Machine (BOOM) Implemented on FPGA
Anh-Tien Le; Ba-Anh Dao; Kuniyasu Suzaki; Cong Kha Pham
Proc. of Fourth Workshop on Computer Architecture Research with RISC-V (CARRV 2020), Fourth Workshop on Computer Architecture Research with RISC-V (CARRV 2020)), 1-4, 30 May 2020, Peer-reviwed, In this work, we start by describing the implementation and benchmark of the BOOM processor (RISC-V Berkeley
Out-of-Order Machine) on an FPGA board ZC706. Then compare the result with the RISC-V in-order scalar processor the
Rocket Core. Subsequently, we demonstrate a side-channel attack that exploits some characteristics of an Out-of-Order processor in general and the BOOM processor in particular. The experiment would be a premise for constructing a custom heterogeneous processor.
International conference proceedings, English

An Efficient Hardware Implementation of Residual Data Binarization in HEVC CABAC Encoder
Dinh-Lam Tran; Xuan-Tu Tran; Duy-Hieu Bui; Cong-Kha Pham
Electronics — Open Access Journal, MDPI, 9, 684, 1-12, 23 Apr. 2020, Peer-reviwed, This paper proposes an efficient
hardware implementation of a binarizer for CABAC that focuses on low area cost, low power consumption while still providing enough bins for high-throughput CABAC. On the average, the proposed design can process up to 3.5 residual syntax elements (SEs) per clock cycle at the maximum
frequency of 500 MHz with an area cost of 9.45 Kgates (6.41 Kgates for the binarizer core) and power consumption of 0.239mW(0.184mWfor the binarizer core) with NanGate 45 nm technology. It shows that our proposal achieved a high overhead-efficiency of 1.293 Mbins/Kgate/mW, much better than the other related high performance designs. In addition, our design also achieved a high power-efficiency
of 8288 Mbins/mW; this is important factor for handheld applications.
Scientific journal, English
URL
DOI URL

Quick Boot of Trusted Execution Environment with Hardware Accelerators
Trong-Thuc Hoang; Christian Duran; Duc-Thinh Nguyen-Hoang; Duc-Hung Le; Akira Tsukamoto; Kuniyasu Suzak; Cong-Kha Pham
IEEE Access, IEEE, 8, **, 74015-74023, 13 Apr. 2020, Peer-reviwed, The Trusted Execution Environment (TEE) offers a software platform for secure applications. The TEE offers a memory isolation scheme and software authentication from a high privilege mode. The procedure uses different algorithms such as hashes and signatures, to authenticate the application to secure. Although the TEE hardware has been defined for memory isolation, the security algorithms often are executed using software implementations. In this paper, a RISC-V system compatible with TEEs featuring security algorithm accelerators is presented. The hardware accelerators are the SHA-3 hash and the Ed25519 elliptic curve algorithms. TileLink is used for the communications between the processor and the register of the accelerators. For the TEE boot, the software procedures are switched with the accelerated counterpart. Comparing to the software approach, a 2.5-decade increment is observed in the throughput of the signature procedure using the SHA-3 acceleration for big chunks of data. The Ed25519 performs 90% better compared to the software counterpart in execution times.
Scientific journal, English
DOI URL

High-Performance FPGA-Based BWA-MEM Accelerator
Binh Kieu-Do-Nguyen; Cuong Pham-Quoc; Cong-Kha Pham
Proc. of 2020 9th International Conference on Software and Computing Technologies (ICSCT 2020), 2020 9th International Conference on Software and Computing Technologies (ICSCT 2020), 1-4, 04 Apr. 2020, Peer-reviwed, There is no denying that Bioinformatics is one of the most important realms for our forthcoming development. As a demonstration of this fact, a plethora of new algorithms that were published over the last decade. Those significantly boost up the processes of biological analysis, especially for DNA alignment. Despite their undeniable contributions, it is still far more to state that DNA alignment has already achieved the ideal performance. In this work, we focus on the DNA alignment system which is based on our improved BWA-MEM algorithm that we have already published. Besides that, we also
28
propose some optimization methods which was applied in order to improve the performance as well as the stability of our entire system. The system offers a speed-up by 46.52x when compared with the other computing platforms.
International conference proceedings, English

Hardware-assisted High-performance DNA Alignment System
Binh Kieu-Do-Nguyen; Cuong Pham-Quoc; Cong-Kha Pham
Proc. of 2020 5th International Conference on Intelligent Information Technology (ICIIT 2020), IEEE, 1-4, 19 Feb. 2020, Peer-reviwed, The investigations of DNA become more and more important in
this era. A plethora of new algorithms that were published over the last decade are apparent evidences for this fact. In the DNA‘s researches, alignment is one of the most important steps that is especially taken care and continuously developed. Despite they already have a lot of algorithms for this problems, and some of them provide impressive enhancements. But it is still far more to state that DNA alignment has already achieved the ideal performance. Therefore, in this work, we promote an efficient architecture which is based on our improved BWA-MEM algorithm that we have already published in [14]. Beside that, we also propose a communication protocol as well as as its API in order to ensure the accuracy and stability of the system. The system offers a speed-ups by 18.14x when compared with modern computing platforms.
International conference proceedings, English

Reducing Bitrate and Increasing the Quality of Inter Frame by Avoiding Quantization Errors in Stationary Blocks
Xuan-Tu Tran; Ngoc-Sinh Nguyen; Duy-Hieu Bui; Minh-Trien Pham; Hung K. Nguyen; Cong-Kha Pham
EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, EAI, 7, 22, 1-10, 17 Jan. 2020, Peer-reviwed, In image compression and video coding, quantization error helps to reduce the amount of information of the high frequency components. However, in temporal prediction the quantization error contributes its value as noise in the total residual information. Therefore, the residual signal of the inter-picture prediction is greater than the expected one and always differs zero value even input video contains only homogeneous frames. In this paper, we reveal negative effects of quantization errors in inter prediction and propose a video encoding scheme which is able to avoid side effects of quantization errors in the stationary parts. We propose to implement a motion detection algorithm as the first stage of video encoding to separate the video into two
parts: motion and static. The motion information allows us to force residual data of non-changed part to zero and keep the residual signal of motion regularly. Beside, we design block-based filters which improve motion results and filter those results fit into block encode size well. Fixed residual data of static information permits us to precalculate its quantized coefficient and create a bypass encoding path for it.
Scientific journal, English
URL
DOI URL

An Improved All-Digital Background Calibration Technique for Channel Mismatches in High Speed Time-Interleaved Analog-to-Digital Converters
Van-Thanh Ta; Van-Phuc Hoang; Van-Phu Pham; Cong-Kha Pham
Electronics — Open Access Journal, MDPI, 9, 73, 1-13, 01 Jan. 2020, Peer-reviwed, The time-interleaved analog-to-digital converters (TIADCs), performance is seriously affected by channel mismatches, especially for the applications in the next-generation communication systems. This work presents an improved all-digital background calibration technique for TIADCs by combining the Hadamard transform for calibrating gain and timing mismatches and averaging for offset mismatch cancellation. The numerical simulation results show that the proposed calibration technique completely suppresses the spurious images due to the channel mismatches at the output spectrum, which increases the spurious-free dynamic range (SFDR) and signal-to-noise and distortion ratio (SNDR) by 74 dB and 43.7 dB, respectively. Furthermore, the hardware co-simulation on the field programmable gate array (FPGA) platform is performed to confirm the effectiveness of the proposed calibration technique. The simulation and experimental results clarify the improvement of the proposed calibration technique in the TIADC’s performance.
Scientific journal, English
URL

Low-Power Floating-Point Adaptive-CORDIC-Based FFT Twiddle Factor on 65-nm Silicon-on-Thin-BOX (SOTB) With Back-Gate Bias
Trong-Thuc Hoang; Xuan-Thuan Nguyen; Duc-Hung Le; Cong-Kha Pham
IEEE Transactions on Circuits and Systems II: Express Briefs, IEEE, 66, 10, 1723-1727, 11 Jul. 2019, Peer-reviwed, In this brief, a silicon-on-thin-BOX (SOTB) implementation of single-precision floating-point fast-Fourier-transform (FFT) twiddle factor (TF) is presented. The architecture of the proposed TF is developed based on the adaptive method of the coordinate rotation digital computer (CORDIC) algorithm. The 65-nm SOTB technology was chosen because of its ultra-low-power advantage. Furthermore, the back-gate bias technique can be applied on an SOTB chip to adjust the operation for high-performance or low-power requirement. The layout of the SOTB 65-nm TF core is about 22869 gate-count on the die area of 86721 $ \mu \text {m}^{2}$ . The measurement results show that the core reached its highest operating frequency of 55 MHz at the 1.2-V supply voltage (V DD ) with the forward back-gate bias (FBB) ≥ 1.5 V. The power and energy consumptions at this point were 1.54 mW and 27.91 pJ/cycle, respectively. The lowest operating V DD was at 0.5 V with the FBB ≥ 0.5 V. In the standby mode, when the clock-gating technique was deployed, the leakage current can be reduced to 0.4 nA at the 0.4 V V DD and −2.5-V reverse back-gate bias (RBB).
Scientific journal, English
URL
DOI URL

A 1.2-V 162.9-pJ/cycle Bitmap Index Creation Core with 0.31-pW/bit Standby Power on 65-nm SOTB
Xuan-Thuan Nguyen; Trong-Thuc Hoang; Hong-Thu Nguyen; Katsumi Inoue; Cong-Kha Pham
Microprocessors and Microsystems, Elsevier, 69, 112-117, 04 Jun. 2019, Peer-reviwed, Maximizing the performance during peak workload hours and minimizing the power consumption during off-peak time plays a significant role in the energy-efficient systems. Our previous work has proposed an efficient architecture of a bitmap index creator (BIC) that produced higher indexing throughput than the central processing units and graphics processing units. This paper extends the previous study by focusing on the ASIC implementation of BIC in a 65-nm silicon-on-thin-buried-oxide (SOTB) CMOS process. The fabricated chip could operate at different supply voltages, from 0.4 V to 1.2 V. In the active mode with the supply voltage of 1.2 V, it was fully operational at 41 MHz and consumed 6.68 mW, or 162.9 pJ/cycle. In the standby mode with the supply voltage of 0.4 V and clock gated, the power consumption lowered to 10.6 μW. More significantly, when the reverse back-gate bias voltages are supplied, the standby power deeply reduced to 2.64 nW. This achievement is of considerable importance to the energy-efficient systems.
Scientific journal, English
DOI URL

A 1.2-V 90-MHz Bitmap Index Creation Accelerator with 0.27-nW Standby Power on 65-nm Silicon-On-Thin-Box (SOTB) CMOS
Xuan-Thuan Nguyen; Trong-Thuc Hoang; Katsumi Inoue; Ngoc-Tu Bui; Van-Phuc Hoang; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2019), IEEE, 1-4, 26 May 2019, Peer-reviwed, Although bitmap index (BI) can surmount complex and multi-dimensional queries, the creation of BI itself is a time-consuming task. Many studies exploit the highly parallel processing capabilities of multi-core CPUs, graphics processing units (GPUs), or field-programmable gate arrays (FPGAs) to overcome this obstacle. This study, on the other hand, proposes a 65-nm silicon-on-thin-buried-oxide (SOTB) hardware accelerator dedicated to BI creation. The fabricated chip could operate at different supply voltages, from 0.45-V to 1.2-V. Concretely, in the active mode with the supply voltage of 1.2-V, this chip was fully operational at 90-MHz and consumed approximately 88.1-pJ/cycle. In the standby mode with the supply voltage of 0.45-V and clock gated, the power consumption was only 476.1-nW. Moreover, when the reverse back-gate bias voltage of −2.5-V is supplied, the standby power sharply dropped to 0.27-nW or approximately 1,763 times. This achievement is vitally essential for the energy-efficient applications, where the performance should be maximized during peak workload hours and the power should be minimized during off-peak time.
International conference proceedings, English
DOI URL

Live Demonstration: Real-Time Auto-Exposure Histogram Equalization Video-System using Frequent Items Counter
Takahiro Hosaka; Trong-Thuc Hoang; Van-Phuc Hoang; Duc-Hung Le; Katsumi Inoue; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2019), IEEE, 1-1, 26 May 2019, Peer-reviwed, In this demonstration, a real-time auto-exposure Histogram Equalization (HE) video-system is presented. The video histogram is extracted in each frame by the Frequent Items Counter (FIC) core. Based on the HE Transformation Function (HE-TF), the camera exposure value is adjusted to fit the current luminance condition. The proposed system was developed on the VEEK-MT-SoCKit with an FPGA chip of Altera Cyclone V SoC and a 5-Megapixel (5-MP) Charge Coupled Device (CCD). The video resolution is 1280×800. The monitor display rate is at 60Hz while the CCD capture rate is at 24.28Hz to 38.98Hz depend on the exposure value. The histogram, the transformation function, and the camera exposure value are changed in each frame to satisfy the real-time requirement.
International conference proceedings, English
DOI URL

A 1.05-V 62-MHz with 0.12-nW standby power SOTB-65nm chip of 32-point DCT based on adaptive CORDIC
Duc-Hung Le; Trong-Thuc Hoang; Cong-Kha Pham
IEICE Electronics Express, The Institute of Electronics, Information and Communication Engineers (IEICE), 16, 10, 1-6, 25 May 2019, Peer-reviwed, In this paper, a Silicon On Thin Buried-oxide (SOTB) implementation of a 32-point Discrete Cosine Transform (DCT) is presented. The architecture is based on the fixed-rotation adaptive COordinate Rotation DIgital Computer (CORDIC) algorithm. The SOTB-65nm process was chosen due to the profound advantages of low-power and high-performance. The core layout contained about 47.2K gate-count and had the size of about 183K-μm2. The measurement results showed that the highest operating frequency of 62-MHz was achieved at the 1.05-V power supply and consumed about 737-μW and 11.89-pJ/cycle. In the standby mode, the least power consumption of 0.12-nW was achieved at the 0.4-V power supply when the clock-gating technique and -2.5-V reverse back-gate biassing were applied.
Scientific journal, English
DOI URL

An Efficient I/O Architecture for RAM-based Content-Addressable Memory on FPGA
Xuan-Thuan Nguyen; Trong-Thuc Hoang; Hong-Thu Nguyen; Katsumi Inoue; Cong-Kha Pham
IEEE Transactions on Circuits and Systems II: Express Briefs, IEEE, 66, 3, 472-476, Mar. 2019, Peer-reviwed, Despite the impressive search rate of one key per clock cycle, the update stage of a random-access-memory-based content-addressable-memory (RAM-based CAM) always suffers high latency. Two primary causes of such latency include: (1) the compulsory erasing stage along with the writing stage and (2) the major difference in data width between the RAM-based CAM (e.g., 8-bit width) and the modern systems (e.g., 256-bit width). This brief, therefore, proposes an efficient input/output (I/O) architecture of RAM-based binary CAM (RCAM) for low-latency update. To achieve this goal, three techniques, namely centralized erase RAM, bit-sliced, and hierarchical-partitioning, are proposed to eliminate the latency of erasing stage, as well as to allow RCAM to exploit the bandwidth of modern systems effectively. Several RCAMs, whose data width ranges from 8 bits to 64 bits, were integrated into a 256-bit system for the evaluation. The experimental results in an Intel Arria V 5ASTFD5 FPGA prove that at 100 MHz, the proposed designs achieve at least 9.6 times higher I/O efficiency as compared to the traditional RCAM.
Scientific journal, English
URL
DOI URL

A 0.75-V 32-MHz 181- W SOTB-65nm Floating-point Twiddle Factor Using Adaptive CORDIC
Ngoc-Tu Bui; Trong-Thuc Hoang; Duc-Hung Le; Cong-Kha Pham
Proc. of The 2019 IEEE International Conference on Industrial Technology (ICIT), IEEE, 1-4, 13 Feb. 2019, Peer-reviwed, In this paper, a Silicon On Thin Buried-oxide
(SOTB) implementation of the 32-bit floating-point Twiddle Factor
(TF) is presented. The architecture was developed based on
the adaptive COordinate Rotation DIgital Computer (CORDIC).
The CORDIC method is a well-known approach for approximating
the complex-number multiplication, also known as TF
in FFT designs. The SOTB-65nm TF core layout has the size
area of 86.7K- m2. The measurement results showed that at the
best crossing-point of the 0.75-V power supply (VDD), the chip
could run at the maximum operating frequency of 32-MHz and
consumed 181- W power. At the sleep-mode, the leakage power
dropped about 258.6 to 0.7- W at the 0.75-V VDD.
International conference proceedings, English

Hardware System for Quaternion Neural Network Dedicated to Real-Time Systems
Phuoc-Loc Diep; Trong-Thuc Hoang; Cong-Kha Pham
Proc. of The Irago Conference 2018, University of Electro-Communications, Tokyo Toyohashi University of Technology Tokai University, **-**, 01 Nov. 2018, Peer-reviwed
International conference proceedings, English

Low-resource hardware implementation of ECDSA for the Internet of Things
Bien-Cuong Nguyen; Cong-Kha Pham
Proc. of The Irago Conference 2018, University of Electro-Communications, Tokyo Toyohashi University of Technology Tokai University, **-**, 01 Nov. 2018, Peer-reviwed
International conference proceedings, English

Frequent Items Counter Based on Binary Decoders
Kastumi Inoue; Trong-Thuc Hoang; Cong-Kha Pham
IEICE Electronics Express, The Institute of Electronics, Information and Communication Engineers (IEICE), 15, 20, 1-12, 29 Oct. 2018, Peer-reviwed, In this paper the hardware design of frequent items counter is proposed.The key idea is to create a matrix of binary value by using an array of binary decoder to decode all of the input items in parallel. After that, an array of population count modules are applied to the rows of the matrix to generate counting results.The architecture was implemented with five options of bit/item from 6-bit/itemto
10-bit/item, and seven options of count register bit-width from 8-bit counters to 32-bit counters. Therefore,there were 35 different versions of implementation presented in this paper. Those implementations were built on the Field-Programmable Gate Array (FPGA) board of Altera ArriaV SoC development kit. Also, they were synthesized to chips with the process technology of 65nm Silicon On Thin Buried-
Oxide (SOTB). The experimental results of the proposed architecture achieved outstanding timing performances compared to other attempts to date.
Scientific journal, English
DOI URL

VLSI Design of Floating-Point Twiddle Factor Using Adaptive CORDIC on Various Iteration Limitations
Trong-Thuc Hoang; Duc-Hung Le; Cong-Kha Pham
Proc. of The 2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), IEEE, 225-232, 12 Sep. 2018, Peer-reviwed, The design of 32-bit floating-point Fast Fourier Transform (FFT) Twiddle Factor (TF) is proposed in this paper. The architecture was developed based on the adaptive algorithm of COordinate Rotation DIgital Computer (CORDIC). The CORDIC method is a well-known approach for approximating the complex-number multiplication in FFT implementations, also known as TF. An iterative process does the calculations of adaptive CORDIC. Therefore, by limiting the number of iterations, the accuracy performances can be sacrificed for the better outcome of throughput rates. As a result, there are three different FFT TF implementations were presented in this paper. They are TF-4, TF-8, and TF-16 for the design of TF implemented on four, eight, and 16 iteration limitations, respectively. The results of the three implementations were reported on both Field Programmable Gate Array (FPGA) and Application Specific Integrated Chip (ASIC) level. The FPGA results were examined on the Altera Stratix IV development kit, and the ASIC results were reported by the Synopsys tools with the Silicon On Thin Buried-oxide (SOTB) 65nm process library.
International conference proceedings, English
DOI URL

VLSI Design of Frequent Items Counting Using Binary Decoders Applied to 8-bit per Item Case-study
Katsumi Inoue; Trong-Thuc Hoang; Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
Proc. of The 14th Conference on PhD Research in Microelectronics and Electronics (IEEE PRIME 2018), IEEE, 161-164, 02 Jul. 2018, Peer-reviwed, In this paper, the Very-Large-Scale Integration design of Frequent Items Counting (FIC) is proposed. The fundamental idea is to use binary decoders to generate a matrix of binary values of all input items, with each column represents for one items binary value. Then, the sums are executed on the rows of the matrix to retrieve the input items counting results. The design was implemented on the Altera Arria V SoC Development Kit. After successful built and verified on Field Programmable Gate Array (FPGA), the design was synthesized using Synopsys tools with the process of SOTB (Silicon on Thin Buried-oxide) 65nm. Compared to our previous work and the software-based application, the achieved speed results are more than three times and more than 150 times faster, respectively. The SOTB-65nm builds achieved the theory speed about 75% of the average practical results of FPGA implementations.
International conference proceedings, English
DOI URL

A two-stage-pipeline CPU of SH-2 architecture implemented on FPGA and SoC for IoT, edge AI and robotic applications
Kesami Hagiwara; Tomoichi Hayashi; Shumpei Kawasaki; Fumio Arakawa; Oleg Endo; Hayato Nomura; Akira Tsukamoto; Duong Nguyen; Binh Nguyen; Anh Tran; Hoan Hyunh; Ikuo Kudoh; Cong-Kha Pham
21st IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL Chips 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 1-3, 05 Jun. 2018, Peer-reviwed, SH ISA patents were filed in 1991 by Hitachi, and expired in 2014. Thereafter the ISA belonged to the public domain. We developed a 2-stage pipeline SH-2 CPU core, which expends only 4,655 logic cells of Intel MAX 10 FPGA fabricated on 55nm embedded NOR flash technology, 33KG of 40nm NVM process at 240MHz, and 20KG of 0.18um process at 80MHz. We bifurcated the RTL to (1) SoC integration, and (2) small FPGA, each optimized for the respective technology. The MCU which incorporates the CPU supports AHB, APB, UART, CAN-FD, PWM, and ADC. We plan to move this solution to IoT, edge AI and robotic applications. GNU and other compilers, assemblers, simulators, debuggers support the CPU.
International conference proceedings, English
DOI URL

A Scalable High-Performance Priority Encoder Using 1D-Array to 2D-Array Conversion
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2018), IEEE, 1-4, 27 May 2018, Peer-reviwed, In our prior study of an L-bit priority encoder (PE), a so-called one-directional-array to two-directional-array conversion method is deployed to turn an L-bit input data into an M×N-bit matrix. Following this, an N-bit PE and an M-bit PE are employed to obtain a row index and column index. From those, the highest priority bit of L-bit input data is achieved.This brief extends our previous work to construct a scalable architecture of high-performance large-sized PEs. An optimum pair of (M,N) and look-ahead signal are proposed to improve the overall PE performance significantly. The evaluation is achieved by implementing a variety of PEs whose L varies from 4-bit to 4096-bit in 180-nm CMOS technology. According to post-place-and-route simulation results, at PE size of 64 bits, 256 bits, and 2048 bits the operating frequencies reach 649 MHz, 520 MHz,and 370 MHz, which are 1.2 times, 1.5 times, and 1.4 times, as high as state-of-the-art ones.
International conference proceedings, English

A 219-µW 1D-to-2D-Based Priority Encoder on 65-nm SOTB CMOS
Xuan-Thuan Nguyen; Trong-Thuc Hoang; Hong-Thu Nguyen; Katsumi Inoue; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2018), IEEE, 1-4, 27 May 2018, Peer-reviwed, Priority encoder (PE) is recognized as an indispensable component in the content-addressable memory. In this paper, two efficient architecture of 64-bit PE and 256-bit PE using 1D-array to 2D-array conversion (1D-to-2D) method are presented and implemented in a 65-nm Silicon-on-thin-buried-oxide (SOTB) CMOS process. The 1D-to-2D method is exploited because of its advantages in large-sized PE construction. The SOTB CMOS process is utilized because of its prominent advantages of low-power and high-performance configuration using back bias voltages. The measurement results at 1.2 V showed that a fabricated PE256 chip was fully operational at 45 MHz and consumed approximately 219 \textmu W. Additionally, in sleep mode, the leakage power dropped as low as 0.34 \textmu W at 0.6 V.
International conference proceedings, English

High-Speed 8/16/32-Point DCT Architecture Using Fixed-Rotation Adaptive CORDIC
Trong-Thuc Hoang; Duc-Hung Le; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2018), IEEE, 1-4, 27 May 2018, Peer-reviwed, In this paper, the high-speed Discrete Cosine Transform (DCT) architecture is presented using the Adaptive CORDIC (ACor) algorithm built with a fixed-rotation angle. The proposed method is implemented in six different versions corresponding to the number of DCT point, i.e., 8-point (8p), 16-point (16p), and 32-point (32p), and the number of ACor stages, i.e., 2-Stage (2S) and 3-Stage (3S). The implementations are built and verified on an Altera Stratix IV FPGA. The 2S designs of 8p-DCT, 16p-DCT, and 32p-DCT achieve the maximum operating frequencies of 179.86 MHz, 162.60 MHz, and 136.97 MHz, respectively. Moreover, the 2S-32p-DCT module is implemented in ASIC with the 65nm-SOTB CMOS technology. The synthesis shows that the core costs 47.2K gates and consumes about 0.68 mW while operating at 100 MHz clock rate. The 2S implementations of 8p-DCT, 16p-DCT, and 32p-DCT achieve four, five, and six adder-delay, mean-square-error of 1.403e-4, 2.029e-2, and 7.663e-2, and coding gain of 8.8108 dB, 9.0984 dB, and 9.2170 dB, respectively. In comparison with recent works, the proposed method achieves the best timing performances, good accuracy results, and adequate resources cost.
International conference proceedings, English

Hardware Implementation of Background Calibration Technique for TIADCs with Signals in Any Nyquist Bands
Han Le Duc; Van-Phuc Hoang; Duc-Minh Nguyen; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2018), IEEE, 1-4, 27 May 2018, Peer-reviwed, In this work, we investigate a novel fully digital background calibration technique to mitigate the gain and timing mismatches in Time-Interleaved Analog-to-Digital Converters (TIADCs) for the wideband bandlimited input signal at any Nyquist Zones. The correction scheme is simple by subtracting the image signals from the distorted signal. The channel mismatch parameters are estimated based on out-of-band error estimation. Neither an additional reference channel and nor a pilot input are required in calibration. The efficiency of the proposed calibration is demonstrated for a 4-channel 60dB SNR TIADC clocked at 2.7GHz by both simulation and experimental results. The SNDR improvement is 16dB for a multi-tone input occupied at the third Nyquist band. The calibration is validated on Altera FPGA DE4 board. In a Hardware-In-the-Loop emulation framework, the synthesized circuit works effectively and utilizes a very little amount of the hardware resource in the FPGA chip.
International conference proceedings, English

Minimum adder-delay architecture of 8/16/32-point DCT based on fixed-rotation adaptive CORDIC
Trong-Thuc Hoang; Duc-Hung Le; Cong-Kha Pham
IEICE Electronics Express, Institute of Electronics Information Communication Engineers, 15, 10, 1-12, 25 May 2018, Peer-reviwed, In this paper, the minimum adder-delay Discrete Cosine Transform (DCT) architecture is proposed using the Adaptive CORDIC (ACor) algorithm with fixed-rotation implementations. The proposed method has six different versions differ from the number of DCT point, i.e., 8-point (8p), 16-point (16p), and 32-point (32p), and the number of ACor stages, i.e., 2-Stage (2S) and 3-Stage (3S). The Altera Stratix IV and Stratix II FPGAs were used to built and verified the implementations. The 2S designs of 8p, 16p, and 32p DCT achieved the timing performances of four, five, and six adder-delay results, respectively. The proposed method was proven to have the best timing performances, good accuracy results, and adequate resources cost in comparison with other recent works.
Scientific journal, English
DOI URL

Dependence of Short-Channel Effects on Semiconductor Bandgap in Tunnel Field-Effect Transistors
Nguyen Dang Chien; Chun-Hsing Shih; Hung-Jin Teng; Cong-Kha Pham
Journal of Physics: Conference Series, IOP Publishing Ltd, 1034, 1-6, 01 May 2018, Peer-reviwed, Scaling down the bandgap is considered as an essential approach to enhance the performance of tunnel field-effect transistors (TFETs). Using two-dimensional simulations, this study examines the dependence of short-channel effects on the semiconductor bandgap in TFETs. It is shown that the short-channel effect is more severe with using lower bandgap materials although the supply voltage is scaled in parallel with the bandgap. For a given bandgap material, the short-channel effect can be well evaluated by the increase of drain-induced barrier thinning (DIBT) with decreasing the channel length. For different bandgap TFETs, however, their short-channel effects cannot be compared properly by comparing the DIBTs. Adequately considering the effect of bandgap on the TFET scalability is necessary in designing scaled integrated circuits.
Scientific journal, English
URL
DOI URL

High-speed Hardware Implementation of 8-bit per Item Frequent Items Counter
Katsumi Inoue; Trong-Thuc Hoang; Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
Proc. of IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips 21), 1, 18 Apr. 2018, Peer-reviwed, In this paper, the high-speed architecture of Frequent Items
Counting (FIC) is proposed. FIC is a problem of counting
frequently appeared items in the itemset. The task is a must have function in almost every data mining algorithms such as frequent elements [1], iceberg queries [2], and top-k queries [3]. For related works, the space-saving method was the primary method used to solve the FIC problem in software applications [1], [3]. It is an approximation method with the idea of selecting and monitoring only a few best candidates. The algorithm was also implemented in hardware as in [4]. However, due to the approximation approach, the architectures in [4] cannot produce the full FIC table. Therefore, the goal of the proposed FIC architecture in this paper is to produce the completed FIC table. Hence, the proposed implementations did not deploy an approximation method such as space saving algorithm, but the tuple-scan approach. The tuple-scan approach can produce the completed FIC table in a single pass of itemset by maintaining an array of count-register.
International conference proceedings, English

A Low-Power Hybrid Adaptive CORDIC
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham
IEEE Transactions on Circuits and Systems II: Express Briefs, Institute of Electrical and Electronics Engineers Inc., 65, 4, 496-500, 01 Apr. 2018, Peer-reviwed, The purpose of this brief is to introduce a hybrid adaptive coordinate rotation digital computer (HA-CORDIC) implemented on 65-nm silicon on thin buried oxide technology. The supply voltage of HA-CORDIC ranges from 0.25 V to 1.2 V and the lowest energy in active mode and sleep mode are 2.4 pJ/cycle and 0.003 pJ/cycle, respectively. By changing body bias voltages, the leakage current can be reduced to as small as 1.0 nA. Its experimental results proves that HA-CORDIC is potentially a good choice for low-power and high-precision applications in comparison with the previous work.
Scientific journal, English
DOI URL

A high-throughput low-energy Arithmetic Processor
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham
IEICE Transactions on Electronics, Institute of Electronics, Information and Communication, Engineers, IEICE, E101C, 4, 281-284, 01 Apr. 2018, Peer-reviwed, In this paper, the hardware architecture of a CORDIC-based Arithmetic Processor utilizing both angle recoding (ARD) CORDIC algorithm and scaling-free (SCFE) CORDIC algorithm is proposed and implemented in 180 nm CMOS technology. The arithmetic processor is capable of calculating the sine, cosine, sine hyperbolic, cosine hyperbolic, and multiplication function. The experimental results prove that the design is able to work at 100 MHz frequency and requires 12.96 mW power consumption. In comparison with some previous work, the design can be seen as a good choice for high-throughput low-energy applications.
Scientific journal, English
DOI URL

A CORDIC-based QR decomposition for MIMO signal detector
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Trong-Thuc Hoang; Cong-Kha Pham
IEICE Electronics Express, Institute of Electronics Information Communication Engineers, 15, 6, 1-8, 25 Mar. 2018, Peer-reviwed, The purpose of this article is to propose a CORDIC-based QR Decomposition (CQRD) for MIMO Signal Detector module with qualities of low-resource and low-latency. The design contains four stages with six CORDIC modules in which its hardware architecture employs both vectoring and rotation mode equations. The evaluated results of CORDIC-based QRD prove that the proposed hardware design is high performance, low resource, and low latency. Because of the advantages of CQRD, it is suitable for the signal detector in MIMO systems.
Scientific journal, English
DOI URL

An FPGA-Based Hardware Accelerator for Energy-Efficient Bitmap Index Creation
Xuan-Thuan Nguyen; Trong-Thuc Hoang; Hong-Thu Nguyen; Katsumi Inoue; Cong-Kha Pham
IEEE Access, Institute of Electrical and Electronics Engineers Inc., 6, 16046-16059, 14 Mar. 2018, Peer-reviwed, Bitmap index is recognized as a promising candidate for online analytics processing systems, because it effectively supports not only parallel processing but also complex and multi-dimensional queries. However, bitmap index creation is a time-consuming task. In this paper, by taking full advantage of massive parallel computing of field-programmable gate array (FPGA), two hardware accelerators of bitmap index creation, namely BIC64K8 and BIC32K16, are originally proposed. Each of the accelerator contains two primary components, namely an enhanced content-addressable memory and a query logic array module, which allow BIC64K8 and BIC32K16 to index 65 536 8-bit words and 32 768 16-bit words in parallel, at every clock cycle. The experimental results on an Intel Arria V 5ASTFD5 FPGA prove that at 100 MHz, BIC64K8 and BIC32K16 achieve the approximate indexing throughput of 1.43 GB/s and 1.46 GB/s, respectively. The throughputs are also proven to be stable, regardless the size of the data sets. More significantly, BIC32K16 only consumes as low as 6.76% and 3.28% of energy compared to the central-processing-unit- and graphic-processing-unit-based designs, respectively.
Scientific journal, English
DOI URL

An efficient fixed-point arithmetic processor using a hybrid CORDIC algorithm
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham
Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC, Institute of Electrical and Electronics Engineers Inc., 2018-, 327-328, 20 Feb. 2018, Peer-reviwed, The purpose of this article is to introduce a CORDIC-based Arithmetic Processor which utilizes both angle recoding (ARD) and scaling-free (SCFE) CORDIC algorithms. The proposed processor is able to operate the sine, cosine, sine hyperbolic, cosine hyperbolic, and multiplication function. Its hardware architecture implemented in 180 nm CMOS technology is capable of working at 100 MHz frequency and spends 12.96 mW power consumption. In comparison with some previous work, the design is a good choice for high-throughput low-energy applications.
International conference proceedings, English
DOI URL

Design of ultra-low power AES encryption cores with silicon demonstration in SOTB CMOS process
V-P Hoang; V-L Dao; C-K Pham
ELECTRONICS LETTERS, INST ENGINEERING TECHNOLOGY-IET, 53, 23, 1512-1513, Nov. 2017, Peer-reviwed, The design of ultra-low power advanced encryption standard (AES) encryption cores for emerging wireless networks and Internet of things systems by combining optimised architectures, a simple clock gating technique and an advanced 65 nm silicon on thin buried oxide (SOTB) CMOS process is presented. The implementation results show that the proposed 2-Sbox AES encryption core requires the smallest number of clock cycles and achieves the lowest power consumption of 0.4 mu W/MHz which is 3.3x lower than that of the best previous presented AES encryption core, with a very small area overhead. Moreover, the proposed 1-Sbox AES encryption core consumes very low hardware resources of 2.4 kgates gate equivalent.
Scientific journal, English
DOI URL

Quadrature Multi-carrier DCSK: A High-efficiency Scheme for Radio Communications
Xuan-Quyen Nguyen; Cong-Kha Pham
Proc. of The International Conference on Advanced Technologies for Communications (ACT2017), IEEE, 186-191, Oct. 2017, Peer-reviwed, In the proposed scheme, the chaotic spreading sequence is transmitted on a predefined frequency the same as in the conventional MC-DCSK, while each of the remaining frequencies is phase-shifted an 90 angle in order to produce two quadrature sub-carriers located at the same frequency. These subcarriers are modulated by the databearing sequences which are the product of the chaotic spreading sequence and the corresponding bit sub-streams in parallel. The use of quadrature modulation aims at doubling the data rate
for a defined bandwidth and hence improve the bandwidth
efficiency of the system. In the receiver, the chaotic sequence retrieved from the predefined frequency is correlated with the data-bearing sequences retrieved from the subcarriers. The bit sub-streams are recovered based on the sign of the correlation values. The structure and operation of the conventional and proposed schemes are described. The BER performance over a typical model of radio channel is theoretically analyzed and then verified by numerical simulations. The improvement in terms of bit rate, energy and bandwidth efficiencies of QMC-DCSK is
evaluated in the comparison to those of MC-DCSK.
International conference proceedings, English

A low power AES-GCM authenticated encryption core in 65nm SOTB CMOS process
Van-Phuc Hoang; Van-Tinh Nguyen; Anh-Thai Nguyen; Cong-Kha Pham
Midwest Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 2017-, 112-115, 27 Sep. 2017, Peer-reviwed, This paper presents a low power AES-GCM authenticated encryption IP core which combines an improved four-parallel architecture, an advanced 65nm SOTB CMOS technology and a low complexity clock gating technique. As a result, the power consumption of the proposed AES-GCM core is only 8.9mW which is lower than other AES-GCM IP cores presented in literature. The detail implementation results are also presented and discussed.
International conference proceedings, English
DOI URL

FPGA-based frequent items counting using matrix of equality comparators
Trong-Thuc Hoang; Xuan-Thuan Nguyen; Hong-Thu Nguyen; Nhu-Quynh Truong; Duc-Hung Le; Katsumi Inoue; Cong-Kha Pham
Midwest Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 2017-, 285-288, 27 Sep. 2017, Peer-reviwed, In this paper, an FPGA-based implementation of Frequent Items Counting is proposed. The architecture deploys the equality comparator matrix for comparing the input items with themselves to count them instantly within a single operating clock. The proposed architecture is applied to the case of the 8-bit item. That means 256 different types of items in total. The system is built and verified on the Altera Arria V SoC Development Kit. The experimental results show that the implementation can perform on the maximum clock frequency of 40.85 MHz and requires 51,094 ALUTs and 8,417 registers, which is about 29% of the FPGA's resources. The average throughput performance achieves 1,280 millions items per second, which is about 50 times faster than that of the software-based application at the same setting.
International conference proceedings, English
DOI URL

Highly parallel bitmap-based regular expression matching for text analytics
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Katsumi Inoue; Osamu Shimojo; Cong-Kha Pham
Proceedings - IEEE International Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 25 Sep. 2017, Peer-reviwed, Text analytics has become increasingly important in the past few years because of the substantial growth in the amount of research, business, and government needs. An efficient text analytics system is likely to require high-powered regular expression matching (REGEX), as REGEX operations dominate the whole execution time. Some approaches have exploited the parallelism of graphic processing units (GPUs) and field-programmable logic arrays (FPGAs) to boost REGEX's performance. Nevertheless, those approaches still used finite-state automaton to detect the given patterns while automation structure is naturally inadequate for parallel processing. In this paper, we propose a completely different hardware architecture of REGEX that employs a bitmap index instead of the finite-state automaton. Internal logic gates/registers and embedded memory of FPGA are used to construct the query processing units and a bitmap index, respectively. The experimental results on an Intel Arria V FPGA prove that our REGEX is fully operational at 100 MHz and can process a 64-character query inside a 64-KB text data within 43.76 μs. The throughput achieved, therefore, reaches 11.98 Gbps.
International conference proceedings, English
DOI URL

A Scalable High-Performance Priority Encoder Using 1D-Array to 2D-Array Conversion
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 64, 9, 1102-1106, Sep. 2017, Peer-reviwed, In our prior study of an L-bit priority encoder (PE), a so-called one-directional-array to two-directional-array conversion method is deployed to turn an L-bit input data into an M x N-bit matrix. Following this, an N-bit PE and an M-bit PE are employed to obtain a row index and column index. From those, the highest priority bit of L-bit input data is achieved. This brief extends our previous work to construct a scalable architecture of high-performance large-sized PEs. An optimum pair of (M, N) and look-ahead signal are proposed to improve the overall PE performance significantly. The evaluation is achieved by implementing a variety of PEs whose L varies from 4-bit to 4096-bit in 180-nm CMOS technology. According to post-placeand-route simulation results, at PE size of 64 bits, 256 bits, and 2048 bits the operating frequencies reach 649 MHz, 520 MHz, and 370 MHz, which are 1.2 times, 1.5 times, and 1.4 times, as high as state-of-the-art ones.
Scientific journal, English
URL
DOI URL

Reliable and Energy-Efficient Transmission on the Internet-of-Video-Things
Yuichiro Mori; Xuan-Thuan Nguyen; Cong-Kha Pham
Proc. of The 17th International Symposium on Communications and Information Technologies (ISCIT2017), 1-4, Sep. 2017, Peer-reviwed, Due to the rapid development of smart homes, smart grid, and intelligent transportation, Internet-of-Video-Things have become increasingly important. IoVT is considered as a part of Internet-of-Things (IoT) that can effectively deal with large volumes of data, such as image and video. In IoVT, reliable and energy-efficient transmission is extremely important. The reliability guarantees all data are properly transferred in the network, while the energy efficiency allows a large amount of data to be processed at low power consumption. In this paper, a hardware platform based on Raspberry Pi Zero (RPZ) is proposed. RPZ is exploited due to its integrated H.264 hardware encoder/decoder. A source node is composed of a RPZ, a camera, and an Atmel RF, whereas a sink node excludes the camera. The input is a 640$\times$480@30fps video, and the output is the 300-Kbps H.264 encoded bit stream. Based on various experiments, we concluded that data are properly transferred and the energy per bit is approximately 6.4 nJ/bit.
International conference proceedings, English

FPGA-based Frequent Items Counting Using Matrix of Equality Comparators
Trong-Thuc Hoang; Xuan-Thuan Nguyen; Hong-Thu Nguyen; Nhu-Quynh Truong; Duc-Hung Le; Katsumi Inoue; Cong-Kha Pham
Proc. of The 60th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), 285-288, Aug. 2017, Peer-reviwed, In this paper, an FPGA-based implementation of
Frequent Items Counting is proposed. The architecture deploys the equality comparator matrix for comparing the input items with themselves to count them instantly within a single operating clock. The proposed architecture is applied to the case of the 8-bit item. That means 256 different types of items in total. The system is built and verified on the Altera Arria V SoC Development Kit. The experimental results show that the implementation can
perform on the maximum clock frequency of 40.85 MHz and
requires 51,094 ALUTs and 8,417 registers, which is about 29% of the FPGA’s resources. The average throughput performance achieves 1,280 millions items per second, which is about 50 times faster than that of the software-based application at the same setting.
International conference proceedings, English
URL
DOI URL

A Low Power AES-GCM Authenticated Encryption Core in 65nm SOTB CMOS Process
Van-Phuc Hoang; Van-Tinh Nguyen; Anh-Thai Nguyen; Cong-Kha Pham
Proc. of The 60th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), 112-115, Aug. 2017, Peer-reviwed, This paper presents a low power AES-GCM IP core which combines an improved four-parallel architecture, an advanced 65nm SOTB CMOS ASIC library and a low complexity clock gating technique. The power consumption of the proposed AES-GCM core with clock gating is only 8.9mW which is much lower than other AES-GCM IP cores presented in literature.
International conference proceedings, English
DOI URL

Low Power Constant gm Rail-to-Rail Opamp only Using Subthreshold Region
Takayuki Ito; Cong-Kha Pham
Proc. of The 2017 Taiwan and Japan Conference on Circuits and Systems, **-**, Aug. 2017, Peer-reviwed, In recent years, there are many portable types of equipments
such as smartphones and tablets. These things are required
to operate with low power consumption for a long time
driving. Also, by developing CMOS technology, the size of
CMOS is getting smaller along with scaling law. Though
this law makes LSI more low power, supply voltage (VDD)
becomes decreasing. It reduces signal range and signal to
noise ratio(SNR) gets worse. Therefore, rail-to-rail operational amplifier(opamp) becomes necessary in the future. Because it can handle signals widely both input/output and improves SNR. In our work, we proposed low power consumption rail-to-rail opamp with using subthreshold region. In addition, we realized constant-gm to prevent the occurrence of signal distortion when the opamp amplifies the signal. Proposed opamp can operate at low supply voltage and suitable for future low power design.
International conference proceedings, English

Highly Parallel Bitmap-Based Regular Expression Matching for Text Analytics
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Katsumi Inoue; Osamu Shimojo; Cong-Kha Pham
Proc. of The IEEE International Symposium on Circuits and Systems (ISCAS 2017), IEEE, 1-4, 28 May 2017, Peer-reviwed, Text analytics has become increasingly important in the past few years because of the substantial growth in the amount of research, business, and government needs. An efficient text analytics system is likely to require high-powered regular expression matching (REGEX), as REGEX operations dominate the whole execution time. Some approaches have exploited the parallelism of graphic processing units (GPUs) and field-programmable logic arrays (FPGAs) to boost REGEX’s performance. Nevertheless, those approaches still used finite-state automaton to detect the given patterns while automation structure is naturally inadequate for parallel processing. In this paper, we propose a completely different hardware architecture of REGEX that employs a bitmap index instead of the finite-state automaton. Internal logic gates/registers and embedded memory of FPGA are used to construct the query processing units and a bitmap index, respectively. The experimental results on an Intel Arria V FPGA prove that our REGEX is fully operational at 100 MHz and can process a 64-character query inside a 64-KB text data within 43.76 µs. The throughput achieved, therefore, reaches 11.98 Gbps.
International conference proceedings, English
URL
DOI URL

A Low-Latency Parallel Pipeline CORDIC
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham
IEICE TRANSACTIONS ON ELECTRONICS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E100C, 4, 391-398, Apr. 2017, Peer-reviwed, COordinate Rotation DIgital Computer (CORDIC) is an efficient algorithm to compute elementary arithmetic such as trigonometric, exponent, and logarithm. However, the main drawback of the conventional CORDIC is that the number of iterations is equal to the number of angle constants. Among a great deal of research to overcome this disadvantage, angle recording method is an effective method because it is capable of reducing 50% of the number of iterations. Nevertheless, the hardware architecture of this algorithm is difficult to implement in pipeline. Therefore, a low-latency parallel pipeline hybrid adaptive CORDIC (PP-CORDIC) architecture is proposed in this paper. In the design hybrid architecture was exploited together with pipeline and parallel technique to achieve low latency. This design is able to operate at 122.6 MHz frequency and costs 8, 12, and 15 clock cycles latency in the best, average, and worst case, respectively. More significantly, the latency of PP-CORDIC in the worst case is 1.1X lower than that of the Altera's commercial floating-point sine and cosine IP cores.
Scientific journal, English
DOI URL

DYNAMIC NODE LABELING SCHEMES FOR XML UPDATES
Xuan-Thuan Nguyen; Su-Cheng Haw; Samini Subramaniam; Cong-Kha Pham
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS, UNIV UTARI MALAYSIA-UUM, 505-510, 2017, Peer-reviwed, Recent years have witnessed the rapid development of XML labeling schemes for the facilitation of XML query processing. Nonetheless, relabeling faces the daunting challenge due to space and time consumption whenever labels are inserted or deleted. In this paper, we review three XML labeling schemes that completely avoid relabeling and can re-use the deleted labels for encoding the new nodes. Afterwards, we also discuss the current trends in labeling schemes.
International conference proceedings, English

A Floating-point FFT Twiddle Factor Implementation Based on Adaptive Angle Recoding CORDIC
Phuong-Thao Vo-Thi; Trong-Thuc Hoang; Cong-Kha Pham; Duc-Hung Le
2017 INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SIGNAL PROCESSING, TELECOMMUNICATIONS & COMPUTING (SIGTELCOM), IEEE, 21-26, 2017, Peer-reviwed, In this paper, a single-precision floating-point FFT Twiddle Factor (TF) implementation is proposed. The architecture is based on Adaptive Angle Recoding CORDIC (AARC) algorithm. The TF design is built and verified on Altera Stratix IV FPGA chip and 65 nm SOTB synthesis. The FPGA implementation has 103.9 MHz maximum frequency, throughput result of 16.966 Mega-Sample per second (MSps), and resources utilization of 7, 747 ALUTs and 625 registers. On the other hand, the SOTB synthesis has 16,858 standard cells on an area of 86,718um(2), 166 MHz maximum frequency, and the speed of 27.107 MSps. The accuracy results are 1.133E - 10 Mean-Square-Error (MSE) and about 26 part-per-million (ppm) maximum error-ratio.
International conference proceedings, English
DOI URL

A 180-nm CMOS Bitmap-Index-Based Query Processor for Fast Data Analytics
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
2017 INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SIGNAL PROCESSING, TELECOMMUNICATIONS & COMPUTING (SIGTELCOM), IEEE, 155-157, 2017, Peer-reviwed, Fast database analytics has become increasingly important nowadays due to the massive growth of global data created by social networking services, mobile devices, and Internet-of-Things. In this paper, a high-performance 180-nm query processor using bitmap index technology is presented. The parallel processing is fully applied in the processor so that thousands of records can be queried in every clock cycle. The post-simulation results prove that a 1.8-V 200-MHz processor can process as many as 5.7x10(6) queries per 92.7x10(6) records in every second.
International conference proceedings, English
DOI URL

A hybrid adaptive CORDIC in 65nm SOTB CMOS process
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham; Trong-Thuc Hoang; Duc-Hung Le
Proceedings - IEEE International Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 2016-, 2158-2161, 29 Jul. 2016, Peer-reviwed, In this paper, a hybird adaptive Coordinate Rotation Digital Computer (HA-CORDIC) has implemented in 65nm Silicon On Thin Buried oxide (SOTB) CMOS technology. In the HA-CORDIC implementation, the adaptive algorithm is utilized for reducing the iteration of CORDIC algorithm. In comparison with other floating-point CORDIC designs, the latency of our proposed scheme is lower. It spends only 12, 20, and 26 clocks cycles in the best, average, and worst case, respectively. The HA-CORDIC exploits some design techniques such as resource sharing, pipeline, and parallel processing to achieve low-resource and low-latency. In 65nm SOTB CMOS technology, this design is able to operate at 50 MHz frequency with 0.5 V supply voltage, 0.36 mA current, and 0.058 mm2 area. Its power consumption of HA-CORDIC is 0.251 mW, about three times lower than the one in conventional CMOS technology. Its leakage current is about 0.492 μA if the supply voltage VDD is 0.4 V and the bias voltage VBB is -1.5 V. This leakage current is about four times lower than that of HA-CORDIC implementing in conventional CMOS.
International conference proceedings, English
DOI URL

An efficient FPGA-based database processor for fast database analytics
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Trong-Thuc Hoang; Katsumi Inoue; Osamu Shimojo; Toshio Murayama; Kenji Tominaga; Cong-Kha Pham
Proceedings - IEEE International Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 2016-, 1758-1761, 29 Jul. 2016, Peer-reviwed, Recent years have witnessed a massive growth of global data due to the ubiquitous internet-of-thing products, social networking services, and mobile devices. Fast database analytics, therefore, has been increasingly attractive to numerous research. In this paper, a low-latency FPGA-based Database Processor (DBP) using bitmap index is proposed. By exploiting available embedded memory blocks and logic elements, a 50-MHz DBP is capable of performing 1,024 queries for entire 32,768 4-KB records within around 3.31 ms. In other words, the DBP can analyze a capacity data of nearly 37.76 GB per second.
International conference proceedings, English
DOI URL

An FPGA approach for high-performance multi-match priority encoder
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
IEICE ELECTRONICS EXPRESS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, 13, 13, 1-9, Jul. 2016, Peer-reviwed, In this paper, a scalable high-performance multi-match priority encoder (MPE) for information retrieval is presented. This approach deploys a new design architecture to construct the large-sized MPEs by using an 8-bit priority encoder as a basement. The experiments in an 8-bit MPE, 64-bit MPE, and 2,048-bit MPE prove that the achieved throughputs are 1.5 times, 1.7 times, and 1.4 times as high as those of previous works. Furthermore, a 4,096-bit MPE is fully operational in an information retrieval system and is capable of returning one match per clock cycle. At the operating frequency of 75 MHz, the processing time in worst and best case are 54.6 mu s and 0.03 mu s, respectively.
Scientific journal, English
DOI URL

A High-Throughput and Low-Power Design for Bitmap Indexing on 65-nm SOTB CMOS
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
Proc. of The IEEE International Conference on IC Design and Technology (ICICDT 2016), 1-4, 27 Jun. 2016, Peer-reviwed
International conference proceedings, English

An FPGA approach for fast bitmap indexing
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
IEICE ELECTRONICS EXPRESS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, 13, 4, 1-9, Feb. 2016, Peer-reviwed, In this paper, an efficient architecture of an FPGA-based bitmap index creation is proposed. The design utilizes a content addressable memory together with a bit-level transpose matrix to index multi-record documents by several given keywords. The experiments in a Cyclone V SX FPGA proved that our circuit could attain throughput of 330.6 million records per second while only using around 71% of embedded memory together with 45% of lookup tables and registers. In fact, achieved throughput is 2.8 times and 1.7 times as high as that of CPU-based and GPU-based design, respectively.
Scientific journal, English
DOI URL

A Parallel Pipeline CORDIC based on Adaptive Angle Selection
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham; Trong-Thuc Hoang; Duc-Hung Le
2016 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATIONS (ICEIC), IEEE, 411-414, 2016, Peer-reviwed, COordinate Rotation DIgital Computer (CORDIC) was an efficient algorithm to compute elementary arithmetic such as multiplication, division, and root extractions. However, conventional CORDIC algorithm requires high latency to obtain the results. This paper proposes a low latency parallel pipeline CORDIC (PP-CORDIC) to calculate trigonometric functions. The results show that PP-CORDIC can operate at 83.64 MHz frequency with the latency was 10, 15, and 17 clock cycles in the best, average, and worst case, respectively. The hardware architecture occupies 7,035 LUTs, and 3,409 registers on Stratix IV FPGA.
International conference proceedings, English

A decentralized localization scheme for swarm robotics based on coordinate geometry and distributed gradient descent
Vy-Long Dang; Binh-Son Le; Trong-Tu Bui; Huu-Thuan Huynh; Cong-Kha Pham
2016 7TH INTERNATIONAL CONFERENCE ON MECHANICAL, INDUSTRIAL, AND MANUFACTURING TECHNOLOGIES (MIMT 2016), E D P SCIENCES, 54, 1-6, 2016, Peer-reviwed, In this paper, a decentralized localization scheme using coordinate geometry and distributed gradient descent (DGD) algorithm is presented. Coordinate geometry is proposed to provide a rough estimation of robots' location instead of the traditional trigonometry approach, which suffers from flip and discontinuous flex ambiguity. Then, these estimations will be used as initial values for DGD algorithm to determine robots' real position. Evaluated results on real mobile robots show an average mean error of 2.56 cm, which is closed to the minimum achievable accuracy of the testing platform (2 cm). For a team of eight robots, the total average run time of the proposed scheme is 66.7 seconds. Finally, its application in swarm robotics is verified by experimenting with a self -assembly algorithm named DASH.
International conference proceedings, English
DOI URL

An Efficient FPGA-based DataBase Processor for Fast Database Analytics
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Trong-Thuc Hoang; Katsumi Inoue; Osamu Shimojo; Toshio Murayama; Kenji Tominaga; Cong-Kha Pham
2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE, 1758-1761, 2016, Peer-reviwed, Recent years have witnessed a massive growth of global data due to the ubiquitous internet-of-thing products, social networking services, and mobile devices. Fast database analytics, therefore, has been increasingly attractive to numerous research. In this paper, a low-latency FPGA-based Database Processor (DBP) using bitmap index is proposed. By exploiting available embedded memory blocks and logic elements, a 50-MHz DBP is capable of performing 1,024 queries for entire 32,768 4-KB records within around 3.31 ms. In other words, the DBP can analyze a capacity data of nearly 37.76 GB per second.
International conference proceedings, English

A Hybrid Adaptive CORDIC in 65nm SOTB CMOS Process
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham; Trong-Thuc Hoang; Duc-Hung Le
2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE, 2158-2161, 2016, Peer-reviwed, In this paper, a hybird adaptive COordinate Rotation DIgital Computer (HA-CORDIC) has implemented in 65nm Silicon On Thin Buried oxide (SOTB) CMOS technology. In the HA-CORDIC implementation, the adaptive algorithm is utilized for reducing the iteration of CORDIC algorithm. In comparison with other floating-point CORDIC designs, the latency of our proposed scheme is lower. It spends only 12, 20, and 26 clocks cycles in the best, average, and worst case, respectively. The HA-CORDIC exploits some design techniques such as resource sharing, pipeline, and parallel processing to achieve low-resource and low-latency. In 65nm SOTB CMOS technology, this design is able to operate at 50 MHz frequency with 0.5 V supply voltage, 0.36 mA current, and 0.058 mm(2) area. Its power consumption of HA-CORDIC is 0.251 mW, about three times lower than the one in conventional CMOS technology. Its leakage current is about 0.492 mu A if the supply voltage VDD is 0.4 V and the bias voltage VBB is -1.5 V. This leakage current is about four times lower than that of HA-CORDIC implementing in conventional CMOS.
International conference proceedings, English

High-performance DCT Architecture Based on Angle Recoding CORDIC and Scale-free Factor
Trong-Thuc Hoang; Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham; Duc-Hung Le
2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), IEEE, 199-204, 2016, Peer-reviwed, In this paper, the authors proposed high-performance DCT architectures based on COordinate Rotation DIgital Computer (CORDIC). The implementations deployed Adaptive angle recoding CORDIC (ACor) method and Scale-Free Factor (SFF) technique. There are two models presented in the paper: ACor-based Chen-DCT (ACor-DCT-C) and ACor-based Loeffler-DCT (ACor-DCT-L). The critical path in both models is six adder-delay. The experimental results give the coding gain performances of 8.8238 dB and 8.8229 dB for ACor-DCT-C and ACor-DCT-L, respectively. The mean-square-error (MSE) results are 6.27e-6 and 4.42e-4 for ACor-DCT-C and ACor-DCT-L, respectively. Each design requires 36 adders and 16 shifters in its implementation.
International conference proceedings, English

A Bit-Level Matrix Transpose for Bitmap-Index-Based Data Analytics
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), IEEE, 217-220, 2016, Peer-reviwed, When it comes to data analytics, bitmap index is likely to be one of the most effective approaches to processing a large number of queries with multiple conditions. In this work, we propose a ping-pong architecture of a bit-level matrix transpose (BLMT), which is successfully applied to the creation of bitmap index. BLMT employs a set of registers and multiplexers operating in parallel, so as to transpose every single row of matrix within one clock cycle. The experimental results in an Altera Arria V SX FPGA show that BLMT achieves the peak throughput of 6.37 Gbps at 50-MHz operating frequency, which is equivalent to 99.5% of the theoretical throughput. Additionally, BLMT only costs approximately 1.5% of lookup tables and 4.8% of registers.
International conference proceedings, English

A Compact, Ultra-Low Power AES-CCM IP Core for Wireless Body Area Networks
Van-Phuc Hoang; Thi-Thanh-Dung Phan; Van-Lan Dao; Cong-Kha Pham
2016 IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), IEEE, 1-4, 2016, Peer-reviwed, This paper presents a compact, ultra-low power AES-CCM authenticated encryption IP core for WBANs by combining a low area 8-bit AES encryption core, iterative structure and other optimized circuits. The proposed AES-CCM IP core can be used for the message security at the MAC level, e.g. message encryption and authentication, based on AES forward cipher function with a 128-bit key for counter and cipher block chaining modes of operations. The implementation results show that the proposed AES-CCM IP core achieves a very high resource efficiency and ultra-low power consumption while meeting the requirement of operation speed in WBANs.
International conference proceedings, English
DOI URL

An All-Digital PLL with SAR Frequency Locking System in 65nm SOTB CMOS
Keita Arai; Cong-Kha Pham
2016 IEEE SOI-3D-SUBTHRESHOLD MICROELECTRONICS TECHNOLOGY UNIFIED CONFERENCE (S3S), IEEE, 1-2, 2016, Peer-reviwed, This paper presents an all-digital PLL (ADPLL) which synthesizes any frequency using the successive approximation (SAR) algorithm. The proposed ADPLL consists of a high-frequency resolution digitally controlled oscillator, a time-to-digital converter, a frequency detection divider and the SAR controller. The proposed ADPLL is designed using 65nm SOTB CMOS process and occupies an area of 124.6x68.4 mu m(2). The range of output frequency is from 577 to 1876MHz at 1.0V power supply. The power consumption is 0.46mW at 1876MHz. The number of clocks to lock-in is 10 clocks in the best case and 34 clocks in the typical cases.
International conference proceedings, English
DOI URL

A High-Performance Bitmap-Index-Based Query Processor on 65-nm SOTB CMOS Process
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
2016 IEEE SOI-3D-SUBTHRESHOLD MICROELECTRONICS TECHNOLOGY UNIFIED CONFERENCE (S3S), IEEE, 1-2, 2016, Peer-reviwed, This paper presents the efficient architecture of a bitmap-index-based query processor for fast data analytics on 65-nm SOTB CMOS process. The post-simulation with a supply voltage of 0.55 V indicates that the processor could operate at 125 MHz and process up to 232 million records/second in case 32 keys and queries are utilized. Furthermore, by applying the reverse body bias voltage of -2 V, the leakage current reduces up to 73 times as compared to that in normal body bias.
International conference proceedings, English
DOI URL

An Ultra-Low Power ANTS Encryption Core in 65nm SOTB CMOS Process
Van-Phuc Hoang; Van-Lan Dao; Cong-Kha Pham
2016 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), IEEE, 89-90, 2016, Peer-reviwed, This paper presents an efficient ASIC implementation of the low area and ultra-low power AES encryption core with an optimized S-box, Rcon and control blocks optimization, combined with a simple clock gating technique using an ultra-low power 65nm SOTB CMOS technology. The ASIC implementation results show that the proposed AES encryption core requires a small number of clock cycles with ultra-low power consumption and achieves higher resource usage efficiency compared with other designs.
International conference proceedings, English
DOI URL

A High-Throughput Multi-Match Priority Encoder for Data Retrieval on 65-nm SOTB CMOS Process
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), IEEE, 2392-2395, 2016, Peer-reviwed, In this paper, a high-throughput multi-match priority encoder (MPE) for data retrieval is implemented in a low power 65-nm SOTB CMOS process. This approach employs an 8 bit priority encoder (PE) as a basement and utilizes a new design architecture to construct a 2,048-bit MPE (MPE2K). The experimental results on an FPGA proved that the operating frequency of MPE2K is 1.42 times higher than that of similar design, while the resource utilization is 4.39 times lower. Moreover, when being applied in a 100-MHz data analytics system, MPE2K achieves the minimum throughput of 99.8 Mbps. The post-simulation results on SOTB process indicate that MPE2K can operate at 312 MHz and consumes 9.56 mW at 1.0 V. Additionally, at 0.4 V, the leakage current in idle mode is reduced approximately 203.8 times due to the usage of reverse body bias voltage.
International conference proceedings, English
DOI URL

DataBase Processor (DBP) - A New Search Engine for the Big Data Era
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Trong-Thuc Hoang; Katsumi Inoue; Osamu Shimojo; Toshio Murayama; Kenji Tominaga; Cong-Kha Pham
Proc. of The 2015 International Conference on Integrated Circuits, Design, and Verification (ICDV 2015), 9-14, 10 Aug. 2015, Peer-reviwed
International conference proceedings, English

An FPGA Implementation of OFDM System for IEEE 802.22 WRAN
Tieu-Khanh Luong; Van-Phuc Hoang; Cong-Kha Pham
Proc. of The 2015 International Conference on Integrated Circuits, Design, and Verification (ICDV 2015), 104-107, 10 Aug. 2015, Peer-reviwed
International conference proceedings, English

Parallel pipelining configurable multi-port memory controller for multimedia applications
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
Proceedings - IEEE International Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 2015-, 2908-2911, 27 Jul. 2015, Peer-reviwed, Despite many significant improvements of processors up to now, the off-chip memory performance has still lagged far behind. The high-performance memory controller, therefore, has become the key to success. In this paper, a parallel pipelining configurable multi-port memory controller is proposed to not only exploit the external memory bandwidth effectively, but also provide the flexibility in use and the independence from other system architectures. The proposed architecture is composed of multi-clock multi-data-width buffers to speed up the transactions, embedded memory to store the configuration, and priority scheme arbiter to schedule all access. The design, then, is evaluated in a low-cost low-power Altera Cyclone V FPGA with 1 GB DDR3 external memory. The experimental results demonstrate that the proposed controller can support up to 32 concurrent connections with various clocks and data width, and achieve approximately 82% and 87% of theory peak bandwidth in write and read process, respectively.
International conference proceedings, English
DOI URL

A Perpetuum Mobile 32bit CPU on 65nm SOTB CMOS Technology with Reverse-Body-Bias Assisted Sleep Mode
Koichiro Ishibashi; Nobuyuki Sugii; Shiro Kamohara; Kimiyoshi Usami; Hideharu Amano; Kazutoshi Kobayashi; Cong-Kha Pham
IEICE TRANSACTIONS ON ELECTRONICS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E98C, 7, 536-543, Jul. 2015, Peer-reviwed, A 32bit CPU, which can operate more than 15 years with 220mAH Li battery, or eternally operate with an energy harvester of in-door light is presented. The CPU was fabricated by using 65nm SOTB CMOS technology (Silicon on Thin Buried oxide) where gate length is 60nm and BOX layer thickness is 10nm. The threshold voltage was designed to be as low as 0.19V so that the CPU operates at over threshold region, even at lower supply voltages down to 0.22V. Large reverse body bias up to -2.5V can be applied to bodies of SOTB devices without increasing gate induced drain leak current to reduce the sleep current of the CPU. It operated at 14MHz and 0.35V with the lowest energy of 13.4 pJ/cycle. The sleep current of 0.14 mu A at 0.35V with the body bias voltage of -2.5V was obtained. These characteristics are suitable for such new applications as energy harvesting sensor network systems, and long lasting wearable computers.
Scientific journal, English
DOI URL

Design of co-processor for real-time HMM-based text-to-speech on hardware system applied to Vietnamese
Trong-Thuc Hoang; Hong-Kiet Su; Hieu-Binh Nguyen; Duc-Hung Le; Huu-Thuan Huynh; Trong-Tu Bui; Cong-Kha Pham
IEICE ELECTRONICS EXPRESS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, 12, 14, 1-10, Jul. 2015, Peer-reviwed, Although HMM-based TTS has been studied for many years, there are some limitations such as real-time applications based on low-performance and low cost systems. In this paper, we present a design of a TTS co-processor used for HMM-based Text-to-Speech (TTS) hardware systems. Based on a dedicated FPU and resource sharing architecture, the co-processor can compute a lot of DSP algorithms required by HMM at very high speed. The system has been built and verified on the FPGA system with English and Vietnamese languages. The results show that it can compute up to 3 words per second at frequency of 100 MHz with the resources cost about 32,000 logic elements, 19,000 registers, and 957 KB memory.
Scientific journal, English
DOI URL

Low-resource low-latency hybrid adaptive CORDIC with floating-point precision
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Trong-Thuc Hoang; Duc-Hung Le; Cong-Kha Pham
IEICE ELECTRONICS EXPRESS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, 12, 9, 1-12, May 2015, Peer-reviwed, Despite being proposed since more than 50 years ago, COordinate Rotation DIgital Computer (CORDIC) is still one of the most effective algorithms for elementary function calculation so far. Original CORDIC, however, suffers high latency due to its nature of unvarying number of rotations. As a result, a low-latency hybrid adaptive (HA) CORDIC is proposed in this paper. Firstly, adaptive angle selection decreases total iterations up to 50% with respect to higher accuracy of results. Secondly, hybrid architecture including fixed-point input and floating-point output reduces the total hardware utilization and enhances the dynamic range of final results. Lastly, parallel and pipeline processing together with resource sharing technique allow the design to operate fully at 175.7MHz with low resource consumption-1,139 LUTs and 489 registers.
Scientific journal, English
DOI URL

An Efficient Multi-port Memory Controller for Multimedia Applications
Xuan-Thuan Nguyen; Cong-Kha Pham
2015 20TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), IEEE, 12-13, 2015, Peer-reviwed, The remedy for processor-memory bottleneck has considered as the key to success because of the substantial growth in multimedia applications. In this paper, an efficient external multi-port memory controller (MPMC) which consists of several buffers to speed up the transactions, embedded memory to store the configuration, and an arbiter to schedule all access, is proposed. The experimental results prove that the proposed design can operate independently of other system architectures, support up to 16 simultaneous external components with different clocks and data width, and achieve up to 88% and 92% of theory peak bandwidth for write and read process, respectively.
International conference proceedings, English
DOI URL

Parallel Pipelining Configurable Multi-port Memory Controller For Multimedia Applications
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE, 2908-2911, 2015, Peer-reviwed, Despite many significant improvements of processors up to now, the off-chip memory performance has still lagged far behind. The high-performance memory controller, therefore, has become the key to success. In this paper, a parallel pipelining configurable multi-port memory controller is proposed to not only exploit the external memory bandwidth effectively, but also provide the flexibility in use and the independence from other system architectures. The proposed architecture is composed of multi-clock multi-data-width buffers to speed up the transactions, embedded memory to store the configuration, and priority scheme arbiter to schedule all access. The design, then, is evaluated in a low-cost low-power Altera Cyclone V FPGA with 1 GB DDR3 external memory. The experimental results demonstrate that the proposed controller can support up to 32 concurrent connections with various clocks and data width, and achieve approximately 82% and 87% of theory peak bandwidth in write and read process, respectively.
International conference proceedings, English

A Reliable Protocol For Multimedia Transmission Over Wireless Sensor Networks
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
2015 11TH CONFERENCE ON PH.D. RESEARCH IN MICROELECTRONICS AND ELECTRONICS (PRIME), IEEE, 302-305, 2015, Peer-reviwed, The development of wireless sensor networks in last decade have changed the way information is collected and retrieved from the area monitoring. However, the harvesting data might be limited to more complex applications such as surveillance, object detection and tracking. For this reason, the research efforts in the field of wireless multimedia sensor networks (WMSNs) have growth rapidly so far. In this paper, the reliability issue is concentrated on to deliver sustained data over WMSNs for long periods of time, by contrast with the environmental noises. This reliable transport protocol (RTP) combines a modified automatic repeat request error-control with an error-correction mechanism. The experimental results in a Raspberry Pi compute module and an Atmel transceiver prove that RTP can transfer data successfully with a wide range of transmission rates, from 16.6 to 430.5 Kbps, at a distance up to 128 m.
International conference proceedings, English
DOI URL

Design of a Low-power Fixed-point 16-bit Digital Signal Processor Using 65nm SOTB Process
Duc-Hung Le; Nobuyuki Sugii; Shiro Kamohara; Xuan-Thuan Nguyen; Koichiro Ishibashi; Cong-Kha Pham
2015 International Conference on IC Design & Technology (ICICDT), IEEE, 1-4, 2015, Peer-reviwed, In this paper, a design of 16-bit fixed-point digital signal processor (DSP) is proposed. This DSP is based on the Harvard architecture, having two buses for ALU and a pipeline multiply accumulator (MAC). It composes of 16 general purpose 24-bit registers together with 41 four-cycle instruction sets. The DSP has a simple structure which is compact and flexible. The DSP is designed for low-power consumption, and implemented on ASIC using SOTB 65nm process which is a kind of SOI devices. The DSP chip consumes very low-power consumption 282 mu W at the operation voltage 0.55V and operation frequency 200MHz.
International conference proceedings, English
DOI URL

SAR: A Self-Adaptive and Reliable Protocol for Wireless Multimedia Sensor Networks
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Cong-Kha Pham
2015 SEVENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, IEEE, 760-765, 2015, Peer-reviwed, Recent years have witnessed a substantial growth in wireless multimedia sensor networks (WMSNs) due to the development of low-cost camera together with low-power system-on-chip (SoC). Unlike previous wireless sensor networks (WSNs), WMSNs strictly requires sustained multimedia data to be delivered over long periods of time, in contrast to the environmental noises. In this paper, Self-Adaptive and Reliable Point-to-Point (SAR) protocol for WMSN is presented. This transport protocol not only guarantees to dispatch all messages successfully but also adapts the data rate of the higher layer to prevent the network from congestion. The design is validated in a pair of Raspberry Pi and Atmel AT86RF212 module under different conditions. The experimental results indicate that SAR can carry out a wide range of transmission rates, from 16.6 to 470.5 Kbps, with distance up to 50 m. Moreover, the number of lost messages is eliminated around 47%, upon the utilization of proposed self-adaptive framework, Decrease Quickly Increase Slowly (DQIS).
International conference proceedings, English
DOI URL

A 400triV 059m. Lowpower CAM-based Pattern Matching System on 65nm SOTB Process
Duc-Hung Lei; Nobuyuki Sugii; Shiro Kamohara; Hong-Thu Nguyen; Koichiro Ishibashi; Cong-Kha Pham
TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, IEEE, 1-2, 2015, Peer-reviwed, A CAA-based matching system for fast exact pattern matching is implemented on AMC, using 65nm SOTB process, for Ver.), low power consumption. I he system has a simple structure, which consists of Content 'Addressable Nlemory (( AM), AND, SHIFT, and an FYI!, does not employ Central Processor Unit ((Pt) as well as complicated algorithms. We take advantage ofCtI which has an ability of parallel multi -match mode for designing the system. The system is applied to fast pattern matching with various required search patterns without using any search principles. In this paper, the system operates at 4006V, power consumption 0.59m%% using SOl B 65nm process.
International conference proceedings, English
DOI URL

A Low-resource Low-latency Hybrid Adaptive CORDIC in 180-nm CMOS Technology
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham; Trong-Thuc Hoang; Duc-Hung Le
TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, IEEE, 1-4, 2015, Peer-reviwed, In this paper, a low-resource low-latency hybrid adaptive COordinate Rotation DIgital Computer (HA-CORDIC) is implemented both in FPGA and 180-nm CMOS technology. The adaptive algorithm reduces around 50% iterations in comparison with the conventional CORDIC algorithm. The hybrid architecture together with resource sharing, parallel and pipeline processing are utilized in HA-CORDIC implementation. In FPGA implementation, the results show that the proposed system can operate at 108.15-MHz frequency, with 716 LUTs and 473 registers resource consumption. In CMOS implementation, the hardware architecture costs 10,299 cells with 0.41 mm 2 area and fully operates at 50-MHz frequency.
International conference proceedings, English
DOI URL

A 0.9V 200kHz Current-mode Successive Approximation Analog-to-Digital Converter in 0.18μm CMOS Technology
Takumu Yomogita; Cong-Kha Pham
Proc. of The 2014 International Conference on Integrated Circuits, Design, and Verification (ICDV 2014), 20-23, Nov. 2014, Peer-reviwed
International conference proceedings, English

Designing a High Performance Cryptographic System for Video Applications
Van Toan Nguyen; Huu Thuan Huynh; Cong-Kha Pham
Proc. of The 2014 International Conference on Integrated Circuits, Design, and Verification (ICDV 2014), 56-61, Nov. 2014, Peer-reviwed
International conference proceedings, English

[Invited] Perpetuum-Mobile Sensor Network Systems using a CPU on 65nm SOTB CMOS Technology
Koichiro Ishibashi; Nobuyuki Sugii; Cong-Kha Pham
Proc. of The 2014 International Conference on Integrated Circuits, Design, and Verification (ICDV 2014), 2-3, Nov. 2014
International conference proceedings, English

An FPGA-based Multi-port Memory Controller for High Bandwidth Applications
Xuan-Thuan NGUYEN; Cong-Kha PHAM
Proc. of The joint conference 4S-2014/AVIC2014 (3st Solid-State Systems Symposium & VLSI & Related Technologies/17th International Conference on Analog VLSI CIrcuits), 240-245, 22 Oct. 2014, Peer-reviwed
International conference proceedings, English

A Circuit Structure for MOS Only R-2R Ladder DAC Having Higher Linearity
Takumu Yomogita; Nobuyuki Sugii; Shiro Kamohara; Koichiro Ishibashi; Cong-Kha Pham
Proc. of IEEE International Conference on Communications and Electronics (ICCE2014), 650-654, Jul. 2014, Peer-reviwed
International conference proceedings, English

Point-to-point H.264 Video Streaming over IEEE 802.15.4 with Reed-Solomon Error Correction
Wei-Chun Tung; Nhat-Tan Mai; Duy-Tung Dao; Huu-Thuan Huynh; Cong-Kha Pham
Proc. of International Conference on Green and Human Information Technology 2014 (ICGHIT 2014), 89-93, Feb. 2014, Peer-reviwed
International conference proceedings, English

A Perpetuum Mobile 32bit CPU on 65nm SOTB CMOS Technology with Reverse-Body-Bias Assisted Sleep Mode
Shiro Kamohara; Nobuyuki Sugil; Koichiro Ishibashi; Kimiyoshi Usami; Hideharu Amano; Kazutoshi Kobayashi; Cong-Kha Pham
2014 IEEE HOT CHIPS 26 SYMPOSIUM (HCS), IEEE, 2014, Peer-reviwed
International conference proceedings, English

A CAM-Based Information Detection Hardware System for Fast Image Matching on FPGA
Duc-Hung Le; Tran-Bao-Thuong Cao; Katsumi Inoue; Cong-Kha Pham
IEICE TRANSACTIONS ON ELECTRONICS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E97C, 1, 65-76, Jan. 2014, Peer-reviwed, In this paper, the authors present a CAM-based Information Detection Hardware System for fast, exact and approximate image matching on 2-D data, using FPGA. The proposed system can be potentially applied to fast image matching with various required search patterns, without using search principles. In designing the system, we take advantage of Content Addressable Memory (CAM) which has parallel multi-match mode capability and has been designed, using dual-port RAM blocks. The system has a simple structure, and does not employ any Central Processor Unit (CPU) or complicated computations.
Scientific journal, English
DOI URL

A Perpetuum Mobile 32bit CPU with 13.4pJ/cycle, 0.14 mu A Sleep Current using Reverse Body Bias Assisted 65nm SOTB CMOS Technology
Koiehiro Ishibashi; Nobuyuki Sugii; Kimiyoshi Usami; Hideharu Amano; Kazutoshi Kobayashi; Cong-Kha Pham; Hideki Makiyama; Yoshiki Yamamoto; Hirofumi Shinohara; Toshiaki Iwamatsu; Yasuo Yamaguehi; Hidekazu Oda; Takumi Hasegawa; Shinobu Okanishi; Hiroshi Yanagita; Shiro Kamohara; Masaru Kadoshima; Keiiehi Maekawa; Tomohiro Yamashita; Duc-Hung Le; Takumu Yomogita; Masaru Kudo; Kuniaki Kitamori; Shuya Kondo; Yuuki Manzawa
2014 IEEE COOL CHIPS XVII, IEEE, 8, 2014, Peer-reviwed, A 32-bit CPU which operates with the lowest energy of 13.4 pJ/cycle at 0.35V and 14MHz, operates at 0.22V to 1.2V and with 0.14 mu A sleep current is demonstrated. The low power performance is attained by Reverse-Body-Bias-Assisted 65nm SOTB CMOS (Silicon On Thin Buried oxide) technology. The CPU can operate more than 100 years with 610mAH Li battery.
International conference proceedings, English

Design of a Parallel CAM-based Multi-Match Search System Using 0.18-mu m CMOS Process
Due-Hung Lei; Katsumi Inoue; Cong-Kha Pham
2014 IEEE FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), IEEE, 336-339, 2014, Peer-reviwed, A novel data matching method has been proposed for a very fast and efficient search engine. This method is implemented on 0.18-mu m CMOS process. We take advantage of Content Addressable Memory (CAM) which has an ability of parallel multi-match mode for designing the system. The system operates based on the CAMs for pattern matching in parallel manner to return multiple addresses of multi-match results. This principle increases search performance of the search system. Based on the parallel multi-match operations, the system can be applied to pattern matching or searching applications with various constraint query patterns without using any search principles.
International conference proceedings, English

A 4pA/Gate Sleep Current 65nm SOTB Logic Gates Using On-chip VBB Generator for Energy Harvesting Sensor Network Systems
Hiroki Nagatomi; Le Duc-Hung; Cong-Kha Pham; Nobuyuki Sugii; Shirou Kamohara; Toshiaki Iwamatsu; Koichiro Ishibashi
Proc. of The 2013 International Conference on Integrated Circuits, Design, and Verification (ICDV 2013), 42-45, Nov. 2013, Peer-reviwed
International conference proceedings, English

A Design of Differential Digital-Controlled Oscillator in a 0.18um CMOS Process
Trung-Khanh Le; Duc-Hung Le; Cong-Kha Pham; Trong-Tu Bui
Proc. of The 2013 International Conference on Integrated Circuits, Design, and Verification (ICDV 2013), 57-60, Nov. 2013, Peer-reviwed
International conference proceedings, English

Point-to-Point Real-time H.264 Video Streaming over IEEE 802.15.4
Wei-Chun Tung; Hong-Thang Nguyen; Minh-Triet Luu; Cao-Quyen Tran; Huu-Thuan Huynh; Cong-Kha Pham; Kenzo Ozaki
Proc. of The 2013 International Conference on Integrated Circuits, Design, and Verification (ICDV 2013), 182-187, Nov. 2013, Peer-reviwed
International conference proceedings, English

A Compact Improved TDES Cryptography Module for Wearable Medical Devices
Quang-Kien Trinh; Xuan-Tien Do; Van-Phuc Hoang; Thi-Thanh-Dung Phan; Cong-Kha Pham
Proc. of The 2013 International Conference on Integrated Circuits, Design, and Verification (ICDV 2013), 79-82, Nov. 2013, Peer-reviwed
International conference proceedings, English

Power Reduction Methodologies for High-Speed Flash ADC Using 180 nm CMOS Process
Thanh-Tri Vo; Duc-Hung Le; Cong-Kha Pham; Trong-Tu Bui
Proc. of The 2013 International Conference on Integrated Circuits, Design, and Verification (ICDV 2013), 46-51, Nov. 2013, Peer-reviwed
International conference proceedings, English

An ASIC Implementation of 16-bit Fixed-Point Digital Signal Processor
Xuan-Thuan Nguyen; Duc-Hung Le; Cong-Kha Pham; Trong-Tu Bui; Huu-Thuan Huynh
Proc. of The International Conference on Advanced Computing and Applications (ACOMP), **-**, Oct. 2013, Peer-reviwed
International conference proceedings, English

An ASIC Implementation of 16-Bit Fixed-Point Digital Signal Processor
Xuan-Thuan NGUYEN; Trong-Tu BUI; Huu-Thuan HUYNH; Cong-Kha PHAM; Duc-Hung LE
Journal of Science and Technology, 51, 4B, 282-289, Oct. 2013, Peer-reviwed
Scientific journal, English

Design a Fast CAM-Based Exact Pattern Matching System on FPGA and 0.18 mu m CMOS Process
Due-Hung Le; Katsumi Inoue; Cong-Kha Pham
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E96A, 9, 1883-1888, Sep. 2013, Peer-reviwed, A CAM-based matching system for fast exact pattern matching is implemented on a hardware system with FPGA and ASIC. The system has a simple structure, and does not employ any Central Processor Unit (CPU) as well as complicated computations. We take advantage of Content Addressable Memory (CAM) which has an ability of parallel multi-match mode for designing the system. The system is applied to fast pattern matching with various required search patterns without using search principles. In this paper, the authors present a CAM-based system for fast exact pattern matching on 2-D data.
Scientific journal, English
DOI URL

A fast CAM-based image matching system on FPGA
Duc-Hung Le; Tran Bao Thuong Cao; Katsumi Inoue; Cong-Kha Pham
Proceedings - IEEE International Symposium on Circuits and Systems, 1797-1800, 2013, Peer-reviwed, A CAM-based (Content Addressable Memory) image matching system is implemented on hardware system using FPGA. The system has simple structure, does not employ any Central Processor Units (CPUs) as well as complicated computations. The authors take advantages of CAM which has an ability of parallel multi-match mode for designing the system. Thus increases the matching performance of the system. The system is applied for exact image matching or approximate image matching with various required search patterns without using search principles. In this paper, the authors present the system for fast image matching applications on 2-D data. © 2013 IEEE.
International conference proceedings, English
DOI URL

Low complexity logarithmic and anti-logarithmic converters for hybrid number system processors and DSP applications
Van-Phuc Hoang; Cong-Kha Pham
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Institute of Electronics, Information and Communication, Engineers, IEICE, E96-A, 2, 584-590, 2013, Peer-reviwed, This paper presents an efficient approach for logarithmic and anti-logarithmic converters which can be used in the arithmetic unit of hybrid number system processors and logarithm/exponent function generators in DSP applications. By employing the novel quasi-symmetrical difference method with only the simple shift-add logic and the look-up table, the proposed approach can reduce the hardware area and improve the conversion speed significantly while achieve similar accuracy compared with the previous methods. The implementation results in both FPGA and 0.18-μm CMOS technology are also presented and discussed. Copyright © 2013 The Institute of Electronics, Information and Communication Engineers.
Scientific journal, English
DOI URL

A Fast CAM-based Image Matching System on FPGA
Duc-Hung Le; Tran Bao Thuong Cao; Katsumi Inoue; Cong-Kha Pham
2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE, 1797-1800, 2013, Peer-reviwed, A CAM-based (Content Addressable Memory) image matching system is implemented on hardware system using FPGA. The system has simple structure, does not employ any Central Processor Units (CPUs) as well as complicated computations. The authors take advantages of CAM which has an ability of parallel multi-match mode for designing the system. Thus increases the matching performance of the system. The system is applied for exact image matching or approximate image matching with various required search patterns without using search principles. In this paper, the authors present the system for fast image matching applications on 2-D data.
International conference proceedings, English

A Fast CAM-based Watermarking Extraction on FPGA
Duc-Hung Le; Tran-Bao-Thuong Cao; Katsumi Inoue; Cong-Kha Pham
2013 INTERNATIONAL CONFERENCE ON IC DESIGN AND TECHNOLOGY (ICICDT), IEEE, 207-210, 2013, Peer-reviwed, A CAM-based (Content Addressable Memory) system for fast Watermarking extraction has been proposed and implemented on hardware system using FPGA device. The system has a simple structure, does not employ any Central Processor Units (CPUs) or complicated algorithms for extracting hidden information. The authors take advantages of CAM which has an ability of parallel multi-match mode for designing the system. This increases the decryption performance of the Watermarking extraction system. The fast proposed system is applied to fast Watermarking extraction on 2-dimension (2-D) data.
International conference proceedings, English

Design a Fast CAM-based Information Detection System on FPGA and 0.18 mu m ASIC Technology
Duc-Hung Le; Katsumi Inoue; Cong-Kha Pham
2013 IEEE INTERNATIONAL CONFERENCE OF ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), IEEE, 1-2, 2013, Peer-reviwed, A novel information detection method has been proposed for a fast and efficient search engine. This method is implemented on hardware system using FPGA in advance for functional verification and then on ASIC using 0.18 mu m CMOS technology. We take advantages of Content Addressable Memory ( CAM) which has an ability of parallel multi-match mode for designing the system. The system operates based on CAM blocks for pattern matching in parallel manner to return multiple addresses of multi-match results. Based on the parallel multi-match operations, the system can be applied to pattern matching or searching applications with various constraint search patterns without using any search principles. This increases the matching performance of the search systems.
International conference proceedings, English

A CAM-based Information Detection Hardware System for fast exact pattern matching
Duc-Hung Le; Tran-Bao-Thuong Cao; Katsumi Inoue; Cong-Kha Pham
Midwest Symposium on Circuits and Systems, 848-851, 2013, Peer-reviwed, A CAM-based Information Detection Hardware System for fast exact pattern matching is implemented on a hardware system with FPGA and ASIC. The system has a simple structure, does not employ any Central Processor Unit (CPU) as well as complicated algorithms. We take advantage of Content Addressable Memory (CAM) which has an ability of parallel multi-match mode for designing the system. The system is applied to fast pattern matching with various required search patterns without using any search principles. In this paper, the authors present the system for exact pattern matching on 2-D data. © 2013 IEEE.
International conference proceedings, English
DOI URL

An Efficient ASIC Implementation of Logarithm Approximation for HDR Image Processing
Van-Phuc Hoang; Xuan-Tien Do; Cong-Kha Pham
2013 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), IEEE, 535-539, 2013, Peer-reviwed, This paper presents an efficient ASIC implementation for the hardware approximation of the logarithm function which can be used for emerging high dynamic range image processing applications. By employing a new logarithm approximation method, the modified barrel shifter circuit and optimized leading one detector and encoder, the proposed approach can reduce the hardware area and improve the logarithm computation speed significantly while achieve the similar accuracy compared with other methods. The implementation results in 0.18-mu m CMOS technology are also presented and discussed.
International conference proceedings, English

A 44 mu W/10MHz Minimum Power Operation of 50K Logic Gate using 65nm SOTB Devices with Back Gate Control
Shotaro Morohashi; Nobuyuki Sugii; Toshiaki Iwamatsu; Shiro Kamohara; Yudai Kato; Cong-Kha Pham; Koichiro Ishibashi
2013 IEEE SOI-3D-SUBTHRESHOLD MICROELECTRONICS TECHNOLOGY UNIFIED CONFERENCE (S3S), IEEE, **-**, 2013, Peer-reviwed, Performance, leakage and E-min on 65-nm SOTB and bulk were compared. We evaluated ring oscillators for SOTB and bulk with the same layout pattern. It is shown that operation frequency can be controlled from 6MHz to 40MHz, leakage of sleep mode can be decreased by 3 orders of magnitude on SOTB. By applying adjustable body bias and supply voltage depending on frequency, energy of 50k gates CMOS logic circuit can be minimized to be 4.4pJ/Hz, which corresponds to 44 mu W at 10MHz. Leakage of the logic gates can be reduced at 4.2nA at sleep mode.
International conference proceedings, English

An FPGA-Based Information Detection Hardware System Employing Multi-Match Content Addressable Memory
Duc-Hung Le; Katsumi Inoue; Masahiro Sowa; Cong-Kha Pham
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E95A, 10, 1708-1717, Oct. 2012, Peer-reviwed, A new information detection method has been proposed for a very fast and efficient search engine. This method is implemented on hardware system using FPGA. We take advantages of Content Addressable Memory (CAM) which has an ability of matching mode for designing the system. The CAM blocks have been designed using available memory blocks of the FPGA device to save access times of the whole system. The entire memory can return multi-match results concurrently. The system operates based on the CAMs for pattern matching, in a parallel manner, to output multiple addresses of multi-match results. Based on the parallel multi-match operations, the system can be applied for pattern matching with various required constraint conditions without using any search principles. The very fast multi-match results are achieved at 60 ns with the operation frequency 50 MHz. This increases the search performance of the information detection system which uses this method as the core system.
Scientific journal, English
DOI URL

A CAM-based Information Detection Hardware System for fast pattern matching on FPGA
Duc Hung Le; Tran Bao Thuong Cao; Katsumi Inoue; Cong Kha Pham
Proc. of 2st Solid-State Systems Symposium & VLSI & Related Technologies (4S-2012), 223-226, Aug. 2012, Peer-reviwed
International conference proceedings, English

A Design of 16-bit Pi-Tpe DAC Employing Three-Stage Indirect Feedback Compensation Opamp
Trung-Khanh LE; Trong-Tu BUI; Duc-Hung LE; Cong-Kha PHAM
Proc. of 3rd ICICE International Conference on Integrated Circuits and Devices in Vietnam ICDV 2012, 64-68, Aug. 2012, Peer-reviwed
International conference proceedings, English

A PCIe-based FFT Implementation for High-speed Spectrum Analysis
Xuan-Thuan NGUYEN; QM-Dang-Do; Huu-Thuan HUYNH; Cong-Kha PHAM
Proc. of 3rd ICICE International Conference on Integrated Circuits and Devices in Vietnam ICDV 2012, 126-131, Aug. 2012, Peer-reviwed
International conference proceedings, English

A Design of Three-Stage CMOS Opamp Using Indirect Feedback Compensation Technique
Trung-Khanh LE; Trong-Tu BUI; Duc-Hung LE; Cong-Kha PHAM
Proc. of 2st Solid-State Systems Symposium & VLSI & Related Technologies (4S-2012), 153-156, Aug. 2012, Peer-reviwed
International conference proceedings, English

The New Structure of Time-to-Digital Converter (TDC) - Multi Diagonal Vernier based TDC
Phu-Quoc NGUYEN; Cong-Kha PHAM
Proc. of 2st Solid-State Systems Symposium & VLSI & Related Technologies (4S-2012), 119-122, Aug. 2012, Peer-reviwed
International conference proceedings, English

An Improved Hybrid LUT-Based Architecture for Low-Error and Efficient Fixed-Width Squarer
Van-Phuc Hoang; Cong-Kha Pham
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E95A, 7, 1180-1184, Jul. 2012, Peer-reviwed, In this paper, an improved hybrid LUT-based architecture for low-error and efficient fixed-width squarer circuits is presented in which LUT-based and conventional logic circuits are employed together to achieve the good trade-off between hardware complexity and performance. By exploiting the mathematical identities and hybrid architecture, the mean error and mean squarer error of the proposed squarer are reduced by up to 40%, compared with the best previous method presented in literature. Moreover, the proposed method can improve the speed and reduce the area of the squarer circuit. The implementation and chip measurement results in 0.18-mu m CMOS technology are also presented and discussed.
Scientific journal, English
DOI URL

Efficient LUT-Based Truncated Multiplier and Its Application in RGB to YCbCr Color Space Conversion
Van-Phuc Hoang; Cong-Kha Pham
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E95A, 6, 999-1006, Jun. 2012, Peer-reviwed, High performance, low area multipliers are highly desired for modern and future DSP systems due to the increasing demand of high speed DSP applications. In this paper, we present an efficient architecture for an LUT-based truncated multiplier and its application in RGB to YCbCr color space conversion which can be used for digital TV, image and video processing systems. By employing an improved split LUT-based architecture and LUT optimization method, the proposed multiplier can reduce the value of area-delay product by up to 52% compared with other constant multiplier methods. The FPGA implementation of a color space conversion application employing the proposed multiplier also results in significant reduction of area-delay product of up to 48%.
Scientific journal, English
DOI URL

Novel Quasi-Symmetrical Approach for Efficient Logarithmic and Anti-logarithmic Converters
Van-Phuc HOANG; Cong-Kha PHAM
Proc. of IEEE 8th Conference on Ph.D. Research in Microelectronics & Electronics (PRIME2012), 111-114, Jun. 2012, Peer-reviwed
International conference proceedings, English

A novel Information Detection Hardware System
Duc-Hung Le; Cong-Kha PHAM
Proc. of IEEE 8th Conference on Ph.D. Research in Microelectronics & Electronics (PRIME2012), 123-126, Jun. 2012, Peer-reviwed
International conference proceedings, English

Low-Area, High-Speed Logarithmic and Anti-logarithmic Converters for Digital Signal Processors Based on Hybrid Number System
Van-Phuc HOANG; Cong-Kha PHAM
Proc. of IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips XV), 8, Apr. 2012, Peer-reviwed
International conference proceedings, English

Low-Error and Efficient Fixed-Width Squarer for Digital Signal Processing Applications
Van-Phuc Hoang; Cong-Kha Pham
2012 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), IEEE COMPUTER SOC, 477-482, 2012, Peer-reviwed, This paper presents a new approach of using the improved hybrid LUT-based architecture for the low-error and efficient fixed-width squarer circuits. By employing both LUT-based and simple conventional logic circuits, the good trade-off between hardware complexity and performance can be achieved. Moreover, the mathematical identity of squaring operation is exploited so that the error can be reduced significantly compared with other methods. The proposed method can also improve the speed and reduce the area of squarer circuit. The implementation and chip measurement results in 0.18-mu m CMOS technology are also presented and discussed.
International conference proceedings, English

A Fully-Parallel Information Detection Hardware System Employing Content Addressable Memory
Duc-Hung Le; Masahiro Sowa; Cong-Kha Pham; Katsumi Inoue
2012 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), IEEE COMPUTER SOC, 447-452, 2012, Peer-reviwed, A new information detection method has been proposed for a very fast and efficient search engine. This method is implemented on hardware system using FPGA. We take advantages of Content Addressable Memory (CAM) which has an ability of searching and matching mode for designing the system. The CAM blocks have been designed using available memory blocks of the FPGA device to save access times of the whole system. The entire memory can return multi-matched results concurrently. The system operates based on the CAMs for pattern matching in parallel manner to return multiple addresses of multi-matched results. Based on the parallel multi-matching operations, the system can be applied for pattern matching with various required constraint conditions without using any search principles. The very fast multi-matched results 60ns are achieved at the operational frequency 50 Mhz. Thus increases the matching performance of the information detection system which uses this method as the core system.
International conference proceedings, English

Parameter extraction and optimization using Levenberg-Marquardt algorithm
Le Duc-Hung; Pham Cong-Kha; Nguyen Thi Thien Trang; Bui Trong Tu
2012 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), IEEE COMPUTER SOC, 434-437, 2012, Peer-reviwed, Parameter extraction is an important part of model development. The goal of parameter extraction and optimization is to determine such values of device model parameters that minimize the differences between a set of measured characteristics and results obtained by evaluations of the device model. This minimization process is often called fitting of model characteristics to the measurement data.
The objective of this paper is presenting an extraction and optimization method, using Levenberg-Marquardt algorithm, for a set of electrical parameters of the EKV MOSFET Model. The Levenberg-Marquardt algorithm is an efficient and popular damped least square technique. This algorithm is a combination between the steepest gradient descent and the Gauss-Newton algorithms. All implementations are carried out on Matlab environment.
International conference proceedings, English

A Linearity Optimization Method for CMOS R-2R Ladder Network
Yuta Kato; Cong-Kha Pham
Proc. of 2011 IEEJ International Analog VLSI Workshop, **-**, Nov. 2011, Peer-reviwed
International conference proceedings, English

A Constant-gm Rail-to-Rail Operational Amplifier with Low-gain Variation and It's Analysis
Nobuyuki Yokoyama; Cong-Kha Pham
Proc. of 2011 IEEJ International Analog VLSI Workshop, **-**, Nov. 2011, Peer-reviwed
International conference proceedings, English

An SoPC for Real-Time Motion Detection Using Spatial-Temporal Entropy
Thuan NGUYEN; Thuan HUYNH; Cong-Kha PHAM
Proc. of Integrated Circuits and Devices in Vietnam (ICDV 2011), 43-48, Aug. 2011, Peer-reviwed
International conference proceedings, English

Efficient LUT-based multiplier and squarer for DSP applications
Van-Phuc HOANG; Cong-Kha PHAM
Proc. of Integrated Circuits and Devices in Vietnam (ICDV 2011), 148-153, Aug. 2011, Peer-reviwed
International conference proceedings, English

Implementation of Search-Less Information Detection based on Content Addressable Memory on FPGA
Duc-Hung LE; Katsumi INOUE; Masahiro SOWA; Cong-Kha PHAM
Proc. of Integrated Circuits and Devices in Vietnam (ICDV 2011), 166-171, Aug. 2011, Peer-reviwed
International conference proceedings, English

Parameter Extraction and Optimization using Levenberg-Marquardt and Genetic Algorithm
Duc-Hung LE; Cong-Kha PHAM; Thi Thien Trang NGUYEN; Trong-Tu Bui
Proc. of Triangle Symposium on Advanced ICT 2011, 54-58, Aug. 2011, Peer-reviwed
International conference proceedings, English

An Improved Linear Difference Method with High ROM Compression Ratio in Direct Digital Frequency Synthesizer
Van-Phuc Hoang; Cong-Kha Pham
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E94A, 3, 995-998, Mar. 2011, Peer-reviwed, The increasing demand of low power Direct Digital Frequency Synthesizer (DDFS) leads to the requirement of efficient compression methods to reduce ROM size for storing sine function values. This paper presents a technique to achieve very high compression ratio by using the optimized four-segment linear difference method. The proposed technique results in the ROM compression ratio of about 117.3:1 and the word size reduction of 6 bits for the design of a DDFS with 11-bit sine amplitude output. This high compression ratio result is very promising to meet the requirement of low power consumption and low hardware complexity in digital VLSI technology.
Scientific journal, English
DOI URL

Low error, efficient fixed width squarer using hybrid LUT-based architecture
Van-Phuc Hoang; Cong-Kha Pham
Lecture Notes in Electrical Engineering, 134, 223-230, 2011, Peer-reviwed, This paper presents the design of low error and efficient fixed width squarer employing hybrid LUT-based architecture which can be used for future DSP applications. By applying the mathematical identities and hybrid architecture, the mean error and mean squarer error of proposed squarer are reduced by up to 40% compared with the best previous method presented in literature. Moreover, the proposed method can improve the speed and reduce the area of squaring circuits. The implementations results for both FPGA hardware and 0.18-μm CMOS technology are also reported and discussed. © 2011 Springer-Verlag.
International conference proceedings, English
DOI URL

A Novel Soft-Start Control Circuit for Current-Mode DC-DC Converter
Kimio Shibata; Cong-Kha Pham
Proc. of 2010 International Conference on Solid State Device and Materials, 343-344, Sep. 2010, Peer-reviwed
International conference proceedings, English

Improved linear difference method for sine ROM compression in Direct Digital Frequency Synthesizer
Van-Phuc Hoang; Cong-Kha Pham
Proc. of 1st Solid-State Systems Symposium – VLSI & Related Technologies (4S-2010), 192-195, Jun. 2010, Peer-reviwed
International conference proceedings, English

A Low-Power High-PSRR Low-Dropout Regulator With Bulk-Gate Controlled Circuit
Socheat Heng; Cong-Kha Pham
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 57, 4, 245-249, Apr. 2010, Peer-reviwed, In this brief, we presented a bulk-gate controlled circuit for improving a power supply rejection ratio (PSRR) of a low-dropout voltage regulator (LDO), which deteriorated due to lowering of a power consumption. A test chip was fabricated using a 0.18-mu m complimentary metal-oxide-semiconductor process, and experimental results demonstrated that the proposed circuit provides the PSRR that improved to 77 dB at 10 Hz and 64.3 dB at 1 kHz, while the consumption current of the whole LDO with all component circuits was 8.5 mu A without a load and 35 mu A with a full load.
Scientific journal, English
DOI URL

高速ソフトスタート制御回路を用いた電流モードDC-DCコンバータ
柴田公男; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J93-A, 2, 127-135, Feb. 2010, Peer-reviwed, 携帯電子機器は,小型,軽量,そして搭載されている電池の動作時間の延長が要求されている.スリープモードや待機モードは,不必要な消費電力を抑えることができるので高い頻度での動作モード切換えは電池の動作時間延長に有効である.しかし,スイッチング電源はスタート時に大きな入力突入電流と出力電圧のオーバシュートが発生するため,ソフトスタート回路によりミリ秒単位の時間を要して電圧を安定化している.電源回路の入力突入電流や出力電圧のオーバシュートは,電池の寿命を延長できないばかりかインダクタや電子部品の信頼性を損なう要因となる.本論文では,高速の電源オン/オフを可能とするカレントモード型PWM DC-DC降圧型コンバータ制御回路を提案する.シミュレーション結果により,提案する高速ソフトスタート制御回路は,突入電流とオーバシュートを低減し,無負荷から最大負荷電流まで,入力電圧や動作温度などに依存せず,約150μsのソフトスタート時間が確認された.これは従来回路の代表的なソフトスタート時間である7.5msと比較すると1/50に相当する.
Scientific journal, Japanese
URL

電流モードDC-DCコンバータ用小型の適応型スロープ補償回路
柴田公男; 範公可
電子情報通信学会A論文誌, J93-A, 1, 27-30, Jan. 2010, Peer-reviwed
Scientific journal, Japanese

A Compact Adaptive Slope Compensation Circuit for Current-Mode DC-DC Converter
Kimio Shibata; Cong-Kha Pham
2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, IEEE, 1651-1654, 2010, Peer-reviwed, In this paper, the adaptive slope compensation circuit operating in low power consumption with a less component counts design is proposed. The sub-harmonic oscillation is a well-known problem in the Current-Mode DC-DC converters. Proposed novel adaptive slope compensation circuit solved the sub-harmonic oscillation problem. The circuit adjusts the slope compensation ramp by automatic operation according to the output voltage. The proposed circuit has implemented to the Current-Mode DC-DC converter which operates at 1.2MHz of the switching frequency. The proposed circuit used standard 0.5 mu m CMOS parameters for HSPICE simulation. The proposed circuit which composed of 15 components and consumes only 10 mu A has eliminated the sub-harmonic oscillation problem.
International conference proceedings, English

A DC-DC Converter Using A High Speed Soft-Start Control Circuit
Kimio Shibata; Cong-Kha Pham
2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, IEEE, 833-836, 2010, Peer-reviwed, In this paper, a high speed soft-start control circuit is proposed to implement the Current-Mode DC-DC Converter. The time for the soft-start does not depend on the load condition from no load to the maximum load current. The proposed circuit consists of SS(RAMP), VREF(RAMP) and a small differential-voltage generator. The SS(RAMP) is a piecewise linear voltage that has four different ramps. The VREF(RAMP) is a linear ramp voltage. The HSPICE simulation results show that the proposed high speed soft-start control circuit to implement to the Current-Mode DC-DC Converter achieved 150 mu s and has been shortened to 1/50 compared with a commonly available high performance converter which achieves in about 7.5ms.
International conference proceedings, English

A Wide Frequency Range and Adjustable Duty Cycle CMOS Ring Voltage Controlled Oscillator
Minh-Hai Nguyen; Cong-Kha Pham
2010 THIRD INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), IEEE, 107-109, 2010, Peer-reviwed, This paper presents a voltage controlled ring oscillator (VCO) with wide tuning range and adjustable duty cycle. The circuit was designed using Rohm 0.18 m technology with 1.5V supply voltage. The simulation results show that the VCO oscillates from 300Hz up to 1.4GHz, while the duty cycle can be adjusted 20-80% independently from the oscillating frequency.
International conference proceedings, English

A Low-Power High Accuracy Over Current Protection Circuit for Low Dropout Regulator
Socheat Heng; Cong-Kha Pham
IEICE TRANSACTIONS ON ELECTRONICS, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E92C, 9, 1208-1214, Sep. 2009, Peer-reviwed, In this paper, a low power current protection circuit implemented in a low dropout regulator (LDO) is presented. The proposed circuit, designed in a 0.35 mu m CMOS process, provides a precise limiting current as well as holding current with low dependency on both supply voltage and regulator output voltage. The experimental results showed that the proposed circuit is operable in the regulator output voltage range from VOUT=1.2 V to VOUT=3.6 V and supply voltage range from VDD=VOUT+0.5 V to VDD=5.6 V. Since the proposed circuit is composed of few simple basic circuits such as a comparator and a Schmitt Trigger, it has a low current consumption of less than ISS=0.82 mu A at a load current of ILOAD=200 mA. This makes the circuit suitable for low power and low voltage LDO design.
Scientific journal, English
DOI URL

低消費電力シリーズレギュレータ用の負荷過渡応答の高速化回路
ヘインソチェット; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J92-A, 7, 470-476, Jul. 2009, Peer-reviwed, 本論文では,シリーズレギュレータ(以降レギュレータ)において低消費電力で,かつ高速に応答できる高速化回路(Quick Response Circuit:QRC)を提案する.0.18μm CMOSプロセスによって試作されたチップの評価結果により,QRCを内蔵したレギュレータにおいて,出力電圧が1.2[V],出力電圧安定化容量が1[μF]の場合,負荷電流IOUTが0.5[μs]で0.1〜150[mA]の急激な変動においても出力電圧VOUTの降下及び上昇をそれぞれ196[mV]及び172[mV]以下に抑えられることが確認できた.このとき,基準電圧回路,過電流保護回路及び分割抵抗を含めたレギュレータ全体回路の消費電流は,軽負荷においてわずか8.5[μA],重負荷においても35[μA]のみとなった.
Scientific journal, Japanese
URL

シリーズレギュレータが高速起動可能な突入電流制限回路の構成
ヘインソチェット; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J92-A, 7, 521-523, Jul. 2009, Peer-reviwed, 本論文では,パワーマネージメントICなどに内蔵されている複数のシリーズレギュレータ(以下LDO)が同時起動によって発生する大電流の突入電流問題を解決するために,LDOの突入電流制限回路を提案する.0.18μmのCMOSプロセスで設計し,HSPICEによるシミュレーションを行った結果,LDOの出力コンデンサを10[μF]にしたにもかかわらず,最大突入電流を144.1[mA]以下に抑制することができた.更に,出力電圧の最大起動時間もわずか313[μs]以内である.従来の基準電圧の起動特性の傾斜を制御する方法と違い,パワーMOSFETのゲート電圧を直接に制御することで外付けのソフトスタートコンデンサが不要となり,省面積かつ低コストの電源システムが実現できる.また,提案した回路の消費電流は4[μA]のみである.
Scientific journal, Japanese
URL

A low-power high accuracy over current protection circuit for low dropout regulator
Socheat Heng; Cong-Kha Pham
IEICE Transactions on Electronics, Institute of Electronics, Information and Communication, Engineers, IEICE, E92-C, 9, 1208-1214, 2009, Peer-reviwed, In this paper, a low power current protection circuit implemented in a low dropout regulator (LDO) is presented. The proposed circuit, designed in a 0.35μm CMOS process, provides a precise limiting current as well as holding current with low dependency on both supply voltage and regulator output voltage. The experimental results showed that the proposed circuit is operable in the regulator output voltage range from VOUT=1.2 V to VOUT=3.6 V and supply voltage range from VDD=VOUT+0.5 V to VDD=5.6 V. Since the proposed circuit is composed of few simple basic circuits such as a comparator and a Schmitt Trigger, it has a low current consumption of less than IS S=0.82μA at a load current of ILOAD=200 mA. This makes the circuit suitable for low power and low voltage LDO design. Copyright © 2009 The Institute of Electronics, Information and Communication Engineers.
Scientific journal, English
DOI URL

New Design Method of Low Power Over Current Protection Circuit for Low Dropout Regulator
Socheat Heng; Weichun Tung; Cong-Kha Pham
2009 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), PROCEEDINGS OF TECHNICAL PROGRAM, IEEE, 47-+, 2009, Peer-reviwed, In this paper, a low power current protection circuit implemented in LDOs is presented. The proposed circuit, designed in 0.35 mu m, CMOS process, provides a precise limiting current as well as holding current with low dependency on both supply voltage and regulator output voltage. The experimental results showed the proposed circuit is operable in the regulator output voltage range VOUT = 1.2V to VOUT = 3.6V and supply voltage range VDD = VOUT + 0.5V to VDD = 5.6V. Since the proposed circuit is composed of few simple basic circuits such as comparator, Schmitt Trigger, it has a low current consumption which is less than ISS = 0.82 mu A at load current ILOAD = 2007mA. This makes the circuit suitable for low power and low voltage LDO design.
International conference proceedings, English

Improvement of LDO's PSRR Deteriorated By Reducing Power Consumption : Implementation and Experimental Results
Socheat Heng; Cong-Kha Pham
2009 IEEE INTERNATIONAL CONFERENCE ON INTEGRATED CIRCUIT DESIGN AND TECHNOLOGY, PROCEEDINGS, IEEE, 11-15, 2009, Peer-reviwed, In this work, a Bulk-Gate Controlled Circuit, for improving power supply rejection ratio (PSRR) of a Low Dropout Voltage Regulator (LDO) which deteriorates due to lowering of power consumption is proposed. A test chip was fabricated using 0.18-mu m CMOS process. Experimental results of the test chip demonstrate that the proposed circuit provides a high performance of PSRR which is up to 77 dB at 10 Hz, and 64.3 dB at 1 KHz, while the consumption current of the whole LDO which includes currents of all component circuits such as a reference circuit, an over current protection circuit, ect., is reduced to 8.5 mu A without load, and 35 mu A with full load. Comparing to the basic type of conventional LDOs, PSRR of the proposed Bulk-Gate Controlled LDO achieves an improvement of 16 dB for 10 Hz and 27.8 dB for 1 KHz
International conference proceedings, English

Low Power LDO with Fast Load Transient Response Based on Quick Response Circuit
Socheat Heng; Weichun Tung; Cong-Kha Pham
ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, IEEE, 2529-+, 2009, Peer-reviwed, In this work, we propose a design technique of low power fully CMOS low-dropout voltage regulator (LDO) based on quick response (QR) circuit to improve the load transient response. Implemented in 0.18 mu m. CMOS technology, the LDO with proposed QR circuit can achieve a fast load transient responses with less transient overshoot or undershoot when driving a large load current. For 1 mu F decoupling capacitor and 0.1mA-150mA load current change, the output undershoot and overshoot are 196mV and 172mV while the settling time is approximately 60 mu s and 65 mu s respectively. The proposed circuit dissipates a very low static power, with only 8.5 mu A for light load and 35 mu A for heavy load for output voltage VOUT = 1.2V and input voltage VDD = VOUT + 1.0V. This includes the reference circuit, the over current protection circuit as well as the feedback network.
International conference proceedings, English

高速かつ低消費電力な全加算器
原田津; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J91-A, 9, 915-918, Sep. 2008, Peer-reviwed, 本論文では,フルスイング出力するXNOR回路を用いた10トランジスタ全加算器を提案する. 0.18μm CMOSプロセスを用いて,HSPICEによるプレレイアウト及びポストレイアウトのシミュレーションを行った上性能を評価した.従来全加算器と比較した結果,提案全加算器は遅延及び消費電力がともに大きく改善された.
Scientific journal, Japanese
URL

高精度アクティブ分圧回路
柴田公男; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J91-A, 9, 919-922, Sep. 2008, Peer-reviwed, 本論文では,高入力インピーダンスで電源電圧,温度,製造プロセスなどの依存性をもたない高精度の差電圧を発生させるアクティブ分圧回路を提案する.0.5μmCMOSプロセスを用いて,SPICEによるシミュレーションを行って評価した.1Vppの入力電圧を1/100に分圧された出力は10mVpp±1mVpp以下の精度となり,0.1%以下の誤差に相当する結果が得られた.
Scientific journal, Japanese
URL

シリーズレギュレータの低消費電力化による電源電圧のリプル除去率(PSRR)の劣化の改善
ヘインソチェット; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J91-A, 4, 535-537, Apr. 2008, Peer-reviwed, 本論文では,低消費電力化による電源電圧のリプル除去率(PSRR)の劣化を基板バイアス制御回路を用いて改善する低消費電力・高リプル除去率のシリーズレギュレータの構成を提案する.0.25μmのCMOSプロセスで設計し, HSPICEによるシミュレーションを行った結果,消費電力は従来の1/10に低減できたにもかかわらず,PSRR特性は従来に比較して最大40[dB]の改善を確認することができた.
Scientific journal, Japanese
URL

ボンディングワイヤの抵抗を考慮したシリーズレギュレータの負荷安定のための補正回路
ヘインソチェット; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J91-A, 1, 172-175, Jan. 2008, Peer-reviwed, 本論文では,ボンディングワイヤの抵抗を考慮したシリーズレギュレータの負荷安定度を改善する補正回路を提案する.HSPICEによるシミュレーションの結果により,負荷が変化してもレギュレータの出力電圧を0.5%の変動幅に保つことが確認できた.
Scientific journal, Japanese
URL

Improvement of power supply rejection ratio of LDO deteriorated by reducing power consumption
Socheat Heng; Cong-Kha Pham
2008 IEEE INTERNATIONAL CONFERENCE ON INTEGRATED CIRCUIT DESIGN AND TECHNOLOGY, PROCEEDINGS, IEEE, 43-46, 2008, Peer-reviwed, In this work, the bulk-gate controlled circuit to improve the power supply ripple ratio (PSRR) of a Low Dropout Regulator (LDO) which deteriorates due to lowering power, consumption is proposed. Designing with 0.25 mu m CMOS process, the simulation results by HSPICE shown that the proposed circuit provides a high performance of PSRR even though 1/10 of the power consumption is reduced compare to the conventional circuit. It is confirmed that about 40[dB] at 10[Hz] frequency and 20[dB] at 1[kHz] frequency of PSRR are together improved.
International conference proceedings, English

コンパクトなハミング重み比較回路
範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J90-A, 10, 762-766, Oct. 2007, Peer-reviwed, "0"と"1"を含む数列間のハミング重みを比較するコンパクトな回路を提案する.266トランジスタのみで64ビットのハミング重み比較回路が構成でき,HSPICEのシミュレーション結果より,従来回路と同様の0.8μm CMOSプロセスを用いた場合の最大遅延は4.5[ns]である.
Scientific journal, Japanese
URL

最適設計による高速かつ小規模なディジタル比較回路
範公可; 高橋俊太郎
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J90-A, 9, 727-730, Sep. 2007, Peer-reviwed, 各けたの比較結果を本状に接続する構成を有するディジタル比較回路の第0ステージの最適化,及び,ゲートサイジング,バッファ(インバータ)の挿入等の改善を施した64ビットディジタル比較回路について述べる.従来ディジタル比較回路に比べ回路規模や遅延の改善が確認できた.
Scientific journal, Japanese
URL

低消費電力シリーズレギュレータ用の過電流保護回路
ヘインソチェット; 清水麻里江; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J90-A, 7, 619-621, Jul. 2007, Peer-reviwed, 単純なアナログの基本回路を用いたシリーズレギュレータ以下「レギュレータ」用の過電流保護回路を提案する.提案回路のコンセプトでは,レギュレータの入出力電圧の依存性が少なく,安定した制限電流及び保持電流の回路が得られる.また,単純な回路構成であるため,約1.2μAの低消費電流及び0.0079mm^2の省面積回路が実現可能となる.0.35μmROHM社のプロセスを用いて設計したレイアウトによるポストシミュレーションにより,レギュレータの出力電圧V_は1.2〜3.6V,電源電圧V_
はV_+0.5〜6.0Vの範囲で使用可能であることが確認できた.
Scientific journal, Japanese
URL

A Compact Hamming Distance Detector
Fu Qu; Cong-Kha Pham
Proc. of 2007 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'07), 37-40, Mar. 2007, Peer-reviwed
International conference proceedings, English

Rank Order Filter Using Analog Hamming Comparator
Tei Ko; Cong-Kha Pham
Proc. of 2007 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'07), 105-108, Mar. 2007, Peer-reviwed
International conference proceedings, English

Image Encryption Method Using Chaotic System Having Dynamic Initial Condition
Do Toan; Cong-Kha Pham
Proc. of 2007 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'07), 245-248, Mar. 2007, Peer-reviwed
International conference proceedings, English

Solving Large N-Queen Problem with a Maximum Neuron Model by Canceling Diagonal Competition
Cong-Kha Pham; Watarru Noguchi
Journal of Signal Processing, 〔信号処理学会〕, 11, 1, 25-32, Jan. 2007, Peer-reviwed
Scientific journal, English
URL

CMOS schmitt trigger circuit with controllable hysteresis using logical threshold voltage control circuit
Cong-Kha Pham
6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, IEEE COMPUTER SOC, 48-+, 2007, Peer-reviwed, A simple logical threshold voltage control circuit is proposed. It can be implemented using normal conventional CMOS inverters. The proposed circuit is able to control a logical threshold voltage of agate linearly and continuously over a range of a power supply voltage. Applications to the Schmitt trigger circuit with controllable hysteresis and a window comparator are shown to demonstrate practical usages of the proposed circuit.
International conference proceedings, English

An edge extraction method for color image using multiple-valued LoG filter and color space
Cong-Kha Pham; Koutaro Yamano
6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, IEEE COMPUTER SOC, 658-+, 2007, Peer-reviwed, All of the conventional edge extraction methods using LoG filter give output results having binary value. Therefore, the extracted edge always looses a lot of important information. To overcome this problem, the edge extraction method using LoG filter having multiple-valued output such as 3-valued and 5-valued has been proposed [1]. In our work, to improve this extraction method so far, we emphasis the previous method by changing a size of the LoG filter and threshold values as the propose first method. Also, we propose the second method for getting a more detailed edge from a lightness information L, a chrominance information contained in I (orange-blue) information and Q (purple-green) information.
International conference proceedings, English

Compensated circuit for Low Dropout Regulator having stable load regulation after consideration of bonding wire resistance
Socheat Heng; Cong-Kha Pham
2007 EUROPEAN CONFERENCE ON CIRCUIT THEORY AND DESIGN, VOLS 1-3, IEEE, 120-123, 2007, Peer-reviwed, A compensation circuit which considered a resistance of a bonding wire for improving a load regulation of a Low Dropout Regulator (LDO) is presented. The circuit is designed a conventional 0.18 mu CMOS process that provides a high performance of a load regulation for a LDO despite of a high load current and a high bonding wire resistance. The proposed circuit is not affected by an input to the LDO and an output voltage setting as well as a variation of temperature and threshold voltages of transistors. The output voltage of the LDO which can be maintained at 0.5% fluctuation when a load current change from 0[mA] to 300[mA] is confirmed by simulation results of HSPICE. This characteristic plays an importance role for the design of LDOs at low output voltage with a high output current.
International conference proceedings, English

Quick response circuit for low-power LDO voltage regulators to improve load transient response
Socheat Heng; Cong-Kha Pham
2007 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, VOLS 1-3, IEEE, 28-33, 2007, Peer-reviwed, In this work, we propose a quick response circuit to improve the load transient response of fully low dropout voltage linear regulator (LDO) which is operable with a very low power consumption. Simulating by HSPICE with 0.35 mu m CMOS technology shows that we can achieve the transient responses with less transient overshoot or undershoot when driving large current loads. Comparing to the generic LDO, for example, in case of 1 mu F decoupling capacitor, about 95% output drop and 27% settling time for 0.1mA to 100mA load current and 88% output overshoot and 63% settling time for 100mA to 0.1mA load current have been together improved. The proposed circuit only dissipates low static power, so we could achieve the above LDO with only 3.3 mu A consuming current at Vout + 1V and 150mA load current. Vout is the output voltage of the regulator.
International conference proceedings, English

マキシマムニューロン及び修正Hill-Climbing項を用いたN-Queen問題の解法
野口渉; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J89-A, 11, 1012-1017, Nov. 2006, Peer-reviwed, Takefujiが提案したマキシマムニューロンモデルは,互いに素に分割されたニューロングループの中で,最大の入力をもつニューロンのみが発火する"winner-take-all"方式を採用したホップフィールド型ニューラルネットワークの一種である.制約条件充足型の組合せ最適化問題に対して非常に有効である.本論文は,マキシマムニューロンモデルに対し山登り学習法を適用し,また,従来のHill-Climbing項に修正を加えてN-Queen問題の解法を提案する.結果により,従来の解法に比べて優れた求解性能をもつことを示す.
Scientific journal, Japanese
URL

Low Power Full Input Range Current-Mode Operational Amplifier Using Level Shifter Technique
Socheat Heng; Cong-Kha Pham
Journal of Signal Processing, 〔信号処理学会〕, 10, 6, 385-390, Nov. 2006, Peer-reviwed
Scientific journal, English
URL

An effective solving method for N-Queens problem
Wataru Noguchi; Cong-Kha Pham
Proc. of 2006 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'06), 321-324, Mar. 2006, Peer-reviwed
International conference proceedings, English

A Robust-Fragile Dual Watermarking System Based on Bilateral Filtering
Fu Qu; Cong-Kha Pham
Proc. of 2006 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'06), 345-348, Mar. 2006, Peer-reviwed
International conference proceedings, English

A 1.2V Current-Mode Operational Amplifier Using Level Shifter Technique
Socheat Heng; Cong-Kha Pham
Proc. of 2006 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'06), 393-396, Mar. 2006, Peer-reviwed
International conference proceedings, English

CPL-Based Low-Power Full Adder
Chau-Hai Huynh; Cong-Kha Pham
Proc. of 2006 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'06), 401-404, Mar. 2006, Peer-reviwed
International conference proceedings, English

A Digital Comparator using Analog Operations
Yuji Kunida; Cong-Kha Pham
Proc. of 2006 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'06), 397-400, Mar. 2006, Peer-reviwed
International conference proceedings, English

A 1.5V current-mode operational amplifier using level shifter technique
Socheat Heng; Cong-Kha Pham
2006 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION, AND TEST (VLSI-DAT), PROCEEDINGS OF TECHNICAL PAPERS, IEEE, 291-+, 2006, Peer-reviwed, A low-voltage and low-power consumption of current-mode operational amplifier designed with level shifter technique is presented. This simple integrator is built up with only 9 typical MOSFETs and 2 bias current sources. To minimize the influence of common-mode signal and noise to the signal processing, the differential structure is applied. As the result of simulation, it has been confirmed that the proposed circuit works as integrator in the frequency range 0-1.6MHz at 1.5V supply voltage and consumed DC power at maximum 8.85 mu W with 1.2 mu m double-poly CMOS process.
International conference proceedings, English

A proposal to solve N-Queens problems using maximum neuron model with a modified hill-climbing term
Wataru Noguchi; Cong-Kha Pham
2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, IEEE, 2679-+, 2006, Peer-reviwed, An effective solving method with a modified hill-climbing term which is applied to a maximum neuron model for the N-Queens problems is proposed. In which, a first model using a gradient ascent learning for determining A and B coefficients, a second model using fixed A and B coefficients which are determined by an upper bound of an input value to a neuron, and a third model using modified initial values which applied to the second model, have been adopted. As a result, calculation times are reduced when compared with the previous methods.
International conference proceedings, English

A hardware accelerator for solving the N-Queen problem
Cong-Kha Pham; Wataru Noguchi
PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, ACTA PRESS ANAHEIM, 146-+, 2006, Peer-reviwed, A hardware accelerator for solving the N-Queen problem is presented. It has been designed to perform an effective solving method with a modified hill-climbing term which is applied to a maximum neuron model for solving the N-Queen problems. In the solving method, a first model using a gradient ascent learning for determining A and B coefficients, a second model using fixed A and B coefficients which are determined by an upper bound of an input value to a neuron, and a third model using modified initial values which applied to the second model, have been adopted. As a result, calculation times are reduced up to 40% when compared with the previous methods. And, the hardware accelerator gives up to 25 times of higher speed when compared with software running by the PC.
International conference proceedings, English

Simple logic threshold conversion circuits
Cong-Kha Pham
2006 13TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, IEEE, 268-271, 2006, Peer-reviwed, Three simple logic threshold conversion circuits are proposed. The proposed simple logic threshold conversion circuits can be implemented using normal conventional MOS transistors. Conversion characteristics of the proposed circuit have been confirmed using HSPICE simulations. As a result, the proposed simple logic threshold conversion circuits can linearly and continuously change the logic threshold voltage through a whole range of the supply voltage.
International conference proceedings, English

低消費電力全加算器
柳沢真; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J88-A, 10, 1163-1167, Oct. 2005, Peer-reviwed, パストランジスタを用いる回路において, 以前からの問題であったしきい値電圧による出力信号の電圧低下を, 基板バイアス効果を抑制することによって改善する方法を提案する. 出力信号の電圧低下の減少により, 出力バッファでの消費される電力の低減も期待できる. 提案する方法について, パストランジスタを使用しているSERF全加算器[1]を用いて評価を行った. HSPICEによるシミュレーションの結果により, 従来回路における特定の入力信号による不正常出力信号が正常出力信号に改善できた. 更に, 従来回路と比較して消費電力, 遅延の減少が確認できた.
Scientific journal, Japanese
URL

A Stochastic Bit-Stream Digital Neuron Using Generalized LFSR and It's Application to Two-Dimensional Binary Classification
Cong-Kha Pham; Makoto Fukuda
Journal of Signal Processing, 〔信号処理学会〕, 9, 5, 409-414, Sep. 2005, Peer-reviwed
Scientific journal, English
URL

Low Power Full Adder Cell using XNOR Circuit of Pass Transistor
Makoto Yanagisawa; Cong-Kha Pham
Proc. of 2005 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'05), 163-166, Mar. 2005, Peer-reviwed
International conference proceedings, English

A new N-parallel updating method of the Hopfield-type neural network for N-queens problem
TN Le; CK Pham
Proceedings of the International Joint Conference on Neural Networks (IJCNN), Vols 1-5, IEEE, 788-791, 2005, Peer-reviwed, In the previous N-parallel updating methods of the Hopfield-type neural network for N-queens problem, NxN neurons have been grouped into N groups. Each group composed of N neurons which are located in a same horizontal line (column) or in a same diagonal line. However, these method did not give convergence results of 100% in all size of N. Also, they required a large convergence time steps. In our work, we propose a new N-parallel updating method of the Hopfield-type neural network for N-queens problem, in which, a new grouping method for N neurons composed in the same group has been adopted. As a result, simulation results of the proposed method show a best performance than the previous generally.
International conference proceedings, English

Tolerance on geometrical operation as an attack to watermarked JPEG image
CK Pham; H Yamashita
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, SPRINGER-VERLAG BERLIN, 3681, 1199-1204, 2005, Peer-reviwed, A method of embedding binary data into JPEG bitstreams has been reported in [1]. However, attacks as geometrical operations to the watermarked JPEG image data have not been analyze. In this work, we propose a method for improvement the tolerance on the geometrical operations as attacks to the watermarked JPEG image.
Scientific journal, English

並列Generalized-LFSRを用いた自己組み込みテスト
東裕貴; 範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J87-A, 9, 1252-1253, Sep. 2004, Peer-reviwed, 本論文では,自己組込みテスト(Built-In Self-Test:BIST)のテスト発生回路(Test PatternGenerator:TPG)に並列汎用線形フイードバックシフトレジスタ(Parallel Generalized Linear Feedback Shift Register:並列GLFSR)を提案する.従来のTPGであるLFSR,GLFSRと並列GLFSRをIS-CAS'85ベンチマーク回路に適用した場合のテスト数をそれぞれ比較する.提案したTPGが故障検出率95%を超える場合のテスト数がより少ないという結果から,BISTのTPGとして有効であると考察する.
Scientific journal, Japanese

Improvement of Robustness on Embedding of BinaryData to JPEG Image
Hiroshi Yamashita; Cong-Kha Pham
Proc. of 2004 RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP'04), 65-68, Mar. 2004, Peer-reviwed
International conference proceedings, English

A stochastic pulse bit-stream with high accurate multiplication
CK Pham; M Fukuda
2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL III, CONFERENCE PROCEEDINGS, IEEE, 93-96, 2004, Peer-reviwed, In order to scale down a circuit size for hardware realization of neural network systems, there are many researches on a conventional pulse model neuron, in which, a conventional numerical calculation is replaced by a stochastic calculation. In the conventional stochastic calculation, LFSR (Linear Feedback Shift Register) has been used as a random number generator. In general, a good randomness is required for an exact calculation result. However, since the random numbers which are generated by LFSR is lacking in the randomness, there is a problem which affects an accuracy of the conventional stochastic calculation which uses LFSR for generating the random numbers. In our work, in order to improve the accuracy of the conventional stochastic calculation, we propose to use GLFSR (Generalized-LFSR) which generates the random numbers which are rich in randomness as the random number generator for the stochastic calculation. Also, the accuracy of the stochastic calculation is evaluated using a standard deviation value (SD) and a maximum error. As a result, the proposed stochastic calculation which using,GLFSR as the random number generator shows a highest accuracy of the calculation result.
International conference proceedings, English

A pulse model neuron with high accurate calculation and it's application to two-dimensional binary classification
CK Pham; M Fukuda
Proceedings of the 2004 Intelligent Sensors, Sensor Networks & Information Processing Conference, IEEE, 411-416, 2004, Peer-reviwed, In order to scale down a circuit size for hardware realization of neural network systems, there are many researches on a conventional pulse model neuron, in which, a conventional numerical calculation is replaced by a stochastic calculation. In the conventional stochastic calculation, LFSR (Linear Feedback Shift Register) has been used as a random number generator In general, a good randomness is required for all exact calculation result. However since the random numbers which are generated by LFSR are lacking in the randomness, there is a problem which affects an accuracy of the conventional stochastic calculation which uses LFSR for generating the random numbers. In our work, in order to improve the accuracy of the conventional stochastic calculation, we propose to use GLFSR (Generalized-LFSR) which generates the random numbers which are rich in randomness as the random number generator for the stochastic calculation. The accuracy of the stochastic calculation is evaluated using a Root-Mean-Square error (RMS) and a maxinum error As a result, the proposed stochastic calculation which uses GLFSR as the random number generator shows a highest accuracy of the calculation result. Moreover, when it actually applied to neural network for solving a problem of two-dimensional binary classification, the network shows better classification results other than that of the neural network which uses the conventional stochastic calculation.
International conference proceedings, English

Implementation of a Novel CMOS Synapses Circuit
Cong-Kha Pham
Journal of Signal Processing, 〔信号処理学会〕, 7, 1, 111-116, Jan. 2003, Peer-reviwed
Scientific journal, English
URL

A Novel Synapses Circuit and It's Application to a Neural-Based A/D Converter
C-K. Pham
Proc. of IEEE International Symposium on Circuits and Systems ISCAS'01, 612-615, May 2001, Peer-reviwed
International conference proceedings, English

新型CMOSシナプス回路
範公可
電子情報通信学会A論文誌, The Institute of Electronics, Information and Communication Engineers, J84-A, 2, 246-248, Feb. 2001, Peer-reviwed, ニューロン回路等に用いられている新型のCMOSシナプス回路を提案している.従来のCMOSインバータのみで構成されているシナプス回路において, 重み値はCMOSインバータの構成要素であるPMOSとNMOSトランジスタの相互コンダクタンスg_mとして実現されていた.しかし, CMOSインバータの出力端子の電圧レベルによって, これらのPMOSとNMOSトランジスタの相互コンダクタンスg_mが変動してしまい, シナプス回路に非線形な出力特性をもたらしてきた.今回, 提案した新型CMOSシナプス回路では, 構成しているCMOSインバータ回路に抵抗を導入し正確にシナプスの重み値を実現することができた.
Scientific journal, Japanese
URL

A Neural-Based A/D Cowerter Using only CMOS Inverter
C-K. Pham
Journal of Signal Processing, 4, 1, 95-98, Jan. 2000, Peer-reviwed
Scientific journal, English

An Appliction of Genetic Algorithm to a Backward Evolution of Cellular Autowata
Y. Murakami; K. Kitakaze; C-K. Pham
Proc. of 1999 International Symposium on Nonlinear Theory and It's Applications, 327-330, Dec. 1999, Peer-reviwed
International conference proceedings, English

Simple Methods for Secure Communications Using Nonlinear Mapping Function
C-K. Pham
Proc. of 1997 Int. Symposium on Nonlinear Theory and It's Applications NOLTA '97 Proceedings, 101-103, Dec. 1997, Peer-reviwed
International conference proceedings, English

Bifurcational Coummunication with Novel Chaotic Transistors Circuit
C-K. Pham
International Journal of Chaos Theory and Applications, 2, 2, 25-34, Feb. 1997, Peer-reviwed
Scientific journal, English

Chaotic behavior and synchronization phenomena in a novel chaotic transistors circuit
CK Pham; M Korehisa; M Tanaka
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 43, 12, 1006-1011, Dec. 1996, Peer-reviwed, In this brief, chaotic behavior and synchronization phenomena which occur in a novel Chaotic Transistors circuit with high speed operation are described. The most important point in this brief is to change a nonlinear transfer characteristic of a MOS inverter to a nonlinearity generating a chaos. The proposed circuit includes a looped MOS inverter having a pull-up resistor serially connected to a pull-down NMOS transistor. A switched capacitor (SC) circuit having a hold capacitor and two CMOS switches is added in the loop of the circuit to operate sampling holding. The chaotic behavior has been found along with a variation of a sampling clock frequency. The synchronization phenomena is also found between two coupled Chaotic Transistors circuits. The test chip is implemented employing 2 mu m CMOS technology of MOSIS service.
Scientific journal, English

A simple 6-bit neural-based A/D converter employing only CMOS inverters
CK Pham; M Tanaka; K Shono
ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 1, I E E E, 357-360, 1996, Peer-reviwed
International conference proceedings, English

Bifurcational communication with novel chaotic transistors circuits
CK Pham; M Tanaka
ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 3, I E E E, 100-103, 1996, Peer-reviwed
International conference proceedings, English

Implementation of nonlinear circuits employing MOSIS
C-K. Pham; M. Tanaka
Proc. of 1995 International Symposium on Nonlinear Theory and Applications Nolta '95, 1-4, Dec. 1995, Peer-reviwed
International conference proceedings, English

DISCRETE-TIME CELLULAR NEURAL NETWORKS WITH 2 TYPES OF NEURON CIRCUITS FOR IMAGE-CODING AND THEIR VLSI IMPLEMENTATIONS
CK PHAM; M IKEGAMI; M TANAKA
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78A, 8, 978-988, Aug. 1995, Peer-reviwed, This paper described discrete time Cellular Neural Networks (DT-CNN) with two types of neuron circuits for image coding from an analog Format to a digital format and their VLSI implementations. The image coding methods proposed in this paper have been investigated for a purpose of transmission of a coded image and restoration again without a large loss of an original image information. Each neuron circuit of a network receives one pixel of an input image, and processes it with binary outputs data fed from neighboring neuron circuits. Parallel dynamics quantization methods have been adopted for image coding methods. They are performed in networks to decide an output binary value of each neuron circuit according to output values of neighboring neuron circuits. Delayed binary outputs of neuron circuits in a neighborhood are directly connected to inputs of a current active neuron circuit. Next state of a network is computed from a current state at some neuron circuits in any time interval. Models of two types of neuron circuits and networks are presented and simulated to confirm an ability of proposed methods. Also, physical layout designs of coding chips have been done to show their possibility of VLSI realizations.
Scientific journal, English

Discrete Time Cellular Neural Networks with Two Types of Neuron Circuits for Image Coding and Their VLSI Implementation
C-K. Pham; M. Ikegami; M. Tanaka
IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, E-78-A, 8, 291-299, Aug. 1995, Peer-reviwed
Scientific journal, English

A Simple Chaos Generator and It's Nonlinear Analysis
C-K. Pham; M. Korehisa; M. Tanaka
Proc. of European Conference on Circuit Theory Design ECCTD '95, 1125-1128, Aug. 1995, Peer-reviwed
International conference proceedings, English

CHAOTIC BEHAVIOR IN SIMPLE LOOPED MOS INVERTERS
CK PHAM; M TANAKA; K SHONO
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E78A, 3, 291-299, Mar. 1995, Peer-reviwed, In this paper, bifurcation and chaotic behavior which occur in simple looped MOS inverters with high speed operation are described. The most important point in this work is to change a nonlinear transfer characteristic of a MOS inverter to the nonlinearity generating a chaos. Three types of circuits which include four, three and one MOS inverters, respectively, are proposed. A switched capacitor (SC) circuit to operate sampling holding is added in the loop in each of the circuits. The bifurcation and chaotic behavior have been found along with a variation of an external input, and/or a sampling clock frequency. The bifurcation and chaotic behavior of the proposed simple looped MOS inverters are verified by employing SPICE circuit simulator as well as the experiments. For the first type of four looped CMOS inverters, Lyapunov exponent lambda which has the positive regions for the chaotic behavior can be calculated by use of the fitting nonlinear function synthesized from two sigmoid functions. For the second type of three looped CMOS inverters and the third type of one looped MOS inverter, the nonlinear charge/discharge characteristics of the hold capacitor in the SC circuit is utilized efficiently for forming the nonlinearity generating the bifurcation and chaotic behavior. Their bifurcation can be generated by the sampling clock frequency parameter which is controlled easily.
Scientific journal, English

Associative dynamics of competitive cellular neural network
M KANAYA; M TAKAHIRA; T WATANABE; CK PHAM; M TANAKA
1995 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, I E E E, 1152-1155, 1995, Peer-reviwed
International conference proceedings, English

Pulse coded cellular neural network and it's hardware implementation
CK Pham; T Kimura; M Ikegami; M Tanaka
1995 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS PROCEEDINGS, VOLS 1-6, IEEE, 4, 1590-1594, 1995, Peer-reviwed
International conference proceedings, English

A Novel Chaos Generator Employing CMOS Inverfer for Cellular Neural Networks
C-K. Pham; M. Tanaka
Proc. of 3rd IEEE International Workshop on Cellular Neural Networks and their Applications CNNA-94, 355, Dec. 1994, Peer-reviwed
International conference proceedings, English

Design of Dynamical Image Halftoning Processor
H. Numata; C-K. Pham; M. Tanaka
Proc. of 2nd Asian Pacific Conference on Hardware Description Languages APCHDL'94, 151-154, Oct. 1994, Peer-reviwed
International conference proceedings, English

Binocular Stereo Vision by Silicon Retina
M. Awata; Y. Nakamura; C-K. Pham; M. Tanaka
Proc. of 5th Australian Conference on Neural Networks, 125-128, Feb. 1994, Peer-reviwed
International conference proceedings, English

BIFURCATION AND CHAOS IN CMOS INVERTERS RING OSCILLATOR
CK PHAM; M TANAKA; K SHONO
1994 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, I E E E, E697-E700, 1994, Peer-reviwed
International conference proceedings, English

A SIMPLE NEURAL-BASED A/D CONVERTER EMPLOYING CMOS INVERTERS
CK PHAM; M TANAKA; K SHONO
1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, I E E E, 2093-2096, 1994, Peer-reviwed
International conference proceedings, English

Design and Performance of CMOS Analog Fuzzy Chips
K. Shono; C-K. Pham
Proc. of 3rd International Conference on Industrial Fuzzy Control Intelligent Systems IFIS'93, 161-166, Dec. 1993, Peer-reviwed
International conference proceedings, English

2-bit Neuron Circuit for Cellular Neural Network
C-K. Pham; M. Ikegami; M. Tanaka
Proc. of 1993 International Symposium on Nonlinear Theory and its Applications NOLTA'93, 1371-1374, Dec. 1993, Peer-reviwed
International conference proceedings, English

A HARDWARE ACCELERATOR FOR DESIGN-RULE CHECKING IN A BIT-MAPPING CAD-SYSTEM
CK PHAM; K SHONO
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, E76A, 10, 1684-1693, Oct. 1993, Peer-reviwed, A hardware accelerator for a raster-based design-rule checking called BITDRC for a bit-mapping CAD system is described. BITDRC is a special-purpose hardware accelerator which performs design-rule checking for the Manhattan layout style VLSI circuits, much faster than the software checking which belonged to the bit-mapping CAD system before. The bit-mapping CAD system had effectively been developed for both of educational and VLSI design purposes, and just needs only a personal computer as a compact working environment. The proposed hardware architecture is rather simply and characterized by the bit-mapping CAD system where it works on. The hardware architecture and checking algorithm have been confirmed by implementing a bread-board prototype using discrete components. As a result, the processing time of BITDRC is speeded up as much as 500 times faster than the original software and takes only 4 seconds for checking every rule on a (1500 x 1500) grids layout pattern. BITDRC performs the error checking together with the data scanning that makes it can be used as an on-line design-rule checker for the bit-mapping CAD system. Finally, the physical layout of BITDRC has been designed using a conventional CMOS technology.
Scientific journal, English

Pipelining System of Discrete Time Cellular Neural Networks for Information Coding and Decoding
M. Tanaka; N. Shimizu; C.-K. Pham; M. Ikegami; Y. Nakamura
11th European Conference on Circuit Theory and Design ECCTD'93, 45-50, Sep. 1993, Peer-reviwed
International conference proceedings, English

CMOS DIGITAL RETINA CHIP WITH MULTIBIT NEURONS FOR IMAGE-CODING
CK PHAM; M IKEGAMI; M TANAKA; K SHONO
1993 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS : PROCEEDINGS, VOLS 1-4 ( ISCAS 93 ), I E E E, 2752-2755, 1993, Peer-reviwed
International conference proceedings, English

Fuzzy Processors using Neural Phenomena in CMOS Digital LSI
K. Shono; C-K. Pham
Proc. of 2nd international Conference on Fuzzy logic and Neural Networks, 17-22, Jun. 1992, Peer-reviwed
International conference proceedings, English

A CMOS CELL COMPILER FOR A BIT-MAPPING CAD SYSTEM
CK PHAM; K SHONO
IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, IEICE-INST ELECTRON INFO COMMUN ENG, 74, 9, 2603-2611, Sep. 1991, Peer-reviwed, In this paper, the CMOS Cell Compiler (CCC) which was developed on our bit-mapping CAD is introduced. The CCC generates the physical layout for a logic cell from a functional description expressed by a set of Boolean equations. A CMOS digital LSI having up to 10(4) transistors can be designed on this bit-mapping CAD by placing a physical layout of cells generated on the bit-map editor and by giving interconnections manually among the cells. The physical layout obeys lambda-base design rule on a bit-map grid plane and can support single-and double-metal CMOS process. An aspect ratio and a position of input and output terminals in a rectangle cell are depending on functional description of Boolean equations.
Scientific journal, English

A Bitmap Memory Bank of Region Access
高窪統; ファム・コンカー; 庄野克房
電子情報通信学会C-II論文誌, 電子情報通信学会エレクトロニクスソサイエティ, J75-C-II, 4, 227-235, Apr. 1991
Scientific journal, Japanese
URL

MISC

A Parallel Hybrid Adaptive CORDIC in 180 nm CMOS Technology (集積回路)
Nguyen Hong-Thu; Pham Cong-Kha
電子情報通信学会, 15 Dec. 2016, 電子情報通信学会技術研究報告 = IEICE technical report : 信学技報, 116, 364, 101-104, English, 0913-5685, 40021059179, AN10013276
URL

A Current-mode Successive Approximation Analog-to-Digital Converter in 0.18μm CMOS Technology
YOMOGITA Takumu; PHAM Cong-Kha
An 8-bit current-mode Successive Approximation (SAR) Analog-to-Digital Converter (ADC) has been proposed. This proposed current-mode SAR ADC was fabricated using ROHM 0.18 μm CMOS technology. As a result, the simulated results gave the maximum sampling frequency is 200 kHz and the power consumption is 2.3 μW and, the Figure of Merit (FoM) is achieved 45 fJ/conv. In addition, this proposed ADC can be operating 0.9 V supply voltage, thus this circuit can be implemented on the Mixed-Signal chip., The Institute of Electronics, Information and Communication Engineers, 01 Dec. 2014, IEICE technical report. Computer systems, 114, 346, 95-99, Japanese, 0913-5685, 110009977423, AN10013141
URL

Set Operating Processor (SOP) : Application for Image recognition
INOUE Katsumi; LE Duc-Hung; SOWA Masahiro; PHAM Cong-Kha
The Processing burden of information search such as verification and recognition for conventional processor CPU, GPU and DSP is very large. Since birth of Computer the use of technology have been compensated for this weakness in for the current computer and reduce the burden on conventional processor. Today the processing speed of conventional processor has reached its limit, new hardware architecture for implementing fast search more accurately matching and more recognition is essential. From this background the authors propose a memory which stored information and which can find own information itself by own hardware algorithm (Set Operating Processor) .This technology has been able to get the feasibility of practical application for image recognition., The Institute of Electronics, Information and Communication Engineers, 30 Sep. 2013, Technical report of IEICE. VLD, 113, 235, 35-40, Japanese, 110009821692, AN10013323

Input common-mode voltage of the opamp improved wide-area by bulk-control
OHSAWA Mamoru; PHAM Cong-Kha
We examine the wide area of the common-mode input voltage range of the op amp that operates at low power supply voltage. In this study, we aimed to realize a wide area by using a silicon substrate that can controlle the bulk terminal of the MOS transistor. It indicates that controlling bulk terminal is able to spread the common mode voltage range without changing of transconductance gm which conventional circuit has. In addition, this reserch indicates not only common mode voltage range, but also improvement of CMRR., The Institute of Electronics, Information and Communication Engineers, 01 Aug. 2013, Technical report of IEICE. SDM, 113, 172, 105-110, Japanese, 0913-5685, 110009806049, AN10013254
URL

Input common-mode voltage of the opamp improved wide-area by bulk-control
OHSAWA Mamoru; PHAM Cong-Kha
We examine the wide area of the common-mode input voltage range of the op amp that operates at low power supply voltage. In this study, we aimed to realize a wide area by using a silicon substrate that can controlle the bulk terminal of the MOS transistor. It indicates that controlling bulk terminal is able to spread the common mode voltage range without changing of transconductance gm which conventional circuit has. In addition, this reserch indicates not only common mode voltage range, but also improvement of CMRR., The Institute of Electronics, Information and Communication Engineers, 01 Aug. 2013, Technical report of IEICE. ICD, 113, 173, 105-110, Japanese, 0913-5685, 110009806072, AN10013276
URL

Low-Power High-Speed Rail-to-Rail Voltage Buffer for LCD Drivers
TSUKAMOTO Yousuke; Kha Pham Cong
This paper propodes a high-speed low-power classs B buffer amplifier topology for liquid crystal display applications. The output buffer amplifiers of a LCD driver determine the speed resolution, voltage swing and power consumption of whole driver. This paper propose a low static current by using the circuit in which the drive only in a dynamic state to the load capacitor of a liquid crystal display. The proposed buffer can drive a 1nF column line load within a 1.5 V/us at rising edge and 1 V/us at falling edge respectively. Static current is only 1.5uA from a 3.3V power supply., The Institute of Electronics, Information and Communication Engineers, 15 Dec. 2011, Technical report of IEICE. ICD, 111, 352, 47-52, Japanese, 0913-5685, 110009466837, AN10013276
URL

A Compact Adaptive Slope Compensation Circuit for Current-Mode DC-DC Converter
SHIBATA Kimio; PHAM Cong-Kha
A novel compact adaptive slope compensation circuit of the current-mode dc-dc converter has proposed. The proposed compact circuit achieved small die size and low power consumption. The current-mode dc-dc converter has confirmed the operation with the proposed circuit by HSPICE at the switching frequency of 1.2MHz., The Institute of Electronics, Information and Communication Engineers, 25 Nov. 2009, IEICE technical report, 109, 316, 107-111, Japanese, 0913-5685, 110007863167, AA11645397
URL

A DC-DC Converter Using A High Speed Soft-Start Control Circuit
SHIBATA Kimio; PHAM Cong-Kha
The battery powered equipment demands small size, light weight and long battery operating time. The sleep mode and shut down mode is a way to reduce the power consumption. The high-speed power-up supply system enables a high frequent sleep mode system. DC-DC switching regulators generate the inrush current and overshoot voltage without a soft-start control circuit. The inrush current and the overshoot ruin the battery operating time and also the reliabilities of a coil and electronic parts. In this paper, proposes Current Mode PWM DC-DC Buck Converter which enables high-speed power-up without high inrush current and overshoot. The proposed control circuit decreased the inrush current and the output overshoot by independent of the load current, the input voltage, and the operating temperature. The power-up time, Soft-Start time, is reduced from 1/10 to 1/50 in comparison with the typical industry standard products., The Institute of Electronics, Information and Communication Engineers, 12 Jun. 2009, IEICE technical report, 109, 90, 1-6, Japanese, 0913-5685, 110007331828, AN10012932
URL

64-bit Digital Comparator
TAKAHASHI Syuntarou; PHAM Cong-Kha
映像情報メディア学会, 27 Jul. 2006, ITE technical report, 30, 38, 57-60, Japanese, 1342-6893, 10018045727, AN1059086X
URL

64-bit Digital Comparator
TAKAHASHI Syuntarou; PHAM Cong-Kha
A new connection method for connecting comparison results in a digital comparator which uses previous comparison results for determining next comparison results is proposed. Implementation of a 64-bit digital comparator based on a digital signal comparison algorithm corresponding to the proposed connection method is presented. According to simulation results, circuit components and delay time were improved when compared with the previous works., The Institute of Electronics, Information and Communication Engineers, 20 Jul. 2006, IEICE technical report, 106, 189, 57-60, Japanese, 0913-5685, 110004811051, AN10013276
URL

エレクトロニクス素子と集積の総合理解を目指す教育－研究・教育活性化支援システム教育プロジェクト－
野崎眞次; 範公可
Jan. 2005, 電気通信大学紀要, 17, 1, 2, Japanese, Introduction other

Research on improvement of fault coverage in BIST using parallel GLFSR
HIGASHI Hirotaka; PHAM C-K
In this paper, a Parallel Generalized Linear Feedback Shift Register (Parallel GLFSR) which is applied to Test Generator Circuit (TPG) of Built-in Self-Test (BIST) is proposed. The Fault Coverage are compared in the case of TPG is implemented using, Linear Feedback Shift Register (LFSR), GLFSR and Parallel GLFSR, respectively, and applied to the ISCAS benchmark circuits. The TPG composition and the simulation method of the proposal technique are described. As a result, the proposed technique is the most suited TPG for BIST, due to the highest fault coverage is obtained in the same number of tests when TPG is implemented using Parallel GLFSR., The Institute of Electronics, Information and Communication Engineers, 31 Oct. 2003, Technical report of IEICE. CST, 103, 406, 9-12, Japanese, 0913-5685, 110003299912, AN10438446
URL

Books and other publications

「論理回路」（コンピュータサイエンス教科書シリーズ）
曽和将容; 範公可
Japanese, Joint work, コロナ社, Aug. 2013

Lectures, oral presentations, etc.

A Perpetuum Mobile 32bit CPU with 13.4pJ/cycle, 0.14μA Sleep Current using Reverse-Body-Bias Assisted 65nm SOTB CMOS Technology
Koichiro Ishibashi(UEC; Nobuyuki Sugii(LEAP; Kimiyoshi Usami(SIT; Hideharu Amano(K; Kazutoshi Kobayashi(KIT; Cong-Kha Pham(UEC; Hideki Makiyama; Yoshiki Yamamoto; Hirofumi Shinohara; Toshiaki Iwamatsu; Yasuo Yamaguchi; Hidekazu Oda; Takumi Hasegawa; Shinobu Okanishi; Hiroshi Yanagita(LEAP
Invited oral presentation, English, 電子情報通信学会,集積回路研究会(SDM2014-62,ICD2014-31), Invited, Domestic conference
04 Aug. 2014

超高速なデータ検索を実現するデータベースプロセッサー（ＤＢＰ）～メモリ型コンピューティングで情報処理を大きく進化革新～
井上克己; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Apr. 2014

CMOS R-2Rラダー型D/Aコンバータの線形性向上法
蓬田拓夢; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Mar. 2014

論理閾値変換回路を用いたRing-VCOの発振周波数範囲拡大に関する研究
塩野谷雅仁; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Mar. 2014

Set Operating Processor (SOP)-Application for Image recognition
井上克己; レドゥクフン; 曽和将容; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Oct. 2013

Input common-mode voltage of the opamp improved wide-area by bulk-control
大澤衛; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Aug. 2013

Improvement Linearity of the DAC with Unification of the MOSFET's Operating Region
蓬田拓夢; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Aug. 2013

Low-Power High-Speed Rail-to-Rail Voltage Buffer for LCD Drivers
塚本洋介; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Dec. 2011

Design of a Low Error LUT-based Truncated Multiplier
ホアンヴァンフック; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Dec. 2010

Study on CMOS R-2R Ladder for Linearity Optimization by Adjust Channel Width
加藤雄大; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Dec. 2010

High Transient Performance of Low-Dropout(LDO) regulator
Fouzhiwei Tong; 範公可
Oral presentation, Japanese, 電子情報通信学会,シリコン材料・デバイス研究会 (SDM)
Nov. 2010

電流モード型DC-DCコンバータ用小型適応型スロープ補償回路
柴田公男; 範公可
Oral presentation, Japanese, 電子情報通信学会,ディペンダブルコンピューティング, システムLSI設計技術, コンピュータシステム, リコンフィギャラブルシステム, 集積回路, 電子部品・材料
Dec. 2009

入力電圧及び負荷変動に適応する効率且つシンプルなDC-DCコンバータ
張品; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Dec. 2009

Wide Swing, Low Gain Error Voltage Buffer with Adaptive Biasing for Improving Slew-rate
ジャガトジョティギミレ; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路研究会 (ICD)
Dec. 2009

高速ソフトスタート制御回路を用いた電流モードDC-DCコンバータ
柴田公男; 範公可
Oral presentation, Japanese, 電子情報通信学会,電子部品・材料, 機構デバイス, 有機エレクトロニクス
Jun. 2009

Inrush Current Limiting Circuit For for Low Dropout Regulator
Socheat HENG; Cong-Kha PHAM
Oral presentation, English, 2009年度第22回回路とシステム軽井沢ワークショップ
Apr. 2009

可変相互コンダクタンス,非線形性,可変利得増幅器,演算トランスコンダクタンス
池本真樹; 範公可
Oral presentation, Japanese, 電子情報通信学会,VLSI設計技術研究会 (VLD)
Sep. 2008

64ビットディジタルコンパレータ
高橋俊太郎; 範公可
Oral presentation, Japanese, 電子情報通信学会,集積回路/情報センシング研究会
Jul. 2006

A Simple Differential Voltage Comparator
Christopher NTYANGIRI; Cong-Kha PHAM
Oral presentation, English, IEICE Technical Report, VLD2005-38, SDM 2005-157,STM05-02, Vol.105, No. 307
Sep. 2005

可変閾値のラプラス－ガウスフィルタを用いた多値画像輪郭抽出法
山野公太郎; 範公可
Oral presentation, Japanese, 信学技報，SIP2005－85～95
Sep. 2005

ニューロンの新準同期更新方法
Le Thanh Nhat; 範公可
Oral presentation, Japanese, 電子情報通信学会ニューロコンピューティング研究会,電子情報通信学会ニューロコンピューティング研究会
Dec. 2004

カスタムプロセッサによるOgg Vorbisデコーダの実装
渡辺智是; 範公可
Oral presentation, Japanese, 2004度電子情報通信学会春季全国大会,2004度電子情報通信学会春季全国大会
Mar. 2004

FPGAを用いた大規模集積行列演算回路の実装に関する研究
崔巍; 範公可
Oral presentation, Japanese, 電子情報通信学会回路とシステム研究会
Nov. 2003

高精度乗算のための確立演算ビットストリム
福田真人; 範公可
Oral presentation, Japanese, 電子情報通信学会回路とシステム研究会
Nov. 2003

パラレルGLFSRを用いた自己組み込みテストにおける故障検出率の向上に関する研究
東裕貴; 範公可
Oral presentation, Japanese, 電子情報通信学会回路とシステム研究会
Nov. 2003

1補数を用いたパラレル乗算器の設計に関する研究
青山達也; 範公可
Oral presentation, Japanese, 2002度電子情報通信学会春季全国大会
Mar. 2003

FPGAを用いたパイプライン化FFTの設計及び実装に関する研究
青山達也; 範公可
Oral presentation, Japanese, 第６回システムLSIワークショップ
Nov. 2002

ハードウェア設計システム及びシステムの集積化その２
範公可
Others, Japanese, Domestic conference
1999

ハードウェア設計システム及びシステムの集積化その１
範公可
Others, Japanese, Domestic conference
Feb. 1998

MOSISへのアクセス
フアムコンカ
Others, Japanese, Domestic conference
Jul. 1995

カオティック・トランジスタ回路における同期現象
ファム・コン・カー; 伊久信; 田中衞
Oral presentation, Japanese, 1995年度電子情報通信学会春季全国大会
Mar. 1995

カオティック・トランジスタ回路における同期現象
ファム・コン・カー; 伊久信; 田中衞
Oral presentation, Japanese, 1995年度電子情報通信学会春季全国大会
Mar. 1995

カオティック・トランジスタ
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, Japanese, 電子情報通信学会非線形問題研究会
Sep. 1994

カオティック・トランジスタ
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, Japanese, 電子情報通信学会非線形問題研究会
Sep. 1994

Chaotic Behavior in CMOS Inverters Ring
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, Japanese, 1994年度第７回回路とシステム軽井沢ワークショップ
Apr. 1994

Chaotic Behavior in CMOS Inverters Ring
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, Japanese, 1994年度第７回回路とシステム軽井沢ワークショップ
Apr. 1994

カオティックＣＭＯＳインバータ
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, Japanese, 電子情報通信学会非線形問題研究会
Mar. 1994

カオティックＣＭＯＳインバータ
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, Japanese, 電子情報通信学会非線形問題研究会
Mar. 1994

Chaotic CMOS Inverters
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, English, 1994年度電子情報通信学会春季全国大会
Mar. 1994

Chaotic CMOS Inverters
ファム・コン・カー; 田中衞; 庄野克房
Oral presentation, English, 1994年度電子情報通信学会春季全国大会
Mar. 1994

濃淡画像を2値化するCMOS網膜チップ
ファム・コン・カー; 池上宗光; 田中衞; 庄野克房
Oral presentation, Japanese, 電子情報通信学会非線形問題研究会
Mar. 1993

濃淡画像を2値化するCMOS網膜チップ
ファム・コン・カー; 池上宗光; 田中衞; 庄野克房
Oral presentation, Japanese, 電子情報通信学会非線形問題研究会
Mar. 1993

CMOS Digital Retinal Chip with 1-bit Neurons for Image Coding
ファム・コン・カー; 池上宗光; 田中衞; 庄野克房
Oral presentation, English, 1993年度電子情報通信学会春季全国大会
Mar. 1993

CMOS Digital Retinal Chip with 1-bit Neurons for Image Coding
ファム・コン・カー; 池上宗光; 田中衞; 庄野克房
Oral presentation, English, 1993年度電子情報通信学会春季全国大会
Mar. 1993

Implementation of Fuzzy Processors on CMOS Digital IC
C-K. Pham; K. Shono
Public symposium, English, Japan-Korea Joint Seminar, Japan-Korea Joint Seminar
1992

領域アクセスを行うビットマップメモリバンク
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 第4回マイクロエレクトロニクスシンポジウム (MES'91)
May 1991

領域アクセスを行うビットマップメモリバンク
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 第4回マイクロエレクトロニクスシンポジウム (MES'91)
May 1991

アナログ・ディジタル・バランス回路を用いた逐次比較型Ａ／Ｄコンバータ
ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 1991年度電子情報通信学会春季全国大会
Mar. 1991

アナログ・ディジタル・バランス回路を用いた逐次比較型Ａ／Ｄコンバータ
ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 1991年度電子情報通信学会春季全国大会
Mar. 1991

Successive Approximation Analog-to-Digital conversion employing Analog-Digital-Balance Circuit - Part 2
C-K Pham; K. Shono
Public symposium, English, Japan-Korea Joint seminar, Japan-Korea Joint seminar
1991

BITDRC : A Hardware Accelerator for Design-Rule Checking on bit-mapping CAD System
C-K. Pham; K. Shono
Public symposium, English, Japan-Korea joint seminar, Japan-Korea joint seminar
1991

ビットマップ方式を取り入れたメモリバンク
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 電子情報通信学会集積回路研究会
May 1990

ビットマップ方式を取り入れたメモリバンク
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 電子情報通信学会集積回路研究会
May 1990

A CMOS Digital Macro-Cell Compiler
ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 電子情報通信学会春季全国大会
Mar. 1990

A CMOS Digital Macro-Cell Compiler
ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 電子情報通信学会春季全国大会
Mar. 1990

ビットマップ方式を取り入れたメモリバンク
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 1990年度電子情報通信学会春季全国大会
Mar. 1990

ビットマップ方式を取り入れたメモリバンク
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 1990年度電子情報通信学会春季全国大会
Mar. 1990

教材としての２ビットマイクロコンピュータ-LSI Dry and Wet Laboratory-
庄野克房; 姜黎一; ファム・コン・カー
Oral presentation, Japanese, マイクロエレクトロニクス研究開発機構リサーチコミュニケーション
Jan. 1990

教材としての２ビットマイクロコンピュータ-LSI Dry and Wet Laboratory-
庄野克房; 姜黎一; ファム・コン・カー
Oral presentation, Japanese, マイクロエレクトロニクス研究開発機構リサーチコミュニケーション
Jan. 1990

Bitmap-IV : A Layout system for Manual and Automatic Design
C-K. Pham; K. Shono
Public symposium, English, Japan-Korea Joint seminar, Japan-Korea Joint seminar
1990

Successive Approximation Analog-to-Digital conversion Employing Analog-Digital-Balance Circuit - Part 1
C-K Pham; K. Shono
Public symposium, English, Japan-Korea Joint seminar, Japan-Korea Joint seminar
1990

画像処理用ビットマップメモリバンク
ファム・コン・カー; 高窪統; 庄野克房
Oral presentation, Japanese, 1989年度電子情報通信学会春季全国大会
Mar. 1989

画像処理用ビットマップメモリバンク
ファム・コン・カー; 高窪統; 庄野克房
Oral presentation, Japanese, 1989年度電子情報通信学会春季全国大会
Mar. 1989

ビットマップメモリバンクの設計
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 電子情報通信学会集積回路研究会
Mar. 1988

ビットマップメモリバンクの設計
高窪統; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 電子情報通信学会集積回路研究会
Mar. 1988

ＶＴＬを用いた手書き論理回路図認識システム
高窪統; 山本美奈; ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 電子情報通信学会集積回路研究会
Dec. 1987

ビットアクセス用ＤＲＡＭコントローラの設計
ファム・コン・カー; 庄野克房
Oral presentation, Japanese, 1987年度電子情報通信学会学生研究発表会
Dec. 1987

Affiliated academic society

電子情報通信学会

IEEE

情報処理学会

日本鑑識科学技術学会

Signal Processing

Industrial Property Rights

曖昧さを含む情報の検出機能を備えた半導体及びこの半導体を組み込んだ装置
Patent right, 特願2011-201425, Date applied: 2011