AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Faraone, Julian; Kumm, Martin; Hardieck, Martin; Zipf, Peter; Liu, Xueyuan; Boland, David; Leong, Philip H. W.

Published in

arXiv, 2019

DOI: 10.48550/arxiv.1911.08097

Institute of Electrical and Electronics Engineers, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1(28), p. 115-128, 2020

DOI: 10.1109/tvlsi.2019.2939429

Tools

Export citation

Search in Google Scholar

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Journal article published in 2020 by Julian Faraone

, Martin Kumm

, Martin Hardieck, Peter Zipf

, Xueyuan Liu, David Boland

, Philip H. W. Leong

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

Low-precision arithmetic operations to accelerate deep-learning applications on field-programmable gate arrays (FPGAs) have been studied extensively, because they offer the potential to save silicon area or increase throughput. However, these benefits come at the cost of a decrease in accuracy. In this article, we demonstrate that reconfigurable constant coefficient multipliers (RCCMs) offer a better alternative for saving the silicon area than utilizing low-precision arithmetic. RCCMs multiply input values by a restricted choice of coefficients using only adders, subtractors, bit shifts, and multiplexers (MUXes), meaning that they can be heavily optimized for FPGAs. We propose a family of RCCMs tailored to FPGA logic elements to ensure their efficient utilization. To minimize information loss from quantization, we then develop novel training techniques that map the possible coefficient representations of the RCCMs to neural network weight parameter distributions. This enables the usage of the RCCMs in hardware, while maintaining high accuracy. We demonstrate the benefits of these techniques using AlexNet, ResNet-18, and ResNet-50 networks. The resulting implementations achieve up to 50% resource savings over traditional 8-bit quantized networks, translating to significant speedups and power savings. Our RCCM with the lowest resource requirements exceeds 6-bit fixed point accuracy, while all other implementations with RCCMs achieve at least similar accuracy to an 8-bit uniformly quantized design, while achieving significant resource savings.

Published in

Links

Tools

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Abstract