Design tradeoff analysis of floating-point adder in FPGAs
Field Programmable Gate Arrays (FPGA) are increasingly being used to design high end computationally intense microprocessors capable of handling both fixed and floating-point mathematical operations. Addition is the most complex operation in a floating-point unit and offers major delay while taking significant area. Over the years, the VLSI community has developed many floating-point adder algorithms mainly aimed to reduce the overall latency. An efficient design of floating-point adder onto an FPGA offers major area and performance overheads. With the recent advancement in FPGA architecture and area density, latency has been the main focus of attention in order to improve performance. Our research was oriented towards studying and implementing standard, Leading One Predictor (LOP), and far and close data-path floating-point addition algorithms. Each algorithm has complex sub-operations which lead significantly to overall latency of the design. Each of the sub-operation is researched for different implementations and then synthesized onto a Xilinx Virtex2p FPGA device to be chosen for best performance. This thesis discusses in detail the best possible FPGA implementation for all the three algorithms and will act as an important design resource. The performance criterion is latency in all the cases. The algorithms are compared for overall latency, area, and levels of logic and analyzed specifically for Virtex2p architecture, one of the latest FPGA architectures provided by Xilinx. According to our results standard algorithm is the best implementation with respect to area but has overall large latency of 27.059 ns while occupying 541 slices. LOP algorithm improves latency by 6.5% on added expense of 38% area compared to standard algorithm. Far and close data-path implementation shows 19% improvement in latency on added expense of 88% in area compared to standard algorithm. The results clearly show that for area efficient design standard algorithm is the best choice but for designs where latency is the criteria of performance far and close data-path is the best alternative. The standard and LOP algorithms were pipelined into five stages and compared with the Xilinx Intellectual Property. The pipelined LOP gives 22% better clock speed on an added expense of 15% area when compared to Xilinx Intellectual Property and thus a better choice for higher throughput applications. Test benches were also developed to test these algorithms both in simulation and hardware. Our work is an important design resource for development of floating-point adder hardware on FPGAs. All sub components within the floating-point adder and known algorithms are researched and implemented to provide versatility and flexibility to designers as an alternative to intellectual property where they have no control over the design. The VHDL code is open source and can be used by designers with proper reference.