Low-Power Design of Ethernet Data Transmission

2014-03-24 05:40WenMingPanQinZhangJiaFengChenHaoYuanWangandJiaChongKan

Journal of Electronic Science and Technology 2014年4期

Wen-Ming Pan, Qin Zhang, Jia-Feng Chen, Hao-Yuan Wang, and Jia-Chong Kan

Wen-Ming Pan, Qin Zhang, Jia-Feng Chen, Hao-Yuan Wang, and Jia-Chong Kan

—— For the reliability and power consumption issues of Ethernet data transmission based on the field programmable gate array (FPGA), a low-power consumption design method is proposed, which is suitable for FPGA implementation. To reduce the dynamic power consumption of integrated circuit (IC) design, the proposed method adopts the dynamic control of the clock frequency. For most of the time, when the port is in the idle state or lower-rate state, users can reduce or even turn off the reading clock frequency and reduce the clock flip frequency in order to reduce the dynamic power consumption. When the receiving rate is high, the reading clock frequency will be improved timely to ensure that no data will lost. Simulated and verified by Modelsim, the proposed method can dynamically control the clock frequency, including the dynamic switching of high-speed and low-speed clock flip rates, or stop of the clock flip.

Index Terms——Clock frequency, Ethernet, field programmable gate array, low-power consumption.

1. Introduction

Since 1960s, when the IC was born, the integration of IC has been trying to be improved in the design, from the integration of hundreds of thousands of transistors to current integration of hundreds of millions of transistors on a single chip IC, it makes the design of circuit system achieve tremendous development. However, when the speed optimization and area reduction have been improved better and better in the circuit design, the designers have to consider the power consumption issue, which is also veryimportant[1]. Meanwhile, due to the ingrained impact of Von·Neumann architecture on modern IC, on one hand the command system can provide flexible usages, but on the other hand, frequent reading and writing of memory bring a great increase of power consumption. Although the processor can be allowed to work at a higher frequency technically, power consumption prevents further enhancing the dominant frequency, which is also named as the power consumption wall[2]. Many ways have been proposed to reduce the power consumption. At the hardware level, there are clock gating, variable frequency clock, low power cell library, low power consumption design of cache, and so on. In the operating system layer, various state information acquisition systems reduce power consumption through a variety of algorithms to control the hardware equipment. In real life, the transmission ports of Ethernet are idle for most of the time, which brings reliability and power consumption issues of Ethernet data transmission based on the field programmable gate array (FPGA). So we propose a low-power consumption design method suitable for FPGA implementation. The design mainly aims to realize the dynamic management of clock, with the function of gating clock and controlling the clock frequency, it can be suitable for high-speed data transmission of Ethernet, and achieve low-power consumption stably and reliably.

2. Theoretical Background

2.1 Composition of Power Consumption

The power consumption of digital IC is mainly composed of two parts: static and dynamic power consumption[3]. The total power consumption can be represented by

wherePbootrepresents starting power consumption,Pstdenotes static power consumption, andPdynis dynamic power consumption.

2.2 Power Consumption Analysis

Static power consumption mainly arises from the drain current of transistors, which is composed of the leakage current from the source electrode to the drain electrode and the drain current from the grid electrode to the substrate. Dynamic power consumption arises from the charging and discharging of the capacitor, and its main relatedparameters are the voltage, node capacitance, and operating frequency. As shown in (2), the dynamic power consumption of signal node is proportional to the node capacitance, work frequency, and the core voltage:

whereais the average times of charging and discharging of the capacitor within a node cycle,cis the node capacitance,Vddis the operating voltage, andfis the clock frequency[4].

The dynamic power consumption of FPGA mainly reflects the power consumption of memory, internal logic, clock, and input/output (I/O). In the general design, the dynamic power consumption accounts for more than 90% of the power consumption in the entire system, so reducing the dynamic power consumption is a key factor in reducing the power consumption in the entire system[5].

3. Design Methods of Reducing Dynamic Power Consumption

According to the composition of power consumption mentioned before, reducing the power consumption of FPGA focuses on reducing the static and dynamic power consumption. The design implemented in this paper studies the clock in digital circuits, mainly to reduce the dynamic power consumption. Here is a brief introduction to the processing method for the clock to reduce the dynamic power consumption.

3.1 Set Gating Clock

When the data that relate to the register are on bus, generally, the chip selection or clock enable logic is adopted to control the register, which is called the register transfer level (RTL) low-power technology. The power consumption is reduced mainly by reducing the undesirable jump through the register. For example, a set of registers could let their clock source turn off, so that the gates do not switch, there is no charging or discharging and thus no wasted power[6]. In digital circuits, the clock flip will inevitably trigger the action of the timing unit. If it is in an idle state, the same value will be loaded repeatedly into the subsequent register. Therefore, the gating clock circuit is adopted, which will timely close the register clock and prevent the clock from triggering registers. This method can reduce by 30% to 40% of power consumption[7].

3.2 Control Clock Off

The clock should be controlled to reduce the dynamic power consumption. If some parts of the design are in an inactive state, prohibiting the flip of the clock tree can be considered, instead of using the enable clock. Although the enable clock can prevent unnecessary flips of registers, the clock tree still can flip and consume power[7]. So it is quite necessary to control the clock frequency. We have already known that the power consumption of the complementary metal-oxide-semiconductor transistor (CMOS) circuit relates to the frequency, so the logical clock which is dynamically off in the idle state has obvious energy-saving effect[8].

As mentioned above, the design mainly aims to realize the dynamic management of clock, with the function of gating clock and controlling the clock frequency, it can be suitable for high-speed data transmission of Ethernet, and achieve low-power consumption stably and reliably.

4. Implementation of Dynamic Clock Management

The proposed design has realized the dynamic management of clock mentioned in Section 3, which has been used in high-speed data transmission of Ethernet based on FPGA. The data from the external interface will be stored in first in first out (FIFO) firstly, then be read and sent to the downstream module for processing. For most of the time, when the port is in the idle or lower rate states, the system can reduce or even turn off the read clock frequency to reduce dynamic power consumption itself. When the receiving rate is higher, the system increases the reading clock frequency timely to ensure that no data will be lost.

4.1 Innovation and Advantages

The clock power is reduced in the design by using clock gating techniques[9]. Compared with the single clock control technology, the dynamic clock management can combine the single data enable technology namely the gating clock with the clock-off technology, and the control of the value of clock frequency, which can meet the needs of more respects. And the dynamic power consumption minimization with no loss of the transmission data is guaranteed. The dynamic clock management approach is to define corresponding clock frequency by the amount of data, and the clock can work at the medium-speed frequency, high-speed frequency, even be off separately depending on the amount of data, and the frequency band can be set for different kinds of situations depending on specific circumstances. Fig. 1 shows the relationship between the clock frequency and the amount of data stored in FIFO.

Fig. 1. Dynamic clock frequency setting.

In Fig. 1,y1andy2are the deceleration threshold and acceleration threshold, respectively. Corresponding requests will be issued when the amount of data stored within the FIFO triggers the threshold. Then the clock generation unit will generate the clock with the corresponding frequency according to the current amount of data after receiving the request signal; if the data is empty, the clock will be turned off.

For high-speed data transmission of Ethernet, compared with generating the requests of clock on or off through monitoring whether there is a data stream by software, the dynamic clock management can launch quick real-time processing by triggering the clock control according to the amount of data stored in the FIFO, thus reducing more power consumption. Because instead of using the software interrupt request mode, this technique directly implements the function of software mentioned before by the IC circuit through the innovative architecture, which can not only greatly improve the performance, but also significantly reduce the power consumption, meanwhile, ensuring the flexibility of usage. Because the software interrupt request mode needs the clock tree always on for checking whether there are data in or not to decide if it should send an interrupt to turn off the gating clock, while using the IC circuit, it is not necessary.

4.2 Program Design

As shown in Fig. 2, the dynamic clock management program mainly includes the clock generation module, and status information modules of multiple receiving ports. The proposed system is applied to the Ethernet data transmission of FPGA. Different receiving ports of users respectively receive different amount of data without interference and using FIFO as the buffer. In order to control the reading clock frequency of FIFO, the design system can automatically detect the FIFO memory depth and dynamically transmit the corresponding clock processing requests to ensure that FIFO will not overflow and the data will not get lost even using the dynamical power consumption reduction.

Fig. 2. Dynamic clock management architecture.

A．Receiving Port Module

Each receiving port contains a module of receiving FIFO and state generation. The writing clock of receiving FIFO is an interface clock; different interfaces have different clock frequencies. The reading clock of receiving FIFO is an internal clock generated and controlled by the clock generation module. Through controlling the reading clock, the low-power consumption design can be realized. Because the interfaces of users are idle for most of time, generally, the reading clock is turned off or in the low-frequency state, which realizes the reduction of clock flip frequency and the reduction of dynamic power consumption.

The state generation module will issue the acceleration, deceleration, and clock off requests based on the storage state of receiving FIFO. The state generation module has the deceleration thresholdy1and acceleration thresholdy2, as shown in Fig. 3. When the stored data of receiving FIFO is larger than the acceleration thresholdy2, the acceleration request is valid, which requests the clock generation module to increase the clock frequency. When the stored data of receiving FIFO is less than the deceleration thresholdy1, the deceleration request is valid, which indicates that the clock frequency of the clock generator module should be reduced. When the stored data of receiving FIFO is 0, the clock-off request will be generated, then the clock generation module will turn off the clock.

Fig. 3. Threshold distribution.

The value of acceleration thresholdy2depends on the memory depth and acceleration delay of FIFO. Assume that the memory depth of FIFO isM, the maximum writing rate isz, the initial reading rate isx, and the delay istfrom the generation of acceleration request to the clock frequency becomingzthat is the maximum reading and writing rates. During the delay, the data stored in the FIFO will add (z-x)t, so in order to ensure that the FIFO does not overflow and the data will not be lost,y2should be

In order to avoid frequent acceleration and deceleration requests, the deceleration thresholdy1must be less than the acceleration thresholdy2, and there is a certain gap.

B. Clock Generation Module

The clock generation module consists of phase lockedloop (PLL), clock control, and clock divider units.

The PLL unit generates the high-frequency clock of 550 MHz, which is used as the work clock for the clock control unit and clock divider unit. This unit is generated by the intellectual property (IP) core of PLL.

The clock control unit receives the acceleration and deceleration requests of all ports. And the frequency division factor used for dividing the clock frequency will be generated in this unit according to the actual situation of each port, and then sent to the clock divider unit.

The clock divider unit generates the required work clock for each port by using the 550 MHz clock generated by the PLL unit and the frequency division factor generated by the reading clock control unit. Through the clock divider unit, we can control the clock dynamically, thus ensuring that the clock frequency is minimized on the premise that the data will not overflow and lose. As mentioned above, the short circuit current consumption in the register file is proportional to the flip power consumption and clock flip frequency of the transmission network, which realizes the minimization of the power consumption.

4.3 Cases Verification

The proposed design will be elaborated by taking the Ethernet data transmission as an example. In the experiment, program A and program B are used. Program A is a traditional design method, namely the design with the fixed reading clock frequency; program B is the proposed new program with the dynamic clock design. To avoid Ethernet transmission data FIFO overflowing and then resulting in the data loss, program A uses the clock more than 125 MHz, and sets its clock frequency of 137 MHz, which is the frequency after acceleration in program B. Assume that the maximum dynamic power of program A isS, and the total power consumption is set toWa. For program B, the data transmission within different thresholds has different clock frequencies. The writing clock of Ethernet transmission data FIFO is 125 MHz. The reading clock of transmission data FIFO generated after the deceleration request of the design is 27 MHz. The reading clock frequency of the middle frequency band is 80 MHz. the reading clock frequency after acceleration is 137 MHz. And the output clock frequency after the turning off request is 0. And different clock frequencies will have different dynamic power consumption. The total power consumption of program B isWb. Four different frequencies have been used for the design implementation of program B. The expression ofWbis

wherexrepresents the current amount of data storage in FIFO andk1andk2are constants. Its schematic diagram is roughly as follows:

Fig. 4. Power consumption.

Set the operating time of the whole system asT, where the truncation time ist0, low-speed running time ist1, medium-speed running time ist2, the high-speed ist3, then the total dynamic power consumption of program A is

And the total dynamic power consumption of program B is

Through the analysis of the research data by our internal team, we can draw the relevant information of the Ethernet channel utilization ratio. When the Ethernet sends message, the ports of Ethernet are in the idle or lower-rate state for most of the time, so in general, we assume thatt0=60%T,t1=20%T,t2=10%T, andt3=10%T.

As mentioned above, the larger the clock frequency is has, the greater the dynamic power consumption is; therefore, the maximum mid-band and low-band power consumption of program B can be gained based on the maximum dynamic power consumptionSof program A as the standard:

Then:

Formula (4) shows that program B which adopts dynamic clock management design saves 80% of dynamic power consumption than program A.

For the design of clock off, such as the multi-core digital signal processing (DSP) low-power consumption design for software defined radio (SDR) applications proposed by Xu[10], his design has implemented the multi-core clock control, thus achieving low-power consumption. But its clock only has two states, namely off and on states. Compared with the proposed design, it has no medium- and low-frequency states. Through calculation by using the above method, the power consumption of thesystem proposed by Xu is 40%Wa, which is 20%Wahigher than the power consumption of the proposed design.

5. Conclusions

This paper implements the low-power consumption design of Ethernet data transmission and dynamic clock management based on FPGA. The proposed system can generate the corresponding clock frequency according to the amount of data storage of FIFO, which can work at off, low-speed, medium-speed, and high-speed frequency separately. The frequency band can be adjusted depending on the FIFO depth and other specific circumstances. The whole design is completed in cooperation with the clock generation module and the receiving port module. In the proposed design, the minimum clock is adopted on the premise that the transmission data will not be lost and the dynamic power consumption is minimized by minimizing the unnecessary register flips. The proposed system can achieve the reliability and low-power consumption Ethernet data transmission based on FPGA.

[1] F.-M. Sun, H.-Y. Wang, F. Wu, and X.-Y. Li, “Survey of FPGA low power design,” inProc. of Int. Conf. on Intelligent Control and Information Processing, Dalian, 2010, pp. 547-550.

[2] W.-S. Jun and S.-J. Wei “Research and progress of low-power design in SoC era,”Microelectronics, vol. 35, no. 2, pp.174-179, 2005 (in Chinese).

[3] P. Kitsos, “Low power FPGA implementations of 256-bit Luffa Hash function,” inProc. of the 13th Euromicro Conf. on Digital System Design: Architectures,Methods and Tools, Washington DC, 2010, pp. 416-419, 2010.

[4] Y.-J. Song, T.-F. Xu, G.-Q. Ni, K. Gao, and Q. Wang,“Low-power consumption image fusion system based on Virtex_4 FPGA,”Optics and Precision Engineering, vol. 15, no. 6, pp. 935-940, Jun. 2007 (in Chinese).

[5] X. Han and W.-C. Guo, “Research on low-power consumption design of FPGA,”Microcontrollers & Embedded Systems, vol. 3, pp. 9-11, Jan. 2010 (in Chinese).

[6] A. Natkha, J. Palicot, P. Leray, and Y. Louet, “Leakage power consumption in FPGAs: thermal analysis,” inProc. of Int. Symposium on Wireless Communication Systems, Paris, 2012, pp. 606-610.

[7] Y.-P. Liang, “Review of digital low-power consumption design,”Advanced Technology Research, vol. 3, pp. 47-50, Apr. 2009 (in Chinese).

[8] G. Larri, “ARM810: Dancing to the beat of a different drum, hotchips 8: A symposium on high-performance chips,” Ph.D. dissertation, Stanford University, Palo Alto, 1996.

[9] B. Pandey, J. Yadav, J. Kumar, and R. Kumar, “Clock gating aware low power global reset ALU and implementation on 28 nm FPGA,” inProc. of the 5th Int. Conf. on Computational Intelligence and Communication Networks, Mathura, 2013, pp. 413-417.

[10] L. Xu, S.-B. Shi, and Q. Wang, “Low-power design and implementation on multi-core DSP in SDR platform,”Journal of University of Electronic Science and Technology of China, vol. 41, pp. 136-141, Jan. 2012 (in Chinese).

Wen-Ming Pan was born in Guangdong, China in 1982. He received the B.S. and M.S. degrees in electronic engineering from Jinan University, Guangzhou in 2004 and 2007, respectively. He works as a research assistant with Guangzhou Institute of Advanced Technology, Chinese Academy of Sciences. He has been engaged in the FPGA design research for years. His research interests include networks-on-chip and parallel computing.

Qin Zhang was born in Hubei, China in 1989. She received her B.S. degree from Hainan Normal University, Hainan. Currently, she is pursuing her M.S. degree in electronic and communication engineering with South China Normal University, Guangzhou. Her research interest is digital image processing.

Jia-Feng Chen was born in Guangdong, China in 1993. He is pursuing the B.S. degree with Guangdong University of Technology, Guangzhou. His research interests include image processing and parallel computing.

Hao-Yuan Wang was born in Guangdong, China in 1991. He is pursuing his B.S. degree with South China Normal University, Guangzhou in 2015. His research interests include image processing and parallel computing.

Jia-Chong Kan was born in Liaoning, China in 1990. He received his M.S. degree in circuit and system from South China Normal University, Guangzhou in 2014. He works as an intern with the Guangzhou Institute of Advanced Technology, Chinese Academy of Science. His research interests include image processing and parallel computing.

Manuscript received August 11, 2014; revised November 12, 2014. This work is supported by the Natural Science Foundation of China under Grant No. 61376024 and No. 61306024, Natural Science Foundation of Guangdong Province under Grant No. S2013040014366, and Basic Research Programme of Shenzhen under Grant No. JCYJ20140417113430642 and No. JCYJ20140901003939020.

W.-M. Pan is with the Guangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou 511400, China (Corresponding author e-mail: wm.pan@giat.ac.cn).

Q. Zhang, H.-Y. Wang, and J.-C. Kan are with the School of Physics and Telecommunication Engineering, South Normal China University, Guangzhou 510006, China.

J.-F. Chen is with the School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China.

Digital Object Identifier: 10.3969/j.issn.1674-862X.2014.04.006

Journal of Electronic Science and Technology2014年4期

Journal of Electronic Science and Technology的其它文章: Study on Temperature Distribution of Specimens Tested on the Gleeble 3800 at Hot Forming Conditions; Automatic Vessel Segmentation on Retinal Images; Family Competition Pheromone Genetic Algorithm for Comparative Genome Assembly; Quantification of Cranial Asymmetry in Infants by Facial Feature Extraction; Intrinsic Limits of Electron Mobility inModulation-Doped AlGaN/GaN 2D Electron Gas by Phonon Scattering; Real-Time Hand Motion Parameter Estimation with Feature Point Detection Using Kinect