Platform-Based Electronic System-Level (ESL) Synthesis

There was an interesting talk earlier today at Microsoft Research by Jason Cong of the UCLA Computer Science Department on compiling design specifications in C/C++/SystemC and user constraints into ASIC and FPGA design. The advantage of compiler based approaches include, more productivity working at a higher level, automating verification, allows optimization, and allows rapid experimentation at different frequencies and different optimization goals (performance vs power, for example). As design complexity increases higher level language and optimization have excellent potential. Essentially the same thing is happening in hardware design as happened 30 years ago in operating systems implementation languages. High level implementation languages replace lower level as complexity climbs. For example, the Intel Core 2 Duo processor is a 1B transistor implementation whereas the 386 was 1/1000th of that complexity at 1M.

Also super interesting was the example from a financial institution that is taking a software based stock analysis system where they take the hottest parts of the system and compile these to FPGA implementations. 30x faster at 1/10th the power. Very cool.

Now that AMD supports hyper transport it is possible to implement custom processors with excellent overall system performance. Intel has opened up the FSB and is also expected to offer a non-compatible hyper transport-like implementation in the future.

My rough notes from the talk follow:

· Speaker: Jason Cong, UCLA Computer Science (cong@cs.ucla.edu)

· Working on on-chip interconnects & communications

o 3D IC design

o RF-interconnects

§ Note that power restrictions restrict processors to ~5GH

· But, communications lines can scale to 100s of GH

· Dividing communications link into 10 or more “channels” that operate at different frequencies

· This talk focused on ESL SystemC to FPGA compiler

· Why?

o 700,000 lines of RTL for a 10M gate design is too much

o Allows executable specification

o Verification requires executable design

o Accelerated computing or reconfigurable computing also need C/C++ based compilation/synthesis to FPGAs

§ CPUs coupled with FPGA to support common functions at high performance and lower power

· Note that performance limited by communications (getting data to the CPU)

o Long wires that have to be traversed in a single clock are the limiting factor

o This research focuses on supporting multi-cycle communications

· xPilot: Behavior-to-RTL (Register Transfer Level design) synthesis flow

o takes behavior spec in C/SystemC to front-end compiler to SSDM

o SSDM is optimized using standard compiler optimization (loop unrolling, strength reduction, scheduling, etc.)

o SSDM is compiled to:

§ Verilog/VHDL/SystemC

§ FPGAs: Altera, Xilinx

§ ASICs: Magma, Synopsys

o UPS: Uniform Power Specification

o During final compilation optimize for power and shut off compute units that are not being used and shut off those that are being during idle periods (a busy disk controller is frequently waiting for mechanicals and not need to execute instructions)

§ Can’t shut FPGAs but can with ASICs

§ The only solution to dynamic power leakage only solution is shut the component of

o Allows faster experimentation than hand coding. You can try different frequencies and different power optimizations (too complex for most humans)

o Scheduling (allocation of operations to compute logic and specific clock cycles) is NP-complete and automated techniques can exceed quality of expert designs

· Example:

o Schedule the behavior to RTL using the following characterization, cycle time, constraints, and objectives:

§ Platform characterization: adder (2ns) & multiplier (5ns)

§ Target cycle time (10ns)

§ Resource constraint: only one multiplier is available

§ Objective: high performance or lower power as examples

· Note as optimizing and reducing component counts, less space is required, which can allow faster clocking

· Investigating compilation for Reconfigurable Accelerated Computing

o Take GCC 3DES implementation and synthesis FPGS RTL description

o Example took 3DES from C level implementation to a FPGA (Xilinx Virtex-5)

· Investment bank is using this tool to compile financial optimizations from S/W implementation to FPGA accelerators (Black-Scholes S/w kernel)

o 30x speed-up over software implementation and 1/10 the power (6 vs 68W)

A related presentation: http://cadlab.cs.ucla.edu/~cong/slides/fpt05_xpilot_final.pdf.

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.