Applications such as wireless base station, radar signal processing, fingerprint identification and software radio all require extremely high processing power. These new high-performance DSP applications promote the performance of independent processors. In order to improve the performance, hardware solutions are constantly developing.
In the early 1990s, designers were faced with the challenge of how to use multiple processors to collect more processing power to meet their performance requirements. However, when coordinating the functions of multiple processors, system-level design becomes extremely difficult, not to mention that this method is both costly and wasteful.
When the first FPGA to realize DSP appeared, the designers of DSP began to use this device to support the processor's ability. In this method, FPGA can supplement the processor by accelerating the key part of DSP algorithm, which is very important for performance.
Nowadays, dedicated FPGA, such as Virtex 4 of Xilinx Company or Stratix II of Altera Company, have great potential and can improve performance through parallelization. In fact, the performance advantage of DSP-specific FPGA technology is 100 times higher than other implementations (table 1).
Figure 1: FPGA provides 100 times that of DSP.
MACOPS (multiplication/addition operations per second). Ma kops is.
The product of clock frequency and the number of multipliers.
Therefore, it is becoming more and more common to include a standard DSP in FPGA, and it is expected that the number of designs using FPGA in this way will increase rapidly.
Design challenge
However, with this powerful hardware capability, designers are faced with the problem of how to effectively implement these DSP systems based on FPGA. This large-scale complex design challenges the traditional DSP design method. This is largely because in the application of DSP, the traditional FPGA design process does not make full use of two key elements of an efficient design process: integration technology and portable IP.
People who use integrated technology to design ASIC are well aware of its advantages. For DSP based on FPGA, this technology is the key, which makes the design at a highly abstract level and can automatically explore the trade-off between area and performance. The combination of rapid design with high abstraction and automation can not only provide a single design example, but also provide various alternative implementation results.
For applications where performance takes precedence over area, an implementation that includes hundreds of multipliers may be required. This method will be very fast, but it will also consume a lot of silicon area. Similarly, for those applications that are more sensitive to area, the implementation scheme should use multipliers with lower performance and fewer numbers in order to obtain a smaller area. These types of compromises are very important for the development of advanced DSP based on FPGA, so powerful tools are needed.
Another key element of efficient DSP development is to have a suitable building block or IP. IP suitable for these applications has two main attributes: scalability and portability.
Compared with similar IP with relatively low applicability, extensible IP enables designers to build customized IP functions without sacrificing efficiency. The new functional module is efficient because the unused or unnecessary parts will be optimized in the subsequent synthesis process.
Portability can also ensure efficiency. DSP designers must be able to design algorithms and run them on any FPGA vendor's products without modification. This portability will provide great efficiency and freedom, so as to choose the best implementation scheme.
DSP verification also brings challenges. When verifying DSP, signal debugging and analysis become more complicated, which is not limited to checking time domain, frequency domain curve and scatter plot. Because the characteristics of digital signal depend on its sampling time and discrete amplitude, DSP verification tools must be able to effectively define and operate time in multi-rate DSP applications.
In addition, they must be easy to switch from full-precision floating-point simulation to finite-word fixed-point simulation. At the same time, they also need a language for modeling DSP algorithms, including local support for concepts such as time, fixed-point resources and parallelism.
integration
The latest development of design technology provides an exciting solution to solve the unique challenges of DSP designers. Simulink provided by Mathworks Company is a system design environment based on mathematical model, which provides powerful modeling and simulation functions for DSP designers. The environment can deal with DSP problems such as multi-rate discrete-time definition and management, single-source floating-point simulation and so on.
Figure 2: DSP design flow based on FPGA.
For FPGA implementation, DSP synthesis is a key innovation, which links DSP verification with the best DSP implementation. With the embedded function in Synplify DSP tool, designers can check the trade-offs in the implementation process and complete the target mapping in an automatic and device-independent way.
The combination of DSP and Simulink can integrate the professional knowledge of system architects and hardware designers into a common environment. System architects can create a vendor-independent model for Simulink, and keep the design breakthrough point at the level of pure algorithm, thus focusing on higher-level design functions.
When the model is handed over to the hardware designer, the specification has no architectural significance. As long as the DSP verification tool in the modeling environment allows seamless integration of the synthesis engine, the hardware designer can check the architectural trade-offs without modifying the verification source.
Because the verification sources are consistent, system architects don't have to worry about hardware implementation, and hardware designers don't have to work hard to study DSP algorithm specifications. In addition, it can ensure the integrity and optimization of the design and improve the work efficiency of the two team members.
The key of this design method is to use the general DSP library. The vendor-specific IP will make the algorithm design fall into unnecessary implementation details. Using a general DSP function library independent of architecture parameters, the design will produce output according to high-level specifications.
With the help of advanced function library, even the delay related to DSP function can be postponed to the stage of architecture optimization. This is achieved by DSP synthesis. Innovations such as DSP synthesis, Simulink and portable library are all key elements to improve DSP design, but it is also very important to integrate these functions into common methods. Best DSP design process: Cheng Kewei's existing design capabilities have increased the general library and the ability to integrate DSP synthesis with Simulink (see Figure 2).
When designing specifications, system architects only need to operate at the level of pure algorithm abstraction. By using functional modules, designers can capture algorithms with similar concepts to DSP.
In the later stage of the design process, because Simulink has the characteristics of DSP verification environment, the algorithm verification becomes very easy. Visualization, debugging and built-in accelerator make it easier for designers to realize rapid simulation of discrete-time design.
The engine of this design method is DSP synthesis, which determines the system-level goals such as area and performance. This step aims to create an architecture that can consume the least resources and achieve the required performance. By using appropriate system-level optimization techniques, such as folding, system-level retiming and increasing delay, DSP synthesis can meet system-level performance goals.
The final architecture can be generated by a comprehensive RTL code independent of the vendor. Because the design remains vendor-independent, all the functions of RTL synthesis tools can be used to perform further design optimization.
Compared with the traditional design process, the above DSP design method has obvious advantages. With the increase of the design scale, the DSP integration process has surpassed the traditional method, just because its algorithm has no delay and time synchronization of multiple paths.
Comparing the design results of DSP synthesis and traditional process, it is found that the former has been improved even under different optimization conditions. When advanced optimization is not performed during DSP synthesis, any optimization obtained is mainly due to RTL synthesis. Even without the comprehensive optimization of DSP, the number of logic units used in all test circuits will continue to decrease and the performance will be improved.
We need to consider several different optimization situations. When resource sharing is allowed, it is often hoped that the utilization rate of resources will be significantly improved, even at the expense of some performance. The test circuit has proved that the consumed resources can be significantly reduced at the expense of obvious performance degradation.
This optimization technique is most suitable for use when resources are limited but performance is allowed to drop to a certain extent. Re-timing optimization technology is another choice to improve the comprehensive performance of DSP. When using this method, although it may consume more resources, compared with single DSP synthesis and traditional design methods, the performance will be significantly improved.
In order to achieve the purpose of timing, some DSP integration solutions redistribute registers at the architecture layer and introduce some pipes. This kind of advanced timing can be supplemented by gate-level retiming, and the combination of the two methods will get the best optimization result, and obvious performance improvement can be obtained without increasing any resources.
Author: Andrew Dorman
Vice president of applied engineering
Dirk Seinhaff
Director of DSP application engineering
New city company
DSP design process from top to bottom to physical realization
Dan Ganousis, the electronic design application of AccelChip Company.
The world is at the beginning of the next wave of high-tech rapid growth. DSP has become the recognized technical focus of the industry and will grow exponentially. At present, most DSP designs can be implemented on general DSP chips provided by semiconductor manufacturers (such as T 1, ADI, Freescale, etc.). ). The price of general-purpose processors is relatively cheap, and there are high-quality and cheap programming tools to support DSP algorithms conveniently and quickly, but developers prefer to reprogram during prototype creation and debugging.
Figure 1 Comparison between the performance of general DSP processor and the processing performance of DSP needed in communication field
Demand for speed Now, the performance requirements for electronic systems have exceeded the capabilities of general DSP processors. Figure 1 shows the difference between the performance requirements of DSP algorithm driven by broadband network market and that of general DSP processor. It can be seen that the gap between the performance capacity of general DSP and the demand of new broadband communication technology is expanding exponentially.
Traditionally, the only way for DSP developers to change the performance of general DSP processors is to inject DSP algorithms into ASIC to speed up the hardware. However, this ASIC scheme is very difficult to implement. The realization of DSP algorithm on ASIC is at the expense of reprogramming flexibility, and it also requires a lot of non-repetitive design costs, long prototype initialization and the purchase of a large number of expensive integrated circuit design tools.
With the introduction of advanced FPGA architectures such as Xilinx Virtex-II and Altera Stratix-II, DSP designers can get a new type of hardware, which combines all the advantages of general DSP processors and the advanced performance of ASIC. These new FPGA architectures can optimize the implementation of DSP and provide the necessary processing power to meet the needs of today's electronic systems.
The advantage of FPGA is that it can make DSP designers "structure adaptation algorithm", and designers can make maximum use of parallel resources in FPGA according to the needs of system performance. However, in general DSP processors, the resources are fixed, because each processor only contains a limited number of basic operation functions similar to multipliers, and the designer must "adapt the algorithm to the structure", so the performance that can be obtained in FPGA cannot be achieved.
Figure 2 Global DSP Revenue Forecast
Highlights of semiconductor industry
Figure 2 shows the annual revenue forecast of the whole DSP market and the algorithm-on-chip market (composed of FPGA, structured ASIC and ASIC). Among them, the DSP chip algorithm market will grow at an annual growth rate of more than 42% in the next three years, which is the fastest growing part of the whole semiconductor field.
Now the challenge faced by DSP design team is similar to that faced by ASIC designers in 1990s-how to replace general DSP with the design method of target FPGA; How to develop the required new design skills; How to improve the company's design process; How to put forward a new DSP algorithm implementation method without endangering the current product development plan? Perhaps more importantly, how can managers minimize the possibility of catastrophic results?
AccelChip believes that the future of DSP depends on the adoption of new design methods, which must enable the company to meet the demanding requirements of the DSP market for time to market and cost. Like the emergence of ASIC and FPGA, the transformation of DSP is to adopt a real top-down design process.
Fig. 3 traditional DSP design flow
Traditional top-down design process
Traditionally, DSP design is divided into two kinds of work: system/algorithm development and software/hardware implementation. These two types of work are completed by two completely different groups of engineers, and usually the two groups are relatively separated between their respective interfaces. Algorithm developers use mathematical analysis tools to create, analyze and refine the required DSP algorithms without considering the system structure or hardware and software implementation details; System designers mainly consider the definition of function and the design of structure, and make it consistent with product description and interface standards. The hardware and software design team adopts the specifications established by system engineers and algorithm developers to complete the physical realization of DSP design.
Generally speaking, a detailed specification can be divided into many small modules, and each small module is assigned to each member. They must first understand the function of their modules.
If the goal of DSP algorithm is FPGA, structured ASIC or SoC, then the first task is to establish RTL model with hardware description languages such as Verilog or VHDL. This requires engineers to understand communication theory and signal processing in order to understand the detailed rules and specifications put forward by system engineers. It often takes one to two months to establish RTL model and simulation test platform, mainly because of the need to manually verify the accurate matching between RTL file and MATLAB model. Once the RTL model simulation environment is established, the implementation engineer will communicate with system engineers and algorithm developers, and analyze the performance, scope and function of DSP system hardware implementation.
Because system engineers can't see the physical layer design in the algorithm development stage, they usually need to modify the original algorithm and system structure, update the text specification, modify the RTL model and test platform, and re-simulate. These processes often need to be carried out several times until the performance requirements of DSP system can be realized by hardware. Then, the realization engineer uses logic synthesis to realize the standard top-down design process of FPGA/ASIC, thus mapping the RTL model to the gate netlist, and using physical design tools to set the layout netlist in a given FPGA/ASIC device. Fig. 3 shows the design flow of the basic on-chip DSP algorithm, which is mainly composed of two relatively independent parts: algorithm development and hardware implementation.
As mentioned above, only when it takes a long time to manually establish RTL model based on text specification can we avoid the delay in the design and development process caused by the lack of connection between the two design domains. However, for this design project, it is more noteworthy that the physical design of DSP algorithm is based on the subjective understanding of text specifications by hardware engineers.
Lack of DSP experts among hardware engineers often leads to disastrous consequences because of misinterpretation of required functions. With the increasing complexity of DSP, it is common to produce errors in the process of manually establishing RTL model. Because the same error is written into the simulation test platform, even if there are many errors in the simulation, it can't be caught. Hardware design errors will only be found in the prototype design stage.
improve one's method
One of the most important benefits for FPGA/ASIC designers to adopt the real top-down design method is the improvement of design data management. However, when ASIC and FPGA adopt the same bottom-up design method as the existing DSP design, many errors will be introduced due to the lack of a single effective design data source. Therefore, in today's DSP design, every independent design department has the obligation to keep the MATLAB model synchronized with the artificially created RTL model and test platform. However, as mentioned above, the two teams rarely communicate with each other and are usually geographically far apart. So it becomes very difficult to manage these data.
CoWare provides a solution to the problem of module synchronization in its SPW toolkit: the concept of aided simulation design method is introduced into the hardware design system, so as to realize the transformation from detailed specification to implementation. In this method, CoWare suggested that the DSP design team use their hardware design system with DSP hardware model base to create executable specifications, thus replacing the programming language that explains DSP specifications and algorithms.
This method has great advantages in eliminating the misinterpretation caused by hardware engineers when developing RTL model, but it still has shortcomings in ensuring the synchronization of design data. Because the executable specification needs to be modified manually every time the module is modified, especially under the double pressure of increasing complexity and shortening the time to market of products, the possibility of errors will greatly increase.
Real top-down DSP design method
Using VHDL or Verilog hardware description language, the DSP synthesis tool of Accelchip Company can directly read the MATLAB model and automatically output the RTL model and simulation test platform that can be synthesized. By connecting the two design domains of DSP, the design team of DSP is greatly simplified in manpower and time, eliminating misinterpretation, high-cost repetitive work, automatic verification of hardware implementation and the ability of system designers and algorithm developers to explore the structure in the early stage of development.
Accelchip makes it unnecessary for hardware designers to manually create RTL models and simulation test platforms, thus shortening the development cycle and reducing the number of designers needed for hardware implementation. Moreover, the automatically established RTL model is the "structural awareness" of the target FPGA device, rather than a simple inherited RTL model. After the RTL model is established, its advanced synthesis tool will create an optimal implementation of logic synthesis to ensure that the generated gate-level netlist has the advantages of FPGA devices.
For example, DSP algorithm is implemented in FPGA device columns provided by different manufacturers, and its performance and application range are very different, because different devices have different structures, logical resources, layout resources and layout methods. Through "structural awareness", Accelchip provides a good physical realization for the target FPGA device of the DSP design team. At the same time, by providing an easy-to-use automatic direct path from MATLAB to hardware implementation, DSP system designers and algorithm developers can define their algorithms in the early stage of design and development. In addition, algorithm developers can quickly convert the MATLAB design into the gate-level netlist of the target FPGA, which integrates the advantages of performance, range, cost and power consumption. With the feedback from the initial stage of physical realization or the algorithm development cycle, it means that there is less repetition in the later stage of the design process, which saves valuable time and manpower again.
label
The importance of DSP technology is increasing day by day, and the performance requirements of its algorithm far exceed the capabilities of general DSP processors, prompting the DSP implementation team to find hardware solutions. FPGA provides an ideal platform for DSP implementation, and the truly top-down design scheme provided by Accelchip is seamlessly integrated into the design environment of DSP, thus ensuring the minimization of management risk when switching to the truly top-down DSP design method.