BB_2DFDWT Block Based Forward
Discrete Wavelet Transform Core
Description | Features | Applications | Symbol Diagram | Block Diagram | Functional Description | Usage | Performance | Implementation Results | Support | Verification | Deliverables
The core implements the Forward 5/3 2D Discrete Wavelet Transform and it is based on a localized block-based computational architecture. This architecture favors the drastic reduction of data accesses and transfers to/from frame (off-chip) memories, achieving this way significant increase of processing speed and decrease of energy dissipation. The core allows for functional pipelining between transform and Zero-Tree-Coding engines (e.g. MPEG4-VTC) with the addition of an extra block.
The BB_2DFDWT is designed for reuse in ASIC and FPGA implementations. However it is currently tested only on Xilinx’s Virtex and SpartanII FPGAs. The design is strictly synchronous with positive edge clocking and a synchronous reset; therefore scan insertion is easily feasible.
Features
- 2D 5/3 Forward Discrete Wavelet Transform
- Minimizes Accesses to External Memory and Energy Dissipation
- 3 cycles per input pixel average throughput
- 16 Bits Precision
- “In Place” transform (if desired)
- Fully Synchronous Design
- Based on the local wavelet technology of IMEC1,2
The core is parameterized. Its parameterization is realized in two levels as follows:
- Synthesis (VHDL’s Generics):
- Max. Image Dimensions
- Max. DWT Decomposition Levels
- Run Time (using corresponding inputs):
- Image Dimensions
- Decomposition Levels
Applications
The BB_2DFDWT core can be used in a variety of multimedia applications such as:
- Still Image Coding (e.g. JPEG2000 Encoders)
- Video Coding
Symbol Diagram

Block Diagram

Functional Description
The BB_2DFDWT is internally partitioned into functional blocks as shown in the block diagram and discussed below.
Filtering Unit
Consist of a FIFO and a heavily optimized parallel filter, which performs the actual filtering.
Address Unit
Generates addresses for the internal busses, and the external image memory (Dual-ported RAM).
Local Memories Blocks
Store intermediate results.
Control Unit
Controls internal operation and contains the memory mapped configuration registers.
Usage
Usage of the BB_2DFDWT is simple. Image dimensions are defined through R and C inputs in terms of image rows and columns respectively. L input is used to define the desired decomposition levels. Encoding starts by setting high the ST input for at least one clock cycle. An active high pulse at the EOT output denotes the end of one frame full transform. Image data are transferred to the wavelet engine through the bi-directional IMG_FU_BUS. This operation as well as the operation of writing the transform coefficients back (or to another external memory space/module) through the same data bus is controlled with the 3-bit IMG_MEM_CTRL bus. The above-mentioned operations are addressed through the IMG_ADDR bus. IMG_FU_BUS is a two-coefficient, 32 bit bus.
R, C, L, and IMG_ADDR widths are depended on the values of the synthesis parameters. Each has the minimum required width to hold its maximum value. It is noted here that IMG_ADDR has two times the width needed to address an image frame since it addresses a two-coefficient bus.
Performance - Area
Performance in terms of frames/sec as well as area occupation are strongly related to the number of decomposition levels, and image dimensions.
Implementation Results
BB_2DFDWT reference designs have been evaluated in a variety of technologies.
Support
The core as delivered is warranted against defects for three years from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.
Verification
The correct operation of BB_2DFDWT Core was verified for all possible set of parameters as follows:
- A large number of images were fed to both the Core and reference Software and the results were compared.
- After performing forward transform with the core, inverse transform of the output was performed with reference software. PSNR measurements among the re-constructed and original images were performed.
- The core was implemented on a Xilinx Virtex equipped prototyping board and validated for correct operation.
Deliverables
The core is available in ASIC (synthesizable HDL) and FPGA (netlist) forms, and includes everything required for successful implementation:
- HDL RTL source code (ASICs) or post-synthesis EDIF netlist (FPGAs)
- Sophisticated HDL Testbench including external FIFOs, buffers, models of interfaces, and the core
- Simulation script, vectors, expected results, and comparison utility
- Synthesis script (ASICs) or place and route script (FPGAs)
- Comprehensive user documentation, including detailed specifications, architectural overview, user's guide, and a system integration guide
- Design Support incluidng interface customization

|