IDCT 2D Inverse Discrete Cosine
Transform Core
Description | Applications | Features | Block Diagram | Functional Description | Implementation Results | Support | Verification | Core Modifications | Deliverables
The IDCT Core implements the 2D Inverse Discrete Cosine Transform (2D-IDCT) on an 8x8 block of coefficients. Hence, it is appropriate for IDCT-based image or video DECODERS and can be used as a core for JPEG, MPEG1, MPEG2, MPEG4, H.261, and H.263 standards. It is based on the row-column computational architecture.
The IDCT is designed for reuse in ASIC and FPGA implementations. The design is fully synchronous with positive edge clocking and no internal tri-state buffers. It offers high performance, while maintaining low gate count, and can be used in any multimedia, digital video and digital printing applications.
Applications
The IDCT core is a typical building block for image processing (e.g. image decompres-sion) applications and can be utilized for a variety of multimedia applications including:
- Digital printing
- Desktop video editing
- Digital still cameras
- Various progressive image transmission (PIT) systems such as:
- Teleconferencing
- Medical diagnostic imaging
- Security services
Features
- High clock speed
- Low gate count
- 8x8 IDCT block size
- Fully synchronous
- Continuous one symbol per clock cycle
- 11-bit inputs, 8-bit outputs, 12-bit cosine coefficients and 15-bit internal computations precision
- No internal RAM requirements
- Internal zero-level shifting on output samples
Block Diagram

Functional Description
The IDCT is a transform that converts a set of frequency co-efficients to a signal. For an image, this transform is performed on a 2 dimensional array of coefficients, resulting in a 2 dimensional array of samples. The data input into the core and output from the core takes place as a block of 8x8 samples.
The IDCT core can perform Inverse Discrete Cosine Trans-form (IDCT) on an 8x8 block of samples. The mathematical definition for the IDCT is given below.
 
where are the input coefficients, are the output image samples, for and otherwise and N=8.
In order to operate the IDCT, the core must be connected to an NxN external connected Dual Port RAM. The IDCT trans-formation is implemented using the row/column algorithm (due to the separarability property of the above equations). This external small Dual Port memory is required in order to store the intermediate coefficients.
Stage 1
This processing stage comprises a set of multiplier-accumulator units as well as a Cosine lookup table for re-spective IDCT computations. The input to this stage is the 11-bit data DIN from the I/O port. The output from this process-ing stage is a word of 15 bits length and passed onto the transpose memory.
Stage 2
This processing stage comprises a set of multiplier-accumulator units as well as a Cosine lookup table for re-spective IDCT computations. The input to this stage is the data stored in the transpose memory by stage 1. This stage, similar to stage1, performs a 1-D IDCT and provides the final 11-bit output sample at the DOUT port.
Control Unit
This is the control unit for the IDCT transformation. It re-ceives all input control signals (RESET, START) and generates: a) all the internal control signals for each stage; b) the output control signals for communication (BUSY, READY, DATA_AV); c) the control signals for the communication with the external transpose memory.
Transpose Memory
The core requires an externally connected dual-port RAM of 64 words in order to operate correctly. This memory is used to store the IDCT intermediate results. The core will use one port as write only and the other port as read only. The two ports must be able to be independently addressed. Both the write and read ports will be accessed by the core synchro-nously, with a clock synchronous to the IDCT core clock.
Implementation Results
IDCT reference designs have been evaluated in a variety of technologies.
Support
The core as delivered is warranted against defects for three years from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.
Verification
The correct operation of IDCT core was verified for all possi-ble set of parameters as follows. A large number of DCT coefficient sets were fed to the core that implementing the in-verse 2D-IDCT. The outputs of the VHDL core (IDCT output samples) have been binary compared to the outputs of the reference Bit Accurate Model.
Core Modifications
The IDCT core can be easily customized to meet your area/throughput constraints. Essentially the core can be modified as follows: a) a dual-port RAM can be included within the core for technologies that provide such memory blocks; b) if the image block is stored in a single port mem-ory, then the IP needs only one IDCT_block that implements the row/column IDCT algorithm (area saving because only one stage is needed but half the throughput); c) the blocks size (N=8 or 16) input/output, cosine and internal precisions are configurable. Please contact CAST directly for any re-quired modifications.
Deliverables
The core is available in ASIC (synthesizable HDL) and FPGA (netlist) forms, and includes everything required for successful implementation:
- HDL RTL source code (ASICs) or post-synthesis EDIF netlist (FPGAs)
- Bit Accurate BAM Model
- Sophisticated HDL Testbench including external FIFOs, buffers, models of interfaces, and the core
- Simulation script, vectors, expected results, and comparison utility
- Synthesis script (ASICs) or place and route script (FPGAs)
- Comprehensive user documentation, including detailed specifications and a system integration guide
- Design support

|