Hardware implementation of Theora decoding

Integration with LEON3

by student André Luiz Nazareth da Costa (andre.lnc [at] gmail.com)

and mentor Timothy B. Terriberry (tterribe [at] vt.edu)

 

About

Decode a video in the Theora format requires a great power of processing. In this way, the development of a specify hardware for it is a viable solution and some modules had already been made successful in hardware on GSoC 2006 (Google Summer of Code). The idea is you get the FPGA with small embedded processor and to put just the critical modules in the hardware.
Goal of my project is to give continuity to the project of the last year, putting one or more modules in hardware and then diminishing the cpu-time processing. This implementation will be done in VHDL and synthesized to the Altera StratixII FPGA. GSoC Project page: http://code.google.com/soc/2007/xiph/appinfo.html?csaid=4235040C184DBD68

XIPH and Theora

The Xiph.Org Foundation (http://www.xiph.org/) is a non-profit corporation dedicated to protecting the foundations of Internet multimedia from control by private interests. The purpose is to support and develop free, open protocols and software to serve the public, developer, and business markets.Theora is the video codec from Xiph, based on the VP3 codec donated by On2 Technologies.

GSoC

Google Summer of Code (http://code.google.com/soc/) is a program that offers student developers stipends to write code for various open source projects. Google works with a several open source, free software and technology-related groups to identify and fund several projects over a three month period. Historically, the program has brought together over 1,000 students with over 100 open source projects, to create hundreds of thousands of lines of code. The program, whichkicked off in 2005, is now (2007) in its third year.

Hardware implementation

The First step (analysis of theora decoding process) was studied firstly by Felipe Portavales and after by Leonardo Piga. The conclusion was that the function reconrefframes waste approximately 60% of CPU-time, but functions before have a lot of struct's of decision and few struct's of processing (like multiplication). You can see this on http://svn.xiph.org/trunk/theora-fpga/doc/. Thus, thisfirst part isn't too interesting to be done in Hardware, but the reconrefframes is.

Felipe Portavales did the iDCT and Leonardo Piga did the others functions. The VHDL simulation is OK and the synthesis in FPGA is OK too.

But, the integration was did with NIOS processor, which is a proprietary processor. The alternative of a nonproprietary processor was the LEON.
NIOS has a good interface and good support for FPGA. LEON has a different interface and is very flexible. Then, I started to study more about this processor, my first goal from GSoC was to do all the integration of Theora Hardware with LEON. This page describe how to do this integration step by step.

Sites:

Theora: http://www.theora.org/

Xiph: http://www.xiph.org/

Theora Hardware Wiki: http://wiki.xiph.org/index.php/TheoraHardware

Google Summer of Code page: code : http://code.google.com/soc/

GSoC Project page: http://code.google.com/soc/2007/xiph/appinfo.html?csaid=4235040C184DBD68

Gaisler : http://www.gaisler.com

Vorbis Hardware implementation on LEON2: http://oggonachip.sourceforge.net/

MP3 Hardware implementation on LEON2: http://lampiao.lsc.ic.unicamp.br/~billo/leon2_on_mblazeboard/index.htm

Lists:

Leon Sparc: http://tech.groups.yahoo.com/group/leon_sparc/

Theora: http://lists.xiph.org/mailman/listinfo/theora-dev

1 - The LEON3 processor.

figure 1

About Gaisler

Gaisler Research provides a complete framework for the development of processor-based SOC designs. The framework is centered around the LEON processor core and includes a large IP library, behavioral simulators, and related software development tools.
http://www.gaisler.com

About GRLIB

The GRLIB IP Library is an integrated set of reusable IP cores, designed for system-on-chip (SOC) development. The IP cores are centered around the common on-chip bus, and use a coherent method for simulation and synthesis.
http://gaisler.com/products/grlib/grlib.pdf

Choosing a ideal configuration (add my Configuration File)

You need first to install the GRLIB (I worked with grlib-gpl-1.0.15-b2149.tar.gz ), It is following the instructions on grlib.pdf
After you have the GRLIB installed, you can run the "make xconfig" on "grlib/designs/leon3-altera-ep2s60-sdr" (I used the Stratix II EP2S60F672C5ES).
There, you can select this components:

Component                                     Vendor
LEON3 SPARC V8 Processor       Gaisler Research
AHB Debug UART                         Gaisler Research
AHB Debug JTAG TAP                 Gaisler Research
LEON2 Memory Controller            European Space Agency
AHB/APB Bridge                           Gaisler Research
LEON3 Debug Support Unit           Gaisler Research
Generic APB UART                       GaislerResearch

My configuration file: config.in

Now, you can run the synthesis of your design (make quartus).
FPGA problem pins: You need pay attention in select the suitable design for your FPGA, else you can have problem with pin mapping.

Using the GRMON jTAG interface

GRMON is a general debug monitor for the LEON processor, and for SOC designs based on the GRLIB IP library.
We will use this to Load and execution of LEON applications
Manual: http://www.gaisler.com/doc/grmon.pdf

ftp://gaisler.com/gaisler.com/grmon/grmon-eval-1.1.21.tar.gz

Run GRMON with this command:

grmon-eval -altjtag -u

-altjtag : Connect to the JTAG Debug Link using Altera USB Blaster or Byte Blaster.
-u : Put UART 1 in loop-back mode, and print its output on monitor console.

 

2 - Application on LEON3

figure 2

libtheora-1.0alpha6, libogg-1.1.3 and sparc-elf-3.4.4-1.0.29

Create a new path (like /theora_hardware/).
Do the download and unpack the libtheora-1.0alpha6.tar.gz on /theora_hardware/

http://downloads.xiph.org/releases/theora/libtheora-1.0alpha6.tar.gz
tar -xzf libtheora-1.0alpha6.tar.gz

Do the download and unpack the libogg-1.1.3.tar.gz on /theora_hardware/libtheora-1.0alpha6/

http://downloads.xiph.org/releases/ogg/libogg-1.1.3.tar.gz
tar -xzf libogg-1.1.3.tar.gz

Now, you will need to use the BCC (Bare-C Cross-Compiler). BCC is a cross-compiler for LEON2 and LEON3 processors.
Do the download and unpack the sparc-elf-3.4.4-1.0.29.tar.bz2 on /opt/

mkdir /opt
tar -C /opt -xjf sparc-elf-3.4.4-1.0.29.tar.bz2

dump_video.c modified and vector of input

How we are not running on a Linux, you will need to take care with file functions. You can to comment the fprint, to change the fread's to a vector of inputs and the fwrite will be just a printf. Like this:

dump_video_hardware.c
insert vector_of_input.h

BUG detected from OGG lib (unaligned address error).

There was a error Bug from OGG lib:

IU in error mode (tt = 0x07)
400013a4 e8220011 st %l4, [%o0 + %l1]

The trap type 0x07 is a memory access to unaligned address. Some architectures support unaligned stores, but SPARC does not (just in 4 by 4 bytes). I had a luck in to find a report from a group that put the Vorbis decoder on FPGA. It was a master thesis of 2 students http://oggonachip.sourceforge.net/.

Then, you just need to type so extra lines in configure.in file (on Ogg library's, /theora_hardware/libtheora- 1.0alpha6/libogg-1.1.3/) as follows:

AC_CHECK_SIZEOF(short,2)
AC_CHECK_SIZEOF(int,4)
AC_CHECK_SIZEOF(long,4)
AC_CHECK_SIZEOF(long long,8)"

Compilation of Libtheora for LEON3 architecture

You can run this script

# Export sparc-elf PATH
export PATH=/opt/sparc-elf-3.4.4/bin:$PATH

# Clean all
make distclean
cd libogg-1.1.3/
make distclean

# Set CROSS-Compiler and parameters
export CC=sparc-elf-gcc
export CXX=sparc-elf-gcc
export CFLAGS='-mv8 -msoft-float -static'
# -mv-8 generate SPARC V8 mul/div instructions - needs hardware multiply and divide
# -msoft-float emulate floating-point - must be used if no FPU exists in the system

#Configure and install OGG lib
./configure --prefix=/theora_hardware/ --target=sparc-elf --host=sparc-elf --enable-static
make
make install

#Configure and make Theora for LEON (sparc)
cd ../
./configure --prefix=/theora_hardware/ --target=sparc-elf --host=sparc-elf --enable-static --disable-encode
make

How to do the test on figure 2

After last step, you will have the binary "dump_video_hardware". At first step (The LEON processor) you generated by the synthesis a programmer file(leon3mp.sof) that now you can to programmer your FPGA. Then, open the Grmon interface and load the dump_video_hardware ("load dump_video_hardware"). Now, "run dump_video_hardware".

 

3 - LINUX on LEON3

 

figure 3

Snapgear

LINUX support for LEON2 and LEON3 is provided through a special version of the SnapGear Embedded Linux distribution. SnapGear Linux is a full source package, containing kernel, libraries and application code for rapiddevelopment of embedded Linux systems.

Download the Snapgear:
ftp://gaisler.com/gaisler.com/linux/snapgear/snapgear-p33a.tar.bz2

Snapgear Manual:
ftp://gaisler.com/gaisler.com/linux/snapgear/snapgear-manual-1.33.0.pdf

Download the Sparc Linux Cross Compiler:
ftp://gaisler.com/gaisler.com/linux/snapgear/sparc-linux-1.0.0.tar.bz2

Kernel versions that I am using: linux-2.6.21.1 for MMU system

The tool-chain should be installed under /opt :

cd /opt
tar xjf /sparc-linux-1.0.0.tar.bz2

Add /opt/sparc-linux/bin to your PATH.

The SnapGear distribution can be installed anywhere:

tar -xjf snapgear-p33a.tar.bz2

General instructions on how to use SnapGear linux is provided with the distribution.

 

Testing

After programmer your FPGA with LEON3, you can open the GRMON with this command:
./grmon-eval -altjtag -nb -abaud 38400 -nosram

The GRMON should be started with -nb to avoid going into break mode on a page-fault or data exception.

Problem with SRAM

I disabled the SRAM (-nosram) because I had just 2 Mbit of SRAM on my FPGA, then I needed to load the kernel on SDRAM. But, I was having problems of memory mapping. Thus, I decided disable the SRAM.

Serial and jTAG Dbg Link.

The "-abaud 38400" set application baudrate for UART 1.
In order to have a konsole interface from linux you need to connect a serial cable with you computer. Then, you can use a program like "kermit" that provides a serial communication with your linux konsole on FPGA. Some FPGA´s has 2 serial connectors, BE SURE that you are using the suitable connector!.
I am using the follow configuration of kermit:

set line /dev/ttyS0
define sz !sz \%0 > /dev/ttyS0 < /dev/ttyS0
set speed 38400
set carrier-watch off
set prefixing all
set parity none
set stop-bits 1
set modem none
set file type bin
set file name lit
set flow-control none
set prompt "Sparc Linux Kermit> "
c


Now, load your kernel image (image.dsu) generated with Snapgear and to see your konsole running on kermit.

 

4 - Libtheora running on LINUX

figure 4

Libtheora compilation for Linux on LEON3

Now, you can use the original dump_video.c because you are using the linux.Then, you can to work with files.

# Export sparc-linux PATH
export PATH=/opt/sparc-linux/bin/:$PATH

# Clean all
make distclean
cd libogg-1.1.3/
make distclean

# Set CROSS-Compiler and parameters
export CC=sparc-linux-gcc
export CXX=sparc-linux-gcc
export CFLAGS='-msoft-float -fPIC -static'
# -msoft-float emulate floating-point - must be used if no FPU exists in the system
# -g generate debugging information - must be used for debugging with gdb
# -fPIC generate position independent machine code. It is necessary because we are using linux now.
# -static when linking an application static, all code used from libraries are included into the output binary

#Configure and install OGG lib
./configure --prefix=/homes_export/andre.lnc/theora/libtheora6_hard/ --target=sparc-linux --host=sparc-linux --enable-static
make
make install

#Configure and make Theora for LEON (sparc)
cd ../
./configure --prefix=/homes_export/andre.lnc/theora/libtheora6_hard/ --target=sparc-linux --host=sparc-linux --enable-static
make

How to do the test on figure 4

After generate the binary for LINUX on LEON3, you need to do a copy of this to /snapgear-p33/romfs/home/ and to make a image of linux kernel with the Theora compiled (dump_video). Don`t forget to do a copy of some video to /snapgear-p33/romfs/home/. Take care about size of your linux image, your SDRAM of FPGA needs to havespace for this.

Then:
Programmer your board with LEON3;
Load the linux image on LEON3 (using grmon);
Open your kermit interface and set the configuration;
Run the linux kernel (using grmon);
Come back to kermit and you will see a konsole of Linux;
Now, go to home (cd home) and run the dump_video (./dump_video video.ogg);

 

5 - A Peripheral on LEON3

figure 5

AHB and APB bus

AHB is a new generation of AMBA bus which is intended to address the requirementsof high-performance synthesizable designs. It is a high-performance system bus thatsupports multiple bus masters and provides high-bandwidth operation.
AMBA AHB implements the features required for high-performance, high clockfrequency systems.
The APB is part of the AMBA hierarchy of buses and is optimized for minimal powerconsumption and reduced interface complexity.The AMBA APB appears as a local secondary bus that is encapsulated as a single AHBslave device. APB provides a low-power extension to the system bus whichbuilds on AHB signals directly.
The APB bridge appears as a slave module which handles the bus handshake andcontrol signal retiming on behalf of the local peripheral bus.

AMBA Protocol

You can see details:
http://www.gaisler.com/doc/amba.pdf

Why APB interface was choosed.

I was searching on teses and articles in order to decide where would be the best place for Theora Hardware and how I could to do the communication between software and hardware by bus and to pass the data's for hardware. I found many differents solution.
The AHB is a high speed bus suitable to connect units with high data rate. But, the problem is that the Theora Hardware will be a Master on AHB bus and could overload the bus and diminish the performance of LEON3. APB is slower than AHB. However the protocol is simpler than AHB and don't disturb the communication between LEON3 and Memory controller. Also, I found hybrids solution with APBand AHB, but I thought better to plug this just on APB bus.

 

6 - Plugging Theora Hardware on LEON3

figure 6

How to include a APB core

How to include the Theora APB core

Create the path grlib/lib/opencores/theora_hardware
Include ¨theora_hardware¨ on grlib/lib/opencores/dir.txt

Download the revision 13432 from SVN on grlib/lib/opencores/theora_hardware/:
http://svn.xiph.org/trunk/theora-fpga/

You will need to change the name of entity syncram to tsyncram of the modules: Syncram, expand block, loopfilter, copyrecon, databuffer. It is because syncram is a name used in other different component from LEON3.

Now, we need to create the theora_hardware.vhd and theora_amba_interface.vhd:
theora_hardware.vhd and theora_amba_interface.vhd

Create vhdlsyn.txt on grlib/lib/opencores/theora_hardware/vhdlsyn.txt and include all the vhdl`s

If you prefer, you can download these files here: theora_hardware1.tar

You should include the Theora Hardware APB/AMBA (OPENCORES_THEORA_HARDWARE on VENDOR_OPENCORES) just changing the file devices.vhd (grlib/lib/grlib/amba/):
devices.vhd

Finally, we need instantiate the theora_hardware on leon3.vhd and take care about to use a selector free of APB slave output vector (apbo(i)):
leon3mp.vhd

Before synthesis ("make quartus"), Type the commands "make distclean" and "make script" on your path (design grlib/designs/leon3-altera-ep2s60-sdr/).

Addressing protocol that I did between Software Interface and Theora_amba_interface

struct theora_regs_t {
volatile int flag_send_data;
volatile int data_transmitted;
volatile int flag_read_data;
volatile int data_received;
};

struct theora_regs_t * theora_regs = (struct theora_regs_t *)0x80000800;

flag_send_data (address 0x80000800): It is a flag used to the driver to know if can send a data to Theora Hardware.
data_transmitted (address 0x80000804): Data Transmitted to Theora Hardware
flag_read_data (address 0x80000808): It is a flag used to the driver to know Can the driver receive a data from Theora Hardware.
data_received (address 0x8000080C): Data received from Theora Hardware

How to do a Software interface

If you can send a data, you need to do a loop on software until the flag_send_data to be '1'. Then you can send by data_transmitted.
If you can write a data, you need to do a loop on software until the flag_read_data to be '1'. Then you can send by data_received.
Below is a example of a simple software that I did to test this comunication:

send_vector_of_input.c and input.h
Compiling the software:

sparc-elf-gcc -mv8 -msoft-float -g send_vector_of_input.c -o send_vector_of_input.exe

Inputs and ReconRefFrames

There is a correct sequence of inputs that you need to send to ReconRefFrames. You can generate this vector of inputs with a libtheora modified:
libtheora-1.0alpha6-fpga.tar.gz

Theora AMBA interface

The Theora_amba_interface implement the APB/AMBA peripheral in order to receive and transmit the data's from driver to Theora_hardware using the Addressing protocol defined above and the ReconReframe protocol.

7 - Integration Software and Hardware of Theora decoder on LEON3

figure 7

Driver Theora

A driver is necessary because we are using a linux. Then, a software running on linux can not write in a real address, it needs of a driver.
There are many tutorial on internet of how to do a character device, then I will not talk about these details.The parameters of transaction between software and driver that I did are these:

struct _data
{
int read;
int wrote;
int data;
};
struct _data dt;

I/O control function: theora_ioctl(struct inode *inode, struct file *filp, unsigned int nFunc, unsigned long nParam)

If nFunc = '0' means that the driver will try to do a reading on Theora Hardware. If nFunc = '1', the driver will try to do a writting on Theora Hardware.

If occurred a successful reading, the dt.read will return 1. If not, will return 0.
If occurred a successful writting, the dt.wrote will return 1 and the data on dt.data. If not, the dt.wrote will return 0.

See the driver theora: theora.c

How to include the driver on linux image

Include the theora.c (the driver) on snapgear-p33/linux-2.6.21.1/drivers/char/

Include the line "obj-$(CONFIG_THEORA) += theora.o" on snapgear-p33/linux-2.6.21.1/drivers/char/Makefile. Like this Makefile

Include the lines ...
config THEORA
bool "Theora Driver"
default y

... on snapgear-p33/linux-2.6.21.1/drivers/char/Kconfig. Like this Kconfig

You need to make sure to select a unique number from the snapgear-p33/linux-2.6.21.1/Documentation/devices.txt. In my case was the number 121.
Then, you need to add the line "DEVICES = theora,c,121,0 \" on snapgear-p33/vendors/gaisler/leon3mmu/Makefile. Like this Makefile. It will create a /dev/mydriver each time make is run.

Now, if you want generate the linux image, you just need to do a "make" on snapgear-p33 path

When you boot the linux from FPGA you will see these lines:
Loading theora ...
LEON THEORA driver by Andre Costa (2007) - andre.lnc@gmail.com

Problems:

Below I will describe some errors that I had with the driver and how to solve it:

- Unable to handle kernel paging request at virtual address 80000000:
The MMU protects certain memory spaces, you either bypass the MMU usingthe SPARC specific STA or LDA instructions (not recommended) or useioremap to inform the MMU about the new area. In my case I used the ioremap.

- Warning: ioremap: done with statics, switching to malloc
Error (running on FPGA): alloc_io_res(phys_80000800): cannot occupy
Halt
Halt

The problem is that you repeatedly call ioremap(). Youshould do this once and keep the pointer returned from ioremap and usethis to access the hardware in the rest of the code. I was using the ioremap on ioclt(), but It should be on theora_init().

- BUG: soft lockup detected on CPU#0!
Soft lockup is when the kernel fails to reschedule for 10 seconds. Thisimplies that your driver does not yield the CPU. For example, in yourread/write functions you should either return immediately or sleep untilwoken up by an interrupt. You may not busy wait. I was doing the loop (until receive a data from theora_hardware) on driver, but It should do just on modified libtheora software.

How to cut LIBTHEORA and to send the data's to Driver Theora

You will to edit the dct_decode.c from libtheora. First, open the driver: pf = open("/dev/theora",O_RDONLY|O_WRONLY|O_TRUNC|O_CREAT);The function write_theoradriver(int pf, int data) that was implemented is responsable to send a data to the driver. Then, we need to send all the data's and receive in a correct sequence. Take care about this, if just a data was not sent or read it's can stop all the pipeline of decodification. You can receive back the data in order to compare the output.
dct_decode2.c
codec_internal.h (some little changes on this file)

8 - Video Controller

figure 8

The controller consists of a YUV to RGB converter and a video signal generator that send the signal to a D/A converter.

Lancelot board

It is a video D/A converter and It is necessary because the Stratix II doesn't have one.
You should read the Manual

9 - Integration Video Controller and Hardware Theora

figure 9

Leonardo Piga did a video controller and he plugged it on NIOS. Then I worked in order to pluged this video controller on my LEON-Theora integration and I found some problems that I will describe.

Changes

dct_decode: The differences between this dct_decode.c and dct_decode2.c is that now we don't need to receive the outputs of reconrefframe and compare with software, we just need to send the data's predecoded to reconrefframe.Beyond this, we need to send the height and the width, because the videocontroller will request.
You can see my dct_decode: dct_decode.c and dump_video.c (Now we can't see print to any file, the data's are transmitted to theora_amba_interface)

Hierarchy of the modules: Now we have the theora_hardware that will have the reconrefframe and the video controller. It was necessary to do some adaptations (theora_apb.vhd, theora_amba_interface.vhd, theora_hardware.vhd ...). Here you can download all these modules: theora_apb.tar

Pins of Lancelot: You will need to connect all pins of lancelot on to leon system. My new leon3mp is leon3mp.vhd, and my file of connections: leon3mp.qsf

Memory Problems

At first time, this system (LEON+ Theora_Hardware + Video Controller) was using a lot of internal memory, I was not getting to join it on my FPGA. There was some bugs on video controller and that size of buffer was not necessary, but if changed the size (to a video of 96x80), it was not running. These bugs are solved, but I just decode a video of 96x80 resolution. It will be futurely solved when the external memory is implemented.

Cross-clock domain

I had some difficult in to plug it on Leon, because of hardware constrains. The clock frequency used by video controller is of 25 MHz, but the frequency of Leon system is of 50 MHz. It was not just to put a simples clock divider, because on the synthesis a had problems of cross-clock domain at time analysis. The video controller (25 MHz) need to receive data's from a module of 50 MHz. It was generating a clock skew problems. The solution was simples, I needed to change some parameters on PLL of Leon system, the PLL (phase-locked loop) is basically a closed loop frequency control system that generate the clocks of Leon and sdram with the phase adjusted, I needed to include a new clock there with the correct parameters. Like this on /grlib/lib/techmap/clocks/clkgen_altera_mf.vhd
clkgen_altera_mf.vhd

A band of 8 pixel green below of video

The dump_video includes a band of 8 pixel green below of video. If run a video of 96x72, I will have a video of 96x80. Something like:
Ogg logical stream 583c6ca0 is Theora 96x80 29.97 fps video
Encoded frame content is 96x72 with 0x0 offset

Theora encodes the frame in whole 16x16 macro blocks, so both the widthand height must be a multiple of 16. When the actual video content isnot a multiple of 16, it is expanded to one and a clipping rectangle isstored in the header (that's the "Encoded frame content..." message).dump_video does not crop the output down to the actual size of thisrectangle, but outputs the entire expanded frame. The encoder by defaultstores zeros in this part of the frame, so that's why it looks green.

Littles purple points on video

At first tests, there were littles purple points on the image, It needed of a shift phase on video controller clock (25 MHz). We think that the reading of video controller memory was happening at the same phase of writting.

ffmpeg2theora

Here you can find the ffmpeg2theora software that you can change the resolution, the start and end point, and more some things very usefull that you certainly will need to do to tests some videos.

Demostration

I did a demonstration of this integration until the video controller and it is on youtube. Click here to see the video

This video shows the sequence:

- Programmer the board (Stratix II)with LEON3 (by USB Blaster - jtag);

- Load the linux image on LEON3 (using grmon, by USB Blaster - jtag). The Unknown device is the Theora Hardware, it is not show the name "Theora Hardware" because I am using a evaluation version of Grmon;

- Open the kermit interface and set the configuration (by Serial Interface);

- Run the linux kernel (using grmon, by USB Blaster - jtag);

- Come back to kermit and you will see a konsole of Linux (by Serial Interface). Here you can the "LEON Theora Driver" that is recognize by the linux;

-Now, I go to home path and run the dump_video (./dump_video ronaldinho9680.ogg);

My current FPGA programmer file: leon3mp.sof
My current LINUX Kernel images (with theora driver and dump_video included and complied): image.dsu

The size is little because the buffer multiplexed with a external memory (SRAM) was not implemented, then we just have to user the few blocks of internal memory of FPGA.
There are basically one problem in this presentation. On NIOS, a video was running very slow, almost 7 times. On my LEON system it is still slow, but just 5 times, then the perfomance is a little better then NIOS. A video of 15 second is taking 75 seconds (5 times).

I discovered that the problem was on LINUX! If I don't use the linux (like is done on NIOS), I can to decode much faster than the time of exibition!
The problem is the LINUX Call systems, because I am calling the driver for each word that I want to send. The solution would be to do a copy from a block of words to the driver, but it isn't implemented for while.
Without the LINUX, a video (30 seconds) encoded with the best quality (ffmpeg2theora -x 96 -y 80 -v 10 -e 30 original.ogg -o video.ogg) is decoded in 25 seconds. Below I will discribe this other implementation.

 

10 - LEON3 and Hardware Theora without LINUX



Demostration


Now we have two points on software (the hardware is the same, LEON+Theora_hardware):

[a] Theora on LINUX: Here we have a .ogg file that can be decoded on FPGA and shown on a monitor. But it can't be decoded in time of exibition. I suppose that the problem is the LINUX system call, but I don't have any idea how it could be solved.

[b] Theora without LINUX: Here, we can to decode much faster that the time of exibition. But, without LINUX, we can't to use a .ogg file directly, we need to convert it in a vector and to compile with dump_video (the Leonardo also needs to do it on NIOS).


The most important is that now we have a complete theora decoding on FPGA and with no NIOS or any module proprietary. Putting a ogg video a seeing a video on monitor.

We are working with a 96x80 of resolution, that video (on youtube) was duplicating the output pixels. The next step will be to do plug it (the system [b]) on the SRAM memory in order to increase the resolution.

You can to do the download of the last version of theora hardware and the integration at svn: https://trac.xiph.org/browser/trunk/theora-fpga

 

11 - Memory Controller for our Memory Muliplexed

figure 10

NOT IMPLEMENTED

 

12 - Full Integration

figure 11

 

Final consideration

[to complete]

Timing analysis

[to complete]

 

Free Counter