####_Theora Bitstream Specification and Reference Decoder -- Theora 1.0
##Document version: 0.074

#/(c) 2003 Xiph Foundation, Dan Miller

#/special thanks to various people at On2 Technologies for donating VP3, and to the folks at Xiph.org Foundation for doing good things with it.  Particular shout outs to Mike Melansen, Tim Teriberry, Ralph Giles, Christopher Montgomery, Mauricio Piacentini, and Jack Moffit (please let me know if I spelled your name wrong..)

#/This documentation and software are distributed under the terms of the BSD-style Theora license, as outlined in the "COPYING" file in the root directory of this distribution.  Please familiarize yourself with these terms before distributing any of the documents in this directory.

###_Introduction

#Theora is a video codec. This is the spec. 

##_State of this document

#This document is unfinished.  It is a work in progress that is included in the Theora release for your edification.  Once it is complete and verified, it will hopefully become the canonical reference for Theora compatibility.

#The most glaring deficiency is that presently the documentation and Python decoder only cover keyframes.  We are actively looking for volunteers to help us complete the work of fully documenting the Theora format.  If you are interested and believe you can help, please post a message to theora-dev@xiph.org, or contact jack@xiph.org or danbmil99@yahoo.com.

#In addition to support for predicted frames, the Python script needs to be heavily error-protected.  Presently various illegal bitstreams will cause the program to go into an endless loop or crash in other unpredictable ways.  The specification should eventually be usable as a syntax checker that is able to detect and report on any possible error condition within the bitstream.  In addition to work on the code, we will need to generate a comprehensive test suite of encoded files that exercise every aspect of the bitstream.  Illegal files will also need to be generated to test the error handling capacity of the script.

##_About this document

#This document is The Theora Bitstream Specification and Reference Decoder. It is both an english language description of the Theora video bitstream, and an interpretable program in the Python programming language, which can be executed using a version 2.0 or later Python interpreter.  More information about Python can be gleaned at www.python.org. When run as a Python script, this document will decode compliant Theora bitstreams, producing uncompressed YUV files.  The YUV output format is identical to that used by theora/win32/experimental/dumpvid/dump_vid.exe, which is as follows:

#For each frame, Y pixel data is output as unsigned characters, scanning left to right for each row of pixels starting with the top row.  The Y data is followed by interleaved V and U data, ie one scanline of V followed by one scanline of U.  This format allows raw data to be inspected visually as a monochrome image, with the V and U planes situated directly below the Y plane.

#To invoke this document as a Python program: at the command prompt, type:
#*[prompt]> python spec.py testspec.ogg out.raw
#The program will attempt to open and decode 'testspec.ogg'.  The raw decoded file will be saved as "out.raw"

##_Formatting Conventions

#There are three versions of this document: text, HTML, and Python.  The HTML and Python versions are created from the text file, spec.txt.  While spec.txt is actually a legal Python script, the formatted version spec.py is more readable as the comments have been reformatted with linefeeds and formatting characters have been stripped.  To create the HTML version spec.html, a simple hinting scheme is used to specify some rudimentary formatting (see txt2html.py for details).  The following commands will regenerate both files from the source file:

#*[prompt]> python txt2py.py spec.txt spec.py
#*[prompt]> python txt2html.py spec.txt spec.html

#In the HTML version, non-code sections are generally formatted using a variable-width font to distinguish them from code.  Code and non-code can be interspersed, even within Python routines.  In general, non-code blocks of text within routines will be italicized.  

##_Overview

#This document represents part of the overall specification package. The specification package includes the following elements: 

#* * Theora Bitstream Specification and Reference Decoder (this document) 

#* * Verification Bitstreams (compressed data) 

#* * Verification Output Streams (uncompressed YUV data)

##/(Note: the following paragraph will not be considered in effect until this document is completed and verified - Ed.)

#A decoding application is Theora compliant IFF it can decode any Theora bitstream that is decodable with the Reference Decoder (this document), producing output that is bytewise equivalent. Several Verification Bitstreams and Verification Output Streams are included in the Specification Package for reference; the Verification Output Streams were produced using this document as a Python script. However, it is understood that a compliant decoder must be capable of decoding any legal Theora bitstream, not just those included in the Specification Package.  A legal Theora bitstream is any bitstream that can be decoded by this document.

#Please note: the Python code herein is written solely to facilitate the definition and compliance testing of Theora bitstreams.  It is by design an extremely inefficient and poorly structured piece of code.  Do not use this as a template for real-world player applications.  Instead, start with the C-based Theora decoder available for download at www.theora.org.

#The Theora Bitstream Specification does not cover encoders, except for the following sentence:

#_/An encoder is Theora compliant if it can produce a compressed bitstream that the Verification Decoder can decode.

#Note that this could be an application that produces nothing but black frames. The quality and scope of a Theora compliant encoder are entirely defined by the application domain for that product. If it produces a syntactically correct Theora bitstream, it is by definition a Theora encoder. The rest is up to you.


###_Theora Bitstream Specification
#A Theora bitstream consists of header packet(s), followed by video packets. A decoder must receive all valid header packets before playing video. (note that this means if you wish to play a Theora stream from an arbitrary point, you need a mechanism to acquire the header information for that stream before commencing playback).

#Before we define our first routine, a little housekeeping for Python:

from array import array
from sys import exit
import sys
import ogg2

#/usage:

if len(sys.argv) < 3:
  print "usage: python spec.py infile outfile"
  exit()
#/some globals & useful definitions:

oggfile = file(sys.argv[1],"rb")
outfile = file(sys.argv[2],"wb")
oggsync = ogg2.OggSyncState()

streams = {}
streamid = None
page = None
packet = None

huffs = []                                            #this will contain list of huffman trees
infoflag = 0                                          #initialization flags
tableflag = 0


print
print "THEORA SPEC PYTHON SCRIPT"
print "Test: decoding first frame of", sys.argv[1]

def output(ret) :
  global outfile

  if type(ret) == str :                               #if it's a header packet,
    print "header packet type:", ret                  #print the type (info, comment, tables)
    if ret == "info":
      print "  version:", version_major, version_minor, version_subminor 
      print "  encoded width:", encoded_width
      print "  encoded height:", encoded_height
      print "  decode width:", decode_width
      print "  decode height:", decode_height
      print "  X offset:", offset_x
      print "  Y offset:", offset_y
      print "  fps:", fps_numerator, "/", fps_denominator
      print "  aspect:", aspect_numerator, "/", aspect_denominator
      print "  colorspace:",
      if colorspace == 0:
        print "  not specified"
      elif colorspace == 1:
        print "  ITU 601"
      elif colorspace == 2:
        print "  CIE 709"
      else:
        print "  colorspace type not recognized"
      print "  target bitrate:", bitrate
      print "  target quality:", quality
      outfile.write('YUV4MPEG2 W%d H%d F%d:%d Ip A%d:%d\n' % \
                    (decode_width, decode_height, \
                     fps_numerator, fps_denominator, \
                     aspect_numerator, aspect_denominator))
    elif ret == "comment":
      print "  vendor string:", vendor_string
      print "  comment length:", comment_string_len
    elif ret == "table":
      print "  tables loaded"
  else:
    print "frame decoded"
    Y, U, V = ret
    outfile.write('FRAME\n')
    outfile.write(array('B', Y + U + V))


##_Bitstream parsing routines

#Bit & byte ordering:  typically, Ogg packs bits starting with the most significant bit to the least.  For historical reasons, Theora packs bits least significant bit first.  In cases where values are byte-aligned (8 bit boundaries), this only affects byte ordering.  We do some Ogg parsing but it happens to be only on byte-aligned values, so we use readbits(8) but reverse the byte order (See for instance readOgg32())

#Note that in this specification we are assuming bytes are always 8 bit values.  Future versions may support non-8-bit platforms.

#/this routine just grabs a byte from the input stream:

#/These are used during the bulk of Theora stream parsing:

def spec_err(msg):
  print 'Error: %s' % msg
  exit()

#/readbits: our workhorse.  Gets up to 32 bits from the stream 

def readbits(x):
  global buffer
  return buffer.read(x)

#/readstring reads a string of 8-bit unsigned chars:

def readstring(x) :
  global buffer
  s = ''
  for i in range(x):
    byte = readbits(8)
    if byte != None :
      s += chr(byte)
  return s

#readOgg32 reads a longword Ogg style.  ONLY GOOD FOR BYTE-ALIGNED READS! (should fix)

def readOgg32():                                      #different than readbits(32): byte order is reversed
  return readbits(8) + (readbits(8) << 8) + (readbits(8) << 16) + (readbits(8) << 24)

def get_page() :
  global page, oggsync
  while page == None :
    new = oggsync.input(oggfile)
    page = oggsync.pageout()
    if new == 0 : break
    

##_entropy coding routines

#Certain values in Theora (such as DCT coefficients) are encoded using a context-sensitive Huffman scheme based on 32 possible token values.  Each token value has an associated set of extra bits that are bitpacked immediately following the primary huffman string.  The binary decision trees (80 of them) necessary for decoding are in the table header.

##Set up the Huffman tables

#Huffman tables are encoded in compressed form using the following algorithm:

#/Note how this function is called recursively for each possible branch in the tree until all branches have bottomed out with complete bitstrings:

hufftokens=0                                          #keep track of # of token strings -- 32 max

def read_hufftable(table):
  global hufftokens
  if readbits(1):                                      #if bit==1, this bitstring is complete
    table.append( readbits(5) )                       #next 5 bits = token number for this string
    hufftokens += 1
    if hufftokens > 32:
      spec_err("illegal huffman table, > 32 tokens")
  else:                                               #if bit was zero, we have two more entries defining
                                                      #the zero and one case for the next bit:
    table.append([])                                  #add another pair of tables
    table.append([])
    read_hufftable(table[0])                          #with an entry for zero
    read_hufftable(table[1])                          #and one for one

##read a token

#/Again, we use recursion to parse the bits until we have a complete string:

def readtoken(huf):
  if type(huf[0]) == type(0):                         #integer means we have a value
    return huf[0]                                     #return token value
  else:
    if readbits(1):                                   #read a bit, recurse into subtable 0 or 1
      return readtoken(huf[1])                        #case for bit=1
    else:
      return readtoken(huf[0])                        #case for bit=0

##define an array of information tables for each token

#/Each table contains the following five entries in this order:

#*#  base run length
#*#  number of extra run length bits (0 - 12)
#*#  base value
#*#  number of extra value bits (0 - 9)
#*#  number of extra sign bits (0 or 1)

token_array = [    
[1, 0, 'eob', 0, 0], 
[2, 0, 'eob', 0, 0], 
[3, 0, 'eob', 0, 0], 
[4, 2, 'eob', 0, 0], 
[8, 3, 'eob', 0, 0], 
[16, 4, 'eob', 0, 0], 
[0, 12, 'eob', 0, 0], 
[0, 3, 0, 0, 0], 

[0, 6, 0, 0, 0], 
[0, 0, 1, 0, 0], 
[0, 0, -1, 0, 0], 
[0, 0, 2, 0, 0], 
[0, 0, -2, 0, 0], 
[0, 0, 3, 0, 1], 
[0, 0, 4, 0, 1], 
[0, 0, 5, 0, 1], 

[0, 0, 6, 0, 1], 
[0, 0, 7, 1, 1], 
[0, 0, 9, 2, 1], 
[0, 0, 13, 3, 1], 
[0, 0, 21, 4, 1], 
[0, 0, 37, 5, 1], 
[0, 0, 69, 9, 1], 
[1, 0, 1, 0, 1], 

[2, 0, 1, 0, 1], 
[3, 0, 1, 0, 1], 
[4, 0, 1, 0, 1], 
[5, 0, 1, 0, 1], 
[6, 2, 1, 0, 1], 
[10, 3, 1, 0, 1], 
[1, 0, 2, 1, 1], 
[2, 1, 2, 1, 1] ]

##parse the tokens

#/this function returns a run length & value based on the token & extended bits:

def parsetoken(huf):
  global token_array                                  #(not strictly necessary for read-only in Python)
  token = readtoken(huf)                              #read a token from the stream

  table = token_array[token]                          #get our table of parameters for this token
  run = table[0]                                      #base run length
  run_extra = table[1]                                #number of extra bits for run length
  value = table[2]                                    #actual value
  val_extra = table[3]                                #number of extra value bits
  sign_extra = table[4]                               #number of sign bits

  sign = 1
  if sign_extra:
    if readbits(1):
      sign = -1                                       #if there's a sign bit, get it.  1 means negative
                                                      #note that value may be negative to begin with, in
                                                      #which case there are no extra value or sign bits
  if val_extra:
    value += readbits(val_extra)                      #get extra value bits

  if run_extra:
    run += readbits(run_extra)                        #get extra run bits
  
  return [run, value * sign]                          #return run length and value
                                                      #note: string * 1 = string, so 'eob', 'zrl' are OK


##_routines to read & parse codec headers

#/routine to parse the Theora info header:

def read_info_header():
  global huffs, encoded_width, encoded_height, decode_width, decode_height, offset_x, offset_y
  global fps_numerator, fps_denominator, aspect_numerator, aspect_denominator, quality, bitrate
  global version_major, version_minor, version_subminor, colorspace, infoflag

  version_major = readbits(8)                         #major & minor version must be exact match
  if version_major != 3:
    spec_err("incompatible major version#")
  version_minor = readbits(8)
  if version_minor != 2:
    spec_err("incompatible minor version#")
  version_subminor = readbits(8)

  encoded_width = readbits(16) << 4                   #encoded width & height are in block units of 16x16
  encoded_height = readbits(16) << 4

  decode_width = readbits(24)                         #decode width & height are in actual pixels
  decode_height = readbits(24)
  offset_x = readbits(8)                              #offset for cropping if decode != full encoded frame
  offset_y = readbits(8)

  fps_numerator = readbits(32)                        #frames per second encoded as a fraction
  fps_denominator = readbits(32)
  aspect_numerator = readbits(24)                     #aspect not used now
  aspect_denominator = readbits(24)
  colorspace = readbits(8)                            #colorspace flag defines YUV to RGB mapping
  bitrate = readbits(24)                              #target bitrate; not used for decode
  quality = readbits(6)                               #target quality also not used for decode
  keyframe_granulepos_shift = readbits(5)             #P-frame granulepos field size; not used for decode
  readbits(5)                                         #5 bits set aside for future use

  infoflag = 1

#/parse the comment header:

def read_comment_header():
  global vendor_string, vendor_string_len, comment_string, comment_string_len

  vendor_string_len = readOgg32()
  vendor_string = readstring(vendor_string_len)
  comment_string_len = readOgg32()
  comment_string = readstring(comment_string_len)


#/read & parse the table header:

def read_table_header():
  global scale_table_AC, scale_table_DC, Y_quantizer, UV_quantizer, IF_quantizer
  global frequency_counts, hufftokens, tableflag

  scale_table_AC = []                                             #64 possible quantizer scalers for AC coeffs
  for x in range(64):    scale_table_AC.append(readbits(16))

  scale_table_DC = []                                             #64 possible quantizer scalers for DC coeffs
                                                                  #Note this is unrelated to 64 coeffs in an 8x8 block!
  for x in range(64):    scale_table_DC.append(readbits(16))

  Y_quantizer = []                                                #quantizers for intra Y coeff (this IS about 8x8 blocks!)
  for x in range(64):    Y_quantizer.append(readbits(8))

  UV_quantizer = []                                               #quantizers for intra U or V coeff
  for x in range(64):    UV_quantizer.append(readbits(8))

  IF_quantizer = []                                               #quantizers for interframe coeffs (Y, U, or V)
  for x in range(64):    IF_quantizer.append(readbits(8))

  for x in range(80):                                             #Read in huffman tables
    huffs.append([])
    hufftokens=0
    read_hufftable(huffs[x])

  tableflag = 1

def decode_header():

  header_type = readbits(7)

  cid = readstring(6)
  if cid != "theora":
    spec_err("not a theora stream header %s", cid)

  if header_type == 0:
    read_info_header()
    return "info"

  elif header_type == 1:
    read_comment_header()
    return "comment"

  elif header_type == 2:
    read_table_header()
    return "table"

  else:
    print "unknown stream header type -- skipping"
    return "unknown"

##_routine to process a new stream

def decode_new_stream() :
  global page, stream, buffer, streams, streamid

  codecs = (("\x80theora", "Theora"), ("\x01vorbis", "Vorbis"),
            ("Speex", "Speex"), ("fLaC", "FLAC"), ("\x00writ", "Writ"))

  serialno = page.serialno
  temp_stream = ogg2.OggStreamState(serialno)
  temp_stream.pagein(page)
  page = None
  packet = temp_stream.packetout()
  if packet == None :
    spec_err('first page of a bitstream must contain one complete packet')
  buffer = ogg2.OggPackBuffB(packet)
  magic = readstring(7)
  for c in codecs:
    if magic[:len(c[0])] == c[0] :
      streams[serialno] = c[1]
      if c[1] == "Theora" :
        if streamid == None :
          print "Theora stream found (serialno %d), will decode" % serialno
          stream = temp_stream
          streamid = serialno
          read_info_header()
          output("info")
        else :
          print "another Theora stream found (serialno %d), will ignore" % serialno
      else :
        print "%s stream found (serialno %d), will ignore" % (c[1], serialno)
      break


##_ Routines that decode video

#/[NOTE: for now, these routines only handle keyframes.  We may modify or add routines to support interframe data]
#/[NOTE: each frame of video resides in a single Ogg page.]
#/[NOTE: that's crap.  each frame of video resides in a logical Ogg packet.  pagination is irrelevant.]

##Hilbert ordering

#All data in Theora is organized into 8x8 blocks.  When encoding the data, these blocks are further grouped into 'super-blocks' of 16 blocks each, and encoded in 'Hilbert' order.  Each super-block consists of up to 16 blocks encoded in the following order:

#*      X -> X    X -> X
#*           |    ^
#*           v    |
#*      X <- X    X <- X
#*      |              ^
#*      v              |
#*      X    X -> X    X
#*      |    ^    |    ^
#*      v    |    v    |
#*      X -> X    X -> X

#(thanks to Mike Melanson for the diagram)

#If any block is not coded (due to clipping, for instance -- encoded images can include partial super-blocks) the pattern continues until the next coded block is hit.  Each pixel plane -- Y, V, and U -- are encoded using their own pattern of superblocks.

#By way of example: if a plane consisted of 32x16 pixels, only the top half of the Hilbert pattern would be used.  If the 8 8x8 blocks in this example are labled in this way:

#*    A B C D
#*    E F G H

#then they will be encoded in the following order: A, B, F, E, H, G, C, D.


#/the following array & function are used to 'de-hilbertize' the data.  [note: this is a keyframe-only routine right now]

hilbert = [    
[0,0], [1,0], [1,1], [0,1], 
[0,2], [0,3], [1,3], [1,2], 
[2,2], [2,3], [3,3], [3,2], 
[3,1], [2,1], [2,0], [3,0] ]

#/note the use of Python style integer division, //
#/(this is important as Python's / operator behaves differently in various versions!)

def de_hilbert(w, h, colist):                               #width, height, coefficient list
  sbw = (w+3) // 4                                          #super-block width (width in sb's)
  sbh = (h+3) // 4                                          #super-block height (height in sb's)
  ii = 0 
  comap = []

  for x in range(w):                                        #initialize coefficient map
    comap.append([])
    for y in range(h):
      comap[x].append([])

  for y in range(sbh):
    for x in range(sbw):
      for i in range(16):
        p, q = hilbert[i]                                   #nice Python syntax
        xx = x*4 + p
        yy = y*4 + q
        if (xx < w) & (yy < h):                             #skip stuff out of range
          comap[xx][yy] = colist[ii]                        #if in range, get a coeff
          ii += 1
  return comap

##decoding the DC coefficients

#The DC values are simply the zero-order coefficients of each 8x8 block.  These values tend to have more entropy than most AC components, so in addition to quantization, it is desirable to use delta coding to reduce the data requirement of encoding them.

#Theora uses a scheme where each encoded DC value is in fact a difference between a predicted value and the actual value.  Since the blocks are coded in raster order, the predicted value can be any combination of DC values of blocks to the left, up, upper left, and upper right.

#/routine to do DC prediction on a single color plane:

def DCpredict(comap, w, h):

#/first row is straight delta coding:

  l = 0                                                     #l = DC coeff to my left; init to zero
  for x in range(w):
    l += comap[x][0][0]
    comap[x][0][0] = l
      
#/now the rest:

  for y in range(h):
    for x in range(w):
      if y>0:                                               #already got the first row
        if x == 0:                                          #left column?
          u = comap[0][y-1][0]                              #u = upper
          ur = comap[1][y-1][0]                             #ur = upper-right
          p = u
          comap[0][y][0] += p                               #add predictor to decoded value

        else:                                               #general case -- neither left column nor top row
          l = comap[x-1][y][0]                              #l = left
          ul = comap[x-1][y-1][0]                           #ul = upper-left
          u = comap[x][y-1][0]                              #u = upper
          p = (l*29 + u*29 - ul*26)                         #compute weighted predictor
          if p < 0:
            p += 31                                         #round towards zero
          p >>= 5                                           #shift by weight multiplier
          if abs(p-u) > 128:                                #range checking
            p = u;
          if abs(p-l) > 128:
            p = l;
          if abs(p-ul) > 128:
            p = ul;

          comap[x][y][0] += p                               #add predictor to decoded value


##Dequantization logic

#/(thanks again to Mike Melanson for this exposition)

#_Setting Up The Dequantizers

#Theora has three static tables for dequantizing fragments-- one for intra Y fragments, one for intra C fragments, and one for inter Y or C fragments.  In the following code, these tables are loaded as Y_quantizer[], UV_quantizer[], and IF_quantizer[] (see definition for read_table_header below).  However, these tables are adjusted according to the value of quality_index.

#quality_index is an index into 2 64-element tables: scale_table_DC[] and scale_table_AC[]. Each dequantizer from the three dequantization tables is adjusted by the appropriate scale factor according to this formula:

#*    Scale dequantizers:
     
#*        dequantizer * sf
#*        ----------------
#*              100
     
#*    where sf = scale_table_DC[quality_index] for DC dequantizer
#*               scale_table_AC[quality_index] for AC dequantizer

#/(Note that the dequantize routine also multiplies each coefficient by 4.  This is to facilitate the iDCT later on.)

def dequant(data, scaleDC, scaleAC, dqtable):
  mul = ( (scaleDC * dqtable[0]) // 100 ) * 4
  for x in range(64):
    if x>0:
      mul = ( (scaleAC * dqtable[x]) // 100 ) * 4
    data[x] *= mul

##ZIG-ZAG order

#The coefficients in each 8x8 DCT coded block are layed out in 'zig-zag' order.  

#/The following table shows the order in which the 64 coefficients are coded:

zigzag=[  
  0,  1,  5,  6,  14, 15, 27, 28,
  2,  4,  7,  13, 16, 26, 29, 42,
  3,  8,  12, 17, 25, 30, 41, 43,
  9,  11, 18, 24, 31, 40, 44, 53,
  10, 19, 23, 32, 39, 45, 52, 54,
  20, 22, 33, 38, 46, 51, 55, 60,
  21, 34, 37, 47, 50, 56, 59, 61,
  35, 36, 48, 49, 57, 58, 62, 63]

#/this routine remaps the coefficients to their original order:

def unzig(data,un):
  for x in range(64):
    un[x] = data[zigzag[x]]

##inverse DCT (iDCT)

#Once coefficients are re-ordered and dequantized, the iDCT is performed on the 8x8 matrix to produce the actual pixel values (or differentials for predicted blocks).

#Theora's particular choice of iDCT computation involves intermediate values that are calculated using 32-bit, fixed-point arithmetic.  Multiplication is defined as follows (assuming 32-bit integer parameters):

def Mul_fix(a, b):
  return (a * b) >> 16

#addition and subtraction can be performed normally.

#/First we define a one-dimensional iDCT.  Note that there are many implementations of iDCT available.  Theora compatibility requires that the output of your iDCT routine be bitwise equivalent to the one outlined here:

def idct_1D(data, i, stride):

  A = Mul_fix(64277, data[i+stride]) + Mul_fix(12785, data[i+7*stride])
  B = Mul_fix(12785, data[i+stride]) - Mul_fix(64277, data[i+7*stride])
  C = Mul_fix(54491, data[i+3*stride]) + Mul_fix(36410, data[i+5*stride])
  D = Mul_fix(54491, data[i+5*stride]) - Mul_fix(36410, data[i+3*stride])
  A2 = Mul_fix(46341, A - C)
  B2 = Mul_fix(46341, B - D)
  C2 = A + C
  D2 = B + D
  E = Mul_fix(46341, data[i] + data[i+4*stride])
  F = Mul_fix(46341, data[i] - data[i+4*stride])
  G = Mul_fix(60547, data[i+2*stride]) + Mul_fix(25080, data[i+6*stride])
  H = Mul_fix(25080, data[i+2*stride]) - Mul_fix(60547, data[i+6*stride])
  E2 = E - G
  G2 = E + G
  A3 = F + A2
  B3 = B2 - H
  F2 = F - A2
  H2 = B2 + H

  data[i] = G2 + C2
  data[i+stride] = A3 + H2
  data[i+2*stride] = A3 - H2
  data[i+3*stride] = E2 + D2
  data[i+4*stride] = E2 - D2
  data[i+5*stride] = F2 + B3
  data[i+6*stride] = F2 - B3
  data[i+7*stride] = G2 - C2

#2D iDCT is performed first on rows, then columns (note order can affect lower bits!)

#/(Note that we dequantized all coefficients to 4 times their real value.  Since each coefficient is run through the iDCT twice (horizontal & vertical), the final values must be divided by 16.)

def idct(data):
  for y in range(8):
    idct_1D(data, y*8, 1);

  for x in range(8):
    idct_1D(data, x, 8);

  for x in range(64):
    data[x] = (data[x] + 8) >> 4                            #add for rounding; /16 (remember dequant was *4!)

##loop filter

#the Theora loop filter is run along every horizontal and vertical edge between blocks where one of the blocks is coded.  In the keyframe case, this means every edge except the borders of the frame.  For predicted frames, the only edges that are not filtered are those between two uncoded blocks (because they were filtered at some point previously, when the block was originally reconstructed)...

#/clamping routine

def clampit(n, lo, hi):
  if n < lo:
    return lo
  if n > hi:
    return hi
  return n

#/first, define an array of quality-dependent filter parameters:

loopfilter_array = [
  30, 25, 20, 20, 15, 15, 14, 14,
  13, 13, 12, 12, 11, 11, 10, 10,
  9,  9,  8,  8,  7,  7,  7,  7,
  6,  6,  6,  6,  5,  5,  5,  5,
  4,  4,  4,  4,  3,  3,  3,  3,
  2,  2,  2,  2,  2,  2,  2,  2,
  0,  0,  0,  0,  0,  0,  0,  0,
  0,  0,  0,  0,  0,  0,  0,  0 ]

#/next, we define a step function with five linear segments:

def loopfunc(n):
  global loopfilter_array
  global quality_index                                      #from the frame header
  K = loopfilter_array[quality_index]
  if (n <= -K*2) | (n >= K*2):
    return 0
  elif (n > -K*2) & (n < -K):
    return -2*K - n
  elif (n >= -K) & (n <= K):
    return n
  else:
    return 2*K - n

#/now the 1D filter functions:

def filter_horiz(pixmap, x, y):
  for i in range(8):

    A = pixmap[x-2][y+i]
    B = pixmap[x-1][y+i]
    C = pixmap[x][y+i]
    D = pixmap[x+1][y+i]

    A = clampit(A, -128, 127)
    B = clampit(B, -128, 127)
    C = clampit(C, -128, 127)
    D = clampit(D, -128, 127)

    N = ( 4+(A - B*3 + C*3 - D)) >> 3
    delta = loopfunc(N)

    pixmap[x-1][y+i] = B + delta
    pixmap[x][y+i] = C - delta

def filter_vert(pixmap, x, y):

  for i in range(8):

    A = pixmap[x+i][y-2]
    B = pixmap[x+i][y-1]
    C = pixmap[x+i][y]
    D = pixmap[x+i][y+1]

    A = clampit(A, -128, 127)
    B = clampit(B, -128, 127)
    C = clampit(C, -128, 127)
    D = clampit(D, -128, 127)

    N = ( 4+(A - B*3 + C*3 - D)) >> 3
    delta = loopfunc(N)

    pixmap[x+i][y-1] = B + delta
    pixmap[x+i][y] = C - delta

#/full filter (keyframe case only):

def loopfilter(pixmap, w, h):
  for y in range(h>>3):
    for x in range(w>>3):
      xx = x*8
      yy = y*8
      if xx > 0:                                      #vertical if not on left edge
        filter_horiz(pixmap, xx, yy)
      if yy > 0:                                      #horiz if not on top row
        filter_vert(pixmap, xx, yy)

##Pixel management routines

#/this routine converts an array of 8x8 blocks of data into a pixel array

def blocks2pixels(data, w, h, dx, dy):
  pix = [ [0 for y in range(h)] for x in range(w) ]         #initialize 2D array, pix[w][h]

  for y in range(h):
    for x in range(w):
      xx = x+dx
      yy = y+dy
      bx = xx >> 3
      by = yy >> 3
      ix = xx % 8
      iy = yy % 8
      p = data[bx][by][ix + iy*8]
      pix[x][h-y-1] = p   # The h-y-1 trick inverts the frame
  return pix

#/one last helper -- turns a color map (integers with x, y coordinates) into a straight block array, clamped to 0-255:

def pixels2chars(pixels):
  h = len(pixels[0])
  w = len(pixels)

  data = []                                                 #initialze linear array width * height long

  for y in range(h):
    for x in range(w):
      p = pixels[x][y]
      p += 128
      p = clampit(p, 0, 255)
      data.append(p)                                        #add it to the list
  return data

## *** DECODING THE FRAME ***

#ok, let's do it!

def decode_frame():
  print
  print "DECODING FRAME"
  global quality_index, infoflag, tableflag

  if (infoflag == 0) | (tableflag == 0):                    #if info & table not initialized
    spec_err("stream parameters not initialized -- missing info or table headers?")

#/First, we decode the frame header:

  is_predicted = readbits(1)
  print "is_predicted:", is_predicted
  quality_index = readbits(6)
  print "quality_index =", quality_index
  spare_q_bit = readbits(1)
  if spare_q_bit == 1 :
    spec_err("spare QI bit is set, I don't know what to do!")
  scalefactor_AC = scale_table_AC[quality_index]            #(ThisFrameQualityValue in C)
  print "scalefactor_AC =", scalefactor_AC
  scalefactor_DC = scale_table_DC[quality_index]
  print "scalefactor_DC =", scalefactor_DC

  if is_predicted == 0:                                     #0 = keyframe, 1 = interframe

#/OK, this is a keyframe.  That means we just have 'intra' coded blocks.
    print "decoding keyframe"
    keyframe_type = readbits(1)                             #keyframe type always == 1 (DCT) (for now)
    readbits(2)                                             #2 unused bits

#/compute some values based on width & height:
    blocks_Y = encoded_height*encoded_width // 64           #Y blocks coded
    blocks_UV = blocks_Y // 2
    blocks_U = blocks_UV // 2
    blocks_V = blocks_U
    blocks_tot = int(blocks_Y * 1.5)                        #Y and UV blocks coded

#/initialize a map of coefficients.  For each coded block, we will eventually have 64:
    global coeffs
    coeffs = [ [] for x in range(blocks_tot) ]

#/Theora encodes each coefficient for every block in sequence.  IE first we decode all the DC coefficients; then coefficient #1, then #2, and so on.
#/Also, as we go higher into the coefficient index, we will use different huffman tables:

    for i in range(64):                                     #For each coefficient index,
      if i == 0:                                            #get DC huffman tables for coeff=0
        huff_Y = readbits(4)
        huff_UV = readbits(4)
      elif i == 1:
        huff_Y = readbits(4)+16                             #get AC huff tables at i=1
        huff_UV = readbits(4)+16
      elif i == 6:                                          #update at  6, 15, and 28
        huff_Y += 16
        huff_UV += 16
      elif i == 15:
        huff_Y += 16
        huff_UV += 16
      elif i == 28:
        huff_Y += 16
        huff_UV += 16

#/now, for every block, decode our coefficient:

      for x in range(blocks_tot):
        if x < blocks_Y:                                    #if we're in the Y plane,
          huff = huff_Y                                     #use huff_Y
        else:                                               #or,
          huff = huff_UV                                    #huff_UV for chroma planes

#/first check whether this coefficient was already decoded (Because of an end-of-block or other run):

        if len(coeffs[x]) <= i:                             #if this coeff has not been set

#/if not, get a token:

          run, val = parsetoken(huffs[huff])

#/if this is an end-of-block token, that means we have a run of blocks to mark as fully decoded:

          if val == 'eob':                                  #eob = End Of Block run
            xx = x                                          #temporary block index starts with x
            for r in range(run):                            #clear (run) blocks
              done = len(coeffs[xx])                        #this many coeffs are set in this block
              remain = 64 - done                            #this many remain
              for j in range (remain):                      #for all remaining coeffs
                coeffs[xx].append(0)                        #set to zero
              ii = i                                        #temporary coeff index starts with i
              while (len(coeffs[xx]) >ii) and ii<64:        #find next candidate block for eob treatment
                xx += 1                                     #next block
                if xx == blocks_tot:                        #if we wrapped around,
                  xx = 0                                    #back to block zero
                  ii += 1                                   #and next coeff

#/otherwise the token represents a run of zeros followed by a value:
          else:                                             #zero run + value
            for r in range (run):                           #a run of zeros
              coeffs[x].append(0)
            coeffs[x].append(val)                           #followed by a val

#/now 'de-hilbertize' coefficient blocks:

    Yheight = encoded_height//8
    Ywidth = encoded_width//8
    UVheight = Yheight//2
    UVwidth = Ywidth//2
    global comapY, comapU, comapV
    comapY = de_hilbert(Ywidth, Yheight, coeffs)
    comapU = de_hilbert(UVwidth, UVheight, coeffs[blocks_Y:])
    comapV = de_hilbert(UVwidth, UVheight, coeffs[(blocks_Y+blocks_U):])

#/next, we need to reverse the DC prediction-based delta coding:

    DCpredict(comapY, Ywidth, Yheight)
    DCpredict(comapV, UVwidth, UVheight)
    DCpredict(comapU, UVwidth, UVheight)

#/finally reorder, dequantize, and iDCT all the coefficients, for each color plane:

    temp = [0 for x in range(64)]                           #temporary array

    for y in range(Yheight):
      for x in range(Ywidth):
        unzig (comapY[x][y], temp)
        dequant(temp, scalefactor_DC, scalefactor_AC, Y_quantizer)
        idct(temp)
        comapY[x][y] = temp + []                            #Python quirk -- force assignment by copy (not reference)

    for y in range(UVheight):
      for x in range(UVwidth):
        unzig (comapU[x][y], temp)
        dequant(temp, scalefactor_DC, scalefactor_AC, UV_quantizer)
        idct(temp)
        comapU[x][y] = temp + []

    for y in range(UVheight):
      for x in range(UVwidth):
        unzig (comapV[x][y], temp)
        dequant(temp, scalefactor_DC, scalefactor_AC, UV_quantizer)
        idct(temp)
        comapV[x][y] = temp + []

#/convert the image into a raw byte array of planar Y, V, U:

    pixY = blocks2pixels(comapY, decode_width, decode_height, offset_x, offset_y)
    pixU = blocks2pixels(comapU, decode_width>>1, decode_height>>1, offset_x>>1, offset_y>>1)
    pixV = blocks2pixels(comapV, decode_width>>1, decode_height>>1, offset_x>>1, offset_y>>1)

#/run loop filter:

    loopfilter(pixY, decode_width, decode_height)
    loopfilter(pixU, decode_width>>1, decode_height>>1)
    loopfilter(pixV, decode_width>>1, decode_height>>1)

#/return the three color planes:

    return pixels2chars(pixY), pixels2chars(pixU), pixels2chars(pixV)

##/Decode Predicted Frame:                THIS SECTION UNFINISHED
  else: 
    spec_err("decoding interframe (NOT!)")
    coding_scheme = readbits(3)
    if coding_scheme == 0:
      mode_alphabet = []                                    #define a list (think of it as an array)
      for x in range(8):
        mode_alphabet.append(readbits(3))                   #add another mode to the list

#/--end of definition for decode_frame()


##Parse Theora packets

#/Define a function to parse the packet type & call appropriate functions.  Returns either a string for header packets, or a tuple of Y, U, and V data for frames:

def decode_packet():
  packet_type = readbits(1)
  if packet_type == 0:
    return decode_frame()
  else:
    return decode_header()

###_MAIN TEST SEQUENCE

#let's test our routines by parsing the stream headers and the first frame.

###_Main Loop

while 1 :
  get_page()
  if page == None :
    print 'No more pages left, done!'
    outfile.close()
    exit()
  if streams.has_key(page.serialno) :
    if page.serialno == streamid :
      stream.pagein(page)
      packet = stream.packetout()
      while packet != None :
        buffer = ogg2.OggPackBuffB(packet)
        output(decode_packet())
        packet = stream.packetout()
  else :
    decode_new_stream()
  page = None
    

#'ret' should now have the first frame:


 