From e03e3175a06e5bf31f78d3ffcd230c34194bcd76 Mon Sep 17 00:00:00 2001 From: "Timothy B. Terriberry" Date: Tue, 21 Aug 2012 18:25:28 -0700 Subject: Fix nits reported since draft16. --- doc/draft-ietf-codec-opus.xml | 508 ++++++++++++++++++++++-------------------- 1 file changed, 261 insertions(+), 247 deletions(-) diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml index c1fd820d..a97d6190 100644 --- a/doc/draft-ietf-codec-opus.xml +++ b/doc/draft-ietf-codec-opus.xml @@ -123,14 +123,14 @@ For these reasons, this RFC uses the reference implementation as the sole always the easiest way to understand the codec's operation. For this reason, this document also describes significant parts of the codec in English and takes the opportunity to explain the rationale behind many of the more -surprising elements of the design. +surprising elements of the design. - @@ -212,9 +212,9 @@ With this definition, if lo > hi, then lo is returned. The sign of x, i.e.,
0 + ( -1, x < 0 +sign(x) = < 0, x == 0 + ( 1, x > 0 ]]>
@@ -225,7 +225,7 @@ sign(x) = < 0, x == 0 The absolute value of x, i.e.,
@@ -249,7 +249,7 @@ The integer z nearest to f, with ties rounded towards negative infinity, i.e.,
@@ -304,7 +304,7 @@ The codec allows input and output of various audio bandwidths, defined as - + Abbreviation Audio Bandwidth @@ -314,7 +314,7 @@ The codec allows input and output of various audio bandwidths, defined as WB (wideband) 8 kHz 16 kHz SWB (super-wideband) 12 kHz 24 kHz FB (fullband) 20 kHz (*) 48 kHz - + @@ -637,7 +637,7 @@ The 32 possible configurations each identify which one of these operating modes the packet uses, as well as the audio bandwidth and the frame size. lists the parameters for each configuration. - + Configuration Number(s) Mode @@ -652,7 +652,7 @@ The 32 possible configurations each identify which one of these operating modes 20...23 CELT-only WB 2.5, 5, 10, 20 ms 24...27 CELT-only SWB 2.5, 5, 10, 20 ms 28...31 CELT-only FB 2.5, 5, 10, 20 ms - + The configuration numbers in each range (e.g., 0...3 for NB SILK-only) @@ -1120,8 +1120,8 @@ MSB = Most Significant Bit ]]> - @@ -1184,11 +1184,11 @@ The second step updates the range decoder state with the three-tuple (fl[k], fh[k], ft) corresponding to that symbol. -The first step is implemented by ec_decode() (entdec.c), which computes +The first step is implemented by ec_decode() (entdec.c), which computes
@@ -1202,7 +1202,7 @@ It uses this tuple to update val according to
@@ -1210,7 +1210,7 @@ If fl[k] is greater than zero, then the decoder updates rng using
@@ -1218,7 +1218,7 @@ Otherwise, it updates rng using
@@ -1256,17 +1256,18 @@ The remaining bit in the byte just read is buffered for use in the next If no more input bytes remain, it uses zero bits instead. See for the initialization used to process the first byte. -Then, it sets +Then, it sets
It is normal and expected that the range decoder will read several bytes - into the data of the raw bits (if any) at the end of the packet by the time the frame - is completely decoded, as illustrated in . + into the data of the raw bits (if any) at the end of the frame by the time the + frame is completely decoded, as illustrated in + . This same data MUST also be returned as raw bits when requested. The encoder is expected to terminate the stream in such a way that the decoder will decode the intended values regardless of the data contained in the raw @@ -1381,7 +1382,7 @@ In such contexts, ec_dec_icdf() can decode the symbol by using a table that
-The raw bits used by the CELT layer are packed at the end of the packet, with +The raw bits used by the CELT layer are packed at the end of the frame, with the least significant bit of the first value packed in the least significant bit of the last byte, filling up to the most significant bit in the last byte, continuing on to the least significant bit of the penultimate byte, and so on. @@ -1425,10 +1426,10 @@ If ftb is 8 or less, then t is decoded with t = ec_decode(ft), and ft). -If ftb is greater than 8, then the top 8 bits of t are decoded using +If ftb is greater than 8, then the top 8 bits of t are decoded using
> (ftb - 8)) + 1) +t = ec_decode(((ft - 1) >> (ftb - 8)) + 1) ]]>
the decoder state is updated using the three-tuple @@ -1437,7 +1438,7 @@ t = ec_decode(((ft - 1) >> (ftb - 8)) + 1) and the remaining bits are decoded as raw bits, setting
If, at this point, t >= ft, then the current frame is corrupt. @@ -1527,7 +1528,7 @@ Let
> (lg-16) +r_Q15 = rng >> (lg-16) ]]>
so that 32768 <= r_Q15 < 65536, an unsigned Q15 value representing the @@ -1537,21 +1538,21 @@ First, update
> 15 +r_Q15 = (r_Q15*r_Q15) >> 15 ]]>
Then, add the 16th bit of r_Q15 to lg via
> 16) +lg = 2*lg + (r_Q15 >> 16) ]]>
Finally, if this bit was a 1, reduce r_Q15 by a factor of two via
> 1 +r_Q15 = r_Q15 >> 1 ]]>
so that it once again lies in the range 32768 <= r_Q15 < 65536. @@ -1688,7 +1689,7 @@ Figures  mono and stereo, respectively.
- + Symbol(s) @@ -1715,7 +1716,7 @@ Figures  - +
- + Frame Size PDF 40 ms {0, 53, 53, 150}/256 60 ms {0, 41, 20, 29, 41, 15, 28, 82}/256 - + @@ -1924,7 +1925,7 @@ The quantized excitation signal (see ) follows SILK frame. - + Symbol(s) @@ -2003,7 +2004,7 @@ The quantized excitation signal (see ) follows - +
- + Stage PDF @@ -2068,7 +2069,7 @@ Then, let i0 and i1 be indices decoded with the stage-2 and stage-3 PDFs in Stage 3 {51, 51, 52, 51, 51}/256 - + @@ -2100,7 +2101,7 @@ Although wi0 and wi1 only have 15 possible values, interpolation between entry wi0 and (wi0 + 1) (and likewise for wi1). - + Index @@ -2121,7 +2122,7 @@ Although wi0 and wi1 only have 15 possible values, 13 8266 14 10050 15 13732 - +
@@ -2173,11 +2174,11 @@ In that case, if this flag is zero (indicating that there should be a side Otherwise, the stereo image will collapse. - + PDF {192, 64}/256 - +
@@ -2198,16 +2199,16 @@ If the frame is an LBRR frame or a regular SILK frame whose VAD flag was set type.
- + VAD Flag PDF Inactive {26, 230, 0, 0, 0, 0}/256 Active {0, 0, 24, 74, 148, 10}/256 - + - + Frame Type @@ -2219,7 +2220,7 @@ If the frame is an LBRR frame or a regular SILK frame whose VAD flag was set 3 Unvoiced High 4 Voiced Low 5 Voiced High - + @@ -2265,7 +2266,7 @@ In an independently coded subframe gain, the 3 most significant bits of the type (see ).
- + Signal Type @@ -2273,18 +2274,18 @@ In an independently coded subframe gain, the 3 most significant bits of the Inactive {32, 112, 68, 29, 12, 1, 1, 1}/256 Unvoiced {2, 17, 45, 60, 62, 47, 19, 4}/256 Voiced {1, 3, 26, 71, 94, 50, 9, 2}/256 - + The 3 least significant bits are decoded using a uniform PDF: - + PDF {32, 32, 32, 32, 32, 32, 32, 32}/256 - + @@ -2293,7 +2294,7 @@ When the gain for the previous subframe is available, then the current gain is limited as follows:
This may help some implementations limit the change in precision of their @@ -2314,7 +2315,7 @@ For subframes that do not have an independent gain (including the first The PDF in yields a delta_gain_index value between 0 and 40, inclusive.
- + PDF @@ -2323,7 +2324,7 @@ The PDF in yields a delta_gain_index value 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256 - + The following formula translates this index into a quantization gain for the @@ -2331,7 +2332,7 @@ The following formula translates this index into a quantization gain for the
@@ -2411,7 +2412,7 @@ The actual codebook elements are listed in Tables stages of reconstructing the LSF coefficients. - + Audio Bandwidth @@ -2445,7 +2446,7 @@ The actual codebook elements are listed in Tables 14, 12, 2, 6, 1, 12, 12, 11, 10, 3, 10, 5, 1, 1, 1, 3}/256 - + @@ -2463,7 +2464,7 @@ Which PDF is used for which coefficient is driven by the index, I1, lists the same information for WB. - + Codebook @@ -2476,10 +2477,10 @@ Which PDF is used for which coefficient is driven by the index, I1, f {1, 3, 17, 55, 90, 73, 15, 1, 1}/256 g {1, 7, 24, 53, 74, 67, 26, 3, 1}/256 h {1, 1, 18, 63, 78, 58, 30, 6, 1}/256 - + - + Codebook @@ -2492,10 +2493,10 @@ Which PDF is used for which coefficient is driven by the index, I1, n {1, 1, 14, 54, 100, 72, 12, 1, 1}/256 o {1, 1, 15, 61, 87, 61, 25, 4, 1}/256 p {1, 7, 21, 50, 77, 81, 17, 1, 1}/256 - + - + I1 @@ -2566,10 +2567,10 @@ Which PDF is used for which coefficient is driven by the index, I1, c f d h f f e e f e 31 e e f e f g f g f e - + - + I1 @@ -2640,7 +2641,7 @@ Which PDF is used for which coefficient is driven by the index, I1, l  n  n  m  p  n  l  l  k  l  k  k  j  i  j  i 31 k  l  n  l  m  l  l  l  k  j  k  o  m  i  i  i - + @@ -2655,12 +2656,12 @@ If the index is either -4 or 4, it reads a second symbol using the PDF in This gives the index, I2[k], a total range of -10 to 10, inclusive. - + PDF {156, 60, 24, 9, 4, 2, 1}/256 - + @@ -2677,7 +2678,7 @@ There are two lists for NB and MB, and another two lists for WB, giving two possible prediction weights for each coefficient. - + Coefficient @@ -2700,7 +2701,7 @@ There are two lists for NB and MB, and another two lists for WB, giving two 12 198 160 13 192 142 14 182 155 - + @@ -2726,7 +2727,7 @@ res_Q10[k] = (k+1 < d_LPC ? (res_Q10[k+1]*pred_Q8[k])>>8 : 0) respectively). - + I1 @@ -2797,10 +2798,10 @@ res_Q10[k] = (k+1 < d_LPC ? (res_Q10[k+1]*pred_Q8[k])>>8 : 0) A A A B B A B A B 31 B A B B A B B B B - + - + I1 @@ -2871,7 +2872,7 @@ res_Q10[k] = (k+1 < d_LPC ? (res_Q10[k+1]*pred_Q8[k])>>8 : 0) D  C  C  C  C  C  C  C  C  C  C  D  C  C  C 31 C  C  D  C  C  D  D  D  C  C  D  C  C  D  C - + @@ -2897,7 +2898,7 @@ Then, for 0 <= k < d_LPC, the following expression @@ -2924,7 +2925,7 @@ The reference implementation already requires code to compute these weights on required. - + I1 @@ -2995,10 +2996,10 @@ The reference implementation already requires code to compute these weights on 24  30  52  84 131 150 166 186 203 229 31 37  48  64  84 104 118 156 177 201 230 - + - + I1 @@ -3069,7 +3070,7 @@ The reference implementation already requires code to compute these weights on 16 29 47 61 76  90 106 119 133 147 161 176 193 209 224 240 31 15 21 35 50 61  73  86  97 110 119 129 141 175 198 218 237 - + @@ -3079,7 +3080,7 @@ Given the stage-1 codebook entry cb1_Q8[], the stage-2 residual res_Q10[], and
where the division is integer division. @@ -3108,7 +3109,7 @@ For the purposes of computing this spacing for the first and last coefficient, NLSF_Q15[-1] is taken to be 0 and NLSF_Q15[d_LPC] is taken to be 32768.
- + Coefficient @@ -3131,7 +3132,7 @@ For the purposes of computing this spacing for the first and last coefficient, 14 7 15 3 16 347 - + @@ -3171,7 +3172,7 @@ center_freq_Q15 = clamp(min_center_Q15[i], NLSF_Q15[i-1] = center_freq_Q15 - (NDeltaMin_Q15[i]>>1) - NLSF_Q15[i] = NLSF_Q15[i-1] + NDeltaMin_Q15[i] + NLSF_Q15[i] = NLSF_Q15[i-1] + NDeltaMin_Q15[i] ]]> Then, the procedure repeats again, until it has either executed 20 times or @@ -3185,13 +3186,13 @@ First, the values of NLSF_Q15[k] for 0 <= k < d_LPC Then, for each value of k from 0 to d_LPC-1, NLSF_Q15[k] is set to
Next, for each value of k from d_LPC-1 down to 0, NLSF_Q15[k] is set to
@@ -3222,12 +3223,12 @@ After either For 10 ms SILK frames, this factor is not stored at all. - + PDF {13, 22, 29, 11, 181}/256 - + @@ -3238,7 +3239,7 @@ Then, the normalized LSF coefficients used for the first half of a 20 ms frame, n1_Q15[k], are
> 2) +n1_Q15[k] = n0_Q15[k] + (w_Q2*(n2_Q15[k] - n0_Q15[k]) >> 2) ]]>
This interpolation is performed in silk_decode_parameters() @@ -3267,7 +3268,7 @@ with P(z) = A(z) + z * A(z ) -d_LPC-1 -1 -Q(z) = A(z) - z * A(z ) +Q(z) = A(z) - z * A(z ) ]]> The even normalized LSF coefficients correspond to a pair of conjugate roots of @@ -3306,7 +3307,7 @@ These values are also re-ordered to improve numerical accuracy when constructing the LPC polynomials.
- + Coefficient @@ -3328,7 +3329,7 @@ These values are also re-ordered to improve numerical accuracy when 13 9 14 14 15 1 - + @@ -3341,7 +3342,7 @@ Then, the re-ordered, approximated cosine, c_Q17[ordering[k]], is
> 3 + + (cos_Q12[i+1]-cos_Q12[i])*f + 4) >> 3 ]]>
where ordering[k] is the k'th entry of the column of @@ -3349,7 +3350,7 @@ c_Q17[ordering[k]] = (cos_Q12[i]*256 bandwidth and cos_Q12[i] is the i'th entry of .
- + i @@ -3423,7 +3424,7 @@ c_Q17[ordering[k]] = (cos_Q12[i]*256 -4076-4085-4091-4095 128 -4096 - + @@ -3444,10 +3445,10 @@ Then, for 0 < k < d2 and 0 <= j <
>16) + - ((c_Q17[2*k]*p_Q16[k-1][j-1] + 32768)>>16) q_Q16[k][j] = q_Q16[k-1][j] + q_Q16[k-1][j-2] - - ((c_Q17[2*k+1]*q_Q16[k-1][j-1] + 32768)>>16) + - ((c_Q17[2*k+1]*q_Q16[k-1][j-1] + 32768)>>16) ]]>
The use of Q17 values for the cosine terms in an otherwise Q16 expression @@ -3464,7 +3465,7 @@ silk_NLSF2A() uses the values from the last row of this recurrence to
> 5, 163838) +maxabs_Q12 = min((maxabs_Q17 + 16) >> 5, 163838) ]]>
If this is larger than 32767, the procedure derives the chirp factor, @@ -3505,7 +3506,7 @@ If this is larger than 32767, the procedure derives the chirp factor,
> 2 ]]>
@@ -3542,7 +3543,7 @@ After 10 rounds of bandwidth expansion are performed, they are simply saturated to 16 bits:
> 5, 32767) << 5 +a32_Q17[k] = clamp(-32768, (a32_Q17[k] + 16) >> 5, 32767) << 5 ]]>
Because this performs the actual saturation in the Q12 domain, but converts the @@ -3562,7 +3563,7 @@ The prediction gain of an LPC synthesis filter is the square root of the output energy when the filter is excited by a unit-energy impulse. Even if the Q12 coefficients would fit, the resulting filter may still have a significant gain (especially for voiced sounds), making the filter unstable. -silk_NLSF2A() applies up to 18 additional rounds of bandwidth expansion to +silk_NLSF2A() applies up to 16 additional rounds of bandwidth expansion to limit the prediction gain. Instead of controlling the amount of bandwidth expansion using the prediction gain itself (which may diverge to infinity for an unstable filter), @@ -3578,7 +3579,7 @@ The reflection coefficients, rc[k], can be computed using a simple Levinson rc[k] = -a[k][k] , a[k][n] - a[k][k-n-1]*rc[k] -a[k-1][n] = --------------------------- +a[k-1][n] = --------------------------- 2 1 - rc[k] ]]> @@ -3616,43 +3617,55 @@ Increasing the precision of these Q12 coefficients to Q24 for intermediate so the decoder initializes the recurrence via
Then, for each k from d_LPC-1 down to 0, if abs(a32_Q24[k][k]) > 16773022, the filter is unstable and the recurrence stops. The constant 16773022 here is approximately 0.99975 in Q24. -Otherwise, row k-1 of a32_Q24 is computed from row k as +Otherwise, the inverse of the prediction gain, inv_gain_Q30[k], is updated via
> 32) - b1[k] = ilog(div_Q30[k]) +inv_gain_Q30[k] = (inv_gain_Q30[k+1]*div_Q30[k] >> 32) << 2 +]]> +
+ and if inv_gain_Q30[k] < 107374, the filter is unstable and the + recurrence stops. +The constant 107374 here is approximately 1/10000 in Q30. +If neither of these checks determine that the filter is unstable and + k > 0, row k-1 of a32_Q24 is computed from row k as +
+> (b2[k]+1) err_Q29[k] = (1<<29) - - ((div_Q30[k]<<(15-b2[k]))*inv_Qb2[k] >> 16) + - ((div_Q30[k]<<(15-b2[k]))*inv_Qb2[k] >> 16) gain_Qb1[k] = ((inv_Qb2[k] << 16) - + (err_Q29[k]*inv_Qb2[k] >> 13)) + + (err_Q29[k]*inv_Qb2[k] >> 13)) num_Q24[k-1][n] = a32_Q24[k][n] - - ((a32_Q24[k][k-n-1]*rc_Q31[k] + (1<<30)) >> 31) + - ((a32_Q24[k][k-n-1]*rc_Q31[k] + (1<<30)) >> 31) a32_Q24[k-1][n] = (num_Q24[k-1][n]*gain_Qb1[k] - + (1<<(b1[k]-1))) >> b1[k] + + (1<<(b1[k]-1))) >> b1[k] ]]>
where 0 <= n < k. -Here, rc_Q30[k] are the reflection coefficients. +In the above, rc_Q31[k] are the reflection coefficients. div_Q30[k] is the denominator for each iteration, and gain_Qb1[k] is its multiplicative inverse (with b1[k] fractional bits, where b1[k] ranges from 20 to 31). @@ -3670,8 +3683,10 @@ In practice, because each row only depends on the next one, an implementation does not need to store them all.
-If abs(a32_Q24[k][k]) <= 16773022 for - 0 <= k < d_LPC, then the filter is considered stable. +If abs(a32_Q24[k][k]) <= 16773022 and + inv_gain_Q30[k] >= 107374 for + 0 <= k < d_LPC, then the filter is considered + stable. However, the problem of determining stability is ill-conditioned when the filter contains several reflection coefficients whose magnitude is very close to one. @@ -3680,22 +3695,22 @@ This fixed-point algorithm is not mathematically guaranteed to correctly in practice. -On round i, 1 <= i <= 18, if the filter passes these +On round i, 0 <= i < 16, if the filter passes these stability checks, then this procedure stops, and the final LPC coefficients to use for reconstruction in are
> 5 +a_Q12[k] = (a32_Q17[k] + 16) >> 5 ]]>
Otherwise, a round of bandwidth expansion is applied using the same procedure as in , with
-During the 15th round, sc_Q16[0] becomes 0 in the above equation, so a_Q12[k] +During round 15, sc_Q16[0] becomes 0 in the above equation, so a_Q12[k] is set to 0 for all k, guaranteeing a stable filter.
@@ -3738,10 +3753,11 @@ That previous SILK frame was coded, but was not voiced (see With absolute coding, the primary pitch lag may range from 2 ms (inclusive) up to 18 ms (exclusive), corresponding to pitches from 500 Hz down to 55.6 Hz, respectively. -It is comprised of a high part and a low part, where the decoder reads the high - part using the 32-entry codebook in - and the low part using the codebook corresponding to the current audio - bandwidth from . +It is comprised of a high part and a low part, where the decoder first reads + the high part using the 32-entry codebook in + and then the low part using the + codebook corresponding to the current audio bandwidth from + . The final primary pitch lag is then
, respectively. - + PDF @@ -3761,10 +3777,10 @@ lag = lag_high*lag_scale + lag_low + lag_min 11, 10, 12, 13, 13, 12, 11, 9, 8, 7, 6, 4, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1}/256 - + - + Audio Bandwidth @@ -3775,7 +3791,7 @@ lag = lag_high*lag_scale + lag_low + lag_min NB {64, 64, 64, 64}/256 4 16 144 MB {43, 42, 43, 43, 42, 43}/256 6 24 216 WB {32, 32, 32, 32, 32, 32, 32, 32}/256 8 32 288 - + @@ -3801,14 +3817,14 @@ However, because an Opus frame can use relative coding for at most two consecutive SILK frames, integer overflow should not be an issue. - + PDF {46, 2, 2, 3, 4, 6, 10, 15, 26, 38, 30, 22, 15, 10, 7, 6, 4, 4, 2, 2, 2}/256 - + @@ -3824,7 +3840,7 @@ Tables  subframe given the decoded codebook index. - + Audio Bandwidth @@ -3845,10 +3861,10 @@ Tables  5, 4, 4, 4, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1}/256 - + - + Index @@ -3856,10 +3872,10 @@ Tables  0  0  0 1  1  0 2  0  1 - + - + Index @@ -3875,10 +3891,10 @@ Tables  8  1  0  0  0 9  0  0  0 -1 10  1  0  0 -1 - + - + Index @@ -3895,10 +3911,10 @@ Tables  9 -2  3 10  3 -2 11 -3  3 - + - + Index @@ -3937,7 +3953,7 @@ Tables  31  5  2 -2 -5 32  8  3 -2 -7 33 -9 -3  3  9 - + @@ -3984,11 +4000,11 @@ This immediately follows the subframe pitch lags, and is coded using the 3-entry PDF from . - + PDF {77, 80, 99}/256 - + @@ -4000,7 +4016,7 @@ Tables  contain the corresponding filter taps as signed Q7 integers. - + Periodicity Index Codebook Size @@ -4012,10 +4028,10 @@ Tables  11, 10, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 6, 6, 5, 4, 5, 4, 4, 4, 3, 4, 3, 2}/256 - + - + Index @@ -4036,10 +4052,10 @@ Tables   -6   4  66   7  -8 7  16  14  38  -3  33 - + - + Index @@ -4077,10 +4093,10 @@ Tables   -2  55  46  -2  15 15   3  -1  21  16  41 - + - + Index @@ -4149,7 +4165,7 @@ Tables   81   5  11   3   7 31   2   0   9  10  88 - + @@ -4197,12 +4213,12 @@ Frames that do not code the scaling parameter use the default factor of 15565 (approximately 0.95). - + PDF {128, 64, 64}/256 - + @@ -4224,12 +4240,12 @@ The decoder reads the seed using the uniform 4-entry PDF in , yielding a value between 0 and 3, inclusive. - + PDF {64, 64, 64, 64}/256 - + @@ -4247,7 +4263,7 @@ Thus, the codebook includes all integer codevectors y of dimension N that @@ -4279,7 +4295,7 @@ The decoder contains no special case that prevents an encoder from placing if present, but they are otherwise ignored. - + Audio Bandwidth @@ -4291,7 +4307,7 @@ The decoder contains no special case that prevents an encoder from placing NB 20 ms 10 MB 20 ms 15 WB 20 ms 20 - +
@@ -4312,7 +4328,7 @@ Level 0 provides a more efficient encoding at low rates generally, and An encoder should, but is not required to, use the most efficient rate level. - + Signal Type @@ -4321,7 +4337,7 @@ An encoder should, but is not required to, use the most efficient rate level. {15, 51, 12, 46, 45, 13, 33, 27, 14}/256 Voiced {33, 30, 36, 17, 34, 49, 18, 21, 18}/256 - +
@@ -4350,7 +4366,7 @@ The cumulative distribution for rate level 10 is just a shifted version of that for 9 and thus does not require any additional storage. - + Rate Level @@ -4377,7 +4393,7 @@ The cumulative distribution for rate level 10 is just a shifted version of {1, 1, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2}/256 10 {2, 1, 6, 27, 58, 56, 39, 25, 14, 10, 6, 3, 3, 2, 1, 1, 2, 0}/256 - + @@ -4411,7 +4427,7 @@ This process skips partitions without any pulses, i.e., where the initial pulse These partitions have nothing to code, so they require no PDF. - + Pulse Count @@ -4432,10 +4448,10 @@ These partitions have nothing to code, so they require no PDF. 14 {1, 1, 4, 10, 17, 27, 37, 47, 43, 33, 21, 9, 4, 1, 1}/256 15 {1, 1, 1, 8, 14, 22, 33, 40, 43, 38, 28, 16, 8, 1, 1, 1}/256 16 {1, 1, 1, 1, 13, 18, 27, 36, 41, 41, 34, 24, 14, 1, 1, 1, 1}/256 - + - + Pulse Count @@ -4456,10 +4472,10 @@ These partitions have nothing to code, so they require no PDF. 14 {1, 1, 4, 10, 20, 31, 40, 42, 40, 31, 20, 10, 4, 1, 1}/256 15 {1, 1, 3, 8, 16, 26, 35, 38, 38, 35, 26, 16, 8, 3, 1, 1}/256 16 {1, 1, 2, 6, 12, 21, 30, 36, 38, 36, 30, 21, 12, 6, 2, 1, 1}/256 - + - + Pulse Count @@ -4480,10 +4496,10 @@ These partitions have nothing to code, so they require no PDF. 14 {1, 3, 7, 14, 22, 29, 34, 36, 34, 29, 22, 14, 7, 3, 1}/256 15 {1, 2, 5, 11, 18, 25, 31, 35, 35, 31, 25, 18, 11, 5, 2, 1}/256 16 {1, 1, 4, 9, 15, 21, 28, 32, 34, 32, 28, 21, 15, 9, 4, 1, 1}/256 - + - + Pulse Count @@ -4504,7 +4520,7 @@ These partitions have nothing to code, so they require no PDF. 14 {1, 4, 7, 13, 21, 29, 35, 36, 35, 29, 21, 13, 7, 4, 1}/256 15 {1, 2, 5, 10, 17, 25, 32, 36, 36, 32, 25, 17, 10, 5, 2, 1}/256 16 {1, 2, 4, 7, 13, 21, 28, 34, 36, 34, 28, 21, 13, 7, 4, 2, 1}/256 - + @@ -4521,11 +4537,11 @@ The LSBs are coded from most significant to least significant, and they all use the PDF in . - + PDF {136, 120}/256 - + @@ -4561,7 +4577,7 @@ If a block contains many positive coefficients, it is sometimes beneficial to coefficient magnitude encoding. - + Signal Type @@ -4610,7 +4626,7 @@ If a block contains many positive coefficients, it is sometimes beneficial to Voiced High 4 {168, 88}/256 Voiced High 5 {161, 95}/256 Voiced High 6 or more {154, 102}/256 - + @@ -4627,7 +4643,7 @@ The constant quantization offset varies depending on the signal type and quantization offset type (see ). - + Signal Type @@ -4639,7 +4655,7 @@ The constant quantization offset varies depending on the signal type and Unvoiced High 60 Voiced Low 8 Voiced High 25 - + @@ -4734,8 +4750,7 @@ During reconstruction of the first subframe for this channel after either An uncoded regular SILK frame (if this is the side channel), or A decoder reset (see ), - out[] is rewhitened into an LPC residual, - res[i], via + out[i] is rewhitened into an LPC residual, res[i], via
@@ -4771,7 +4786,7 @@ Then for i such that j <= i < (j + n), @@ -4810,7 +4825,7 @@ Then, for i, such that j <= i < (j + n), the @@ -4825,7 +4840,7 @@ This requires storage for up to 16 values of lpc[i] (for WB frames). Then, the signal is clamped into the final nominal range:
This clamping occurs entirely after the LPC synthesis filter has run. @@ -4880,18 +4895,18 @@ Then, for i, such that j <= i < (j + n2),
@@ -4948,7 +4963,7 @@ However, such deviations are unlikely to be perceptible, and the comparison The delays listed here are the ones that should be targeted by the encoder. - + Audio Bandwidth @@ -4956,7 +4971,7 @@ The delays listed here are the ones that should be targeted by the encoder. NB 0.538 MB 0.692 WB 0.706 - + @@ -4994,7 +5009,7 @@ preserve the spectral envelope of the signal. The remaining unit-norm shape vector is encoded using a Pyramid Vector Quantizer (PVQ) . - + Frame Size: @@ -5026,7 +5041,7 @@ vector is encoded using a Pyramid Vector Quantizer (PVQ)  Frame size (ms) @@ -5691,10 +5706,10 @@ resolution is shown in the tables below. 5 0 -1 10 0 -2 20 0 -2 - + - + Frame size (ms) @@ -5704,11 +5719,11 @@ resolution is shown in the tables below. 5 0 -2 10 0 -3 20 0 -3 - + - + Frame size (ms) @@ -5718,10 +5733,10 @@ resolution is shown in the tables below. 5 1 0 10 2 0 20 3 0 - + - + Frame size (ms) @@ -5731,7 +5746,7 @@ resolution is shown in the tables below. 5 1 -1 10 1 -1 20 1 -1 - + @@ -5785,7 +5800,7 @@ It is derived from a basic (full-overlap) 240-sample version of the window used
@@ -5839,7 +5854,7 @@ used in the encoder:
@@ -5997,8 +6012,7 @@ The presence of redundancy is signaled in all SILK-only and Hybrid frames, not This allows the frames to be decoded correctly even if an adjacent frame is lost. For SILK-only frames, this signaling is implicit, based on the size of the - of the Opus frame and the number of bits consumed decoding the SILK portion of - it. + Opus frame and the number of bits consumed decoding the SILK portion of it. After decoding the SILK portion of the Opus frame, the decoder uses ec_tell() (see ) to check if there are at least 17 bits remaining. @@ -6016,11 +6030,11 @@ Otherwise (if there were fewer than 37 bits left or the value was 0), the frame does not contain redundancy. - + PDF {4095, 1}/4096 - + @@ -6036,11 +6050,11 @@ After determining that a frame contains redundancy, the decoder reads a (). - + PDF {1, 1}/2 - + @@ -6311,7 +6325,7 @@ For a normal encoder where both the SILK and the CELT modules are included, an o encoder should select which coding mode to use at run-time depending on the conditions. In the reference implementation, the frame size is selected by the application, but the other configuration parameters (number of channels, bandwidth, mode) are automatically -selected (unless explicitly overridden by the application) depend on the following: +selected (unless explicitly overridden by the application) depending on the following: Requested bitrate Input sampling rate @@ -6400,11 +6414,11 @@ If fl[k] is greater than zero, then
@@ -6430,7 +6444,7 @@ First, the top 9 bits of val, (val>>23), are sent to the carry buffer, Then, the encoder sets
@@ -6463,7 +6477,7 @@ If ext is non-zero, then the encoder outputs ext bytes -- all with a value of 0 rem is set to the 8 data bits:
@@ -6578,7 +6592,7 @@ Then, while end is not zero, the top 9 bits of end, i.e., (end>>23), are , and end is updated via
Finally, if the buffered output byte, rem, is neither zero nor the special @@ -6757,7 +6771,7 @@ the remainder of this section. An overview of the encoder is given in 9: LSF coefficients 10: Quantized LSF coefficients 11: Processed gains, and synthesis noise shape coefficients -12: LTP state scaling coefficient. Controlling error +12: LTP state scaling coefficient. Controlling error propagation / prediction gain trade-off 13: Quantized signal ]]> @@ -7045,7 +7059,7 @@ origin, using the formulas a_ana(k) = a(k)*g_ana and k - a_syn(k) = a(k)*g_syn + a_syn(k) = a(k)*g_syn ]]>
@@ -7122,7 +7136,7 @@ as @@ -7537,7 +7551,7 @@ taking into account the frame size by subtracting 80 bits per frame for coarse e band using intensity coding is as follows:
- + bitrate (kbit/s) @@ -7549,8 +7563,8 @@ band using intensity coding is as follows: 84-102 19 102-130 20 >130 disabled - - + + @@ -7836,16 +7850,16 @@ We would also like to thank Igor Dyakonov, Christian Hoene, and Jan Skoglund Thanks to Andrew D'Addesio, Elwyn Davies, Ralph Giles, Christian Hoene, John Ridges, Ben Schwartz, Kat Walsh, Keith Yan, and many others on the Opus and CELT mailing lists for their bug reports and feedback. At last, the -authors would like to thank Robert Sparks, Cullen Jennings, and Johathan +authors would like to thank Robert Sparks, Cullen Jennings, and Jonathan Rosenberg for their support throughout the standardization process.
- @@ -8398,22 +8412,22 @@ Releases and other resources are available at
@@ -15906,7 +15920,7 @@ necessary updates.
Because of size constraints, the Opus test vectors are not distributed in this -document. They are available in the proceedings of the 83th IETF meeting (Paris) and from the Opus codec website at +document. They are available in the proceedings of the 83rd IETF meeting (Paris) and from the Opus codec website at . These test vectors were created specifically to exercise all aspects of the decoder; therefore, the audio quality of the decoded output is significantly lower than what Opus can achieve in normal operation. -- cgit v1.2.1