summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTimothy B. Terriberry <tterribe@xiph.org>2012-05-15 10:51:59 -0400
committerJean-Marc Valin <jmvalin@jmvalin.ca>2012-05-15 10:51:59 -0400
commitdf39d65c839183e19498edb640da8e783d130722 (patch)
tree8622453756d9c834e1e506a85bd09a5e9ed06848
parent2cb95f528582ed19a1acedbba6ca513649dd736c (diff)
downloadopus-df39d65c839183e19498edb640da8e783d130722.tar.gz
Gen-art part2
-rw-r--r--doc/draft-ietf-codec-opus.xml78
1 files changed, 44 insertions, 34 deletions
diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml
index 4b59db99..b4ce3520 100644
--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -3588,8 +3588,8 @@ Otherwise, a round of bandwidth expansion is applied using the same procedure
sc_Q16[0] = 65536 - (2<<i) .
]]></artwork>
</figure>
-After the 15th round, the filter is guaranteed to be stable because sc_Q16[0]
-is 0 so a_Q12[k] is set to 0 for all k.
+During the 15th round, sc_Q16[0] becomes 0 in the above equation, so a_Q12[k]
+ is set to 0 for all k, guaranteeing a stable filter.
</t>
</section>
@@ -4820,31 +4820,33 @@ When the decoder is reset, any samples remaining in the resampling buffer
<section title="CELT Decoder">
<t>
-The CELT part of Opus is based on the Modified Discrete Cosine Transform
+The CELT layer of Opus is based on the Modified Discrete Cosine Transform
<xref target='MDCT'/> with partially overlapping windows of 5 to 22.5 ms.
The main principle behind CELT is that the MDCT spectrum is divided into
bands that (roughly) follow the Bark scale, i.e. the scale of the ear's
-critical bands. There are 21 of those bands, a band can contain as little as
-one MDCT bin per channel, and up to 176 bins per channel. In hybrid mode, the first
-17 bands (up to 8 kHz) are not coded. In each band, the gain (energy) is coded separately from
+critical bands. The normal CELT layer uses 21 of those bands, though Opus
+ Custom (see <xref target="opus-custom"/>) may use a different number of bands.
+A band can contain as little as one MDCT bin per channel, and as many as 176
+bins per channel.
+In each band, the gain (energy) is coded separately from
the shape of the spectrum. Coding the gain explicitly makes it easy to
preserve the spectral envelope of the signal. The remaining unit-norm shape
-vector is encoded using a pyramid vector quantizer <xref target='PVQ-decoder'/>.
+vector is encoded using a Pyramid Vector Quantizer (PVQ)&nbsp;<xref target='PVQ-decoder'/>.
</t>
<t>
-Transients are notoriously difficult to code for transform codecs and CELT
-uses two different strategies for dealing with them:
+Transients are notoriously difficult for transform codecs to code.
+CELT uses two different strategies for them:
<list style="numbers">
-<t>Using multiple smaller MDCTs instead of a large MDCT</t>
-<t>Dynamic time-frequency changes (See <xref target='tf-change'/>)</t>
+<t>Using multiple smaller MDCTs instead of a single large MDCT, and</t>
+<t>Dynamic time-frequency resolution changes (See <xref target='tf-change'/>).</t>
</list>
To improve quality on highly tonal and periodic signals, CELT includes
-a prefilter/postfilter combination. The prefilter on the encoder side
+a prefilter/postfilter combination. The prefilter on the encoder side
attenuates the signal's harmonics. The postfilter on the decoder size,
restores the original gain of the harmonics, while shaping the coding noise
to roughly follow the harmonics. Such noise shaping reduces the perception
-of the noise.
+of the noise.
</t>
<t>
@@ -4924,7 +4926,7 @@ in the SILK layer.
<section anchor="transient-decoding" title="Transient Decoding">
<t>
-The "transient" flag indicates whether the frame uses a long MDCT or shoft MDCTs.
+The "transient" flag indicates whether the frame uses a single long MDCT or several short MDCTs.
When it is set, then the MDCT coefficients represent multiple
short MDCTs in the frame. When not set, the coefficients represent a single
long MDCT for the frame. The flag is encoded in the bitstream with a probability of 1/8.
@@ -4943,7 +4945,7 @@ tf_change flags.
<t>
It is important to quantize the energy with sufficient resolution because
any energy quantization error cannot be compensated for at a later
-stage. Regardless of the resolution used for encoding the shape of a band,
+stage. Regardless of the resolution used for encoding the spectral shape of a band,
it is perceptually important to preserve the energy in each band. CELT uses a
three-step coarse-fine-fine strategy for encoding the energy in the base-2 log
domain, as implemented in quant_bands.c</t>
@@ -5014,18 +5016,16 @@ This is implemented in unquant_energy_finalise() (quant_bands.c).
</section> <!-- Energy decode -->
<section anchor="allocation" title="Bit Allocation">
-<t>Many codecs transmit significant amounts of side information for
-the purpose of controlling bit allocation within a frame. Often this
-side information controls bit usage indirectly and must be carefully
-selected to achieve the desired rate constraints.</t>
-
-<t>The band-energy normalized structure of Opus MDCT mode ensures that a
-constant bit allocation for the shape content of a band will result in a
-roughly constant tone-to-noise ratio, which provides for fairly consistent
-perceptual performance <xref target='Valin2010'/>. The effectiveness of this approach is the result of
-two factors: that the band energy, which is understood to be perceptually
-important on its own, is always preserved regardless of the shape precision, and because
-the constant tone-to-noise ratio implies a constant intra-band noise to masking ratio.
+
+<t>The band-energy normalized structure of the CELT layer ensures that using
+ the same number of bits for the spectral shape of a band in every packet will
+ result in a roughly constant tone-to-noise ratio.
+This provides fairly consistent perceptual
+ performance&nbsp;<xref target='Valin2010'/>.
+The effectiveness of this approach is the result of
+two factors: 1) the band energy, which is perceptually important on its own, is
+always preserved regardless of the shape precision, and 2) because
+the constant tone-to-noise ratio implies a constant intra-band noise-to-masking ratio.
Intra-band masking is the strongest of the perceptual masking effects. This structure
means that the ideal allocation is more consistent from frame to frame than
it is for other codecs without an equivalent structure.</t>
@@ -5036,16 +5036,26 @@ made in the encoder and decoder. Any deviation from the reference's resulting
bit allocation will result in corrupted output, though implementers are
free to implement the procedure in any way which produces identical results.</t>
-<t>Because all of the information required to decode a frame must be derived
-from that frame alone in order to retain robustness to packet loss, the
-overhead of explicitly signaling the allocation would be considerable,
-especially for low-latency (small frame size) applications,
-even though the allocation is relatively static.</t>
+<t>Many codecs transmit significant amounts of side information to control the
+ bit allocation within a frame.
+Often this control is only indirect, and must be exercised carefully to
+ achieve the desired rate constraints.
+The CELT layer, however, can adapt over a very wide range of rates, and thus
+ has a large number of codebooks sizes to choose from for each band.
+Explicitly signaling the size of each of these codebooks would impose
+ considerable overhead, even though the allocation is relatively static from
+ frame to frame.
+This is because all of the information required to compute these codebook sizes
+ must be derived from a single frame by itself, in order to retain robustness
+ to packet loss, so the signaling cannot take advantage of knowledge of the
+ allocation in neighboring frames.
+This problem is exacerbated in low-latency (small frame size) applications,
+ which would include this overhead in every frame.</t>
<t>For this reason, in the MDCT mode Opus uses a primarily implicit bit
allocation. The available bitstream capacity is known in advance to both
the encoder and decoder without additional signaling, ultimately from the
-packet sizes expressed by a higher-level protocol. Using this information
+packet sizes expressed by a higher-level protocol. Using this information,
the codec interpolates an allocation from a hard-coded table.</t>
<t>While the band-energy structure effectively models intra-band masking,
@@ -7446,7 +7456,7 @@ are built and &lt;vector path&gt; is the directory containing the test vectors.
</t>
</section>
-<section title="Opus Custom">
+<section anchor="opus-custom" title="Opus Custom">
<t>
Opus Custom is an OPTIONAL part of the specification that is defined to
handle special sample rates and frame rates that are not supported by the