docs/design/design-decodebin.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294

Decodebin design

GstDecodeBin
------------

Description:

  Autoplug and decode to raw media

  Input : single pad with ANY caps Output : Dynamic pads

* Contents

  _ a GstTypeFindElement connected to the single sink pad

  _ optionally a demuxer/parser

  _ optionally one or more DecodeGroup

* Autoplugging

  The goal is to reach 'target' caps (by default raw media).

  This is done by using the GstCaps of a source pad and finding the available
demuxers/decoders GstElement that can be linked to that pad.

  The process starts with the source pad of typefind and stops when no more
non-target caps are left. It is commonly done while pre-rolling, but can also
happen whenever a new pad appears on any element.

  Once a target caps has been found, that pad is ghosted and the
'pad-added' signal is emitted.

  If no compatible elements can be found for a GstCaps, the pad is ghosted and
the 'unknown-type' signal is emitted.


* Assisted auto-plugging

  When starting the auto-plugging process for a given GstCaps, two signals are
emitted in the following way in order to allow the application/user to assist or
fine-tune the process.

  _ 'autoplug-continue' :

    gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)

    This signal is fired at the very beginning with the source pad GstCaps. If
  the callback returns TRUE, the process continues normally. If the callback
  returns FALSE, then the GstCaps are considered as a target caps and the
  autoplugging process stops.

  - 'autoplug-factories' :

    GValueArray user_function (GstElement* decodebin, GstPad* pad, 
         GstCaps* caps);

    Get a list of elementfactories for @pad with @caps. This function is used to
    instruct decodebin2 of the elements it should try to autoplug. The default
    behaviour when this function is not overridern is to get all elements that
    can handle @caps from the registry sorted by rank.

  - 'autoplug-select' :

    gint user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps,
                GValueArray* factories);

    This signal is fired once autoplugging has got a list of compatible
  GstElementFactory. The signal is emitted with the GstCaps of the source pad
  and a pointer on the GValueArray of compatible factories.

    The callback should return the index of the elementfactory in @factories
    that should be tried next.

    If the callback returns -1, the autoplugging process will stop as if no
  compatible factories were found.

  The default implementation of this function will try to autoplug the first
  factory of the list.

* Target Caps

  The target caps are a read/write GObject property of decodebin.

  By default the target caps are:

  _ Raw audio : audio/x-raw-int, audio/x-raw-float

  _ and raw video : video/x-raw-rgb, video/x-raw-yuv

  _ and Text : text/plain, text/x-pango-markup


* media chain/group handling

  When autoplugging, all streams coming out of a demuxer will be grouped in a
DecodeGroup.

  All new source pads created on that demuxer after it has emitted the
'no-more-pads' signal will be put in another DecodeGroup.

  Only one decodegroup can be active at any given time. If a new decodegroup is
created while another one exists, that decodegroup will be set as blocking until
the existing one has drained.


DecodeGroup
-----------

Description:

  Streams belonging to the same group/chain of a media file.

* Contents

  The DecodeGroup contains:

  _ a GstMultiQueue to which all streams of a the media group are connected.

  _ the eventual decoders which are autoplugged in order to produce the
  requested target pads.

* Proper group draining

  The DecodeGroup takes care that all the streams in the group are completely
drained (EOS has come through all source ghost pads).

* Pre-roll and block

  The DecodeGroup has a global blocking feature. If enabled, all the ghosted
source pads for that group will be blocked.

  A method is available to unblock all blocked pads for that group.
 

GstMultiQueue
-------------

Description:

  Multiple input-output data queue
  
  The GstMultiQueue achieves the same functionnality as GstQueue, with a few
differences:

  * Multiple streams handling.

    The element handles queueing data on more than one stream at once. To
  achieve such a feature it has request sink pads (sink_%d) and 'sometimes' src
  pads (src_%d).

    When requesting a given sinkpad, the associated srcpad for that stream will
  be created. Ex: requesting sink_1 will generate src_1.


  * Non-starvation on multiple streams.

    If more than one stream is used with the element, the streams' queues will
  be dynamically grown (up to a limit), in order to ensure that no stream is
  risking data starvation. This guarantees that at any given time there are at
  least N bytes queued and available for each individual stream.

    If an EOS event comes through a srcpad, the associated queue should be
  considered as 'not-empty' in the queue-size-growing algorithm.


  * Non-linked srcpads graceful handling.

    A GstTask is started for all srcpads when going to GST_STATE_PAUSED.

    The task are blocking against a GCondition which will be fired in two
  different cases:

    _ When the associated queue has received a buffer.

    _ When the associated queue was previously declared as 'not-linked' and the
    first buffer of the queue is scheduled to be pushed synchronously in
    relation to the order in which it arrived globally in the element (see
    'Synchronous data pushing' below).

    When woken up by the GCondition, the GstTask will try to push the next
  GstBuffer/GstEvent on the queue. If pushing the GstBuffer/GstEvent returns
  GST_FLOW_NOT_LINKED, then the associated queue is marked as 'not-linked'. If
  pushing the GstBuffer/GstEvent succeeded the queue will no longer be marked as
  'not-linked'.

    If pushing on all srcpads returns GstFlowReturn different from GST_FLOW_OK,
  then all the srcpads' tasks are stopped and subsequent pushes on sinkpads will
  return GST_FLOW_NOT_LINKED.

  * Synchronous data pushing for non-linked pads.

    In order to better support dynamic switching between streams, the multiqueue
  (unlike the current GStreamer queue) continues to push buffers on non-linked
  pads rather than shutting down. 

    In addition, to prevent a non-linked stream from very quickly consuming all
  available buffers and thus 'racing ahead' of the other streams, the element
  must ensure that buffers and inlined events for a non-linked stream are pushed
  in the same order as they were received, relative to the other streams
  controlled by the element. This means that a buffer cannot be pushed to a
  non-linked pad any sooner than buffers in any other stream which were received
  before it.


=====================================
 Parsers, decoders and auto-plugging
=====================================

This section has DRAFT status.

Some media formats come in different "flavours" or "stream formats". These
formats differ in the way the setup data and media data is signalled and/or
packaged. An example for this is H.264 video, where there is a bytestream
format (with codec setup data signalled inline and units prefixed by a sync
code and packet length information) and a "raw" format where codec setup
data is signalled out of band (via the caps) and the chunking is implicit
in the way the buffers were muxed into a container, to mention just two of
the possible variants.

Especially on embedded platforms it is common that decoders can only
handle one particular stream format, and not all of them.

Where there are multiple stream formats, parsers are usually expected
to be able to convert between the different formats. This will, if
implemented correctly, work as expected in a static pipeline such as

   ... ! parser ! decoder ! sink

where the parser can query the decoder's capabilities even before
processing the first piece of data, and configure itself to convert
accordingly, if conversion is needed at all.

In an auto-plugging context this is not so straight-forward though,
because elements are plugged incrementally and not before the previous
element has processes some data and decided what it will output exactly
(unless the template caps are completely fixed, then it can continue
right away, this is not always the case here though, see below). A
parser will thus have to decide on *some* output format so auto-plugging
can continue. It doesn't know anything about the available decoders and
their capabilities though, so it's possible that it will choose a format
that is not supported by any of the available decoders, or by the preferred
decoder.

If the parser had sufficiently concise but fixed source pad template caps,
decodebin could continue to plug a decoder right away, allowing the
parser to configure itself in the same way as it would with a static
pipeline. This is not an option, unfortunately, because often the
parser needs to process some data to determine e.g. the format's profile or
other stream properties (resolution, sample rate, channel configuration, etc.),
and there may be different decoders for different profiles (e.g. DSP codec
for baseline profile, and software fallback for main/high profile; or a DSP
codec only supporting certain resolutions, with a software fallback for
unusual resolutions). So if decodebin just plugged the most highest-ranking
decoder, that decoder might not be be able to handle the actual stream later
on, which would yield in an error (this is a data flow error then which would
be hard to intercept and avoid in decodebin). In other words, we can't solve
this issue by plugging a decoder right away with the parser.

So decodebin need to communicate to the parser the set of available decoder
caps (which would contain the relevant capabilities/restrictions such as
supported profiles, resolutions, etc.), after the usual "autoplug-*" signal
filtering/sorting of course.

This could be done in multiple ways, e.g.

  - plug a capsfilter element right after the parser, and construct
    a set of filter caps from the list of available decoders (one
    could append at the end just the name(s) of the caps structures
    from the parser pad template caps to function as an 'ANY other'
    caps equivalent). This would let the parser negotiate to a
    supported stream format in the same way as with the static
    pipeline mentioned above, but of course incur some overhead
    through the additional capsfilter element.

  - one could add a filter-caps equivalent property to the parsers
    (and/or GstBaseParse class) (e.g. "prefered-caps" or so).

  - one could add some kind of "fixate-caps" or "fixate-format"
    signal to such parsers

Alternatively, one could simply make all decoders incorporate parsers, so
that always all formats are supported. This is problematic for other reasons
though (e.g. we would not be able to detect the profile in all cases then
before plugging a decoder, which would make it hard to just play the audio
part of a stream and not the video if a suitable decoder was missing, for
example).

Additional considerations: the same problem exists with sinks that support
non-raw formats. Consider, for example, an audio sink that accepts DTS audio,
but only the 14-bit variant, not the 16-bit variant (or only native endiannes).
Ideally dcaparse would convert into the required stream format here.