summaryrefslogtreecommitdiff
path: root/src/xmlpatterns/doc/src/xquery-introduction.qdoc
blob: 1c6d82b578a5daaddeb09089b18d5aeac39eb829 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
/****************************************************************************
**
** Copyright (C) 2017 The Qt Company Ltd.
** Contact: https://www.qt.io/licensing/
**
** This file is part of the documentation of the Qt Toolkit.
**
** $QT_BEGIN_LICENSE:FDL$
** Commercial License Usage
** Licensees holding valid commercial Qt licenses may use this file in
** accordance with the commercial license agreement provided with the
** Software or, alternatively, in accordance with the terms contained in
** a written agreement between you and The Qt Company. For licensing terms
** and conditions see https://www.qt.io/terms-conditions. For further
** information use the contact form at https://www.qt.io/contact-us.
**
** GNU Free Documentation License Usage
** Alternatively, this file may be used under the terms of the GNU Free
** Documentation License version 1.3 as published by the Free Software
** Foundation and appearing in the file included in the packaging of
** this file. Please review the following information to ensure
** the GNU Free Documentation License version 1.3 requirements
** will be met: https://www.gnu.org/licenses/fdl-1.3.html.
** $QT_END_LICENSE$
**
****************************************************************************/

/*!
\page xquery-introduction.html
\title A Short Path to XQuery

\pagekeywords XPath XQuery
\startpage XQuery
\keyword XQuery-introduction

XQuery is a language for querying XML data or non-XML data that can be
modeled as XML. XQuery is specified by the \l{http://www.w3.org}{W3C}.

\tableofcontents

\section1 Introduction

Where Java and C++ are \e{statement-based} languages, the XQuery
language is \e{expression-based}. The simplest XQuery expression is an
XML element constructor:

\snippet code/doc_src_qtxmlpatterns.qdoc 20

This \c{<recipe/>} element is an XQuery expression that forms a
complete XQuery. In fact, this XQuery doesn't actually query
anything. It just creates an empty \c{<recipe/>} element in the
output. But \l{Constructing Elements} {constructing new elements in an
XQuery} is often necessary.

An XQuery expression can also be enclosed in curly braces and embedded
in another XQuery expression. This XQuery has a document expression
embedded in a node expression:

\snippet code/doc_src_qtxmlpatterns.qdoc 21

It creates a new \c{<html>} element in the output and sets its \c{id}
attribute to be the \c{id} attribute from an \c{<html>} element in the
\c{other.html} file.

\section1 Using Path Expressions To Match And Select Items

In C++ and Java, we write nested \c{for} loops and recursive functions
to traverse XML trees in search of elements of interest. In XQuery, we
write these iterative and recursive algorithms with \e{path
expressions}.

A path expression looks somewhat like a typical \e{file pathname} for
locating a file in a hierarchical file system. It is a sequence of one
or more \e{steps} separated by slash '/' or double slash '//'.
Although path expressions are used for traversing XML trees, not file
systems, in Qt XML Patterns we can model a file system to look like an
XML tree, so in Qt XML Patterns we can use XQuery to traverse a file
system. See the \l {File System Example} {file system example}.

Think of a path expression as an algorithm for traversing an XML tree
to find and collect items of interest. This algorithm is evaluated by
evaluating each step moving from left to right through the sequence. A
step is evaluated with a set of input items (nodes and atomic values),
sometimes called the \e focus.  The step is evaluated for each item in
the focus. These evaluations produce a new set of items, called the \e
result, which then becomes the focus that is passed to the next step.
Evaluation of the final step produces the final result, which is the
result of the XQuery.  The items in the result set are presented in
\l{http://www.w3.org/TR/xquery/#id-document-order} {document order}
and without duplicates.

With Qt XML Patterns, a standard way to present the initial focus to a
query is to call QXmlQuery::setFocus(). Another common way is to let
the XQuery itself create the initial focus by using the first step of
the path expression to call the XQuery \c{doc()} function. The
\c{doc()} function loads an XML document and returns the \e {document
node}. Note that the document node is \e{not} the same as the
\e{document element}. The \e{document node} is a node constructed in
memory, when the document is loaded. It represents the entire XML
document, not the document element. The \e{document element} is the
single, top-level XML element in the file. The \c{doc()} function
returns the document node, which becomes the singleton node in the
initial focus set. The document node will have one child node, and
that child node will represent the document element.  Consider the
following XQuery:

\snippet code/doc_src_qtxmlpatterns.qdoc 18

The \c{doc()} function loads the \c{cookbook.xml} file and returns the
document node. The document node then becomes the focus for the next
step \c{//recipe}. Here the double slash means select all \c{<recipe>}
elements found below the document node, regardless of where they
appear in the document tree.  The query selects all \c{<recipe>}
elements in the cookbook. See \l{Running The Cookbook Examples} for
instructions on how to run this query (and most of the ones that
follow) from the command line.

Conceptually, evaluation of the steps of a path expression is similar
to iterating through the same number of nested \e{for} loops. Consider
the following XQuery, which builds on the previous one:

\snippet code/doc_src_qtxmlpatterns.qdoc 19

This XQuery is a single path expression composed of three steps. The
first step creates the initial focus by calling the \c{doc()}
function. We can paraphrase what the query engine does at each step:

\list 1
    \li for each node in the initial focus (the document node)...
    \li for each descendant node that is a \c{<recipe>} element...
    \li collect the child nodes that are \c{<title>} elements.
\endlist

Again the double slash means select all the \c{<recipe>} elements in the
document. The single slash before the \c{<title>} element means select
only those \c{<title>} elements that are \e{child} elements of a
\c{<recipe>} element (i.e. not grandchildren, etc). The XQuery evaluates
to a final result set containing the \c{<title>} element of each
\c{<recipe>} element in the cookbook.

\section2 Axis Steps

The most common kind of path step is called an \e{axis step}, which
tells the query engine which way to navigate from the context node,
and which test to perform when it encounters nodes along the way. An
axis step has two parts, an \e{axis specifier}, and a \e{node test}.
Conceptually, evaluation of an axis step proceeds as follows: For each
node in the focus set, the query engine navigates out from the node
along the specified axis and applies the node test to each node it
encounters. The nodes selected by the node test are collected in the
result set, which becomes the focus set for the next step.

In the example XQuery above, the second and third steps are both axis
steps. Both apply the \c{element(name)} node test to nodes encountered
while traversing along some axis. But in this example, the two axis
steps are written in a \l{Shorthand Form} {shorthand form}, where the
axis specifier and the node test are not written explicitly but are
implied. XQueries are normally written in this shorthand form, but
they can also be written in the longhand form. If we rewrite the
XQuery in the longhand form, it looks like this:

\snippet code/doc_src_qtxmlpatterns.qdoc 22

The two axis steps have been expanded. The first step (\c{//recipe})
has been rewritten as \c{/descendant-or-self::element(recipe)}, where
\c{descendant-or-self::} is the axis specifier and \c{element(recipe)}
is the node test. The second step (\c{title}) has been rewritten as
\c{/child::element(title)}, where \c{child::} is the axis specifier
and \c{element(title)} is the node test. The output of the expanded
XQuery will be exactly the same as the output of the shorthand form.

To create an axis step, concatenate an axis specifier and a node
test. The following sections list the axis specifiers and node tests
that are available.

\section2 Axis Specifiers

An axis specifier defines the direction you want the query engine to
take, when it navigates away from the context node. Qt XML Patterns
supports the following axes.

\table
\header
  \li Axis Specifier
  \li refers to the axis containing...
  \row
    \li \c{self::}
    \li the context node itself
  \row
    \li \c{attribute::}
    \li all attribute nodes of the context node
  \row
    \li \c{child::}
    \li all child nodes of the context node (not attributes)
  \row
    \li \c{descendant::}
    \li all descendants of the context node (children, grandchildren, etc)
  \row
    \li \c{descendant-or-self::}
    \li all nodes in \c{descendant} + \c{self}
  \row
    \li \c{parent::}
    \li the parent node of the context node, or empty if there is no parent
  \row
    \li \c{ancestor::}
    \li all ancestors of the context node (parent, grandparent, etc)
  \row
    \li \c{ancestor-or-self::}
    \li all nodes in \c{ancestor} + \c{self}
  \row
    \li \c{following::}
    \li all nodes in the tree containing the context node, \e not
    including \c{descendant}, \e and that follow the context node
    in the document
  \row
    \li \c{preceding::}
    \li all nodes in the tree contianing the context node, \e not
    including \c{ancestor}, \e and that precede the context node in
    the document
  \row
    \li \c{following-sibling::}
    \li all children of the context node's \c{parent} that follow the
    context node in the document
  \row
    \li \c{preceding-sibling::}
    \li all children of the context node's \c{parent} that precede the
    context node in the document
\endtable

\section2 Node Tests

A node test is a conditional expression that must be true for a node
if the node is to be selected by the axis step. The conditional
expression can test just the \e kind of node, or it can test the \e
kind of node and the \e name of the node. The XQuery specification for
\l{http://www.w3.org/TR/xquery/#node-tests} {node tests} also defines
a third condition, the node's \e {Schema Type}, but schema type tests
are not supported in Qt XML Patterns.

Qt XML Patterns supports the following node tests. The tests that have a
\c{name} parameter test the node's name in addition to its \e{kind}
and are often called the \l{Name Tests}.

\table
\header
  \li Node Test
  \li matches all...
  \row
    \li \c{node()}
    \li nodes of any kind
  \row
    \li \c{text()}
    \li text nodes
  \row
    \li \c{comment()}
    \li comment nodes
  \row
    \li \c{element()}
    \li element nodes (same as star: *)
  \row
    \li \c{element(name)}
    \li element nodes named \c{name}
  \row
    \li \c{attribute()}
    \li attribute nodes
  \row
    \li \c{attribute(name)}
    \li attribute nodes named \c{name}
   \row
    \li \c{processing-instruction()}
    \li processing-instructions
  \row
    \li \c{processing-instruction(name)}
    \li processing-instructions named \c{name}
  \row
    \li \c{document-node()}
    \li document nodes (there is only one)
  \row
    \li \c{document-node(element(name))}
    \li document node with document element \c{name}
\endtable

\target Shorthand Form
\section2 Shorthand Form

Writing axis steps using the longhand form with axis specifiers and
node tests is semantically clear but syntactically verbose. The
shorthand form is easy to learn and, once you learn it, just as easy
to read. In the shorthand form, the axis specifier and node test are
implied by the syntax. XQueries are normally written in the shorthand
form. Here is a table of some frequently used shorthand forms:

\table
\header
  \li Shorthand syntax
  \li Short for...
  \li matches all...
  \row
    \li \c{name}
    \li \c{child::element(name)}
    \li child nodes that are \c{name} elements

  \row
    \li \c{*}
    \li \c{child::element()}
    \li child nodes that are elements (\c{node()} matches
    \e all child nodes)

  \row
    \li \c{..}
    \li \c{parent::node()}
    \li parent nodes (there is only one)

  \row
    \li \c{@*}
    \li \c{attribute::attribute()}
    \li attribute nodes

  \row
    \li \c{@name}
    \li \c{attribute::attribute(name)}
    \li \c{name} attributes

  \row
    \li \c{//}
    \li \c{descendant-or-self::node()}
    \li descendent nodes (when used instead of '/')

\endtable

The \l{http://www.w3.org/TR/xquery/}{XQuery language specification}
has a more detailed section on the shorthand form, which it calls the
\l{http://www.w3.org/TR/xquery/#abbrev} {abbreviated syntax}. More
examples of path expressions written in the shorthand form are found
there. There is also a section listing examples of path expressions
written in the \l{http://www.w3.org/TR/xquery/#unabbrev} {longhand
form}.

\target Name Tests
\section2 Name Tests

The name tests are the \l{Node Tests} that have the \c{name}
parameter. A name test must match the node \e name in addition to the
node \e kind. We have already seen name tests used:

\snippet code/doc_src_qtxmlpatterns.qdoc 19

In this path expression, both \c{recipe} and \c{title} are name tests
written in the shorthand form. XQuery resolves these names
(\l{http://www.w3.org/TR/xquery/#id-basics}{QNames}) to their expanded
form using whatever
\l{http://www.w3.org/TR/xquery/#dt-namespace-declaration} {namespace
declarations} it knows about. Resolving a name to its expanded form
means replacing its namespace prefix, if one is present (there aren't
any present in the example), with a namespace URI. The expanded name
then consists of the namespace URI and the local name.

But the names in the example above don't have namespace prefixes,
because we didn't include a namespace declaration in our
\c{cookbook.xml} file. However, we will often use XQuery to query XML
documents that use namespaces. Forgetting to declare the correct
namespace(s) in an XQuery is a common cause of XQuery failures. Let's
add a \e{default} namespace to \c{cookbook.xml} now. Change the
\e{document element} in \c{cookbook.xml} from:

\snippet code/doc_src_qtxmlpatterns.qdoc 23

to...

\snippet code/doc_src_qtxmlpatterns.qdoc 24

This is called a \e{default namespace} declaration because it doesn't
include a namespace prefix. By including this default namespace
declaration in the document element, we mean that all unprefixed
\e{element} names in the document, including the document element
itself (\c{cookbook}), are automatically in the default namespace
\c{http://cookbook/namespace}. Note that unprefixed \e{attribute}
names are not affected by the default namespace declaration. They are
always considered to be in \e{no namespace}.  Note also that the URL
we choose as our namespace URI need not refer to an actual location,
and doesn't refer to one in this case. But click on
\l{http://www.w3.org/XML/1998/namespace}, for example, which is the
namespace URI for elements and attributes prefixed with \c{xml:}.

Now when we try to run the previous XQuery example, no output is
produced! The path expression no longer matches anything in the
cookbook file because our XQuery doesn't yet know about the namespace
declaration we added to the cookbook document. There are two ways we
can declare the namespace in the XQuery. We can give it a \e{namespace
prefix} (e.g. \c{c} for cookbook) and prefix each name test with the
namespace prefix:

\snippet code/doc_src_qtxmlpatterns.qdoc 3

Or we can declare the namespace to be the \e{default element
namespace}, and then we can still run the original XQuery:

\snippet code/doc_src_qtxmlpatterns.qdoc 4

Both methods will work and produce the same output, all the
\c{<title>} elements:

\snippet code/doc_src_qtxmlpatterns.qdoc 5

But note how the output is slightly different from the output we saw
before we added the default namespace declaration to the cookbook file.
Qt XML Patterns automatically includes the correct namespace attribute
in each \c{<title>} element in the output. When Qt XML Patterns loads a
document and expands a QName, it creates an instance of QXmlName,
which retains the namespace prefix along with the namespace URI and
the local name. See QXmlName for further details.

One thing to keep in mind from this namespace discussion, whether you
run XQueries in a Qt program using Qt XML Patterns, or you run them from
the command line using xmlpatterns, is that if you don't get the
output you expect, it might be because the data you are querying uses
namespaces, but you didn't declare those namespaces in your XQuery.

\section3 Wildcards in Name Tests

The wildcard \c{'*'} can be used in a name test. To find all the
attributes in the cookbook but select only the ones in the \c{xml}
namespace, use the \c{xml:} namespace prefix but replace the
\e{local name} (the attribute name) with the wildcard:

\snippet code/doc_src_qtxmlpatterns.qdoc 7

Oops! If you save this XQuery in \c{file.xq} and run it through
\c{xmlpatterns}, it doesn't work. You get an error message instead,
something like this: \e{Error SENR0001 in file:///...file.xq, at line
1, column 1: Attribute xml:id can't be serialized because it appears
at the top level.} The XQuery actually ran correctly. It selected a
bunch of \c{xml:id} attributes and put them in the result set. But
then \c{xmlpatterns} sent the result set to a \l{QXmlSerializer}
{serializer}, which tried to output it as well-formed XML. Since the
result set contains only attributes and attributes alone are not
well-formed XML, the \l{QXmlSerializer} {serializer} reports a
\l{http://www.w3.org/TR/2005/WD-xslt-xquery-serialization-20050915/#id-errors}
{serialization error}.

Fear not. XQuery can do more than just find and select elements and
attributes. It can \l{Constructing Elements} {construct new ones on
the fly} as well, which is what we need to do here if we want
\c{xmlpatterns} to let us see the attributes we selected. The example
above and the ones below are revisited in the \l{Constructing
Elements} section. You can jump ahead to see the modified examples
now, and then come back, or you can press on from here.

To find all the \c{name} attributes in the cookbook and select them
all regardless of their namespace, replace the namespace prefix with
the wildcard and write \c{name} (the attribute name) as the local
name:

\snippet code/doc_src_qtxmlpatterns.qdoc 8

To find and select all the attributes of the \e{document element} in
the cookbook, replace the entire name test with the wildcard:

\snippet code/doc_src_qtxmlpatterns.qdoc 9

\section1 Using Predicates In Path Expressions

Predicates can be used to further filter the nodes selected by a path
expression. A predicate is an expression in square brackets ('[' and
']') that either returns a boolean value or a number. A predicate can
appear at the end of any path step in a path expression. The predicate
is applied to each node in the focus set.  If a node passes the
filter, the node is included in the result set.  The query below
selects the recipe element that has the \c{<title>} element
\c{"Hard-Boiled Eggs"}.

\snippet code/doc_src_qtxmlpatterns.qdoc 10

The dot expression ('.') can be used in predicates and path
expressions to refer to the current context node. The following query
uses the dot expression to refer to the current \c{<method>} element.
The query selects the empty \c{<method>} elements from the cookbook.

\snippet code/doc_src_qtxmlpatterns.qdoc 11

Note that passing the dot expression to the
\l{http://www.w3.org/TR/xpath-functions/#func-string-length}
{string-length()} function is optional. When
\l{http://www.w3.org/TR/xpath-functions/#func-string-length}
{string-length()} is called with no parameter, the context node is
assumed:

\snippet code/doc_src_qtxmlpatterns.qdoc 12

Actually, selecting an empty \c{<method>} element might not be very
useful by itself. It doesn't tell you which recipe has the empty
method:

\snippet code/doc_src_qtxmlpatterns.qdoc 31

\target Empty Method Not Robust
What you probably want to see instead are the \c{<recipe>} elements that
have empty \c{<method>} elements:

\snippet code/doc_src_qtxmlpatterns.qdoc 32

The predicate uses the
\l{http://www.w3.org/TR/xpath-functions/#func-string-length}
{string-length()} function to test the length of each \c{<method>}
element in each \c{<recipe>} element found by the node test. If a
\c{<method>} contains no text, the predicate evaluates to \c{true} and
the \c{<recipe>} element is selected. If the method contains some
text, the predicate evaluates to \c{false}, and the \c{<recipe>}
element is discarded.  The output is the entire recipe that has no
instructions for preparation:

\snippet code/doc_src_qtxmlpatterns.qdoc 33

The astute reader will have noticed that this use of
\c{string-length()} to find an empty element is unreliable. It works
in this case, because the method element is written as \c{<method/>},
guaranteeing that its string length will be 0. It will still work if
the method element is written as \c{<method></method>}, but it will
fail if there is any whitespace between the opening and ending
\c{<method>} tags. A more robust way to find the recipes with empty
methods is presented in the section on \l{Boolean Predicates}.

There are many more functions and operators defined for XQuery and
XPath. They are all \l{http://www.w3.org/TR/xpath-functions}
{documented in the specification}.

\section2 Positional Predicates

Predicates are often used to filter items based on their position in
a sequence. For path expressions processing items loaded from XML
documents, the normal sequence is
\l{http://www.w3.org/TR/xquery/#id-document-order} {document order}.
This query returns the second \c{<recipe>} element in the
\c{cookbook.xml} file:

\snippet code/doc_src_qtxmlpatterns.qdoc 13

The other frequently used positional function is
\l{http://www.w3.org/TR/xpath-functions/#func-last} {last()}, which
returns the numeric position of the last item in the focus set. Stated
another way, \l{http://www.w3.org/TR/xpath-functions/#func-last}
{last()} returns the size of the focus set. This query returns the
last recipe in the cookbook:

\snippet code/doc_src_qtxmlpatterns.qdoc 16

And this query returns the next to last \c{<recipe>}:

\snippet code/doc_src_qtxmlpatterns.qdoc 17

\section2 Boolean Predicates

The other kind of predicate evaluates to \e true or \e false. A
boolean predicate takes the value of its expression and determines its
\e{effective boolean value} according to the following rules:

\list
    \li An expression that evaluates to a single node is \c{true}.

    \li An expression that evaluates to a string is \c{false} if the
       string is empty and \c{true} if the string is not empty.

    \li An expression that evaluates to a boolean value (i.e. type
    \c{xs:boolean}) is that value.

    \li If the expression evaluates to anything else, it's an error
    (e.g. type \c{xs:date}).

\endlist

We have already seen some boolean predicates in use.  Earlier, we saw
a \e{not so robust} way to find the \l{Empty Method Not Robust}
{recipes that have no instructions}. \c{[string-length(method) = 0]}
is a boolean predicate that would fail in the example if the empty
method element was written with both opening and closing tags and
there was whitespace between the tags. Here is a more robust way that
uses a different boolean predicate.

\snippet code/doc_src_qtxmlpatterns.qdoc 34

This one uses the
\l{http://www.w3.org/TR/xpath-functions/#func-empty} {empty()} and
function to test whether the method contains any steps. If the method
contains no steps, then \c{empty(step)} will return \c{true}, and
hence the predicate will evaluate to \c{true}.

But even that version isn't foolproof. Suppose the method does contain
steps, but all the steps themselves are empty. That's still a case of
a recipe with no instructions that won't be detected. There is a
better way:

\snippet code/doc_src_qtxmlpatterns.qdoc 35

This version uses the
\l{http://www.w3.org/TR/xpath-functions/#func-not} {not} and
\l{http://www.w3.org/TR/xpath-functions/#func-normalize-space}
{normalize-space()} functions. \c{normalize-space(method))} returns
the contents of the method element as a string, but with all the
whitespace normalized, i.e., the string value of each \c{<step>}
element will have its whitespace normalized, and then all the
normalized step values will be concatenated. If that string is empty,
then \c{not()} returns \c{true} and the predicate is \c{true}.

We can also use the
\l{http://www.w3.org/TR/xpath-functions/#func-position} {position()}
function in a comparison to inspect positions with conditional logic. The
\l{http://www.w3.org/TR/xpath-functions/#func-position} {position()}
function returns the position index of the current context item in the
sequence of items:

\snippet code/doc_src_qtxmlpatterns.qdoc 14

Note that the first position in the sequence is position 1, not 0. We
can also select \e{all} the recipes after the first one:

\snippet code/doc_src_qtxmlpatterns.qdoc 15

\target Constructing Elements
\section1 Constructing Elements

In the section about \l{Wildcards in Name Tests} {using wildcards in
name tests}, we saw three simple example XQueries, each of which
selected a different list of XML attributes from the cookbook.  We
couldn't use \c{xmlpatterns} to run these queries, however, because
\c{xmlpatterns} sends the XQuery results to a \l{QXmlSerializer}
{serializer}, which expects to serialize the results as well-formed
XML. Since a list of XML attributes by itself is not well-formed XML,
the serializer reported an error for each XQuery.

Since an attribute must appear in an element, for each attribute in
the result set, we must create an XML element. We can do that using a
\l{http://www.w3.org/TR/xquery/#id-for-let} {\e{for} clause} with a
\l{http://www.w3.org/TR/xquery/#id-variables} {bound variable}, and a
\l{http://www.w3.org/TR/xquery/#id-orderby-return} {\e{return}
clause} with an element constructor:

\snippet code/doc_src_qtxmlpatterns.qdoc 25

The \e{for} clause produces a sequence of attribute nodes from the result
of the path expression. Each attribute node in the sequence is bound
to the variable \c{$i}. The \e{return} clause then constructs a \c{<p>}
element around the attribute node. Here is the output:

\snippet code/doc_src_qtxmlpatterns.qdoc 28

The output contains one \c{<p>} element for each \c{xml:id} attribute
in the cookbook. Note that XQuery puts each attribute in the right
place in its \c{<p>} element, despite the fact that in the \e{return}
clause, the \c{$i} variable is positioned as if it is meant to become
\c{<p>} element content.

The other two examples from the \l{Wildcards in Name Tests} {wildcard}
section can be rewritten the same way. Here is the XQuery that selects
all the \c{name} attributes, regardless of namespace:

\snippet code/doc_src_qtxmlpatterns.qdoc 26

And here is its output:

\snippet code/doc_src_qtxmlpatterns.qdoc 29

And here is the XQuery that selects all the attributes from the
\e{document element}:

\snippet code/doc_src_qtxmlpatterns.qdoc 27

And here is its output:

\snippet code/doc_src_qtxmlpatterns.qdoc 30

\section2 Element Constructors are Expressions

Because node constructors are expressions, they can be used in
XQueries wherever expressions are allowed.

\snippet code/doc_src_qtxmlpatterns.qdoc 40

If \c{cookbook.xml} is loaded without error, a \c{<oppskrift>} element
(Norwegian word for recipe) is constructed for each \c{<recipe>}
element in the cookbook, and the child nodes of the \c{<recipe>} are
copied into the \c{<oppskrift>} element. But if the cookbook document
doesn't exist or does not contain well-formed XML, a single
\c{<oppskrift>} element is constructed containing an error message.

\section1 Constructing Atomic Values

XQuery also has atomic values. An atomic value is a value in the value
space of one of the built-in datatypes in the \l
{http://www.w3.org/TR/xmlschema-2} {XML Schema language}. These
\e{atomic types} have built-in operators for doing arithmetic,
comparisons, and for converting values to other atomic types. See the
\l {http://www.w3.org/TR/xmlschema-2/#built-in-datatypes} {Built-in
Datatype Hierarchy} for the entire tree of built-in, primitive and
derived atomic types. \note Click on a data type in the tree for its
detailed specification.

To construct an atomic value as element content, enclose an expression
in curly braces and embed it in the element constructor:

\snippet code/doc_src_qtxmlpatterns.qdoc 36

Sending this XQuery through xmlpatterns produces:

\snippet code/doc_src_qtxmlpatterns.qdoc 37

To compute the value of an attribute, enclose the expression in
curly braces and embed it in the attribute value:

\snippet code/doc_src_qtxmlpatterns.qdoc 38

Sending this XQuery through xmlpatterns produces:

\snippet code/doc_src_qtxmlpatterns.qdoc 39

\snippet code/doc_src_qtxmlpatterns.qdoc 40

If \c{cookbook.xml} is loaded without error, a \c{<oppskrift>} element
(Norweigian word for recipe) is constructed for each \c{<recipe>}
element in the cookbook, and the child nodes of the \c{<recipe>} are
copied into the \c{<oppskrift>} element. But if the cookbook document
doesn't exist or does not contain well-formed XML, a single
\c{<oppskrift>} element is constructed containing an error message.

\section1 Running The Cookbook Examples

Most of the XQuery examples in this document refer to the
\c{cookbook.xml} example file from the \l{Recipes Example}.
Copy the \c{cookbook.xml} to your current directory, save one of the
cookbook XQuery examples in a \c{.xq} file (e.g., \c{file.xq}), and
run the XQuery using Qt's command line utility:

\snippet code/doc_src_qtxmlpatterns.qdoc 6

\section1 Further Reading

There is much more to the XQuery language than we have presented in
this short introduction. We will be adding more here in later
releases. In the meantime, playing with the \c{xmlpatterns} utility
and making modifications to the XQuery examples provided here will be
quite informative. An XQuery textbook will be a good investment.

You can also ask questions on XQuery mail lists:

\list
\li
\l{http://lists.qt-project.org/mailman/listinfo/interest}{qt-interest}
\li
\l{http://www.x-query.com/mailman/listinfo/talk}{talk at x-query.com}.
\endlist

\l{http://www.functx.com/functx/}{FunctX} has a collection of XQuery
functions that can be both useful and educational.

This introduction contains many links to the specifications, which, of course,
are the ultimate source of information about XQuery. They can be a bit
difficult, though, so consider investing in a textbook:

\list

    \li \l{http://www.w3.org/TR/xquery/}{XQuery 1.0: An XML Query
    Language} - the main source for syntax and semantics.

    \li \l{http://www.w3.org/TR/xpath-functions/}{XQuery 1.0 and XPath
       2.0 Functions and Operators} - the builtin functions and operators.

\endlist

\section1 FAQ

The answers to these frequently asked questions explain the causes of
several common mistakes that most beginners make.  Reading through the
answers ahead of time might save you a lot of head scratching.

\section2 Why didn't my path expression match anything?

The most common cause of this bug is failure to declare one or more
namespaces in your XQuery. Consider the following query for selecting
all the examples in an XHTML document:

\quotefile patternist/simpleHTML.xq

It won't match anything because \c{index.html} is an XHTML file, and
all XHTML files declare the default namespace
\c{"http://www.w3.org/1999/xhtml"} in their top (\c{<html>}) element.
But the query doesn't declare this namespace, so the path expression
expands \c{html} to \c{{}html} and tries to match that expanded name.
But the actual expanded name is
\c{{http://www.w3.org/1999/xhtml}html}. One possible fix is to declare the
correct default namespace in the XQuery:

\quotefile patternist/simpleXHTML.xq

Another common cause of this bug is to confuse the \e{document node}
with the top element node. They are different. This query won't match
anything:

\quotefile patternist/docPlainHTML.xq

The \c{doc()} function returns the \e{document node}, not the top
element node (\c{<html>}). Don't forget to match the top element node
in the path expression:

\quotefile patternist/docPlainHTML2.xq

\section2 What if my input namespace is different from my output namespace?

Just remember to declare both namespaces in your XQuery and use them
properly. Consider the following query, which is meant to generate
XHTML output from XML input:

\quotefile patternist/embedDataInXHTML.xq

We want the \c{<html>}, \c{<body>}, and \c{<p>} nodes we create in the
output to be in the standard XHTML namespace, so we declare the
default namespace to be \c{http://www.w3.org/1999/xhtml}. That's
correct for the output, but that same default namespace will also be
applied to the node names in the path expression we're trying to match
in the input (\c{/tests/test[@status = "failure"]}), which is wrong,
because the namespace used in \c{testResult.xml} is perhaps in the
empty namespace. So we must declare that namespace too, with a
namespace prefix, and then use the prefix with the node names in
the path expression. This one will probably work better:

\quotefile patternist/embedDataInXHTML2.xq

\section2 Why doesn't my return clause work?

Recall that XQuery is an \e{expression-based} language, not
\e{statement-based}. Because an XQuery is a lot of expressions,
understanding XQuery expression precedence is very important.
Consider the following query:

\quotefile patternist/forClause2.xq

It looks ok, but it isn't. It is supposed to be a FLWOR expression
comprising a \e{for} clause and a \e{return} clause, but it isn't just
that. It \e{has} a FLWOR expression, certainly (with the \e{for} and
\e{return} clauses), but it \e{also} has an arithmetic expression
(\e{+ $d}) dangling at the end because we didn't enclose the return
expression in parentheses.

Using parentheses to establish precedence is more important in XQuery
than in other languages, because XQuery is \e{expression-based}. In
In this case, without parantheses enclosing \c{$i + $d}, the return
clause only returns \c{$i}. The \c{+$d} will have the result of the
FLWOR expression as its left operand. And, since the scope of variable
\c{$d} ends at the end of the \e{return} clause, a variable out of
scope error will be reported. Correct these problems by using
parentheses.

\quotefile patternist/forClause.xq

\section2 Why didn't my expression get evaluated?

You probably misplaced some curly braces. When you want an expression
evaluated inside an element constructor, enclose the expression in
curly braces. Without the curly braces, the expression will be
interpreted as text. Here is a \c{sum()} expression used in an \c{<e>}
element. The table shows cases where the curly braces are missing,
misplaced, and placed correctly:

\table
\header
  \li element constructor with expression...
  \li evaluates to...
  \row
    \li <e>sum((1, 2, 3))</e>
    \li <e>sum((1, 2, 3))</e>
  \row
    \li <e>sum({(1, 2, 3)})</e>
    \li <e>sum(1 2 3)</e>
  \row
    \li <e>{sum((1, 2, 3))}</e>
    \li <e>6</e>
\endtable

\section2 My predicate is correct, so why doesn't it select the right stuff?

Either you put your predicate in the wrong place in your path
expression, or you forgot to add some parentheses.  Consider this
input file \c{doc.txt}:

\quotefile patternist/doc.txt

Suppose you want the first \c{<span>} element of every \c{<p>}
element. Apply a position filter (\c{[1]}) to the \c{/span} path step:

\quotefile patternist/filterOnStep.xq

Applying the \c{[1]} filter to the \c{/span} step returns the first
\c{<span>} element of each \c{<p>} element:

\snippet code/doc_src_qtxmlpatterns.qdoc 41

\note: You can write the same query this way:

\snippet code/doc_src_qtxmlpatterns.qdoc 44

Or you can reduce it right down to this:

\snippet code/doc_src_qtxmlpatterns.qdoc 45

On the other hand, suppose you really want only one \c{<span>}
element, the first one in the document (i.e., you only want the first
\c{<span>} element in the first \c{<p>} element). Then you have to do
more filtering. There are two ways you can do it. You can apply the
\c{[1]} filter in the same place as above but enclose the path
expression in parentheses:

\quotefile patternist/filterOnPath.xq

Or you can apply a second position filter (\c{[1]} again) to the
\c{/p} path step:

\snippet code/doc_src_qtxmlpatterns.qdoc 43

Either way the query will return only the first \c{<span>} element in
the document:

\snippet code/doc_src_qtxmlpatterns.qdoc 42

\section2 Why doesn't my FLWOR behave as expected?

The quick answer is you probably expected your XQuery FLWOR to behave
just like a C++ \e{for} loop. But they aren't the same. Consider a
simple example:

\quotefile patternist/letOrderBy.xq

This query evaluates to \e{4 -4 -2 2 -8 8}. The \e{for} clause does
set up a \e{for} loop style iteration, which does evaluate the rest of
the FLWOR multiple times, one time for each value returned by the
\e{in} expression. That much is similar to the C++ \e{for} loop.

But consider the \e{return} clause. In C++ if you hit a \e{return}
statement, you break out of the \e{for} loop and return from the
function with one value. Not so in XQuery. The \e{return} clause is
the last clause of the FLWOR, and it means: \e{Append the return value
to the result list and then begin the next iteration of the FLWOR}.
When the \e{for} clause's \e{in} expression no longer returns a value,
the entire result list is returned.

Next, consider the \e{order by} clause. It doesn't do any sorting on
each iteration of the FLWOR. It just evaluates its expression on each
iteration (\c{$a} in this case) to get an ordering value to map to the
result item from each iteration. These ordering values are kept in a
parallel list. The result list is sorted at the end using the parallel
list of ordering values.

The last difference to note here is that the \e{let} clause does
\e{not} set up an iteration through a sequence of values like the
\e{for} clause does. The \e{let} clause isn't a sort of nested loop.
It isn't a loop at all. It is just a variable binding.  On each
iteration, it binds the \e{entire} sequence of values on the right to
the variable on the left. In the example above, it binds (4 -4) to
\c{$b} on the first iteration, (-2 2) on the second iteration, and (-8
8) on the third iteration. So the following query doesn't iterate
through anything, and doesn't do any ordering:

\quotefile patternist/invalidLetOrderBy.xq

It binds the entire sequence (2, 3, 1) to \c{$i} one time only; the
\e{order by} clause only has one thing to order and hence does
nothing, and the query evaluates to 2 3 1, the sequence assigned to
\c{$i}.

\note We didn't include a \e{where} clause in the example. The
\e{where} clause is for filtering results.

\section2 Why are my elements created in the wrong order?

The short answer is your elements are \e{not} created in the wrong
order, because when appearing as operands to a path expression,
there is no correct order. Consider the following query,
which again uses the input file \c{doc.txt}:

\snippet code/doc_src_qtxmlpatterns.qdoc 46

The query finds all the \c{<p>} elements in the file. For each \c{<p>}
element, it builds a \c{<p>} element in the output containing the
concatenated contents of all the \c{<p>} element's child \c{<span>}
elements. Running the query through \c{xmlpatterns} might produce the
following output, which is not sorted in the expected order.

\snippet code/doc_src_qtxmlpatterns.qdoc 47

You can use a \e{for} loop to ensure that the order of
the result set corresponds to the order of the input sequence:

\snippet code/doc_src_qtxmlpatterns.qdoc 48

This version produces the same result set but in the expected order:

\snippet code/doc_src_qtxmlpatterns.qdoc 49

\section2 Why can't I use \c{true} and \c{false} in my XQuery?

You can, but not by just using the names \c{true} and \c{false}
directly, because they are \l{Name Tests} {name tests} although they look
like boolean constants. The simple way to create the boolean values is
to use the builtin functions \c{true()} and \c{false()} wherever
you want to use \c{true} and \c{false}. The other way is to invoke the
boolean constructor:

\quotefile patternist/xsBooleanTrue.xq
*/