summaryrefslogtreecommitdiff
path: root/docs/intro_to_parsing.rst
blob: 7a585aeeecafd36ffe9a59c3c6ca90663ad6668e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
.. _intro_to_parsing: 

======================
Loading and saving RDF
======================

Reading an NT file
-------------------

RDF data has various syntaxes (``xml``, ``n3``, ``ntriples``, ``trix``, etc) that you might want to read. The simplest format is ``ntriples``, a line-based format. Create the file :file:`demo.nt` in the current directory with these two lines:

.. code-block:: n3

    <http://bigasterisk.com/foaf.rdf#drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
    <http://bigasterisk.com/foaf.rdf#drewp> <http://example.com/says> "Hello world" .

You need to tell RDFLib what format to parse, use the ``format`` keyword-parameter to :meth:`~rdflib.graph.Graph.parse`, you can pass either a mime-type or the name (a :doc:`list of available parsers <plugin_parsers>` is available).
If you are not sure what format your file will be, you can use :func:`rdflib.util.guess_format` which will guess based on the file extension. 

In an interactive python interpreter, try this:

.. code-block:: pycon

    >>> from rdflib import Graph
    >>> g = Graph()
    >>> g.parse("demo.nt", format="nt")
    <Graph identifier=HCbubHJy0 (<class 'rdflib.graph.Graph'>)>
    >>> len(g)
    2
    >>> import pprint
    >>> for stmt in g:
    ...     pprint.pprint(stmt)
    ... 
    (rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
     rdflib.term.URIRef('http://example.com/says'),
     rdflib.term.Literal(u'Hello world'))
    (rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
     rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
     rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))

The final lines show how RDFLib represents the two statements in the file. The statements themselves are just length-3 tuples; and the subjects, predicates, and objects are all rdflib types.

Reading remote graphs
---------------------

Reading graphs from the net is just as easy:

.. code-block:: pycon

    >>> g.parse("http://bigasterisk.com/foaf.rdf")
    >>> len(g)
    42

The format defaults to ``xml``, which is the common format for .rdf files you'll find on the net.

RDFLib will also happily read RDF from any file-like object, i.e. anything with a ``.read`` method.