| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
recursive parser is grddl.
|
| |
|
|
|
|
| |
raptor_parser_get_current_base_id
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
time so it does a fresh guess.
(raptor_grddl_run_grddl_transform_doc): Save and restore the genid
around recursive parsers, so blank nodes are numbered across graphs.
(raptor_grddl_run_recursive): Switch to parser_name, flags args.
Pass on the filter to the internal parser call.
Do not add parent if the parser is not grddl.
Pass on the ignore error flag to raptor_grddl_fetch_uri.
Save and restore the genid
around recursive parsers, so blank nodes are numbered across graphs.
Do not call rdfxml parser if selected parser is already rdfxml.
Update raptor_grddl_run_recursive calls to use parser name and flags.
Alter the <link> processing to use the guess parser to figure out the
mime type during the recursion. Do not filter the triples.
Fixes Issue#0000238 http://bugs.librdf.org/mantis/view.php?id=238
|
|
|
|
|
| |
decide whether to enable libxml HTML_PARSE_NONET with the html
parser.
|
| |
|
|
|
|
| |
XML nonet option if it was set with raptor feature nonet.
|
|
|
|
| |
to check for enum value XML_PARSE_NONET
|
|
|
|
| |
is not a file URI
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
which passes the tests again and
Fixes Issue#0000239 http://bugs.librdf.org/mantis/view.php?id=239
(raptor_grddl_parser_add_parent): Restored.
(raptor_grddl_copy_state): Removed
(raptor_grddl_new_child_parser): Removed.
(raptor_grddl_run_recursive): Remove reference to the above -
replacing raptor_grddl_new_child_parser with
raptor_grddl_ensure_internal_parser and replacing 'nparser'
references with grddl_parser->internal_parser.
|
| |
|
| |
|
| |
|
|
|
|
| |
feature RAPTOR_FEATURE_WWW_TIMEOUT
|
|
|
|
|
|
|
|
| |
to enable looking for <html> <link> with RDF/XML value.
(raptor_grddl_parse_init_common): Enable html <link> by default.
(raptor_rdfa_parse_init): Disable html <link> for RDFA parser.
(raptor_grddl_parse_chunk): Check for html <link> available as
well as allowed by feature.
|
|
|
|
| |
type="application/rdf+xml" href="uri">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(raptor_grddl_parser_add_parent): Deleted, merged into
raptor_grddl_new_child_parser.
Delete html:link entry from table for now - handle rdf/xml links
specially later.
(raptor_grddl_copy_state): Added, pulled out of
raptor_grddl_ensure_internal_parser
(raptor_grddl_ensure_internal_parser): Call raptor_grddl_copy_state
(raptor_grddl_new_child_parser): Added, from
raptor_grddl_ensure_internal_parser and
raptor_grddl_parser_add_parent to allocate a new parser rather than
overwrite the 'internal_parser'.
(raptor_grddl_fetch_uri): Set/reset the content type handler eachtime.
(raptor_grddl_run_xpath_match): Free URI after calculating relative
to base.
(raptor_grddl_run_recursive): Gains filter arg, again.
Use raptor_grddl_new_child_parser to make a new (GRDDL) raptor_parser*
and free it here when done.
(raptor_grddl_parse_chunk): Add new filter arg to
raptor_grddl_run_recursive
Look for <link type="application/rdf+xml" href="URI" /> with RDF
expected, not an XSLT transform URI.
|
|
|
|
|
|
| |
(raptor_grddl_parse_chunk): Run XML then HTML parsing in sequence,
discarding all errors here. Restore the error handlers afterwards.
Move tidying up of buffers to function exit tidying.
|
| |
|
|
|
|
| |
href="URI" />
|
|
|
|
|
|
| |
(raptor_init_parser_grddl_common): Deny reading, writing to files,
creating directories or writing to network.
(raptor_terminate_parser_grddl_common): Tidy up xslt security prefs.
|
|
|
|
| |
with XSLT are reported against that URI not the documents.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
raptor_grddl_check_rdf_content_type_handler since it stores all
content types now.
Check for HTML content type and set html_base processing flag
(raptor_grddl_run_recursive): Remove allow_rdf flag, always true.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
html_base_processing flags.
(raptor_grddl_parse_init_common): Initialise grddl, xinclude but not
html base.
(raptor_rdfa_parse_init): Disable grddl, xinclude and init html base.
(raptor_grddl_run_xpath_match): If html_base_processing is enabled,
switch XML doc type to XML_HTML for the xmlNodeGetBase() call
and restore afterwards.
(raptor_grddl_parse_chunk): Look for HTML or XHTML mime types to
enable html_base_processing.
Conditionalise XML Include processing with xinclude_processing flag.
|
|
|
|
|
| |
(raptor_grddl_parse_chunk): After xinclude processing, reserialize
the document DOM so it can be parsed later as RDF/XML if needed.
|
|
|
|
|
| |
process as RDF/XML. Fix RDF/XML parsing of doc to not filter
triples.
|
| |
|
|
|
|
|
| |
(raptor_grddl_run_recursive): Use typedef raptor_parse_bytes_context with
raptor_parse_uri_write_bytes as a handler for starting parse lazily
|
|
|
|
| |
when ignore_errors set.
|
|
|
|
|
|
|
|
| |
send a different accept header.
(raptor_grddl_run_grddl_transform_uri): Call raptor_grddl_fetch_uri
and expect XSLT.
(raptor_grddl_run_recursive): Call raptor_grddl_fetch_uri
and ignore errors.
|
|
|
|
|
|
|
|
|
|
|
|
| |
(raptor_grddl_fetch_uri): Added ignore_errors argument to set the
raptor_www error handler to raptor_grddl_discard_message
(raptor_grddl_run_grddl_transform_uri): Do not discard errors from
raptor_grddl_fetch_uri call.
(raptor_grddl_run_recursive): Added ignore_errors argument and
use it to return 0 with no warnings, when errors happen.
(raptor_grddl_parse_chunk): Run namespace URI recursive grddl while
discarding errors.
Run head profile URIs recursive grddl while discarding errors.
|
|
|
|
|
|
| |
use the base URI there rather than the parser's.
(raptor_grddl_run_grddl_transform_uri): Pass on the xml context to
the above.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(raptor_new_xml_context): Renamed from
raptor_sequence_push_xml_context, moving sequence push to main code.
(raptor_rdfa_parse_init): Push URI for RDFa in raptor_grddl_parse_start.
(raptor_grddl_parse_start): Add XML context for RDFa here.
(raptor_grddl_add_transform_xml_context): Renamed from
raptor_grddl_add_transform_uri
(raptor_grddl_run_grddl_transform_doc): If there is no parser name
guessable, return.
(raptor_grddl_run_grddl_transform_uri): Take a grddl_xml_context* arg
instead of raptor_uri.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transform and profile URI raptor_sequences are now sequences of
grddl_xml_context structures.
(raptor_sequence_push_xml_context, grddl_free_xml_context): Added.
(raptor_grddl_parse_init_common): No need for
raptor_libxml_init_generic_error_handlers, raptor_new_sax2 does it.
Initialise raptor_sequence with grddl_free_xml_context.
(raptor_grddl_add_transform_uri): Use raptor_sequence_push_xml_context.
(raptor_grddl_filter_triples): Use grddl_xml_context for profile_uri
sequence.
(raptor_grddl_run_xpath_match): Use grddl_xml_context for URI results.
(raptor_grddl_parse_chunk): Use raptor_sequence_push_xml_context
for former URI sequences.
|
|
|
|
|
| |
be RDF/XML by mime type during recursive GRDDL, and an additional
parse of the top level content too, if also found.
|
|
|
|
| |
to dis/enable checking for hardcoded microformats
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add hReview sheet that if matches, stops looking for later
microformats such as hCard.
(raptor_grddl_parse_chunk): Use MATCH_LAST to stop looking
for later hardcoded matches.
(raptor_init_parser_grddl_common,
raptor_terminate_parser_grddl_common): Added, called once
for grddl or rdfa available.
|
|
|
|
|
|
| |
(XSLT) for a document, removing duplicate URIs.
(raptor_grddl_filter_triples, raptor_grddl_parse_chunk): Use
raptor_grddl_add_transform_uri.
|
| |
|
| |
|
|
|
|
| |
GRDDL parser
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from invalid/not-WF HTML content.
raptor_grddl_parser_context_s gains htmlParserCtxt as well as
xmlParserCtxt and process_this_as_rdfxml GRDDL flag to know when
to parse the content twice.
(raptor_grddl_parse_terminate): Tidy htmlParserCtxt.
Add MATCH_IS_HARDCODED to match_table just to make it clear.
(match_table): Re-enable hcalendar and hcard as hardcoded XSLTs
(raptor_grddl_run_xpath_match): Handle non-namespace elements.
Handle MATCH_IS_HARDCODED and return on first match.
(raptor_grddl_parse_chunk): Major change in structure - all content
passed in is saved until is_end=1, then parsed with XML parser and if
that fails, HTML parser. HTML parser is run with no errors or
warnings.
|
| |
|
|
|
|
| |
higher, very unlikely another parser is dealing with this.
|