summaryrefslogtreecommitdiff
path: root/src/raptor_grddl.c
Commit message (Collapse)AuthorAgeFilesLines
* (raptor_grddl_run_recursive): Only set content type handler whenDave Beckett2007-09-301-2/+4
| | | | recursive parser is grddl.
* #wsDave Beckett2007-09-301-2/+3
|
* Replaced all calls to get parser's current base ID withDave Beckett2007-09-301-11/+13
| | | | raptor_parser_get_current_base_id
* (raptor_grddl_parse_chunk): Remove #ifdef-out old <link> processingDave Beckett2007-09-301-22/+0
|
* (raptor_grddl_ensure_internal_parser): Re-init the guess parser eachDave Beckett2007-09-301-15/+74
| | | | | | | | | | | | | | | | | time so it does a fresh guess. (raptor_grddl_run_grddl_transform_doc): Save and restore the genid around recursive parsers, so blank nodes are numbered across graphs. (raptor_grddl_run_recursive): Switch to parser_name, flags args. Pass on the filter to the internal parser call. Do not add parent if the parser is not grddl. Pass on the ignore error flag to raptor_grddl_fetch_uri. Save and restore the genid around recursive parsers, so blank nodes are numbered across graphs. Do not call rdfxml parser if selected parser is already rdfxml. Update raptor_grddl_run_recursive calls to use parser name and flags. Alter the <link> processing to use the guess parser to figure out the mime type during the recursion. Do not filter the triples. Fixes Issue#0000238 http://bugs.librdf.org/mantis/view.php?id=238
* (raptor_grddl_parse_chunk): Use RAPTOR_LIBXML_HTML_PARSE_NONET toDave Beckett2007-09-301-0/+4
| | | | | decide whether to enable libxml HTML_PARSE_NONET with the html parser.
* Add declaration for libxml_optionsDave Beckett2007-09-301-0/+2
|
* (raptor_grddl_parse_chunk): Use RAPTOR_LIBXML_XML_PARSE_NONET to setDave Beckett2007-09-301-0/+9
| | | | XML nonet option if it was set with raptor feature nonet.
* (raptor_grddl_uri_xml_parse_bytes): Use RAPTOR_LIBXML_XML_PARSE_NONETDave Beckett2007-09-301-1/+1
| | | | to check for enum value XML_PARSE_NONET
* (raptor_grddl_fetch_uri): Reject a URI with feature noNet only if itDave Beckett2007-09-301-2/+4
| | | | is not a file URI
* Revert GRDDL to the main algorithm of around 12377Dave Beckett2007-09-241-81/+43
| | | | | | | | | | | | | which passes the tests again and Fixes Issue#0000239 http://bugs.librdf.org/mantis/view.php?id=239 (raptor_grddl_parser_add_parent): Restored. (raptor_grddl_copy_state): Removed (raptor_grddl_new_child_parser): Removed. (raptor_grddl_run_recursive): Remove reference to the above - replacing raptor_grddl_new_child_parser with raptor_grddl_ensure_internal_parser and replacing 'nparser' references with grddl_parser->internal_parser.
* (raptor_grddl_discard_message): debug message tweak.Dave Beckett2007-09-241-3/+2
|
* Remove RDFa support for nowDave Beckett2007-09-161-60/+1
|
* GRDDL and RDFaDave Beckett2007-08-281-4/+9
|
* (raptor_grddl_fetch_uri): Set WWW timeout from value of new parserDave Beckett2007-08-261-0/+4
| | | | feature RAPTOR_FEATURE_WWW_TIMEOUT
* struct raptor_grddl_parser_context_s gains html_link_processingDave Beckett2007-07-091-2/+8
| | | | | | | | to enable looking for <html> <link> with RDF/XML value. (raptor_grddl_parse_init_common): Enable html <link> by default. (raptor_rdfa_parse_init): Disable html <link> for RDFA parser. (raptor_grddl_parse_chunk): Check for html <link> available as well as allowed by feature.
* Added RAPTOR_FEATURE_HTML_LINK to control GRDDL looking for html <link ↵Dave Beckett2007-07-051-1/+1
| | | | type="application/rdf+xml" href="uri">
* (grddl_free_xml_context): Free the context itself.Dave Beckett2007-07-041-62/+136
| | | | | | | | | | | | | | | | | | | | | | | | (raptor_grddl_parser_add_parent): Deleted, merged into raptor_grddl_new_child_parser. Delete html:link entry from table for now - handle rdf/xml links specially later. (raptor_grddl_copy_state): Added, pulled out of raptor_grddl_ensure_internal_parser (raptor_grddl_ensure_internal_parser): Call raptor_grddl_copy_state (raptor_grddl_new_child_parser): Added, from raptor_grddl_ensure_internal_parser and raptor_grddl_parser_add_parent to allocate a new parser rather than overwrite the 'internal_parser'. (raptor_grddl_fetch_uri): Set/reset the content type handler eachtime. (raptor_grddl_run_xpath_match): Free URI after calculating relative to base. (raptor_grddl_run_recursive): Gains filter arg, again. Use raptor_grddl_new_child_parser to make a new (GRDDL) raptor_parser* and free it here when done. (raptor_grddl_parse_chunk): Add new filter arg to raptor_grddl_run_recursive Look for <link type="application/rdf+xml" href="URI" /> with RDF expected, not an XSLT transform URI.
* (raptor_grddl_discard_message): Report discarded errors when debugging.Dave Beckett2007-07-031-64/+125
| | | | | | (raptor_grddl_parse_chunk): Run XML then HTML parsing in sequence, discarding all errors here. Restore the error handlers afterwards. Move tidying up of buffers to function exit tidying.
* styleDave Beckett2007-07-021-3/+3
|
* match-table gains: looking for <link type="application/rdf+xml"Dave Beckett2007-07-021-0/+10
| | | | href="URI" />
* Add XSLT securityDave Beckett2007-06-201-0/+29
| | | | | | (raptor_init_parser_grddl_common): Deny reading, writing to files, creating directories or writing to network. (raptor_terminate_parser_grddl_common): Tidy up xslt security prefs.
* (raptor_grddl_run_grddl_transform_uri): Hack locator URI so errorsDave Beckett2007-06-131-2/+8
| | | | with XSLT are reported against that URI not the documents.
* (raptor_grddl_fetch_uri): Fix accept headerDave Beckett2007-06-131-1/+1
|
* Use /* for root elementDave Beckett2007-06-131-1/+1
|
* XML @dataview:transformation are only on the root element.Dave Beckett2007-06-131-1/+2
|
* (raptor_grddl_check_recursive_content_type_handler): Renamed fromDave Beckett2007-06-131-17/+26
| | | | | | | raptor_grddl_check_rdf_content_type_handler since it stores all content types now. Check for HTML content type and set html_base processing flag (raptor_grddl_run_recursive): Remove allow_rdf flag, always true.
* struct raptor_grddl_parser_context_s gains xinclude_processing andDave Beckett2007-06-131-22/+50
| | | | | | | | | | | | | html_base_processing flags. (raptor_grddl_parse_init_common): Initialise grddl, xinclude but not html base. (raptor_rdfa_parse_init): Disable grddl, xinclude and init html base. (raptor_grddl_run_xpath_match): If html_base_processing is enabled, switch XML doc type to XML_HTML for the xmlNodeGetBase() call and restore afterwards. (raptor_grddl_parse_chunk): Look for HTML or XHTML mime types to enable html_base_processing. Conditionalise XML Include processing with xinclude_processing flag.
* Debug message madness!Dave Beckett2007-06-111-15/+38
| | | | | (raptor_grddl_parse_chunk): After xinclude processing, reserialize the document DOM so it can be parsed later as RDF/XML if needed.
* (raptor_grddl_parse_chunk): Recognise root rdf:RDF element andDave Beckett2007-06-111-1/+10
| | | | | process as RDF/XML. Fix RDF/XML parsing of doc to not filter triples.
* (raptor_grddl_run_recursive): Send to right parser.Dave Beckett2007-06-101-1/+1
|
* (raptor_grddl_parse_uri_write_bytes): Removed.Dave Beckett2007-06-101-22/+12
| | | | | (raptor_grddl_run_recursive): Use typedef raptor_parse_bytes_context with raptor_parse_uri_write_bytes as a handler for starting parse lazily
* (raptor_grddl_run_recursive): Zaps error handlers on recursive parseDave Beckett2007-06-101-0/+7
| | | | when ignore_errors set.
* (raptor_grddl_fetch_uri): flags argument (was ignore_errors) can nowDave Beckett2007-06-101-8/+16
| | | | | | | | send a different accept header. (raptor_grddl_run_grddl_transform_uri): Call raptor_grddl_fetch_uri and expect XSLT. (raptor_grddl_run_recursive): Call raptor_grddl_fetch_uri and ignore errors.
* (raptor_grddl_discard_message): Added.Dave Beckett2007-06-101-14/+33
| | | | | | | | | | | | (raptor_grddl_fetch_uri): Added ignore_errors argument to set the raptor_www error handler to raptor_grddl_discard_message (raptor_grddl_run_grddl_transform_uri): Do not discard errors from raptor_grddl_fetch_uri call. (raptor_grddl_run_recursive): Added ignore_errors argument and use it to return 0 with no warnings, when errors happen. (raptor_grddl_parse_chunk): Run namespace URI recursive grddl while discarding errors. Run head profile URIs recursive grddl while discarding errors.
* (raptor_grddl_run_grddl_transform_doc): Pass in an xml context andDave Beckett2007-06-101-6/+12
| | | | | | use the base URI there rather than the parser's. (raptor_grddl_run_grddl_transform_uri): Pass on the xml context to the above.
* Use XML base URI passed around with the grddl_xml_context. Dave Beckett2007-06-101-52/+87
| | | | | | | | | | | | | (raptor_new_xml_context): Renamed from raptor_sequence_push_xml_context, moving sequence push to main code. (raptor_rdfa_parse_init): Push URI for RDFa in raptor_grddl_parse_start. (raptor_grddl_parse_start): Add XML context for RDFa here. (raptor_grddl_add_transform_xml_context): Renamed from raptor_grddl_add_transform_uri (raptor_grddl_run_grddl_transform_doc): If there is no parser name guessable, return. (raptor_grddl_run_grddl_transform_uri): Take a grddl_xml_context* arg instead of raptor_uri.
* Added grddl_xml_context structure.Dave Beckett2007-06-071-25/+74
| | | | | | | | | | | | | | | Transform and profile URI raptor_sequences are now sequences of grddl_xml_context structures. (raptor_sequence_push_xml_context, grddl_free_xml_context): Added. (raptor_grddl_parse_init_common): No need for raptor_libxml_init_generic_error_handlers, raptor_new_sax2 does it. Initialise raptor_sequence with grddl_free_xml_context. (raptor_grddl_add_transform_uri): Use raptor_sequence_push_xml_context. (raptor_grddl_filter_triples): Use grddl_xml_context for profile_uri sequence. (raptor_grddl_run_xpath_match): Use grddl_xml_context for URI results. (raptor_grddl_parse_chunk): Use raptor_sequence_push_xml_context for former URI sequences.
* Do an additional RDF/XML parse of content that is found toDave Beckett2007-06-051-37/+53
| | | | | be RDF/XML by mime type during recursive GRDDL, and an additional parse of the top level content too, if also found.
* (raptor_grddl_parse_chunk): Use feature RAPTOR_FEATURE_MICROFORMATSDave Beckett2007-06-051-0/+4
| | | | to dis/enable checking for hardcoded microformats
* Remove C++ commentDave Beckett2007-06-051-3/+3
|
* Added MATCH_LAST to stop searching for hardcoded sheets.Dave Beckett2007-06-051-12/+26
| | | | | | | | | | Add hReview sheet that if matches, stops looking for later microformats such as hCard. (raptor_grddl_parse_chunk): Use MATCH_LAST to stop looking for later hardcoded matches. (raptor_init_parser_grddl_common, raptor_terminate_parser_grddl_common): Added, called once for grddl or rdfa available.
* (raptor_grddl_add_transform_uri): Added, to add a transformation URIDave Beckett2007-06-041-6/+35
| | | | | | (XSLT) for a document, removing duplicate URIs. (raptor_grddl_filter_triples, raptor_grddl_parse_chunk): Use raptor_grddl_add_transform_uri.
* Add RDFa parserDave Beckett2007-06-041-4/+84
|
* (raptor_grddl_parse_chunk): Use HTML_PARSE_RECOVER if availableDave Beckett2007-05-151-2/+8
|
* Added parser feature RAPTOR_FEATURE_HTML_TAG_SOUP aka htmlTagSoup for use by ↵Dave Beckett2007-05-151-1/+2
| | | | GRDDL parser
* debugmsgDave Beckett2007-05-151-1/+1
|
* Use HTML parser when XML parser fails, to create a DOM for GRDDLingDave Beckett2007-05-151-80/+155
| | | | | | | | | | | | | | | | from invalid/not-WF HTML content. raptor_grddl_parser_context_s gains htmlParserCtxt as well as xmlParserCtxt and process_this_as_rdfxml GRDDL flag to know when to parse the content twice. (raptor_grddl_parse_terminate): Tidy htmlParserCtxt. Add MATCH_IS_HARDCODED to match_table just to make it clear. (match_table): Re-enable hcalendar and hcard as hardcoded XSLTs (raptor_grddl_run_xpath_match): Handle non-namespace elements. Handle MATCH_IS_HARDCODED and return on first match. (raptor_grddl_parse_chunk): Major change in structure - all content passed in is saved until is_end=1, then parsed with XML parser and if that fails, HTML parser. HTML parser is run with no errors or warnings.
* Update for error_handlers arrays.Dave Beckett2007-04-251-2/+3
|
* (raptor_grddl_parser_register_factory): Register XHTML mime typeDave Beckett2007-03-261-1/+1
| | | | higher, very unlikely another parser is dealing with this.