delta/raptor.git - github.com: dajobe/raptor.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	(raptor_grddl_run_recursive): Only set content type handler when	Dave Beckett	2007-09-30	1	-2/+4
\| \| \| \|	recursive parser is grddl.
*	#ws	Dave Beckett	2007-09-30	1	-2/+3
\|
*	Replaced all calls to get parser's current base ID with	Dave Beckett	2007-09-30	1	-11/+13
\| \| \| \|	raptor_parser_get_current_base_id
*	(raptor_grddl_parse_chunk): Remove #ifdef-out old <link> processing	Dave Beckett	2007-09-30	1	-22/+0
\|
*	(raptor_grddl_ensure_internal_parser): Re-init the guess parser each	Dave Beckett	2007-09-30	1	-15/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	time so it does a fresh guess. (raptor_grddl_run_grddl_transform_doc): Save and restore the genid around recursive parsers, so blank nodes are numbered across graphs. (raptor_grddl_run_recursive): Switch to parser_name, flags args. Pass on the filter to the internal parser call. Do not add parent if the parser is not grddl. Pass on the ignore error flag to raptor_grddl_fetch_uri. Save and restore the genid around recursive parsers, so blank nodes are numbered across graphs. Do not call rdfxml parser if selected parser is already rdfxml. Update raptor_grddl_run_recursive calls to use parser name and flags. Alter the <link> processing to use the guess parser to figure out the mime type during the recursion. Do not filter the triples. Fixes Issue#0000238 http://bugs.librdf.org/mantis/view.php?id=238
*	(raptor_grddl_parse_chunk): Use RAPTOR_LIBXML_HTML_PARSE_NONET to	Dave Beckett	2007-09-30	1	-0/+4
\| \| \| \| \|	decide whether to enable libxml HTML_PARSE_NONET with the html parser.
*	Add declaration for libxml_options	Dave Beckett	2007-09-30	1	-0/+2
\|
*	(raptor_grddl_parse_chunk): Use RAPTOR_LIBXML_XML_PARSE_NONET to set	Dave Beckett	2007-09-30	1	-0/+9
\| \| \| \|	XML nonet option if it was set with raptor feature nonet.
*	(raptor_grddl_uri_xml_parse_bytes): Use RAPTOR_LIBXML_XML_PARSE_NONET	Dave Beckett	2007-09-30	1	-1/+1
\| \| \| \|	to check for enum value XML_PARSE_NONET
*	(raptor_grddl_fetch_uri): Reject a URI with feature noNet only if it	Dave Beckett	2007-09-30	1	-2/+4
\| \| \| \|	is not a file URI
*	Revert GRDDL to the main algorithm of around 12377	Dave Beckett	2007-09-24	1	-81/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	which passes the tests again and Fixes Issue#0000239 http://bugs.librdf.org/mantis/view.php?id=239 (raptor_grddl_parser_add_parent): Restored. (raptor_grddl_copy_state): Removed (raptor_grddl_new_child_parser): Removed. (raptor_grddl_run_recursive): Remove reference to the above - replacing raptor_grddl_new_child_parser with raptor_grddl_ensure_internal_parser and replacing 'nparser' references with grddl_parser->internal_parser.
*	(raptor_grddl_discard_message): debug message tweak.	Dave Beckett	2007-09-24	1	-3/+2
\|
*	Remove RDFa support for now	Dave Beckett	2007-09-16	1	-60/+1
\|
*	GRDDL and RDFa	Dave Beckett	2007-08-28	1	-4/+9
\|
*	(raptor_grddl_fetch_uri): Set WWW timeout from value of new parser	Dave Beckett	2007-08-26	1	-0/+4
\| \| \| \|	feature RAPTOR_FEATURE_WWW_TIMEOUT
*	struct raptor_grddl_parser_context_s gains html_link_processing	Dave Beckett	2007-07-09	1	-2/+8
\| \| \| \| \| \| \| \|	to enable looking for <html> <link> with RDF/XML value. (raptor_grddl_parse_init_common): Enable html <link> by default. (raptor_rdfa_parse_init): Disable html <link> for RDFA parser. (raptor_grddl_parse_chunk): Check for html <link> available as well as allowed by feature.
*	Added RAPTOR_FEATURE_HTML_LINK to control GRDDL looking for html <link ↵	Dave Beckett	2007-07-05	1	-1/+1
\| \| \| \|	type="application/rdf+xml" href="uri">
*	(grddl_free_xml_context): Free the context itself.	Dave Beckett	2007-07-04	1	-62/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(raptor_grddl_parser_add_parent): Deleted, merged into raptor_grddl_new_child_parser. Delete html:link entry from table for now - handle rdf/xml links specially later. (raptor_grddl_copy_state): Added, pulled out of raptor_grddl_ensure_internal_parser (raptor_grddl_ensure_internal_parser): Call raptor_grddl_copy_state (raptor_grddl_new_child_parser): Added, from raptor_grddl_ensure_internal_parser and raptor_grddl_parser_add_parent to allocate a new parser rather than overwrite the 'internal_parser'. (raptor_grddl_fetch_uri): Set/reset the content type handler eachtime. (raptor_grddl_run_xpath_match): Free URI after calculating relative to base. (raptor_grddl_run_recursive): Gains filter arg, again. Use raptor_grddl_new_child_parser to make a new (GRDDL) raptor_parser* and free it here when done. (raptor_grddl_parse_chunk): Add new filter arg to raptor_grddl_run_recursive Look for <link type="application/rdf+xml" href="URI" /> with RDF expected, not an XSLT transform URI.
*	(raptor_grddl_discard_message): Report discarded errors when debugging.	Dave Beckett	2007-07-03	1	-64/+125
\| \| \| \| \| \|	(raptor_grddl_parse_chunk): Run XML then HTML parsing in sequence, discarding all errors here. Restore the error handlers afterwards. Move tidying up of buffers to function exit tidying.
*	style	Dave Beckett	2007-07-02	1	-3/+3
\|
*	match-table gains: looking for <link type="application/rdf+xml"	Dave Beckett	2007-07-02	1	-0/+10
\| \| \| \|	href="URI" />
*	Add XSLT security	Dave Beckett	2007-06-20	1	-0/+29
\| \| \| \| \| \|	(raptor_init_parser_grddl_common): Deny reading, writing to files, creating directories or writing to network. (raptor_terminate_parser_grddl_common): Tidy up xslt security prefs.
*	(raptor_grddl_run_grddl_transform_uri): Hack locator URI so errors	Dave Beckett	2007-06-13	1	-2/+8
\| \| \| \|	with XSLT are reported against that URI not the documents.
*	(raptor_grddl_fetch_uri): Fix accept header	Dave Beckett	2007-06-13	1	-1/+1
\|
*	Use /* for root element	Dave Beckett	2007-06-13	1	-1/+1
\|
*	XML @dataview:transformation are only on the root element.	Dave Beckett	2007-06-13	1	-1/+2
\|
*	(raptor_grddl_check_recursive_content_type_handler): Renamed from	Dave Beckett	2007-06-13	1	-17/+26
\| \| \| \| \| \| \|	raptor_grddl_check_rdf_content_type_handler since it stores all content types now. Check for HTML content type and set html_base processing flag (raptor_grddl_run_recursive): Remove allow_rdf flag, always true.
*	struct raptor_grddl_parser_context_s gains xinclude_processing and	Dave Beckett	2007-06-13	1	-22/+50
\| \| \| \| \| \| \| \| \| \| \| \| \|	html_base_processing flags. (raptor_grddl_parse_init_common): Initialise grddl, xinclude but not html base. (raptor_rdfa_parse_init): Disable grddl, xinclude and init html base. (raptor_grddl_run_xpath_match): If html_base_processing is enabled, switch XML doc type to XML_HTML for the xmlNodeGetBase() call and restore afterwards. (raptor_grddl_parse_chunk): Look for HTML or XHTML mime types to enable html_base_processing. Conditionalise XML Include processing with xinclude_processing flag.
*	Debug message madness!	Dave Beckett	2007-06-11	1	-15/+38
\| \| \| \| \|	(raptor_grddl_parse_chunk): After xinclude processing, reserialize the document DOM so it can be parsed later as RDF/XML if needed.
*	(raptor_grddl_parse_chunk): Recognise root rdf:RDF element and	Dave Beckett	2007-06-11	1	-1/+10
\| \| \| \| \|	process as RDF/XML. Fix RDF/XML parsing of doc to not filter triples.
*	(raptor_grddl_run_recursive): Send to right parser.	Dave Beckett	2007-06-10	1	-1/+1
\|
*	(raptor_grddl_parse_uri_write_bytes): Removed.	Dave Beckett	2007-06-10	1	-22/+12
\| \| \| \| \|	(raptor_grddl_run_recursive): Use typedef raptor_parse_bytes_context with raptor_parse_uri_write_bytes as a handler for starting parse lazily
*	(raptor_grddl_run_recursive): Zaps error handlers on recursive parse	Dave Beckett	2007-06-10	1	-0/+7
\| \| \| \|	when ignore_errors set.
*	(raptor_grddl_fetch_uri): flags argument (was ignore_errors) can now	Dave Beckett	2007-06-10	1	-8/+16
\| \| \| \| \| \| \| \|	send a different accept header. (raptor_grddl_run_grddl_transform_uri): Call raptor_grddl_fetch_uri and expect XSLT. (raptor_grddl_run_recursive): Call raptor_grddl_fetch_uri and ignore errors.
*	(raptor_grddl_discard_message): Added.	Dave Beckett	2007-06-10	1	-14/+33
\| \| \| \| \| \| \| \| \| \| \| \|	(raptor_grddl_fetch_uri): Added ignore_errors argument to set the raptor_www error handler to raptor_grddl_discard_message (raptor_grddl_run_grddl_transform_uri): Do not discard errors from raptor_grddl_fetch_uri call. (raptor_grddl_run_recursive): Added ignore_errors argument and use it to return 0 with no warnings, when errors happen. (raptor_grddl_parse_chunk): Run namespace URI recursive grddl while discarding errors. Run head profile URIs recursive grddl while discarding errors.
*	(raptor_grddl_run_grddl_transform_doc): Pass in an xml context and	Dave Beckett	2007-06-10	1	-6/+12
\| \| \| \| \| \|	use the base URI there rather than the parser's. (raptor_grddl_run_grddl_transform_uri): Pass on the xml context to the above.
*	Use XML base URI passed around with the grddl_xml_context.	Dave Beckett	2007-06-10	1	-52/+87
\| \| \| \| \| \| \| \| \| \| \| \| \|	(raptor_new_xml_context): Renamed from raptor_sequence_push_xml_context, moving sequence push to main code. (raptor_rdfa_parse_init): Push URI for RDFa in raptor_grddl_parse_start. (raptor_grddl_parse_start): Add XML context for RDFa here. (raptor_grddl_add_transform_xml_context): Renamed from raptor_grddl_add_transform_uri (raptor_grddl_run_grddl_transform_doc): If there is no parser name guessable, return. (raptor_grddl_run_grddl_transform_uri): Take a grddl_xml_context* arg instead of raptor_uri.
*	Added grddl_xml_context structure.	Dave Beckett	2007-06-07	1	-25/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Transform and profile URI raptor_sequences are now sequences of grddl_xml_context structures. (raptor_sequence_push_xml_context, grddl_free_xml_context): Added. (raptor_grddl_parse_init_common): No need for raptor_libxml_init_generic_error_handlers, raptor_new_sax2 does it. Initialise raptor_sequence with grddl_free_xml_context. (raptor_grddl_add_transform_uri): Use raptor_sequence_push_xml_context. (raptor_grddl_filter_triples): Use grddl_xml_context for profile_uri sequence. (raptor_grddl_run_xpath_match): Use grddl_xml_context for URI results. (raptor_grddl_parse_chunk): Use raptor_sequence_push_xml_context for former URI sequences.
*	Do an additional RDF/XML parse of content that is found to	Dave Beckett	2007-06-05	1	-37/+53
\| \| \| \| \|	be RDF/XML by mime type during recursive GRDDL, and an additional parse of the top level content too, if also found.
*	(raptor_grddl_parse_chunk): Use feature RAPTOR_FEATURE_MICROFORMATS	Dave Beckett	2007-06-05	1	-0/+4
\| \| \| \|	to dis/enable checking for hardcoded microformats
*	Remove C++ comment	Dave Beckett	2007-06-05	1	-3/+3
\|
*	Added MATCH_LAST to stop searching for hardcoded sheets.	Dave Beckett	2007-06-05	1	-12/+26
\| \| \| \| \| \| \| \| \| \|	Add hReview sheet that if matches, stops looking for later microformats such as hCard. (raptor_grddl_parse_chunk): Use MATCH_LAST to stop looking for later hardcoded matches. (raptor_init_parser_grddl_common, raptor_terminate_parser_grddl_common): Added, called once for grddl or rdfa available.
*	(raptor_grddl_add_transform_uri): Added, to add a transformation URI	Dave Beckett	2007-06-04	1	-6/+35
\| \| \| \| \| \|	(XSLT) for a document, removing duplicate URIs. (raptor_grddl_filter_triples, raptor_grddl_parse_chunk): Use raptor_grddl_add_transform_uri.
*	Add RDFa parser	Dave Beckett	2007-06-04	1	-4/+84
\|
*	(raptor_grddl_parse_chunk): Use HTML_PARSE_RECOVER if available	Dave Beckett	2007-05-15	1	-2/+8
\|
*	Added parser feature RAPTOR_FEATURE_HTML_TAG_SOUP aka htmlTagSoup for use by ↵	Dave Beckett	2007-05-15	1	-1/+2
\| \| \| \|	GRDDL parser
*	debugmsg	Dave Beckett	2007-05-15	1	-1/+1
\|
*	Use HTML parser when XML parser fails, to create a DOM for GRDDLing	Dave Beckett	2007-05-15	1	-80/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from invalid/not-WF HTML content. raptor_grddl_parser_context_s gains htmlParserCtxt as well as xmlParserCtxt and process_this_as_rdfxml GRDDL flag to know when to parse the content twice. (raptor_grddl_parse_terminate): Tidy htmlParserCtxt. Add MATCH_IS_HARDCODED to match_table just to make it clear. (match_table): Re-enable hcalendar and hcard as hardcoded XSLTs (raptor_grddl_run_xpath_match): Handle non-namespace elements. Handle MATCH_IS_HARDCODED and return on first match. (raptor_grddl_parse_chunk): Major change in structure - all content passed in is saved until is_end=1, then parsed with XML parser and if that fails, HTML parser. HTML parser is run with no errors or warnings.
*	Update for error_handlers arrays.	Dave Beckett	2007-04-25	1	-2/+3
\|
*	(raptor_grddl_parser_register_factory): Register XHTML mime type	Dave Beckett	2007-03-26	1	-1/+1
\| \| \| \|	higher, very unlikely another parser is dealing with this.