A quick test of tokenizing: >>> from lxml.cssselect import tokenize, parse >>> def ptok(s): ... for item in tokenize(s): ... print(repr(item).replace("u'", "'")) >>> ptok('E > f[a~="y\\"x"]') Symbol('E', 0) Token('>', 2) Symbol('f', 4) Token('[', 5) Symbol('a', 6) Token('~=', 7) String('y"x', 9) Token(']', 15) Then of parsing: >>> parse('td.foo, .bar') Or([Class[Element[td].foo], CombinedSelector[Element[*] Class[Element[*].bar]]]) >>> parse('div, td.foo, div.bar span') Or([Element[div], Class[Element[td].foo], CombinedSelector[Class[Element[div].bar] Element[span]]]) >>> parse('div > p') CombinedSelector[Element[div] > Element[p]] >>> parse('td:first') Pseudo[Element[td]:first] >>> parse('a[name]') Attrib[Element[a][name]] >>> parse('a [name]') CombinedSelector[Element[a] Attrib[Element[*][name]]] >>> repr(parse('a[rel="include"]')).replace("u'", "'") "Attrib[Element[a][rel = String('include', 6)]]" >>> repr(parse('a[hreflang |= \'en\']')).replace("u'", "'") "Attrib[Element[a][hreflang |= String('en', 14)]]" >>> parse('div:nth-child(10)') Function[Element[div]:nth-child(10)] >>> parse('div:nth-of-type(10)') Function[Element[div]:nth-of-type(10)] >>> parse('div div:nth-of-type(10) .aclass') CombinedSelector[CombinedSelector[Element[div] Function[Element[div]:nth-of-type(10)]] Class[Element[*].aclass]] >>> parse('label:only') Pseudo[Element[label]:only] >>> parse('a:lang(fr)') Function[Element[a]:lang(Element[fr])] >>> repr(parse('div:contains("foo")')).replace("u'", "'") "Function[Element[div]:contains(String('foo', 13))]" >>> parse('div#foobar') Hash[Element[div]#foobar] >>> parse('div:not(div.foo)') Function[Element[div]:not(Class[Element[div].foo])] >>> parse('td ~ th') CombinedSelector[Element[td] ~ Element[th]] Some parse error tests: >>> try: parse('attributes(href)/html/body/a') ... except: # Py2, Py3, ... ... import sys ... print(str(sys.exc_info()[1]).replace("(u'", "('")) Expected selector, got '(' at [Symbol('attributes', 0)] -> Token('(', 10) Now of translation: >>> def xpath(css): ... print(parse(css).xpath()) >>> xpath('*') * >>> xpath('E') e >>> xpath('E[foo]') e[@foo] >>> xpath('E[foo="bar"]') e[@foo = 'bar'] >>> xpath('E[foo~="bar"]') e[contains(concat(' ', normalize-space(@foo), ' '), ' bar ')] >>> xpath('E[foo^="bar"]') e[starts-with(@foo, 'bar')] >>> xpath('E[foo$="bar"]') e[substring(@foo, string-length(@foo)-2) = 'bar'] >>> xpath('E[foo*="bar"]') e[contains(@foo, 'bar')] >>> xpath('E[hreflang|="en"]') e[@hreflang = 'en' or starts-with(@hreflang, 'en-')] >>> #xpath('E:root') >>> xpath('E:nth-child(1)') */*[name() = 'e' and (position() = 1)] >>> xpath('E:nth-last-child(1)') */*[name() = 'e' and (position() = last() - 1)] >>> xpath('E:nth-last-child(2n+2)') */*[name() = 'e' and ((position() +2) mod -2 = 0 and position() < (last() -2))] >>> xpath('E:nth-of-type(1)') */e[position() = 1] >>> xpath('E:nth-last-of-type(1)') */e[position() = last() - 1] >>> xpath('E:nth-last-of-type(1)') */e[position() = last() - 1] >>> xpath('div E:nth-last-of-type(1) .aclass') div/descendant::e[position() = last() - 1]/descendant::*[contains(concat(' ', normalize-space(@class), ' '), ' aclass ')] >>> xpath('E:first-child') */*[name() = 'e' and (position() = 1)] >>> xpath('E:last-child') */*[name() = 'e' and (position() = last())] >>> xpath('E:first-of-type') */e[position() = 1] >>> xpath('E:last-of-type') */e[position() = last()] >>> xpath('E:only-child') */*[name() = 'e' and (last() = 1)] >>> xpath('E:only-of-type') e[last() = 1] >>> xpath('E:empty') e[not(*) and not(normalize-space())] >>> xpath('E:contains("foo")') e[contains(css:lower-case(string(.)), 'foo')] >>> xpath('E.warning') e[contains(concat(' ', normalize-space(@class), ' '), ' warning ')] >>> xpath('E#myid') e[@id = 'myid'] >>> xpath('E:not(:contains("foo"))') e[not(contains(css:lower-case(string(.)), 'foo'))] >>> xpath('E F') e/descendant::f >>> xpath('E > F') e/f >>> xpath('E + F') e/following-sibling::*[name() = 'f' and (position() = 1)] >>> xpath('E ~ F') e/following-sibling::f >>> xpath('div#container p') div[@id = 'container']/descendant::p >>> xpath('p *:only-of-type') Traceback (most recent call last): ... NotImplementedError: *:only-of-type is not implemented Now a Unicode character test: >>> from lxml.cssselect import css_to_xpath >>> import sys >>> if sys.version_info[0] >= 3: ... css_expr = '.a\xc1b' ... else: ... css_expr = '.a\xc1b'.decode('ISO-8859-1') >>> xpath_expr = css_to_xpath(css_expr) >>> print( css_expr[1:] in xpath_expr ) True >>> print( xpath_expr.encode('ascii', 'xmlcharrefreplace').decode('ASCII') ) descendant-or-self::*[contains(concat(' ', normalize-space(@class), ' '), ' aÁb ')] And some special character tests: >>> print( css_to_xpath('*[aval="\'"]') ) descendant-or-self::*[@aval = "'"] >>> print( css_to_xpath('*[aval="\'\'\'"]') ) descendant-or-self::*[@aval = "'''"] >>> print( css_to_xpath('*[aval=\'"\']') ) descendant-or-self::*[@aval = '"'] >>> print( css_to_xpath('*[aval=\'"""\']') ) descendant-or-self::*[@aval = '"""'] Some Unicode escape tests (including the trailing whitespace rules): >>> print( css_to_xpath(r'*[aval="\'\22\'"]') ) # \22 == '"' descendant-or-self::*[@aval = concat("'",'"',"'")] >>> print( css_to_xpath(r'*[aval="\'\22 2\'"]') ) descendant-or-self::*[@aval = concat("'",'"2',"'")] >>> print( css_to_xpath(r'*[aval="\'\20 \'"]') ) # \20 == ' ' descendant-or-self::*[@aval = "' '"] >>> print( css_to_xpath('*[aval="\'\\20\r\n \'"]') ) descendant-or-self::*[@aval = "' '"] Then some test for parse_series: >>> from lxml.cssselect import parse_series >>> parse_series('1n+3') (1, 3) >>> parse_series('n-5') (1, -5) >>> parse_series('odd') (2, 1) >>> parse_series('3n') (3, 0) >>> parse_series('n') (1, 0) >>> parse_series('5') (0, 5)