lxml / pyquery 解析 bing 页面xhtml坑一则

不知道谁发明的 xhtml 。坑爹啊。微软的Bing搜索结果页面,是特么xhtml格式的,申明了 xmlns="http://www.w3.org/1999/xhtml",用lxml解析就不能css selector了。解决办法:

pyquery.PyQuery('http://global.bing.com/search?mkt=en-US&q=test').xhtml_to_html()('li')

参考:

https://bitbucket.org/olauzanne/pyquery/issues/10/pyquery-fails-without-errors-when

https://bitbucket.org/olauzanne/pyquery/issues/17/pyquery-fails-when-trying-to-query-a

https://bitbucket.org/olauzanne/pyquery/issues/45/pyquery-fails-to-work-for-the-following

Comments