<dt><a name="313"></a>ADHE 313 (6) <b>Organization of Adult Basic Education Programs</b></dt>
想抓出ADHE313和Organization of Adult Basic Education Programs
想抓出ADHE313和Organization of Adult Basic Education Programs

1 asj May 28, 2015 这难道不是应该用CSS/JQuery selector,或者XPath么? |
2 phx13ye May 28, 2015 <\/a>(.*)<b>(.*?)<\/b> |
3 sicongliu OP xpath比较简单 但是想学下正则的方法 |
4 shoumu May 28, 2015 看一下pyquery吧,支持jQuery的语法 |
5 professorz May 28, 2015 .+<\\/a>(.+)(6)<b>(.+)<\\/b>.+ java下的regex |
6 sicongliu OP python的如何写 |
7 yiyiwa May 28, 2015 python测试了一下,不完善,有空的东西。 '\>([^\<]*)\<' |
8 sicongliu OP m=re.search("</a>(.*?)\s(",text) print (m.group(1)) m=re.search("<b>(.*?)</b>(",text) print (m.group(1)) |
9 sicongliu OP 如果要取ADHE 313呢? 如何判断第二个空格?当然用字符串的search切片功能很容易达到,只是想知道正则如何达到 |
10 sicongliu OP m=re.search("</a>(.*?)\s+\(",text) print (m.group(1)) 当然方法比较笨,如果第二个空格后不是“(”就没办法了 |
11 asj May 28, 2015 |
12 fy May 28, 2015 这个需求不用正则,会简单得多 page.xpath("//dt/text()") -> ADHE 313 (6) page.xpath("//dt/b/text()") -> Organization of Adult Basic Education Programs |
13 picasso250 May 28, 2015 /a>([\w ()]+)<b>([\w ]+)</b> 最简单的解决了你现在的问题。 |
14 picasso250 May 28, 2015 对不起,上一个是错误的,多提取了(6) /a>(\w+ \d+).+?<b>([\w ]+)</b> |
15 leozy2014 May 28, 2015 print re.findall('</a>(.*?) \(6\) <b>(.*?)</b></dt>', s) #[('ADHE 313', 'Organization of Adult Basic Education Programs')] |
16 wmttom May 28, 2015 python正则 (?<=>)[\w, ,\(,\)]+?(?= \(|<) re.findall("(?<=>)[\w, ,\(,\)]+?(?= \(|<)", '<dt><a name="313"></a>ADHE 313 (6) <b>Organization of Adult Basic Education Programs</b></dt>') ['ADHE 313', 'Organization of Adult Basic Education Programs'] |
17 sicongliu OP 楼上两个貌似都不能用 |
18 sicongliu OP sorry这个可行 print re.findall('</a>(.*?) \(6\) <b>(.*?)</b></dt>', s) #[('ADHE 313', 'Organization of Adult Basic Education Programs')] |