![]() | 1 INT21H 2012-06-14 19:47:46 +08:00 ![]() >>> from BeautifulSoup import BeautifulSoup >>> html="""<html> ... <head> ... <title>Test</title> ... </head> ... <body> ... <p>输出我</p> ... <p>我来捣乱</p> ... </body> ... </html>""" >>> bs = BeautifulSoup(html) >>> bs.p <p>输出我</p> >>> bs.p.contents [u'\u8f93\u51fa\u6211'] >>> |
![]() | 2 vfasky 2012-06-14 20:56:33 +08:00 <code> html = '''<html> <head> <title>Test</title> </head> <body> <p>输出我</p> <p>我来捣乱</p> </body> </html>''' for t in html.split('</p>') : print t.replace('<p>','') break; </code> |
![]() | 3 vfasky 2012-06-14 20:58:41 +08:00 ![]() |
![]() | 4 muzuiget 2012-06-14 21:03:12 +08:00 关键词:正则表达式,DOM。 |
5 goofansu 2012-06-14 21:05:13 +08:00 最近也在玩,beautifulsoup很棒 |
![]() | 6 yibin001 2012-06-14 21:16:34 +08:00 beautifulsoup还真是个神器 |
![]() | 7 likuku 2012-06-14 21:29:06 +08:00 ![]() #!/usr/bin/env python # encoding: utf-8 """ html.py Created by likuku on 2012-06-14. Copyright (c) 2012 __MyCompanyName__. All rights reserved. """ import sys import os html=""" <html> <head> <title>Test</title> </head> <body> <p>输出我</p> <p>我来捣乱</p> </body> </html> """ def main(): for text in html.split('\n'): if text.find('<p>') != -1: tmp = text.replace('</p>','').replace('<p>','') print tmp break if __name__ == '__main__': main() |
8 aa88kk 2012-06-14 21:51:15 +08:00 ![]() 用正则: m = re.search('<p>(.*?)<\/p>', s, re.S) |
![]() | 9 cute 2012-06-14 21:57:50 +08:00 ![]() start = s.find('<p>')+ len('<p>') end = s.find('</p>', start) print s[start:end] |
![]() | 10 ihciah OP 谢谢各位!!~~~~~~~~ |
![]() | 11 ling0322 2012-06-18 21:46:38 +08:00 其实有一个比beautifulsoap更霸气的, 叫pyQuery |
![]() | 12 binux 2012-06-18 21:49:04 +08:00 beautifulsoup太费内存了 |
![]() | 13 chairo 2012-06-18 22:25:00 +08:00 libxml路过 |