[爬虫]php 通过 ajax 与 file_get_contents, snoopy 都无法获取 壹心理 电台的动态页面 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
bosshida
V2EX    问与答

[爬虫]php 通过 ajax 与 file_get_contents, snoopy 都无法获取 壹心理 电台的动态页面

  •  
  •   bosshida 2014-12-22 23:05:46 +08:00 7286 次点击
    这是一个创建于 4024 天前的主题,其中的信息可能已经有所发展或是发生改变。
    尝试抓取 壹心理的FM的信息,例如: http://fm.xinli001.com/#4916186 通过firebug,知道页面载入后会发送
    http://fm.xinli001.com/broadcast/?pk=4916186&t=1419258885474 会获取当前FM的基本信息。我尝试用ajax访问该网址,返回403 FORBIDDEN。
    firefox另外提示:“已阻止交叉源请求:同源策略不允许读取 http://fm.xinli001.com/broadcast/?pk=12139723&t=1419253731488 上的远程资源。可以将资源移动到相同的域名上或者启用 CORS 来解决这个问题。”
    js代码:
    $.ajax
    ({
    type: "get",
    dataType: "json",
    url: "http://fm.xinli001.com/broadcast/?pk=4916186&t=1419258885474",
    success:function(data){alert('ok');},
    timeout:30000,
    error: function (XMLHttpRequest, textStatus, errorThrown) {
    alert('error');
    }
    });

    googel一翻,搜到一个方法,增加:jQuery.support.cors = true; 后也是不可以。
    增加Headers相当参数也不可以。实在没辄了。
    请问有没什么办法可以获取到FM的基本信息?
    13 条回复    2014-12-24 11:02:21 +08:00
    Jat001
        1
    Jat001  
       2014-12-22 23:18:32 +08:00
    带上 header
    X-Requested-With XMLHttpRequest
    Referer http://fm.xinli001.com/
    做爬虫就是模拟浏览器,看看浏览器发了什么 header,一个个减少,直到出错,就知道需要什么 header。
    fising
        2
    fising  
       2014-12-22 23:38:37 +08:00 via iPad
    ajax跨域了,被浏览器block住了
    bosshida
        3
    bosshida  
    OP
       2014-12-23 00:13:00 +08:00 via Android
    @fising 有什么办法解决吗?
    Jat001
        4
    Jat001  
       2014-12-23 00:37:07 +08:00
    @bosshida 要么在他们的服务端设置 Access-Control-Allow-Origin header,当然,你肯定没这权限。要么就用类似 userscripts 的方法搞。
    其实我觉得这种请求最好在服务端搞定。
    esile
        5
    esile  
       2014-12-23 01:52:08 +08:00
    设置referer和X-Requested-With即可成功获取了

    以下是测试返回值
    {"code": 0, "data": {"favnum": 398, "commentnum": 120, "speaker_id": 108, "is_home": true, "background": "http://image.xinli001.com/20141220/18083879570a3ec9b9a360.jpg", "speak_url": "http://www.xinli001.com/user/742450/", "duration": 1283, "tags": [], "weight": 397, "title1": "", "_cache_key": "data_fm_broadcast_4916186", "rticle": null, "specials": [], "_id": "54954aea4f670ade3e8b4a1b", "range": 20535196, "word": "\u6625\u6653", "speakers_id": [], "lizhi_url": "", "created": "2014-12-20 18:01", "word_url": "http://www.xinli001.com/user/article/3866918/", "speak": "\u5cf0_\u5c0f\u5cf0", "id": 4916186, "is_teacher": false, "message_url": "", "cover": "http://image.xinli001.com/20141220/18094254011b53336c1227.jpg", "title": "\u6211\u548c\u90b5\u6bdb\u6bdb\u7684\u65e5\u4e0e\u591c", "url": "http://image.kaolafm.net/mz/audios/201412/a59b5e60-e515-4804-88f5-64f167aa957e.mp3", "absolute_url": "http://fm.xinli001.com/4916186/", "content": "\u4e0d\u8bba\u751f\u6d3b\u5728\u54ea\u91cc\uff0c\u53ea\u8981\u5728\u4e00\u8d77\u5c31\u597d\u4e86\u3002\u6211\u4eec\u5728\u83dc\u5e02\u573a\u4e70\u83dc\uff0c\u5728\u623f\u95f4\u91cc\u505a\u996d\uff0c\u996d\u540e\u6cbf\u7740\u8857\u8fb9\u6563\u6b65\uff0c\u4e00\u8d77\u770b\u592a\u9633\u5347\u8d77\uff0c\u592a\u9633\u843d\u4e0b\uff0c\u8fd9\u6837\u5c31\u8db3\u591f\u4e86\u3002", "url1": ""}}
    bosshida
        6
    bosshida  
    OP
       2014-12-23 10:17:52 +08:00
    @Jat001 可以加的header都加了,但都不行。我对着Firefox的header,逐个增加参数,还是提示403 FORBIDDEN.

    <!DOCTYPE html>
    <html>
    <head>
    <meta http-equiv="content-type" cOntent="text/html; charset=">
    <script src="./jquery-2.0.0.min.js"></script>

    <script type="text/Javascript">
    function test(){
    $.ajax({
    type : "get",
    url : "http://fm.xinli001.com/broadcast/",
    datatype:"json",
    data: "pk=97701348&t=1419296643104",
    headers:{
    "Referer":"http://fm.xinli001.com/",
    "X-Requested-With":"XMLHttpRequest",
    "Accept":"*/*",
    "Accept-Encoding":"gzip, deflate",
    "Accept-Language":" zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3",
    "Connection":"keep-alive",
    "Host":"fm.xinli001.com",
    "User-Agent":"Mozilla/5.0 (Windows NT 5.1; rv:34.0) Gecko/20100101 Firefox/34.0",
    },
    success : function(json){
    alert('ok');
    },
    error:function(){
    alert('fail');
    }
    });
    }
    </script>

    <title>parseFm</title>
    </head>
    <body>
    <input type="button" value="test" Onclick="test();">
    </body>
    </html>
    yrdr
        7
    yrdr  
       2014-12-23 10:18:23 +08:00
    第一,你跨域了,所以请用jsonp
    第二,你没设置http头,被服务器屏蔽了请求了吧
    bosshida
        8
    bosshida  
    OP
       2014-12-23 10:18:27 +08:00
    @esile 你是怎么测试成功的?可以发下测试代码吗?
    zhangwei727
        9
    zhangwei727  
       2014-12-23 12:10:54 +08:00
    @esile 同求测试源码,[email protected] 谢谢!
    nilennoct
        10
    nilennoct  
       2014-12-23 13:41:04 +08:00
    @bosshida 这种需求就不要在浏览器里玩了,还是用node吧==
    bosshida
        11
    bosshida  
    OP
       2014-12-23 20:55:48 +08:00
    @yrdr 试过jsonp了,还是不行。用jquery和用原生Js代码的Jsonp都返回403 forbidden。
    Jquery:
    <script type="text/Javascript">
    function haha(){
    $.ajax({
    type : "get",
    async:false,
    url : "http://fm.xinli001.com/broadcast/",
    data: "pk=97701348&t=1419336731430",
    dataType: "jsonp",
    jsonpCallback:"fmHandler",
    headers:{
    "Referer":"http://fm.xinli001.com/",
    "X-Requested-With":"XMLHttpRequest",
    "Accept":"*/*",
    "Accept-Encoding":"gzip, deflate",
    "Accept-Language":" zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3",
    "Connection":"keep-alive",
    "Host":"fm.xinli001.com",
    "User-Agent":"Mozilla/5.0 (Windows NT 5.1; rv:34.0) Gecko/20100101 Firefox/34.0",
    },
    success : function(json){
    console.log(json);
    alert('ok');
    },
    error:function(){
    alert('fail');
    }
    });
    }
    </script>

    原生Js:
    <script type="text/Javascript">
    var myFmHandler = function(data){
    alert('ok');
    };
    var url = "http://fm.xinli001.com/broadcast/?pk=97701348&t=1419336731430&callback=myFmHandler";
    var script = document.createElement('script');
    script.setAttribute('src', url);
    document.getElementsByTagName('head')[0].appendChild(script);
    </script>

    楼上说的Node.js,我没用过,现在来现学现用一下。。。
    esile
        12
    esile  
       2014-12-24 11:01:38 +08:00
    @bosshida @zhangwei727 需要搞那么负责么?
    <?php
    function fetchpage($url, $referer)
    {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array ('X-Requested-With: XMLHttpRequest') );
    curl_setopt($ch, CURLOPT_HEADER,false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_REFERER, $referer);
    curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6; .NET CLR 2.0.50727; CIBA)");
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    $temp = curl_exec($ch);
    curl_close($ch);
    return $temp;

    }

    var_dump(fetchpage('http://fm.xinli001.com/broadcast/?pk=4916186&t=1419258885474', 'http://fm.xinli001.com/'));
    esile
        13
    esile  
       2014-12-24 11:02:21 +08:00
    负责=复杂,o()o 唉 拼音坑人
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     3297 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 20ms UTC 11:02 PVG 19:02 LAX 03:02 JFK 06:02
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86