nodejs 请求接口在高并发下耗时很大，而单个请求非常快

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

已注册用户请登录

这是一个创建于 3263 天前的主题，其中的信息可能已经有所发展或是发生改变。

情况： request.js 库请求接口， express.js 做 server ，实现了 curl http://localhost:8080/proxy-api 本地一个地址，在 router 里用 request.js 请求接口，统计了一下请求耗时，单个请求耗时很低，如下：

get http://ip:9190/user/getUserInfo 13 ms

然后分别使用 webbench 和 ab 做并发测试，并发 500 ，发现接口有非常大的耗时。

# 测试命令 ab -n 1000 -c 200 -r http://localhost:8080/proxy-api webbench -t 10 -c 500 http://localhost:8080/proxy-api

# 截取部分响应耗时： get http://ip:9190/user/getUserInfo 2019 ms cost time: 2020 get http://ip:9190/user/getUserInfo 2062 ms cost time: 2062 get http://ip:9190/user/getUserInfo 2064 ms cost time: 2065 get http://ip:9190/user/getUserInfo 2063 ms cost time: 2063 get http://ip:9190/user/getUserInfo 2062 ms cost time: 2063 get http://ip:9190/user/getUserInfo 2063 ms cost time: 2063 get http://ip:9190/user/getUserInfo 2061 ms cost time: 2062 get http://ip:9190/user/getUserInfo 2063 ms cost time: 2064 get http://ip:9190/user/getUserInfo 2063 ms ... ... get http://ip:9190/user/getUserInfo 1362 ms cost time: 1362 get http://ip:9190/user/getUserInfo 1361 ms cost time: 1362 get http://ip:9190/user/getUserInfo 1362 ms cost time: 1362 get http://ip:9190/user/getUserInfo 1362 ms cost time: 1362 get http://ip:9190/user/getUserInfo 1362 ms cost time: 1362 get http://ip:9190/user/getUserInfo 1363 ms cost time: 1363 get http://ip:9190/user/getUserInfo 1362 ms cost time: 1362 ... ... get http://ip:9190/user/getUserInfo 1006 ms cost time: 1006 get http://ip:9190/user/getUserInfo 627 ms cost time: 628 get http://ip:9190/user/getUserInfo 629 ms cost time: 629 get http://ip:9190/user/getUserInfo 628 ms cost time: 629 get http://ip:9190/user/getUserInfo 1403 ms cost time: 1403 get http://ip:9190/user/getUserInfo 1402 ms

请问哪位朋友有没有解决这类问题的经验？

第 1 条附言 2016-11-03 13:05:23 +08:00

++++++++++
2016-11-03 13:00:06 补充一下，简单测试问题的代码，序号输出是无序的，说明是异步，只不过越到后面越耗时：
```
const request = require('request').defaults({
pool: { maxSockets: 5000 }
});

const c = 500;
const api = '替换为一个 api';

function doGet(i) {
const start = Date.now();
const index = i;
request.get(api, () => {
const end = Date.now() - start;
console.log(`${index}: ${end}ms`);
});
}

for (let i = 0; i < c; i++) {
doGet(i);
}
```
执行结果：
```
3: 570ms
1: 582ms
5: 580ms
7: 580ms
9: 580ms
11: 580ms
2: 582ms
0: 591ms
4: 582ms
6: 582ms
12: 582ms
10: 582ms
8: 583ms
14: 583ms
13: 583ms
16: 583ms
17: 584ms
20: 583ms
22: 583ms
24: 583ms
26: 583ms
28: 582ms
30: 582ms
...
457: 2262ms
467: 2263ms
469: 2263ms
237: 2283ms
474: 2267ms
471: 2267ms
493: 2269ms
475: 2272ms
479: 2271ms
477: 2272ms
485: 2272ms
483: 2273ms
481: 2273ms
487: 2273ms
489: 2274ms
495: 2273ms
497: 2273ms
499: 2274ms
491: 4380ms
```

第 2 条附言 2016-11-03 13:12:55 +08:00

发生这种情况，我大概也能理解， js 单线程，任务按队列来的，压的越多越耗时吧，不过才 500 ，应该不至于这么严重吧，关键是什么方式可以优化这种情况呢？

第 3 条附言 2016-11-03 14:16:27 +08:00

统一回一下大家猜测的原因，直接压后端耗时也很小的，所以是前端请求问题。 Transfer rate 大小可以忽略，前端就是本机传输，后端也是局域网，速度不会差什么，带宽也不会跑满，一点点数据。
ab 压前端转发：
Requests per second: 171.05 [#/sec] (mean)
Time per request: 1169.261 [ms] (mean)
Time per request: 5.846 [ms] (mean, across all concurrent requests)
Transfer rate: 1981.31 [Kbytes/sec] received

ab 直接压后端接口：
Requests per second: 858.12 [#/sec] (mean)
Time per request: 233.068 [ms] (mean)
Time per request: 1.165 [ms] (mean, across all concurrent requests)
Transfer rate: 190.23 [Kbytes/sec] received

console.log 耗时在上面备注的测试代码中并没有统计在内。

换成原生 http ，也是一样结果， superagent 虽然没试，猜测不会好多少。
http.get(api, (res) => {
res.on('end', () => {
const end = Date.now() - start;
console.log(`${index}: ${end}ms`);
});
res.resume();
}).on('error', (e) => {
console.log(`Got error: ${e.message}`);
});

第 4 条附言 2016-11-03 14:42:32 +08:00

node 版本从 v6.9.1 升级为 v7.0.0 ，耗时减少一半， v7 提升了不少性能。但还是有挺高的耗时，差不多在 1500ms 左右。

第 5 条附言 2016-11-09 10:23:29 +08:00

最后总结：根据这几天反复测试，暂时没有找到单核单进程的特别有效的代码优化方法，这可能是极限了吧。
最后可以优化的是： 1 、用 node v7 ，比 v6 版本性能高些。 2 、 pm2 多核多进程部署。

<-- SOL tip topic -->

46 条回复 2019-06-14 11:23:16 +08:00

penjianfeng

2016-11-03 11:38:39 +08:00

关注下,公司一个系统准备换成 node 跑,有结果后希望分享下:-)

tofishes

2016-11-03 11:45:54 +08:00

@penjianfeng 好的，目前就这个并发问题了。

janxin

2016-11-03 11:47:49 +08:00

是不是只开了一个进程？

mcfog

2016-11-03 11:49:27 +08:00

所以 8080 上是个无限牛逼的后端接口，然后 9190 上是你的问题的 node ，里面的逻辑是用 request 捅 8080 ，然后并发有问题？

没记错的话， request 库是带连接池的，默认并发只开了 5 ，所以你用并发 500 去压耗时很长是正常的

tofishes

2016-11-03 12:33:04 +08:00

@janxin 没有做任何进程操作，默认的简单程序

tofishes

2016-11-03 12:42:32 +08:00

@mcfog 你理解反了， 8080 上是 node sever, 9190 是后端接口，并发 node 上一个地址，该地址又用了 request 去请求接口，统计了一下 request 开始到完成的耗时。然后你说的 request 连接池是 'pool': { maxSockets: 5000 } ？改为 5000 也没有明显提升。

ibigbug

2016-11-03 12:47:40 +08:00 via iPhone

你看下系统 cpu 内存和 load ？

xxxyyy

2016-11-03 12:48:04 +08:00 via Android

@tofishes 你直接压后端的接口的耗时是多少？

sherlocktheplant

2016-11-03 12:50:56 +08:00

看看是否是内存不足发生内存换页

tofishes

2016-11-03 13:09:05 +08:00

@xxxyyy 直接压后端耗时也很小的，所以是前端请求问题。
ab 压前端转发：
Requests per second: 171.05 [#/sec] (mean)
Time per request: 1169.261 [ms] (mean)
Time per request: 5.846 [ms] (mean, across all concurrent requests)
Transfer rate: 1981.31 [Kbytes/sec] received

ab 直接压后端接口：
Requests per second: 858.12 [#/sec] (mean)
Time per request: 233.068 [ms] (mean)
Time per request: 1.165 [ms] (mean, across all concurrent requests)
Transfer rate: 190.23 [Kbytes/sec] received

tofishes

2016-11-03 13:11:28 +08:00

@ibigbug
@sherlocktheplant 我都是在本地测试，只有后端接口是局域网内一个服务器， mac pro 13 中配，应该不是你们所说的问题

sherlocktheplant

2016-11-03 13:15:38 +08:00

@tofishes 发下转发的代码

powerfj

2016-11-03 13:16:27 +08:00

你这样测试很可能测试的是 request 请求的后端的接口的返回吧, 很可能是后面的接口不行了

sorra

2016-11-03 13:23:15 +08:00

把每个步骤的耗时都弄清楚一些。注意 console.log 也可能耗时。
网上宣传的高并发是高端服务器跑出来的，普通机器不要期望太高。

smallpath

2016-11-03 13:27:51 +08:00

先用 nginx 把请求转发到后端接口，压一下看问题是不是 node 层的。
确定是 node 层的问题后，换一个请求库例如 superagent 跑一下，看看是不是所有库都有这个问题。
我们之前不是局域网时 ab 也是这样，但最后发现是本地带宽跑满了，解决后就没问题了

zhuangzhuang1988

2016-11-03 13:30:59 +08:00

用 visualstudio + node 扩展看下，里面带有性能分析工具的

smallpath

2016-11-03 13:45:57 +08:00

superagent 跑起来还是有延迟的话，再换原生 http 模块试试，当然你有权限的话可以直接看后台接口的请求耗时对不对。
我觉得把 node 层请求用 nginx 转给后台可能有些效果

xxxyyy

2016-11-03 13:55:29 +08:00 via Android

@tofishes 是不是你的 node 版本问题，我用 node.js 7.0 测你给出来的代码，每个请求的耗时相差不大（第一个与最后一个相差都不超过 80ms ）

shanelau

PRO

2016-11-03 14:37:45 +08:00

做排队啊。
关注下设备的性能，是什么原因找到的，网络还是 cpu 还是内存

tofishes

2016-11-03 14:40:59 +08:00

@xxxyyy 我从 v6.9.1 升级到了 v7.0.0 ，耗时减少一半，但依然有超过 1000ms 的。 api 地址干脆使用了 http://localhost ，是本机 nginx 默认页面，耗时从 150ms ~ 500ms 左右都有。

enenaaa

2016-11-03 14:56:44 +08:00

不懂 node 。
考虑一下系统和网络性能，服务端 tcp 监听端口有接受队列，频繁发起请求可能会有点耗时。
可以试一下在单个连接上测试

tofishes

2016-11-03 15:00:19 +08:00

@sherlocktheplant 简单代码已附言

xxxyyy

2016-11-03 15:42:52 +08:00

@tofishes 你试下用以下的代码测下，如果结果差别不大，可能是后端的问题了：

```
const http = require("http");

const server = http.createServer((req, res) => {
setTimeout(function () {
res.writeHead(200);
res.end();
}, Math.random() * 100);
});

server.listen(8888);
```

=============================
然后你上面贴出来的脚本也改成这样：
```
const request = require('request').defaults({
pool: { maxSockets: 5000 }
});

const c = 500;
const api = 'http://localhost:8888';

const costs = [];
function doGet(i) {
const start = Date.now();
const index = i;
request.get(api, () => {
const end = Date.now() - start;
costs.push(`${index}: ${end}ms`);
if (costs.length === c) {
console.log(costs.join("\n"));
}
});
}

for (let i = 0; i < c; i++) {
doGet(i);
}
```

wxx199101046

2016-11-03 15:50:42 +08:00

用上 pm2 结果会不会不一样？

Arrowing

2016-11-03 16:26:14 +08:00

用利用多进程多核的特性没？
我用的 PM2 测的，服务器是 CPU:Intel(R) Xeon(R) E5-2640 0 @ 2.50GHz 共 24 核 MEM:32G

并发数 TPS 响应时间 CPU 成功数写文件丢失率时间
5000 10331.5 0.446 55% 19327244 19323366 0.02% 1 小时
10000 10392.1 0.86 55% 23631728 23626987 0.02% 1 小时
2000 9673.45 0.198 55% 416065314 415983941 0.0195% 11 小时+

tofishes

2016-11-03 18:34:32 +08:00

@Arrowing 没有实用多核多进程

tofishes

2016-11-03 18:38:57 +08:00

@xxxyyy 用了你的代码，用时变的比较均匀，截取最长耗时的和最短耗时，如下：

```
# 最短耗时：
127: 235ms
34: 285ms
62: 283ms
72: 283ms
97: 283ms
55: 289ms
89: 286ms
124: 283ms

# 最长耗时：
279: 611ms
267: 612ms
437: 600ms
467: 599ms
421: 608ms
417: 609ms
436: 608ms
479: 606ms
461: 607ms
443: 608ms
448: 607ms
438: 609ms
430: 609ms
```

tofishes

2016-11-03 18:45:18 +08:00

@wxx199101046 用 pm2 简单部署了下，结果还是一样

sampeng

2016-11-03 18:51:11 +08:00

为啥不把 node 换成 nginx 试试呢？直接 nginx 反向代理试试昂。。。

tofishes

2016-11-03 19:44:28 +08:00

@sampeng 反向代理不是根本目的， node 做页面渲染的，转发接口这是其中一部分，不是 ningx 干的事儿。

hxsf

2016-11-03 20:02:28 +08:00 via iPhone

mark 等下电脑回复

zhuangzhuang1988

2016-11-03 20:57:57 +08:00

https://github.com/Microsoft/nodejstools/wiki/Profiling 这么好的工具试试

https://github.com/Microsoft/nodejstools/wiki/Images/prof-callercalee.png

xxxyyy

2016-11-03 21:05:50 +08:00 via Android

@tofishes 这相差还是很大的，或许你可以用 wireshark 抓包，然后统计下时间，看能不能发现问题。

tofishes

2016-11-04 09:45:12 +08:00

@zhuangzhuang1988 基于 Visual Studio ？ osx 系统比较尴尬。。。我安装个虚拟机吧

sampeng

2016-11-04 10:52:04 +08:00

@tofishes nonono,问题是你需要确认是 node 的问题还是转发的问题。没问题就可以忽略掉 api 的故障了。。
我在使用 nodejs 的过程中转发发现也很诡异，经常堵得严严实实的。。。

tofishes

2016-11-04 11:17:05 +08:00

@sampeng 是这样的，发起 1000 个请求，并发为 1 ，每次耗时都很低，但一旦并发数加大，加到 10 ， 50 ， 100 ，耗时就随之增加，可以认为是并发下的问题。

tofishes

2016-11-04 11:17:45 +08:00

@zhuangzhuang1988 了解了一下， VS Code 有 mac 版本

zhuangzhuang1988

2016-11-04 11:28:25 +08:00

@tofishes vscode 没 profile 功能,记得

zhuangzhuang1988

2016-11-04 13:56:15 +08:00

而且 windows 上安装 nodejs 后支持这个

tofishes

2016-11-09 10:19:16 +08:00

@Arrowing 服务器 24 核，性能不要太好。想问下，你用的测试软件是什么？

tofishes

2016-11-09 10:20:42 +08:00

@wxx199101046 上次 pm2 用法错了，开双核后转发效率有提高了

Arrowing

2016-11-09 10:52:16 +08:00

@tofishes 我司测试大神测的，软件名是 loadrunner

scyuns

2016-11-10 03:09:39 +08:00 via Android

看不懂尴尬

wxx199101046

2016-11-10 10:45:47 +08:00

@tofishes 那就好理论上 pm2 多开就是提高并发啊

Sparetire

2016-11-21 17:21:29 +08:00

求教楼主最后解决了吗？我这里 4 核服务器 20W 请求，每个请求要压到 5s 以下。。不知道怎么优化

E2gCaBAT5I87sw1M

2019-06-14 11:23:16 +08:00

问题解决了：![123]( https://imgchr.com/i/V4sQl4 ''123'')