ClickHouse way to explore https:https://cdn.v2ex.com/navatar/a016/0709/1099_normal.png?m=1628849112 https:https://cdn.v2ex.com/navatar/a016/0709/1099_large.png?m=1628849112 2025-10-06T16:16:58Z Copyright © 2010-2018, V2EX 有没有熟悉 clickhouse 的? clickhouse 对于分布式支持的如何? tag:www.v2ex.com,2025-10-06:/t/1163573 2025-10-06T22:17:58Z 2025-10-06T16:16:58Z red13 member/red13 要对一个大数据量的 table 进行查询,不会有复杂的查询逻辑,都是简单的 where 、order by 、group by 、sum 、avg 、count 查询,当前数据量接近 500 亿了,在半年内会增加到一万亿。

目前方案是使用 spark ,我知道 clickhouse 很适合 olap 查询场景并且速度很快,但 clickhouse 对于 10000 亿数据量能扛得住吗?或者 clickhouse 也能很好的支持分布式?

对 clickhouse 了解不是很深入,希望大佬指点

]]>
ClickHouse 的 MaterializedMySQL 引擎 tag:www.v2ex.com,2025-04-28:/t/1128690 2025-04-28T09:17:10Z 2025-04-28T09:15:10Z yb2313 member/yb2313
有人实际用过吗, 使用体验怎么样?

怎么这个节点就 8 个主题, 这也太少了 ]]>
两条数据库创建语句产生了同样的效果 CREATE DATABASE hello1; 与 CREATE DATABASE hello ON CLUSTER 'xxxxx'; tag:www.v2ex.com,2024-06-27:/t/1053202 2024-06-27T13:20:07Z 2024-06-27T15:20:07Z aapeli member/aapeli
问题: CREATE DATABASE hello1; 与 CREATE DATABASE hello ON CLUSTER 'xxxxx'; 产生了相同的效果,都在 clickhouse 所有节点上创建了数据库,咨询下可能存在的原因.

期望的效果: 不加 ON CLUSTER 只在本地创建,加了就在整个集群每个节点上创建. ]]>
大佬们,我又来了!群晖装 clickhouse,撑得住吗? tag:www.v2ex.com,2024-03-25:/t/1026970 2024-03-25T16:16:21Z 2024-03-25T18:16:21Z xyxy member/xyxy 这台群晖 cpu 和 6G 内存,撑得住吗?
查询次数很低,查询数据量预估一个月的,300 万条
]]> 究竟是什么在占用着内存 tag:www.v2ex.com,2024-02-23:/t/1017867 2024-02-23T06:44:09Z 2024-02-23T06:41:09Z fruitmonster member/fruitmonster 新手使用 clickhouse ,前几天意外重启了下 clickhouse ,发现监控中内存使用,急剧下降:

而下降的类型是 cache ,且在下降之后又在缓慢上涨,所以猜测肯定是和 clickhouse 有关,所以一顿查询,查到了以下文档,地址: https://clickhouse.com/docs/en/operations/query-cache

目前的内存占用情况:

目前现在有一些疑惑,请各位大佬解答:

1.这红色部分 Cache 占用是 clickhouse 的查询缓存占用吗?若是,为什么会一直上涨,不释放呢?

2.这部分红色的 Cache 占用,会在内存快满的时候释放吗?会有什么影响吗?

]]>
求大佬优化一下 3000 万数据的 NOT IN 查询 tag:www.v2ex.com,2023-12-13:/t/1000096 2023-12-13T09:28:29Z 2023-12-26T01:08:59Z sunrealzhang member/sunrealzhang 我有一个 3000w 行的数据表,我需要在这个表上统计从某一年开始新参保的人数,原数据库是 ORACLE ,用的是

AND A.AAC001 NOT EXISTS (SELECT 1 FROM AC02_TEMP AS B WHERE A.AAC001 = B.AAC001 AND B.AAC030 < '2018-01-01 00:00:00') 

的语法,在 clickhouse 上我试了 LEFT JOIN 和 NOT IN ,性能均不理想

SELECT COUNT(1) AS "新参保人数" FROM AC02_TEMP AS A WHERE A.AAB301 IN (SELECT AAB301 FROM AA26 WHERE AAA148 = '130800') AND A.AAE200 = '41' AND A.AAC031 = '1' AND A.AAC030 >= '2018-01-01 00:00:00' AND A.AAC001 NOT IN (SELECT B.AAC001 FROM AC02_TEMP AS B WHERE B.AAC030 < '2018-01-01 00:00:00'); 

以下是 explain

CreatingSets (Create sets before main query execution) Expression ((Projection + Before ORDER BY)) Aggregating Expression (Before GROUP BY) ReadFromMergeTree (default.AC02_TEMP) Indexes: PrimaryKey Keys: AAC001 AAE200 " Condition: and((AAC001 notIn 18692488-element set), (AAE200 in ['41', '41']))" Parts: 2/2 Granules: 4821/4821 CreatingSet (Create set for subquery) Expression ((Projection + Before ORDER BY)) ReadFromMergeTree (default.AA26) Indexes: PrimaryKey Condition: true Parts: 1/1 Granules: 1/1 

我是 clickhouse 新手,目前没什么头绪,求大佬帮助 0.0

]]>
求大佬优化 3000w 数据多 UNION tag:www.v2ex.com,2022-11-10:/t/894027 2022-11-10T01:36:23Z 2022-11-10T04:24:08Z dollck member/dollck 我有一个 3000w 行的数据表,用户输入数据后,需要在表内 6 个字段依次查询是否与数据匹配,试过 EXPLAIN SYNTAX 但没有用 现在运行时间差不多 3-4s 之内 大家有办法吗 语句如下:

WITH A AS (SELECT * FROM otherinfor)
SELECT * FROM A where value1 = '1'UNION DISTINCT
SELECT * FROM A where value2 = '1'UNION DISTINCT
SELECT * FROM A where value3 = '1'UNION DISTINCT
SELECT * FROM A where value4 = '1'UNION DISTINCT
SELECT * FROM A where value5 = '1'UNION DISTINCT
SELECT * FROM A where value6 = '1'
下面是贴了 explain 的:

Distinct
Union
Expression ((Projection + Before ORDER BY))
Filter ((WHERE + (Projection + Before ORDER BY)))
ReadFromMergeTree (default.otherinfor)
Expression ((Projection + Before ORDER BY))
Filter ((WHERE + (Projection + Before ORDER BY)))
ReadFromMergeTree (default.otherinfor)
Expression ((Projection + Before ORDER BY))
Filter ((WHERE + (Projection + Before ORDER BY)))
ReadFromMergeTree (default.otherinfor)
Expression ((Projection + Before ORDER BY))
Filter ((WHERE + (Projection + Before ORDER BY)))
ReadFromMergeTree (default.otherinfor)
Expression ((Projection + Before ORDER BY))
Filter ((WHERE + (Projection + Before ORDER BY)))
ReadFromMergeTree (default.otherinfor)
Expression ((Projection + Before ORDER BY))
Limit (preliminary LIMIT (without OFFSET))
Filter ((WHERE + (Projection + Before ORDER BY)))
ReadFromMergeTree (default.otherinfor)
特别感谢大佬们,这对我非常重要 ]]>
clickhouse 文档里的划分冷热多盘存储配置真的是按时间划分冷热数据的吗? tag:www.v2ex.com,2022-10-09:/t/885593 2022-10-09T10:32:16Z 2022-10-09T10:32:16Z meso5533 member/meso5533

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#table_engine-mergetree-multiple-volumes

move_factor:when the amount of available space gets lower than this factor, data automatically starts to move on the next volume if any (by default, 0.1). ClickHouse sorts existing parts by size from largest to smallest (in descending order) and selects parts with the total size that is sufficient to meet the move_factor condition. If the total size of all parts is insufficient, all parts will be moved.

看文档的解释,应该是按 part 的大小优先把大的 part 移到下一个盘

但是什么样的数据会被合并成一个 part 的呢?

大的 part 一定就是时间久远的数据吗?

]]>
请教各位大佬关于 clickhouse 的问题 tag:www.v2ex.com,2021-10-22:/t/809834 2021-10-22T07:49:15Z 2021-10-22T07:48:15Z qq1340691923 member/qq1340691923 请问 clickhouse 的用户基础信息表怎么存,是存按用户 id 进行 alter table 修改数据,还是存 ReplacingMergeTree 引擎,定期 optimize

]]>
我这个场景, clickhouse 适用吗? tag:www.v2ex.com,2021-06-03:/t/781077 2021-06-03T03:17:26Z 2021-06-03T06:10:21Z wenjun19931112 member/wenjun19931112 比如求每个客户的“平均购买力”。 我们表设计 3 个字段 ( userId,orderId,price )。 但是 price 可能会变动,这条记录会被删除(业务原因)。

为了支持 update\delete,我们选择版本折叠树。 主键设置 (userId,orderId)两个字段。 但是我的统计维度却只要 userId (求每个客户的“平均购买力”),如果通过一个维度去聚合 “版本折叠树”,得到的数据又是不准确的。

有什么办法吗?

]]>
ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86