目前方案是使用 spark ,我知道 clickhouse 很适合 olap 查询场景并且速度很快,但 clickhouse 对于 10000 亿数据量能扛得住吗?或者 clickhouse 也能很好的支持分布式?
对 clickhouse 了解不是很深入,希望大佬指点
]]>而下降的类型是 cache ,且在下降之后又在缓慢上涨,所以猜测肯定是和 clickhouse 有关,所以一顿查询,查到了以下文档,地址: https://clickhouse.com/docs/en/operations/query-cache
目前的内存占用情况:
目前现在有一些疑惑,请各位大佬解答:
1.这红色部分 Cache 占用是 clickhouse 的查询缓存占用吗?若是,为什么会一直上涨,不释放呢?
2.这部分红色的 Cache 占用,会在内存快满的时候释放吗?会有什么影响吗?
]]>AND A.AAC001 NOT EXISTS (SELECT 1 FROM AC02_TEMP AS B WHERE A.AAC001 = B.AAC001 AND B.AAC030 < '2018-01-01 00:00:00')
的语法,在 clickhouse 上我试了 LEFT JOIN 和 NOT IN ,性能均不理想
SELECT COUNT(1) AS "新参保人数" FROM AC02_TEMP AS A WHERE A.AAB301 IN (SELECT AAB301 FROM AA26 WHERE AAA148 = '130800') AND A.AAE200 = '41' AND A.AAC031 = '1' AND A.AAC030 >= '2018-01-01 00:00:00' AND A.AAC001 NOT IN (SELECT B.AAC001 FROM AC02_TEMP AS B WHERE B.AAC030 < '2018-01-01 00:00:00');
以下是 explain
CreatingSets (Create sets before main query execution) Expression ((Projection + Before ORDER BY)) Aggregating Expression (Before GROUP BY) ReadFromMergeTree (default.AC02_TEMP) Indexes: PrimaryKey Keys: AAC001 AAE200 " Condition: and((AAC001 notIn 18692488-element set), (AAE200 in ['41', '41']))" Parts: 2/2 Granules: 4821/4821 CreatingSet (Create set for subquery) Expression ((Projection + Before ORDER BY)) ReadFromMergeTree (default.AA26) Indexes: PrimaryKey Condition: true Parts: 1/1 Granules: 1/1
我是 clickhouse 新手,目前没什么头绪,求大佬帮助 0.0
]]>move_factor:when the amount of available space gets lower than this factor, data automatically starts to move on the next volume if any (by default, 0.1). ClickHouse sorts existing parts by size from largest to smallest (in descending order) and selects parts with the total size that is sufficient to meet the move_factor condition. If the total size of all parts is insufficient, all parts will be moved.
看文档的解释,应该是按 part 的大小优先把大的 part 移到下一个盘
但是什么样的数据会被合并成一个 part 的呢?
大的 part 一定就是时间久远的数据吗?
]]>为了支持 update\delete,我们选择版本折叠树。 主键设置 (userId,orderId)两个字段。 但是我的统计维度却只要 userId (求每个客户的“平均购买力”),如果通过一个维度去聚合 “版本折叠树”,得到的数据又是不准确的。
有什么办法吗?
]]>