Clickhouse 字符串函数

2022-10-25 leiting (9540阅读)

标签数据库

1.基本字符串操作

1.1判断是否为空

clickhouse> select empty(''),empty(' '),notEmpty(''),notEmpty(' ');

┌─empty('')─┬─empty(' ')─┬─notEmpty('')─┬─notEmpty(' ')─┐

│ 1 │ 0 │ 0 │ 1 │

└───────────┴────────────┴──────────────┴───────────────┘

1 rows in set. Elapsed: 0.007 sec.

clickhouse的函数名严格区分大小写。

'' 表示空字符串

' ' 表示的一个空格

1.2拼接

concat

clickhouse> select concat('China',' Huawei',' Shenzhen') str;

┌─str───────────────────┐

│ China Huawei Shenzhen │

└───────────────────────┘

clickhouse> select concat('China', null) str;

┌─str──┐

│ ᴺᵁᴸᴸ │

└──────┘

1.3截取：

substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)

三个函数的用法和含义是一样的。

clickhouse> select substring('china',1,2) str;

┌─str─┐

│ ch │

└─────┘

1.4字符串长度

length

char_length, CHAR_LENGTH

character_length, CHARACTER_LENGTH

clickhouse> select length('China'),char_length('China'),character_length('China');

┌─length('China')─┬─char_length('China')─┬─character_length('China')─┐

│ 5 │ 5 │ 5 │

└─────────────────┴──────────────────────┴───────────────────────────┘

clickhouse> select length('中国'),char_length('中国'),character_length('中国');

┌─length('中国')─┬─char_length('中国')─┬─character_length('中国')─┐

│ 6 │ 2 │ 2 │

└────────────────┴─────────────────────┴──────────────────────────┘

1.5字符串重复:

repeat(s, n)

clickhouse> select repeat('China ',5)

┌─repeat('China ', 5)────────────┐

│ China China China China China │

└────────────────────────────────┘

1.6字符串反转：

reverse

1.7追加：

appendTrailingCharIfAbsent(s, c)

1.8获取字符串的开头结尾:

endsWith(s, suffix) startsWith(str, prefix)

1.9 字符串大小写转换:

lower, lcase

upper, ucase

1.10 字符串trim函数:trim

trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)

trimLeft(input_string) ==ltrim(input_string)

trimRight(input_string) ==rtrim(input_string)

trimBoth(input_string) ==trim(input_string)

1.11 字符串加密和解密:

base64Encode(s)

base64Decode(s)

tryBase64Decode(s)

CRC32(s)

CRC32IEEE(s)

CRC64(s)

1.12 转换函数：

convertCharset(s, from, to)

1.13其他函数：

concatAssumeInjective

format(pattern, s0, s1, …)

isValidUTF8

toValidUTF8

normalizeQuery

normalizedQueryHash

2.字符串查找

2.1 字符串定位：

position(haystack, needle), locate(haystack, needle),positionCaseInsensitive

语法：

position(haystack, needle[, start_pos])

locate(haystack, needle[, start_pos])

positionCaseInsensitive(haystack, needle[, start_pos])

2.2 多个字符串定位：

multiSearchAllPositions,multiSearchAllPositionsCaseInsensitive

multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])

multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])

multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])

multiSearchAny(haystack, [needle1, needle2, …, needlen])

2.3 字符串匹配：

match(haystack, pattern)

multiMatchAny(haystack, [pattern1, pattern2, …, patternn])

multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])

multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])

2.4 字符串提取：

extract(haystack, pattern)

extractAll(haystack, pattern)

extractAllGroupsHorizontal

extractAllGroupsVertical

注意：extractAllGroupsHorizontal function is slower than extractAllGroupsVertical.

2.5字符串模糊查询：

like(haystack, pattern), haystack LIKE pattern operator

notLike(haystack, pattern), haystack NOT LIKE pattern operator

ilike

ilike(haystack, pattern)

2.6

ngramDistance(haystack, needle)

ngramSearch(haystack, needle)

2.7字符串出现的次数：

countSubstrings(haystack, needle)

countSubstrings(haystack, needle[, start_pos])

类似的函数：

countSubstrings()/countSubstringsCaseInsensitive()/countSubstringsCaseInsensitiveUTF8()

该功能在clickhouse 20.12.+版本实现。

clickhouse> select countSubstrings('I love China,he love china too,she also love china','china') china_count;

3.字符串替换

replaceOne(haystack, pattern, replacement)

replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)

replaceRegexpOne(haystack, pattern, replacement)

replaceRegexpAll(haystack, pattern, replacement)

regexpQuoteMeta(s)

Predefined characters: \0, \\, |, (, ), ^, $, ., [, ], ?, *, +, {, :, -.

4.字符串切割

splitByChar(separator, s)

splitByString(separator, s)

arrayStringConcat(arr[, separator])

alphaTokens(s)

extractAllGroups(text, regexp)

5.UTF8 相关的函数：

Clickhouse> select * from system.functions where name like '%UTF%';

SELECT *

FROM system.functions

WHERE name LIKE '%UTF%'

┌─name────────────────────────────────────────┬─is_aggregate─┬─case_insensitive─┬─alias_to─┐

│ ngramSearchCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ ngramDistanceCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchFirstIndexCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchAnyUTF8 │ 0 │ 0 │ │

│ positionCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ positionUTF8 │ 0 │ 0 │ │

│ reverseUTF8 │ 0 │ 0 │ │

│ lengthUTF8 │ 0 │ 0 │ │

│ ngramSearchUTF8 │ 0 │ 0 │ │

│ randomStringUTF8 │ 0 │ 0 │ │

│ javaHashUTF16LE │ 0 │ 0 │ │

│ multiSearchFirstIndexUTF8 │ 0 │ 0 │ │

│ isValidUTF8 │ 0 │ 0 │ │

│ multiSearchAllPositionsCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchFirstPositionCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ lowerUTF8 │ 0 │ 0 │ │

│ multiSearchAllPositionsUTF8 │ 0 │ 0 │ │

│ multiSearchFirstPositionUTF8 │ 0 │ 0 │ │

│ substringUTF8 │ 0 │ 0 │ │

│ multiSearchAnyCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ toValidUTF8 │ 0 │ 0 │ │

│ ngramDistanceUTF8 │ 0 │ 0 │ │

│ upperUTF8 │ 0 │ 0 │ │

└─────────────────────────────────────────────┴──────────────┴──────────────────┴──────────┘

23 rows in set. Elapsed: 0.016 sec.

参考：

https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-replace-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/

————————————————

原文链接：https://blog.csdn.net/vkingnew/article/details/108229090

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch 数据库面试 HTML/CSS/XML 网络 JAVA NoSQL 操作系统 C/C++ Golang Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际商业架构设计 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 硬件电子娱乐设计摄影 nginx 游戏 WordPress HTTP 团建数码电器 Docker 大模型

赛默飞明星产品引领色谱分析领域发展如何以创新色谱分析解决方案引领实验室高效运作？ git 通过 SSH 账号验证限制提交到 master 分支灵活人才如何破解供应链战略升级困局？ BD经理与销售经理如何重塑企业增长引擎？代谢组学研究中技术整合驱动生命科学新突破 win10 安装tesla v100 驱动配合R7 240输出（R5 240 以上）

略微加速

略速 - 互联网笔记