Skip to main content
Skip to main content

Hash Functions

Hash functions can be used for the deterministic pseudo-random shuffling of elements.

Simhash is a hash function, which returns close hash values for close (similar) arguments.

halfMD5

Interprets all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as UInt64 in big-endian byte order.

The function is relatively slow (5 million short strings per second per processor core). Consider using the sipHash64 function instead.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Example

MD4

Calculates the MD4 from a string and returns the resulting set of bytes as FixedString(16).

MD5

Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16). If you do not need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the 'sipHash128’ function instead. If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).

RIPEMD160

Produces RIPEMD-160 hash value.

Syntax

Parameters

Returned value

Example

Use the hex function to represent the result as a hex-encoded string.

Query:

sipHash64

Produces a 64-bit SipHash hash value.

This is a cryptographic hash function. It works at least three times faster than the MD5 hash function.

The function interprets all the input parameters as strings and calculates the hash value for each of them. It then combines the hashes by the following algorithm:

  1. The first and the second hash value are concatenated to an array which is hashed.
  2. The previously calculated hash value and the hash of the third input parameter are hashed in a similar way.
  3. This calculation is repeated for all remaining hash values of the original input.

Arguments

The function takes a variable number of input parameters of any of the supported data types.

Returned Value

A UInt64 data type hash value.

Note that the calculated hash values may be equal for the same input values of different argument types. This affects for example integer types of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data.

Example

sipHash64Keyed

Same as sipHash64 but additionally takes an explicit key argument instead of using a fixed key.

Syntax

Arguments

Same as sipHash64, but the first argument is a tuple of two UInt64 values representing the key.

Returned value

A UInt64 data type hash value.

Example

Query:

sipHash128

Like sipHash64 but produces a 128-bit hash value, i.e. the final xor-folding state is done up to 128 bits.

Note

This 128-bit variant differs from the reference implementation and it's weaker. This version exists because, when it was written, there was no official 128-bit extension for SipHash. New projects should probably use sipHash128Reference.

Syntax

Arguments

Same as for sipHash64.

Returned value

A 128-bit SipHash hash value of type FixedString(16).

Example

Query:

Result:

sipHash128Keyed

Same as sipHash128 but additionally takes an explicit key argument instead of using a fixed key.

Note

This 128-bit variant differs from the reference implementation and it's weaker. This version exists because, when it was written, there was no official 128-bit extension for SipHash. New projects should probably use sipHash128ReferenceKeyed.

Syntax

Arguments

Same as sipHash128, but the first argument is a tuple of two UInt64 values representing the key.

Returned value

A 128-bit SipHash hash value of type FixedString(16).

Example

Query:

Result:

sipHash128Reference

Like sipHash128 but implements the 128-bit algorithm from the original authors of SipHash.

Syntax

Arguments

Same as for sipHash128.

Returned value

A 128-bit SipHash hash value of type FixedString(16).

Example

Query:

Result:

sipHash128ReferenceKeyed

Same as sipHash128Reference but additionally takes an explicit key argument instead of using a fixed key.

Syntax

Arguments

Same as sipHash128Reference, but the first argument is a tuple of two UInt64 values representing the key.

Returned value

A 128-bit SipHash hash value of type FixedString(16).

Example

Query:

Result:

cityHash64

Produces a 64-bit CityHash hash value.

This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.

Note that Google changed the algorithm of CityHash after it has been added to ClickHouse. In other words, ClickHouse's cityHash64 and Google's upstream CityHash now produce different results. ClickHouse cityHash64 corresponds to CityHash v1.0.2.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Examples

Call example:

The following example shows how to compute the checksum of the entire table with accuracy up to the row order:

intHash32

Calculates a 32-bit hash code from any type of integer. This is a relatively fast non-cryptographic hash function of average quality for numbers.

Syntax

Arguments

  • int — Integer to hash. (U)Int*.

Returned value

Example

Query:

Result:

intHash64

Calculates a 64-bit hash code from any type of integer. This is a relatively fast non-cryptographic hash function of average quality for numbers. It works faster than intHash32.

Syntax

Arguments

  • int — Integer to hash. (U)Int*.

Returned value

Example

Query:

Result:

SHA1, SHA224, SHA256, SHA512, SHA512_256

Calculates SHA-1, SHA-224, SHA-256, SHA-512, SHA-512-256 hash from a string and returns the resulting set of bytes as FixedString.

Syntax

The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million). We recommend using this function only in cases when you need a specific hash function and you can’t select it. Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECT queries.

Arguments

  • s — Input string for SHA hash calculation. String.

Returned value

  • SHA hash as a hex-unencoded FixedString. SHA-1 returns as FixedString(20), SHA-224 as FixedString(28), SHA-256 — FixedString(32), SHA-512 — FixedString(64). FixedString.

Example

Use the hex function to represent the result as a hex-encoded string.

Query:

Result:

BLAKE3

Calculates BLAKE3 hash string and returns the resulting set of bytes as FixedString.

Syntax

This cryptographic hash-function is integrated into ClickHouse with BLAKE3 Rust library. The function is rather fast and shows approximately two times faster performance compared to SHA-2, while generating hashes of the same length as SHA-256.

Arguments

  • s - input string for BLAKE3 hash calculation. String.

Return value

  • BLAKE3 hash as a byte array with type FixedString(32). FixedString.

Example

Use function hex to represent the result as a hex-encoded string.

Query:

Result:

URLHash(url[, N])

A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization. URLHash(s) – Calculates a hash from a string without one of the trailing symbols /,? or # at the end, if present. URLHash(s, N) – Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols /,? or # at the end, if present. Levels are the same as in URLHierarchy.

farmFingerprint64

farmHash64

Produces a 64-bit FarmHash or Fingerprint value. farmFingerprint64 is preferred for a stable and portable value.

These functions use the Fingerprint64 and Hash64 methods respectively from all available methods.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Example

javaHash

Calculates JavaHash from a string, Byte, Short, Integer, Long. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.

Note that Java only support calculating signed integers hash, so if you want to calculate unsigned integers hash you must cast it to proper signed ClickHouse types.

Syntax

Returned value

A Int32 data type hash value.

Example

Query:

Result:

Query:

Result:

javaHashUTF16LE

Calculates JavaHash from a string, assuming it contains bytes representing a string in UTF-16LE encoding.

Syntax

Arguments

  • stringUtf16le — a string in UTF-16LE encoding.

Returned value

A Int32 data type hash value.

Example

Correct query with UTF-16LE encoded string.

Query:

Result:

hiveHash

Calculates HiveHash from a string.

This is just JavaHash with zeroed out sign bit. This function is used in Apache Hive for versions before 3.0. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.

Returned value

  • hiveHash hash value. Int32.

Example

Query:

Result:

metroHash64

Produces a 64-bit MetroHash hash value.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

A UInt64 data type hash value.

Example

jumpConsistentHash

Calculates JumpConsistentHash form a UInt64. Accepts two arguments: a UInt64-type key and the number of buckets. Returns Int32. For more information, see the link: JumpConsistentHash

kostikConsistentHash

An O(1) time and space consistent hash algorithm by Konstantin 'kostik' Oblakov. Previously yandexConsistentHash.

Syntax

Alias: yandexConsistentHash (left for backwards compatibility sake).

Parameters

  • input: A UInt64-type key UInt64.
  • n: Number of buckets. UInt16.

Returned value

  • A UInt16 data type hash value.

Implementation details

It is efficient only if n <= 32768.

Example

Query:

murmurHash2_32, murmurHash2_64

Produces a MurmurHash2 hash value.

Arguments

Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

  • The murmurHash2_32 function returns hash value having the UInt32 data type.
  • The murmurHash2_64 function returns hash value having the UInt64 data type.

Example

gccMurmurHash

Calculates a 64-bit MurmurHash2 hash value using the same hash seed as gcc. It is portable between Clang and GCC builds.

Syntax

Arguments

Returned value

  • Calculated hash value. UInt64.

Example

Query:

Result:

kafkaMurmurHash

Calculates a 32-bit MurmurHash2 hash value using the same hash seed as Kafka and without the highest bit to be compatible with Default Partitioner.

Syntax

Arguments

Returned value

  • Calculated hash value. UInt32.

Example

Query:

Result:

murmurHash3_32, murmurHash3_64

Produces a MurmurHash3 hash value.

Arguments

Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).

Returned Value

  • The murmurHash3_32 function returns a UInt32 data type hash value.
  • The murmurHash3_64 function returns a UInt64 data type hash value.

Example

murmurHash3_128

Produces a 128-bit MurmurHash3 hash value.

Syntax

Arguments

Returned value

A 128-bit MurmurHash3 hash value. FixedString(16).

Example

Query:

Result:

xxh3

Produces a 64-bit xxh3 hash value.

Syntax

Arguments

Returned value

A 64-bit xxh3 hash value. UInt64.

Example

Query:

Result:

xxHash32, xxHash64

Calculates xxHash from a string. It is proposed in two flavors, 32 and 64 bits.

Returned value

Note

The return type will be UInt32 for xxHash32 and UInt64 for xxHash64.

Example

Query:

Result:

See Also

ngramSimHash

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

ngramSimHashCaseInsensitive

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

ngramSimHashUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

ngramSimHashCaseInsensitiveUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-gram simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

wordShingleSimHash

Splits a ASCII string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

wordShingleSimHashCaseInsensitive

Splits a ASCII string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

wordShingleSimHashUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case sensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

wordShingleSimHashCaseInsensitiveUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and returns the word shingle simhash. Is case insensitive.

Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes of two strings, the more likely these strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.

Returned value

Example

Query:

Result:

wyHash64

Produces a 64-bit wyHash64 hash value.

Syntax

Arguments

Returned value

Example

Query:

Result:

ngramMinHash

Splits a ASCII string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

ngramMinHashCaseInsensitive

Splits a ASCII string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

ngramMinHashUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

ngramMinHashCaseInsensitiveUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and calculates hash values for each n-gram. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

ngramMinHashArg

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHash function with the same input. Is case sensitive.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

ngramMinHashArgCaseInsensitive

Splits a ASCII string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitive function with the same input. Is case insensitive.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

ngramMinHashArgUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashUTF8 function with the same input. Is case sensitive.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

ngramMinHashArgCaseInsensitiveUTF8

Splits a UTF-8 string into n-grams of ngramsize symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.

Syntax

Arguments

  • string — String. String.
  • ngramsize — The size of an n-gram. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHash

Splits a ASCII string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHashCaseInsensitive

Splits a ASCII string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHashUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHashCaseInsensitiveUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words and calculates hash values for each word shingle. Uses hashnum minimum hashes to calculate the minimum hash and hashnum maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.

Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHashArg

Splits a ASCII string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordshingleMinHash function with the same input. Is case sensitive.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHashArgCaseInsensitive

Splits a ASCII string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitive function with the same input. Is case insensitive.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHashArgUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashUTF8 function with the same input. Is case sensitive.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

wordShingleMinHashArgCaseInsensitiveUTF8

Splits a UTF-8 string into parts (shingles) of shinglesize words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.

Syntax

Arguments

  • string — String. String.
  • shinglesize — The size of a word shingle. Optional. Possible values: any number from 1 to 25. Default value: 3. UInt8.
  • hashnum — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from 1 to 25. Default value: 6. UInt8.

Returned value

Example

Query:

Result:

sqidEncode

Encodes numbers as a Sqid which is a YouTube-like ID string. The output alphabet is abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789. Do not use this function for hashing - the generated IDs can be decoded back into the original numbers.

Syntax

Alias: sqid

Arguments

  • A variable number of UInt8, UInt16, UInt32 or UInt64 numbers.

Returned Value

A sqid String.

Example

sqidDecode

Decodes a Sqid back into its original numbers. Returns an empty array in case the input string is not a valid sqid.

Syntax

Arguments

Returned Value

The sqid transformed to numbers Array(UInt64).

Example