• In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating...
    25 KB (3,204 words) - 13:19, 10 March 2025
  • Minhash and LSH for Google News personalization. MinHash w-shingling Count–min sketch Locality-sensitive hashing Cyphers, Bennett (2021-03-03). "Google's FLoC...
    3 KB (284 words) - 04:35, 14 November 2024
  • Thumbnail for Levenshtein distance
    engine that implements edit distance) Manhattan distance Metric space MinHash Optimal matching algorithm Numerical taxonomy Sørensen similarity index...
    21 KB (2,434 words) - 07:35, 10 March 2025
  • Thumbnail for Salesforce
    Toopher, a mobile authentication company, Tempo, an AI calendar app, and MinHash, an AI platform. The company also acquired SteelBrick, a software company...
    69 KB (5,906 words) - 08:21, 30 May 2025
  • exclusive ors and circular shifts. MinHash – Data mining technique w-shingling Daniel Lemire, Owen Kaser: Recursive n-gram hashing is pairwise independent, at...
    14 KB (2,014 words) - 07:21, 28 May 2025
  • improves scalability. Additive smoothing Feature extraction Machine learning MinHash Vector space model w-shingling McTear et al 2016, p. 167. Sivic, Josef...
    8 KB (926 words) - 02:02, 12 May 2025
  • trie Hash list Hash table Hash tree Hash trie Koorde Prefix hash tree Rolling hash MinHash Ctrie Many graph-based data structures are used in computer...
    9 KB (914 words) - 05:55, 20 March 2025
  • semantic analysis Local tangent space alignment Locality-sensitive hashing MinHash Multifactor dimensionality reduction Nearest neighbor search Nonlinear...
    21 KB (2,248 words) - 07:14, 18 April 2025
  • Thumbnail for Jaccard index
    are not well defined in these cases. The MinHash min-wise independent permutations locality sensitive hashing scheme may be used to efficiently compute...
    25 KB (3,922 words) - 17:47, 29 May 2025
  • document Locality-sensitive hashing – Algorithmic technique using hashing MinHash – Data mining technique Moody, John (1989). "Fast learning in multi-resolution...
    20 KB (3,124 words) - 18:26, 13 May 2024
  • methods that require a high-quality hash function, including hopscotch hashing, cuckoo hashing, and the MinHash technique for estimating the size of...
    19 KB (2,762 words) - 13:24, 2 September 2024
  • sketch not a linear sketch, it is still mergeable. Feature hashing Locality-sensitive hashing MinHash The following discussion assumes that only "positive"...
    10 KB (1,436 words) - 03:16, 28 March 2025
  • Thumbnail for Intersection (set theory)
    for combinations of sets Logical conjunction – Logical connective AND MinHash – Data mining technique Naive set theory – Informal set theories Symmetric...
    12 KB (1,733 words) - 23:16, 26 December 2023
  • D., et al. "Mash: fast genome and metagenome distance estimation using MinHash." Genome biology 17.1 (2016): 1-14. Bray, J. Roger; Curtis, J. T. (1957)...
    14 KB (1,794 words) - 21:26, 5 March 2025
  • processes can consist of dimensionality -reduction techniques, such as Minhash, and clusterization algorithms such as k-medoids and affinity propagation...
    10 KB (1,175 words) - 23:21, 24 May 2025
  • neighbor algorithm Linear least squares Locality sensitive hashing Maximum inner-product search MinHash Multidimensional analysis Nearest-neighbor interpolation...
    27 KB (3,341 words) - 05:46, 24 February 2025
  • Collocation Feature engineering Hidden Markov model Longest common substring MinHash n-tuple String kernel Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal;...
    20 KB (2,647 words) - 06:45, 26 May 2025
  • approach using minhash. In this method, given a number k, a genomic sequence is transformed into a shorter sketch through a random hash function on the...
    14 KB (1,992 words) - 22:54, 9 March 2025
  • variables. The MinHash algorithm can be implemented using a log ⁡ 1 ϵ {\displaystyle \log {\tfrac {1}{\epsilon }}} -independent hash function as was...
    15 KB (2,001 words) - 14:49, 17 October 2024
  • Bloom filter (category Hash-based data structures)
    portal Count–min sketch – Probabilistic data structure in computer science Feature hashing – Vectorizing features using a hash function MinHash – Data mining...
    90 KB (10,788 words) - 18:48, 28 May 2025
  • Bag-of-words model Jaccard index Concept mining k-mer MinHash n-gram Rabin fingerprint Rolling hash Vector space model Broder; Glassman; Manasse; Zweig...
    3 KB (318 words) - 09:59, 4 June 2025
  • The Secure Hash Algorithms are a family of cryptographic hash functions published by the National Institute of Standards and Technology (NIST) as a U.S...
    3 KB (464 words) - 07:05, 4 October 2024
  • In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability...
    31 KB (4,202 words) - 22:41, 1 June 2025
  • Metropolis–Hastings algorithm Mexican paradox Microdata (statistics) Midhinge Mid-range MinHash Minimax Minimax estimator Minimisation (clinical trials) Minimum chi-square...
    87 KB (8,280 words) - 23:04, 12 March 2025
  • S2CID 196180156. Criscuolo A (November 2020). "On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic...
    44 KB (2,453 words) - 03:20, 15 May 2025
  • Conference on Artificial Intelligence Michael Kearns (computer scientist) MinHash Mixture model Mlpy Models of DNA evolution Moral graph Mountain car problem...
    39 KB (3,386 words) - 19:51, 2 June 2025
  • In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n / m {\displaystyle n/m} keys...
    22 KB (2,597 words) - 01:56, 26 May 2025
  • Thumbnail for Quotient filter
    Quotient filter (category Hashing)
    just the quotients and remainders. MinHash Bloom filter Cuckoo filter Cleary, John G. (September 1984). "Compact hash tables using bidirectional linear...
    20 KB (2,664 words) - 05:02, 27 December 2023
  • billion byte-pair-encoded tokens. Fuzzy deduplication used Apache Spark's MinHashLSH.: 9  Other sources are 19 billion tokens from WebText2 representing...
    55 KB (4,923 words) - 20:03, 12 May 2025
  • Thumbnail for Andrei Broder
    set-intersection problem and "min-hashing" or to construct "sketches" of sets. This was a pioneering effort in the area of locality-sensitive hashing. In 1998, he co-invented...
    9 KB (844 words) - 06:45, 12 December 2024