edits

CI Documentation

A collection of edit distance algorithms in Crystal.

Includes Levenshtein, Restricted Edit (Optimal Alignment) and Damerau-Levenshtein distances, and Jaro and Jaro-Winkler similarity.

Installation

Add this to your application's shard.yml:

dependencies:
  edits:
    github: tcrouch/edits.cr

Usage

require "edits"

Levenshtein variants

Calculate the edit distance between two sequences with variants of the Levenshtein distance algorithm.

Edits::Levenshtein.distance "raked", "bakers"
# => 3
Edits::RestrictedEdit.distance "iota", "atom"
# => 3
Edits::DamerauLevenshtein.distance "acer", "earn"
# => 3

| | Levenshtein | Restricted Damerau-Levenshtein | Damerau-Levenshtein | |----------------------|-------------|--------------------------------|---------------------| | "raked" vs. "bakers" | 3 | 3 | 3 | | "iota" vs. "atom" | 4 | 3 | 3 | | "acer" vs. "earn" | 4 | 4 | 3 |

Levenshtein and Restricted Edit distances accept an optional maximum bound.

Edits::Levenshtein.distance "fghijk", "abcde", 3
# => 3

The convenience method most_similar searches for the best match to a given sequence from a collection. It is similar to using min_by, but leverages a maximum bound.

Edits::RestrictedEdit.most_similar "atom", ["iota", "tome", "mown", "tame"]
# => "tome"

Jaro & Jaro-Winkler

Calculate the Jaro and Jaro-Winkler similarity/distance of two sequences.

Edits::Jaro.similarity "information", "informant"
# => 0.90235690235690236
Edits::Jaro.distance "information", "informant"
# => 0.097643097643097643

Edits::JaroWinkler.similarity "information", "informant"
# => 0.94141414141414137
Edits::JaroWinkler.distance "information", "informant"
# => 0.05858585858585863

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors