GitXplorerGitXplorer
L

strings

public
1 stars
1 forks
0 issues

Commits

List of commits on branch main.
Unverified
56ea226d38a9d275ac48074123133be62d6d3b99

[Package] Update package dependencies info

LLucianoPAlmeida committed 6 months ago
Unverified
954025e90c912a01aa5e2e122cefa1fd5dea75dd

Use accessor to array that does not perform bound checking

LLucianoPAlmeida committed 2 years ago
Unverified
9f9efd3d054604597686b3d72745797d4f7e9732

Fixing issue with wrong index

LLucianoPAlmeida committed 2 years ago
Unverified
7424563fd66128fb2a9c651ec5d3de789ad84111

Remove cost option

LLucianoPAlmeida committed 2 years ago
Unverified
79076a978b88ec28d4eadf1573178a85cf259da6

[NFC] Minors

LLucianoPAlmeida committed 2 years ago
Unverified
8193bfe334c0e47791247791dc3eb375fa7a3ba2

Swiftlint

LLucianoPAlmeida committed 3 years ago

README

The README file for this repository.

strings

license Swift Xcode SPM Swift

Just playing around with strings... It started as a collection of string algorithms implemented in order to learn a bit more about them.

Making it a package so people can import if they find useful.

Algorithms Available

Levenshtein

An implementation of Levenshtein distance algorithm with a few algorithms and swift optimization. Step by step:

  • An optimization by trimming equal prefixes and suffixes from source and destination and since it doesn't affect the result and we can reduce the cost. Example "abcd" -> "auid" == "bc" -> "ui".

  • Early exit for empty cases.

  • Allocate two rows (current and previous) which at the initial stage of the algorithm is first and second row initialized accordingly the first row is 0...destination.count and second row is 1 at initial position and all zeros. Note that the original algorithm pseudo code describes a matrix source.count x destination.count but since through the algorithm iterations it only operates in terms of the current and the previous rows a size allocation optimization can be done.

  • Iterate through "all the rows" calculating the minimum of insertion, deletion and substitution + 1. Pseudo code:

if s[i] = t[j]:
    substitutionCost := 0
else:
    substitutionCost := 1

d[i, j] := minimum(d[i-1, j] + 1,                   // deletion
                   d[i, j-1] + 1,                   // insertion
                   d[i-1, j-1] + substitutionCost)  // substitution

Given that we don't have the whole matrix, the iteration is done by at the end of operating the current row we swap the previous and current so the previous row becomes the current a we reuse the area which was the previous(that will be discarded) for operate the new current initializing the first position of the current accordingly.

Jaro and Jaro-Winkler

Hamming

  • The Hamming distance between two strings of equal length is the count of positions in which the caracter is different in the between the two.

TODO

  • Implement other algorithms e.g. Damerau–Levenshtein(differs from the Levenshtein by including also transponsitions on the calculation in addition to the insertion, substitution and deletion).

Usage

Is very simple to use...

import strings

"friend".levenshteinDistance(to: "fresh") // 3
"adlsajdlsa".jaroDistance(to: "asv") // ~= 0.6222
"friend".hammingDistance(to: "frinnd") // 1

Instalation

You can use The Swift Package Manager to install by adding the proper description to your Package.swift

import PackageDescription

let package = Package(
    name: "YOUR_PROJECT_NAME",
    targets: [],
    dependencies: [
        .package(url: "https://github.com/LucianoPAlmeida/strings.git", from: "0.0.1")
    ]
)

Then add the target as dependency

.target(
    name: "YOUR_TARGET_NAME",
    dependencies: [
        "strings", ... 
    ]
),

Or you can manually just copy the implementations into your project.

Benchmarks

Benchmark was there to make us strive to the most performant implementation given the knowledge of the algorithms and of the swift language.

References

Licence

strings is released under the MIT License.