How to know the operations made to calculate the Levenshtein distance between strings?What is the difference between String and string in C#?How to check if a string contains a substring in BashHow do I iterate over the words of a string?How do I read / convert an InputStream into a String in Java?How to substring a string in Python?How do I make the first letter of a string uppercase in JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I check if a string contains a specific word?How do I convert a String to an int in Java?

usage of y" not just for locations?

Are the plates of a battery really charged?

What is this fluorinated organic substance?

Is it normal for professors to hold their graduate students "hostage?"

Is there a connection between representation theory and PDEs?

What is the point of using the kunai?

Why will we fail creating a self sustaining off world colony?

Is my background sufficient to start Quantum Computing

What's the idiomatic (or best) way to trim surrounding whitespace from a string?

How can solar sailed ships be protected from space debris?

Why did the Middle Kingdom stop building pyramid tombs?

What happened to the Apollo 1 rocket?

How to extract coefficients of a generating function like this one, using a computer?

Find the closest three-digit hex colour

Which high-degree derivatives play an essential role?

German idiomatic equivalents of 能骗就骗 (if you can cheat, then cheat)

Making arrow with a gradual colour

Can I hire several veteran soldiers to accompany me?

Variable declaration inside main loop

Odd PCB Layout for Voltage Regulator

Replacing 5 gang light switches that have 3 of them daisy chained together

Are the Gray and Death Slaad's Bite and Claw attacks magical?

GFCI versus circuit breaker

Finding an optimal set without forbidden subsets



How to know the operations made to calculate the Levenshtein distance between strings?


What is the difference between String and string in C#?How to check if a string contains a substring in BashHow do I iterate over the words of a string?How do I read / convert an InputStream into a String in Java?How to substring a string in Python?How do I make the first letter of a string uppercase in JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I check if a string contains a specific word?How do I convert a String to an int in Java?













6















With the function stringdist, I can calculate the Levenshtein distance between strings : it counts the number of deletions, insertions and substitutions necessary to turn a string into another. For instance, stringdist("abc abc","abcd abc") = 1 because "d" was inserted in the second string.



Is it possible to know the operations made to obtain the Levenshtein distance between two strings ? Or else to know the characters that are different between the 2 strings (in this example, only "d")?
Thanks.



library(stringdist)
stringdist("abc abc","abcde acc") = 3


I would like to know that :



  • "d" was inserted


  • "e" was inserted


  • "b" was substitued into "c"


Or more simply, I would like to have the list ("d","e","c").










share|improve this question









New contributor



yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1





    I don't know of any R package that does what you want. Do you really need it or are you asking for pedagogic purposes? In any case, see the Wikipedia has to say on this. And upvote.

    – Rui Barradas
    8 hours ago












  • It could help me for my research because I'm trying to know the differences between strings. Thanks for the link

    – yaki
    8 hours ago











  • @RuiBarradas Not only do such packages exist, their existence is the major reason for R’s popularity today. :-)

    – Konrad Rudolph
    7 hours ago















6















With the function stringdist, I can calculate the Levenshtein distance between strings : it counts the number of deletions, insertions and substitutions necessary to turn a string into another. For instance, stringdist("abc abc","abcd abc") = 1 because "d" was inserted in the second string.



Is it possible to know the operations made to obtain the Levenshtein distance between two strings ? Or else to know the characters that are different between the 2 strings (in this example, only "d")?
Thanks.



library(stringdist)
stringdist("abc abc","abcde acc") = 3


I would like to know that :



  • "d" was inserted


  • "e" was inserted


  • "b" was substitued into "c"


Or more simply, I would like to have the list ("d","e","c").










share|improve this question









New contributor



yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1





    I don't know of any R package that does what you want. Do you really need it or are you asking for pedagogic purposes? In any case, see the Wikipedia has to say on this. And upvote.

    – Rui Barradas
    8 hours ago












  • It could help me for my research because I'm trying to know the differences between strings. Thanks for the link

    – yaki
    8 hours ago











  • @RuiBarradas Not only do such packages exist, their existence is the major reason for R’s popularity today. :-)

    – Konrad Rudolph
    7 hours ago













6












6








6


2






With the function stringdist, I can calculate the Levenshtein distance between strings : it counts the number of deletions, insertions and substitutions necessary to turn a string into another. For instance, stringdist("abc abc","abcd abc") = 1 because "d" was inserted in the second string.



Is it possible to know the operations made to obtain the Levenshtein distance between two strings ? Or else to know the characters that are different between the 2 strings (in this example, only "d")?
Thanks.



library(stringdist)
stringdist("abc abc","abcde acc") = 3


I would like to know that :



  • "d" was inserted


  • "e" was inserted


  • "b" was substitued into "c"


Or more simply, I would like to have the list ("d","e","c").










share|improve this question









New contributor



yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











With the function stringdist, I can calculate the Levenshtein distance between strings : it counts the number of deletions, insertions and substitutions necessary to turn a string into another. For instance, stringdist("abc abc","abcd abc") = 1 because "d" was inserted in the second string.



Is it possible to know the operations made to obtain the Levenshtein distance between two strings ? Or else to know the characters that are different between the 2 strings (in this example, only "d")?
Thanks.



library(stringdist)
stringdist("abc abc","abcde acc") = 3


I would like to know that :



  • "d" was inserted


  • "e" was inserted


  • "b" was substitued into "c"


Or more simply, I would like to have the list ("d","e","c").







r string levenshtein-distance stringdist






share|improve this question









New contributor



yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|improve this question









New contributor



yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|improve this question




share|improve this question








edited 7 hours ago









Konrad Rudolph

411k103 gold badges805 silver badges1051 bronze badges




411k103 gold badges805 silver badges1051 bronze badges






New contributor



yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked 8 hours ago









yakiyaki

434 bronze badges




434 bronze badges




New contributor



yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




yaki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









  • 1





    I don't know of any R package that does what you want. Do you really need it or are you asking for pedagogic purposes? In any case, see the Wikipedia has to say on this. And upvote.

    – Rui Barradas
    8 hours ago












  • It could help me for my research because I'm trying to know the differences between strings. Thanks for the link

    – yaki
    8 hours ago











  • @RuiBarradas Not only do such packages exist, their existence is the major reason for R’s popularity today. :-)

    – Konrad Rudolph
    7 hours ago












  • 1





    I don't know of any R package that does what you want. Do you really need it or are you asking for pedagogic purposes? In any case, see the Wikipedia has to say on this. And upvote.

    – Rui Barradas
    8 hours ago












  • It could help me for my research because I'm trying to know the differences between strings. Thanks for the link

    – yaki
    8 hours ago











  • @RuiBarradas Not only do such packages exist, their existence is the major reason for R’s popularity today. :-)

    – Konrad Rudolph
    7 hours ago







1




1





I don't know of any R package that does what you want. Do you really need it or are you asking for pedagogic purposes? In any case, see the Wikipedia has to say on this. And upvote.

– Rui Barradas
8 hours ago






I don't know of any R package that does what you want. Do you really need it or are you asking for pedagogic purposes? In any case, see the Wikipedia has to say on this. And upvote.

– Rui Barradas
8 hours ago














It could help me for my research because I'm trying to know the differences between strings. Thanks for the link

– yaki
8 hours ago





It could help me for my research because I'm trying to know the differences between strings. Thanks for the link

– yaki
8 hours ago













@RuiBarradas Not only do such packages exist, their existence is the major reason for R’s popularity today. :-)

– Konrad Rudolph
7 hours ago





@RuiBarradas Not only do such packages exist, their existence is the major reason for R’s popularity today. :-)

– Konrad Rudolph
7 hours ago










3 Answers
3






active

oldest

votes


















6














With adist(), you can retrieve the operations:



drop(attr(adist("abc abc","abcde acc", count = TRUE), "counts"))

ins del sub
2 0 1


From ?adist:




If counts is TRUE, the transformation counts are returned as the
"counts" attribute of this matrix, as a 3-dimensional array with
dimensions corresponding to the elements of x, the elements of y, and
the type of transformation (insertions, deletions and substitutions),
respectively.







share|improve this answer

























  • Thanks it helps me a lot! Do you know if there is a function to directly know the characters corresponding to these operations ? Else, I could try to create a function using attr(adist("abda cc","abc abc", count = TRUE),"trafos") #= "MMSDMSIM" where M=match, S=substitute, D=delete, I=insert

    – yaki
    8 hours ago







  • 1





    Don't know about any handy function that will do it. However, I assume that playing around trafos will lead you to the desired results.

    – tmfmnk
    8 hours ago


















4














Building off tmfmnk's answer and the suggestion to play around with the "trafos" attribute, here's a function which will show you a table of all the characters inserted or substituted, and how many times they were inserted and substituted. If you set all_actions = T it will show you matches also.



f <- function(x, y, all_actions = FALSE)
o <- adist(x, y, count = TRUE)
cva <-
list(char = strsplit(y, '')[[1]],
action = strsplit(attr(o,"trafos"), '')[[1]])
if(!all_actions)
cva <- lapply(cva, '[', cva$action %in% c('I', 'S'))
do.call(table, cva)


f(x = "abc abc", y = "abcde acc")
# action
# char I S
# c 0 1
# d 1 0
# e 1 0

f(x = "abc abc", y = "abcde acc", all_actions = T)
# action
# char I M S
# 0 1 0
# a 0 2 0
# b 0 1 0
# c 0 2 1
# d 1 0 0
# e 1 0 0





share|improve this answer
































    3














    This is known as the Needleman–Wunsch algorithm. It calculates both the distance between two strings as well as the so-called traceback, which allows you to reconstruct the alignment.



    Since this problem mostly crops up in biology when comparing biological sequences, this algorithm (and related ones) are implemented in the R package Biostrings, which is part of Bioconductor.



    Since this package implements are more general solution than the simple Levenshtein distance, the usage is unfortunately more complex, and the usage vignette is correspondingly long. But the fundamental usage for your purposes is as follows:



    library(Biostrings)

    dist_mat = diag(27L)
    colnames(dist_mat) = rownames(dist_mat) = c(letters, ' ')

    result = pairwiseAlignment(
    "abc abc", "abcde acc",
    substitutionMatrix = dist_mat,
    gapOpening = 1, gapExtension = 1
    )


    This won’t simply give you the list c('b', 'c', 'c'), though, because that list does not fully represent what actually happened here. Instead, it will return an alignment between the two strings. This can be represented as a sequence with substitutions and gaps:



    score(result)
    # [1] 3
    aligned(result)
    as.matrix(aligned(result))
    # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
    # [1,] "a" "b" "c" "-" "-" " " "a" "b" "c"
    aligned(result)


    — For each character in the second string it provides the corresponding character in the original string, replacing inserted characters by -. Basically, this is a “recipe” for transforming the first string into the second string. Note that it will only contain insertions and substitutions, not deletions. To get these, you need to perform the alignment the other way round (i.e. swapping the string arguments).






    share|improve this answer























    • Unfortunately the code above requires you to specify dist_mat manually such that it contains one row and column for each character that your string might contain. The code shown in this answer thus only allows lower-case letters and spaces, nothing else.

      – Konrad Rudolph
      6 hours ago













    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    yaki is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56827772%2fhow-to-know-the-operations-made-to-calculate-the-levenshtein-distance-between-st%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    With adist(), you can retrieve the operations:



    drop(attr(adist("abc abc","abcde acc", count = TRUE), "counts"))

    ins del sub
    2 0 1


    From ?adist:




    If counts is TRUE, the transformation counts are returned as the
    "counts" attribute of this matrix, as a 3-dimensional array with
    dimensions corresponding to the elements of x, the elements of y, and
    the type of transformation (insertions, deletions and substitutions),
    respectively.







    share|improve this answer

























    • Thanks it helps me a lot! Do you know if there is a function to directly know the characters corresponding to these operations ? Else, I could try to create a function using attr(adist("abda cc","abc abc", count = TRUE),"trafos") #= "MMSDMSIM" where M=match, S=substitute, D=delete, I=insert

      – yaki
      8 hours ago







    • 1





      Don't know about any handy function that will do it. However, I assume that playing around trafos will lead you to the desired results.

      – tmfmnk
      8 hours ago















    6














    With adist(), you can retrieve the operations:



    drop(attr(adist("abc abc","abcde acc", count = TRUE), "counts"))

    ins del sub
    2 0 1


    From ?adist:




    If counts is TRUE, the transformation counts are returned as the
    "counts" attribute of this matrix, as a 3-dimensional array with
    dimensions corresponding to the elements of x, the elements of y, and
    the type of transformation (insertions, deletions and substitutions),
    respectively.







    share|improve this answer

























    • Thanks it helps me a lot! Do you know if there is a function to directly know the characters corresponding to these operations ? Else, I could try to create a function using attr(adist("abda cc","abc abc", count = TRUE),"trafos") #= "MMSDMSIM" where M=match, S=substitute, D=delete, I=insert

      – yaki
      8 hours ago







    • 1





      Don't know about any handy function that will do it. However, I assume that playing around trafos will lead you to the desired results.

      – tmfmnk
      8 hours ago













    6












    6








    6







    With adist(), you can retrieve the operations:



    drop(attr(adist("abc abc","abcde acc", count = TRUE), "counts"))

    ins del sub
    2 0 1


    From ?adist:




    If counts is TRUE, the transformation counts are returned as the
    "counts" attribute of this matrix, as a 3-dimensional array with
    dimensions corresponding to the elements of x, the elements of y, and
    the type of transformation (insertions, deletions and substitutions),
    respectively.







    share|improve this answer















    With adist(), you can retrieve the operations:



    drop(attr(adist("abc abc","abcde acc", count = TRUE), "counts"))

    ins del sub
    2 0 1


    From ?adist:




    If counts is TRUE, the transformation counts are returned as the
    "counts" attribute of this matrix, as a 3-dimensional array with
    dimensions corresponding to the elements of x, the elements of y, and
    the type of transformation (insertions, deletions and substitutions),
    respectively.








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 8 hours ago

























    answered 8 hours ago









    tmfmnktmfmnk

    7,8981 gold badge8 silver badges21 bronze badges




    7,8981 gold badge8 silver badges21 bronze badges












    • Thanks it helps me a lot! Do you know if there is a function to directly know the characters corresponding to these operations ? Else, I could try to create a function using attr(adist("abda cc","abc abc", count = TRUE),"trafos") #= "MMSDMSIM" where M=match, S=substitute, D=delete, I=insert

      – yaki
      8 hours ago







    • 1





      Don't know about any handy function that will do it. However, I assume that playing around trafos will lead you to the desired results.

      – tmfmnk
      8 hours ago

















    • Thanks it helps me a lot! Do you know if there is a function to directly know the characters corresponding to these operations ? Else, I could try to create a function using attr(adist("abda cc","abc abc", count = TRUE),"trafos") #= "MMSDMSIM" where M=match, S=substitute, D=delete, I=insert

      – yaki
      8 hours ago







    • 1





      Don't know about any handy function that will do it. However, I assume that playing around trafos will lead you to the desired results.

      – tmfmnk
      8 hours ago
















    Thanks it helps me a lot! Do you know if there is a function to directly know the characters corresponding to these operations ? Else, I could try to create a function using attr(adist("abda cc","abc abc", count = TRUE),"trafos") #= "MMSDMSIM" where M=match, S=substitute, D=delete, I=insert

    – yaki
    8 hours ago






    Thanks it helps me a lot! Do you know if there is a function to directly know the characters corresponding to these operations ? Else, I could try to create a function using attr(adist("abda cc","abc abc", count = TRUE),"trafos") #= "MMSDMSIM" where M=match, S=substitute, D=delete, I=insert

    – yaki
    8 hours ago





    1




    1





    Don't know about any handy function that will do it. However, I assume that playing around trafos will lead you to the desired results.

    – tmfmnk
    8 hours ago





    Don't know about any handy function that will do it. However, I assume that playing around trafos will lead you to the desired results.

    – tmfmnk
    8 hours ago











    4














    Building off tmfmnk's answer and the suggestion to play around with the "trafos" attribute, here's a function which will show you a table of all the characters inserted or substituted, and how many times they were inserted and substituted. If you set all_actions = T it will show you matches also.



    f <- function(x, y, all_actions = FALSE)
    o <- adist(x, y, count = TRUE)
    cva <-
    list(char = strsplit(y, '')[[1]],
    action = strsplit(attr(o,"trafos"), '')[[1]])
    if(!all_actions)
    cva <- lapply(cva, '[', cva$action %in% c('I', 'S'))
    do.call(table, cva)


    f(x = "abc abc", y = "abcde acc")
    # action
    # char I S
    # c 0 1
    # d 1 0
    # e 1 0

    f(x = "abc abc", y = "abcde acc", all_actions = T)
    # action
    # char I M S
    # 0 1 0
    # a 0 2 0
    # b 0 1 0
    # c 0 2 1
    # d 1 0 0
    # e 1 0 0





    share|improve this answer





























      4














      Building off tmfmnk's answer and the suggestion to play around with the "trafos" attribute, here's a function which will show you a table of all the characters inserted or substituted, and how many times they were inserted and substituted. If you set all_actions = T it will show you matches also.



      f <- function(x, y, all_actions = FALSE)
      o <- adist(x, y, count = TRUE)
      cva <-
      list(char = strsplit(y, '')[[1]],
      action = strsplit(attr(o,"trafos"), '')[[1]])
      if(!all_actions)
      cva <- lapply(cva, '[', cva$action %in% c('I', 'S'))
      do.call(table, cva)


      f(x = "abc abc", y = "abcde acc")
      # action
      # char I S
      # c 0 1
      # d 1 0
      # e 1 0

      f(x = "abc abc", y = "abcde acc", all_actions = T)
      # action
      # char I M S
      # 0 1 0
      # a 0 2 0
      # b 0 1 0
      # c 0 2 1
      # d 1 0 0
      # e 1 0 0





      share|improve this answer



























        4












        4








        4







        Building off tmfmnk's answer and the suggestion to play around with the "trafos" attribute, here's a function which will show you a table of all the characters inserted or substituted, and how many times they were inserted and substituted. If you set all_actions = T it will show you matches also.



        f <- function(x, y, all_actions = FALSE)
        o <- adist(x, y, count = TRUE)
        cva <-
        list(char = strsplit(y, '')[[1]],
        action = strsplit(attr(o,"trafos"), '')[[1]])
        if(!all_actions)
        cva <- lapply(cva, '[', cva$action %in% c('I', 'S'))
        do.call(table, cva)


        f(x = "abc abc", y = "abcde acc")
        # action
        # char I S
        # c 0 1
        # d 1 0
        # e 1 0

        f(x = "abc abc", y = "abcde acc", all_actions = T)
        # action
        # char I M S
        # 0 1 0
        # a 0 2 0
        # b 0 1 0
        # c 0 2 1
        # d 1 0 0
        # e 1 0 0





        share|improve this answer















        Building off tmfmnk's answer and the suggestion to play around with the "trafos" attribute, here's a function which will show you a table of all the characters inserted or substituted, and how many times they were inserted and substituted. If you set all_actions = T it will show you matches also.



        f <- function(x, y, all_actions = FALSE)
        o <- adist(x, y, count = TRUE)
        cva <-
        list(char = strsplit(y, '')[[1]],
        action = strsplit(attr(o,"trafos"), '')[[1]])
        if(!all_actions)
        cva <- lapply(cva, '[', cva$action %in% c('I', 'S'))
        do.call(table, cva)


        f(x = "abc abc", y = "abcde acc")
        # action
        # char I S
        # c 0 1
        # d 1 0
        # e 1 0

        f(x = "abc abc", y = "abcde acc", all_actions = T)
        # action
        # char I M S
        # 0 1 0
        # a 0 2 0
        # b 0 1 0
        # c 0 2 1
        # d 1 0 0
        # e 1 0 0






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 5 hours ago

























        answered 7 hours ago









        IceCreamToucanIceCreamToucan

        12.7k1 gold badge8 silver badges19 bronze badges




        12.7k1 gold badge8 silver badges19 bronze badges





















            3














            This is known as the Needleman–Wunsch algorithm. It calculates both the distance between two strings as well as the so-called traceback, which allows you to reconstruct the alignment.



            Since this problem mostly crops up in biology when comparing biological sequences, this algorithm (and related ones) are implemented in the R package Biostrings, which is part of Bioconductor.



            Since this package implements are more general solution than the simple Levenshtein distance, the usage is unfortunately more complex, and the usage vignette is correspondingly long. But the fundamental usage for your purposes is as follows:



            library(Biostrings)

            dist_mat = diag(27L)
            colnames(dist_mat) = rownames(dist_mat) = c(letters, ' ')

            result = pairwiseAlignment(
            "abc abc", "abcde acc",
            substitutionMatrix = dist_mat,
            gapOpening = 1, gapExtension = 1
            )


            This won’t simply give you the list c('b', 'c', 'c'), though, because that list does not fully represent what actually happened here. Instead, it will return an alignment between the two strings. This can be represented as a sequence with substitutions and gaps:



            score(result)
            # [1] 3
            aligned(result)
            as.matrix(aligned(result))
            # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
            # [1,] "a" "b" "c" "-" "-" " " "a" "b" "c"
            aligned(result)


            — For each character in the second string it provides the corresponding character in the original string, replacing inserted characters by -. Basically, this is a “recipe” for transforming the first string into the second string. Note that it will only contain insertions and substitutions, not deletions. To get these, you need to perform the alignment the other way round (i.e. swapping the string arguments).






            share|improve this answer























            • Unfortunately the code above requires you to specify dist_mat manually such that it contains one row and column for each character that your string might contain. The code shown in this answer thus only allows lower-case letters and spaces, nothing else.

              – Konrad Rudolph
              6 hours ago















            3














            This is known as the Needleman–Wunsch algorithm. It calculates both the distance between two strings as well as the so-called traceback, which allows you to reconstruct the alignment.



            Since this problem mostly crops up in biology when comparing biological sequences, this algorithm (and related ones) are implemented in the R package Biostrings, which is part of Bioconductor.



            Since this package implements are more general solution than the simple Levenshtein distance, the usage is unfortunately more complex, and the usage vignette is correspondingly long. But the fundamental usage for your purposes is as follows:



            library(Biostrings)

            dist_mat = diag(27L)
            colnames(dist_mat) = rownames(dist_mat) = c(letters, ' ')

            result = pairwiseAlignment(
            "abc abc", "abcde acc",
            substitutionMatrix = dist_mat,
            gapOpening = 1, gapExtension = 1
            )


            This won’t simply give you the list c('b', 'c', 'c'), though, because that list does not fully represent what actually happened here. Instead, it will return an alignment between the two strings. This can be represented as a sequence with substitutions and gaps:



            score(result)
            # [1] 3
            aligned(result)
            as.matrix(aligned(result))
            # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
            # [1,] "a" "b" "c" "-" "-" " " "a" "b" "c"
            aligned(result)


            — For each character in the second string it provides the corresponding character in the original string, replacing inserted characters by -. Basically, this is a “recipe” for transforming the first string into the second string. Note that it will only contain insertions and substitutions, not deletions. To get these, you need to perform the alignment the other way round (i.e. swapping the string arguments).






            share|improve this answer























            • Unfortunately the code above requires you to specify dist_mat manually such that it contains one row and column for each character that your string might contain. The code shown in this answer thus only allows lower-case letters and spaces, nothing else.

              – Konrad Rudolph
              6 hours ago













            3












            3








            3







            This is known as the Needleman–Wunsch algorithm. It calculates both the distance between two strings as well as the so-called traceback, which allows you to reconstruct the alignment.



            Since this problem mostly crops up in biology when comparing biological sequences, this algorithm (and related ones) are implemented in the R package Biostrings, which is part of Bioconductor.



            Since this package implements are more general solution than the simple Levenshtein distance, the usage is unfortunately more complex, and the usage vignette is correspondingly long. But the fundamental usage for your purposes is as follows:



            library(Biostrings)

            dist_mat = diag(27L)
            colnames(dist_mat) = rownames(dist_mat) = c(letters, ' ')

            result = pairwiseAlignment(
            "abc abc", "abcde acc",
            substitutionMatrix = dist_mat,
            gapOpening = 1, gapExtension = 1
            )


            This won’t simply give you the list c('b', 'c', 'c'), though, because that list does not fully represent what actually happened here. Instead, it will return an alignment between the two strings. This can be represented as a sequence with substitutions and gaps:



            score(result)
            # [1] 3
            aligned(result)
            as.matrix(aligned(result))
            # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
            # [1,] "a" "b" "c" "-" "-" " " "a" "b" "c"
            aligned(result)


            — For each character in the second string it provides the corresponding character in the original string, replacing inserted characters by -. Basically, this is a “recipe” for transforming the first string into the second string. Note that it will only contain insertions and substitutions, not deletions. To get these, you need to perform the alignment the other way round (i.e. swapping the string arguments).






            share|improve this answer













            This is known as the Needleman–Wunsch algorithm. It calculates both the distance between two strings as well as the so-called traceback, which allows you to reconstruct the alignment.



            Since this problem mostly crops up in biology when comparing biological sequences, this algorithm (and related ones) are implemented in the R package Biostrings, which is part of Bioconductor.



            Since this package implements are more general solution than the simple Levenshtein distance, the usage is unfortunately more complex, and the usage vignette is correspondingly long. But the fundamental usage for your purposes is as follows:



            library(Biostrings)

            dist_mat = diag(27L)
            colnames(dist_mat) = rownames(dist_mat) = c(letters, ' ')

            result = pairwiseAlignment(
            "abc abc", "abcde acc",
            substitutionMatrix = dist_mat,
            gapOpening = 1, gapExtension = 1
            )


            This won’t simply give you the list c('b', 'c', 'c'), though, because that list does not fully represent what actually happened here. Instead, it will return an alignment between the two strings. This can be represented as a sequence with substitutions and gaps:



            score(result)
            # [1] 3
            aligned(result)
            as.matrix(aligned(result))
            # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
            # [1,] "a" "b" "c" "-" "-" " " "a" "b" "c"
            aligned(result)


            — For each character in the second string it provides the corresponding character in the original string, replacing inserted characters by -. Basically, this is a “recipe” for transforming the first string into the second string. Note that it will only contain insertions and substitutions, not deletions. To get these, you need to perform the alignment the other way round (i.e. swapping the string arguments).







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 6 hours ago









            Konrad RudolphKonrad Rudolph

            411k103 gold badges805 silver badges1051 bronze badges




            411k103 gold badges805 silver badges1051 bronze badges












            • Unfortunately the code above requires you to specify dist_mat manually such that it contains one row and column for each character that your string might contain. The code shown in this answer thus only allows lower-case letters and spaces, nothing else.

              – Konrad Rudolph
              6 hours ago

















            • Unfortunately the code above requires you to specify dist_mat manually such that it contains one row and column for each character that your string might contain. The code shown in this answer thus only allows lower-case letters and spaces, nothing else.

              – Konrad Rudolph
              6 hours ago
















            Unfortunately the code above requires you to specify dist_mat manually such that it contains one row and column for each character that your string might contain. The code shown in this answer thus only allows lower-case letters and spaces, nothing else.

            – Konrad Rudolph
            6 hours ago





            Unfortunately the code above requires you to specify dist_mat manually such that it contains one row and column for each character that your string might contain. The code shown in this answer thus only allows lower-case letters and spaces, nothing else.

            – Konrad Rudolph
            6 hours ago










            yaki is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            yaki is a new contributor. Be nice, and check out our Code of Conduct.












            yaki is a new contributor. Be nice, and check out our Code of Conduct.











            yaki is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56827772%2fhow-to-know-the-operations-made-to-calculate-the-levenshtein-distance-between-st%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

            Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

            199年 目錄 大件事 到箇年出世嗰人 到箇年死嗰人 節慶、風俗習慣 導覽選單