word frequency from file using partial matchHow to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows

Are Custom Indexes passed on to Sandboxes

Draw a horizontal line from the left margin to the end of centered text

What secular civic space would pioneers build for small frontier towns?

Does "as soon as" imply simultaneity?

How do I set a custom order for folders on Windows 7 and 10?

Examples of "unsuccessful" theories with afterlives

Line segments inside a square

Is it impolite to ask for an in-flight catalogue with no intention of buying?

Find equation of the circle whose diameter is the common chord of two other circles?

Does wetting a beer glass change the foam characteristics?

If an object moving in a circle experiences centripetal force, then doesn't it also experience centrifugal force, because of Newton's third law?

To what extent is it worthwhile to report check fraud / refund scams?

Why are there two fundamental laws of logic?

How do pilots align the HUD with their eyeballs?

My manager quit. Should I agree to defer wage increase to accommodate budget concerns?

Is it more effective to add yeast before or after kneading?

Subverting the emotional woman and stoic man trope

Is it impolite to ask for halal food when traveling to and in Thailand?

What is the meaning of word 'crack' in chapter 33 of A Game of Thrones?

Performance for simple code that converts a RGB tuple to hex string

2000s Animated TV show where teenagers could physically go into a virtual world

Is this Portent-like spell balanced?

Could Apollo astronauts see city lights from the moon?

To change trains = cambiare treno?



word frequency from file using partial match


How to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4















I have a text file like this:



tom
and
jerry
went
to
america
and
england


I want to get the frequency of each word.



When I tried the following command



cat test.txt |sort|uniq -c


I got the following output



 1 america
2 and
1 england
1 jerry
1 to
1 tom
1 went


But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?










share|improve this question









New contributor



TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



























    4















    I have a text file like this:



    tom
    and
    jerry
    went
    to
    america
    and
    england


    I want to get the frequency of each word.



    When I tried the following command



    cat test.txt |sort|uniq -c


    I got the following output



     1 america
    2 and
    1 england
    1 jerry
    1 to
    1 tom
    1 went


    But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?










    share|improve this question









    New contributor



    TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      4












      4








      4








      I have a text file like this:



      tom
      and
      jerry
      went
      to
      america
      and
      england


      I want to get the frequency of each word.



      When I tried the following command



      cat test.txt |sort|uniq -c


      I got the following output



       1 america
      2 and
      1 england
      1 jerry
      1 to
      1 tom
      1 went


      But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?










      share|improve this question









      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have a text file like this:



      tom
      and
      jerry
      went
      to
      america
      and
      england


      I want to get the frequency of each word.



      When I tried the following command



      cat test.txt |sort|uniq -c


      I got the following output



       1 america
      2 and
      1 england
      1 jerry
      1 to
      1 tom
      1 went


      But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?







      text-processing command-line






      share|improve this question









      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share|improve this question









      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|improve this question




      share|improve this question








      edited 13 hours ago









      terdon

      143k35 gold badges295 silver badges472 bronze badges




      143k35 gold badges295 silver badges472 bronze badges






      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked 14 hours ago









      TweetManTweetMan

      1233 bronze badges




      1233 bronze badges




      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.

























          4 Answers
          4






          active

          oldest

          votes


















          4
















          Here's one way, but it isn't very elegant:



          $ sort -u file | while IFS= read -r word; do 
          printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
          done
          america 1
          and 3
          england 1
          jerry 1
          to 2
          tom 1
          went 1





          share|improve this answer


































            3
















            An awk approach:



            awk '
            !x c[$0]; next
            for (i in c) if (index($0, i)) c[i]++
            ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


            Which on your input give



            3 and
            2 to
            1 america
            1 england
            1 jerry
            1 tom
            1 went





            share|improve this answer

























            • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

              – TweetMan
              7 hours ago











            • @TweetMan depends how many unique words there are. It stores all unique words in memory.

              – Stéphane Chazelas
              7 hours ago











            • Hmm. then that would be a problem. it may crash the system.

              – TweetMan
              7 hours ago











            • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

              – A.Danischewski
              4 hours ago


















            1
















            This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



            sort -u < in | while read w
            do
            printf "%dt%sn" `grep -c "$w" in` "$w"
            done


            which on your input got me:



            1 america
            3 and
            1 england
            1 jerry
            2 to
            1 tom
            1 went





            share|improve this answer
































              1
















              It's not clear to me if the partial matches have to be anchored to the beginning of the line.
              Assuming that to be the case, what might speed things up here is the use of binary search via the venerable look command.
              Of course look needs that its input file be sorted.
              Therefore, first create a sorted version of the original file



               sort file > file.sorted


              Then loop through the original file, looking up one word at a time against the sorted file



              while read -r word; do 
              printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)";
              done <file


              Some systems don't need the -b flag to be passed to look to force a binary search.
              Disk caching of the sorted file could help speed things up even further






              share|improve this answer





























                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "106"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );







                TweetMan is a new contributor. Be nice, and check out our Code of Conduct.









                draft saved

                draft discarded
















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                4
















                Here's one way, but it isn't very elegant:



                $ sort -u file | while IFS= read -r word; do 
                printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
                done
                america 1
                and 3
                england 1
                jerry 1
                to 2
                tom 1
                went 1





                share|improve this answer































                  4
















                  Here's one way, but it isn't very elegant:



                  $ sort -u file | while IFS= read -r word; do 
                  printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
                  done
                  america 1
                  and 3
                  england 1
                  jerry 1
                  to 2
                  tom 1
                  went 1





                  share|improve this answer





























                    4














                    4










                    4









                    Here's one way, but it isn't very elegant:



                    $ sort -u file | while IFS= read -r word; do 
                    printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
                    done
                    america 1
                    and 3
                    england 1
                    jerry 1
                    to 2
                    tom 1
                    went 1





                    share|improve this answer















                    Here's one way, but it isn't very elegant:



                    $ sort -u file | while IFS= read -r word; do 
                    printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
                    done
                    america 1
                    and 3
                    england 1
                    jerry 1
                    to 2
                    tom 1
                    went 1






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 7 hours ago









                    Stéphane Chazelas

                    335k58 gold badges654 silver badges1031 bronze badges




                    335k58 gold badges654 silver badges1031 bronze badges










                    answered 13 hours ago









                    terdonterdon

                    143k35 gold badges295 silver badges472 bronze badges




                    143k35 gold badges295 silver badges472 bronze badges


























                        3
















                        An awk approach:



                        awk '
                        !x c[$0]; next
                        for (i in c) if (index($0, i)) c[i]++
                        ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                        Which on your input give



                        3 and
                        2 to
                        1 america
                        1 england
                        1 jerry
                        1 tom
                        1 went





                        share|improve this answer

























                        • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                          – TweetMan
                          7 hours ago











                        • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                          – Stéphane Chazelas
                          7 hours ago











                        • Hmm. then that would be a problem. it may crash the system.

                          – TweetMan
                          7 hours ago











                        • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                          – A.Danischewski
                          4 hours ago















                        3
















                        An awk approach:



                        awk '
                        !x c[$0]; next
                        for (i in c) if (index($0, i)) c[i]++
                        ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                        Which on your input give



                        3 and
                        2 to
                        1 america
                        1 england
                        1 jerry
                        1 tom
                        1 went





                        share|improve this answer

























                        • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                          – TweetMan
                          7 hours ago











                        • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                          – Stéphane Chazelas
                          7 hours ago











                        • Hmm. then that would be a problem. it may crash the system.

                          – TweetMan
                          7 hours ago











                        • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                          – A.Danischewski
                          4 hours ago













                        3














                        3










                        3









                        An awk approach:



                        awk '
                        !x c[$0]; next
                        for (i in c) if (index($0, i)) c[i]++
                        ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                        Which on your input give



                        3 and
                        2 to
                        1 america
                        1 england
                        1 jerry
                        1 tom
                        1 went





                        share|improve this answer













                        An awk approach:



                        awk '
                        !x c[$0]; next
                        for (i in c) if (index($0, i)) c[i]++
                        ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                        Which on your input give



                        3 and
                        2 to
                        1 america
                        1 england
                        1 jerry
                        1 tom
                        1 went






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered 7 hours ago









                        Stéphane ChazelasStéphane Chazelas

                        335k58 gold badges654 silver badges1031 bronze badges




                        335k58 gold badges654 silver badges1031 bronze badges















                        • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                          – TweetMan
                          7 hours ago











                        • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                          – Stéphane Chazelas
                          7 hours ago











                        • Hmm. then that would be a problem. it may crash the system.

                          – TweetMan
                          7 hours ago











                        • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                          – A.Danischewski
                          4 hours ago

















                        • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                          – TweetMan
                          7 hours ago











                        • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                          – Stéphane Chazelas
                          7 hours ago











                        • Hmm. then that would be a problem. it may crash the system.

                          – TweetMan
                          7 hours ago











                        • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                          – A.Danischewski
                          4 hours ago
















                        thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                        – TweetMan
                        7 hours ago





                        thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                        – TweetMan
                        7 hours ago













                        @TweetMan depends how many unique words there are. It stores all unique words in memory.

                        – Stéphane Chazelas
                        7 hours ago





                        @TweetMan depends how many unique words there are. It stores all unique words in memory.

                        – Stéphane Chazelas
                        7 hours ago













                        Hmm. then that would be a problem. it may crash the system.

                        – TweetMan
                        7 hours ago





                        Hmm. then that would be a problem. it may crash the system.

                        – TweetMan
                        7 hours ago













                        Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                        – A.Danischewski
                        4 hours ago





                        Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                        – A.Danischewski
                        4 hours ago











                        1
















                        This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                        sort -u < in | while read w
                        do
                        printf "%dt%sn" `grep -c "$w" in` "$w"
                        done


                        which on your input got me:



                        1 america
                        3 and
                        1 england
                        1 jerry
                        2 to
                        1 tom
                        1 went





                        share|improve this answer





























                          1
















                          This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                          sort -u < in | while read w
                          do
                          printf "%dt%sn" `grep -c "$w" in` "$w"
                          done


                          which on your input got me:



                          1 america
                          3 and
                          1 england
                          1 jerry
                          2 to
                          1 tom
                          1 went





                          share|improve this answer



























                            1














                            1










                            1









                            This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                            sort -u < in | while read w
                            do
                            printf "%dt%sn" `grep -c "$w" in` "$w"
                            done


                            which on your input got me:



                            1 america
                            3 and
                            1 england
                            1 jerry
                            2 to
                            1 tom
                            1 went





                            share|improve this answer













                            This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                            sort -u < in | while read w
                            do
                            printf "%dt%sn" `grep -c "$w" in` "$w"
                            done


                            which on your input got me:



                            1 america
                            3 and
                            1 england
                            1 jerry
                            2 to
                            1 tom
                            1 went






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered 4 hours ago









                            sitaramsitaram

                            1115 bronze badges




                            1115 bronze badges
























                                1
















                                It's not clear to me if the partial matches have to be anchored to the beginning of the line.
                                Assuming that to be the case, what might speed things up here is the use of binary search via the venerable look command.
                                Of course look needs that its input file be sorted.
                                Therefore, first create a sorted version of the original file



                                 sort file > file.sorted


                                Then loop through the original file, looking up one word at a time against the sorted file



                                while read -r word; do 
                                printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)";
                                done <file


                                Some systems don't need the -b flag to be passed to look to force a binary search.
                                Disk caching of the sorted file could help speed things up even further






                                share|improve this answer































                                  1
















                                  It's not clear to me if the partial matches have to be anchored to the beginning of the line.
                                  Assuming that to be the case, what might speed things up here is the use of binary search via the venerable look command.
                                  Of course look needs that its input file be sorted.
                                  Therefore, first create a sorted version of the original file



                                   sort file > file.sorted


                                  Then loop through the original file, looking up one word at a time against the sorted file



                                  while read -r word; do 
                                  printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)";
                                  done <file


                                  Some systems don't need the -b flag to be passed to look to force a binary search.
                                  Disk caching of the sorted file could help speed things up even further






                                  share|improve this answer





























                                    1














                                    1










                                    1









                                    It's not clear to me if the partial matches have to be anchored to the beginning of the line.
                                    Assuming that to be the case, what might speed things up here is the use of binary search via the venerable look command.
                                    Of course look needs that its input file be sorted.
                                    Therefore, first create a sorted version of the original file



                                     sort file > file.sorted


                                    Then loop through the original file, looking up one word at a time against the sorted file



                                    while read -r word; do 
                                    printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)";
                                    done <file


                                    Some systems don't need the -b flag to be passed to look to force a binary search.
                                    Disk caching of the sorted file could help speed things up even further






                                    share|improve this answer















                                    It's not clear to me if the partial matches have to be anchored to the beginning of the line.
                                    Assuming that to be the case, what might speed things up here is the use of binary search via the venerable look command.
                                    Of course look needs that its input file be sorted.
                                    Therefore, first create a sorted version of the original file



                                     sort file > file.sorted


                                    Then loop through the original file, looking up one word at a time against the sorted file



                                    while read -r word; do 
                                    printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)";
                                    done <file


                                    Some systems don't need the -b flag to be passed to look to force a binary search.
                                    Disk caching of the sorted file could help speed things up even further







                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited 50 mins ago

























                                    answered 59 mins ago









                                    iruvariruvar

                                    13k6 gold badges34 silver badges64 bronze badges




                                    13k6 gold badges34 silver badges64 bronze badges
























                                        TweetMan is a new contributor. Be nice, and check out our Code of Conduct.









                                        draft saved

                                        draft discarded

















                                        TweetMan is a new contributor. Be nice, and check out our Code of Conduct.












                                        TweetMan is a new contributor. Be nice, and check out our Code of Conduct.











                                        TweetMan is a new contributor. Be nice, and check out our Code of Conduct.














                                        Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                                        Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                                        Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її