word frequency from file using partial matchHow to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows

Are Custom Indexes passed on to Sandboxes

Draw a horizontal line from the left margin to the end of centered text

What secular civic space would pioneers build for small frontier towns?

Does "as soon as" imply simultaneity?

How do I set a custom order for folders on Windows 7 and 10?

Examples of "unsuccessful" theories with afterlives

Line segments inside a square

Is it impolite to ask for an in-flight catalogue with no intention of buying?

Find equation of the circle whose diameter is the common chord of two other circles?

Does wetting a beer glass change the foam characteristics?

If an object moving in a circle experiences centripetal force, then doesn't it also experience centrifugal force, because of Newton's third law?

To what extent is it worthwhile to report check fraud / refund scams?

Why are there two fundamental laws of logic?

How do pilots align the HUD with their eyeballs?

My manager quit. Should I agree to defer wage increase to accommodate budget concerns?

Is it more effective to add yeast before or after kneading?

Subverting the emotional woman and stoic man trope

Is it impolite to ask for halal food when traveling to and in Thailand?

What is the meaning of word 'crack' in chapter 33 of A Game of Thrones?

Performance for simple code that converts a RGB tuple to hex string

2000s Animated TV show where teenagers could physically go into a virtual world

Is this Portent-like spell balanced?

Could Apollo astronauts see city lights from the moon?

To change trains = cambiare treno?

word frequency from file using partial match

How to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 14 hours ago

TweetMan

1233 bronze badges

New contributor

add a comment
|

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 14 hours ago

TweetMan

1233 bronze badges

New contributor

add a comment
|

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 14 hours ago

TweetMan

1233 bronze badges

New contributor

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

text-processing command-line

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 14 hours ago

TweetMan

1233 bronze badges

New contributor

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 14 hours ago

TweetMan

1233 bronze badges

New contributor

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

edited 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 14 hours ago

TweetMan

1233 bronze badges

New contributor

asked 14 hours ago

TweetMan

1233 bronze badges

asked 14 hours ago

TweetMan

1233 bronze badges

New contributor

add a comment
|

4 Answers
4

active

oldest

votes

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
7 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
7 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
7 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
4 hours ago

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 4 hours ago

sitaram

1115 bronze badges

add a comment
|

It's not clear to me if the partial matches have to be anchored to the beginning of the line.
Assuming that to be the case, what might speed things up here is the use of binary search via the venerable look command.
Of course look needs that its input file be sorted.
Therefore, first create a sorted version of the original file

 sort file > file.sorted

Then loop through the original file, looking up one word at a time against the sorted file

while read -r word; do 
 printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)"; 
done <file

Some systems don't need the -b flag to be passed to look to force a binary search.
Disk caching of the sorted file could help speed things up even further

edited 50 mins ago

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

add a comment
|

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

TweetMan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

edited 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

answered 13 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
7 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
7 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
7 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
4 hours ago

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
7 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
7 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
7 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
4 hours ago

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 7 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
7 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
7 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
7 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
4 hours ago

add a comment
|

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
7 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
7 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
7 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
4 hours ago

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
7 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
7 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
7 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
4 hours ago

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 4 hours ago

sitaram

1115 bronze badges

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 4 hours ago

sitaram

1115 bronze badges

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 4 hours ago

sitaram

1115 bronze badges

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 4 hours ago

sitaram

1115 bronze badges

answered 4 hours ago

sitaram

1115 bronze badges

answered 4 hours ago

sitaram

1115 bronze badges

answered 4 hours ago

sitaram

1115 bronze badges

add a comment
|

 sort file > file.sorted

Then loop through the original file, looking up one word at a time against the sorted file

while read -r word; do 
 printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)"; 
done <file

Some systems don't need the -b flag to be passed to look to force a binary search.
Disk caching of the sorted file could help speed things up even further

edited 50 mins ago

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

add a comment
|

 sort file > file.sorted

Then loop through the original file, looking up one word at a time against the sorted file

while read -r word; do 
 printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)"; 
done <file

Some systems don't need the -b flag to be passed to look to force a binary search.
Disk caching of the sorted file could help speed things up even further

edited 50 mins ago

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

add a comment
|

 sort file > file.sorted

Then loop through the original file, looking up one word at a time against the sorted file

while read -r word; do 
 printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)"; 
done <file

Some systems don't need the -b flag to be passed to look to force a binary search.
Disk caching of the sorted file could help speed things up even further

edited 50 mins ago

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

 sort file > file.sorted

Then loop through the original file, looking up one word at a time against the sorted file

while read -r word; do 
 printf "%s %dn" "$word" "$(look -b "$word" file.sorted | wc -l)"; 
done <file

Some systems don't need the -b flag to be passed to look to force a binary search.
Disk caching of the sorted file could help speed things up even further

edited 50 mins ago

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

edited 50 mins ago

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

answered 59 mins ago

iruvar

13k6 gold badges34 silver badges64 bronze badges

add a comment
|

TweetMan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

TweetMan is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mfcttrf

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

4 Answers
4

4 Answers
4

4 Answers
4