Pronounceable encrypted textsimple algorithm to encrypt/decrypt a text fileShould I add randomness at the end before AES?Will knowing clear text value behind encrypted data help hacker find password?Does an encryption algorithm exist where encrypted data can be decrypted by multiple private keys?
How could a planet have one hemisphere way warmer than the other without the planet being tidally locked?
Did the Byzantines ever attempt to move their capital to Rome?
Pronounceable encrypted text
Dissuading my girlfriend from a scam
Is Sanskrit really the mother of all languages?
What exactly is Apple Cider
What is the justification for Dirac's large numbers hypothesis?
How can I hint that my character isn't real?
Does the Giant Toad's Swallow also inflict its Bite damage?
Entering the US with dual citizenship but US passport is long expired?
Infinitely many primes
Can you create water inside someone's mouth?
Did the US Climate Reference Network Show No New Warming Since 2005 in the US?
How do I use NEC PC-6001 .p6 or .cas files?
What does it mean to count a group of numbers with their multiplicity?
What are some countries where you can be imprisoned for reading or owning a Bible?
What do English-speaking kids call ice-cream on a stick?
Euro sign in table with siunitx
Is it right to use the ideas of non-winning designers in a design contest?
Can taking my 1-week-old on a 6-7 hours journey in the car lead to medical complications?
What is the purpose of the rotating plate in front of the lock?
If I sell my PS4 game disc and buy a digital version, can I still access my saved game?
French equivalent of "my cup of tea"
If I have an accident, should I file a claim with my car insurance company?
Pronounceable encrypted text
simple algorithm to encrypt/decrypt a text fileShould I add randomness at the end before AES?Will knowing clear text value behind encrypted data help hacker find password?Does an encryption algorithm exist where encrypted data can be decrypted by multiple private keys?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I want to "translate" a text so that its encrypted form should be pronounceable text. Is there any crypto technique that can make encrypted text look as another language? (yet, difficult to decrypt).
encryption
$endgroup$
add a comment |
$begingroup$
I want to "translate" a text so that its encrypted form should be pronounceable text. Is there any crypto technique that can make encrypted text look as another language? (yet, difficult to decrypt).
encryption
$endgroup$
$begingroup$
Not another language but look at Format Preserving Encryption, that you can spell, also, you can encode the output into base64...
$endgroup$
– kelalaka
9 hours ago
$begingroup$
@kelalaka - thank you. Since I am a total amateur, is there any software (win, mac or web) that does this?
$endgroup$
– octo
9 hours ago
$begingroup$
There are many libraries depending on your programing language just Googleyour_programming_language FPE, and base64 is almost common in all.
$endgroup$
– kelalaka
9 hours ago
$begingroup$
Just to be crystal clear, you want your cipher text to look like46 66 72 BF 2E 3A 09 67 6D EF F3 77 14 3A A5 96 58 AA E9 79 88 2E B1 71 6F 93 11 E0 21 FA 35rather than pronounceable English words (phonemes)? So not like what Lisa Gerard sings (glosalalia)?
$endgroup$
– Paul Uszak
8 hours ago
1
$begingroup$
Commercial one-part codes used pronounceable words (prior to WW1, then the telegraph rules changed), but that isn't software and wasn't secure against attack even at the time.
$endgroup$
– Eugene Styer
8 hours ago
add a comment |
$begingroup$
I want to "translate" a text so that its encrypted form should be pronounceable text. Is there any crypto technique that can make encrypted text look as another language? (yet, difficult to decrypt).
encryption
$endgroup$
I want to "translate" a text so that its encrypted form should be pronounceable text. Is there any crypto technique that can make encrypted text look as another language? (yet, difficult to decrypt).
encryption
encryption
edited 9 hours ago
octo
asked 9 hours ago
octoocto
184 bronze badges
184 bronze badges
$begingroup$
Not another language but look at Format Preserving Encryption, that you can spell, also, you can encode the output into base64...
$endgroup$
– kelalaka
9 hours ago
$begingroup$
@kelalaka - thank you. Since I am a total amateur, is there any software (win, mac or web) that does this?
$endgroup$
– octo
9 hours ago
$begingroup$
There are many libraries depending on your programing language just Googleyour_programming_language FPE, and base64 is almost common in all.
$endgroup$
– kelalaka
9 hours ago
$begingroup$
Just to be crystal clear, you want your cipher text to look like46 66 72 BF 2E 3A 09 67 6D EF F3 77 14 3A A5 96 58 AA E9 79 88 2E B1 71 6F 93 11 E0 21 FA 35rather than pronounceable English words (phonemes)? So not like what Lisa Gerard sings (glosalalia)?
$endgroup$
– Paul Uszak
8 hours ago
1
$begingroup$
Commercial one-part codes used pronounceable words (prior to WW1, then the telegraph rules changed), but that isn't software and wasn't secure against attack even at the time.
$endgroup$
– Eugene Styer
8 hours ago
add a comment |
$begingroup$
Not another language but look at Format Preserving Encryption, that you can spell, also, you can encode the output into base64...
$endgroup$
– kelalaka
9 hours ago
$begingroup$
@kelalaka - thank you. Since I am a total amateur, is there any software (win, mac or web) that does this?
$endgroup$
– octo
9 hours ago
$begingroup$
There are many libraries depending on your programing language just Googleyour_programming_language FPE, and base64 is almost common in all.
$endgroup$
– kelalaka
9 hours ago
$begingroup$
Just to be crystal clear, you want your cipher text to look like46 66 72 BF 2E 3A 09 67 6D EF F3 77 14 3A A5 96 58 AA E9 79 88 2E B1 71 6F 93 11 E0 21 FA 35rather than pronounceable English words (phonemes)? So not like what Lisa Gerard sings (glosalalia)?
$endgroup$
– Paul Uszak
8 hours ago
1
$begingroup$
Commercial one-part codes used pronounceable words (prior to WW1, then the telegraph rules changed), but that isn't software and wasn't secure against attack even at the time.
$endgroup$
– Eugene Styer
8 hours ago
$begingroup$
Not another language but look at Format Preserving Encryption, that you can spell, also, you can encode the output into base64...
$endgroup$
– kelalaka
9 hours ago
$begingroup$
Not another language but look at Format Preserving Encryption, that you can spell, also, you can encode the output into base64...
$endgroup$
– kelalaka
9 hours ago
$begingroup$
@kelalaka - thank you. Since I am a total amateur, is there any software (win, mac or web) that does this?
$endgroup$
– octo
9 hours ago
$begingroup$
@kelalaka - thank you. Since I am a total amateur, is there any software (win, mac or web) that does this?
$endgroup$
– octo
9 hours ago
$begingroup$
There are many libraries depending on your programing language just Google
your_programming_language FPE, and base64 is almost common in all.$endgroup$
– kelalaka
9 hours ago
$begingroup$
There are many libraries depending on your programing language just Google
your_programming_language FPE, and base64 is almost common in all.$endgroup$
– kelalaka
9 hours ago
$begingroup$
Just to be crystal clear, you want your cipher text to look like
46 66 72 BF 2E 3A 09 67 6D EF F3 77 14 3A A5 96 58 AA E9 79 88 2E B1 71 6F 93 11 E0 21 FA 35 rather than pronounceable English words (phonemes)? So not like what Lisa Gerard sings (glosalalia)?$endgroup$
– Paul Uszak
8 hours ago
$begingroup$
Just to be crystal clear, you want your cipher text to look like
46 66 72 BF 2E 3A 09 67 6D EF F3 77 14 3A A5 96 58 AA E9 79 88 2E B1 71 6F 93 11 E0 21 FA 35 rather than pronounceable English words (phonemes)? So not like what Lisa Gerard sings (glosalalia)?$endgroup$
– Paul Uszak
8 hours ago
1
1
$begingroup$
Commercial one-part codes used pronounceable words (prior to WW1, then the telegraph rules changed), but that isn't software and wasn't secure against attack even at the time.
$endgroup$
– Eugene Styer
8 hours ago
$begingroup$
Commercial one-part codes used pronounceable words (prior to WW1, then the telegraph rules changed), but that isn't software and wasn't secure against attack even at the time.
$endgroup$
– Eugene Styer
8 hours ago
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
The best solution is out of scope for this website. Just apply an algorithm that converts binary data to human-readable pronounceable text.
A simple solution, much like base-64 or diceware, would be to download a dictionary, split binary ciphertext into chunk, and replace each chunk by selecting the nth word in the dictionary. Join the words together using spaces as separators. Punctuation can be added randomly.
The process would be reversed by removing punctuation, splitting on spaces, looking up the index of each word, then joining the binary format of the indices together.
Format preserving encryption is undesirable in this case. English prose is not a simple data format. It would be hard to convert one message to another without leaking some information about the content and/or format of the text.
Chances are also pretty good that readers wanting to implement the same kind of thing as you would roll their own format-preserving encryption algorithm which is little more than a substitution cipher. (Replacing plaintext words with pseudorandom dictionary words, for example.)
Substitution ciphers can be broken with paper and pencil. They put puzzles in newspapers next to Sudoku and comics where people are challenged to do exactly that.
The primary reason I suggest looking at reversible methods of encoding binary data as English text is because I think it's best to work to work on the process of encrypting and of formatting data as two totally independent problems.
When you use normal strong encryption (don't forget authentication), the ciphertext will be indistinguishable from a randomly generated bit sequence. It won't matter if the binary-to-text conversion reveals something about the input because the input is ciphertext. Even side channels in the conversion algorithm wouldn't be a problem if the encryption implementation isn't vulnerable to side-channel attacks.
Therefore, I think you would have better luck if you took that half of the question to StackOverflow.
You should not have high hopes for the results. Procedural generation of text is a difficult problem. It's of interest to AI researchers, spammers, news media, anyone with customer support chat, and artists. Even the most successful algorithms produce text that seems unnatural
It's really hard to generate text which can pass for something written by a real person. Luckily (or not, if you use a spam filter) it's also somewhat hard for computers to tell the difference between human-generated and computer-generated text.
As for what kind of suggestion I would give if you asked on StackOverflow...
I would take word frequency and n-gram frequency into account if the purpose of the algorithm was steganography. "To" is more likely to appear in a normal sentence than "meteorite" is. "Go to" is more likely than "jump bread".
Think of using the auto-complete function on a phone. You could write a random sentence that might look legitimate at first glance pretty easily. You could randomly select the first one or two letters of each word and select from the options given to you. An n-gram based data encoding algorithm might produce pretty similar results.
If steganography is not required (say the purpose of making ciphertext human readable was so that you could send a message over an audio channel or less-than-reliable text channel) then the simple word list encoding method is what I would try. I would make sure the word list doesn't contain homophones or even words with similar or confusing spellings.
(See one of EFF's short diceware word lists. There is one where the first three letters of a word uniquely determines which number it corresponds to and no two words have spellings which are too close to make spellcheck hard. Something like this would be more resilient to typos and misunderstandings.)
$endgroup$
add a comment |
$begingroup$
I don't know of any tools that do what you want, and so likely you'll need to develop something on your own.
I assume you want to take a plaintext and generate a 'ciphertext' which is effectively undecipherable (unless you know the key), and looks sort of like text in some unknown language (after some unspecified level of inspection - fooling an experienced linguist who studies it would be much harder than passing a casual inspection).
One overall design would be to run the plaintext through a standard cipher (e.g. AES-GCM), which produces a random looking string of bits, and then encoding that string of bits in some pseudo-language. This effectively covers the first requirement (security), as we believe that AES-GCM is secure, and the encoding process can't weaken things.
The only other question (apart from "which standard cipher") is how we encode the string.
The easiest method would be to generate a large dictionary (with, say, 4096 entries) of pseudowords; strings that are individually pronounceable, distinct, and at first glance, look like they might be words from the same language. Then, you would divide the bit string up into 12 bit sections, and for each 12 bits, output the word that corresponds to that setting of 12 bits; the decoding process is obvious.
This is not too much work; however it probably won't fool anyone who studies the text in any detail - it is likely to see that the word order really doesn't follow any particular pattern (and they will likely see repeated words on occasion; that is, the same word occurring twice in a row; that is very very unlikely to happen in a real language text[1]).
If fooling people is a goal, the next step might be to place a pseudogrammar on top of your pseudovocabulary. This is, instead of having a single dictionary, you might divide it up into 'nouns', 'verbs', 'adjectives', 'prepositions', etc; and then when you create a sentence, you randomly pick a sentence form (e.g. Article-Noun-Adjective-Verb-Adverb-Noun' might be one possibility; remember, your pseudogrammar need not follow English rules), and then for each possibility, you use bits from the bitstring to select words from the appropriate dictionary (for example, if you have defined 4 articles, you would use 2 bits from the bitstring to select from them).
This is obviously more work, but may be able to fool some observers, even after some small period of study. Things could get even more complex (real languages often conjugate and decline words, for example), but I suspect you aren't interested in doing that much work...
[1]: Yes, that was intentional...
$endgroup$
add a comment |
$begingroup$
It is easy. For example:
Take first 256 words from the language dictionary, like Oxford dictionary or Merriam-Webster dictionary, and for each byte of your encrypted message use a word with corresponding number.
Or take first 65536 words from the dictionary and replace each 2 bytes with corresponding word.
Or take each encrypted byte. Add a random multiple of 256, which is less than the number of words in your dictionary. E.g. if encrypted byte is 65 and random number is 194, take the word number 49729 (= 65 + 256*194). Do get encrypted message from the words for each word find its number in the dictionary and take
(word number) mod 256. E.g.49729 mod 256 = 65.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "281"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f73074%2fpronounceable-encrypted-text%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The best solution is out of scope for this website. Just apply an algorithm that converts binary data to human-readable pronounceable text.
A simple solution, much like base-64 or diceware, would be to download a dictionary, split binary ciphertext into chunk, and replace each chunk by selecting the nth word in the dictionary. Join the words together using spaces as separators. Punctuation can be added randomly.
The process would be reversed by removing punctuation, splitting on spaces, looking up the index of each word, then joining the binary format of the indices together.
Format preserving encryption is undesirable in this case. English prose is not a simple data format. It would be hard to convert one message to another without leaking some information about the content and/or format of the text.
Chances are also pretty good that readers wanting to implement the same kind of thing as you would roll their own format-preserving encryption algorithm which is little more than a substitution cipher. (Replacing plaintext words with pseudorandom dictionary words, for example.)
Substitution ciphers can be broken with paper and pencil. They put puzzles in newspapers next to Sudoku and comics where people are challenged to do exactly that.
The primary reason I suggest looking at reversible methods of encoding binary data as English text is because I think it's best to work to work on the process of encrypting and of formatting data as two totally independent problems.
When you use normal strong encryption (don't forget authentication), the ciphertext will be indistinguishable from a randomly generated bit sequence. It won't matter if the binary-to-text conversion reveals something about the input because the input is ciphertext. Even side channels in the conversion algorithm wouldn't be a problem if the encryption implementation isn't vulnerable to side-channel attacks.
Therefore, I think you would have better luck if you took that half of the question to StackOverflow.
You should not have high hopes for the results. Procedural generation of text is a difficult problem. It's of interest to AI researchers, spammers, news media, anyone with customer support chat, and artists. Even the most successful algorithms produce text that seems unnatural
It's really hard to generate text which can pass for something written by a real person. Luckily (or not, if you use a spam filter) it's also somewhat hard for computers to tell the difference between human-generated and computer-generated text.
As for what kind of suggestion I would give if you asked on StackOverflow...
I would take word frequency and n-gram frequency into account if the purpose of the algorithm was steganography. "To" is more likely to appear in a normal sentence than "meteorite" is. "Go to" is more likely than "jump bread".
Think of using the auto-complete function on a phone. You could write a random sentence that might look legitimate at first glance pretty easily. You could randomly select the first one or two letters of each word and select from the options given to you. An n-gram based data encoding algorithm might produce pretty similar results.
If steganography is not required (say the purpose of making ciphertext human readable was so that you could send a message over an audio channel or less-than-reliable text channel) then the simple word list encoding method is what I would try. I would make sure the word list doesn't contain homophones or even words with similar or confusing spellings.
(See one of EFF's short diceware word lists. There is one where the first three letters of a word uniquely determines which number it corresponds to and no two words have spellings which are too close to make spellcheck hard. Something like this would be more resilient to typos and misunderstandings.)
$endgroup$
add a comment |
$begingroup$
The best solution is out of scope for this website. Just apply an algorithm that converts binary data to human-readable pronounceable text.
A simple solution, much like base-64 or diceware, would be to download a dictionary, split binary ciphertext into chunk, and replace each chunk by selecting the nth word in the dictionary. Join the words together using spaces as separators. Punctuation can be added randomly.
The process would be reversed by removing punctuation, splitting on spaces, looking up the index of each word, then joining the binary format of the indices together.
Format preserving encryption is undesirable in this case. English prose is not a simple data format. It would be hard to convert one message to another without leaking some information about the content and/or format of the text.
Chances are also pretty good that readers wanting to implement the same kind of thing as you would roll their own format-preserving encryption algorithm which is little more than a substitution cipher. (Replacing plaintext words with pseudorandom dictionary words, for example.)
Substitution ciphers can be broken with paper and pencil. They put puzzles in newspapers next to Sudoku and comics where people are challenged to do exactly that.
The primary reason I suggest looking at reversible methods of encoding binary data as English text is because I think it's best to work to work on the process of encrypting and of formatting data as two totally independent problems.
When you use normal strong encryption (don't forget authentication), the ciphertext will be indistinguishable from a randomly generated bit sequence. It won't matter if the binary-to-text conversion reveals something about the input because the input is ciphertext. Even side channels in the conversion algorithm wouldn't be a problem if the encryption implementation isn't vulnerable to side-channel attacks.
Therefore, I think you would have better luck if you took that half of the question to StackOverflow.
You should not have high hopes for the results. Procedural generation of text is a difficult problem. It's of interest to AI researchers, spammers, news media, anyone with customer support chat, and artists. Even the most successful algorithms produce text that seems unnatural
It's really hard to generate text which can pass for something written by a real person. Luckily (or not, if you use a spam filter) it's also somewhat hard for computers to tell the difference between human-generated and computer-generated text.
As for what kind of suggestion I would give if you asked on StackOverflow...
I would take word frequency and n-gram frequency into account if the purpose of the algorithm was steganography. "To" is more likely to appear in a normal sentence than "meteorite" is. "Go to" is more likely than "jump bread".
Think of using the auto-complete function on a phone. You could write a random sentence that might look legitimate at first glance pretty easily. You could randomly select the first one or two letters of each word and select from the options given to you. An n-gram based data encoding algorithm might produce pretty similar results.
If steganography is not required (say the purpose of making ciphertext human readable was so that you could send a message over an audio channel or less-than-reliable text channel) then the simple word list encoding method is what I would try. I would make sure the word list doesn't contain homophones or even words with similar or confusing spellings.
(See one of EFF's short diceware word lists. There is one where the first three letters of a word uniquely determines which number it corresponds to and no two words have spellings which are too close to make spellcheck hard. Something like this would be more resilient to typos and misunderstandings.)
$endgroup$
add a comment |
$begingroup$
The best solution is out of scope for this website. Just apply an algorithm that converts binary data to human-readable pronounceable text.
A simple solution, much like base-64 or diceware, would be to download a dictionary, split binary ciphertext into chunk, and replace each chunk by selecting the nth word in the dictionary. Join the words together using spaces as separators. Punctuation can be added randomly.
The process would be reversed by removing punctuation, splitting on spaces, looking up the index of each word, then joining the binary format of the indices together.
Format preserving encryption is undesirable in this case. English prose is not a simple data format. It would be hard to convert one message to another without leaking some information about the content and/or format of the text.
Chances are also pretty good that readers wanting to implement the same kind of thing as you would roll their own format-preserving encryption algorithm which is little more than a substitution cipher. (Replacing plaintext words with pseudorandom dictionary words, for example.)
Substitution ciphers can be broken with paper and pencil. They put puzzles in newspapers next to Sudoku and comics where people are challenged to do exactly that.
The primary reason I suggest looking at reversible methods of encoding binary data as English text is because I think it's best to work to work on the process of encrypting and of formatting data as two totally independent problems.
When you use normal strong encryption (don't forget authentication), the ciphertext will be indistinguishable from a randomly generated bit sequence. It won't matter if the binary-to-text conversion reveals something about the input because the input is ciphertext. Even side channels in the conversion algorithm wouldn't be a problem if the encryption implementation isn't vulnerable to side-channel attacks.
Therefore, I think you would have better luck if you took that half of the question to StackOverflow.
You should not have high hopes for the results. Procedural generation of text is a difficult problem. It's of interest to AI researchers, spammers, news media, anyone with customer support chat, and artists. Even the most successful algorithms produce text that seems unnatural
It's really hard to generate text which can pass for something written by a real person. Luckily (or not, if you use a spam filter) it's also somewhat hard for computers to tell the difference between human-generated and computer-generated text.
As for what kind of suggestion I would give if you asked on StackOverflow...
I would take word frequency and n-gram frequency into account if the purpose of the algorithm was steganography. "To" is more likely to appear in a normal sentence than "meteorite" is. "Go to" is more likely than "jump bread".
Think of using the auto-complete function on a phone. You could write a random sentence that might look legitimate at first glance pretty easily. You could randomly select the first one or two letters of each word and select from the options given to you. An n-gram based data encoding algorithm might produce pretty similar results.
If steganography is not required (say the purpose of making ciphertext human readable was so that you could send a message over an audio channel or less-than-reliable text channel) then the simple word list encoding method is what I would try. I would make sure the word list doesn't contain homophones or even words with similar or confusing spellings.
(See one of EFF's short diceware word lists. There is one where the first three letters of a word uniquely determines which number it corresponds to and no two words have spellings which are too close to make spellcheck hard. Something like this would be more resilient to typos and misunderstandings.)
$endgroup$
The best solution is out of scope for this website. Just apply an algorithm that converts binary data to human-readable pronounceable text.
A simple solution, much like base-64 or diceware, would be to download a dictionary, split binary ciphertext into chunk, and replace each chunk by selecting the nth word in the dictionary. Join the words together using spaces as separators. Punctuation can be added randomly.
The process would be reversed by removing punctuation, splitting on spaces, looking up the index of each word, then joining the binary format of the indices together.
Format preserving encryption is undesirable in this case. English prose is not a simple data format. It would be hard to convert one message to another without leaking some information about the content and/or format of the text.
Chances are also pretty good that readers wanting to implement the same kind of thing as you would roll their own format-preserving encryption algorithm which is little more than a substitution cipher. (Replacing plaintext words with pseudorandom dictionary words, for example.)
Substitution ciphers can be broken with paper and pencil. They put puzzles in newspapers next to Sudoku and comics where people are challenged to do exactly that.
The primary reason I suggest looking at reversible methods of encoding binary data as English text is because I think it's best to work to work on the process of encrypting and of formatting data as two totally independent problems.
When you use normal strong encryption (don't forget authentication), the ciphertext will be indistinguishable from a randomly generated bit sequence. It won't matter if the binary-to-text conversion reveals something about the input because the input is ciphertext. Even side channels in the conversion algorithm wouldn't be a problem if the encryption implementation isn't vulnerable to side-channel attacks.
Therefore, I think you would have better luck if you took that half of the question to StackOverflow.
You should not have high hopes for the results. Procedural generation of text is a difficult problem. It's of interest to AI researchers, spammers, news media, anyone with customer support chat, and artists. Even the most successful algorithms produce text that seems unnatural
It's really hard to generate text which can pass for something written by a real person. Luckily (or not, if you use a spam filter) it's also somewhat hard for computers to tell the difference between human-generated and computer-generated text.
As for what kind of suggestion I would give if you asked on StackOverflow...
I would take word frequency and n-gram frequency into account if the purpose of the algorithm was steganography. "To" is more likely to appear in a normal sentence than "meteorite" is. "Go to" is more likely than "jump bread".
Think of using the auto-complete function on a phone. You could write a random sentence that might look legitimate at first glance pretty easily. You could randomly select the first one or two letters of each word and select from the options given to you. An n-gram based data encoding algorithm might produce pretty similar results.
If steganography is not required (say the purpose of making ciphertext human readable was so that you could send a message over an audio channel or less-than-reliable text channel) then the simple word list encoding method is what I would try. I would make sure the word list doesn't contain homophones or even words with similar or confusing spellings.
(See one of EFF's short diceware word lists. There is one where the first three letters of a word uniquely determines which number it corresponds to and no two words have spellings which are too close to make spellcheck hard. Something like this would be more resilient to typos and misunderstandings.)
edited 6 hours ago
answered 6 hours ago
Future SecurityFuture Security
2,2881 gold badge4 silver badges20 bronze badges
2,2881 gold badge4 silver badges20 bronze badges
add a comment |
add a comment |
$begingroup$
I don't know of any tools that do what you want, and so likely you'll need to develop something on your own.
I assume you want to take a plaintext and generate a 'ciphertext' which is effectively undecipherable (unless you know the key), and looks sort of like text in some unknown language (after some unspecified level of inspection - fooling an experienced linguist who studies it would be much harder than passing a casual inspection).
One overall design would be to run the plaintext through a standard cipher (e.g. AES-GCM), which produces a random looking string of bits, and then encoding that string of bits in some pseudo-language. This effectively covers the first requirement (security), as we believe that AES-GCM is secure, and the encoding process can't weaken things.
The only other question (apart from "which standard cipher") is how we encode the string.
The easiest method would be to generate a large dictionary (with, say, 4096 entries) of pseudowords; strings that are individually pronounceable, distinct, and at first glance, look like they might be words from the same language. Then, you would divide the bit string up into 12 bit sections, and for each 12 bits, output the word that corresponds to that setting of 12 bits; the decoding process is obvious.
This is not too much work; however it probably won't fool anyone who studies the text in any detail - it is likely to see that the word order really doesn't follow any particular pattern (and they will likely see repeated words on occasion; that is, the same word occurring twice in a row; that is very very unlikely to happen in a real language text[1]).
If fooling people is a goal, the next step might be to place a pseudogrammar on top of your pseudovocabulary. This is, instead of having a single dictionary, you might divide it up into 'nouns', 'verbs', 'adjectives', 'prepositions', etc; and then when you create a sentence, you randomly pick a sentence form (e.g. Article-Noun-Adjective-Verb-Adverb-Noun' might be one possibility; remember, your pseudogrammar need not follow English rules), and then for each possibility, you use bits from the bitstring to select words from the appropriate dictionary (for example, if you have defined 4 articles, you would use 2 bits from the bitstring to select from them).
This is obviously more work, but may be able to fool some observers, even after some small period of study. Things could get even more complex (real languages often conjugate and decline words, for example), but I suspect you aren't interested in doing that much work...
[1]: Yes, that was intentional...
$endgroup$
add a comment |
$begingroup$
I don't know of any tools that do what you want, and so likely you'll need to develop something on your own.
I assume you want to take a plaintext and generate a 'ciphertext' which is effectively undecipherable (unless you know the key), and looks sort of like text in some unknown language (after some unspecified level of inspection - fooling an experienced linguist who studies it would be much harder than passing a casual inspection).
One overall design would be to run the plaintext through a standard cipher (e.g. AES-GCM), which produces a random looking string of bits, and then encoding that string of bits in some pseudo-language. This effectively covers the first requirement (security), as we believe that AES-GCM is secure, and the encoding process can't weaken things.
The only other question (apart from "which standard cipher") is how we encode the string.
The easiest method would be to generate a large dictionary (with, say, 4096 entries) of pseudowords; strings that are individually pronounceable, distinct, and at first glance, look like they might be words from the same language. Then, you would divide the bit string up into 12 bit sections, and for each 12 bits, output the word that corresponds to that setting of 12 bits; the decoding process is obvious.
This is not too much work; however it probably won't fool anyone who studies the text in any detail - it is likely to see that the word order really doesn't follow any particular pattern (and they will likely see repeated words on occasion; that is, the same word occurring twice in a row; that is very very unlikely to happen in a real language text[1]).
If fooling people is a goal, the next step might be to place a pseudogrammar on top of your pseudovocabulary. This is, instead of having a single dictionary, you might divide it up into 'nouns', 'verbs', 'adjectives', 'prepositions', etc; and then when you create a sentence, you randomly pick a sentence form (e.g. Article-Noun-Adjective-Verb-Adverb-Noun' might be one possibility; remember, your pseudogrammar need not follow English rules), and then for each possibility, you use bits from the bitstring to select words from the appropriate dictionary (for example, if you have defined 4 articles, you would use 2 bits from the bitstring to select from them).
This is obviously more work, but may be able to fool some observers, even after some small period of study. Things could get even more complex (real languages often conjugate and decline words, for example), but I suspect you aren't interested in doing that much work...
[1]: Yes, that was intentional...
$endgroup$
add a comment |
$begingroup$
I don't know of any tools that do what you want, and so likely you'll need to develop something on your own.
I assume you want to take a plaintext and generate a 'ciphertext' which is effectively undecipherable (unless you know the key), and looks sort of like text in some unknown language (after some unspecified level of inspection - fooling an experienced linguist who studies it would be much harder than passing a casual inspection).
One overall design would be to run the plaintext through a standard cipher (e.g. AES-GCM), which produces a random looking string of bits, and then encoding that string of bits in some pseudo-language. This effectively covers the first requirement (security), as we believe that AES-GCM is secure, and the encoding process can't weaken things.
The only other question (apart from "which standard cipher") is how we encode the string.
The easiest method would be to generate a large dictionary (with, say, 4096 entries) of pseudowords; strings that are individually pronounceable, distinct, and at first glance, look like they might be words from the same language. Then, you would divide the bit string up into 12 bit sections, and for each 12 bits, output the word that corresponds to that setting of 12 bits; the decoding process is obvious.
This is not too much work; however it probably won't fool anyone who studies the text in any detail - it is likely to see that the word order really doesn't follow any particular pattern (and they will likely see repeated words on occasion; that is, the same word occurring twice in a row; that is very very unlikely to happen in a real language text[1]).
If fooling people is a goal, the next step might be to place a pseudogrammar on top of your pseudovocabulary. This is, instead of having a single dictionary, you might divide it up into 'nouns', 'verbs', 'adjectives', 'prepositions', etc; and then when you create a sentence, you randomly pick a sentence form (e.g. Article-Noun-Adjective-Verb-Adverb-Noun' might be one possibility; remember, your pseudogrammar need not follow English rules), and then for each possibility, you use bits from the bitstring to select words from the appropriate dictionary (for example, if you have defined 4 articles, you would use 2 bits from the bitstring to select from them).
This is obviously more work, but may be able to fool some observers, even after some small period of study. Things could get even more complex (real languages often conjugate and decline words, for example), but I suspect you aren't interested in doing that much work...
[1]: Yes, that was intentional...
$endgroup$
I don't know of any tools that do what you want, and so likely you'll need to develop something on your own.
I assume you want to take a plaintext and generate a 'ciphertext' which is effectively undecipherable (unless you know the key), and looks sort of like text in some unknown language (after some unspecified level of inspection - fooling an experienced linguist who studies it would be much harder than passing a casual inspection).
One overall design would be to run the plaintext through a standard cipher (e.g. AES-GCM), which produces a random looking string of bits, and then encoding that string of bits in some pseudo-language. This effectively covers the first requirement (security), as we believe that AES-GCM is secure, and the encoding process can't weaken things.
The only other question (apart from "which standard cipher") is how we encode the string.
The easiest method would be to generate a large dictionary (with, say, 4096 entries) of pseudowords; strings that are individually pronounceable, distinct, and at first glance, look like they might be words from the same language. Then, you would divide the bit string up into 12 bit sections, and for each 12 bits, output the word that corresponds to that setting of 12 bits; the decoding process is obvious.
This is not too much work; however it probably won't fool anyone who studies the text in any detail - it is likely to see that the word order really doesn't follow any particular pattern (and they will likely see repeated words on occasion; that is, the same word occurring twice in a row; that is very very unlikely to happen in a real language text[1]).
If fooling people is a goal, the next step might be to place a pseudogrammar on top of your pseudovocabulary. This is, instead of having a single dictionary, you might divide it up into 'nouns', 'verbs', 'adjectives', 'prepositions', etc; and then when you create a sentence, you randomly pick a sentence form (e.g. Article-Noun-Adjective-Verb-Adverb-Noun' might be one possibility; remember, your pseudogrammar need not follow English rules), and then for each possibility, you use bits from the bitstring to select words from the appropriate dictionary (for example, if you have defined 4 articles, you would use 2 bits from the bitstring to select from them).
This is obviously more work, but may be able to fool some observers, even after some small period of study. Things could get even more complex (real languages often conjugate and decline words, for example), but I suspect you aren't interested in doing that much work...
[1]: Yes, that was intentional...
answered 6 hours ago
ponchoponcho
98.9k3 gold badges161 silver badges258 bronze badges
98.9k3 gold badges161 silver badges258 bronze badges
add a comment |
add a comment |
$begingroup$
It is easy. For example:
Take first 256 words from the language dictionary, like Oxford dictionary or Merriam-Webster dictionary, and for each byte of your encrypted message use a word with corresponding number.
Or take first 65536 words from the dictionary and replace each 2 bytes with corresponding word.
Or take each encrypted byte. Add a random multiple of 256, which is less than the number of words in your dictionary. E.g. if encrypted byte is 65 and random number is 194, take the word number 49729 (= 65 + 256*194). Do get encrypted message from the words for each word find its number in the dictionary and take
(word number) mod 256. E.g.49729 mod 256 = 65.
$endgroup$
add a comment |
$begingroup$
It is easy. For example:
Take first 256 words from the language dictionary, like Oxford dictionary or Merriam-Webster dictionary, and for each byte of your encrypted message use a word with corresponding number.
Or take first 65536 words from the dictionary and replace each 2 bytes with corresponding word.
Or take each encrypted byte. Add a random multiple of 256, which is less than the number of words in your dictionary. E.g. if encrypted byte is 65 and random number is 194, take the word number 49729 (= 65 + 256*194). Do get encrypted message from the words for each word find its number in the dictionary and take
(word number) mod 256. E.g.49729 mod 256 = 65.
$endgroup$
add a comment |
$begingroup$
It is easy. For example:
Take first 256 words from the language dictionary, like Oxford dictionary or Merriam-Webster dictionary, and for each byte of your encrypted message use a word with corresponding number.
Or take first 65536 words from the dictionary and replace each 2 bytes with corresponding word.
Or take each encrypted byte. Add a random multiple of 256, which is less than the number of words in your dictionary. E.g. if encrypted byte is 65 and random number is 194, take the word number 49729 (= 65 + 256*194). Do get encrypted message from the words for each word find its number in the dictionary and take
(word number) mod 256. E.g.49729 mod 256 = 65.
$endgroup$
It is easy. For example:
Take first 256 words from the language dictionary, like Oxford dictionary or Merriam-Webster dictionary, and for each byte of your encrypted message use a word with corresponding number.
Or take first 65536 words from the dictionary and replace each 2 bytes with corresponding word.
Or take each encrypted byte. Add a random multiple of 256, which is less than the number of words in your dictionary. E.g. if encrypted byte is 65 and random number is 194, take the word number 49729 (= 65 + 256*194). Do get encrypted message from the words for each word find its number in the dictionary and take
(word number) mod 256. E.g.49729 mod 256 = 65.
edited 1 hour ago
answered 1 hour ago
mentallurgmentallurg
2731 silver badge11 bronze badges
2731 silver badge11 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Cryptography Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f73074%2fpronounceable-encrypted-text%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Not another language but look at Format Preserving Encryption, that you can spell, also, you can encode the output into base64...
$endgroup$
– kelalaka
9 hours ago
$begingroup$
@kelalaka - thank you. Since I am a total amateur, is there any software (win, mac or web) that does this?
$endgroup$
– octo
9 hours ago
$begingroup$
There are many libraries depending on your programing language just Google
your_programming_language FPE, and base64 is almost common in all.$endgroup$
– kelalaka
9 hours ago
$begingroup$
Just to be crystal clear, you want your cipher text to look like
46 66 72 BF 2E 3A 09 67 6D EF F3 77 14 3A A5 96 58 AA E9 79 88 2E B1 71 6F 93 11 E0 21 FA 35rather than pronounceable English words (phonemes)? So not like what Lisa Gerard sings (glosalalia)?$endgroup$
– Paul Uszak
8 hours ago
1
$begingroup$
Commercial one-part codes used pronounceable words (prior to WW1, then the telegraph rules changed), but that isn't software and wasn't secure against attack even at the time.
$endgroup$
– Eugene Styer
8 hours ago