6. Symmetric-Key Encryption¶
IND-CPA Security¶
Recall confidentiality means that an attacker cannot read our messages. However, let's make it more precise and rigorious:
The cipher text $C$ should give the attacker no additional information about the message $M$. In other words, the attacker should not learn any new information about $M$ beyond what they already knew before seeing $C$.
Further formalize via experiment. Alice has encrypted and sent one of two messages, either \(M_0\) or \(M_1\), and Eve has no idea which was sent. Eve tries to guess which was sent by looking at the cipher text. If the encryption scheme is confidential, then Eve's probability of guessing which message was sent should be 1/2, which is the same probability if she had not intercepted the ciphertext at all, and was instead guessing at random.
Our definition says that even if Eve can trick Alice into encrypting some messages, she still cannot distinguish whether Alice sent \(M_0\) or \(M_1\) in the experiment. This definition is known as indistinguishability under chosen plaintext attack, or IND-CPA. It works as follows: 1. Eve chooses two different messages \(M_0\), and \(M_1\), and sends both messages to Alice 2. Alice flips a fair coin. If the coin is heads, she encrypts \(M_0\). If the coin is tails, she encrypts \(M_1\). More formally, Alice will choose a bit \(b \in\{0, 1\}\) uniformly at random, and then encrypts \(M_b\). Alice sends the encrypted message \(Enc(K, M_b)\) back to Eve 3. Eve is now allowed to ask Alice for encryptions of messages of Eve's choosing. Eve can send a plaintext message with the secret key. Eve is allowed to repeat this as many times as she wants. Intuitively, this steps is allowing Eve to perform a chosen-plaintext attack in an attempt to learn something about which message was sent 4. After Eve is finished asking for encryptions, she must guess whether the encrypted message from step 2 is the encryption of \(M_0\) or \(M_1\).
If Eve can guess which message was sent with probability > 1/2, she wins, and the scheme is not IND-CPA secure.
Some restrictions:
-
\(M_0\) and \(M_1\) must be the same length.
-
Eve is limited to a practical number of encryption requests.
-
Eve only wins if she has a non-negligible advantage.
XOR Review¶
Symmetric-key encryption often relies on the bitwise XOR operation. Some properties of the XOR operations:
The last one is important to note. Given \(x\oplus y\), you can retrieve \(y\) by XOR'ing with \(x\), cancelling out the \(x\). This allows us to perform algebra with XOR:
One Time Pad¶
One Time Pad is a symmetric encryption scheme. Alice and Bob share an \(n\)-bit secret key \(K= k_1\cdots k_n\). The bits are picked uniformly at random. To send an \(n\)-bit message \(M=m_1\cdots m_n\), we want the desired properties: 1. It should scramble up the message, i.e., map it to a ciphertext \(C = c_1\cdots c_n\). 2. Given knowledge of the secret key \(K\), it should be easy to recover \(M\) from \(C\). 3. Eve, who does not know \(K\), should get no information about \(M\).
How to do it: for encryption
for decryption
Maybe more summarizable:
- Alice and Bob pick a shared random key \(K\).
- Encryption: \(C = M\oplus K\)
- Decryption: \(M = C\oplus K\)
Proof that One-Time Pad is IND-CPA Secure¶
For a fixed choice of plaintext \(M\), every possible value of ciphertext \(C\) can be achieved by an appropriate and unique choice of the shared key \(K\), namely \(K = M\oplus C\). Since each such key value is equally likely, it follows that \(C\) is also equally likely to be any n-bit string.
Eve thus sees a uniformly random \(n\) bit string no matter what the plaintext message was, and thus gets no information about which of the two messages was encrypted.
Suppose Eve observes a ciphertext \(C\), and knows the message \(M\) is either \(M_0\) or \(M_1\). The probability space here is \(2^{n+1}\) since there are \(2^n\) choices for the n-bit key \(K\), as well as the challenger's choice of whether to send \(M_0\) or \(M_1\). All choices are equally likely.
Assume key \(K\) is generated uniformly at random; then the challenger randomly chooses a bit \(b\in\{0, 1\}\), and Alice sends the encryption of \(M_b\). So, if Eve observes that the cipher text has some specific value \(C\), what is the conditional probability that \(b = 0\) given her observation?
Drawback of one time pad: Shared key cannot be reused to transmit another message \(M'\).
-
If \(K\) is reused to encrypt two messages \(M\) and \(M'\), then Eve can XOR the two ciphertexts \(C = M\oplus K\) and \(C' = M'\oplus K\) to obtain \(C\oplus C' = M\oplus M'\).
-
If Eve happens to learn \(M\), then she can easily deduce the other message \(M'\). But she can also just reconstruct the key \(K\) too.
-
Even if Eve does not know \(M\) or \(M'\), there is often enough redundancy in messages that merely knowing \(M\oplus M'\) is enough to recover most of the messages.
In general, one-time pad is not secure if the key is used to encrypt more than one message.
Block Ciphers¶
Block cipher transforms a fixed-length \(n\)-bit input into a fixed-length \(n\)-bit output. The block cipher has \(2^k\) different settings for scrambling, so it also takes in a \(k\)-bit key as input, where each key corresponds to a different scrambling setting.
Given a fixed key, the block cipher encryption must map each of the \(2^n\) possible plaintext inputs to a different ciphertext output. The block cipher encryption must map uniquely, so that there is some inverse operation that decrypts it. This means block cipher is deterministic.
Mathematically: There is an encryption function \(E : \{0, 1\}^k \times \{0, 1\} \rightarrow \{0, 1\}^n\). Once we fix a key \(K\), the function maps \(E_K: \{0, 1\}^n\rightarrow \{0, 1\}^n\), defined by \(E_K(M) = E(K, M)\).
We require \(E_K\) to be a permutation on the n-bit strings, so it is invertible (bijective). The inverse mapping of this permutation is the decryption algorithm \(D_K\). In other words, \(D_K(E_K(M)) = M\).
Block Cipher Security¶
Block ciphers are not IND-CPA secure on their own because they are deterministic. In other words, encrypting the same message twice with the same key produces the same output twice. Eve sends \(M_0\) and \(M_1\) to be encrypted, then queries the challenger for the encryption of \(M_0\) and receives the same encryption. If the two encryptions she receives from the challenger are the same, then Eve knows the challenger encrypted \(M_0\) and sent \(E(K, M_0\)) otherwise they encrypted \(M_1\).
Altough they are not IND-CPA secure, they have a desirable security property that helps us build IND-CPA secure symmetric encryption schemes: a block cipher is computationally indistinguishable from a random permutation.
A random permutation is a function that maps each \(n\)-bit input to exactly one \(n\)-bit random output.
AES is not truly indistinguishable from random, but is believed to be computationally indistinguishable from random. Intuitively, this means that given a practical amount of computation power, Eve cannot guess which type of permutation she received.
This gives a strong security guarantee. given a single ciphertext \(C = E_K(M)\), an attacker without the key cannot learn anything about the original message \(M\). If the attacker could learn something about \(M\), then AES would no longer be computationally indistinguishable. There is no proof that AES is computationally indistinguishable from random, but it is believed to be computationally indistinguishable.
Block Cipher Modes of Operation¶
Why AES cannot be a practical IND-CPA secure encryption scheme: 1. We would like to encrypt arbitrarily long messages, but the block cipher only takes fixed-length inputs 2. If the same message is sent twice, the ciphertext in the two transmissions is the same with AES.
To fix these problems, the encryption algorithm can be randomized or stateful.
Several standard ways of building an encryption algorithm, using a block cipher
Electronic Code Book (ECB) Mode¶
The plaintext \(M\) is simply broken into \(n\)-bit blocks \(M_1\cdots M_l\), and each block is encoded using the block cipher \(C_i = E_K(M_i)\). The ciphertext is just a concatenation of these individual blocks: \(C = C_1\cdot C_2\cdots C_l\).
This scheme is flawed. Any redundancy in the blocks will show through and allow the eavesdropper to deduce information about the plaintext. If, for instance \(M_i = M_j\), then \(C_i = C_j\).
-
ECB mode leaks information about the plaintext
-
Encryption: C_i = E_K(M_i)
-
Decryption: M_i = D_K(C_i)
Cipher Block Chaining (CBC) Mode¶
For each message, the sender picks a random \(n\)-bit string, called the initial vector (IV). Define \(C_0 = IV\). The \(i\)th ciphertext block is given by \(C_i = E_K(C_{i-1}\oplus M_i)\). The ciphertext is the concatenation of the initial vector and these individual blocks \(C = IV\cdot C_1\cdot C_2\cdots C_l\).
-
CBC mode has been proven to provide strong security guarantees on the privacy of the plaintext message
-
Encryption: \(C_0 = IV;\;\; C_i = E_K(P_i\oplus C_{i-1})\)
-
Decryption: \(P_i = D_K(C_i)\oplus C_{i-1}\)
Ciphertext Feedback (CFB) Mode¶
\(C_0\) is again the IV. The \(i\)th ciphertext block is now given by \(C_i = E_K(C_{i-1})\oplus M_i\).
-
Encryption: \(C_0 = IV;\;\;\; C_i = E_K(C_{i-1})\oplus P_i\)
-
Decryption: \(P_i = E_K(C_{i-1})\oplus C_i\)
Output Feedback (OFB) Mode¶
The initial vector is repeatedly encrypted to obtain a set of values \(Z_i\) as follows: \(Z_0 = IV\) and \(Z_i = E_K(Z_{i-1})\). These values \(Z_i\) are now used as though they were the key for a one-tie pad, so that \(C_i = Z_i\oplus M_i\). The ciphertext is the concatenation of the IV and these individual blocks: \(C = IV\cdot C_1\cdot C_2\cdots C_l\).
In OFB, it is easy to tampe with ciphertexts. Suppose that the adversary happens to know that the \(j\)th block of the message \(M_j\), specifies the amount of money being transferred to his account from the bank, and suppose he knows that \(M_j = 100\). Since he knows both \(M_j\) and \(C_j\), he can determine \(Z_j\). He can then substitute any \(n\)-bit block in place of \(M_j\) and get a new ciphertext \(C'_j\) where 100 is replaced by any amount of his choice.
-
Encryption: \(Z*0 = IV;\;\;\; Z_i = E_K(Z*i-1);\;\;\; C_i = M_i\oplus Z_i\)
-
Decyption: \(P_i = C_i\oplus Z_i\)
Counter (CTR) Mode¶
A counter is initialized to IV and repeatedly incremented and encrypted to obtain a sequence that can now be used as though they were they keys for a one-time pad, namely \(Z_i = E_K(IV + i)\) and \(C_i = Z_i\oplus M_i\). In CTR, the IV is renamed the nonce.
-
Encryption: \(C_i = E_K(IV + i) \oplus M_i\)
-
Decryption: \(M_i = E_K(IV + i) \oplus C_i\)
Parallelization¶
Some modes require successive blocks to be encrypted or decrypted sequentially, but some of them can be parallelized:
-
CBC encryption cannot be parallized since it depends on \(C_{i-1}\).
-
CBC decryption can be parallelized. Since we already have all the ciphertext blocks
-
CTR encryption and decryption can both be parallelized.
Padding¶
Recall that block ciphers let us encrypt messages that are longer than one block long. What happens if we want to send a message that is not a multiple of the block size? It will depends on which mode is being used. Let's assume the block size is 128 bits (16 characters).
CBC: if the plaintext isn't a multiple of 128 bits, then the last block of plaintext will be slightly shorter than 128 bits. The XOR would be undefined-bitwise XOR. The solution is to pad the plaintext until it is a multiple of 128 bits. But that means we have to remove the padding in decryption.
- PKCS#7 padding pads the message by the number of padding bytes used. This allows the decryption to understand how much bytes to remove.
Not all modes need to be padded. CTR doesn't because the result of XOR never has to be passed into a block cipher again at the end, so the 100 bits left is fine.
Reusing IVs is Insecure¶
Recall ECB is not IND-CPA because it is deterministic. Encrypting the same plaintext twice produces the same output, and this causes information leakage. All other modes introduce a random initialization vector IV that is different on every encryption in order to ensure that encrypting the same plaintext twice with the same key results in different output.
However, reusing IV in CTR is equivalent to reusing the one-time pad.