Encryption Help

Hi, I need to encrypt some data for a client. I’ve looked at a few code samples and have adapted this from Stack Exchange. Does it ook okay? Are there any ways to make it more secure?

I have a few questions:

  1. Is it true that MCRYPT_RIJNDAEL_128 == AES 256-bit? If so, why the misnomer?
  2. I’ve heard you can used a MAC in encryption. What is this and do I need to adapt the code below?
  3. Why the need for base64? It seems to work without it?
  4. What is MCRYPT_MODE_CBC and is it the right option?

Thanks.

function encrypt($toEncrypt, $key) {
    $iv = mcrypt_create_iv(mcrypt_get_iv_size(MCRYPT_RIJNDAEL_128, MCRYPT_MODE_CBC), MCRYPT_DEV_URANDOM);
    
    return base64_encode($iv . mcrypt_encrypt(MCRYPT_RIJNDAEL_128, $key, $toEncrypt, MCRYPT_MODE_CBC, $iv));
	
}

function decrypt($toDecrypt, $key) {
    $ivSize = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_128, MCRYPT_MODE_CBC);
    $toDecrypt = base64_decode($toDecrypt);

    return rtrim(mcrypt_decrypt(MCRYPT_RIJNDAEL_128, $key, substr($toDecrypt, $ivSize), MCRYPT_MODE_CBC, substr($toDecrypt, 0, $ivSize)));	
}

$key 		= 'my key';
$string 	= 'Plain text string';

$encrypted = encrypt($string, $key);
echo $encrypted;
echo '<br />';
echo decrypt($encrypted, $key);

PHP’s documentation about each cipher is sparse, but most likely the “128” and “256” are referring to different things. Rijndael 128 is probably referring to the block size, because Rijndael can have one of several block sizes, so it’s important to specify which one you’re using. AES, on the other hand, allows only a single block size, 128-bit. The 256-bit is almost certainly referring to the key size.

Message authentication code. Encryption provides privacy, but not integrity. Just because a third party can’t read the message doesn’t mean they didn’t flip a bit somewhere. A MAC is basically just a hash of the message the receiver can use to verify it came through unaltered.

Base64 turns binary data into printable characters. Whether that’s useful or not depends on what you plan to do next.

Block ciphers encrypt your message one chunk at a time. But what if two chunks have the same data? Maybe your message has a long line of whitespace, for example. If the cipher is given identical inputs, then it will produce identical outputs. That is, part of the ciphertext will appear to repeat. And that’s bad. That would leak information about the message. So we need a system for each block such that even identical inputs would produce different outputs. We call such a system a block cipher mode. There are several good ones to choose from (and some bad ones). CBC is a good one and also the most commonly used.

The $key isn’t a key yet. It’s a password. A cryptographic key usually needs to be a very specific size, such as 256 bits. PHP is probably masking the error by null padding the string. Typically you would use something like PBKDF2 to convert a password into a key. Also, the message to encrypt needs to be padded to a multiple of the block size, then unpadded on decrypt. PHP seems to be masking this error too by just tacking on some nulls.

But more importantly, encryption is very easy to get wrong, even when you have some idea of what you’re doing. It’s almost always better to use existing tools or systems rather than hand roll your own. Why don’t you go into more detail about what your client needs, what goal he’s trying to achieve, and we can probably find you a better solution.

Great answer, thanks for taking the time to write it!

  1. Got you!
  2. See context below and see if you think MAC is required
  3. Do you mind giving an example where base64 would help/be required?
  4. Got you!
  5. If I add in hash_pbkdf2 sha256 would that do the job?

The client has sensitive customer information (passwords for networks, etc). Each customer would have a number of rows in a database that contain encrypted data. As well as logging in the client would need to enter an encryption password per client to decrypt the data so the data is only decrypted within the scope of a single page load. So to get at the data the user would need to be logged in and also have the encryption password.

If the database got hacked the hacker would need to break each customer separately much the same way you salt hashed passwords.

I guess it would be similar to how OS X unlocks your keychain when you log in.

If I convert my password to a 256-bit key would I be good to go?

Just been looking into this a bit more and have adapted the code from the PHP page:

# --- ENCRYPTION ---

# the key should be random binary, use scrypt, bcrypt or PBKDF2 to
# convert a string into a key
# key is specified using hexadecimal
$key = pack('H*', hash('sha256', 'human readable password goes here?')); // CHANGE 1: I added the hash function

# show key size use either 16, 24 or 32 byte keys for AES-128, 192
# and 256 respectively
$key_size =  strlen($key);
echo "Key size: " . $key_size . "\n";

$plaintext = "This string was AES-256 / CBC / ZeroBytePadding encrypted.";

# create a random IV to use with CBC encoding
$iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_128, MCRYPT_MODE_CBC);
$iv = mcrypt_create_iv($iv_size, MCRYPT_DEV_URANDOM); // CHANGE 2: I changed to MCRYPT_DEV_URANDOM

# creates a cipher text compatible with AES (Rijndael block size = 128)
# to keep the text confidential 
# only suitable for encoded input that never ends with value 00h
# (because of default zero padding)
$ciphertext = mcrypt_encrypt(MCRYPT_RIJNDAEL_128, $key,
							 $plaintext, MCRYPT_MODE_CBC, $iv);

# prepend the IV for it to be available for decryption
$ciphertext = $iv . $ciphertext;

# encode the resulting cipher text so it can be represented by a string
$ciphertext_base64 = base64_encode($ciphertext);

echo  $ciphertext_base64 . "\n";

# === WARNING ===

# Resulting cipher text has no integrity or authenticity added
# and is not protected against padding oracle attacks.

# --- DECRYPTION ---

$ciphertext_dec = base64_decode($ciphertext_base64);

# retrieves the IV, iv_size should be created using mcrypt_get_iv_size()
$iv_dec = substr($ciphertext_dec, 0, $iv_size);

# retrieves the cipher text (everything except the $iv_size in the front)
$ciphertext_dec = substr($ciphertext_dec, $iv_size);

# may remove 00h valued characters from end of plain text
$plaintext_dec = mcrypt_decrypt(MCRYPT_RIJNDAEL_128, $key,
								$ciphertext_dec, MCRYPT_MODE_CBC, $iv_dec);

echo  rtrim($plaintext_dec) . "\n"; // CHANGE 3: I added rtrim

See above:

CHANGE 1: I added the hash function
CHANGE 2: I changed to MCRYPT_DEV_URANDOM
CHANGE 3: I added rtrim

If I want to password-protect data and create a 32 byte key is sha256ing it as I’ve done the way to go? Should it be salted?

Finally, can someone please explain the “only suitable for encoded input that never ends with value 00h” comment?

Thanks.

The comments say “use scrypt, bcrypt or PBKDF2”, so how come you didn’t use one of those?

Rather than comment about this limitation, why not fix it? You need to pad your message to a multiple of the block size in a way that’s reversible. Padding (cryptography)

Why? The decryption routine shouldn’t be altering the message.

I was once contemplating encrypting credit card numbers and storing them.
I am convinced the the encrypted value will never be cracked without the keys.
Everything gets encrypted twice, refresh your screen and you will se that for the same data, the encrypted string is different every time.
Also you can see why base 64 is important, you can store its results in the database,

function encrypt($string,$key, $type) {

	srand((double) microtime() * 1000000); //for sake of MCRYPT_RAND
	$key = md5($key); //to improve variance
	if ($type == '1'){
		    $td = mcrypt_module_open('serpent', '', 'cfb', '');
		}
	if ($type == '2'){
		    $td = mcrypt_module_open('rijndael-256', '', 'cfb', ''); 
		}
	$key = substr($key, 0, mcrypt_enc_get_key_size($td));
	$iv_size = mcrypt_enc_get_iv_size($td);
	$iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
	/* Initialize encryption handle */
	if (mcrypt_generic_init($td, $key, $iv) != -1) {
	
		/* Encrypt data */
		$c_t = mcrypt_generic($td, $string);
		mcrypt_generic_deinit($td);
		mcrypt_module_close($td);
		
		$c_t = $iv.$c_t;	  
		if ($type == '1'){
			return $c_t;
		}
		elseif ($type == '2'){
			echo '<br>before b64 '.$c_t.'<br>';
			$encoded_64 = base64_encode($c_t); 
			echo '<br>after b64 '.$encoded_64.'<br>';
			
			return $encoded_64;
		} 
	} //end if
}

function decrypt($string, $key, $type) {
   $key = md5($key); //to improve variance
  /* Open module, and create IV */
	if ($type == '1'){
		$td = mcrypt_module_open('serpent', '', 'cfb', '');
	}
	if ($type == '2'){
		$td = mcrypt_module_open('rijndael-256', '', 'cfb', '');
	}
	//  $td = mcrypt_module_open('rijndael-256', '','cfb', '');
	  $key = substr($key, 0, mcrypt_enc_get_key_size($td));
	  $iv_size = mcrypt_enc_get_iv_size($td);
	  $iv = substr($string,0,$iv_size);
	  $string = substr($string,$iv_size);
	  /* Initialize encryption handle */
	   if (mcrypt_generic_init($td, $key, $iv) != -1) {
	
		  /* Encrypt data */
	//      $c_t = mdecrypt_generic($td, $string);
		if ($type == '1'){ 
			$c_t = mdecrypt_generic($td, $string); 
		}
		elseif ($type == '2'){ 
			$decoded_64=base64_decode($string); 
			$c_t = mdecrypt_generic($td, $string); 
		}
	
		  mcrypt_generic_deinit($td);
		  mcrypt_module_close($td);
		   return $c_t;
   } //end if
}

$key1 = 123;
$key2 = 321;
$str = 'My name is mud ~ Primus';

	$enc1 = encrypt($str,$key1, '1');
	$enc2 = encrypt($enc1, $key2, '2');

$dec1 = decrypt($enc2, $key2, 2);
$dec2 = decrypt($enc1, $key1, 1);

echo $dec2;

Don’t. If you’re caught - regardless of whether or not your security is defeated, the fine is up to $500,000 per number. You must be licensed to store such numbers and the auditing process on your code is well north of $10,000 last time I looked it up. Besides, Verisign and other payment gateways provide means to store credit cards for recurrent charges without needing to take the risk of storing the numbers on your server.

MCRYPT_RAND isn’t actually a cryptographically secure random number generator. And believe it or not, but sometimes weak random numbers is all it takes to break the system. And, your seed provides only 10^6 possibilities. That needs to be a lot closer to 10^38.

Let’s say I provide 16 bytes (128 bits) for the key. Your then hash it and get back a hex string. Then mcrypt_enc_get_key_size tells you it needs 16 key bytes, so you substr from the hex string. But since each byte of hex represents only 4 bits worth of data, you end up severely diminishing the key strength.

That doesn’t actually help anything.

Did you talk to professional cryptanalysts? Did any security expert review your code? You should never declare something secure just because you can’t break it. You actually have several glaring security issues.

Encryption is very easy to get wrong, even when you have some idea of what you’re doing. Avoid at all costs trying to do it yourself. And if for some reason you absolutely have to, have several security experts triple check your work.

Okay, I have read up on it a lot more now and believe I have it working to a decent standard.

The comments say “use scrypt, bcrypt or PBKDF2”, so how come you didn’t use one of those?

Fair point, see below.

Rather than comment about this limitation, why not fix it? You need to pad your message to a multiple of the block size in a way that’s reversible.

I didn’t actually add that comment, it was from the PHP site — which is why I was questioning it.

Why? The decryption routine shouldn’t be altering the message.

From what I gather if you have any null padding on the data you want to encrypt this can get lost after decryption; rtrim is used to remove padding. However, not all block modes require padding (see my example that uses CFB). See http://stackoverflow.com/questions/1061765/should-i-trim-the-decrypted-string-after-mcrypt-decrypt for more information.

Here is what I came up with. It uses hash_pbkdf2 which isn’t available < PHP 5.5. That isn’t a problem for me but if you need a < 5.5 version see http://stackoverflow.com/questions/1788150/how-to-encrypt-string-in-php/19445173#19445173

The code below has a hard-coded password and encrypts and decrypts all in one go. Obviously, you wouldn’t do it this way in reality but it gives you an example of how to use pbkdf2 and mcrypt together.

Any thoughts or issues with this? I’ve base64’d the salt, IV and encrypted data so as to store it in a database. Are there any issues in using MCRYPT_MODE_CFB? Are 5000 iterations secure enough without being too slow?

// Check hash_pbkdf2 exists as it's not available < 5.5
if(function_exists('hash_pbkdf2') === false) {

    // A fallback could be implemented here
    exit('hash_pbkdf2() does not exist');

}

// Settings
$stringToEncrypt                = 'This was encrypted';        // The plain text string to encrypt; binary files should be base64 encoded
$encryptionPassword             = 'MyP@$$w0rd';                // The password; the only data that should be kept private
$pbkdf2Iterations               = 5000;                        // More iterations = more secure, 1000 should be used as a minimum
$pbkdf2HashAlgo                 = 'sha256';                    // 256-bit key size
$cipher                         = MCRYPT_RIJNDAEL_128;         // Note: 128 refers to block size, not key size; MCRYPT_RIJNDAEL_128 is similar to 256-bit AES
$blockMode                      = MCRYPT_MODE_CFB;            // CFB mode requires a random IV and does not require padding
$ivSource                       = MCRYPT_DEV_URANDOM;        // URANDOM is faster than RANDOM in some environments

// Create key using PBKDF2 hash
$ivSize                         = mcrypt_get_iv_size($cipher, $blockMode);
$pbkdf2Salt                     = mcrypt_create_iv($ivSize, $ivSource);
$key                            = hash_pbkdf2($pbkdf2HashAlgo, $encryptionPassword, $pbkdf2Salt, $pbkdf2Iterations);

// Pack the key into a binary hex string
$key                            = pack('H*', $key);

// Create random IV
$iv                             = mcrypt_create_iv($ivSize, $ivSource);

// Encrypt the data
$encryptedData                  = mcrypt_encrypt($cipher, $key, $stringToEncrypt, $blockMode, $iv);

// Prepend the IV as it's needed for decryption
$encryptedData                  = $iv . $encryptedData;
    
// Base64 encode it
$encryptedData                  = base64_encode($encryptedData);

// Decrypt
$decryptedData                  = base64_decode($encryptedData);

// Get the IV
$ivDecrypted                    = substr($decryptedData, 0, $ivSize); // Don't need to use mb_substr as it's hexadecimal

// Get the encrypted data on its own
$decryptedData                  = substr($decryptedData, $ivSize);

// Decrypted data
$decryptedData                  = mcrypt_decrypt($cipher, $key, $decryptedData, $blockMode, $ivDecrypted);

// Output for testing
echo "<p>stringToEncrypt: $stringToEncrypt</p>";
echo "<p>encryptionPassword: $encryptionPassword</p>";
echo "<p>pbkdf2Iterations: $pbkdf2Iterations</p>";
echo "<p>pbkdf2HashAlgo: $pbkdf2HashAlgo</p>";
echo "<p>ivSize: $ivSize</p>";
echo "<p>iv: $iv</p>";
echo "<p>pbkdf2Salt: $pbkdf2Salt</p>";
echo "<p>key: $key</p>";
echo "<p>encryptedData: $encryptedData</p>";
echo "<p>ivDecrypted: $ivDecrypted</p>";
echo "<p>decryptedData: $decryptedData</p>";

Trimming null padding is a poor-man’s solution. The better solution is to use a padding scheme that is guaranteed to always be reversible (null padding isn’t).

But this is a moot point now with CFB.

Not a big issue, but databases are capable of storing binary data. You probably don’t need to bother with base64 in this case.

Nope.

However many iterations it takes to hit an ideal (for security) running time of between 0.25 and 0.5 seconds. (That range was suggested by ircmaxwell, but it squares with other recommendations I recall reading over time.) I haven’t benchmarked it, but I suspect 5000 will be a good number.

Not a big deal but just a tip, the hash_pbkdf2 function has a “raw_output” parameter that will cause the result to be returned as raw binary data rather than a hex string. It’ll save you from later on having to convert the hex string back to binary data again.

Only the comment is wrong here. $decryptedData isn’t hexadecimal. It’s raw binary data. The real reason you don’t need mb_substr is because multi-byte operations are for text encodings, such as UTF-8, where a single character might span multiple bytes.


Summary:

Things are looking good now. Password to key generation and iterations are good. Cipher is good. Block mode is good. Padding issue was avoided. Salts and IVs are good, and random data source is good. Everything looks good to my eyes.

ButDISCLAIMER… although I consider myself reasonably knowledgeable on this topic, I’m far from an expert. If you (or your client) want to be truly sure that everything is good, then you should contract a security expert to audit the code and process.

Trimming null padding is a poor-man’s solution.

I agree, that’s why I thought I’d use CFB instead.

That range was suggested by ircmaxwell, but it squares with other recommendations I recall reading over time.

Great link, I am familiar with irxmaxwell from Stackoverflow. I use the excellent bcrypt for password hashing so am familiar with most of that.

Only the comment is wrong here. $decryptedData isn’t hexadecimal. It’s raw binary data. The real reason you don’t need mb_substr is because multi-byte operations are for text encodings, such as UTF-8, where a single character might span multiple bytes.

Yep, good point.

But… DISCLAIMER… although I consider myself reasonably knowledgeable on this topic, I’m far from an expert. If you (or your client) want to be truly sure that everything is good, then you should contract a security expert to audit the code and process.

Thanks for sticking with me. I have found that encryption in this manner isn’t particularly well documented in PHP (perhaps because hash_pbkdf2 is quite new) so hopefully this’ll serve as a good reference point for others.

I’ve just been looking at adding a HMAC. That makes sure that data is the same as what you encrypted, right?

Is this simply hashing the encrypted data with hash_hmac using the encryption key and appending it to the encrypted data? Then on decryption you split off the HMAC, hash_hmac the rest of the data with the key again and check it matches the HMAC.

I’m guessing if you get hash_hmac to return raw output and then use strcmp the IMAC you can keep all the data binary.

Is that how you do it?

Correct. That kind of protection is typically called data integrity.

I have to admit that this particular detail is a hole in my knowledge. But I searched around, this seems useful, and it seems to confirm your approach.

Yup, you should be able to keep it all in binary.

Thanks.

I got it all working and put it into a class. It appears the HMAC is simply hash_hmac-ing it after encryption and then checking again before encryption. You can HMAC before encryption but there are drawbacks (see http://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac).

I have checked this now and it seems CBC and CFB are the best block modes for general file encryption. CFB doesn’t require padding but is a lot slower than CBC. I ran it on a decent reseller server and CBC was up to ten times faster than CFB.

So, I’m thinking about using that instead. I know you mentioned trim and it being a poor man’s padding so it looks like PKCS7 padding is the way to go. According to the Wikipedia article on it:

The value of each added byte is the number of bytes that are added, i.e. N bytes, each of value N are added. The number of bytes added will depend on the block boundary to which the message needs to be extended.

The padding will be one of:

01
02 02
03 03 03
04 04 04 04
05 05 05 05 05
etc.

There is an example on stackoverflow that seems to suggest adding this kind of padding is easy.

$str    = 'Foo';
$block  = 16; // You wouldn't hard code this in practice
$pad = $block - (strlen($str) % $block);
$str .= str_repeat(chr($pad), $pad);

On decryption you simply reverse it:

$len = strlen($str);
$pad = ord($str[$len-1]);
$str = substr($str, 0, strlen($str) - $pad);

Is that about it? Is chr and ord used so that it is always padded with an entire byte and thus allows you to pad up to 256-bit sized blocks?

As a final question, is there any way to “stream” an encryption, that is, you read a binary/file stream and encryption and save as you go thus allowing you to encrypt large files without running out of memory. I guess things that require huge amounts of RAM aren’t suitable for web apps but I’m interested all the same.

Yup! :smile:

Large block sizes might not have been the motivation, but yes, padding is in whole bytes.

I glanced through the list of mcrypt functions, but it doesn’t look like they support that. :-/

Great, thank you!

Last question (I think!): I read on one stackoverflow post that when you create the HMAC you should hash the IV along with the encrypted data. Seems pointless to me, what are your thoughts? I know you said HMACs weren’t your strong point but as a programmer what extra security do you think this adds?

Thanks again for your help.

Off the top of my head, I can’t think of any reason why that would help. As it is, if you’re HMACing the final encrypted content, then the IV might already be in there somewhere. It’s common for the encrypted content to include a prepended IV.

That’s what I thought. I’m keeping the IV, salt and encrypted data stored in separate database fields. I assume the IV and salt can be public like the salt for a hash.

If you prepended the IV you’d do it after encryption, presumably, to save having to store the IV and encrypted data separately. So HMACing the encrypted data and IV together would add no security. I guess it would confirm the IV was correct but you’d know if the IV was wrong as the encryption would fail. :smile:

Yup.

True. Although, I think one of the side-benefits HMACing is supposed to provide is to allow you to know whether decryption will succeed before you even attempt it. So from that perspective, it can be useful to make sure the IV is included in the HMAC.

True. Although, I think one of the side-benefits HMACing is supposed to provide is to allow you to know whether decryption will succeed before you even attempt it. So from that perspective, it can be useful to make sure the IV is included in the HMAC.

Excellent point :slight_smile:

Sort of off-topic but do you think salts should be generated from a cryptographically secure source? I personally don’t think they should, they should just be unique for each password as they are there to prevent rainbow tables. From http://blog.ircmaxell.com/2012/04/properly-salting-passwords-case-against.html:

To put it simply, a salt is a unique value that you use to differentiate multiple hashes from each other. Now, there are a few key words in there. First, for a salt to be effective, it does not need to be random! The only condition that needs to be satisfied is that it’s unique.