Ruby HTTPS web calls


As I noted in my entry on Ruby security, VERIFY_NONE is used all over the place. And what I realized when I tried to use VERIFY_PEER was that it really doesn’t work for net/https, and doesn’t seem to ever have worked for me. I got a bit mystified by this since I couldn’t find much mention about it online. And then Victor Grey came to the rescue in one of the comments. The solution is to not use net/https at all, but instead use the httpclient gem (formerly called http-access2). So do a ‘gem install httpclient’. Then you can use this code:

require 'rubygems'
require 'httpclient'

clnt = HTTPClient.new
puts clnt.get_content("https://www.random.org/")

This will just work. Under the covers, httpclient uses VERIFY_PEER as default. And you can see this by changing the hostname from www.random.org to random.org. That will generate a verification error directly. Awesome. So what’s the lesson? Never use net/https, folks!



Ruby Security quick guide


I’ve looked around a bit and it seems that there is no real good guide to security programming in Ruby. Neither is there any book available (although Maik Schmidts book Enterprise Recipes with Ruby and Rails will be the best reference once it arrives). The aim for this blog entry will be to note a few things you often would like to do, and how you can do it with Ruby. The focus will be mostly on the cryptographic APIs for Ruby, which doesn’t have much documentation either. In fact, the best documentation for this is probably the OpenSSL documentation, once you learn how to map it to the Ruby libraries.

You want to avoid clear text passwords in your database

One of the very good properties of handling passwords, is that you usually don’t need to actually know what they are. The only thing you need to be able to do is to reset them if someone forgets their password, and verify the correct combination of username and password. In fact, many practitioners feel better if they don’t need to have the responsibility of knowing the passwords of every user on their system, and conversely I feel much better knowing that my password is secure from even the administrators of the system. Not that I ever use the same password to two different systems, but someone else might… =)

So how do you solve this easily? Well, the easiest way – and also the most common way – is to use a digest. A digest is a mathematical function that takes as input a series of numbers of any length and returns a large number that represents the input text. There are three properties of digests that makes them useful: the first is that a small change in the input text generate a large effect in the outcome data, meaning that olabini and olbbini will have quite different digests.

The second is digests generally have a good distribution and small risk of collisions – meaning that it’s extremely unlikely that two different texts have the same output for a specific digest algorithm. The third property is that they are fundamentally one way mathematical operations – it’s very easy to go from plain text to digest, but extremely hard to go in the other direction.

There are currently several algorithms used for digests. MD5 and SHA-1 is by far the most common ones. If you can avoid MD5, do so, because there have recently been some successful attacks against it. The same might happen with SHA-1 at some point, but right now it’s the best algorithm for these purposes. Oh, and avoid doing a double digest – digesting the output of a digest algorithm – since this generally makes the plain text easier to crack, not harder.

Let’s see some code:

require 'openssl'

digest = OpenSSL::Digest::SHA1.hexdigest("My super secret pass")
p digest # => "dd5f30310682e5b41e122c637e8542b1b39466cf"

digest = OpenSSL::Digest::SHA1.hexdigest("my super secret pass")
p digest # => "f923786cc72ed61ae31325b6e8e285e6c35e6519"

d = OpenSSL::Digest::SHA1.new
d << "first part of pass phrase"
d << "second part of pass phrase"
p d.hexdigest # => "f13d7bdee0634c017babb8c72dcebe18f9e0598e"

I have used two different variations here. The first one, calling a method directly on the SHA1 class, is useful when you have a small string that you want to digest immediately. It’s also useful when you won’t need different algorithms or send the digest object around. The second method allows you to update the string data to digest several times and the finally generate the end digest from that. I have taken the liberty of using the methods called hexdigest for both cases – this is because they are more readable. If you replace hexdigest with digest, you will get back a Ruby string that contains characters from 0 to 255 instead, which means they doesn’t print well. But if you were to compare the result of hexdigest and digest, you will see that they return the same data, just in different formats.

So as you can see, this is extremely easy to incorporate in your code. To verify a password you just make a digest of the password the person trying to authenticate sent in to you, and compare that to what’s in your database. Of course, the approach is generalizable to other cases when you want to protect data in such a way that you can’t recover the data itself.

Finally, be careful with this approach. You should generally combine the password with something else, to get you good security. There are attacks based on something called rainbow tables that makes it easy to find the password from a digested password. This can be avoided using a salt or other kind of secret data added to the password.

You want to communicate with someone securely
When you want to so send messages back and forth between actors, you generally need a way to turn it unreadable and then turn it back into something readable again. The operation you need for this is called a cipher, but there are many kinds of ciphers, and not all of them are right. For example, rot-13 be considered a cipher, but there is no real security inherent in it.

In general, what you want for communication between two parties is a symmetric cipher using a secret key. A symmetric cipher means that you can use the same key to “lock” and “unlock” a message – or encrypt and decrypt it. There are other kinds of ciphers that are very useful, which we’ll see in the next example, but symmetric ciphers are the most commonly used ciphers since they are quite easy to use and also very efficient. For a symmetric cipher to be completely safe, the key should always be as long as the data that should be encrypted. Generally this doesn’t happen, since key distribution becomes a nightmare, but the current approaches are reasonably sure against most kinds of cracking. Coupled with asymmetric ciphers, they become extremely useful.

There are three things you need to use a symmetric cipher. The first one is the algorithm. There are loads of different kinds of symmetric ciphers around. Which kind you will use doesn’t matter that much as long as you choose something that is reasonably sure. One of the more widely used algorithms is called DES. In it’s original form DES should definitely not be used (since the key length is only 56 bits, it’s actually not that hard to crack it.). There is another form of DES called Triple DES, or DES3, which effectively gives you more security. DES3 might work in some circumstances, but I would recommend AES in almost all cases. AES come in three varieties called AES-128, AES-192 and AES-256. The difference between them is the key length. As you might guess, AES-128 needs a 128 bit long key, AES-192 a 192 bit long and AES-256 a 256 bit long. These key lengths all give reasonable security. The more security you need the longer key algorithm you can choose. The tradeoff is that the longer the key is, the more time it will take to encrypt and decrypt messages.

Once you have an algorithm, you need a key. This can come in two varieties – either you get a key from a humans, where that key is generally a password. Otherwise you might want to automatically generate a key. This should be done with a secure random number generator – NOT with rand().

Depending on which algorithm you choose, you might also need something called an IV (initialization vector). The algorithms that require an IV is called cyclic block ciphers (CBC) and will work on a small amount of bytes at the same time. In the case of AES-128, a block of 16 bytes are generated on every cycle of the algorithm. These 16 bytes will be based on the algorithm, the key, and the 16 bytes generated the last time. The problem is that the first time there were no bytes generated, which means these will have to be initialized another way and this is where the IV comes in. It’s just the first block that will be used for generating the first real block of data. The IV does not need to be secured and can be sent in the clear. In Ruby, if you don’t provide an IV when initializing your cipher, the default will be a part of the string “OpenSSL for Ruby rulez!”. Depending what length the IV should have, a substring will be used.

So, let’s take a look at some code. This code will just encrypt a message with a password and then decrypt it again:

require 'openssl'

cipher = OpenSSL::Cipher::AES128.new("CBC")
cipher.encrypt
cipher.key = "A key that is 16"
cipher.iv =  "An IV that is 16"

output = ""

output << cipher.update("One")
output << cipher.update(" and two")
output << cipher.final

p output # => "\023D\aL\375\314\277\264\256\245\225\a\360|\372+"

cipher = OpenSSL::Cipher::AES128.new("CBC")
cipher.decrypt
cipher.key = "A key that is 16"
cipher.iv =  "An IV that is 16"

output2 = ""

output2 << cipher.update(output)
output2 << cipher.final

p output2 # => "One and two"

We first create a cipher instance, giving it CBC as the type. This is the default but will generate a warning if you don’t supply. We tell the cipher to encrypt, then initialize it with key= and iv=. The key is 16 bytes because it’s a 128 bit cipher, and the IV is 16 bytes because that’s the block size of AES. Finally we call update on the data we would like to encrypt. We need to save the return value from the update call, since that’s part of the generated cipher text. This means that even encrypting really large texts can be very efficient, since you can do it in smaller pieces. Finally you need to call final to get the last generated cipher text. The only difference when decrypting is that we call decrypt on the object instead of encrypt. After that the same update and final calls are made. (Am I the only one who thinks that encrypt and decrypt should be called encrypt! and decrypt!)?

That’s how easy it is to work with ciphers in Ruby. There are some complications with regards to padding, but you generally don’t need to concern yourself with that in most applications.

You want nodes to be able to communicate with each other securely without the headache of managing loads of secret keys

OK, cool, symmetric ciphers are nice. But they have one weakness, which is the secret key. The problem shows up if you want to have a network of computers talking to each other securely. Of course, you can have a secret key for all of them, which means that all nodes in the network can read messages to other parties. Or you can have a different secure key for each combination of nodes. That’s ok if you have 3 nodes (when you will just need 3 secret keys). But if you have a network with a 100 nodes that all need to communicate with each other you will need 4950 keys. It will be hard to distribute these keys since they need to be kept secure, and generally just managing it all will be painful.

The solution to this problem is called asymmetric ciphers. The most common form of this is public key cryptography. The idea is that you need one key to encrypt something, and another key to decrypt it. If you have key1 and encrypt something with it, you can’t decrypt that something with key1. This curious property is extremely useful. In the version of public key cryptography you generate two keys, and then you publish one of the keys widely around. Since the public key can’t be used to decrypt content there is no security risk in not keeping it secret. The private key should obviously be kept private, since you can generate the public key from it. What does this mean in practice? Well, that in your 100 node network, each node can have a key pair. When node3 wants to send a message to node42, node3 will ask around for the public key of node42, encrypt his message with that and send it to node42. Finally, node42 will decrypt the message using his private key.

There is one really large downside with these ciphers. Namely, it is quite expensive operations, even compared to other ciphers. So the way you generally solve this is to generate a random key for a symmetric cipher, encrypt the message with that key and then encrypt the actual key using the asymmetric cipher. This is how https works, it’s how SSH works, it’s how SMIME and PGP works. It’s generally a good way of matching the strengths and weaknesses of ciphers with each other.

So how do you use an asymmetric cipher? First you need to generate the keys, and then you can use them. To generate the keys you can use something like this:

require 'openssl'

key = OpenSSL::PKey::RSA.generate(1024)
puts key.to_pem
puts key.public_key.to_pem

This will give you the output in form of one private key and one public key in PEM format. Once you have this saved away somewhere, you can start distributing the public key. As the name says, it’s public so there is no problem with distributing it. Say that someone has the public key and wants to send you a message. If the PEM-encoded public key is in a file called public_key.pem this code will generate a message that can only be decrypted with the private key and then read the private key and decrypted the message again.

require 'openssl'

key = OpenSSL::PKey::RSA.new(File.read("public_key.pem"))
res = key.public_encrypt("This is a secret message. WOHO")
p res

key2 = OpenSSL::PKey::RSA.new(File.read("private_key.pem"))
res2 = key2.private_decrypt(res)
p res2

Note that the methods we use are called public_encrypt and private_decrypt. Since the public and private prefix is there, there has to be a reason for it. And there is, as you’ll see soon.

You want to prove that it was you and only you who wrote a message

Being able to send things in private is all good and well, but if you distribute you public key wide you may never know who you get messages from. Or rather, you know that you get a message from their addresses (if you’re using mail) – but you have no way of ensuring that the other person is actually who they say they are. There is a way around this problem too, using asymmetric ciphers. Interestingly, if you encrypt something using your private key, anyone with your public key can decrypt it. THat doesn’t sound very smart from a security perspective, but it’s actually quite useful. Since you are the only one with your private key, if someone can use your public key to decrypt it, then you have to be the one who wrote the message. This have two important consequences. First, you can always trust that the person who wrote you a message was actually the one writing it. And second, that person can never retract a message after it’s been written, since it’s been signed.

So how do you do this in practice? It’s called cryptographic signatures, and OpenSSL supports it quite easily. If you have the keys we created earlier, you can do something like this using the low level operations available on the keys:

require 'openssl'

pub_key = OpenSSL::PKey::RSA.new(File.read("public_key.pem"))
priv_key = OpenSSL::PKey::RSA.new(File.read("private_key.pem"))

text = "This is the text I want to send"
signature = priv_key.private_encrypt(text)

if pub_key.public_decrypt(signature) == text
  puts "Signature match"
else
  puts "Signature didn't match!"
end

There happens to be a slight problem with this code. It works, but if you want to send the signature along the message will always be twice the size of the original text. Also, the larger the text to encrypt the longer it takes. The way this is solved in basically all cases is to first create a digest of the text and then sign that. The code to do that would look like this:

require 'openssl'

pub_key = OpenSSL::PKey::RSA.new(File.read("public_key.pem"))
priv_key = OpenSSL::PKey::RSA.new(File.read("private_key.pem"))

text = "This is the text I want to send"*200

signature = priv_key.sign(OpenSSL::Digest::SHA1.new,text)

if pub_key.verify(OpenSSL::Digest::SHA1.new, signature, text)
  puts "Signature verified"
else
  puts "Signature NOT verified"
end

As you can see we use the sign and verify methods on the keys. We also have to send in the digest to use.

You want to ensure that a message doesn’t get modified in transit

Another usage of signatures, that we actually get for free together with them, is that there is no way to tamper with the message in transit. So even if you send everything in clear text, if the signature verification succeeds you know that the message can’t have been tampered with. The reason is simply this: if the text was tampered with, the digest that gets generated would be different, meaning that the signature would not be equal to what’s expected. And if someone tampered with the signature, it wouldn’t match either. Theoritically you can tamper with both the text and the signature, but then you’d first have to crack the private key used, since otherwise you would never be able to generate a correct signature. All of this comes for free with the earlier code example.

You want to ensure that you can trust someones public key
Another thing that can get troublesome with public and private keys is the distribution problem. When you have lots of agents you need to make sure that the public key you get from someone is actually their public key and that they are the ones they say. In real life this is a social problem as much as a technological. But the way you generally make sure that you can trust that someones public key is actually theirs is that you use somethong called a certificate. A certificate more or less is a signature, where the signer is someone you trust, and the text that is signed is the actual public key under question. If you trust someone that issues certificates, and that someone has issued a certificate saying that a public key belongs to a specific entity, you can trust that public key and use it. What’s more interesting is that certificates can also be signed, which means that you can trust someone, that someone can sign another certificate authority, which signs another certificate authority, which finally signs a public key. If you trust the root CA (certificate authority) you should also be able to trust all the certificates and public keys in the chain. This is how the infrastructure surrounding https works. You have a list of implicitly trusted certificates in your browser, and if an https site can’t be verified with this trust store, you will get one of those popup boxes asking if you really want to trust this site or not.

I will not show any code for this, since X509 certificates are actually quite well documented for Ruby. They are also a solution that takes some more knowledge to handle correctly, so a real book would be recommended.

You want to use https correctly

The protocol https is very useful, since it allows you to apply the whole certificate and asymmetric cipher business to your http communication. There is a library called net/https which works really well if you use it right. The problem is that I see lots and lots of example code that really doesn’t use it correctly. The main problem is the setting of verification level. If you have the constant VERIFY_NONE anywhere in your code, you probably aren’t as secure as you’d like to think. The real problem is that I’ve never been able to get Ruby to verify a certificate with VERIFY_PEER. No matter what I do in the manner of adding stores and setting ca_file and ca_path, I haven’t ever been able to make this work. It’s quite strange and I can’t seem to find much mention of it online, since everyone just uses VERIFY_NONE. That’s not acceptable to me, since if no verification happens, there is no way to know if the other party is the right web site.

This is one area where JRuby’s OpenSSL seems to work better, in fact. It verifies those sites that should be verified and not the others.

If anyone feels like illuminating this for me, I would be grateful – the state of it right now is unacceptable if you need to do really secure https connections.

Keep security in mind

My final advice is twofold and have nothing to do with Ruby. First, always keep security in mind. Always think about how someone externally can exploit what you’re doing, intercept your connections, or just listen to them. Programmers generally need to be much more aware of this.

The last thing I’ll say is this: start reading Cryptogram by Bruce Schneier. While not completely technical anymore, it talks about lots of things surrounding security and also gives a very good feeling about how you should think and approach security. Subscribe here.