OAuth 2.0 and OpenID Connect For Dummies
Many of us struggle with this subject. What’s up with that? And what are those, anyway?
OAuth 2.0 and OpenID Connect are simply protocols we use to authenticate our apps. However, a lot of us have a hard time understanding how they work. They are indeed pretty complicated to grasp and for some reason the internet is full of confusing information about them. This post will hopefully help.
Authentication at its most basic example would be this:
A login form that allows the user to submit their username and password.
The backend, in turn, goes to the database to look for it and, if found, makes sure the password matches. It might also get all the info about the user and will probably throw some cookie into the browser so it can keep track of them.
This trivial scenario has been the bane of many coders in ancient times, mostly because they had to write it themselves and this is not trivial at all. There are security, roles, and permissions concerns to consider. You also have to keep it up-to-date with best practices, maintain and make sure it always works properly.
The good news is that the industry had changed a lot since then. Developing your authentication mechanism in-house— is no longer a thing. Phew!
We Have Protocols for This
OAuth 2.0 ** and OpenID Connect are considered best practice for dealing with what I just described. What are they? How do they work? The information on the internet is not always very newbie-friendly. The terminology is very specific and most of the texts assume that the readers already speak the lingo. If you’re not already in deep, it’s very hard to get in. Also, it appears that people are using them in all kinds of ways, and it’s easier to find debates on how to use them than an actual spec. That is to say, if you find yourself intimidated by this subject, you are not alone.
** Yes, once upon a time there was Oath 1.0. But we’re not going to talk about it here.
You’ve probably run into a dialog saying something like this: “Hey, this app is asking for access to your Facebook account, but won’t publish posts on your behalf”. This very common pattern is OAuth.
Let’s break it down piece by piece.
Say we’re a new social app (“Tunagram”- kinda like Instagram, but for fish) and we need the user’s Google contacts to invite them to join. We don’t need anything else from them. Just contacts.
- Resource owner: the user (which is — some fish, probably).
- Client: the application (“Tunagram”).
- Authorization server: the system used to grant access to the client (ex: accounts.google.com).
- Resource server: the API which holds the data we want to access (sometimes it’s the same as the authorization server. Many times it’s not).
- Authorization grant: a “proof” that the user gave permission.
- Redirect URI: where to send the user when they’re done authenticating. Also known as callback (tunagram.com/callback).
- Access token: the key the client is going to use to actually get the data, upon receiving permission.
Refer to the diagram and breakdown below:
The Authorization Code Flow
- The resource owner, hence the user, clicks the login button.
- The user’s browser redirects them to a Google domain: accounts.google.com (which is the authorization server, remember?) in which they will probably need to login to Google.
- The browser also passes along the redirect URI (or callback) — which is where we want the authorization server to redirect us, assuming everything went well, and the response type — what kind of proof do we want. In this use case, which is the most common one, we’re asking for a code (hence “the authorization code flow”).
- Once the user successfully logged in, they’re going to see a dialog similar to the one above — the app is asking for this and that, do you consent?
- Assuming the user consented, the browser will successfully redirect back to the redirect URI with the authorization code (that’s what we asked for).
- The client (meaning the application) can only do one thing with said code, which is send it back to the authorization server and exchange it for an access token.
- Upon verification of the access token, the client can do what it originally meant: happily access the resource server (contacts.google.com). Thanks to the access token attached to the request, the resource server will understand the client is eligible to access it.
What if the client tries to do something other than reading the contacts? Say, delete them? The resource server will not allow it. This is what OAuth calls a scope.
The authorization server has a list of scopes it understands. For example, contacts.read, contacts.write etc. When the client starts the flow, it can also define the wanted scopes in its request.
Let’s look at the above flow again, this time with this important addition: all the scopes we need. This is what the user will see in the consent dialog. And when all is said and done, the access token received is limited to the scopes we asked for.
(See an example of such a request here: https://www.oauth.com/oauth2-servers/authorization/the-authorization-request/)
Back Channel and Front Channel
This flow is designed to benefit from both the front and back channels and use them to be as secure as it can be.
Wait, what are those?
The back channel is the highly secure communication channel. For example, the code in our server which we’re the only ones who can access, that is making an API request to another server, through HTTPS.
Check out the flow drawing again. Everything up until receiving the authorization code happens in the front channel. All of those are done by the browser. They’re no secrets.
The next step though — exchanging the code for an access token — happens only in the back channel. The client takes the code received and makes a POST request, along with some additional information that only the server knows, such as a secret key ***.
*** The secret key, and also the client id, are created upon setting up a client in Google, that is the initial setup at our end as the app owners,
and both are used to identify the user against the authorization server. We don’t ever want that secret key to be exposed anywhere in the browser.
The communication with the resource server using the access token also happens in the back channel. That is because once the client has the token, if it is passed to the browser, it can be stolen.
So the user’s browser sends something like this:
And if all went well, and the user consented, they get a nice response, and can proceed to do this:
And assuming that also went well, the user will get this back:
Then they can finally do this:
Authorization: Bearer FgrgGsd456GdSghDsgsa3fHg
The client makes a request to the API, it includes the token, and the API makes sure it’s a valid token and that it has the scope to do what it asks to do in the request.
Other Possible Flows
Note there are some other flows too, but this is by far the most common one, and unless you have a service-to-service situation with no front channel, or otherwise a browser application with no back channel, this is what you should be looking at.
(For other flows read: https://auth0.com/docs/authorization/authorization-flows)
OAuth For Authentication?
So this was OAuth, which was invented around 2006 to give us this great way to do this delegated authorization (and there were no good ways up until then — only really bad, bad ways). And it really was great. So great that everyone started to overuse it for things it was not designed for: not only delegated authorization but also standard authentication, mobile app login, single sign on and all of that stuff.
Using OAuth for authentication is bad, because in OAuth there is no standard way that isn’t hacky to get information about the user. OAuth was designed for certain permission and scopes, It doesn’t know who just logged in or what is their email. It doesn’t have a standard for it, so everyone who wants to build over it to do an authentication flow, has to implement it freestyle, with their own custom hacks.
So now what?
OpenID Connect came to solve what OAuth couldn’t: it is basically an extension of OAuth, with some extras made to close the gap:
- ID token, which adds some user information.
- /UserInfo endpoint — which we can access for more user information.
- Standard specs.
It means that if we’re connecting with an authorization server that understands OpenID Connect, we can ask for not just an access token, but also an ID token and the information at /UserInfo, which are basically everything we need to know.
OpenID Connect Authorization Code Flow
Notice that little addition to the request: now in the scopes we also ask for openid. This is the only thing that identifies the request as an OpenID Connect request. So we still go to the authorization server and get the authorization code and exchange that for an access token etc. But we also get the ID token now, which is used to let the app know about the user.
The id token we’re expecting to get looks like a long string of Gibberish:
However, it isn’t so. It is a JWT — JSON Web Token. Upon decoding it, you can see its anatomy: a header, a payload (also called claims), and a signature.
In the payload part, you can see the basic information about the user. The signature is basically all the encoded data — the header and payload combined — as was sent to the user, and it indicates that the information is valid and that the id token wasn’t changed or corrupted.
Let’s go back for a second to the super basic login form scenario from the beginning of the post. Knowing what we know now, imagine building it using OpenID Connect: all we have to do is use the OpenID authorization code flow, which would be approaching the authorization server, receiving a code, exchanging it for an access token and an id token (and optionally, a refresh token. See next part!) which are going to be saved at the backend.
Again, this is the common flow. But there are others, for other use cases. Please check out https://connect2id.com/learn/openid-connect#auth-request
Let’s Take a Short Break
This is a bit outside of the scope of the post but not really. It’s important enough to be mentioned. Every JWT has an expiration time in its payload. This means that after this time, the token is not valid anymore. In order to avoid forcing the user to log in again every five minutes, we can use a refresh token.
A refresh token is another type of token, which is used behind the scenes to generate a new access token once expired. So upon the initial request, the authorization server will now return both an access token and a refresh token (which is basically another JWT only with a much longer lifetime). When the access token is expired, the client will issue another request, now of grant type refresh_token, and after validation, it will get back a new access token and replace the old one with it. This happens without involving the user, and so we can keep scrolling our Instagram (or in our case, Tunagram) feed basically forever without having to log in again.
End of Short Break
OAuth and OpenID Connect are simply different tools for different tasks. I hope this post helps to wrap your head around which is what. However, it is covering a lot of stuff in a very general fashion. For implementing OAuth and OpenID in your backend, for example, you should check other sources too.
There are tons of info in the great following sources:
Also posted here.
We will contact you as soon as possible.