Security: Recognizing the Inappropriate
There are things I left out in my earlier post on authorization. One of those things is dealing with what may pass those checks if divided into sufficiently small pieces but are undesirable in a larger context that may span time. In that case its less about leaking data or executing actions the apparent requestor is not authorized for and more about detecting compromised requestors and/or DoS (denial of service) attacks.
Some may think that is separate from authorization. Others may point out that the recognition process is like making an authorization decision. All it takes is to present the question differently. For example: “Is this user authorized to perform this many requests within the current time period?” or “Is this used authorized to do that much all at once?". It matters less (to me) what you or I may call this. It matters more how and where we manage, administer, apply those constraints. Before we go there, there are some things we need to figure out first…
What is appropriate?
Step out of the servers and network infrastructure. Let’s talk about you expect your users and other clients to do. If you expect them to do something, then that “something” must be appropriate. Conversely, at some point, the “unexpected” may either be considered “malicious”, or an expectation miss. But how do you form your expectations? Based on what? I’ll mention a few examples to think about:
How fast can a real human type, click, use the UI?
How many computers can a real human use concurrently?
How quickly can a real human move from one location to another?
How long can a real human work before having to rest?
How many distinct resources do they need to access within some period, e.g., clients by support agents?
What are the tools those humans use implemented to be able to do? Doing more would suggest some other tools are being used. Include proxies, gateways, middleware, etc.
Do other practical limits apply – e.g., known low bandwidth or high latency? Being faster becomes suspicious.
Are some abilities implied – e.g., being smart enough to not keep repeating the same or similar requests over and over again?
Do known users, tools, clients have a recognizable style or shape of requests, say the inclusion and order of details and metadata?
How frequently are which errors expected? Too many “not found” errors may indicate random fishing for entities/resources.
Consider also whether all your users and tools are the same. I’m going to assume that you’d at least like to be able to cover those examples above and that the challenge is only as to how to do so. I’m going to approach this through the “beyond immediate user input, everything is an API” lens. Serving static web pages? That is the API that browsers use to get HTML. Sending IP packets to another computer? That is the API one computer uses to request another to handle those packets.
It becomes apparent that we have different levels of exposure, context, details, and abstraction with different types of APIs. Think how hard it would be to try to implement security covering the above at the level of IP packets alone, without cheating by understanding higher level protocols. Payloads would mostly be encrypted so you’re generally constrained to headers and time. You don’t even know who, or even what is on either end of communication. The higher you go up the abstraction tower, the more visibility you gain but your reaction time gets delayed, and you stop seeing communication that is invalid as per that abstraction. That tells you something…
One constraint can’t cover everything
Each level of abstraction deals with its own detail that isn’t present in others. As we can only constrain the details we know, that means that requires to constraints to be applied in their respective levels of abstraction, not above or below. Applying an IP packet rate limit to a particular link, perhaps for pairs of IP addresses may help curb some unexpected communication but it says nothing about users, or the kind of data communicated. Such constraints must be generous to allow all proper cases.
What happens if we take a higher abstraction and apply the same concept, a rate limit, to HTTP requests? First, we lose the visibility of how many packets a request involves. There are many ways to make this intentionally more stressful for the system while still appearing as a single, valid HTTP request. Of course, we gain the visibility of higher-level interactions, authentication, details of requests, payloads. It becomes possible to express more expectations, constraints. Is an HTTP rate limit enough?
Not by a lightyear. HTTP remains very chatty and disconnected from the real-world activity. The days where a single human produced a single HTTP request have disappeared in the early days, if not hours of the HTTP standard. Just think how many CSS, image, script and any other requests are needed just to display one page? How many hundreds, sometimes thousands of extra HTTP requests are needed to gather additional resources after the initial REST API call (due to N+1 issues)? If you’re not familiar with what I’m talking about, do check out my series on REST APIs: what they are, carving the states, opposing forces to list a few.
An HTTP rate limit is just a bit less removed from the user use case than IP rate limit, especially with REST APIs. REST API isn’t a use-case API style, it is a state transfer API, and any single use case frequently ends up needing many of these. Not only you will find it hard to discover what should a good limit be, you’ll have to significantly inflate it to cover valid edge cases and that will give your attackers sufficient room to do what they want. Additionally, rate limits don’t control the kinds of data communicated, though you could introduce many independent rate limits for each pair of HTTP verb and endpoint (or REST resource). Nevertheless, as most REST API requests follow the same strict forms and have rigid output structures, you can’t properly do request style/signature recognition or output variation. You think you do? That’s probably not really REST any more. See that series of posts.
Options
So how do you map your real world, use-case-based expectations into implementable constraints? Do you?
Go to an even higher level of abstraction above REST and HTTP? Which one?
Use an approach that aligns use cases with HTTP better than REST does?
Abandon HTTP and switch to something else that aligns better?
Answer (1) is theoretically obvious, but do you have a practical answer? I don’t. That leaves us with two other choices if we want to do well… and neither is RESTful. A well-designed gRPC API could be the answer for (3) but it may not be suitable in all contexts. There are multiple contenders for (2). One of the major ones is (Microsoft-initiated) OData and a minor one I’m aware of is TreeQL. Due to reasons that would deserve a separate post, I don’t recommend either unless you are already involved with them.
That leaves the most popular, supported option that has taken the world as a storm: GraphQL. It lets clients state exactly what they need for the use case, allowing almost ideal mapping between the use cases and GraphQL/HTTP requests. As a bonus the style or signature of requests becomes apparent and can be considered. Some who are only familiar enough with it to be dangerous point out that a single GraphQL/HTTP request can demand too much. That is 100% correct. What is also 100% correct is that this becomes apparent and can trivially be accounted for and discriminated against. It does not suffer REST afflictions as it isn’t rigid, and more tools are available beyond rate limiting. You begin to see more than nails once you have more than a hammer. A huge ecosystem of tools is out there to help, including my open-source project, Paniql. You should also check Apollo GraphQL contracts and complexity-based limits and Stellate complexity-based rate limiting as well.
Continuous Administration
Coming full circle back to my opening analogy with authorization, consider where should any limits, constraints, budgets, quotas be expressed. How fine grain should they be? Per user? Per system? Per service? Why not all? Who maintains them, updates them as use cases evolve and needs and abilities of people and tools change? Are these to be manually burned into your infrastructure configuration by the IT people? Could they even do it if they could keep up? Would you benefit from more sophisticated tooling? Does your ecosystem have such tooling already? Are you aware that there are tools that can help you with this? Would you start moving on from your legacy approaches to be able to use them?
No, mechanical application of authentication and rate limit constraints is not enough as these are relatively easy to work around. We have the opportunity and tooling to do much better and we need to continuously manage our constraints to match our evolving expectations for every player in the system, human, machine, or infrastructure. If we want to do this well, frequently and with many parameters we shouldn’t rely on manual labour to keep updating some derivative numbers. While the enforcement of constraints can and should be done in places authorizations may not be, what drives it, the actual limits, quotas would benefit from authorization-like management, in the same tools and viewing both human and synthetic actors as security subjects, authorizing the entire chain (graph, really) of “broken telephones” on the way.
Now what?
Do you think that a single number or a few of them used as limits can protect your system? I honestly hope that your answer isn’t more positive than “Maybe just as a beginning.” Proper protection against the inappropriate requires the understanding of real-world expectations, not “Nth derivatives” of those. Understanding is nothing until it is applied – so we must be able to both express and enforce those, likely covering multiple different levels of abstraction, from the lowest network layers up, as high as you can go. You’re going to have to think about the use case mappings with REST APIs and decide whether you’re going to move on (if you haven’t already). There are other reasons why REST APIs are notoriously insecure, the chief among them, in this context, are that they take too much development time and focus away from security considerations and the impossibility to have generic tools that could help.