When you leave your house, go to sleep, or go do work in the yard, you lock yourdoor. Maybe you have a gate or fence you lock too. Without these, anyone canwaltz into your house and snoop around.
Row Level Security (RLS) can be attractive to developers for numerous reasons,but the foot-guns and gotchas in RLS often outweigh the benefits. You probablywant to keep your doors locked.
Friends and family: Managing access
RLS for Postgres lets administrators define security policies in theirdatabase, instead of the application layer. Let's imagine your house is yourdatabase, and the rows, tables, and data are like the things inside.
When your friends or family come over, you give them keys to every drawer theyare allowed to have access to. Maybe everyone gets access to the silverware, butonly the family can access your laundry room.
This is similar to how policies work in RLS. The rules for who gets which keysare your policies. If a user passes a policy rule (has the key) then they areallowed to access the data. At a very small scale, this can seem like a greatidea. Anyone can access your database however they want and your policies ensurethey aren't seeing things they shouldn't.
Testing and scaling these policies as your database grows becomes nearimpossible. For every new feature in your application, you must ensure your RLSpolicies are protecting the correct rows. Remembering to add these policies canbe cumbersome, especially when they need to be manually synced to your codebase.
RLS fundamentally exists to protect your data. If you mess up even a singlepolicy however, your data becomes exposed. Managing access in the same locationyour code lives is much easier than remembering to write a new policy every timea new table, column, or feature is added to your product.
The party: Managing connections
Postgres uses a process-per-connection architecture. Each new user connecting toyour database directly with their role is like a new person coming into yourhouse. At first it's fine, but once you have 100 people it gets crowded prettyquick.
PgBouncer is a connection pooler that reuses a small number of directconnections to your database while letting many clients connect to it. Whenusing PgBouncer with RLS, you lose the upstream identity of the client.
The traditional way of solving this is using local variables instead of roles todefine RLS policies. You define a policy that reads from a session-localvariable instead of checking the Postgres role:CREATE POLICY user_isolation ON orders
FOR ALL USING (user_id = current_setting('app.tenant_id')::bigint);
Then wrap every transaction in your application to set that variable:BEGIN;
SET LOCAL app.tenant_id = '1234';
SELECT * FROM orders;
COMMIT;
This requires a lot of extra application code to manage all the different localvariables attached to each and every transaction (1). IfSET LOCAL is omitted, current_setting() returns an empty string or throws anerror depending on how your policy is written.
Annoying neighbor: Attack Surface
You go out to get your mail and you find your neighbor standing over yourmailbox trying to open it over and over. You try to tell them that one isyours and to let you in, but they are having none of it. Now you have to sitand wait until they get bored and figure out they don't have the right key.
RLS acts like an extra WHERE clause appended to your queries. Unless the userlacks read permission on a table, their queries will still runeven if no data is returned. On complex joins or queries lacking indexes, thiscan hurt database performance.
If a malicious user starts retrying a query over and over, RLS will make sure theydon't see any data, but cannot stop them from running the query itself. Relyingon RLS to completely protect your tables burns valuable CPU cycles and canpotentially starve your other, honest users.
Any user of your application, particularly in situations where you do not havesufficient rate limiting in place, can DDoS your database simply by hitting anAPI endpoint. This is preventable by checking authentication to see if a user isallowed to run a query, without relying on RLS to manage your security for you.
A large keyring: Performance Implications
Every time your friend goes to get a Diet Coke, they need tofind the fridge key on their very large key chain. This wastes valuable timesifting through all the different keys and trying each one, so instead they markthe key so it's easier to find next time they go to the fridge.
RLS policies are executed per row, meaning any function or complex logic willrun for each row scanned. This can be solved by wrapping the function in asubquery. Setting up a simple benchmark, we can see the difference between RLS,RLS cached, and with RLS disabled. If you want to try it yourself, you can usethis benchmark repository.
For this benchmark, we tested 5 different setups. Two different functions thatare called from two different policies, and one without RLS at all.
RLS with a VOLATILE function
RLS with a STABLE function
RLS with a VOLATILE function + cache
RLS with a STABLE function + cache
No RLS
A volatile function is defined with the keyword VOLATILE that tells Postgresthe function may modify data or return different values uponsuccessive calls. This is the default mode for a new function in Postgres.CREATE OR REPLACE FUNCTION get_current_role()
RETURNS TEXT
LANGUAGE SQL
VOLATILE
SECURITY DEFINER
AS $$
...
$$;
The other option is to use STABLE in our function definition. Stable functionscannot modify data, and are expected to return the same value for successivecalls within the same transaction. When using RLS however, Postgres does notcache the value when evaluating the policy on each row during queries. In orderto successfully cache the result across each policy evaluation, we need to trickPostgres.
When we wrap the function call in a SELECT, Postgres creates an InitPlanquery node type. By default, anything after the USING keyword is executed as aSubPlan type, where Postgres expects that the outcome can change row to row.This is desired as that is what we are checking; for every row, should the userbe allowed to fetch it.
An InitPlan is only run once per execution of the outer plan, and cached forreuse in later rows of the evaluation. Using EXPLAIN, we can see how thedifferent policy definitions change the estimated cost.
-- RLS without subquery: no InitPlan, high cost
CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')::bigint AND get_current_role() = 'admin');
EXPLAIN:
Aggregate (cost=34828.68..34828.69 rows=1 width=40)
-> Index Scan using orders_tenant_id_idx on orders (cost=0.43..34826.20 rows=495 width=6)
Index Cond: (tenant_id = (current_setting('app.tenant_id'::text))::bigint)
Filter: (get_current_role() = 'admin'::text)
-- RLS with subquery: Initplan caches result, lower cost
CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')::bigint AND (SELECT get_current_role()) = 'admin');
EXPLAIN:
Aggregate (cost=10095.69..10095.70 rows=1 width=40)
InitPlan 1
-> Result (cost=0.00..0.26 rows=1 width=32)
-> Index Scan using orders_tenant_id_idx on orders (cost=0.43..10092.95 rows=495 width=6)
Index Cond: (tenant_id = (current_setting('app.tenant_id'::text))::bigint)
Filter: ((InitPlan 1).col1 = 'admin'::text)
The cost= in the explain rows is Postgres' guess at how expensive a querywill be to run, in arbitrary units. The first number is the estimated startupcost; or how expensive it is to do the sorting and filtering of the query beforereturning rows to the user. The second number is the estimated total cost,including fetching all the rows. The rows= and width= are how many expectedrows the query will return, and the width of those rows respectively.
When Postgres doesn't think it can cache the inner query, the cost is over 3xhigher than if it would have been able to. In reality, the actual latencydifference is much larger than 3x as seen in the chart above.
When Postgres doesn't cache expensive functions in your policy definitions, RLSbecomes expensive overhead. RLS can be just as fast as if you weren't using itat all in some scenarios. The issue is that RLS becomes yet another layer ofcode that needs to continuously optimized, where small mistakes can cause largeperformance hits.
It's your house: Permission ownership
It's your house, you obviously have the keys to everything, but what if youweren't supposed to?
Every Postgres table has an owner. Normally you'd control table and row accesson a per-Postgres-role basis, however when you connect to Postgres as the owningrole of a table, none of its RLS policies apply. You must explicitly opt in:ALTER TABLE users FORCE ROW LEVEL SECURITY;
Even this may not be sufficient if you are connected with the Postgres superuserrole. Any roles that contain the SUPERUSER attribute will always bypass RLS.This is easy to miss and easy to test incorrectly. Your policy tests might passunder a non-owner role while production traffic runs as the owner.
Making a ham sandwich: Stricter patterns
Let's say your friend Andy wanted to make a ham sandwich. He had access to thefridge and utensils, but not your grocery list. When he made his sandwich, heused up all the mustard, and now you need to go get more. When using RLS, Andy'squery can't touch our grocery list. We have to update that separately.
Without RLS this is easy. When using RLS, doing this type of query can add a lotof complexity. Getting the utensils, making the sandwich, and updating thegrocery list might not share the same permissions. While rows in one table maybe accessible to a user, updating rows in another may not be. Since we own thegrocery list, we don't want anyone touching it except in well defined scenarios.
One way to solve this is by using multiple roles and multiple transactions, butthis becomes overly cumbersome on our application layer. A better solution wouldbe to add a SECURITY DEFINER function in our database that gives roles accessto modify or view data in a well defined way:CREATE FUNCTION use_ingredients(ingredients text[])
RETURNS void
LANGUAGE plpgsql
SECURITY DEFINER AS $$
BEGIN
-- Runs as the function owner, bypassing Andy's RLS policies
UPDATE grocery_list SET quantity = quantity - 1
WHERE item = ANY(ingredients);
END;
$$;
SECURITY DEFINER causes the function to run as its owner's role, bypassing RLSentirely for that operation. Now you're back to managing security on both RLSand your application layer, ensuring only specific parameters are allowed topass to this function.
Keeping database functions in version control also becomes difficult. Mostmigration tools do not include SQL functions, and adding them to your versioncontrol is a manual process.
Your application layer also needs to stay in sync with every function it callsin your database. Changing function definitions, names, or return values becomesa delicate surgery not only for your application code, but also yourdatabase.
End of the day
Once we have managed locking everything under a different key inside your house,who has what keys, who is allowed in, and who is delegating access for who, wefind our application code has almost as much logic as if it didn't have RLS atall.
RLS policies themselves are stored in pg_policies inside your database, not inyour source code. Most standard migration tools don't track policy changesalongside schema changes. Policy migrations become a separate, manual process,and they drift. A schema change that adds a column or renames a table cansilently break a policy that no one realizes is outdated until something breaksin our application, impacting users.
Each query to the database will already need some sort of modifier in yourapplication code to add local variables for user identification when usingPgBouncer. Misconfigured local variables could be just as damaging as if RLSwasn't there to begin with.
We still need to check early on if a user has permission to run a query, or elsewe risk allowing users to degrade our database performance with spam. If we arealready checking permissions at the application layer, the benefits of RLSbecome harder to observe.
Optimizing queries also becomes much harder. Queries are artificially restrictedto what they are allowed to see, and need bespoke functions and permissions toget access. This causes our management of source code and database logic tobecome even harder to manage, between policies, functions, and the mappingsbetween them.
How to do it right
At PlanetScale, we typically recommend against relying on Postgres RLS.There may be occasional useful scenarios, but when implementing RLScorrectly at scale, the benefits quickly turn into cons with a higheroverhead not only to performance, but also developer experience and complexity.
Application-layer authorization like middleware, ORM-level scoping, or adedicated permissions table keeps your logic visible, testable, and co-locatedwith the code that uses it.
Your database is more like a warehouse. Don't treat it like your house.
Footnotes
Note that PgBouncer pool_mode must be in either session or transaction.statement mode won't work with SET LOCAL at all.