Authentication Patterns for Multi-Tenant SaaS Applications
Multi-tenancy adds a dimension to authentication that single-tenant applications do not have: you must not only verify who a user is, but also which tenant they belong to and what they are authorized to do within that tenant. Getting this wrong leads to the most damaging class of security vulnerabilities in SaaS: cross-tenant data leakage, where User A in Tenant X can access data belonging to Tenant Y. We have built authentication systems for four multi-tenant SaaS products at Harbor Software. This post covers the patterns we use, the tradeoffs we have encountered, and the specific mistakes to avoid.
Tenant Identification: The Foundation
Before you authenticate a user, you must identify the tenant. There are three common approaches, each with distinct tradeoffs:
Subdomain-based: tenant-a.yourapp.com, tenant-b.yourapp.com. This is the cleanest UX because the tenant context is always visible in the URL. It requires wildcard DNS and wildcard TLS certificates (straightforward with Let’s Encrypt and cert-manager). The main drawback is that subdomain-based routing breaks when a tenant needs a custom domain, which enterprise customers invariably request. Custom domain support requires a reverse proxy that maps arbitrary domains to tenant IDs, adding infrastructure complexity. We use Caddy for this because it handles automatic TLS certificate provisioning per-domain, but Nginx with lua-resty-auto-ssl works too.
Path-based: yourapp.com/tenant-a/dashboard, yourapp.com/tenant-b/dashboard. Simpler infrastructure (single domain, single certificate), but muddies the URL structure and makes it easy for a routing bug to leak tenant context across paths. We have seen this go wrong when a developer adds a new route that forgets to include the tenant prefix, creating a path that defaults to the wrong tenant or, worse, returns data from all tenants.
Header/token-based: The tenant is identified by a claim in the JWT or a custom HTTP header. This decouples tenant identification from URL structure entirely, which is flexible but invisible. Developers cannot tell which tenant a request is for by looking at the URL, which makes debugging harder and increases the risk of accidentally ignoring the tenant context in new code.
We use subdomain-based identification with token-based fallback. The subdomain is the primary identifier for browser-based access. The JWT claim is the primary identifier for API access (because API clients do not care about subdomains). Both must agree: if a request comes to tenant-a.yourapp.com but the JWT contains tenant_id: "tenant-b", we reject it with a 403. This cross-check prevents a class of bugs where a user switches tenants in their browser but their token still references the old tenant.
// Middleware: tenant identification and cross-check
export async function tenantMiddleware(
req: Request, res: Response, next: NextFunction
) {
const subdomainTenant = extractTenantFromSubdomain(req.hostname);
const tokenTenant = req.auth?.tenant_id;
// API requests: trust the token
if (req.headers['x-api-client']) {
if (!tokenTenant) {
return res.status(401).json({ error: 'missing_tenant_claim' });
}
req.tenantId = tokenTenant;
return next();
}
// Browser requests: subdomain is primary, token must match
if (!subdomainTenant) {
return res.status(400).json({ error: 'missing_tenant_context' });
}
if (tokenTenant && tokenTenant !== subdomainTenant) {
// Log this mismatch for security monitoring
securityLog.warn('tenant_mismatch', {
subdomain: subdomainTenant,
token: tokenTenant,
user: req.auth?.sub
});
return res.status(403).json({ error: 'tenant_mismatch' });
}
req.tenantId = subdomainTenant;
next();
}
JWT Design for Multi-Tenancy
The JWT is where authentication and tenant authorization converge. A well-designed JWT for a multi-tenant system includes:
{
"sub": "user_01HXYZ",
"email": "alice@company.com",
"tenant_id": "tn_acme_corp",
"org_role": "admin",
"permissions": [
"projects:read",
"projects:write",
"billing:read",
"members:manage"
],
"tenants": [
{ "id": "tn_acme_corp", "role": "admin" },
{ "id": "tn_startup_inc", "role": "viewer" }
],
"iat": 1704931200,
"exp": 1704934800,
"iss": "https://auth.yourapp.com"
}
Several design decisions matter here:
The tenant_id claim represents the current tenant context, not all tenants the user has access to. A user who belongs to three tenants gets a JWT scoped to whichever tenant they are currently working in. Switching tenants requires obtaining a new token (either by re-authenticating or by exchanging the current token for one scoped to a different tenant via a token exchange endpoint). This prevents a class of bugs where code accidentally uses the user’s privileges in Tenant A to access data in Tenant B. The token exchange endpoint verifies that the user has access to the requested tenant before issuing a new token.
Permissions are embedded in the token, not looked up at request time. This avoids a database round-trip on every request to check permissions. The tradeoff is that permission changes do not take effect until the token expires. With a 1-hour token lifetime and silent background refresh, the maximum delay is 1 hour, which is acceptable for most applications. For operations where immediate permission enforcement matters (deleting an organization, revoking admin access, changing payment methods), we check the database directly regardless of what the token says. We call these “sensitive operations” and they are marked with a @requireFreshPermissions decorator that bypasses the token-based cache.
Short token lifetimes with refresh tokens. Access tokens expire in 1 hour. Refresh tokens expire in 30 days. The refresh token is stored in an HttpOnly, Secure, SameSite=Strict cookie. The access token is stored in memory (not localStorage, not sessionStorage) on the client side. This prevents XSS attacks from stealing the access token (it is not in any persistent storage) and prevents CSRF attacks from using the refresh token (SameSite=Strict blocks cross-origin requests). When the page reloads, the access token is lost and the client silently refreshes it using the cookie-based refresh token. This adds one extra HTTP request on page load, which adds approximately 50ms of latency, but eliminates the XSS risk entirely.
We also implement token revocation via a deny-list. When a user’s access is revoked (fired from the organization, password reset, admin action), their refresh token is added to a Redis-backed deny-list. The deny-list is checked on every refresh token exchange, ensuring revocation takes effect within 1 hour (the access token TTL) even though JWTs are stateless. For immediate revocation requirements, the deny-list also contains short-lived entries for access tokens, checked on every request for sensitive operations.
Row-Level Security: The Last Line of Defense
Regardless of how well you implement authentication and authorization in your application code, you need a database-level safety net. Application code has bugs. Middleware gets bypassed. New endpoints get added without tenant filtering. Row-level security (RLS) in PostgreSQL ensures that even if your application code forgets to filter by tenant, the database itself enforces the boundary.
-- Enable RLS on all tenant-scoped tables
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
ALTER TABLE invoices ENABLE ROW LEVEL SECURITY;
-- Force RLS even for table owners (prevents bypass)
ALTER TABLE projects FORCE ROW LEVEL SECURITY;
ALTER TABLE documents FORCE ROW LEVEL SECURITY;
ALTER TABLE invoices FORCE ROW LEVEL SECURITY;
-- Policy: users can only see rows belonging to their tenant
CREATE POLICY tenant_isolation ON projects
USING (tenant_id = current_setting('app.current_tenant')::text);
CREATE POLICY tenant_isolation ON documents
USING (tenant_id = current_setting('app.current_tenant')::text);
CREATE POLICY tenant_isolation ON invoices
USING (tenant_id = current_setting('app.current_tenant')::text);
In your application, set the tenant context at the start of every database transaction:
// Database middleware: set tenant context for every query
export function createTenantConnection(tenantId: string) {
return {
async query(sql: string, params: any[]) {
const client = await pool.connect();
try {
await client.query('BEGIN');
await client.query(
`SET LOCAL app.current_tenant = $1`, [tenantId]
);
const result = await client.query(sql, params);
await client.query('COMMIT');
return result;
} catch (e) {
await client.query('ROLLBACK');
throw e;
} finally {
client.release();
}
}
};
}
The SET LOCAL is critically important: it scopes the setting to the current transaction only. If you use SET without LOCAL, the tenant context persists on the connection after the transaction ends, which can leak between requests if you use connection pooling. We caught this exact bug in a code review: a developer used SET instead of SET LOCAL, and under high concurrency, requests from Tenant A were sometimes executed with Tenant B’s context because they reused the same pooled connection. The bug was intermittent and extremely difficult to reproduce in testing; it only manifested under production-level concurrency.
Session Management and Token Lifecycle
Multi-tenant session management introduces complexity beyond standard single-tenant JWT flows. The primary challenge is handling users who belong to multiple tenants and need to switch between them without re-authenticating. Our implementation uses a session hierarchy: a global session (tied to the user’s identity) and tenant-scoped sessions (tied to the user’s context within a specific tenant).
The global session is established when the user authenticates (via password, SSO, or social login). It lives as an HttpOnly cookie and contains the user’s identity but no tenant context. The tenant-scoped session is established when the user selects a tenant (either by navigating to a tenant subdomain or explicitly choosing one from a tenant picker). The tenant-scoped session produces the JWT described earlier, with the tenant_id claim and tenant-specific permissions.
The token exchange endpoint that creates tenant-scoped sessions performs three checks: (1) the user’s global session is valid and not expired, (2) the user has an active membership in the requested tenant, and (3) the user’s account is not suspended or locked in that tenant. If all three checks pass, a new tenant-scoped JWT is issued. If any check fails, the exchange is rejected and the reason is logged for security monitoring.
// Token exchange endpoint: switch tenant context
app.post('/auth/exchange', authenticate, async (req, res) => {
const { target_tenant_id } = req.body;
const userId = req.auth.sub;
// Check membership in target tenant
const membership = await db.query(
'SELECT role, status FROM tenant_members WHERE user_id = $1 AND tenant_id = $2',
[userId, target_tenant_id]
);
if (!membership || membership.status !== 'active') {
securityLog.warn('tenant_exchange_denied', {
user: userId,
target: target_tenant_id,
reason: membership ? 'suspended' : 'not_a_member'
});
return res.status(403).json({ error: 'access_denied' });
}
// Fetch tenant-specific permissions
const permissions = await getPermissionsForRole(
membership.role, target_tenant_id
);
// Issue new tenant-scoped token
const token = signJWT({
sub: userId,
tenant_id: target_tenant_id,
org_role: membership.role,
permissions: permissions,
}, { expiresIn: '1h' });
res.json({ access_token: token });
});
This approach ensures that switching tenants always checks current authorization state. A user who was removed from Tenant B five minutes ago will fail the exchange check, even if their browser still has a valid global session. Without this check, a stale JWT from before the removal could grant access until it expires.
We also implement session binding: the refresh token is bound to the tenant context it was issued for. Refreshing a token for Tenant A does not produce a token valid for Tenant B. This prevents a subtle attack where a malicious browser extension captures a refresh token from one tenant context and attempts to use it in another. Each refresh token is stored in the database alongside its tenant ID, and the refresh endpoint verifies that the tenant context matches before issuing a new access token.
For enterprise customers who use SAML or OIDC-based single sign-on, we support identity provider-initiated tenant selection. The SSO assertion includes a custom attribute that specifies which tenant the user is logging into. This attribute is validated against the user’s actual tenant membership (we never trust the IdP blindly because a misconfigured IdP could assert access to tenants the user does not belong to). The IdP attribute serves as a hint for which tenant to activate, but the authorization decision is always made by our system based on the membership database.
Common Mistakes and How to Avoid Them
Mistake 1: Trusting client-supplied tenant IDs without validation. If your API accepts a tenant_id parameter in the request body and uses it to filter data, an attacker can pass any tenant ID and access any tenant’s data. The tenant context must come from the authenticated session (JWT claim or subdomain), never from request parameters. This seems obvious, but we have found this vulnerability in three out of four multi-tenant codebases we have audited. It usually happens when a developer builds an internal admin endpoint and uses a request parameter for tenant selection, then that endpoint is accidentally exposed to regular users.
Mistake 2: Missing tenant filters on aggregate queries. Individual record lookups usually include tenant filtering because the developer thinks about it. Aggregate queries (reports, dashboards, export functions) often miss tenant filtering because the developer focuses on the aggregation logic and forgets the access control. We audit every new database query for tenant filtering by requiring a linter rule that flags any query on a tenant-scoped table that does not include a WHERE tenant_id = ? clause. The RLS layer catches this at the database level, but we prefer to catch it in code review because RLS errors surface as empty results rather than explicit failures, which can be confusing to debug.
Mistake 3: Leaking tenant context in error messages. A 404 error that says “Project proj_abc not found in tenant tn_acme_corp” reveals the tenant name to an attacker probing for valid tenant IDs. Error messages should never include tenant-specific information. Use generic messages: “Resource not found.” Similarly, timing-based information leaks are a concern: if a request for a resource in the correct tenant returns 404 in 5ms but a request for a resource in the wrong tenant returns 404 in 2ms (because the tenant check short-circuits before the resource lookup), an attacker can enumerate valid tenant IDs by measuring response times. Ensure your error paths take consistent time regardless of which check fails.
Mistake 4: Sharing caches across tenants. If Tenant A and Tenant B both request the same report, and you cache the report with a key that does not include the tenant ID, Tenant B might receive Tenant A’s cached report. Every cache key must include the tenant ID. This sounds obvious but we have seen it happen three times in production codebases we audited.
// Better: enforce at the cache layer so individual call sites
// cannot forget
class TenantScopedCache {
constructor(
private cache: Redis, private tenantId: string
) {}
async get(key: string): Promise<string | null> {
return this.cache.get(`${this.tenantId}:${key}`);
}
async set(
key: string, value: string, ttl: number
): Promise<void> {
await this.cache.setex(
`${this.tenantId}:${key}`, ttl, value
);
}
}
Mistake 5: Not testing cross-tenant access in automated tests. Every API endpoint should have at least one test that authenticates as a user in Tenant A and attempts to access a resource belonging to Tenant B. This test should return a 404 (not a 403, to avoid confirming the resource exists in another tenant). If you do not have these tests, you will eventually ship a cross-tenant vulnerability. It is not a matter of if, but when. We run these tests as part of our CI pipeline, and they have caught 5 cross-tenant bugs before they reached production in the past year.
One additional consideration for enterprise deployments: audit logging for all authentication events. Every login, logout, token exchange, permission check failure, and cross-tenant access attempt should be logged with the user ID, tenant ID, IP address, and user agent. These logs serve dual purposes: compliance (SOC 2 and ISO 27001 require authentication audit trails) and security incident investigation (when a suspicious access pattern is detected, the audit log provides the forensic evidence needed to determine scope and impact). We store authentication audit logs in a separate, append-only data store with a 13-month retention period, which satisfies both SOC 2 annual audit requirements and gives us a full year of historical data for pattern analysis. The audit log is immutable: even database administrators cannot modify or delete entries, which is a requirement for several compliance frameworks.
For organizations subject to GDPR or similar data protection regulations, tenant isolation in the authentication layer has additional implications. User data processed in the context of one tenant must not be accessible from another tenant’s context, even for internal reporting or analytics. This means your analytics pipeline must be tenant-aware: aggregate metrics can span tenants, but any drill-down that reaches individual user data must be scoped to a single tenant. We enforce this at the query layer using the same RLS policies that protect the application data, ensuring that a dashboard query from Tenant A’s admin cannot accidentally include user activity from Tenant B.
Authentication in multi-tenant SaaS is not harder than single-tenant authentication in terms of the cryptographic primitives or protocol flows. It is harder because every layer of your stack (routing, middleware, application code, database, caching) must be tenant-aware, and a failure at any single layer can expose data across tenants. Defense in depth is not a nice-to-have; it is the only approach that works in practice.