Securing SSH with Certificate-Based Authentication and Short-Lived Certs

The traditional method of managing SSH access-distributing public keys and appending them to `~/.ssh/authorized_keys`-is a scalability nightmare. In a growing infrastructure, this approach leads to "key sprawl," where orphaned keys from departed employees or decommissioned services linger indefinitely on production servers. Revocation is manual, error-prone, and nearly impossible to audit at scale.

To achieve a true Zero Trust architecture, we must move away from static identity (the permanent public key) and toward ephemeral identity (the short-lived certificate). By leveraging an SSH Certificate Authority (CA), we can decouple identity verification from server configuration, enabling a system where access is cryptographically bound to a period of time and a verified identity.

The Fundamental Shift: Keys vs. Certificates

To understand the benefit, we must distinguish between standard SSH key-based authentication and SSH certificate-based authentication.

Static Key-Based Authentication

In the standard model, the client possesses a private key, and the server possesses the corresponding public key. The server trusts the client because the public key is explicitly listed in its `authorized_keys` file. This creates an $O(n \times m)$ management problem, where $n$ is the number of users and $m$ is the number of servers.

Certificate-Based Authentication

In a CA-based model, the server does not store individual user keys. Instead, the server is configured to trust a single SSH Certificate Authority (CA).

When a user wants to connect, they present a certificate signed by the CA. The server verifies the CA's signature using its local copy of the CA's public key. If the signature is valid and the certificate's metadata (principals, validity period) meets the server's requirements, access is granted. This shifts the management problem from $O(n \times m)$ to $O(1)$-you only manage the trust in the CA.

The Mechanics of SSH Certificates

An SSH certificate is essentially an extension of an SSH public key. It contains the original public key plus metadata, including:

Principals: A list of usernames or roles (e.g., `admin`, `deploy-user`, `web-server-01`) that the certificate is authorized to act as.
Validity Period: An explicit `start_time` and `expire_time`.
Extensions: Additional constraints, such as source IP restrictions or forced commands.

The signing process is performed using the `ssh-keygen` utility (or via an automated API). The command structure looks roughly like this:

```bash

ssh-keygen -s ca_key -I user_identity -n web-admin -V +4h id_rsa.pub

```

In this example:

`-s ca_key`: Uses the CA's private key to sign the key.
`-I user_identity`: An identifier for the session (useful for logging).

modeling.

`-n web-admin`: Assigns the `web-admin` principal to the certificate.
`-V +4h`: Sets the expiration to 4 hours from now.

When the user attempts to connect, `sshd` checks the `TrustedUserCAKeys` directive in `/etc/ssh/sshd_config`. If the certificate was signed by the key listed there, and the user is attempting to log in as a principal listed in the certificate, the handshake succeeds.

Implementing Short-Lived Certificates

The true power of this architecture is realized when certificates are "short-lived." In a mature implementation, certificates should expire in minutes or hours, not days.

The Automated Workflow

A robust implementation follows this lifecycle:

Identity Authentication: A user authenticates against a central Identity Provider (IdP) using modern protocols like OIDC or SAML (e.g., Okta, Google, or GitHub).
The Certificate Request: Upon successful authentication, the user's local machine generates a new, ephemeral SSH key pair. The public key is sent to a "Signing Service."
Verification and Signing: The Signing Service verifies the user's OIDC token. If valid, it uses the CA private key (stored securely in a Hardware Security Module or a secret manager like HashiCorp Vault) to sign the ephemeral public key.
Deployment: The signed certificate is returned to the user's local SSH agent.
Access: The user connects to the target host. The host validates the certificate against the CA public key.
Expiration: Once the 4-hour window closes, the certificate becomes cryptographically invalid. No manual cleanup on the target host is required.

This workflow eliminates the need for Revocation Lists (KRLs). Because the window of exposure is so small, the risk of a compromised certificate is naturally mitigated by its expiration.

Operational Considerations

Transitioning to a CA-based model requires significant infrastructure investment.

1. CA Key Protection

The CA private key is the "crown jewel." If an attacker gains access to this key, they can mint certificates for any user on any server in your fleet.

Never store the CA key on a persistent disk on a general-purpose server.
Use an HSM (Hardware Security Module) or a managed service like AWS KMS or Google Cloud KMS to perform the signing operations.
Implement strict IAM policies around the Signing Service.

2. Observability and Auditing

Since the server no longer sees "who" is connecting via `authorized_keys`, auditing must happen at the Signing Service level. Every certificate issuance should be logged with:

The identity of the user (from the IdP).
The principals requested.
The timestamp and expiration.
The source IP of the request.

3. Handling Clock Drift

Since certificates rely on `start_time` and `expire_time`, clock synchronization is critical. If a target server's clock drifts backward, it might accept expired certificates; if it drifts forward, it might reject valid ones. Implementing NTP (Network Time Protocol) across

Conclusion

As shown across "The Fundamental Shift: Keys vs. Certificates", "The Mechanics of SSH Certificates", "Implementing Short-Lived Certificates", a secure implementation for securing ssh with certificate-based authentication and short-lived certs depends on execution discipline as much as design.

The practical hardening path is to enforce strict token/claim validation and replay resistance, deterministic identity policy evaluation with deny-by-default semantics, and certificate lifecycle governance with strict chain/revocation checks. This combination reduces both exploitability and attacker dwell time by forcing failures across multiple independent control layers.

Operational confidence should be measured, not assumed: track false-allow rate and time-to-revoke privileged access and mean time to detect and remediate configuration drift, then use those results to tune preventive policy, detection fidelity, and response runbooks on a fixed review cadence.

Recommended Next Steps

If this topic is relevant to your organisation, use one of these paths:

Compare service options to identify the right engagement model.
Download the CE+ readiness checklist for practical implementation steps.
Run the interactive security posture quiz for a quick baseline.
Request a scoped quote or book a discovery call.

Securing SSH with Certificate-Based Authentication and Short-Lived Certs

Securing SSH with Certificate-Based Authentication and Short-Lived Certs

The Fundamental Shift: Keys vs. Certificates

Static Key-Based Authentication

Certificate-Based Authentication

The Mechanics of SSH Certificates

Implementing Short-Lived Certificates

The Automated Workflow

Operational Considerations

1. CA Key Protection

2. Observability and Auditing

3. Handling Clock Drift

Conclusion

Related Articles

Recommended Next Steps