Skip to content

Setting up Authentication

Nick Sweeting edited this page Oct 5, 2024 · 121 revisions

Setting Up Authentication

πŸ’¬ We offer consulting services to set up, integrate, and maintain ArchiveBox with your org's auth & hosting.
If you need support, advanced development to capture difficult sites, audit logging, and more, we can provide it!

We use this revenue (from corporate clients who can afford to pay) to support open source development and keep ArchiveBox free.


ArchiveBox supports several types of authentication for users logging in via the Admin Web UI or REST API.

Set Up Admin Web UI Permissions

Non-admin user permissions are only available to paying ArchiveBox clients

Use these three options to set up your desired permissions for non-admin guest users:

Note

Open source ArchiveBox does not support setting up non-admin users & groups with custom permissions. We do offer this feature, audit logging, and more to paying clients.



Admin Web UI Authentication Methods


Username & Password (the default)

You need a user account to access the Admin UI, you can run the commands below to create/edit a user from the CLI:

archivebox manage createsuperuser
archivebox manage changepassword <username>

# equivalent: docker compose run archivebox manage [...]
# equivalent: docker run -v $PWD:/data archivebox/archivebox manage [...]

Tip

If using Docker, you can set ADMIN_USERNAME & ADMIN_PASSWORD to auto-create an admin account on first run.

Existing users can be managed from the Admin UI here: /admin/auth/user/,
and you can change your password in the UI here: /admin/password_change/.



Reverse Proxy Authentication

Can be used with a reverse proxy auth provider like oauth2-proxy, Cloudflare Zero Trust, Authentik, and others.

Set these ArchiveBox configuration values based on your reverse proxy setup and needs:

# REQUIRED: the header where your upstream reverse proxy will place the authenticated user's username/email
# EXAMPLE: Cf-Access-Authenticated-User-Email (if using Cloudflare Access / Zero Trust)
REVERSE_PROXY_USER_HEADER=X-Remote-User

# REQUIRED: the IP/CIDR of your upstream reverse proxy server
# WARNING: make sure this range contains ONLY your reverse proxy server!
# ArchiveBox will completely trust any IP in this range for authentication
REVERSE_PROXY_WHITELIST=192.0.2.3/32

# OPTIONAL: redirect users to an external URL after they log out
LOGOUT_REDIRECT_URL=https://auth.yourcompany.example.com/after/logout

LDAP Authentication

Can be used with an SSO provider like Authentik, Authelia, Okta / Auth0, Keycloak, and others.

First, pip-install the ldap add-on to use this feature (not needed for Docker Archivebox).

pip install archivebox[ldap]

Then set these configuration values to finish configuring LDAP:

LDAP=True
LDAP_SERVER_URI="ldap://ldap.example.com:3389"
LDAP_BIND_DN="ou=archivebox,ou=services,dc=ldap.example.com"
LDAP_BIND_PASSWORD="secret-bind-user-password"
LDAP_USER_BASE="ou=users,ou=archivebox,ou=services,dc=ldap.example.com"
LDAP_USER_FILTER="(objectClass=user)"

LDAP_USERNAME_ATTR="uid"
LDAP_FIRSTNAME_ATTR="givenName"
LDAP_LASTNAME_ATTR="sn"
LDAP_EMAIL_ATTR="mail"

Not Yet Supported: SAML / OAuth2 / OpenID Authentication

We'd welcome PRs to add support for these using django-allauth!

These methods are not natively supported by ArchiveBox at the moment. However it is still possible to use them with ArchiveBox by running your own IdP (Identity Provider) server to act as a bridge (e.g. Authentik, Authelia, oauth2-proxy).

The IdP server can act as a middleman gateway to authenticate users using an external SAML/OAuth/OpenID/etc. provider (e.g. Google, Microsoft, Github, Facebook, etc.), and then pass on the authenticated user's session info to ArchiveBox using LDAP or reverse proxy headers (as described above).




REST API

The REST API (available starting in v0.8.0) supports several methods of authentication for convenience.

To see API docs, try endpoints interactively, and see how auth works, visit this URL on your ArchiveBox server:
http://127.0.0.1:8000/api/v1/docs

Screenshot of django-ninja Swagger API docs page



To get started using the REST API, you can generate an API key for your user in the Admin Web UI:
http://127.0.0.1:8000/admin/api/apitoken/add/

or by calling the http://127.0.0.1:8000/api/v1/auth/get_api_token endpoint with a username & password:

curl -X 'POST' \
  'http://127.0.0.1:8000/api/v1/auth/get_api_token' \
  -H 'Content-Type: application/json'
  -d '{"username": "YOURUSERNAMEHERE", "password": "YOURPASSWORDHERE"}'

Tip

Bearer Tokens are the recommended method for the best balance of security and convenience.

API Bearer Token Authentication

Pass Authorization=Bearer YOURAPITOKENHERE as a request header.

curl -X 'GET' \
  'http://127.0.0.1:8000/api/v1/core/snapshots?limit=10' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer YOURAPITOKENHERE'

API Request Header Authentication

This method is provided in case you have a reverse proxy in front of ArchiveBox that consumes the bearer header.

Pass X-ArchiveBox-API-Key=YOURAPITOKENHERE as a request header.

curl -X 'GET' \
  'http://127.0.0.1:8000/api/v1/core/snapshots?limit=10' \
  -H 'accept: application/json' \
  -H 'X-ArchiveBox-API-Key: YOURAPITOKENHERE'

API Query Parameter Authentication

Warning

This method is sometimes known as "Capability URLs" because anyone in possession of the URL can perform API actions. It comes with important security caveats and is not recommended unless you fully understand the risks.

Pass api_key=YOURAPITOKENHERE as a GET/POST query parameter.

curl -X 'GET' \
  'http://127.0.0.1:8000/api/v1/core/snapshots?limit=10&api_key=YOURAPITOKENHERE' \
  -H 'accept: application/json'

API Session Cookie Authentication

Caution

We recommend sticking to header-based authentication and not using this method unless you deeply understand the CSRF/CORS security risks. This method is mostly useful when accessing the API from external apps where CSRF/CORS is not a concern (e.g. curl, mobile apps, other servers, etc.).

Browsers enforce that requests made to the ArchiveBox API from other origins will not include any session cookies by default. This is is a foundational security principle of the web that protects you from API requests being initiated by JS on websites you don't control (aka CSRF/CORS attacks).

To allow incoming POST/PUT/DELETE requests from other domains that you trust, you must add them to CSRF_TRUSTED_ORIGINS in the archivebox/core/settings.py source code on your machine (open an issue and explain your use-case for help).

Log in via the Admin Web UI: /admin/login/, you can then re-use your login session id (stored in the sessionid cookie) for REST API requests. By default, this only allows you to make requests from the same domain ArchiveBox is being served on (e.g. from browser devtools open on an ArchiveBox page or CLI tools).

curl -X 'GET' \
  'http://127.0.0.1:8000/api/v1/core/snapshots?limit=10' \
  -H 'accept: application/json' \
  -H 'Cookie: sessionid=YOURSESSIONIDVALUEHERE'

API HTTP Basic Authentication

Caution

This method is fairly uncommon and is only useful in a few niche situations where the other methods are not available.
We will likely remove this method in a future ArchiveBox release if nobody uses it.
If you rely on this method and want us to keep it, please open an issue and explain your use-case!

Pass your ArchiveBox admin username & password via HTTP Basic Authentication.

curl -X 'GET' \
  'http://127.0.0.1:8000/api/v1/core/snapshots?limit=10' \
  -u 'YOURUSERNAMEHERE:YOURPASSWORDHERE'
  -H 'accept: application/json'

Further Reading

Clone this wiki locally