Example: Deploying JupyterHealth Exchange on Kubernetes

This page documents how to deploy the jupyterhealth-exchange application onto a kubernetes cluster running on AWS. In this case, the associated JupyterHub happened to also be running on AWS, but in a different AWS account.

Create the Kubernetes Cluster¶

Define the cluster in a configuration file, cluster.yml. Specify values that are appropriate for your deployment.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: jhe
  region: us-east-2
  version: '1.30'

vpc:
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
  nat:
    gateway: Single

nodeGroups:
  - name: public-nodes
    instanceType: t2.micro
    desiredCapacity: 2
    privateNetworking: false

managedNodeGroups:
  - name: system-nodes
    instanceType: t2.small
    privateNetworking: true
    minSize: 1
    maxSize: 3

The configuration will be provided to eksctl which in this case had access to the following environment variables:

AWS_SECRET_ACCESS_KEY
AWS_ACCESS_KEY_ID
AWS_DEFAULT_REGION

Create the cluster:

eksctl create cluster -f cluster.yml

Install Cluster Components¶

Install ingress-nginx¶

First, prepare parameters in ingress-nginx.yaml:

controller:
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
  config:
    use-forwarded-headers: "true"

Then run the following:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx -f ingress-nginx.yml

Install certmanager¶

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.15.4 \
  --set crds.install=true \
  --set installCRDs=true \
  --wait

Create a Database¶

This is an example configuration for an Amazon RDS PostgreSQL instance. Use values appropriate for your deployment. The VPC-related values would come from identifiers created when the cluster was created.

Example RDS¶

AWS RDS Configuration

Parameter	Value
Creation method	Standard create
Engine type	PostgreSQL
Engine version	16.3-R3
Templates	Dev/Test
Availability and durability, deployment	Multi-AZ DB Instance
DB instance identifier	jhe-db-staging-1
Credentials management	Self managed, not auto generated
DB instance class	Burstable classes, db.t3.small
Storage type	General Purpose SSD (gp2)
Allocated storage	100 GiB
Enable storage autoscaling	yes
Maximum storage threshold	1000 GiB
Compute resource	Don’t connect to an EC2 compute resource
VPC	eksctl-jhe-cluster/VPC
DB subnet group	create new db subnet group
Public access	no
VPC security group	choose existing
Existing VPC security groups	default, `eks-cluster-jhe-...`
Database authentication	password
Enable Performance insights	yes
Retention period	7 days (free tier)
AWS KMS key	(default) aws/rds
Initial database name	`jhe`

Note the attributes of the database, e.g.

Database Attributes

Parameter	Value
db identifier	`database-1`
endpoint	`database-1...rds.amazonaws.com`
port	`5432`
master username	`postgres`
secret value	(your secret)
rotation	`365d`

Test the Database¶

Launch a shell in the cluster.

$ kubectl run postgres-test -it --rm --image=postgres:16.3 -- bash
If you don't see a command prompt, try pressing enter.
root@postgres-test:/#

Use the database endpoint, username, and secret to connect to the database you created.

root@postgres-test:/# psql -h {endpoint} -U {master username} -d postgres
Password for user postgres:
psql (16.3 (Debian 16.3-1.pgdg120+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression:
off)
Type "help" for help.

postgres=>

Seed Data into the Database¶

Migrate Database¶

Create a Job to migrate the database using our existing ConfigMap.

# job-manage-migrate.yml
apiVersion: batch/v1
kind: Job
metadata:
  name: jhe-manage-migrate
  namespace: jhe
spec:
  template:
    metadata:
      name: jhe-manage-migrate
    spec:
      restartPolicy: Never
      containers:
      - name: jhe-manage-migrate
        image: ryanlovett/jupyterhealth-exchange:a30ad58
        command: ["python", "manage.py", "migrate"]
        envFrom:
        - configMapRef:
            name: jhe-config

Run the job.

kubectl apply -f job-manage-migrate.yml

Seed the Database¶

This requires the seed.sql file from the jupyterhealth-exchange repository, and a new python script, jhe/scripts/seed.py to import it. seed.py is currently available in a pull request to jupyterhealth-exchange.

Injest them as ConfigMaps by running the following commands from within the working directory of jupyterhealth-exchange.

kubectl -n jhe create configmap db-seed-sql --from-file=db/seed.sql
kubectl -n jhe create configmap jhe-scripts-seed.py --from-file=jhe/scripts/seed
.py

Create a Job to seed the database.

apiVersion: batch/v1
kind: Job
metadata:
  name: import-seed
  namespace: jhe
spec:
  template:
    metadata:
      name: import-seed
    spec:
      containers:
      - name: import-seed
        image: ryanlovett/jupyterhealth-exchange:a30ad58
        command: ["python", "/app/seed.py"]
        envFrom:
        - configMapRef:
            name: jhe-config
        volumeMounts:
        - name: seed-sql
          mountPath: /app/seed.sql
          subPath: seed.sql
        - name: seed-py
          mountPath: /app/seed.py
          subPath: seed.py
      restartPolicy: Never
      volumes:
      - name: seed-sql
        configMap:
          name: db-seed-sql
      - name: seed-py
        configMap:
          name: jhe-scripts-seed.py

and run it

kubectl apply -f job-import-seed.yml

Install the Application¶

Finally, install the application into the cluster. jhe-example.yml is provided as example kubernetes configuration, although you will need to substitute values appropriate for your deployment.

kubectl apply -f jhe-example.yml

Administering JHE¶

Login to your JupyterHealth Exchange app, https://jhe.example.org/admin/
Under Django OAuth Toolkit, add application
a. Save Client id
b. Add space-separated redirect uris for hubs
c. Client type: Public
d. Authorization grant type: Authorization code
e. Client secret: {client secret}
f. Hash client secret: yes
g. Skip authorization: yes
h. Algorithm: RSA with SHA-2 256

Authenticating JupyterHub with JHE¶

In order for users of JupyterHub to have access to JHE, the simplest way is to use JHE as the OAuth provider for logging into JupyterHub. To do that, configure. Below is the configuration to login to JupyterHub with JHE as OAuth provider:

hub-jhe-auth.yaml

hub:
  config:
    JupyterHub:
      # first chunk:use Exchange as oauth provider
      authenticator_class: generic-oauth
      GenericOAuthenticator:
        client_id: ${{ saved from JHE }}
        cookie_max_age_days: 1
        authorize_url: https://jhe.example.org/authorize/
        token_url: https://jhe.example.org/o/token/
        userdata_url: https://jhe.example.org/api/v1/users/profile
        username_claim: email
        login_service: JupyterHealth Exchange
        scope:
          - openid
        admin_users:
          - email@example.org
        enable_auth_state: true
        # grant specific users access by email
        allowed_users:
          - user-email@example.org
        # or allow all JHE users to access the Hub with:
        # allow_all: true
        # see other example for group-based access
    extraConfig:
      # add access tokens from auth state to user env
      auth_state_env.py: |
        def auth_state_env(spawner, auth_state):
            if not auth_state:
                spawner.log.warning(f"Missing auth state for user {spawner.user.name}")
                return
            spawner.environment["JHE_TOKEN"] = auth_state["access_token"]

        c.Spawner.auth_state_hook = auth_state_env
singleuser:
  extraEnv:
    JHE_URL: https://jhe.example.org

You have 3 choices for authorizing JHE users to access the Hub:

allow any JHE user to use the Hub. In which case, set:
```
GenericOAuthenticator:
  allow_all: true
```

allow specific users by email address:

GenericOAuthenticator:
  allowed_users:
    - user@example.org

allow based on organization membership in JHE, which requires a bit more configuration.

Authorizing the Hub via JHE organization¶

To authorize access to the Hub based on JHE organization membership, we need to connect JupyterHub groups with JHE organizations. This lets you manage access to the Hub in the JHE UI by adding/removing users to the authorized groups.

[In JHE] create the organization(s) that you want to grant access to the Hub. Note the integer “organization id” of each organization (they probably look like 2000X).
[In JHE] add users to these organizations

configure JupyterHub to populate group membership based on JHE organization membership:

hub-jhe-access-groups.yaml

hub:
  config:
    JupyterHub:
      GenericOAuthenticator:
        # grant access based on JHE organization membership
        manage_groups: true
        auth_state_groups_key: "organizations"
        allowed_groups:
          # the integer id (in quotes) in JHE of organizations to allow access to the Hub
          - "2XXXX"
    extraConfig:
      # get organization membership for managed groups:
      managed_organizations.py: |
        from urllib.parse import urlparse

        async def auth_state_hook(authenticator, auth_state):
            if not auth_state:
                return auth_state
            access_token = auth_state["access_token"]
            url = urlparse(authenticator.authorize_url)
            org_url = f"{url.scheme}://{url.netloc}/api/v1/users/organizations"
            organizations = await authenticator.httpfetch(
                org_url,
                headers={"Authorization": f"Bearer {access_token}"}
            )
            # use string ids for now
            auth_state["organizations"] = [str(org['id']) for org in organizations]
            return auth_state

        c.OAuthenticator.modify_auth_state_hook = auth_state_hook

Accessing JHE from the Hub¶

With the above configuration, when a user logs in to the Hub, two environment variables are set when a user starts their server:

$JHE_URL  # the URL of the Exchange
$JHE_TOKEN  # the user's access token for the Exchange

You can use these to make API requests to the Exchange. There is also the jupyterhealth-client package, which you can add to your user image:

pip install --pre jupyterhealth-client

And then you can use the JupyterHealthClient class to fetch patient data.

JupyterHealth Software

Cloud Storage Service