Skip to article frontmatterSkip to article content

Example: Deploying JupyterHealth Exchange on Kubernetes

This page documents how to deploy the jupyterhealth-exchange application onto a kubernetes cluster running on AWS. In this case, the associated JupyterHub happened to also be running on AWS, but in a different AWS account.

Create the Kubernetes Cluster

Define the cluster in a configuration file, cluster.yml. Specify values that are appropriate for your deployment.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: jhe
  region: us-east-2
  version: '1.30'

vpc:
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
  nat:
    gateway: Single

nodeGroups:
  - name: public-nodes
    instanceType: t2.micro
    desiredCapacity: 2
    privateNetworking: false

managedNodeGroups:
  - name: system-nodes
    instanceType: t2.small
    privateNetworking: true
    minSize: 1
    maxSize: 3

The configuration will be provided to eksctl which in this case had access to the following environment variables:

Create the cluster:

eksctl create cluster -f cluster.yml

Install Cluster Components

Install ingress-nginx

First, prepare parameters in ingress-nginx.yaml:

controller:
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
  config:
    use-forwarded-headers: "true"

Then run the following:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx -f ingress-nginx.yml

Install certmanager

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.15.4 \
  --set crds.install=true \
  --set installCRDs=true \
  --wait

Create a Database

This is an example configuration for an Amazon RDS PostgreSQL instance. Use values appropriate for your deployment. The VPC-related values would come from identifiers created when the cluster was created.

Example RDS

AWS RDS Configuration

ParameterValue
Creation methodStandard create
Engine typePostgreSQL
Engine version16.3-R3
TemplatesDev/Test
Availability and durability, deploymentMulti-AZ DB Instance
DB instance identifierjhe-db-staging-1
Credentials managementSelf managed, not auto generated
DB instance classBurstable classes, db.t3.small
Storage typeGeneral Purpose SSD (gp2)
Allocated storage100 GiB
Enable storage autoscalingyes
Maximum storage threshold1000 GiB
Compute resourceDon’t connect to an EC2 compute resource
VPCeksctl-jhe-cluster/VPC
DB subnet groupcreate new db subnet group
Public accessno
VPC security groupchoose existing
Existing VPC security groupsdefault, eks-cluster-jhe-...
Database authenticationpassword
Enable Performance insightsyes
Retention period7 days (free tier)
AWS KMS key(default) aws/rds
Initial database namejhe

Note the attributes of the database, e.g.

Database Attributes

ParameterValue
db identifierdatabase-1
endpointdatabase-1...rds.amazonaws.com
port5432
master usernamepostgres
secret value(your secret)
rotation365d

Test the Database

Launch a shell in the cluster.

$ kubectl run postgres-test -it --rm --image=postgres:16.3 -- bash
If you don't see a command prompt, try pressing enter.
root@postgres-test:/#

Use the database endpoint, username, and secret to connect to the database you created.

root@postgres-test:/# psql -h {endpoint} -U {master username} -d postgres
Password for user postgres:
psql (16.3 (Debian 16.3-1.pgdg120+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression:
off)
Type "help" for help.

postgres=>

Seed Data into the Database

Migrate Database

Create a Job to migrate the database using our existing ConfigMap.

# job-manage-migrate.yml
apiVersion: batch/v1
kind: Job
metadata:
  name: jhe-manage-migrate
  namespace: jhe
spec:
  template:
    metadata:
      name: jhe-manage-migrate
    spec:
      restartPolicy: Never
      containers:
      - name: jhe-manage-migrate
        image: ryanlovett/jupyterhealth-exchange:a30ad58
        command: ["python", "manage.py", "migrate"]
        envFrom:
        - configMapRef:
            name: jhe-config

Run the job.

kubectl apply -f job-manage-migrate.yml

Seed the Database

This requires the seed.sql file from the jupyterhealth-exchange repository, and a new python script, jhe/scripts/seed.py to import it. seed.py is currently available in a pull request to jupyterhealth-exchange.

Injest them as ConfigMaps by running the following commands from within the working directory of jupyterhealth-exchange.

kubectl -n jhe create configmap db-seed-sql --from-file=db/seed.sql
kubectl -n jhe create configmap jhe-scripts-seed.py --from-file=jhe/scripts/seed
.py

Create a Job to seed the database.

apiVersion: batch/v1
kind: Job
metadata:
  name: import-seed
  namespace: jhe
spec:
  template:
    metadata:
      name: import-seed
    spec:
      containers:
      - name: import-seed
        image: ryanlovett/jupyterhealth-exchange:a30ad58
        command: ["python", "/app/seed.py"]
        envFrom:
        - configMapRef:
            name: jhe-config
        volumeMounts:
        - name: seed-sql
          mountPath: /app/seed.sql
          subPath: seed.sql
        - name: seed-py
          mountPath: /app/seed.py
          subPath: seed.py
      restartPolicy: Never
      volumes:
      - name: seed-sql
        configMap:
          name: db-seed-sql
      - name: seed-py
        configMap:
          name: jhe-scripts-seed.py

and run it

kubectl apply -f job-import-seed.yml

Install the Application

Finally, install the application into the cluster. jhe-example.yml is provided as example kubernetes configuration, although you will need to substitute values appropriate for your deployment.

kubectl apply -f jhe-example.yml

Administering JHE

  1. Login to your JupyterHealth Exchange app, https://jhe.example.org/admin/

  2. Under Django OAuth Toolkit, add application

    a. Save Client id

    b. Add space-separated redirect uris for hubs

    c. Client type: Public

    d. Authorization grant type: Authorization code

    e. Client secret: {client secret}

    f. Hash client secret: yes

    g. Skip authorization: yes

    h. Algorithm: RSA with SHA-2 256

Authenticating JupyterHub with JHE

In order for users of JupyterHub to have access to JHE, the simplest way is to use JHE as the OAuth provider for logging into JupyterHub. To do that, configure. Below is the configuration to login to JupyterHub with JHE as OAuth provider:

hub-jhe-auth.yaml
hub:
  config:
    JupyterHub:
      # first chunk:use Exchange as oauth provider
      authenticator_class: generic-oauth
      GenericOAuthenticator:
        client_id: ${{ saved from JHE }}
        cookie_max_age_days: 1
        authorize_url: https://jhe.example.org/authorize/
        token_url: https://jhe.example.org/o/token/
        userdata_url: https://jhe.example.org/api/v1/users/profile
        username_claim: email
        login_service: JupyterHealth Exchange
        scope:
          - openid
        admin_users:
          - email@example.org
        enable_auth_state: true
        # grant specific users access by email
        allowed_users:
          - user-email@example.org
        # or allow all JHE users to access the Hub with:
        # allow_all: true
        # see other example for group-based access
    extraConfig:
      # add access tokens from auth state to user env
      auth_state_env.py: |
        def auth_state_env(spawner, auth_state):
            if not auth_state:
                spawner.log.warning(f"Missing auth state for user {spawner.user.name}")
                return
            spawner.environment["JHE_TOKEN"] = auth_state["access_token"]

        c.Spawner.auth_state_hook = auth_state_env
singleuser:
  extraEnv:
    JHE_URL: https://jhe.example.org

You have 3 choices for authorizing JHE users to access the Hub:

  1. allow any JHE user to use the Hub. In which case, set:

    GenericOAuthenticator:
      allow_all: true
  2. allow specific users by email address:

    GenericOAuthenticator:
      allowed_users:
        - user@example.org
  3. allow based on organization membership in JHE, which requires a bit more configuration.

Authorizing the Hub via JHE organization

To authorize access to the Hub based on JHE organization membership, we need to connect JupyterHub groups with JHE organizations. This lets you manage access to the Hub in the JHE UI by adding/removing users to the authorized groups.

  1. [In JHE] create the organization(s) that you want to grant access to the Hub. Note the integer “organization id” of each organization (they probably look like 2000X).

  2. [In JHE] add users to these organizations

  3. configure JupyterHub to populate group membership based on JHE organization membership:

    hub-jhe-access-groups.yaml
    hub:
      config:
        JupyterHub:
          GenericOAuthenticator:
            # grant access based on JHE organization membership
            manage_groups: true
            auth_state_groups_key: "organizations"
            allowed_groups:
              # the integer id (in quotes) in JHE of organizations to allow access to the Hub
              - "2XXXX"
        extraConfig:
          # get organization membership for managed groups:
          managed_organizations.py: |
            from urllib.parse import urlparse
    
            async def auth_state_hook(authenticator, auth_state):
                if not auth_state:
                    return auth_state
                access_token = auth_state["access_token"]
                url = urlparse(authenticator.authorize_url)
                org_url = f"{url.scheme}://{url.netloc}/api/v1/users/organizations"
                organizations = await authenticator.httpfetch(
                    org_url,
                    headers={"Authorization": f"Bearer {access_token}"}
                )
                # use string ids for now
                auth_state["organizations"] = [str(org['id']) for org in organizations]
                return auth_state
    
            c.OAuthenticator.modify_auth_state_hook = auth_state_hook
    

Accessing JHE from the Hub

With the above configuration, when a user logs in to the Hub, two environment variables are set when a user starts their server:

$JHE_URL  # the URL of the Exchange
$JHE_TOKEN  # the user's access token for the Exchange

You can use these to make API requests to the Exchange. There is also the jupyterhealth-client package, which you can add to your user image:

pip install --pre jupyterhealth-client

And then you can use the JupyterHealthClient class to fetch patient data.