Fusion with Google GKE and Google object storage

Fusion streamlines the deployment of Nextflow pipeline in a Kubernetes cluster, because it replaces the need to configure and maintain a shared file system in your cluster.

This feature requires Nextflow 23.02.1-edge or later.

Cluster preparation

Create a GKE "standard" cluster ("Autopilot" is not supported yet). See Google documentation for details.
Make sure to use instance types with 2 or more CPUs and providing SSD instance storage (families: n1, n2, c2, m1, m2, m3)
Make sure to enable the Workload identity feature when creating (or updating) the cluster
- "Enable Workload Identity" in the cluster "Security" setting
- "Enable GKE Metadata Server" in the node group "Security" settings
- Configure the cluster following the See the Google documentation for details. documentation
- The following values were used in this example (replace them with values corresponding your environment):
  - CLUSTER_NAME: the GKE cluster name e.g. cluster-1
  - COMPUTE_REGION: the GKE cluster region e.g. europe-west1
  - NAMESPACE: the GKE namespace e.g. fusion-demo
  - KSA_NAME: the GKE service account name e.g. fusion-sa
  - GSA_NAME: the Google service account e.g. gsa-demo
  - GSA_PROJECT: the Google project id e.g. my-nf-project-261815
  - PROJECT_ID: the Google project id e.g. my-nf-project-261815
  - ROLE_NAME: the role to grant access permission to the Google Storage bucket e.g. roles/storage.admin
Create the K8s role and rolebinding required to run Nextflow applying the Kubernetes config shown below:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: fusion-demo
  name: fusion-role
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/status", "pods/log", "pods/exec"]
    verbs: ["get", "list", "watch", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: fusion-demo
  name: fusion-rolebind
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: fusion-role
subjects:
  - kind: ServiceAccount
    name: fusion-sa
---
apiVersion: v1
kind: Secret
metadata:
  namespace: fusion-demo
  name: fusion-sa-token
  annotations:
    kubernetes.io/service-account.name: fusion-sa
type: kubernetes.io/service-account-token
...

Nextflow configuration

The minimal Nextflow configuration looks like the following:

wave.enabled = true
fusion.enabled = true
process.executor = 'k8s'
process.scratch = false
k8s.context = '<YOUR-GKE-CLUSTER-CONTEXT>'
k8s.namespace = 'fusion-demo'
k8s.serviceAccount = 'fusion-sa'
k8s.pod.nodeSelector = 'iam.gke.io/gke-metadata-server-enabled=true'

In the above snippet replace <YOUR-GKE-CLUSTER-CONTEXT> with the name of the context in you Kubernetes configuration, and save it to a file named nextflow.config into the pipeline launching directory.

Then launch the pipeline execution with the usual run command:

nextflow run <YOUR PIPELINE SCRIPT> -w gs://<YOUR-BUCKET>/work

Make sure to specify a Google Storage bucket to which you have read-write access as work directory.

Fusion with Google GKE and Google object storage

Cluster preparation​

Nextflow configuration​

Cluster preparation

Nextflow configuration