Running a Container for Data Science in a Kubernetes Cluster

Using this guide, you can run a container with pre-installed machine learning tools in your Kubernetes cluster and run the Jupyter Notebook service in it.

Below is the list of packages in a container:

Containers can be used for training and model inference when developing apps and working with data.

Creating a Cluster and Getting Started

Follow these steps to create a cluster and get started with it in the Control panel:

  1. Go to the Kubernetes section.

  2. Create a cluster.

  3. We recommend choosing a configuration of at least 4 vCPUs, 8 GB RAM, and 20 GB SSD for comfortable working with the container.

    Please note that system requirements may vary depending on the executable scripts.

  4. Wait for the cluster status to switch to Active.

  5. Select the created cluster and go to the Settings tab.

  6. Click Download kubeconfig and save the YAML configuration file.

  7. Set it a name, for example, my-kube.yaml.

Managing Clusters through Console Client

Install kubectl, the Kubernetes console client.

In the console, export the path to the my-kube.yaml file downloaded in step 6 to an environment variable:

export KUBECONFIG=my-kube.yaml

To ensure that the configuration was successful:

kubectl get nodes

If the configuration was successful, the output will be similar to the following:

NAME        STATUS    ROLES     AGE    VERSION
your-node   Ready     <none>    2m01s  v1.17.6

Running a Container

Follow these steps to run a container:

  1. Download the YAML deployment configuration file.
  2. Run the following:
    kubectl apply -f selectel-ml.yaml
    
  3. The command has been accepted:
    deployment.apps/selectel-ml created
    
  4. Check the container status by running the following:
    kubectl get pod -w
    
  5. The ContainerCreating status will be displayed first:
    NAME         READY     STATUS              RESTARTS    AGE
    selectel-ml  0/1       ContainerCreating   0           10s
    
    Press Ctrl+C to exit preview mode.
  6. Wait for the Running status to be displayed. It means that the container is created and running. This may take several minutes:
    selectel-ml	1/1	Running
    

Running the Jupyter Notebook

Follow these steps to run the Jupyter Notebook:

  1. Open the port to access the service:

    kubectl expose deployment selectel-ml --type=LoadBalancer --name=my-service
    
  2. The command has been accepted:

    service/my-service exposed
    
  3. To get the port to connect to the Jupyter server, run the following:

    kubectl get services
    
  4. The output should be similar to the following:

    NAME       TYPE          CLUSTER-IP      EXTERNAL-IP     PORT(S)         AGE
    my-service LoadBalancer  10.100.90.86    203.0.113.1     8888:31779/TCP  30s
    
  5. Enter the IP address from EXTERNAL-IP and the port number from PORT(S) in the browser address bar, for example: 203.0.113.1:8888. If EXTERNAL-IP field shows <pending>, run kubectl get services after a few minutes.

  6. Enter the default password: 9lG0eXCevt in the Jupyter Notebook web interface that opens.

Learn more about changing your password in the Jupyter Notebook documentation.

Managing Containers through the Console

To manage containers through the console, run the following:

kubectl exec -it [pod name] /bin/bash