Configuration Jobs & Services

There are 3 type of tasks that are deployed and used within the realm of the DataTask platform. They are configured and built in a manifest.json file in the task git repository.

cookbook3 1

Jobs

Jobs are represented by Kubernetes Job objects. It’s a batch process.

Cronjobs

CronJobs use Job objects to complete their tasks. CronJobs create a Job object about once per execution time of its schedule.

Deployment

Deployment are Kubernetes object that help ensure that one or more instances of your application are available to serve user requests. Its controller contains the specification of the application.

Services

A service in the DataTask environment represent both Kubernetes Service and Deployment. Services are Kubernetes object which defines a logical set of the application and a policy by which to access them.

Manifest file (manifest.json)

The manifest.json file defines the nature and the structure of what it is to be deployed. Its components are:

  • labels: String (key, value), represents the label of a Kubernetes deployment, acting as a Kubernetes service selector

  • type: String. Describe the nature of your deployment. Either job, deployment or service

  • published: Dict, for a service. Is used to configure an exposed service. If you want to create a non-exposed service leave the published key empty. If set, contains the keys:

    • access: String. that represents the group you wish to give your application access to.

    • prefix: String. Which is the prefix your application will be set with.

    • rewrite: String, not required. Rewrite the URL before handing it off to the service.

    • timeout_ms: Int, not required, default : 6000. The timeout, in milliseconds, for requests through this service.

    • sticky_sessions: Boolean, not required, default : false. If your service has several replicas, users will always fall back on the same pods.

    • bypass_auth: Boolean, not required, default : false. If you set this option to true, your service will be deployed with public access.

    • displayed_name: String, not required, _default : "real name of the service". This field corresponds to the name of the service that will be displayed on the portal.

    • icon: String, not required, _default : "import_contacts". This field corresponds to the name of the "Material icon" to be used or the http(s) link to a custom icon. In the case of a Material icon, you can find all the names available on this site : link.

    • visible: Boolean, not required, default : true. If a deployed service should be visible when on the portal. (s) link to a custom icon. In the case of a Material icon, you can find all the names available on this site : link.

  • name: String. Represents the name of your deployment.

  • image-source: String. Docker registery image. If set, this item is directly used to build your application, image-destination not set.

  • image-destination: String. Docker registery image. If set, this item is used as an end-variable, the tag going alongside the docker build, having a code directory context in the same path of the manifest file, image-source not set.

  • image-pull-policy: String, not required, default : "Always". Possible values : "Always", "IfNotPresent", "Never".

  • env: List of dicts (name, value which have to be Strings). Defines the environment variable of your application.

  • svc-account-secret: String, not required, default : "". Represents the service account name that will be used. Note that when you deploy a task with a service account in a secret. This secret must be on the same namespace as the namespace of the task. You can find more information on how to create a secret here.

  • cmd: List of String. Overrides or defines the command of your (image-source or image-destination).

  • cpu-limit: String. Represents the limit of units of cores your application can overgo.

  • memory-limit: String. Represents the limit of units of bytes your application can overgo.

  • cpu-request: String. Represents the requested amoung of units of cores your application starts with.

  • memory-request: String. Represents the requested amoung of units of bytes your application starts with.

  • schedule: String. Describes the repetitively time-based in which the task is executed. If set, makes the difference between a job and a cronjob.

  • replicas: Int, for a service or deployment. Defines the number of copies (replicas) of your deployment.

  • name-port: String, for a service. Defines the name of the port.

  • container-port: Int, for a service. Represents the port of your application.

  • volumes: [] Array. Represents the volume mount configuration. more informations.

  • tolerations: [] Array. Represents the Taints and Tolerations to configure the node affinity. more informations.

  • node-selector: Dict. Used to assign pods to nodes. more informations.

FYI:

You do not need to have all these items in your json file, just select and fill what you’re interesting in. Depending on what you want to build, there are mandatory fields. But, these ones always appear in manifest.json files :

  • type

  • name

  • labels

  • either image-source or image-destination

Manifest Jobs Configuration

The following represents a Job to be deployed.

{
    "labels": "app:gstogs",
    "type": "job",
    "name": "gstogs",
    "image-source": "",
    "image-destination": "gcr.io/affini-tech-datalab/gs_to_gs:0.1",
    "image-pull-policy": "Always",
    "env": [
        {"name":"bucket_source", "value":"dev_etl_pipeline"},
        {"name":"logger_name", "value":"gs_to_gs"}
    ],
    "svc-account-secret": "svc-account-affini-tech-a",
    "cmd": [
        "python3",
        "gs_to_gs.py"
    ],
    "cpu-limit": "0.20",
    "memory-limit": "200Mi",
    "cpu-request": "0.15",
    "memory-request": "150Mi",
    "schedule": ""
}

Manifest cronjob configuration

The json file below will launch a cronjob (a job schedule every day at 5 am)

{
    "labels": "app:gstogs",
    "type": "job",
    "name": "gstogs",
    "image-source": "",
    "image-destination": "gcr.io/affini-tech-datalab/gs_to_gs:0.1",
    "image-pull-policy": "Always",
    "env": [
        {"name":"bucket_source", "value":"dev_etl_pipeline"},
        {"name":"logger_name", "value":"gs_to_gs"}
    ],
    "svc-account-secret": "svc-account-affini-tech-a",
    "cmd": [
        "python3",
        "gs_to_gs.py"
    ],
    "cpu-limit": "0.20",
    "memory-limit": "200Mi",
    "cpu-request": "0.15",
    "memory-request": "150Mi",
    "schedule": "0 5 * * *"
}

Manifest deployment configuration

The json file below will launch a job as a deployment. The advantage of running a job as a deployment is that the deployment will ensure that there is always at least one instance running (depending on your configuration).

{
    "labels": "app:subscriber-etl",
    "type": "deployment",
    "name": "subscriber-launch-pipeline",
    "image-source": "",
    "image-destination": "gcr.io/datataskio/launch_pipeline:0.2",
    "env": [
        {"name":"bucket_source", "value":"datatasketl"},
        {"name":"logger_name", "value":"etl_launch_pipeline"}
    ],
    "svc-account-secret": "svc-account-affini-tech-a",
    "cmd": [
        "python3",
        "launch_pipeline.py"
    ]
}

Manifest service configuration

  • published service

This json file represents a flask application that will be published, given access to the group dev and being called with the prefix /app/

In the context of a DataTask shared with several organizations, to ensure the availability of a prefix, it is recommended to add the name of your organization at the beginning of "prefix" for each service deployed. Example: "prefix": "/affini-tech/my-app/".
{
    "labels": "app:flask",
    "type": "service",
    "published" : {
	    "access": "dev",
    	"prefix": "/app/"
	},
    "name": "app-flask",
    "image-source": "quay.io/gilles/flask-team",
    "image-destination": "",
    "image-pull-policy": "Always",
    "env": [],
    "svc-account-secret": "svc-account-affini-tech-a",
    "cmd": [],
    "replicas": 1,
    "name-port": "app-port",
    "container-port": 5000
}
  • Image destination

Whenever there’s an image-destination, the gitbox looks always like:

cookbook3 1

And the associated manifest.json write itself that way:

{
   "labels": "app:gstogs",
   "type": "job",
   "name": "gstogs",
   "image-source": "",
   "image-destination": "gcr.io/datataskio/gs_to_gs:0.1",
   "image-pull-policy": "Always",
   "env": [
       {"name":"bucket_source", "value":"datatasketl"},
       {"name":"logger_name", "value":"gs_to_gs"}
   ],
   "cmd": [
       "python3",
       "gs_to_gs.py"
   ],
   "svc-account-secret": "svc-account-affini-tech-a",
   "cpu-limit": "0.20",
   "memory-limit": "200Mi",
   "cpu-request": "0.15",
   "memory-request": "150Mi",
   "schedule": ""
}