How do I link internal Kubernetes services? Environment variables or DNS?

 

in the following snippet from the weavescope yaml:

```

spec:
hostPID: true
hostNetwork: true
containers:
- name: weavescope-probe
image: 'weaveworks/scope:latest'
args:
- '--no-app'
- '--probe.docker.bridge=docker0'
- '--probe.docker=true'
- '--probe.kubernetes=true'
- '$(WEAVESCOPE_APP_SERVICE_HOST):$(WEAVESCOPE_APP_SERVICE_PORT)'

```

 

they’re relying on a goofy Kubernetes convention that automatically appends _SERVICE_HOST and _SERVICE_PORT to every Service - and adds these as env variables to the target Docker container.  

 

oh, and it uppercases the Service name - and replaces any `-` (hyphens) with `_` (underscore) 

 

ie. weavescope-app -> WEAVESCOPE_APP(_SERVICE_HOST/_SERVICE_PORT)

 

this is usually used when there is no internal DNS setup in your Kubernetes environment - which is rare.  

 

my guess is that they’re doing this to address the least common denominator of Kubernetes cluster configurations.  some people might not have DNS setup for whatever reason.

 

the line that you highlighted is a way of saying, “once you get inside the container, use whatever value is placed in that env variable for this argument”:

 

                - '$(WEAVESCOPE_APP_SERVICE_HOST):$(WEAVESCOPE_APP_SERVICE_PORT)’

 

in this case, they’re linking the weavescope-probe Docker container to whatever host:port the weavescope-app Service ends up listening on at the time of initialization.  this has an obvious order-of-startup problem.  i actually don’t know how they get around that here.  i don’t use this convention much.

 

and this is bit brittle for obvious reasons, but the thinking is that the Service host:port shouldn't change often.  but if it does, this link will be broken.  it’s better the rely on internal DNS, of course.

 

i actually started out this way when i first linked the Spark workers to the Spark master, but quickly moved to internal DNS.

 

here’s the DNS version (i don’t have history of the non-DNS version for some reason):

 

https://github.com/fluxcapacitor/pipeline/blob/a75ec61c7edfc1afcc2416e7479899cef524f6f3/apachespark.ml/spark-worker-rc.yaml#L30

 

notice in the WEAVESCOPE example above, that they’re using $(ENV_VAR_NAME) surround with `$(` and`)`.  this is a Kubernetes convention to defer the resolution until Docker creation time.

 

below is a snippet of the ENV_VARS inside the Docker container of a Spark worker.  i’m using the super-handy `kubectl exec -it <pod-name> bash` command to get onto the Docker container inside the pod.  note:  this command breaks down a bit if you have multiple Docker containers per pod, but you can get around that by specifying a `-c` argument or some such.

 

```

declare -x SPARK_MASTER_2_0_1_SERVICE_HOST=“100.67.230.192"

...

declare -x SPARK_MASTER_2_0_1_SERVICE_PORT_ADMIN_HTTP="80"

declare -x SPARK_MASTER_2_0_1_SERVICE_PORT_REST_SUBMIT="6066"

declare -x SPARK_MASTER_2_0_1_SERVICE_PORT_SPARK_SUBMIT=“7077”

declare -x WEAVESCOPE_APP_SERVICE_HOST="100.71.210.218"

declare -x WEAVESCOPE_APP_SERVICE_PORT="80"

```

 

oh, and another downside of this convention is that the automatic Kubernetes-injected ENV VAR may actually conflict with your own app-specific ENV VAR names.  i was pulling my hair out debugging this exact issue early on.  the variable i was setting was being overwritten by the Kube-injected variable (or something).  it wasn’t pretty. 

 

here’s some references:

https://kubernetes.io/docs/user-guide/services/#environment-variables

 

https://github.com/kubernetes/kubernetes/tree/master/examples/guestbook

Have more questions? Submit a request

Comments