*** mattp is now known as mkjpryor | 08:20 | |
opendevreview | John Garbutt proposed openstack/magnum master: WIP: Implement cluster update for Cluster API driver https://review.opendev.org/c/openstack/magnum/+/880805 | 08:36 |
---|---|---|
jakeyip | gmann: ok | 08:57 |
jakeyip | I'm here if anyone needs me | 08:59 |
* dalees is here | 09:01 | |
dalees | we tested Fedora CoreOS 38 this week, doesn't look like any changes required. | 09:03 |
jakeyip | nice | 09:06 |
jakeyip | which k8s? | 09:06 |
dalees | that was with 1.25 I think. | 09:07 |
dalees | I did test 1.27 the other week too (with the same changes in place as required for 1.26), it started ok too. Might be some `kube-system` services to update, but the basic service ran with all existing arguments. | 09:08 |
mkjpryor | Not sure who has seen so far, but we have been working pretty hard on the Cluster API driver in the last few weeks. | 09:12 |
mkjpryor | We have had all the basic functionality working except template upgrade, so create/resize/delete and nodegroups all work | 09:14 |
dalees | we have seen and are following it closely mkjpryor - thank you! travisholton is testing your work. I'm currently working on the CAPI side (to ensure access to clusters with no API Floating IP) | 09:15 |
mkjpryor | dalees - nice. travisholton and others - any reviews of the patches would be much appreciated! | 09:16 |
travisholton | hi all | 09:16 |
mkjpryor | dalees - on the subject of CAPI clusters without a floating IP. I don't know if you know Azimuth? It is the user-friendly platform-focused portal we have been developing at StackHPC. | 09:17 |
mkjpryor | In Azimuth, we are able to create Kubernetes clusters and expose web services on them (e.g. monitoring, dashboards, JupyterHub, KubeFlow) without consuming any floating IPs at all. | 09:18 |
mkjpryor | We do this using a tunnelling proxy that we built called Zenith. Zenith also handles TLS and SSO for services that want it to (not the Kubernetes API). | 09:20 |
mkjpryor | Zenith has a server and a client that establish a tunnel (using SSH reverse port forwarding) over which traffic can flow to the proxied service. In the case of the Kubernetes API, we launch a Zenith client on each control plane node using static pod manifests injected using cloud-init that points to the API server on localhost. | 09:22 |
dalees | That's interesting, I'll be keen to have a look at Zenith. I've PoC'd an Octavia solution that will give us access in, from another project network. Now looking at a netns proxy on neutron dhcp nodes, as Vexxhost have shown - seems like a good no-resource consuming solution. | 09:23 |
mkjpryor | I've also seen Vexxhost's solution, and it does look neat. | 09:23 |
mkjpryor | Zenith does require you to run the Zenith server somewhere, but it is one server for lots of tunnels. | 09:24 |
mkjpryor | We will stick with Zenith in Azimuth as we want the SSO integration. | 09:24 |
mkjpryor | But something like Vexxhost's solution could be neat for Magnum going forward. | 09:25 |
mkjpryor | In terms of the upstream patches we have now, something like that would not be in the first iteration as the patch(es) are already large enough! | 09:25 |
jakeyip | any link to vexxhost's solution ? | 09:25 |
dalees | and so all your zenith clients just phone home with reverse ssh? that's kinda neat, too - so once they're connected the zenith server handles the SSO and no-one needs the ssh keys? | 09:25 |
dalees | jakeyip: https://github.com/vexxhost/magnum-cluster-api/blob/main/magnum_cluster_api/proxy/manager.py (as mentioned on https://kubernetes.slack.com/archives/CFKJB65G9/p1683654401315339?thread_ts=1683121146.879219&cid=CFKJB65G9 ) | 09:26 |
mkjpryor | jakeyip it is in their out-of-tree driver | 09:26 |
jakeyip | thanks | 09:27 |
mkjpryor | Once we have the initial version of the in-tree Cluster API driver merged, it is something we could look at adding in a future patch | 09:27 |
dalees | mkjpryor: yes i agree, it doesn't need to be in the first iteration. Not a must have for Private clouds, and we can add later. It would deploy in a very different way to the rest of Magnum, too. | 09:27 |
mkjpryor | Their code should work in our driver with very few tweaks, as it is pretty much independent of the way the CAPI resources are actually mode | 09:28 |
mkjpryor | dalees: I'm keen not to co-opt this into a discussion on Zenith, but yeah - basically. There is a process of associating an SSH key with a service ID, then any Zenith client that connects with that SSH key is associated with that service in a load balanced configuration. | 09:29 |
mkjpryor | The server then makes that load-balanced configuration available as <serviceid>.<basedomain>, handling TLS termination and SSO as required. | 09:30 |
mkjpryor | (It only works for HTTP services ATM) | 09:31 |
mkjpryor | Because it relies on virtual hosts | 09:31 |
jakeyip | vexxhost's implementation looks interesting | 09:31 |
mkjpryor | I do like vexxhost's implementation if your only aim is to expose the Kubernetes API server, which it would be in Magnum | 09:33 |
dalees | jakeyip: yeah, haproxy does all the heavy lifting. Just need to keep its config updated. | 09:33 |
mkjpryor | In Azimuth we also use Zenith to expose the Kubernetes and monitoring dashboards of the stacks that we deploy on the clusters, with SSO to make sure only authorised people can access them. | 09:34 |
jakeyip | mkjpryor: this is more like a managed service? Azimuth is used to deploy clusters and managed resources, and users use normal k8s ingress, etc? | 09:36 |
dalees | could do extra services with HAProxy into the netns, just need to write the control plane host addresses into config. | 09:37 |
mkjpryor | jakeyip: At the moment, yes. Azimuth deploys Kubernetes clusters with dashboard and monitoring. It also allows users to deploy apps onto those clusters, like JupyterHub and KubeFlow, whose interfaces are also exposed using Zenith. | 09:38 |
mkjpryor | The Kubernetes API server is a slightly special case, done using static pod manifests for the Zenith clients. This is because they need to run before Kubernetes itself is actually up and running. | 09:39 |
jakeyip | deploying apps is interesting. do users still get kubeconfig access to their clusters? does Azimuth store state? what if a user deletes e.g. namespace of the app, will Azimuth gets confused | 09:40 |
mkjpryor | For all the other services, there is a Zenith operator watching for instances of the CRDs "zenith.stackhpc.com/{Reservation, Client}" to be created on each tenant cluster, and creating resources on the clusters in response to those. | 09:40 |
mkjpryor | jakeyip: At the moment, we are just using Helm to splat the apps onto the cluster at deploy time. However we are in the process of using Argo CD to manage all the addons and apps on tenant clusters so that, in theory, we can recover from users doing stupid things like that ;-) | 09:41 |
mkjpryor | All the Azimuth state is in CRDs in the Kubernetes cluster on which it runs, so etcd in most cases | 09:42 |
jakeyip | nice | 09:42 |
mkjpryor | I did once have a user delete the CNI... In theory, the ArgoCD-managed addons could actually recover from that. | 09:43 |
mkjpryor | Because Argo is constantly watching to make sure that the resources it created are healthy | 09:44 |
mkjpryor | I'm giving a talk about Azimuth in Vancouver actually, if anyone is interested. | 09:44 |
jakeyip | will make sure my team mate go :P | 09:45 |
jakeyip | I need to read more about Azimuth and Zenith | 09:45 |
dalees | cool, I will look out for the recording | 09:45 |
mkjpryor | It is called "Self-service LOKI applications for non-technical users" in the Private & Hybrid Cloud track | 09:45 |
mkjpryor | I'm also giving a talk about Cluster API and Magnum with mnaser from vexxhost | 09:46 |
jakeyip | we are also starting to dip our toes into Managed K8S so your experiences will be very helpful indeed. will need to bother you more if we have questions. | 09:47 |
jakeyip | hopefully the code merges before Vancouver! :D | 09:47 |
jakeyip | no pressure | 09:47 |
mkjpryor | jakeyip feel free to get in touch | 09:47 |
jakeyip | thanks | 09:48 |
mkjpryor | jakeyip speaking of merging the code, your eyes on the patches would be much appreciated | 09:48 |
mkjpryor | We have been working on improving the test coverage | 09:48 |
jakeyip | yeah I have a bunch of things to review | 09:48 |
mkjpryor | I think there is some stuff that is probably not too far from being mergable | 09:48 |
jakeyip | need everyone's help on the RBAC too | 09:48 |
dalees | travisholton: did you have any comments or changes on mkjpryor's patchsets in gerrit? | 09:49 |
jakeyip | mkjpryor: are tests passing yet? would appreciate if you can guide me with a list of reviews that I should be doing in order, etc | 09:50 |
travisholton | mkjpryor: I have been experimenting with it a bit lately as dalees mentioned | 09:50 |
mkjpryor | jakeyip: Tyler from our team has been doing a lot of work on the Tempest Magnum plugin. | 09:51 |
mkjpryor | I'm not sure things are completely passing yet in the gate | 09:51 |
mkjpryor | But then I don't think the old driver is fully tested in the gate either right, because of general slowness? | 09:52 |
mkjpryor | (That is what we are trying to fix) | 09:52 |
jakeyip | yeah I do the testing manually, which contributed to the slowness of reviews. mnasiadka was trying to help with that but he got busy for a bit. | 09:53 |
travisholton | mkjpryor: ok if I submit my own patches to that? I have one change I'd like to add | 09:53 |
mkjpryor | Other colleagues in our team have been working on the tests for the old driver. mnasiadka probably has more context than me on how far that got. | 09:53 |
mkjpryor | travisholton: The patch chain is already quite complicated - maybe for now just comment what you would change? We can add you as a co-author on the patch if we adopt it | 09:54 |
travisholton | yes I've been watching the patchsets change daily :-) | 09:55 |
mkjpryor | Is the change adding additional functionality or fixing something? | 09:55 |
travisholton | additional functionality (at least as of Monday) | 09:55 |
mkjpryor | We want to get the simplest possible chain merged, TBH. | 09:55 |
mkjpryor | For example, we hardly support any labels at all right now | 09:55 |
mkjpryor | But I think that is fine for the first pass. | 09:56 |
travisholton | lol yeah I wanted to add a way to pass extra --set args to helm via labels | 09:56 |
mkjpryor | We can add support for labels one-by-one in smaller, more reviewable, patches | 09:56 |
jakeyip | travisholton: you can also put up a WIP patch too without the intention of it getting merged, just for sharing your code and ideas | 09:57 |
jakeyip | WIP/DNM | 09:57 |
mkjpryor | travisholton so my idea for that is that there will be a template label that contains serialized JSON that will be merged into the Helm values | 09:57 |
mkjpryor | That is what we do in Azimuth (we also have a templates -> clusters model, where the operator manages the available templates) | 09:57 |
travisholton | my idea was similar I think. Just a label that adds a string for the --set option | 09:58 |
mkjpryor | I think I prefer the more structured approach, TBH | 09:58 |
mkjpryor | As in the template defines a set of values that are used as a "starting point" (just an empty dict now), then Magnum applies the cluster-specific stuff on top (node groups etc.) | 09:59 |
mkjpryor | --set also notoriously doesn't deal with certain types very well, which is why they added --set-string | 10:00 |
mkjpryor | So using serialised JSON in the label avoids that too | 10:00 |
travisholton | sounds sensible | 10:02 |
travisholton | I've needed to pass some extra arguments in to get clusters to build successfully on devstack. How are you managing things like machineSSHKeyName, and passing kubeletExtraArgs right now? | 10:07 |
mkjpryor | machineSSHKeyName should be set, if a key name is provided in the cluster | 10:10 |
mkjpryor | kubeletExtraArgs is not set ATM | 10:10 |
mkjpryor | What do you need to set in kubeletExtraArgs? | 10:10 |
travisholton | it hasn't been set in my devstack clusters (at least as of Monday). I haven't tried the past couple days | 10:11 |
travisholton | in ubuntu I need to set resolv-conf: /run/systemd/resolve/resolv.conf | 10:12 |
mkjpryor | TBH, I haven't tried since JohnG starting mangling my original patches to make them more reviewable/testable | 10:14 |
mkjpryor | So it might be broken | 10:14 |
mkjpryor | But it should work | 10:14 |
travisholton | here's a complete output of the helm values that I used in devstack that works for me: https://0bin.net/paste/Kj+EsYct#Ps1iIFncxctoFg966Yg1aVBV7W8Z6HEbOnVM0IY-mbR | 10:15 |
mkjpryor | I've never seen a requirement to make that change to the resolv.conf. What is it that means you need to do that? | 10:15 |
travisholton | https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues | 10:16 |
mkjpryor | "kubeadm automatically detects systemd-resolved, and adjusts the kubelet flags accordingly." | 10:17 |
mkjpryor | Shouldn't that be happening? | 10:17 |
travisholton | hmm...I know I have seen it not working. I may try it without again and see if it's still a problem | 10:18 |
mkjpryor | That would explain why I haven't needed to do it before | 10:18 |
mkjpryor | What image are you using? | 10:18 |
travisholton | an ubuntu-22.04-x86_64 image that I built with packer+ansible | 10:19 |
mkjpryor | Maybe it is a 22.04 thing | 10:19 |
mkjpryor | We are using 20.04 still | 10:19 |
travisholton | right..I've only been using 22.04 lately | 10:19 |
mkjpryor | In any case, I think the implementation of the label containing strucutred "default values" would be a good thing to have | 10:20 |
travisholton | +1 | 10:20 |
mkjpryor | You can implement it at the top of the patch chain if you want | 10:20 |
mkjpryor | Just be prepared to rebase regularly :D | 10:21 |
travisholton | that's not a problem | 10:21 |
mkjpryor | We will probably also implement other labels to simplify common things that could "technically" all be specified via this "default values" template label | 10:22 |
dalees | in which order should they have precedence? I'd guess: helm chart defaults, then default value json structure, then specific labels? | 10:23 |
travisholton | we'll certainly want to have some that we can customise (eg imageRepository) | 10:23 |
mkjpryor | So the merge order, with rightmost entries taking precedence, will eventually be: "chart defaults" -> "Magnum global defaults" -> "template defaults" -> "template labels" -> "cluster labels" -> "Magnum-derived cluster-specifics (e.g. networks, node groups)" | 10:24 |
mkjpryor | IMHO anyway | 10:24 |
dalees | yeah, that makes sense. | 10:26 |
dalees | thanks all, see you next week. | 10:43 |
jakeyip | seeya all | 10:54 |
gmann | jakeyip: thanks | 16:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!