Tuesday, 2021-04-20

ianwProcess: 5316 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock [0;1;31m(code=exited, status=1/FAILURE)[0m01:45
ianwone of thsoe days01:45
ianwhow this ended up with one file called "Got" is weird01:46
ianwi wonder if this relates to the ipv6 thing with zuul as well, docker has pushed some change01:47
openstackgerritIan Wienand proposed zuul/zuul-jobs master: collect-container-logs: don't copy on failure  https://review.opendev.org/c/zuul/zuul-jobs/+/78701902:10
ianwok, i've seena  buch of these docker failures now04:02
ianwnothing in the logs really points to what it is04:02
ianwok, i have proposed https://review.opendev.org/c/zuul/nodepool/+/787065 which gets things further by not failing the job when nodepool logs aren't available; but i still can't see why docker is failing to install06:24
*** ykarel|lunch is now known as ykarel10:47
*** vishalmanchanda has quit IRC13:39
*** roman_g has quit IRC13:45
*** vishalmanchanda has joined #opendev13:47
*** artom has joined #opendev13:49
clarkbfungi: ianw: first thought is are any of our production system upgraded to latest docker? and if so do any of them exhibit the same problem?14:50
clarkbI see ianw found the moby issue for failures when ipv6 is disabled on the host. But we shouldn't disable ipv6 on any of our hosts so I doubt that is related14:50
clarkbdocker 20.10.6 released just over a week ago14:52
clarkbbut these errors didn't start happening until more recently. Is it possible there is a ubuntu kernel + docker + ssh-keyscan interaction?14:52
fungiwell the "breaks when ipv6 disabled" bug could still be a clue to a behavior change14:59
fungiif things have suddenly started preferring ipv6 for some subset of situations, and we have baked in assumptions that ipv4 will be tried, i could see running into similar problems14:59
clarkbfungi: ianw: looking at syslog for the failed nodepool job I see that docker.io appears to be installed, we then remove it then install docker-ce15:08
clarkbI wonder if docker.io is leaving behind things that break docker-ce15:08
fungiooh, good catch15:10
fungido we have competing tasks for installing docker?15:11
fungiis one installed from ubuntu and the other from the external docker package repository?15:11
clarkbyes I think docker.io is ubuntu and docker-ce is the upstream package15:17
fungimakes sense, though the only thing the docker.io package should leave behind is data and configuration. if we explicitly purge the package before installing docker-ce then any lingering data and configuration should also be cleaned up15:18
fungi(the difference between apt remove and apt purge)15:19
clarkbright, but maybe that would delete containers too15:19
fungiyes, it probably would15:19
clarkbI don't understand why docker.io is installed though15:19
clarkbtrying to figure that out15:19
clarkbhttps://zuul.opendev.org/t/zuul/build/4c8186da9fe043bf83d63926cbb3fc84/console#1/0/56/ubuntu-bionic is where we uninstall docker.io15:21
clarkbthe next task installs docker-ce15:22
clarkbfound it15:22
clarkbit is bindep15:22
fungihah, so we have docker.io in bindep?15:22
fungier, in bindep.txt15:23
clarkblet me push up a change that removes that entry and see if the job is happier15:23
clarkbfungi: does uninstall remove unit files or just disable them? the error complains about lack of socket activation stuff15:30
clarkbI wonder if there is an assumed unit behavior in the new package and the old one is leaving a different assumption behind15:30
fungii think it would remove the old systemd units15:32
fungibut it might leave them in a disabled state instead, i'm not positive15:32
fungipurge should definitely remove them15:32
clarkblooks like my test of focal failed so same behavior there15:32
clarkbI think that may be it because we have't hit any retries yet15:41
clarkbthat being the docker.io install prior to docker-ce15:41
clarkbfungi: fyi https://review.opendev.org/c/opendev/system-config/+/778116 has passed testing and when I checked the screenshots I saw zuul status things. Maybe you want to check it over too and we can start working to land that then your jeepyb changes?16:01
fungiahh cool, yep16:10
corvusclarkb: what wm do you use?16:39
clarkbcorvus: xmonad16:41
clarkbmost people complain about the config because it is written in haskell but I've always enjoyed that aspect of it :)16:41
clarkb(you also need a ghc toolchain which isn't small)16:41
clarkbi3 has become the more mainstream equivalent. dwm, ratpoison, and awesome are all sort of similar16:43
corvusthx; i think i tried all of them like 8 years ago; it's probably time for a redo :)16:43
corvusi also tried qtile (written/configured in python); it was very new and not quite there yet.  but apparently it's still around.16:44
clarkbI would probably start with i3 if you aren't drawn to haskell. I keep meaning to try it too myself16:44
corvusarch wiki to the rescue: https://wiki.archlinux.org/index.php/Comparison_of_tiling_window_managers16:45
clarkbcorvus: I also use it with xfce. To do that I have to put an xfwm4 in my path ahead of the actual xfwm4 that is actuall xmonad. It works, there is at least one library that makes it work better too (it respects things like panel space)16:47
fungii still swear by ratpoison, fwiw, i prefer the keybindings16:47
clarkbI wish xfce made it easier to switch out window managers but this has worked ok16:47
*** hamalq has quit IRC16:47
clarkbbut that gets me all the nice things from a desktop like battery management, network management, *-agent, etc without doing it all myself16:48
corvusclarkb: oh that's helpful (i also use xfce), thx16:48
corvusfungi: okay, i'll put it on the short list too :)16:48
corvusi love that all the tiling managers from 8 years ago are still around; time has done nothing to winnow the field :)16:48
fungiyeah, i don't use any desktop environment or login manager, just ratpoison called from .xinitrc and a basic rp config to precreate some layout i want and spawn terminals16:50
fungivery barebones. i'm not a fan of draggable windows nor window decorations, so all decorationless other than a one-pixel outline around the window with active keyboard focus16:50
JayFMeanwhile, I spent an hour this weekend getting GNOME 40 installed on my gentoo ;)16:51
JayFI do think that if I ever get a linux machine for work, I might look at a more barebones window manager, just to cut down on distractions. But gimme all the eye candy/decorations/etc for my fun PC :D16:51
*** fressi has joined #opendev16:55
clarkbJayF: I found it was particularly helpful on laptops where input devices don't work so well beyond the keyboard16:56
JayFHeh. I haven't run linux on a laptop in years, but there's a pilot program for linux work laptops at my employer, and I'm hoping to get one soon :D16:56
corvus2021 year of the linux laptop16:57
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: WIP: Roles to snapshot and cleanup image builds for digitalocean  https://review.opendev.org/c/zuul/zuul-jobs/+/78675717:02
fungione of the reasons i like keyboard-driven window management is that i have some graphical systems where a keyboard (or chording keyer) is the only input device, so the keynav utility winds up standing in as a pointer only when a pointer is absolutely required17:11
fungithough on that note, keynav *is* a really cool tool17:12
elodfungi clarkb : thanks for the merge (meta-config patch)! I'll drop a mail to ML about starting the delete of ocata-eol tagged branches and after some time (~1 week) I'll do some "live" testing on some branch (whose repository's acl was changed in the patch)17:28
eloddoes this sound OK for you? ^^^17:28
clarkbya I think we were hoping for some live testing before we started merging more of those changes17:29
clarkbjust to avoid any subsequent updates if they become necessary17:29
fungielod: yeah, should be fine. i suggest starting slowly and double-checking things worked the way you expect. then if there are no complaints we can work on updating the other acls too17:29
elodsure, cool17:29
elodJayF: some warning in advance ;)  ^^^17:31
JayFelod: Thank you :) Courtesy of our previous discussion we now have a simple runbook to follow to reconfigure CI if a retired branch breaks it17:32
elodJayF: ++17:32
smcginnisProbably doesn't have an impact on us, but just so folks know - https://grafana.com/blog/2021/04/20/grafana-loki-tempo-relicensing-to-agplv3/17:35
clarkbsmcginnis: should be fine, we deploy the upstream grafana images so we dn't even have modifications to publish (which we would sort of do automatically anyway)17:36
*** fressi has quit IRC18:00
*** iurygregory has quit IRC18:15
*** iurygregory has joined #opendev18:33
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: WIP: Roles to snapshot and cleanup image builds for digitalocean  https://review.opendev.org/c/zuul/zuul-jobs/+/78675718:35
openstackgerritMatt McEuen proposed openstack/project-config master: Add ACL rule for deleting treasuremap branches  https://review.opendev.org/c/openstack/project-config/+/78728219:10
openstackgerritGonéri Le Bouder proposed zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped  https://review.opendev.org/c/zuul/zuul-jobs/+/78727119:15
openstackgerritMerged openstack/project-config master: Add ACL rule for deleting treasuremap branches  https://review.opendev.org/c/openstack/project-config/+/78728219:29
zigofungi: Hi there! I saw that https://review.opendev.org/c/openstack/project-config/+/783613 was merged, does it mean I already can use Bullseye ?19:39
fungizigo: turns out we merged it prematurely, we should have waited for a new dib release with the dependency of that change included, and then an updated nodepool container image with that dib release included. ianw went ahead and tagged a new dib version yesterday, but found a new problem with docker which is blocking us from merging the subsequent change to update the nodepool image19:43
zigofungi: Though is the dib change for Bullseye ok? The docker thingy is unrelated, right?19:44
fungiwe think we've finally got a handle on what's changed in docker19:44
fungiyeah, unrelated, just blocking us from upgrading dib on our nodepool builders19:44
fungiso we won't be able to build proper bullseye node images until that's fixed, hopefully in the next few hours19:44
zigofungi: Allright, good news then, tomorrow should be ok for me... :P19:45
fungilooking that way, yes19:45
zigofungi: I still got that https://review.opendev.org/c/openstack/puppet-openstack_extras/+/786503 patch to merge before I can try a WIP patch on puppet-openstack-integration, so I have to wait anyways...19:46
fungithere's still an outstanding change we need to merge after we have nodes to start building amd64 and arm64 python wheels for bullseye, but that's just a performance booster for pip install of things on those platforms19:46
zigofungi: Did you hear about extrepo? It's great, it's kind of like PPAs for Debian, kind of ...19:46
zigoIt's a completely independent initiative from a single DD though ...19:47
zigoPPAs are hosted by canonical, extrepo is just a list of available (but authenticated) random Debian repos hosted wherever one wishes...19:48
zigoThe good thing is about auth: Extrepo is taking care of the GPG stuff, so you don't need to trust any random GPG key on the net.19:49
zigoAs for me:19:49
zigoextrepo openstack_wallaby19:49
zigoDONE ! :)19:49
fungivery cool!19:49
zigoextrepo enable openstack_wallaby19:49
zigo(that is)19:49
fungiwe're not running any debian servers currently, but i bet that could be useful in some of the roles in zuul-jobs19:49
zigohttps://salsa.debian.org/extrepo-team/extrepo-data/-/tree/master/repos/debian <--- List of external repositories. You just need a merge request to add yourself.19:50
openstackgerritGonéri Le Bouder proposed zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped  https://review.opendev.org/c/zuul/zuul-jobs/+/78727119:50
fungithankfully we know better than to fetch signing keys from keyservers to configure package repositories anyway, and just bake the key material into our orchestration19:50
zigowget https://example.com/key.asc -O - | apt-key add <---- To me, this is the equivalent of "curl | sudo bash" :)19:51
funginot just from a security perspective, but also because it's additional network access which increases the chances a job will break on random connection failures19:52
openstackgerritMerged opendev/system-config master: Remove Jenkins related documentation fragments  https://review.opendev.org/c/opendev/system-config/+/78679619:57
corvusfungi, clarkb, mordred: https://fishshell.com/ i have not used the shell, but the web page is great20:13
slittle1https://review.opendev.org/c/starlingx/integ/+/785891   seems to be stuck20:14
slittle1neither recheck  nor WF 0/+1 have worked20:15
corvusslittle1: can you elaborate on 'stuck'?  do you mean zuul has not enqueued the change into gate?20:16
mordredcorvus: it's written in reasonably modern C++20:16
mordredcorvus: https://github.com/fish-shell/fish-shell/blob/master/src/builtin_commandline.cpp#L12720:16
fungislittle1: its parent is outdated, so the change cannot merge in its current state. gerrit doesn't do a great job of indicating that20:17
corvusslittle1: it looks like that patch may be based on an outdated version of its parent20:17
corvusyeah that :)20:17
corvusa rebase on current branch tip should fix20:17
corvusmordred: neat :)20:18
fungislittle1: look at the relation chain, and if you follow the link to the "merged" parent you'll see it takes you to patchset 1 of 785814 when patchset 2 is what merged20:18
fungibasically the git parent of 785891 is a commit which will never appear in that branch, so the commit can't be attached to the target branch without a rebase20:19
*** lbragstad has quit IRC20:20
slittle1So shouldn't 785891 be marked as merge-conflict or needs-rebase or something ?20:30
fungislittle1: the old gerrit ui used to show an indicator that the change had an "outdated" parent20:33
fungiit's not evident in the new ui where it displays that, if it does20:34
fungii agree it's a bit confusing, i don't know if it's on the gerrit developers' radar as something which needs to be improved20:34
fungizuul doesn't even attempt to evaluate the change, because the gerrit api indicates the change cannot merge20:35
fungithe way developers usually avoid that in a dependent series of commits is to edit earlier commits with an interactive rebase, so that the commits following it will get updated accordingly, and then the whole series re-pushed (at which point gerrit will create new patchsets for any of the commits in that series which have been updated)20:38
fungii find https://pypi.org/project/git-restack/ very useful for such cases20:39
fungithe other way i've seen it dealt with is to fetch the change you want to edit, and then cherry-pick the subsequent changes from gerrit in order onto it, before pushing20:41
fungigit-review's -x option is useful for that method20:41
openstackgerritGonéri Le Bouder proposed zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped  https://review.opendev.org/c/zuul/zuul-jobs/+/78727120:44
fungislittle1: i'm actually somewhat confused, since https://review.opendev.org/Documentation/user-review-ui.html#related-changes-tab says (and shows a screenshot for) "If an ancestor change is marked with an orange dot it means that the currently viewed patch set depends on a outdated patch set of the ancestor change."20:55
fungii wonder if something is preventing those icons from displaying in our current theme20:55
corvusfungi: maybe that changes when ancestor is merged?20:55
fungioh, that might20:56
fungibe it20:56
fungi(merged) was showing for it where i would normally have expected the icon to be displayed instead20:57
fungiso perhaps it reuses the same field20:57
corvusfungi: merged is orange; is that different?20:57
fungiyeah, perhaps that's what they were trying to convey. not very colorblindness-friendly20:57
fungimaybe it would have also displayed a tooltip if i'd thought to try hovering over it21:00
ianwfungi / zigo : hopefully i can babysit the nodepool change in today if docker is behaving :)21:30
*** roman_g has joined #opendev21:41
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Remove use of --generator=run-pod/v1  https://review.opendev.org/c/zuul/zuul-jobs/+/78729121:43
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Remove use of --generator=run-pod/v1  https://review.opendev.org/c/zuul/zuul-jobs/+/78729121:43
openstackgerritIan Wienand proposed opendev/system-config master: OSU OSL : change upload format to RAW  https://review.opendev.org/c/opendev/system-config/+/78729321:58
openstackgerritGonéri Le Bouder proposed zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped  https://review.opendev.org/c/zuul/zuul-jobs/+/78727122:00
openstackgerritGonéri Le Bouder proposed zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped  https://review.opendev.org/c/zuul/zuul-jobs/+/78727122:04
ianwRamereth asked us to upload raw to OSU OSL, which i proposed, but also pointed out https://github.com/osuosl/packer-templates/blob/master/bin/upload.sh#L48-L5622:10
ianw --property hw_qemu_guest_agent=yes \22:10
ianw--property os_require_quiesce=yes22:10
ianwi don't think we run the qemu guest agent22:11
clarkbI don't think our images have a qemu guest agent?22:11
ianwand then the quiesce argument i think is related, in that if snapshotting, migrating etc. it sends a quiesce command via the guest agent22:11
ianwanyway, i'm not sure they're important arguments for us, but just something pointed out22:12
ianwraw images should put much less load on their ceph22:13
ianwkevinz: ^ not sure if this is something that would be helpful in linaro as well; not something i'd thought of before.  we just default to .qcow2 but doesn't have to be22:13
*** DSpider has joined #opendev22:15
clarkbianw: one thing to keep in mind is that this will likely double the disk space needs for our builder22:16
clarkbif we can do raw for linaro too that may be a good thing to keep builder disk needs down22:17
clarkb(we do fstrim the raw images so they end up roughly equivalent to qcow2 in disk use iirc)22:17
mordredclarkb, ianw : I agree - we do not do the qemu guest agent in our instances22:21
mordredand that's quite on purpose - we'd rather the instances just get killed and we can make new ones. having a provider try to "nicely" manage test nodes would be rather yuck22:22
mordredwhich is to say - yeah, we should not supply either of those parameters22:24
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Remove use of --generator=run-pod/v1  https://review.opendev.org/c/zuul/zuul-jobs/+/78729122:30
openstackgerritClark Boylan proposed opendev/git-review master: Install PBR explicitly  https://review.opendev.org/c/opendev/git-review/+/78729722:46
clarkbI suspect we're going to need a story for solving ^ soon. Going to see if this works22:46
clarkbnote that is preventing the --no-thin patch from landing22:46
openstackgerritSteve Baker proposed openstack/diskimage-builder master: Ensure redhat efi packages are reinstalled during finalise  https://review.opendev.org/c/openstack/diskimage-builder/+/78680422:58
openstackgerritClark Boylan proposed opendev/git-review master: Install PBR explicitly  https://review.opendev.org/c/opendev/git-review/+/78729723:15
clarkbfungi: ^ thta might make packagers have a fit, but I think that is a reasonable workaround23:15
ianw"Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible."23:16
ianwa new failure for the nodepool func job :/23:17
*** tosky has quit IRC23:17
clarkbfungi: looks like walmart has responded. I'm not sure what the proper fix is on the osa side? is it to pre clone then tell osa to fetch from there?23:34
clarkbI don't want to tell them the wrong thing23:35
openstackgerritMerged openstack/project-config master: nodepool: make ARM64 config names consistent  https://review.opendev.org/c/openstack/project-config/+/78635723:38
openstackgerritMerged opendev/system-config master: OSU OSL : change upload format to RAW  https://review.opendev.org/c/opendev/system-config/+/78729323:38
fungiclarkb: the last one to have this problem got help in #openstack-ansible... i suggested they follow up there on on the openstack-discuss ml23:39
fungibut if they really want to have us play proxy for a discussion with the ansible team, i guess we can try to get suggestions for them?23:39
clarkbwell I'm not sure they want us to play proxy, they just responded to the email you sent them a week ago23:40
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: Remove use of --generator=run-pod/v1 for oc  https://review.opendev.org/c/zuul/zuul-jobs/+/78730023:41
clarkboh maybe it isn't a response they are just reaching out direclt23:41
clarkbI wonder why they used incident23:41
clarkbmaybe we aren't clear enough about hte prupose of that list23:41
clarkbfungi: maybe you want to respond to them with the details you already sent to them :/23:41
fungii cc'd service-incident on the message i sent them about blocking their access, because i didn't want to broadcast their ip address info or otherwise publicly shame them23:41
fungiso that's probably why they replied there23:42
*** DSpider has quit IRC23:44
clarkbI'm tryingto find that email23:45
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped  https://review.opendev.org/c/zuul/zuul-jobs/+/78727123:45
clarkbbut searching walmart doesn't seem tp bring it up23:45
fungii sent it to "Chris Undernehr"23:46
fungiat the recommendation of some foundation account reps23:47
clarkbaha that found it23:47
clarkbya I think your initial email seems like a reasonable response23:48
fungiand yeah looking at the message they sent us, i have a feeling they never saw what i sent to their department head last thursday23:48
clarkbso I guess let the incident email through then respond with what you sent previously? Do you want to do that or should I?23:48
fungii can get to it shortly but have to switch rooms/computers23:49
clarkbcool, thanks (I don't think its a huge rush since we did reach out on thursday, but getting a response to them should hopeflly help them resolve things)23:49

