*** sgw has joined #opendev | 00:05 | |
*** ysandeep|away is now known as ysandeep|rover | 01:44 | |
*** ysandeep|rover is now known as ysandeep|rover|b | 02:36 | |
*** ykarel|away is now known as ykarel | 03:54 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [dnm] test with plain nodes https://review.opendev.org/712819 | 04:02 |
---|---|---|
*** ysandeep|rover|b is now known as ysandeep|rover | 04:19 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: use stage3 instead of stage4 for gentoo builds https://review.opendev.org/717177 | 05:52 |
*** sgw has quit IRC | 06:01 | |
*** dpawlik has joined #opendev | 06:06 | |
*** DSpider has joined #opendev | 06:35 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: use stage3 instead of stage4 for gentoo builds https://review.opendev.org/717177 | 06:56 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Adds roles to install and run hashicorp packer https://review.opendev.org/709292 | 06:57 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: use stage3 instead of stage4 for gentoo builds https://review.opendev.org/717177 | 07:01 |
*** mrunge_ is now known as mrunge | 07:05 | |
*** ykarel is now known as ykarel|afk | 07:21 | |
yoctozepto | morning folks | 07:34 |
yoctozepto | I have a weird issue with logs in swift: https://07bda34aab9549eeca55-002c6183c0f1ab9234471cb74705bd70.ssl.cf5.rackcdn.com/717073/2/check/kolla-ansible-centos8-source-upgrade/7b9a5ae/primary/logs/kolla/libvirt/ | 07:35 |
yoctozepto | the file has non-0 size in listing and really should have some content | 07:35 |
yoctozepto | yet downloads as a 0-length file ;/ | 07:35 |
yoctozepto | other files seem fine | 07:36 |
*** ykarel|afk is now known as ykarel | 07:37 | |
*** tosky has joined #opendev | 07:44 | |
openstackgerrit | Daniel Pawlik proposed zuul/zuul-jobs master: [DNM] - testing updated software-factory images https://review.opendev.org/717210 | 07:48 |
*** ykarel is now known as ykarel|afk | 07:55 | |
*** rpittau|afk is now known as rpittau | 07:56 | |
*** ralonsoh has joined #opendev | 07:57 | |
frickler | yoctozepto: iirc we had some reports of incomplete multipart-uploads, maybe the same is happening for single-part uploads too, sometimes | 07:59 |
*** ysandeep|rover is now known as ysandeep|rover|l | 08:03 | |
yoctozepto | frickler: ack, sad to lose logs | 08:04 |
*** ykarel|afk is now known as ykarel|lunch | 08:09 | |
*** dpawlik has quit IRC | 08:11 | |
*** dpawlik has joined #opendev | 08:11 | |
*** ysandeep|rover|l is now known as ysandeep|lunch | 08:32 | |
openstackgerrit | Merged openstack/project-config master: Add hourly periodic pipeline https://review.opendev.org/717063 | 08:44 |
openstackgerrit | Merged openstack/project-config master: Be clear that zone repos are owned by infra-core https://review.opendev.org/717121 | 08:44 |
*** rmart04 has joined #opendev | 08:48 | |
*** hashar has joined #opendev | 09:03 | |
*** ysandeep|lunch is now known as ysandeep|rover | 09:09 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Add support for RedHat platforms on install-podman https://review.opendev.org/716578 | 09:14 |
*** ykarel|lunch is now known as ykarel | 09:26 | |
openstackgerrit | Thierry Carrez proposed openstack/project-config master: release-approval pipeline: fix zuul-excluding regexp https://review.opendev.org/717277 | 10:06 |
*** rpittau is now known as rpittau|bbl | 10:29 | |
*** ysandeep|rover is now known as ysandeep|break | 11:05 | |
*** ysandeep|break is now known as ysandeep|rover | 11:48 | |
*** rpittau|bbl is now known as rpitau | 12:08 | |
*** rpitau is now known as rpittau | 12:09 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: Add phoronix-test-suite job https://review.opendev.org/679082 | 13:03 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: tox: allow tox to be upgraded https://review.opendev.org/690057 | 13:11 |
fungi | yoctozepto: does that file usually get served okay from other build results? | 13:20 |
openstackgerrit | Merged openstack/project-config master: release-approval pipeline: fix zuul-excluding regexp https://review.opendev.org/717277 | 13:20 |
fungi | interesting, certcheck says the https cert for mirror02.mtl01.inap.opendev.org isn't getting renewed | 13:24 |
fungi | ahh, it's the stale apache worker problem again | 13:26 |
fungi | openssl says "Not After : Jun 22 03:04:36 2020 GMT" there for me | 13:28 |
fungi | i've restart apache on it for good measure | 13:29 |
fungi | er, restarted | 13:29 |
fungi | #status log restarted apache on mirror02.mtl01.inap since certcheck spotted a stale apache worker | 13:30 |
openstackstatus | fungi: finished logging | 13:30 |
yoctozepto | fungi: yeah, it does | 13:37 |
fungi | okay, so it was just in that one build where it's corrupted, not a file which is consistently corrupted across builds. in that case frickler's theory is a good onw | 13:40 |
fungi | good one | 13:40 |
openstackgerrit | Merged opendev/system-config master: Run service-bridge in zuul and semaphore everything https://review.opendev.org/716745 | 13:42 |
openstackgerrit | Merged opendev/system-config master: Migrate gitea-lb to zuul https://review.opendev.org/716746 | 13:44 |
openstackgerrit | Merged opendev/system-config master: Run letsencrypt in zuul https://review.opendev.org/716747 | 13:47 |
openstackgerrit | Merged opendev/system-config master: Run nodepool in zuul https://review.opendev.org/716770 | 13:47 |
*** ysandeep|rover is now known as ysandeep|away | 13:59 | |
openstackgerrit | Monty Taylor proposed openstack/project-config master: zuul-worker: remove python-apt & libselinux deps https://review.opendev.org/716785 | 14:20 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: zuul-worker: remove additional install of apt-transport-https https://review.opendev.org/716789 | 14:20 |
openstackgerrit | Merged openstack/project-config master: Trigger infra-prod-service-nodepool on nodepool changes https://review.opendev.org/717135 | 14:24 |
mordred | woot. we are now triggering nodepool bulder ansible when we land nodepool things in project-config | 14:29 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Add support for RedHat platforms on install-podman https://review.opendev.org/716578 | 14:35 |
clarkb | mordred: part of me wants to merge silly changes and just watch it work. Disable gitea04, reenable gitea04 etc | 14:48 |
*** sgw has joined #opendev | 14:49 | |
*** ykarel is now known as ykarel|away | 14:52 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: tox: allow tox to be upgraded https://review.opendev.org/690057 | 15:01 |
*** yoctozepto has quit IRC | 15:17 | |
*** yoctozepto8 has joined #opendev | 15:18 | |
mordred | clarkb: yeah. that's actually why I pushed up updates to ianw's project-config patches above | 15:24 |
mordred | clarkb: https://review.opendev.org/#/c/716785/ should be safe to land and we can watch the ansible run | 15:24 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: use stage3 instead of stage4 for gentoo builds https://review.opendev.org/717177 | 15:36 |
mordred | clarkb, fungi : oh - we need to do this: https://review.opendev.org/#/c/717112/ | 15:41 |
mordred | (this is gerrit followup, not infra-prod-zuul stuff) | 15:42 |
clarkb | mordred: I don't see where ${git_cmd} is defined in those crons? | 15:43 |
*** rpittau is now known as rpittau|afk | 15:43 | |
mordred | clarkb: oh! good point | 15:45 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add cron jobs that were managed by puppet https://review.opendev.org/717112 | 15:46 |
mordred | clarkb: also - I don't thnk we need to escape those " do we? | 15:47 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add cron jobs that were managed by puppet https://review.opendev.org/717112 | 15:48 |
*** ysandeep|away is now known as ysandeep | 15:49 | |
clarkb | mordred: nope, thats from puppet string interpolation | 15:49 |
clarkb | mordred: because we have to use " in puppet to substitute git_cmd | 15:49 |
mordred | yah | 15:50 |
corvus | mordred: i want to jump in here; where do i start? | 15:50 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: use stage3 instead of stage4 for gentoo builds https://review.opendev.org/717177 | 15:50 |
mordred | corvus: which thing do you want to jump in with? | 15:50 |
corvus | mordred: the fun, easy thing? or maybe the zuul cd stuff | 15:50 |
mordred | corvus: the zuul cd is all fun and easy! | 15:51 |
corvus | mordred: should i just start reviewing 716771? | 15:51 |
mordred | if you review and land https://review.opendev.org/#/c/716785/ | 15:51 |
clarkb | mordred: one minor string thing related to the "s | 15:51 |
clarkb | (inlined on the change) | 15:51 |
mordred | corvus: you can watch zuul run nodepool playbooks when it's done | 15:51 |
corvus | mordred: ah cool didn't see that | 15:51 |
mordred | corvus: because we landed that! | 15:51 |
mordred | clarkb: we're sure we don't need to double \ in ansible? | 15:52 |
clarkb | mordred: you don't have it on line 326 | 15:52 |
mordred | good point | 15:53 |
clarkb | and I think the yaml ' quoting will preserve a \ | 15:53 |
clarkb | but now I'm double checking | 15:53 |
mordred | although I should put a ' on 326 | 15:53 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add cron jobs that were managed by puppet https://review.opendev.org/717112 | 15:53 |
clarkb | yup yaml single quotes preserves the \ | 15:53 |
mordred | corvus: https://review.opendev.org/#/q/status:open+topic:infra-prod-zuul is the remaining zuul-cd stack | 15:54 |
mordred | corvus: paused at landing it on the nodepool one so that we could watch that work before doing more of them | 15:54 |
*** JayF is now known as JasonF | 15:55 | |
corvus | mordred: something is still rubbing me a little wrong about 716771 -- why do we have so many jobs instead of one playbook that does all those things? | 15:55 |
*** JasonF is now known as JayF | 15:55 | |
corvus | mordred: like the system-config, install-ansible, base, letsencrypt.... | 15:55 |
corvus | shouldn't there be a 'meetpad' playbook that does all of those if necessary? | 15:56 |
corvus | mordred: oh, most of those are soft depends... | 15:56 |
*** ysandeep is now known as ysandeep|away | 15:56 | |
mordred | corvus: oh - well, I mean - that would be a rearchitecture of our existing playbooks which I wasn't really doing ... yeah | 15:56 |
corvus | so they won't normally run? | 15:56 |
mordred | yeah | 15:56 |
mordred | they've all got file matchers on them | 15:56 |
mordred | so we should be running what we need to run when we need to run it and not running if we don't | 15:57 |
mordred | (except for in periodic, where we'll run everything) | 15:57 |
corvus | well, we started with 0 playbooks, and i thought we wanted a service-based architecture, so i wasn't thinking there should be a rearchitecture. :) | 15:57 |
corvus | (like, i'm thinking all the way back to when we started planning this :) | 15:57 |
corvus | i always thought it was supposed to be "run the etherpad playbook" | 15:57 |
mordred | yeah. I think there's also a reason for some of the base separation (except for base itself) | 15:58 |
corvus | anyway, i guess the soft deps make it moot for now | 15:58 |
mordred | which is that we need to run install-ansible separately so that it gets picked up in the next ansible runs | 15:58 |
mordred | and we need to update system-config separately | 15:58 |
clarkb | I think what we are doing is mapping the current shell script of run this playbook run that playbook, onto zuul | 15:58 |
mordred | base I *totally* agree with | 15:58 |
clarkb | from there we can continue to split things | 15:58 |
mordred | base should probably go away and be things we run in each service playbook | 15:58 |
corvus | mordred: why does meetpad need system-config updated? | 15:59 |
clarkb | the only realy difference so far as been knowing when we need to run playbooks and not running them if we don't need to (but the old shell script ran them all regardless) | 15:59 |
mordred | because system-config is where the meetpad playbook is | 15:59 |
corvus | mordred: and we're not syncing repos from zuul onto bridge | 15:59 |
mordred | so if we update the meetpad playbook we need to pull the latest system-config to get the updated playbook | 15:59 |
mordred | right | 15:59 |
mordred | we can probably move to that | 15:59 |
mordred | and remove update-system-config | 15:59 |
mordred | but it'll be easier to do that once _everything_ is a zuul job and run_all is gone | 15:59 |
corvus | mordred: ack; i don't have an opinion on which is better yet :) | 16:00 |
mordred | me either :) | 16:00 |
mordred | corvus: one downside to including base in service playbooks is that it's a LOT of tasks that are mostly no ops most of the time but still take a decent amount of time to run | 16:00 |
mordred | by keeping it separate and triggering with files matchers we can avoid nooping the install of all the infra sysadmins each time and only run the thing that updated | 16:01 |
mordred | we'd also wind up with a larger surface area of triggering if we triggered on all of the base files for every service | 16:01 |
mordred | but ... I think we could also go to single-playbook and get rid of base and that would be valid too | 16:02 |
mordred | (I think we're still learning here) | 16:02 |
clarkb | and for right now mapping the shell script onto zuul makes the reviews nice and symmetric | 16:02 |
clarkb | its clearer when things are correct | 16:02 |
corvus | mordred: yeah. letsencrypt might be the more interesting thing to fold in first. | 16:02 |
mordred | corvus: yeah | 16:02 |
mordred | corvus: and would be the one that would make more important logical service sense | 16:03 |
corvus | mordred: +2 with comment on 716771 | 16:03 |
mordred | corvus: I like that comment | 16:03 |
openstackgerrit | Merged openstack/project-config master: zuul-worker: remove python-apt & libselinux deps https://review.opendev.org/716785 | 16:03 |
mordred | corvus: I think I might do a followup that goes in and follows that pattern across all the jobs | 16:03 |
mordred | (where appropriate) | 16:03 |
corvus | (i'm confident the first half is a good idea; the second one may have unintended consequences) | 16:04 |
clarkb | don't forget to do similar on the group_vars path too :) | 16:05 |
clarkb | though amybe thats what the second suggestion gets us globally | 16:05 |
corvus | mordred: why doesn't 716772 have a dep on base, etc? | 16:05 |
mordred | corvus: yay! the nodepool project-config change failed in promote - | 16:05 |
mordred | fix coming | 16:05 |
mordred | corvus: because they're dep'd in infra-prod-service-base | 16:06 |
corvus | mordred: then shouldn't that be true for meetpad? | 16:06 |
mordred | corvus: meetpad adds a dep | 16:06 |
clarkb | ya I think we decided that you can't merge deps, that changing deps requires them to be relisted | 16:07 |
mordred | and I thought deps were overrides rather than adds - so I've been copying the full dep list when we need to add one | 16:07 |
corvus | i believe that's correct | 16:07 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Add update-system-config to promote list for project-config https://review.opendev.org/717328 | 16:10 |
mordred | corvus, clarkb : that should fix the error from https://review.opendev.org/#/c/716785/ | 16:10 |
mordred | corvus: this is fun isn't it? | 16:11 |
corvus | mordred: :) | 16:12 |
mordred | (we could alternately give it a dependencies: [] - since it's a project-config change it doesn't ACTUALLY need to update system-config) | 16:12 |
mordred | corvus, clarkb : once 717328 lands, we can land https://review.opendev.org/#/c/716789/ as a second test | 16:15 |
clarkb | k | 16:16 |
clarkb | I'm about to enter tech support for remote learning/school stuff. Its fun because we get to use zoom! | 16:16 |
fungi | you're just secretly planning to zoom-bomb your preschoolers' virtual classrooms, aren't you? | 16:17 |
clarkb | fungi: well not sure if you know about the behavior of 4 year old but they basically zoom bomb each other already | 16:17 |
clarkb | the teachers have discovered they can manage mute toggle of participants though so heopfully things will get better | 16:18 |
fungi | chances are the students showed them where | 16:20 |
corvus | clarkb: https://www.nytimes.com/2020/03/30/technology/new-york-attorney-general-zoom-privacy.html | 16:20 |
*** mlavalle has quit IRC | 16:20 | |
clarkb | corvus: yes I'm well aware | 16:20 |
clarkb | ironically if I want to read the nytimes I have to use the tabs with google account enabled | 16:21 |
corvus | clarkb: k, wasn't sure if you saw that article yet -- it's in a pretty accessible form that non-tech-experts might be comfortable digesting | 16:21 |
corvus | mordred: we can't land 789, it has a depends-on, and i think we'd need a new nodepool builder image | 16:23 |
*** mlavalle has joined #opendev | 16:23 | |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: opensuse: fix python 2.x install https://review.opendev.org/716437 | 16:23 |
mordred | corvus: ah - oh, good point | 16:23 |
clarkb | (by the way firefox tab containers are great! you discover fun things like that when your default has nothign signed in, website doesn't work, switch to container group with google anv oila!) | 16:23 |
corvus | mordred: want to make a no-op change to test, or just leave it for the next time an element changes? | 16:24 |
fungi | clarkb: i use ff tab containers extensively on a few of my machines, but have slowly warmed to the idea of just setting my browser permanently incognito | 16:25 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Trigger nodepool run https://review.opendev.org/717330 | 16:26 |
mordred | corvus: there's a noop | 16:27 |
mordred | corvus: IIRC - noop changes match all file matchers so all the jobs should get run | 16:27 |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: Build openSUSE images on opensuse nodepool image https://review.opendev.org/717331 | 16:28 |
openstackgerrit | Merged openstack/project-config master: Add update-system-config to promote list for project-config https://review.opendev.org/717328 | 16:37 |
fungi | clarkb: just got around to looking at the flock the opensuse-15 builds on nb02 are complaining about, lsof says one of the processes with an open file handle on it is: | 16:42 |
fungi | nodepool 24024 0.0 0.0 26960 3236 ? S Apr01 0:00 /bin/bash /opt/dib_tmp/dib_build.DFvCP7NK/hooks/extra-data.d/98-source-repositories | 16:42 |
fungi | that seems like it's been running longer than i would expect | 16:42 |
mordred | clarkb, fungi, corvus : quick +A on https://review.opendev.org/#/c/717330 ? | 16:42 |
fungi | mordred: looking | 16:42 |
mordred | fungi: I agree, that seems a little longer than we'd want | 16:42 |
openstackgerrit | Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339 | 16:42 |
*** yoctozepto8 is now known as yoctozepto | 16:43 | |
clarkb | fungi: oh I was looking at nb01 not nb02, but nb01 definitely has stale processes | 16:43 |
clarkb | ps -elf | grep disk-image-create | 16:43 |
clarkb | mordred: and we expect that to match all file matchers/ I guess we'll find out? | 16:44 |
fungi | it seems like a harmless experiment at the very least | 16:46 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Support multiple matchers when parsing tox output https://review.opendev.org/716263 | 16:46 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Don't silently ignore exceptions when parsing tox output https://review.opendev.org/716766 | 16:47 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Strip source dir from file comments https://review.opendev.org/716264 | 16:47 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Ignore absolute paths after stripping work dir https://review.opendev.org/717042 | 16:47 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Add test cases for tox line comment parsing https://review.opendev.org/717341 | 16:47 |
clarkb | fungi: my hunch is that certain things crash in such a way that the processes don't clean up nicely | 16:47 |
fungi | though long term i wonder if we really want to be littering our repos with a bunch of empty commits just to manually trigger things | 16:47 |
clarkb | fungi: but I have no idea what that may be | 16:47 |
fungi | clarkb: looking at ps tree view suggests it's stalled out installing distro packages | 16:47 |
fungi | yum install install dnf dnf-plugins-core curl libcurl glibc-minimal-langpack glibc-langpack-en coreutils | 16:48 |
fungi | s/install // | 16:48 |
clarkb | oh hrm thats fedora | 16:48 |
clarkb | or does centos 8 do dnf now too? | 16:48 |
fungi | oh, yep, i was looking at the wrong tree | 16:48 |
clarkb | (fedora should be on nb04 only fwiw) | 16:48 |
fungi | yeah, so 24024 on nb02 has no children | 16:50 |
fungi | strace says it's stuck at: write(1, "Updating cache of https://opende"..., 160 | 16:50 |
openstackgerrit | Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339 | 16:50 |
openstackgerrit | Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339 | 16:52 |
fungi | looks like AJaeger and i are the only subscribers on service-discuss so far | 16:53 |
clarkb | thanks for the reminder! | 16:54 |
fungi | don't forget to sign up if you want to use our shiny new ml! http://lists.opendev.org/cgi-bin/mailman/listinfo/service-discuss | 16:54 |
fungi | clarkb: also i'm happy to co-admin the list if you let me know what you wind up resetting the listadmin pw to | 16:54 |
clarkb | fungi: k, I'm sorting that out now | 16:55 |
fungi | there's no hurry | 16:55 |
fungi | while i'm poking around, it looks like we should do a better job of promoting the service-announce ml too... it currently has 5 subscribers (4 of whom are on the osf staff, i don't recognize the 5th one) | 16:56 |
fungi | and probably want to make sure service-announce is set to moderate all posts with followup to service-discuss | 16:57 |
fungi | i'm happy to help admin that one too if you like | 16:57 |
openstackgerrit | Merged openstack/project-config master: Trigger nodepool run https://review.opendev.org/717330 | 16:58 |
corvus | fungi, clarkb: i think a "please subscribe to service-discuss" email sent to openstack-infra might be warranted? | 16:59 |
fungi | corvus: absolutely | 16:59 |
fungi | didn't know if we wanted to wait for next tuesday, but i don't see a compelling reason to hold off | 17:00 |
fungi | on a related note, there are only 5 non-bots in http://eavesdrop.openstack.org/irclogs/%23opendev-meeting/ | 17:00 |
fungi | but it's ready for service too | 17:00 |
fungi | maybe we should do a quick pass of the two service-.* ml configs and then suggest subscribing to both when we e-mail the old ml? | 17:01 |
clarkb | fungi: sound sgood | 17:03 |
clarkb | fungi: what do you mean by "with followup to service-discuss"? | 17:05 |
clarkb | forward to service-discuss? | 17:05 |
fungi | reply-to | 17:05 |
clarkb | ah | 17:05 |
clarkb | ok I'll give that a go then you can double check what I've done | 17:06 |
fungi | so that if someone replies to an announcement from the service-announce ml their reply goes to service-discuss | 17:06 |
mordred | corvus: https://review.opendev.org/#/c/717330/ ... we're getting a graph issue because we didn't define infra-prod-letsencrypt, which is a soft-dep from nodepoo | 17:08 |
clarkb | fungi: I've got the reply to set. Digging around for forcing moderation on all emails | 17:09 |
corvus | mordred: that suggests you'll need the image promote job too, yeah? | 17:10 |
clarkb | fungi: is hat "set everyone's moderation bit, including those not visiable" option that I want? | 17:11 |
mordred | corvus: should we maybe instead re-define the deps list there in project-config | 17:11 |
corvus | mordred: so maybe the dependencies:[] is the better approach? | 17:11 |
corvus | mordred: yeah that | 17:11 |
clarkb | I'm taking the secrets file lock in order to add these passwds there as others may need to interact with the lists from time to time | 17:11 |
*** mlavalle has quit IRC | 17:11 | |
fungi | clarkb: i think that's a one-time thing to set the moderation bit on all current subscribers, but i'll look in just a sec | 17:12 |
mordred | corvus: should we keep update-system-config? | 17:13 |
mordred | corvus: my only concern would be de-duplication of the nodepool job - but project is part of the uniqifier right? | 17:13 |
clarkb | done with the lock | 17:14 |
corvus | mordred: deduplication? like in the supercedent pipeline? | 17:14 |
mordred | yeah | 17:14 |
fungi | clarkb: almost on the reply-to... see the option immediately above the field you filled in | 17:14 |
corvus | mordred: pipelines deduplicate items, not jobs. so a change to project-config followed by a change to system-config will always run the jobs for both. | 17:15 |
clarkb | fungi: I want to set that to explicit address and remove the one I set? | 17:15 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Update deps list for nodepool job https://review.opendev.org/717351 | 17:15 |
mordred | corvus: cool - then that ^^ should do what we want | 17:15 |
fungi | clarkb: pretty sure you need both, set reply_goes_to_list to explicit address radio button, then specify the explicit address in the reply_to_address option immediately below it | 17:16 |
corvus | mordred: (the only thing that would be potentially deduplicated is two changes to the same branch of project-config in a row. note that means there's actually a pretty high probability that we may not run the jobs we need to. | 17:16 |
clarkb | fungi: gotcha | 17:16 |
clarkb | fungi: done, if you want ot check it | 17:16 |
mordred | corvus: yeah - I was wondering how file matchers and supercedent pipeline would work out | 17:16 |
corvus | mordred: i think promote with supercedent probably works best without file matchers | 17:16 |
fungi | clarkb: afaik reply_to_address is only ever used *if* reply_goes_to_list is set to explicit reply address | 17:17 |
corvus | mordred: we may want to take AJaeger's advice and make a new pipeline | 17:17 |
mordred | yeah. so ... that's unfortunate :( | 17:17 |
mordred | yeah | 17:17 |
corvus | mordred: make an independent change-merged pipeline, so there's no deduplication | 17:17 |
fungi | clarkb: yep, that looks right | 17:17 |
mordred | corvus: ++ | 17:17 |
corvus | mordred: call it 'deploy'? | 17:17 |
mordred | corvus: I'm starting to be sad that we're not in the opendev tenant already | 17:17 |
clarkb | fungi: I'm not seeing anything other forced moderation options unless we se tallowed message size to 0 | 17:18 |
fungi | clarkb: default_member_moderation under Privacy options...Sender filetrs | 17:20 |
fungi | filters | 17:20 |
fungi | the way to do it is set that to yes and then also manually (individually or mass) set the moderation bit on the handful of existing subscribers | 17:20 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Add a deploy pipeline https://review.opendev.org/717353 | 17:21 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Use deploy pipeline for project-config deployments https://review.opendev.org/717354 | 17:21 |
mordred | corvus: ^^ | 17:21 |
fungi | clarkb: and unset the moderation bit for any subscribers you want to be able to post announcements directly without being reviewed | 17:21 |
fungi | so what will happen then is any new subscribers will automatically be set to moderated | 17:22 |
mordred | corvus: here's another fun edge case ... | 17:22 |
clarkb | fungi: ok I've moderated everyone but you and I and changed that default setting | 17:22 |
corvus | mordred: why not squash those 2? | 17:22 |
clarkb | fungi: thanks for the elp | 17:22 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Add test cases for tox line comment parsing https://review.opendev.org/717341 | 17:22 |
mordred | corvus: that'll work? I was thinking the deploy would be a config error until we landed it - but now that I say that, of course it won't be :) | 17:23 |
clarkb | fungi: though the set everyones moderation bit is flipped. I don't think I did that, did you? | 17:23 |
mordred | corvus: while we're talking about this ... | 17:23 |
fungi | clarkb: my pleasure, we can also untick moderation on our other sysadmins once they subscribe, if there's a consensus that's how we want to operate the announcements list | 17:23 |
fungi | clarkb: i didn't, as far as i know | 17:23 |
clarkb | fungi: perhaps that was toggled by the default changing on the privacy page | 17:23 |
clarkb | fungi: I think that must've been it | 17:23 |
mordred | corvus: some of the deploy jobs depend on an image being published - but that's a promote pipeline thing. should we move those image publications to deploy too, even if it means no de-dupe for them? | 17:23 |
fungi | clarkb: oh, i see what you're talking about, at the bottom of the member list view. that's an action masquerading as a config option ;) | 17:24 |
clarkb | fungi: I thought it was set to no before | 17:24 |
corvus | mordred: yeah, they'll all need to be in the same pipeline | 17:25 |
clarkb | but no wI see the "set" button is seprate fro mthe list update | 17:25 |
clarkb | so ya all good | 17:25 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Add and use a deploy pipeline https://review.opendev.org/717353 | 17:25 |
fungi | if you hit the set button while the radio toggle is at "on" it will set all current subscribers to moderated. if you hit the set button while the radio toggle is at "off" it'll unset everyone's moderation flag. it's not really a config option on its own | 17:25 |
mordred | corvus: kk. | 17:25 |
clarkb | fungi: yup | 17:25 |
clarkb | I missed the "set" button at first | 17:25 |
clarkb | fungi: on nb01 what do you think of my idea of stopping the builder service there and checking if dib cleans up at that point? | 17:26 |
clarkb | fungi: then we can start the builder up again | 17:26 |
fungi | clarkb: one potential topic for next week's meeting could be whether we want to adopt the same dmarc avoidance policies on service-discuss as we've got applied on openstack-discuss and zuul-discuss | 17:26 |
clarkb | fungi: I want to say we did apply them to -infra for testing? | 17:27 |
clarkb | so maybe we should for consistency, but can discuss it there | 17:27 |
fungi | yeah, i mean i'm in favor, but picking a stance on that and stating it clearly while the list is fresh is a good time to do so | 17:27 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Switch to deploy pipeline for deployments https://review.opendev.org/717356 | 17:28 |
mordred | corvus: k. there's the other side of that then | 17:28 |
fungi | clarkb: also we should set some good description and info text for these | 17:28 |
fungi | i'm happy to take a stab at that | 17:29 |
clarkb | fungi: go for it | 17:29 |
fungi | clarkb: yeah, stopping the builder can't hurt and would be good to find out what actually happens | 17:29 |
*** mlavalle has joined #opendev | 17:29 | |
fungi | though getting to the bottom of the stuck processes might be good to if we have time to do so | 17:30 |
clarkb | ok I'll give that a go on nb01 now and ee if it cleans up those processes (i expect they will get zombied and reparented to init | 17:30 |
mordred | clarkb, AJaeger: https://review.opendev.org/#/c/717353 could use eyeballs - and then https://review.opendev.org/717356 | 17:30 |
mordred | (because of the issue identitied with promote + files matchers) | 17:30 |
*** dpawlik has quit IRC | 17:30 | |
clarkb | yup all the dib processes went away when nodepool-builder stopped | 17:30 |
clarkb | maybe we just need nodepool to waitpid harder? | 17:31 |
clarkb | I mean if init can clean them up we should be able to? | 17:31 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Add test cases for tox line comment parsing https://review.opendev.org/717341 | 17:34 |
clarkb | I think I see the nodepool bug | 17:35 |
*** mlavalle has quit IRC | 17:37 | |
corvus | biab | 17:37 |
*** mlavalle has joined #opendev | 17:39 | |
clarkb | https://review.opendev.org/717357 pushed to help with nodepool dib process leaks | 17:41 |
*** mlavalle has quit IRC | 17:42 | |
mordred | clarkb: lgtm | 17:44 |
fungi | clarkb: okay, i've drafted description and info text for both mailing lists, take a look and see if you want to tweak anything | 17:48 |
fungi | inspiration taken from openstack-announce and openstack-discuss | 17:48 |
clarkb | fungi: lgtm | 17:50 |
fungi | i guessed at the scope we ultimately want for the announcements list, we can go over that in the meeting too | 17:52 |
fungi | and we might want to consider forklifting the netiquette wiki article into the infra manual | 17:53 |
fungi | then link to that in the discuss ml info instead of to wiki.openstack.org | 17:53 |
openstackgerrit | sebastian marcet proposed opendev/puppet-openstackid master: Fixed permissions issues on SpammerProcess https://review.opendev.org/717359 | 17:54 |
*** hashar is now known as hasharDinner | 17:55 | |
clarkb | mordred: actually I think that change isn't necssary. There are two loops happening there and I thought there was just one | 18:11 |
clarkb | mordred: we fall into the subprocess checking via the outer loop | 18:12 |
clarkb | so now I think we want ot look for logged timeouts | 18:12 |
clarkb | and maybe we need a p.wait() after the kill instead | 18:12 |
*** mlavalle has joined #opendev | 18:13 | |
clarkb | we never log a build timeout though | 18:16 |
clarkb | subprocess does say that process.wait() can deadlock if using PIPE for stdout and stderr. We do use a PIPE I wonder if that is what is happening here | 18:17 |
clarkb | that would also explain why we don't hit the timeout case because we'd be stuck on the wait condition | 18:18 |
clarkb | I *think* what we want to do is get a thread dump and see if we are deadlocked in subprocess wait routines | 18:18 |
clarkb | fungi: ^ | 18:18 |
fungi | that sounds likely | 18:19 |
fungi | i'm about to disappear on our weekly outing to pick up our grocery order and some takeout pizza though | 18:19 |
corvus | clarkb, mordred: do we want to: 1) merge the 'deploy' pipeline change and rebase everything in system-config? 2) ignore the deploy pipeline for now and keep merging system-config changes? | 18:25 |
clarkb | corvus: mordred: I'm thinking that maybe we should switch to deploy to start as it will avoid any oddness that might demand debugging later | 18:27 |
clarkb | also whay does https://review.opendev.org/#/c/717356/1/.zuul.yaml switch the image builds to deploy from promote | 18:27 |
corvus | clarkb: because of job dependencies | 18:27 |
clarkb | ah right bceause we consume those images in deploy | 18:27 |
corvus | yep | 18:27 |
clarkb | I've acced that change and its parent but not approved in case consensus is to switch after landing things | 18:28 |
clarkb | *acked | 18:28 |
corvus | clarkb: i think maybe we should go ahead and merge it; i guess we could probably still move system-config to deploy later | 18:33 |
corvus | er scratch that, nm | 18:33 |
corvus | let's merge that and rebase the system-config stack | 18:33 |
*** ralonsoh has quit IRC | 18:38 | |
*** rmart04 has quit IRC | 18:38 | |
*** hasharDinner has quit IRC | 18:55 | |
*** hashar has joined #opendev | 18:57 | |
mordred | clarkb, corvus : yes - I agree- let's merge it | 19:04 |
mordred | also - I don't think we'll need to rebase the system-config stack - I think the patches in the stack will merge onto it just fine | 19:05 |
clarkb | mordred: oh because we edit the lines above ya | 19:05 |
mordred | (I might be wrong - but I think it'll just work) | 19:05 |
mordred | yeah | 19:05 |
clarkb | git may just do the right thing, I agree | 19:05 |
mordred | there's enough separation | 19:05 |
openstackgerrit | Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339 | 19:12 |
corvus | fingers crossed :) | 19:17 |
corvus | clarkb, mordred: gah, i did not notice https://review.opendev.org/717351 | 19:17 |
clarkb | why do we not need the deps in that case? | 19:19 |
clarkb | don't we still want system-config playbooks to be updated and LE to run first etc? | 19:19 |
corvus | clarkb: not on on project-config where we're only running that to install new nodepool elements | 19:19 |
clarkb | its also installing the nodepool confg which could depend on other bits? | 19:20 |
corvus | i dont think it needs to run le | 19:21 |
clarkb | ya I think system-config playbooks are what I'm most concerned about | 19:22 |
clarkb | specifically we write out clouds.yaml for nodepool in system-config but consume that in the nodepool configs in project-config | 19:22 |
corvus | clarkb: but they can't be updated in this change | 19:22 |
corvus | er | 19:22 |
corvus | clarkb: but they can't be updated in a change to project-config | 19:22 |
clarkb | so you might add a new cloud and get that written out without the properly clouds.yaml on disk | 19:22 |
clarkb | corvus: its the consumption that is updated in project-config | 19:22 |
clarkb | https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl01.openstack.org.yaml that stuff | 19:23 |
corvus | clarkb: i don't think we have any testing that the nodepool config file works with the clouds.yaml contents from system-config | 19:24 |
corvus | clarkb: so even if we did run it, zuul isn't going to stop us from merging a change that adds "cloud-that-does-not-exist" to nodepool.yaml | 19:24 |
clarkb | correct its the side effects on production I'm worried about | 19:25 |
clarkb | basically even if we do all the things properly with depends on and merge things in order we could apply them in the wrong order an dmake things unhappy | 19:25 |
corvus | clarkb: however, if we do the process correctly, then we will add a new cloud to system-config first, when that change merges, the system-config deploy job will put it on disk. then a project-config change to use it can rely that it will be there. | 19:25 |
clarkb | granted it will likely only be unhappy for a short time | 19:25 |
corvus | clarkb: i don't think we can do it in the wrong order, especially with a depends-on | 19:25 |
clarkb | corvus: they can merge concurrently | 19:25 |
clarkb | or at least one after the other | 19:25 |
clarkb | couldn't the project-config deploy job run before the system-config deploy job in that case? | 19:26 |
clarkb | (bceause zuuls scheduling isn't strictly ordered? though maybe that isn't true for empty nodeset jobs) | 19:26 |
corvus | clarkb: it generally shouldn't but could occasionally run in the wrong order if they merged close together | 19:27 |
corvus | clarkb: don't we have a semaphore though? | 19:27 |
clarkb | corvus: we do have a semaphore does that enforce order or just locking? | 19:27 |
corvus | clarkb: just locking, but the only way they could run out of order is if an executor failed or was slow. so if they share a semaphore, they'll run in order. | 19:28 |
clarkb | ok, I think the risk in this case then is that changes merged together could write invalid configs until the other change's deploy job ran. In most cases that would be a short period of invalid config. | 19:28 |
clarkb | do we think it is worthwhile to properly express that dependency? | 19:29 |
clarkb | Maybe I'm missing the motivation for removing it in the first place | 19:29 |
corvus | i think if there's a depends-on, then we can't break it | 19:29 |
corvus | and i think the motivation is so we don't put a whole bunch of the system-config deploy jobs in the project-config pipeline | 19:30 |
clarkb | the file filter would've restricted it to when nodepool was updated | 19:31 |
corvus | clarkb: yes, but we'd have to add the letsencrypt and base and system-config update when it's not actually possible to make a change to project-config which can do anything that would require those jobs to run | 19:31 |
clarkb | ok maybe thats what I'm missing | 19:31 |
clarkb | we can't run the system-config update without running base and le? | 19:32 |
mordred | we can ... but the deps list for the job currently has them | 19:32 |
corvus | yeah, any change to things that those playbooks touch are in the system-config project; when changes to system-config to touch those things merge, system-config will run the whole set of jobs | 19:32 |
mordred | we could also override the deps list to only have the system-config job | 19:32 |
mordred | instead of overiding to [] | 19:32 |
clarkb | mordred: I think thats what I'm trying t oget at :) | 19:33 |
corvus | but the update-system-config job is not necessary | 19:33 |
mordred | but I agree with corvus | 19:33 |
mordred | that the system-config is not necessary | 19:33 |
clarkb | I'm still not sure I understand that | 19:33 |
mordred | because the system-config change that would go in woudl trigger it | 19:33 |
mordred | so when system-config changes, it'll run the update-system-config | 19:33 |
mordred | and it'll update system-config | 19:33 |
clarkb | right but didn't corvus say they could run out of order in some cases? | 19:33 |
corvus | no, i revised that to say, with a semaphore, that can't happen | 19:34 |
clarkb | and expressing the dep explicitly would avoid that | 19:34 |
mordred | yeah | 19:34 |
mordred | the semaphore gets us here | 19:34 |
clarkb | ok thats no thow I parsed that. I still parsed that as "in some cases wecan run out of order" and those cases would still be addressed by explicit dep | 19:35 |
clarkb | *not how | 19:35 |
corvus | pipeline change queues are ordered; so even though these projects don't share a change queue (because it's an independent pipeline), they're still created in order | 19:35 |
clarkb | corvus: but not necessarily executed in order (which was my qusteion about empty node set jobs) | 19:35 |
corvus | i said that before we started talking about semaphores. i'm sorry i didn't remember we had semaphores at first, but as soon as i did, i attempted to correct myself. | 19:36 |
clarkb | ok | 19:36 |
clarkb | I think I'm still of the opinion that being explicit is worthwhile if it makes the system correct and more intuitively understandable. But if the setup as proposed is correct I'll go with it. | 19:37 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: install-yarn: add coverage for all platforms https://review.opendev.org/717375 | 19:38 |
corvus | clarkb: if we're worried about sequencing there, i'd rather do more work to make sure the jobs only run in the right order (ie, if we relax the semaphore restriction, make sure they still run in the right order). i don't think we should ever need to run the update-system-config on a change to project-config, and i think it's more correct for us not to. | 19:39 |
clarkb | also for anyone else following along here I think the lightbulb just went off a bit brighter. deploy peipeline is not supercedent | 19:40 |
clarkb | this means we'd be running redundant jobs earlier in the queue | 19:40 |
clarkb | or later depending on which way you look at it I suppose | 19:40 |
clarkb | corvus: I think my problem is I'm still thinking of it from a supercedent perspective | 19:42 |
clarkb | corvus: where you want any one run to ensure all of its deps run in that pipeline item | 19:42 |
clarkb | what we've done is split that out into running jobs for every change in the pipeline (in order via semaphore) | 19:43 |
clarkb | and so don't need each thing to always have its deps (they'll have run ahead) | 19:43 |
corvus | clarkb: yes, we can (due to semaphore) guarantee that | 19:44 |
corvus | clarkb, mordred: we may want to experiment with having a dependent pipeline with a defined shared queue | 19:44 |
AJaeger | anybody to review updating openstack-zuul-jobs hacking version, please? https://review.opendev.org/716261 | 19:45 |
corvus | it's unorthodox, but could be interesting | 19:45 |
mordred | corvus: hrm. that could be interesting | 19:45 |
openstackgerrit | Merged zuul/zuul-jobs master: Add phoronix-test-suite job https://review.opendev.org/679082 | 19:46 |
mordred | I"m not 100% sure we need it given the semaphore thing ... but it also might not be crazy | 19:46 |
*** dpawlik has joined #opendev | 19:48 | |
openstackgerrit | Merged openstack/project-config master: Update deps list for nodepool job https://review.opendev.org/717351 | 19:51 |
openstackgerrit | Merged openstack/project-config master: Add and use a deploy pipeline https://review.opendev.org/717353 | 19:51 |
corvus | mordred: it'll look really good | 19:52 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: install-yarn: always install https://review.opendev.org/717375 | 19:56 |
openstackgerrit | Merged zuul/zuul-jobs master: Support multiple matchers when parsing tox output https://review.opendev.org/716263 | 20:00 |
openstackgerrit | Merged zuul/zuul-jobs master: Don't silently ignore exceptions when parsing tox output https://review.opendev.org/716766 | 20:00 |
openstackgerrit | Merged zuul/zuul-jobs master: Strip source dir from file comments https://review.opendev.org/716264 | 20:00 |
openstackgerrit | Merged zuul/zuul-jobs master: Ignore absolute paths after stripping work dir https://review.opendev.org/717042 | 20:00 |
openstackgerrit | Merged zuul/zuul-jobs master: Add test cases for tox line comment parsing https://review.opendev.org/717341 | 20:02 |
mordred | corvus: well that's reason enough | 20:03 |
mnaser | i dont see more than 1 log for bionic on nb01 | 20:08 |
fungi | this discussion reminds me, i guess it's dangerous to use file filters on jobs in supercedent pipelines, since the deduplicated builds might run different sets of jobs | 20:08 |
mnaser | but https://nb01.openstack.org/ubuntu-bionic-plain-0000000031.log from a couple days ago installed gnupg https://nb01.openstack.org/ubuntu-bionic-plain-0000000043.log the most recent one does not show it being installed | 20:09 |
corvus | see scrollback in #zuul for info about a possible image problem | 20:09 |
corvus | what's ubuntu-bionic vs ubuntu-bionic-plain? | 20:09 |
fungi | -plain is the version without the pip-and-virtualenv element | 20:09 |
clarkb | it is what we are using to test that removing pip doesn't completely break the owrld | 20:09 |
clarkb | (because we'll have zuul jobs cover the delta) | 20:10 |
corvus | ok, so 'ubuntu-bionic' is mostly what we're concerned about, but '-plain' may give us extra data points | 20:10 |
mnaser | right, but i think what was hinted about the relation to the zuul-worker role change | 20:10 |
fungi | yeah, eventually that will disappear when we fold that change into the real images | 20:10 |
mnaser | and nb01 doesnt have logs except for 1 single bionic build | 20:10 |
mnaser | so this one provided some contextual history | 20:10 |
clarkb | I know mordred had made changes to things to not install suggests and recommends but I believe that was our docker images only | 20:10 |
clarkb | and gpg-agent is a requires not suggests or recomments of gnupg anyway | 20:10 |
corvus | oh i thought we applied it elsewhere too? | 20:11 |
mnaser | basically 2020-04-02 05:30 had gnupg and 2020-04-03 06:57 did not | 20:11 |
corvus | (i thought i recalled a sort of "we're doing it here, let's do it in the other place" commit message) | 20:11 |
clarkb | corvus: ah that could be | 20:11 |
mordred | clarkb: yeah - and that was actually beause we already do that in our dib images | 20:12 |
corvus | mordred: was that a change to dib rather than a change to p-c? | 20:12 |
fungi | last dib release was monday | 20:12 |
mordred | it wasn't either - it was a change to python-base docker to match what p-c was already doing in dib images | 20:12 |
fungi | and we're installing dib from tagged releases | 20:12 |
corvus | mordred: okay, so nothing about that changed recently in our disk images | 20:13 |
mordred | nope | 20:13 |
fungi | so unless something delayed image deployment by a couple days the dib release seems unlikely to have brought this behavior in | 20:13 |
corvus | fungi: if nb04 is implicated, that's a possibility, but i think only nb01/nb02 are right now | 20:14 |
openstackgerrit | Merged zuul/zuul-jobs master: helm: collect kubernetes logs in post https://review.opendev.org/715709 | 20:14 |
corvus | (i mean, it's still a possibility on nb01/2 but it's remote) | 20:14 |
fungi | yeah, nb04 is still only building fedora-31 i think? | 20:14 |
clarkb | fwiw python-apt does hard dep gnupg | 20:14 |
mnaser | hmm | 20:14 |
clarkb | so I bet removing python-apt did cause us to stop installing gnupg ( and gpg-agent) | 20:14 |
fungi | er, 30 and 31 | 20:14 |
fungi | https://nb04.opendev.org/ | 20:14 |
mnaser | it looks like install-packages element didnt run at all | 20:15 |
clarkb | what I don't understand is how we have a `gpg` utility on the server without gpg-agent | 20:15 |
clarkb | mnaser: it had to otherwise we wouldn't have an ssh server | 20:15 |
fungi | dib may run apt installing without recommends in some places | 20:15 |
mnaser | ctrl+f on the log which doesnt include gnupg doesnt show `package-installs-v2 --phase install.d` running | 20:15 |
clarkb | fungi: they aren't recommends they are requires | 20:15 |
clarkb | fungi: that is why I'm confused gnupg requires gpg-agent | 20:15 |
fungi | depends, yes | 20:16 |
fungi | i just checked | 20:16 |
fungi | (not requires, that's a python thing) | 20:16 |
mordred | mnaser: I was about to say the same thing | 20:16 |
corvus | yeah, they seem surprisingly different | 20:16 |
mnaser | maybe comparing the 'plain' isn't teh right idea but yeah | 20:17 |
mordred | maybe we were getting a transtive dep on install-packages from pip-and-virtualenv | 20:17 |
corvus | mnaser: even comparing https://nb02.openstack.org/ubuntu-bionic-0000104262.log and https://nb01.openstack.org/ubuntu-bionic-0000104264.log shows that difference | 20:17 |
mordred | infra-package-needs has a dep on package-installs | 20:18 |
mordred | not on install-packages | 20:18 |
mnaser | it looks like ifnra-package-needs is not running | 20:18 |
mnaser | or at least it's not doing the whole mapping thing | 20:18 |
corvus | oh 4262 failed | 20:18 |
clarkb | mnaser: thats what installs sshd though | 20:18 |
corvus | maybe it just didn't get far enough | 20:18 |
clarkb | mnaser: so how did we get an sshd if that didn't run? | 20:18 |
mordred | oh - nm. the element is package-installs | 20:19 |
mnaser | that's a very good question but i'm just trying to put things out | 20:19 |
clarkb | I think its running | 20:19 |
mordred | package-installs-v2 --phase install.d didn't run in the bad log | 20:19 |
mnaser | Map install for infra-package-needs: wget, rsync, ntpdate, git, traceroute, lvm2, ntp, tcpdump, iptables, tox, at, redhat-lsb-core, iproute2, dnsutils, build-essential, cron, gentoolkit, strace, curl, util-linux, rsyslog, redhat-rpm-config, haveged, python3-dev, python-dev, parted, uuid-runtime, coreutils, acpid, iputils-ping | 20:19 |
mnaser | i dont see that in https://nb01.openstack.org/ubuntu-bionic-0000104264.log | 20:19 |
clarkb | search DPKG_MANIFEST_NAME | 20:20 |
mnaser | clarkb: openssh comes in from another element | 20:20 |
mnaser | Map install for openssh-server: openssh-server | 20:20 |
mnaser | so i think the issue is that somehow infra-package-needs list of packages didnt get "appended" inside package-installs | 20:20 |
clarkb | hrm Ithought that was in infra pakacage needs (I always added it to my own image builds for that reason) but maybe we can get it from multiple places now | 20:20 |
fungi | okay, so here's how we can have gpg and no gpg-agent. the gpg and gpg-agent packages are a dependens entries for the gnupg metapackage, but you an install the gpg package which merely recommends gnupg | 20:21 |
mnaser | wait, im blind, sorry, infra-package-needs is there | 20:21 |
clarkb | 2020-04-03 17:51:51.325 | Map install for infra-package-needs: rsync, tcpdump, cron, tox, git, haveged, ntpdate, traceroute, util-linux, dnsutils, redhat-lsb-core, build-essential, ntp, python-dev, gentoolkit, acpid, redhat-rpm-config, python3-dev, parted, lvm2, coreutils, rsyslog, iptables, curl, strace, iputils-ping, wget, at, iproute2, uuid-runtime | 20:22 |
* fungi will retype that if people can't read his pizza-writing | 20:22 | |
clarkb | fungi: aha | 20:22 |
clarkb | so ya I bet something is installing gpg but not gnupg | 20:22 |
clarkb | before this dind't matter becase python-apt installed gnupg | 20:22 |
clarkb | to fix this we should add gnupg to infra-package-needs | 20:22 |
clarkb | (or revert https://review.opendev.org/#/c/716785/3 ) | 20:23 |
fungi | yeah, i concur, adding gnupg to infra-package-needs ought to solve it | 20:23 |
mordred | ++ | 20:23 |
mnaser | that seems most reasonable to me too | 20:23 |
fungi | but who knows what other package we might also be missing beyond that | 20:23 |
fungi | i guess odds are not many | 20:23 |
clarkb | fungi: well its just things that python-apt and libselinx-python would pull in | 20:23 |
fungi | so better to play whack-a-mole with any we find than roll back | 20:23 |
clarkb | oh and libselinux-python wasn't a thing on debuntu | 20:24 |
clarkb | so just python-apt | 20:24 |
fungi | the upshot of this is that we're not getting images updated, right? | 20:24 |
clarkb | https://packages.ubuntu.com/bionic/python-apt | 20:24 |
fungi | it's not breaking any jobs? | 20:24 |
clarkb | fungi: no we found this because jobs broke | 20:24 |
corvus | fungi: no, we have broken images breaking jobs | 20:24 |
fungi | ahh | 20:24 |
corvus | mostly ones that install things (like docker or yarn) from apt repos | 20:24 |
mnaser | ^ and need to add apt-keys | 20:25 |
fungi | oh, more specifically things which try to use gpg to load third-party repository keys with apt-key? | 20:25 |
corvus | yep | 20:25 |
fungi | makes sense why we wouldn't have spotted that sooner | 20:25 |
corvus | fungi: we spotted it pretty quickly :) | 20:26 |
corvus | change merged 4 hours ago, the image is 2 hours old | 20:26 |
corvus | but yeah, it didn't break *everything* | 20:27 |
corvus | which probably means there aren't too many moles to whack | 20:27 |
fungi | neat | 20:27 |
corvus | is someone going to push up a fix we can merge quickly, or should i push up a change to pause bionic builds? | 20:27 |
mnaser | i can do that quickly | 20:27 |
fungi | thanks mnaser! | 20:27 |
fungi | standing by to review in that case | 20:28 |
clarkb | corvus: working on it | 20:28 |
clarkb | just double checking package names on various systems | 20:28 |
clarkb | I think 'gnupg2' will work universally | 20:28 |
fungi | yeah, gnupg2 in newer debian/ubuntu is just a transitional package but pulls in gnupg which in turn depends on gpg and gpg-agent and more | 20:28 |
mnaser | oh we need to play "figure out the package name" | 20:28 |
corvus | i'll go ahead and delete the current bionic image; we might end up building one more before the cleanup lands, but we can roll the dice on that | 20:28 |
mnaser | ok clarkb is doing that | 20:29 |
clarkb | just trying to confirm on suse now | 20:29 |
fungi | thanks corvus! | 20:29 |
corvus | #status log deleted ubuntu-bionic-0000104264 image due to missing gpg-related packages | 20:29 |
openstackstatus | corvus: finished logging | 20:29 |
mnaser | doesnt seem to be gnupg2 or gnupg | 20:30 |
mnaser | but do we need it on suse-based system anyways? | 20:30 |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Install gpg tooling on dib images https://review.opendev.org/717386 | 20:31 |
clarkb | that should do it I think | 20:31 |
clarkb | mnaser: we want to be consistent I think | 20:31 |
clarkb | mnaser: ya I saved suse for lsat bceause it is what I run | 20:32 |
clarkb | mnaser: I just have to run zypper search locally :) | 20:32 |
mnaser | clarkb: oh cool, i enjoy using docker for these type of things lately | 20:32 |
mnaser | docker run -it --rm opensuse/tumbleweed /bin/bash and go | 20:32 |
mordred | mnaser: same | 20:35 |
mordred | clarkb: well - amusingly enough this should test our new update to the deps list :) | 20:39 |
clarkb | indeed | 20:39 |
corvus | mnaser: do i recall correctly that all the machines using an image in vexxhost have to be deleted before the image itself can be completely deleted? | 20:39 |
clarkb | corvus: ya due to boot from volume + ceph behavior | 20:40 |
corvus | k. no problem; i just noticed that the bionic image was sticking around in delete state and wanted to confirm that's expected | 20:40 |
corvus | should clear up eventually | 20:40 |
clarkb | I've just noticed that ubuntu gnupg2 is actually a transitional package | 20:41 |
clarkb | fungi: ^ should I push a followon change that will switch to gnupg on debuntu? | 20:41 |
clarkb | (the change that is approved should work, mostly thinking about future distro releases) | 20:42 |
fungi | clarkb: it's probably fine for now | 20:42 |
fungi | when that stops working (which won't be for a year or two probably) we'll start getting an error in the image builds anyway | 20:43 |
fungi | so we can always wait to fix that until we actually have to | 20:43 |
clarkb | k | 20:44 |
fungi | clarkb: especially since part of what's transitional about the gnupg2 package is that it ships symlinks from /usr/bin/gpg2 to /usr/bin/gpg and the like | 20:45 |
fungi | which folks are probably going to want for a long time to come | 20:45 |
clarkb | oh ya those are useful | 20:45 |
fungi | so there's no indication as to when (if ever) that package would go away | 20:45 |
fungi | so many custom scripts and tutorials still reference `gpg2` | 20:46 |
fungi | and from the package maintainers' point of view, continuing to provide that package costs basically nothing | 20:47 |
fungi | so there's little incentive to try to get rid of it | 20:47 |
openstackgerrit | Merged openstack/project-config master: Install gpg tooling on dib images https://review.opendev.org/717386 | 20:59 |
mordred | infra-root: infra-prod-service-nodepool is running in the new deploy pipeline :) | 21:00 |
mordred | and it has run | 21:04 |
clarkb | now we need to trigger an image build but there is probably already one running that we need to complete first (and when that one completes deleting its images will trigger the rebuild for us) | 21:05 |
mordred | yeah | 21:05 |
clarkb | hrm I don't think the files updated on nb01 at least | 21:05 |
clarkb | mordred: /etc/nodepool/elements/infra-package-needs/ doesn't seem to have updated on nb01 or nb02 | 21:06 |
clarkb | did that deploy job run properly? | 21:07 |
mordred | clarkb: it looks like it - I mean- it ran the playbook - lemme go read the playbook | 21:07 |
clarkb | oh wai | 21:08 |
clarkb | I know | 21:08 |
clarkb | the nodepool playbook is only for nb04 I bet | 21:08 |
clarkb | nb01-03 rely on puppet | 21:08 |
fungi | that's it exactly | 21:08 |
mordred | ah. wow. yeah. so ... yeah | 21:09 |
mordred | so we'll get it in the next pulse | 21:09 |
fungi | do we have a way to kill the in progress builds? | 21:09 |
clarkb | ya and nb04 is up to date | 21:09 |
mordred | I think ianw was planning on replacing 1-3 with ansible/docker soon | 21:09 |
clarkb | fungi: it will just restart | 21:09 |
fungi | or are we better off stopping and restarting the builders once puppet runs | 21:09 |
openstackgerrit | Merged opendev/system-config master: Switch to deploy pipeline for deployments https://review.opendev.org/717356 | 21:09 |
clarkb | fungi: ya I think that otherwise nodepool will immediately start a new build | 21:10 |
clarkb | the good news here is nb04 updated as epxceted so that side of the system is working well :) | 21:10 |
fungi | i mean, this change is going to affect all the images regardless, so we might as well wait for it to land and build everything anew | 21:10 |
fungi | i'll go stop the nodepool-builder service on 01-03 unless there are objections | 21:11 |
mordred | sounds fine to me | 21:11 |
corvus | ++ | 21:11 |
clarkb | yup that should be fine | 21:11 |
mordred | so - for things like nodepool now ... | 21:12 |
fungi | okay, term signals sent to all via initscripts | 21:12 |
mordred | should we remove nb* from remote_puppet_else and add a puppet play to the service-nodepool playbook? | 21:12 |
mordred | that way we'll properly trigger all of nodepool on those project-config patches | 21:12 |
clarkb | mordred: maybe check with ianw monday and see? since I know ianw said moving to docker nodepool builder was one of his next things | 21:13 |
fungi | some still have some long-running subprocesses underway, like qemu-immg invocatins | 21:13 |
clarkb | and if thats done the old servers just go away | 21:13 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run puppet on old nb0[1-3] in nodepool playbook https://review.opendev.org/717396 | 21:16 |
mordred | clarkb: ^^ I think we can wait for ianw to land it- but I think we should do that until the other builders are replaced - otherwise we might miss project-config changes and have to wait for the nightly | 21:16 |
mordred | clarkb: but also - yeah - just spinning up nb01-3.opendev.org and deleting the old ones will also be nice | 21:18 |
clarkb | mordred: the cron is still running though right? | 21:18 |
clarkb | so puppet is happening hourly | 21:18 |
mordred | clarkb: well - it is now until we land the remote_puppet_else change :) | 21:18 |
clarkb | oh gotcha | 21:18 |
fungi | the builders have gone idle now, wrapped up pending child processes | 21:19 |
mordred | although we coudl also just add a trigger | 21:19 |
corvus | mordred: what should we be doing now? merge meetpad change? | 21:19 |
mordred | yeah. | 21:20 |
mordred | clarkb: I forgot - we have a catchall for running remote_puppet_else hourly at the end of the stack | 21:20 |
corvus | clarkb: https://review.opendev.org/716771 | 21:20 |
mordred | https://review.opendev.org/#/c/717064/3 | 21:21 |
mordred | (just - if we land through there we're back to hourly catch-all runs) | 21:21 |
corvus | mordred: do you want to proceed one at a time through that, or do a batch? | 21:21 |
mordred | corvus: at this point I'm pretty happy with how it's running | 21:21 |
mordred | I think we can do a batch - main thing is to make sure we're not missing triggering from something else on any of them | 21:21 |
mordred | (of course, we can always add that as a followup as needed) | 21:22 |
mordred | so - from my pov - I don't think there's any reason to hold off on any of them | 21:22 |
clarkb | I think they've got the reviews they need now | 21:23 |
mordred | well - we need to pause on https://review.opendev.org/#/c/717058 for at least a deploy | 21:23 |
clarkb | mordred: if you want to +A what you are comfortable with | 21:23 |
clarkb | fwiw batches are nice simply if something goes wrong its less things to debug | 21:24 |
clarkb | but also take longer when things are going well :) | 21:24 |
mordred | yeah. well - I clicked +A up to https://review.opendev.org/#/c/717053 and then stopped because it's voteless | 21:24 |
mordred | so let's call that a natural pausing place :) | 21:24 |
corvus | i'm going to stop at 717053; that seems like a really good place to stop | 21:24 |
corvus | mordred: heh, yeah :) | 21:25 |
corvus | not opposed to doing that on a friday afternoon (or saturday), but that seems like a good one to merge only if everything else is stable | 21:26 |
mordred | corvus: re: comment - it was mostly because it kept confusing my eyeballs - and zuul-registry is running in the registry service, so preview felt similarly like an independent service | 21:28 |
mordred | corvus: but- I don't have strong feelings on it in either direction | 21:28 |
corvus | mordred: yeah, it's a +2 comment :) | 21:29 |
clarkb | https://etherpad.openstack.org/p/IBLWO1WBBc email draft for getting people on mailing lists | 21:29 |
clarkb | I worry sending that on friday afternoon might end up leading to it being ignored | 21:30 |
corvus | clarkb: yeah, next week ++ | 21:30 |
clarkb | but can either send that nowish or monday morning | 21:30 |
corvus | 1) sign up for service discuss. 2) sign up for service-announce. 3) wash hands. | 21:30 |
mordred | corvus: I feel like wash hands should be in the list more | 21:31 |
corvus | clarkb: looks good, made some suggestions | 21:35 |
clarkb | corvus: those edits look great, thanks | 21:36 |
fungi | clarkb: yes, monday will in theory get more eyeballs | 21:38 |
fungi | clarkb: do we want to also use service-discuss to answer service usage questions? if so, the announcement doesn't give that impression with "plan changes to services, notify of meetings, and otherwise communicate about OpenDev" (though i guess it could technically fall into the "otherwise communicate about" category) | 21:43 |
clarkb | ya I was trying to give examples that set it apart from announce | 21:44 |
clarkb | answering service usage questions would fit under otherwise communicate but we should call it out | 21:44 |
fungi | definitely asking for help sets it apart from the scope of the announce ml | 21:44 |
fungi | if it's a place we expect users and collaborators on these services to go for help with them, then i do think it probably merits direct mention | 21:45 |
clarkb | fungi: how does that look? | 21:45 |
openstackgerrit | Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339 | 21:45 |
fungi | clarkb: perfect! | 21:46 |
fungi | i tried to capture that in the ml description/info texts too | 21:46 |
openstackgerrit | Merged opendev/system-config master: Run meetpad in zuul https://review.opendev.org/716771 | 21:51 |
openstackgerrit | Merged opendev/system-config master: Run mirror-update in zuul https://review.opendev.org/716772 | 21:55 |
openstackgerrit | Merged opendev/system-config master: Run nameserver in zuul https://review.opendev.org/716764 | 21:55 |
*** DSpider has quit IRC | 21:56 | |
*** hashar has quit IRC | 22:05 | |
openstackgerrit | Merged opendev/system-config master: Run mirror in zuul https://review.opendev.org/717048 | 22:08 |
openstackgerrit | Merged opendev/system-config master: Run static in zuul https://review.opendev.org/717049 | 22:08 |
openstackgerrit | Merged opendev/system-config master: Add file matchers for roles used via include_role https://review.opendev.org/717050 | 22:08 |
openstackgerrit | Merged opendev/system-config master: Run backup in zuul https://review.opendev.org/717051 | 22:08 |
openstackgerrit | Merged opendev/system-config master: Run registry in zuul https://review.opendev.org/717052 | 22:08 |
clarkb | fungi: nb01 seems to have new elements | 22:11 |
clarkb | should we start the services there? | 22:11 |
clarkb | nb02 looks good as well | 22:12 |
corvus | that's the current batch of approved changes merged | 22:12 |
fungi | clarkb: yeah, i suppose 03 is less critical but i'll start them all back up now | 22:12 |
clarkb | nb03 also. So ya I think we can start builers again? | 22:12 |
clarkb | fungi: they should all be good now | 22:13 |
fungi | all started again | 22:13 |
corvus | clarkb, mordred: are the prod-service playbooks not running when we add jobs? | 22:27 |
clarkb | corvus: I don't think so because they are in deploy? | 22:28 |
clarkb | (I want to say the auto run thing is only for pre merge things? but maybe I'm wrong) | 22:28 |
corvus | yeah, it may be, but that may be an omission we should see if we can lift | 22:28 |
clarkb | we've definitely been merging changes to trigger them | 22:28 |
clarkb | rather than relying on the self run behavior (because I don't think that happens) | 22:28 |
corvus | i wonder why it doesn't trigger | 22:31 |
*** dpawlik has quit IRC | 22:32 | |
clarkb | https://nb02.openstack.org/ubuntu-bionic-0000104265.log is the log of hte image that should fix bionic for us | 22:32 |
clarkb | build is in progress | 22:32 |
clarkb | 2020-04-03 22:35:45.010 | > Get:155 http://mirror.dfw.rax.openstack.org/ubuntu bionic-updates/main amd64 gpg-agent amd64 2.2.4-1ubuntu1.2 [227 kB] | 22:40 |
fungi | lookin good, yep | 22:40 |
mordred | corvus: yeah - I've been thinking that's weird | 23:00 |
corvus | clarkb, fungi, mordred: looking into why new jobs don't run in deploy: i think the change to add the job is merged; it's enqueued into deploy, it sees that it's a config update, so in enqueues a config update event, it processes the pipelines (it's in deploy) and sends out a merger job for it, then it starts the main loop over, reconfigure events are handled first, so it stops the loop and performs a | 23:02 |
corvus | reconfigure event, it resumes, some time passes, the merge job comes back, it generates a diff between the requested and current config, and there is none, so it reads it as not updating the job. | 23:02 |
corvus | in short: by the time zuul evaluates whether it's a change to the config, the running config has changed, so it is no longer a change to the config | 23:03 |
fungi | that mind-bending | 23:03 |
fungi | but yes | 23:03 |
fungi | it does make sense | 23:04 |
corvus | and i think that's behavior is pretty solid (ie, not subject to races). | 23:04 |
fungi | so we need it comparing against the pre-reconfigure config | 23:04 |
clarkb | my mental image now is of zuul-scheduler and zuul-merger giving each other a high five | 23:04 |
fungi | bwahahaha | 23:04 |
* fungi did actually lol | 23:05 | |
clarkb | "we did it!" | 23:05 |
corvus | clarkb: i imagine them attempting a high five and just missing | 23:05 |
fungi | *woosh* | 23:05 |
mordred | corvus: that's amazing | 23:05 |
corvus | [i briefly investigated what would happen if there's a race -- if that were possible, i think it could start a deploy job, and then abort it when the config updated. but i don't think that's actually possible (and i poked around with a unit test to try to make it happen and could not).] | 23:07 |
corvus | so yeah, if we wanted that to happen, we'd have to compare to an earlier config... and, er, saving old versions of the layout may not be the best idea. | 23:08 |
corvus | that's pretty much our most effective memory leak creation tool :) | 23:08 |
fungi | as evidenced by prior cases where we failed to reap extra copies of the layout in a timely fashion | 23:09 |
corvus | this may be a topic to file away and revisit if we make any changes that might make that easier | 23:10 |
clarkb | the bionic image has built | 23:15 |
clarkb | usually we get new images within about 10 minutes in some clouds | 23:15 |
*** tosky has quit IRC | 23:16 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!