ianw | i'm going to reboot it ... see if it reoccurs after a certain build | 00:26 |
---|---|---|
ianw | fedora-30 or xenial-plain was the last thing it tried to build before it went crazy, i think | 00:27 |
ianw | root 24520 24518 0 00:49 ? 00:00:00 lvs --noheadings --separator : -o vg_name,lv_name ... hrm | 00:52 |
prometheanfire | ianw: have a tic to look at https://review.opendev.org/367487 ? | 02:39 |
prometheanfire | another glean thing, but the last one that's both passing tests and has at least one other +2 | 02:39 |
*** jhesketh has joined #opendev | 02:56 | |
*** kevinz has joined #opendev | 03:06 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Fix release for Fedora 31 https://review.opendev.org/721159 | 03:25 |
ianw | ^ i think this might be the "cause" of the nb04 issues -- it causes the build to fail, and something in the unwinding process kills the container .... | 03:26 |
openstackgerrit | Merged openstack/project-config master: Fix release for Fedora 31 https://review.opendev.org/721159 | 03:47 |
*** ykarel|away is now known as ykarel | 04:09 | |
ianw | unfortunately that has *not* fixed it. f31 built but then it looks like the container was hosed | 04:58 |
ianw | logs in /var/log/nodepool/builds/keep | 04:58 |
ianw | # docker run --privileged --entrypoint /bin/bash zuul/nodepool-builder:latest -c " umount /proc; mount" | 05:14 |
ianw | i notice that this "works" ... as in unmounts /proc ... whereas on a real system that's pretty much impossible | 05:14 |
ianw | it's a red herring | 05:57 |
ianw | i think | 05:57 |
ianw | # docker run --privileged --entrypoint /bin/bash zuul/nodepool-builder:latest -c "DIB_RELEASE=bionic disk-image-create -o test.qcow2 ubuntu-minimal vm ; echo ; mount ; ls /proc" | 05:57 |
ianw | is a replicator | 05:57 |
*** dpawlik has joined #opendev | 06:09 | |
*** ysandeep|away is now known as ysandeep | 06:53 | |
*** DSpider has joined #opendev | 06:57 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Move Ubuntu builds away from nb04 https://review.opendev.org/721175 | 07:05 |
*** ysandeep is now known as ysandeep|afk | 07:11 | |
*** rpittau|afk is now known as rpittau | 07:12 | |
*** ralonsoh has joined #opendev | 07:18 | |
openstackgerrit | Merged openstack/project-config master: Move Ubuntu builds away from nb04 https://review.opendev.org/721175 | 07:39 |
openstackgerrit | Merged openstack/project-config master: Add TrilioVault charms https://review.opendev.org/720534 | 07:47 |
openstackgerrit | Merged openstack/project-config master: Remove pypy job from x/surveil https://review.opendev.org/720699 | 07:47 |
AJaeger | config-core, please review https://review.opendev.org/720641 (nb03 update) | 07:51 |
AJaeger | ianw, are we ready to go to Fedora 31? https://review.opendev.org/717657 What about updating fedora-latest as well? | 07:51 |
AJaeger | infra-root, https://review.opendev.org/720534 has a promote failure: infra-prod-remote-puppet-else https://zuul.opendev.org/t/openstack/build/f6b57467b53f410b902d72285eadd4d6 : FAILURE in 1m 44s | 07:57 |
AJaeger | "ERROR! the playbook: /home/zuul/src/opendev.org/opendev/system-config/playbooks/remote-puppet-else.yaml could not be found" | 08:00 |
*** ysandeep|afk is now known as ysandeep | 08:02 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 08:04 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 08:08 |
*** ykarel is now known as ykarel|lunch | 08:17 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 08:19 |
*** sshnaidm has joined #opendev | 08:22 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 08:25 |
ianw | AJaeger: i'll re-evaluate tomorrow, i have to investigate these nb04 builder failures a bit more | 08:37 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 08:43 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 08:46 |
*** ykarel|lunch is now known as ykarel | 09:00 | |
*** tosky has joined #opendev | 09:01 | |
AJaeger | ianw: sure | 09:03 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 09:19 |
*** ysandeep is now known as ysandeep|lunch | 09:34 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output https://review.opendev.org/721192 | 09:47 |
openstackgerrit | Merged opendev/irc-meetings master: Not all meetings are OpenStack https://review.opendev.org/720063 | 09:51 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Fail fetch-sphinx-tarball if no html exists https://review.opendev.org/721221 | 10:12 |
*** ysandeep|lunch is now known as ysandeep | 10:13 | |
*** rpittau is now known as rpittau|bbl | 10:15 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Fail fetch-sphinx-tarball if no html exists https://review.opendev.org/721221 | 10:33 |
*** roman_g has joined #opendev | 10:39 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Use remote_src false for easier debugging https://review.opendev.org/721237 | 10:52 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: use remote_src false https://review.opendev.org/721237 | 10:53 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: use remote_src false https://review.opendev.org/721237 | 10:58 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: tox: allow tox to be upgraded https://review.opendev.org/690057 | 11:09 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: use remote_src true https://review.opendev.org/721237 | 11:28 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml https://review.opendev.org/721245 | 11:48 |
*** dpawlik has quit IRC | 11:50 | |
*** dpawlik has joined #opendev | 11:50 | |
*** dpawlik has quit IRC | 11:52 | |
*** dpawlik has joined #opendev | 11:53 | |
*** dpawlik has quit IRC | 11:54 | |
*** dpawlik has joined #opendev | 11:54 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml https://review.opendev.org/721245 | 11:55 |
*** bwensley has left #opendev | 11:56 | |
*** rpittau|bbl is now known as rpittau | 12:03 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: Do not keep owner of archived files https://review.opendev.org/721248 | 12:05 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists https://review.opendev.org/721258 | 12:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook https://review.opendev.org/721098 | 12:43 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Move cloud-init removal to its own playbook https://review.opendev.org/721106 | 12:43 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Just move cloud-init removal into base-server https://review.opendev.org/721107 | 12:43 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use https://review.opendev.org/720892 | 12:43 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove some extra bits from site.pp https://review.opendev.org/721101 | 12:43 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split codesearch into its own playbook https://review.opendev.org/721102 | 12:43 |
AJaeger | morning, mordred ! Did you see my comment about the promote failure on 720534 above? | 12:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Fix remote_puppet_else playbook name https://review.opendev.org/721260 | 12:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Fix remote_puppet playbook names https://review.opendev.org/721260 | 12:45 |
mordred | AJaeger: ^^ that should fix it | 12:45 |
AJaeger | thx | 12:49 |
mordred | AJaeger: thanks for noticing :) | 12:50 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Update Neutron Grafana dashboard https://review.opendev.org/718392 | 12:53 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Update Neutron Grafana dashboard https://review.opendev.org/718392 | 12:54 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists https://review.opendev.org/721258 | 12:55 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split codesearch into its own playbook https://review.opendev.org/721102 | 12:56 |
*** ykarel is now known as ykarel|afk | 12:56 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists https://review.opendev.org/721258 | 12:58 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml https://review.opendev.org/721245 | 13:00 |
*** kevinz has quit IRC | 13:09 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists https://review.opendev.org/721258 | 13:11 |
*** kevinz has joined #opendev | 13:14 | |
openstackgerrit | Merged zuul/zuul-jobs master: Update Fedora to 31 https://review.opendev.org/717657 | 13:21 |
mordred | frickler, fungi: when you get a sec, https://review.opendev.org/721260 | 13:24 |
openstackgerrit | Merged zuul/zuul-jobs master: Make ubuntu-plain jobs voting https://review.opendev.org/719701 | 13:25 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Adds role: ensure-ansible https://review.opendev.org/721269 | 13:27 |
corvus | mordred: re the report of retry_limits in #openstack-infra, it looks like the scheduler is under memory pressure and we need to restart | 13:35 |
mordred | corvus: ah. I didn't think to look for memory pressure - I was trying to find an error or something in the log | 13:38 |
corvus | mordred: i looked at the graph, and then grepped for "ZooKeeper" to confirm there were connection losses | 13:39 |
corvus | mordred: we may have a problem, i may need more eyes | 13:41 |
mordred | corvus: k. where should I apply the eyes? | 13:41 |
corvus | i think the scheduler is at the part of the startup process where it queries gerrit for all project-branches, but i see no activity | 13:41 |
corvus | it is talking to gerrit though | 13:42 |
corvus | it's getting events | 13:42 |
mordred | corvus: yeah - I see the events | 13:43 |
corvus | huh, it looks like it never stopped | 13:43 |
corvus | but the process timestamp is recent | 13:43 |
corvus | i'll just try killing it again | 13:44 |
mordred | RuntimeError: cannot join current thread | 13:44 |
mordred | that's fun | 13:44 |
corvus | okay, there's definitely no scheduler process running now, restarting | 13:44 |
mordred | it looks like it's starting | 13:44 |
corvus | why aren't we seeing all the branches at cat jobs? | 13:45 |
mordred | but ... yeah | 13:45 |
mordred | do we need to restart the mergers? | 13:45 |
corvus | shouldn't need to | 13:45 |
corvus | it debug logs the cat jobs before sending them to the mergers anyway | 13:46 |
corvus | so even if we were waiting on cat jobs, we'd still see a bunch of output | 13:46 |
mordred | yeah | 13:46 |
mordred | I just checked and the main.yaml is there and is correct | 13:46 |
mordred | fwiw - puppet hasn't run on the zuul hosts in several days (or anywhere) - so honestly _nothing_ should be different on the hosts | 13:48 |
corvus | i was just about to say, the zuul install is rather outdated | 13:48 |
corvus | oh! | 13:49 |
corvus | it was doing a db migration? | 13:49 |
mordred | corvus: yeah | 13:49 |
corvus | and aborting it caused problems | 13:49 |
mordred | and being unhappy about it | 13:49 |
mordred | yeah | 13:49 |
corvus | ok, should we manually undo that migration and let it run this time | 13:50 |
mordred | well - it at least explains the pause - we should maybe add a log line | 13:50 |
corvus | and then put in some friggin log lines? :) | 13:50 |
mordred | "starting db migrations" or something | 13:50 |
mordred | yeah | 13:50 |
corvus | mordred: can you get a mysql prompt while i find the migration? | 13:50 |
mordred | corvus: working on it | 13:50 |
mordred | corvus: in a mysql prompt on zuul.o.o in a screen session | 13:52 |
corvus | mordred: alter table "zuul_build" drop column "error_detail"; | 13:52 |
mordred | running | 13:52 |
corvus | sorry, i never get the quotes right | 13:52 |
mordred | corvus: do we need to update the migrations table/ | 13:52 |
corvus | mordred: i'm assuming it never got upgraded, but i'll get the values | 13:53 |
corvus | mordred: i think the value we want in there is 5f183546b39c. let's just check and see if that's what's there. | 13:53 |
mordred | k. will do once this is done | 13:54 |
corvus | i just checked, and wow, it really doesn't say anything about starting a migration | 13:55 |
mordred | we might want to set a status - if the restarts were any indication, we might be at this for a few minutes | 13:55 |
corvus | yeah | 13:55 |
mordred | corvus: I'd think "starting migrations" "running migration XXX" and "done with migrations" would all be nice | 13:55 |
mordred | we also might want to investigate starting to use online ddl for some of these | 13:56 |
corvus | status notice Zuul is temporarily offline; service should be restored in about 15 minutes. | 13:56 |
corvus | mordred: ^? | 13:56 |
corvus | (is that too optimistic?) | 13:57 |
mordred | corvus: let's go for it | 13:57 |
corvus | #status notice Zuul is temporarily offline; service should be restored in about 15 minutes. | 13:57 |
openstackstatus | corvus: sending notice | 13:57 |
mordred | corvus: https://dev.mysql.com/doc/refman/5.7/en/innodb-online-ddl-operations.html#online-ddl-column-operations <-- for our reading pleasure later | 13:57 |
-openstackstatus- NOTICE: Zuul is temporarily offline; service should be restored in about 15 minutes. | 13:57 | |
*** sgw has joined #opendev | 13:57 | |
* fungi is here if help is needed, but has just been watching since you seem to have figured this out already | 13:58 | |
*** dmsimard has joined #opendev | 13:58 | |
corvus | mordred: neat; it will be interesting to manage schema upgrades with multiple schedulers | 13:58 |
sgw | Morning folks, any idea why a .gitreview commit would get stuck for the startlingx/kernel? Does the .zuul.yaml need to precede it? | 14:00 |
corvus | mordred: we should also get around to dropping some rows from this table. | 14:00 |
mordred | corvus: yeah. will probably need to do a leader election | 14:00 |
openstackstatus | corvus: finished sending notice | 14:00 |
corvus | sgw: yes it does | 14:00 |
corvus | sgw: at least add the 'noop' jobs to allow the gate to work | 14:01 |
corvus | mordred: let's check the rev | 14:02 |
*** iurygregory has joined #opendev | 14:02 | |
corvus | good that's what we want | 14:02 |
sgw | Ok, thatnks | 14:02 |
fungi | sgw: or you could squash them both into a single change | 14:02 |
corvus | i'll restart scheduler now and let it run | 14:02 |
mordred | cool. I think you're good to ... yeah | 14:02 |
mordred | corvus: dropping took about 9 minutes, so we shoudl expet the add to take the same | 14:03 |
corvus | mordred: ack; loosk like it's running | 14:03 |
mordred | ++ | 14:04 |
ttx | hrm, github does not seem to allow me to do that git push --prune after all | 14:08 |
ttx | Fails with lots of ! [remote failure] refs/changes/66/217766/2 (remote failed to report status) | 14:08 |
ttx | I guess git push --mirror would work, but that is a bit more costly to run and potentially would introduce a race | 14:09 |
*** calcmandan has quit IRC | 14:09 | |
mordred | ttx: well - the race is probably fine - it'll just get resolved by the next push | 14:09 |
*** calcmandan has joined #opendev | 14:09 | |
ttx | Does someone know how to clone a repository with all the branches ? Doing git clone and then git push --mirror only pushes the master branch | 14:10 |
corvus | you could have someone standing by to force a gerrit replication on the project afterwords | 14:10 |
ttx | git clone --mirror seems to do something else than what you expect it to | 14:10 |
mordred | ttx: git clone --mirror makes a bare repo - after you do that, do "git config --bool core.bare false" | 14:12 |
mordred | (from in the repo) | 14:12 |
corvus | progress on zuul | 14:13 |
corvus | looks like the github driver is starting | 14:13 |
fungi | ttx: we have some code in jeepyb which clones all branches and tags | 14:13 |
* fungi finds | 14:13 | |
corvus | and there go the cat jobs \o/ | 14:13 |
mordred | there we go! | 14:13 |
ttx | fungi: yeah, I was hoping the async refs/changes cleanup script would not require to clone half of the universe | 14:14 |
corvus | ttx: despite that error, did the prune happen to work? | 14:15 |
*** ysandeep is now known as ysandeep|afk | 14:15 | |
ttx | corvus: no, it leaves the refs/changes untouched | 14:15 |
fungi | ttx: https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/utils.py#L135-L145 | 14:17 |
corvus | okay, it's up and re-enqueue is running | 14:17 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook https://review.opendev.org/721098 | 14:18 |
mordred | corvus: woot. that's a relief | 14:18 |
*** ykarel|afk is now known as ykarel | 14:22 | |
corvus | win 96 | 14:32 |
corvus | looks like all systems nominal | 14:34 |
mordred | corvus: woot | 14:34 |
corvus | mordred: shall we terminate the screen session now? | 14:34 |
mordred | corvus: yeah | 14:35 |
corvus | done | 14:36 |
mordred | woot | 14:36 |
corvus | mordred: thanks for your help; i think we recovered from my screwup pretty quickly :) | 14:36 |
mordred | corvus: yeah - I'm glad we accidentally cancelled the migration and then reapplied it causing that error | 14:37 |
mordred | corvus: otherwise we would have just been super confused for much longer | 14:37 |
corvus | "why did it take 10 minutes to start?" | 14:38 |
ttx | So in summary: we are trying to out-of-band clean up refs/changes/* on GitHub mirrors so that the executor does not get caught for hours cleaning them up the first time it does a git-mirror replication. The only way to do that seems to be to run a git push --mirror from a full clone with all refs/heads but no refs/changes. Any suggestion on how to do that without actually cloning all repositories locally ? | 14:39 |
mordred | ttx: nope. I think that'll require cloning all the repos | 14:39 |
ttx | OK, will upgrade my bash one-liner to a full bash script then :) | 14:40 |
mordred | :) | 14:40 |
corvus | ttx: maybe we just want to do this for nova and neutron, and let the executors handle the rest? | 14:49 |
ttx | yeah... I'll first assess how long it takes and see if that would work | 14:49 |
openstackgerrit | Merged zuul/zuul-jobs master: Document output variables https://review.opendev.org/719704 | 14:51 |
openstackgerrit | Merged zuul/zuul-jobs master: Python roles: misc doc updates https://review.opendev.org/720111 | 14:56 |
fungi | ttx: openstack-manuals is the other one gerrit tends to spend a bunch of time syncing, so maybe that one as well | 14:56 |
*** mlavalle has joined #opendev | 14:57 | |
clarkb | is there a tldr on the zuul situation? | 14:57 |
fungi | clarkb: i posted one in #openstack-infra i can copy here | 14:59 |
clarkb | I see it now thnaks | 15:00 |
corvus | clarkb: ran out of memory, i restarted, got confused why it was hanging during start, aborted startup, started again, got a db migration error, realized that's why it was slow, manually reverted the db migration, started it again and just let it run, then all is good. | 15:00 |
clarkb | do we think the memory issue is a leak? | 15:00 |
corvus | clarkb: the big run up was a few weeks ago, and may have been due to me using the repl | 15:00 |
corvus | i'd like to disregard this data point based on that | 15:01 |
corvus | clarkb: https://review.opendev.org/721283 should improve the logging to print out migration info (but we won't see it until we stop using our custom log config) | 15:01 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook https://review.opendev.org/721098 | 15:03 |
*** dpawlik has quit IRC | 15:07 | |
mordred | clarkb, corvus: https://review.opendev.org/#/c/721098/ is not quite ready yet (I keep finding things - but I think they'll all help with splitting site.pp generally) ... but the stack leading up to it should all be both ready and safe to land | 15:09 |
clarkb | mordred: ok I've got a couple meetings to watch for the next little bit then will try to review | 15:09 |
clarkb | I also need to get a change up to simplify the docker-compose stuff (zk can use the global install and global install doesn't need to keep removing distro package now) | 15:10 |
*** ykarel is now known as ykarel|away | 15:18 | |
sgw | Another dumb question: So we added the .zuul.yaml to starlingx/kernel, but I don't see the zuul job running on zuul.openstack.org for that commit, did we miss something else? | 15:25 |
clarkb | sgw: can you link to the change? | 15:26 |
sgw | https://review.opendev.org/720004 | 15:26 |
fungi | entirely possible it got pushed in the few minutes when the scheduler was offline for emergency maintenance | 15:26 |
sgw | so try recheck? | 15:26 |
fungi | mm, no it was pushed after the manitenance concluded | 15:26 |
fungi | is it maybe already queued? | 15:27 |
fungi | nope, i don't see it on the status page either | 15:28 |
sgw | Right, I thought it should show up on the status page | 15:28 |
clarkb | we should double check the project was added to the tenant config | 15:28 |
fungi | yeah, next thing i checked was the tenant configuration errors zuul is reporting, but those are all for openstack/openstack-ansible-tests | 15:29 |
clarkb | its in the file and the restarts today should've updated it. However, maybe mordred's changes aren't updating zuul config properly? | 15:29 |
fungi | mordred did say ansible hasn't updated the scheduler in several days | 15:30 |
frickler | corvus: once zuul is stable, meetpad is broken for me, the redirects for the etherpad work, but I can't start a meeting. I'm assuming some bug in https://review.opendev.org/720095 | 15:30 |
AJaeger | http://zuul.opendev.org/t/openstack/projects does not list starlingx/kernel | 15:31 |
corvus | frickler: ah thanks, i'll take a look. | 15:31 |
clarkb | AJaeger: k so it likely is the issue of infra-prod-* jobs not updating zuul.opendev.org properly | 15:31 |
clarkb | mordred: ^ fyi | 15:31 |
fungi | clarkb: AJaeger: yep, not (yet) updated in /etc/zuul/layout/main.yaml on zuul.o.o | 15:32 |
mordred | there is a patch in gate that should fix this | 15:34 |
mordred | https://review.opendev.org/#/c/721260/ | 15:35 |
corvus | frickler: it looks like only 2 of the 4 services are running | 15:36 |
AJaeger | mordred: the job is still in check after the restart - should we move it to gate? | 15:36 |
corvus | FATAL ERROR: JVB auth password must be changed, check the README | 15:37 |
mordred | AJaeger: yeah - how about i do that real quick | 15:38 |
corvus | frickler: ^ i guess they updated the docker images, probably to make that password change mandatory | 15:38 |
corvus | so we could either pin to old images, or actually come up with real passwords | 15:38 |
corvus | may as well do real passwords, since we might want to expose xmpp later anyway | 15:38 |
mordred | corvus: might as well | 15:39 |
mordred | AJaeger: enqueued | 15:39 |
*** donnyd has joined #opendev | 15:39 | |
mordred | clarkb: incidentally (and you'll see this in the eavedrop patch) - I learned that the hostname: ansible task requires dbus to exist on systemd systems - and we have that on our cloud-provider images in prod, but it's not installed in the gate nodes. yay dbus | 15:40 |
donnyd | I guess I also need to start having my infra discussions here too right? | 15:41 |
corvus | mordred: what's the state of getting /p/ mapped into the gerrit container? | 15:41 |
corvus | mordred: my local git repos are getting further and further behind | 15:41 |
corvus | i feel like we should either treat that as a serious regression, or stop running the service | 15:42 |
clarkb | /p/ is going away right? | 15:42 |
mordred | corvus: we need to restart the gerrit contianer | 15:42 |
clarkb | should we redirect it to opendev.org/ or just / on gerrit? | 15:42 |
clarkb | ah if this is already handled then /me gets out of the way | 15:42 |
corvus | mordred: can we just do that now? | 15:42 |
mordred | yes | 15:42 |
corvus | mordred: you want to type that or shall i? | 15:42 |
mordred | either way - I can if you wanna do a status log | 15:43 |
corvus | mordred: sgtm | 15:43 |
corvus | mordred: do you want to do a notice though? | 15:43 |
corvus | or just a log? | 15:43 |
mordred | oh - yeah - that's what I meant | 15:43 |
mordred | a notice | 15:43 |
corvus | k | 15:43 |
corvus | status notice Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references. | 15:44 |
mordred | corvus: hang on one sec | 15:44 |
mordred | (but that looks good) | 15:44 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml https://review.opendev.org/721245 | 15:45 |
corvus | how's that ^ (i didn't want to advertise /p/ specifically, but wanted to give folks a breadcrumb in case they saw what i was seeing) | 15:45 |
mordred | corvus: ok. we're good (I was double checking the config and then had a "wait, is that right?" moment, but I'm back to being good | 15:45 |
corvus | #status notice Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references. | 15:45 |
openstackstatus | corvus: sending notice | 15:45 |
-openstackstatus- NOTICE: Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references. | 15:45 | |
fungi | donnyd: opendev-wide infrastructure discussions in here, openstack-project-specific infrastructure discussions still make sense in #openstack-infra | 15:46 |
donnyd | thanks fungi | 15:47 |
donnyd | So I should have brought the issue i noticed this morning to this channel as it effected all opendev (on OE that is) | 15:48 |
openstackstatus | corvus: finished sending notice | 15:49 |
corvus | mordred: ^ | 15:49 |
mordred | ok. stopping gerrit | 15:49 |
mordred | gerrit is starting | 15:49 |
mordred | corvus: exception in error log | 15:51 |
corvus | looking | 15:51 |
mordred | gerrit seems up - but we're getting mergability check tracebacks | 15:52 |
mordred | or - we got one | 15:52 |
clarkb | I think its normal to get a variety of excepions if you want to compare to pre restart | 15:52 |
clarkb | things like ssh connections closing unexpectedly | 15:52 |
mordred | nod | 15:52 |
corvus | yeah, i think there were some before | 15:52 |
mordred | ok. cool | 15:52 |
corvus | having said that, we may want to see if we can figure out what repo that is and fsck it | 15:53 |
corvus | because: Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing blob baeb8879a2ae011e4ea3836dabba584f1311f814 | 15:53 |
corvus | i love how it doesn't say what repo | 15:53 |
mordred | yup. super helpful | 15:53 |
corvus | "just google for the sha and see if github has it" | 15:53 |
clarkb | mordred: corvus: this restart used the new graceful stop right? (want to confirm that worked as expected) | 15:53 |
mordred | clarkb: the graceful stop was in the compose file - I don't know that I experienced different behavior | 15:54 |
corvus | i don't see any log lines about stopping | 15:54 |
corvus | clarkb: maybe you can test it out on review-dev? | 15:55 |
clarkb | k | 15:56 |
fungi | donnyd: there's a good chance the problem you spotted was actually a symptom of the zuul scheduler running out of memory | 15:57 |
donnyd | that would make sense.. .it was firing nodes and they came up fine.. it just seemed like the rest of the process was busted somewhere | 15:57 |
fungi | since that would have caused nodepool to delete lots of nodes out from under zuul, resulting in jobs getting rerun en masse | 15:58 |
fungi | (because of zookeeper disconnects) | 15:58 |
sgw | So, is the starlingx/kernel setup correctly, or did the restart help? Or should I fire a recheck | 16:00 |
clarkb | sgw: we need a zuul job on our end to run and update zuul's config. This is fixed by a change in the gate apparently | 16:01 |
clarkb | sgw: I think you just need to wait until your projcets shows up here http://zuul.opendev.org/t/openstack/projects | 16:01 |
openstackgerrit | Merged opendev/system-config master: Fix remote_puppet playbook names https://review.opendev.org/721260 | 16:02 |
mordred | ok - there's the patch | 16:02 |
AJaeger | sgw: that change needed to merge first ^ | 16:02 |
sgw | Ah so we missed adding it to the projects list? | 16:03 |
mordred | we've been updating our config management- and we had a typo that caused some of it to not actually run | 16:03 |
AJaeger | sgw: you did fine - we missed telling zuul about it | 16:03 |
mordred | the config management in question was the stuff that actually applies the zuul main.yaml config file :) | 16:03 |
sgw | ah ok thanks | 16:03 |
mordred | the job should run in the next few minutes - the deploy job is enqueued | 16:04 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Use real passwords for meetpad https://review.opendev.org/721295 | 16:06 |
corvus | frickler, clarkb, mordred, fungi: ^ | 16:06 |
*** rpittau is now known as rpittau|afk | 16:07 | |
mordred | corvus: ++ | 16:07 |
fungi | sgw: yeah, you caught us mid-transition between how we're handling applying those configurations | 16:14 |
openstackgerrit | Merged zuul/zuul-jobs master: Fail fetch-sphinx-tarball if no html exists https://review.opendev.org/721221 | 16:19 |
clarkb | AJaeger: is http://lists.openstack.org/pipermail/openstack-infra/2020-April/006623.html something you might be able to respond to? its questions baout docs publishing and layout for airship | 16:20 |
AJaeger | clarkb: yes, can do | 16:22 |
clarkb | thank you! | 16:23 |
*** fdegir has joined #opendev | 16:25 | |
mordred | AJaeger: remote-puppet-else is running | 16:25 |
AJaeger | \o/ | 16:27 |
mordred | AJaeger, sgw : zuul has been updated - should be all good now | 16:42 |
mordred | sorry about the delay | 16:42 |
sgw | do I need to do a recheck or is it in the queue | 16:42 |
clarkb | sgw: you need to recheck | 16:43 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Cleanup unneeded things post docker-compose upgrade https://review.opendev.org/721304 | 16:47 |
clarkb | infra-root ^ small cleanup from docker-compose things friday | 16:48 |
clarkb | mordred: on the applytest changes in system-config we don't seem to run the applytest job? | 16:57 |
clarkb | oh wait nevermind I think we do I'm just blind | 16:57 |
*** sshnaidm is now known as sshnaidm|afk | 16:59 | |
clarkb | mordred: comment on https://review.opendev.org/#/c/720887/5 not sure if you want to do that in a followon or update the change | 17:00 |
clarkb | mordred: comment on https://review.opendev.org/#/c/717492/4 which is a bit moreinvolved | 17:03 |
AJaeger | clarkb, mordred , let me do that followup change on 720887 | 17:04 |
AJaeger | clarkb: want to +A 720887 then? | 17:04 |
clarkb | AJaeger: sure | 17:04 |
dmsimard | is anyone else getting a basic authentication prompt on https://opendev.org/recordsansible/ara ? I don't get one on https://opendev.org/zuul/zuul | 17:06 |
openstackgerrit | Andreas Jaeger proposed opendev/system-config master: Remove system-config-puppet-beaker-rspec-puppet-4-centos-7-infra https://review.opendev.org/721312 | 17:06 |
AJaeger | clarkb, mordred ^ | 17:06 |
AJaeger | dmsimard: I get the prompt as well ;( | 17:06 |
clarkb | dmsimard: yes we tracked it down a few weeks back. It has to do with your readme having a 404 link iirc | 17:07 |
clarkb | something like that | 17:07 |
clarkb | (and that is how gitea expresses it when rendering the readme) | 17:07 |
dmsimard | yeah there's a link to 127.0.0.1:8000 in the readme, that's proabably not going to work | 17:08 |
openstackgerrit | Merged opendev/system-config master: Remove puppet-beaker-rspec-puppet-4-infra-system-config https://review.opendev.org/720799 | 17:13 |
openstackgerrit | Merged opendev/system-config master: Remove unused rspec tests https://review.opendev.org/720802 | 17:13 |
openstackgerrit | Merged opendev/system-config master: Make applytest files outside of system-config https://review.opendev.org/720848 | 17:18 |
sgw | Thanks for your help unsticking zuul and the starlingx/kernel jobs | 17:20 |
openstackgerrit | Merged opendev/system-config master: Move puppet apply jobs to system-config repo https://review.opendev.org/720887 | 17:30 |
AJaeger | config-core, please review https://review.opendev.org/#/c/720889 now that 720887 is merged ^. And infra-root, please review https://review.opendev.org/721312 | 17:31 |
AJaeger | mordred: did you see the -1 on https://review.opendev.org/#/c/720719/ ? | 17:31 |
AJaeger | clarkb: please have a look at https://review.opendev.org/#/c/720900/ - mass puppet retirement. Should we ask for an announcement email? | 17:32 |
clarkb | AJaeger: ya why don't I write an email to openstack-infra | 17:33 |
AJaeger | that works as well ;) | 17:33 |
AJaeger | thx | 17:33 |
*** ralonsoh has quit IRC | 17:39 | |
clarkb | AJaeger: note sent | 17:40 |
*** roman_g has quit IRC | 17:56 | |
*** prometheanfire has quit IRC | 18:15 | |
*** slittle1 has quit IRC | 18:17 | |
*** slittle1 has joined #opendev | 18:21 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook https://review.opendev.org/721098 | 18:28 |
mordred | clarkb: yeah - I'll do the centos removal in a followup | 18:28 |
clarkb | mordred: AJaeger volunteered fwiw so I approved the parent | 18:28 |
mordred | clarkb: cool | 18:29 |
mordred | clarkb: so - etherpad-dev ... I don't know - do we want to keep an etherpad-dev? | 18:29 |
mordred | at the very least I think we should re-deploy one using the new stuff | 18:29 |
openstackgerrit | Merged opendev/system-config master: Use real passwords for meetpad https://review.opendev.org/721295 | 18:29 |
mordred | so removing the existing one probably isn't a bad idea anyway | 18:29 |
clarkb | mordred: ya I think its mostly about making sure we know what the plan is there and not ignoring it if we need to not ignore it | 18:30 |
mordred | ++ | 18:30 |
clarkb | I'm semi interested in running it like we do gitea and others. Basically rely on our test tooling as -dev replacement | 18:30 |
mordred | yeah. I think I'd like to do that until such a time as we realize it doesn't work | 18:30 |
clarkb | but etherpad is different in that its client behavior is really important so being able to test it "live" might make it an exception? | 18:30 |
mordred | also a good point | 18:30 |
mordred | since we're building our own images, we'd likely need to make a second image with like a :dev tag | 18:31 |
mordred | so that we could run them on different tags | 18:31 |
clarkb | mordred: thinking out loud here, maybe we can get away with holding a test node for actual client verification | 18:33 |
*** slittle1 has quit IRC | 18:33 | |
mordred | yeah | 18:34 |
clarkb | basically build -dev on demand rather than keeping it around (I kinda like that) | 18:34 |
mordred | yeah. me too | 18:34 |
mordred | I think it's worth at least trying | 18:34 |
mordred | turns out we can always spin up another etherpad-dev if we need to | 18:34 |
AJaeger | mordred: https://review.opendev.org/721312 is the centos removal | 18:35 |
mordred | AJaeger: ++ | 18:37 |
mordred | AJaeger: fun on the focal image build :( | 18:37 |
openstackgerrit | Monty Taylor proposed openstack/project-config master: Start building focal images https://review.opendev.org/720719 | 18:38 |
AJaeger | mordred: yeah | 18:39 |
AJaeger | speaking about nodepool, here's one more change to review, please https://review.opendev.org/720641 | 18:39 |
AJaeger | mordred: and you need a +2 on https://review.opendev.org/#/c/720718/ (mirror focal) first... | 18:40 |
*** jrosser has quit IRC | 18:41 | |
*** slittle1 has joined #opendev | 18:41 | |
AJaeger | mordred: don't we add it to nb04 as paused as well? | 18:42 |
*** jrosser has joined #opendev | 18:42 | |
clarkb | AJaeger: yes those two should be disjoint | 18:43 |
clarkb | so enable on !nb04 and disable on nb04 or vice versa | 18:43 |
AJaeger | config-core, please review https://review.opendev.org/#/c/720889/3 and https://review.opendev.org/#/c/720890/ (needs recheck once 720889 is merged) | 18:50 |
clarkb | looking | 18:51 |
mordred | clarkb: sigh: https://zuul.opendev.org/t/openstack/build/9ec11dba30e54f3a9f4d658d600b1d6d/log/eavesdrop01.openstack.org/syslog.txt#1518 | 18:54 |
mordred | clarkb: what do you think we should do about that? the issue is that puppet is trying to start accessbot - which is clearly not going to work because we don't have a real irc account there | 18:55 |
openstackgerrit | Drew Walters proposed openstack/project-config master: Add Airship subproject documentation job https://review.opendev.org/721328 | 18:55 |
mordred | (also - puppet logging to syslog is ... really annoying) | 18:55 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists https://review.opendev.org/721258 | 18:55 |
clarkb | mordred: maybe we should just stick with the noop apply jobs for now then? | 18:56 |
clarkb | seems like handling cases like that will be easier as we convert into ansible | 18:56 |
*** prometheanfire has joined #opendev | 18:56 | |
mordred | clarkb: yeah. well - oh, you know | 18:56 |
mordred | clarkb: what if we add noop support to the puppet role | 18:56 |
mordred | so we still break it out into a service playbook like that patch is doing | 18:57 |
mordred | but we just have the test job run with noop set | 18:57 |
clarkb | that might give us better coverage of the puppet machinery | 18:57 |
clarkb | if not quite for every puppet object | 18:57 |
mordred | yeah | 18:57 |
clarkb | ya I think I like that | 18:57 |
clarkb | its a decent halfway compromise | 18:57 |
mordred | yeah. that said - _most_ of these things work just ine | 18:58 |
mordred | fine | 18:58 |
mordred | it's really just starting services like accessbot | 18:58 |
mordred | so maybe thinking about passing a "skip services" flag, similar to noop, that we can protect things like accessbot that are just impossible | 18:58 |
mordred | we might not have that many | 18:58 |
clarkb | ++ | 18:58 |
mordred | and we _can_ do most of these things | 18:58 |
mordred | I'll play with that first | 18:58 |
mordred | (we could always spin up an irc daemon and point accessbot at it ;) ) | 18:59 |
* mordred is not going to do that | 18:59 | |
mordred | also - I mean, one of my next tasks is ansible-ifying eavesdrop _anyway_ | 19:00 |
mordred | I was thinking it would make an easy first puppet split out - but I might also just ansiblify it rather than update teh puppet to conditionally run accessbot | 19:01 |
openstackgerrit | Merged openstack/project-config master: Use legacy infra puppet jobs from system-config https://review.opendev.org/720889 | 19:05 |
openstackgerrit | Drew Walters proposed openstack/project-config master: Add Airship subproject documentation job https://review.opendev.org/721328 | 19:08 |
AJaeger | ianw: since you reworked all the docs publishing for static, could you review the change above, please ^ | 19:15 |
AJaeger | is that all fine - including directory structure? | 19:15 |
clarkb | ok meeting agenda is out | 20:58 |
clarkb | I might finally have time to look at nodepool logs again | 20:58 |
clarkb | looks like rackspace has been happy but inap is not currently | 20:59 |
clarkb | based on grafana graphs | 21:00 |
openstackgerrit | Joseph Richard proposed openstack/project-config master: Add Portieris Armada app to StarlingX https://review.opendev.org/721343 | 21:01 |
clarkb | mordred: if you have a second I was going to test docker-compose stoppage of gerrit on review-dev but there are a ton of jeepyb upstream project leaked processes that maybe we should deal with first | 21:05 |
clarkb | I seem to recall something had to be fixed around that, is there cleanup we need to do? | 21:05 |
mordred | clarkb: oh - I think we already fixed that but probably didnt' clean up on review-dev | 21:07 |
mordred | the issue was that we weren't mounting in all the right things, so manage-projects was starting and then couldn't log in to gerrit, so it just sits there retrying untilthe end of time | 21:08 |
*** sgw has quit IRC | 21:09 | |
clarkb | corvus: looking at the nodepool behavior with fresh eyes. I think nodepool is actually aware that it is at or near quota, it then pauses while waiting for ~150 nodes to delete. Then nodes "delete" but nova quota isn't updated so nodepool unpauses and tries to launch nodes and fails on quota errors. | 21:09 |
clarkb | now I think where we get in trouble is we then pause immediately again? | 21:10 |
clarkb | so we can end up with multiple requests locked in a provider that isn't really in a happy state | 21:10 |
clarkb | I'll try and get an etherpad of relevant logs together | 21:10 |
fungi | clarkb: mordred: also the cronspam from the daily backups for review-dev are complaining | 21:17 |
fungi | i don't have an example handy, but there will be another in a few hours | 21:17 |
*** sgw has joined #opendev | 21:18 | |
clarkb | https://etherpad.opendev.org/p/lF1-vMNVtDqlEH1QwNAZ my notes on nodepool behavior | 21:18 |
clarkb | going to share in #zuul now as I think this is mostly a nodepool thing | 21:19 |
sgw | Hi Team, has something changed with the build-openstack-docs-pti template? We are seeing a POST_FAILURE in the starlingx/zuul-jobs repo with this change: https://review.opendev.org/721294 | 21:20 |
sgw | This repo does not generate any docs | 21:20 |
sgw | AJaeger: ^^^ you put this into starlingx/zuul-jobs, we are also seeing a similar issue in the new starlingx/kernel repo | 21:22 |
clarkb | sgw: do you have the change that added the jobs handy? | 21:30 |
fungi | clarkb: https://review.opendev.org/677739 added build-openstack-docs-pti to starlingx/zuul-jobs when it merged on 2019-08-26 | 21:32 |
fungi | so that's been a while | 21:33 |
mordred | build-openstack-docs-pti seems like a weird job to run on a starlingx repo - but I don't know background there | 21:33 |
fungi | since it was added in august, that likely predates us better standardizing opendev docs jobs | 21:37 |
sgw | Should I just disable that job for now since that repo does not have any docs requirements anyway | 21:40 |
openstackgerrit | Douglas Mendizábal proposed openstack/project-config master: Add ansible role for managing Luna SA HSM https://review.opendev.org/721349 | 21:41 |
openstackgerrit | Monty Taylor proposed opendev/puppet-accessbot master: Add flag to skip running the access script https://review.opendev.org/721350 | 21:41 |
clarkb | sgw: yes if there are no docs to build I think yo ucan safely drop the docs jobs | 21:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook https://review.opendev.org/721098 | 21:45 |
mordred | clarkb: ^^ the depends-on won't actually work - I don't have the puppet modules doing that yet - and I don't know if I care enough to | 21:45 |
mordred | clarkb: but if we could go ahead and land the puppet change, I think we can recheck the system-config change and it should work | 21:45 |
mordred | I looked at the other thing - but the ansible task is going to be ... a little more involved | 21:45 |
mordred | and I think it should be done as a separate change | 21:46 |
clarkb | k | 21:46 |
mordred | clarkb: that said - this is going to be another great one to have triggered by project-config changes | 21:48 |
mordred | since the puppet run itself is the actual "bot" in this case | 21:48 |
mordred | clarkb: we have a file - checkaccess.py in puppet-accessbot - that doesn't seem to be used anywhere | 21:50 |
clarkb | mordred: is that what we run to check that the proper perms are already set on channels to add the bot? | 21:50 |
mordred | clarkb: oh - maybe so? | 21:51 |
mordred | we don't install it | 21:51 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: WIP Build a container for accessbot https://review.opendev.org/721354 | 22:03 |
fungi | mordred: is that ^ maybe something we should eventually be running as a periodic zuul job? | 22:04 |
mordred | fungi: yup! | 22:04 |
fungi | cool, doesn't seem like it actually needs a home on any persistent server | 22:05 |
mordred | fungi: what I'm thinking is - make the container - install the container and config files with ansible in service-eavesdrop.yaml - and them make a run-accessbot.yaml playbook that just runs the command - that we can run in response to project-config changes and also in timer jobs | 22:06 |
mordred | then we can run service-eavesdrop in the gate, but leave off run-accessbot | 22:06 |
mordred | fungi: yeah - we could almost certainly even not install it - but we do need the config files and secrets, so it might be just as easy to keep the pattern | 22:07 |
fungi | yeah, for now it makes sense not to change much | 22:07 |
fungi | but ultimately it's just a command we run periodically with some config from git and a secret | 22:07 |
mordred | yup | 22:08 |
fungi | there's no state to maintain | 22:08 |
fungi | (unlike our other irc bots) | 22:08 |
mordred | fungi: in the mean time -if you have a sec, could you https://review.opendev.org/721350 ? | 22:09 |
mordred | I need that to unstick https://review.opendev.org/721098 | 22:09 |
mordred | (turns out running the command when we run the puppet in the gate job is unhappy) | 22:09 |
mordred | the depends-on will not work | 22:09 |
fungi | sure | 22:10 |
mordred | thanks! | 22:10 |
* mordred promises the is all going to lead to finishing the gerrit task and getting gerritbot updating again | 22:10 | |
fungi | mordred: looks like there's some negative ci results for 721350 | 22:11 |
mordred | fungi: booo | 22:11 |
fungi | legacy-puppet-lint and system-config-puppet-beaker-rspec-puppet-4-infra | 22:11 |
mordred | wow | 22:13 |
mordred | so ... | 22:13 |
mordred | ok - nevermind | 22:13 |
mordred | I'm just going to finish the ansiblifying in the morning | 22:13 |
fungi | heh | 22:13 |
mordred | because I'm NOT fixing that | 22:14 |
fungi | more bitrot? i haven't looked at the errors yet | 22:14 |
mordred | yeah- well - sort of | 22:14 |
mordred | I think it's bitrot in terms of we have a central test that expects to run that likely hasn't been run here | 22:14 |
fungi | but it seems like every time i turn my back, yet another ruby gem decides it no longer supports xenial's version of ruby | 22:14 |
mordred | yeah | 22:14 |
ianw | mordred / clarkb: not sure if you saw, after a number of false starts i diagnosed the nb04 down to ubuntu somehow destroying the container (https://etherpad.opendev.org/p/8GqzoNcof6_QHo1jfw2V) | 22:14 |
mordred | ianw: I saw the disable ubuntu on nb04 patch but hadn't seen that | 22:15 |
* fungi imagines ubuntu dousing some crates with petrol and setting them ablaze | 22:15 | |
ianw | i'm not sure why ... we do the same thing in the gate for the nodepool container func test | 22:15 |
ianw | anyway, that will obviously be a blocker for migrating all our hosts to the container builder | 22:16 |
mordred | ianw: do we run more than one in the gate? I mean - the first build seems to work fine | 22:16 |
ianw | that was the red-herring ... the rpms builds work fine | 22:16 |
mordred | ianw: is it that dib is finding a mount inside of the container and "cleaning it up" ... oh | 22:16 |
mordred | so it's not a fundamentally dib thing | 22:17 |
mordred | it's the debootstrap | 22:17 |
mordred | so - potentially something about how debootstrap builds its chroot - or more likeluy cleans up after itself | 22:17 |
mordred | is "cleaning up" something it shouldn't be | 22:17 |
ianw | i have a suspicion debootstrap is involved ... i need to trace it out today | 22:17 |
mordred | yea | 22:17 |
mordred | ianw: I agree with that suspicion | 22:17 |
ianw | i'm not really sure why "umount /proc" in a container actually works | 22:17 |
ianw | i guess no daemon has any part of it open? | 22:18 |
ianw | and i'm not sure why it would work in the gate if it doesn't in production | 22:18 |
ianw | these are the mysteries of our time | 22:18 |
mordred | ianw: by the time you figure this out, you're going to understand everything about docker | 22:18 |
*** tosky has quit IRC | 22:19 | |
ianw | haha docker will probably get bought by microsoft and we'll probably switch to podman then :) | 22:19 |
mordred | ianw: :) | 22:19 |
* ianw wonders if docker is already owned by microsoft, it's hard to keep up | 22:20 | |
mordred | ianw: so - a thing to ponder in parallel | 22:20 |
mordred | ianw: https://review.opendev.org/#/c/700083/ <-- if we get that working, we could use the ubuntu docker image for the initial rootfs and avoid the debootstrap step | 22:21 |
mordred | ianw: it might be a thing to ponder depending on how today's debugging goes | 22:22 |
ianw | yeah it's certainly on my mind | 22:22 |
fungi | also you might consider switching to mmdebstrap, if we're doing it in containers and not stuck with ancient tools | 22:23 |
mordred | fungi: wow - what's mmdebstrap? | 22:23 |
fungi | it's in debian as of buster, and ubuntu as of disco | 22:24 |
mordred | fungi: neat | 22:24 |
fungi | In contrast to debootstrap it uses apt, supports more than one mirror, automatically uses security and updates mirrors for Debian stable chroots, is 3-6 times faster, produces smaller output by removing unnecessary cruft, is bit-by-bit reproducible if $SOURCE_DATE_EPOCH is set, allows unprivileged operation using Linux user namespaces, fakechroot or proot and can setup foreign architecture chroots using | 22:24 |
fungi | qemu-user. | 22:24 |
fungi | https://packages.debian.org/mmdebstrap | 22:24 |
mordred | ianw: so - yeah - might also be worth trying updating dib to use mmdebstrap | 22:24 |
corvus | ianw, mordred: i believe that change is working | 22:24 |
corvus | gimme a sec to dig up links | 22:25 |
mordred | corvus: yeah - by "get it working" I might just mean "figure out the test failures" | 22:25 |
corvus | it's actually passing the relevant tests | 22:25 |
mordred | cool | 22:25 |
corvus | the failures are that it's only tested under bionic or something | 22:25 |
corvus | because podman isn't installed on the others | 22:25 |
fungi | mordred: ianw: i've been using mmdebstrap on sid for creating my stable chroots for a couple years now, works well | 22:25 |
mordred | cool | 22:26 |
* mordred has to afk | 22:26 | |
corvus | ianw, mordred: so to get this working we might just need to add more podman support to distros in zuul-jobs, or disable that functest on platforms where we can't | 22:26 |
openstackgerrit | Merged opendev/system-config master: Cleanup unneeded things post docker-compose upgrade https://review.opendev.org/721304 | 22:26 |
fungi | also the fact that it can use qemu-user for foreign architectures would make it possible to build an arm64 ubuntu image on an amd64 vm, in theory (though i don't know how slow that would be) | 22:27 |
ianw | corvus: it would be good to do some boot tests too, even in experimental. although the nodepool tests are on my todo list to replace with container based ones because they don't really reflect production | 22:28 |
corvus | ah yep | 22:28 |
clarkb | fungi: slow as molasses probably | 22:31 |
fungi | yeah, depends on how much actually runs under qemu-user | 22:32 |
fungi | like, unpacking the debs doesn't need emulation | 22:32 |
fungi | but maintscripts probably could | 22:32 |
fungi | also probably dib would need to be extended to run certain phases for elements in a similar emulation layer anyway | 22:32 |
fungi | so it's not like we could take advantage of that straight away | 22:33 |
clarkb | ianw: fungi mordred corvus is there a tldr on the dib things? or should I just not worry? was pretty heads down on nodepool things but I think we've maybe pulled that thread to a conclusion (at least until change gets written?) | 22:37 |
ianw | clarkb: the long story short is now i've realised that ubuntu doesn't seem to build under the container as is | 22:38 |
ianw | we can either fix it, or move ahead with an alternative approach like the container base images, or try something in between like new debootstraps | 22:39 |
ianw | then we should update the dib nodepool tests to be testing our production images under container builds | 22:39 |
clarkb | thanks | 22:40 |
ianw | ok, i've figured out the magic ruins to run a test build under strace on nb04 and it's outputting to /root/trace/out.txt ... this should tell us if it's clearly someting running "umount /proc" | 22:55 |
ianw | keeping notes in : https://etherpad.opendev.org/p/8GqzoNcof6_QHo1jfw2V | 22:57 |
ianw | https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=919659 | 23:02 |
openstack | Debian bug 919659 in live-build "live-build: building in docker fails with mounting /proc unmount /sys" [Important,Open] | 23:02 |
ianw | "2020-04-20 22:58:27.162 | W: Failure trying to run: chroot "/tmp/dib_build.JD7OVXuc/mnt" mount -t proc proc /proc" | 23:05 |
fungi | i guess that should be merged with bug 921815 | 23:05 |
fungi | er, https://bugs.debian.org/921815 i mean | 23:06 |
openstack | Debian bug 921815 in debootstrap "debootstrap umount "host" /proc when running in a Docker container" [Normal,Open] | 23:06 |
ianw | yeah, that links to an unmerged pull request | 23:06 |
fungi | well, "merge request" (because it's gitlab), but yep | 23:07 |
ianw | --cap-add SYS_ADMIN might be enough? i sort of thought that a privileged container had that | 23:07 |
clarkb | I need to reboot for overdo system updates. Back in a bit | 23:07 |
fungi | ianw: i thought it did too, but i'm not super confident in my grasp of container implementations | 23:07 |
fungi | the first bug you linked had a runtime hackaround mentioned downthread | 23:08 |
ianw | it must be allowed to do mounts, all the others mount a million things ... | 23:08 |
ianw | 4197 mount("proc", "/proc", "proc", MS_MGC_VAL, NULL) = -1 ELOOP (Too many levels of symbolic links) | 23:09 |
ianw | 4197 write(2, "/proc: mount(2) system call fail"..., 70) = 70 | 23:09 |
ianw | right, it's not failing with a permissions error, but something like the layout issues | 23:10 |
ianw | fungi: is mmdebstrap a complete rewrite? | 23:17 |
ianw | hrm "Debootstrap supports creating a Debian chroot on non-Debian systems but mmdebstrap requires apt and is thus limited to Debian and derivatives." | 23:19 |
clarkb | apt is available on other system though right? | 23:19 |
clarkb | posible that isn't sufficient though | 23:19 |
fungi | ianw: seems like a complete rewrite at least. there's also cdebootstrap, which is a rewrite in c | 23:20 |
ianw | johnsom / cgoncalves are possibly the main people who might care about building debuntu on !debuntu? | 23:21 |
johnsom | debuntu? | 23:22 |
fungi | a colloquial term for the family of debian and debian derivative distributions | 23:23 |
fungi | (such as ubuntu | 23:23 |
fungi | ) | 23:23 |
johnsom | Currently Octavia cares about Ubuntu and CentOS/RHEL | 23:23 |
fungi | does octavia use centos/rhel to build ubuntu images with dib? | 23:24 |
johnsom | There might be some shenanigans like that happening, but I don't think it is a *need*. | 23:25 |
ianw | right now i'm thinking import the debootstrap patch into the openstackci ppa and put that in the container | 23:25 |
ianw | we ran with a debootstrap from there (still do maybe on xenial?) | 23:26 |
johnsom | We do use debootstrap for the ubuntu-minimal style builds as the cloud images are ... large and this has been more stable for us (i.e. changing cloud image formats, etc.) | 23:27 |
ianw | yeah i'd prefer to KISS ... i don't think we want to spend time on new debootstrap implementations that might make life hard for existing users of -minimal images | 23:29 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: helm-template: enable using values file https://review.opendev.org/721365 | 23:34 |
fungi | also mmdebstrap isn't available on old ubuntu versions (prior to 20.04 lts, which isn't even out yet) so, yeah, that would be a challenge for anyone not running from a debian/buster container | 23:44 |
fungi | i agree patching debootstrap is probably the best option for now | 23:44 |
johnsom | I am jumping between a bunch of conversations, so hard for me to track here. Let me know if there is feedback I can give for DIB. (FYI, 20.04 is planned for release this week last I checked). | 23:52 |
fungi | johnsom: nope, that was helpful, thanks | 23:52 |
johnsom | Ok, cool! Easiest conversation I have had today | 23:53 |
fungi | we likely don't want to cause problems for folks building images from older ubuntu releases for a while still | 23:53 |
fungi | i was only considering opendev's use case, as ianw rightly pointed out | 23:53 |
ianw | https://launchpad.net/~openstack-ci-core/+archive/ubuntu/debootstrap/+sourcepub/11197793/+listing-archive-extra | 23:59 |
ianw | this should let us test | 23:59 |
ianw | it's nice to have options; of the three "fixes" available https://salsa.debian.org/installer-team/debootstrap/-/merge_requests/27 seemed the most appropriate | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!