opendevreview | Kenny Ho proposed zuul/zuul-jobs master: fetch-output-openshift: add option to specify container to fetch from https://review.opendev.org/c/zuul/zuul-jobs/+/843527 | 00:01 |
---|---|---|
opendevreview | Kenny Ho proposed zuul/zuul-jobs master: fetch-output-openshift: add option to specify container to fetch from https://review.opendev.org/c/zuul/zuul-jobs/+/843527 | 00:01 |
*** dviroel is now known as dviroel|out | 00:04 | |
fungi | infra-prod-base deployments are choking on an apt cache failure for mirror01.regionone.osuosl | 00:24 |
fungi | taking a look | 00:24 |
opendevreview | Kenny Ho proposed zuul/zuul-jobs master: fetch-output-openshift: add option to specify container to fetch from https://review.opendev.org/c/zuul/zuul-jobs/+/843527 | 00:39 |
opendevreview | Ian Wienand proposed opendev/glean master: redhat-ish platforms: write out ipv6 configuration https://review.opendev.org/c/opendev/glean/+/843243 | 01:08 |
opendevreview | Ian Wienand proposed opendev/glean master: redhat-ish platforms: use IPV6_DEFAULTGW https://review.opendev.org/c/opendev/glean/+/843758 | 01:08 |
opendevreview | Ian Wienand proposed opendev/glean master: [wip] handle slaac https://review.opendev.org/c/opendev/glean/+/843777 | 01:08 |
opendevreview | Ian Wienand proposed opendev/glean master: tests: symlink same files together https://review.opendev.org/c/opendev/glean/+/843979 | 01:09 |
ianw | i'm figuring we didn't decide to do this weeks meeting? | 01:23 |
ianw | this glean ipv6 change has been a *lot* more work than i thought it would be. but i'm getting close | 01:23 |
Clark[m] | ianw I sent out an agenda. It is next week that may not happen. I also forgot to put the glean stuff on the agenda | 01:32 |
ianw | oh sorry i haven't even opened mail today, doh | 01:32 |
*** ysandeep|out is now known as ysandeep | 03:31 | |
*** marios is now known as marios|ruck | 05:06 | |
frickler | not sure who still cares about translation-updates, they are broken now since they are still running on bionic, but master reqs now only work for py38 and 39. iirc there was a reason they needed older python? https://zuul.opendev.org/t/openstack/builds?job_name=propose-translation-update&skip=0 | 05:13 |
frickler | fungi: looks like there was really only a repo update in progress, apt update working fine now. but I also wonder whether we actually need /etc/apt/sources.list.d/ddebs.list , doesn't seem to be created by automation. | 06:17 |
frickler | almost completely unrelated: do we have any plans to upgrade the bridge? might be needed/useful in the context of ansible updates | 06:18 |
ianw | frickler: hrm, that translation issue sounds familiar. i found from my notes https://etherpad.opendev.org/p/translation-job-failures-02-2022 | 06:49 |
ianw | istr thinking about moving that to focal and something became too hard about it. i'm afraid i can't remember what | 06:49 |
opendevreview | Ian Wienand proposed opendev/glean master: Revert "Add option to ignore config drive interfaces info" https://review.opendev.org/c/opendev/glean/+/843225 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: write_redhat_interfaces: refactor to walk interfaces first https://review.opendev.org/c/opendev/glean/+/843241 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: write_redhat_interfaces: pass multiple networks to output functions https://review.opendev.org/c/opendev/glean/+/843242 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: _network_info: Clean up TYPE=Ethernet handling https://review.opendev.org/c/opendev/glean/+/843353 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: _network_info: refactor to add ipv4 info at the end https://review.opendev.org/c/opendev/glean/+/843367 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: tests: symlink same files together https://review.opendev.org/c/opendev/glean/+/843979 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: redhat-ish platforms: write out ipv6 configuration https://review.opendev.org/c/opendev/glean/+/843243 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: redhat-ish platforms: use IPV6_DEFAULTGW https://review.opendev.org/c/opendev/glean/+/843758 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: redhat-ish platforms: handle ipv6_slaac metadata https://review.opendev.org/c/opendev/glean/+/843995 | 06:52 |
opendevreview | Ian Wienand proposed opendev/glean master: testing: add vexxhost sample config https://review.opendev.org/c/opendev/glean/+/843996 | 06:52 |
ianw | ^ that should be a full stack of things that get ipv6 setup on redhat hosts on our providers | 06:52 |
frickler | mgariepy: seems only one of your holds triggered so far. root@158.69.70.36 is now yours | 06:58 |
*** jpena|off is now known as jpena | 07:37 | |
*** ysandeep is now known as ysandeep|lunch | 08:47 | |
*** ysandeep|lunch is now known as ysandeep | 09:29 | |
*** rlandy|out is now known as rlandy | 10:20 | |
*** rlandy_ is now known as rlandy | 10:52 | |
*** ysandeep is now known as ysandeep|break | 11:03 | |
*** pojadhav is now known as pojadhav|afk | 11:11 | |
*** ysandeep|break is now known as ysandeep | 11:19 | |
mgariepy | thanks frickler i'll look at it . | 11:20 |
*** dviroel|out is now known as dviroel | 11:29 | |
fungi | oh, the mirror01.regionone.osuosl apt cache problem was apparently a bad signature on the debug symbol packages index, but after a few times trying to force it to update the package lists, it updated cleanly. i have a feeling one of the ubuntu mirrors has a corrupt file | 11:46 |
fungi | on ddebs.ubuntu.com | 11:46 |
mgariepy | thanks again a lot for the hold i'm done with theses. | 11:49 |
frickler | mgariepy: so should I drop both holds, also the unused one? | 11:52 |
mgariepy | yep it was the same issue. | 11:52 |
frickler | o.k., done, thx for the feedback | 11:53 |
frickler | fungi: yes, I also tried it this morning and it works. but I wonder why we need ddebs at all (iiuc those are source pkgs)? | 11:54 |
fungi | frickler: not source packages, but the debug symbols stripped from built binaries, separately installable as packages | 11:55 |
fungi | that;s the modern answer to how they previously used to ship two versions of every library (one stripped, and a larger one with the debug symbols left in) | 11:56 |
fungi | so this way if you get a crash with a core dump, you can just install the relevant ddebs to get symbol resolution when loading it into the debugger | 11:57 |
fungi | but no, i don't think we need the ddeb package repositories on every server by default, we can always add them later or run the debugger in a chroot elsewhere | 11:58 |
frickler | fungi: hmm, o.k., for an aarch64 host the choice of "elsewhere" might be a bit restricted. my guess is that this was installed while debugging afs things, maybe wait for ianw's feedback before cleaning up | 12:01 |
fungi | oh, indeed /etc/apt/sources.list.d/ddebs.list was added or altered a couple of weeks ago on that server | 12:06 |
fungi | we don' | 12:06 |
fungi | t normally add that | 12:06 |
fungi | so our other servers don't have it | 12:06 |
fungi | good thinking | 12:06 |
fungi | i agree it was probably added for looking at (openafs?) arm64 binary coredumps | 12:07 |
fungi | which would explain why it was the only server that exhibited the bad signature error when updating the apt cache | 12:07 |
*** ysandeep is now known as ysandeep|afk | 12:53 | |
*** pojadhav|afk is now known as pojadhav | 13:14 | |
*** ysandeep|afk is now known as ysandeep | 13:37 | |
Clark[m] | Apparently openstacksdk 0.99.0 is an RV release for the known to break the world 1.0.0 release. Considering that I'm inclined to pin < 0.99.0 in zuul Ansible installs and see if we fix things. Separately I wonder if the sdk team knows there is a proper way to release an RC without violating server... | 13:37 |
Clark[m] | *violating semver | 13:38 |
fungi | certainly worth a try | 13:39 |
fungi | what did they expect to break with 1.0.0? | 13:39 |
Clark[m] | I don't recall details but they changed a ton of stuff to use a consistent layer internally for API operations (I wonder if that is filtering headers or not applying them for some reason on swift requests). There is an openstack discuss thread about how the openstack Ansible module is keeping an older release that pins yoo | 13:40 |
Clark[m] | I just wish a proper rc was made here. Isn't that part of what the openstack release team is meant to police? | 13:41 |
jrosser_ | there is a very long discussion about why it was apparently not possible to have an RC in the release channel on the 19th | 13:42 |
Clark[m] | But how is anyone supposed to know that 0.99.0 is a breaking change? | 13:44 |
Clark[m] | It should've been 1.0.0 | 13:45 |
Clark[m] | If you can't make a true RC then at least follow semver | 13:45 |
Clark[m] | I think we can pin in zuul. If that confirms to fix swift uploads the next step is debugging why. We may also want to raise an alarm that new sdk silently breaks swift uploads which is a different class of failure than your interface is different or you failed to make a server as there is potential data loss here (though we haven't seen data loss yet just metadata loss) | 13:52 |
fungi | specifically, breaking cors headers for objects uploaded to some swift implementations (rackspace) but not others (ovh) | 13:56 |
Clark[m] | I wonder too if the next nodepool launcher restart will break us | 13:58 |
Clark[m] | In theory nodepool tests against a devstack cloud and major impact there will be caught | 13:58 |
fungi | "the problem is that the validator currently doesn't allow an rc1 release for independent" https://meetings.opendev.org/irclogs/%23openstack-release/%23openstack-release.2022-05-19.log.html | 14:06 |
fungi | so it's just that the release management automation assumes "independent" release model won't use release candidates | 14:06 |
fungi | doesn't sound like there was any technical reason not to do an rc | 14:06 |
fungi | further reading suggests that the reason libs don't "do" release candidates is that it's hard to override constraints lists (which will only have the actual release versions, not prereleases), but that seems like it could be overcome for projects wanting to try it with a dnm change or whatever | 14:12 |
fungi | the original alternative suggestion from the release managers was to just make it 1.0.0 and then release a 2.0.0 if it needed another backward-incompatible fix | 14:14 |
fungi | but the counterargument was that projects don't need to follow semver conventions when they're still 0.x.y "experimental" versions | 14:15 |
Clark[m] | While true that ignores everyone has been told to beware the 1.0.0 release as it will break stuff. A 0.99.0 release looks like last stable release before 1.0.0 not a 1.0.0 rc | 14:19 |
Clark[m] | It's incredibly user unfriendly imo | 14:20 |
Clark[m] | The whole point here is communication and being clear to users not what you can technically get away with | 14:21 |
fungi | yeah, ultimately the guidance was "just release your breaking changes, nobody will use the release candidate so that's the only way to find out what you broke" | 14:22 |
fungi | but that assumes the only consumers of the release are other openstack projects, which is not the case especially where things like openstacksdk and openstackclient are concerned | 14:26 |
corvus | just an update: the merger shutdown fix is approved but is currently in a merging traffic jam... hopefully soon. | 14:33 |
Clark[m] | corvus: I think we should land an openstacksdk cap in zuul as well before the next restart. See above conversation for why. In addition to the swift concerns it also affects Ansible openstack modules which I think some users may use too | 14:34 |
Clark[m] | Even if this doesn't fix the swift issue the cap is appropriate for users of those modules aiui | 14:35 |
corvus | Clark: i have read and agree. you planning on proposing that? | 14:35 |
corvus | (i mean the actual changes) | 14:35 |
Clark[m] | Yes after the school run. I'm happy for someone else to push it too and I can review after | 14:35 |
fungi | well, we don't have empirical evidence to say that openstacksdk broke cors headers for rackspace swift uploads, so this pin would be a bit of a test | 14:35 |
fungi | but i can push the pin changes | 14:36 |
Clark[m] | fungi: correct but we do have a message from the openstack Ansible people saying they don't support this version of sdk with their modules and zuul supports those modules too iirc | 14:36 |
corvus | have we double checked that 0.99 is what is being used to upload them in opendev's zuul? | 14:36 |
Clark[m] | corvus I thought I checked that on Friday but it has been a long time since Friday. Double checking isn't a bad idea | 14:36 |
Clark[m] | And I was debugging several things that day | 14:36 |
corvus | Clark: i will double check that then. sound like it would be good to have 2 eyes on that. i'll await your changes after the school run. :) | 14:37 |
fungi | looks like openstacksdk is used in nodepool/requirements.txt, zuul-jobs/test-requirements.txt, zuul-registry/requirements.txt, zuul-storage-proxy/requirements.txt | 14:37 |
fungi | should we cap it there too? | 14:38 |
fungi | or just in the ansible envs? | 14:38 |
corvus | all those use sdk to talk to swift, fwiw. | 14:39 |
corvus | $ /usr/local/lib/zuul/ansible/2.9/bin/pip freeze|grep -i openstack | 14:44 |
corvus | openstacksdk==0.99.0 | 14:44 |
corvus | that's in a container on ze01. all the executors are running the same image. i believe that's the version of sdk that the log upload module will import. | 14:44 |
corvus | so i believe i've double checked Clark's work that 0.99 is the version we're currently using to upload logs | 14:45 |
*** ysandeep is now known as ysandeep|out | 14:48 | |
fungi | https://review.opendev.org/c/zuul/zuul/+/844090 Temporarily pin OpenStackSDK before 0.99 [NEW] | 14:50 |
*** dviroel is now known as dviroel|lunch | 15:22 | |
clarkb | looks like the ethercalc removal change landed. I'll go ahead and shutdown the server now. fungi remind me we have used poweroff successfully for that? I never remember which command makes the acpi vm stuff happy | 15:43 |
fungi | yes | 15:45 |
clarkb | poweroff has been run | 15:47 |
clarkb | Once the nova api reflects ths shutdown I'll start a snapshopt | 15:48 |
clarkb | it shows being shutoff now so I have started the snapshot process | 15:52 |
clarkb | fungi: corvus: as far as restarting zuul again goes we need to manually do the mergers since they aren't currently running with the fixed shutdown process. Should we go ahead and just do those? | 15:54 |
corvus | clarkb: not yet -- the stop fix has not merged yet. i'm trying to navigate the test races and get stuff merged, so hopefully soon. | 15:59 |
clarkb | ah | 16:02 |
clarkb | I guess it was just the model api fix that landed that I saw | 16:02 |
fungi | right, but once it merges, yes we still have to manually down the merger containers since they'll be running from the pre-fix images | 16:11 |
fungi | and yes, i agree it may make sense to upgrade the mergers manually and then we can try to run through the entire playbook to see if it will complete without intervention | 16:12 |
clarkb | my only concern with that is if we hit problems like last time and it takes away from summit prep or other needs. But considering we just did restarts and delta between now and then should be small this concern isn't huge | 16:14 |
fungi | yes, and it's only tuesday. i'm not flying until saturday so i'm happy to shift gears to look at this if needed | 16:19 |
clarkb | ya I just always feel swamped before a trip no matter what happens. Its worse now because I'm out of practice | 16:20 |
*** dviroel|lunch is now known as dviroel | 16:24 | |
fungi | same. i'm already half packed | 16:26 |
fungi | just trying to make sure i forget as little as possible | 16:26 |
*** marios|ruck is now known as marios|out | 16:29 | |
clarkb | the ethercalc snapshot appears complete. I'm not in a rush to delete the actual server and its dns records. If we want we can leave the server itself around for a day or two to catch any major problems should they come up? I guess we can cover that in the meeting too | 16:32 |
fungi | sounds good | 16:32 |
corvus | what are our thoughts about providing higher vcpu count nodes? | 16:38 |
fungi | by default, or with alternative labels? | 16:39 |
clarkb | corvus: I think the higher memory nodes also have higher cpu counts. But we have limited availability of that currently | 16:40 |
fungi | some providers have higher cpu count flavors with the same memory limits | 16:40 |
fungi | though for the most part it's the other way around, providers are increasing memory in their flavors relative to cpu count, i think? | 16:41 |
corvus | alternate labels would be fine | 16:41 |
clarkb | ya there isn't a ton of consistency with flavor types across providers. They all vary one way or another | 16:41 |
clarkb | the main issue from the opendev perspective is funneling the majority of jobs into those flavors when they realize they can go faster or install all the things and have more memory etc | 16:42 |
clarkb | then you've effectively reduces your capacity | 16:42 |
corvus | if i went through the providers and came up with a 'high cpu' label for each, would that be a useful change, or a waste of time? | 16:42 |
clarkb | I think the answer to that question boils down to "are we ok with a large portion of our users immediately not using the smaller flavor size anymore" as I think that is a very real risk (particularl due to openstack/devstack oom problems and general throughput struggles | 16:43 |
corvus | (ie, are there constraints that would prohibit us from adding those to some/all providers?) | 16:43 |
clarkb | if we're ok with cutting capacity by whatever that ratio is then its probably ok | 16:43 |
clarkb | in a way we already do that with multinode jobs and projects that run 50 jobs against a single change | 16:44 |
clarkb | so its probably ok? | 16:44 |
corvus | if openstack wants some constraints, this can be limited per tenant: https://zuul-ci.org/docs/zuul/latest/tenants.html#attr-tenant.allowed-labels | 16:44 |
clarkb | corvus: I know vexxhost has indicated larger flavors are ok/preferable. OVH gives us specific flavors and we'd need to run a size up by them if we do it more globally. I'm not sure of the other clouds off the top of my head | 16:45 |
clarkb | specifically for OVH I think we'd need them to create a new flavor for us | 16:45 |
corvus | okay... i'll take a look at the options and set up an etherca...er...etherpad. | 16:46 |
fungi | well, vexxhost wants us to use higher memory-to-cpu ratio flavors, not the other way around | 16:48 |
fungi | so 8 vcpus with 32gb ram rarther than 8gb | 16:49 |
fungi | because their hardware has a similar memory-to-cpu ratio | 16:50 |
clarkb | ah | 16:50 |
corvus | we have min-ram=8000 and 8192 in the same rax provider sections... any idea why the discrepancy? | 16:51 |
fungi | no clue. unless there's a hint from git blame, i have a feeling some of that dates back to when we had only rackspace and hpcloud | 16:53 |
clarkb | I think we may have started at 8192 but then some clouds set ram for 8GB flavors to a "round" number 8000 so then we shifted it | 16:54 |
clarkb | but we may not have shifted it globally or cargo culted the wrong thing? I think 8000 is the value we should use everywhere for 8GB flavors | 16:54 |
fungi | oh, right it's a minimum, so by insisting on 8192 we may have been excluding their "8gb" flavors | 16:54 |
fungi | but i agree, it's worth trying to clean up and make more consistent | 16:54 |
corvus | also, we have a lot of min-ram where we probably don't need it (ie, very specific flavor names) | 16:56 |
fungi | totally in favor of cleaning any of that up | 17:00 |
*** jpena is now known as jpena|off | 17:05 | |
*** rlandy is now known as rlandy|dr_appt | 17:09 | |
corvus | fungi: clarkb https://ethercalc.net/o6pfpf4dhi1v | 17:22 |
corvus | i think i got the current non-arm flavors right there | 17:23 |
corvus | then below i looked up what it would take to get us to 16vcpus | 17:23 |
clarkb | corvus: I left some comments to your questions | 17:24 |
corvus | basically: rax and iweb have flavors we can use... vexxhost has a flaver we are already using. for inmotion, we have a special flavor even though inmotion appears to already have one for 8 | 17:24 |
clarkb | also am curious if there is a 12vcpu or similar in between scaling factor on any of the clouds | 17:24 |
corvus | i don't remember seeing any, but wasn't specifically looking | 17:25 |
corvus | so inmotion is curious -- why do we have a special flavor name there? | 17:25 |
clarkb | corvus: because we manage the flavors in that cloud | 17:28 |
fungi | s/the flavors in// | 17:29 |
clarkb | it starts with whatever defaults nova provides none of whihc are near what we need | 17:29 |
clarkb | so we added the flavor we needed (and can add others, etc) | 17:29 |
fungi | did we configure the flavor manually, or with automation? | 17:29 |
corvus | oh... i see the difference... our flavors have no ephemeral disk | 17:29 |
corvus | so we should add a new flavor to match that then | 17:29 |
clarkb | ++ | 17:29 |
corvus | oh, we're still in that situation where vexxhost is only providing special flavors | 17:33 |
corvus | we could really use that memory cap... | 17:34 |
clarkb | corvus: memory cap on boot in the instance you mean? | 17:35 |
corvus | yeah | 17:39 |
corvus | the thing we used to do for hpcloud because of the huge ram values | 17:39 |
opendevreview | James E. Blair proposed openstack/project-config master: Add 16 vcpu flavors https://review.opendev.org/c/openstack/project-config/+/844116 | 17:39 |
corvus | that's why we don't have vexxhost used for normal nodes right now, their 8vcpu nodes have 32g of ram; everything else has 8192 or 8000(ovh only) | 17:40 |
clarkb | gotcha | 17:41 |
clarkb | in the past we did it as a kernel boot flag. Which we are not setting again to fix jammy syslog spam in ovh | 17:41 |
corvus | there is much more ram variance for 16vcpu nodes, but since 16gb is the minimum, it's perhaps less of an issue? i dunno; brave new world. | 17:41 |
fungi | s/not/now/ | 17:41 |
clarkb | iirc we had to set the value low enough to fit in ovh, but then it worked on all the providers? | 17:41 |
fungi | yeah, so wouldn't be hard to add | 17:41 |
fungi | we can do that independent of the images, right? | 17:42 |
clarkb | no you have to do it on the image | 17:42 |
clarkb | I don't think openstack allows you to set boot flags for whole image images | 17:42 |
clarkb | (you have to do it with the kernel image split but we don't support that for our images) | 17:42 |
fungi | so we'd need ram-specific images in order to have the same distro work on multiple different flavors | 17:42 |
fungi | that was the downside i'm remembering | 17:42 |
clarkb | yes. I think this is why we haven't done it again | 17:43 |
fungi | basically doubles our image count | 17:43 |
clarkb | or we limit these new labels to an image or two to make it reasonable | 17:43 |
corvus | oof | 17:43 |
corvus | well, it could be argued that even though there's more ram variance for the 16c flavors, since the minimum is higher it's less of a problem? | 17:44 |
corvus | i still think it's def a problem for the 8c flavors though -- i mean, we absolutely had jobs that worked in vexxhost under 32g that did not work anywhere else under 8g | 17:44 |
clarkb | ya the delta is smaller | 17:45 |
corvus | (if 32g were the minimum, i would be in the camp of "it almost certainly doesn't matter". with 16g the minimum, i'm at "it probably doesn't matter") | 17:45 |
corvus | (and at 8g i'm at "we've seen it matter") | 17:46 |
fungi | when my current meeting wraps up i have to go run some quick errands, but should hopefully be back in time for the opendev meeting | 17:53 |
fungi | back just in the nick of time | 18:58 |
clarkb | I got a drink of flavored fizzy water just in time :) | 18:59 |
mgariepy | hello, can i have a hold on : --project=opendev.org/openstack/openstack-ansible-galera_server --job=openstack-ansible-deploy-infra_lxc-centos-9-stream --ref=refs/changes/37/844037/1 | 19:00 |
fungi | mgariepy: sure, gimme a sec since the meeting is just starting | 19:01 |
mgariepy | whenever you can i'll prbably just take a look tomorrow. | 19:01 |
fungi | still trying to reproduce a galera issue? | 19:01 |
fungi | mgariepy: that autohold is set now | 19:02 |
mgariepy | nop that one is another galera issue on c9s. that i have difficulty reproducing over here.. i don't even get to that point :/ | 19:02 |
mgariepy | the other day my patch wasn't quite right.. string vs list didn't rendedred well in the file :) | 19:03 |
mgariepy | it's now fixed tho. | 19:03 |
opendevreview | Ian Wienand proposed opendev/glean master: testing: add vexxhost sample config https://review.opendev.org/c/opendev/glean/+/843996 | 19:08 |
*** rlandy|dr_appt is now known as rlandy | 19:56 | |
clarkb | alright lunch now but when I get back I'll try to review those glean changes | 20:02 |
clarkb | ianw: I guess we can hold off on approving things and you can approve what you think is appropriate for your release plan? | 20:02 |
ianw | if that works for you, sure. i'll make some extra notes in the reviews | 20:03 |
clarkb | yup I'm happy to +2 and have you land what you think is ready for tagging (that wayt we don't land the whole stack and end up reverting stuff oddly | 20:03 |
clarkb | and your plan sounded great to me | 20:03 |
*** dviroel is now known as dviroel|afk | 20:25 | |
mgariepy | fungi, the instance seems to be stuck on a task. does the hold also apply when the tasks timeout ? | 20:49 |
clarkb | ianw: question on https://review.opendev.org/c/opendev/glean/+/843243 | 20:52 |
clarkb | mgariepy: yes timeout should be treated like a failure and get held | 20:52 |
mgariepy | ok cool thanks | 20:53 |
clarkb | ianw: and on the next one after that | 20:55 |
clarkb | ya left a few notes but overall lgtm | 21:04 |
Clark[m] | And now a school run | 21:12 |
fungi | Clark[m]: https://review.opendev.org/843411 is another refactor/cleanup in the same topic, but branches off the rest of the series | 21:32 |
ianw | oh, hrm, that is supposed to be in the stack | 21:39 |
ianw | i think that might have fallen out accidentally on a rebase | 21:40 |
opendevreview | Ian Wienand proposed opendev/glean master: tests: stub out fields https://review.opendev.org/c/opendev/glean/+/844143 | 22:02 |
opendevreview | Merged opendev/glean master: Revert "Add option to ignore config drive interfaces info" https://review.opendev.org/c/opendev/glean/+/843225 | 22:02 |
opendevreview | Merged opendev/glean master: write_redhat_interfaces: refactor to walk interfaces first https://review.opendev.org/c/opendev/glean/+/843241 | 22:02 |
*** rlandy is now known as rlandy|bbl | 22:04 | |
corvus | i believe all the zuul changes we were waiting on have merged. we can manually restart the mergers now, then perform another rolling restart of the whole system. | 22:13 |
corvus | clarkb: ^ | 22:13 |
Clark[m] | Cool. I should be home soon from the school run and can help coordinate that | 22:16 |
fungi | i'm around for a while this evening too if things go sideways | 22:17 |
corvus | i'll go ahead and restart the mergers | 22:18 |
corvus | (side effect: all zuul hosts will have the latest images locally too) | 22:19 |
fungi | noted, thanks | 22:22 |
opendevreview | Merged opendev/glean master: write_redhat_interfaces: pass multiple networks to output functions https://review.opendev.org/c/opendev/glean/+/843242 | 22:25 |
Clark[m] | Any concern with the package updates interfering with auto updates? | 22:25 |
Clark[m] | Or just send it and cross that bridge if it happens | 22:26 |
corvus | Clark: can you elaborate? | 22:27 |
corvus | #status log restarted zuul mergers on 6.0.1.dev54 7842e3fcf10e116ca47cfffbd82022802b53432d which includes merger graceful fix in preparation for rolling restart | 22:28 |
opendevstatus | corvus: finished logging | 22:28 |
corvus | we're gtg on the rolling restart whenever ^ | 22:28 |
fungi | as mentioned in the meeting, concern that our playbook doing apt update/upgrade may happen coincident with unattended-upgrades doing the same, leading to errors which cause the playbook to abort early | 22:30 |
fungi | i say we not worry about it unless it starts hitting us, otherwise we probably need some sort of lock/wait mechanism | 22:30 |
corvus | sounds reasonable, especially since we're doing it so slowly -- worst case is we stop and we're down one host until we fix it | 22:31 |
corvus | at least with the gui package stuff, i think there's some way to wait until unattended-upgrades is done... maybe there's a command we can throw in to do that | 22:32 |
opendevreview | Merged opendev/glean master: _network_info: Clean up TYPE=Ethernet handling https://review.opendev.org/c/opendev/glean/+/843353 | 22:35 |
opendevreview | Merged opendev/glean master: _network_info: refactor to add ipv4 info at the end https://review.opendev.org/c/opendev/glean/+/843367 | 22:35 |
corvus | or maybe we just throw the update tasks into a big retry loop? | 22:37 |
clarkb | ya something like that may work. | 22:40 |
clarkb | I'll go ahead and start the playbook now in a root screen on bridge | 22:40 |
clarkb | `time ansible-playbook -f20 /home/zuul/src/opendev.org/opendev/system-config/playbooks/zuul_reboot.yaml 2>&1 | tee zuul_reboot.log.20220531` is the command | 22:41 |
clarkb | and it is running | 22:41 |
opendevreview | Merged opendev/glean master: tests: symlink same files together https://review.opendev.org/c/opendev/glean/+/843979 | 22:44 |
clarkb | ze01 is down to 2 running jobs. Of course those can run for another coupel of hours | 23:23 |
fungi | and up to 6 i think? or more if one is paused | 23:27 |
clarkb | I think its max about 3 hours and 45 minutes? | 23:28 |
fungi | oh, i guess we have a shorter max timeout on post-run | 23:29 |
clarkb | but ya I'm monitoring it. I'd like to see the first one get done before I pop out for dinner but if not I can always check in after | 23:34 |
fungi | i'm watching it too, if you need to disappear for a while | 23:37 |
clarkb | now down to one. ya in about 10-15 minutes probably | 23:39 |
clarkb | ze01 is done and logging into it the packages updated and the executor seems to be running stuff | 23:52 |
clarkb | I think that means the package updates are generally fine. Now we wait and see how the rest do | 23:52 |
fungi | yep, lgtm | 23:53 |
clarkb | ok this looks happy enough for me to grab dinner and start to wind down. | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!