clarkb | I assume then that centos 8 and other distros that work in rax where static networking is required are converting the move event to an add then? | 00:00 |
---|---|---|
clarkb | I think that is related to my remaining concern with triggering on a move. If we do that will we run twice? I guess that is ok because glean will see the config already exists and noop exit | 00:00 |
ianw | i would say that's right. systemd/udev sends a netlink message IFLA_IFNAME to trigger the rename. the kernel handles that, and eventually sends the udev event via device_rename->kobject_rename, where udev gets that event | 00:00 |
ianw | ... where systemd-udevd gets ... | 00:01 |
ianw | ... while interesting, i don't think really any closer to understanding what events should be triggered her | 00:05 |
ianw | here | 00:05 |
clarkb | well I think that says the move event is expected | 00:07 |
clarkb | what we don't know is if we should get an add after the move | 00:07 |
ianw | i wonder if we should do something with DEVPATH_OLD in glean-udev.rules | 00:07 |
ianw | clarkb: yes, agree on that | 00:07 |
clarkb | considering that the comments seem to indicate that moves are largely just for renaming network interfaces I think we can fairly safely match on move too? | 00:10 |
clarkb | then as long as matching move and add (say if some distros proxy move to an add) is safe the nwe're ok | 00:10 |
ianw | clarkb: yeah, i think there's enough due-diligence that we don't think this is an upstream issue but something we're missing, and so adding |move (and maybe |change?) is where we're at | 00:16 |
clarkb | well I think upstream may need to add the add event as there is enough implication that that was existing behavior and is behavior in other distros? | 00:23 |
clarkb | I mean I guess it isn't strictly required that that dod that since we can match on move | 00:24 |
clarkb | *that they do that | 00:24 |
clarkb | but it is annoying when behaviors like this randomly change | 00:24 |
clarkb | udev should be fairly stable imo | 00:24 |
ianw | what i might try is bring up a plain/from upstream qcow centos 8-stream and 9-stream, run a udev trace and rename the devices via "ip" commands? | 00:25 |
ianw | that should give us a trace of the events? if that differs, we know somethings up | 00:25 |
clarkb | ianw: you might also just compare our cntos 8 stream boot to centos 9? Also `udevadm info /sys/class/net/ens3` is useful | 00:26 |
clarkb | there shouldbe a SYSTEMD_WANTS value in there if it is working | 00:27 |
clarkb | ianw: any reason for me to keep my first test node up in bhs1 or should I delete it now? | 00:27 |
ianw | umm, that is the node with the "bad" environment, right? | 00:27 |
clarkb | ianw: its the one that I modified to have ACTION=="add|move" to test that that fixed things | 00:28 |
clarkb | so it is happy now I think | 00:28 |
clarkb | but you can revert that rules file and reboot to generate current behavior from the image | 00:28 |
ianw | ahh maybe keep it, just until we either decide to file a bug report or do that | 00:28 |
ianw | i'm just thinking get glean out of the picutre for the rename trace; to avoid any confusion for a potential bug report | 00:28 |
clarkb | ah yup that makes sense | 00:29 |
clarkb | ok I'll keep the instance up. I can delete it tomorrow if this is conlcuded | 00:29 |
ianw | i'll try and get two traces of a manual rename now. i feel like that's the best way forward to decide where things might have changed | 00:29 |
clarkb | but now I need to figure out what our dinner plan is | 00:30 |
ianw | ok, renaming only issues a MOVE udev event on centos-8 | 02:25 |
mgagne | FYI I restarted some services on iweb cloud. that could explain why some instances were stuck in deleting state. | 03:18 |
ianw | clarkb: i have a theory -> https://etherpad.opendev.org/p/centos-9-glean-renaming | 03:58 |
ianw | it seems that on RAX centos-8 we are *not* renaming the devices. they are eth0/eth1 -- and glean is working | 03:58 |
ianw | my theory is that there is a driver issue there with /sys/class/net/eth0/name_assign_type returning an invalid value. this is what systemd/udev uses to decide if the device should be renamed. | 04:00 |
ianw | it's probing that value, getting -EINVAL or whatever and failing out without renaming | 04:00 |
ianw | i'm assuming that is fixed on 9-stream either in the kernel or systemd/udev and now it *is* deciding to rename | 04:01 |
ianw | and then we fall back to what you've already discovered; that we are not matching the "move" event and thus not setting up glean correctly | 04:02 |
ianw | however, on other clouds, we haven't noticed because, as you note, they fall back to dhcp. the only xen+!dhcp combo we have is RAX, and because the interface wasn't being renamed from eth0 we just happened to work | 04:03 |
ianw | to add extra confusion to all this, the upstream .qcow2 images set net.ifnames=0 on their default command line. so they don't rename interfaces, by design. we do not set that anywhere | 04:04 |
*** Guest318 is now known as diablo_rojo_phone | 04:06 | |
*** diablo_rojo_phone is now known as diablo_rojo | 04:10 | |
*** ysandeep|out is now known as ysandeep | 05:09 | |
*** akahat is now known as akahat|rover | 06:35 | |
*** jpena|off is now known as jpena | 07:11 | |
*** pojadhav- is now known as pojadhav | 08:13 | |
*** ysandeep is now known as ysandeep|lunch | 08:31 | |
*** ysandeep|lunch is now known as ysandeep | 09:15 | |
*** hjensas is now known as hjensas|out-sick | 09:52 | |
*** marios is now known as marios|food|biab | 09:54 | |
*** marios|food|biab is now known as marios | 10:26 | |
*** prometheanfire is now known as Guest928 | 10:40 | |
*** dviroel|afk is now known as dviroel | 11:22 | |
*** ysandeep is now known as ysandeep|afk | 11:51 | |
*** soniya29 is now known as soniya29|afk | 12:04 | |
*** pojadhav- is now known as pojadhav | 12:18 | |
opendevreview | Jeremy Stanley proposed opendev/base-jobs master: No longer special-case CentOS Stream https://review.opendev.org/c/opendev/base-jobs/+/836181 | 12:47 |
fungi | infra-root: ^ semi-urgent fix, looks like the centos-8 wheel cache removal has broken things for centos-8-stream nodes | 12:48 |
*** artom_ is now known as artom | 13:17 | |
*** ysandeep|afk is now known as ysandeep | 13:28 | |
Clark[m] | fungi: I'm not sure that change is correct. That is for the actual distro mirroring and CentOS-8 stream is adjacent to CentOS 8 iirc. Then when CentOS 9 stream they changed roots and our mirrors accommodated that and that is what those conditions reflect | 13:56 |
Clark[m] | http://mirror.dfw.rax.opendev.org/centos/8-stream/ is one root and http://mirror.dfw.rax.opendev.org/centos-stream/9-stream/ the other | 13:57 |
Clark[m] | fungi: I think https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/defaults/main.yaml#L12 sets the wheel mirror path | 14:08 |
Clark[m] | Looks like the two CentOS files in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/vars override the default value though | 14:12 |
Clark[m] | I can take a closer look in about an hour | 14:13 |
fungi | oh, for some reason i thought the wheel mirror urls were under our control | 14:42 |
fungi | i'm back now and can work on an alternative patch | 14:42 |
fungi | that's going to get weird though since it's not just configuring our system, so need to make sure i keep it backward-compatible for other users | 14:43 |
fungi | the wheel_mirror override in vars/CentOS.yaml is identical to the one in vars/CentOS-9.yaml | 14:48 |
fungi | i wonder if this means we're actually putting the centos 8 wheels in the wrong place and we need to move it? | 14:49 |
*** ysandeep is now known as ysandeep|out | 14:53 | |
fungi | oh, we don't even have a centos stream 9 wheel cache | 14:54 |
fungi | so we can't reason about what is or isn't working for 9 | 14:54 |
fungi | should we try to override zj's configure-mirrors vars for centos from where we include the role in our base job? | 14:56 |
Clark[m] | Have we verified the Ansible vars we use lack -stream in them? | 14:57 |
Clark[m] | I think we might be able to include that if we can check a var for the info | 14:57 |
fungi | the jobs running on centos-8-stream nodes are looking for wheels in the url without -stream in it | 14:57 |
fungi | i'll see if centos-9-stream is trying similar | 14:58 |
Clark[m] | Those bars are recorded by zuul in the host info file | 14:58 |
Clark[m] | *vars | 14:58 |
fungi | unfortunately i'll have to find an actual example because the logs provided in #openstack-infra were via paste not zuul build results | 14:59 |
fungi | Clark[m]: which host info file? zuul-info/inventory.yaml or something else? | 15:05 |
fungi | oh, i'm blind. zuul-info/host-info.centos-9-stream.yaml et cetera | 15:06 |
Clark[m] | Ya those files have all the Ansible facts and that is what the role I linked uses to construct that var | 15:07 |
fungi | confirmed, ansible_distribution_major_version is just '8' or '9' with no -stream on the end | 15:07 |
fungi | https://zuul.opendev.org/t/openstack/build/3777ce57d6024b668499d433a3ebd93a/log/zuul-info/host-info.centos-8-stream.yaml#162 | 15:07 |
fungi | https://zuul.opendev.org/t/openstack/build/a3674b7b2fe24f7895ed25851b5f9b8d/log/zuul-info/host-info.centos-9-stream.yaml#130 | 15:07 |
fungi | so we need to override wheel_mirror to "{{ http_or_https }}://{{ mirror_fqdn }}/wheel/{{ ansible_distribution | lower }}-{{ ansible_distribution_major_version }}-stream-{{ ansible_architecture | lower }}" for all centos versions | 15:08 |
*** dviroel is now known as dviroel|lunch | 15:09 | |
Clark[m] | Or modify the role to add -stream if that is detectable | 15:10 |
fungi | in zuul-jobs? | 15:10 |
fungi | the problem is that what wheel_mirror should be is determined by where you're publishing your wheels | 15:11 |
fungi | and there's nowhere to "detect" that choice | 15:11 |
fungi | so we can either change it for everyone who's using configure-mirror and possibly break their systems if they were already putting wheels where that role expects them, or we can override our use of the role to point to where we publish our wheels, or we can move our wheels to the location the role expects to find them | 15:12 |
Clark[m] | Right but we might be able to detect if the distro is a stream release which is distinct to the old rhel clone releases | 15:12 |
Clark[m] | 8 != 8-stream I think it is fair to update zuul-jobs here if we can tell it is 8-stream and not 8 | 15:13 |
fungi | the role expects centos stream 8 wheels to be in centos-8 and centos stream 9 wheels to be in centos-9 | 15:13 |
Clark[m] | Yes but that is wrong because wheels for stream don't always work on not stream | 15:13 |
Clark[m] | They need to be distinct to be correct | 15:13 |
fungi | anybody who's been using it for centos stream 9 (we haven't) may be publishing them to centos-9 as the url | 15:13 |
fungi | changing it in the role in zuul-jobs could break any existing consumers who are publishing their wheels where the role currently expects to find them | 15:14 |
fungi | distinguishing between 8 and stream 8 may be a thing, but there is and was no 9 separate from stream 9 | 15:15 |
Clark[m] | Yes, that is a risk. And I agree 8 vs 9 is a different situation. I'd personally be inclined to say meh here. And tell the job to install libvirt-dev. It never worked for stream anyway except by random chance for a while | 15:16 |
fungi | also since centos linux 8 (non-stream) is no longer a thing, it would make just as much sense to fix the wheel cache urls/paths we're publishing | 15:16 |
clarkb | ya we could just put a symlink in place maybe | 15:21 |
fungi | i don't think changing the centos default in zuul-jobs can be safely done without notifying potential users in advance at the very least, which leaves either reorganizing our mirrors (volume/mount rename, symlink, whatever), or overriding the var in our base job | 15:22 |
clarkb | or accepting it never worked anyway and jobs need to adjust. | 15:22 |
clarkb | But since a symlink is simple enough I could see just doing that and maybe make a note of the weirdness somewhere | 15:23 |
fungi | yeah, i don't see any reason to assume it never worked for other users of that role, it just didn't work for us because we put our mirror at a different place than the role expects to find it | 15:24 |
clarkb | ya I mean tell opendev users there is no centos wheel mirrors anymore | 15:24 |
clarkb | which means installing libvirt-dev etc (and maybe they can pressur etheir colleagues to finally publish to pypi) | 15:25 |
fungi | oh, that | 15:25 |
fungi | and yes, we have no centos stream 9 wheel cache anyway since it's never been added | 15:25 |
clarkb | ianw: that all makes a ton of sense. I'll try to look into addressing the python27 testing for glean | 15:30 |
fungi | okay, after a bit of hunting, this is where we decide on what path to publish the centos wheels under: https://opendev.org/openstack/project-config/src/branch/master/playbooks/publish/wheel-mirror.yaml#L14-L23 | 15:38 |
opendevreview | Clark Boylan proposed opendev/glean master: Handle udev move events in addition to add events https://review.opendev.org/c/opendev/glean/+/836103 | 15:39 |
opendevreview | Clark Boylan proposed opendev/glean master: Install older voluptuous on py27 due to import error https://review.opendev.org/c/opendev/glean/+/836194 | 15:39 |
clarkb | fungi: I think our two options if we wish to keep the mirror are either to put a symlink in place or in our base jobs set the wheel_mirror var to include -stream on our centos machines | 15:41 |
*** marios is now known as marios|out | 15:42 | |
fungi | or rename the two volumes to what the configure-mirrors role expects and change the publish playbook to match | 15:44 |
fungi | which would basically just be a simple revert of https://review.opendev.org/803411 | 15:45 |
clarkb | well no because centos 8 wheels were published to centos-8 and now we need centos-8-stream to be published there | 15:47 |
clarkb | that does give a hint at how we can override wheel_mirror though using ansible_lsb.id to check if we are on stream | 15:47 |
fungi | centos 8 wheels were published to centos-8 but we deleted them | 15:49 |
fungi | backwards compatibility there is unnecessary | 15:49 |
clarkb | correct but if you just revert that change it will try to publish centos-8 wheels to centos-8? | 15:49 |
fungi | yes, which is exactly where the configure-mirrors role expects to find them | 15:50 |
clarkb | I think we have to keep the new jobs but change the slug definition. So not a proper revert | 15:50 |
clarkb | revert https://review.opendev.org/c/openstack/project-config/+/803411/4/playbooks/publish/wheel-mirror.yaml but not the rest of the change | 15:50 |
fungi | oh, right i meant revert 803411's change of the playbooks/publish/wheel-mirror.yaml file only | 15:50 |
fungi | i'll push that up for discussion, and if others agree i can work on the corresponding volume moves | 15:54 |
*** dviroel|lunch is now known as dviroel | 16:00 | |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Match configure-mirrors for CentOS wheel URLs https://review.opendev.org/c/openstack/project-config/+/836200 | 16:14 |
fungi | clarkb: ^ something like that | 16:15 |
clarkb | ya I think that will do it if we also update afs | 16:15 |
clarkb | might be good to double check with ianw since he had pushed on it but that would wait for ~sunday our time | 16:16 |
fungi | dpawlik: that region has no instance flavors with 8 vcpus and only 4gb ram, so i'm resizing to v2-highcpu-8 which has 8 vcpus with 8 gb ram instead | 16:43 |
fungi | are you ready for me to initiate the resize? | 16:43 |
*** rlandy is now known as rlandy|biab | 16:55 | |
dpawlik | fungi: hey, gimme 2 min | 17:02 |
fungi | sure thing, there's no hurry | 17:03 |
fungi | just want to make sure you're not in the middle of something when it reboots into the new flavor | 17:03 |
dpawlik | fungi: exactly, I need to stop logscraper | 17:03 |
dpawlik | few logs left | 17:04 |
dpawlik | we can go | 17:04 |
*** jpena is now known as jpena|off | 17:04 | |
dpawlik | fungi: ping me after, I will check how it works | 17:09 |
fungi | resizing now | 17:20 |
fungi | sorry for the delay, stepped away for a few | 17:20 |
fungi | dpawlik: console log says it's booted. once you give me the thumbs-up i'll mark it confirmed | 17:21 |
dpawlik | fungi: works. Thank you | 17:26 |
fungi | awesome, the latest resize is marked as confirmed now | 17:26 |
dpawlik | hope it will be last time | 17:29 |
dpawlik | if not, need to think for some other language for parsing time and message | 17:30 |
*** rlandy|biab is now known as rlandy | 17:45 | |
fungi | well, we had 20 servers running 4 concurrent preprocessors each for the old solution | 17:46 |
dpawlik | fungi: there are few data nodes and master nodes on Opensearch side, but just one host that is getting the logs, parse and send... | 18:44 |
dpawlik | today I see that there is a lot of devstack jobs, that require more time to parse one build than usually... | 18:45 |
dpawlik | Have a good weekend | 18:47 |
fungi | thanks, you too! | 18:58 |
fungi | and yes, now that the release is done, development activity on openstack is picking back up. after next week (ptg) i expect it to accelerate further | 18:58 |
clarkb | https://review.opendev.org/c/opendev/glean/+/836103/2 the glean stack to try and address the centos 9 udev stuff has +1's from zuul now. I believe we consume that from releaess? Maybe if we can land those changes today then early next week we can tag glean 1.20.1 ? | 19:31 |
fungi | yeah, i think it needs a release before dib will pick it up, but anyway i approved the change just now | 19:35 |
clarkb | fungi: the parent will need revie wtoo | 19:36 |
fungi | oh | 19:36 |
clarkb | I suspect the parent is safe to land with single reviewer approval since it is just a dep cap | 19:36 |
fungi | single-core approved it just now, yep | 19:36 |
clarkb | great, hopefully that addresses the node failure problems by giving us more cloud sto boot in. fwiw I think the fix in bhs1 no valid hosts and adding gra1 back again helped a lot too | 19:37 |
*** dviroel is now known as dviroel|brb | 20:05 | |
opendevreview | Merged opendev/glean master: Install older voluptuous on py27 due to import error https://review.opendev.org/c/opendev/glean/+/836194 | 20:53 |
opendevreview | Merged opendev/glean master: Handle udev move events in addition to add events https://review.opendev.org/c/opendev/glean/+/836103 | 20:53 |
*** dviroel|brb is now known as dviroel | 21:16 | |
fungi | #status log Restarted Zuul executors for kernel and docker updates, now running on Zuul 5.2.2.dev2 (08348143) | 22:38 |
opendevstatus | fungi: finished logging | 22:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!