*** rlandy|bbl is now known as rlandy|out | 00:45 | |
*** ykarel_ is now known as ykarel | 04:51 | |
*** ysandeep|out is now known as ysandeep|rover | 04:52 | |
*** pojadhav|afk is now known as pojadhav | 05:10 | |
*** ysandeep|rover is now known as ysandeep|rover|brb | 05:51 | |
*** ysandeep|rover|brb is now known as ysandeep|rover | 05:56 | |
*** jpena|off is now known as jpena | 07:36 | |
*** ysandeep|rover is now known as ysandeep|rover|lunch | 09:31 | |
*** ysandeep|rover|lunch is now known as ysandeep|rover | 10:09 | |
*** rlandy|out is now known as rlandy | 10:27 | |
*** dviroel|out is now known as dviroel | 11:26 | |
*** rlandy is now known as rlandy|mtg | 11:26 | |
*** ysandeep|rover is now known as ysandeep|rover|afk | 11:33 | |
*** rlandy|mtg is now known as rlandy | 12:02 | |
*** ysandeep|rover|afk is now known as ysandeep|rover | 12:26 | |
frickler | infra-root: cf. https://review.opendev.org/c/zuul/zuul/+/837852/5/doc/source/client.rst how would you currently for example enqueue a patch into gate? the zuul-client on zuul02 doesn't know the --trigger option, it does work without it, though. do we need to do something before zuul moves on? | 12:27 |
---|---|---|
frickler | also for reference I enqueued 842532,1 into gate to speed up unblocking devstack, is that worth a status log? | 12:30 |
fungi | frickler: i do often #status log manual actions like that, just to serve as a clear record | 12:36 |
frickler | #status log enqueued 842532,1 into gate to speed up unblocking devstack | 12:41 |
fungi | frickler: i have this in my command history on zuul02: | 12:44 |
fungi | sudo zuul-client enqueue --tenant=openstack --pipeline=check --project=openstack/placement --change=825849,1 | 12:44 |
frickler | fungi: yes, that is what I used, but it seems from the above patch that that is deprecated and zuul-admin should be used now? or maybe I read that wrong | 12:47 |
fungi | oh, i thought it was `zuul` being deprecated in favor of `zuul-admin` | 12:49 |
frickler | I just noticed a regression in gerrit: if I hit rebase and start typing a patch ID or text, it shows patches from all projects, not just from the project the patch is against. | 12:50 |
frickler | fungi: there's a note added that says: For operations related to normal workflow like enqueue, dequeue, autohold and promote, the `zuul-client` CLI should be used instead. | 12:51 |
frickler | but lateron there are still examples with zuul-admin for enqueue etc. | 12:51 |
frickler | but maybe we should discuss on that patch rather than here | 12:52 |
fungi | that seems like it might be a typo | 12:52 |
fungi | yeah | 12:52 |
fungi | frickler: oh! | 12:55 |
fungi | i get it now | 12:55 |
fungi | the enqueue, dequeue, autohold and promote subcommands are being retained for now for backward compatibility, so the zuul (not zuul-client) documentation about them is being updated to indicate that you need to run zuul-admin instead of just zuul | 12:56 |
fungi | it's not saying "use `zuul-admin enqueue` instead of `zuul-client enqueue`" but rather "...instead of `zuul enqueue`" | 12:57 |
frickler | fungi: ah, o.k., then I really read this the wrong way around and we should be fine with what we are doing | 13:02 |
fungi | i added a recommendation in a review comment to hopefully reduce that point of confusion | 13:03 |
fungi | if you look at the change, it will print a deprecation note "Warning: this command is deprecated with zuul-admin, please use `zuul-client` instead" | 13:05 |
frickler | yes, maybe then the docs should also be updated to not show deprecated examples, I'll add a comment with that on the patch | 13:06 |
fungi | that's basically what my comment suggested | 13:06 |
fungi | er, well i suggested each subcommand's documentation entry mention it's deprecated, but you're right we probably should also drop the examples from those entries | 13:07 |
fungi | i could go either way on keeping examples for deprecated options until they're actually removed | 13:08 |
fungi | as long as we make it clear they're deprecated | 13:08 |
frickler | I suggested grouping them into a "Deprecated" section to make it more obvious | 13:10 |
fungi | oh, yep that could also work | 13:10 |
fungi | looks like the ovh-bhs1 mirror has broken package updates, which has in turn broken our base deploy job, which is the reason ssl certs aren't getting updated | 13:17 |
fungi | i'll get it squared up | 13:17 |
fungi | huh, it wants openafs-build-deps | 13:19 |
fungi | which is going to drag in a slew of other packages | 13:19 |
fungi | specifically for openafs-build-deps 1.8.8.1-2~ppa0~bionic | 13:21 |
fungi | the gra1 mirror is also running bionic and has that version installed with no problem | 13:22 |
fungi | oh, no it doesn't have openafs-build-deps installed, just the dkms package for the openafs lkm | 13:23 |
frickler | fungi: openafs-build-deps shouldn't be installed on mirrors, should they? | 13:24 |
fungi | nope! | 13:24 |
fungi | not sure when/why it was installed there | 13:25 |
fungi | #status log Purged the unneeded openafs-build-deps package from mirror01.bhs1.ovh.opendev.org in order to unblock our base deploy job | 13:25 |
fungi | cleaning it up seems to have solved the problem | 13:25 |
frickler | hmm, doesn't show up in any of the apt logs, so must have been in place for a very long time. or installed manually with dpkg? | 13:27 |
fungi | probably not via dpkg -i since that's a virtual package | 13:27 |
TheJulia | I just observed a bunch of jobs hit this within the last few minutes: E: Failed to fetch https://mirror.bhs1.ovh.opendev.org/ubuntu/dists/focal/universe/binary-amd64/Packages 403 Forbidden [IP: 158.69.73.218 443] | 13:45 |
TheJulia | I don't know if should expect things to be working or not at the moment | 13:46 |
abhishekk | clarkb,hi, around? | 13:56 |
abhishekk | can anyone help me to get this issue resolved, https://review.opendev.org/c/openstack/glance/+/842400 | 13:57 |
abhishekk | We are stuck and not able to merge anything due to this error | 13:57 |
abhishekk | 2022-05-19 08:31:55.100787 | ubuntu-bionic | The conflict is caused by: | 13:57 |
abhishekk | 2022-05-19 08:31:55.100796 | ubuntu-bionic | The user requested glance-store>=2.3.0 | 13:57 |
abhishekk | 2022-05-19 08:31:55.100804 | ubuntu-bionic | The user requested (constraint) glance-store===4.0.0 | 13:57 |
rosmaita | if we hack upper-constraints to glance-store===3.0.0 , we can build the tox py36 environment locally | 13:58 |
rosmaita | and we tried as a short term thing to put glance-store>=2.3.0,<4.0.0 in requirements.txt, but that just gives us | 13:59 |
rosmaita | ERROR: Cannot install glance-store<4.0.0 and >=2.3.0 because these package versions have conflicting dependencies. | 13:59 |
rosmaita | The conflict is caused by: | 13:59 |
rosmaita | The user requested glance-store<4.0.0 and >=2.3.0 | 13:59 |
rosmaita | The user requested (constraint) glance-store===4.0.0 | 13:59 |
Clark[m] | fungi: is The Julia's error related to your mirror surgery? | 14:00 |
Clark[m] | abhishekk: rosmaita: I'm not sure how we would help with that. Seems like constraints and your requirements are in conflict so you need to change one or the other | 14:01 |
rosmaita | Clark[m]: i misread that as "minor surgery" and was wondering what happened to fungi | 14:01 |
Clark[m] | It is the server you need to worry about :) | 14:02 |
Clark[m] | Anyway, I'm not really here yet and need to do a school run but can help in an hour or so | 14:02 |
abhishekk | Clark[m], if we do change requirement in glance to >= 4.0.0 then also it is failing with same error | 14:03 |
abhishekk | The conflict is caused by: | 14:03 |
abhishekk | The user requested glance-store>=4.0.0 | 14:03 |
abhishekk | The user requested (constraint) glance-store===4.0.0 | 14:03 |
Clark[m] | abhishekk: because 4.0.0 requires python3.8 or newer? | 14:05 |
abhishekk | yes, it does not support py36 and py37 | 14:06 |
Clark[m] | That is still a requirements and constraints conflict. You'll need to use different constraints under python 3.6 likely | 14:06 |
abhishekk | ack | 14:08 |
abhishekk | any example for the same? | 14:08 |
rosmaita | i think the setup.cfg in 4.0.0 tag still says 3.6 | 14:09 |
rosmaita | nope was looking at the wrong branch | 14:10 |
abhishekk | https://pypi.org/project/glance-store/ different here | 14:10 |
fungi | Clark[m]: TheJulia: oh, yes i didn't think about it but the openafs lkm may have been unloaded on the ovh-bhs1 mirror while it was being upgraded | 14:16 |
Clark[m] | fungi it looks to still be broken if you load the root http dir | 14:17 |
fungi | looks like it may still be that way. rebooting the mirror server now | 14:17 |
fungi | lsmod didn't show the module resident at all | 14:18 |
TheJulia | sweet | 14:18 |
fungi | also must have decided to run a filesystem check, or is otherwise timing out trying to load openafs now | 14:21 |
fungi | A start job is running for OpenAFS client (5min 16s / 8min 3s) | 14:23 |
fungi | i want to say we've seen this before with the ovh mirrors | 14:23 |
fungi | i can't remember if forcing the dkms rebuild was necessary, or if it was just a boot-time race and rebooting usually solved it | 14:24 |
fungi | it did eventually boot and seems afs is working now | 14:26 |
fungi | #status log Distribution package mirrors on mirror01.bhs1.ovh.opendev.org were unavailable 13:25-14:25 UTC due to a package upgrade not removing and not reloading the openafs kernel module; related job errors can be safely rechecked | 14:27 |
fungi | TheJulia: https://mirror.bhs1.ovh.opendev.org/ubuntu/dists/focal/universe/binary-amd64/Packages is returning content again. thanks for the heads up! | 14:27 |
*** pdeore is now known as pdeore|afk | 15:03 | |
TheJulia | fungi: thanks! | 15:04 |
clarkb | fungi: I think it is just slow with iops? it is the arm64 mirrors that have similar trouble | 15:15 |
fungi | ahh, maybe related to the afs cache volume then | 15:16 |
clarkb | abhishekk: rosmaita: ok I'm at a proper keyboard now. The first thing is to determine why you are testing with python3.6 in the first place. Is this on master? if so hasn't openstack dropped master python3.6 support? | 15:16 |
clarkb | fungi: ya I think it prunes it or verifies it or something and that takes time | 15:16 |
fungi | oh, right. we've blown away the cache before rebooting in the past when needed to avoid that | 15:16 |
rosmaita | clarkb: thanks, i think we have it sorted | 15:17 |
clarkb | abhishekk: rosmaita: if you are still testing with 3.6 because master glance-store is expected to work with stable releease of openstack then you need to do something like carry constrain override files or convince requirements to carry special 3.6 rules | 15:17 |
rosmaita | apparently the zed template change got stuck in the gate | 15:17 |
clarkb | PBR has examples of the special contraint override files iirc because it installs stuff back to python2.7 | 15:17 |
abhishekk | yep :/ | 15:17 |
rosmaita | clarkb: there is some kind of weird dependency on a job not defined in the glance repo that was breaking things, i don't understand it really, but abhishekk has a handle on it | 15:19 |
rosmaita | "things" being preventing the zed template merge | 15:19 |
abhishekk | ++ | 15:20 |
*** dviroel is now known as dviroel|lunch | 15:26 | |
*** ysandeep|rover is now known as ysandeep|out | 15:29 | |
clarkb | johnsom: I can see an argument to put it in the zuul job. Devstack probably doesn't need to know when its system is appropriately booted all it cares about is the user has triggered it and the user should be sure that the system si ready | 15:50 |
clarkb | johnsom: that might still be a change to the devstack repo, but in the zuul playbooks or roles that trigger stack.sh | 15:50 |
fungi | yeah, conversely, the fips setup role shouldn't need to care that dns resolution works | 15:51 |
fungi | and it's unclear what or when something in zuul-jobs can generically assure the system is "fully booted" | 15:52 |
johnsom | I am flexible. I just liked the fact that devstack would stop and give a direct error when DNS was broken instead of the current behavior where it runs a while and complains of missing packages. | 15:52 |
clarkb | right I'm beginning to think the best place for the check is in the devstack ansible stuff that runs stack.sh. Before running stack.sh you can do whatever system readiness checks are appropriate for devstack | 15:52 |
johnsom | Yeah, I looked briefly at systemd to get status, all of my systems reported "degraded" even though they are booted ok. | 15:52 |
clarkb | well I think you can do the same wait until nslookup returns a result check | 15:53 |
clarkb | but move it into the ansible running stack.sh and devstack proper can assume the user is triggering it on a ready system | 15:53 |
fungi | though if systemd deferred allowing logins until unbound was actually started up, this would likely be a non-issue | 15:54 |
fungi | to ianw's point | 15:55 |
clarkb | ya but then you have to mdoify every third party ci setup's images | 15:55 |
clarkb | and systemd is designed to have these problems somewhat intentionally aiui | 15:56 |
clarkb | it wants to give you access to the system as early as possible so that you can decide if you are ready to do additional work (to reduce total system startup cost) | 15:56 |
fungi | something else i liked about sysvinit | 15:56 |
clarkb | ya I think for a laptop it makes a lot of sense. For servers consistency and stability are desireable and worth a few seconds of startup cost | 15:57 |
fungi | agreed. it seems like systemd optimized for portable devices, at the expense of adding instability and vague lack of startup assurances for servers | 15:59 |
frickler | fungi: ade_lee: taking the reboot issue here, because there is a deeper question hidden I think: should a consumer be able to expect a CI node to be working properly after a reboot. if the answer is "Yes, we as opendev want to support this", then we likely need to set up things like unbound in place and have tests for the images we build that ensure that this works | 16:00 |
frickler | if not, maybe the fips job setup needs a different approach | 16:00 |
fungi | granted, the behavior with afs on the mirror servers is a clear indication that it does block on some things. openssh wouldn't allow me to log into the mirror until afs had started | 16:00 |
clarkb | my perspective on that is if you reboot in your job then you assume all responsibility for making the node happy | 16:00 |
johnsom | There is a nss-lookup.target that in theory should mean the system can accept queries, but I don't know how reliable that is | 16:00 |
clarkb | we already do ensure the node is ready for you when we hand you the node | 16:00 |
fungi | should we rerun the validate-host role after reboots? | 16:01 |
clarkb | rebooting throws a huge wrench in things. Its powerful that you are able to reboot at all (jenkins couldn't do it), btu its also something that jobs need to accept can be problematic and deal with | 16:01 |
clarkb | fungi: that may be an option | 16:01 |
fungi | granted, validate-host doesn't wait for the server to be booted, it just discards the build if the server isn't ready for the things it checks | 16:02 |
clarkb | a "wait for system to settle after reboot" role may be reasonable to add to zuul-jobs | 16:03 |
clarkb | then add that to the fips jobs after the reboot | 16:03 |
clarkb | a wait for network to be up (checked via ssh connectivity?), restart zuul console logger, and validate dns resolution are the three steps I can think of off the top of my head | 16:04 |
johnsom | The bonus with the switch to ansible is you would have access to the mirror FQDN | 16:06 |
ade_lee | sounds like a reasonable plan | 16:08 |
ade_lee | where would the ssh check connect to? | 16:09 |
ade_lee | and whats the ansible parameter that specifies the mirror FQDN? | 16:10 |
clarkb | ade_lee: zuul-executor ansible being able to connect to the node that rebooted post reboot | 16:10 |
clarkb | I'm sure you're already doing that in a wait_for or something. I'm just suggesting we can collect these common post reboot actiosn into a single role | 16:11 |
*** marios is now known as marios|out | 16:14 | |
ade_lee | clarkb, ack. | 16:18 |
ade_lee | johnsom, clarkb , fungi frickler -- If we agree on this approach, I can start putting such a role together. | 16:18 |
ade_lee | something that does a wait_for, restarts the zuul console, and checks dns by resolving opendev.org | 16:19 |
johnsom | ade_lee zuul_site_mirror_fqdn | 16:20 |
ade_lee | johnsom, sorry yes ^^ that one :) | 16:20 |
fungi | to make the role generic, opendev.org should be at most a default for some rolevar so other sites can supply a record they expect to be resolvable by their nodes | 16:23 |
clarkb | or make it a required value | 16:24 |
fungi | yeah, ideally we'd ask it to resolve $zuul_site_mirror_fqdn in our deployment, but other sites may want to supply a different record to test | 16:25 |
*** dviroel|lunch is now known as dviroel | 16:25 | |
ade_lee | ok - so resolve $zuul_site_mirror_fqdn if set, else some rolevar which defaults to opendev.org ? | 16:26 |
clarkb | ade_lee: I think I would make it a required var input. Maybe suggest that you can use $zuul_site_mirror_fqdn if using mirrors | 16:27 |
clarkb | then people using the role can decide if google.com is more appropriate | 16:27 |
ade_lee | ok | 16:28 |
*** jpena is now known as jpena|off | 16:31 | |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Re-sync test-mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842572 | 16:34 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842573 | 16:34 |
clarkb | corvus (and the rest of infra-root) only our openstack tenant uses the deprecated zuul queue syntax according to that script as of the last 10 minutes or so | 16:34 |
clarkb | I'm working on an email to the openstack-discuss list now. There are a number of projects so I'm going to do my best to catch the attention of those that need it | 16:35 |
fungi | thanks for checking it! | 16:35 |
johnsom | I just posted patches for octavia and designate repos. | 16:35 |
fungi | awesome! | 16:36 |
fungi | don't forget your stable branches, if you also set it there | 16:36 |
johnsom | Yeah, that will be some work/time to get through. | 16:36 |
corvus | clarkb: re projects that are unmaintained -- maybe consider dropping them from the zuul config if they don't clean up errors after a certain time? | 17:00 |
corvus | they can always be added back later easily enough | 17:00 |
clarkb | ya I think that is a reasonable appraoch to take | 17:00 |
fungi | i concur | 17:07 |
clarkb | ok email sent | 17:10 |
clarkb | fungi: it is just over the size limit if you can moderate it through | 17:11 |
fungi | gladly | 17:12 |
clarkb | (I attached a file with all the branch and file info for each project which did that) | 17:12 |
fungi | i discarded your message and approved the ones for hydraulics investment opportunities and shipping notices | 17:12 |
fungi | (just kidding, it was the other way around) | 17:13 |
clarkb | I'm always open to good investment opportunities | 17:14 |
*** timburke__ is now known as timburke | 17:28 | |
opendevreview | Merged zuul/zuul-jobs master: Re-sync test-mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842572 | 18:01 |
opendevreview | Merged zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842573 | 18:03 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Make test-prepare-workspace-git role https://review.opendev.org/c/zuul/zuul-jobs/+/842598 | 18:11 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Make test-prepare-workspace-git role https://review.opendev.org/c/zuul/zuul-jobs/+/842598 | 18:40 |
opendevreview | Merged zuul/zuul-jobs master: Make test-prepare-workspace-git role https://review.opendev.org/c/zuul/zuul-jobs/+/842598 | 19:29 |
opendevreview | James E. Blair proposed opendev/base-jobs master: Switch base-test to test-prepare-workspace-git https://review.opendev.org/c/opendev/base-jobs/+/842615 | 20:12 |
corvus | infra-root: ^ a base-test change to prepare us for ansible 5 | 20:12 |
*** dviroel is now known as dviroel|out | 20:45 | |
clarkb | I'm trying to push a few of these queue update changes to projects that are likely abandoned (but figure make it easy for them to address and then remove from zuul projects list if they don't) and am discovering they don't even have valid .gitreview configs on their branches ugh | 20:59 |
clarkb | I tried to push to stable/xena and it pushed a second patchset to my master change. Trying to push to ussuri-test made a second stable/ussuri chagne and overriding the branch doesn't seem to do anything due to some .gitreview config they have | 21:00 |
clarkb | https://review.opendev.org/q/topic:fix-queue-config that was fun | 21:11 |
fungi | yeah, i just make a point of always telling git-review what branch to target when doing that sort of thing, for exactly that reason | 21:58 |
clarkb | learned my lesson | 22:22 |
fungi | make no assumptions | 22:24 |
corvus | fungi: got a sec for https://review.opendev.org/842615 ? | 22:24 |
fungi | lookin' | 22:24 |
corvus | would like to keep the base-test cycle moving | 22:24 |
fungi | me too, thanks! | 22:25 |
opendevreview | Merged opendev/base-jobs master: Switch base-test to test-prepare-workspace-git https://review.opendev.org/c/opendev/base-jobs/+/842615 | 22:29 |
opendevreview | Merged openstack/project-config master: update generate constraints to py38,39 https://review.opendev.org/c/openstack/project-config/+/837815 | 22:35 |
ianw | ok, sorry i got distracted yesterday but i've parsed https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f46993d83ff4abb310ef7b4beced56ba96f0d9d now | 23:43 |
clarkb | I was disrtacted yesterday too :) | 23:43 |
ianw | spec_store_bypass_disable and spectre_v2_user can both be set to "seccomp" or "prctl" | 23:44 |
ianw | if it's seccomp, every seccomp() enabled thing will try to enable these flags for the process. if it's prctl, it becomes an opt-in thing userland needs to set explicitly | 23:45 |
ianw | 0x4 is ssbd from previous investigation. so it is presumably spec_store_bypass_disable that is causing the problems | 23:46 |
ianw | i just need to rejig my test machine back to the standard kernel, but i'll try booting that with spec_store_bypass_disable=prctl and i expect the flood of messages goes away | 23:48 |
ianw | oh, and that change modifed the default kernel to turn it to prctl, because, as the changelog goes into, they're basically unhelpful | 23:49 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 2.8 https://review.opendev.org/c/zuul/zuul-jobs/+/842647 | 23:50 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 2.9 https://review.opendev.org/c/zuul/zuul-jobs/+/842648 | 23:50 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 5 https://review.opendev.org/c/zuul/zuul-jobs/+/842649 | 23:50 |
ianw | so hopefully we can get an argument to someone upstream that jammy kernels should do the same and it can be backported. however that doesn't solve the immediate issue of jammy nodes having hundreds of megabytes of logs on OVH | 23:51 |
clarkb | is that something that can be set via sysfs on boot? | 23:51 |
clarkb | or maybe via a kernel flag? | 23:52 |
clarkb | if so we could make dib element modify that? | 23:52 |
ianw | hrm, yes i wonder if sysctl works dynamically. i'm just reinstalling kernels and can test | 23:54 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!