opendevreview | Merged openstack/ironic-inspector master: CI: re-add genade job to normal CI queues https://review.opendev.org/c/openstack/ironic-inspector/+/895863 | 00:11 |
---|---|---|
opendevreview | Jake Hutchinson proposed openstack/bifrost master: Bifrost NTP configuration https://review.opendev.org/c/openstack/bifrost/+/895691 | 08:45 |
dtantsur | JayF, could we squeeze https://review.opendev.org/c/openstack/ironic-inspector/+/881463 in the release? The IPA deprecation part has been there for some time. | 09:34 |
*** vanou is now known as Guest737 | 10:33 | |
opendevreview | Merged openstack/ironic-inspector master: Handle bracketed IPv6 redfish_address https://review.opendev.org/c/openstack/ironic-inspector/+/895734 | 10:57 |
iurygregory | morning ironic o/ | 11:15 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic master: RedfishFirmware Interface https://review.opendev.org/c/openstack/ironic/+/885425 | 11:35 |
iurygregory | ok, funny thing I still get the Failed to set node power state to power on. =( | 11:36 |
iurygregory | maybe is a bad firmware I'm updating? <thinking> | 11:36 |
opendevreview | Merged openstack/ironic-inspector master: Support LLDP data coming in the new field https://review.opendev.org/c/openstack/ironic-inspector/+/881463 | 12:38 |
opendevreview | Harald Jensås proposed openstack/ironic-inspector stable/2023.1: Handle bracketed IPv6 redfish_address https://review.opendev.org/c/openstack/ironic-inspector/+/895906 | 12:49 |
TheJulia | Bad firmware or firmware not ready | 13:03 |
TheJulia | Let’s love forward, and just be mindful to backport fixes | 13:04 |
mmalchuk | good day Ironic o/ | 13:06 |
iurygregory | TheJulia, yeah agree, trying to understand if there is something else I could do to handle the case... like re-try to power on after some time maybe... | 13:16 |
iurygregory | if the error was ironic.common.exception.PowerStateFailure: Failed to set node power state to power on. | 13:16 |
iurygregory | but in general, the code updates the firmware, the information is update in the DB (in case of the power failure I haven't seen the versions being updated) | 13:18 |
TheJulia | I think, if it does fail to power on, and our expected state is power on, we'll fix that eventually | 13:19 |
TheJulia | but you can only try/except/retry ;) so many different cases with minimal data | 13:20 |
mmalchuk | folks, please review https://review.opendev.org/c/openstack/diskimage-builder/+/895486 and trivial fixes in the related chain | 13:21 |
iurygregory | TheJulia, yeah =( I wish things could be more deterministic lol | 13:22 |
TheJulia | iurygregory: you need lots of data points | 13:22 |
iurygregory | yup | 13:23 |
TheJulia | iurygregory: if your super worried about it, add a release note indicating we would love feedback since it is a new feature and we as a developer community have limited hardware access | 13:23 |
TheJulia | or something along those lines | 13:23 |
TheJulia | we don't have everything everyone is using | 13:23 |
iurygregory | ++ let me update the release note, I'm also pushing a separate patch with the docs about the feature | 13:24 |
iurygregory | will also mention there | 13:24 |
TheJulia | ++ | 13:26 |
TheJulia | "warning: may have sharp edges... and dull ones too. please reach out to the ironic developer community with any unexpected behavior." | 13:27 |
iurygregory | oh perfect CI doesn't seem in a good shape in ironic :D or maybe is just my patch | 13:34 |
TheJulia | iurygregory: link? | 13:40 |
TheJulia | help me see what you see! | 13:40 |
dtantsur | mmalchuk, I think only TheJulia has any rights on DIB, and she has already reviewed | 13:41 |
TheJulia | stevebaker[m] does as well | 13:41 |
TheJulia | and there is the #openstack-dib channel | 13:41 |
iurygregory | https://zuul.opendev.org/t/openstack/build/529fc30cdf5f4abeb014352780de5ac3 https://zuul.opendev.org/t/openstack/build/f7028e211ccb4c2e87b7c4bdb3d95ee5 https://zuul.opendev.org/t/openstack/build/8a71f08821f64da4a1186465c7e96351 I'm focusing in the standalone first | 13:42 |
mmalchuk | dtantsur thank you | 13:42 |
iurygregory | metalsmith seems unhappy and bifrost also .-. | 13:42 |
mmalchuk | TheJulia thanks for link | 13:42 |
TheJulia | really?!? | 13:43 |
TheJulia | (that was for iurygregory ) | 13:43 |
iurygregory | :D | 13:43 |
iurygregory | ok, failed to start uefi doesn't seem like a good sign | 13:44 |
iurygregory | https://zuul.opendev.org/t/openstack/build/529fc30cdf5f4abeb014352780de5ac3/log/controller/logs/ironic-bm-logs/node-3_console_log.txt | 13:44 |
TheJulia | iurygregory: I think it is your patch maybe | 13:51 |
TheJulia | or maybe not | 13:51 |
TheJulia | https://paste.opendev.org/show/bxrj0E82McAM8RMLIcpe/ | 13:53 |
iurygregory | yeah, trying to figure out what could have cause this till patch set 4 CI was green on it | 13:53 |
TheJulia | perhaps a dummy change on CI just to see it's current health | 14:09 |
iurygregory | yeah pushing now | 14:15 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic master: [DNM] Testing https://review.opendev.org/c/openstack/ironic/+/895938 | 14:23 |
TheJulia | anyone have the ptg etherpad link handy? | 14:49 |
TheJulia | https://etherpad.opendev.org/p/ironic-ptg-october-2023 | 14:50 |
iurygregory | on my patch the functional job is hitting timed_out, in the dnm is green =( | 14:54 |
iurygregory | yeah stadnalone is also green | 15:08 |
TheJulia | I think I see what is going on | 15:36 |
TheJulia | iurygregory: want to jump on a call and talk through it? | 15:37 |
iurygregory | sure | 15:37 |
iurygregory | let me get a meet link | 15:37 |
TheJulia | ok | 15:38 |
iurygregory | TheJulia, https://meet.google.com/afo-jzvj-vpj | 15:39 |
JayF | FYI we (well, openstack telemetry project) are adding an OSC plugin to talk to prometheus | 15:48 |
JayF | pretty neat | 15:48 |
JayF | https://review.opendev.org/c/openstack/governance/+/894915 | 15:48 |
clarkb | JayF: this is the query data in prometheus? | 15:49 |
clarkb | "communication with prometheus" is a bit ambiguous :) | 15:49 |
JayF | https://github.com/infrawatch/python-observabilityclient | 15:51 |
JayF | > observabilityclient is an OpenStackClient (OSC) plugin implementation that implements commands for management of Prometheus. | 15:51 |
JayF | that's where it's being imported from | 15:51 |
JayF | I' | 15:51 |
JayF | **I am going to begin cutting releases and branches for all Ironic things except Ironic proper | 15:51 |
clarkb | side note: All those osc plugins you are installing are what make osc performance terrible | 15:51 |
clarkb | the plugin registration system adds significant overhead to python process startup times | 15:52 |
JayF | TheJulia: iurygregory: /me will be more appreciative of those tests in the future :D | 16:03 |
TheJulia | :) | 16:03 |
iurygregory | :D | 16:04 |
opendevreview | Mark Goddard proposed openstack/bifrost master: ironic: Perform online data migrations with localhost DB https://review.opendev.org/c/openstack/bifrost/+/895948 | 16:11 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic master: RedfishFirmware Interface https://review.opendev.org/c/openstack/ironic/+/885425 | 16:13 |
JayF | mgoddard: I suspect it's being used as a proxy for sqlite | 16:25 |
JayF | mgoddard: and we don't support sqlite migrations | 16:25 |
JayF | mgoddard: just a hunch | 16:25 |
JayF | fyi; I'm tracking my work on making Ironic releases for 2023.2: https://etherpad.opendev.org/p/ironic-bobcat-releases in case any of the release managers on the team want to review my work | 16:30 |
iurygregory | JayF, ack | 16:33 |
iurygregory | if you want to share the overhead I can take care of some releases | 16:33 |
JayF | You have one job | 16:34 |
* JayF kicks the redfish bmc | 16:34 | |
JayF | lol | 16:34 |
JayF | I don't mind overviewing all the releases, if you want to offer peace of mind you can check my work | 16:34 |
iurygregory | I'm checking the patches you have open o/ | 16:37 |
JayF | iurygregory: I was all worried about us cutting a bifrost release outta master and not doing bugfix/ or stable/ branching at the time | 16:38 |
JayF | I ran around my a little while worried until it occurred to me: that is the happy path for all other projects | 16:38 |
JayF | we are the weirdos who love a good branch lol | 16:38 |
iurygregory | didn't we make a decision about not having the bugfix in bifrost? | 16:38 |
JayF | in this situation, it's more that when we cut libraries a month ago | 16:39 |
JayF | we didn't branch | 16:39 |
JayF | which is something to look out for enxt time | 16:39 |
iurygregory | yeah | 16:39 |
iurygregory | Jesus https://review.opendev.org/q/project:openstack/bifrost+status:open+branch:master | 16:40 |
iurygregory | a lot of merge conflict :D | 16:40 |
iurygregory | https://review.opendev.org/c/openstack/bifrost/+/884198 this would be good to include in bifrost, but we can have this as backport also | 16:41 |
JayF | I have no problem handling that as a backport | 16:41 |
iurygregory | JayF, fyi molteniron we don't have releases | 16:42 |
iurygregory | :D | 16:42 |
JayF | ack | 16:42 |
JayF | that is just a list from parsing projects.yaml | 16:42 |
JayF | which I do because I sometimes forget things we manage LOL | 16:42 |
iurygregory | just to be sure, the list you have in the etherpad are you planning to cut the stable branch right? | 16:42 |
JayF | that is my list of things to check | 16:42 |
JayF | if it has a bullet under it, it's been checked that's the status | 16:42 |
JayF | iurygregory: fwiw I usually use the release cycle boundry as a good oppo to see if the indepedent projects need a release, it's just easy to do +4 more :D | 16:51 |
iurygregory | agree | 16:52 |
iurygregory | is just a reminder that we don't have stable branches for them | 16:52 |
JayF | iurygregory: TheJulia: others; any objection to me making sushy-tools next release 1.0.0? | 16:53 |
* JayF really dislikes the 0.x.y styling of releases | 16:53 | |
JayF | I'm pretty sure it works :D | 16:53 |
iurygregory | it's a good point | 16:54 |
* JayF JFDI | 16:54 | |
iurygregory | I don't have objections tbh | 16:54 |
JayF | mgoddard: tenks hasn't been released since 2019; and has a weird branching model. We should probably get some understanding of how releases there should be managed (or maybe just ... no more releases and change the model?) | 17:02 |
iurygregory | JayF, I've added a comment in the metalsmith one | 17:02 |
JayF | updated | 17:02 |
JayF | yeah that's gone | 17:03 |
JayF | mgoddard: /me sends you an inside slack about this since I don't know how much you IRC :) | 17:04 |
JayF | I'm cutting inspector now, that only leaves Ironic | 17:12 |
JayF | which I will wait to cut until Iury's patch gets along or at some point tomorrow if it's not gonna make it (but it will) | 17:12 |
JayF | Note that at this point; any changes landed in master to any Ironic project except openstack/ironic will miss the release unless you contact me to update the releases patch | 17:15 |
iurygregory | JayF, I have the feeling that ngs is broken... | 17:26 |
JayF | iurygregory: it is until neutron branches | 17:26 |
JayF | iurygregory: I have a note on the PTG etherpad about it; I think in the future we need to branch and release it at library time | 17:26 |
iurygregory | ok, so I probably missed something lol | 17:26 |
JayF | iurygregory: it's all requirements-shifting-bs | 17:26 |
iurygregory | gotcha | 17:26 |
JayF | iurygregory: line 76, it talks about networking-bm but I expect ngs to have similar problems | 17:27 |
JayF | https://etherpad.opendev.org/p/ironic-ptg-october-2023 | 17:28 |
iurygregory | JayF, tks! | 17:28 |
iurygregory | JayF, ironic-lib I don't think we need, like metalsmith | 17:33 |
JayF | iurygregory: ack; got it | 17:34 |
TheJulia | yeah, anything which uses neutron lib can break, but I think ngs was okay last I looked sans the dlm testing is failing because of something with etcd. I don't quite grok where things went sideways though | 17:45 |
iurygregory | humm | 18:03 |
* iurygregory checks something | 18:03 | |
TheJulia | it happens, we've had to fix it post-release in the past | 18:13 |
TheJulia | iurygregory: +2'ed, "short order" in my book is the next couple of weeks. | 18:19 |
TheJulia | three things, two minor, one I think you'll be spending some time on, but given the amount of testing I'm comfortable at this time | 18:19 |
iurygregory | ack | 18:20 |
TheJulia | So, any odds on if my car will be done today? :) | 18:21 |
iurygregory | depends on what they are fixing I would say | 18:21 |
JayF | based on my recent experience with repair professionals of all kinds | 18:22 |
JayF | probably not | 18:22 |
JayF | did they tell you it'd be done last week sometime? if so, maybe | 18:22 |
iurygregory | https://zuul.opendev.org/t/openstack/build/9fd16425ab244fb8ac4959ffe2594578/log/job-output.txt#10170 seems like ovs is mad in ngs | 18:23 |
TheJulia | JayF: "most likely today", which really means tomorrow | 18:25 |
JayF | most likely today means they are starting on it today | 18:26 |
JayF | and maybe tomorrow if there's no shenanigans | 18:26 |
JayF | but shenanigans pay well so good luck :P | 18:26 |
TheJulia | heh | 18:27 |
iurygregory | https://stackoverflow.com/questions/48577019/not-able-create-ports-in-ovs | 18:28 |
iurygregory | I'm wondering if this is what we are hitting in ngs | 18:29 |
JayF | Does NGS' CI do anything that Ironic's doesn't? | 18:29 |
JayF | meaning in terms of environmental setup | 18:29 |
JayF | (I'm curious if that is the breakage, would we see it in Ironic; if so, can you steal the fix from ironic/devstack/etc) | 18:30 |
iurygregory | well the job I was looking is pure tempest testing.. | 18:30 |
TheJulia | it does some additional networking setup | 18:30 |
iurygregory | thinking of giving a try in https://opendev.org/openstack/networking-generic-switch/src/branch/master/devstack/plugin.sh#L128 | 18:30 |
TheJulia | but https://60870c86ba00a6b46654-40c40653085e805e0ea6c3df0bb43128.ssl.cf2.rackcdn.com/886404/1/check/networking-generic-switch-tempest-dlm/9fd1642/controller/logs/screen-q-svc.txt is downright fatal | 18:31 |
JayF | iurygregory: TheJulia: I'm going to go -1 my releases patch for ngs while you are digging on this | 18:31 |
iurygregory | right etcd is angry | 18:31 |
JayF | No I'm not, just kidding, it's just a branch create not a release | 18:32 |
TheJulia | yeah, I've been trying to get people to look/chime in for weeks on this | 18:32 |
JayF | TheJulia: this is in johnthetubaguy's changes that merged this cycle, yes? | 18:32 |
TheJulia | no | 18:32 |
TheJulia | I don't think so | 18:32 |
JayF | I thought that's what used etcd | 18:32 |
TheJulia | the dlm code has been there for ages, but for some reason it can't find the entry | 18:33 |
TheJulia | now, maybe he touched it, dunno | 18:33 |
TheJulia | but it seems to have started failing after that s well | 18:33 |
TheJulia | s/\ s\ /\ as\ / | 18:34 |
TheJulia | well, https://review.opendev.org/c/openstack/networking-generic-switch/+/743283 was the last patch to merge | 18:41 |
opendevreview | Jay Faulkner proposed openstack/networking-generic-switch master: DNM: Revert "Support batching up commands" https://review.opendev.org/c/openstack/networking-generic-switch/+/895915 | 18:42 |
iurygregory | interesting | 18:42 |
JayF | just some science | 18:42 |
JayF | if we nail it down to that patch it'll make it easier | 18:42 |
iurygregory | ++ | 18:42 |
TheJulia | i don't think it is though, it raises the exception in the dlm code | 18:42 |
TheJulia | which is not what that patch touched | 18:42 |
iurygregory | hummm | 18:43 |
iurygregory | but the patch added the etcg3gw requirement | 18:44 |
iurygregory | https://review.opendev.org/c/openstack/networking-generic-switch/+/743283/11/requirements.txt | 18:44 |
TheJulia | hmmm | 18:45 |
TheJulia | yeah | 18:45 |
TheJulia | intresting | 18:45 |
TheJulia | since i thought the dlm code was previously using it | 18:45 |
JayF | I bet it's something like | 18:45 |
iurygregory | *magic* | 18:45 |
JayF | if etcd3gw is installed | 18:45 |
JayF | tooz wants to use it as the backend | 18:45 |
opendevreview | Julia Kreger proposed openstack/networking-generic-switch master: DNM: Revert "Support batching up commands" https://review.opendev.org/c/openstack/networking-generic-switch/+/895916 | 18:45 |
iurygregory | 2 reverts? lol | 18:46 |
JayF | just for science | 18:46 |
JayF | not actually going to land them | 18:46 |
TheJulia | oh, doh | 18:46 |
iurygregory | :D | 18:46 |
iurygregory | agree, two showing the same result is better than one | 18:46 |
TheJulia | I abandoned mine | 18:46 |
iurygregory | going to the gym now, be back in about 2hrs | 18:47 |
TheJulia | have fun, heading back to the house to wait for car in air conditioning | 18:47 |
JayF | TheJulia: iurygregory: https://github.com/openstack/networking-generic-switch/blob/master/networking_generic_switch/devices/__init__.py#L173 returns true in all cases, I think | 18:50 |
JayF | which leads to etcd/tooz coordination getting returned in every case | 18:50 |
JayF | when in CI we should have it disabled until we do the work to enable it | 18:50 |
JayF | hmm or backend_url is set | 18:53 |
JayF | line 145, so it is the elif case there | 18:53 |
JayF | that code /should not be running/ | 18:53 |
JayF | https://60870c86ba00a6b46654-40c40653085e805e0ea6c3df0bb43128.ssl.cf2.rackcdn.com/886404/1/check/networking-generic-switch-tempest-dlm/9fd1642/controller/logs/etc/neutron/plugins/ml2/ml2_conf_genericswitch.ini | 18:57 |
JayF | we tell it etcd is there, but it's not or is not configured | 18:57 |
JayF | https://60870c86ba00a6b46654-40c40653085e805e0ea6c3df0bb43128.ssl.cf2.rackcdn.com/886404/1/check/networking-generic-switch-tempest-dlm/9fd1642/controller/logs/screen-etcd.txt and etcd is running | 18:58 |
JayF | Sep 14 00:37:08.318687 np0035244654 etcd[23648]: advertise client URLs = http://10.209.64.96:2379 | 18:58 |
JayF | this is correct, too | 18:59 |
JayF | firewalling, perhaps? | 18:59 |
JayF | since it's going to ip:port and not localhost:port? | 18:59 |
JayF | and etcd had been running for 2 minutes at the point in which it's called | 19:00 |
JayF | WTF | 19:00 |
opendevreview | Jay Faulkner proposed openstack/networking-generic-switch master: CI fix: Ensure we use the same ETCD version tooz CI does https://review.opendev.org/c/openstack/networking-generic-switch/+/895973 | 19:07 |
JayF | trying ^ since I came to the conclusion the most likely explanation is that etcd3gw and etcd were incompatible, given they were all configured correctly | 19:12 |
JayF | iurygregory: I'm approving firmware interface; please ensure you get the followup with TheJulia's issues resolved pushed up ASAP; I'd prefer just have that and not need to backport (I wanna cut Ironic by EOD tomorrow so I think we have time) | 19:13 |
JayF | if that etcd version lock works; I might make a -nv version of the job on master after branch is cut that uses etcd and a voting version that doesn't; just to help us isolate failures (since I anticipate "etcd version mismatch" might be a recurring pain) | 19:14 |
JayF | yes, confirmed | 19:19 |
JayF | https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.4.md?plain=1#L811 + https://github.com/openstack/tooz/blob/master/tooz/drivers/etcd3gw.py#L204 | 19:19 |
JayF | I think that tells teh story | 19:20 |
JayF | and my version lock should work | 19:20 |
TheJulia | Interesting | 19:21 |
TheJulia | Seems like tooz needs a fix then too | 19:21 |
TheJulia | I guess that can just be next cycle too | 19:21 |
JayF | I just mentioned in #openstack-oslo | 19:21 |
JayF | and also you can force api version in config via url | 19:21 |
JayF | so it's not that bad | 19:21 |
TheJulia | great find! | 19:22 |
JayF | also I'll note: NGS is fine | 19:22 |
TheJulia | \o/ | 19:22 |
JayF | this is just operational shenanigans in CI | 19:22 |
JayF | (assuming this is correct) | 19:22 |
JayF | which is relatively safe at this point, I think | 19:22 |
JayF | that etcd version 404'd | 20:07 |
TheJulia | well, etcd, if we got a newer etcd by default via the gate, it would break then | 20:08 |
JayF | I'm just looking at releases page to find one that works | 20:08 |
JayF | to get this passing for now, until we talk to oslo about fixing tooz | 20:09 |
JayF | I'd also rather not code in a default version to our devstack | 20:09 |
clarkb | Jan Gutter had a thread on openstack-discuss about upgrading etcd | 20:09 |
clarkb | starts on August 10 according to my email client | 20:09 |
clarkb | https://review.opendev.org/c/openstack/tooz/+/891355 seems this is on peoples radar just stalled out maybe? | 20:10 |
JayF | ack, makes sense | 20:11 |
JayF | I will try to just fix our job locally then | 20:11 |
clarkb | oh maybe it is waiting on the pifpaf release | 20:11 |
opendevreview | Jay Faulkner proposed openstack/networking-generic-switch master: CI fix: Use the un-deprecated v3 etcd API https://review.opendev.org/c/openstack/networking-generic-switch/+/895973 | 20:14 |
JayF | clarkb: thank you for that pointer, I never explicitly said that :D | 20:19 |
clarkb | you're welcome | 20:19 |
JayF | I think for our purposes; we just need to fix CI and help (if not too many cooks) getting that stuff landed in tooz | 20:19 |
JayF | all the knobs are there for the operators to ensure ngs works with their installed etcd version | 20:20 |
JayF | that failing NGS job is passing with that new patchset | 20:36 |
JayF | there's still another job running but that's even more confirmation our release is good and works \o/ | 20:36 |
opendevreview | Merged openstack/ironic master: RedfishFirmware Interface https://review.opendev.org/c/openstack/ironic/+/885425 | 21:05 |
iurygregory | I'm back | 21:44 |
iurygregory | so we need to add "?api_version=v3" | 21:44 |
iurygregory | an this will magically fix ngs? | 21:44 |
TheJulia | seems like it | 21:45 |
iurygregory | what kind of sorcery is that?! | 21:46 |
iurygregory | I'm ok with hardcoding (we should probably add some note regarding this so we don't forget to change or something | 21:46 |
JayF | tooz defaults to v3alpha | 21:46 |
JayF | which is gone since 3.3 | 21:46 |
iurygregory | perfect! | 21:46 |
JayF | tooz parses that arg and uses the new one | 21:46 |
JayF | they have a patch to fix it up but it's delayed on deps | 21:47 |
iurygregory | got it | 21:47 |
iurygregory | we should keep on our radar | 21:47 |
iurygregory | :D | 21:47 |
iurygregory | and maybe see if we can land other patches in ngs other than the fixes if we are ok with that... | 21:47 |
JayF | They will not be a part of the branch/release unless the patch for those are updated | 21:48 |
JayF | my preference would be to land things as if they are going in caracal and evaluate for backport | 21:49 |
JayF | rather than rushing in more changes to a project we just got warm-fuzzies about the CI of :D | 21:49 |
TheJulia | if we don't, the release is nearly identical to 2023.1 | 21:51 |
TheJulia | just as a data point | 21:51 |
iurygregory | updated? | 21:51 |
iurygregory | maybe I'm lost here | 21:51 |
TheJulia | Jay just wants to ship the release, and not have to update the release branch | 21:51 |
TheJulia | the branch to be cut | 21:51 |
TheJulia | that is | 21:51 |
JayF | Right now, the HEAD of ngs master is tagged in a releases change as the sha for stable/2023.2 | 21:52 |
JayF | changing that and/or backporting things for another release are ~trivial? | 21:52 |
TheJulia | changing it is super trivial, but we have to reach consensus | 21:52 |
JayF | let me put it this way | 21:52 |
JayF | lets not talk about imaginary patches | 21:53 |
JayF | lets talk about what's up and eligible, potentially | 21:53 |
TheJulia | basically everything sitting there looks like features | 21:53 |
iurygregory | https://review.opendev.org/c/openstack/networking-generic-switch/+/874793 https://review.opendev.org/c/openstack/networking-generic-switch/+/886405 | 21:53 |
TheJulia | so we wouldn't backport htem | 21:53 |
iurygregory | https://review.opendev.org/c/openstack/networking-generic-switch/+/847592 | 21:53 |
iurygregory | this 3 would need recheck after we land the fix | 21:53 |
iurygregory | unless they have conflict between them | 21:53 |
JayF | afaict 793 is CI/testing related, I'm not sure having it in a release would matter to foolks using it in that context but IMBW | 21:54 |
iurygregory | and doesn't seem like they do | 21:54 |
JayF | 405 is a bugfix that should land | 21:54 |
JayF | and would be backportable | 21:54 |
TheJulia | i guess 592 is also a fix | 21:55 |
TheJulia | so we can backport that | 21:55 |
iurygregory | makes sense | 21:55 |
JayF | 592 is theoretically a fix | 21:55 |
JayF | I do not know NGS well enough to make that judgement if it's OK or not | 21:55 |
iurygregory | green https://review.opendev.org/c/openstack/networking-generic-switch/+/895973 | 21:55 |
JayF | changing when something happens is worrisome in terms of "will the status quo keep working" but I should trust CI :) | 21:56 |
JayF | TheJulia: iurygregory: To be clear: my concern is not "I don't wanna change a handful of characters in a PR", it's more about going "are we sure this works?" to "lets land a bunch of changes now that we are" is a little whiplash | 21:56 |
JayF | note that I indicated not landing stuff pre-release was my preference | 21:57 |
iurygregory | I see =) | 21:57 |
JayF | not trying to dictate just like ... a little skiddish :) | 21:57 |
TheJulia | almost nobody reviews n-g-s | 21:57 |
JayF | I don't because I don't trust myself to, bluntly | 21:57 |
TheJulia | so... all I can do is recheck and trust CI | 21:57 |
JayF | which is playing into the conservative preference I have | 21:57 |
TheJulia | if we don't trust CI though... that is a whole other issue | 21:57 |
iurygregory | in CI we trust | 21:57 |
TheJulia | .... it is our community ethos | 21:58 |
JayF | true | 21:58 |
JayF | but 90% of the community that has that ethos already has their rc releases in ;) | 21:58 |
JayF | lol | 21:58 |
iurygregory | when we don't have CI we trust from the data points from testing in real hardware :D | 21:58 |
JayF | I trust you all to make the right decision | 21:58 |
TheJulia | iurygregory: heh | 21:58 |
JayF | I have a preference but it's not based in real technical proof just feelings | 21:58 |
JayF | If "trust CI" is the consensus; land my CI fix and rebase stuff tomorrow and we'll see where we are | 21:59 |
iurygregory | TheJulia, feel free to +W https://review.opendev.org/c/openstack/networking-generic-switch/+/895973 :D | 21:59 |
TheJulia | i just did | 21:59 |
JayF | iurygregory: TheJulia: FWIW; I have an email in my inbox from HPE | 21:59 |
iurygregory | we don't need rebase *I think* we just wait for the promote job to finish and recheck | 21:59 |
JayF | asking if we'd remove the ilo driver if they stopped 3rd party CI | 21:59 |
iurygregory | oh wow | 22:00 |
iurygregory | jesus | 22:00 |
JayF | they want to change to a "validate at the end of the cycle" model | 22:00 |
JayF | I'm unsure how to respond, and I want to put it on the PTG and invite HPE to the session about it; does that sound alright to folks? | 22:00 |
* iurygregory is not aware of this ... | 22:00 | |
TheJulia | Yeah, lets see if we can talk through the mechanics of what they are thinking and the risks | 22:00 |
TheJulia | because... we *cannot* delay for them | 22:00 |
iurygregory | thinking a bit, we still support some drivers that stopped 3rd Party CI | 22:01 |
TheJulia | eh, we're kind of in a "if we get a report it is broken, out it goes" | 22:01 |
iurygregory | or maybe patches didn't trigger their CI... | 22:01 |
TheJulia | mode | 22:01 |
iurygregory | haven't seen FJ or Dell reporting in patches... | 22:01 |
TheJulia | But their 3rd party CI has been broken for a while because they have local chages | 22:01 |
TheJulia | yeah | 22:01 |
iurygregory | so maybe they only run if we change files related to their driver | 22:02 |
JayF | My major concern is not policy/precedent/etc | 22:02 |
TheJulia | I think I could be fine with a validate at the end of the cycle model, but I don't want to be in a situation like this | 22:02 |
JayF | it's that ilo is about to release a substantial change | 22:02 |
JayF | and it's a really, really crummy timing for them to pull 3rd party ci | 22:02 |
JayF | but I guess that's baked in the likely-invalid assumption that they'd version-bump | 22:02 |
iurygregory | ilo6 is already out | 22:02 |
TheJulia | ilo folks *are* better about bumping and versioning | 22:03 |
TheJulia | but yeah, that too is a risk | 22:03 |
TheJulia | Lets set aside an hour, and try to get them to bring their thoughts/concerns | 22:03 |
JayF | My thought is that the best way to handle it would be to try and get them to come to a PTG session | 22:03 |
TheJulia | and discuss the risks | 22:03 |
JayF | Hmm you have more exp working with them so if you think a private meeting is better I can do that too | 22:03 |
TheJulia | Maybe time for perception to be revisited on our part as well, its not a bad thing | 22:03 |
JayF | then we can take the output to the PTG Session on hardware drivers | 22:03 |
iurygregory | ++ | 22:04 |
TheJulia | ++ | 22:04 |
JayF | Who wants in on it? | 22:04 |
iurygregory | feel free to add me, it will depend when its ofc =) | 22:04 |
TheJulia | sure! | 22:05 |
JayF | I assumed it might be a 4am/9pm meeting for me | 22:05 |
JayF | honestly I don't hate the idea of qualification | 22:08 |
JayF | but if we went that route we'd have to make our release process have a good spot for that tbh | 22:08 |
JayF | cycle-with-intermediary-and-rc-and-all-of-the-things /s | 22:09 |
TheJulia | lets maybe not try and reflect this in the community release modeling | 22:10 |
* TheJulia *really* hates the RC model | 22:10 | |
TheJulia | like... unreasonably so | 22:10 |
JayF | I'm more or less saying, you start pulling in third party QA people to help validate releases, you start flirting with things like "freezes" and the like potentially being needed | 22:12 |
JayF | I'm not saying "these things are tied together" I'm saying "Ironic's special release framework gives us an extra thing to think about with consideration of that" :) | 22:12 |
TheJulia | oh, very much so | 22:14 |
* iurygregory also hates the RC model :D | 22:15 | |
TheJulia | err, and they were not able to verify the major issue *why* I took the car to the dealer | 22:16 |
TheJulia | another aspect is intermediate releases | 22:17 |
JayF | lolyep | 22:17 |
JayF | bugfix releases, do they get validated? etc | 22:17 |
TheJulia | yup | 22:17 |
JayF | and plus this model is extra rough for anyone community implementing anything | 22:17 |
JayF | they can only hope it works against theirs means it works in a general case | 22:17 |
JayF | which I guess is same for 3rd party CI except it's whatever boxes they have running the4re | 22:18 |
TheJulia | Yeah, and then the other concern is what patches get applied for the validation | 22:18 |
TheJulia | I think in every third party CI, we've seen one or more patches carried locally they rebase over, often it is just config/devstack stuffs, but... yeah | 22:19 |
TheJulia | just makes it harder for us to be aware | 22:19 |
JayF | that's a good thought as well | 22:20 |
iurygregory | maybe they need to reduce when they run things | 22:28 |
iurygregory | at least would help a bit, or have jobs in experimental | 22:29 |
iurygregory | some points we should consider when talking with them | 22:29 |
JayF | Honestly it plays into something else I was going to suggest during bug triage topic at PTG | 22:32 |
JayF | if we had a rolling role, like cinder/neutron's bug deputy | 22:32 |
JayF | we could make "ensure CI is sane" part of that role too and try to reduce interrupts across the team (and maybe actually monitor/identify CI breakages when they happen to make them easier to root cause) | 22:32 |
JayF | I think the difficulty of something like that is you need real committment from folks to be able to dedicate time upstream periodically, and that can be tough | 22:33 |
TheJulia | they have been reporting broken for a while | 22:42 |
TheJulia | unfortunately | 22:42 |
JayF | who has been reporting broken? | 22:42 |
JayF | HPE? They fixed it very recently | 22:42 |
TheJulia | oh, did they finally fix it? | 22:42 |
JayF | like a 48 hour turnaround from an email from me, it wasn't that bad once I told em | 22:42 |
TheJulia | the merge conflict | 22:42 |
TheJulia | no, not bad at all | 22:42 |
JayF | I've had good experiences with them, I just have to poll for them myself | 22:42 |
JayF | they do not subscribe to the Ironic events feed so to speak :D | 22:43 |
TheJulia | yeah | 22:43 |
JayF | HPE CI is 100% passing, it was passing on Iury's firmware change, for instance | 22:43 |
JayF | in fact it passed even with the functional test working forever | 22:43 |
JayF | s/work/hang/ | 22:43 |
TheJulia | okay, I've just not seen it recently | 22:43 |
TheJulia | I just went looking for dell/FJ | 22:44 |
JayF | it's easy to get sorta notice-blind to it | 22:44 |
JayF | Dell/FJ is gone-gone-gone afaict | 22:44 |
TheJulia | yeah | 22:44 |
JayF | and have been... my entire PTL tenure, I believe? | 22:44 |
TheJulia | *looks* like both went gone gone pulled plugs at EOY | 22:44 |
JayF | so a year? | 22:44 |
JayF | yeah that more or less lines up | 22:44 |
TheJulia | FJ commented on stuff stuff months later it looks like, but kind of hard to hunt down the accounts | 22:44 |
TheJulia | dell was definitely end of year, I think FJ might have been up until May | 22:45 |
JayF | In a weird way, it's sorta a victory for Ironic and hardware? | 22:45 |
JayF | because like, the third party CI is not as needed, and it flipped at some point from being "make sure Ironic works right" to "make sure the hardware works right" | 22:46 |
JayF | with redfish I think we have a stronger sense that it's going to work | 22:46 |
TheJulia | yeah, sort of I guess. The challenge is folks out there default into $vendor thinking it is the best/perfect option | 22:47 |
JayF | I'm trying to think of the right way to think about this | 22:47 |
TheJulia | I think we need to work at providing feedback, in terms of "that won't work" or "you may want to try other driver | 22:47 |
JayF | yes | 22:47 |
JayF | exactly | 22:47 |
JayF | which makes me think of like, QVLs from motherboards | 22:47 |
JayF | where like "most stuff shuold work; we promise this stuff works" with an actual list of model numbers, etc | 22:47 |
JayF | e.g. right now, someone comes to us and says "I have an HPE server, what driver do I use?" We need more information | 22:48 |
TheJulia | yeah, but nobody is going to actually do that for Open Source, so all we can do today is "detect and try to shunt" or "detect bad case, and provide guidance" | 22:48 |
JayF | a real process like that would probably be DMTF/redfish centered | 22:49 |
JayF | we test against a redfish standard way of doign things, hw vendors get certified good for redfish (or like; a matrix of features or whatever) | 22:49 |
JayF | I want it to exist but I don't think we're the people to do it :) | 22:50 |
TheJulia | but then what do we do about the $super_weird_vendory_thing | 22:50 |
JayF | How many customers care about $super_weird_vendory_thing? | 22:50 |
TheJulia | "insert only one token, with one ethernet port connected... port is magically cloned because *magic*, and deploy the thing" | 22:50 |
TheJulia | only $vendor can *really* tell us | 22:51 |
TheJulia | which means, the *right* way is the vendor works with us to begin to tear down their driver | 22:52 |
JayF | oh, I see what you mean now | 22:52 |
TheJulia | which sounds awful, but maybe that is the right path | 22:52 |
JayF | I thought you meant like, vendor features outside of redfish | 22:52 |
JayF | you mean deconstructing $brandName drivers into the redfish driver | 22:52 |
TheJulia | oh, no, like... FJ virtual media *has* to use SMBFS | 22:52 |
TheJulia | there is no other option, which makes it wholesale incompatible with stock redfish-virtual-media | 22:53 |
JayF | fj virtual media mounts a samba share?! | 22:53 |
TheJulia | well, we don't manage it, but that is the design default in their firmware | 22:53 |
TheJulia | you configure a location AIUI, it gets the filename from there | 22:53 |
JayF | we'd almost have to like ... have some concept of "quirks" and mapping those quirks to hardware automatically or manually | 22:54 |
JayF | like maybe in this world, you point an FJ server at the redfish driver, it does some kind of redfishy "what are you?" question, we activate it into "fj-virtual-media-quirk" mode, and put that in like node[driver_info][quirks] (or make a top level idea for it) | 22:57 |
JayF | try to detach the idea of how the hardware is weird from who mnfr'd that hardware | 22:57 |
TheJulia | I think the special driver features likely need to be cataloged as well. Like FJ wants to do a new driver, but I don’t *think* they have posted a spec yet. | 23:01 |
TheJulia | Oh, posted 2 weeks ago | 23:03 |
TheJulia | Err a week ago | 23:03 |
TheJulia | Everything is scrambled for me | 23:03 |
JayF | yeah, I'm taking my EOD now, I'll see you in the morning | 23:07 |
JayF | stevebaker[m]: if you're working NZ TZ tomorrow and want to do a solid, please recheck https://review.opendev.org/c/openstack/networking-generic-switch/+/874793 https://review.opendev.org/c/openstack/networking-generic-switch/+/886405 https://review.opendev.org/c/openstack/networking-generic-switch/+/847592 once CI fix merges | 23:07 |
JayF | thank you in advance o/ | 23:08 |
stevebaker[m] | I'm on it | 23:08 |
TheJulia | Heh, the FJ driver spec says they will use third party ci. | 23:15 |
opendevreview | Merged openstack/ironic-specs master: Fix linter error in past spec which blocks new spec https://review.opendev.org/c/openstack/ironic-specs/+/893576 | 23:21 |
TheJulia | vanou: o/ | 23:23 |
opendevreview | Merged openstack/networking-generic-switch master: CI fix: Use the un-deprecated v3 etcd API https://review.opendev.org/c/openstack/networking-generic-switch/+/895973 | 23:25 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!