Wednesday, 2023-07-19

fricklerdansmith: gmann: kopecmartin: seems 886795 causes some regression, see e.g. https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/88878706:00
opendevreviewMaxim Sava proposed openstack/tempest master: Add image task client and image tests task APIs.  https://review.opendev.org/c/openstack/tempest/+/88875506:09
opendevreviewMaxim Sava proposed openstack/tempest master: Add image task client and image tests task APIs.  https://review.opendev.org/c/openstack/tempest/+/88875506:10
opendevreviewDr. Jens Harbott proposed openstack/devstack master: Revert "Set two different image in tempest irespective of DEFAULT_IMAGE_NAME"  https://review.opendev.org/c/openstack/devstack/+/88865006:26
opendevreviewyatin proposed openstack/devstack master: Handle more than 1 image while configuring tempest  https://review.opendev.org/c/openstack/devstack/+/88890606:47
ykarelfrickler, ^06:47
fricklerykarel: doh, lgtm, I always struggle with these advanced bash features, too. do you want to verify with a depends-on? else I'll just approve at once06:55
ykarelfrickler, sure can send a test patch, but let me first update patch to also add a break in for loop06:57
opendevreviewyatin proposed openstack/devstack master: Handle more than 1 image while configuring tempest  https://review.opendev.org/c/openstack/devstack/+/88890607:00
ykareltesting in https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/88890707:02
fricklermeh, next CI failure. /me fetches the force-merge hammer11:34
opendevreviewMerged openstack/devstack master: Handle more than 1 image while configuring tempest  https://review.opendev.org/c/openstack/devstack/+/88890611:41
sean-k-mooneyi just found something fun 13:15
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/8525a961871c4602bedf3df605f72791/log/controller/logs/screen-n-cpu.txt#6033813:15
sean-k-mooneysom eof our job logs are large enouch that they will crash your browser tab13:15
sean-k-mooneyi guess thats what happens if you try to mark up a 2.4MB text file in a browser tab13:22
dansmithgmann: I'm seeing a test fail with an unexpected host key, but it's after a rebuild14:46
dansmithAFAIK, post-rebuild the host key *should* be different, so I'm not sure how the ssh client in tempest would know what it *should* be until it has ssh'd in once14:46
dansmithis that not correct?14:46
dansmithhmm: ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())14:49
dansmithperhaps we don't clear the ssh key after the rebuild, or race to do that?14:50
dansmithhmm, I wonder if we start ssh polling before the rebuild is finished, connect once but don't fully execute the command, but have grabbed the host key, then when the rebuild finishes, the polling loop continues and finds the wrong host key14:52
dansmithkopecmartin: does that make sense?14:53
kopecmartinsean-k-mooney: my browser can open that fine, it's slow, but it works .. that's because of the UI i think, if you open the raw file, it works well https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_852/887255/5/check/tempest-integrated-compute/8525a96/job-output.txt .. although you can't send a link to a specific line , hmm, each has its pros and cons15:18
kopecmartindansmith: no idea, what test are we talking about?15:20
dansmithkopecmartin: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_84d/879499/10/check/nova-multi-cell/84d38ab/testr_results.html15:24
dansmithkopecmartin: if you notice there, it goes to post-rebuild ssh to the server, a couple failed attempts that take a while, and then starts complaining about the host key being wrong15:26
dansmithlooking at lib/ssh I think that likely means we've successfully gotten the host key in one iteration of the poll loop and then subsequent ones fail after the rebuild has completed and the host key is (obviously) different15:27
dansmithI've seen two of these failures this morning15:27
dansmithso I'm thinking we need to wait for REBUILD state before we wait for active to make sure we don't race through and start polling before the rebuild starts, maybe15:28
dansmithalthough I dunno, seems like the transition to REBUILD is synchronous with the request call so we should stall on the wait for active15:29
kopecmartinhm, we're rebuilding to an alt_image .. it used be the same, but we changed that and now the alt_image is different than image - partially thanks to this? https://review.opendev.org/c/openstack/devstack/+/886795 .. so maybe that's the cause? question would be what we wanna do about it15:36
kopecmartini'm sorry if i'm stating obvious, had already a couple of drinks :D .. celebrating today15:37
dansmithI don't think so .. the image shouldn't have a host key in it, that should be generated on first boot15:38
dansmithhowever, after rebuild the host key should be regenerated of course15:39
dansmithbut yeah, the correlation with the image change in devstack does seem ... hard to ignore15:39
dansmiththe ssh client is complaining about the host key being different than what it expects, but we don't pass it into the client, so I think the only thing that it could be picking up on is the ssh key being different from one attempt to the other in a polling loop15:40
kopecmartini'm comparing the the server's metadata  before and after the rebuild, the name is different , the image is different, and all that is consistent with the code 15:47
kopecmartini can try to run it and put a few breakpoints there, but only tomorrow 15:47
dansmithokay I think we're not on the same page here :)15:51
dansmithI think gmann likely has more context here.. we'll see when he pops up ;)15:52
*** gthiemon1e is now known as gthiemonge15:59
gmanndansmith: checking 17:17
gmanndansmith: there might be chances that rebuild is not finish before ssh, let me check test code17:20
gmanndansmith: no, we wait for rebuild to finish and server to be active before trying ssh https://github.com/openstack/tempest/blob/180717d3833b6e0f89c3aa8b34b369f4cccf69fd/tempest/api/compute/servers/test_server_actions.py#L22917:21
gmanndansmith: may be this is issue? we pass the 'server' response in this ssh which we get before rebuild so that is obsolete 'server' response  https://github.com/openstack/tempest/blob/180717d3833b6e0f89c3aa8b34b369f4cccf69fd/tempest/api/compute/servers/test_server_actions.py#L32517:22
gmannwhich is not just to get ip but here too https://github.com/openstack/tempest/blob/180717d3833b6e0f89c3aa8b34b369f4cccf69fd/tempest/api/compute/servers/test_server_actions.py#L32917:23
dansmithright, but.. I'm not sure how that would cause this17:23
dansmithagree that we're waiting for active, which should only happen after the rebuild is finished since the rebuild sets the task state17:23
dansmiththe ip should not change across a rebuild and there's an assert in the inner rebuild test helper that verifies that17:25
gmannyeah, and in other place it is used to log console output only so yes this should not be issue17:25
dansmithso I just can't really see how we got here17:25
dansmithwe're also hitting this quite a bit lately, and I see it almost 100% locally:17:28
dansmithhttps://zuul.opendev.org/t/openstack/build/94b65655fe6643aa9ec61ce61a5d3c7517:28
dansmithgouthamr: ^17:28
gouthamro/ late response; weird19:18
gouthamrseeing it consistently now in the past few builds: https://zuul.opendev.org/t/openstack/builds?job_name=nova-ceph-multistore&project=openstack/nova 19:19
dansmithit doesn't fail for me on fedora 37 nor macos, but does on jammy, all coming from the same local network19:22
dansmithand I've tried updating my ca-certificates package in case it's relevant19:23
dansmithcurl -k works, as expected19:24
gouthamr^ same; no wonder.. i was trying to find evidence in a different job, but that runs on centos-9-stream19:25
dansmithit's not failing all the time.. just this morning after a failure, I rechecked and it passed check then failed again in gate19:28
dansmithso it doesn't seem to be 100% reproducible19:28
dansmithit does, however, seem like maybe we should be doing something better than just curl'ing a raw file out of a repo. Like maybe we should just clone the ceph repo and run it out of there?19:29
gouthamrhttps://bugs.launchpad.net/ubuntu/+source/curl/+bug/202817019:30
dansmithI know that'll be a lot bigger, but...19:30
dansmithhah19:30
dansmithI think that bug is wrong (and is marked invalid now) since I was using curl from months ago this morning when testing19:31
gouthamrmarked invalid for everything but jammy?19:32
dansmithoh?19:32
dansmithokay I see19:32
gouthamryeah, they released a fixed package a few hours ago from what i can tell19:32
dansmithis curl'ing the cephadm thing direct from the ceph tree really their recommendation? I dunno what the policy is for landing stuff there, but seems like it could be an easy thing for us to trip on if there's a bad commit there19:33
gouthamryes, because ceph is a really huge repo19:34
gouthamrhttps://docs.ceph.com/en/latest/cephadm/install/#curl-based-installation19:35
gouthamrour aversion to use a distro package lest we miss some bugfix :P 19:35
gouthamrbut, the location's updated in those docs ^19:35
dansmithyeah, I just wish there was a github release we could fetch or something19:36
dansmithack19:36
* gouthamr will push up a fix19:36
dansmiththat looks more cultivated than just-what-is-in-the-tree19:36
dansmithcool19:36
gouthamrweird to put this under "rpm-${CEPH_RELEASE}/el9/noarch/" cephadm is a python file, and the distro here wouldn't matter19:40
opendevreviewGoutham Pacha Ravi proposed openstack/devstack-plugin-ceph master: Update location of cephadm script  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/88895219:47
opendevreviewGoutham Pacha Ravi proposed openstack/devstack-plugin-ceph master: Update location of cephadm script  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/88895219:48
opendevreviewBrian Haley proposed openstack/devstack master: Always set image_uuid_alt in configure_tempest()  https://review.opendev.org/c/openstack/devstack/+/88895320:08
haleybfrickler: ^^ that change is needed as well for tempest image change, just verified it fixed my stack.sh issue locally20:09
gmannhaleyb: thanks. i thought image_uuid is not set in case of single image but your fix lgtm, 1 comment in the gerrit20:21
haleybgmann: only image_uuid is ever set anymore, https://review.opendev.org/c/openstack/devstack/+/886795/4/lib/tempest remove setting of image_uuid_alt. Running stack.sh without this fails locally, gates must be doing something special20:23
haleybi added similar debugging locally and it was always ""20:24
gmannhaleyb: ok. I agree with setting the alt uuid also always. one comment on your change to check if that exist or not otherwise lgtm20:26
haleybi'm adding a response20:26
haleybgmann: see my comments, but adding a check doesn't get much since we know it's not set as that code was removed :(20:32
gmannhaleyb: but how it is set as "" and if condition is not letting to set it to proper uuid in you local env20:32
haleybgmann: it can only be "" if image_uuid is also "", which can't happen in the single image case. sorry i don't understand. it took me an hour of debugging to write that one line of code :)20:35
gmannhaleyb: ok but as we are removing setting of alt based on image_uuid let's set it separately and under if condition. if anyone setting it to "" that we need to fix because there might be valid case of setting it to  "" somewhere 20:36
gmannso set it when it is not set at all. if any scenario does not want to set it to "" and it is set somewhere we can fix that20:37
haleybit will blow up on L302 call if "" in image_size_in_gib20:39
opendevreviewJakub Skunda proposed openstack/devstack stable/yoga: git: git checkout for a commit hash combinated with depth argument  https://review.opendev.org/c/openstack/devstack/+/88875220:41
gmannhaleyb: but there should not be case where it is set as "" right ?20:42
gmannwith you change as you are removing it setting at L25220:43
haleybgmann: right, not with my change, without it and a single image it's always "" though. Even the check on L258 in the multi-image case can go away, as it's not been set there either.20:44
haleybhttps://review.opendev.org/c/openstack/devstack/+/886795/4/lib/tempest changed the behavior completely but the code seems wrong looking at it20:45
gmannhaleyb: need to go for my appointment, will come back and tty in an hr or so20:46
haleybgmann: ack, it is close to eod here so might not respond until tomorrow20:47
dansmithgouthamr: ah, I guess devstack is updating my curl package, which explains why I see it during/after devstack20:53
gouthamrdansmith: oh.. ack; the change (https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/888952) is cool to have despite the curl update; i can see it working on the jammy job we have https://zuul.opendev.org/t/openstack/stream/1b90f46c303b4ed2acdfad4075d504f7?logfile=console.log20:57
dansmithack, my mirror just got the curl update and I can confirm it works also20:58
dansmithI've been rechecking something all week (>:() so I'm hoping I might get it through now...20:58
dansmithunfortunately that console log has a test failure in it21:00
dansmithI'll look at the details when it finishes21:00
gouthamr++21:03
gouthamr> I've been rechecking something all week -- :( 21:03
dansmithyeah, it's sucking my will to live :/21:12
gmannhaleyb: I checked and could not find where it can be set as "" but anyways setting it always in case of single image in glance is ok. approved your changes22:31

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!