*** mhen_ is now known as mhen | 02:16 | |
opendevreview | Biser Milanov proposed openstack/tempest master: test_instances_with_cinder_volumes: Make sure attachments in guest are correct https://review.opendev.org/c/openstack/tempest/+/939633 | 12:53 |
---|---|---|
sp-bmilanov | Hi openstack-qa, can someone have a look at this bug and the proposed change? It has been causing some (a non-trivial amount) of our CI runs to fail https://bugs.launchpad.net/tempest/+bug/2095336 https://review.opendev.org/c/openstack/tempest/+/939633 | 13:03 |
opendevreview | Biser Milanov proposed openstack/tempest master: test_instances_with_cinder_volumes: Make sure attachments in guest are correct https://review.opendev.org/c/openstack/tempest/+/939633 | 13:46 |
frickler | the failure I'm seeing most often is cleanup failing during volume tests, but maybe making sure the attachments worked properly in the first case would help, too ^^ cc gmann sean-k-mooney | 13:57 |
sean-k-mooney | one or two of the volume test still wait for active rather then sshable which may also be a factor | 14:41 |
sean-k-mooney | frickler: for exampel the test ehy are modifying https://review.opendev.org/c/openstack/tempest/+/939633/2/tempest/scenario/test_instances_with_cinder_volumes.py#151 | 14:43 |
sean-k-mooney | any tests that attaches a server at run time should wait for sshabel before trying to do the attach | 14:44 |
frickler | sean-k-mooney: yes, seems that this wasn't don't done consistently everywhere, which surprises me a bit | 15:34 |
sean-k-mooney | we got most of the placese a few cycles ago but we defintly missed a few | 15:35 |
frickler | ok, so next time I stumble across some volume error, I'll try to remember to check the test that is failing for this | 15:38 |
gmann | yeah we have some places where sshable server might be needed. We have done a lot of them especially volume attach/detach one in past and then did case by case later | 18:01 |
*** haleyb_ is now known as haleyb | 18:03 | |
gmann | sean-k-mooney: frickler but in this case, test does wait for the sshable server https://review.opendev.org/c/openstack/tempest/+/939633/2/tempest/scenario/test_instances_with_cinder_volumes.py#169 | 18:04 |
gmann | there is something else causing issue | 18:04 |
sean-k-mooney | its waiting after the attachment is done | 18:08 |
sean-k-mooney | we need to wait for it to be sshable before call " self.nova_volume_attach" | 18:08 |
sean-k-mooney | moving the volume attachment to line 174 as is done in the patch | 18:09 |
sean-k-mooney | may be sufficent but ti woudl be better to explictly have a wait for sshable to make it clear that is what happening | 18:09 |
sean-k-mooney | does constucting the linux clinet on line 169 actully ensure it sshable before it returns? | 18:10 |
gmann | sean-k-mooney: in this change volume attachment is moved after ssh access right | 18:10 |
sean-k-mooney | its moved after the clinet is constucted | 18:11 |
sean-k-mooney | is that enouch to try and ssh in want wait? | 18:11 |
sean-k-mooney | i woudl not expect get_remote_client to do that but if it does then with the chage applied that enough | 18:12 |
gmann | get_remote_client is what ssh confirmation | 18:12 |
gmann | which is this guy https://github.com/openstack/tempest/blob/master/tempest/lib/common/utils/linux/remote_client.py#L118 | 18:12 |
sean-k-mooney | validate_authentication | 18:13 |
gmann | this is what we so for wait=SSHABLE case also | 18:13 |
gmann | we do | 18:13 |
sean-k-mooney | ack | 18:13 |
sean-k-mooney | ok so its validated as a sideffect fo concustiing the client | 18:13 |
sean-k-mooney | then y athe current patch is suffeicnt | 18:13 |
sean-k-mooney | its less obvious to me then an explict wait but it will do what we want | 18:14 |
gmann | yeah, its a lengthy test | 18:16 |
sean-k-mooney | the part that im still not conviced by it using dev_name | 18:18 |
sean-k-mooney | we shoudl not rely on the dev_name hint as that all it is | 18:18 |
gmann | sean-k-mooney: sure, maybe we can change that in separate change, but good to leave comment there | 18:21 |
sean-k-mooney | im leaving one now but that depenency is being added by this patch | 18:21 |
sean-k-mooney | it did not exist before so to me it intorducitng a bug | 18:21 |
gmann | I see, make sense, will remove my +2, we should fix that | 18:23 |
sean-k-mooney | i left a comment, you know tempet way better then i so ill defer to your judgement on that but i would be inclind to do the ordering change in the intial patch and then have a followup for the extra assertion | 18:24 |
sean-k-mooney | im not sure what the best way to write that assetion is given we do not have the udev rule for disk/by-id in cirros | 18:24 |
sean-k-mooney | but i woudl assuem blkid or other /sys locations provide the same info? | 18:25 |
sean-k-mooney | althoguh cirros does use the busybox blkid so it may not provide that info | 18:25 |
sean-k-mooney | its been a while since i looked at athat in any detail | 18:25 |
gmann | I am ok to do these extra verification as we have remote ssh client in tests (especially scenario tests) but agree with you that it should handle all cases. Also, I am ok to separate these two things or fix both together, let's wait for author. | 18:32 |
sean-k-mooney | ack | 18:33 |
sean-k-mooney | part of my concern is the name of the volume depend both on how the device is attached, the order and which bus is used. i.e. vd* if its virtio, sd* if its sata or scsi, im not sure if the hint reflect that or not off the top of my head. | 18:35 |
sp-bmilanov | sean-k-mooney: gmann: we have a truncated serial in /sys/block/vdX/serial, and in theory, the Tempest code can verify those. That was one of my initial implementation options, but it is more complex than attaching devices one by one and making sure we have the expected dev inside the instance | 18:54 |
sp-bmilanov | I still do not know the root cause of the issue, but it seems to be a race that gets taken care of when doing the attachments serially | 18:55 |
sp-bmilanov | also, for some reason, blkid, and lsblk do not show serial IDs for virtio devices | 18:57 |
sean-k-mooney | we cant rely on the serial either in all cases | 18:57 |
sean-k-mooney | for type=lun devices it may or may not match the cinder volume uuid | 18:57 |
sean-k-mooney | i guess we decied to assuem sda in https://review.opendev.org/c/openstack/tempest/+/918457/2/tempest/api/compute/volumes/test_attach_volume.py | 18:58 |
sp-bmilanov | right, so, ultimately if we don't want to use /sys/block/vdX/serial, there's no way to confirm the mapping? also, I am going to split the commits just now so its one for the reorder and one for the extra check | 19:01 |
sean-k-mooney | i think, if we are to assert the mappign we woudl ideally try using the serial and fall back to the name but im not sure if that is really rquired in the tempest tests | 19:03 |
sean-k-mooney | if i was to do this genericlly, i would be included to ssh into the vm and list all the blk devices, then attach a voluem and assert an addtional entry exists (extracting it name) and then on detach assert that the orginal set was restored | 19:04 |
sean-k-mooney | but again that really depend on what your are trying to test | 19:05 |
opendevreview | Liron Kuchlani proposed openstack/tempest master: Add test to verify instance bootability after volume snapshot https://review.opendev.org/c/openstack/tempest/+/939329 | 19:06 |
opendevreview | Biser Milanov proposed openstack/tempest master: test_instances_with_cinder_volumes: Make sure attachments in guest are correct https://review.opendev.org/c/openstack/tempest/+/939633 | 19:08 |
opendevreview | Biser Milanov proposed openstack/tempest master: test_instances_with_cinder_volumes: Delay attaching volumes https://review.opendev.org/c/openstack/tempest/+/939663 | 19:08 |
opendevreview | Biser Milanov proposed openstack/tempest master: test_instances_with_cinder_volumes: Delay attaching volumes https://review.opendev.org/c/openstack/tempest/+/939663 | 19:09 |
opendevreview | Biser Milanov proposed openstack/tempest master: test_instances_with_cinder_volumes: Make sure attachments in guest are correct https://review.opendev.org/c/openstack/tempest/+/939633 | 19:09 |
sp-bmilanov | > but again that really depend on what your are trying to test | 19:18 |
sp-bmilanov | I want to make sure that the actual device inside the guest matches what libvirt and OpenStack think the device should be, your approach with the pre- and post- attach checks seem best, I can refactor the change if nobody else minds | 19:18 |
sp-bmilanov | s/libvirt and OpenStack think/libvirt and OpenStack (and hence the Tempest test) think/ | 19:19 |
opendevreview | Merged openstack/devstack master: Removing start_ovn_services call https://review.opendev.org/c/openstack/devstack/+/937606 | 20:05 |
sp-bmilanov | sean-k-mooney: thanks for the suggestions in the comments to the change, I will upload something tomorrow | 20:16 |
sp-bmilanov | btw, for 939663, at this point we don't care whether attach has completed or not, that is the point of the next commit that will verify the attachment inside the guest | 20:21 |
sp-bmilanov | hmm, or maybe waiting to verify the attach at the API level can give us more information if the test fails in some other way | 20:22 |
sp-bmilanov | yeah, that's fair | 20:22 |
opendevreview | Merged openstack/devstack master: Refactor readiness and custom config for ovn-nortd https://review.opendev.org/c/openstack/devstack/+/937039 | 23:23 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!