*** songwenping__ is now known as songwenping | 00:48 | |
sdmitriev | https://review.opendev.org/c/openstack/nova/+/760354/5 | 01:56 |
---|---|---|
sdmitriev | https://review.opendev.org/c/openstack/nova/+/710848/6 | 01:56 |
sdmitriev | https://review.opendev.org/c/openstack/nova/+/710847/8 | 01:57 |
sdmitriev | Hello there, just rebased these commits above, would really appreciate if someone could review them | 01:57 |
sdmitriev | I'm really interested in fixing that bug since deployments are pretty much affected | 01:57 |
*** bhagyashris_ is now known as bhagyashris | 06:21 | |
opendevreview | Pierre-Samuel Le Stang proposed openstack/nova-specs master: Implements: blueprint soft-delete-instance-actions https://review.opendev.org/c/openstack/nova-specs/+/821387 | 08:47 |
gibi | bauzas, sean-k-mooney: this is my last week before vacation and I will not be back until M2 so I'd like to get over with the placement spec https://review.opendev.org/q/topic:any-traits-support this week. | 09:42 |
bauzas | gibi: ack good point | 09:42 |
gibi | it was already approved a long time ago so I don't think it is controversioal | 09:42 |
bauzas | and reminder that we have our spec review day tomorrow | 09:42 |
gibi | yepp, tomorrow is good for me | 09:43 |
gibi | :) | 09:43 |
gibi | as a trade I'm willing to look at many reviews this week as I don't want to start any new on my side just closing down open things | 09:50 |
gibi | so hit me with reviews | 09:50 |
gibi | sean-k-mooney: I'm +2 on the recent version of the instance action softdelete spec https://review.opendev.org/c/openstack/nova-specs/+/821387 | 11:03 |
gibi | bauzas: and easy stable/xena backport needing a second core (lyarwood is already on PTO) https://review.opendev.org/q/topic:%2522bug/1945310%2522+branch:stable/xena | 11:11 |
bauzas | gibi: done | 11:17 |
bauzas | gibi: I'm just reviewing pslestang's spec already :) | 11:18 |
sean-k-mooney | gibi: just reading back | 11:50 |
sean-k-mooney | ill look at the placement one shortly | 11:50 |
sean-k-mooney | gibi: i was pretty happy with the instance action spec so ill look at that again this morning since its been revised | 11:51 |
sean-k-mooney | gibi: by the way i dong thave +2 rights on placemt/placemsnt spec directory since its in tree but ill review it anyway | 11:52 |
sean-k-mooney | gibi: oh this is the spect that like yuou do required=in:TRAIT1,TRAIT2 | 11:56 |
*** dpawlik7 is now known as dpawlik | 11:56 | |
opendevreview | Wenping Song proposed openstack/nova master: Fill the AcceleratorRequestBindingFailed exception msg info https://review.opendev.org/c/openstack/nova/+/817326 | 11:57 |
gibi | sean-k-mooney: yes, that is in :TRAIT1,TRAIT2 | 12:07 |
sean-k-mooney | gibi: i have one open qustion/suggestion for that spec | 12:07 |
sean-k-mooney | gibi: https://review.opendev.org/c/openstack/placement/+/649992/5/doc/source/specs/yoga/approved/2005346-any-traits-in-allocation_candidates-query.rst#68 | 12:09 |
sean-k-mooney | oh i see the last spec is for mixing which is what i was asking us to support | 12:10 |
sean-k-mooney | the proposal for mixing will work for placment but now for nova/glance | 12:11 |
sean-k-mooney | i tink just using ; is simpler | 12:12 |
gibi | sean-k-mooney: it is almost directly following how member_of is modelled in the placement API | 12:24 |
gibi | so repeat means AND. OR expressed with in: prefix | 12:25 |
gibi | sure ';' could be used instead of repeat of the required query param but then we would create inconsistency with member_of | 12:26 |
opendevreview | Pierre-Samuel Le Stang proposed openstack/nova-specs master: Implements: blueprint soft-delete-instance-actions https://review.opendev.org/c/openstack/nova-specs/+/821387 | 12:32 |
sean-k-mooney | gibi: it would yes but the ofllow on proposal of using multiple &required=...&required=... | 12:33 |
sean-k-mooney | cant be modled in flavors or images | 12:33 |
sean-k-mooney | since we cannot support multi opts and merge them | 12:33 |
sean-k-mooney | so we will need to have some other way to mix them in the flavor/image | 12:33 |
sean-k-mooney | which is why i think either requiring all in parmater to be at the end or using ; makes sense | 12:34 |
gibi | the current falvor syntax is trait:HW_CPU_X86_AVX2=required , so it can extended to trait:inFOO,BAR=required | 12:35 |
opendevreview | Merged openstack/nova stable/xena: Reproduce bug 1945310 https://review.opendev.org/c/openstack/nova/+/811405 | 12:35 |
gibi | also trait:inFOO,BAR=required and trait:BAZ=required can be mixed | 12:36 |
sean-k-mooney | i guess that could work | 12:36 |
pslestang | gibi: thanks for reviewing the spec, I pushed an other patch to fix all the syntax error you underligned | 12:36 |
gibi | the flavor syntax is ugly | 12:36 |
gibi | but that is a different story | 12:36 |
gibi | :D | 12:36 |
sean-k-mooney | gibi: this might be nicer trait:FOO,BAR=required_in | 12:36 |
gibi | sean-k-mooney: yeah, that could be another option | 12:37 |
gibi | the point is that we have a way to modell repetition as the key contains the trait name | 12:37 |
gibi | pslestang: I will check soon | 12:37 |
sean-k-mooney | gibi: ok the image/flavor represnetiaotn is out of scope of the placment spec anyway | 12:37 |
sean-k-mooney | so i think you have convinced me that this can work | 12:38 |
gibi | yes, but you had a point, if the flavor have had required=FOO syntax then we would be in deep trouble | 12:38 |
gibi | so it was a good excersize to see how flavor will fit | 12:38 |
sean-k-mooney | ya i forgot we reversted it | 12:38 |
sean-k-mooney | and didn trait:*=required | 12:39 |
sean-k-mooney | ill update my review with this conversation and change to a +1 | 12:40 |
gibi | sean-k-mooney: thanks | 12:41 |
gibi | pslestang: +2 | 12:41 |
sean-k-mooney | pslestang: ill review that sortly after i grab a coffee | 12:43 |
sean-k-mooney | i unexpectly have meeting for the next 4 hours but i should have time to review your spec before i have to join them | 12:43 |
sean-k-mooney | its almost as if there is pending holidays that people will be taking on mass and everything is beeing cramed into this week :) | 12:44 |
gibi | sean-k-mooney: I hope that 4 hour meeting is some light gathering before the vacation time to chill out and slow down :) | 12:44 |
pslestang | gibi: thanks | 12:44 |
plibeau3 | lyarwood: I have apply a part of proposition. I have put some question on patchset3. -> https://review.opendev.org/c/openstack/nova/+/820531 | 12:44 |
pslestang | sean-k-mooney: sure, don't worry | 12:44 |
gibi | pslestang: thank you for the nicely written spec! | 12:45 |
gibi | plibeau3: lyarwood is on PTO already | 12:45 |
gibi | plibeau3: so be prepared for slow response :) | 12:45 |
plibeau3 | gibi: oki :) | 12:46 |
opendevreview | Merged openstack/nova stable/xena: Query ports with admin client to get resource_request https://review.opendev.org/c/openstack/nova/+/811407 | 12:47 |
sean-k-mooney | bauzas: unless you want to to specifically look at the instance-action sepc im also not +2 on it and was going to +w | 12:54 |
sean-k-mooney | pslestang: thanks for adding the lines regardding restore | 12:55 |
pslestang | gibi: sean-k-mooney thanks! | 13:08 |
gibi | bauzas: I've responded in https://review.opendev.org/c/openstack/nova/+/820549 , could you upgrade your vote to +2? :) | 13:29 |
bauzas | gibi: sean-k-mooney: sorry, was afk for getting a 3rd jab | 14:04 |
gibi | bauzas: nice | 14:04 |
sean-k-mooney | bauzas: no worries i just +w'd the instance action spec | 14:04 |
sean-k-mooney | i can porbly pull it out of the gate if you want to review? | 14:04 |
sean-k-mooney | gibi and i are both +2 on it | 14:05 |
bauzas | sean-k-mooney: nah, eventually I just +wd it too | 14:07 |
bauzas | I'm OK with the way | 14:07 |
bauzas | so we wouldn't have any upgrade issues | 14:07 |
sean-k-mooney | ack | 14:11 |
opendevreview | Merged openstack/nova-specs master: Implements: blueprint soft-delete-instance-actions https://review.opendev.org/c/openstack/nova-specs/+/821387 | 14:16 |
gibi | gmann: due to https://review.opendev.org/c/openstack/nova/+/821423 the AZ related tempest tests cannot be run in parallel with tests that are booting instances (you can see many example failures in the run on that patch). Is there an easy way to separate these tests out? | 14:38 |
gibi | I see fixtures.LockFixture('availability_zone') in these tests, but I guess that only protects the AZ test from each other | 14:39 |
gibi | and it would be hard to add the same lock to all the other tests that boot instances | 14:39 |
gibi | gmann: I'm thinking about separating out the AZ tests to a separate tempest run command in all tox targets, but that is also a bit of a pain to do properly | 14:44 |
gmann | gibi: looking | 15:37 |
opendevreview | yuval proposed openstack/nova master: Lightbits LightOS driver https://review.opendev.org/c/openstack/nova/+/821606 | 16:15 |
jhartkopf | Hi there, I need another +2 on my patch: https://review.opendev.org/c/openstack/nova/+/813672 It's just a small fix. I'd appreciate if someone could take time for review. | 16:36 |
gmann | gibi: I got the problem, its tempest test also need to modify as per your change. | 16:46 |
gibi | gmann: what would be the good way to separate the AZ tests from the rest? | 16:47 |
gmann | gibi: test is trying to remove the host which has servers (bug you are fixing). you can see remove host request - https://zuul.opendev.org/t/openstack/build/5a9aad45498d45cda1eabd2b87b743d2/log/controller/logs/screen-n-api.txt#2268 | 16:47 |
gmann | and delete server request is after that https://zuul.opendev.org/t/openstack/build/5a9aad45498d45cda1eabd2b87b743d2/log/controller/logs/screen-n-api.txt#2568 | 16:47 |
gmann | gibi: I am fixing test to wait for server delete before it does remove host request | 16:48 |
gmann | there is no race condition here its just tests did not wait for server to be deleted before remove_host is called in cleanup | 16:48 |
gibi | gmann: OK, if that is in a single test case I can fix that up. But I think there are cases when we have two parallel tempest tests 1) does some AZ testing by moving host between aggregates 2) boot instance on host. If this two happens in parallel then the 1) will fail after the bugfix | 16:49 |
gmann | gibi: I can check those if there is any such case. I think that is only test we have to add/remove host to agg with server | 16:50 |
gibi | OK, I will gather the full picutre and fix up the cleanup issue | 16:50 |
gmann | gibi: let me update test and add your change as depends-on to see if all pass. and later we can merge the temest fix before nova change as per bug fix procedure | 16:50 |
gibi | gmann: sure, if you have time then I can appreciate the tempest fix | 16:51 |
gmann | those tempest test play on same host but we have only single tests doing server boot but I will check if we have any case like race condition you explained | 16:51 |
gmann | sure | 16:51 |
gibi | gmann: I think test_aggregate_basic_ops failure in https://zuul.opendev.org/t/openstack/build/83aad9cdd5b849d48f4ffb463880d259/logs is a race | 16:53 |
gibi | as that test case does not boot a server | 16:53 |
gibi | but it fails as probably other parallel test booted one | 16:53 |
gmann | gibi: ohk, did not see that in latest run. let me check | 16:54 |
gmann | we should not need lock here instead test should check host with AZ etc carefully | 16:55 |
gibi | gmann: so mean skip the AZ test if the host has instance? | 16:58 |
gibi | or wait in the test until the host has no instnace? | 16:58 |
gmann | gibi: skip like we do here https://github.com/openstack/tempest/blob/7e96c8e854386f43604ad098a6ec7606ee676145/tempest/api/compute/admin/test_aggregates.py#L228 | 16:59 |
gibi | gmann: but would not that mean we sometimes run the test sometimes does not depending on which other parallel test runs at the same time | 16:59 |
gibi | ? | 16:59 |
gibi | so in theory we might lose test coverage by that | 17:00 |
gmann | gibi: i mean skip if there is no such host in env. test creating instance has to be cleaned up so we can add wait or so | 17:00 |
gmann | gibi: ah, we do have lock there but not sure why race is happening https://github.com/openstack/tempest/blob/master/tempest/scenario/test_aggregates_basic_ops.py#L119 | 17:02 |
gmann | gibi: ohk but any other non agg test might be picking that host and creating server | 17:02 |
gibi | gmann: exactly ^^ | 17:03 |
gibi | so the lock only avoid two AZ test to race | 17:03 |
gmann | yeah | 17:03 |
gibi | can we somehow make the AZ tests always run _after_ all the other tests? | 17:04 |
gmann | and skip if host has server also does not solve the issue as server can be created in between | 17:04 |
gmann | gibi: we can configure that way in our CI but as tempest run it is still bug that this test canot be run in parallal | 17:05 |
sean-k-mooney | we do run the senario test after all the others in serial | 17:06 |
sean-k-mooney | so we could put the az test in that group | 17:07 |
sean-k-mooney | at least for tempest-full | 17:07 |
sean-k-mooney | https://github.com/openstack/tempest/blob/master/tox.ini#L112-L113 | 17:07 |
sean-k-mooney | we could exclude the az test the same way and run them with the senario and slow tests | 17:07 |
sean-k-mooney | its not the ideal solution but its an option | 17:08 |
gmann | gibi: one best we can do in tests 'check for host has no servers and then only add/remove tests' but that does not solve race completly | 17:08 |
sean-k-mooney | gmann: we donbt really want to make the dest depend on thing that are dynamic like that really | 17:09 |
gmann | gibi: or we can add try except and if conflict error due to server present then skip the test otherwise pass | 17:09 |
gibi | sean-k-mooney: you mean AZ test would be a 3rd group? | 17:09 |
gibi | to separate them from all the others | 17:10 |
sean-k-mooney | gibi: not entirly just put them with the senario tests | 17:10 |
sean-k-mooney | we could have them be a third group | 17:10 |
gmann | otherwise we have only choice of remvoe the test | 17:10 |
sean-k-mooney | but only if we could run them in paralle in that group | 17:10 |
gmann | sean-k-mooney: that does not solve the issue, it is just our CI fix | 17:10 |
gmann | anyone running tempest will face the same issue | 17:10 |
sean-k-mooney | gmann: right but the only way to solve the issue it so use a lock really | 17:11 |
gibi | sean-k-mooney: does the scenario / show runs in a single thread? if so we can add the AZ test there but if those also run in parallel then the same race can happen there too | 17:11 |
gmann | sean-k-mooney: we cannot use lock as it end up making all tests running serially | 17:11 |
sean-k-mooney | gibi: for tempest full its serical so one thread yes | 17:11 |
gmann | gibi: yeah that is happening now. | 17:11 |
gmann | scenario test also create server. | 17:12 |
gibi | yeah, locking means we basically need to lock agains any test that boots an instance, that is pretty restrictive. | 17:12 |
gmann | yeah | 17:12 |
sean-k-mooney | gibi: not quite | 17:12 |
sean-k-mooney | yes we do but the locking is slightly different | 17:13 |
gmann | it end up running tempest in serial | 17:13 |
sean-k-mooney | basiclly we need somihing like a reader writer lock | 17:13 |
sean-k-mooney | the aggreate tests need to grab the writer lock but the boot test need only a reader lock | 17:13 |
gmann | I see these are good option if we want to keep tests - 1. test will 'check for host has no servers and then only add/remove tests' 2. add try except and if conflict error due to server present then skip the test otherwise pass | 17:14 |
sean-k-mooney | so the build test can run in parallel with the read lock until a aggreate test try to modify thing in which case it grabes the writer lock | 17:14 |
gmann | we do have same kind of try except for image tests too for some cases, that is best tempest can test these APi race operations | 17:16 |
gmann | and which matches to the real operation scenario | 17:16 |
sean-k-mooney | perhaps but i still think a reader/writere lock would be a good fit here | 17:16 |
gmann | sean-k-mooney: but boot tests having reader lock will still create server on host as aggregate test having write lock. correct? | 17:17 |
gmann | if we want to isolate them the it has to be same lock | 17:18 |
sean-k-mooney | it depend on the sematics of the lcok. you can have it perfer reads so that you cant get the write lock until all readers relese | 17:18 |
sean-k-mooney | https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock | 17:18 |
gmann | yeah which is same thing right. end up running in serial as per many current boot tests | 17:19 |
sean-k-mooney | well we have to run in serial | 17:19 |
gmann | let me try the above check and then we can see how it looks like | 17:19 |
sean-k-mooney | we can run in parallel only if aggreates are not being changed | 17:19 |
sean-k-mooney | if they are then only that test can happen as its global state | 17:20 |
sean-k-mooney | we actully need "Write-preferring RW lock" semantics if we were to use this correctly | 17:21 |
gibi | as we have a small amount of AZ test I still think separating them out to the end would be easier than addin a reader lock to each test that boots an instance | 17:22 |
gibi | of course if such separation is possible | 17:22 |
sean-k-mooney | gibi: yep it would be but it need to be done by teh caller of tempest | 17:22 |
sean-k-mooney | we can do it via tox with the regex support | 17:22 |
sean-k-mooney | but every client would have to do that | 17:23 |
gibi | sean-k-mooney: then we might need to invent someting inside tempest to order cases? | 17:23 |
sean-k-mooney | unless we add a new feature to tempest to tag things as "must be serial" | 17:23 |
gibi | yeah | 17:23 |
gibi | like that | 17:23 |
sean-k-mooney | if we could do that as a decorator that would be nice | 17:23 |
gibi | OK, so we have couple of options a) RW lock b) @serial decorator | 17:23 |
sean-k-mooney | i just dont know if we can. i suspect its posible | 17:23 |
sean-k-mooney | c) tox.ini for now | 17:24 |
gibi | yeah temporarily tox.ini but I don't really like that | 17:24 |
sean-k-mooney | its what we have done for senario tests for years | 17:24 |
gmann | yeah we should not do that as other tempest user still will face the error | 17:24 |
sean-k-mooney | it works but you have to know to do it | 17:24 |
gibi | I will sleep on this. I don't need to rush to land the nova bug fix so we have time | 17:25 |
gmann | sean-k-mooney: we did that only for env/timeout restriction they can be run in parallel like we do in tempest-parallel job | 17:25 |
gibi | thanks for the input | 17:25 |
sean-k-mooney | gmann: they can if you have enough resouces in the could yes | 17:25 |
gibi | and I will read back if you still continue discussiing this | 17:25 |
gibi | :) | 17:25 |
gmann | sean-k-mooney: here we are making tempest cannot be run in parallel so we have to fix test not the way we run those | 17:25 |
sean-k-mooney | we do it because with our default concurance tehy wont fit in the ci vms | 17:26 |
sean-k-mooney | gmann: yep i understand | 17:26 |
sean-k-mooney | however if this is a gate blocker it better to unblock the gate and then fix it properly | 17:26 |
gmann | gibi:sean-k-mooney let me add test modification with checks and try/except. and later we can see if tagging a tests to run serial can be done or not | 17:26 |
sean-k-mooney | ack | 17:26 |
gmann | sean-k-mooney: this is not gate blokcer but change need with nova fix | 17:27 |
gmann | new bug fix in nova and so does tempest tests need fix | 17:27 |
sean-k-mooney | this is for stephens bug fix to prevent you updating aggreate when there are isntance on the host? | 17:29 |
gibi | sean-k-mooney: yes | 17:30 |
sean-k-mooney | that was previously ment to be blocked but i assume there was an edge case we missed | 17:30 |
gmann | yeah | 17:30 |
gmann | https://bugs.launchpad.net/nova/+bug/1907775 | 17:30 |
gibi | sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/821423 | 17:30 |
sean-k-mooney | ok that was "fixed" for a different edgecase many releases ago | 17:30 |
sean-k-mooney | so im surpised tempest has not hit this already | 17:30 |
gibi | sean-k-mooney: what was actually fixed? | 17:33 |
gibi | sean-k-mooney: as this bug now has reproduction test | 17:33 |
gmann | yeah, we never added restriction for add/remove host having server | 17:34 |
gmann | that is what tempest tests is doing | 17:34 |
sean-k-mooney | is_safe_to_update_az | 17:35 |
sean-k-mooney | https://github.com/openstack/nova/commit/8e19ef4173906da0b7c761da4de0728a2fd71e24 | 17:35 |
sean-k-mooney | https://github.com/openstack/nova/commit/0ad5a64dc9ac4b1cfb8038f9171b6385fdf07f28 | 17:35 |
gmann | ohk, no two different AZ while adding | 17:35 |
sean-k-mooney | we currently block you form adding or removing the host to the host-aggreate that has the az metadata | 17:36 |
gmann | different AZ | 17:37 |
sean-k-mooney | well more when doign az update | 17:37 |
sean-k-mooney | but this code was ment to alos cater for this usecase | 17:37 |
gibi | hm the first one is aggregated update, that is different from add/remove host but yes that could race with instance boot | 17:38 |
gibi | the second is about one host can only be one AZ case | 17:39 |
gibi | that I think cannot race with instance boot | 17:39 |
sean-k-mooney | gibi: well yes and no. ading or removign a host form an aggreate is really an update too in the more general sense | 17:39 |
sean-k-mooney | my point was that https://review.opendev.org/c/openstack/nova/+/821423/1/nova/compute/api.py#6458 | 17:40 |
sean-k-mooney | should have been blocking this edgecacse already | 17:40 |
sean-k-mooney | but it does not hece the need for stephens change | 17:40 |
gibi | OK, I really need to drop | 17:42 |
gibi | for today | 17:42 |
gibi | I will think about it more tomorrow o/ | 17:42 |
*** tbachman_ is now known as tbachman | 17:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!