*** abhishekk is now known as akekane|home | 04:48 | |
*** akekane|home is now known as abhishekk | 04:48 | |
manuvakery1 | Hi .. if i take an instance snapshot after resizing to a higher flavor the min_disk property it set to the older flavor disk size. Is this a known issue in train? | 04:58 |
---|---|---|
sean-k-mooney | ade_lee: so first bit of feedback is the jobs is not captureing logs form comptue-0 only compute-1 | 05:31 |
sean-k-mooney | ade_lee: second bit of feedback is this is a know issue | 05:31 |
sean-k-mooney | 2022-06-27 10:32:42.103 2 ERROR nova.virt.libvirt.driver [-] [instance: 1fb29abc-c443-4404-81df-312b233d05ca] Live Migration failure: End of file while reading data: | 05:32 |
sean-k-mooney | We trust you have received the usual lecture from the local System | 05:32 |
sean-k-mooney | Administrator. It usually boils down to these three things: | 05:32 |
sean-k-mooney | #1) Respect the privacy of others. | 05:32 |
sean-k-mooney | #2) Think before you type. | 05:32 |
sean-k-mooney | #3) With great power comes great responsibility. | 05:32 |
sean-k-mooney | sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper | 05:32 |
sean-k-mooney | https://bugzilla.redhat.com/show_bug.cgi?id=2089520 | 05:33 |
sean-k-mooney | and its the same root cause | 05:33 |
sean-k-mooney | live_migration_uri=qemu+ssh://nova_migration@%s:2022/system?keyfile=/etc/nova/migration/identity&proxy=netcat | 05:33 |
sean-k-mooney | you are definign ^ | 05:34 |
sean-k-mooney | the live_migration_uri is depercated and we should not be using it downstream | 05:34 |
sean-k-mooney | but the actual issus is that that netcat at the ned if forceing netcat to be used in the live migration wrapper | 05:35 |
sean-k-mooney | so we are taking this if branch instead of the previous one https://github.com/rdo-packages/nova-distgit/blob/rpm-master/nova-migration-wrapper#L73-L75= | 05:35 |
sean-k-mooney | there are two issues with that first nc is not install and second the sudoers file only allow it to be used if you use the fully qualifed path | 05:36 |
sean-k-mooney | https://github.com/rdo-packages/nova-distgit/blob/rpm-master/nova_migration-sudoers#L3= | 05:37 |
sean-k-mooney | if nc was actully installed it might work as the command not found helper might eb trigering the sudo prompt so the patch might not be the problem | 05:37 |
sean-k-mooney | but the issue is its takign the netcat path | 05:37 |
sean-k-mooney | https://gitlab.com/libvirt/libvirt/-/blob/65312001bd972df8b7d4f11ea4662aff4889bee5/src/rpc/virnetclient.c#L446-448 | 05:43 |
sean-k-mooney | this is the relevent libvit code | 05:43 |
sean-k-mooney | proxy=netcat will not work on 17 we shoudl etierh use auto or force the virt-ssh-helper via the native proxy | 05:49 |
sean-k-mooney | change the migration uri to proxy=native | 05:49 |
sean-k-mooney | ade_lee: based on jame's comments and looking at the code this is a puppet-nova bug https://bugzilla.redhat.com/show_bug.cgi?id=2089520#c3 they obviouly never got the memo that using the uri directly has been deprecated since like osp 13 and that netcat was going away in 17 and they needed to stop forcing netcat and use either auto or native or stop generating the uri to let | 05:56 |
sean-k-mooney | libvirt use the virt-ssh-helper. | 05:56 |
sean-k-mooney | actully looking at puppet-nova this look liek its a ooo issue | 06:00 |
kashyap | sean-k-mooney: At least there seems to be upstream bug that says "fix released", but no patch there: https://bugs.launchpad.net/tripleo/+bug/1918250 | 06:01 |
sean-k-mooney | nice find | 06:02 |
kashyap | Yep, here it is (it's in Mar 2021): https://review.opendev.org/c/openstack/puppet-tripleo/+/779784/ | 06:02 |
kashyap | sean-k-mooney: Unrelated: You seem to be quite early up today | 06:02 |
sean-k-mooney | well | 06:02 |
sean-k-mooney | that might have been regressed | 06:02 |
sean-k-mooney | ya i woke up at 5:30 and didnt feel like going back to sleep | 06:03 |
sean-k-mooney | so i got up | 06:03 |
kashyap | Is there a new issue? | 06:03 |
kashyap | (I see) | 06:03 |
sean-k-mooney | live migration does not work on 17 period | 06:03 |
sean-k-mooney | well at least not with tls-e config | 06:03 |
kashyap | Is there a bz link? | 06:03 |
sean-k-mooney | not sure about with it disabled | 06:03 |
sean-k-mooney | kashyap: yes ill get it in a sec but the issue is that proxy=netcat is gettign generated in the migration uri | 06:04 |
frickler | are you sure you are in the right channel? this sounds like rdo talk | 06:04 |
sean-k-mooney | and netcat is not installed in the container since we should be using the virt-ssh-helper in 17 | 06:04 |
kashyap | frickler: Heh, I briefly wondered: if it's an upstream or a downstream-specific discussion | 06:04 |
sean-k-mooney | frickler: actully this was ment to be downstream but meh its a ooo bug | 06:05 |
sean-k-mooney | it was ment to be rhos-dev but i clicked the wrong tab | 06:05 |
sean-k-mooney | i have upstream on the top pane and downstream on the bottom | 06:05 |
gibi | o/ | 07:00 |
sean-k-mooney | gibi: o/ | 07:17 |
sean-k-mooney | frickler: so kashyap found https://review.opendev.org/c/openstack/puppet-tripleo/+/779313/ and i have filed the revert https://review.opendev.org/c/openstack/puppet-tripleo/+/847818 just to close the loop on the downstream topic i rasied upstream :) in case you were wondering | 07:21 |
sean-k-mooney | tl;dr is we forced netcat to workaround the lack of support in the rdo package for virt ssh-helper and since then netcat has been removed form teh ooo contianer | 07:22 |
sean-k-mooney | so we should just go back to auto now that we have support in the package | 07:22 |
whoami-rajat | hi #openstack-nova , would like to request reviews on my volume backend instance rebuild feature (3 patches in chain). the spec has merged and it was targeted for yoga cycle (but couldn't make it to the Feature freeze) so would like to get some early feedback | 07:23 |
whoami-rajat | https://review.opendev.org/c/openstack/nova/+/820368 | 07:23 |
sean-k-mooney | whoami-rajat: specs are approved per release so the imporant thing is that it was reappoved for zed | 07:33 |
whoami-rajat | sean-k-mooney, yep it was re-approved | 07:35 |
whoami-rajat | https://review.opendev.org/c/openstack/nova-specs/+/840155 | 07:36 |
sean-k-mooney | whoami-rajat: yep i remember | 07:36 |
sean-k-mooney | i have your review open. i need to do some downstream jira work quickly but ill review them when im done | 07:37 |
whoami-rajat | great, what i meant was it was approved last cycle as well but just wanted early reviews on it so i can make the deadline this time :) | 07:37 |
whoami-rajat | sean-k-mooney, thanks! | 07:37 |
frickler | sean-k-mooney: thx for the update. I really didn't want to drive you away, it just looked a bit out of context | 08:11 |
jkulik | Is there something in Nova that lets me express anti-affinity towards a group of hosts in the same rack? Use-case: we structure our HVs into racks and there might be customers not wanting their VM to run on the same HV (current anti-affinity I know about) and customers who don't want to run in the same rack as another VM, while staying in the same AZ. | 09:03 |
gibi | jkulik: I think there is no automatic way to express rack level (or host aggregate level) anti-affinity. What you can do is to create separate falvors for separate racks by referring the specific rack in the flavor | 09:06 |
gibi | but that does not scale well for many racks | 09:06 |
gibi | and many flavors | 09:06 |
gibi | I do remember discussing this on the last summit in berlin | 09:07 |
gibi | was it in the blazar session? | 09:07 |
gibi | https://etherpad.opendev.org/p/blazar-preemptible-and-gpus L64 | 09:08 |
gibi | or more like L57 | 09:08 |
sean-k-mooney | jkulik: we only supprot this at the az level | 09:11 |
sean-k-mooney | well no | 09:11 |
sean-k-mooney | at the host level | 09:11 |
sean-k-mooney | you can use AZs per rack | 09:11 |
sean-k-mooney | but we have no aggreate or az anti affintiy concept in nova | 09:11 |
sean-k-mooney | and its not simple to add | 09:12 |
gibi | sean-k-mooney: I had a suggestion in the above etherpad how to add it to nova | 09:12 |
sean-k-mooney | jkulik: if you want to isolate custoemr we have tenat isolation filters | 09:12 |
sean-k-mooney | gibi: we coudl do it via placment aggreate if we wanted too | 09:13 |
gibi | sean-k-mooney: nope | 09:13 |
sean-k-mooney | but i was just sayign we dont currently support it | 09:13 |
gibi | yes, we not currently support it | 09:13 |
gibi | the placement way would be harder than a nova way as placmenet aggregates has no metadata | 09:13 |
sean-k-mooney | we have also rejected it in the past as we did not want to add more orchestration to nova | 09:13 |
gibi | while nova aggregates has metadata | 09:13 |
sean-k-mooney | well nova aggreate are mapped to palcemnt aggreates | 09:14 |
gibi | without the metadata piece | 09:14 |
sean-k-mooney | but sure for the rack affinty that is doable | 09:14 |
gibi | we need the metadata to mark an aggregate as target for affinity/anti-affinity | 09:14 |
sean-k-mooney | az anti affintiy which was the otehr request cant be done that way but we can do rack/row/room antiaffinity with aggreate metadta and a filter | 09:15 |
sean-k-mooney | gibi: ya so if wee were to do this i would make the filter generic | 09:15 |
gibi | sure | 09:15 |
gibi | that was an afterthought for me as well | 09:15 |
sean-k-mooney | so that you can define a set of lables and then express the anti affintiy requirement in the flavor | 09:15 |
sean-k-mooney | like you do with ceph | 09:16 |
sean-k-mooney | and the palcment maps | 09:16 |
gibi | yep generic label based affinity/anti-affinity either via the flavor or via the server group API | 09:16 |
sean-k-mooney | yep | 09:16 |
gibi | so we only need some devs to propose a spec and then the implementation :D | 09:17 |
gibi | easy peasy :D | 09:17 |
sean-k-mooney | that i would be ok with but you could map the info to placment too perhaps as a step 2 | 09:17 |
gibi | sean-k-mooney: you mean extend the aggregate concept in placement with metadta? | 09:17 |
sean-k-mooney | no we could likely modle this with aggreates and custom_traits | 09:17 |
sean-k-mooney | i have not fully tought that out | 09:18 |
gibi | I don't like it, as then the trait needs to be on the all the PRs in the aggregate | 09:18 |
sean-k-mooney | but it feels like we shoudl be able to do that | 09:18 |
sean-k-mooney | well i was thinking more like how misc_share_via_aggrate works | 09:18 |
gibi | technically doable but I would extend the aggregate concept instead in placement | 09:18 |
sean-k-mooney | i would be fine with extendign aggreates too if we had a clean way to extend it | 09:19 |
sean-k-mooney | perhaps traits on aggretes or some other metadta but doing it in nova first is alot simpler as you said | 09:19 |
sean-k-mooney | so get it working (nova) then make it fast (placment) | 09:19 |
gibi | we are in agreement :) | 09:20 |
sean-k-mooney | jkulik: interested in working on ^ | 09:20 |
sean-k-mooney | jkulik: its not an uncommen request so we can also try an pitch it to our pm dowstream | 09:20 |
sean-k-mooney | but no promices they will go for it and we wont look at it until at least the A cycle in any case | 09:21 |
sean-k-mooney | jkulik: but if you had time to work on it we have 2 week to spec freeze and we could proably reivew it if you had somethign ready before code freeze | 09:21 |
sean-k-mooney | jkulik:realisticly though its proably an A or lather cycle thing | 09:22 |
sean-k-mooney | but it woudl be doable in an out of tree schduler filter today | 09:22 |
sean-k-mooney | at least if you took the flavor approch initaly to expres the affinity requiremnt | 09:22 |
sean-k-mooney | extending teh server group api woudl require an api change which need a spec and is not backportable due to how api versioning works | 09:23 |
sean-k-mooney | gibi: what was the blazar usecase/imporance of this | 09:24 |
sean-k-mooney | i assume some sort of ha/fault tollernace usecase | 09:24 |
sean-k-mooney | oh no performance | 09:25 |
sean-k-mooney | i.e. reserve three GPU VMs in the same rack, where possible (hard/soft affinity). | 09:25 |
sean-k-mooney | Specifically this is to reduce network latency (MPI/horovod), and unnessacery TOR switch network contention. | 09:25 |
gibi | yepp | 09:25 |
sean-k-mooney | i mean i gues it could be used for either | 09:25 |
gibi | but there was limited time so we did not dig deep | 09:25 |
sean-k-mooney | ack | 09:25 |
sean-k-mooney | so there soft vs hard "requirement" would be filter vs weigher | 09:26 |
sean-k-mooney | but most of the logic would be the same so i woudl proably just do both | 09:26 |
jkulik | sean-k-mooney: I don't think I can make it in 2 weeks, but would be able to do it in the longer run | 09:26 |
sean-k-mooney | and try and shre the code | 09:26 |
sean-k-mooney | jkulik: ack | 09:26 |
jkulik | custom scheduler filter would be my idea, too. but we'd need it in the server-group API anyways as having one flavor per rack really doesn't scale | 09:28 |
sean-k-mooney | not per rack | 09:29 |
sean-k-mooney | so in the flavor you would have a policy | 09:29 |
sean-k-mooney | aggreate_anti_affinity:rack | 09:29 |
sean-k-mooney | aggreate_anti_affinity:room | 09:30 |
sean-k-mooney | aggreate_anti_affinity:row | 09:30 |
jkulik | ah, but still. customers are already overwhelmed by the number of flavors as is. I don't think we can add more for this :) | 09:30 |
sean-k-mooney | ya that is why the server-group api woudl be beeter | 09:30 |
sean-k-mooney | ther is a hack that you could use if you promise not to mention my name as the source :P | 09:31 |
jkulik | :D | 09:31 |
sean-k-mooney | so you could use server tags or instance metadta for now | 09:31 |
bauzas | I wonder how feasible a weigher could be | 09:31 |
bauzas | we could weigh per aggregatesd | 09:32 |
sean-k-mooney | so you could add aggreate_anti_affinity_* set of server tags | 09:32 |
sean-k-mooney | and then look at those in the filter/weigher | 09:32 |
bauzas | each host getting the same weigh | 09:32 |
jkulik | sean-k-mooney: that sounds wrong :D but thanks for the idea | 09:32 |
sean-k-mooney | bauzas: that is doable yes | 09:32 |
sean-k-mooney | jkulik: tags are somethimes used for this in other out of tree implemenation like tripleos instance ha feature | 09:33 |
jkulik | speaking of the server-group API: we've extended it to allow adding servers to server-groups after they're spawned. this can be helpful if you need to spawn a new instance anti-affin to a previously spawned one, where you didn't know that requirement, yet. | 09:33 |
bauzas | I'm still confused by the weights | 09:33 |
sean-k-mooney | its not how the api is inteded to be used but still | 09:33 |
bauzas | but | 09:33 |
jkulik | is that something that has a chance upstream, if we write a spec for it? | 09:33 |
bauzas | you could pass a hint | 09:33 |
bauzas | and then have a weigher looking up at the hint | 09:34 |
sean-k-mooney | schduler hint ya but that also versioned | 09:34 |
sean-k-mooney | bauzas: oh for the pack vs spread policy | 09:34 |
sean-k-mooney | for weighers | 09:34 |
bauzas | yup | 09:34 |
sean-k-mooney | ya so i was thining the weigher woudl jsut look at the metadta in the server-group policy | 09:35 |
sean-k-mooney | and ignore the multiplpere sign | 09:35 |
sean-k-mooney | or we woudl just set min =0 for the config option | 09:35 |
bauzas | I'm not in a favor of adding more in the existing server group API but... | 09:35 |
sean-k-mooney | so the multiplper for the weigher woudl afffect only magnitude | 09:35 |
bauzas | I guess we need to agree on the use case | 09:36 |
sean-k-mooney | well server group api is preferable to schduler hit | 09:36 |
bauzas | sean-k-mooney: from an UX, I think so | 09:36 |
bauzas | but, you know how much I like our server group implementations | 09:36 |
* bauzas wonders whether we should just do this for multicreate | 09:37 | |
bauzas | like --min 2 --hint spread_my_stuff | 09:37 |
gibi | bauzas: you can look at the soft-anti-affinity weigher for reference how to do it | 09:45 |
gibi | ie how to do weigher for soft things | 09:46 |
gibi | jkulik: adding existing instance to server group needs a decision from a) allow adding an instance only if the policy is valid b)allow adding instance even if the policy is not valid yet but will be at the next move | 09:47 |
gibi | jkulik: I think in the past we was not able to agree on which way we should go | 09:47 |
gibi | but you can try again | 09:47 |
jkulik | we've opted for a) ... | 09:53 |
jkulik | where would be the right point to discuss this? | 09:53 |
sean-k-mooney | https://github.com/openstack/nova-specs/tree/master/specs | 09:54 |
sean-k-mooney | sorry my wifi dropped so missed the last few minutes | 09:54 |
sean-k-mooney | so not sure waht a) is | 09:54 |
sean-k-mooney | but if you want to creat a spec to discuss the desgin upstream copy https://github.com/openstack/nova-specs/blob/master/specs/zed-template.rst | 09:55 |
sean-k-mooney | and fill it in and we can discuss on the spec and on irc | 09:55 |
sean-k-mooney[m] | ah so looking at my matix client i see i did not miss much | 09:56 |
sean-k-mooney[m] | i dont like multi create | 09:56 |
sean-k-mooney[m] | so i dont think we should add this tere | 09:57 |
* sean-k-mooney switch back to irc | 09:57 | |
sean-k-mooney | gibi: correct we did not come to an agreement on how to handel the case where the instance did not comply with the policy | 09:58 |
sean-k-mooney | i did not like the idea of the add operation implictly live migratinging the instnace | 09:58 |
sean-k-mooney | we have basicaly 3 options, 1 reject the request if the policy would be violated, 2 accpet but automticaly trigger a move operation to reconsile the state, 3 allow it to be invalide and retturn some kind or warnign and leave it to the end user to reconsile the state with a move operation | 10:00 |
sean-k-mooney | at a later date | 10:00 |
jkulik | is moving an instance allowed for normal users? | 10:01 |
sean-k-mooney | technially yes but only via a resize or shelve | 10:01 |
sean-k-mooney | live migrate and cold migrate are admin only | 10:01 |
gibi | sean-k-mooney: yeah, from those options I would go with either reject the add if violates the policy, or accept it but only warn (or extend the server group api to show if the policy is invalid), but I definitely don't want to trigger a move automatically | 10:11 |
sean-k-mooney | jkulik: gibi this was the previous spec on the topic https://review.opendev.org/c/openstack/nova-specs/+/782353 | 10:26 |
jkulik | sean-k-mooney: thanks. that will be helpful | 10:28 |
jkulik | oh, that's proposed as a server-action | 10:29 |
jkulik | fyi, this is how we built it downstream https://github.com/sapcc/nova/commit/7220be3968ee1dd257c9add88228cc5bb9857795 (+ some commits afterwards to fix certain stuff) | 10:33 |
sean-k-mooney | i see that still has the same problem | 10:41 |
frickler | does that ring a bell for someone? "nova.exception.InternalError: Unexpected vif_type=unbound" unstable failure in OSC jobs, shows as JSONDecodeError https://zuul.opendev.org/t/openstack/build/181d8177eab5428a82facc4d95ce00e2 | 10:41 |
sean-k-mooney | vif_type unbound is what the netron itnerface has before you set the host-id | 10:42 |
sean-k-mooney | frickler: so } openstackclient.tests.functional.compute.v2.test_server.ServerTests.test_server_attach_detach_floating_ip might be racing | 10:43 |
sean-k-mooney | with the server boot | 10:43 |
sean-k-mooney | if it has not finished booting when you try to attach the floating ip then you woudl get that issue i guess | 10:43 |
frickler | oh, so yet another set of tests needing wait-for-ssh things | 10:43 |
sean-k-mooney | maybe have not looked at the test yet | 10:44 |
sean-k-mooney | its not nessiarly sshable | 10:44 |
sean-k-mooney | it woudl need to be active | 10:44 |
sean-k-mooney | so this is the test https://github.com/openstack/python-openstackclient/blob/master/openstackclient/tests/functional/compute/v2/test_server.py#L339= | 10:45 |
sean-k-mooney | it shoud be waiting for active | 10:45 |
sean-k-mooney | it looks like the error is coming form self.server_create | 10:45 |
sean-k-mooney | ah from here https://github.com/openstack/python-openstackclient/blob/20e7b01af8f0fb4cf0f4af253270ad470926ba4e/openstackclient/tests/functional/compute/v2/common.py#L89= | 10:46 |
sean-k-mooney | frickler: so its assuming that a value will be populated in the output i guess | 10:46 |
sean-k-mooney | from the trace it looks like it got an empty responce of something like that | 10:47 |
sean-k-mooney | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) | 10:47 |
sean-k-mooney | frickler: without a request id it might not be easy to see what the api returned | 10:48 |
sean-k-mooney | 136.246758s that seamed to take a very long time | 10:49 |
sean-k-mooney | is there a timeout for wait? | 10:49 |
frickler | neutron says "Concurrent port binding operations failed on port 5b1348e9-4048-4b39-a1df-1161a798052e" before nova fails, so more likely an issue on their side | 11:15 |
sean-k-mooney | concurrent? | 11:19 |
sean-k-mooney | how | 11:19 |
sean-k-mooney | that could only happne if we failed reschulded and tried to bind again | 11:19 |
sean-k-mooney | but they were sitll binding the port to the first host | 11:19 |
sean-k-mooney | so i guess yes that woudl be a neutorn issue | 11:20 |
sean-k-mooney | that is not as far as i am aware one of the exctpiotn they are allowed to raise at teh api level | 11:20 |
frickler | that's just a warning in the q-svc log and iiuc they retry and succeed after that. but it may be that they send a notification to nova about the first attempt anyway | 11:42 |
sean-k-mooney | maybe i know ralonsoh has a wip patch for somethign else that i dont think will actully help but a race was mentioned in cotext to that | 11:43 |
sean-k-mooney | frickler: https://review.opendev.org/c/openstack/neutron/+/846422/3 | 11:44 |
sean-k-mooney | https://bugs.launchpad.net/neutron/+bug/1979072 | 11:45 |
sean-k-mooney | that shoudl actully be fixed in nova | 11:45 |
ralonsoh | sean-k-mooney, right, this is just a WIP patch | 11:45 |
ralonsoh | trying to address an issue that is on the Neutron side | 11:45 |
sean-k-mooney | right please dont | 11:46 |
ralonsoh | however that doesn't address the problem of "concurrent port binding" | 11:46 |
sean-k-mooney | at least not with a periodic | 11:46 |
ralonsoh | don't what? | 11:46 |
sean-k-mooney | try an fix it form neutron | 11:46 |
ralonsoh | ok but the problem is anytime we have this issue, the bug is bug Neutron | 11:46 |
sean-k-mooney | we need to fix this form the nova side to avoid posible races between nova and neturon | 11:46 |
sean-k-mooney | right so there are two ways to fix this in nova. 1 make sure we delete the inactive port bidning when we revert | 11:47 |
sean-k-mooney | i think we try that already today but it can fial | 11:47 |
sean-k-mooney | second when we live migrate or try to create a portbidning | 11:47 |
sean-k-mooney | and it already exits delete and recreate | 11:47 |
sean-k-mooney | that will prevent this form breaking in the future | 11:48 |
frickler | the failure in osc isn't related to migration. it happens on initial server create. so I don't understand what could cause the duplicate there | 11:48 |
sean-k-mooney | if the initall port create faild and we resuchlde then we will try binding it to a second host | 11:48 |
opendevreview | Sergii Golovatiuk proposed openstack/nova master: Replace "db archive" with "db archive_deleted_raws" https://review.opendev.org/c/openstack/nova/+/847963 | 11:48 |
sean-k-mooney | we wont create a second binding | 11:48 |
sean-k-mooney | we will just update the host-id | 11:48 |
sean-k-mooney | but if neutron is still binding it form the first failed attempet we woudl get a concurrent error | 11:49 |
sean-k-mooney | frickler: did you check the logs to see if the vm was retired on a second host? | 11:49 |
frickler | sean-k-mooney: it is a single-node job, I would be surprised if that happened | 11:51 |
sean-k-mooney | frickler: ok well the only other thing i can think of is the client retry | 11:53 |
sean-k-mooney | ralonsoh: could this happen if nova retryed creating the port binding because thet inital call timed out | 11:54 |
sean-k-mooney | i think we try 3 times | 11:54 |
sean-k-mooney | this being "Concurrent port binding operations failed on port ..." | 11:54 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add regression test for bug 1838392 https://review.opendev.org/c/openstack/nova/+/847965 | 12:19 |
*** dasm|off is now known as dasm | 13:04 | |
bauzas | gibi: sean-k-mooney: jkulik: fwiw, this is not a new meat https://review.opendev.org/c/openstack/nova-specs/+/130005 | 13:08 |
sean-k-mooney | bauzas: yep i know | 13:08 |
bauzas | sure, but I provided the existing context | 13:08 |
sean-k-mooney | there are other specs more recnet then that | 13:09 |
bauzas | just look at the gerrit comments | 13:09 |
sean-k-mooney | but ya we have rejected dynmic server groups and adding other affinity policies in the past | 13:09 |
bauzas | yes and no, this is just the fact that the spec was split https://review.opendev.org/c/openstack/nova-specs/+/139272 and https://review.opendev.org/c/openstack/nova-specs/+/136487 | 13:10 |
bauzas | look again about the comments | 13:10 |
bauzas | sean-k-mooney: not for you, but rather jkulik | 13:10 |
jkulik | bauzas: thanks, will take a look | 13:11 |
sean-k-mooney | bauzas: ack | 13:11 |
bauzas | jkulik: there was by then a try to have a new API instead of server groups https://review.opendev.org/c/openstack/nova-specs/+/183837/4/specs/liberty/approved/generic-scheduling-policies.rst | 13:13 |
sean-k-mooney | bauzas: im not sure how that would help | 13:13 |
sean-k-mooney | the issue with server groups is adding an instance that violates the policy | 13:14 |
bauzas | just saying this is a can of worms | 13:14 |
sean-k-mooney | it is | 13:14 |
bauzas | sean-k-mooney: the problem with server groups is that if we touch it, it creates more races than the ones it fixes | 13:14 |
sean-k-mooney | but that is a sperate topic form the orginal thing that jkulik raised | 13:14 |
bauzas | sean-k-mooney: well, this is all about colocality | 13:15 |
bauzas | we express this in Nova with server groups | 13:15 |
sean-k-mooney | well affinity and anti affity | 13:15 |
sean-k-mooney | with differnt granularity | 13:15 |
sean-k-mooney | but yes | 13:15 |
bauzas | but the question remains about the best UX we may have | 13:15 |
bauzas | anyway, me goes back at bug scrub | 13:18 |
Uggla | question, from the api, I try to check that a scheduling is impossible. To do that I look at 'No valid host found for unshelve instance' in the log. My test seems to work, but I have to wait before checking the logs. Is there a proper way to do that ? | 14:29 |
Uggla | bauzas, gibi ^ | 14:30 |
sean-k-mooney | using notifications | 14:33 |
sean-k-mooney | but in general we dont tend to use logs in tests | 14:33 |
sean-k-mooney | we do sometimes but there are often better way to do that | 14:33 |
Uggla | is _wait_for_action_fail_completion a possible option ? | 14:33 |
gibi | Uggla: you wait for the server to go to ERROR state then you can check the fault in the server to see if it is a no valid host | 14:43 |
gibi | give me a sec and I will find an example | 14:43 |
*** diablo_rojo is now known as Guest3525 | 14:43 | |
gibi | Uggla: for example https://github.com/openstack/nova/blob/c53ec4e48884235566962bc934cbf292ad5b67b8/nova/tests/functional/test_servers.py#L4100-L4108 | 14:45 |
Uggla | gibi, ok probably better that what I have just done using wait_for_assert(). | 14:45 |
bauzas | Uggla: not sure I understand your question | 15:07 |
bauzas | you're asking how nova-api could know there is a scheduling error ? | 15:07 |
Uggla | bauzas, no worries Gibi is helping me right now. | 15:07 |
bauzas | oh, for testing | 15:08 |
sean-k-mooney | yep testing | 15:08 |
sean-k-mooney | presumable functional testing | 15:08 |
sean-k-mooney | rather then unit | 15:08 |
gibi | yepp func testing | 15:11 |
gibi | Uggla just found the limitation of the our nova.tests.functional.integrated_helpers.InstanceHelperMixin._wait_for_instance_action_event assert | 15:11 |
gibi | it always checks the first action of a given type from the list of instance actions | 15:12 |
gibi | and he had two unshelve actions in the test case | 15:12 |
gibi | and the assert only checked the first | 15:12 |
gibi | even though the second was in error state | 15:12 |
bauzas | reminder: nova meeting in 30 mins | 15:29 |
sean-k-mooney | oh fun | 15:31 |
sean-k-mooney | i was not aware we had that limitation | 15:31 |
gibi | me neither, but now Uggla can improve on that :) | 15:40 |
Uggla | gibi, sean-k-mooney , currently not completely sure but no the code looks ok. It is more the event I'm looking for which is not the good one. | 15:42 |
gibi | it can be that the listing of the instance actions are not stable so sometimes the code founds the proper unshelve action sometimes not | 15:43 |
gibi | I mean the sorting is not stable | 15:48 |
*** diablo_rojo__ is now known as diablo_rojo | 15:54 | |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Jun 28 16:00:11 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
bauzas | hello everyone | 16:00 |
bauzas | mmmm | 16:01 |
bauzas | heyhoooooooo | 16:01 |
* bauzas awaits the echoing | 16:01 | |
gibi | o/ | 16:02 |
bauzas | hah, I hear some sound | 16:02 |
gibi | crickets ? | 16:02 |
bauzas | I'm maybe in a cave | 16:02 |
bauzas | gibi: not sure we have quorum for today's meeting :( | 16:03 |
elodilles | o/ | 16:03 |
gibi | sean-k-mooney already gone for today | 16:03 |
gibi | I guess melwitt is still on PTO | 16:03 |
bauzas | hah | 16:03 |
bauzas | yes indeed + artom | 16:04 |
bauzas | what would you want ? | 16:04 |
gibi | do we have somebody here for the todays nova meeting to talk about specific things? (if not then we can close this) | 16:05 |
bauzas | we still have one critical bug | 16:05 |
elodilles | nothing special news from stable point of view, so - | 16:05 |
bauzas | I can be the bug baton owner for the next week | 16:06 |
bauzas | that said, next week we will have a spec review day | 16:06 |
bauzas | I'll email it | 16:06 |
bauzas | that's it for me | 16:06 |
gibi | nothing from me I spent most of the week downstream | 16:07 |
bauzas | #info Next bug baton is still for bauzas | 16:07 |
bauzas | #info One Critical bug | 16:07 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 8 new untriaged bugs (-4 since the last meeting) | 16:07 |
bauzas | #link https://storyboard.openstack.org/#!/project/openstack/placement 27 open stories (+1 since the last meeting) in Storyboard for Placement | 16:07 |
bauzas | #link https://storyboard.openstack.org/#!/story/2010108 new Placement bug | 16:07 |
bauzas | but I'll tell it again next week | 16:08 |
bauzas | #info Spec review day on July 5th | 16:08 |
bauzas | that's basically it | 16:08 |
gibi | I guess in the critical bug we keep the job non-votig while waiting for the fix to be released in centos stream 9 | 16:08 |
bauzas | gibi: yeah, we'll discuss about it next week | 16:08 |
gibi | ok | 16:08 |
bauzas | I have a concern about the centos 9 stream job | 16:09 |
gibi | then I think we can close early today | 16:09 |
bauzas | ok, then | 16:09 |
bauzas | #agreed given we don't have quorum for this meeting, let's punt it for this week until next week | 16:09 |
bauzas | #info remember we'll have a spec review day next week | 16:09 |
bauzas | that's it, thanks | 16:10 |
bauzas | #endmeetingh | 16:10 |
bauzas | meh | 16:10 |
bauzas | #endmeeting | 16:10 |
opendevmeet | Meeting ended Tue Jun 28 16:10:14 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:10 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2022/nova.2022-06-28-16.00.html | 16:10 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-06-28-16.00.txt | 16:10 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2022/nova.2022-06-28-16.00.log.html | 16:10 |
bauzas | even | 16:10 |
gibi | have a nice evening folks | 16:11 |
elodilles | you too o/ | 16:11 |
Uggla | bauzas, already done ! | 16:15 |
bauzas | Uggla: yeah we didn't had a quorum | 16:15 |
Uggla | oh ok. | 16:16 |
opendevreview | Merged openstack/osc-placement master: Replace deprecated assertRaisesRegexp https://review.opendev.org/c/openstack/osc-placement/+/817365 | 16:27 |
bauzas | gibi: sean-k-mooney: Uggla: others: I forgot to tell I'll be on PTO tomorrow | 16:55 |
sean-k-mooney[m] | ok | 16:57 |
*** diablo_rojo is now known as Guest3544 | 17:40 | |
*** dasm is now known as dasm|afk | 19:34 | |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Remove system scope from all APIs https://review.opendev.org/c/openstack/nova/+/848021 | 19:47 |
dansmith | gmann: ^ passes unit tests locally for me, we'll see what happens in functional | 19:47 |
dansmith | also, side note | 19:47 |
dansmith | I'm f**king tired of policy stuff | 19:48 |
gmann | dansmith: thanks for that. I was also fixing some unit tests but functional test will be good i think. | 19:49 |
gmann | dansmith: agree, same here on policy stuff. | 19:49 |
dansmith | ack, it would be GREAT if I don't have to mess with functional failures | 19:49 |
*** mfo is now known as Guest3572 | 22:51 | |
*** mfo_ is now known as mfo | 22:51 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!