*** igordc has quit IRC | 00:13 | |
*** sapd1 has joined #openstack-nova | 00:30 | |
*** awalende has joined #openstack-nova | 00:31 | |
*** mlavalle has quit IRC | 00:34 | |
*** awalende has quit IRC | 00:36 | |
*** jcosmao has left #openstack-nova | 00:36 | |
*** tbachman has quit IRC | 00:46 | |
*** tbachman has joined #openstack-nova | 00:51 | |
*** zhanglong has joined #openstack-nova | 01:02 | |
*** sapd1 has quit IRC | 01:02 | |
*** gyee has quit IRC | 01:09 | |
*** Liang__ has joined #openstack-nova | 01:12 | |
openstackgerrit | Merged openstack/nova master: Tie requester_id to RequestGroup suffix https://review.opendev.org/696946 | 01:13 |
---|---|---|
*** jmlowe has joined #openstack-nova | 01:32 | |
*** zhanglong has quit IRC | 01:36 | |
*** mmethot has quit IRC | 01:36 | |
*** zhanglong has joined #openstack-nova | 01:38 | |
*** mdbooth has quit IRC | 01:44 | |
*** mdbooth has joined #openstack-nova | 01:46 | |
*** mmethot has joined #openstack-nova | 01:47 | |
*** larainema has joined #openstack-nova | 01:49 | |
*** rcernin has quit IRC | 01:53 | |
*** dave-mccowan has joined #openstack-nova | 02:00 | |
*** jmlowe has quit IRC | 02:10 | |
*** lvbin02 has joined #openstack-nova | 02:18 | |
*** lvbin01 has quit IRC | 02:21 | |
*** lvbin02 has quit IRC | 02:21 | |
*** lvbin01 has joined #openstack-nova | 02:22 | |
openstackgerrit | jichenjc proposed openstack/nova master: libvirt: avoid cpu check at s390x arch https://review.opendev.org/696228 | 02:30 |
*** awalende has joined #openstack-nova | 02:32 | |
*** awalende has quit IRC | 02:37 | |
*** nweinber__ has joined #openstack-nova | 02:44 | |
openstackgerrit | hutianhao27 proposed openstack/nova master: Revert "nova shared storage: rbd is always shared storage" https://review.opendev.org/682523 | 02:45 |
*** rcernin has joined #openstack-nova | 02:54 | |
*** dave-mccowan has quit IRC | 02:55 | |
*** zhanglong has quit IRC | 03:07 | |
*** nweinber__ has quit IRC | 03:18 | |
*** munimeha1 has quit IRC | 03:28 | |
*** mkrai has joined #openstack-nova | 03:28 | |
*** factor has joined #openstack-nova | 03:31 | |
*** nweinber__ has joined #openstack-nova | 03:36 | |
*** psachin has joined #openstack-nova | 03:41 | |
*** zhanglong has joined #openstack-nova | 03:54 | |
openstackgerrit | hutianhao27 proposed openstack/nova master: Revert "nova shared storage: rbd is always shared storage" https://review.opendev.org/682523 | 03:55 |
*** tetsuro has quit IRC | 03:58 | |
*** tetsuro has joined #openstack-nova | 03:59 | |
*** ociuhandu has joined #openstack-nova | 04:04 | |
*** nweinber__ has quit IRC | 04:06 | |
*** ociuhandu has quit IRC | 04:08 | |
*** awalende has joined #openstack-nova | 04:10 | |
*** udesale has joined #openstack-nova | 04:13 | |
*** awalende has quit IRC | 04:15 | |
*** bhagyashris has joined #openstack-nova | 04:20 | |
*** zhanglong has quit IRC | 04:46 | |
*** dansmith has quit IRC | 05:10 | |
*** dansmith has joined #openstack-nova | 05:14 | |
*** tetsuro has quit IRC | 05:28 | |
*** tetsuro has joined #openstack-nova | 05:31 | |
*** shilpasd has quit IRC | 05:35 | |
*** tetsuro has quit IRC | 06:04 | |
*** tetsuro has joined #openstack-nova | 06:09 | |
*** pcaruana has joined #openstack-nova | 06:09 | |
*** tbachman has quit IRC | 06:19 | |
*** tbachman has joined #openstack-nova | 06:21 | |
openstackgerrit | Merged openstack/nova master: Switch to uses_virtio to enable iommu driver for AMD SEV https://review.opendev.org/696697 | 06:25 |
openstackgerrit | Merged openstack/nova master: Also enable iommu for virtio controllers and video in libvirt https://review.opendev.org/684825 | 06:25 |
*** avolkov has joined #openstack-nova | 06:39 | |
*** dpawlik has joined #openstack-nova | 07:00 | |
*** damien_r has quit IRC | 07:03 | |
*** dpawlik has quit IRC | 07:04 | |
*** dpawlik has joined #openstack-nova | 07:07 | |
*** tosky has joined #openstack-nova | 07:09 | |
*** lpetrut has joined #openstack-nova | 07:18 | |
openstackgerrit | Shilpa Devharakar proposed openstack/nova master: Handle new is_volume_backend join column query https://review.opendev.org/694462 | 07:18 |
openstackgerrit | Shilpa Devharakar proposed openstack/nova master: Instance object changes for the new 'is_volume_backed' expected_attr https://review.opendev.org/694463 | 07:18 |
openstackgerrit | Shilpa Devharakar proposed openstack/nova master: Ignore root_gb if instance is booted from volume https://review.opendev.org/612626 | 07:18 |
*** damien_r has joined #openstack-nova | 07:27 | |
*** zainub_wahid has joined #openstack-nova | 07:30 | |
*** gibi_off is now known as gibi | 07:44 | |
* gibi is back | 07:46 | |
gibi | reading the notification while I was away I feel like IRC's away message system doesn't work as intended as I noted there that I back on Wednesday but it seems this info was unknown for the folks on the channel | 07:47 |
*** slaweq has joined #openstack-nova | 07:47 | |
*** bhagyashris has quit IRC | 07:54 | |
*** belmoreira has joined #openstack-nova | 07:56 | |
*** damien_r has quit IRC | 07:57 | |
*** tesseract has joined #openstack-nova | 07:57 | |
*** ociuhandu has joined #openstack-nova | 08:05 | |
*** ociuhandu has quit IRC | 08:05 | |
*** maciejjozefczyk has joined #openstack-nova | 08:06 | |
*** bhagyashris has joined #openstack-nova | 08:06 | |
*** ociuhandu has joined #openstack-nova | 08:07 | |
openstackgerrit | Eric Xie proposed openstack/nova master: Report trait 'COMPUTE_IMAGE_TYPE_PLOOP' https://review.opendev.org/698132 | 08:10 |
*** tkajinam has quit IRC | 08:12 | |
*** ccamacho has joined #openstack-nova | 08:13 | |
*** awalende has joined #openstack-nova | 08:20 | |
*** damien_r has joined #openstack-nova | 08:21 | |
*** pcaruana has quit IRC | 08:22 | |
*** ociuhandu has quit IRC | 08:30 | |
*** bhagyashris has quit IRC | 08:30 | |
*** rpittau|afk is now known as rpittau | 08:36 | |
*** links has joined #openstack-nova | 08:44 | |
*** aloga has quit IRC | 08:45 | |
openstackgerrit | Guo Jingyu proposed openstack/nova master: Make scheduling more debuggable https://review.opendev.org/698421 | 08:45 |
*** aloga has joined #openstack-nova | 08:45 | |
*** ccamacho is now known as ccamacho|pto | 08:50 | |
*** ociuhandu has joined #openstack-nova | 08:54 | |
*** ralonsoh has joined #openstack-nova | 08:55 | |
*** awalende has quit IRC | 08:59 | |
*** ociuhandu has quit IRC | 08:59 | |
openstackgerrit | Guo Jingyu proposed openstack/nova master: Make scheduling more debuggable https://review.opendev.org/698421 | 08:59 |
*** ociuhandu has joined #openstack-nova | 09:00 | |
*** awalende has joined #openstack-nova | 09:01 | |
*** bhagyashris has joined #openstack-nova | 09:01 | |
*** iurygregory has joined #openstack-nova | 09:02 | |
openstackgerrit | Guo Jingyu proposed openstack/nova master: Make scheduling more debuggable https://review.opendev.org/698421 | 09:03 |
*** jangutter has quit IRC | 09:03 | |
*** jangutter has joined #openstack-nova | 09:04 | |
*** ociuhandu has quit IRC | 09:05 | |
*** zhanglong has joined #openstack-nova | 09:08 | |
*** ociuhandu has joined #openstack-nova | 09:27 | |
*** udesale has quit IRC | 09:29 | |
*** udesale has joined #openstack-nova | 09:32 | |
*** martinkennelly has joined #openstack-nova | 09:32 | |
*** jangutter_ has joined #openstack-nova | 09:36 | |
*** zainub_wahid has quit IRC | 09:37 | |
*** dpawlik has quit IRC | 09:38 | |
*** ccamacho|pto has quit IRC | 09:38 | |
*** ccamacho has joined #openstack-nova | 09:38 | |
*** jangutter has quit IRC | 09:39 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle multiple bindings https://review.opendev.org/696246 | 09:42 |
*** derekh has joined #openstack-nova | 09:43 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Do not mock setup net and migrate inst in NeutronFixture https://review.opendev.org/696247 | 09:43 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Move _get_request_group_mapping() to RequestSpec https://review.opendev.org/696541 | 09:45 |
*** abaindur has quit IRC | 09:46 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Move _update_pci_request_spec_with_allocated_interface_name https://review.opendev.org/696574 | 09:46 |
*** factor has quit IRC | 09:46 | |
*** factor has joined #openstack-nova | 09:47 | |
*** ttsiouts has joined #openstack-nova | 09:47 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support live migration with qos ports https://review.opendev.org/695905 | 09:48 |
*** rcernin has quit IRC | 09:50 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle multiple bindings https://review.opendev.org/696246 | 09:53 |
*** farhanjamil has joined #openstack-nova | 09:54 | |
*** ccamacho is now known as ccamacho|pto | 09:54 | |
*** dpawlik has joined #openstack-nova | 09:54 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Do not mock setup net and migrate inst in NeutronFixture https://review.opendev.org/696247 | 09:55 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Move _get_request_group_mapping() to RequestSpec https://review.opendev.org/696541 | 09:56 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Move _update_pci_request_spec_with_allocated_interface_name https://review.opendev.org/696574 | 09:58 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support live migration with qos ports https://review.opendev.org/695905 | 09:59 |
*** dtantsur|afk is now known as dtantsur | 10:06 | |
*** farhanjamil has quit IRC | 10:07 | |
*** vishalmanchanda has quit IRC | 10:09 | |
*** vishalmanchanda has joined #openstack-nova | 10:10 | |
*** Liang__ has quit IRC | 10:17 | |
*** dpawlik has quit IRC | 10:19 | |
*** dpawlik has joined #openstack-nova | 10:22 | |
*** zhanglong has quit IRC | 10:25 | |
*** salmankhan has joined #openstack-nova | 10:31 | |
*** salmankhan has joined #openstack-nova | 10:32 | |
*** belmoreira has quit IRC | 10:44 | |
*** belmoreira has joined #openstack-nova | 10:45 | |
*** belmoreira has quit IRC | 10:49 | |
*** bhagyashris has quit IRC | 10:53 | |
*** mkrai_ has joined #openstack-nova | 10:54 | |
*** mkrai has quit IRC | 10:57 | |
*** mkrai_ has quit IRC | 10:59 | |
*** mkrai has joined #openstack-nova | 10:59 | |
*** udesale has quit IRC | 11:04 | |
*** mkrai has quit IRC | 11:04 | |
*** bhagyashris has joined #openstack-nova | 11:07 | |
*** awalende has quit IRC | 11:20 | |
*** awalende has joined #openstack-nova | 11:21 | |
*** dpawlik has quit IRC | 11:21 | |
*** awalende_ has joined #openstack-nova | 11:23 | |
*** ttsiouts has quit IRC | 11:24 | |
*** awalende has quit IRC | 11:25 | |
*** awalende_ has quit IRC | 11:27 | |
*** arxcruz is now known as arxcruz|pto | 11:32 | |
*** udesale has joined #openstack-nova | 11:35 | |
*** pcaruana has joined #openstack-nova | 11:36 | |
*** tbachman has quit IRC | 11:46 | |
*** ttsiouts has joined #openstack-nova | 11:52 | |
sean-k-mooney | gibi: ya we did not get any away notifications | 11:53 |
sean-k-mooney | did you perhaps set your self as away before chaning your nick or something | 11:53 |
gibi | sean-k-mooney: used /away <message> in my irssi but then I will not try to rely on that in the future | 11:54 |
sean-k-mooney | that is what i use but with weechat and it works fine | 11:54 |
sean-k-mooney | if you ping me now you should get a message | 11:54 |
gibi | jeah could be an ordering issue | 11:55 |
*** sean-k-mooney is now known as skm | 11:55 | |
skm | did you get a message before | 11:55 |
*** skm is now known as sean-k-mooney | 11:55 | |
sean-k-mooney | oh you didnt use my name | 11:56 |
sean-k-mooney | gibi: anyway shall i proceed with keeping the notifications for now. | 11:57 |
gibi | sean-k-mooney: I think you should add your new image meta to the payload | 11:58 |
gibi | just to follow the pattern | 11:58 |
sean-k-mooney | but i am the only person that has done that since the notification have been added | 11:59 |
sean-k-mooney | and we have added lots of image properties since then | 11:59 |
sean-k-mooney | i also added the 1.1 version bump | 11:59 |
*** brinzhang has joined #openstack-nova | 12:00 | |
*** dpawlik has joined #openstack-nova | 12:00 | |
sean-k-mooney | so i can set and follow a new pattern but if i was to follow the exitising one i would ignore it | 12:00 |
gibi | I opened the original patch introduced the payload class and at that time it contained everyhing, so the intention is clear to me based on that | 12:00 |
sean-k-mooney | yep which is why i added the property but we have not been doing that | 12:01 |
*** larainema has quit IRC | 12:01 | |
*** awalende has joined #openstack-nova | 12:01 | |
*** brinzhang has quit IRC | 12:02 | |
sean-k-mooney | i can file a bug to bring them in sync and work on a patch to do that if that is what we want to do going forward | 12:02 |
gibi | honestly I would not block you patch on this in either way. It is pretty clear to me know that there are no active notification consumers out there that are willing to give us feedback what they need | 12:02 |
gibi | s/know/now/ | 12:02 |
*** brinzhang has joined #openstack-nova | 12:02 | |
gibi | so I won't be hard with rules | 12:02 |
*** brinzhang has quit IRC | 12:03 | |
sean-k-mooney | well its more efried that was interested since he was addeding image peorperties too and he was wondering if he should be doing the same | 12:03 |
*** brinzhang has joined #openstack-nova | 12:03 | |
sean-k-mooney | i was wondering if we should document the policy if there was one somewhere so that we can do the right thing | 12:04 |
*** brinzhang has quit IRC | 12:04 | |
*** spsurya has joined #openstack-nova | 12:05 | |
*** dpawlik has quit IRC | 12:05 | |
gibi | so we need to 1) decide what is the policy 2) make sure we can somehow enforce the policy | 12:06 |
gibi | for 1) the original policy was to mirror image meta but then later patches not followed that policy | 12:06 |
sean-k-mooney | well we can just compare the fields dict of both ovos | 12:06 |
sean-k-mooney | they shoulds always be in sync right | 12:07 |
*** brinzhang has joined #openstack-nova | 12:07 | |
*** brinzhang has quit IRC | 12:07 | |
gibi | then we need a separate patch to re-sync the two | 12:07 |
sean-k-mooney | yes, i commented on the bug that i was not sure why it needed to be a seperate object vs the nova.object.ImageMetaProps | 12:08 |
*** brinzhang has joined #openstack-nova | 12:08 | |
sean-k-mooney | *patch | 12:08 |
sean-k-mooney | was there a reson you made the payload object sperate? | 12:08 |
*** brinzhang has quit IRC | 12:09 | |
gibi | in general we wanted to have separate objects for notification payload so that we are not enforced to push every internal data out in the notification. But for this particular case if we decide that we always want to push every image meta in the notification then sure in this case we don't need a separate payload class | 12:09 |
*** brinzhang has joined #openstack-nova | 12:11 | |
sean-k-mooney | well i guess that is the issue right. having a seperate class made the notification update optional since it would not fail tests if you forgot. | 12:11 |
*** brinzhang has quit IRC | 12:11 | |
*** brinzhang has joined #openstack-nova | 12:12 | |
sean-k-mooney | ok so ill level the notification updates in the patch for now assuming efried is ok with that and ill file a bug for the fact tehy are not in sync. | 12:12 |
gibi | sean-k-mooney: ack, it works for me | 12:12 |
gibi | and thanks for taking care of | 12:12 |
sean-k-mooney | then we can decide if we want to fix that by extending the new object and added the test for the fields or if we want to just use one object in this case sound good? | 12:13 |
*** brinzhang_ has joined #openstack-nova | 12:16 | |
*** nicolasbock has joined #openstack-nova | 12:18 | |
*** brinzhang has quit IRC | 12:19 | |
*** dpawlik has joined #openstack-nova | 12:20 | |
*** dpawlik has quit IRC | 12:25 | |
*** Liang__ has joined #openstack-nova | 12:27 | |
*** elod has quit IRC | 12:28 | |
*** dpawlik has joined #openstack-nova | 12:33 | |
*** ociuhandu has quit IRC | 12:37 | |
*** jangutter_ is now known as jangutter | 12:41 | |
*** parlos has joined #openstack-nova | 12:50 | |
*** udesale has quit IRC | 12:50 | |
*** elod has joined #openstack-nova | 12:51 | |
*** ociuhandu has joined #openstack-nova | 13:03 | |
*** CeeMac has quit IRC | 13:08 | |
*** brinzhang has joined #openstack-nova | 13:10 | |
*** tbachman has joined #openstack-nova | 13:11 | |
*** mmethot has quit IRC | 13:13 | |
*** tbachman_ has joined #openstack-nova | 13:16 | |
*** tbachman has quit IRC | 13:16 | |
*** tbachman_ is now known as tbachman | 13:16 | |
*** mmethot has joined #openstack-nova | 13:17 | |
*** brinzhang_ has quit IRC | 13:33 | |
*** damien_r has quit IRC | 13:33 | |
openstackgerrit | Guo Jingyu proposed openstack/nova master: Make scheduling more debuggable https://review.opendev.org/698421 | 13:53 |
*** Liang__ is now known as LiangFang | 13:54 | |
*** mriedem has joined #openstack-nova | 13:59 | |
gibi | sean-k-mooney: sorry, I was pulled. Yeah, it sounds like a plan | 14:00 |
sean-k-mooney | hehe no worries | 14:01 |
*** bnemec has joined #openstack-nova | 14:01 | |
*** ociuhandu_ has joined #openstack-nova | 14:02 | |
*** mdbooth has quit IRC | 14:02 | |
*** ociuhandu has quit IRC | 14:05 | |
*** bhagyashris has quit IRC | 14:06 | |
*** kaisers has joined #openstack-nova | 14:06 | |
*** mdbooth has joined #openstack-nova | 14:08 | |
*** brinzhang has quit IRC | 14:15 | |
efried | gibi, sean-k-mooney: ack, thanks for the followup. Sounds like we'll want a big patch to sync the two objects. | 14:16 |
efried | and maybe some kind of clever test that enforces their parity. | 14:16 |
efried | for the future | 14:16 |
gibi | efried: hi! I can get behind this plan | 14:17 |
efried | This ship has probably sailed, but is there no way to use the same object for both purposes? | 14:17 |
sean-k-mooney | well the clever test is just loop over the fields dict in the nova.object.image_meta.ImageProp object and assert they are in the notificaiton object dict | 14:18 |
gibi | efried: if we hack the versioning then I guess we can. But I don't know if we really want to heck the versioning | 14:18 |
gibi | sean-k-mooney: yeah that could work | 14:19 |
sean-k-mooney | ill write a patch to do that when i finish updating the functional tests | 14:19 |
efried | sean-k-mooney: and vice versa | 14:20 |
efried | thanks sean-k-mooney | 14:20 |
*** ociuhandu_ has quit IRC | 14:20 | |
sean-k-mooney | well actully they are dicts so i could jsut assert the keys are equal | 14:21 |
sean-k-mooney | if we want the typs to match i could jsut assert the dicts are equal to check the keys and values | 14:21 |
efried | I'd be fine just asserting they have the same keys. | 14:26 |
efried | it's really just a sniff test to make sure devs didn't miss syncing. | 14:26 |
sean-k-mooney | yep ill include it in the sync patch | 14:27 |
sean-k-mooney | also ddt makes updating the func test way simpler | 14:27 |
efried | sweet. | 14:27 |
efried | gibi: what's your vacation schedule? | 14:28 |
gibi | efried: last official day in the office is 16th. If I cannot finish with the qos live migration until then then I will add some extra time to that before 20th | 14:29 |
gibi | then back in the 6th of Jan | 14:29 |
efried | Okay. I'll try to get some reviews in on that. I'm also going to ask for another look at the vTPM spec soon if that's okay. I need to do another update, so maybe tomorrow? | 14:30 |
mriedem | stephenfin: i've got a few questions in this one https://review.opendev.org/#/c/696509/ | 14:31 |
gibi | efried: sure, lets do it | 14:32 |
efried | I'm going to make it simpler. Basically, the bits that are really awkward or hard to explain to users, I'm just going to say "behaves like baremetal". | 14:32 |
gibi | :) | 14:32 |
mriedem | stephenfin: also, not sure how others feel about this, but a massive rename like this screws up git history https://review.opendev.org/#/c/696745/7 | 14:32 |
mriedem | and backports | 14:32 |
mriedem | so i'd rather not do that personally | 14:32 |
mriedem | dansmith: how are your feels on this ^? | 14:32 |
efried | I'm more offended by the misused apostrophe in the commit message. | 14:33 |
stephenfin | mriedem: It really should. git's automerge tooling should detect renames for us | 14:33 |
mriedem | i guess https://review.opendev.org/#/c/696745/7/nova/network/api.py isn't as terrible as i would expect | 14:33 |
stephenfin | *it really shouldn't | 14:33 |
mriedem | maybe i still have ptsd from when ed renamed all of the legacy v2 contrib api modules | 14:34 |
dansmith | mriedem: yeah, that seems unnecessary to me | 14:34 |
stephenfin | There will be merge conflicts but that's due to stuff having been moved around. Straight up file renames aren't an issue | 14:34 |
dansmith | if others really care about it, then whatever, but it seems more trouble than it's worth to me | 14:35 |
mriedem | backports related to anything touching networking stuff is going to be a nightmare for awhile anyway, but that's expected | 14:35 |
efried | so might as well shoot the whole hog? | 14:36 |
stephenfin | dansmith: Well, I really care about it :) | 14:36 |
dansmith | this definitely makes backports harder, | 14:36 |
dansmith | I'm not sure why they're hard without it | 14:36 |
efried | (hm, I've been misusing that idiom for years. It's "go the whole hog".) | 14:36 |
efried | dansmith: because of all the *other* nova-net removal churn | 14:37 |
stephenfin | mocks, for one | 14:37 |
stephenfin | we no longer need to care about mocking stuff that nova-net was doing. If you backport a test, that goes back to not being true | 14:38 |
dansmith | efried: okay but that's hard like code conflict, not hard like "this file doesn't exist in the old branch" hard | 14:38 |
efried | I get it. | 14:38 |
openstackgerrit | sean mooney proposed openstack/nova master: support pci numa affinity policies in flavor and image https://review.opendev.org/674072 | 14:39 |
mriedem | anywho, i didn't -1 it and it's later in the series, just mentioning it | 14:39 |
mriedem | someone has to take on the current big ass bottom change that i +2ed already | 14:39 |
efried | stephenfin: Given that we have nova/image/glance.py, I don't see a problem with having nova/networking/neutron.py | 14:39 |
sean-k-mooney | efried: i have not added the spy function to check the pci assignment but i think i have made all the other chagnes you suggested | 14:40 |
stephenfin | dansmith: but it won't - git is smart | 14:40 |
mriedem | efried: that's a good point | 14:40 |
mriedem | we also have a cinder.py | 14:40 |
efried | nova/volume/cinder.py yeah. | 14:40 |
stephenfin | dansmith: Go make a trivial modification to nova/tests/unit/test_nova_manage.py, commit on master, then backport to stable/stein | 14:40 |
efried | we should make sure Sundar names it nova/accelerator/cyborg.py and we're golden. | 14:40 |
stephenfin | it'll happen cleanly, even though I renamed that file in commit nova/tests/unit/test_nova_manage.py | 14:41 |
stephenfin | whoops, commit 463017b51b8cde48582b2f55ad7a1f2321d03d02 | 14:41 |
mriedem | actually i did just -1 | 14:41 |
mriedem | i think having the module having neutron in the name would be less confusing to someone coming to hack on nova and knowing that there was at least once a nova-network thing, and finding nova/network/api.py and wondering if it's nova-net or neutron | 14:42 |
sean-k-mooney | efried: that would imply that that code is virt driver indepentet. if they cyborg code in that folder was then sure | 14:42 |
dansmith | stephenfin: aight, well, when we moved tests/ into tests/unit things were not "smart", but whatevs | 14:42 |
mriedem | since we have explicit glance.py and cinder.py modules those are pretty clear what they are for | 14:42 |
mriedem | dansmith: oh yeah the tests -> tests/unit was another big one | 14:42 |
efried | mriedem: oh, I hope we're getting rid of the API shim | 14:43 |
efried | stephenfin: ? ^ | 14:43 |
mriedem | efried: i'm fine with removing base_api.py and api.py (nova-net) | 14:43 |
mriedem | but i don't think we need to rename neutronv2.api to api.py | 14:43 |
efried | yeah, I'm fine leaving it named neutron | 14:43 |
sean-k-mooney | well on that how would people feel about eventurally moving the neutron code to os-vif? | 14:44 |
mriedem | why? | 14:44 |
efried | that sounds like a conversation for a far future release | 14:44 |
*** dtantsur is now known as dtantsur|brb | 14:44 | |
sean-k-mooney | well partly to not need to have any netowrking code in nova | 14:44 |
mriedem | "hey let's move some already really complicated and not very well understood code out to a library with a different core team" | 14:44 |
canori01 | Is it safe to rebuild the placement database? I have an issue where all my hypervisors are running into conflicts (conflicting resource provider name) | 14:45 |
sean-k-mooney | well all nova cores are os-vif cores | 14:45 |
canori01 | Can I just empty the db and bounce nova-compute? | 14:45 |
sean-k-mooney | but anyway its just an idea | 14:45 |
mriedem | sean-k-mooney: not worth the effort | 14:45 |
mriedem | pick your battles | 14:45 |
sean-k-mooney | its currently plugable so i was thinking of porting it then we could swap after | 14:45 |
sean-k-mooney | but ok | 14:45 |
mriedem | canori01: the inventory will rebuild itself automatically, the consumers/allocations will not | 14:46 |
sean-k-mooney | ill drop it for now | 14:46 |
*** ociuhandu has joined #openstack-nova | 14:46 | |
mriedem | canori01: nova-manage placement heal_allocations should heal those up those if you do have to rebuild the placement db | 14:46 |
mriedem | *heal those up though | 14:47 |
*** eharney has quit IRC | 14:47 | |
sean-k-mooney | mriedem: that was added in rocky right. in queens we still had the periodic heal task form the move to placment | 14:47 |
mriedem | wrongish | 14:48 |
canori01 | mriedem: What happened is I used to have a third-party backup service that had service entries under nova. After removing them, all the hypervisors complain that an entry for them already exists and are unschedulable as a result | 14:48 |
canori01 | So my thought was to rebuild the placement db. I don't know if there's a better option | 14:49 |
*** links has quit IRC | 14:49 | |
sean-k-mooney | canori01: did you remvoe all the nova services | 14:49 |
canori01 | no, the nova services are still there | 14:50 |
canori01 | sean-k-mooney: would removing the service and bouncing nova-compute put things back in order? | 14:51 |
mriedem | kvm right? | 14:53 |
mriedem | the problem is placement has a unique constraint on the hostname, but the uuid on your computes has changed | 14:53 |
mriedem | the uuid on the compute_nodes table record that nova creates and uses to report the resource_providers to placement | 14:53 |
mriedem | so i think you're hitting some version of this https://bugs.launchpad.net/nova/+bug/1817833 | 14:55 |
openstack | Launchpad bug 1817833 in OpenStack Compute (nova) "Check compute_id existence when nova-compute reports info to placement" [Medium,In progress] - Assigned to Matt Riedemann (mriedem) | 14:55 |
sean-k-mooney | canori01: no. i asked because if you remvoed the service the uuid would change when the agent restarts but the hostname would be the same and would cause a resouce provider conflcit | 14:56 |
*** ociuhandu has quit IRC | 14:56 | |
mriedem | if you really need to rebuild the placement db, then i think your steps would be: | 14:57 |
mriedem | 1. backup your current placement db | 14:57 |
mriedem | 2. drop it and rebuild the schema so it's empty | 14:57 |
mriedem | 3. let the computes report their inventory in which will create resource providers on the first run | 14:57 |
mriedem | 4. run: nova-manage placement heal_allocations | 14:57 |
stephenfin | mriedem: replied on https://review.opendev.org/#/c/696509/ | 14:57 |
mriedem | 5. run: nova-manage placement sync_aggregates | 14:57 |
stephenfin | tl;dr: if I don't do that req stuff, s*** breaks, so I did enough to make it work ¯\_(ツ)_/¯ | 14:58 |
mriedem | this is also assuming you aren't using some of the more advanced features like QoS ports in neutron | 14:58 |
canori01 | mriedem: I am not yet using QoS for neutron | 14:58 |
stephenfin | mriedem: also, it's way up the stack but you should definitely look at https://review.opendev.org/#/c/696746/ since it affects your security group caching changes. I think what I did is correct | 14:59 |
*** chason has joined #openstack-nova | 14:59 | |
sean-k-mooney | canori01: it specficlly woudl only be an issue if you were using minium bandwidth qos policy | 14:59 |
mriedem | jesus that is a big change | 14:59 |
sean-k-mooney | canori01: the other qos polices do not interact with placment | 14:59 |
sean-k-mooney | canori01: are you useing routed networks out of interest. e.g. calico | 15:00 |
canori01 | sean-k-mooney: So if I sync the uuid on the database to match the uuid of my existing service entries, that would also solve the issue? | 15:00 |
canori01 | For example, nova service-list has: | 15:00 |
canori01 | 2c1037b3-4977-4a13-aea8-700a805cc11c | nova-compute | bctlz7nova36 | 15:00 |
sean-k-mooney | canori01: that is easier said then done as you would have to also consider exitsting allocation too but in principal yes | 15:01 |
canori01 | placement has: | 2019-10-22 19:12:06 | 2019-11-25 19:27:36 | 157 | c53e4b12-0b0b-4eaa-9fb1-373da8538cea | bctlz7nova36 | 15:01 |
*** chason has left #openstack-nova | 15:01 | |
mriedem | no those aren't the same | 15:01 |
mriedem | the nova services table uuid and nova compute_nodes uuid are not the same | 15:01 |
canori01 | ah ok | 15:01 |
*** chason has joined #openstack-nova | 15:01 | |
sean-k-mooney | right the plcamment uuid is the compute node node uuid | 15:01 |
canori01 | sean-k-mooney: I'm not using calico. Just overlay vxlan networks advertised out with the neutron bgp agent | 15:02 |
*** amodi has quit IRC | 15:02 | |
sean-k-mooney | canori01: ok neutron report the network segment for routed networks to placmenet too | 15:02 |
sean-k-mooney | canori01: if you are using vxlan then you are fine | 15:02 |
mriedem | stephenfin: ok +2 on the 2nd from bottom change | 15:05 |
stephenfin | ta | 15:05 |
efried | stephenfin: and the bottom one is +A | 15:05 |
canori01 | mriedem: so would the safest course of action be to rebuild the placement db and heal the allocations? | 15:06 |
mriedem | i'm assuming canori01 didn't understand the question about what routed networks as a feature in neutron is | 15:06 |
mriedem | https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html | 15:06 |
mriedem | tl;dr it relies on aggregates | 15:06 |
mriedem | and they aren't supported in nova anyway so it's a red herring here | 15:06 |
*** ociuhandu has joined #openstack-nova | 15:07 | |
mriedem | canori01: i think doing that (what i laid about above) is likely more fool proof than trying to hack the uuids to get all synced up | 15:07 |
mriedem | disclaimer: if you run into problems with that this isn't a support channel nor am i your paid vendor so i'm not going to be walking you through every issue you run into :) | 15:08 |
efried | It would be a good real world test of that procedure, anyway. | 15:08 |
efried | A canori in a coal mine, so to speak. | 15:08 |
canori01 | mriedem: ok, thanks. Also, I'm definitely not using the routed networks. My provider network is just one segment | 15:08 |
mriedem | efried: indeed - test it in (someone else's) production | 15:08 |
mriedem | efried: maybe a troubleshooting item to document, "oh no my placement db is all screwed up, how can i just start over w/ my existing nova" | 15:09 |
canori01 | mriedem: of course. I understand about the disclaimer :D | 15:09 |
efried | I thought cdent had that somewhere mebbe? | 15:09 |
efried | he's hanging out in -placement atm... | 15:09 |
* efried bbiab | 15:09 | |
*** ociuhandu has quit IRC | 15:12 | |
*** chason has quit IRC | 15:13 | |
mriedem | sean-k-mooney: to answer your earlier question, yes heal_allocations was added in rocky, but the RT did not report allocations peridiocially in rocky *unless* it's an ironic compute | 15:13 |
mriedem | see https://review.opendev.org/#/c/576462/ | 15:14 |
*** dpawlik has quit IRC | 15:16 | |
sean-k-mooney | mriedem: no i ment it did that in queens | 15:16 |
sean-k-mooney | although only if you had ironic or pike compute nodes | 15:16 |
mriedem | correct | 15:17 |
mriedem | wait, no, <pike computes | 15:17 |
sean-k-mooney | am maybe | 15:17 |
mriedem | starting in pike once all your computes were upgraded we stopped having the RT report allocations because it would overwrite what the scheduler did and screw up allocations during move operations | 15:17 |
sean-k-mooney | yes | 15:17 |
mriedem | that's also why dansmith did migration-based allocatoins for move ops in queens | 15:18 |
sean-k-mooney | yep so basicaly the downstream issue was related to fixing allocation for undercloud (ironic) node where the customer acidentally deleted the service | 15:19 |
mriedem | stephenfin: while you're still around, i need you and efried to come to an agreement on https://review.opendev.org/#/c/696582/ | 15:20 |
sean-k-mooney | because it was ironic and queens that perodic saved them. but they were asking what would happen if the same happend on the overcloud(libvirt nodes) | 15:20 |
sean-k-mooney | which was when we noticed that the heal_allcoation command was not on queens just rocky | 15:21 |
*** eharney has joined #openstack-nova | 15:23 | |
*** parlos has quit IRC | 15:24 | |
*** ociuhandu has joined #openstack-nova | 15:24 | |
*** ociuhandu has quit IRC | 15:28 | |
*** damien_r has joined #openstack-nova | 15:30 | |
*** mkrai has joined #openstack-nova | 15:30 | |
*** lpetrut has quit IRC | 15:32 | |
*** damien_r has quit IRC | 15:33 | |
mriedem | speaking of which, melwitt - should i continue backporting these to rocky? https://review.opendev.org/#/q/topic:heal_allocations_dry_run+(status:open+OR+status:merged) | 15:34 |
*** links has joined #openstack-nova | 15:38 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add troubleshooting doc about rebuilding the placement db https://review.opendev.org/698517 | 15:42 |
mriedem | efried: canori01: ^ brain dump | 15:42 |
*** panda is now known as panda|bbl | 15:44 | |
*** ttsiouts has quit IRC | 15:48 | |
*** mlavalle has joined #openstack-nova | 15:53 | |
*** mkrai has quit IRC | 15:58 | |
melwitt | mriedem: if you do, it would be a help | 16:00 |
melwitt | I support ++ | 16:00 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support live migration with qos ports https://review.opendev.org/695905 | 16:06 |
gibi | mriedem: the happy case support for live migration is now complete in ^^ | 16:06 |
efried | stephenfin: re https://review.opendev.org/#/c/696582/ -- I want the shiny new command in the docs for sure. And I'm not sure your PS2 commentary meant you wanted it actually removed -- did it? | 16:10 |
mriedem | gibi: ack - throw that series into the runways etherpad? | 16:12 |
mriedem | i'm also waiting on efried to come back on https://review.opendev.org/#/c/696541/ | 16:12 |
*** lbragstad_ has joined #openstack-nova | 16:12 | |
efried | mriedem: looking now | 16:13 |
gibi | mriedem: ack, adding... | 16:13 |
mriedem | gibi: i also replied to your comments on https://review.opendev.org/#/c/637070/ but then accidentally rebased | 16:14 |
gibi | mriedem: ack, put it in my queue | 16:14 |
*** lbragstad has quit IRC | 16:15 | |
*** Sundar has joined #openstack-nova | 16:16 | |
efried | mriedem, gibi: I'm +2 on https://review.opendev.org/#/c/696541/ | 16:19 |
gibi | efried: thanks | 16:19 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Remove native LUKS compat code https://review.opendev.org/669121 | 16:25 |
mriedem | i'm not | 16:25 |
*** jaosorior has joined #openstack-nova | 16:25 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add --dry-run option to heal_allocations CLI https://review.opendev.org/698525 | 16:29 |
gibi | mriedem: ack, I will need to get back to that patch tomorrow | 16:30 |
*** jmlowe has joined #openstack-nova | 16:30 | |
gibi | mriedem: most of the nois is there because this patch went through couple PSs with different solutions | 16:31 |
gibi | mriedem: I will get back to it tomorrow and clean it up | 16:31 |
* gibi leaves for today, happy hacking folks | 16:32 | |
sean-k-mooney | gibi: o/ | 16:33 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add --instance option to heal_allocations https://review.opendev.org/698529 | 16:38 |
openstackgerrit | sean mooney proposed openstack/nova stable/train: Block rebuild when NUMA topology changed https://review.opendev.org/698530 | 16:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Add BFV wrinkle to TestNovaManagePlacementHealAllocations https://review.opendev.org/698531 | 16:39 |
openstackgerrit | sean mooney proposed openstack/nova stable/train: Disable NUMATopologyFilter on rebuild https://review.opendev.org/698532 | 16:40 |
sean-k-mooney | mriedem: since im backporting stuff should i backport https://review.opendev.org/#/c/695118/ which is the fix to https://bugs.launchpad.net/nova/+bug/1847367 | 16:45 |
openstack | Launchpad bug 1847367 in OpenStack Compute (nova) "Images with hw:vif_multiqueue_enabled can be limited to 8 queues even if more are supported" [Undecided,Fix released] - Assigned to sean mooney (sean-k-mooney) | 16:45 |
sean-k-mooney | mriedem: it was opened against rocky so i guess it should go back at least that far | 16:45 |
mriedem | i'd probably let eandersson or his minions do the backports if they want them | 16:47 |
efried | mriedem: If I delete a shelved instance, does the virt driver ever get a crack at cleaning up? | 16:49 |
efried | a) if not offloaded, I assume yes, because the instance is still on the host | 16:49 |
efried | b) if offloaded, I assume no, because whose virt driver would we hit? | 16:49 |
mriedem | correct | 16:49 |
efried | thx | 16:49 |
mriedem | is this vpmem or accelerator related? | 16:49 |
efried | neither, vtpm | 16:51 |
efried | Means I think we're going to have to delete the swift obj from the conductor rather than the virt driver. | 16:52 |
efried | which kinda sucks because it's a virt driver-specific thing. At least the contents are. | 16:52 |
mriedem | i haven't been paying attentiong to the vtpm hullabaloo | 16:52 |
mriedem | *attention | 16:52 |
mriedem | conductor isn't involved in the server delete btw, | 16:53 |
mriedem | so the api would be doing whatever external cleanup is necessary | 16:53 |
efried | sigh, that's what I meant. | 16:53 |
efried | "controller" | 16:53 |
efried | can't imagine why I confuse that with "conductor". | 16:53 |
efried | though by now you would have thought I could get it the f right. | 16:54 |
efried | maybe after 8 years... | 16:54 |
mriedem | with enough pedantic ridicule you'll get there! | 16:54 |
efried | is that what it's for? gtk | 16:54 |
efried | I thought it was just plain old schoolyard bullying. | 16:54 |
*** gyee has joined #openstack-nova | 16:55 | |
*** dtantsur|brb is now known as dtantsur | 16:55 | |
*** iurygregory has quit IRC | 16:56 | |
*** maciejjozefczyk has quit IRC | 16:59 | |
*** LiangFang has quit IRC | 17:01 | |
canori01 | mriedem: That seems to have worked well. I'm still doing tests, but looks promising. I had to do it slightly differently. I'm running rocky, but my placement db is not broken out. So instead of dropping and recreating the db, I had to truncate the tables that placement uses. Then I bounced nova-compute and did the healing and aggregate syncing | 17:04 |
mriedem | ah cool | 17:07 |
mriedem | glad it's working | 17:07 |
*** rpittau is now known as rpittau|afk | 17:13 | |
openstackgerrit | Merged openstack/nova stable/queens: Do not update root_device_name during guest config https://review.opendev.org/696469 | 17:16 |
openstackgerrit | Merged openstack/nova master: nova-net: Drop nova-network-base security group tests https://review.opendev.org/696508 | 17:16 |
sean-k-mooney | efried: the virt driver should have cleaned up evertying on the host as part of the offload step | 17:19 |
sean-k-mooney | oh i see this is related to vtpm | 17:19 |
*** psachin has quit IRC | 17:19 | |
*** panda|bbl is now known as panda | 17:20 | |
*** jlvillal has left #openstack-nova | 17:20 | |
*** links has quit IRC | 17:22 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Resolve (most) flake8 3.x issues https://review.opendev.org/695732 | 17:24 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Switch to flake8 3.x https://review.opendev.org/695733 | 17:24 |
*** yan0s has quit IRC | 17:26 | |
*** ociuhandu has joined #openstack-nova | 17:30 | |
mriedem | wtf, so back on dec 5 i had a passing run of nova-multi-cell with migration tests enabled. now since the 9th with a new run all migration tests are failing because once the confirmed resized server is active again the api is saying the flavor is the old id even though i can see in the conductor logs where right before that the instance has the correct flavor | 17:31 |
mriedem | the api is pulling from the correct cell | 17:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add resource provider allocation unset example to troubleshooting doc https://review.opendev.org/696582 | 17:32 |
stephenfin | mriedem: Revert that if you don't like what I've done. It was just easier do it than explain what I wanted in a comment ^ | 17:33 |
stephenfin | efried: you asked about that earlier ^ | 17:33 |
mriedem | ok | 17:34 |
openstackgerrit | Merged openstack/nova master: Use provider mappings from Placement (mostly) https://review.opendev.org/696992 | 17:40 |
openstackgerrit | Merged openstack/nova master: Create a controller for qga when SEV is used https://review.opendev.org/693072 | 17:40 |
openstackgerrit | Merged openstack/nova master: Extend NeutronFixture to handle multiple bindings https://review.opendev.org/696246 | 17:40 |
openstackgerrit | Merged openstack/nova master: Do not mock setup net and migrate inst in NeutronFixture https://review.opendev.org/696247 | 17:40 |
*** igordc has joined #openstack-nova | 17:42 | |
*** ociuhandu has quit IRC | 17:45 | |
*** awalende has quit IRC | 17:46 | |
*** awalende has joined #openstack-nova | 17:47 | |
*** awalende has quit IRC | 17:51 | |
*** awalende has joined #openstack-nova | 17:52 | |
*** awalende_ has joined #openstack-nova | 17:54 | |
*** awalende has quit IRC | 17:56 | |
*** awalende_ has quit IRC | 17:57 | |
*** derekh has quit IRC | 18:01 | |
*** Sundar has quit IRC | 18:12 | |
*** igordc has quit IRC | 18:13 | |
*** igordc has joined #openstack-nova | 18:14 | |
*** salmankhan has quit IRC | 18:19 | |
*** awalende has joined #openstack-nova | 18:21 | |
efried | mriedem: are you okay with stephenfin's update to at doc patch? Since you both have hands in it, if you're okay with it I'll fast approve, taking stephenfin's authorship as implicit approval and since it's docs... | 18:22 |
melwitt | johnthetubaguy: hey, are you around bychance? | 18:22 |
*** dtantsur is now known as dtantsur|afk | 18:23 | |
*** ociuhandu has joined #openstack-nova | 18:24 | |
*** awalende has quit IRC | 18:25 | |
*** igordc has quit IRC | 18:25 | |
*** jaosorior has quit IRC | 18:27 | |
*** ociuhandu has quit IRC | 18:29 | |
mriedem | efried: yet to look at it | 18:37 |
mriedem | but soon, very soon....muwahhaaha | 18:37 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add cross-cell resize tests for _poll_unconfirmed_resizes https://review.opendev.org/698322 | 18:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: DNM: debug cross-cell resize https://review.opendev.org/698304 | 18:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: DNM: debug cross-cell resize https://review.opendev.org/698304 | 18:41 |
mriedem | efried: stephenfin's changes look fine to me | 18:44 |
*** tosky has quit IRC | 18:44 | |
efried | +A | 18:45 |
mriedem | if someone can push through https://review.opendev.org/#/c/696509/ it's the current bottom of the nova-net removal series; it looks like the rest of the series after that is now in merge conflict so the whole thing has to be rebased. | 18:45 |
*** henriqueof has joined #openstack-nova | 18:50 | |
efried | mriedem: I'm gonna try to hit that today, but it keeps getting pushed down my stack :( | 18:52 |
mriedem | ack it's pretty mechanical so anyone should be able to hit it | 18:53 |
*** ralonsoh has quit IRC | 18:55 | |
melwitt | TheJulia: do you know whether cpu_arch is supposed to be required from an ironic pov? https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L103-L106 is it valid for a deployment *not* to specify cpu_arch? (for example, in a single arch environment) context is this bug https://bugzilla.redhat.com/show_bug.cgi?id=1688838 | 19:02 |
openstack | bugzilla.redhat.com bug 1688838 in openstack-nova "Ironic should not treat cpu_arch as mandatory" [Medium,New] - Assigned to mwitt | 19:02 |
sean-k-mooney | gibi: by the way i assume attaching a port with a resocue request to a existing instance is still supported correct. it was declared out of scope in stien and it was not mentioned as adressed in train. is that on your ussuri todo list? | 19:06 |
*** mvkr has quit IRC | 19:06 | |
*** martinkennelly has quit IRC | 19:07 | |
sean-k-mooney | i assume we would have updated https://github.com/openstack/nova/blob/master/releasenotes/notes/reject-interface-attach-with-port-resource-request-17473ddc5a989a2a.yaml if it was supported or added another release note | 19:08 |
*** nicolasbock has quit IRC | 19:11 | |
*** nicolasbock has joined #openstack-nova | 19:11 | |
*** spsurya has quit IRC | 19:14 | |
melwitt | jroll: ^ maybe you might know (ironic driver question from me) | 19:18 |
*** eharney has quit IRC | 19:30 | |
mriedem | sean-k-mooney: not supported - the request has to be validated with placement on attach and that isn't done | 19:31 |
sean-k-mooney | ya that is what i understood too | 19:32 |
sean-k-mooney | im doing a downstream docs review and wanted to make sure that was called out | 19:32 |
*** damien_r has joined #openstack-nova | 19:32 | |
mnaser | eandersson: is this similar to what you've been running into? https://bugs.launchpad.net/nova/+bug/1835637 | 19:33 |
openstack | Launchpad bug 1835637 in OpenStack Compute (nova) "(404) NOT_FOUND - failed to perform operation on queue 'notifications.info' in vhost '/nova' due to timeout" [Undecided,Incomplete] | 19:33 |
*** lpetrut has joined #openstack-nova | 19:33 | |
*** iurygregory has joined #openstack-nova | 19:34 | |
*** damien_r has quit IRC | 19:35 | |
*** mriedem has quit IRC | 19:37 | |
*** tesseract has quit IRC | 19:46 | |
*** lpetrut has quit IRC | 19:46 | |
*** awalende has joined #openstack-nova | 19:48 | |
*** awalende has quit IRC | 19:53 | |
*** lbragstad has joined #openstack-nova | 20:03 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: ksa auth conf and client for Cyborg access https://review.opendev.org/631242 | 20:04 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec. https://review.opendev.org/631243 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Define Cyborg ARQ binding notification event. https://review.opendev.org/692707 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Compose accelerator PCI devices into domain XML in libvirt driver. https://review.opendev.org/631245 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard reboot with accelerators. https://review.opendev.org/697940 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Pass accelerator requests to each virt driver from compute manager. https://review.opendev.org/698581 | 20:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244 | 20:07 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Pass accelerator requests to each virt driver from compute manager. https://review.opendev.org/698581 | 20:07 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Compose accelerator PCI devices into domain XML in libvirt driver. https://review.opendev.org/631245 | 20:07 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 20:07 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard reboot with accelerators. https://review.opendev.org/697940 | 20:07 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 20:07 |
*** ociuhandu has joined #openstack-nova | 20:09 | |
*** eharney has joined #openstack-nova | 20:12 | |
*** abaindur has joined #openstack-nova | 20:16 | |
*** abaindur has quit IRC | 20:17 | |
*** abaindur has joined #openstack-nova | 20:17 | |
*** ociuhandu has quit IRC | 20:21 | |
*** ociuhandu has joined #openstack-nova | 20:21 | |
*** ociuhandu has quit IRC | 20:40 | |
*** ociuhandu has joined #openstack-nova | 20:41 | |
*** martinkennelly has joined #openstack-nova | 20:43 | |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/stein: block_device: Copy original volume_type when missing for snapshot based volumes https://review.opendev.org/696686 | 20:44 |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/rocky: block_device: Copy original volume_type when missing for snapshot based volumes https://review.opendev.org/697260 | 20:44 |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/queens: block_device: Copy original volume_type when missing for snapshot based volumes https://review.opendev.org/697261 | 20:45 |
*** iurygregory has quit IRC | 20:45 | |
*** damien_r has joined #openstack-nova | 20:47 | |
*** damien_r has quit IRC | 20:47 | |
*** damien_r has joined #openstack-nova | 20:47 | |
eandersson | mnaser I dont't think it's the same issue, but could be similar | 20:49 |
mnaser | eandersson: seems like the queues just crash even on a restart | 20:49 |
eandersson | https://github.com/rabbitmq/rabbitmq-server/issues/641 | 20:49 |
mnaser | even with 3.8.1 | 20:49 |
eandersson | This is the issue we are running into | 20:49 |
mnaser | given that we have k8s running side by side our entire infra now, we're just probably going to run one single non-clustered rabbitmq instance | 20:50 |
eandersson | but yea could be the same because we see different symptons each time | 20:50 |
mnaser | for every service | 20:50 |
eandersson | Yea - this has been very draining for us | 20:51 |
eandersson | Some of our RabbitMQ clusters has 17k+ queues as well | 20:52 |
eandersson | Since every compute ends up with like at least 7 queues | 20:52 |
eandersson | btw mnaser do you have network partition auto-healing enabled? | 20:52 |
eandersson | or pause_minority rather | 20:53 |
*** mgariepy has quit IRC | 20:54 | |
eandersson | I believe the trigger for these issues is that when RabbitMQ comes back up it is starting to accept connections before it is fully recovered. | 20:54 |
eandersson | At least in 3.6.X we had that issue. Could be a new issue in 3.7.X | 20:54 |
mnaser | eandersson: i ran into it even with 3.8.1 | 20:55 |
mnaser | eandersson: cluster_partition_handling, pause_minority | 20:55 |
eandersson | I tried to tell the RabbitMQ guys about this, but I don't know how to reproduce it properly. | 20:56 |
eandersson | Plus I run like 3.7.5 so they just keep telling me to upgrade. | 20:56 |
eandersson | But after the RabbitMQ 3.6.3 debacle we don't upgrade too frequently without first testing it properly. So takes time. | 20:57 |
*** ociuhandu has quit IRC | 20:57 | |
*** Sundar has joined #openstack-nova | 20:59 | |
Sundar | dansmith: Would it help to discuss in IRC and then summarize in Gerrit? | 21:00 |
dansmith | Sundar: if you were ever in irc, sure | 21:01 |
dansmith | Sundar: what I want is discussion instead of replies of "no you're wrong" two minutes before pushing up replacement set without addressing the thing | 21:01 |
Sundar | dansmith Sure. I addressed most of your points, BTW. | 21:02 |
Sundar | I think the disconnect is the understanding of the object model. Please see https://review.opendev.org/#/c/631243/46/nova/accelerator/cyborg.py@26 | 21:03 |
*** slaweq has quit IRC | 21:03 | |
dansmith | Sundar: I'm not sure what your point is | 21:04 |
dansmith | Sundar: I will eventually be able to have one crypto accelerator and one gzip accelerator, right? | 21:05 |
Sundar | dansmith: Yes, for the same instance | 21:05 |
dansmith | and those are two device profiles? | 21:05 |
Sundar | No, one device profile with 2 request groups | 21:05 |
dansmith | okay, so an instance will only ever have one device profile? | 21:05 |
Sundar | Yes. That single device profile's name is set in the flavor. | 21:06 |
dansmith | so, why are you setting the tag in the event to the device profile? | 21:06 |
Sundar | That was the only logical choice for a tag that seemed relevant. | 21:07 |
dansmith | setting the tag means that the event needs to be multiplexed for the instance | 21:07 |
dansmith | which is why we set it to the port id for neutron ports, for example, because there are multiple ports per instance | 21:08 |
dansmith | I'm not sure why it was decided the encompass all of the accelerators for an instance into a single entity, but alas | 21:08 |
Sundar | Would we ever need multiple events per instance? For example, for hot adds/deletes in the future, by updating the device profile? | 21:09 |
dansmith | Sundar: those would be different event types | 21:09 |
Sundar | Then what exactly is the problem -- that the tag is superfluous? | 21:10 |
dansmith | setting the tag implies that there can be multiples, so ... yes | 21:11 |
*** slaweq has joined #openstack-nova | 21:11 | |
Sundar | dansmith: Got the disconnect. We don;t use the tag, at least not today. With hot adds/deletes, since it it is going to be another event type, we still don;t need it. | 21:12 |
Sundar | dansmith: BTW, the idea of one device profile per instance came from the idea of setting one device profile name in the instance. | 21:12 |
dansmith | I dunno what those would look like, but I would hope that there is some indication of what device the "thing got added" event pertains to | 21:12 |
dansmith | ...which would be done with a tag | 21:13 |
*** ociuhandu has joined #openstack-nova | 21:13 | |
Sundar | dansmith: It may make sense to have a single notification for an update too -- because there is not much that Nova can with a partial update knowing that the next event may indicate a failure and things need to be rolled back | 21:14 |
Sundar | That is the same reasoning as for the bind logic here | 21:15 |
dansmith | Sundar: "something failed" events are pretty terrible | 21:15 |
*** slaweq has quit IRC | 21:16 | |
Sundar | dansmith: If you mean that the event should say what exactly failed, agreed. | 21:16 |
dansmith | Sundar: I mean we should at least know which thing failed, and tag is the "which" | 21:16 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: Remove 'nova-xvpvncproxy' https://review.opendev.org/687909 | 21:17 |
Sundar | dansmith: The problem is that, if I were to send a separate event for each ARQ for binding success/failure, tagged with the ARQ UUID, Nova woudn't know what to do with that . For one, there is no association between the ARQ UUID and the instance in nova -- that would presumably need a db change. | 21:18 |
Sundar | So, nova wouldn't know which instance is affected. | 21:19 |
dansmith | huh? the event would still be delivered to the instance, so we know which instance is affected | 21:19 |
dansmith | I understand we don't store the arq uuid anywhere, although I think we likely still have it before we're going to wait, since we just polled cyborg for the list | 21:20 |
Sundar | dansmith: true. The event has a server-uuid field. | 21:20 |
dansmith | that's kindof the whole point. | 21:20 |
dansmith | it would be much less odd if you did that, even if it doesn't mean that much to us right now | 21:20 |
dansmith | because we can log the detail, potentially report it in the instance action event, | 21:21 |
dansmith | instead of "well, I dunno, cyborg said it didn't work.. *shrug*" | 21:21 |
Sundar | dansmith: However, what does a single ARQ's success mean to Nova? It would have to wait anyway for all of them. If any of them failed, the whole set needs to be rolled back. | 21:21 |
dansmith | but whatever happens, if you're going to send one per instance, it needs to not have a tag | 21:21 |
Sundar | dansmith: Ok, I can remove the tag. | 21:22 |
dansmith | Sundar: in some future we may need to do extra things to the device on the host, and we can do that for devices that complete early while we wait for the others, | 21:22 |
dansmith | and like I said, being able to report to the user and the admin which and what happened is useful | 21:22 |
*** rcernin has joined #openstack-nova | 21:23 | |
*** francoisp has quit IRC | 21:23 | |
Sundar | Re. report to user/admin, the exact details would be in Cyborg logs. Given the range of errors, it will be tough to state the failure reason in the Nova event. | 21:24 |
Sundar | Perhaps the Nova logs could point the admin to Cyborg logs. | 21:24 |
dansmith | no, I mean report *which* thing failed | 21:25 |
dansmith | not why | 21:25 |
Sundar | For example, say we had one event per ARQ, tagged by ARQ UUID. Nova logs report that a certain ARQ UUID failed to bind, Would that really be useful, without knowing more about the ARQ? | 21:26 |
dansmith | yes? | 21:26 |
dansmith | I grab that arq uuid and hit logstash to see what else has reported stuff about that thing | 21:27 |
dansmith | We could just always report "error with (neutron|cinder|cyborg). You go figure it out" but that's not likely to be very popular | 21:28 |
Sundar | dansmith: IMHO, I think it will be better to look at an error in context. Giving the user an ARQ UUID is equivalent to 'go search Cyborg logs', right? | 21:30 |
dansmith | I can't believe I'm having this conversation. | 21:30 |
*** mvkr has joined #openstack-nova | 21:30 | |
dansmith | Sundar: either make the event look like the rest of our events, or stop passing the tag. | 21:30 |
Sundar | dansmith: Sure, I said I can stop passing the tag. You had 2 concerns with that -- one was ease of use of logs. We were attempting to resolve that. | 21:32 |
Sundar | The other was "in some future we may need to do extra things to the device on the host". | 21:32 |
Sundar | I am still thinking about that, esp. if those extra things will require a different event type, which could have or not have a tag, independent of this event. | 21:33 |
Sundar | dansmith: Please LMK if I am off. | 21:34 |
*** lbragstad has quit IRC | 21:34 | |
*** tbachman has quit IRC | 21:35 | |
dansmith | Sundar: is there any reason you can't send the event per ARQ in cyborg, and any reason you can't know the ARQ UUIDs at the time we wait? We have that info passed in now right? | 21:35 |
*** pcaruana has quit IRC | 21:38 | |
Sundar | dansmith: We are debating the benefits of sending one event per ARQ, beyond 'this is what we have done in the past.' It seems to me that logging the ARQ UUID gives little info to the user beyond 'look elsewhere for the details'. Please LMk wht that is wrong. | 21:38 |
Sundar | Secondly, we still have to wait for all events for that instance, right? | 21:39 |
dansmith | efried: you around now? | 21:39 |
*** martinkennelly has quit IRC | 21:40 | |
efried | hi, yes, all the doctors this week. | 21:40 |
efried | shall I read scrollback or are you going to summarize for me? | 21:40 |
dansmith | so, it has become clear to me that the cyborg event as currently proposed is being used very differently than what we have for the other services | 21:40 |
efried | If we're discussing how many events... | 21:41 |
dansmith | specifically, it's one event per instance with "everything is done" or "everything failed" level granularity | 21:41 |
dansmith | instead of per-ARQ, akin to per-port or per-volume or anything else | 21:41 |
efried | I skimmed the reviews briefly and it makes sense to me to have an event per "attachable thingy" | 21:41 |
dansmith | right | 21:41 |
dansmith | so I think I'm going to convert to taking a hard line on that | 21:41 |
efried | That's me looking at a very high level, without really digging in. | 21:41 |
dansmith | that the event should be per-ARQ | 21:41 |
dansmith | even though everything is very monolithic right now, | 21:42 |
efried | unless the contract with cyborg is "all or nothing" | 21:42 |
Sundar | efried; What will Nova with a per-ARQ event? | 21:42 |
Sundar | *do with | 21:42 |
efried | I mean, I'm guessing we would abort the build if we get partial | 21:42 |
dansmith | eventually this should be similar to what we do with ports and volumes in terms of attachability | 21:42 |
dansmith | efried: yes, just like we do for ports and volumes now during build | 21:42 |
dansmith | however, | 21:42 |
dansmith | we use the same event during attach-one-later type operations | 21:42 |
dansmith | and this should be the same. | 21:43 |
efried | sure, but iiuc the (anticipated, future) design is still going to involve wrapping "attach later" arqs in a device profile-like thing. | 21:43 |
Sundar | If Nova starts sending per-ARQ delete for an instance, that is going to complicate things on both sides | 21:43 |
efried | and again, wouldn't we want to fail the whole operation if a subset of attachments fail? | 21:43 |
dansmith | efried: during build | 21:44 |
dansmith | efried: just like we do for network and storage | 21:44 |
dansmith | efried: but during attach, the granularity matters of course | 21:44 |
efried | Okay, if we do post-build attach of networks/volumes, we support "partial success" and don't revert the whole operation? | 21:45 |
Sundar | efried: Exactly. That is why it makes sense to wait for the whole thing.I Cyborg gets an abort in the middle of a device prep task, there may not be much to do, except wait for it to complete | 21:45 |
Sundar | *If | 21:45 |
dansmith | efried: post-build attach sends one event per attachable thing, exactly the same cardinality as one-per during build | 21:45 |
dansmith | efried: and yes, failure to attach just the new thing is not a completly totally fatal thing, of course | 21:46 |
dansmith | Sundar: waiting for them all to complete has nothing to do with whether or not there are multiple events | 21:46 |
dansmith | I don't even care if you currently batch them up and send them all at once today | 21:46 |
efried | dansmith: tbf, it's not "of course"; that's an architectural decision that was made at some point. Which is fine if that's the way it is. And | 21:46 |
efried | Sundar: I agree we want to have parity with the network/volume operations, even if it doesn't make sense for accels right now. | 21:46 |
dansmith | it's about the data structure and the protocol, and how that affects the future | 21:46 |
efried | so yeah, it happens that for build we'll abort the whole thing if any subset fails. | 21:47 |
efried | but this will allow us to do the future thing without having to like rearchitect the API on both sides and sweat upgrades etc. | 21:47 |
efried | Sundar: the added complexity is negligible, really. | 21:47 |
dansmith | that's the thing I want ... to be granular now so we can be granular in the future | 21:48 |
*** ociuhandu has quit IRC | 21:48 | |
Sundar | dansmith: efried: just to be sure, you are ok if Cyborg bunches up all the ARQ events for an instance, after all ARQs have bound? | 21:49 |
efried | Sundar: multiple events in the same POST /os-server-external-events call, yes. | 21:49 |
dansmith | Sundar: yes, I'm okay with you batching them all so they arrive at the same time as they do today, I just care that you represent them separately | 21:49 |
efried | dansmith: presumably if we run into the no-host-yet race, it would impact all the events in that payload, because all same instance, and that instance is loaded just once in the method... | 21:51 |
Sundar | The current logic to wait_for_instance_event waits for a single event, right? So, if if we poll Cyborg at that point, and find that all have resolved, can we still exit early, like today? | 21:51 |
dansmith | Sundar: you're going to need to do one of two things, I think, which is to poll cyborg once more without the binding=complete filter to get those, or pass them down from conductor so you have them. And yes, I'm fine with that and think it's worth the overhead | 21:51 |
dansmith | efried: it doesn't matter | 21:51 |
efried | (I just checked it, the instance is loaded once, so yeah, if one 422s, all will 422.) | 21:52 |
dansmith | efried: the whole reason we got into this conversation is because he wasn't doing the skip granularly.. so we still do that, and only skip the ones we're missing, which is why we started having this discussion | 21:52 |
*** henriqueof has quit IRC | 21:52 | |
dansmith | efried: ah you mean the api side, yeah I think that's fine | 21:52 |
eandersson | mriedem, sean-k-mooney for the max_queues patch we don't need it as we just internally changed the logic to always allow 256 queues, but if I get some time over this week I can help backport it. | 21:52 |
*** ociuhandu has joined #openstack-nova | 21:52 | |
*** henriqueof has joined #openstack-nova | 21:52 | |
efried | Sundar: wait_for_instance_event takes multiple events to wait for. If you query cyborg and a subset have already bound, you cancel just those events -- dansmith's patch accounts for that -- and then wait_for_instance_event will continue to wait for the remainder. | 21:53 |
efried | dansmith: right ^ ? | 21:53 |
dansmith | precisely | 21:53 |
Sundar | efried: There's a problem with that. Nobva doesn't keep ARQ UUIDs around, to use as tags in the event to wait for. | 21:54 |
dansmith | Sundar: I just addressed that above | 21:54 |
dansmith | [13:51:32] <dansmith>Sundar: you're going to need to do one of two things, I think, which is to poll cyborg once more without the binding=complete filter to get those, or pass them down from conductor so you have them. And yes, I'm fine with that and think it's worth the overhead | 21:55 |
efried | so | 21:56 |
efried | all_arqs = poll without binding=complete | 21:56 |
efried | wait_for_instance_event(all_arqs): | 21:56 |
efried | done_arqs = poll with binding=complete | 21:56 |
efried | if done_arqs: cancel(done_arqs) | 21:56 |
efried | dansmith: is that what you meant ^ | 21:56 |
dansmith | that's one option yes | 21:56 |
efried | "pass down from conductor" like change RPC signatures and stuff? | 21:56 |
dansmith | you could also stash the ARQ UUIDs in sysmeta in the conductor and avoid having to dick with the rpc signature | 21:57 |
efried | I would prefer the other, just because mucking with RPC gorp scares me. | 21:57 |
dansmith | efried: so, something I've had in my back pocket on this is that you're already breaking upgrades | 21:57 |
dansmith | efried: because you are assuming that computes are new enough to do the thing you're promising to the user, but no checks are made for that | 21:57 |
dansmith | efried: an RPC change would bring that to the forefront | 21:58 |
efried | well | 21:58 |
efried | you'll never get scheduled to such a compute. | 21:58 |
efried | Because such a compute will never advertise the accel inventory | 21:58 |
efried | so I think that's n/a | 21:58 |
dansmith | efried: doesn't cyborg manage that? | 21:58 |
efried | the cyborg agent on the compute | 21:58 |
dansmith | right, so you're wrong :) | 21:59 |
dansmith | new cyborg agent, old compute agent = breakage | 21:59 |
efried | so as long as there's a xdep between the cyborg agent and compute on the same host | 21:59 |
efried | which, that must be a thing, no?? | 21:59 |
dansmith | eh? | 21:59 |
efried | I mean, we enforce that for placement | 21:59 |
dansmith | you mean like an RPM or DEB dependency? | 21:59 |
efried | no, I mean like cyborg agent refuses to start up if compute isn't at version xyz | 22:00 |
dansmith | efried: how would it know? | 22:00 |
Sundar | dansmith: An old Nova would not query CYborg for device profiles or create/bind ARQs | 22:00 |
dansmith | Sundar: a new nova control plane would | 22:00 |
dansmith | Sundar: nova supports backlevel nova-compute services, which is the scenario I'm talking about | 22:00 |
*** dviroel has quit IRC | 22:00 | |
Sundar | dansmith: You mean new n-api, n-super-cond, n-sch but old n-cpu? | 22:01 |
efried | ...do we seriously not have a way to query the nova version on a compute? | 22:01 |
dansmith | Sundar: yes, a very important scenario we have supported for a long time that people depend on | 22:01 |
efried | I mean, obv the RPC API does, which is what you're leading up to. | 22:01 |
dansmith | efried: we don't and shouldn't | 22:02 |
dansmith | I guess I better go ahead and drop that in the review somewhere before I go poof for the year and ya'll try to merge this :) | 22:03 |
Sundar | dansmith: I may be missing something. We had the create/bind and the wait all in n-cpu, which would have avoided this issue (at the expense of less concurrency), right? | 22:03 |
efried | Sundar: no | 22:04 |
dansmith | Sundar: no, you still have the same problem with that arrangement | 22:04 |
efried | So, ugh, what we actually need is something like: | 22:04 |
efried | compute advertises a COMPUTE_CAN_DO_CYBORG_SHIT trait so that | 22:04 |
efried | cyborg (which already queries that RP to know where to hang the accel RPs) can know whether it's even allowed to do that. | 22:04 |
dansmith | efried: or do it with service version but you might as well do an RPC version in that case | 22:04 |
efried | Sundar: The create/bind happens at deploy time, when you've already picked a host. We're talking about the bootstrapping process where cyborg decides to advertise accel resources in the first place. | 22:04 |
dansmith | efried: the trait is more expensive unless you've already got them | 22:04 |
efried | dansmith: expensive how? Because an extra placement query, once, on startup? Not worried about that. | 22:05 |
efried | or were you talking about something else? | 22:05 |
dansmith | something else, and I misunderstood what you mean so let me 'splain: | 22:05 |
*** ociuhandu has quit IRC | 22:06 | |
dansmith | efried: you could either try to avoid cyborg hanging the inventory off the compute at bootstrap time, or you could avoid letting the conductor/scheduler start down this path for a compute that can't handle it | 22:06 |
*** ociuhandu has joined #openstack-nova | 22:06 | |
efried | the former gets you the latter for free | 22:06 |
dansmith | IMHO, the latter is the job of nova anyway, | 22:06 |
dansmith | and would fit with a "this compute isn't new enough to do that thing you want" sort of check | 22:06 |
dansmith | efried: it depends on another service for upgrade correctness though | 22:06 |
efried | if you expose the inventory but have something else that prevents you from using it, that's a waste, isn't it? | 22:06 |
dansmith | i.e. it depends on cyborg being well-behaved, and/or not being modified to just always do it | 22:07 |
dansmith | efried: well, it means that cyborg can be upgraded and working before you've upgraded all your computes | 22:07 |
efried | heck, on that theory we could hang the inventory off a *really* old compute and it mushroom clouds because it can't handle nested providers at all. | 22:07 |
dansmith | so if you upgrade service by service, you don't have to loop back around | 22:07 |
dansmith | which was always annoying with things like ironic that had kinda circular deps with nova | 22:07 |
efried | cyborg is polling though, isn't it Sundar? | 22:08 |
dansmith | efried: well, that might be a good reason to do both then I guess, I dunno | 22:08 |
Sundar | efried: Sorry.Cyborg is polling for what? | 22:08 |
efried | Heh, mushroom cloud. That should be a thing. | 22:08 |
Sundar | Polling for new devices? Yes | 22:09 |
dansmith | efried: I understand that we avoid asking for the new thing in a roundabout way by not exposing a trait, cyborg not exposing inventory, placement not returning any candidates, and thus us not actually running our control plane code, | 22:09 |
efried | Sundar: If cyborg agent is shiny and new on a host, but the nova-compute on the host is downlevel | 22:09 |
efried | and then you upgrade the nova-compute | 22:09 |
efried | will cyborg eventually figure that out and realize it should now start reporting accel inventory on that host? | 22:09 |
dansmith | efried: but it would be a lot more in line with our regular checks to just be explicit about it, even if we should never hit it because of the other | 22:10 |
efried | dansmith: okay, so what is it that you're suggesting exactly? I didn't pick up on that. | 22:10 |
efried | An RPC version check from conductor to compute? | 22:10 |
openstackgerrit | Mykola Yakovliev proposed openstack/nova master: Fix boot_roles in InstanceSystemMetadata https://review.opendev.org/698040 | 22:10 |
dansmith | efried: if you pass the ARQs down in the spawn call, you need an RPC version, so you get a service version, which is something you can just check for easily in conductor, alternately just bump the service version without the rpc and check for that in conductor | 22:11 |
dansmith | efried: or expose it as a trait and have conductor/scheduler filter or check for that trait before agreeing to use that compute | 22:11 |
*** slaweq has joined #openstack-nova | 22:11 | |
dansmith | efried: perhaps the easiest thing would be just "if we have accel resources, also add CAN_DO_CYBORG_SHIT trait into the requirements" before we call placement during scheduling? | 22:11 |
dansmith | you'll need to do it again for attach when we add that | 22:12 |
dansmith | so CAN_DO_CYBORG_SHIT_V2alpha | 22:12 |
efried | dansmith: I'm not offended by the thought of those capability traits, which seem on par with what we have for e.g. "can do multiattach" or whatever | 22:13 |
efried | so IMO we should do those regardless | 22:13 |
dansmith | ack | 22:13 |
dansmith | so cyborg can look for that, | 22:13 |
efried | those are simply flags on that virt driver dict | 22:13 |
dansmith | and we throw that into the scheduling request also if we're doing accel stuff? | 22:13 |
efried | Well, I don't think we need it in the sched request, if cyborg is using it to decide whether to present the inventory in the first place. | 22:13 |
efried | but if you think it should be there for form's sake, I wouldn't object. | 22:14 |
dansmith | again, that puts the onus on cyborg to be responsible, which I don't like | 22:14 |
efried | where's the trust?? | 22:14 |
Sundar | So, the use case for all this is that, the operator added accelerator to a compute node but did not upgrade n-cpu on that node? | 22:14 |
dansmith | right, thank you :) | 22:14 |
dansmith | efried: let me check my butt, hang on | 22:14 |
efried | That's not spelled "onus" dansmith | 22:14 |
dansmith | hehe | 22:15 |
efried | Sundar: yes, exactly. | 22:15 |
efried | Sundar: we're trying to make nova account for poorly-behaved a) operators, b) service code from other services. | 22:15 |
*** tosky has joined #openstack-nova | 22:15 | |
efried | and not-work in a predictable rather than unpredictable (and potentially unrecoverable) way. | 22:15 |
dansmith | just to be clear, | 22:16 |
*** slaweq has quit IRC | 22:16 | |
dansmith | I definitely think that the rpc method change is the right thing to do, instead of just coding in new assumptions on both sides | 22:16 |
*** mvkr has quit IRC | 22:16 | |
dansmith | I understand the hesitation to that, so I won't hold strictly to it, despite it being the safer option, IMHO | 22:16 |
efried | dansmith: if we did that, we would have to have a post-placement filter to remove hosts based on RPC version, right? | 22:17 |
efried | perhaps we already have that. | 22:17 |
Sundar | What's wrong with the option where Nova places a CAN_DO_CYBORG trait on some compute node RPs, and factors that into Placement query? | 22:17 |
efried | Sundar: nothing's wrong with that, and we should do it, but we're talking about doing another thing as well. | 22:18 |
dansmith | efried: no, you'd still want to pick a capable host during scheduling, and scheduling based on rpc version is not a thing | 22:18 |
efried | dansmith: so what's the RPC version for again? Just passing the arq uuids? | 22:18 |
*** tbachman has joined #openstack-nova | 22:18 | |
dansmith | efried: yes, and to crystalize our request to the compute node so that (a) the RPC layer can tell if we're (for whatever reason) asking the compute node to do something it can't handle, and so that compute doesn't bake the assumption of the instance having a magical key in its flavor into meaning that it should call to cyborg to do all this stuff | 22:19 |
*** mvkr has joined #openstack-nova | 22:20 | |
dansmith | efried: because sometime in the future, the mere presence of that flavor key will imply a slightly different thing, and we will have an upgrade concern to deal with because we have a bunch of old computes that will still be doing the assume-it thing | 22:20 |
dansmith | I made reference to this in a previous discussion about all of this | 22:20 |
efried | This is getting into details I probably don't care about rn, but do we get (a) for free (does the call bounce if the compute is too old) or do we have to do an explicit check? | 22:22 |
dansmith | summarized the bits you do care about here: https://review.opendev.org/#/c/631244/51/nova/conductor/manager.py | 22:22 |
efried | ack | 22:23 |
dansmith | efried: the rpc layer knows what the minimum supported version is across the cluster, so when it is converting the request, it can explode if doing so would drop an arqs parameter | 22:23 |
dansmith | efried: it gets you the circuit breaker, not necessarily the "graceful" part -- that should still be scheduler-based | 22:24 |
efried | cool | 22:24 |
dansmith | Sundar: while you're here, what's the plan for getting some small example of this running with an actual pci device in the intel pci ci system (or something) ? | 22:26 |
*** ociuhandu has quit IRC | 22:27 | |
*** nicolasbock has quit IRC | 22:27 | |
Sundar | dansmith: We have the server and the FPGA in a lab, and one person is working on that. We expect it by Jan, if not this year. | 22:27 |
Sundar | FWIW, I check each patch set with real FPGAs, with different device profiles, and also the fake driver. | 22:28 |
efried | We should have a 3rd party CI called "Intel Cyborg Sundar Manual CI" | 22:29 |
dansmith | Sundar: that would be good except that I totally don't trust you as much as a computer | 22:30 |
dansmith | :) | 22:30 |
Sundar | We already have that hehe | 22:30 |
dansmith | Sundar: so one server and one fpga and one person.. what is the plan? to have that be a manual "recheck use fpgas" command or something? | 22:30 |
Sundar | dansmith: Too bad, I am more reliable than other past Intel CIs ;) | 22:30 |
dansmith | Sundar: that's true, although doesn't really change my feelings about it :) | 22:30 |
efried | dansmith: experimental queue? | 22:31 |
*** tbachman has quit IRC | 22:31 | |
dansmith | efried: that's what I meant.. some not-on-everything per-request method | 22:31 |
efried | or severely restrict files | 22:31 |
efried | but yeah | 22:31 |
dansmith | per-request would be better I think, if it's severely constrained | 22:32 |
*** awalende has joined #openstack-nova | 22:34 | |
*** mriedem has joined #openstack-nova | 22:37 | |
*** awalende has quit IRC | 22:39 | |
*** mdbooth has quit IRC | 22:46 | |
*** mdbooth has joined #openstack-nova | 22:48 | |
*** Sundar has quit IRC | 23:04 | |
*** tkajinam has joined #openstack-nova | 23:08 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: DNM: debug cross-cell resize https://review.opendev.org/698304 | 23:09 |
*** slaweq has joined #openstack-nova | 23:11 | |
*** slaweq has quit IRC | 23:15 | |
openstackgerrit | Merged openstack/nova master: Add resource provider allocation unset example to troubleshooting doc https://review.opendev.org/696582 | 23:17 |
openstackgerrit | Merged openstack/nova master: nova-net: Convert remaining API tests to use neutron https://review.opendev.org/696509 | 23:17 |
*** mlavalle has quit IRC | 23:17 | |
*** Liang__ has joined #openstack-nova | 23:20 | |
*** brault has quit IRC | 23:30 | |
*** mmethot has quit IRC | 23:31 | |
*** brault has joined #openstack-nova | 23:31 | |
*** slaweq has joined #openstack-nova | 23:32 | |
*** brault has quit IRC | 23:32 | |
*** brault has joined #openstack-nova | 23:33 | |
*** mmethot has joined #openstack-nova | 23:36 | |
*** pmatulis has joined #openstack-nova | 23:37 | |
pmatulis | how do i map the hypervisor long ID (server show) to the hypervisor hostname (hypervisor list/show)? | 23:38 |
*** avolkov has quit IRC | 23:38 | |
*** tosky has quit IRC | 23:40 | |
dansmith | pmatulis: if you server-show as admin, you should see the unobfuscated hostid , IIRC | 23:40 |
*** tbachman has joined #openstack-nova | 23:41 | |
*** slaweq has quit IRC | 23:41 | |
dansmith | pmatulis: as OS-EXT-SRV-ATTR:hypervisor_hostname | 23:41 |
*** Liang__ has quit IRC | 23:42 | |
*** brinzhang has joined #openstack-nova | 23:58 | |
*** brault has quit IRC | 23:58 | |
*** brault has joined #openstack-nova | 23:59 | |
*** ccamacho|pto has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!