Monday, 2019-12-02

*** bbowen has joined #openstack-nova00:00
*** slaweq has joined #openstack-nova00:10
*** slaweq has quit IRC00:15
*** ociuhandu has joined #openstack-nova00:19
*** slaweq has joined #openstack-nova00:21
*** tosky has quit IRC00:26
*** ociuhandu has quit IRC00:26
*** spsurya has joined #openstack-nova01:28
*** ociuhandu has joined #openstack-nova01:29
*** ociuhandu has quit IRC01:34
*** dave-mccowan has joined #openstack-nova01:46
*** ociuhandu has joined #openstack-nova01:59
*** chenhaw has joined #openstack-nova02:00
*** slaweq has quit IRC02:02
*** ociuhandu has quit IRC02:03
*** ociuhandu has joined #openstack-nova02:04
*** chenhaw has quit IRC02:06
*** ociuhandu has quit IRC02:09
*** chenhaw has joined #openstack-nova02:21
*** yaawang has quit IRC02:24
*** yaawang has joined #openstack-nova02:33
*** ociuhandu has joined #openstack-nova02:51
*** ociuhandu has quit IRC02:53
*** ociuhandu has joined #openstack-nova02:55
*** ociuhandu has quit IRC03:03
*** mkrai has joined #openstack-nova03:26
*** ricolin has joined #openstack-nova03:47
*** davee_ has joined #openstack-nova04:03
*** ociuhandu has joined #openstack-nova04:03
*** davee___ has quit IRC04:04
*** bhagyashris has joined #openstack-nova04:04
*** ociuhandu has quit IRC04:08
*** udesale has joined #openstack-nova04:37
*** dave-mccowan has quit IRC04:40
*** awalende has joined #openstack-nova05:18
*** lbragstad_ has joined #openstack-nova05:21
*** awalende has quit IRC05:23
*** lbragstad has quit IRC05:24
*** ociuhandu has joined #openstack-nova05:30
*** ociuhandu has quit IRC05:36
*** mkrai has quit IRC05:38
*** links has joined #openstack-nova05:39
*** mkrai has joined #openstack-nova05:41
*** Luzi has joined #openstack-nova06:06
*** huaqiang has quit IRC06:27
*** avolkov has joined #openstack-nova06:29
*** pcaruana has joined #openstack-nova06:32
*** mkrai has quit IRC06:38
*** ralonsoh has joined #openstack-nova06:51
*** belmoreira has joined #openstack-nova06:54
*** ircuser-1 has quit IRC06:55
*** belmoreira has quit IRC06:56
*** dpawlik has joined #openstack-nova07:00
*** ociuhandu has joined #openstack-nova07:01
*** dpawlik has quit IRC07:05
*** ociuhandu has quit IRC07:05
*** slaweq has joined #openstack-nova07:08
*** abaindur has quit IRC07:08
*** tkajinam has quit IRC07:11
*** tkajinam has joined #openstack-nova07:12
*** dpawlik has joined #openstack-nova07:15
*** mkrai has joined #openstack-nova07:23
*** slaweq has quit IRC07:37
*** slaweq has joined #openstack-nova07:40
*** abaindur has joined #openstack-nova07:43
*** abaindur has quit IRC07:53
*** damien_r has joined #openstack-nova08:04
*** jangutter has joined #openstack-nova08:09
*** awalende has joined #openstack-nova08:14
*** tesseract has joined #openstack-nova08:14
*** awalende has quit IRC08:18
*** tkajinam has quit IRC08:25
*** tkajinam has joined #openstack-nova08:26
*** tosky has joined #openstack-nova08:31
*** tssurya has joined #openstack-nova08:31
*** rpittau|afk is now known as rpittau08:31
*** ccamacho has joined #openstack-nova08:34
*** yedongcan has joined #openstack-nova08:34
*** tkajinam has quit IRC08:39
*** maciejjozefczyk has quit IRC08:46
*** ttx has quit IRC08:50
*** ttx has joined #openstack-nova08:50
*** dtantsur|afk is now known as dtantsur08:52
*** ociuhandu has joined #openstack-nova08:55
*** ociuhandu has quit IRC08:56
*** ociuhandu has joined #openstack-nova08:57
*** trident has quit IRC09:07
*** trident has joined #openstack-nova09:09
openstackgerritLee Yarwood proposed openstack/nova master: WIP libvirt: Use virDomainBlockCopy to swap volumes with >= 5.10.0  https://review.opendev.org/69683409:10
*** ganso has quit IRC09:23
*** ganso has joined #openstack-nova09:23
*** ganso has quit IRC09:24
*** ganso has joined #openstack-nova09:24
*** derekh has joined #openstack-nova09:35
*** damien_r has quit IRC09:38
*** yedongcan has quit IRC09:49
*** damien_r has joined #openstack-nova09:50
*** awalende has joined #openstack-nova09:52
*** yaawang has quit IRC09:53
openstackgerritDirk Mueller proposed openstack/nova master: Remove rootwrap filters for nova network  https://review.opendev.org/69684409:59
*** ociuhandu has quit IRC10:04
*** ivve has joined #openstack-nova10:08
openstackgerritGuo Jingyu proposed openstack/nova master: WIP:Drop unreliable host_aggregate_map  https://review.opendev.org/69685110:11
*** yaawang has joined #openstack-nova10:12
*** lpetrut has joined #openstack-nova10:13
*** mkrai has quit IRC10:24
*** tosky_ has joined #openstack-nova10:26
*** ociuhandu has joined #openstack-nova10:28
*** tosky has quit IRC10:28
openstackgerritGuo Jingyu proposed openstack/nova-specs master: Proposal for a safer noVNC console with password authentication  https://review.opendev.org/62312010:36
*** tosky_ is now known as tosky10:38
*** mkrai has joined #openstack-nova10:45
*** ralonsoh has quit IRC11:01
*** ralonsoh has joined #openstack-nova11:03
*** mkrai has quit IRC11:05
*** chenhaw has quit IRC11:06
*** mkrai has joined #openstack-nova11:09
stephenfinsean-k-mooney: Hit https://review.opendev.org/#/c/689861/ and it's predecessor, FYI. Sorry for the delay11:12
*** ociuhandu has quit IRC11:16
*** ociuhandu has joined #openstack-nova11:17
*** ociuhandu has quit IRC11:26
*** ociuhandu has joined #openstack-nova11:26
*** ivve has quit IRC11:26
*** tssurya has quit IRC11:29
*** mkrai has quit IRC11:32
*** ociuhandu has quit IRC11:43
*** ociuhandu has joined #openstack-nova11:45
sean-k-mooneyah cool ill address the comments on that today11:46
sean-k-mooneythanks11:46
sean-k-mooneystephenfin: by the way you realise that i did the test refactor before you started yours right. and that i cant depend you your refactor because i need to backport this which is why i orginally asked you to base your refactor on top of that series11:48
sean-k-mooneyi can drop the new base class and go back to the other _wait_for_state_chage but i can use any of your refactoring or ill have to undo all of it when i backport11:49
*** ociuhandu has quit IRC11:50
*** dtantsur is now known as dtantsur|afk11:54
*** mkrai has joined #openstack-nova11:57
sean-k-mooney* i can't use12:00
*** martinkennelly has joined #openstack-nova12:16
*** ivve has joined #openstack-nova12:19
*** ociuhandu has joined #openstack-nova12:24
*** ociuhandu has quit IRC12:29
stephenfinsean-k-mooney: Yup, but I don't think it's a massive deal that '_wait_for_state_change' behaves oddly on stable branches, given that there's plenty of tests using it12:31
openstackgerritLee Yarwood proposed openstack/nova master: Revert "nova shared storage: rbd is always shared storage"  https://review.opendev.org/68252312:33
openstackgerritLee Yarwood proposed openstack/nova master: libvirt: Rename _is_storage_shared_with to _is_path_shared_with  https://review.opendev.org/69333712:33
*** udesale has quit IRC12:36
*** udesale has joined #openstack-nova12:37
sean-k-mooneystephenfin: ya im using the fake notifier too so i can also use the notifer to wait for the completeion events too so its fine ill rework them later today12:39
sean-k-mooneygibi: johnthetubaguy do either of ye feel comfrotable reviewing pci code12:49
stephenfinsean-k-mooney: Can I say no to that? ^ :P12:59
*** ygk_12345 has joined #openstack-nova13:02
sean-k-mooneyyou can but you already reviewed the patch and said yes :)13:07
sean-k-mooneyalso i said comfortabley not do you like reviewing pci stuff :)13:07
*** nweinber has joined #openstack-nova13:08
sean-k-mooneyi also feel like i missed a chance to use "no take backsies" but that might date me a little13:11
*** ccamacho has quit IRC13:12
*** ccamacho has joined #openstack-nova13:12
*** efried_pto is now known as efried13:13
*** ccamacho has quit IRC13:13
*** ccamacho has joined #openstack-nova13:13
openstackgerritEric Fried proposed openstack/os-traits master: Add 'TYPE_PLOOP' image type.  https://review.opendev.org/69643513:14
efriedstephenfin: I fixed that link ^ -- do you want to confirm real quick before I send it?13:15
*** dave-mccowan has joined #openstack-nova13:16
*** dave-mccowan has quit IRC13:21
sean-k-mooneythe ploop image type is only used by parallel/openvz containers rights13:24
efriedNo idea, but it's on the list, so we should have a trait for it.13:25
efried(and by "no idea", I mean your sentence has at least three words I don't know)13:25
sean-k-mooneyya it is https://wiki.openvz.org/Ploop/format13:25
sean-k-mooneyand yes we should have a trait for it13:26
efriedsounds like a euphemism for a particular type of turd to me <shrug>13:26
openstackgerritEric Fried proposed openstack/nova master: Process requested_resources in ResourceRequest init  https://review.opendev.org/69635413:28
openstackgerritEric Fried proposed openstack/nova master: Reusable RequestGroup.add_{resource|trait}  https://review.opendev.org/69638013:28
openstackgerritEric Fried proposed openstack/nova master: WIP: Use string suffixes and provider mappings  https://review.opendev.org/69641813:28
efriedstephenfin, gibi: fixed the unit test on the bottom one ^13:28
*** ociuhandu has joined #openstack-nova13:30
openstackgerritMerged openstack/os-traits master: Add 'TYPE_PLOOP' image type.  https://review.opendev.org/69643513:34
*** ygk_12345 has quit IRC13:34
*** mkrai has quit IRC13:41
*** mkrai has joined #openstack-nova13:41
*** ociuhandu has quit IRC13:55
*** ociuhandu has joined #openstack-nova13:55
*** jroll has quit IRC13:57
*** jroll has joined #openstack-nova13:59
*** tkajinam has joined #openstack-nova14:06
*** ociuhandu has quit IRC14:10
*** mkrai has quit IRC14:10
*** haleyb|away is now known as haleyb14:17
*** bhagyashris has quit IRC14:20
*** ociuhandu has joined #openstack-nova14:22
*** lbragstad_ is now known as lbragstad14:22
*** eharney has joined #openstack-nova14:23
gibiefried: I'm +2 on the bottom patch14:23
efriedthanks gibi14:23
gibithanks for starting this work14:23
gibiyou are solving some of my todos :)14:23
efrieddoes the strategy seem sane to you wrt the requester_id <=> group ID?14:23
gibiefried: yeah, that step is what I had in my mind14:24
efriedgreat14:24
gibiefried: I'm a bit affraid the replacement of fill_provider_mapping with the mapping from the a_c response. Theortically it should work I just hope there will not be too much complication14:25
efriedwe'll see :)14:25
efriedgibi: I think the most complicated bit might be revert_resize14:26
gibiyes, that is the type of surgery that better do that plan14:26
efriedBy that time the original mappings might be long gone14:26
efried...unless we persist them in the RequestSpec...?14:26
gibiefried: in case of revert_resize we can either start using multiple port binding for resize which is hard but nice, or persist the mapping somewhere14:26
gibiif we go to the later then I would persist it to the Migration object14:27
gibias the migration object has the proper lifecycle for this type of data14:27
gibithe RequestSpec lives longer14:27
*** munimeha1 has joined #openstack-nova14:35
*** tbachman has joined #openstack-nova14:36
*** awalende_ has joined #openstack-nova14:40
sean-k-mooneygibi: i would like to move to useing multiple port bindings for resize for what its worth14:41
sean-k-mooneygibi: we will be using it for cross cell resize14:41
sean-k-mooneyso i think it makes sense to just use it always for resize/cold migrate14:41
*** awalende has quit IRC14:44
*** awalende has joined #openstack-nova14:44
*** awalende_ has quit IRC14:44
*** awalende has quit IRC14:45
sean-k-mooneyefried: gibi when ye have time could yet take a look at https://review.opendev.org/#/c/674072/ and maybe https://review.opendev.org/#/c/695118/14:46
sean-k-mooneyill be going on vacation in 2 weeks but i have a 4 patches i really want to get merged if i can before i do those are two of them and im resping the other two now14:47
*** mriedem has joined #openstack-nova14:48
efriedstephenfin, lyarwood: thoughts on bug tagging here? https://review.opendev.org/#/c/682594/14:49
efriedsean-k-mooney: that might be a bit out of my bailiwick, but I'll give it a try.14:51
stephenfinlooking14:53
lyarwoodefried: yeah that's fair, I can also address the nits before we merge this afternoon14:53
efriedokay14:53
lyarwoodefried: I think I was trying to avoid any suggestion that this was the only fix required here14:53
stephenfinYour call. I could go either way. I'm guessing it was left that way since mriedem has a better albeit not backportable fix14:53
stephenfinyup14:54
efriedright, I'm looking at it from the perspective of stable, where if we don't Closes-Bug on this patch, there will never be a "fix".14:54
efriedI guess if you really want, we could keep this one Related-Bug and switch it to Closes-Bug on the backports14:54
efriedcan't believe there's not a precedent for something like this.14:55
*** mkrai has joined #openstack-nova14:55
lyarwoodYeah not that I recall but I'd be okay with that from a stable point of view14:56
lyarwoodrelated on master, closes on stable that is14:56
sean-k-mooneywell does it actully fix the bug14:56
sean-k-mooneyor just make it less bad14:56
sean-k-mooneyye are talking about https://bugs.launchpad.net/nova/+bug/1844296 right14:56
openstackLaunchpad bug 1844296 in OpenStack Compute (nova) "Stale BDM records remain in the DB after n-api to n-cpu RPC timeouts during reserve_block_device_name" [Undecided,In progress] - Assigned to Lee Yarwood (lyarwood)14:56
sean-k-mooneyand this patch https://review.opendev.org/#/c/682594/14:57
lyarwoodsean-k-mooney: yeah, it cleans up after one type of failure, it doesn't stop the failure from happening.14:57
sean-k-mooneyah i see14:57
sean-k-mooneyso its treating the symtoms not the cause14:57
lyarwoodsean-k-mooney: yeah and very specific symptoms at that.14:58
*** tkajinam has quit IRC14:58
sean-k-mooneyyou could make it partial-bug if you plan to try and fix it someother way on master in a follow up patch14:58
lyarwoodah yeah I forgot about partial14:59
lyarwoodthat might do14:59
sean-k-mooneyrelated bug makes sense too but i can see why you might what cloeses bug for stabel if you dont think any complete fix woudl be backportable14:59
sean-k-mooneyPartial-Bug i think is the most correct but that wont mark it as "fixed" on stable15:00
mriedemsee my comments from PS415:00
mriedem1. Can we sort out the various bugs I've mentioned to see which ones  this fixes and then duplicate whatever is a duplicate and decide on one  (probably the oldest) bug for the same issue.15:00
mriedem2. I don't trust unit tests for this type of stuff since it involves (a)  more than one service and (b) the database. Can we write a functional  test to recreate the bug and then assert the fix resolves it?15:00
mriedemi believe i sorted out the bazillion related bugs here https://review.opendev.org/#/c/692940/15:01
lyarwoodgah, I'm not sure how I missed that, sorry mriedem15:02
mriedemprobably distracted by all of the comments inline about races and locks and such that might mean this doesn't fix a bug15:03
openstackgerritDan Smith proposed openstack/nova master: Add a way to exit early from a wait_for_instance_event()  https://review.opendev.org/69598515:04
mriedemif you want i could write the functional test if you don't want to bother with that15:10
mriedemlyarwood: since the bdm.create is the last thing that happens on the compute, in what case would this cleanup catch and actually clean anything up?15:11
mriedemlike just the split second that we actually created the record and then got a messaging timeout?15:11
lyarwoodmriedem: happy to write the func test15:13
lyarwoodmriedem: and yeah but that's asssuming that is a split second and n-cpu isn't stuck somehow15:13
lyarwoodmriedem: as mdbooth suggested15:13
lyarwoodmriedem: FWIW we have seen this downstream at least, so either way it's possible to hit and reasonable to cleanup15:14
mriedemsure, so if n-cpu is stuck on let's say getting the instance uuid lock, or doing something in the driver like _get_device_name_for_instance, we wouldn't create the bdm in that case anyway15:14
mriedemwhere/how was it triggered downstream?15:14
*** ociuhandu has quit IRC15:14
mriedemslow compute and the api timed out?15:14
mriedembecause if so, the change to make reserve_block_device_name using the long_rpc_timeout will fix that15:15
*** ociuhandu has joined #openstack-nova15:15
lyarwoodmriedem: https://bugzilla.redhat.com/show_bug.cgi?id=1752734 - I didn't get a root cause out of this as the env wasn't logging in DEBUG unfortunatley15:15
openstackbugzilla.redhat.com bug 1752734 in openstack-nova "Invalid bdm record remains when reserve_block_device_name rpc call times out" [High,On_dev] - Assigned to lyarwood15:15
lyarwoodmriedem: but we did end up with stale BDM records in the DB after seeing timeouts15:16
sean-k-mooneylyarwood: ok so there is a customer case attached to that so its not a ci issue15:16
mriedem3. nova-api again requests to nova-compute to do attaching operation  4. nova-compute updates bdm with retrieved connection info  5. if some error like timeout detected between 3-4, remove bdm record created at 215:17
mriedemso that's not the same thing15:17
mriedemthat's (1) reserve_block_device_name worked and created a bdm and then (2) when the api cast to attach_volume something timed out and the bdm was orphaned15:18
*** links has quit IRC15:18
lyarwoodmriedem: yeah I think that description is wrong, there's reserve_block_device_name failures documented in the next private comment of the bug15:19
lyarwoodsean-k-mooney: ^ can confirm that, however I don't think we can share the actual logs here.15:20
mriedemeven so, the change to make reserve_block_device_name using the long_rpc_timeout will fix that15:20
mriedembut i don't think that long rpc timeout goes back to queens15:20
sean-k-mooneyam i was reading it and im not sure the comment need to be private. we cant share the raw logs but i do not see antynig in comment 1 that is really an issue15:21
*** ociuhandu has quit IRC15:21
mriedemif it's just a messagingtimeout i can understand that15:21
sean-k-mooney MessagingTimeout: Timed out waiting for a reply to message ID fa3d55d2d8c248ba82a13c940350ea9c15:21
lyarwoodsuper secret volume uuids15:21
lyarwood;)15:21
sean-k-mooneyso the long rpc might fix it ya15:22
mriedemit's a 60 second timeout by default with a blocking api - if the compute is slow or we're waiting on the instance uuid lock (maybe because a periodic is doing something with the instance at the time of attach), we could reasonably timeout15:22
mriedemhttps://review.opendev.org/#/c/566696/ only goes back to rocky though, so you can't get the long_rpc_timeout fix to queens15:22
mriedemso no use in saying that's the backportable fix15:22
*** dpawlik has quit IRC15:22
mriedemanyway, it just seems like a really small window where the fix in the api would handle it since the bdm create is the last thing to happen - if we timeout before that we still orphan the bdm record15:23
mriedemand i know i'm contradicting my "what's the harm?" replies to mdbooth in the comments :)15:23
mriedemthe better longer term fix is the wip i posted, but that's not backportable15:24
lyarwoodyeah agreed on that part, we could backport long_rpc_timeout downstream *cough*15:24
lyarwoodbut yeah15:24
mriedemalso your bug is definitely a duplicate of https://bugs.launchpad.net/nova/+bug/142535215:25
openstackLaunchpad bug 1425352 in OpenStack Compute (nova) "A volume remains attached and cannot be detached after attaching it fails" [Low,Confirmed]15:25
* sean-k-mooney may have been looking at the patch to see if we could15:25
sean-k-mooneylyarwood: the issue would be the oslo.messaging min version bump15:25
mriedemsean-k-mooney: correct we're not backporting that to queens-em upstream15:26
lyarwoodokay that's likely not an issue downstram15:27
lyarwooddownstream*15:27
sean-k-mooneyi dont know15:27
sean-k-mooneyi would not nessisarliy be comfrotable with that15:27
sean-k-mooneybut we could explore it as an option15:27
sean-k-mooneyit would depend on what version of oslo.messaging we are actully using downstream15:28
lyarwood>= 5.29.015:28
lyarwoodsean-k-mooney: when did this come in?15:28
sean-k-mooney6.3.015:28
lyarwoodah15:29
lyarwoodthat would be an issue15:29
sean-k-mooneyhttps://github.com/openstack/requirements/blob/stable/queens/upper-constraints.txt#L14515:29
sean-k-mooneyso queens was capped a 5.35.615:29
lyarwoodyeah guess what we use downstream15:29
sean-k-mooneyso 6.3.0 proably will break things15:29
lyarwoodokay that's dream is dead15:29
lyarwoodthat*15:29
sean-k-mooneyis it higher then the upper-constraints for queens? cause i hate when we do that15:30
lyarwoodno it's not15:30
sean-k-mooneycool15:30
mriedemyou could create your own workaround option as a timeout for this specific call that overrides the default rpc_response_timeout just for your queens-only change and drop it for long_rpc_timeout when you start using that for this call from upstream15:30
*** ociuhandu has joined #openstack-nova15:32
sean-k-mooneyya is there an advantage to not useing the same config option15:32
sean-k-mooneyis it jsut to make it clear it applies to just this one call15:33
mriedemlyarwood: i left some comments in the patch to try and summarize irc discussion and duplicated the bug to the existing one for the same issue. i'd hold off on writing a functional test until more people weigh in on what the correct direction is here given backports and such.15:37
*** artom has joined #openstack-nova15:37
gibisean-k-mooney: could you quickly respin https://review.opendev.org/#/c/695118/ to fix stephenfin nits in the conf doc and the release notes? The change looks good to me too but I guess you want to backpor it so a follow up won't help15:38
*** Luzi has quit IRC15:38
*** ociuhandu has quit IRC15:38
sean-k-mooneyyep i can do that in 5 mins thanks for taking a look15:38
*** ociuhandu has joined #openstack-nova15:39
gibisean-k-mooney: thanks. Ping me and I will +215:39
mriedemwould be nice if any of the blizzard people would ack that change since they reported it15:39
*** ociuhandu has quit IRC15:44
mriedemstephenfin: tbc i'm waiting on vmware ci results on https://review.opendev.org/#/c/696503/ before moving it forward15:45
mriedemif that ci is busted (let's say we don't get results within the next 24 hours or something) then i'm cool with moving forward since if they can't maintain a working ci then we can't maintain their driver for them15:45
stephenfincoolness15:45
*** ociuhandu has joined #openstack-nova15:48
*** dpawlik has joined #openstack-nova15:51
sean-k-mooney... the release notes tox env takes forever to run15:55
*** dpawlik has quit IRC15:56
*** bhagyashris has joined #openstack-nova15:57
*** macz has joined #openstack-nova15:58
*** eharney has quit IRC15:59
openstackgerritsean mooney proposed openstack/nova master: add [libvirt]/max_queues config option  https://review.opendev.org/69511815:59
sean-k-mooneygibi: stephenfin ^ nits addressed no other changes15:59
gibisean-k-mooney: looking...15:59
*** ociuhandu has quit IRC16:01
gibieandersson: the bug is coming from blizzard ^^ could you check the bugfix from your perspective?16:01
sean-k-mooneyi will be filing and fixing a related bug for vhost-user seperatly but that should cover the case in the current bug if not let me know.16:02
sean-k-mooneyactully thinking about that i wonder if i should also add extra code to handel updating this on live migration?16:08
sean-k-mooneyi dont think we have code for the queue lenghts on live migration so i kind of feel a seperate patch to handel both queue lenght and max queues  would make sense since that will need object changes and wont be backportable where as this simple fix is16:11
sean-k-mooneygibi: stephenfin do you think that would be good to do as a followup? looking at the migrate data object we dont pass any of that info today16:13
gibisean-k-mooney: do you say that during live migration the server will use the source host config regarding the number of queues?16:14
sean-k-mooneyyes and rx/tx queue lenghts16:15
*** ociuhandu has joined #openstack-nova16:15
sean-k-mooneywe have no support for updating this at all16:15
sean-k-mooneygibi: cold migrate will be fine and a hard reboot would fix it after a live migrate16:16
sean-k-mooneyit would only be an issue if you went form a new host with a higher limit to an old host with a smaller one16:16
sean-k-mooneythe bug in this case was we limit the queues to 8 eventhogh the kernel support 256 so in generally it wont be an issue16:17
sean-k-mooneybut for correct ness we should be using the destination value16:17
gibisean-k-mooney: yeah, I think that the fix to use the dest value make sense16:18
sean-k-mooneyit should be a seperate patch however right16:18
sean-k-mooneyso we can backport this one16:18
sean-k-mooneythe fix for dest value would need an object change16:18
*** ociuhandu has quit IRC16:19
sean-k-mooneyassuming livirt allows it at all. it might just say no an reject the live migration16:20
*** mkrai has quit IRC16:20
sean-k-mooneygibi: ill file a bug and we can figure out what to do with it later16:21
gibisean-k-mooney: ack16:21
*** mlavalle has joined #openstack-nova16:24
*** rouk has joined #openstack-nova16:26
*** ivve has quit IRC16:26
efriedstephenfin: can I get a quick re+A on https://review.opendev.org/#/c/696354/ ? (Updated UT)16:27
*** gyee has joined #openstack-nova16:29
openstackgerritMatt Riedemann proposed openstack/nova master: Enable cross-cell resize in the nova-multi-cell job  https://review.opendev.org/65665616:32
openstackgerritMatt Riedemann proposed openstack/nova master: Flesh out docs for cross-cell resize/cold migrate  https://review.opendev.org/69621216:32
openstackgerritMatt Riedemann proposed openstack/nova master: Simplify FinishResizeAtDestTask event handling  https://review.opendev.org/69533716:32
openstackgerritMatt Riedemann proposed openstack/nova master: WIP: Add negative test to delete server during cross-cell resize claim  https://review.opendev.org/68883216:32
openstackgerritMatt Riedemann proposed openstack/nova master: WIP: Implement reschedule logic for cross-cell resize/migrate  https://review.opendev.org/69621316:32
openstackgerritStephen Finucane proposed openstack/nova master: docs: Clarify configuration steps for PF devices  https://review.opendev.org/69452216:32
stephenfinSure, done16:33
stephenfinefried: Some real nice docs there ^ :)16:33
efriedthanks, and noted.16:35
dansmithefried: did you see my comment about the event getting 422?16:36
*** lbragsta_ has joined #openstack-nova16:36
efriednot yet16:36
*** gyee has quit IRC16:36
dansmithokay I don't want sundar to be held up on a decision there or anything16:36
efriedreading now16:37
*** rpittau is now known as rpittau|afk16:38
efrieddansmith: oh, that's a good point. If cyborg gets this particular 422 it should mean the bind finished before we even got to the host, so the poll inside wait_for_instance_event will trigger and prevent us from waiting (for an event that won't come)16:38
dansmithefried: right, if it gets a 404 then it should fail the bind, but if it gets 422, it shouldn't freak out16:39
efried++16:39
dansmithin fact, 422 confirms, in a weird way, that the instance is still there, just not on a host yet16:39
dansmithso if you could chuck that ++ into the review so sundar isn't waiting that would be good16:40
*** udesale has quit IRC16:41
efrieddansmith: done16:43
dansmithefried: spanks16:43
efriedcomfortable and slimming16:44
dansmithefried: have you chatted with him to know that he's responning?16:44
dansmith*respinning16:44
efriedhaltingly16:44
dansmithhaltingly?16:44
efriedthink voicemail tag16:44
efriedbut with slack16:44
dansmithI, uh, will trust you16:46
*** bhagyashris has quit IRC16:46
stephenfindansmith: Can I remove remotable methods on objects if there are no callers left in-tree16:49
*** bhagyashris has joined #openstack-nova16:50
stephenfinOr are they versioned since they're RPC'y16:50
dansmithstephenfin: not according to the letter of the law16:50
dansmithand yes, they're RPC by virtue of being remotable16:50
dansmithstephenfin: I would just replace their bodies with raise NotImplementedError()16:50
stephenfinYeah, this is for Network.(create|destroy|associate|disassociate)16:51
*** gyee has joined #openstack-nova16:51
stephenfinI'll do just that16:51
dansmithaye16:51
*** trident has quit IRC16:51
*** trident has joined #openstack-nova16:53
*** df_sbr has joined #openstack-nova16:54
*** damien_r has quit IRC16:55
*** Sundar has joined #openstack-nova16:55
*** awalende has joined #openstack-nova16:56
*** eharney has joined #openstack-nova16:57
efriedstephenfin: +2 on that docs patch, but a pile of nits if you're so inclined.16:59
*** awalende has quit IRC16:59
efriedgibi or stephenfin: are you on board with https://review.opendev.org/#/c/695985/ (ability to cancel instance events)?17:00
melwittstephenfin: did you see my note here that other quotas related to nova-network will be able to be removed? https://review.opendev.org/#/c/686812/7/nova/api/openstack/compute/quota_classes.py@3817:01
stephenfinThat's way too much context to load up at 5pm /o\ I can hit it in the morning though17:01
efriedack, thx17:01
stephenfinmelwitt: Sure did. It's on the list of stuff still to clean up17:02
melwittstephenfin: k, coolness17:02
SundarResponded to the discussion on exit_wait_early for instance events in https://review.opendev.org/#/c/63124417:10
Sundarefried: stephenfin: ^ relates to https://review.opendev.org/#/c/695985/17:12
openstackgerritLee Yarwood proposed openstack/nova stable/stein: libvirt: Add a rbd_connect_timeout configurable  https://review.opendev.org/66916717:12
openstackgerritLee Yarwood proposed openstack/nova stable/rocky: libvirt: Add a rbd_connect_timeout configurable  https://review.opendev.org/66916817:12
openstackgerritLee Yarwood proposed openstack/nova stable/queens: libvirt: Add a rbd_connect_timeout configurable  https://review.opendev.org/66916917:12
lyarwoodmelwitt: ^ reopened and rebased17:12
melwittlyarwood: ack thanks17:16
*** KeithMnemonic has joined #openstack-nova17:18
mriedemlyarwood: did anyone that reported that issue actually say the fix resolved it for them?17:19
lyarwoodmriedem: we've just sent a test build out to a stable/queens user17:19
lyarwoodmriedem: it's going to fix their issues but will at least confirm that their network is borked17:20
*** mdbooth has joined #openstack-nova17:28
efriedSundar: back atcha. To be clear, I'm not seeing a problem; are you?17:31
*** macz has quit IRC17:32
Sundarefried: The notification from Cyborg will fail in many/most cases; ignoring that error could be considered a hack, rather than a solution. After all, folks looking at the n-api logs will see an error. Are we ok with that?17:33
sean-k-mooneygibi: this is the bug for the issue with live migrating https://bugs.launchpad.net/nova/+bug/1854844. ill try to think about ways to adress this in a backportable way. i think i know of one way but this might just be something we have to fix on master only17:33
openstackLaunchpad bug 1854844 in OpenStack Compute (nova) "libvirt: tx/rx queue lenght and max queues are not updtated on live migration" [Low,Triaged] - Assigned to sean mooney (sean-k-mooney)17:33
efriedSundar: Only if they're looking at debug logs :)17:34
efriedThe cyborg logs should of course explain what's happening as well.17:35
efriedAnd yes, I'm fine with this.17:35
Sundarefried: From Cyborg's POV, the notification after binding would be best-effort: ignore any failures and complete the binding. The Cyborg logs will indicate as much.17:37
sean-k-mooneygibi: we might be able to store the queue paramater in the port binding profile which is an unversioned dict of strigns filed and was intend to store hypervior specific info for a port.17:38
efriedSundar: Not *any* failures.17:39
SundarIf we're all ok with that, I'm good. But there are a few nuances to consider. First, the function to get the resolved ARQs is now defined in https://review.opendev.org/631245, while it needs to be used in the previous patch https://review.opendev.org/631244.17:39
sean-k-mooneygibi: the use of multiple port bindings for live migration and the rx/tx queue size options were both intoduced in rocky so it should be backportable to all affected version if i do it right17:39
efriedSundar: As dansmith suggested (oh, maybe it was in IRC, not in the patch) you should still fail and clean up the binding if you get e.g. a 404.17:40
efriedmeaning the instance was deleted by the time you went to emit the notification.17:40
sean-k-mooneygibi: well assuming the call order works out ill follow up with this later17:40
efriedSo, to be clear, the 422 is a special case, with a prominent NOTE and a nice debug log stating that the binding completed before the instance landed on the host, so it's acceptable that the notification is dropped.17:41
efriedSundar: refactoring method X from patch Z into patch Y because that's where it's now going to be used, yeah, that's SOP.17:42
Sundarefried: There could be a bunch of different failure modes: cannot locate n-api service, located but it timed out?, etc. We need to distinguish that category from 'called n-api and it returned an error'. In the latter case, is there granularity of error codes for why it failed?17:42
efriedSundar: Yes, the 422 is a special case; any other error is an error.17:42
*** jangutter has quit IRC17:43
SundarNeed to ceck if 422 is used for any other error case.17:43
Sundar*check17:43
*** lbragsta_ has quit IRC17:44
SundarRe. the patch series, I think we need to squash  https://review.opendev.org/631245 into the previous patch https://review.opendev.org/631244. Because once the get_resolved_arqs() moves to previous patch, the only things left behind are the virt driver changes. It would arguably be easier to understand those changes together with the changes to get17:47
Sundarthe arqs in compute/manager.py.17:47
mriedemnice https://docs.openstack.org/nova/latest/admin/#overview - "nova-api Receives XML requests"17:50
*** macz has joined #openstack-nova17:52
*** dpawlik has joined #openstack-nova17:52
*** bhagyashris has quit IRC17:55
melwittstephenfin: I noticed a new recent comment on this bug https://bugs.launchpad.net/nova/+bug/1289064 saying, maybe it can be closed bc numa aware lm was merged? seems so to me but wanted to check with you17:56
openstackLaunchpad bug 1289064 in OpenStack Compute (nova) "live migration of instance should claim resources on target compute node" [Medium,In progress] - Assigned to Artom Lifshitz (notartom)17:56
SundarOnly one use of 422 error code in create: https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/server_external_events.py . So, we are probably good.17:56
efriedyup (I checked that earlier)17:56
artommelwitt, stephenfin, oh, well - not entirely - only NUMA LM does claims17:57
*** dpawlik has quit IRC17:57
artomThen again, only NUMA LM has non-placement resources, so maybe yes?17:57
artomSRIOV is handled as well, albeit separately17:57
melwitter, or artom, sorry ^ (I pinged stephen bc his patches about rejecting lm requests that have numa topology were merged as Related-Bug)17:58
melwitthm, ok. I don't really understand what that means so I dunno either17:59
sean-k-mooneymelwitt: ya so stephens patch just reject live migration if the guest has a numa toplogy at the api17:59
sean-k-mooneysriov live migration directly allocates and claims the devices during live migration18:00
*** derekh has quit IRC18:00
sean-k-mooneynuma live migration uses a move claim  to claim resouces18:00
sean-k-mooney(cpus and mempages/hugepages)18:00
artomsean-k-mooney, so, could we say that all non-placement resources are not accounted correctly during live migration?18:02
sean-k-mooneyno18:02
sean-k-mooneyi dont think this is an issue on master18:02
artomErr, *are accounted18:02
eanderssonLooks good gibi18:02
eanderssonbtw mriedem did you see the minor bug I mentioned over the weekend18:03
sean-k-mooneyartom: right non-placement rescoues (cpu,mempages,sriov devices) should be accounted for in train+ and we blocked numa/sriov migration before that retoactivly18:03
sean-k-mooneyartom: melwitt so we might just be able to close https://bugs.launchpad.net/nova/+bug/128906418:04
openstackLaunchpad bug 1289064 in OpenStack Compute (nova) "live migration of instance should claim resources on target compute node" [Medium,In progress] - Assigned to Artom Lifshitz (notartom)18:04
*** factor has quit IRC18:04
eanderssonhttps://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2075 <-- eventlet.Timeout (based on BaseException, not Exception) may be raised here causing an UnboundError.18:04
mriedemeandersson: nope18:05
melwittsean-k-mooney: if there's not anything more to do there, I'd think it would be nice to close it18:06
sean-k-mooneywell there are still cases where live migration is not supported but on master i think we account for all resouces correctly in the cases where live migration is supported18:07
mriedemeandersson: open a bug18:07
artomsean-k-mooney, melwitt, so.... fix release 'Train'?18:07
dansmithefried: Sundar: I was on a call during your discussion, but...we're all good now?18:08
sean-k-mooneyartom: ya i think so18:08
* artom does the needful18:08
melwittyeah, I can't remember how/if there's a way to mark it as fixed in the past. but I don't think that matters too much18:09
artomDone.18:09
sean-k-mooneyyou can add the sepcific release series to the bug18:09
sean-k-mooneyi left a comment in the bug too18:09
* artom couldn't, maybe I need more lp powers18:09
eanderssonmriedem https://bugs.launchpad.net/nova/+bug/185484818:10
openstackLaunchpad bug 1854848 in OpenStack Compute (nova) "build_and_run_instance can error out with an UnboundException" [Undecided,New]18:10
Sundarefried: I think so, if you and all are ok with whatever I said above -- squishing the patches, etc. ignoring 422 alone seems fine. if n-api is unreachable or times out, that will still be an error that fails the binding.18:10
sean-k-mooneyartom: added train18:10
eanderssonIt's a minor issue, but caused VMs to get stuck in BUILD... fyi was caused by selinux not being configured properly.18:10
sean-k-mooneyartom: melwitt there is  a "Target to series" button you can use18:11
sean-k-mooneyyou need to be in the nova bug team on LP to see/use it18:11
*** ivve has joined #openstack-nova18:11
melwittyeah, I have that, but I don't think that will stop it from saying "released in <U version number>" whenever it's released next18:12
melwittit'll still count it as landing "today" first and then "backported" to train. I don't think it matters, just that's what the automation is gonna do when it comments later18:12
sean-k-mooneyya proably18:13
sean-k-mooneyi dont think we have automation tieded to that out side of the completed specs thing but that uses blueprints not bugs18:14
sean-k-mooneyartom: if you are triaging nova bugs upstream you should join https://launchpad.net/~nova-bugs its an open team so you will be automatically approved18:16
artomsean-k-mooney, can't say I'm triaging dilligently18:16
sean-k-mooneywell you know where to find it if you change your mind18:17
melwittsean-k-mooney: I mean the thing that will make a comment on the bug that says "this bug was fixed in version x.x.x.x" that will be technically wrong because it's gonna say a U version. again, I don't think it matters, just that's what it's going to say on the comment18:17
sean-k-mooney00.18:18
sean-k-mooney00.118:18
sean-k-mooneydroped phone18:18
sean-k-mooneyam isnt that done by the release tooling18:18
sean-k-mooneynot launchpad18:18
melwittit is yeah18:19
sean-k-mooneybut ya i guess in this case its not going to be set since artoms change did not have closes bug18:19
sean-k-mooneyanyway it should be fine18:20
melwittI don't think it matters, I think it just does a sweep through things that are Fix Released and makes a comment with the fixed in version18:20
*** ricolin has quit IRC18:22
*** macz has quit IRC18:23
*** spsurya has quit IRC18:25
*** macz has joined #openstack-nova18:27
*** macz has quit IRC18:27
efrieddansmith: Yes, thanks.18:27
*** macz has joined #openstack-nova18:28
dansmithefried: so there'll need to be a change to cyborg for the 422 thing presumably18:30
efriedyes18:30
dansmithefried: Sundar: is there a testing patch somewhere in the series that wires this all up with a fake cyborg driver?18:31
dansmithI assume that would be the one that needs to depends-on that cyborg change18:31
efriedthe CI job is working last I checked18:31
sean-k-mooneythere is a patch to add a tempest job againd nova nad i think the job is now running on all cyborg patches18:31
dansmiththere are a lot of patches that have various confused histories18:31
efrieddansmith: https://review.opendev.org/#/c/670999/2018:32
efriedpassing atm18:32
dansmithack just found that18:32
sean-k-mooneysame was going to link it18:32
efriedthere was a time when I looked and it wasn't doing anything, but I thought the last time I checked it was actually running through the spawn flow with a real (pseudo) device18:33
sean-k-mooneyit is doing two things18:33
efriedstill very basic though18:33
efriedgreen path only18:33
sean-k-mooneyits cratein a device profile and then a super minima test of booting a vm18:33
sean-k-mooneyalthogh that one is not correct18:33
sean-k-mooneyits booting a vm but not waiting for ti to go active before deleting it18:34
sean-k-mooneyat least the last time i looked but everythin is in place to add proper lifecycle tests18:34
dansmithseems like a lot of vm_state=error in the compute logs for that, I'll have to track it down closer to see if it's related18:34
sean-k-mooneyhttps://opendev.org/openstack/cyborg-tempest-plugin/src/branch/master/cyborg_tempest_plugin/tests/scenario/test_accelerator_basic_ops.py#L45-L6018:35
dansmithdoes that even wait for it to become running?18:35
sean-k-mooneyno which i mentioned before18:36
sean-k-mooneyto the cyborg folks18:36
sean-k-mooneycreate server does https://opendev.org/openstack/cyborg-tempest-plugin/src/branch/master/cyborg_tempest_plugin/tests/scenario/manager.py#L245-L25718:36
dansmithheh okay18:36
dansmithoh okay18:36
dansmithalso, the name of the test makes it hard to track down those instances in the logs18:36
dansmithwould be better if it was called test_server_ops_with_accel or something18:36
sean-k-mooneyya with that said its an easy change as is extenidn it to do more really tests18:37
Sundarsean-k-mooney: The tempest test waits till the server becomes active: https://review.opendev.org/#/c/667231/11/cyborg_tempest_plugin/tests/scenario/manager.py@15518:37
dansmithI don't see that the cyborg instance ever made it to the compute node18:39
sean-k-mooneyoh you are using the common tempest function ok18:39
*** jmlowe has quit IRC18:39
Sundardansmith:  The tempest CI test in cyborg-tempest-plugin is called test_server_basic_ops '' https://review.opendev.org/#/c/667231/11/cyborg_tempest_plugin/tests/scenario/test_accelerator_basic_ops.py@4518:40
dansmithSundar: right, which makes it hard to distinguish from the other test_server_basic_ops18:40
dansmithI don't see it actually running though: https://8664d62e69400bd89796-9cb1d5e035819d8d5535734f80756cd4.ssl.cf1.rackcdn.com/670999/20/check/tempest-integrated-compute/cdfc846/testr_results.html.gz18:40
sean-k-mooneySundar: that is an existing class name in tempest18:40
dansmithoh wait, that's the wrong log, ignore me18:41
sean-k-mooneyalthough the module path should be different18:41
SundarIf you just want to mention 'accel' somewhere in the name, that's not a problem. Will do.18:41
dansmithsean-k-mooney: right but the test name is what drives the tempest tenant name I think18:42
sean-k-mooneyhttps://2371b0492dbe3a0c56c0-5469ab4f5c2741453cb8b95135b2c449.ssl.cf2.rackcdn.com/670999/20/check/cyborg-tempest/2124885/testr_results.html.gz18:42
sean-k-mooneythat is the result you were looking for18:42
dansmithbut they're hardcoding the server name, which helps18:42
dansmithyeah I know18:42
dansmithso for this fake test, we don't actually provide any device to libvirt to attach to the guest right:18:42
dansmith?18:43
SundarThat's right18:43
sean-k-mooneycorrect18:43
sean-k-mooneythe arq has type fake and its ignored18:43
SundarThe path till creating/binding.getting ARQs is the same as for a real driver18:43
dansmithwill there be a real job running on intel ci or something that can actually do this?18:43
dansmithSundar: sure, it just doesn't cover actually producing a device and getting libvirt to attach it18:44
Sundardansmith: Yes, we are planning a 3rd party CI for Intel FPGAs18:44
sean-k-mooneywe could maybe use some other virtual device for testing in a more robust way too18:44
dansmithsean-k-mooney: yeah I was going to ask if we could pass some harmless pci device through, but wasn't sure if we can do that in the gate without nested18:45
Sundarsean-k-mooney: We have plans to enhance the fake driver to simulate programming -- it will just respons to the programming API within Cyborg with a success/failure18:45
*** igordc has joined #openstack-nova18:45
sean-k-mooneydansmith: i was thinking of creating a file on disk and passing it to the guest18:45
dansmithSundar: yeah, that's good, I just want to see the actual "here's the pci device" stuff tested18:45
SundarThat would cover more paths, including Glance interaction within Cyborg. But it would still not attach a PCI device to the VM18:45
*** martinkennelly has quit IRC18:45
dansmithsean-k-mooney: how does a file on disk simulate a pci device/18:46
dansmithokay anyway, this log looks good for what we have now and can test now, so that's cool18:46
dansmithI'll want to look at this one Sundar retools the current patches per the pending feedback18:46
sean-k-mooneyit does not but it will simulate genarting xml. i was really thinking of a lvm driver for local caching so we could have cyborg provide local storage to a vm.18:47
dansmithSundar: FWIW, I have this week and next week before I turn to a pumpkin for the rest of the year18:47
sean-k-mooneye.g. have a driver that could be useful in producion but has no hardware depency18:47
Sundardansmith: I am going to squash https://review.opendev.org/631245 into https://review.opendev.org/631244, for the reasons I mentioned above18:47
dansmithsean-k-mooney: is that a thing? I thought we were focused on pci devices for accelerators right now18:47
sean-k-mooneywe are but form a livirt driver point of there is very littel difference between generating the xml for a pci device vs a block device18:48
dansmithSundar: okay I skimmed over that.. IMHO, keeping the new event definition in its own patch makes sense just because of how much api stuff has to change18:48
dansmithsean-k-mooney: seems fairly different to me.. I understand it's just a chunk of xml, but it's not the same kind of xml we really care about18:49
Sundardansmith: Agreed. The event definition patch https://review.opendev.org/692707 will stay as is.18:49
sean-k-mooneyif we just want a way to create a device (lvm volume) program it (dd image to volume) and validate it in the guset it was a way to do it.18:49
dansmithsean-k-mooney: it's something, it's just not enough18:49
dansmithSundar: oh sorry I was looking at the wrong two tabs, lemme re-read18:49
openstackgerritLee Yarwood proposed openstack/nova master: libvirt: Request and store the decrypted path of an attached encrypted volume  https://review.opendev.org/69669318:50
sean-k-mooneyya i was also looking at ways to fake pci devices in linux and there are some ways but its not trivial18:50
Sundarsean-k-mooney: The code paths within the virt driver are quite different for block devices and PCI devices. So, how does this help?18:50
dansmithSundar: 631234 all but goes away after you address my feedback18:51
dansmithSundar: so assuming you mean just bringing forward the cyborg client method, then sure18:51
dansmithSundar: it would be attaching a block device as an accelerator, so no typical nova block device handling for it18:51
sean-k-mooneyits a more complete end to end test that is all. also im not sure how https://review.opendev.org/#/c/689070/ is going to progress and its another way to enable local caching18:51
dansmithsean-k-mooney: I don't think cyborg for block devices is a good way to address the local caching thing18:52
dansmithsean-k-mooney: how is that not going to confuse people?18:52
sean-k-mooneywell given samsung are now supporting sriov on there new ssds it might be but sure ill let you get back to it18:52
dansmithwell, I guess we can have that discussion, but I definitely don't want to merge that with this effort at the moment18:53
dansmithand since PCI is different enough, I still want to see PCI working for this18:53
sean-k-mooneyoh sure. its totally seperate18:53
sean-k-mooneyya18:53
sean-k-mooneyi just have not figured out how to fake it yet18:53
sean-k-mooneythe kernel netdevsim module was the close i came up with in the past18:54
dansmithyeah, well, as long as we'll have intel ci so we can see it working, that's something18:54
Sundardansmith: 631234 is a typo, right? Quoting myself from above:  I think we need to squash  https://review.opendev.org/631245 into the previous patch https://review.opendev.org/631244. Because once the get_resolved_arqs() moves to previous patch, the only things left behind are the virt driver changes. It would arguably be easier to understand those18:54
Sundarchanges together with the changes to get the arqs in compute/manager.py.18:54
dansmithSundar: I meant 631245 yes, sorry18:54
sean-k-mooneyya. the fake driver should allow us to test the api flow. the intel ci should validate it works with real hardware18:55
SundarCool, np18:55
dansmithSundar: the virt changes will be insulated from the cyborg client method is my point18:55
*** jmlowe has joined #openstack-nova18:55
dansmithSundar: if I were you I would have one patch that adds the support into the manager, all the build_resources changes to capture the ARQs and pass them to the virt driver,18:56
dansmithSundar: then a separate patch to implement things for libvirt18:56
Sundardansmith: Sure, will try to get back ASAP with the changes18:56
SundarOh, separate patch for libvirt?18:56
dansmithyes definitely18:56
SundarThat would be 63124 today18:56
Sundar*63124518:56
dansmithSundar: so when hyperv people come along, we point them at the libvirt patch and say "implement this part for your driver and you will be good"... separate from the plumbing in compute to wire everything up18:57
SundarI see, fine with that too18:58
dansmithSundar: in the compute manager patch, you just assume that any virt driver below you will ignore the ARQs you pass to it, which will be true of libvirt, until the next patch in the series where that becomes untrue for libvirt, but remains true for all the others18:58
openstackgerritMerged openstack/nova stable/pike: Add regression recreate test for bug 1830747  https://review.opendev.org/66312418:59
openstackbug 1830747 in OpenStack Compute (nova) pike "Error 500 trying to migrate an instance after wrong request_spec" [High,In progress] https://launchpad.net/bugs/1830747 - Assigned to Matt Riedemann (mriedem)18:59
openstackgerritMerged openstack/nova stable/pike: Workaround missing RequestSpec.instance_group.uuid  https://review.opendev.org/66312518:59
Sundardansmith: In the compute manager patch, the signature for libvirt driver's spawn() will have tio change to include the arqs as a parameter.18:59
dansmithSundar: does this method go away if you pass the stuff from compute manager to virt? https://review.opendev.org/#/c/631245/39/nova/accelerator/cyborg.py18:59
dansmithSundar: that would leave the virt patch you have as just the libvirt config bits18:59
dansmithSundar: correct, spawn() will  need to take the ARQs, just like it takes network and block device info19:00
*** lpetrut has quit IRC19:00
*** macz has quit IRC19:01
dansmithSundar: https://github.com/openstack/nova/blob/56d3cd7aa7f4d3be01fd2a5c10903fb548c49458/nova/compute/manager.py#L2491-L249919:01
*** dpawlik has joined #openstack-nova19:01
dansmithSundar: just like block and network, you should pull out arqs from resources and pass to spawn19:02
dansmitharguably we could just be passing in resources to spawn to avoid splitting it out, but let's not go down that hole in this set19:02
dansmithSundar: and since I will have a strong opinion on that later, let me go ahead and say, it should be called:19:03
dansmithaccel_info = resources['accel_info']19:03
dansmith:)19:03
Sundardansmith: That method would remain, albeit it would move to the compute manager patch. So that the waiting logic can call it once to decide whether to quit early. The virt driver patch 631245 will slim down to just the libvirt driver changes. The change in the signature for virt driver base class and the libvirt driver (to include arqs as a19:03
Sundarparameter) will now be in the compute manager patch.19:03
dansmithSundar: ah right, forgot that would be the check, so yes, move it thusly19:03
Sundardansmith: What you said about resources['accel...'] is exactly what I said in the gerrit review :)19:04
dansmithack19:04
dansmithSundar: except not the same names19:04
dansmithSundar: I was giving you a freebie pre -1 on the *name* :)19:04
SundarWhat name?19:05
dansmithaccel_info instead of accelerator_requests19:05
SundarI see. To match network_info etc.19:05
dansmithyes19:05
dansmithbecause I have issues. my therapist and I are working on them.19:05
SundarOk, I'm not too hung up on names19:05
dansmithgood :)19:05
*** dpawlik has quit IRC19:06
Sundarsean-k-mooney: still intrigued by your idea for a fake device, block or PCI. Do you have a way to simulate a PCI device? It can be *quite* difficult AFAICS with linux PCI subsystem.19:08
dansmithso I was thinking about something else, like grab the last usb host and pass that through19:08
dansmithSundar: I would focus on getting the patches back into order, and let sean-k-mooney pull something out of his butt for faking the PCI stuff... :)19:09
SundarYup, that's what I plan to do :)  but a 'real fake driver' (heck, I'll grab the award for today's oxymoron) would be very useful for the future.19:10
dansmithhe pulled off some fancy stuff for the numa live migration stuff, so we should see if he can do it again here :)19:10
sean-k-mooneySundar: the closest i have come up with is the netdevsim kernel module wich does create sriov capable fake pci device for testing and developing the kernel ebpf code19:10
sean-k-mooneySundar: how ever i dont think you can actully pass those device to a qemu instace. i have not tried however19:10
sean-k-mooneyi did try to use it for sriov testing in the past but i could not get it to work19:11
sean-k-mooneyi was able to create the fake device but i could not assing the vfs the pf might have worked but we dont really want to have to compile kernel module in the gate if we can avoid it19:11
Sundarsean-k-mooney: It would be useful to have a libvirt patch where the domain XML for hostdev can have an attribute that indicates 'do not pass this to qemu'.19:12
SundarThat would be useful outside of OpenStack too, I would guess.19:12
dansmithsean-k-mooney: that's why i was thinking pick something on the host we could pass through, like one of those USB hosts19:12
dansmithSundar: can't imagine the libvirt folks would go for that19:12
sean-k-mooneydansmith: maybe ya19:12
sean-k-mooneythere was talk and patchs for adding the abilty to emulate sriov device in qemu19:14
sean-k-mooneyif that ever gets merged upstream we might be able to use that19:14
sean-k-mooneybut for now it think focusing on getting the patches in order is better19:14
sean-k-mooneyif i come up with a way to fake sriov/pci deivces ill let ye know19:15
SundarWe're all agreed on that :). Time for me to get back to work.19:15
sean-k-mooneyone thing i have been meaning to test for a while is if i can use a neutron port as a pci device for a guest19:16
sean-k-mooneythey dont support vfs put i might be able to do a PF passthough19:16
sean-k-mooneyim pretty sure we are missing a few things to make that work however espacally in the upstream ci.19:18
*** macz has joined #openstack-nova19:33
openstackgerritEric Fried proposed openstack/nova master: Use Placement 1.34 (string suffixes & mappings)  https://review.opendev.org/69641819:38
openstackgerritEric Fried proposed openstack/nova master: WIP: Tie requester_id to RequestGroup suffix  https://review.opendev.org/69694619:38
efriedgibi: 696418 ^ now just does the 1.34 cutover, will continue with the rest in subsequent patches.19:40
*** dpawlik has joined #openstack-nova19:41
*** jangutter has joined #openstack-nova19:44
*** dpawlik has quit IRC19:46
*** jmlowe has quit IRC19:48
*** jangutter has quit IRC19:48
*** tesseract has quit IRC19:49
openstackgerritMerged openstack/nova master: Process requested_resources in ResourceRequest init  https://review.opendev.org/69635419:51
*** avolkov_ has joined #openstack-nova19:54
*** donnyd_ has joined #openstack-nova19:54
*** jrosser_ has joined #openstack-nova19:54
*** amodi_ has joined #openstack-nova19:55
*** tbachman_ has joined #openstack-nova19:56
*** tinwood_ has joined #openstack-nova19:56
*** mugsie_ has joined #openstack-nova19:56
*** StevenK_ has joined #openstack-nova19:57
*** macz has quit IRC20:02
*** tbachman has quit IRC20:02
*** ralonsoh has quit IRC20:02
*** pcaruana has quit IRC20:02
*** avolkov has quit IRC20:02
*** brault has quit IRC20:02
*** gryf has quit IRC20:02
*** amodi has quit IRC20:02
*** Roamer` has quit IRC20:02
*** gibi has quit IRC20:02
*** Jeffrey4l_ has quit IRC20:02
*** jkulik has quit IRC20:02
*** tinwood has quit IRC20:02
*** mugsie has quit IRC20:02
*** StevenK has quit IRC20:02
*** johanssone has quit IRC20:02
*** jrosser has quit IRC20:02
*** donnyd has quit IRC20:02
*** rabel has quit IRC20:02
*** fnordahl has quit IRC20:02
*** cmurphy has quit IRC20:02
*** dr_gogeta86 has quit IRC20:02
*** avolkov_ is now known as avolkov20:02
*** tbachman_ is now known as tbachman20:02
*** donnyd_ is now known as donnyd20:02
*** jrosser_ is now known as jrosser20:02
*** Jeffrey4l has joined #openstack-nova20:02
*** johanssone has joined #openstack-nova20:03
efrieddansmith: is nullable=True (e.g. [1]) made moot by having a default and using obj_set_defaults() in __init__ (e.g. [2])?20:04
efried[1] https://github.com/openstack/nova/blob/f1382651dc3b7b945b69a2af0bc05aa472f26b28/nova/objects/request_spec.py#L106120:04
efried[2] https://github.com/openstack/nova/blob/f1382651dc3b7b945b69a2af0bc05aa472f26b28/nova/objects/request_spec.py#L107120:04
*** ralonsoh has joined #openstack-nova20:04
dansmithno20:04
dansmithefried: nullable=True *only* means "can be set to None"20:04
efriedah fusk, will I *ever* get that right?20:04
dansmithprobably not20:05
efriedI thought it meant "can be absent" (in terms of `in`)20:05
dansmithno,20:05
dansmithall fields can be absent all the time20:05
dansmithalso unrelated to defaults20:05
*** pcaruana has joined #openstack-nova20:05
efrieddansmith: oh; so does obj_set_defaults() mean I don't have to do the `in` check?20:05
dansmithno20:05
dansmithif you haven't set defaults on the object, then 'in' will fail if the field is unset20:06
dansmithyou're connecting three things which are unrelated20:06
dansmith__contains__ means nothing more than "is it set right now"20:06
dansmithnullable means "can be set to none"20:06
*** cmurphy has joined #openstack-nova20:07
dansmithand default= means "if they call obj_set_defaults(), set this field to this value"20:07
efriedright, so forget nullable; I'm really just confirming whether gibi is correct here https://review.opendev.org/#/c/696418/1/nova/scheduler/utils.py@32720:07
dansmithtechnically he is not right20:07
dansmithpractically, he is.20:07
efriedbecause the obj could be loaded from cold storage20:07
dansmithright20:07
efriedwhere it was unset20:07
dansmithor you could del obj['field']20:08
efriedbut in this case the only way it got to cold storage was from having been previously __init__()ed, which set the defaults20:08
dansmiththe init hack might actually result in it being set when you load it from the db, but unintentionally, which is why I have campaigned against *ever* doing that20:08
efriedoh, unless it got stored/restored at an earlier version, I guess.20:08
dansmithwe don't check for every field in every object always because it's pointless.. in most cases we know which fields are always set20:09
*** gyee has quit IRC20:09
dansmithif there is any question, then you must check, but if there's not then you don'[t20:09
efriedso the check is not truly redundant, but not worth keeping in this case.20:10
dansmithso the init hack probably means you dont need to check (hence "practically") but... the init hack is not something we should be doing, IMHO, so..20:10
dansmithright20:10
dansmithI mean,20:10
openstackgerritMerged openstack/nova master: Reusable RequestGroup.add_{resource|trait}  https://review.opendev.org/69638020:10
dansmithyou can answer the "worth keeping" part, as I dont' have context20:10
dansmithbut I assume you're saying it has always been set somewhere, so probably safe20:10
dansmithwhich, if true, is in keeping with our practices20:11
efriedI'm glad I asked.20:11
efriedthanks20:11
dansmithyarp20:13
openstackgerritLee Yarwood proposed openstack/nova stable/train: compute: Use long_rpc_timeout in reserve_block_device_name  https://review.opendev.org/69695320:13
*** gryf has joined #openstack-nova20:13
openstackgerritMatt Riedemann proposed openstack/nova master: api-guide: flesh out the server actions section  https://review.opendev.org/69695420:14
mriedemdansmith: melwitt: i wrote up a doc on using the instance actions API/commands ^20:14
*** jmlowe has joined #openstack-nova20:14
melwittf yeah20:15
openstackgerritLee Yarwood proposed openstack/nova stable/stein: compute: Use long_rpc_timeout in reserve_block_device_name  https://review.opendev.org/69695520:15
openstackgerritLee Yarwood proposed openstack/nova stable/rocky: compute: Use long_rpc_timeout in reserve_block_device_name  https://review.opendev.org/69695620:16
*** munimeha1 has quit IRC20:20
*** ralonsoh has quit IRC20:20
melwittmriedem: I thought you might appreciate this video https://twitter.com/Fobwashed/status/120070135862347366420:24
efriedI am clearly missing a huge amount of context and will never get that video.20:26
dansmithsame20:26
mriedemi don't actually play games made after 98 so while i get it, ...20:26
mriedemit's a joke about a shitty first person adventure game20:26
dansmithpeople who live in their parents' basements are laughing hysterically right now though, I'm sure20:27
melwittreally? aw man20:27
dansmith(and melwitt apparently :P)20:27
melwittmy favorite game in the series was morrowind. I started playing skyrim but lost interest20:28
dansmithnever heard of either20:28
mriedemthe pan flute is a nice touch20:28
dansmithdamn kids.20:28
melwittyeah, I guess only rpg fans might know it. though skyrim got pretty universally popular (to my surprise)20:29
*** macz has joined #openstack-nova20:31
*** gyee has joined #openstack-nova20:36
*** tbachman has quit IRC20:50
*** tbachman has joined #openstack-nova20:55
mriedemoh i am an rpg fan and have the saving throws to prove it20:55
efried"saving throws" sounds like a judo thing to me20:56
*** amodi_ is now known as amodi20:57
efriedlike "sacrifice throws" https://www.youtube.com/watch?v=XHuqFd1BHbE20:58
mriedemthere is an equivalent amount of BO smell21:00
mriedemi'm not sure you can be turned to stone by a cockatrice in judo though21:01
melwittlol ......and that's why I thought there was a chance you'd appreciate the video21:02
*** eharney has quit IRC21:06
*** damien_r has joined #openstack-nova21:10
*** damien_r has quit IRC21:14
*** tbachman has quit IRC21:19
*** Sundar has quit IRC21:25
*** tbachman has joined #openstack-nova21:27
openstackgerritDustin Cowles proposed openstack/nova master: Provider Config File: YAML file loading and schema validation  https://review.opendev.org/67334121:40
openstackgerritDustin Cowles proposed openstack/nova master: Provider Config File: Function to further validate and retrieve configs  https://review.opendev.org/67602921:40
openstackgerritDustin Cowles proposed openstack/nova master: Provider Config File: Functions to merge provider configs to provider tree  https://review.opendev.org/67652221:40
openstackgerritDustin Cowles proposed openstack/nova master: WIP: Provider Config File: Enable loading and merging of provider configs  https://review.opendev.org/69346021:40
*** dpawlik has joined #openstack-nova21:43
*** dpawlik has quit IRC21:48
openstackgerritEric Fried proposed openstack/nova master: Tie requester_id to RequestGroup suffix  https://review.opendev.org/69694621:48
*** nweinber has quit IRC21:48
*** slaweq has quit IRC21:48
*** StevenK_ is now known as StevenK21:49
*** fnordahl has joined #openstack-nova21:56
*** rcernin has joined #openstack-nova21:57
*** mugsie_ is now known as mugsie21:59
*** rcernin has quit IRC21:59
*** factor has joined #openstack-nova22:08
*** slaweq has joined #openstack-nova22:09
*** gshippey has quit IRC22:10
openstackgerritMerged openstack/nova master: api-guide: flesh out the server actions section  https://review.opendev.org/69695422:11
*** pcaruana has quit IRC22:11
*** tbachman has quit IRC22:12
*** slaweq has quit IRC22:13
*** eharney has joined #openstack-nova22:13
*** abaindur has joined #openstack-nova22:16
*** dpawlik has joined #openstack-nova22:22
*** abaindur has quit IRC22:24
*** abaindur has joined #openstack-nova22:24
*** dpawlik has quit IRC22:26
*** rcernin has joined #openstack-nova22:57
*** avolkov has quit IRC23:08
*** tkajinam has joined #openstack-nova23:09
*** tbachman has joined #openstack-nova23:19
*** mlavalle has quit IRC23:30
openstackgerritEric Fried proposed openstack/nova master: Use Placement 1.34 (string suffixes & mappings)  https://review.opendev.org/69641823:41
openstackgerritEric Fried proposed openstack/nova master: Tie requester_id to RequestGroup suffix  https://review.opendev.org/69694623:41
*** tosky has quit IRC23:41
*** mriedem has quit IRC23:54

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!