Thursday, 2018-11-01

*** markvoelker has quit IRC00:01
*** brinzhang has joined #openstack-nova00:14
*** slaweq has quit IRC00:15
*** erlon has quit IRC00:30
*** betherly has joined #openstack-nova00:35
*** betherly has quit IRC00:39
*** mlavalle has quit IRC00:42
*** erlon has joined #openstack-nova00:43
*** betherly has joined #openstack-nova00:56
*** betherly has quit IRC01:01
*** zul has quit IRC01:04
*** wangy has joined #openstack-nova01:06
*** hongbin has joined #openstack-nova01:24
*** TuanDA has joined #openstack-nova01:37
*** betherly has joined #openstack-nova01:48
*** betherly has quit IRC01:52
*** Dinesh_Bhor has joined #openstack-nova01:54
*** erlon has quit IRC01:58
*** sapd1 has quit IRC02:02
*** sapd1_ has joined #openstack-nova02:02
openstackgerritZhenyu Zheng proposed openstack/nova master: WIP: Support attach/detach instance root volume  https://review.openstack.org/61444102:02
*** markvoelker has joined #openstack-nova02:03
*** cfriesen has quit IRC02:04
openstackgerritBrin Zhang proposed openstack/nova master: Remove useless sample and add the lack of tests in v266  https://review.openstack.org/61467102:07
*** tetsuro has joined #openstack-nova02:08
*** tetsuro has quit IRC02:11
*** mhen has quit IRC02:13
*** mhen has joined #openstack-nova02:16
*** tiendc has joined #openstack-nova02:25
openstackgerritZhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid  https://review.openstack.org/61467202:29
openstackgerritZhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid  https://review.openstack.org/61467202:33
*** markvoelker has quit IRC02:35
*** tetsuro has joined #openstack-nova02:54
*** mrsoul has quit IRC02:55
*** tetsuro has quit IRC02:59
*** tetsuro has joined #openstack-nova03:00
*** psachin has joined #openstack-nova03:10
*** sapd1_ has quit IRC03:15
*** sapd1__ has joined #openstack-nova03:17
*** icey has quit IRC03:18
*** betherly has joined #openstack-nova03:21
*** sapd1__ has quit IRC03:22
*** sapd1_ has joined #openstack-nova03:22
*** icey has joined #openstack-nova03:23
*** betherly has quit IRC03:26
*** markvoelker has joined #openstack-nova03:32
*** threestrands has joined #openstack-nova03:46
*** udesale has joined #openstack-nova03:50
*** betherly has joined #openstack-nova03:53
*** Dinesh_Bhor has quit IRC03:56
*** betherly has quit IRC03:57
*** bzhao__ has quit IRC03:58
*** wangy has quit IRC04:04
*** markvoelker has quit IRC04:06
*** betherly has joined #openstack-nova04:24
*** betherly has quit IRC04:29
*** hongbin has quit IRC04:37
*** tetsuro has quit IRC04:44
*** Dinesh_Bhor has joined #openstack-nova04:46
*** alex_xu has quit IRC04:53
*** alex_xu has joined #openstack-nova04:56
*** markvoelker has joined #openstack-nova05:02
*** wangy has joined #openstack-nova05:10
*** TuanDA has quit IRC05:10
*** betherly has joined #openstack-nova05:16
*** betherly has quit IRC05:21
*** ircuser-1 has quit IRC05:24
*** abhishekk has joined #openstack-nova05:26
*** ratailor has joined #openstack-nova05:35
*** markvoelker has quit IRC05:36
*** betherly has joined #openstack-nova05:37
*** betherly has quit IRC05:42
*** fanzhang has joined #openstack-nova05:49
*** betherly has joined #openstack-nova05:57
*** betherly has quit IRC06:02
*** wangy has quit IRC06:04
openstackgerritZhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid  https://review.openstack.org/61467206:15
*** Dinesh_Bhor has quit IRC06:25
*** markvoelker has joined #openstack-nova06:33
*** tiendc has quit IRC06:39
openstackgerritZhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid  https://review.openstack.org/61467206:44
*** wangy has joined #openstack-nova06:45
*** threestrands has quit IRC06:52
*** Dinesh_Bhor has joined #openstack-nova07:01
*** brinzhang has quit IRC07:01
*** brinzhang has joined #openstack-nova07:02
*** tetsuro has joined #openstack-nova07:04
*** icey has quit IRC07:04
*** markvoelker has quit IRC07:05
*** icey_ has joined #openstack-nova07:12
*** icey_ is now known as icey07:14
openstackgerritMerged openstack/nova stable/rocky: conductor: Recreate volume attachments during a reschedule  https://review.openstack.org/61248707:18
*** Dinesh_Bhor has quit IRC07:24
*** lpetrut has joined #openstack-nova07:26
*** Dinesh_Bhor has joined #openstack-nova07:30
*** skatsaounis has quit IRC07:30
*** alexchadin has joined #openstack-nova07:38
*** pcaruana|elisa| has joined #openstack-nova07:40
*** slaweq has joined #openstack-nova07:58
*** pcaruana|elisa| has quit IRC07:59
*** imacdonn has quit IRC08:00
*** udesale has quit IRC08:02
*** Dinesh_Bhor has quit IRC08:02
*** markvoelker has joined #openstack-nova08:03
*** pcaruana has joined #openstack-nova08:05
*** lpetrut has quit IRC08:12
*** slaweq has quit IRC08:18
*** ralonsoh has joined #openstack-nova08:22
*** ralonsoh has quit IRC08:22
*** ralonsoh has joined #openstack-nova08:23
*** markvoelker has quit IRC08:36
*** gokhani has joined #openstack-nova08:37
*** skatsaounis has joined #openstack-nova08:38
openstackgerritZhenyu Zheng proposed openstack/nova-specs master: Make scheduling weight more granular  https://review.openstack.org/59930808:46
*** mgoddard has joined #openstack-nova08:50
*** tetsuro has quit IRC08:51
*** tetsuro has joined #openstack-nova08:53
*** alexchadin has quit IRC09:00
*** alexchadin has joined #openstack-nova09:01
*** rmk has quit IRC09:02
*** rabel has quit IRC09:02
*** Dinesh_Bhor has joined #openstack-nova09:05
*** tetsuro has quit IRC09:09
*** alexchadin has quit IRC09:11
*** fghaas has joined #openstack-nova09:14
*** derekh has joined #openstack-nova09:16
*** udesale has joined #openstack-nova09:16
*** wangy has quit IRC09:20
*** fghaas has quit IRC09:23
*** k_mouza has joined #openstack-nova09:24
openstackgerritMartin Midolesov proposed openstack/nova master: vmware:PropertyCollector for caching instance properties  https://review.openstack.org/60827809:26
openstackgerritMartin Midolesov proposed openstack/nova master: VMware: Expose esx hosts to Openstack  https://review.openstack.org/61362609:26
*** rabel has joined #openstack-nova09:28
*** masayukig[m] has joined #openstack-nova09:29
*** markvoelker has joined #openstack-nova09:33
*** ttsiouts has joined #openstack-nova09:36
openstackgerritLucian Petrut proposed openstack/os-vif master: Do not import pyroute2 on Windows  https://review.openstack.org/61472809:39
*** ttsiouts has quit IRC09:42
*** ttsiouts has joined #openstack-nova09:47
*** lpetrut has joined #openstack-nova09:49
openstackgerritStephen Finucane proposed openstack/nova master: PowerVM upt parity for reshaper, DISK_GB reserved  https://review.openstack.org/61464309:50
openstackgerritgaobin proposed openstack/nova master: Improve the properties of the api  https://review.openstack.org/61473009:50
stephenfinlyarwood: Morning. Think you could take a look at these backports today? https://review.openstack.org/#/q/topic:bug/1799727+branch:stable/rocky09:52
*** spatel has joined #openstack-nova09:56
*** panda|off is now known as panda09:56
lyarwoodstephenfin: yup will do09:58
*** spatel has quit IRC10:00
*** ralonsoh has quit IRC10:02
*** ralonsoh has joined #openstack-nova10:03
*** markvoelker has quit IRC10:07
openstackgerritLucian Petrut proposed openstack/os-vif master: Do not import pyroute2 on Windows  https://review.openstack.org/61472810:10
*** phuongnh has joined #openstack-nova10:12
*** k_mouza has quit IRC10:16
openstackgerritgaobin proposed openstack/nova master: Improve the properties of the api  https://review.openstack.org/61473010:17
openstackgerritStephen Finucane proposed openstack/nova master: Fail to live migration if instance has a NUMA topology  https://review.openstack.org/61108810:17
*** k_mouza has joined #openstack-nova10:18
*** k_mouza has quit IRC10:19
*** k_mouza has joined #openstack-nova10:20
*** jaosorior has quit IRC10:23
*** tssurya has joined #openstack-nova10:24
*** phuongnh has quit IRC10:31
*** tbachman has quit IRC10:32
*** k_mouza has quit IRC10:40
openstackgerritTakashi NATSUME proposed openstack/python-novaclient master: Fix flavor keyerror when nova boot vm  https://review.openstack.org/58214710:41
*** k_mouza has joined #openstack-nova10:44
*** alexchadin has joined #openstack-nova10:45
*** dave-mccowan has joined #openstack-nova10:46
*** Dinesh_Bhor has quit IRC10:57
*** markvoelker has joined #openstack-nova11:04
*** pcaruana has quit IRC11:05
*** k_mouza has quit IRC11:07
johnthetubaguystephenfin: yeah, it makes sense not to disrupt the chain.11:15
*** udesale has quit IRC11:24
*** alexchadin has quit IRC11:26
*** k_mouza has joined #openstack-nova11:26
*** ttsiouts has quit IRC11:30
*** jaosorior has joined #openstack-nova11:35
*** ratailor has quit IRC11:36
*** markvoelker has quit IRC11:36
*** Nel1x has joined #openstack-nova11:40
openstackgerrithuanhongda proposed openstack/nova master: AZ operations: check host has no instances  https://review.openstack.org/61183311:50
*** pcaruana has joined #openstack-nova11:52
*** erlon has joined #openstack-nova11:57
*** ttsiouts has joined #openstack-nova12:06
*** lpetrut has quit IRC12:13
openstackgerritZhenyu Zheng proposed openstack/nova master: WIP support attach/detach root volume 2  https://review.openstack.org/61475012:19
jaypipesjohnthetubaguy: re: the unified limits thing... I should have some PoC code to show you by end of week. It will give us something more concrete to discuss. It doesn't impact the REST API in nova at all.12:20
*** alexchadin has joined #openstack-nova12:21
johnthetubaguyjaypipes: OK, cool. Which bit are you looking at, using placement or the oslo.limits piece, or both?12:21
johnthetubaguyjaypipes: was hoping to start work on a PoC soon, how I have finished the previous project that has been distracting me full time!12:22
jaypipesjohnthetubaguy: both.12:22
johnthetubaguyat the PTG we seemed to land on doing the placement thing second, but I would certainly like to see the two together12:22
jaypipesjohnthetubaguy: not actually using oslo.limits, but with a bunch of "TODO(jaypipes): This should be ported to oslo.limits" notes. :) Along with a health dose of "NOTE(jaypipes): Under no circumstances should this infect oslo.limits"12:23
johnthetubaguyah, OK, got you12:23
johnthetubaguysounds good12:23
jaypipesjohnthetubaguy: yeah, I'm tackling the limit-getting stuff first, placement queries second.12:23
jaypipesjohnthetubaguy: obviously, the limit-*setting* stuff along with quota classes are the things marked "under no circumstances should this infect oslo.limits" :)12:24
johnthetubaguyjaypipes: I was thinking along the lines of a parallel quota system, so we just ditch all the old stuff, its too infected with junk like user limits12:25
johnthetubaguywell, its clearly not quite that simple, but anyways, looking forward to seeing the PoC12:26
jaypipesjohnthetubaguy: yeah, I haven't touched any of the "develop a system to migrate nova to use unified limits" stuff. that part of your spec would still very much be needed.12:27
jaypipesjohnthetubaguy: that said, I've added the infrastructure to be able to configure CONF.quota.driver to something like "unified" and have that switch the underlying mechanisms for limits retrieval.12:27
jaypipesjohnthetubaguy: so hopefully that data migration stuff can build on top of my work.12:28
jaypipesjohnthetubaguy: hopefully it should all make sense when I push the code today or tomorrow.12:28
jaypipes(I'm OOO this afternoon)12:28
johnthetubaguyjaypipes: ah, I don't have code for it yet, only a plan. Yeah, I think I get what you mean, but will look out for the patches12:29
jaypipesjohnthetubaguy: cool, thanks. I'll add you to the reviews.12:30
*** udesale has joined #openstack-nova12:32
*** Nel1x has quit IRC12:37
*** brinzhang has quit IRC12:40
*** munimeha1 has joined #openstack-nova12:47
*** jmlowe has quit IRC12:47
openstackgerritOpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata  https://review.openstack.org/61475712:51
*** ttsiouts has quit IRC12:55
*** zul has joined #openstack-nova12:56
*** udesale has quit IRC12:56
*** eharney has joined #openstack-nova13:01
*** udesale has joined #openstack-nova13:01
*** ttsiouts has joined #openstack-nova13:04
*** zul has quit IRC13:04
*** zul has joined #openstack-nova13:06
*** jmlowe has joined #openstack-nova13:07
*** tbachman_ has joined #openstack-nova13:10
*** mchlumsky has joined #openstack-nova13:12
*** tbachman_ is now known as tbachman13:13
*** liuyulong has joined #openstack-nova13:15
*** belmoreira has joined #openstack-nova13:17
*** k_mouza has quit IRC13:20
*** k_mouza has joined #openstack-nova13:25
*** awaugama has joined #openstack-nova13:28
*** mriedem has joined #openstack-nova13:28
sean-k-mooneybauzas: mriedem we have a regression in the os-vif 1.12.0 release which is fixed in one of my patches already so we are going to blacklist 1.12.0 in the global requirements. https://review.openstack.org/#/c/614764/113:38
sean-k-mooneyim going to work on geting 2 new os-vif gate jobs to test ovs with iptables and linux brige next sprint to catch these kind of thing going forward13:39
sean-k-mooneyill likely start on that next week however.13:40
sean-k-mooneybauzas: as the nova release liasion could you comment on13:40
sean-k-mooneyhttps://review.openstack.org/#/c/614764/113:40
mnaserok please forgive me if this sound silly but13:43
mnasermicroversion 1.4 > microversion 1.25, right?13:43
sean-k-mooneyno13:45
sean-k-mooneyits not a desimal point13:45
sean-k-mooneyit semantic versioning13:45
*** takashin has joined #openstack-nova13:46
mnaserok13:46
mnaserexplains things13:46
* mnaser goes back to hacking things13:46
mnaserthanks sean-k-mooney13:46
*** liuyulong has quit IRC13:47
johnthetubaguymnaser: its more like version 4.0 vs version 25.0 actually, as any micro-version can drop functionality13:50
mnaserOkay, so trying to figure out why this upgrade somehow is causing nova to request a micro version 1.25 but the service is not providing that13:51
mnaserCould be a super screwed up deployment too.13:51
johnthetubaguyoh right, request the version from cinder or ironic?13:51
johnthetubaguywe usually have a minimum version we need, which implies a minimum version of all the dependent services13:52
johnthetubaguymnaser: who is requesting 1.25 from whom?13:53
mnaserjohnthetubaguy: so it looks like os_region_name is not a valid option inside the placement section13:55
mnaserSo this multiregion deployed was probably hitting the wrong region. os_region_name was silently dropped?13:56
mnaserSo it was hitting an older region13:56
johnthetubaguygood question, that sounds bad13:56
mnaserIt was removed after one cycle..13:57
sean-k-mooneymnaser: the simplest thing to do it pretend ther is no .13:57
mnaserhttps://github.com/openstack/nova/commit/3db815957324f4bd6912238a960a90624d97c51813:58
mriedemnova meeting in 2 minutes13:58
mnaserA bit quick to remove it after just a cycle?13:58
*** suggestable has joined #openstack-nova14:01
johnthetubaguymnaser: that has always been the norm for config, we just don't usually remember to do it14:01
*** suggestable has left #openstack-nova14:01
*** liuyulong_ has joined #openstack-nova14:03
mnaserjohnthetubaguy: ah okay14:04
johnthetubaguymnaser: now the whole skip version upgrades thing clearly makes that less of a good policy... not sure if we have an answer for that one yet.14:06
mnaserjohnthetubaguy: yeah, i dont do that (nor do i support that idea).. so i should look at logs :p14:07
johnthetubaguymnaser: heh :)14:09
mriedemoslo.config has a new thing for FFU with config stuff14:15
johnthetubaguymriedem: ah, cool14:19
sean-k-mooneyon the meeting ended quicking then i taught it would14:20
*** takashin has left #openstack-nova14:20
sean-k-mooneyi was going to ask peole to asses https://blueprints.launchpad.net/nova/+spec/libvirt-neutron-sriov-livemigration and the related spec if they can to indicate if this can proceed for this cycle14:21
sean-k-mooneyi have spec update to make but they will be done later today.14:21
mriedemjohnthetubaguy: mnaser: this thing https://specs.openstack.org/openstack/oslo-specs/specs/rocky/handle-config-changes.html14:23
mriedemi think that is still a WIP14:23
mriedemjackding: if https://review.openstack.org/#/c/609180/ is ready for review please put it in the runways queue https://etherpad.openstack.org/p/nova-runways-stein14:24
*** mlavalle has joined #openstack-nova14:24
*** Luzi has joined #openstack-nova14:27
*** mvkr has quit IRC14:40
mriedemhmm, did something regress with performance? https://bugs.launchpad.net/nova/+bug/180075514:42
openstackLaunchpad bug 1800755 in OpenStack Compute (nova) "The instance_faults table is too large, leading to slow query speed of command: nova list --all-tenants" [Undecided,New]14:42
mriedemthat was fixed with https://bugs.launchpad.net/nova/+bug/180075514:42
mriedemoops14:42
mriedemhttps://review.openstack.org/#/c/409943/14:42
mriedemis there any reason we don't purge old faults?14:43
mriedemwe only show the latest14:43
mriedemand we don't provide any API or nova-manage CLI to show *all* faults for a given instance14:43
*** Luzi has quit IRC14:44
jackdingmriedem: sure, will do14:44
sean-k-mooneymriedem: would that mess with audit logs?14:45
mriedemyou mean that config/api that no one uses?14:45
sean-k-mooneymriedem: a nova-manage command could make sense or an admin only api14:45
mriedemhttps://developer.openstack.org/api-ref/compute/?expanded=list-server-usage-audits-detail#server-usage-audit-log-os-instance-usage-audit-log14:45
sean-k-mooneymriedem: no i was thinking that for some deployment there may be  requiremetn to record falts for audit/sla reasons14:46
sean-k-mooneyi was not thinking of any feature in partaclar14:46
sean-k-mooneyim just not sure if auto cleanup of old faluts would be somehting we would want in all cases14:47
mriedemi'm not suggest an auto cleanup,14:47
mriedembut a nova-manage db purge_faults14:48
openstackgerritSurya Seetharaman proposed openstack/nova master: [WIP] Make _instances_cores_ram_count() be smart about cells  https://review.openstack.org/56905514:48
openstackgerritSurya Seetharaman proposed openstack/nova master: WIP: API microversion bump for handling-down-cell  https://review.openstack.org/59165714:48
openstackgerritSurya Seetharaman proposed openstack/nova master: [WIP] Add os_compute_api:servers:create:cell_down policy  https://review.openstack.org/61478314:48
sean-k-mooneyya that i think makes total sense. the same way keystone allows you to purge the expired uuid tokens from its db14:48
sean-k-mooneymriedem: where you thinking it would drop the fault older then X from the db or move them to an archive table?14:50
mriedemi'm not really putting much thought into this14:51
*** Swami has joined #openstack-nova14:52
sean-k-mooneyits one of those things that if you brought it up at the  ptg i woudl be like "sure go for it" but it also does not should like a supper high prioity either so ya in any case i cant really think of a reason not to allow it off the top of my head14:54
mriedemtssurya: in case you haven't started yet, i was thinking about how to do 2.68 down-cell functional api samples testing, which will require some kind of fixture to simulate a down cell,14:58
mriedemand i think i have an idea of how to write that14:58
*** cfriesen has joined #openstack-nova15:00
tssuryamriedem: I saw your todos but I haven't started, feel free to start if you have the time your tests are surely going to be more thorough than mine.15:00
tssuryabug thanks15:00
tssuryabig*15:00
mriedemok i think i'll just hack on a DownCellFixture in a separate patch below the API microversion one at the end, and then it could be used in the functional api samples tests,15:00
mriedemthe nice thing with fixtures is they are also context managers,15:01
mriedemso you could create a server while the cell is 'up' and then do something like:15:01
mriedemwith down_cell_fixture:15:01
mriedem    get('/servers')15:01
mriedemand you should get the minimal construct back15:01
tssuryaoh nice15:02
tssuryathere was a doubt however with the sample tests, the jsons you have created.. I thought they were supposed to be created automatically once we write the tests ?15:02
mriedemi think they are,15:02
mriedemi was just trying to get the api-ref build to pass15:02
tssuryaah okay :)15:03
*** liuyulong_ has quit IRC15:05
*** liuyulong has joined #openstack-nova15:06
mriedemlyarwood: don't forget to add an etherpad for your forum session to https://wiki.openstack.org/wiki/Forum/Berlin201815:07
mriedemi think i have crap to dump in there15:07
mriedemmelwitt: were you going to send https://etherpad.openstack.org/p/nova-forum-stein to the ML for the list of xp sessions to have warm nova bodies in attendance?15:09
lyarwoodmriedem: unfortunately I'm no longer attending, sent a note to the foundation when I found out yesterday.15:09
mriedemlyarwood: hmm, i could possibly run that session15:09
mriedemor we could find *someone*15:09
dansmithI vote for mriedem15:09
mriedemrandom berliner on the street15:09
mriedemi'll pay them in sausage15:10
dansmithhe needs more stuff to do and I hear he loves volumes, especially multi-attached ones15:10
*** k_mouza has quit IRC15:10
*** jmlowe has quit IRC15:10
mriedemi'll gladly moderate any number of forum sessions if it means i don't have to do any presentations15:10
*** mvkr has joined #openstack-nova15:11
*** kukacz has quit IRC15:12
mriedemlyarwood: well if the foundation doesn't pull the session, it looks like i had it marked on my calendar to attend anyway so if you want i can moderate it15:12
mriedemand just assign all of the work to you15:12
*** psachin has quit IRC15:13
*** itlinux has quit IRC15:13
lyarwoodmriedem: haha so nothing would ever get done15:13
lyarwoodmriedem: but yeah let me ping them quickly and see if I can save that session15:14
mriedemthere are a few volume-related specs for stein that would be good to discuss there, like this one to specify delete_on_termination when attaching a volume (and changing that value for existing attachments)15:14
mriedemwhich reminds me, i dusted this off too https://review.openstack.org/#/c/393930/15:15
mriedemgetting device tags out of the API15:15
mriedemdansmith: i think you've been on board with that in the past ^15:15
lyarwoodmriedem: ha the delete_on_termination issue just came up downstream and we NACK'd assuming it would be lots of work for little gain15:15
* dansmith nods15:16
lyarwoodmriedem: are you getting pushed for that as well given users can do this in AWS?15:16
mriedemthe major problem i see with that one is we already have a PUT API for volume attachments,15:16
mriedemthe dreaded swap volume API15:16
mriedemlyarwood: no i'm not getting pushed for it from our product people15:16
mriedemas far as i know anyway15:16
mriedembut it's one of those things that comes up every so often, like proxying the volume type on bfv15:17
mriedemi don't think it's much work, it's just updating the DB15:17
mriedemand taking a new parameter on attach15:17
mriedemupdating existing attachments is difficult b/c of our already f'ed up api15:18
mriedemhttps://developer.openstack.org/api-ref/compute/#update-a-volume-attachment15:18
dansmithit's not a major amount of heavy lifting,15:18
dansmithbut the gain seems very minor to me15:18
sean-k-mooneyso random quest. would peole object to an api to list the currently enabled schduler filters? specifically to enabel tempest and other multicloud services to detect what schuler featres they can expect15:19
dansmiththe strongest argument I've seen for it is that AWS has it and thus the standalone EC2 thing needs to be able to proxy that in15:19
sean-k-mooney*question however it could become a quest15:19
dansmithbut afaik, that's pretty much dead these days15:19
*** kukacz has joined #openstack-nova15:19
dansmithsean-k-mooney: yes I would object15:19
*** alexchadin has quit IRC15:19
mriedem"because AWS and Alibaba have it" is something i hear every week15:19
sean-k-mooneydansmith: because we are exposing configuration via the api or somethign else15:20
dansmithsean-k-mooney: it would literally be an api call that would make an rpc call to scheduler to return a chunk of config, which shouldn't be visible externally anyway. and if you're running multiple schedulers, which do you cal?15:20
mriedemthe only reason i could see for doing something like that (scheduler filters and such) is to tell users, via the api, which hints are available15:20
*** fghaas has joined #openstack-nova15:20
johnthetubaguysean-k-mooney: discovery of available scheduler hints was something we once said we would consider, which is a bit different15:21
dansmithyep15:21
mriedemright, it would only be feasible if it was a list of hints15:21
mriedemwhich is totally pluggable btw15:21
sean-k-mooneyjohnthetubaguy: its related to this tempest change https://review.openstack.org/#/c/570207/1215:21
johnthetubaguyyeah, that was the downside, in the general case, it means nothing useful15:21
sean-k-mooneyjohnthetubaguy: the issue i have with the change is it require use to keep the nova and tempest default in sync15:22
dansmithsean-k-mooney: tempest has always been blackbox, requiring you to tell it the nova side scheduler config for this reason15:22
artomsean-k-mooney, don't you dare bring more people into this. I will fly to Ireland and cut you, I swear.15:22
artomWe already can't agree downstream15:22
dansmithtempest is a testing/validation tool.. keeping the two configs in sync is a few lines of bash15:22
mriedemright, devstack configures the filters in both nova and tempest15:22
dansmithright15:22
*** k_mouza has joined #openstack-nova15:22
sean-k-mooneydansmith: the issue is making triplo do that15:22
mriedemdevstack also adds the same/different host filtesr which aren't in the default enabled_filters list for nova15:23
dansmithsean-k-mooney: s/bash/puppet/15:23
mriedemfor any nfv ci, they'd also need to configure to numa/pci filters15:23
mriedemetc15:23
sean-k-mooneydansmith: ya i know its just triplo is a pain to make work instead of devstack15:23
dansmithsean-k-mooney: adding an api to nova to work around tripleo not being able to communicate config to another module is INSANITY15:23
sean-k-mooneymriedem: yes today they only need to enable it in nova however15:23
johnthetubaguyI think sdague convinced me about this in the past, you don't want auto discovery, you want to tell the test system what you expect to happen, else there be dragons15:24
artomdansmith, it's not communicate per se - if tripleo doens't set the nova value, it shouldn't have to set the corresponding tempest value15:24
dansmithartom: find another way15:25
dansmithseriously.15:25
johnthetubaguymatching defaults?15:25
artomdansmith, my other way is https://review.openstack.org/#/c/570207/1215:25
sean-k-mooneyjohnthetubaguy: ya well it was jut a taught my main issue with artom change is that we would have to keep it in sync if we add filter in the futre to the default set15:25
artomjohnthetubaguy, that's what ^^ is15:25
artomBut apparently everyone is literally willing to fight to the death over this.15:25
dansmithI am15:25
* dansmith breaks a bottle on the table15:25
dansmithlet's do it.15:25
artomI only have this bluetooth mouse :(15:26
dansmithforfeit?15:26
* johnthetubaguy hopes people defend themselves with pumpkins15:26
sean-k-mooneyjohnthetubaguy: the interop benift is really only a side effect and i dont feel that strongly that its a good thing15:26
artomPfft, as if. I'm making brass knuckles. Wireless ones.15:26
*** k_mouza has quit IRC15:27
johnthetubaguyartom: curious, when nova changes a default in its config, what happens to the rest of the tempest settings?15:29
artomjohnthetubaguy, you mean for other config options where Tempest uses values from Nova? Good question - gmann was saying on that review that they just update Tempest, but I'd need to look for concrete examples15:31
johnthetubaguyartom: cool, that is what I assumed. I know its branchless, but the default is just a helping hand.15:31
* johnthetubaguy shudders, its complicated15:32
artomjohnthetubaguy, yeah, I grok that it can't be perfect, I figured at least making it match what Nova has in master is a good first step.15:32
*** rmk has joined #openstack-nova15:32
artomjohnthetubaguy, because the previous default of 'all' is... well, it's a handy "feature" for CIs, because they can just enable any filter in Nova and Tempest just runs with it15:33
*** gyee has joined #openstack-nova15:33
artomBut it becomes a problem if a filter *hasn't* been enabled in Nova, Tempest will still try to run with it.15:33
mriedemthis is fun https://bugs.launchpad.net/nova/+bug/180050815:52
openstackLaunchpad bug 1800508 in OpenStack Compute (nova) "Missing exception handling mechanism in 'schedule_and_build_instances' for DBError at line 1180 of nova/conductor/manager.py" [Low,New]15:52
mriedem"nova should set the instance to error state when nova fails to insert the instance into the db"15:53
sean-k-mooneymriedem: am wait if nova cant insert the instacne into the db what is it setting error on?15:54
artomChicken, meet again. Cart, meet horse.15:54
artom"again"? I mean egg15:54
mriedemthe only thing i could think there is we could try updating the instance within the build request, but that's pretty shitty15:57
sean-k-mooneymriedem: in this case however it seams like they are booting with an invalid flavor id right?15:59
mriedemno15:59
mriedemhe's injecting some kind of fault into the code15:59
mriedemto trigger the db error15:59
mriedemif you try to boot with an invalid flavor id, you'll get a 404 in the api looking up the flavor15:59
sean-k-mooneyoh ok i was trying to figure out how the create a flavor with id 1E+22 but then failed to boot with that flavor16:00
mriedemso, i mean, your cell db could drop right when we're trying to create the server i guess, that would do it as well16:00
mriedembut are we going to handle that scenario everywhere in nova?16:00
sean-k-mooneyok right so in that case the insnace would be in the api db but fail to insert into the cell db16:01
sean-k-mooneyme moved the instance staus into the api db recnetly right so ya in that case we coudl set error on the api db i guess but there are a tone of other edgecase like that we dont handel16:02
sean-k-mooneymriedem: the other thing we coudl do is have a periodic task that just updates the status of perptually building instance to error after some time e.g a day or rety limit*build timeout or something16:04
mriedemthis guy has been busy https://bugs.launchpad.net/nova/+bug/1800204 https://bugs.launchpad.net/nova/+bug/179994916:05
openstackLaunchpad bug 1800204 in OpenStack Compute (nova) "n-cpu.service consuming 100% of CPU indeterminately" [Undecided,New]16:05
openstackLaunchpad bug 1799949 in OpenStack Compute (nova) "VM instance building forever when an RPC error occurs" [Undecided,New]16:05
sean-k-mooneymriedem: actull https://bugs.launchpad.net/nova/+bug/1800204 seams familar there was a similar bug report a few monts back around the rocky release16:06
openstackLaunchpad bug 1800204 in OpenStack Compute (nova) "n-cpu.service consuming 100% of CPU indeterminately" [Undecided,New]16:06
*** k_mouza has joined #openstack-nova16:10
*** belmoreira has quit IRC16:11
sean-k-mooneyoh wait that n-cpu not the conductor never mind16:13
*** itlinux has joined #openstack-nova16:14
*** imacdonn has joined #openstack-nova16:15
*** ttsiouts has quit IRC16:16
sean-k-mooneymriedem: do you think we will actully adress any of those bugs.16:16
stephenfinartom: So, do I need to review https://code.engineering.redhat.com/gerrit/#/c/154627/2 yet?16:16
sean-k-mooneystephenfin: wrong irc16:16
stephenfinta :)16:16
mriedemsean-k-mooney: probably not16:19
mriedemunless there is a more obvious way to create those faults with injecting code into the path and blow up the system16:19
mriedem*without16:19
*** ircuser-1 has joined #openstack-nova16:20
sean-k-mooneymriedem: i was just debating if we shoudl triage them as incomplete or wontfix unless a different way to reporduce can be provided16:22
mriedemi marked one of them as opinion16:23
*** k_mouza has quit IRC16:24
johnthetubaguyFWIW, I always wanted to be able to "timeout" tasks to try and catch that pending forever case. They caused me endless pain at Rackspace (I think mostly in the migrate/resize code path). The difference was they were more expected / user triggered errors.16:25
openstackgerritMatt Riedemann proposed openstack/nova master: Add --before to nova-manage db archive_deleted_rows  https://review.openstack.org/55675116:25
mriedemjohnthetubaguy: how much of that was resolved with service user tokens though?16:26
mriedemor the long_rpc_timeout we have since rocky16:26
mriedemwhich we're using now in the live migration flows that do rpc calls16:26
johnthetubaguymriedem: yeah, I saw that go in. Although most of those cases it went to Error (eventually) when it didn't have to.16:27
*** ttsiouts has joined #openstack-nova16:27
mriedemthat's a different bug then16:27
sean-k-mooneyjohnthetubaguy: i have seen this happen with rabbitmq restarte in the past where when perstiency was disabled on instance build and a few other cases.16:29
sean-k-mooneyi never really considerd that a nova bug however because i cased the issue by restarting rabbit16:29
*** slaweq has joined #openstack-nova16:29
sean-k-mooneyjohnthetubaguy: but ya its proably more complcated then jsut set to error after x time as some request could still be in flight16:31
melwittmriedem: yes, it completely slipped my mind :( and I'm not done going through the entire list of the schedule yet16:36
mriedemmelwitt: i added several sessions in there based on my schedule16:40
melwittok, thank you. that's helpful16:41
johnthetubaguysean-k-mooney: yeah, its hard to get right16:47
*** k_mouza has joined #openstack-nova16:53
*** Swami has quit IRC16:55
openstackgerritMatt Riedemann proposed openstack/nova master: WIP: API microversion bump for handling-down-cell  https://review.openstack.org/59165716:56
openstackgerritMatt Riedemann proposed openstack/nova master: Add DownCellFixture  https://review.openstack.org/61481016:56
mriedemtssurya: ^16:56
tssuryalooking, thanks16:57
*** k_mouza has quit IRC16:59
*** mgoddard has quit IRC17:02
*** belmoreira has joined #openstack-nova17:02
tssuryamriedem: okay I am going to write the tests here https://review.openstack.org/#/c/591657/12/nova/tests/functional/api_sample_tests/test_servers.py based on your fixture17:04
*** udesale has quit IRC17:06
openstackgerritMerged openstack/nova master: Make ResourceTracker.tracked_instances a set  https://review.openstack.org/60878117:13
dansmithmriedem: tssurya I'm explaining the host-status concept to someone right now, and why an instance state doesn't  go to STOPPED just because the compute node is down17:18
dansmithmriedem: I wonder if it would make sense to integrate the use of the UNKNOWN state we're adding here with that feature,17:18
*** fghaas has quit IRC17:18
dansmithso that in the same microversion, instances with a down host show up as UNKNOWN as well17:18
*** fghaas has joined #openstack-nova17:19
*** belmoreira has quit IRC17:19
*** fghaas has quit IRC17:19
*** k_mouza has joined #openstack-nova17:20
*** belmoreira has joined #openstack-nova17:20
tssuryadansmith: you mean you want to add a new "UNKNOWN" vm_state ?17:25
*** Swami has joined #openstack-nova17:26
dansmithtssurya: you're already doing that from the external view right now17:26
tssuryayea17:26
dansmithwe would do a similar thing for real instances we can look up just fine, but which have down hosts17:26
dansmiththe only problem would be that right now UNKNOWN means "the rest of the instance details aren't there" which would be slightly more ambiguous in this case17:27
*** slaweq has quit IRC17:27
tssuryahmm, makes sense to make the instance state UNKNOWN since we don't know the host state, I mean I guess "UNKNOWN" could mean unknown details/state right ?17:28
belmoreiradansmith mriedem should placement/nova issues be discussed here or in placement channel17:28
cfriesendansmith: for what it's worth, in our environment if a compute node goes down an external entity sets all of the instances to the "error" state, until they were automatically recovered.17:28
dansmiththat seems like an improper use of the error state to me17:29
dansmithnot to mention that nova on its own won't know whether they're still up and fine or not17:29
cfriesenif the host is "down", then we fence it off and force a reboot.  those instances are guaranteed to be toast17:29
dansmithwhich is why we don't call them "stopped"17:29
dansmithcfriesen: okay, well, that's better in that case but vanilla nova can't do or know that17:30
cfriesenagreed, nova itself can't know the bigger picture17:30
dansmithbelmoreira: depends on what it is.. if it's integration issues then probably here17:30
belmoreirais the increase of the number of requests to placement17:32
cfriesendansmith: although, if an external entity uses the nova API to tell nova that the compute node is "down", it's supposed to have already fenced off the node to prevent instances from (eg) talking to volumes.17:32
belmoreirahave a look into: https://docs.google.com/document/d/1d5k1hA3DbGmMyJbXdVcekR12gyrFTaj_tJdFwdQy-8E/edit?usp=sharing17:32
cfriesenotherwise you could evacuate and then have two copies of an instance trying to access the same cinder volume17:32
dansmithcfriesen: yeah, true.. I guess I just prefer something less overloaded like UNKNOWN than saying it's stopped or error17:33
belmoreirathis is the number of requests to placement when compute-nodes get upgrade to Rocky17:33
dansmithefried: ^17:33
efriedhow far back am I reading?17:33
dansmithefried: one line17:33
dansmithand the url he posted17:33
dansmithbelmoreira: is it really increasing, or is that you bringing nodes on over time?17:35
belmoreirathe increase of requests shows the compute nodes being upgraded over time (queens -> rocky)17:36
dansmithokay17:36
dansmith(ouch)17:36
efriedWhat's happening at that cat-head bump?17:36
efriedor possibly batman17:36
dansmithonline migrations?17:36
efriedthose look like trait requests17:37
efriedif I'm reading this right.17:37
belmoreirano, it must be a cell that upgraded and then stopped nova-compute17:37
*** ttsiouts has quit IRC17:38
belmoreiraso, in the second graph we can see all the new requests17:38
*** ttsiouts has joined #openstack-nova17:38
*** liuyulong has quit IRC17:39
belmoreiraUUID/trais ; ?in_tree; UUID/aggregates; ...17:39
*** k_mouza_ has joined #openstack-nova17:40
efriedright, so it looks to me like, before rocky, we weren't calling ?in_tree, UUID/aggregates, or ?member_of at all. Which makes sense.17:40
belmoreirayes, and this seems to be the reason of the increase of requests17:41
efriedbut also increased number of requests for inventories.17:42
belmoreirabut is a huge increase. Just added another graph with the response time of my placement infrastructure17:42
efriedI have to say, this isn't all that surprising.17:42
*** ttsiouts has quit IRC17:43
*** k_mouza has quit IRC17:43
efriedalthough, hm, I would have expected this jump in queens17:44
efriedbelmoreira: Was this an upgrade from queens, or from earlier?17:44
*** k_mouza_ has quit IRC17:45
belmoreiraefried from queens17:45
belmoreiraI could handle it creating more placement nodes (x3). But looks too much...17:47
efriedbelmoreira: Can you give me a sense of what this timeline represents? At what point are all the upgrades done and the cloud in stable state?17:48
belmoreiraefried the nova/placement control plane was upgraded between 8:00 and 9:00. ~12:00 the compute nodes started to upgrade (this takes 24h for all of them upgrade)17:50
belmoreiraat 12:00 (today) almost all compute nodes are in Rocky.17:51
efriedbelmoreira: So where it tails off at the end, that's when the upgrades are pretty much done?17:51
belmoreirathe load graphs shows when I added more capacity for placement17:52
efriedDo you have a graph for what it looks like right now?17:52
efriedI'm just wondering if it's a massive spike during upgrade, but then it evens back out afterward.17:52
efriedin which case... yeah17:52
belmoreiraefried I'm getting a new graph from now17:54
efriedthough once again, I wouldn't have expected e.g. ?in_tree to be zero at queens. That should be happening every periodic.17:55
dansmithefried: you mean you think it's startup storm?17:57
dansmithso every time they reboot computes they'll get this?17:57
efrieddansmith: If you reboot a thousand computes...17:58
efrieddansmith: I just wanted to understand *whether* it was startup storm.17:58
dansmithright but presumably they're not rebooting them every second17:58
dansmithack17:58
efriedWhether it goes back to normal once everything stabilizes17:58
openstackgerritMerged openstack/nova stable/rocky: De-dupe subnet IDs when calling neutron /subnets API  https://review.openstack.org/60833617:58
dansmiththey also know what upgrades look like17:58
efried(I don't)17:58
dansmithso the fact that they're concerned probably means something17:58
efriedHeh, I'm not tryng to weasel out of anything. Just trying to grok the problem domain.17:59
dansmithno, I know17:59
dansmithjust sain'17:59
dansmitheven if we just made the reboot storm a lot worse, that's something we probably need to look at18:00
belmoreiraefried a new graph from now18:00
*** derekh has quit IRC18:00
belmoreirait is flat at the end. That is the total number of requests that we handle now18:00
efrieddansmith: Can you sanity-check me on this, though - the _refresh_associations code is in queens, including _ensure_resource_provider invoking _get_provider_in_tree, which is what invokes the ?in_tree URI.18:01
efriedthe mystery being, why would they be seeing zero ?in_tree calls right before the upgrade?18:01
dansmithI just headed into a meeting I have to pay attention to18:01
belmoreirahumm. tssurya just point out the "resource_provider_association_refresh" configuration that we had in queens we don't have it in rocky18:04
efriedmm, that'd explain a lot. Y'all added that to compensate for this kind of spike in placement traffic iirc18:05
belmoreiraefried that explains " I wouldn't have expected e.g. ?in_tree to be zero at queens"18:05
efriedyup18:05
mriedemi thought you totally nuked resource_provider_association_refresh rather than just set it to a large value?18:06
efriedbut also why all those things are zero before the upgrade and nonzero after. Like I was saying, I expect all this stuff to happen at the queens boundary, not rocky.18:06
belmoreirain queens we patch it and set it to a very large number (to not run again). And I miss it now. My fault!18:07
efriedIOW I suspect that turning that you would have seen the same graphs simply by turning that switch off and leaving your nodes at queens18:08
belmoreirabut the number of requests we really impressive! meaning that is very difficult to keep this option in a large infrastructure18:08
efriedbelmoreira: I don't disagree with that.18:09
mriedemso by default, every compute (70K?) is refreshing inventory every 1 minute, and every 5 minutes it's also refreshing in_tree, aggregates and traits?18:10
efriedI would think moving it to a fairly generous interval and hoping your computes don't all hit that interval at the same time :)18:10
tssuryamriedem: yea18:10
mriedemand we do'nt use the aggregates stuff in compute yet at all from what i can tell18:10
mriedemit was there for sharing providers which we don't support yet18:10
efriedwell, didn't we start cloning host azs ?18:10
mriedemthat's in the API18:10
efriedbut we're not using that in the scheduler yet?18:11
mriedemthe mirrored aggregates stuff? yes there are pre-request placement filters that rely on it (or something external doing the mirroring)18:11
mriedemi'm not sure what that has to do with the cache / refresh for aggregates in all the computes18:11
mriedemiow, i'm not sure what the cache in the compute buys us18:12
efriedyeah, I'm actually trying to think what we actually use the cache for at all... right.18:12
*** ralonsoh has quit IRC18:13
efriedcdent has been grousing for a while that we should just be able to make placement calls when we need 'em.18:13
sean-k-mooneyif we really wanted to make the storm less likely we could use a random prime ofset for the update18:14
efriedI was thinking it, but then you said it.18:14
mriedemoslo already does something like that for periodics18:15
*** jdillaman has quit IRC18:15
efriedTrying to think what it would take to rip out the cache completely.18:15
belmoreiraI'm changing this option, it will take ~2h to propagate. Will let you know the result18:15
efriedack18:15
belmoreiraI have to leave now for some minutes. Thanks for all your help18:16
efriedo/18:16
efriedmriedem: We use the cache data so the virt driver has the opportunity every periodic to update the provider tree.18:17
efriedmriedem: assuming stable placement, no _refresh'ing, we would be doing a helluva lot fewer calls18:18
efriedand that's also why we cache agg data. Because upt gets to muck with that stuff also.18:18
mriedembut nothing does right now right?18:18
mriedemfor aggregates18:18
mriedemand assuming inventory isn't wildly changing on a compute node, we don't really need the cache18:19
efriednot sure I'm following.18:19
efriedare you saying "as long as nothing is changing, we don't need to call update_provider_tree" ?18:20
mriedemupdate_provider_tree is what returns the inventory from the driver to the RT to push off to placement every 60 seocnds18:20
mriedem*seconds18:20
mriedemright?18:20
efriedYes18:21
mriedemand assuming that disk/ram/cpu on a host doesn't change all that often, at least without a restart of the host, it seems odd we need to cache that information18:21
efriedBut how else would we know whether to push the info back to placement?18:21
*** ldau has joined #openstack-nova18:21
mriedemin the before upt times, didn't the RT/report client just pull inventory, compare to what was reported by the driver, and the PUT it back if there were changes?18:21
efriedWhat does "pull inventory" mean, though?18:22
*** jmlowe has joined #openstack-nova18:22
efriedpull from placement18:22
mriedemGET /resource_providers/{rp_uuid}/inventories18:22
efriedi.e. GET /rps/UUID/inventory18:22
efriedyeah18:22
sean-k-mooneyefried: well the driver could have a perodic check but rememebr the last value it sent and only send a value if it detactes there was a chage18:22
efriedsean-k-mooney: ^ cache18:22
sean-k-mooneythat not the same as a cache18:22
efriedand that's what we do18:23
mriedemget_provider_tree_and_ensure_root is what gets the provider tree from the report client and pulls the current inventory from placement, yes?18:23
mriedemand also checks to see that the provider exists on every periodic18:23
efriedyes18:23
mriedemwhich we should actually know18:23
efriedyeah, we could conceivably expect the compute RP not to disappear once we've created it.18:24
efriedI mean, I don't know how resilient we're trying to be in the face of OOB changes.18:24
efriedwe do offer a placement CLI, not just for GETs but for writes as well18:24
mriedemthe compute service record, compute node record, and rp can all be deleted if the compute service record is deleted18:25
mriedembut to get the compute service record back, you have to restart the compute service18:25
mriedemto recreate the record which would also re-create the compute node18:26
mriedemand then the RP18:26
mriedemsince we know https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/compute/resource_tracker.py#L78018:26
sean-k-mooneyefried: true but in that case could we do what we do with neutron and have placment send a notificaiton to nova that it was changed instead of polling18:26
mriedemwe can pass that down18:26
mriedemi'll hack something up quick18:26
efriedIOW, only have the create code path on start=True18:26
efriedand don't bother with the existence check otherwise18:26
mriedemnot even start true18:26
mriedemsince https://github.com/openstack/nova/commit/418fc93a10fe18de27c75b522a6afdc15e1c49f2 we have a flag to pass through when we create the compute node18:26
mriedemwe just don't plumb it far enough18:27
mriedemi can push up a change that does18:27
efriedmriedem: That's what pike looked like, though. The stuff that's causing the spike is necessary for *enablement* of nrp, which we haven't started using yet18:27
mriedemthat might save precious ms for belmoreira :)18:27
efriedso it seems useless atm18:27
efriedbut as soon as we get e.g. neutron or cyborg adding shit to the tree, we're going to need to do that ?in_tree call every periodic.18:27
efriedunless there's some kind of async notification hook to trigger a refresh18:28
efriedyeah, what sean-k-mooney said.18:28
mriedemi'm saying we can resolve this todo i think https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/scheduler/client/report.py#L101118:28
mriedemcan we agree on that?18:28
mriedemneutron sending an event is possible, but it's also per instance...18:29
mriedemnot per host18:29
efriedmriedem: unfortunately not anymore, because we plan to allow other-than-nova to edit the tree.18:29
mriedemincluding delete the compute node root provider?18:29
mriedemthat nova creates?18:29
efriedno, not that.18:29
mriedemwell isn't that what that todo is all about?18:29
mriedemcreate the resource provider for the compute node if it doesn't exist18:30
efriedno18:30
sean-k-mooneymriedem: we are going to allow them to manage there onw subtrees only so the wont be allowed to modify any nodes created by nova18:30
*** tssurya has quit IRC18:30
efriedmriedem: You could probably factor out *just* the root provider part of that; but you can't get rid of the whole method.18:31
mriedemi'm not saying remove the method18:31
efriedand the GET that _ensure_resource_provider is doing is the ?in_tree one that we can't get rid of anyway.18:31
mriedembecause of something external adding/removing things from the root18:35
mriedemright?18:35
sean-k-mooneywell external entitiy can only leagally add nested resouce providers to the root node they cant add invetories or traits18:36
sean-k-mooneytechnicall the api does not enforce that as we dont have owner of resouce proivers in the api however18:37
mriedemwhat i'm hearing is we can't remove this todo https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/scheduler/client/report.py#L1011 even if we know we didn't just create the root compute node, because we need to call it anyway to determine if there are new nested providers under that pre-existing compute node18:38
mriedems/remove/resolve/18:38
mriedemiow, the todo should be removed b/c we can't do anything about it18:38
mriedemeven if we *know* the compute node record was just created18:38
sean-k-mooneyam i dont know if we need to know if there are new nested resouce providers18:39
mriedemthat's the whole in_tree thing i thought18:39
sean-k-mooneyin fact i would asser as the compute node we dont need to know that18:39
mriedemok well what i'm saying is we (the RT) know when we created a new compute node record, and thus need to create its resource provider, i guess i'll wait for someone to tell me if that's worth doing so we can avoid the GET /resource_providers?in_tree=<uuid of the thing we just created and thus doesn't exist yet> case18:40
sean-k-mooneymriedem: in that case i think you are right wew dont need the /resource_providers?in_tree=<thing i just created> call18:42
sean-k-mooneythe placement api will not allow me to create a resouce provider with a parrent uuid that does not exist18:43
* mriedem shoots self and moves on18:43
efriedsean-k-mooney: We *do* need to know if there are new nested providers.18:44
sean-k-mooneyefried: there cant be nested resouce providers of a compute node if we have not created the compute node yet right18:44
efriedThat's what ?in_tree is about, though.18:45
efried?in_tree=$compute_rp gives me the compute RP and any descendants.18:45
sean-k-mooneyyes18:45
sean-k-mooneybut if the compute RP does not exist yet then cyborge cant create nested resources under it18:46
efriedSo at T0, it gives me nothing, so I create the compute RP. At T1 it gives me the compute RP. At T2, cyborg creates a child provider for a device. At T3 ?in_tree=$compute_rp gives me both providers.18:46
efriedIf I didn't call ?in_tree I would never know about that device RP18:46
efriedand I need to know about that device RP e.g. from my virt driver so I can white/blacklist it and/or deploy it.18:46
sean-k-mooneyefried: sure you dont own that and as nova you cant directly modify it18:46
efriedUnclear whether blacklisting happens at cyborg or at nova18:47
efriedbut what you say makes sense (single ownership) so it would have to be at cyborg.18:47
*** mvkr has quit IRC18:47
efriedWhich means virt driver-esque code would need to be invoked by cyborg to do discovery in the first place.18:48
*** belmoreira has quit IRC18:48
*** cfriesen has quit IRC18:48
openstackgerritMatt Riedemann proposed openstack/nova master: Remove TODO from get_provider_tree_and_ensure_root  https://review.openstack.org/61483518:48
sean-k-mooneyefried: well not nessisarily18:48
*** cfriesen has joined #openstack-nova18:49
sean-k-mooneywe did suggest that in update_provider_tree we could call os-acc to do that18:49
efriedyes18:49
efriedwait18:49
sean-k-mooneybut what cyborg need to do is 1 lookup the root provider for the compute node18:50
efriedupdate_provider_tree would call os_acc with a list of discovered-and-already-whitelist-scrubbed devices so that cyborg can create the providers?18:50
sean-k-mooneyefried: no18:50
sean-k-mooneyif nova is doing the deicovery we are rebuiling cyborg in nova18:50
sean-k-mooneythe idea was that we woudl pass in the current tree to cyborg and it woudl do the discovery itslf and append to that tree18:51
sean-k-mooneybut the other approch18:51
sean-k-mooneywhich is what we were going to do18:51
sean-k-mooneywas cybroge poll placement for compute node to be created.18:52
sean-k-mooneythen it would add child resouce providers to the tree created by nova18:52
sean-k-mooneybut not modify any resouce provider it did not created18:52
sean-k-mooneyefried: today are we doing the provider tree update by put or patch. if put what would it take to make it a patch so nova can do a partial update and merge it on the placement side18:57
efriedmriedem: quick fix pls18:58
efriedsean-k-mooney: patch is only applicable if you're talking about modifying part of a single provider. Which I think we're not considering.18:59
efriedsean-k-mooney: IIUC, you're suggesting modifying some providers in the tree, but not others. That's still PUT - one per provider to be modified.18:59
efriedAnd it's what we do today, see update_from_provider_tree19:00
sean-k-mooneyefried: ok cool19:00
openstackgerritMatt Riedemann proposed openstack/nova master: Remove TODO from get_provider_tree_and_ensure_root  https://review.openstack.org/61483519:00
efried+2 ^19:00
*** ldau has quit IRC19:01
*** itlinux has quit IRC19:01
sean-k-mooneywhat i was actully suggesting was make it a single patch call to placement to update all resouce providers in a tree owned by service x but thats a different conversation19:03
efriedtotally19:03
sean-k-mooneyi am still not aware of an usecase that woudl require nova to be aware of a resouce provider created by another service by the way19:04
sean-k-mooneywhen i say nova i sepcfically mean the compute agent19:05
*** pcaruana has quit IRC19:05
dansmithefried: so was there some outcome?19:07
efrieddansmith: Remember that thing they did where they disabled the refresh_associations?19:08
dansmithoh they reverted that?19:08
dansmithaccidentally19:08
efrieddansmith: That's why all those calls were zeroes in queens and nonzero once they upgraded (because that hack was no longer there). Yeah.19:08
dansmithah cool.19:09
efriedSo belmiro is going to reinstate that and come back at us.19:09
dansmithright on19:09
efriedbut it's still a shit ton of calls19:09
efriedMatt and Sean and I brainstormed briefly on whether we could just get rid of the cache completely (and whether that would actually help).19:10
efriedAnd what we actually ended up doing was getting rid of a comment: https://review.openstack.org/614835 :(19:10
sean-k-mooneyefried: well i think we could maybe get rid of the cache but i think it needs a spec not irc ideas19:13
efriedsean-k-mooney: I would rather see a PoC in code for that one than a spec.19:13
sean-k-mooneyefried: well that a possiblity but it would be similar to neutrons notifications for prot/network events19:14
efriedyeah, a subscribable notification framework at placement itself would be cool.19:15
sean-k-mooneyyep that is what i was about to type but had not decided how to phase it19:15
mriedemthere was a blueprint for that at one point i think19:15
mriedemhttps://blueprints.launchpad.net/nova/+spec/placement-notifications19:16
sean-k-mooneywell this is a much more positive resonce to this idea then i had expected19:16
sean-k-mooneyif we had an owner attribute on every provider and a api to register owner with call backs then each service could register a subsciption to the rps they own19:17
mriedemand if ifs and buts were candy and nuts we'd all have a merry christmas19:19
*** tbachman has quit IRC19:19
efriedI was thinking much simpler to start.19:19
efriedYou could register for a notification any time $rp_uuid is touched.19:19
efriedwhich includes "create a resource provider with $rp_uuid as a root", which solves the use case we were discussing.19:20
sean-k-mooneyefried: i considered that but that could be a lot of RPs19:20
efriedOnly one per host19:20
sean-k-mooneythat said i guess you would only have to do that on creating the rp the first time19:20
sean-k-mooneyat least for clean deployments19:20
cfriesensean-k-mooney: does nova-compute need to know about the child resource providers that cyborg created?19:20
*** panda is now known as panda|off19:20
sean-k-mooneycfriesen: i dont think so19:21
sean-k-mooneycfriesen: not in any of the interaction specs i have seen19:21
efriedcfriesen: Yeah, we talked about that above; the virt driver needs to know about them for purposes of whitelisting, deploying/attaching, etc.19:21
sean-k-mooneyefried: no it does not19:21
sean-k-mooneythe whitelisting is cyborges job19:21
*** ldau has joined #openstack-nova19:22
sean-k-mooneyand deploy/attaching will be not dont in term of the VARs or whatever the equivalent of a port bining has become19:22
cfriesenassuming nova owns a specific set of resource providers, and it's the only thing consuming from those resource providers (can we assume that) then it should only have to update inventory once at startup.19:22
openstackgerritMerged openstack/os-vif master: Do not import pyroute2 on Windows  https://review.openstack.org/61472819:23
ldauHi, somebody has  installed all-in-one openstack using vmware as hypervisor?19:23
sean-k-mooneycfriesen: that is the assumtion we stated in denver so yes i think that is still ture19:23
cfriesennow if anything else can consume those resources, then we need the periodic inventory update19:23
efriedsean-k-mooney, cfriesen: if all of that is true, then we can indeed resolve the TODO that mriedem just blew away.19:23
efriedcfriesen: You mocking this up?19:24
cfriesennot me. :)19:24
efriedcfriesen: "Consume the resources" doesn't matter. Changing inventory matters, but only for the providers I onw.19:24
efriedown19:24
efriedwe don't cache allocation/usage data.19:25
cfriesenefried: agreed.  I guess it'd have to be something like CPU/RAM hotplug where it actually changes the inventory19:25
efriedBut that would be noticed by the virt driver, which would update_provider_tree, which the rt would flush back.19:26
sean-k-mooneycfriesen: it would be the virtdirver that would do that however19:26
efriedIOW, we're not getting rid of the cache. We're getting rid of all the cache *refreshing*.19:26
sean-k-mooneyefried: yes19:26
efriedsean-k-mooney: you mocking this up?19:26
efriedor is mriedem?19:26
mriedemi stopped paying attention, what now?19:27
sean-k-mooneyefried: if we just disable the refesh in the config dose that not effectivly mock it up19:27
efriedmriedem: We're operating on the hypothesis that nova does *not* in fact need to know if outside agents create child providers that they will continue to own.19:27
efriedmriedem: And if that's true, we do *not* in fact need to refresh the cache, ever.19:28
efriedmriedem: So we *can* in fact resolve the TODO you just removed.19:28
efriedsean-k-mooney: More or less, yeah. Which is what CERN did. Which they seem to have had success with.19:28
mriedemhow about someone poop this out in the ML and get it sorted out there when gibi, jaypipes and cdent can also weigh in19:28
efriedsean-k-mooney: I think there's more we can do, though.19:29
mriedemi think in general, we should default to *not* cache b/c of the cern issue, and only allow caching if you want to opt-in b/c you have a wildly busy env where inventory is changing a lot19:29
efriedmriedem: What was your idea to get the compute RP creation happening only once?19:29
mriedemwhich i guess is a powervm thing19:29
sean-k-mooneyefried: yes proably19:29
mriedemefried: yes, create the compute node root rp when the compute node is created, otherwise don't attempt to do the _ensure_resource_provider thing again19:29
mriedemi.e. is it a powervm thing to be swapping out disk and such on the fly and expect nova-compute to just happily handle that?19:30
cfriesenmriedem: changing inventory, or changing usage?19:30
mriedeminventory19:31
mriedemanyway, that probably doesn't matter here,19:31
mriedemwe get the inventory regardless to know if it changed so we can push updates back to placement19:31
cfriesendo we expect anyone to have wildly changing inventory?  I would have thought inventory is relatively stable.19:31
mriedemcfriesen: that's what i said about 2 hours ago19:31
sean-k-mooneymriedem: if its somthing that is discoverd by the virt driver its not an issue if it chagnes19:31
mriedemi also don't think we really need to worry about refreshing aggregate relationships in the compute,19:32
efriedmriedem: I contend it doesn't matter if powervm (or any driver) changes inventory every single periodic. Because update_provider_tree is getting whatever the previous state of placement was (because placement isn't changed yet) and then update_from_provider_tree is flushing that back to placement *and* updating the cache accordingly.19:32
mriedemsince we don't do anything with those yet19:32
*** belmoreira has joined #openstack-nova19:32
efriedyeah, what sean-k-mooney said, only bigger.19:32
sean-k-mooneyso do we all agree we dont need to refesh the cache in any case that is atleas emitly obvious to us19:33
efriedtentatively yes19:33
mriedemi would have thought a lot of prior discussion about why we even have a cache in the first place has happene19:34
mriedem*happened19:34
mriedemtherefore it seems pretty severe to just all of a sudden say, "oh i guess we don't"19:34
mriedemand if there are reasons, do those reasons justify us caching by default19:35
mriedemanyway, those are questions for the ML, not irc19:35
sean-k-mooneyif so then should we start with a patch to default the refresh to off. and a mailing list post to see what operators think/other feedback19:35
mriedemcern is the only deployment big enough and new enough that i've heard complain about that refresh19:35
mriedemnot sure if mnaser is doing anything about it19:36
mnaserhi19:36
* mnaser reads19:36
mriedemmnaser: tl;dr do you turn this way down? https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh19:36
mriedemto avoid computes DDoS'ing placement every 5 minutes19:37
mnaseri didn't know about that but i do feel like placement gets waaaaay too much unnecessary traffic19:37
mnaseran idle cloud (aka literally no new vms being created/deleted) will constantly hit placement19:38
* mnaser never understood why19:38
efriedso mriedem, to do the thing you were talking about earlier, we would add is_new_compute_node as a kwarg from _update_available_resource into _update => _update_to_placement and then only call _get_provider_tree_and_ensure_root if it's true?19:38
mriedemmnaser: inventory updates baby!19:38
mnaseri'm all in favour of minimizing the amount of traffic that placement gets however a lot of times we end up seeing weird stuff happen in placement db19:38
mnaserso if we have to edit stuff via the api19:38
mnaserit'd be nice to just know they work19:39
*** itlinux has joined #openstack-nova19:39
mriedemefried: that's what i was thinking yeah, and nearly starting writing it, but then you guys all said in_tree was mega importante19:39
efriedmnaser: True story. If you edit something by hand, and we've switched off all this cache refreshing, the only way you're going to pick it up again is to restart the compute service.19:39
efriedor maybe we can work a HUP in there or something.19:39
mriedemHUP is how we refresh the enabled/disabled cell cache in the scheduler19:40
sean-k-mooneyand any of the mutable config stuff if we have it so HUP makes sense19:40
mnaserbut really the only editing ive had to do was delete/remove allocations19:40
mnaserthat were stale for $reasons19:40
sean-k-mooneymnaser: oh if its allocation that fine19:40
efriedmriedem: based on having toggled some kind of setting, right? I.e. HUP is a no-op if you haven't changed anything.19:40
mnaserso forgive me for my silly question but19:40
mnaserwhy do computes care about that information19:41
efriedmnaser: which information specifically?19:41
efriedallocations?19:41
mnaserwell, whatever api hits all the time, i think its allocations?19:41
efriedwhat hits the API all the time is inventory, providers, aggregates.19:41
mnaserbut inventory can be a one time hit on start up because afaik the state of that is pretty darn static right19:42
sean-k-mooneyno this check aggretes and traits on the provider not the assoications19:42
efriedmnaser: And what we're discussing is, the virt drivers need to know what that stuff looks like so they can *modify* the provider layout, inventory, etc.19:42
sean-k-mooney*allocations19:42
efriedBUT that stuff shouldn't change unless the virt driver changes it19:42
efried...or if you muck with it in the CLI :)19:42
* mnaser should probably read placement for dummies19:42
mnaseryeah usually my mucking around is around allocations, that's where things get out of sync usually19:43
efriedmnaser: Placement for dummies won't help you. And for placement-in-nova, there's no such thing as for-dummies.19:43
mnaserbut i mean the "traits" change dynamically?19:43
mnaseranyways, i wont let your discusion diverge too much19:43
efriedno, traits is a good point, /me thinks...19:44
mriedemmnaser: you are asking and saying the same thing i've been saying for an hour or so,19:44
mriedemthat inventory is pretty static19:44
mnaserits 100% static.. things in inventory are19:44
sean-k-mooneymnaser: we technicall propised that operators should be able to add traits to RPs but currently the virt dirver just overrite them i think19:44
efriedonce again, if traits change due to external factors, you could HUP to get that flushed.19:44
mriedemtraits could be changed out of band if you're decorating capabilities on your compute node for scheduling,19:44
mnasermemory.. disk.. vcpus..19:44
mriedemand aggregates aren't used in the compute service (yet)19:44
mnaseryeah unless someone is hot plugging in memory/disk/cpu19:45
mnaseri dont see inventory changing19:45
mnaserand yes i agree traits make sense if you wanna say this compute node is special.. but also, does that compute node really care to know if its special?19:45
mnaseronly the scheduler cares that it's special..19:45
sean-k-mooneymriedem: well the only ones that are are the ones that are created form nova host aggreates(assuming jays stuff laned last cycle)19:45
mriedemmnaser: almost correct19:45
efriedyou could run into some interesting race conditions. If you muck with traits at the same time as the virt driver is mucking with traits, whoever gets there last will win.19:45
mriedemright, we try to merge in what the compute is reporting for traits with what was put on the node resource provider for traits externally19:45
*** fghaas has joined #openstack-nova19:46
sean-k-mooneymriedem: sorry you said compute service ignore that19:46
mnaserok so its like19:46
mriedemthe *only* thing on the compute that would care about aggregates in placement is shared storage providers,19:46
mriedemwhich we don't support yet19:46
mnaserself reported traits + "user" decorated traits19:46
mriedemmnaser: yes19:46
mnaserbut the nova-compute reported traits.. i feel like those are pretty static right? maybe i just don't know any wild use cases19:47
mriedemi think i've been saying since pike, at least queens, we don't need to refresh aggregate info in compute19:47
mriedemmnaser: proably depends on the driver19:47
mnaserbut i feel like most of the time, if a system trait changes, you probably have nova-compute restart things19:47
mnaserah ok19:47
mriedemvmware would love to be able to randomly proxy traits from vcenter through nova-compute to placement19:47
cfriesenmnaser: the one exception would be something like vTPM where the driver uses the presence of the requested trait to decide to do something with the instance.19:47
mriedemfor changes in vcenter19:47
sean-k-mooneymnaser: the intent with the user decorated traits was to let the operator tag node with stuff the virt dirver cant discover or to express policy19:47
cfriesenmnaser: but that's really looking at the requested trait, not the trait on the resource provider19:48
sean-k-mooneycfriesen: but that is in the instance request19:48
mnaseryeah, i dunno, i feel like those will not change much, and i think just calling a method *once* when you make some changes rather than all the time isnt probelmatic19:48
sean-k-mooneycfriesen: it does not need the RP info19:48
cfriesensean-k-mooney: yah, that's what I realized after typing the first sentence. :)19:48
*** tbachman has joined #openstack-nova19:48
mnaseri mean lets be honest, we don't "refresh" allocations and those can go pretty stale19:48
mnaserare "system" and "user" traits distingusable or in a race they can wipe each other out?19:49
*** ldau has quit IRC19:49
mriedemsystem traits might be 'standard' traits19:49
mriedemuser traits would be CUSTOM traits19:49
sean-k-mooneyefried: i assume we update the resouce provider generattion when updating traits?19:49
mriedembut a user could put standard traits on a provider that the virt driver doesn't report19:49
mriedemsean-k-mooney: yes19:49
efriedsean-k-mooney: Placement does19:49
cfriesensean-k-mooney: on a totally different topic, have you ever run into a scenario where qemu has a thread sitting at 100% cpu but not making any forward progress?  I'm assuming it's a livelock somehow, just not sure how.19:50
sean-k-mooneyya so one of either the user or the virtdriver will fail in that case and have to retry19:50
mriedemi think the only virt driver today that reports any traits is the libvirt driver reporting cpu features19:50
efriedso yeah, this is where re-GET-and-redrive comes into play.19:50
sean-k-mooneyso there is not race19:50
mnaseri still don't see the need for constantly updating, sounds like this is something that you can just report once on start up or when it changes19:50
sean-k-mooneywell unless the clinet is coded badly19:50
efriedmnaser: Yes19:50
mnaserMAYBE pull it in when a new VM gets spun up19:51
*** mvkr has joined #openstack-nova19:51
mnaserif it means a different codepath19:51
*** tbachman_ has joined #openstack-nova19:51
sean-k-mooneyso currntly resource_provider_association_refresh has a min vlaue of 1. can we allow 0 and define that to mean update cache only on startup or sig hup19:51
*** cfriesen has quit IRC19:52
*** cfriesen has joined #openstack-nova19:53
sean-k-mooneymriedem: did you not have a propsal to report things like support_migration as traits too19:53
cfriesendunno what's going on today, I keep disconnecting19:53
sean-k-mooneymriedem: or is that handeled by the compute manager above the virt dirver level19:54
*** tbachman has quit IRC19:54
*** tbachman_ is now known as tbachman19:54
mnaseranyways thats my 2 cents19:54
* mnaser goes back to dealing with rocky upgrades19:54
efriedmriedem, sean-k-mooney: Is ComputeManager.reset() the right hook for that SIGHUP thing?19:56
*** tbachman has quit IRC19:56
mriedemsean-k-mooney: https://review.openstack.org/#/c/538498/19:56
mriedemefried: yeah19:56
sean-k-mooneymriedem: ah ya that is what i was thinking of19:56
openstackgerritMerged openstack/nova master: PowerVM upt parity for reshaper, DISK_GB reserved  https://review.openstack.org/61464319:57
efriedmriedem: So like if I wanted to "clear the cache" I could add19:58
efried        self.scheduler_client = scheduler_client.SchedulerClient()19:58
efried        self.reportclient = self.scheduler_client.reportclient19:58
efriedto that method19:58
efriedor if I wanted to be narrower about it, I could add a reset() to the report client and invoke self.reportclient.reset() from there instead.19:58
mriedemi reckon19:59
sean-k-mooneyefried: reset might be better19:59
mriedemthis just seems like a 180 in attitude about how important it is to having nova-compute be totally self-healing on every periodic19:59
mriedemwhich i'm sure was debated to death in releases past19:59
sean-k-mooneymriedem: well nothing is stopping us also reviving the notificaiton also to move the healing to a push model20:00
mriedemthere are plenty of things stopping me from doing anything20:01
openstackgerritMatt Riedemann proposed openstack/nova master: Clean up cpu_shared_set config docs  https://review.openstack.org/61486420:01
efriedmriedem: I think a lot of the reasoning in the past was because information was coming from several different places while we were transitioning to placement but not fully there yet.20:02
efriedWe're getting pretty close to fully there at this point, so I think a lot of this stuff is going to get to be cleaned up.20:02
efriedLike _normalize_allocation_and_reserved_bullshit()20:03
mriedemefried: maybe, but i specifically remember you bringing up something once about how powervm shared storage pools can have disk swapped in and out on a whim and nova-compute should be cool with reporting that as it changes - but maybe that's unrelated to this, idk20:04
efriedmriedem: 1) that's not implemented yet, but 2) even when it is, that gels just fine with this, precisely because it's being node by virt.powervm.update_provider_tree and *not* out of band.20:04
efrieds/node/done/20:05
mriedemok i thought it was to handle some out of band thing20:05
mriedembut it was awhile ago and i've been high on ether since then20:06
sean-k-mooneywell in additon to the sig hup stuff + disableing the cache refush via config =0  if we inject a sleep(random(refersh interval)) seconds to the specific periodic task once the jitter should sperad out the update over the entire interval smoothly on average20:06
sean-k-mooneyso for those that dont turn this off the same amount of update to placement will happen jsut not all at once every x seconds20:07
*** eharney has quit IRC20:08
mriedemi imagine cern would appreciate a way to disable the refresh altogether since they are already doing that out of tree20:10
efriedI'm working something up now.20:11
sean-k-mooneyefried: code or ml post or spec20:11
*** belmoreira has quit IRC20:12
efriedsean-k-mooney: code20:13
efriedIf it gets traction, I can spec it.20:13
sean-k-mooneycool20:13
*** itlinux has quit IRC20:14
sean-k-mooneymriedem: by the way i know your busy with other stuff but do you plan to revive https://review.openstack.org/#/c/538498/ at somepoint20:15
*** itlinux has joined #openstack-nova20:15
mriedemit's pretty low priority20:16
sean-k-mooneymriedem: ok i stared it. so in the unlikely event i run out of things to do i might take a look at it if you dont get back to it. but ya ther are many things ahead of it20:18
openstackgerritMatt Riedemann proposed openstack/nova-specs master: High Precision Event Timer (HPET) on x86 guests  https://review.openstack.org/60798920:27
mriedemcfriesen: jackding: ^ i cleaned that up, +220:29
cfriesenmriedem: sweet, thanks.  any chance you could take another look at the vtpm one?20:30
cfriesensean-k-mooney: you too20:30
sean-k-mooneycfriesen: am sure20:30
sean-k-mooneycfriesen: im respining a patch but ill take a look after20:30
mriedemffs yes you know i'd love t20:30
mriedem*to20:30
cfriesenyou're so sweet20:30
*** KeithMnemonic has joined #openstack-nova20:31
mriedemi have my moments20:32
mriedemonce per quarter20:32
KeithMnemonicmriedem: can someone help move this along https://review.openstack.org/#/c/611326/1 ?20:32
*** dave-mccowan has quit IRC20:32
* cfriesen snags another bag of leftover halloween snacks20:32
mriedemKeithMnemonic: umm, melwitt and/or dansmith could probably hammer that through20:32
mriedemKeithMnemonic: how far back do you need that fix?20:34
KeithMnemonicthanks melwitt: dansmith: can you help here ?20:34
KeithMnemonicjust pike20:34
mriedemok i can work on the queens and pike backports in the meantime20:35
KeithMnemonicbut it needs to get in rocky first then20:35
mriedemyup20:35
melwittlooking20:35
KeithMnemonicthanks for helping out!!20:35
*** cdent has joined #openstack-nova20:43
*** slaweq has joined #openstack-nova20:50
openstackgerritsean mooney proposed openstack/os-vif master: add support for generic tap device plug  https://review.openstack.org/60238421:02
openstackgerritsean mooney proposed openstack/os-vif master: add isolate_vif config option  https://review.openstack.org/61253421:02
*** erlon has quit IRC21:05
openstackgerritsean mooney proposed openstack/os-vif master: always create ovs port during plug  https://review.openstack.org/60238421:07
openstackgerritsean mooney proposed openstack/os-vif master: add isolate_vif config option  https://review.openstack.org/61253421:07
sean-k-mooneyjaypipes: sorry for the delay i shoudl have adressed all your comments in ^ i have also reworded the commit message for the first patch to clarify things a little21:08
mriedemcfriesen: done https://review.openstack.org/#/c/571111/21:10
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Fix NoneType error in _notify_volume_usage_detach  https://review.openstack.org/61486821:11
cfriesenthanks.  do you think we should deal with shelve/unshelve as part of this, given that it's broken for UEFI nvram currently?21:11
mriedemi think if you're not going to deal with it now, it should be explicitly called out as a limitation21:12
cfriesenokay, happy to do that21:12
mriedemhappier than adding shelve support anyway :)21:12
cfriesenI think for both cases we'd need to store those files somewhere, either in glance or maybe swift (if present)21:13
mriedemnova doesn't do anything with swift directly so idk21:14
mriedemif only we had switched to glare 3 years ago when they wanted us to21:14
cfriesenfyi, there are actual differences between 1.2 and 2.0 other than CRB21:15
mriedemi figured maybe there were, but idk what they are21:16
mriedembut assume people that care about using this would know the difference21:16
cfriesenme too. :)21:16
mriedemooo https://www.dell.com/support/article/us/en/04/sln312590/tpm-12-vs-20-features21:16
cfriesenmy impression is that this stuff is all crazy complicated21:17
sean-k-mooneycfriesen: yes yes it is21:17
mriedemcool, let's add it to nova!21:17
mriedemWHAT COULD GO WRONG?!21:17
sean-k-mooneymriedem: well a version number is a lot better then traits for all the crap added in each versions21:18
cfriesenyou're giving me nightmares21:18
mriedemi'm fine with reporting the different versions as traits21:19
mriedemhttps://en.wikipedia.org/wiki/Trusted_Platform_Module#TPM_1.2_vs_TPM_2.0 could be a reference in the spec if we cared21:19
mriedemsounds like 2.0 is more secure21:19
sean-k-mooneycfriesen: the cloud plathform group  gave me them frist when the wanted me to enable tpm traits 12 months ago21:19
*** awaugama has quit IRC21:19
sean-k-mooneymriedem: yes it is21:20
sean-k-mooneymriedem: when i was orignally try to standardise tpm trais i have multiple version traits https://review.openstack.org/#/c/514712/3/os_traits/hw/platform/security.py21:21
sean-k-mooneybut honelst 1.2 and 2.0 are all that matter21:22
sean-k-mooneyas far a i know very few deplopyment of tpm 1.0 or 1.1 were ever a thing21:22
cfriesenon a totally different topic, I'd like to draw your attention to https://review.openstack.org/#/c/473973/21:24
cfriesenoriginally we used these for the nova/neutron update where we were being blasted with a bunch of neutron updates.  now with the changes to get fewer neutron updates it's probably not as big a deal, but we might want to consider using the fair locks in a few places.21:26
sean-k-mooneycfriesen: so these are basically the opisite of pirority locks hehe21:27
cfriesenthey're like ticket spinlocks21:27
sean-k-mooneycfriesen: just looking at the implementaiton21:29
cfriesenthe original problem we hit was that the nova-compute thread handling "real work" (like a migration or something) was being starved by tons of incoming neutron events that always got the lock first21:29
cfriesensean-k-mooney: for simplicity it uses the fact that fasteners writer locks are queued21:29
openstackgerritMatt Riedemann proposed openstack/nova stable/pike: Fix NoneType error in _notify_volume_usage_detach  https://review.openstack.org/61487221:30
sean-k-mooneycfriesen: how does this interact with and without eventlets monkypatching21:31
cfriesenshould just work.  the underlying stuff is threading.Condition21:33
*** slaweq has quit IRC21:36
sean-k-mooneycfriesen: once comment if you re spin the patch but ya it neat21:38
sean-k-mooneythat said if you did not need a named lock you could jsut use the readerwriter lock directly21:39
jackdingmriedem: I forgot to push my change, Thank you for doing that.21:45
sean-k-mooneycfriesen: so for real time guests do you care that we cant disable the perfomance moniting unit in the libvir xml in nova22:06
cfriesensean-k-mooney: I don't think it's come up.  Do they default to on?22:10
sean-k-mooneycfriesen: yep22:10
sean-k-mooneyi have no idea what the impact of that is22:10
sean-k-mooneyi assume low22:10
sean-k-mooneybut i have an internal email asking about turning realtime instance and that was the only itme that is not already supported upstream22:11
sean-k-mooneyi could write a patch to allow disableing it in like an hour just not sure its worth my time and or if people would accpet the patch if i did22:11
cfriesensean-k-mooney: I don't see a "perf" section if I do "virsh dumpxml"22:12
cfriesenmaybe we default it to off or something in libvirt22:13
sean-k-mooneyits in this section https://libvirt.org/formatdomain.html#elementsFeatures22:13
sean-k-mooneyand its defalted to on in libvirt22:14
cfriesen"virsh domstats --perf <domain>" gives me nothing22:16
sean-k-mooneycfriesen:  virsh dumpxml | grep pmu ?22:16
*** threestrands has joined #openstack-nova22:17
cfriesennothing22:17
sean-k-mooneywhat version of libvirt are you running22:17
sean-k-mooneythe docs could be wrong22:17
cfriesen3.5.022:18
sean-k-mooneyand qemu22:18
cfriesenqemu-kvm-ev-2.10.022:18
*** cdent has quit IRC22:19
sean-k-mooneyok it said since 1.2.12 ill assume the docs are wrong until they show me a vm xml with this from an openstack instance22:19
cfriesenI have a specific CPU model though, not host-passhtrough, if that matters22:20
sean-k-mooneyit may in this case it was using host-passtrogh22:21
sean-k-mooneythat said i pmu is not a cpu flag so it should not22:21
*** mriedem has quit IRC22:23
sean-k-mooneyactully maybe when the default is on it just does not include it in the xml22:23
sean-k-mooneyill get them to verify its actully on before spending any more time on it. thanks cfriesen :)22:24
cfriesenhow do we handle long URLs in specs?22:24
sean-k-mooneyi belive flake8 ignore them22:25
sean-k-mooneyat least it appeared to in the onse i was writing so i just put them in the refernece section and use [0]_ to refer to them22:25
sean-k-mooneyi dont belive there is an openstack url shortenaer so just use google or something else if you need too22:26
cfriesenhmm..just had a thought.  is there a way to schedule based on libvirt version?22:32
sean-k-mooneycfriesen: nope but you could have trait22:33
* sean-k-mooney ducks before jaypipes see ^22:33
cfriesenheh.  actually, I think I'm okay.  I have a trait for TPM 2.0, and that requires libvirt 4.5 which will also support CRB22:34
sean-k-mooneyya i think realistically we dont want to expose software versions to schdule on things and use feautre instead22:35
sean-k-mooneyTPM 2.0 is defernt at that is refering to an iso standard and well they take a bit more time to have revisions and get implementd in hardware22:36
jaypipessean-k-mooney: you're now officially on the naughty list.22:38
sean-k-mooneyhehe i did say lets not use traits for this :) also was i ever not?22:38
*** fghaas has quit IRC22:45
jaypipes:)22:48
*** KeithMnemonic has quit IRC22:55
openstackgerritEric Fried proposed openstack/nova master: WIP: Trust the report client cache more  https://review.openstack.org/61488623:06
efriedmriedem, sean-k-mooney, jaypipes, cfriesen, belmoreira: ^^23:07
efriedI should link today's IRC discussion in there. But I gotta run riiight now.23:07
*** owalsh_ has joined #openstack-nova23:14
*** owalsh has quit IRC23:15
*** tbachman has joined #openstack-nova23:19
*** spatel has joined #openstack-nova23:21
*** mvkr has quit IRC23:22
*** mvkr has joined #openstack-nova23:23
*** tbachman has quit IRC23:25
*** spatel has quit IRC23:25
*** mlavalle has quit IRC23:36
*** Swami has quit IRC23:53
*** gyee has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!