Friday, 2019-09-20

*** igordc has quit IRC00:10
openstackgerritsean mooney proposed openstack/nova-specs master: resubmit image metadata prefiltering spec for ussuri  https://review.opendev.org/68325800:21
*** mriedem_away has quit IRC00:22
*** gyee has quit IRC00:28
*** elod has quit IRC00:29
*** mmethot_ has quit IRC00:40
*** mmethot has joined #openstack-nova00:40
*** mmethot has quit IRC00:45
*** TxGirlGeek has quit IRC00:45
*** mmethot has joined #openstack-nova00:45
*** markvoelker has joined #openstack-nova00:46
*** mmethot has quit IRC00:46
*** mmethot has joined #openstack-nova00:47
*** elod has joined #openstack-nova00:49
*** markvoelker has quit IRC00:50
*** mmethot has quit IRC00:55
*** mmethot_ has joined #openstack-nova00:55
*** Tianhao_Hu has joined #openstack-nova01:06
*** Tianhao_Hu has left #openstack-nova01:06
*** mlavalle has quit IRC01:09
*** larainema has quit IRC01:15
*** takashin has left #openstack-nova01:30
*** yedongcan has joined #openstack-nova01:37
*** ricolin has joined #openstack-nova01:55
*** boxiang has joined #openstack-nova02:13
*** tbachman has quit IRC02:24
*** mkrai_ has joined #openstack-nova02:45
*** mkrai_ has quit IRC02:50
*** tinwood has quit IRC02:50
*** tinwood has joined #openstack-nova02:52
*** boxiang has quit IRC02:54
*** zhubx has joined #openstack-nova02:54
*** mkrai has joined #openstack-nova02:57
*** cfriesen has quit IRC02:59
*** ricolin_ has joined #openstack-nova03:04
*** ricolin has quit IRC03:06
*** zhubx has quit IRC03:13
*** boxiang has joined #openstack-nova03:13
*** TxGirlGeek has joined #openstack-nova03:27
*** psachin has joined #openstack-nova03:35
*** dave-mccowan has quit IRC04:19
*** BjoernT has joined #openstack-nova04:28
*** BjoernT_ has joined #openstack-nova04:32
*** BjoernT has quit IRC04:33
*** ratailor has joined #openstack-nova04:40
*** _mmethot_ has joined #openstack-nova04:41
*** mmethot_ has quit IRC04:41
*** Luzi has joined #openstack-nova05:01
*** BjoernT_ has quit IRC05:02
*** BjoernT has joined #openstack-nova05:04
*** BjoernT has quit IRC05:05
*** TxGirlGeek has quit IRC05:07
*** BjoernT has joined #openstack-nova05:07
*** BjoernT has quit IRC05:11
*** TxGirlGeek has joined #openstack-nova05:11
*** BjoernT has joined #openstack-nova05:12
*** pcaruana has joined #openstack-nova05:28
*** BjoernT has quit IRC05:29
*** yedongcan has quit IRC05:38
*** pcaruana has quit IRC05:53
*** ricolin_ is now known as ricolin05:54
*** jawad_axd has joined #openstack-nova05:58
*** jawad_ax_ has joined #openstack-nova06:02
*** jawad_axd has quit IRC06:03
*** TxGirlGeek has quit IRC06:08
*** slaweq has joined #openstack-nova06:19
*** psachin has quit IRC06:23
*** xek has joined #openstack-nova06:36
*** psachin has joined #openstack-nova06:38
*** tetsuro has joined #openstack-nova06:44
*** tetsuro has quit IRC06:47
*** zbr is now known as zbr|ruck06:48
*** tetsuro has joined #openstack-nova06:48
*** trident has quit IRC06:49
*** luksky has joined #openstack-nova06:50
*** tetsuro has quit IRC06:51
*** markvoelker has joined #openstack-nova06:57
*** trident has joined #openstack-nova07:01
*** markvoelker has quit IRC07:01
*** cshen has joined #openstack-nova07:02
*** tetsuro has joined #openstack-nova07:03
*** trident has quit IRC07:07
*** damien_r has joined #openstack-nova07:08
*** maciejjozefczyk has joined #openstack-nova07:11
*** ccamacho has joined #openstack-nova07:13
*** zhubx has joined #openstack-nova07:13
*** boxiang has quit IRC07:14
*** awalende has joined #openstack-nova07:17
*** trident has joined #openstack-nova07:17
*** udesale has joined #openstack-nova07:20
*** rcernin has quit IRC07:29
*** rpittau|afk is now known as rpittau07:31
*** zhubx has quit IRC07:32
*** zhubx has joined #openstack-nova07:32
*** tbachman has joined #openstack-nova07:42
*** ralonsoh has joined #openstack-nova07:44
*** ttsiouts has joined #openstack-nova07:45
*** ivve has joined #openstack-nova07:45
*** tbachman has quit IRC07:47
*** huth has joined #openstack-nova07:49
*** huth has left #openstack-nova07:49
*** tkajinam has quit IRC08:09
*** avolkov has joined #openstack-nova08:10
*** cshen has quit IRC08:27
*** cshen has joined #openstack-nova08:44
*** tetsuro has quit IRC08:45
*** cshen has quit IRC08:49
*** mkrai has quit IRC08:51
*** ociuhandu has joined #openstack-nova08:52
*** dtruong has quit IRC08:54
*** rcernin has joined #openstack-nova08:55
luyaoefried_pto, stephenfin: Hi, could you help review the vpmems doc if you get time,  since you both are familiar with the vpmems series.  :D    I think I get the content close which is focusing on the current functionality.  https://review.opendev.org/#/c/680300.08:59
luyaoAnd the patch 'objects: use all_things_equal from objects.base' is ready to merge , need  +W on it, https://review.opendev.org/#/c/681397/1309:02
*** mkrai has joined #openstack-nova09:05
*** cshen has joined #openstack-nova09:05
*** tetsuro has joined #openstack-nova09:05
openstackgerritSylvain Bauza proposed openstack/nova master: Add a prelude for the Train release  https://review.opendev.org/68332709:07
*** cshen has quit IRC09:10
*** pcaruana has joined #openstack-nova09:12
*** igordc has joined #openstack-nova09:15
*** igordc has quit IRC09:20
*** derekh has joined #openstack-nova09:27
*** dtantsur|afk is now known as dtantsur09:27
*** ociuhandu has quit IRC09:30
*** ociuhandu_ has joined #openstack-nova09:30
*** luksky has quit IRC09:37
*** cshen has joined #openstack-nova09:38
*** igordc has joined #openstack-nova09:39
*** rcernin has quit IRC09:49
*** zhongjun2__ has joined #openstack-nova09:50
*** zhongjun2__ has quit IRC09:50
*** AdamMork has joined #openstack-nova09:53
*** tetsuro has quit IRC09:55
*** sean-k-mooney has quit IRC09:57
*** sean-k-mooney has joined #openstack-nova09:58
AdamMorkHello, friends! I have a openstack contains 2 controllers an 4 compute node's. I need to enable AES-NI for one or more vm.  Openstack deployed via kolla and work on docker containers(nova container, neutron container etc. on controller). I read manual https://software.intel.com/en-us/articles/openstack-epa-feature-breakdown-and-analysis . On the09:59
AdamMorkcompute node (in bios i enable AES-NI) . But in vm use command (cat /proc/cpuinfo | grep  aes) i not found aes support. How to configure nova to enable aes instructions?09:59
*** jaosorior has quit IRC10:03
*** tetsuro has joined #openstack-nova10:05
*** ociuhandu_ has quit IRC10:07
*** ociuhandu has joined #openstack-nova10:07
*** ttsiouts has quit IRC10:11
*** ttsiouts has joined #openstack-nova10:12
*** tetsuro has quit IRC10:13
*** brinzhang has quit IRC10:14
*** ociuhandu has quit IRC10:14
*** tetsuro has joined #openstack-nova10:14
*** ttsiouts has quit IRC10:17
*** ociuhandu has joined #openstack-nova10:18
*** luksky has joined #openstack-nova10:22
*** ociuhandu has quit IRC10:22
AdamMorkok! How i can modify Nova conf "The Nova* libvirt driver takes its configuration information from a section in the main Nova file /etc/nova/nova.conf." I have docker containers  and how to find /etc/nova/nova.conf.? should i connect to nova container?10:28
*** tetsuro has quit IRC10:30
*** ociuhandu has joined #openstack-nova10:32
*** artom has quit IRC10:32
*** ociuhandu has quit IRC10:33
*** ociuhandu has joined #openstack-nova10:34
*** ttsiouts has joined #openstack-nova10:35
*** ociuhandu has quit IRC10:39
*** sapd1_x has joined #openstack-nova10:40
*** ociuhandu has joined #openstack-nova10:40
*** brault has joined #openstack-nova10:47
*** mkrai has quit IRC10:52
*** mkrai_ has joined #openstack-nova10:52
*** pcaruana has quit IRC10:56
*** panda is now known as panda|lunch11:03
*** artom has joined #openstack-nova11:05
*** ccamacho has quit IRC11:21
*** igordc has quit IRC11:26
*** ociuhandu has quit IRC11:27
*** jaosorior has joined #openstack-nova11:27
AdamMorkok. I see more manuals and understand : in globals.yml need uncomment :node_custom_config: "/etc/kolla/config"  and overrides  basic config for nova (As of now kolla only supports config overrides for ini based configs.) https://github.com/openstack/kolla-ansible/blob/master/doc/source/admin/advanced-configuration.rst  THX11:28
*** pcaruana has joined #openstack-nova11:29
*** AdamMork has quit IRC11:40
*** pcaruana has quit IRC11:42
*** pcaruana has joined #openstack-nova11:42
*** jaosorior has quit IRC11:47
*** udesale has quit IRC11:52
*** udesale has joined #openstack-nova11:54
*** mgariepy has joined #openstack-nova12:01
*** artom has quit IRC12:04
*** ociuhandu has joined #openstack-nova12:06
*** ociuhandu has quit IRC12:06
*** markvoelker has joined #openstack-nova12:07
*** ociuhandu has joined #openstack-nova12:07
*** mrch_ has quit IRC12:08
*** mrch_ has joined #openstack-nova12:10
*** sapd1_x has quit IRC12:11
*** panda|lunch is now known as panda12:16
*** ttsiouts has quit IRC12:18
*** ttsiouts has joined #openstack-nova12:19
*** ttsiouts has quit IRC12:23
*** ratailor has quit IRC12:25
*** psachin has quit IRC12:28
*** awalende has quit IRC12:36
*** jaosorior has joined #openstack-nova12:38
*** tetsuro has joined #openstack-nova12:39
*** ociuhandu has quit IRC12:40
*** ociuhandu has joined #openstack-nova12:45
*** ociuhandu has quit IRC12:45
*** ociuhandu has joined #openstack-nova12:46
*** mkrai_ has quit IRC12:46
*** mkrai has joined #openstack-nova12:47
*** mriedem has joined #openstack-nova12:52
*** dave-mccowan has joined #openstack-nova12:54
mriedemgibi: i saw your resize reschedule rpc pin bug, do you know if that's a new regression in train?12:54
gibimriedem: not yet. I'm about to push a reproduction patch, then I will look into when the fault was introduced12:55
gibiit is clear that the problem is with the legacy request spec in prep_resize ending up in the conductor during a re-schedule with rpc pinned to 5.012:55
*** eharney has joined #openstack-nova12:56
*** rcernin has joined #openstack-nova12:56
*** ociuhandu has quit IRC12:57
*** ociuhandu has joined #openstack-nova12:57
*** Luzi has quit IRC12:58
gibimriedem: this is where it blows https://github.com/openstack/nova/blob/9b2e00e015f22b2d876cd3c239af8e139040c8c8/nova/conductor/manager.py#L32712:58
mriedemi know that in compute rpc api 5.1 we send the RequestSpec to compute https://opendev.org/openstack/nova/src/branch/master/nova/compute/rpcapi.py#L83312:58
mriedembut backlevel if to the dict form if we can't12:58
mriedemgibi: ah yeah - that's my patch12:59
mriedemremember?12:59
mriedemhttps://review.opendev.org/#/c/680762/12:59
mriedemso if you have a recreate we can lay that on top13:00
gibimriedem: ohh you have a fix for it. cool. Yeah I will push a functional repro soon and the you can rebase13:00
gibior I can rebase13:00
gibiyour series13:00
mriedemyou can rebase it,13:01
mriedemalso, i've duplicated your bug against 184309013:01
gibimriedem: thanks. I will update my patch accordingly13:01
gibimriedem: I reached this bug while tried to create a func test for the bandwidth case when rpc in pinned13:01
gibimriedem: it seems there will also be extra issues13:02
mriedemyeah it was this patch that made me think of it in your bw series for prep_resize where you were using the request spec13:02
gibimriedem: btw do you know about a bug regading booting a instance with --availablity-zone az:node and then migrate it. I saw that in this case nova does not try to re-schedule the migration if the first dest host fails in prep_resize13:03
gibiit seem the that filter_properties are not populated for re-schedule13:03
gibiif I boot the server with the new host parameter then the re-schedule work13:04
mriedemgibi: that's working as designed https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/utils.py#L80113:05
mriedemusing az::node will set force_nodes13:05
mriedemdo you see "Re-scheduling is disabled due to forcing a host" in the logs?13:06
gibimriedem: but forcing the host during the boot allows me to migrate it without dest host specified, but it does not allow nova to re-schedule13:06
gibiduring that migrate13:06
gibimriedem: I have to go back and re-create the situation as I saw this while I created the func test for the rpc pin bug.13:07
mriedemgibi: which release? https://review.opendev.org/#/q/I3f488be6f3c399f23ccf2b9ee0d76cd000da0e3e13:07
gibimriedem: master :)13:07
mriedemoh nvm that's ignore_hosts13:07
mriedemso i think force_hosts/nodes gets persisted for some reason, i'm not sure why, but for every move operation we call this to basically unset those fields https://opendev.org/openstack/nova/src/branch/master/nova/objects/request_spec.py#L69213:08
mriedemit would be simpler if we just didn't persist those values13:08
mriedemgibi: so maybe we're calling ^ *after* populate_retry13:09
*** ociuhandu has quit IRC13:09
mriedemyup https://opendev.org/openstack/nova/src/branch/master/nova/conductor/tasks/migrate.py#L29313:10
*** spatel has joined #openstack-nova13:11
gibimriedem: so is the order of pupulate and rest wrong there?13:11
mriedemit appears so13:11
gibis/pupulate/populate_retry13:11
gibimriedem: OK I will create a reproduction for that as well13:12
mriedemi'm checking if that's a regression13:12
*** ociuhandu has joined #openstack-nova13:13
*** brault has quit IRC13:13
mriedemgibi: looks like it's been that way since newton https://review.opendev.org/#/c/284974/18/nova/conductor/tasks/migrate.py13:13
gibimriedem: then nobody cares :D13:14
mriedemha13:14
gibimriedem: btw there are re-schedule func tests that uses az:node boot, so those will start behaving differently if we fix this13:14
mriedemor nobody just ever thought to ask if that's wrong13:14
mriedemdansmith: bauzas: can you think of any rason why we persist RequestSpec.force_hosts/nodes?13:15
mriedemlike many of the other request spec fields, persisting that seems to only cause headaches13:15
bauzasPEBKAC ?13:15
bauzasI thought we have a method for this13:16
mriedemi understand that when request spec was originally written everything was persisted, but we've rolled back a lot of that piece by piece as we find problems13:16
*** ociuhandu has quit IRC13:16
mriedembauzas: yeah, but that's kind of ... dumb, right?13:16
bauzasmriedem: https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L69213:16
*** ociuhandu has joined #openstack-nova13:16
mriedemwe could just not persist the damn field and then we wouldn't have to explicitly reset for every move operation13:16
bauzasmriedem: yeah, this13:16
*** jawad_ax_ has quit IRC13:16
bauzasmriedem: when alaski provided the persistence for the spec, his and me forgot to discuss about this :(13:17
bauzashim*13:17
bauzasso we created this method13:17
*** jawad_axd has joined #openstack-nova13:17
mriedemok but going forward, to avoid new ways of shooting off our toes, we could just stop persisting force_hosts/nodes and only rely on that method as a workaround for existing request specs13:18
mriedemthese are the fields i count that we've changed to not be persisted on the requestspec: ignore_hosts, requested_destination, retry, network_metadata, requested_resources13:19
mriedemthe latter 2 are newer and were not persisted since they were introduced, but the first 3 were all retroactive13:20
mriedemoh and the instance_group members/hosts https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L61413:20
mriedemwhich reminds me, if any stable core cares about pike and ocata https://review.opendev.org/#/q/topic:bug/1830747+status:open13:21
bauzasmriedem: cool with this13:21
*** jawad_axd has quit IRC13:22
bauzasmriedem: I was also thinking about some way to say which specific fields shouldn't be persisted13:22
bauzasmriedem: and sure, will look at the stable changes13:22
*** mrch_ has quit IRC13:22
mriedemgibi: as for existing functional resize reschedule tests using az::node, i'd have to see the impacts to assess really13:23
bauzasmriedem: efried_pto: btw. https://review.opendev.org/#/c/683327/13:23
mriedembut if we ignore the forced host/node to boot the server during resize scheduling, it seems we should allow rescheduling to alternates13:24
gibimriedem: sure. we will see when I managed to write the reproduction and reorder the populate_retry call13:26
mriedemstephenfin: are you working on an admin guide docs patch for pcpu?13:27
stephenfinmriedem: yup13:27
stephenfinatm, in fact13:27
stephenfinshould have something ready to go before EOD13:27
mriedembauzas: notes on the prelude patch13:35
bauzascool13:36
*** zhubx has quit IRC13:37
*** zhubx has joined #openstack-nova13:37
*** zhubx has quit IRC13:38
mriedemweee https://zuul.opendev.org/t/openstack/build/e3f767dbd32247ffa5aa7daa79e0e5af/log/job-output.txt#3274813:40
openstackgerritBalazs Gibizer proposed openstack/nova master: Func test for migrate reschedule with pinned compute rpc  https://review.opendev.org/68338513:40
gibimriedem: reproduction for 1843090 ^^13:41
gibimriedem: now I'm going to do the rebase of your fix on it13:41
mriedemgibi: when you rebase you might as well strike whatever word efried_pto didn't like from my comment13:42
gibimriedem: sure thing13:42
*** BjoernT has joined #openstack-nova13:45
*** cshen has quit IRC13:46
*** mkrai has quit IRC13:48
*** mkrai_ has joined #openstack-nova13:48
*** tbachman has joined #openstack-nova13:48
*** tetsuro has quit IRC13:49
*** JamesBenson has joined #openstack-nova13:52
mriedemgibi: comments in your functional recreate patch13:52
*** artom has joined #openstack-nova13:54
gibimriedem: thanks. regarding the request_spec uglyness. I can cook up something that is test only. But sooner or later we (I) need to go back and fix this uglyness as it is now needed in two places (as noted)13:54
*** macz has joined #openstack-nova13:55
gibimriedem: the rest of your comments is valid and clear.13:55
mriedemyeah i have no idea why that object is a problem b/c we pass request spec over rpc (select_destinations and such) all the time13:55
mriedemin tests i mean13:56
gibimriedem: in the meantime I can tell you that your fix works according to the functional test13:56
mriedem\o/13:56
*** mlavalle has joined #openstack-nova13:56
gibi... what a nice Friday it is13:56
*** belmoreira has quit IRC13:56
*** belmoreira has joined #openstack-nova14:00
mriedemmy kid is also staying home from school today claiming to be sick and i can't tell if she's playing opposum14:02
mriedemi need to put her to work14:02
*** mkrai_ has quit IRC14:04
artomHow good is she with Python? Weren't we looking for interns for all the osc gaps?14:05
gibimriedem: Is there a change that we consider this https://review.opendev.org/#/c/672577/ as a bug and backport it to stable/pike? (yeah I have a bug day today)14:05
gibiartom, mriedem: hm osc gaps, and osc bugfix backport ^^14:05
*** openstackgerrit has quit IRC14:06
mriedemaspiers: you might want to talk to some suse developers downstream about their upstream "fix only on the branch i care about" practices https://review.opendev.org/#/c/676882/14:07
mriedemgibi: s/change/chance/ ?14:08
gibimriedem: yes, sorry14:08
mriedemgoing to pike is probably going to be tough...14:08
mriedembut, that fix should be compatible...14:09
gibimriedem: OK. I will look into that as well14:09
mriedemit's clearly a bug if you're using >= 2.5314:09
gibimriedem: or find somebody who has time14:09
mriedempike is in extended maintenance upstream so i'm not sure how much dtroyer is going to care about backporting it that far14:09
gibimriedem: yes, we have pike deployemnts out there using 2.53 and seeing openstack client fail14:09
mriedemgibi: are they pinning the version to 2.latest or something?14:10
mriedemor they are just pinning to 2.53 since that's the max in pike?14:10
gibimriedem: using 2.53 by default, but sure we can go back to olderin for this command as a workaround14:10
gibimriedem: 2.53 is used as that is max pike14:11
*** tbachman has quit IRC14:11
gibiit seems I was able to hook in elod to do this client backports...14:11
mriedemi started stein for you https://review.opendev.org/#/c/683394/114:11
gibimriedem: thanks, elod ^^14:12
mriedemgibi: btw, where do you get your packages? some upstream distro or do you build your own?14:12
mriedemb/c for pike you might have to patch your osc package14:12
gibimriedem: packages come from Mirantis14:13
gibimriedem: so if I fail to do a clean thing upstream I will dump the "downstream" work there14:13
dtroyergibi, mriedem: that is enough of a bug it would be back-portable.  OSC follows the same rules of working backwards through releases on backports.   Of course, standard disclaimer of just using a modern OSC goes here, I understand that packagers don't do that, I wish they would/could…14:15
gibidtroyer: thanks14:16
*** BjoernT has quit IRC14:16
mriedemthe biggest barrier to backporting osc fixes is if they rely on something in novaclient that's not in whatever stable branch you're targeting14:17
mriedemit looks like in this case you might get lucky14:17
*** BjoernT has joined #openstack-nova14:19
*** rcernin has quit IRC14:23
*** cfriesen has joined #openstack-nova14:29
*** jaosorior has quit IRC14:30
*** dtantsur is now known as dtantsur|afk14:33
*** liuyulong has joined #openstack-nova14:34
*** tbachman has joined #openstack-nova14:36
hemnahey guys, any idea why I might be getting theses failures against stable/pike: http://paste.openstack.org/show/778056/14:36
hemnatox -epy27 against stable/pike gives about 6300 of those errors14:37
*** rcernin has joined #openstack-nova14:38
*** ociuhandu has quit IRC14:39
*** damien_r has quit IRC14:39
mriedemoh hemna14:42
mriedemhemna: KeithMnemonic1 must have some dirt on you14:43
KeithMnemonic1lol14:43
mriedemhemna: so you're trying to figure out why these are failing https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6e4/683008/2/check/openstack-tox-py27/6e40a0c/testr_results.html.gz right? i'm not sure why you'd be hitting that weird db issue locally - should be using sqlite14:45
mriedemhemna: i'd try cleaning pycs and rebuilding the venv:14:45
mriedemfind . -name *.pyc delete14:45
mriedemtox -r -e py2714:45
bauzasmriedem: just to make it clear, you want me providing the links to https://releases.openstack.org/train/highlights.html#nova-compute-service in https://review.opendev.org/#/c/683327/1/releasenotes/notes/train-prelude-3db0f5f6a75cc57a.yaml ?14:46
*** BjoernT_ has joined #openstack-nova14:46
mriedembauzas: in the commit message?14:47
mriedemi would14:47
mriedemwait no not the latter14:47
bauzasmriedem: looks like you also want to have links in the release note14:47
mriedemthe etherpad and the published highlights doc14:47
mriedembauzas: no14:47
*** damien_r has joined #openstack-nova14:47
mriedemi'm telling you, pull your content from the highlights,14:47
bauzasoh, k14:48
mriedemplus additional stuff not in the highlights that's in the etherpad, like nova-cells/consoleauth removal and xenapi driver deprecatoin14:48
bauzasok, understood14:48
mriedemwe intentionally didn't include ^ in highlights14:48
mriedemb/c they aren't highlights14:48
bauzascool14:48
bauzaswill do it later14:48
mriedemxenapi driver deprecatoin is a brown light14:48
*** BjoernT has quit IRC14:48
*** ociuhandu has joined #openstack-nova14:49
*** ociuhandu has quit IRC14:49
*** ociuhandu has joined #openstack-nova14:50
hemnaok I'll try that14:50
hemnaI'm not sure I'll be successfull with this conflict resolution on that patch14:50
*** udesale has quit IRC14:50
hemnasince so much changed between pike and queens14:50
*** udesale has joined #openstack-nova14:51
mriedemi didn't do the tests for a reason14:51
mriedemi could, but i'd need some incentive$$$14:52
*** openstackgerrit has joined #openstack-nova14:52
openstackgerritWalter A. Boring IV (hemna) proposed openstack/nova stable/pike: WIP: Avoid redundant initialize_connection on source post live migration  https://review.opendev.org/68300814:52
hemnafixed the pep8 issues at least.....14:53
hemnaheh14:53
KeithMnemonic1mriedem will you be in Shanghai. I can incent you there14:54
*** TxGirlGeek has joined #openstack-nova14:55
*** TxGirlGeek has quit IRC14:56
*** TxGirlGeek has joined #openstack-nova14:57
*** luksky has quit IRC14:58
*** macz has quit IRC14:59
mriedemi don't think so14:59
gibiwhaaat?15:02
gibiwho will be in Shanghai then?15:02
mnaserits friday and i dont want to deal with this >:(15:04
mnasernova_api.instance_mappings shows a mapping for an instance15:04
bauzasgibi: \o15:04
*** _mmethot_ has quit IRC15:04
gibibauzas: :)15:04
mnasernova.instances shows the instance there with deleted=0, vm_state=error though15:04
*** _mmethot_ has joined #openstack-nova15:05
bauzasat least we can ask a room for two :p15:05
openstackgerritBalazs Gibizer proposed openstack/nova master: Func test for migrate reschedule with pinned compute rpc  https://review.opendev.org/68338515:05
openstackgerritBalazs Gibizer proposed openstack/nova master: Handle legacy request spec dict in ComputeTaskManager._cold_migrate  https://review.opendev.org/68076215:05
openstackgerritBalazs Gibizer proposed openstack/nova master: Isolate request spec handling from _cold_migrate  https://review.opendev.org/68076315:05
mriedemstephenfin: you might have thoughts on this https://bugs.launchpad.net/nova/+bug/184472115:05
openstackLaunchpad bug 1844721 in OpenStack Compute (nova) "Need NUMA aware RAM reservation to avoid OOM killing host processes" [Undecided,New]15:05
* bauzas should consider playing ping-pong there15:05
mnaserapi gives 404 when looking up the instance..15:05
mnaserbut its there when doing a list15:05
*** bnemec is now known as beekneemech15:06
stephenfinmriedem: Yup, that's a long standing issue. There's definitely at least one existing report of that. Possibly many15:06
stephenfinUnfortunately the impact of a fix will likely be significant, which is why we've punted it continuously /o\15:06
*** belmoreira has quit IRC15:07
mriedemdo you know if it's documented as a known issue?15:07
sean-k-mooneythey can get numa aware ram reservation by setting hw:mem_page_size=small15:07
sean-k-mooneyif you are using cpu pinning but not hugepages you should set that15:07
mriedemyeah the bug says "Many mitigation are "invented", but those mitigation all have some form  of technical or operational "difficulties". One mitigation, for example,  is to enable huge pages, and put VMs on huge pages."15:07
*** maciejjozefczyk has quit IRC15:07
*** gyee has joined #openstack-nova15:07
aspiersmriedem: I'm sure it was honest ignorance of the policy rather than deliberately avoiding work. Your -1 should be sufficient to get it fixed.15:08
*** jmlowe has joined #openstack-nova15:08
sean-k-mooneymriedem: hugepages will prevent it but hw:mem_page_size=small will use 4k pages and still fix the issue15:08
mriedemaspiers: i didn't say anyone was deliberately avoiding doing the work,15:08
stephenfinmriedem: yeah, what sean-k-mooney said15:08
mriedembut i've also seen it across several projects from multiple suse developers15:08
mriedemso it's a pattern15:08
mriedemhence my post to the ML15:08
aspiersmriedem: OK, I'll forward it internally15:09
mriedemaspiers: thanks15:09
mriedemsean-k-mooney: stephenfin: ok it would be good if we had this documented as a limitation somewhere if we don't already15:09
mriedemi'm not sure if that's best in the numa topo admin docs, flavor extra spec, reserved host ram config option, other?15:09
stephenfinI think we should, but I'll check. Will be easy tack onto this existing doc15:09
sean-k-mooneyya we should document the requirement to set hw:mem_page_size small if you set hw:cpu_policy=dedicated and dont you hugepages15:10
mriedemok so maybe it's best to just document alongside that extra spec https://docs.openstack.org/nova/latest/user/flavors.html15:10
sean-k-mooneythe reason this happens is when you enable pinning with out settin hw:mem_page_size we only look at available cpus in the numa toplogy filter not allso aviable memory on the numa nodes15:11
*** belmoreira has joined #openstack-nova15:11
sean-k-mooneyi fyou enable hw:mem_page_size=small|large it validate the avaiable memory too15:11
stephenfinsean-k-mooney: isn't there some issue with how we track mempages though15:12
stephenfinnone of this is obviously in placement yet15:12
sean-k-mooneyyou cant mix vms that use hw:mem_page_size with vms that dont on the same host or it invalidate the tracking15:12
sean-k-mooneybut you should not mix numa and non numa vms on the same host anyway15:12
mriedem"but you should not mix numa and non numa vms on the same host anyway" - is that documented?15:13
mriedemwe have a shit load of tribal knowledge about this stuff15:13
mriedemand by we i mean you15:13
mriedemall i see on https://docs.openstack.org/nova/latest/admin/cpu-topologies.html is "Host aggregates should be used to separate pinned instances from unpinned instances as the latter will not respect the resourcing requirements of the former."15:14
sean-k-mooneyi dont think its in the upstream docs or redhats it was in the intel tuning guide i helped writhe in like 201415:14
sean-k-mooneyso yes i shoudl document the tribal knowlage around this15:14
sean-k-mooney*my tribale knowlasge around this15:14
*** cfriesen has quit IRC15:14
*** trident has quit IRC15:14
sean-k-mooneythe extention of that warning to numa instead of pinned is to because of the OOM behavior15:15
sean-k-mooneyi can try and write something for this on monday if you / stephen can correct the spelling after15:16
kashyapsean-k-mooney: Happy to look as well :-)15:16
mriedemah here is the duplicate https://bugs.launchpad.net/nova/+bug/179298515:17
openstackLaunchpad bug 1439247 in OpenStack Compute (nova) "duplicate for #1792985 Small pages memory are not take into account when not explicitly requested" [Medium,Confirmed]15:17
mriedemwhich is a duplicate of another bug15:17
aspiersmriedem: forwarded internally. If you see it happen again feel free to ping me privately15:17
mriedemack15:17
sean-k-mooneymriedem: do you want to assing one of those bugs to me and i can update the documentaiton regarding the mixing of numa and non numa instances and the use of hw:mem_page_size=small|large to avoid OOM events for numa affined instances15:19
mriedemartom: this isn't fixed with your numa live migration series right? https://bugs.launchpad.net/nova/+bug/1496135 - that's the thing you were half fixing but we had you remove it?15:19
openstackLaunchpad bug 1496135 in OpenStack Compute (nova) "live-migration will not honor destination vcpu_pin_set config" [Medium,Confirmed]15:19
mriedemsean-k-mooney: i can only do so much assing in one day15:20
sean-k-mooneysee alrealy indicating my spelling mistakes :)15:20
mriedemsean-k-mooney: i don't think we probably need to assign it to you, a docs patch would just be a related bug anyway15:21
sean-k-mooneyam i thnk that should be fixed by the numa live migration changes15:21
artommriedem, not clear - they talk about policy, which I assume means hw:cpu_policy, which my thing *did* fix15:21
mriedemyou can track a todo however you want though15:21
*** ivve has quit IRC15:21
sean-k-mooneyoh15:21
sean-k-mooneythis is the edgecase we removed15:21
artomBut... then they also talk about vcpuset15:21
sean-k-mooneyyes15:22
artomSo, it's confusing15:22
sean-k-mooneythis is migration of a non numa instance between host with vcpu_pin_set defined15:22
sean-k-mooneyso the thing we added then removed15:22
sean-k-mooneyso this is not fixed yet15:22
artomI guess? In any case, that's what it can become, because we need something to track that15:22
mriedemok i left a comment https://bugs.launchpad.net/nova/+bug/1496135/comments/1015:23
openstackLaunchpad bug 1496135 in OpenStack Compute (nova) "live-migration will not honor destination vcpu_pin_set config" [Medium,Confirmed]15:23
mriedemartom: while you're here, are we going to try and get https://review.opendev.org/#/c/672595/ into train yet?15:24
mriedemi'm not sure how much fun that would be to backport15:24
artommriedem, I'm always here man15:24
*** trident has joined #openstack-nova15:24
mriedemthough it's all test stuff15:24
mriedemyou weren't yesterday when i needed to bug you15:24
artomPTO, apple picking with son's daycare15:24
mriedemsee, you're not *always* here15:25
artomBought some cider, good stuff15:25
artomIn my hear I am15:25
artom*heart, even15:25
mriedemheart15:25
artomAnyways15:25
mriedemmeanwhile i'm just assing around15:25
artomYeah, the func test should land in Train15:25
artomIf you and dansmith are ready to dive back in, I can pick it up again15:25
artomAlthough it conflicts with some of the PCPU stuff that's still in flight15:26
mriedemi've added it to https://etherpad.openstack.org/p/nova-train-release-todo15:26
artomhttps://review.opendev.org/#/c/681060/ specifically15:26
mriedemi'm not going to review that today15:26
artom^^ needs to land first, anyways, then I rebase and continue15:27
mriedemso likely rc2 at this point15:27
*** luksky has joined #openstack-nova15:27
artomI guess yeah15:28
*** trident has quit IRC15:29
mriedemwhat is the end of the pcpu series? the reshaper patch? because i was thinking about rebasing https://review.opendev.org/#/c/683011/ on top of that so we can approve it to make sure it gets into rc115:29
sean-k-mooneyif its a test only patch we could technically backport it right15:30
openstackgerritMatt Riedemann proposed openstack/nova master: Revert "Temporarily skip TestNovaMigrationsMySQL"  https://review.opendev.org/68301115:30
sean-k-mooneyso does it need to land now15:30
mriedemsean-k-mooney: correct15:30
mriedemsean-k-mooney: what i worry about is saying "sure we'll get to it in ussuri" and then 4 months go by and dansmith and i have lost all context on that stuff and then we don't want to land it and it just never lands15:30
mriedemstrike while the iron is hot and all that15:31
sean-k-mooneyi think we should time box any of this stuff to ensure it before m115:31
sean-k-mooneyya15:31
sean-k-mooneyi was just suggesting that if did not make rc1/2 it could still end up in train15:32
mriedemif anyone is looking for glance image properties docs to work on https://bugs.launchpad.net/nova/+bug/176376115:35
openstackLaunchpad bug 1763761 in Glance "CPU topologies in nova - doesn't mention numa specific image properties" [Medium,Triaged]15:35
*** belmoreira has quit IRC15:37
mriedemlooking at old numa bugs, it looks like https://review.opendev.org/#/c/458848/ just needed some test,15:39
mriedemmight be something for a core to pick up15:39
*** trident has joined #openstack-nova15:40
mriedemartom: i wonder if this is the same numa topology claim race failure thing described in the ML https://bugs.launchpad.net/nova/+bug/182934915:41
openstackLaunchpad bug 1829349 in OpenStack Compute (nova) "Resource Tracker failed to update usage in case numa topology conflict happen" [Undecided,In progress] - Assigned to leehom (feli5)15:41
artommriedem, not sure that's still applicable, stephenfin did a thing around that a long time ago15:41
mriedemnote they are enabling he workaround config15:41
artommriedem, I don't think that's related - their bug is "if two instances have the same pins, RT blows up"15:42
artomBut with NUMA LM we're not supposed to end up in that situation in the first place15:43
mriedem"supposed to"15:43
mriedembut if we have a race with claims not reporting correctly or something, we could still race and blow up right?15:43
artomRight15:44
mriedemanyway there are some details from the reporter on what they think the problem is and where, so maybe would help15:44
*** rcernin has quit IRC15:44
artommriedem, well they proposed https://review.opendev.org/#/c/661208/15:45
artomWhich is... no15:45
artomI need food15:46
sean-k-mooneymriedem: if your looking for numa related bug we should fix sooner rather then later we should aim to merge and backport https://review.opendev.org/#/c/662522/ sonner rather then later15:52
sean-k-mooneyits alex_xu and stephenfin fix for https://bugs.launchpad.net/nova/+bug/180576715:52
openstackLaunchpad bug 1805767 in OpenStack Compute (nova) "The new numa topology in the new flavor extra specs weren't parsed when resize" [Medium,In progress] - Assigned to Stephen Finucane (stephenfinucane)15:52
sean-k-mooneyi think of think all move operation bar shelve are broken in some way if you have a numa toplogy15:53
sean-k-mooneythere are speic things that work but it feels like there are more edgecase that dont then do somethimes15:54
sean-k-mooneythat said we have a tone of fixes for them pending in general15:54
mriedemthis is where i make a generic statement about lack of integration test coverage with tempest for numa,15:55
mriedemand then you say, "i'm working on a ci job for that"15:55
sean-k-mooneyi still thinks there are enogh latent bugs to keep us going for U before im going to be happy we have hardended it enough15:55
sean-k-mooneywell its true15:55
*** jawad_axd has joined #openstack-nova15:55
sean-k-mooneybut also we have been without any real ci coverage for like 18 months so :(15:56
mriedemyou've done more to try and get numa tempest integration testing on nova changes than anyone so i'm not faulting you15:56
sean-k-mooneyoh i know i just whish i had more time to spend on it15:56
mriedemit's just that my eyes glaze over when i hear "we have a bug with x that's related to numa"15:57
sean-k-mooneyalthough for the next week or two im going to continue looking at testing and ci15:57
*** TxGirlGeek has quit IRC15:57
*** rpittau is now known as rpittau|afk15:57
*** TxGirlGeek has joined #openstack-nova15:58
*** elod has quit IRC15:58
sean-k-mooneymriedem: i just pushed this by the way https://review.opendev.org/#/c/683431/15:58
sean-k-mooneythat will allow us to test cpu pinning hugepages and dpdk more or less reliably15:59
*** jawad_axd has quit IRC15:59
sean-k-mooneyim going to wait until after RC* to follow up with getting multi numa flavor on vexxhost and limestone16:00
openstackgerritSylvain Bauza proposed openstack/nova master: Add a prelude for the Train release  https://review.opendev.org/68332716:00
mriedemstephenfin: i'm going to rebase https://review.opendev.org/#/c/682267/4 on top of https://review.opendev.org/#/c/674895/44 since it sounds like we're holding back on merging the former so as to not interrupte the latter16:04
openstackgerritMatt Riedemann proposed openstack/nova master: libvirt: Get the CPU model, not 'arch' from get_capabilities()  https://review.opendev.org/68226716:04
mriedemthere wasn't even a merge conflict so i'm not sure what gerrit was complaining about16:05
mriedembauzas: do you want to re-approve https://review.opendev.org/#/c/681750/ since it was rebased?16:06
*** damien_r has quit IRC16:06
mriedemthe rest of that stack is stuck behind it16:06
openstackgerritStephen Finucane proposed openstack/nova master: docs: Update CPU topologies guide to reflect the new PCPU world  https://review.opendev.org/68343716:08
donnydsean-k-mooney: LGTM16:08
stephenfinmriedem: Makes sense. Also, there the start of the doc ^16:08
mriedemyu[p16:08
donnydI gave it my whole +116:08
bauzasmriedem: cool16:09
bauzasand I call it a day16:09
mriedemstephenfin: man i wish there wasn't a big refactor in that patch16:10
sean-k-mooneydonnyd: thanks the first thing i hope to use it for is ovs-dpdk testing with https://review.opendev.org/#/c/656580/16:10
stephenfinI literally just said I shouldn't have done that to sean-k-mooney /o\16:10
stephenfin(who's working up in the RH office in Dublin with me for the day)16:10
sean-k-mooneyyep you did16:11
stephenfinmriedem: Gimme five to separate it out16:11
mriedemi know it's a compulsion16:11
mriedemobligatory "P16:11
mriedem:P16:11
*** elod has joined #openstack-nova16:13
sean-k-mooneystephenfin: by the way once i get teh ovs-dpdk job working on networking-ovs-dpdk i assume you have no issue with me adding it as non voting to check in os-vif and then promote it to voting around m116:15
stephenfinnot on the slightest16:15
sean-k-mooneycool i think the vhost user path is the only one without tempest coverate in os-vif currently16:16
*** jdillaman has quit IRC16:16
sean-k-mooneyi also need to port one of the jobs to zuul v3 which i hope to do next week16:16
*** pcaruana has quit IRC16:19
mriedemmnaser: does the instance mapping have a cell mapping?16:23
mriedemdoes the instance have a stale build request?16:23
mriedemin that weird GET /servers/{server_id} 404 case before i've handled the InstanceNotFound in the API controller code, logged the traceback and then re-raised16:23
mriedemso you can see where we're coming from, e.g. build request or what16:24
mriedemmnaser: log the traceback here https://github.com/openstack/nova/blob/stable/stein/nova/api/openstack/common.py#L47116:25
mriedemi'm pretty sure you wrote a script to deal with the case that the instance is in a cell, the build request is gone, but the intsance mapping doesn't have a cell mapping16:26
mriedemyeah which i used here https://review.opendev.org/#/c/655908/16:27
openstackgerritStephen Finucane proposed openstack/nova master: docs: Clarify everything CPU pinning  https://review.opendev.org/68343716:27
openstackgerritStephen Finucane proposed openstack/nova master: docs: Update CPU topologies guide to reflect the new PCPU world  https://review.opendev.org/68348516:27
openstackgerritOpenStack Release Bot proposed openstack/os-vif stable/train: Update .gitreview for stable/train  https://review.opendev.org/68348816:28
openstackgerritOpenStack Release Bot proposed openstack/os-vif stable/train: Update TOX/UPPER_CONSTRAINTS_FILE for stable/train  https://review.opendev.org/68348916:28
openstackgerritOpenStack Release Bot proposed openstack/os-vif master: Update master for stable/train  https://review.opendev.org/68349016:28
stephenfinmriedem: Split ^16:28
stephenfinwait, that doesn't look right16:29
* stephenfin tries again16:29
stephenfin:(16:29
mriedemi left some other comments/questions about upgrade and quota details, not sure if you saw those while splitting16:31
stephenfinack16:32
*** BjoernT_ has quit IRC16:32
sean-k-mooneylyarwood: can you take a look at the propasl bot patches for os-vif above on stable ? they look sane to me16:34
sean-k-mooneylyarwood: also it not supper urgent so it can wait till next week but they are trivial16:35
mriedemdansmith: finally got around to replying to our comments on this heal instance mappings command https://review.opendev.org/#/c/655908/ - since there was no vote i hadn't noticed16:38
openstackgerritStephen Finucane proposed openstack/nova master: docs: Clarify everything CPU pinning  https://review.opendev.org/68343716:39
openstackgerritStephen Finucane proposed openstack/nova master: docs: Update CPU topologies guide to reflect the new PCPU world  https://review.opendev.org/68348516:39
*** sean-k-mooney has quit IRC16:42
*** derekh has quit IRC16:42
*** ociuhandu has quit IRC16:43
dansmithmriedem: okay I'll have to re-read to build my context back up16:45
mnasermriedem: the instance does have a mapping, which is where this is different than every time16:51
mnaserinstance_mapping in nova_api exists, and instance in nova exists..16:51
*** igordc has joined #openstack-nova16:55
*** udesale has quit IRC16:57
*** gbarros has joined #openstack-nova16:58
*** ociuhandu has joined #openstack-nova17:00
*** ociuhandu has quit IRC17:04
mnaserwaaaaaaaaaaait17:10
mnaserthe cell mapping points at cell017:10
mnaserbut the instance exists in error state.. in the actual cell (cell1 in this case)17:11
mnaserand the cell0 one _is_ marked as deleted17:12
mnasergiven my cell knowledge so far, you cant start from cell1 and then get buried in cell0 right?17:12
mnaseryou can't end up in cell0 if you're in cell1?17:13
dansmithcorrect17:13
dansmithI mean.. that's the intent/rule17:14
openstackgerritArtom Lifshitz proposed openstack/nova master: Move pre-3.44 Cinder post live migration test to test_compute_mgr  https://review.opendev.org/68359717:17
mriedemartom: "The previous patch (Id0e8b1c32600d53382e5ac938e403258c80221a0) created" you're learning!17:22
artommriedem, next step: multicellular organisms17:22
mriedemmnaser: that's really strange, there are only two places we bury in cell0,17:24
mriedem1. if scheduling fails https://github.com/openstack/nova/blob/stable/stein/nova/conductor/manager.py#L135917:24
mriedem2. or scheduling picks a host that doesn't have a mapping https://github.com/openstack/nova/blob/stable/stein/nova/conductor/manager.py#L138617:24
mriedemi wonder if you had hosts incorrectly mapped to cell0?17:25
mriedemhttps://github.com/openstack/nova/blob/stable/stein/nova/conductor/manager.py#L139717:25
mriedemthere shouldn't ever be host mappings pointing at cell0 but we don't explicitly fail if you try to do that17:25
mriedeme.g. if you started a compute service pointing at the wrong cell database and it was registered in cell0 and then got mapped there in the host mapping17:26
mriedemdansmith: do you think it'd be worthwhile to have a sanity check around https://github.com/openstack/nova/blob/stable/stein/nova/conductor/manager.py#L1421 such that if the cell is actually cell0 in that case we blow up with some kind of "idk what you're doing but this is definitely wrong"17:27
*** mrch_ has joined #openstack-nova17:30
mriedemthat still wouldn't explain how the instance in mnaser's case went from cell0 to cell1, unless he has a busted heal script17:30
mriedemmaybe a host mapping pointing at cell0 isn't possible after all https://github.com/openstack/nova/blob/stable/stein/nova/objects/host_mapping.py#L25317:32
* mnaser catching up on backlog17:33
mnaseryeah so in this case, i have a db record for this instance  both in cell0 and cell117:33
mnaserand i dont think the config changed anytime recently..17:33
mriedemis the host value set for either instance record?17:33
mnaserboth null17:34
mnaserbut the cell0 one is deleted and cell1 is not (aka deleted=0)17:34
mriedemwhich has the older created_at time?17:34
mnaseroh good idea17:34
mnaserthe same exact time.17:35
mriedemwtf17:35
mriedemyou don't have some janky homebrew scripts that try to heal and revive an instance in cell0 and put it into cell1?17:35
mnasernot that i know of17:36
mnaserand also in case im losing it17:36
mnaserhttp://paste.openstack.org/show/778146/17:36
mriedemsome sort of weird mariadb cluster issue or something?17:37
mriedemyou wouldn't be clustering those together though unless you really screwed up17:37
mnaserits just plain old galera, and those are two different databases too..17:38
mnaserso i dont know how it would like "oh let me make a cell0 thing"17:38
mriedemit's clearly a phantom, some kind of 9/11 phantom, and it showed up too early for halloween17:38
mriedemno obvious weirdness for that instance uuid in the logs?17:38
mnaserstuff are mostly rotated out but im looking at all the fields and see the diffs17:39
mnaseroh hm17:39
mnaserso different fields are17:40
openstackgerritBalazs Gibizer proposed openstack/nova master: Remove functional test specific nova code  https://review.opendev.org/68360917:40
mnaserupdated_at (obviously cause deleted was changed), deleted_at (for the one that deleted), id (different ids from cell0 and cell1), availablity zone and launched_on (it looks like it was scheduled?!)17:40
dansmithmriedem: hmm, that's probably not a bad idea17:41
mnaserlet me check if there's anything intersting on the host it was provisioned on..17:41
mriedemit's very weird that launched_on would be set but host is not17:41
openstackgerritOpenStack Release Bot proposed openstack/python-novaclient stable/train: Update .gitreview for stable/train  https://review.opendev.org/68362517:42
dansmithmriedem: I agree that this sounds highly fishy17:42
openstackgerritOpenStack Release Bot proposed openstack/python-novaclient stable/train: Update TOX/UPPER_CONSTRAINTS_FILE for stable/train  https://review.opendev.org/68362617:42
openstackgerritOpenStack Release Bot proposed openstack/python-novaclient master: Update master for stable/train  https://review.opendev.org/68362717:42
mriedemdansmith: not even sounds, smells!17:42
mnaserok, timed out waiting for network-vif-plugged17:42
dansmithmriedem: reeks of fishy sounds17:42
mriedemmnaser: this is what sets launched_on https://github.com/openstack/nova/blob/f4aaa9e229c98a97af085f31e43509189e2e4585/nova/compute/resource_tracker.py#L54417:42
mriedemoh i bet i know why,17:42
mnaser BuildAbortException: Build of instance 10d8a93a-bb6b-44ef-83f1-be6b21336651 aborted: Failed to allocate the network(s), not rescheduling.17:42
mriedemwe fail the build, set host/node to None but not launched_on or az17:43
mriedemhttps://github.com/openstack/nova/blob/f4aaa9e229c98a97af085f31e43509189e2e4585/nova/compute/manager.py#L206917:43
mriedemyeah _nil_out_instance_obj_host_and_node should also null out launched_on17:44
mriedemanyway, that doesn't explain why the instance is also in cell017:44
mnaseryeah that explains half the story i guess17:44
mriedemmnaser: which copy of the instance has launched_on set?17:44
mriedemcell1?17:44
mnaseryeah17:44
mriedemand that compute host that launched_on is set to, the host_mapping in the api db is pointing at cell1?17:45
mnaseryeah the one is cell1 is correct17:45
mnasergood question about host_mapping17:46
mriedemdoes the name of the instance have an index-like suffix on it, i.e. was it part of a multi-create request?17:46
mriedeme.g. my-vm-1, my-vm-2 etc17:46
mnaserits not a multicreate request (afaik) but the title looks like it is17:46
mnasercentos-7-large-tf-ci-000000632117:47
mnasera zuul instance though17:47
mriedemoh that's not the kind of index i'd be looking for17:47
mriedemsimple integers17:47
mnaseryeah17:47
mnaseri was just wondering in case nova had some logic to match <name>-<number>17:47
mnaserbut i didn't think we'd be that wild :)17:47
mriedemi don't know how this could happen, unless some kind of weird rabbit thing where like we actually got 2 copies of the same build request message to conductor, one went to cell1 and one failed scheduling and went to cell0 or something17:48
mnaseryeah host_mappings is the right one17:48
mnaseryeah im pretty confused..17:49
*** markvoelker has quit IRC17:49
dansmiththat sounds pretty fantastical17:49
dansmithespecially without other MAJOR problems showing up all over the place17:49
mriedembtw the request_specs.num_instances field for the instance would tell you if it was a multi-create request17:49
mriedemeach instance in a multi-create request gets a unique id but i've seen some weird things with multi-create requests that blow up the scheduler and trigger automatic retries to the scheduler17:50
mriedemwe removed that in stein though i think17:50
mriedemthis thing https://github.com/openstack/nova/blob/stable/queens/nova/scheduler/utils.py#L80717:51
mnaseri mean, given the instance failed to deploy because of a timeout17:51
mnaserthere may be other weird things happening17:52
mriedemwell the timeout was likely a failure on the ovs agent17:52
mnaseri .. don't know how it can result in something like scheduling it twice though17:52
mriedemcausing vif plugging to fail17:52
*** markvoelker has joined #openstack-nova17:52
dansmithif the failure was vif plugging, we should be well-settled on the destination host and in the right cell17:53
dansmithway way way away from anything that would re-create it in cell017:53
mnaseryeah thats why i think the only place it could have happened is scheduled twice17:54
mriedemmnaser: do you see instance actions for that instance in both dbs?17:59
mnaseroh good idea17:59
mriedemand if so, are there any differences in the events?17:59
mriedemthe instance_action_events tables i mean17:59
mriedem*table17:59
mriedemi'm not sure we actually have instance action events until we have picked a cell though18:00
mnasertheres no column for instance in instance_actions_events or am i losing it?18:00
mriedemyeah we create the action once we've picked a cell https://github.com/openstack/nova/blob/stable/stein/nova/conductor/manager.py#L148318:00
mriedemmnaser: instance -> instance_actions -> instance_action_events18:00
mnaserohhhh okay18:00
mriedem*instance_actions_events18:01
mnasercreate in cell1, nothing in cell018:01
*** BjoernT has joined #openstack-nova18:01
mnasercompute__do_build_and_run_instance in actions18:01
mriedemyeah i guess that's what i'd expect since bury in cell0 before creating the "create" action18:01
mnaserso the only thing is somehow it got scheduled .. twice?18:02
mnaserone said "nope" and the other said "yep"18:02
mriedemidk how that could happen18:02
mnaserprobably because a db exception error for some constraint18:02
mnaseri mean at this point i'm convinced it might be an infra thing18:02
mnaseragain the VM failed to provision because port plugging timed out so that does hint at things likely being weird18:03
mriedemwell that's why i brought up rabbit18:03
mnaserand not an issue in galera, yeah, it would be  rabbit issue surely18:03
mriedeme.g. is it possible that rabbit thought the message to conductor (or scheduler) wasn't received and resent it?18:03
mnaseri guess that is the most likely theory18:04
dansmithI don't think so18:04
mriedempre-cells v2 we wouldn't have a problem b/c instances.uuid is unique per cell db18:04
mnaserunless rabbit thought the cluster was split or something18:04
dansmithI think that when you enqueue a message, it goes into a queue for a given receiver, if there is one waiting18:04
mriedemdo you have rabbit logs going back to when that instance was created?18:04
dansmithbut yeah, rabbit split brain could have done something I guess18:04
dansmithmriedem: what if, in bury_in_cell0, we check last minute to see if there's already a host mapping and if so, we log and abort?18:05
dansmithI mean, I would think that's where we fail in that routine anyway,18:05
*** eharney has quit IRC18:05
dansmithso there should be some log fallout from us failing to create the cell0 mapping anyway right?18:05
mriedemis this one of the many reasons people say to not cluster rabbit?18:06
mriedembeyond performance?18:06
mnaserrabbitmq is the worlds biggest pita :(18:06
*** BjoernT has quit IRC18:06
mnaserok so at least to clean things up i will update the instance_mappings to point that the right cell18:07
mnaserand then at least the instance will be delete-able18:08
mnaserand then the cell0 will just get archived and disappear18:08
mriedemdansmith: i'm not sure what you mean by "already a host mapping" in bury_in_cell018:08
dansmithmriedem: when we bury in cell0, we also have to create a mapping for it to be there18:09
dansmithmriedem: we should fail to do that if we're late to the party because we're a failed reschedule right?18:09
*** gbarros has quit IRC18:09
mnaserhmm yeah, that's true, i guess the bury doesn't check if there's an existing one there18:09
mriedemdansmith: so here https://github.com/openstack/nova/blob/stable/stein/nova/conductor/manager.py#L1282 ?18:09
*** henriqueof has joined #openstack-nova18:09
mriedembefore doing that, double check that instance_mapping.cell_mapping is None, right?18:09
mriedemi mean we'd want to check way before that though, before this https://github.com/openstack/nova/blob/stable/stein/nova/conductor/manager.py#L126018:11
mriedemi'm not opposed to adding a sanity check18:11
mriedemworst case is it doesn't hurt anything,18:11
mriedembest case is it avoids crazy rabbit clustering split brain weirdness18:11
dansmithmriedem: well, I'm thinking more like direct hit the database to make sure18:12
dansmithmriedem: but again, we should fail to create a duplicate mapping there18:12
mnaseris this just a matter of `if inst_mapping.cell_mapping is not None:` before updating it?18:13
dansmithno18:14
dansmithmriedem: oh, hmm18:19
dansmithmriedem: we look up the mapping and set it to cell0 there and then save, not a create18:20
dansmiththat's why no dupe18:20
dansmithso maybe the cell0 save happens first and then the cell1 one comes in later?18:20
dansmithso, mnaser maybe yes to your above question18:20
*** igordc has quit IRC18:20
dansmithI was forgetting how this worked, that we create the mapping early and then update it later18:20
mriedemmnaser: yes that's what i was thinking18:21
mriedemdansmith: the create happens in the api18:21
mriedematomically with the build request and request spec18:21
mriedemyeah that (sorry was catching up)18:21
mriedemmnaser: so i'd move the code that gets the instance mapping earlier before we create the instance record in cell0, check if instance_mapping.cell_mapping is not None and if not log an error or something and return18:22
mriedemnote we still have a "remove after ocata" thing in that method :)18:22
dansmithmriedem: move the getting of the mapping? I would think you'd want it very tight like it is now so you had the best opportunity to see that it has been set by a competitor18:23
*** igordc has joined #openstack-nova18:26
*** ricolin has quit IRC18:27
*** ricolin has joined #openstack-nova18:27
*** adriant has quit IRC18:28
*** adriant has joined #openstack-nova18:28
*** jistr has quit IRC18:28
*** redrobot has quit IRC18:28
*** jistr has joined #openstack-nova18:28
*** BjoernT has joined #openstack-nova18:28
*** jkulik has quit IRC18:29
*** dansmith has quit IRC18:29
*** dansmith has joined #openstack-nova18:29
*** jkulik has joined #openstack-nova18:30
*** BjoernT_ has joined #openstack-nova18:34
*** BjoernT has quit IRC18:35
mriedemdansmith: as long as it's before we create the instance record in cell018:37
mriedemb/c otherwise you have to roll that back18:38
mriedemanyway, i think the shed should be bronw18:38
mriedem*brown18:38
*** mriedem has quit IRC18:40
*** mriedem has joined #openstack-nova18:40
dansmithmaybe we're talking about different things18:43
mriedemmnaser: were you going to push a patch for this sanity check?18:47
mriedemit'd be easier to discuss in review18:47
*** eharney has joined #openstack-nova18:52
*** igordc has quit IRC19:13
*** ricolin has quit IRC19:16
openstackgerritArtom Lifshitz proposed openstack/nova master: Poison netifaces.interfaces() in tests  https://review.opendev.org/67177319:18
*** igordc has joined #openstack-nova19:19
*** slaweq has quit IRC19:32
mnasermriedem: i was *if* i was talking about the right thing19:37
mnasereven if i pushed the initial patch tbh i dont know if i have enough bandwidth to drive it all the way through review and all right now19:38
mnaserbut i can do the initial if statement before the update there..19:38
mnaserwith a test19:38
openstackgerritArtom Lifshitz proposed openstack/nova master: Rename Claims resources to compute_node  https://review.opendev.org/67947019:40
* mriedem finally got a travis ci build with an encrypted file to work properly, yay19:45
mriedemtravis --com, travis --org, what a mess19:45
mriedemmnaser: that's good enough to start, it shouldn't be a difficult change, i think the complexity is mostly in the test since we want to assert that (1) we fail if the instance mapping already has a cell mapping set in _bury_in_cell0 and (2) we check that before calling instance.create19:46
*** ociuhandu has joined #openstack-nova19:49
*** ociuhandu has quit IRC20:03
*** markvoelker has quit IRC20:05
*** ralonsoh has quit IRC20:05
*** artom has quit IRC20:10
*** tbachman has quit IRC20:15
*** tbachman has joined #openstack-nova20:22
*** KeithMnemonic1 has quit IRC20:23
openstackgerritMatt Riedemann proposed openstack/nova master: Clear instance.launched_on when build fails  https://review.opendev.org/68372520:24
*** CeeMac has quit IRC20:29
*** redrobot has joined #openstack-nova20:34
*** BjoernT has joined #openstack-nova20:37
*** BjoernT_ has quit IRC20:39
*** tbachman has quit IRC20:52
*** tbachman has joined #openstack-nova20:59
mriedemmaybe we should finally fix that bug where nova-compute creates volumes but doesn't name them...21:01
*** _mmethot_ has quit IRC21:05
*** mmethot_ has joined #openstack-nova21:05
openstackgerritMatt Riedemann proposed openstack/nova master: WIP: Sanity check instance mapping in _bury_in_cell0  https://review.opendev.org/68373021:10
mriedemmnaser: dansmith: ^ this is what i was thinking21:11
mriedemit's a bit fuglier than i expected21:11
dansmithmriedem: okay I thought you meant adding it quite a bit earlier21:12
dansmither, moving21:12
mriedemso did i,21:12
mriedembut realized we have a list here21:12
mriedemso it has to go in the loop21:12
dansmithmriedem: the other thing is, we should probably do the same sanity check when we set it for a non cell0 cell,21:12
dansmithbecause if cell0 is faster (likely) then we'll create it there and map it,21:12
dansmiththen remap it to cell121:12
dansmithand we should at least warn that that happened for the forensic value21:13
mriedemyeah - can you leave a comment so i don't forget?21:13
dansmithI mean...if we think this is what is happening21:13
dansmithdone21:14
mriedemdanke21:14
mriedemreplied with a question,21:18
mriedemmostly about leaving 2 copies in different dbs which could mess up listing,21:18
mriedemi guess the warning could just be "uh oh spaghettios this is in cell0 and now it's in cell1 too - we're going to trust the cell1 version but you'll need to cleanup cell0"21:19
dansmithit's going to be in both places anyway, apparently, so mapped appropriately and logged is minimal best case I think21:20
*** liuyulong has quit IRC21:20
dansmithyou could also nuke the cell0 version, but I kinda want to see evidence that this is really happening first21:20
mriedemright i don't want to overengineer this21:21
mriedemanyway, not going to happen today, i'm taking off21:21
*** mriedem is now known as mriedem_away21:21
dansmithack21:21
*** JamesBenson has quit IRC21:29
*** JamesBenson has joined #openstack-nova21:31
*** markvoelker has joined #openstack-nova21:32
*** JamesBenson has quit IRC21:35
*** eharney has quit IRC21:39
*** markvoelker has quit IRC21:42
*** Ben78 has joined #openstack-nova21:44
*** dave-mccowan has quit IRC21:55
mlavalleI am working on a devstack built with stable rockey. I modified and restarted the nova api with the following changes: https://review.opendev.org/#/c/674038 and https://review.opendev.org/#/c/645452. I also modified https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L58 to base.RULE_ADMIN_API or base.SYSTEM_READER22:07
mlavallegranted alt_demo user the reader role with system(all) scope. alt_demo is still unbale to do list server --all-users. what am I missing?22:08
mlavallegranted alt_demo user the reader role with system(all) scope. alt_demo is still unbale to do list server --all-projects. what am I missing?22:09
mlavallelbragstad: ^^^^ any advice?22:09
*** xek has quit IRC22:12
lbragstadmlavalle what kind of token are you using to make the request to nova?22:14
lbragstadmlavalle you could put something like https://pasted.tech/pastes/7d06348fdea072ad4784fa75940c142dc3d63f86.raw in your clouds.yaml and then export OS_CLOUD=devstack-alt-system to ensure you're using a system-scoped token in your request22:19
lbragstadi assume you're using openstackclient22:19
mlavallelbragstad: duuh. that was the problem. I was getting a project scoped token. I just tested with a system scoped token and it worked22:19
lbragstadmlavalle nice22:19
mlavallethanks!22:19
mlavallehave a nice weekend22:19
lbragstadno problem22:20
lbragstadyou, too22:20
*** gbarros has joined #openstack-nova22:29
*** tbachman has quit IRC22:30
*** gbarros has quit IRC22:40
*** tbachman has joined #openstack-nova22:41
*** spatel has quit IRC22:44
*** gbarros has joined #openstack-nova23:08
*** mgoddard has quit IRC23:17
*** mgoddard has joined #openstack-nova23:19
*** avolkov has quit IRC23:20
*** jawad_axd has joined #openstack-nova23:24
*** JamesBenson has joined #openstack-nova23:25
*** jawad_axd has quit IRC23:28
*** JamesBenson has quit IRC23:29
*** markvoelker has joined #openstack-nova23:43
*** markvoelker has quit IRC23:47
*** luksky has quit IRC23:56
*** gbarros has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!