Monday, 2019-09-23

*** gbarros has quit IRC00:59
*** yedongcan has joined #openstack-nova01:19
*** yikun has joined #openstack-nova02:02
*** zhubx has joined #openstack-nova02:04
*** BjoernT has joined #openstack-nova02:22
*** BjoernT_ has joined #openstack-nova02:26
*** BjoernT has quit IRC02:26
*** dklyle has quit IRC02:33
*** JamesBenson has joined #openstack-nova02:35
openstackgerritFan Zhang proposed openstack/nova master: Fix exception translation when creating volume  https://review.opendev.org/67899102:40
*** BjoernT_ has quit IRC02:42
*** BjoernT has joined #openstack-nova02:45
*** dklyle has joined #openstack-nova02:48
*** larainema has joined #openstack-nova03:00
*** mkrai has joined #openstack-nova03:19
*** ricolin has joined #openstack-nova03:20
*** dklyle has quit IRC03:20
*** psachin has joined #openstack-nova03:27
*** markvoelker has joined #openstack-nova03:54
*** markvoelker has quit IRC03:58
*** sapd1_x has joined #openstack-nova03:59
*** BjoernT has quit IRC04:06
*** BjoernT has joined #openstack-nova04:14
*** JamesBenson has quit IRC04:31
*** brault has quit IRC04:43
*** BjoernT has quit IRC04:46
*** dave-mccowan has quit IRC04:48
openstackgerritwangyue proposed openstack/nova master: Returns the max disk/cdrom unit used by scsi controller  https://review.opendev.org/68384504:56
*** slaweq has joined #openstack-nova04:57
*** ircuser-1 has quit IRC04:58
*** Luzi has joined #openstack-nova05:00
*** slaweq has quit IRC05:05
*** ratailor has joined #openstack-nova05:10
*** slaweq has joined #openstack-nova05:10
*** sapd1_x has quit IRC05:19
*** macz has joined #openstack-nova05:26
*** slaweq has quit IRC05:28
*** slaweq has joined #openstack-nova05:30
*** macz has quit IRC05:31
openstackgerrithulina proposed openstack/nova master: Nova raise exceptions when extending volume fails  https://review.opendev.org/68064805:42
*** jaosorior has joined #openstack-nova05:43
*** brault has joined #openstack-nova05:49
*** dpawlik has joined #openstack-nova06:00
*** luksky has joined #openstack-nova06:17
*** brault has quit IRC06:17
*** yikun has quit IRC06:21
*** luksky has quit IRC06:23
*** TxGirlGeek has joined #openstack-nova06:23
*** maciejjozefczyk has joined #openstack-nova06:26
*** slaweq has quit IRC06:27
openstackgerritChason Chan proposed openstack/nova master: Note the ``hw_numa_nodes`` image property  https://review.opendev.org/68384906:29
*** maciejjozefczyk has quit IRC06:30
*** slaweq has joined #openstack-nova06:34
*** rpittau|afk is now known as rpittau06:40
*** brault has joined #openstack-nova06:41
*** brault has quit IRC06:41
*** brault has joined #openstack-nova06:41
*** TxGirlGeek has quit IRC06:42
*** damien_r has joined #openstack-nova06:59
*** tesseract has joined #openstack-nova07:00
*** rcernin has quit IRC07:04
*** slaweq has quit IRC07:04
*** slaweq has joined #openstack-nova07:06
*** ivve has joined #openstack-nova07:10
*** cshen has joined #openstack-nova07:14
*** ccamacho has joined #openstack-nova07:16
*** xek has joined #openstack-nova07:17
*** pcaruana has joined #openstack-nova07:41
*** eharney has joined #openstack-nova07:44
*** slaweq has quit IRC07:47
*** zhubx has quit IRC07:54
*** boxiang has joined #openstack-nova07:55
*** ralonsoh has joined #openstack-nova07:59
*** toabctl has joined #openstack-nova08:02
*** brault has quit IRC08:21
openstackgerritpengyuesheng proposed openstack/os-resource-classes master: Update the constraints url  https://review.opendev.org/68387208:25
openstackgerritpengyuesheng proposed openstack/os-vif master: Update the constraints url  https://review.opendev.org/68387308:27
openstackgerritBalazs Gibizer proposed openstack/nova master: Move HostNameWeigher to a common fixture  https://review.opendev.org/68387408:28
*** jangutter has joined #openstack-nova08:31
*** brault has joined #openstack-nova08:32
openstackgerritLee Yarwood proposed openstack/nova master: compute: Remove stale BDMs on reserve_block_device_name failure  https://review.opendev.org/68259408:38
*** rcernin has joined #openstack-nova08:45
*** lpetrut has joined #openstack-nova08:48
*** derekh has joined #openstack-nova08:49
*** markvoelker has joined #openstack-nova08:57
*** CeeMac has joined #openstack-nova08:58
*** markvoelker has quit IRC09:02
*** dpawlik has quit IRC09:04
*** dpawlik has joined #openstack-nova09:04
*** ociuhandu has joined #openstack-nova09:05
*** martinkennelly has joined #openstack-nova09:05
*** dtantsur|afk is now known as dtantsur09:07
openstackgerritSilvan Kaiser proposed openstack/nova stable/stein: Exec systemd-run without --user flag in Quobyte driver  https://review.opendev.org/66070509:09
*** yikun has joined #openstack-nova09:23
openstackgerritStephen Finucane proposed openstack/nova master: docs: Note use of 'nova-manage db sync --config-file'  https://review.opendev.org/67129809:36
openstackgerritStephen Finucane proposed openstack/nova master: docs: Correct 'nova-manage db sync' documentation  https://review.opendev.org/67750809:36
openstackgerritStephen Finucane proposed openstack/nova master: docs: Document global options for nova-manage  https://review.opendev.org/67744309:36
openstackgerritStephen Finucane proposed openstack/nova master: config: Explicitly register 'remote_debug' CLI opts  https://review.opendev.org/67744409:36
openstackgerritStephen Finucane proposed openstack/nova master: WIP: docs: Rewrite nova-manage docs to use proper directives  https://review.opendev.org/67750909:36
*** dpawlik has quit IRC09:37
stephenfinbauzas: Think you could send this trivial patch through? https://review.opendev.org/#/c/676898/09:37
bauzasmorning :)09:37
*** dpawlik has joined #openstack-nova09:38
bauzasand done09:38
bauzasstephenfin: congrats for your rugby team btw., nice play ;)09:38
stephenfinHeh, thanks :)09:39
openstackgerritStephen Finucane proposed openstack/nova master: docs: Rewrite host aggregate, availability zone docs  https://review.opendev.org/66713309:39
*** awalende has joined #openstack-nova09:39
*** lpetrut has quit IRC09:45
*** lpetrut has joined #openstack-nova09:48
*** ricolin has quit IRC09:48
kashyapaspiers: Hey, if you're back ... I'm feeling stupid ... you know why I'm getting the 'secure' enum value _three_ times here? -- http://paste.openstack.org/show/778857/09:51
aspierso/09:52
kashyapFull "self-contained" WIP script in the pastebin :D09:53
aspiersImportError: No module named libvirt09:55
aspiersI haven't woken up yet, help me09:55
kashyapaspiers: Err, I lied.09:55
aspiersthe tox envs don't have libvirt?09:56
kashyapaspiers: For it to be "self-contained", I ran it from my Nova 'tox' directory sourced.09:56
kashyapaspiers: Yes, that's correct09:56
aspierswhich tox env?09:56
kashyapaspiers: (Missing libvirt from your 'tox' env)09:56
aspiersThere are many09:56
aspiersI've tried py27 and functional09:56
kashyapHm, I'm using py2709:57
aspierswhat's the path to your libvirt module?09:58
aspierspip install libvirt doesn't even work09:58
openstackgerritStephen Finucane proposed openstack/nova master: Handle libvirt reporting incorrect 4k page quantities  https://review.opendev.org/63103809:59
openstackgerritStephen Finucane proposed openstack/nova master: Make overcommit check for pinned instance pagesize aware  https://review.opendev.org/63105309:59
*** ociuhandu has quit IRC09:59
*** ociuhandu has joined #openstack-nova10:00
kashyapaspiers: Err, the module is called: 'libvirt-python'10:01
kashyapaspiers: The location of it is: `/home/kashyapc/.virtualenvs/nova/lib/python2.7/site-packages`10:01
kashyap(In my 'tox' env, obv)10:01
aspiersYeah, you must have installed it yourself10:01
aspiersIt's not in requirements.txt because nova doesn't have a hard requirement on it I guess10:01
kashyapYes, installed it myself; and indeed Nova doesn't have a hard req. on it10:02
*** cdent has joined #openstack-nova10:02
aspiersI guess I need to check out one of your branches?10:02
aspiersgit review -d ... ?10:03
cdentaspiers: I've given you days and days to respond to my response that you requested. /me stamps foot :)10:04
aspierscdent: sorry, can't even remember what that was about - been on holiday :-/10:05
kashyapaspiers: Not really; I'm just parsing existing domain_caps data10:05
cdentaspiers: :) me too. It was about openstack/opensource. Something in the tc election related threads10:05
aspierskashyap: AttributeError: 'LibvirtConfigDomainCaps' object has no attribute '_os'10:05
aspierscdent: rings a vague bell :-o10:06
*** ociuhandu has quit IRC10:06
kashyapaspiers: Bad me, you were right, get this one, please: https://review.opendev.org/#/c/673790/10:06
kashyapaspiers: Okay ... I see it, was blind, it's due to the damn 'for' loop, for _each_ of the enums (readonly, type, secure), it is printing the value of 'secure'10:10
aspiersyeah10:11
* aspiers wonders how much to charge for being a rubber duck10:12
kashyapHehe; top it with a drink10:12
aspierskashyap: you may eventually want to use https://review.opendev.org/#/c/680777/6/tempest/scenario/test_server_sev.py as a basis for a SB tempest test10:14
* kashyap clicks10:14
kashyapaspiers: Ah, thank you10:14
kashyapaspiers: In your "copious free time", can you also please review these "sketch" methods (no tests yet, the first method is broken) for detecting SB: https://review.opendev.org/#/c/682627/1/nova/virt/libvirt/host.py10:17
kashyapaspiers: You'll see that they're "stolen" from how SEV detected (although, it requires check in the kernel support & host)10:17
kashyap(Note to self: maybe move these methods out from host.py --> to the grab-bag-of-all utils.py)10:17
kashyapaspiers: The commit message explains the design, with an example, even (of firmware auto-selection in action)10:18
*** brault has quit IRC10:36
*** brault has joined #openstack-nova10:39
*** ociuhandu has joined #openstack-nova10:41
*** brault has quit IRC10:44
*** ociuhandu has quit IRC10:46
*** AdamMork has joined #openstack-nova10:46
*** brault has joined #openstack-nova10:49
*** ociuhandu has joined #openstack-nova10:50
*** brault has quit IRC10:53
*** ociuhandu has quit IRC10:54
*** dpawlik has quit IRC10:57
*** jaosorior has quit IRC10:58
*** zhubx has joined #openstack-nova11:01
openstackgerritMerged openstack/os-traits master: Add support for ppc64le platforms  https://review.opendev.org/68058011:02
*** dpawlik has joined #openstack-nova11:04
*** boxiang has quit IRC11:04
*** dpawlik has quit IRC11:09
*** dpawlik has joined #openstack-nova11:14
*** ociuhandu has joined #openstack-nova11:19
*** cdent has quit IRC11:19
*** ociuhandu has quit IRC11:23
*** ccamacho has quit IRC11:24
openstackgerritMatthew Booth proposed openstack/nova stable/stein: libvirt: Fix service-wide pauses caused by un-proxied libvirt calls  https://review.opendev.org/68392211:31
*** sean-k-mooney has joined #openstack-nova11:34
*** cdent has joined #openstack-nova11:38
openstackgerritMatthew Booth proposed openstack/nova stable/rocky: libvirt: Fix service-wide pauses caused by un-proxied libvirt calls  https://review.opendev.org/68392711:44
*** zul has joined #openstack-nova11:47
*** rcernin has quit IRC11:49
openstackgerritMatthew Booth proposed openstack/nova stable/queens: libvirt: Fix service-wide pauses caused by un-proxied libvirt calls  https://review.opendev.org/68393011:50
*** ociuhandu has joined #openstack-nova11:57
*** ociuhandu has quit IRC11:58
*** ociuhandu has joined #openstack-nova11:59
*** markvoelker has joined #openstack-nova12:03
*** brault has joined #openstack-nova12:20
*** brault has quit IRC12:22
*** brault has joined #openstack-nova12:22
openstackgerritMerged openstack/nova master: Rename 'nova.common.config' module to 'nova.middleware'  https://review.opendev.org/67689812:26
*** rcernin has joined #openstack-nova12:33
*** ratailor has quit IRC12:35
*** lbragstad_ is now known as lbragstad12:39
*** psachin has quit IRC12:39
openstackgerritBalazs Gibizer proposed openstack/nova master: Functional reproduction for bug 1844993  https://review.opendev.org/68394712:45
openstackbug 1844993 in OpenStack Compute (nova) "migrate a server with qos port with compute RPC pinned to 5.1 fails and leaves the qos port in an inconsistent state" [Undecided,New] https://launchpad.net/bugs/1844993 - Assigned to Balazs Gibizer (balazs-gibizer)12:45
*** brault has quit IRC12:45
openstackgerritBalazs Gibizer proposed openstack/nova master: Reject migration with QoS port from conductor if RPC pinned  https://review.opendev.org/68394812:46
*** nweinber has joined #openstack-nova12:53
*** ricolin has joined #openstack-nova12:59
*** francoisp has joined #openstack-nova13:01
*** larainema has quit IRC13:01
*** mloza has joined #openstack-nova13:02
*** gbarros has joined #openstack-nova13:02
*** efried_pto is now known as efried13:03
*** rouk has joined #openstack-nova13:04
efriedo/ nova13:06
gibio/ efried13:09
*** ociuhandu has quit IRC13:10
*** ociuhandu has joined #openstack-nova13:10
*** jaosorior has joined #openstack-nova13:11
*** Luzi has quit IRC13:14
*** mriedem has joined #openstack-nova13:16
*** ociuhandu has quit IRC13:17
sean-k-mooneyefried: o/ have a good weekend13:17
*** mkrai has quit IRC13:18
efriedyeah, not bad, did much laziness.13:18
*** artom has joined #openstack-nova13:19
sean-k-mooneygood. stephenfin's cpu patches finally merged on saturday too13:19
sean-k-mooneyso thats a thing13:19
efriedyeah, I was watching all of that, need to go update blueprints if that hasn't already been done...13:19
openstackgerritStephen Finucane proposed openstack/nova master: Recalculate 'RequestSpec.numa_topology' on resize  https://review.opendev.org/66252213:20
openstackgerritStephen Finucane proposed openstack/nova master: tests: Cleanup of '_test_resize' helper test  https://review.opendev.org/66424513:20
openstackgerritStephen Finucane proposed openstack/nova master: tests: Add '_setup_compute_services' helper  https://review.opendev.org/66310213:20
efrieddone13:20
sean-k-mooneysame, but it ment on sunday i coudl actully relax and not worry about them13:21
bauzasmriedem: I guess we can close https://bugs.launchpad.net/nova/+bug/1427772, right?13:21
openstackLaunchpad bug 1427772 in OpenStack Compute (nova) "Instance that uses force-host still needs to run some filters" [Low,Confirmed]13:21
*** mkrai has joined #openstack-nova13:21
bauzasmriedem: because 1/ we no longer accept to force live-migrations13:21
bauzas2/ we removed the CachingScheduler etc.13:21
sean-k-mooneybauzas: we still accept force for evaucation and maybe resize/cold migrate13:22
bauzaswell, good point13:23
sean-k-mooneyim not sure if that affect the bug or not13:23
stephenfinbauzas: It can be closed because it's possible to request a specific host without bypassing the scheduler filters13:23
*** BjoernT has joined #openstack-nova13:23
sean-k-mooneywell at least the numa related stuff is going to be adress seperatly13:24
*** beekneemech is now known as bnemec13:24
stephenfinbauzas: Specifically, blueprint add-host-and-hypervisor-hostname-flag-to-create-server13:24
sean-k-mooneystephenfin: that wont prevent people form using the old way with --force13:25
sean-k-mooneyusing the az13:25
stephenfinsean-k-mooney: then the answer is "you're holding it wrong"13:25
*** brault has joined #openstack-nova13:26
*** brault has quit IRC13:26
mriedemosc by default forces live migrations to the specified host13:26
sean-k-mooneyi guess we could close it because we have a new feature to replace it13:26
sean-k-mooneyyes13:26
sean-k-mooneywell13:26
sean-k-mooneyyou have to spcify a host13:26
mriedemgiven the age of that bug i'd say screw it, close it13:26
sean-k-mooneyi dont know if it forces it13:26
stephenfinwe can't remove the old way and the new way was specifically added to avoid this issue, so the answer is surely use the new feature13:26
mriedemsean-k-mooney: yes it does13:26
mriedemosc by default (1) requires you specify a host and (2) defaults to 2.113:27
mriedemwhich by default forces the host13:27
sean-k-mooney:(13:27
sean-k-mooneyok i know 1 but not 213:27
mriedemthat is deprecated in osc 4.0 with something i added in train13:27
*** ociuhandu has joined #openstack-nova13:27
mriedemhttps://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-migrate13:27
mriedem--live <hostname> is deprecated13:28
mriedembut for rhosp customers using queens that's going to be an open issue for a long time13:28
sean-k-mooneyunless we backport it13:28
bauzasanyway, nevermind13:28
sean-k-mooneythat said i dont know if we can do that13:28
stephenfinwe'll backport it, I imagine13:28
mriedemwe == red hat13:29
* bauzas doing some internal bug triage :(13:29
sean-k-mooneywell we cant backprot the api change downstream13:29
mriedemthe api change is in queens, that's not your problem13:29
mriedemit's the client side tooling13:29
stephenfinwell you were talking about rhosp, so yes, clearly Red Hat13:29
sean-k-mooneyoh ok13:29
sean-k-mooneyi guess we should add it to our backlog to consider. the question is can we do it without breaking people13:30
* bauzas goes back working on https://review.opendev.org/#/c/670112/13:30
sean-k-mooneyif we cant we cant backport13:30
sean-k-mooney that said just adding  --live-migration shoudl be fine13:31
stephenfinefried, gibi, (dansmith): Interesting issue here https://review.opendev.org/#/c/663102/11/nova/tests/functional/libvirt/test_numa_servers.py@106013:31
stephenfinefried, gibi, (dansmith): As the comment suggests, that test is working by creating two hosts, one of which doesn't have the second node to which a given physnet is associated https://review.opendev.org/#/c/663102/11/nova/tests/functional/libvirt/test_numa_servers.py@85513:32
*** ociuhandu has quit IRC13:33
stephenfinand we're doing that because it's not possible to have different configuration for different compute "services" in functional tests13:33
stephenfinhowever, there's a check that prevents us doing just this (configured a physnet for a NUMA node that doesn't exist) that the test simply wasn't triggering13:33
stephenfinbecause it's only hit as part of the calculation of the host NUMA topology object, which happens in the 'update_available_resource' periodic task, which our test wouldn't normally have time to trigger13:34
*** gbarros has quit IRC13:35
efriednot having looked yet, just reacting to ^, we have a way to trigger that periodic13:35
efriedbut it's also triggered as part of instance creation fwiw13:35
openstackgerritBalazs Gibizer proposed openstack/nova master: Error out interrupted builds  https://review.opendev.org/66685713:36
stephenfinit seems to be triggered when we query against the placement fixture https://review.opendev.org/#/c/663102/11/nova/tests/functional/libvirt/test_numa_servers.py@8213:36
stephenfinand that's the issue - by adding that query, we end up in a situation when the compute service for that falls over13:36
*** BjoernT has quit IRC13:36
*** dpawlik has quit IRC13:37
stephenfinaaactually, maybe it was falling over already and we just never thought to check for that13:37
openstackgerritBalazs Gibizer proposed openstack/nova master: Pull up compute node queries to init_host  https://review.opendev.org/68268013:37
stephenfinwhich is the real reason we're getting NoValidHost - not because the compute service can't support the request but because it never actually started13:37
* stephenfin deletes that test13:38
sean-k-mooneystephenfin: you could check that by looking at the comptue service logs13:38
*** BjoernT has joined #openstack-nova13:39
stephenfinthat's what I'm doing as we speak13:39
stephenfin:)13:39
stephenfinyup, it never even started \o/13:40
sean-k-mooneyi dont know the specific incantation to do that in the functional test but i have seen mriedem and gibi asserting log messages are emmited in the functional test form different agent in the past13:40
* gibi was slow, running that test13:40
mriedemself.assertIn('whatever', self.stdlog.logger.output)13:41
gibisean-k-mooney: you can access the logs in from the logger fixture13:41
gibisean-k-mooney: self.stdlog.logger.output13:41
gibisean-k-mooney: but it is not per service13:41
sean-k-mooneycool ya its the self.stdlog.logger.output bit that i had seen but not used personally in the past13:41
*** dklyle has joined #openstack-nova13:44
*** rcernin has quit IRC13:45
gibimriedem: do you think this qualifies as a rc candidate https://bugs.launchpad.net/nova/+bug/1844993 ?13:46
openstackLaunchpad bug 1844993 in OpenStack Compute (nova) "migrate a server with qos port with compute RPC pinned to 5.1 fails and leaves the qos port in an inconsistent state" [Undecided,In progress] - Assigned to Balazs Gibizer (balazs-gibizer)13:46
openstackgerritStephen Finucane proposed openstack/nova master: Remove 'test_cold_migrate_with_physnet_fails' test  https://review.opendev.org/68396113:47
stephenfingibi, sean-k-mooney: ^ (mriedem too, maybe)13:47
gibistephenfin: looking13:47
efriedgibi: was that a regression in train?13:47
efried(the qos one)13:48
gibiefried: not, it is a bug in the bandwith + migration code we merged to Train13:48
*** jangutter has quit IRC13:48
*** jangutter_ has joined #openstack-nova13:48
mriedemgibi: technically no because it was a regression in stein13:48
mriedemgibi: but it would be good to get it fixed in train regardless13:48
mriedemgibi: oh i guess that's a side effect of the other bug right?13:49
*** dklyle has quit IRC13:50
*** tbachman has joined #openstack-nova13:50
sean-k-mooneystephenfin: so the fact we cant confiugre different compute nodes with different config has come up afew times recently13:50
*** cdent has quit IRC13:50
sean-k-mooneywe could remove that test but should we file a bug to add support for that in the future13:50
gibimriedem: your migrate bug is visible if the rpc is pinned to 5.013:50
gibimriedem: the qos migrate bug is visible even if the rpc is pinned to 5.113:51
sean-k-mooneystephenfin: i think there has to be a way to use mocking to make the CONF non global13:51
sean-k-mooneyand in effect then allow us to have different confics in different agents althoug set_flags likely wont work in that context13:51
mriedemgibi: right so my bug, introduced in stein, is a problem if computes are pinned to rocky (5.0) and yours if computes are pinned to stein (5.1)13:52
mriedemgibi: then yeah it's probably worth tagging for rc113:52
gibimriedem: OK, then we are on the same page13:53
*** redrobot has quit IRC13:53
*** jangutter_ has quit IRC13:54
gibimriedem: I will respin the fix based on your rpc api method suggestion. I haven't seen the live migration way yet13:55
mriedemit's just a simple abstraction so the caller doesn't need to know the router.client or version internals13:56
gibistephenfin: theortically you can go and mock things via self.compute['test_compute1'].manager.driver and that will be service and driver selective13:57
*** JamesBenson has joined #openstack-nova13:59
*** ociuhandu has joined #openstack-nova14:00
*** JamesBenson has quit IRC14:01
*** JamesBenson has joined #openstack-nova14:01
*** slaweq has joined #openstack-nova14:01
*** ociuhandu has quit IRC14:04
*** Guest30550 has joined #openstack-nova14:05
*** macz has joined #openstack-nova14:05
openstackgerritBalazs Gibizer proposed openstack/nova master: Reject migration with QoS port from conductor if RPC pinned  https://review.opendev.org/68394814:07
*** brault has joined #openstack-nova14:08
*** brault has quit IRC14:08
*** Guest30550 is now known as redrobot14:08
*** brault has joined #openstack-nova14:09
*** brault has quit IRC14:09
*** BjoernT has quit IRC14:09
*** BjoernT has joined #openstack-nova14:13
mriedemoh god schedule_and_build_instances is too big, have to write 100 LOC test case just to test one new line of code14:25
*** gbarros has joined #openstack-nova14:27
*** dklyle has joined #openstack-nova14:31
*** TxGirlGeek has joined #openstack-nova14:35
*** macz has quit IRC14:37
*** mlavalle has joined #openstack-nova14:42
kashyapstephenfin: Hey, about?14:43
kashyapstephenfin: When you are --14:43
stephenfinon a call but yeah14:43
kashyapstephenfin: How much of this is possible  _today_ in Nova?  https://kashyapc.fedorapeople.org/NUMA-pinning.txt14:43
kashyapstephenfin: Sure, respond when you can14:43
stephenfin1 and 214:44
kashyapstephenfin: Context is a QEMU dev was asking about it.14:44
stephenfinwe don't provide a way to configure IOThreads14:44
kashyapYeah, I see from the code that we don't do anything for IOThreads14:45
efriedmriedem: what would it take to split tempest-slow-py3 into two parts?14:45
stephenfinand I don't know what a PXB device is so let's assume not for that too14:45
*** JamesBenson has quit IRC14:46
kashyapstephenfin: For that, `grep` for 'pci-expander-bus' here: https://libvirt.org/formatdomain.html14:46
mriedemefried: why?14:46
efriedmriedem: That would make a run take ~1.5h rather than ~2.5h14:47
efriedthat last hour is always just waiting for tempest-slow-py3 to finish.14:47
openstackgerritMerged openstack/python-novaclient master: Update master for stable/train  https://review.opendev.org/68362714:47
kashyapstephenfin: So the suggested IOThreads 'formula' for management tools is, _assuming_ we know the required info here:14:47
mriedemefried: what do you mean the last hour is waiting for the job to finish?14:47
mriedemfor tempest to finish?14:47
mriedemor to collect logs?14:47
kashyapstephenfin:    num_iothreads = min(num_devices, num_vcpus, num_host_cpus)14:48
kashyapThanks for the answer so far14:48
efriedmriedem: I mean that for a given patch, all the other jobs take <=1.5h to complete, but tempest-slow-py3 takes ~2.5h. Trying to reduce the wait time for a single patch to make it through CI.14:49
openstackgerritMatt Riedemann proposed openstack/nova master: Sanity check instance mapping during scheduling  https://review.opendev.org/68373014:50
mriedemefried: well, there are a few options. one is to go the way of tempest-integrated-compute and make a compute-specific job that only runs compute tests we care about (sort of like nova-next) but only runs slow tests (which tempest-integrated-compute does not),14:50
mriedemthe other major difference is tempest-slow* is a multinode job where nova-next and tempest-integrated-compute are single node14:51
*** ociuhandu has joined #openstack-nova14:51
mriedemthe multinode + slow combo is important to run some tests that we don't otherwise run14:51
mriedemone idea is just drop tempest-slow-py3 from nova's job list and make nova-next multi-node14:51
mriedemnova-next runs compute api and scenario tests, including slow14:51
sean-k-mooneyone factor for multinode jobs is it only starts stacking the second node after the first completes14:52
*** cdent has joined #openstack-nova14:52
efriedI'm sure I'm thinking of it way too simplistically, but what I was thinking was just, like, if tempest-slow-py3 currently runs 10 tests, make a tempest-slow-py3-1 that runs the first 5 and a tempest-slow-py3-2 that runs the second 5.14:52
efriedi.e. one additional job, but since it runs in parallel, reduces the overall run time in half.14:53
mriedemefried: that would be a pain in the ass to manage from a tempest perspective i'd think14:53
mriedemi.e. managing what gets run per job14:53
sean-k-mooneyefried: most of the time is spend on devstack14:53
sean-k-mooneyefried: so you would not reduce the time by half14:53
efriedmmph14:53
mriedemalso note that the scenario tests are run in serial14:53
sean-k-mooneywell that might not be quite true the slow test are slow14:53
mriedemwhich i tried to do something about but failed14:53
sean-k-mooneybut devstack is still a signifcant portion14:54
efried"tempest_concurrency":214:54
mriedemhttps://review.opendev.org/#/c/650300/14:54
mriedemno,14:54
sean-k-mooneyefried: we need to run them serially14:54
mriedembecause you want the api tests running at default concurrency, which is i think nproc/214:54
sean-k-mooneythey can fail due to resouce constratins if we dont14:54
mriedemsean-k-mooney: we don't *need* to14:54
efriedSo the above line in the job def is... not doing anything?14:54
efriedhttp://zuul.openstack.org/job/tempest-slow14:55
sean-k-mooneymriedem: we used to get intermitent failture when we didnt right?14:55
mriedemoh geez i didn't realize tempest-slow ran *everything* with only 2 workers14:55
mriedemsean-k-mooney: yes but that was more about ssh issues14:55
mriedemwhich might have been resolved by now14:55
mriedembecause in the long ago, only the scenario tests ran ssh14:56
sean-k-mooneymriedem: oh ok14:56
mriedembut tempest-full has been running with ssh integration on for years now14:56
mriedemefried: so yeah that is definitely one reason that tempest-slow is slower, it's constrained to 2 workers14:56
sean-k-mooneythe scenario tests still use ssh14:57
mriedemsean-k-mooney: yes i know14:57
sean-k-mooneybut unfrotuetly some api tests also do...14:57
mriedemsean-k-mooney: i'm pretty sure ^ is intentional by the QA team14:57
*** ociuhandu has quit IRC14:57
*** awalende has quit IRC14:58
sean-k-mooneydoing it via the connectivyt check is fine. but the api test were ment to work with any hyperviors and should work with the fake driver14:58
sean-k-mooneybut i know that distinction has kind of faded over the years14:58
mriedemnova-next runs with full concurrency, api and scenario, and we don't have issues from that15:00
mriedemhttps://zuul.opendev.org/t/openstack/build/21573b9826664ec8a456f4e3007a91c4/log/job-output.txt#3163315:00
sean-k-mooneydont we run the scenario tests with serially as a seperate step15:00
mriedemRan: 579 tests in 2771.6131 sec.15:01
sean-k-mooney we do that in one of the jobs15:01
mriedemsean-k-mooney: that's what the full env does15:01
mriedemtempest-integrated-compute15:01
mriedemhttps://github.com/openstack/tempest/blob/master/tox.ini#L10615:01
efriedmriedem: if we made nova-next multinode, we could drop tempest-slow-py3, which would reduce the number of CI nodes consumed by one... but would likely inflate nova-next to 2.5h or more, wouldn't it?15:02
sean-k-mooneyok ya15:02
sean-k-mooneythat is why we dont have issue with concurance 415:02
mriedemefried: nova-next would still be faster than tempest-slow i think b/c we'd be avoiding non-compute tests,15:02
mriedembut it's hard to say without just proposing15:02
mriedemefried: and we'd be running nova-next with 4 test workers rather than 215:02
sean-k-mooneyi did a multinoe tempest full fun for one of the cpu pinning jobs15:03
sean-k-mooneyit was just aboud 2 hours15:03
mriedemtempest-integrated-compute SUCCESS in 1h 38m 29s15:04
mriedemi'd say if we can stay within a reasonable comparison time-wise to tempest-integrated-compute it's a win15:04
*** ociuhandu has joined #openstack-nova15:04
mriedemso make nova-next multinode and drop tempest-slow-py3 from nova runs15:04
*** ociuhandu has quit IRC15:04
sean-k-mooneyoh sorry tempest-full with concurrance:1 is just about 2 hours15:05
*** ociuhandu has joined #openstack-nova15:05
*** pcaruana has quit IRC15:05
sean-k-mooneyso with concurrance:4 it should be closer to 1.5h  as you suggested15:06
*** eharney has quit IRC15:06
*** ociuhandu has quit IRC15:08
mriedemstephenfin: was there any reason why you didn't remove this cellsv1 mention here? https://github.com/openstack/nova/blob/5a1c2d4ffa0815e874f373a87eb38b1833d03b24/nova/conductor/manager.py#L56715:08
*** ociuhandu has joined #openstack-nova15:08
*** udesale has joined #openstack-nova15:09
efriedsean-k-mooney: do you understand mriedem's suggestion enough to propose... whatever change(s) are necessary?15:09
stephenfinmriedem: I started doing it but got stuck because of highly coupled tests https://review.opendev.org/#/c/651316/2/nova/conductor/manager.py I'd prefer to leave it until that patch is finished15:10
stephenfinOr at least put in a TODO to remove all the cells v1 stuff15:10
mriedemstephenfin: that patch is wrong anyway15:11
mriedemyou can't just drop compat code w/o a major rpc version bump15:11
*** cfriesen has joined #openstack-nova15:12
*** ociuhandu has quit IRC15:13
stephenfinYeah, I don't get why we do that. I get that we can't change the signature but can't we start crapping out if a too-old client calls us?15:13
stephenfini.e. 'if request_spec if None: raise Exception('too old')'15:14
mriedemi'll let dansmith answer that one15:14
cdentif cdent: raise Exception('too old')15:14
*** TxGirlGeek has quit IRC15:14
mriedemi ask b/c i want to drop ocata-era error handling in _populate_instance_mapping which would only be hit in the cells v1 case, which is no longer possible15:15
dansmithstephenfin: mriedem: Of course we *can* but we're breaking the contract/rules of that 5.x means vs 6.x. So yeah, we could go all wild-west and just deprecate minor versions by raising random exceptions in the receiving code,15:15
dansmithbut then it becomes hard to reason about when we can remove things and what the impacts will be15:15
dansmiththe version numbers and rules are there to make it easier for the humans to know what falls into what bucket, IMHO15:16
mriedemi know, i just pinged you since you're better with the words on this15:17
mriedem"we can but we shouldn't"15:17
dansmithokay15:17
mriedembad habits etc15:17
stephenfinWould bumping the major version each release be too expensive?15:17
mriedemconductor doesn't need to change that often15:18
dansmithit used to be done almost every release15:18
dansmithbut we don't change that much anymore,15:18
mriedembut we are definitely due for a conductor comptue task api 2.015:18
dansmithso yeah, I think it's too expensive15:18
mriedemb/c there is a lot of old shit in here15:18
* dansmith nods15:18
stephenfinmaybe that's the solution so15:18
mriedemonce we go to 2.0 that _populate_instance_mapping just gets dropped15:18
stephenfinI can't remove that stuff yet but I can go to 2.0 and then drop that stuff15:19
openstackgerritMerged openstack/os-resource-classes master: Update the constraints url  https://review.opendev.org/68387215:19
stephenfinmriedem: While I have you - can we remove the os-networks entirely when we drop nova-net?15:19
*** ociuhandu has joined #openstack-nova15:19
*** lpetrut has quit IRC15:20
mriedemi had an etherpad with notes about the more complicated apis that involved networks b/c they aren't all nova-net only anymore15:20
stephenfinIt seems some of the APIs work with neutron but most don't, and I'm trying to decide if we should selectively 404 them or 404 everything15:20
mriedemthat's the issue15:20
mriedemif there are apis that work with neutron we can't just 410 those15:20
stephenfin410, sorry15:20
stephenfinyeah15:20
*** ociuhandu has quit IRC15:21
stephenfinI was afraid you'd say that :( Time to rework again15:22
stephenfinAny idea where that old etherpad is?15:22
mriedembtw this is the last time we did an rpc api major version bump https://review.opendev.org/#/c/541005/15:22
*** ociuhandu has joined #openstack-nova15:22
mriedemhttps://etherpad.openstack.org/p/nova-network-removal-rocky15:22
*** KeithMnemonic has joined #openstack-nova15:22
stephenfinI've a rough idea from converting all the API sample functional tests over but maybe I've missed some stuff15:22
*** mrch_ has quit IRC15:24
stephenfingibi: Just in case, you haven't tried adding floating IP stuff to NeutronFixture or some subclass, have you?15:25
gibistephenfin: let me check some notification tests15:25
*** dtantsur is now known as dtantsur|afk15:26
* stephenfin is trying to convert 'nova/tests/functional/api_sample_tests/test_floating_ips.py' and getting complaints about things like 'create_floatingip' not being defined15:26
stephenfingibi: It would be local if so. We don't have it in tree15:26
stephenfinat least searching for 'create_floatingip' doesn't turn up anything but mocks/the actual call15:26
gibistephenfin: unfortunately no, the IpPayload only contains fixed ips15:28
stephenfindamn15:28
stephenfinthanks for checking15:28
gibinp15:28
*** dklyle has quit IRC15:36
*** dklyle has joined #openstack-nova15:36
openstackgerritMatt Riedemann proposed openstack/nova master: Make nova-next multinode and drop tempest-slow-py3  https://review.opendev.org/68398815:37
mriedemefried: i think this is what you're looking for ^15:37
* efried watches15:37
efriedthanks mriedem15:37
*** damien_r has quit IRC15:46
dansmithmriedem: so on your remap sanity check patch,15:48
dansmithmriedem: it looks to me like you're now letting whichever one comes last win, right?15:48
dansmithmriedem: really what we want is to let the non-cell0 one win, if possible15:49
mriedemdansmith: in the bury in cell0 case, if it's already mapped we don't bury it in cell0,15:49
mriedemin the non-cell0 case, we log an error and map it to the cell we just scheduled it to15:50
mriedemwhich would not be cell015:50
dansmithmriedem: oh I see, you created a helper but only call it in the non-cell0 case15:50
mriedemi think those are both letting the non-cell0 ones win15:50
mriedemyes15:50
dansmithI assumed you were calling it from both places15:50
dansmithokay15:50
mriedemb/c i didn't want to have to mock 100 LOC just to test 1 line change in schedule_and_build_instances15:50
dansmithsounds like cheating15:50
mriedemschedule_and_build_instances is a monster15:51
*** rpittau is now known as rpittau|afk15:52
*** TxGirlGeek has joined #openstack-nova15:52
mriedemi also noted that if we drop that InstanceMappingNotFound pre-ocata compat in _bury_in_cell0 the _bury_in_cell0 method can just call the new _map_instance_to_cell method15:52
mriedemwe have a bunch of pre-ocata pre-cells v2 compat handlers all over the conductor task manager code which it'd be nice to remove15:52
mriedemsince they shouldn't be possible anymore15:52
dansmithmeh, I love piles of compat code15:53
dansmithmakes me feel nostalgic15:54
mriedemall of the TODO(alaski)s to feel like an old sweater15:54
mriedemi'll admit15:54
mriedems/to/do/15:54
*** ociuhandu has quit IRC15:56
*** ociuhandu has joined #openstack-nova15:57
dansmithlol15:57
*** ociuhandu has quit IRC16:01
*** ociuhandu has joined #openstack-nova16:02
stephenfinI'm almost certain we've discussed this before, but why do we drop support for old microversion in novaclient?16:03
stephenfinReferring to the API part, rather than the CLI16:03
stephenfinDoesn't osc use that?16:03
*** pcaruana has joined #openstack-nova16:06
*** ociuhandu has quit IRC16:06
mriedemgibi: a few comments in https://review.opendev.org/#/c/683947/ but not worth holding it up,16:07
mriedemgibi: but we might want to reconsider the fault message that the user could see16:07
mriedem"oh hi your cloud provider is doing an upgrade and you shouldn't be trying to resize your instance right now bob!"16:08
mriedemstephenfin: where have we dropped support for old microversions in novaclient?16:08
gibimriedem: the funny thing is that I think we cannot hit that error after the fix16:08
gibimriedem: so we could even remove it16:09
stephenfinmriedem: e.g. 01fb16533bf562f39fe822bc12b9cc34b858035916:09
mriedemgibi: well, technically your fix is in conductor checking it's config but the computes involved in the resize/cold migrate could have different config with different pins16:09
mriedemstephenfin: that one broke osc16:09
gibimriedem: can we pin our computes differently? how will they talk to each other?16:09
mriedemand they had to fix to avoid using novaclient and hit the api directly16:10
*** ociuhandu has joined #openstack-nova16:10
mriedemgibi: well i meant if your conductor was unpinned but the computes were, or not restarted yet or something16:10
mriedemafter unpinning them16:10
mriedemi'm nto saying that's normal16:10
*** slaweq has quit IRC16:10
mriedembut i'd probably leave the error checking in place until we have a compute rpc api major version bump to 6.016:10
gibimriedem: OK. I so it is a sort transinet. I can accept that16:10
stephenfinYup, I recall that coming up now. Any reason we opted to do that instead of reverting the changes to 'novaclient/v2/client.py'?16:10
stephenfinthat = call the API directly16:11
mriedemgibi: so if you wanted, you could write a separate patch which (1) logs the upgrade stuff but doesn't include it in the PortUpdateFailed message, (2) adds a TODO to that PortUpdateFailed code to say we can drop it after compute RPC API is bumped to 6.016:11
gibimriedem: sure. I can do that16:11
mriedemstephenfin: it was released late in novaclient, end of the milestone before FF and i think osc found the problem too late16:11
mriedemstephenfin: so a mixture of releasing breaking / major version things in novaclient too late in the release to catch them and (2) osc historically not doing functional testing16:12
mriedemwell, not doing good enough coverage with functional testing16:12
*** mkrai has quit IRC16:12
mriedemand (3) me not realizing osc was going to be broken16:13
stephenfinthat makes sense16:14
mriedemthat was probably before i cared more about osc and just figured, "this has been deprecated in novaclient since newton, we can surely drop it and no one will care"16:14
mriedemso having been burned before, and burning myself and others, that's why i'm less cavalier about just removing old stuff16:14
stephenfinYeah, certainly true for something like this16:15
mriedemspeaking of an upgrade-related burning sensation https://review.opendev.org/#/q/topic:bug/1843090+(status:open+OR+status:merged)16:15
stephenfinI was holding off on that til some people had looked at the oslo.messaging bug16:16
stephenfinthough maybe that doesn't make sense16:16
mriedemyou mean gibi's RequestSpecImageSerializationFixture in the functoinal test?16:17
mriedemit's an existing pattern / known issue16:17
mriedemcould also summon dansmith to review those since it's rpc pin related16:18
mriedemand that's dan's middle name16:18
mriedemdan rpc-pin-and-sometimes-evacuate smith16:18
stephenfinIf dansmith doesn't get to it by tomorrow, I will16:19
*** mrch_ has joined #openstack-nova16:19
mriedemgibi: i think we could drop the new and redundant unit test in https://review.opendev.org/#/c/683948/16:29
mriedemthe functional test covers it16:29
*** BjoernT has quit IRC16:31
*** dtruong has joined #openstack-nova16:41
*** pcaruana has quit IRC16:46
*** BjoernT has joined #openstack-nova16:53
*** AdamMork has quit IRC16:57
*** udesale has quit IRC17:00
*** nweinber_ has joined #openstack-nova17:02
*** BjoernT has quit IRC17:04
*** nweinber has quit IRC17:04
*** BjoernT has joined #openstack-nova17:19
*** jmlowe has quit IRC17:19
openstackgerritMerged openstack/nova master: Add note about needing noVNC >= v1.1.0 with using ESX  https://review.opendev.org/68294617:24
*** derekh has quit IRC17:24
*** ociuhandu has quit IRC17:27
*** lpetrut has joined #openstack-nova17:28
*** ociuhandu has joined #openstack-nova17:28
*** ociuhandu has quit IRC17:28
*** ociuhandu has joined #openstack-nova17:28
*** BjoernT has quit IRC17:30
bbobrov_hi! In https://opendev.org/openstack/nova/src/branch/master/nova/servicegroup/drivers/db.py#L99 messaging.MessagingTimeout is getting caught. My understanding is that service_ref.save() only interacts with the database. How can MessagingTimeout happen?17:30
*** bbobrov_ is now known as bbobrov17:30
*** ralonsoh has quit IRC17:35
sean-k-mooneybbobrov: if this is executing on the compute node then the save call would invoke the db update via rpc as the comptue services do not have database access17:38
*** BjoernT has joined #openstack-nova17:39
sean-k-mooneybbobrov: so on the concoctor,api and schduler this shoudl not raise a messaging timeout but it can in the compute agents17:41
bbobrovsean-k-mooney: understood, thanks17:41
*** BjoernT_ has joined #openstack-nova17:43
*** jmlowe has joined #openstack-nova17:44
*** BjoernT has quit IRC17:45
*** davee_ has joined #openstack-nova17:45
openstackgerritMatt Riedemann proposed openstack/nova master: Rename Claims resources to compute_node  https://review.opendev.org/67947017:45
openstackgerritMatt Riedemann proposed openstack/nova master: Add a prelude for the Train release  https://review.opendev.org/68332717:50
*** rouk has quit IRC17:50
*** lpetrut has quit IRC17:52
bbobrov#2 then. there is a `periodic_enable` option, and in nova codebase it seems to affect only https://opendev.org/openstack/nova/src/branch/master/nova/service.py#L205 . if true, self.periodic_tasks is scheduled to be run by add_dynamic_timer17:53
bbobrovBut there is no implementation of periodic_tasks or run_periodic_tasks anywhere in Nova. The method in the base scheduler class does `pass`. Are there any implementation of run_periodic_tasks in Nova that i miss or they can come from somewhere else?17:54
sean-k-mooneywe use the one form oslo service https://github.com/openstack/oslo.service/blob/master/oslo_service/periodic_task.py17:56
*** yaawang has quit IRC17:57
*** yedongcan has quit IRC17:58
*** yaawang has joined #openstack-nova17:59
*** ricolin has quit IRC18:02
*** davee_ has quit IRC18:03
*** igordc has joined #openstack-nova18:06
*** luksky has joined #openstack-nova18:06
mriedemmelwitt: do you know if tripleo has any kind of tooling that pokes a cell mq/db on upgrade to make sure it's ok before saying everything is good to go in the api?18:07
*** davee_ has joined #openstack-nova18:11
*** cdent has quit IRC18:14
*** ociuhandu has quit IRC18:19
bbobrovsean-k-mooney: thanks, it makes sense now18:22
*** jdillaman has joined #openstack-nova18:31
*** martinkennelly has quit IRC18:32
lyarwoodmriedem: we are only just introducing full support for multi cell deployments now in 16 but I can ask around about that in the morning.18:32
lyarwoodowalsh: ^ unless you're around and know?18:33
mriedemlyarwood: yeah in grenade upstream jobs we don't have multiple cells, just cell0 and cell118:34
mriedemtrying to debug https://bugs.launchpad.net/nova/+bug/184492918:34
openstackLaunchpad bug 1844929 in OpenStack Compute (nova) "grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,Confirmed]18:34
mriedemit looks like we upgrade code to train, restart scheduler, it starts up, even hits the cell1 database to pull compute nodes and instances for it's in-memory cache, and then on the first scheduling attempt after that we hang18:35
dansmithmriedem: scheduler doesn't need rabbit to do that, so the rabbit errors in your bug report would be unrelated I'd think18:37
mriedemtrue, didn't think of that18:39
mriedemobviously18:39
mriedemi just left a comment with a bunch of log links,18:39
mriedembut it looks like we start up ok, hit the cell1 db to pull compute nodes and instances, then 4 minutes later is the first scheduling attempt after the upgrade and at that point we're timing out18:40
sean-k-mooneydansmith: we do eventurlly need to adress the rabbit mq heartbeat issue even if it is unrelated to this issue18:41
mriedemit also looks like this goes back to 9/14 which is a few days before i originally thought18:41
dansmithmriedem: that timeout is related to the scatter18:41
dansmithmriedem: that's not new information, I'm just saying18:42
dansmithmriedem: I was expecting to see a trace, but I think the scatter squashes that18:42
sean-k-mooneyi belive if any of the request raise an excpeiton we catch it and return the exception instead of raising it or something like that18:43
mriedemit just logs a warning without a trace18:43
dansmithmriedem: right18:43
sean-k-mooneyso i dont think we actuly log the traceback the way we normally would18:43
dansmithsean-k-mooney: didn't I just say that?18:43
*** slaweq has joined #openstack-nova18:44
mriedemwhat i'm concerned about is if something in https://review.opendev.org/#/c/641907/ which merged and was released late is causing some side effect with the restart18:44
sean-k-mooneyyes i was halfway through typing it when you did so i finished it and hit enter18:44
dansmithmriedem: you said it was able to prime its cache on startup right? so this is it getting all nodes from all cells on the schedule run?18:44
mriedemwe're getting through this https://github.com/openstack/nova/blob/597b34cd87ac349c0f3702a872630f3c830b1483/nova/scheduler/host_manager.py#L41318:45
mriedemand then 4 minutes later the first scheduling request comes,18:45
mriedemand it's going back to the cell to pull compute nodes by uuid18:45
mriedemand times out18:45
dansmithmriedem: I guess I thought we were stopping with systemd in that case, so no sighup involed yeah?18:45
sean-k-mooneyhttps://github.com/openstack/grenade/blob/master/projects/60_nova/shutdown.sh#L2118:47
sean-k-mooneyi have not checked devstack but yes18:47
sean-k-mooneyi think tha tdoes systemctl stop devstack@n=* effectivly18:48
dansmithright18:48
sean-k-mooneyya it does https://github.com/openstack/devstack/blob/master/lib/nova#L1047-L105918:49
mriedemSep 22 00:37:32.126839 ubuntu-bionic-ovh-gra1-0011664420 systemd[1]: Stopped Devstack devstack@n-sch.service.18:49
mriedemSep 22 00:45:55.359862 ubuntu-bionic-ovh-gra1-0011664420 systemd[1]: Started Devstack devstack@n-sch.service.18:49
dansmithmriedem: do we even stop/start mysql in grenade during the upgrade?18:50
mriedemSep 22 00:37:27.786606 ubuntu-bionic-ovh-gra1-0011664420 nova-scheduler[25563]: INFO oslo_service.service [None req-91e88f0d-9b5c-4cb7-a5e9-e7309f922832 None None] Caught SIGTERM, stopping children18:50
mriedemyeah not a HUP18:50
mriedemdansmith: pretty sure we don't18:50
dansmithmriedem: yeah, so, pretty weird that it connects and works once and then times out later, because it's not like we restart mysql after that point or something18:51
dansmithmriedem: so I wonder if we're actually doing something other than what we think18:51
*** igordc has quit IRC18:51
dansmithand since we don't log a trace there we don't know where it's actually timing out18:51
openstackgerritDustin Cowles proposed openstack/nova-specs master: Spec: Provider config YAML file  https://review.opendev.org/68047118:52
*** brault has joined #openstack-nova18:53
*** ircuser-1 has joined #openstack-nova18:54
*** jdillaman has quit IRC18:55
*** tbachman has quit IRC18:57
openstackgerritMatt Riedemann proposed openstack/nova master: Log CellTimeout traceback in scatter_gather_cells  https://review.opendev.org/68411818:59
mriedemdansmith: thinking like this? ^18:59
dansmithmriedem: yeah, I mean we kinda specifically decided not to explode there so we're tolerant of transient failures19:00
dansmithI wonder if we should log debug with exc_info=True and warning without or something19:00
dansmithbut yeah, something19:00
*** tbachman has joined #openstack-nova19:00
mriedemwas thinking about passing down a kwarg to scatter_gather_cells to tell it what to do, but that's probably icky19:01
mriedemtrace_on_timeout = kwargs.pop('trace_on_timeout', False)19:01
dansmithyeah, I don't like that19:01
dansmithI love **kwargs behavior in python, but hate kwargs.pop('arg', None) abuses of it19:02
dansmithwhich is why we can't have nice things in other languages :)19:02
*** nweinber_ has quit IRC19:02
*** nweinber has joined #openstack-nova19:03
*** tbachman has quit IRC19:03
dansmithmriedem: so you just need to recheck that a few times to get a repro on it yeah?19:05
mriedemmaybe if we get lucky http://status.openstack.org/elastic-recheck/#184492919:06
mriedemit mostly only shows up on certain node providers19:06
mriedemso if i hit a rax node i likely won't see it19:07
mriedemand only grenade jobs for some reason19:07
mriedemsomething getting f'ed up in restarts19:07
sean-k-mooneymriedem: which node providers?19:07
mriedemfort nebula and ovh19:07
dansmithmriedem: you should put that in the bug19:07
sean-k-mooneywe can target fort nebula using a specific node pool lable19:07
dansmithso people can know and don't have to ask19:08
sean-k-mooneyim not sure about ovh19:08
mriedemdansmith: from the bug, "It also appears to only show up on fortnebula and OVH nodes, primarily fortnebula."19:08
sean-k-mooneybut we coudl try and repoducice it via FN if you thing that was useful19:08
*** zhubx has quit IRC19:08
dansmithmriedem: I know :)19:08
mriedemi see what you did there19:08
*** zhubx has joined #openstack-nova19:08
dansmithhaha19:09
*** macz has joined #openstack-nova19:09
hemnaso I still can't seem to run tox -epy27 on anything less than stable/rocky19:11
hemnaqueens and pike both fail19:11
*** slaweq has quit IRC19:11
hemnahttp://paste.openstack.org/show/778953/19:11
hemna6444 failures, all the same type of failure.  sqlalchemy.exc.NoSuchTableError: migration_tmp19:11
dansmithhemna: clean your local directory19:12
hemnaI did19:12
*** READ10 has joined #openstack-nova19:12
donnydIs this job CPU bound?19:12
hemnanuked all pyc files and .tox19:12
dansmither, well, maybe not with that _tmp prefix19:12
dansmithhemna: yeah, normally that comes from stale migration pycs but I think this is something else19:12
mriedemlooking a bit further between the n-sch restart and the timeout, verify_instance works19:14
mriedem2019-09-22 00:47:29.331 | Instance 44d7efdc-c048-4dca-8b4b-3d518321eddd is in cell: cell1 (8acfb79b-2e40-4e1c-bc3d-d404dac6db90)19:14
*** tbachman has joined #openstack-nova19:14
mriedemdonnyd: i think any job running nova and neutron is CPU bound :)19:14
donnydYea that is not a strength of FN19:15
*** macz has quit IRC19:15
*** READ10 has quit IRC19:16
*** macz has joined #openstack-nova19:16
sean-k-mooneymriedem: i assume you have seen the https://c0c3548b65f303ef6c0e-9dc5526a72bde5cc52e2c616e6a483fd.ssl.cf5.rackcdn.com/682061/5/check/grenade-heat/79209ac/logs/screen-n-cond.txt.gz i assume those are a result of the timeout19:16
*** mgoddard has quit IRC19:17
mriedemwhat, NoValidHost? yes that's the user visible failure19:17
sean-k-mooneyya19:17
mriedemthe scheduler asks placement for providers for a build request, it gives back 1, we ask the cell db for the compute node for that one by uuid, timeout and then run an empty list of hosts through the filters19:17
sean-k-mooneydonnyd: well for jobs that actully enable nested vert/kvm FN is proably faster then the qemu based jobs19:18
openstackgerritEric Fried proposed openstack/nova master: doc: attaching virtual persistent memory to guests  https://review.opendev.org/68030019:19
sean-k-mooneymriedem: ya i notice the timestamp in the conductor log was after the timeout.19:19
donnydsean-k-mooney: for that case.. and anything with heavy IO should fly in FN - I am curious if the jobs that are failing on FN are related in any way to ipv619:19
*** mgoddard has joined #openstack-nova19:20
dansmithdonnyd: could be, as we might be initiating a connection when this fails,19:20
dansmithwhich might be network-related19:20
mriedemi didn't think grenade jobs were doing anything with ipv619:20
sean-k-mooneydonnyd: where you thinking about the reduced mtu19:21
dansmithwell, maybe not, but just saying, if something network-y is the difference, that could explain flakiness19:21
donnydI was thinking more about the other one where the node loses contact when the ipv6 public side network is created19:22
donnydI recently tried to help that out by lowering my RA's so when it does it will pick it back up in time19:22
sean-k-mooneythe vms are dual stack vms right. they get a public ipv6 and private ipv4 address19:23
donnydcorrect sean-k-mooney19:23
sean-k-mooneyin that case ill check what ip we actully use for devstack19:23
donnydThe mtu was lowered to 145019:23
donnydnot sure what the other providers are set at19:23
sean-k-mooneywe are using the ipv4 addres19:24
sean-k-mooney192.168.48.93 in this case19:24
openstackgerritMerged openstack/nova master: Get pci_devices from _list_devices  https://review.opendev.org/68067419:24
donnydcan we replicate this job on a test instance that doesn't get kilt19:24
sean-k-mooneywe can always ask infra to hold the node19:25
donnydSo we can go poke around and see what the real issue is.. if its just FN failing this job then it must be FN related...19:25
dansmithdonnyd: it's not just FN19:25
sean-k-mooneydonnyd: its also failing on both of the ovh clouds19:26
dansmithand for this, it matters what is in the cell mapping, not what is in the config file, just FYI19:26
donnydso it only succeeds on rax then?19:26
donnydwhat about limestone or vexxhost?19:27
donnydi think limestone is setup like FN with ipv619:27
sean-k-mooneythis is the logstash link for elastic search http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags:%5C%22screen-n-sch.txt%5C%22%20AND%20voting:1&from=864000s19:28
donnydAnd I think maybe using the custom labels will make sure you can schedule the job to a provider known to fail..19:28
mriedemthis would be right around the time where we hit the scheduler for a tempest test to create a server and the timeout happens 1 minute after that19:29
mriedemhttps://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/syslog.txt.gz#826919:29
mriedemidk if the netlink messages in there could be an issue19:29
sean-k-mooneymriedem: i just realised im looking at a single node greade-heat job which has the same issue19:29
openstackgerritMerged openstack/nova master: [Trivial]Remove unused helper should_switch_to_postcopy  https://review.opendev.org/67917719:31
*** tbachman has quit IRC19:31
openstackgerritMerged openstack/nova master: [Trivial]Removed unused helper _extract_query_params  https://review.opendev.org/67917419:31
mriedemlooks like those syslog messages are 'normal'19:32
dansmithmriedem: yeah probably not related to this19:32
dansmithI dunno if OVS uses netlink for any datapath stuff, but in general, it's for setup of networking stuff,19:33
*** nweinber_ has joined #openstack-nova19:33
dansmithnot related to moving actual packets19:33
dansmithalthough I guess if those indicated failure to set something up, then maybe, but..19:33
donnydI am also curious if this job was succeeding before I changed the MTU on FN19:33
sean-k-mooneyit uses netlink for a few things19:33
sean-k-mooneydonnyd: i dont think its MTU related19:34
donnydWell did this job pass on FN before 12 Sep?19:34
sean-k-mooneyas i said the logs i was looking at for the greade-heat job that failed was singlenode19:34
donnydor has it always failed19:34
*** nweinber has quit IRC19:35
mriedemdonnyd: this is recent19:36
mriedemlogstash only goes back 10 days for us but i'm seeing it go to at least 9/1419:36
mriedemi only noticed the failure while looking at some logs this weekend thouh19:36
mriedem*though19:36
mriedemso idk when it started,19:36
donnydreason I ask is the MTU on FN was changed on the 12th here https://review.opendev.org/#/c/681951/19:36
mriedemif you haven't noticed yet, when we get close to feature freeze and RC1, people recheck job failures like a chicken with its head cut off just to get their code merged19:37
mriedemSep 22 00:14:19 ubuntu-bionic-ovh-gra1-0011664420 sudo[13369]:    stack : TTY=unknown ; PWD=/opt/stack/old/devstack ; USER=root ; COMMAND=/sbin/ip link set mtu 1500 dev br-ex19:37
donnydmessage:"sbin/ip link set mtu 1500 dev br-ex" AND node_provider:"fortnebula-regionone"19:41
donnydyea that is probably not going to work well19:41
*** jmlowe has quit IRC19:42
donnydSep 23 16:41:40 ubuntu-bionic-fortnebula-regionone-0011712952 sudo[14580]:    stack : TTY=unknown ; PWD=/opt/stack/old/devstack ; USER=root ; COMMAND=/sbin/ip link set mtu 1500 dev br-ex19:42
*** pcaruana has joined #openstack-nova19:43
openstackgerritMerged openstack/nova master: Remove stubs from VolumeAttachmentsSample API sample test  https://review.opendev.org/68083419:43
openstackgerritMerged openstack/nova master: Use multiple attachments in test_list_volume_attachments  https://review.opendev.org/68161819:43
donnydbest case it will just cause every packet to be retransmitted at a lower MTU...19:43
sean-k-mooneydonnyd: again i dont think this is mtu related. none of the tests shoudl need to send packet that leave teh host19:44
sean-k-mooneydonnyd: the way tempest executs zuul ssh's into the vm provided by FN and runs evertying from within that vm19:45
sean-k-mooneyin a single node deployment that means that noting should leave the vm19:45
donnydWell I don't think they would have to...19:45
sean-k-mooneyso the MTU would cause any issues19:45
donnydthe interface itself that br-ex is connected to would have a lower MTU though wouldn't it19:46
donnydand is the API connected to that same interface?19:46
sean-k-mooneyit may but tempest wont be using that interface19:46
donnydOk19:47
donnydI think we should just use the custom label and then hold the node19:48
donnydso we can see what is in fact the what19:48
hemnahrmm, so the tox -epy27 issue goes away on ubuntu for stable/rocky19:48
hemnahappens on opensuse Leap19:49
hemna:(19:49
sean-k-mooneywhat is the issue?19:49
hemnahttp://paste.openstack.org/show/778953/19:49
donnydyes what=issue19:49
*** dave-mccowan has joined #openstack-nova19:50
mriedemhemna: oh, i can't speak for suse, i only use bionic for unit test runs and local dev19:50
mriedemmaybe some sqlite issue with the suse you're using?19:50
hemnahrmm yah I suppose so19:51
owalshlyarwood, mriedem: AFAIK tripleo isn't doing anything for the cells (apart from the default cell).  The cell has to be manually added and the api services need to be bounced19:52
mriedemowalsh: ack19:53
owalshmriedem: and as lee mentioned support is recent so there isn't an upgrade path to worry about yet. Does something need poking?19:54
mriedemno, just started seeing some weirdness related to cell connection timeouts during scheduling after upgrade in our grenade jobs19:56
mriedemmelwitt: this should be good to go now, your comments have been addressed https://review.opendev.org/#/c/541420/19:57
mriedemand the patch on top is happy19:57
openstackgerritMerged openstack/nova master: Tune up db.instance_get_all_uuids_by_hosts  https://review.opendev.org/67962719:58
openstackgerritMerged openstack/nova master: Add reminder to update corresponding glance docs  https://review.opendev.org/68201219:58
*** derekh has joined #openstack-nova20:01
*** derekh has quit IRC20:01
*** brault has quit IRC20:02
*** markvoelker has quit IRC20:07
*** jmlowe has joined #openstack-nova20:09
efriedmriedem: docs for vpmem if you please? https://review.opendev.org/#/c/680300/20:10
*** pcaruana has quit IRC20:12
mriedemefried: one of the cores that reviewed that series can hit it20:17
mriedemi wouldn't know if any of it is true or not20:17
efriedack20:17
*** brault has joined #openstack-nova20:18
openstackgerritMerged openstack/nova master: Refactor pre-live-migration work out of _do_live_migration  https://review.opendev.org/64145320:22
*** tbachman has joined #openstack-nova20:30
*** gbarros has quit IRC20:35
*** gbarros has joined #openstack-nova20:36
melwittmriedem: ack, will look20:41
*** nweinber_ has quit IRC20:59
*** tbachman has quit IRC21:03
*** xek has quit IRC21:05
*** tesseract has quit IRC21:23
*** igordc has joined #openstack-nova21:27
openstackgerritEric Fried proposed openstack/nova-specs master: resubmit image metadata prefiltering spec for ussuri  https://review.opendev.org/68325821:32
*** BjoernT_ has quit IRC21:34
openstackgerritMerged openstack/nova-specs master: Spec: Provider config YAML file  https://review.opendev.org/68047121:34
*** rcernin has joined #openstack-nova21:38
*** rcernin has quit IRC21:40
*** rcernin has joined #openstack-nova21:40
*** takashin has joined #openstack-nova21:47
openstackgerritMatt Riedemann proposed openstack/nova master: Log error when volume validation fails during boot from volume  https://review.opendev.org/68414021:49
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional tests for [cinder]/cross_az_attach=False  https://review.opendev.org/68414121:49
mriedemdansmith: ^ i wrote some functional tests for that long-standing cross_az_attach api bug21:49
mriedemrather than rely on devstack wip'ery21:49
mriedemcrap, i should actually add the file21:50
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional tests for [cinder]/cross_az_attach=False  https://review.opendev.org/68414121:50
*** markvoelker has joined #openstack-nova21:53
*** mriedem has quit IRC21:55
*** markvoelker has quit IRC21:58
openstackgerritMerged openstack/nova-specs master: resubmit image metadata prefiltering spec for ussuri  https://review.opendev.org/68325822:02
*** mlavalle has quit IRC22:15
*** gbarros has quit IRC22:18
*** adriant has quit IRC22:39
*** ivve has quit IRC22:40
openstackgerritTakashi NATSUME proposed openstack/nova master: Update keypairs in saving an instance object  https://review.opendev.org/68304322:43
*** tkajinam has joined #openstack-nova23:01
*** TxGirlGeek has quit IRC23:06
*** adriant has joined #openstack-nova23:12
*** tbachman has joined #openstack-nova23:40
*** alex_xu has joined #openstack-nova23:44
*** luksky has quit IRC23:44

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!