opendevreview | Merged openstack/nova master: Abort startup if nodename conflict is detected https://review.opendev.org/c/openstack/nova/+/872432 | 01:14 |
---|---|---|
opendevreview | Merged openstack/nova master: Stable compute uuid functional tests https://review.opendev.org/c/openstack/nova/+/872441 | 01:14 |
opendevreview | Maxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted https://review.opendev.org/c/openstack/nova/+/872385 | 05:58 |
*** blarnath is now known as d34dh0r53 | 06:52 | |
*** blarnath is now known as d34dh0r53 | 07:02 | |
opendevreview | sean mooney proposed openstack/nova master: introduce global greenpool https://review.opendev.org/c/openstack/nova/+/873061 | 07:19 |
opendevreview | sean mooney proposed openstack/nova master: introduce global greenpool https://review.opendev.org/c/openstack/nova/+/873061 | 07:30 |
opendevreview | sean mooney proposed openstack/nova master: introduce global greenpool https://review.opendev.org/c/openstack/nova/+/873061 | 07:39 |
opendevreview | sean mooney proposed openstack/nova master: introduce global greenpool https://review.opendev.org/c/openstack/nova/+/873061 | 07:52 |
opendevreview | Pierre Libeau proposed openstack/nova master: Add mechanism to manage snapshot during nc init https://review.opendev.org/c/openstack/nova/+/873062 | 08:12 |
plibeau4 | hello guys, I have push https://review.opendev.org/c/openstack/nova/+/873062 to have your feedback before to start written some tests. I have explain the issue also in this bug: https://bugs.launchpad.net/nova/+bug/2006559 | 08:15 |
sean-k-mooney | gibi: bauzas when ye are around take a look at https://review.opendev.org/c/openstack/nova/+/873061 and specificaly the failing tests here https://2f622fb6915ea6772a94-26db3adf591a82ec37c96a7d3180086f.ssl.cf2.rackcdn.com/873061/4/check/openstack-tox-py310/be4df4f/testr_results.html those are the unit tests that are leaking | 08:31 |
bauzas | sean-k-mooney: okay but fwiw the leaked tests for https://launchpad.net/bugs/1946339 | 08:35 |
sean-k-mooney | ha ok i also know why | 08:35 |
bauzas | are functional tests | 08:35 |
sean-k-mooney | https://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L497 | 08:36 |
sean-k-mooney | the test is calling h.initialize() | 08:36 |
sean-k-mooney | which calls init_event which does utils.spawn(self._dispatch_thread) | 08:37 |
sean-k-mooney | bauzas: here are the functionla failures https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3e4/873061/4/check/nova-tox-functional-py310/3e45bc4/testr_results.html | 08:38 |
sean-k-mooney | we have many more of them | 08:38 |
bauzas | sean-k-mooney: we agreed yesterday on merging it after the RC1, right? | 08:40 |
bauzas | for Bobcat I mean | 08:40 |
sean-k-mooney | maybe i created it because i tought it would find all the tests that were broken | 08:41 |
sean-k-mooney | i was more concenred with teh test fallout then any impact it woudl nova on nova in production | 08:43 |
sean-k-mooney | the code change is pretty low risk | 08:43 |
sean-k-mooney | fixign all the broken test could be a lot of work. | 08:44 |
sean-k-mooney | bauzas: the libvirt event dispatch thread is curently while True | 08:45 |
sean-k-mooney | https://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L209-L218 | 08:45 |
sean-k-mooney | to fix that we need a way to stop that thread for the functionl tests at least | 08:45 |
sean-k-mooney | well or we need to make sure its mocked out properly | 08:46 |
sean-k-mooney | https://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L607 is always creating 2 greenthreads that never exit if its called today | 08:48 |
sean-k-mooney | so the libvirt fixture need to be enhanced to stub that out | 08:49 |
bauzas | sean-k-mooney: the problem is that given we don't know which test is causing trouble, we won't be able to make sure your change can fix it | 08:49 |
sean-k-mooney | my patch show which tests are causing the issue | 08:49 |
bauzas | I see | 08:50 |
sean-k-mooney | bauzas: i added a fixture that detects when tests leak green threads https://review.opendev.org/c/openstack/nova/+/873061/4/nova/tests/fixtures/nova.py#1138 | 08:50 |
sean-k-mooney | and prints there name although that part currenlty has a race | 08:51 |
sean-k-mooney | fortunetly when we hit the races it still raise an exception and fails the test | 08:51 |
bauzas | ok, so then we could use your change for telling which functests leak out | 08:52 |
sean-k-mooney | yep | 08:52 |
bauzas | but for the moment, we should only merge your change at the beginning of Bobcat | 08:53 |
sean-k-mooney | i suspect that much of the cases it in the common code | 08:53 |
bauzas | just because I want to have time to make sure that if we find some issues, it shouldn't be a time problem | 08:53 |
bauzas | (I just want to be careful here) | 08:53 |
sean-k-mooney | so as i noted above. the host.py initalize currently spanw 2 greenthreads and that is call by the libvirt driver | 08:54 |
sean-k-mooney | i dont think that is stubbed by the the libvirt fixture | 08:55 |
sean-k-mooney | so today i think all libvirt functional tests are leaking at least 2 greenthreds. | 08:55 |
sean-k-mooney | the libvit event dispatcher thread and connection event thread | 08:55 |
bauzas | lemme verify | 08:57 |
sean-k-mooney | we dont currently save the returned GT and they have a while true so currenlty there is no way to stop those but it would no be hard to add one | 08:57 |
bauzas | at least we know the leaked thread is calling self.live_migration_abort() | 08:58 |
bauzas | but I think the leaked thread comes from a RPC call | 08:58 |
bauzas | not from a libvirt thread | 08:58 |
gibi | I have some concerns of the pooling in general and I left comments there | 08:58 |
sean-k-mooney | ack | 08:58 |
bauzas | honestly, my concern is more about the time here | 08:59 |
gibi | bauzas, sean-k-mooney: I think the current pooling won't catch the RPC threads as that is created in oslo.messaging | 08:59 |
bauzas | I see three cores at least looking at this bug while we only have 7 days for merging features | 08:59 |
sean-k-mooney | gibi: yes it likely wont but those are not created directly by nova | 08:59 |
sean-k-mooney | gibi: with that said i might be able to make it do that | 09:00 |
gibi | I will keep rechecking bauzas' | 09:00 |
gibi | I will keep rechecking bauzas's patch to catch a failure to see where it is coming from | 09:00 |
bauzas | so, while I think it's important to have a better way to have green threads pooling, I'm just saying that we maybe should try to just find the issue and at least do other stuff | 09:00 |
bauzas | sean-k-mooney: see,that's a problem then | 09:00 |
sean-k-mooney | bauzas: no its not | 09:00 |
bauzas | sean-k-mooney: as I said, I'm pretty sure that the threads that are leaked and create this libvirt exceptioin are RPC calls | 09:01 |
bauzas | when I say a problem, I mean I'm not sure this change would help then | 09:01 |
sean-k-mooney | right but ye did not have a repoducer so i tried to create one and found a bunch of issue | 09:01 |
bauzas | https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html | 09:01 |
sean-k-mooney | it may not fix the current issue but all the other tests if found may be flaky | 09:02 |
gibi | sean-k-mooney, bauzas: cool, we have two set of issues to solve then :) | 09:02 |
bauzas | agreed | 09:02 |
bauzas | and agreed with gibi | 09:02 |
sean-k-mooney | and if im wirte we are also constantly builiding up greenthreas as the test run | 09:02 |
bauzas | sean-k-mooneyI'm not saying "NO" to your change and thanks for having worked on it | 09:02 |
sean-k-mooney | yes | 09:02 |
sean-k-mooney | two sets of issues | 09:02 |
bauzas | sean-k-mooney:I'm just saying that I'm afraid this won't help the functest CI failure we have atm | 09:03 |
bauzas | anyway, I said loudly yesterday that I'll stop looking at the CI failures today and I'll rather review some changes | 09:04 |
bauzas | os-vif and os-traits first, and then nova features | 09:04 |
sean-k-mooney | there arnt any we need for os-vif in this release | 09:04 |
bauzas | and I'll continue to look at https://review.opendev.org/c/openstack/nova/+/872975 and recheck until we get a -1 | 09:04 |
sean-k-mooney | not sure about os-traits | 09:04 |
bauzas | that's what I'll be doing | 09:04 |
bauzas | people are free to do anything | 09:05 |
bauzas | they prefer | 09:05 |
bauzas | I also need to look at the releases we have for os-vif, os-traits and os-rc | 09:05 |
sean-k-mooney | i approved the os-vif one yesterday | 09:06 |
sean-k-mooney | i did not look at the others | 09:06 |
*** amorin_ is now known as amorin | 09:09 | |
sean-k-mooney | gibi: so we do stub out the events thread https://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/tests/fixtures/libvirt.py#L919-L941 | 09:09 |
sean-k-mooney | but not the other one | 09:09 |
sean-k-mooney | https://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L616-L619 | 09:10 |
sean-k-mooney | i wonder if we can just more utils.spawn(self._conn_event_thread) into self._init_events() | 09:11 |
gibi | sean-k-mooney: I won't mix the events part with the connection thread. the events part uses a native thread | 09:12 |
bauzas | elodilles: 2023-02-07 14:02:46.446989 | compute1 | neutron-openvswitch-agent: no process found on https://review.opendev.org/c/openstack/nova/+/871702 | 09:12 |
bauzas | elodilles: that's the second recheck having the same problem | 09:13 |
gibi | but sure we can wrap the connection thread spawning and mock it | 09:13 |
bauzas | elodilles: so, OK, when you say "broken broken", I understand it :) | 09:13 |
sean-k-mooney | well these are both related to event handeling | 09:13 |
sean-k-mooney | so it feell like it should be also in _init_events | 09:14 |
gibi | bauzas: how do you feel about https://review.opendev.org/c/openstack/nova/+/872975/2/nova/virt/libvirt/driver.py#10064 our current trials haven't hit the true positive case yet, but already hit the half false positives a lot | 09:14 |
gibi | sean-k-mooney: connection thread is where nova initiate calls to libvirt the event thread is where libvirt initiate call back to nova | 09:14 |
sean-k-mooney | ok but both of them are related to the libvirt events transmit vs recive | 09:15 |
sean-k-mooney | https://github.com/openstack/nova/blob/ea0526d959f7246c7d741ea24c207b52417d224a/nova/virt/libvirt/host.py#L480 | 09:15 |
sean-k-mooney | and the doc string makes it sound like a good fit | 09:16 |
sean-k-mooney | it woudl jsut be adding the spawn to the end fo that function | 09:16 |
sean-k-mooney | so we spawn the dispatche and reciver threads form the same place | 09:16 |
sean-k-mooney | and the native thread | 09:16 |
sean-k-mooney | anyway ill be back in an hour or two i was woken up at around 6 and started looking at this. so im going to see if i ca rest for a bit although at this point i may have missed that window | 09:18 |
gibi | sean-k-mooney: ack | 09:20 |
elodilles | bauzas: yepp, it's broken broken this time :] | 09:23 |
elodilles | bauzas: but this should do the work when merged: https://review.opendev.org/c/openstack/grenade/+/872969 | 09:25 |
opendevreview | Pierre Libeau proposed openstack/nova master: Add mechanism to manage snapshot during nc init https://review.opendev.org/c/openstack/nova/+/873062 | 09:26 |
zigo | sean-k-mooney[m]: Would you know how to easily just forbid *ALL* .vmdk on train and before? | 09:41 |
bauzas | gibi: sure, let's do it | 09:43 |
opendevreview | Sylvain Bauza proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase https://review.opendev.org/c/openstack/nova/+/872975 | 09:48 |
bauzas | gibi: ^ | 09:49 |
gibi | bauzas: thanks | 10:10 |
bauzas | folks, on os-traits, given we only merged https://opendev.org/openstack/os-traits/commit/feb3e28a00eeab0d4dbf097dd801aff3adeb10d6 we don't need a release | 10:17 |
bauzas | that being said, there are 2 open changes that are related to some accepted nova blueprint https://review.opendev.org/q/status:open+project:openstack/os-traits | 10:18 |
bauzas | elodilles: I just approved https://review.opendev.org/c/openstack/os-traits/+/871226 on os-traits | 10:18 |
bauzas | elodilles: we'll need a release | 10:19 |
bauzas | cores, need a second +2/+W on https://review.opendev.org/c/openstack/os-traits/+/872185 to let Uggla not blocked by this os-trait change | 10:21 |
bauzas | even if there are very little chances to have virtiofs-scaphandre series to be mergeable in Antelope | 10:21 |
* bauzas needs to disappear until noon-ish UTC time | 10:25 | |
opendevreview | Merged openstack/os-traits master: Add new 'COMPUTE_ADDRESS_SPACE_*' traits https://review.opendev.org/c/openstack/os-traits/+/871226 | 10:25 |
gibi | bauzas: it is in merge conflict now so I will rebase it and approve it | 10:30 |
opendevreview | Merged openstack/os-resource-classes master: Change minversion of tox to 3.18.0 https://review.opendev.org/c/openstack/os-resource-classes/+/791974 | 10:31 |
opendevreview | Balazs Gibizer proposed openstack/os-traits master: Add 'COMPUTE_SHARE_LOCAL_FS' https://review.opendev.org/c/openstack/os-traits/+/872185 | 10:33 |
gibi | bauzas: done ^^ | 10:34 |
gibi | bauzas: we need to bump the os-traits min version in placement once there is an os-traits release for A | 10:34 |
elodilles | bauzas: ack, let me know when os-traits can be released | 10:42 |
gibi | elodilles: I think we only waiting for https://review.opendev.org/c/openstack/os-traits/+/872185 to land then we are good to go | 10:43 |
gibi | the rest of the open os-traits reviews seems to be not related to the a release | 10:43 |
gibi | s/a release/A release/ | 10:44 |
elodilles | gibi: ack, thanks | 10:46 |
elodilles | as I see that patch will land within 2-3 minutes if everything goes well | 10:48 |
opendevreview | Merged openstack/os-traits master: Add 'COMPUTE_SHARE_LOCAL_FS' https://review.opendev.org/c/openstack/os-traits/+/872185 | 11:02 |
elodilles | and merged ^^^, do you want me to create the release patch? | 11:11 |
gibi | elodilles: if it means you cannot +2 it then I will create the release patch :) | 11:14 |
elodilles | gibi: then please do o:) | 11:22 |
gibi | on it | 11:25 |
opendevreview | John Garbutt proposed openstack/nova stable/victoria: Revert "Revert resize: wait for events according to hybrid plug" https://review.opendev.org/c/openstack/nova/+/857423 | 11:27 |
gibi | elodilles, bauzas https://review.opendev.org/c/openstack/releases/+/873106 | 11:30 |
elodilles | thanks! will review when the release jobs have finished | 11:34 |
gibi | elodilles: cool, and bauzas will be back soon I guess to review it too | 11:37 |
bauzas | just done | 11:38 |
* bauzas is eating a kebab while working :p | 11:38 | |
elodilles | +2'd! | 11:42 |
elodilles | bauzas: Bon Appétit! | 11:42 |
gibi | bauzas: enjoy | 11:45 |
*** blarnath is now known as d34dh0r53 | 12:14 | |
gibi | bauzas: I just realized that we don't have debug log enabled in the functional test by default so I will going to change https://review.opendev.org/c/openstack/nova/+/872975 to log a warning instead | 12:28 |
bauzas | ack, ok | 12:29 |
sean-k-mooney | zigo: the best you can do is disable the format in glance including disbaling the image conversion. there is no way to fully block it in nova and even then im not sure idsablity in glace will prevnet all ways it could be used. | 12:30 |
bauzas | or pass OS_DEBUG=1 in tox.ini? | 12:30 |
sean-k-mooney | OS_DEBUG=1 can be set on the comand line | 12:30 |
sean-k-mooney | no need to modify tox.ini | 12:30 |
sean-k-mooney | oh you mean for ci | 12:30 |
sean-k-mooney | ya you could in the DNM change | 12:30 |
sean-k-mooney | but its very verbose on failure i think the pass logs are not too bad | 12:31 |
bauzas | yeah I mean in the specific change | 12:31 |
bauzas | but ok, will change the level | 12:31 |
sean-k-mooney | well it might be useful to use debug | 12:31 |
opendevreview | Balazs Gibizer proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase https://review.opendev.org/c/openstack/nova/+/872975 | 12:31 |
bauzas | haha, gibi just did it :) | 12:32 |
gibi | we can switch to DEBUG all over for this patch if we want but that will mean a lot bigger subunit file to deal with :) | 12:35 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (db) https://review.opendev.org/c/openstack/nova/+/831193 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (objects) https://review.opendev.org/c/openstack/nova/+/839401 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (manila abstraction) https://review.opendev.org/c/openstack/nova/+/831194 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (drivers and compute manager part) https://review.opendev.org/c/openstack/nova/+/833090 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api) https://review.opendev.org/c/openstack/nova/+/836830 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Check shares support https://review.opendev.org/c/openstack/nova/+/850499 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add metadata for shares https://review.opendev.org/c/openstack/nova/+/850500 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach notification https://review.opendev.org/c/openstack/nova/+/850501 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach notification https://review.opendev.org/c/openstack/nova/+/851028 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add shares to InstancePayload https://review.opendev.org/c/openstack/nova/+/851029 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to attach/detach shares https://review.opendev.org/c/openstack/nova/+/852085 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working. https://review.opendev.org/c/openstack/nova/+/852086 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add virt/libvirt error test cases https://review.opendev.org/c/openstack/nova/+/852087 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/854823 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/854824 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach_error notification https://review.opendev.org/c/openstack/nova/+/860282 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach_error notification https://review.opendev.org/c/openstack/nova/+/860283 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/860284 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860285 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares https://review.opendev.org/c/openstack/nova/+/860286 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part) https://review.opendev.org/c/openstack/nova/+/860287 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860288 | 12:55 |
opendevreview | ribaudr proposed openstack/nova master: Documentation https://review.opendev.org/c/openstack/nova/+/871642 | 12:55 |
*** dasm|off is now known as dasm|rover | 13:51 | |
*** Roamer`_ is now known as Roamer` | 14:00 | |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api) https://review.opendev.org/c/openstack/nova/+/836830 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Check shares support https://review.opendev.org/c/openstack/nova/+/850499 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add metadata for shares https://review.opendev.org/c/openstack/nova/+/850500 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach notification https://review.opendev.org/c/openstack/nova/+/850501 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach notification https://review.opendev.org/c/openstack/nova/+/851028 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add shares to InstancePayload https://review.opendev.org/c/openstack/nova/+/851029 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to attach/detach shares https://review.opendev.org/c/openstack/nova/+/852085 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working. https://review.opendev.org/c/openstack/nova/+/852086 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add virt/libvirt error test cases https://review.opendev.org/c/openstack/nova/+/852087 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/854823 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/854824 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach_error notification https://review.opendev.org/c/openstack/nova/+/860282 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach_error notification https://review.opendev.org/c/openstack/nova/+/860283 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/860284 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860285 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares https://review.opendev.org/c/openstack/nova/+/860286 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part) https://review.opendev.org/c/openstack/nova/+/860287 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860288 | 14:11 |
opendevreview | ribaudr proposed openstack/nova master: Documentation https://review.opendev.org/c/openstack/nova/+/871642 | 14:11 |
bauzas | Uggla: your're next in my review list, can I do it ? ^ | 14:11 |
Uggla | bauzas, sure | 14:12 |
bauzas | ack, grabbing a coffee first | 14:13 |
*** tosky_ is now known as tosky | 14:19 | |
opendevreview | Elod Illes proposed openstack/nova stable/train: DNM: CI test https://review.opendev.org/c/openstack/nova/+/873116 | 14:33 |
gibi | bauzas: we are really unlucky with the DNM patch, it is still passing and not even producing the half false positives | 14:50 |
gibi | anyhow I rechecked again | 14:50 |
bauzas | yup, I've seen it | 14:50 |
bauzas | maybe we would need to merge it for like 2 or 3 days and revert it after | 14:50 |
bauzas | so more than one change would use it | 14:50 |
*** dasm|rover is now known as dasm|afk | 15:36 | |
bauzas | if, while I'm reviewing some series, someone wants to jab my own power management series, it would be loved <3 https://review.opendev.org/q/topic:bp%252Flibvirt-cpu-state-mgmt | 15:40 |
bauzas | I can hit the bullet. | 15:40 |
gibi | bauzas: on my list to go back to it | 15:40 |
bauzas | thanks | 15:41 |
bauzas | I'm on the manila shares one, but a single coffee shot isn't enough | 15:41 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Look for cpu controller on cgroups v2 https://review.opendev.org/c/openstack/nova/+/873127 | 15:50 |
gmann | gibi: did you get chance to look into the placement RBAC change, updated as per your comment https://review.opendev.org/c/openstack/placement/+/865618 | 16:57 |
gibi | gmann: good point. I'm fast tracking your patch now | 17:00 |
gibi | gmann: +2 +A thanks | 17:02 |
gibi | the check queue getting long in zuul (>2h to get an executor). It almost feels like FF week :) | 17:03 |
gmann | gibi: thanks | 17:10 |
gmann | yeah, gate is in bad situation | 17:10 |
opendevreview | Dan Smith proposed openstack/nova master: Add docs for stable-compute-uuid behaviors https://review.opendev.org/c/openstack/nova/+/872977 | 17:36 |
gibi | bauzas: you might not like it but I reviewed the power managemenet series | 18:18 |
gibi | bauzas: I'm close to -2 on the tar.gz part ;) | 18:19 |
sean-k-mooney | gibi: that is proably resolveable | 18:19 |
bauzas | gibi: thanks for the fish | 18:19 |
bauzas | gibi: do you have time for discussing about it ? | 18:19 |
sean-k-mooney | we started with a large copy of the syfs filesystem but the funcitonl test can be writeen a differnt way | 18:19 |
gibi | bauzas: I have a bit of time yes | 18:20 |
sean-k-mooney | the current approch was taken when thsi started as arbiterd | 18:20 |
sean-k-mooney | it had a much larger scope | 18:20 |
gibi | my main concern with the tar.gz is that I cannot review it or diff it in a normal way | 18:20 |
gibi | so it is hidden information | 18:20 |
bauzas | gibi: I do understand your point | 18:20 |
sean-k-mooney | for the limite scope in nova i htink we can create a fixture another way | 18:20 |
gibi | I would do couple of helper function that generates the fs on the fly | 18:20 |
bauzas | gibi: the problem is, as said by sean-k-mooney, is that the large number of files was creating a problem | 18:21 |
sean-k-mooney | gibi: ya that was the alternitive | 18:21 |
sean-k-mooney | bauzas: we dont need a large number of files | 18:21 |
gibi | bauzas: I assume we have two type of files | 18:21 |
bauzas | ok, I can try to generate those arbitrary files then | 18:21 |
sean-k-mooney | we actully only need a small subset | 18:21 |
bauzas | we only need a subset of the sysfs | 18:21 |
bauzas | yep, ok | 18:21 |
gibi | one for the governor and one for the online | 18:21 |
bauzas | sounds a plan then | 18:21 |
sean-k-mooney | gibi: ya more or less | 18:21 |
bauzas | yup | 18:22 |
gibi | huhh, this went better than I expected :) | 18:22 |
sean-k-mooney | and then the parent directory structure | 18:22 |
bauzas | and we can augment the fixture if needed | 18:22 |
sean-k-mooney | but thats just a call to "mkdir -p" | 18:22 |
bauzas | for other usages, I mean | 18:22 |
gibi | sean-k-mooney: +1 | 18:22 |
sean-k-mooney | within the tempfs/tempdir | 18:22 |
bauzas | yeah, I can do it quick | 18:22 |
gibi | I found one edge case that might be intersting | 18:23 |
* bauzas doesn't have the kids at home until end of next week, my productivity is raising the bar | 18:23 | |
sean-k-mooney | i didnt do that for arbiterd partly becasue it was a quick hack to get fucntional test and i expecet to do much more eventully | 18:23 |
gibi | if you do reconfiguration from cpu_state to governor strategy or back then the patch might not do the right thing now | 18:23 |
sean-k-mooney | oh that is intersting | 18:23 |
gibi | i.e. we set the governor but we never online a cpu that was off before | 18:24 |
sean-k-mooney | we coudl disallow that. but that should be resolveable | 18:24 |
gibi | yeah I think this is resolveable | 18:24 |
gibi | an fairly easily testable I hope | 18:24 |
bauzas | gibi: ah, good edge case | 18:24 |
sean-k-mooney | so the issue is when we start we cant know what it used to be | 18:24 |
gibi | just need a bit more logic on power_up and down | 18:24 |
bauzas | what would be your preference ? | 18:24 |
sean-k-mooney | i actully think i know how to detect it | 18:25 |
bauzas | to check the core status and online if a governor helper is called ? | 18:25 |
sean-k-mooney | when its set to govoner the libvirt chack that checks the cores should be online should catch this | 18:25 |
gibi | bauzas: yeah | 18:25 |
bauzas | ok, I can augment power_down and up | 18:25 |
bauzas | gibi: sean-k-mooney: fwiw, I'll be traveling tomorrow back-and-forth to Paris | 18:26 |
bauzas | so most of my day will be trains | 18:26 |
bauzas | but I'll be online | 18:26 |
bauzas | and on work status | 18:26 |
gibi | maybe the other direction is trickier. so you have a governor startegy and you set the cpu to low. Then you reconfigure to cpu_state. And you boot a VM that need the cpu so you online it, but it still has the low governor set | 18:26 |
sean-k-mooney | bauzas: you chould change https://review.opendev.org/c/openstack/nova/+/821228/5/nova/virt/libvirt/host.py#755 | 18:26 |
sean-k-mooney | we shoudl check if cpu power management is enabled and its set to cpu_state | 18:27 |
sean-k-mooney | if its set to govoner then we can detect that cores are offlien and online them | 18:27 |
sean-k-mooney | or raise an error and stop the compute | 18:27 |
bauzas | damn operators who like to play with config options | 18:27 |
sean-k-mooney | this is not really something you should change when there are instance on the host | 18:28 |
sean-k-mooney | gibi: the other direction wee should not really do anything (state->govoner) | 18:28 |
bauzas | sean-k-mooney: true, I wonder whether it would be rather preferable to detect it at startup and fail | 18:28 |
sean-k-mooney | sorry (govoner->state) | 18:29 |
gibi | I'm OK to reject the reconfiguration with instance on the host, BUT we power down CPUs at init_host without an instance | 18:29 |
sean-k-mooney | because you are allowed to manage the govoner today outside of nova | 18:29 |
gibi | so you don't need an instance to get a discrepancy | 18:29 |
sean-k-mooney | so you may have change the state | 18:29 |
bauzas | I think we're entering danger zone | 18:29 |
sean-k-mooney | with that we could say if you have ONF.libvirt.cpu_power_management=true | 18:29 |
bauzas | here, we're asking operators to think about what they gonna do | 18:30 |
sean-k-mooney | then all govenros in the cpu_dedicated_set must be the same | 18:30 |
sean-k-mooney | we could detect that in that case and catch the (govoner->state) change | 18:30 |
bauzas | I mean, if I'm an operator that already sets governor levels, then I should NOT let nova play with my cores | 18:30 |
sean-k-mooney | so (govoner->state) we can detect that there is a govoner differnce on the cpus, (govoner<-state) we can detect the offline cores | 18:31 |
bauzas | sean-k-mooney: I don't want to add logic while I can add documentation about supported features | 18:31 |
sean-k-mooney | i agree on documenting that you cant cahnge this with instances on the host | 18:31 |
sean-k-mooney | and shoudl reboot if you do change it or something like that | 18:32 |
sean-k-mooney | but we can detect and error or fix it fi we want too | 18:32 |
sean-k-mooney | if we have CONF.libvirt.cpu_power_management=true i think that shoudl mean you cannot manage the cpus outside of nova | 18:32 |
gibi | again, becase we apply the config to the cpus at init_host regardless of any instance on the host you don't need an instance on the host to cause problems | 18:32 |
sean-k-mooney | gibi: we dont your corect | 18:33 |
sean-k-mooney | but we can detect it and document that you should not do this without a host reboot | 18:33 |
bauzas | I'm confused | 18:33 |
gibi | sean-k-mooney: I agree to document | 18:34 |
gibi | sean-k-mooney: and detect if possible | 18:34 |
sean-k-mooney | detect is possibel but i think bauzas would prefer to not require that for the feature to merge and maybe to it later | 18:34 |
sean-k-mooney | (govoner->state) the concer is we leave some cpus in low perfomacne mode | 18:35 |
sean-k-mooney | (state-) | 18:35 |
sean-k-mooney | (state->govoner) the concern is we leave some cores offline | 18:35 |
sean-k-mooney | we can detect boot and make it a hard error from init_host | 18:35 |
sean-k-mooney | in either case a host reboot will resolve it or they operator can fix it | 18:36 |
gibi | yepp | 18:37 |
sean-k-mooney | bauzas: does ^ make sense | 18:38 |
sean-k-mooney | the offline core check is triival its just https://review.opendev.org/c/openstack/nova/+/821228/5/nova/virt/libvirt/host.py#755 | 18:39 |
sean-k-mooney | the inconsitent govoner state is also trivial | 18:39 |
sean-k-mooney | i dont think we shoudl check if its explictly powersave or anything like that | 18:39 |
opendevreview | Merged openstack/placement master: Modify the placement API policies defaults and scope_type https://review.opendev.org/c/openstack/placement/+/865618 | 18:39 |
sean-k-mooney | just in cpu_sate mofe assert the govoner is the same for all cores in cpu_dedicated_set | 18:40 |
sean-k-mooney | s/mofe/mode/ | 18:41 |
bauzas | I just want to clarify | 18:41 |
bauzas | are we OK with doing this in init_host() ? | 18:42 |
gibi | yes | 18:42 |
sean-k-mooney | yep | 18:42 |
bauzas | OK | 18:42 |
bauzas | and in init_host(), do we agree on detecting that if power strategy is set to governor, all cores should be up ? | 18:43 |
sean-k-mooney | i would prefer if you put these in a new fucntions that you invoked form there for simpler testing | 18:43 |
sean-k-mooney | bauzas: yes | 18:44 |
sean-k-mooney | if its set to govoner the old requiremnt that all core must be up should still apply | 18:44 |
bauzas | https://review.opendev.org/c/openstack/nova/+/868237/9/nova/virt/libvirt/driver.py#826 | 18:44 |
bauzas | I already call power_down_all_cores() | 18:44 |
bauzas | in power_down_all_cores() I can do two checks | 18:45 |
bauzas | besides the existing one | 18:45 |
sean-k-mooney | you should not over complicate that function | 18:45 |
bauzas | 1/ if strategy is set to governor, all cpus need to be online | 18:45 |
bauzas | 3/ if strategy is set to cpu_state, then all governors should be identical | 18:46 |
bauzas | s/3/2 | 18:46 |
sean-k-mooney | i would prefer a new validate_cores() or similr function and only call libvirt_cpu.power_down_all_dedicated_cpus() if the mode is state | 18:46 |
sean-k-mooney | bauzas: yes those are the two checks that i think you shoudl do | 18:47 |
sean-k-mooney | but not in libvirt_cpu.power_down_all_dedicated_cpus() | 18:47 |
sean-k-mooney | do it ins libvirt_cpu.validate_all_cpus() | 18:47 |
bauzas | I can manage that request | 18:47 |
gibi | for 2/ if all cores has governor low as nova set it to that before the reconfiguration then we have a problem still | 18:47 |
sean-k-mooney | gibi: no | 18:47 |
sean-k-mooney | we cant check for that | 18:47 |
sean-k-mooney | because the admin might have set it to low intentionally | 18:48 |
sean-k-mooney | or to any other value | 18:48 |
bauzas | this is why I think docs is important | 18:48 |
bauzas | (which is missing, but we can write it before RC1) | 18:48 |
sean-k-mooney | thats why i stated the requiremnt that they should all be the same if the power state is manged by nova | 18:48 |
bauzas | I'm OK with this | 18:49 |
bauzas | nova will try to detect | 18:49 |
bauzas | if the operator explicitely manages all the governors and wants to set nova to turn off cores, that's his choice | 18:49 |
gibi | I undrestand that we cannot catch that if the operator changed a governor directly. And I agree that we should not be able to catch that. But if an empty compute was configured with governor strategy, then reconfigured to cpu_state strategy then that compute will have all dediceted cpus in low performance mode for ever | 18:49 |
bauzas | he'll probably end up with problems, but meh, unsupported | 18:49 |
sean-k-mooney | gibi: that is fixed by the host reboot | 18:50 |
sean-k-mooney | that why i think we need to document that if you want to change this you should reboot the host | 18:50 |
sean-k-mooney | so we start form a clean state | 18:50 |
gibi | sean-k-mooney: so we will say in the doc that the startegy can only be change via host reboot? | 18:50 |
sean-k-mooney | that is what im suggestign yes | 18:50 |
gibi | OK | 18:50 |
gibi | that will solve it yes | 18:50 |
bauzas | gibi: the problem is that we can't assume that governor_high is the default value *before* | 18:50 |
gibi | bauzas: I know :) | 18:51 |
gibi | host reboot, it is | 18:51 |
bauzas | ok, so docs | 18:51 |
sean-k-mooney | docs and the 2 checks you wrote above please | 18:51 |
sean-k-mooney | docs is the most imporant | 18:51 |
sean-k-mooney | but i would like to see the check if possibel too | 18:51 |
gibi | works for me | 18:52 |
gibi | thanks folks for the discussion | 18:52 |
bauzas | ++ | 18:52 |
bauzas | will be working on it on the train tomorrow | 18:52 |
sean-k-mooney | i will be around for reviews tomorrow and then wednesday | 18:53 |
sean-k-mooney | so feel free to ping me | 18:54 |
sean-k-mooney | i would normaly take the week of valentines off but since it FF week im going to be here for wednesday-friday | 18:54 |
bauzas | don't feel obliged | 18:57 |
bauzas | we have already promised more than what we can offer | 18:58 |
bauzas | and we're short in time | 18:58 |
bauzas | the numbers will be terrible, but I can surely explain those | 18:58 |
opendevreview | Elod Illes proposed openstack/nova stable/train: DNM: CI test https://review.opendev.org/c/openstack/nova/+/873116 | 20:21 |
gmann | bauzas: with placement change merged (https://review.opendev.org/c/openstack/placement/+/865618) we can mark this BP as completed https://blueprints.launchpad.net/placement/+spec/policy-defaults-improvement | 21:09 |
*** dasm|afk is now known as dasm|offline | 22:05 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!