*** tbachman is now known as Guest2804 | 00:15 | |
*** bhagyashris is now known as bhagyashris|out | 03:30 | |
opendevreview | Lucian Petrut proposed openstack/nova master: api: enable oslo.reports when using uWSGI https://review.opendev.org/c/openstack/nova/+/810922 | 06:23 |
---|---|---|
bauzas | good morning Nova | 07:00 |
* kashyap waves | 07:19 | |
opendevreview | alecorps proposed openstack/nova master: VMware: Support volumes backed by VStorageObject https://review.opendev.org/c/openstack/nova/+/808791 | 08:32 |
bauzas | mmm, I'm stuck trying to install a devstack on RHEL8.2 with a "openstack: command not found" when creating keystone accounts... https://paste.opendev.org/show/809996/ | 08:41 |
bauzas | anyone hitting it ? | 08:42 |
bauzas | I'm out of ideas | 08:42 |
kashyap | bauzas: Why are you installing it on RHEL8.2? | 08:47 |
kashyap | FWIW, I'd suggest to pick a latest-1 Fedora (or Debian/Ubuntu - if you're comfy w/ it) :) | 08:47 |
* kashyap crawls back into his cave - need to prepare for a presentation on a short notice | 08:48 | |
frickler | bauzas: did you check that there is no earlier error already? also 8.2 afaict isn't supported by devstack anymore | 08:49 |
opendevreview | Rodolfo Alonso proposed openstack/nova master: Set "cache_ok=True" in "TypeDecorator" inheriting classes https://review.opendev.org/c/openstack/nova/+/807359 | 09:01 |
gibi | bauzas: hi! did you got the moderator info for the PTG from Ashlee? or should I forward? | 09:13 |
bauzas | frickler: kashyap: thanks (for some reason, got no pidgin notification when you highlighted me) | 09:18 |
bauzas | frickler: kashyap: I'll then use RHEL8.4 I guess (I need it for testing the nvidia GPUs, Fedora is not supported for their driver) | 09:18 |
bauzas | gibi: hmmm, by email ? if yes, nope | 09:19 |
bauzas | gibi: thanks | 09:19 |
gibi | email so I forward then | 09:19 |
gibi | done | 09:20 |
bauzas | gibi: thanks, will look at it ! | 09:21 |
gibi | also fyi, on Monday I will only be available from 15:00 UTC | 09:21 |
gibi | rest of the week I will fully available | 09:24 |
bauzas | ++ | 09:25 |
kashyap | bauzas: Ah, I see. | 09:35 |
viks__ | hi, I have set `live_migration_completion_timeout=100 & live_migration_timeout_action=force_complete`, now i'm stressing the vm via stress-ng tool with load going up to 300. But my migration is not completing. Why `force_complete` action is not getting kicked in? | 09:37 |
-opendevstatus- NOTICE: zuul was stuck processing jobs and has been restarted. pending jobs will be re-enqueued | 10:02 | |
opendevreview | Ilya Popov proposed openstack/nova master: Fix to use NUMA cell with more free memory first https://review.opendev.org/c/openstack/nova/+/805649 | 10:04 |
gibi | sean-k-mooney: you were right, the rpc.NOTIFIER is the global that we facilitate the test case crosstalk https://bugs.launchpad.net/nova/+bug/1946339/comments/7 | 12:04 |
gibi | s/we facilitate/ facilitates/ | 12:04 |
gibi | we reset that global between tests but the nova does dynamically gets the global from the rpc module whathever it is at the moment | 12:05 |
gibi | so if the global was re-inited it uses the re-inited global for the next notification | 12:05 |
gibi | I don't see how to fix this from the rpc.NOTIFIER perspective | 12:09 |
sean-k-mooney | i see i looked at that breifly but did not see how that happened but that is what my gut was telling had to be happening | 12:09 |
gibi | you have good gut :) | 12:10 |
sean-k-mooney | so this global https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/rpc.py#L53 is really the issue right | 12:11 |
gibi | yes, | 12:11 |
sean-k-mooney | we need that to be mocked in the setup of the test | 12:11 |
gibi | an whatever code wants to emit a notificiation it uses that global | 12:11 |
gibi | sean-k-mooney: that won't work as the code grabs the global at the point of time when the notification needs to be emitted | 12:11 |
gibi | so the first tc will grab it 60 seconds after the tc is finished | 12:12 |
gibi | and at that time it is already restubbed to the current test case | 12:12 |
gibi | so it grabbs the new stubbed version that is connected to the current testcase | 12:12 |
gibi | hence the crosstalk | 12:13 |
gibi | if nova would grab the global at service startup then yes stubbing would work | 12:13 |
sean-k-mooney | damnb ok ya that is annoying | 12:14 |
gibi | it is really due to that the test case executor things that a tc is finished and moves forward but the tc still has greenlets running in the background | 12:15 |
gibi | + the global :) | 12:15 |
gibi | I tried killing greenlets at the end of test case but I think I cannot properly kill it | 12:15 |
sean-k-mooney | we might be able to wait for the notificaiton in that one test but this could affect any set of tests | 12:16 |
gibi | yes, waiting in each test for each build to finish is a way to solve this | 12:16 |
sean-k-mooney | so i think we need a more systematic way of mocking this but im not sure how to approch that | 12:16 |
gibi | yeah probably we need a higher level mock than the stub on rpc.NOTIFIER | 12:17 |
gibi | I have to think about it | 12:17 |
sean-k-mooney | we cant just stub out nova.rpc.get_versioned_notifier() | 12:18 |
gibi | nope | 12:18 |
gibi | the module level function is also a global | 12:18 |
gibi | so when the caller say rpc.get_versioned_notifier it gets whatever mocked version the modul has at the moment | 12:19 |
gibi | and 60 seconds after the first tc, it will be mocked to the current tc not to the first tc | 12:19 |
sean-k-mooney | it does yes but i was wondering if we coudl have a per test dictionaty of notifieers and do a lookup in that | 12:20 |
gibi | the caller cannot provide the test case id | 12:20 |
gibi | afaik | 12:20 |
gibi | or in other way, what would be the key in the lookuptable? | 12:21 |
sean-k-mooney | it cant but we can | 12:21 |
sean-k-mooney | in the fixture we can stash that value | 12:21 |
sean-k-mooney | so the ideay i had was use a dict with set_default with the test_id as a key and a new fake notifyer as the default | 12:22 |
sean-k-mooney | then return the result | 12:22 |
sean-k-mooney | then clear it at the end fo a test run | 12:22 |
sean-k-mooney | if a long runing eventlet sends a notificaiton after the test we will get a new notifyer | 12:23 |
sean-k-mooney | instead of the current one | 12:23 |
gibi | the long running eventlet when calls nova.rpc.get_versioned_notifier it does not provide any tc id, same as if the currntly runnig tc calls nova.rpc.get_versioned_notifier | 12:23 |
gibi | from the fixture prespective both nova.rpc.get_versioned_notifier call are happening at the current tc time | 12:24 |
gibi | and providing no id | 12:24 |
gibi | is there a greenlet specific storage space like threadlocal? | 12:25 |
sean-k-mooney | i think we shoudl be able to make it sticky to the greenlet yes | 12:25 |
gibi | somehow we need to mark the long running eventlet with a different id than the current eventlets | 12:25 |
sean-k-mooney | i feel like i have done this before | 12:26 |
gibi | an we have to store the tc id automatically in each greenlet nova spawns which is /o\ | 12:32 |
sean-k-mooney | i rememebr trying to use with context managers to create funcitonal test where each isntance of nova compute had a different nova.conf in the past | 12:32 |
sean-k-mooney | we did not merge it but i was able to make each nova-compute have a differfent view of the global config | 12:33 |
sean-k-mooney | i have no idea where that his however so i think we can spawn the nova-comptue serivce such that the things we have monkey patched are sticky to that instance but i have no idea if that would out live the test | 12:34 |
sean-k-mooney | i think those patcher would likely get towrn down when the test funciton ends | 12:34 |
sean-k-mooney | leading to the same problem | 12:35 |
sean-k-mooney | gibi: basicaly i was hopign we could use functool.partil or something to carry the extra info | 12:35 |
sean-k-mooney | gibi: there is https://eventlet.net/doc/modules/corolocal.html | 12:36 |
gibi | for the partial: for that we need to attach the partial to a thing that is specific to the current test case execution | 12:36 |
sean-k-mooney | gibi: ya and i dont really know how to do that | 12:37 |
gibi | for corolocal that can be the storage, but then we probably need to patch eventlet.spawn* to fill it | 12:37 |
gibi | I will play around | 12:37 |
sean-k-mooney | certnely not in the notificaiotn fixture which is where we really want to do this | 12:37 |
sean-k-mooney | ya i might try and play with this too | 12:37 |
gibi | sean-k-mooney: eventlet patches threading.local to be corolocal.local() | 12:38 |
lajoskatona | gauzas, gibi: Hi, for rbac discussion do you think Neutron should join to the discussion? (see: https://etherpad.opendev.org/p/policy-popup-yoga-ptg ) | 12:38 |
lajoskatona | bauzas ---^ (sorry) | 12:38 |
gibi | lajoskatona: for the external event discussion would be good to have somebody from neutron as the client of that api | 12:39 |
sean-k-mooney | lajoskatona: i think there is work to be done with makeing nova capable of calling neturon where neutron is using scope enforcement | 12:39 |
sean-k-mooney | and ya the external events is the flip side of that | 12:40 |
lajoskatona | gibi, sean-k-mooney: thanks, than I add it to next week's shcedule | 12:40 |
sean-k-mooney | gibi: i still think we need to create some form of oslo.midelware so that we can dicorver a services policy programticaly form teh api | 12:41 |
sean-k-mooney | right now the operator will need to set the correct scopes ectra in our config file | 12:41 |
lajoskatona | gibi, bauzas, sean-k-mooney: we have edge session at the same time (1400-1600) try to fix that | 12:42 |
bauzas | sorry in a meeting | 12:42 |
* gibi lets bauzas agree on the schedule | 12:42 | |
bauzas | can you tl;dr ? | 12:42 |
bauzas | I'm hardly following | 12:42 |
sean-k-mooney | bauzas: there is a clash between the nova rbac popup session and a neutron? edge session | 12:44 |
sean-k-mooney | it woudl be good if we could adjust the schduler to accomidate that lajoskatona is that a correct summary | 12:44 |
lajoskatona | sean-k-mooney: yes | 12:45 |
lajoskatona | sean-k-mooney, bauzas, gibi: I try to fetch ildikov to see if wee need both hours for edge..... | 12:45 |
sean-k-mooney | we can also talk about neutorn rback issues in the nova neutron session too if we cant | 12:45 |
lajoskatona | sean-k-mooney: yeah, worst case | 12:46 |
*** bhagyashris|out is now known as bhagyashris|mtg | 13:00 | |
bauzas | ah ok | 13:01 |
bauzas | let's then wait for ildikov but I'm pretty sure we can find other slots | 13:01 |
* bauzas disappears for 30 mins after 1h30 of meetings => haircutr | 13:30 | |
bauzas | and then I'm back | 13:30 |
*** bhagyashris|mtg is now known as bhagyashris|away | 15:03 | |
melwitt | bauzas: sorry to bring this up again but do you think there's a chance you could look at https://review.opendev.org/c/openstack/nova/+/791807 and https://review.opendev.org/c/openstack/nova/+/806629 before the ptg? elodilles is +2 on them and I'm trying to avoid them getting lost | 16:06 |
bauzas | melwitt: yeah I remember I had to review your patches but I wasn't seeing them on my review priority list, adding them | 16:08 |
* bauzas has to disappear now | 16:08 | |
melwitt | thank you bauzas | 16:08 |
bauzas | melwitt: will be the first patches I look tomorrow | 16:08 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Prevent leaked eventlets to send notifications https://review.opendev.org/c/openstack/nova/+/814036 | 16:29 |
gibi | sean-k-mooney: ^^ this fixes the local reproduction for me | 16:29 |
melwitt | gibi: omg, I spent some time looking at these failures yesterday and could not figure out how/why we got "no reply on conductor" and DBNonExistentTable. awesome find 🙌 | 16:34 |
gibi | melwitt: it was hard one, sean-k-mooney and I spent two day figuring it out | 16:34 |
gibi | the lesson for me is that we probably should not run multiple testcases in a sequence in a same process if those testcases use some kind of parallelism | 16:36 |
gibi | but I have no viable alternative | 16:36 |
melwitt | I was completely stumped. I'm so happy y'all figured it out | 16:36 |
*** dpawlik5 is now known as dpawlik | 16:37 | |
gibi | ... and I still not like eventlets ;) | 16:37 |
melwitt | haha :) | 16:40 |
melwitt | speaking of eventlet... | 16:41 |
melwitt | here's a thing I'd appreciate your eyes on https://review.opendev.org/c/openstack/nova/+/813114 to see if this is the right way to solve it or if I'm missing a better way | 16:42 |
melwitt | gibi ^ | 16:42 |
gibi | melwitt: added to my queue | 16:43 |
melwitt | danke | 16:43 |
sean-k-mooney | most of the work was by gibi but ok ill review that it looks promising | 17:04 |
sean-k-mooney | it its not that complex at first glance | 17:04 |
sean-k-mooney | ah you are intercepting spawn | 17:06 |
sean-k-mooney | and that is where your getting the testcase id | 17:06 |
sean-k-mooney | and propagating it | 17:06 |
sean-k-mooney | gibi: so the runtime error will that cause the test to fail? | 17:06 |
sean-k-mooney | i.e. when https://review.opendev.org/c/openstack/nova/+/814036/1/nova/tests/fixtures/notifications.py#168 is raised | 17:07 |
sean-k-mooney | which test will fail | 17:07 |
sean-k-mooney | or will we catch that and just loog it | 17:08 |
sean-k-mooney | oh the runtim error goes to the eventlet that called notify | 17:08 |
sean-k-mooney | which is the one that is runnign the backgound and it kill whatever was leaked | 17:09 |
sean-k-mooney | gibi: by the way would stoping the compute service help in this case. | 17:12 |
sean-k-mooney | gibi: we start the service here https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/tests/functional/integrated_helpers.py#L1125-L1143 | 17:14 |
sean-k-mooney | gibi: if we added a tear_down funciton implmeation that expiclty stop them would that help clean up any running eventlests | 17:15 |
sean-k-mooney | we have the service refernces so i feel like we shoudl be able to use them to invoke stop https://github.com/openstack/nova/blob/7b063e4d0518af3e57872bc0288a94edcd33c19d/nova/service.py#L282-L296 | 17:16 |
sean-k-mooney | that will at least stop the rpc server instnaces | 17:16 |
sean-k-mooney | hum ok i guess we are at least partly doing that https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/test.py#L438-L441 | 17:18 |
sean-k-mooney | actully no that is not registring it to be involed automaticaly its patching the stop funciton | 17:20 |
sean-k-mooney | the service fixture calls kill as a cleanup function which in turn calls stop so we are already stoping the services when the serivce fixtuer is disposed of | 17:22 |
opendevreview | Ilya Popov proposed openstack/nova master: Fix to use NUMA cell with more free memory first https://review.opendev.org/c/openstack/nova/+/805649 | 19:14 |
opendevreview | Hang Yang proposed openstack/nova master: Support creating servers with RBAC SGs https://review.opendev.org/c/openstack/nova/+/811521 | 23:28 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!