opendevreview | Rajesh Tailor proposed openstack/nova master: Fix KeyError on assisted snapshot call https://review.opendev.org/c/openstack/nova/+/900783 | 10:35 |
---|---|---|
opendevreview | Danylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added. https://review.opendev.org/c/openstack/nova/+/876075 | 11:46 |
jkulik | Hi folks, we just stumbled over https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/5ASRKETLJQAZM2KJ2EDIJZB6XMDWAL4P/ which has removing vmware from Nova and other projects as the topic. We're still actively using VMware inside SAP through OpenStack. Is there any way we can reverse the decision to completely remove the vmwareapi infrastructure from Nova? | 12:01 |
stephenfin | jkulik: I would suggest bringing this up on the Nova weekly meeting and likely openstack-discuss. The details and link to the agenda for the former can be found here https://meetings.opendev.org/#Nova_Team_Meeting | 12:12 |
stephenfin | jkulik: the tl;dr: is that we have had no CI for any of this code for many years now, which means we have no ability to test it and ensure that it works/isn't broken by other changes | 12:15 |
opendevreview | Samuel Kunkel proposed openstack/nova master: fix: amd-sev handle missing img properties https://review.opendev.org/c/openstack/nova/+/874264 | 12:17 |
stephenfin | jkulik: I would also suggest you need to bring this up internally rather quickly. I can't speak for anyone else but _my guess_ would be that the only way this decision would be reversed would be via a long-term commitment from some third party to maintain testing infrastructure for these components (the VMWare-related drivers in Nova, Cinder, Neutron etc.; oslo-vmware, ...) | 12:17 |
stephenfin | jkulik: We previously got such a commitment from VMWare folks (about 2-3 years ago, iirc) but it wasn't followed up on, so there will likely be some hesitation at taking any commitment at face value without seeing e.g. a CI being established | 12:20 |
stephenfin | Hopefully that helps provide a bit of colour to the situation | 12:20 |
jkulik | thank you. indeed I remember the deprecation attempt 2-3 years ago when we got VMware to bring back some CI. we were not aware that it's broken again. | 12:23 |
sean-k-mooney | jkulik: the ci hase been absent for 2+ years | 12:34 |
jkulik | iirc this was the time when we talked to VMware and they brougt back a CI https://review.opendev.org/c/openstack/nova/+/712101 - 2020-03 ... so there should have been some CI ~3 years ago again. https://review.opendev.org/c/openstack/nova/+/806391 in 2021-09 still contains a message from a "VMware NSX CI". I'm just wondering when it went away. | 12:39 |
sean-k-mooney | jkulik: presumably related to the pandmeic that ci went way again in 2021 and never returned | 12:47 |
jkulik | 2022-02 I found a message `ext-nova-zuul: NOT_REGISTERED` for that CI. sounds like a good hint when it went away. in 2021 it still seemed to work. Even in 2022-01-27 there's still a message for a Cinder review-request | 12:47 |
jkulik | yeah, VMware doesn't have too much going for OpenStack. is there a way for a third-party, non-VMware user to get notified if the CI goes away? | 12:48 |
sean-k-mooney | there isnt, and evenupstream developer will only know if they noticed its not reporting | 12:49 |
sean-k-mooney | jkulik: they did not tell us they were nolonger maintianing it or removing it | 12:49 |
sean-k-mooney | it jus t disappeared silently | 12:49 |
sean-k-mooney | again | 12:50 |
jkulik | they did not tell us either ... that's why I would have liked to get notified somehow to tell them to fix it | 12:50 |
sean-k-mooney | at this point i thik its beyond that | 12:50 |
sean-k-mooney | they have demonstarted that they will not maintain the ci or lib requried to supprot vmware | 12:51 |
jkulik | Could someone else step up? | 12:51 |
jkulik | Would that be acceptable, I mean? | 12:51 |
sean-k-mooney | honestly i dont think so | 12:51 |
jkulik | so we would be left with trying to maintain an out-of-tree driver, which is not supported, and keeping all the to-be-removed code in sync with upstream in our downstream fork ... :/ | 12:52 |
sean-k-mooney | at this point i think we woudl neeed to see more then a token effort | 12:52 |
sean-k-mooney | jkulik: out of tree dirvers wont work | 12:53 |
jkulik | yeah, I read that | 12:53 |
stephenfin | sean-k-mooney: I don't see any reason why another third party couldn't maintain it. As you say though, it'd have to be more than a token effort | 12:53 |
sean-k-mooney | the core of vmware is opensouce right (EXSI( | 12:53 |
sean-k-mooney | stephenfin: someone else could | 12:54 |
jkulik | no, ESXi is not open source. not sure if that was sarcasm | 12:54 |
sean-k-mooney | i tought ther ewas an opensouce core | 12:54 |
sean-k-mooney | exi is not | 12:54 |
sean-k-mooney | but isnt esxi based on linux | 12:54 |
jkulik | I haven't heard of any. The things we use are not open source. | 12:55 |
jkulik | what would "more than a token effort" entail? | 12:55 |
sean-k-mooney | actully continutigi to the maintaicne of vmaware as well as first/tridpary ci | 12:56 |
sean-k-mooney | jkulik: i might of been thinking fo the fact there is a free version of esxi | 12:56 |
sean-k-mooney | its not opensouce but there are free lisnces | 12:57 |
stephenfin | At a minimum, CI coverage. Practically speaking though, there's mountains of tech debt in the VMWare driver that needs addressing if we're keeping it around long-term | 12:57 |
sean-k-mooney | basically i was considing if instead of third party ci we coudl do this first party somehow | 12:57 |
sean-k-mooney | right there is a lot of techdebt( lack of functional tests, old apis/deps) and large feature gaps | 12:58 |
jkulik | sounds like it boils down to us (SAP) working more with upstream, cleaning the driver and by that recognizing CI going missing, if we want that driver to stay | 12:59 |
stephenfin | jkulik: Really it would be a near-full-time job. The obvious answer here is that SAP put forward someone to maintain CI and work to get your downstream fork synced back upstream (or at least work to get equivalent functionality added upstream) but I don't know how SAP are for upstream development. It's definitely something that has to developed and nurtured be in the org/company culture. There's also a cost to it, though how | 12:59 |
stephenfin | that cost compares to maintaining an out-of-tree driver would be interesting to study | 12:59 |
stephenfin | jkulik: exactly | 12:59 |
jkulik | we're open to upstreaming, but given that we need the features downstream first, they often don't end up upstream due to time constraints and us not actively including ourselves in the upstream community. I feel that pain every time when we port our patches to a newer release. I'll bring that topic up internally with management. | 13:02 |
jkulik | thank you both | 13:03 |
stephenfin | jkulik: np. I'll also add that we're generally very open to adding additional cores, particularly when those cores are focused on a particular area and have competency there, but obviously that competency needs to be demonstrated for some time first | 13:05 |
jkulik | :+1: | 13:06 |
sean-k-mooney | well keep in mind the job of a core is primarly to review code not write it | 13:06 |
sean-k-mooney | so while stephenfin is correct we would be open to adding cores the copendncy we woudl use to make that determination is there review output not there code output | 13:07 |
jkulik | review output in Nova in general, I guess, not only the vmwareapi driver then? As I guess, that one doesn't see too many patches. | 13:08 |
sean-k-mooney | sortof both. in general in the past we have expect a general knowladge of the entire codebase but most core have areas or specialisation where they spend most of there time | 13:09 |
stephenfin | That would be an interesting discussion to have as a community. As you say, that area doesn't see too many patches and there is very little expertise in that area among existing cores. Ensuring we had decent review throughput would be an interesting thing to figure out | 13:10 |
sean-k-mooney | well part of the expction would be that they would enforece the same feature design crita as we do for the rest of nova | 13:10 |
sean-k-mooney | in the past vmware have added funcitonlity in the out of tree drive and then tried to upstream it | 13:11 |
sean-k-mooney | or made changes that normaly woudl requrie spec or blueprints that got little or no discussion | 13:11 |
sean-k-mooney | so if we were to add cores spcialised in vmware i woudl expoect the process to addign vmware features to be the saem as libvirt/ironic | 13:12 |
jkulik | hm ... yeah, I'd expect upstream to work like upstream not like downstream :D | 13:12 |
sean-k-mooney | the problem we have had historically with driver that have out of tree vairiant | 13:12 |
sean-k-mooney | is they and just enouch funcitonality in the intree driver to enabel them to create the feature they want to have in teh out of tree one | 13:13 |
jkulik | hm ... having a lot of patches downstream, I can see the point. the VMware driver manages a whole cluster, so things might work quite differently than in the reest of the drivers / Nova in general | 13:14 |
sean-k-mooney | i was plannign to bring this up next cycle but once the vmware/hyperv dirers were removed i was gong to propose removoing the ablit y to have out of tree dirvers entirly | 13:14 |
sean-k-mooney | jkulik: we dont offically support out of tree driers in nova but we dont explictly block it either | 13:15 |
jkulik | I mean, we're having a fork basically, so our patches are not necessarily "out of tree" - they are on top of tree :D | 13:15 |
stephenfin | yeah, that's how cloudbase were handling things with the hyper-v driver | 13:16 |
stephenfin | through a combination of (I suspect) lack of upstream review throughput but also lack of investment in the time needed to drive consensus on larger design changes | 13:16 |
jkulik | yeah, same here. whenever we try to upstream common code (as a basis or start to upstream more), those reviews hang around for a long time. | 13:17 |
stephenfin | the zVM driver takes a different tack, where the driver is essentially a shim over an external library where all the "real" functionality lives | 13:17 |
jkulik | since there's noone invovled with the vmware driver upstream, that's fully understandable, but doesn't help in reducing the technical debt either | 13:18 |
stephenfin | jkulik: then that would be on us. If someone were to start maintaining a CI and the codebase it would be up to us to ensure things aren't hanging around for months | 13:18 |
jkulik | point being no CI anymore for nearly 2 years. got it. | 13:20 |
sean-k-mooney | ya so zvm is interesting because i also have not see it run as a third party ci | 13:20 |
sean-k-mooney | but we did nto mark it as experimental so we need to give them time to correct taht | 13:21 |
sean-k-mooney | the IBM PowerKVM CI does not test zvm | 13:21 |
sean-k-mooney | its thesting livbirt on power | 13:21 |
sean-k-mooney | that siad it may only run wehn you make chagnes to the zvm dirver | 13:22 |
sean-k-mooney | so we need to submit a test patch and determin if its still functional | 13:23 |
jkulik | Does it make sense to submit monthly patches just to check the different CIs still running? | 13:25 |
stephenfin | CIs can be configured to run periodic jobs. That would probably be a better approach | 13:25 |
stephenfin | and publish the URL where those reports can be found | 13:26 |
tkajinam | o/ May I ask for reviews to merge https://review.opendev.org/c/openstack/nova/+/900538 to make sqlalchemy-2.x job voting again ? | 13:50 |
zigo | Also, would be cool to have this one merged: https://review.opendev.org/c/openstack/oslo.cache/+/900158 (ie: oslo.cache leaving sockets to memcached in CLOSE_WAIT state...) | 14:22 |
opendevreview | Danylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added. https://review.opendev.org/c/openstack/nova/+/876075 | 14:26 |
sean-k-mooney | zigo: wrong channel | 14:32 |
zigo | ok :) | 14:32 |
sean-k-mooney | im not sure about hte while true: | 14:32 |
zigo | I am. | 14:32 |
sean-k-mooney | that might never exit | 14:32 |
zigo | There's a break ... | 14:32 |
sean-k-mooney | there is only if you get an indexError | 14:33 |
zigo | Which is going to happen when all connections have been closed. | 14:33 |
sean-k-mooney | no | 14:33 |
zigo | queue.pop() will return nothing... | 14:33 |
zigo | I mean, will do an IndexError no? | 14:33 |
sean-k-mooney | it returns None by default no? | 14:33 |
sean-k-mooney | which will cause an AtributeError when we defernce the connection | 14:34 |
zigo | tkajinam wrote it, he may tell. | 14:34 |
zigo | At first, I wrote: | 14:34 |
zigo | while ( conn = self.queue.pop().connection ) != None: | 14:34 |
sean-k-mooney | https://docs.python.org/3/library/collections.html#collections.deque.pop | 14:35 |
sean-k-mooney | ok pop does raise indexerror if its empty | 14:35 |
sean-k-mooney | i was expeciting it to reutrn a default like get on a dict | 14:35 |
sean-k-mooney | zigo: i still dont partically like useign excptions as contolflow but i see why this is working | 14:37 |
*** ozzzo1 is now known as ozzzo | 14:37 | |
zigo | I was surprised at first too. Probably this only deserves some added code comments ... | 14:38 |
sean-k-mooney | zigo: i still feel like it might be better to have a timeout but i see why you want it to wait | 14:38 |
zigo | A timeout ?!? | 14:38 |
zigo | You mean wait until the CLOSE_WAIT times out ? | 14:38 |
zigo | That is a crazy idea. One of our server doesn't even have source port available because of this. | 14:38 |
zigo | Also, we reach the memcached max open sockets very fast. | 14:39 |
sean-k-mooney | no | 14:39 |
sean-k-mooney | i mean perhasp we shoudl not try to clsoe the connections indefintly | 14:39 |
zigo | If we don't, then there's some remaining sockets in CLOSE_WAIT state. | 14:40 |
tkajinam | the logic loops over all connections in the queue but it does not retry for a single one | 14:40 |
sean-k-mooney | oh your right | 14:40 |
tkajinam | in case a connection closure fails then it leaves that one and process the next | 14:40 |
sean-k-mooney | if we cant close it its already poped | 14:40 |
tkajinam | yeah | 14:40 |
sean-k-mooney | so we will at most try each connection once | 14:40 |
sean-k-mooney | ok yep | 14:40 |
sean-k-mooney | so ya i agree the patch is correct then | 14:40 |
* sean-k-mooney should read it more closly | 14:40 | |
zigo | tkajinam: Are you ok if I add some more comments in your patch? | 14:41 |
tkajinam | you can though we have number of similar implementations in the other places in that single file | 14:41 |
tkajinam | anyway this is not the right channel and we can continue it in #openstack-oslo | 14:42 |
sean-k-mooney | i +1'd it for what its worth | 14:42 |
sean-k-mooney | and i +2'd the voting job changes | 14:43 |
tkajinam | sean-k-mooney, thanks ! | 14:43 |
* zigo added some comments in that code, as it's the 3rd time someone is surprised by it. | 14:47 | |
sean-k-mooney | while true: and excptions as contol flow are redflags for me | 14:48 |
sean-k-mooney | zigo: so if i see either i try to take a closer look and ensure the will exit in all cases | 14:48 |
sean-k-mooney | unless there i s a comment explinaing ny this shoudl run forever or simialr | 14:49 |
zigo | Agreed, it's not very natural. | 14:49 |
zigo | Though there's a lot of valid cases in this world. | 14:50 |
zigo | Example: the wait for event loop in every single X-Window app ... :) | 14:50 |
sean-k-mooney | i alwasy implemnt those in terms of a var that is set to true but can be set to false by the app | 14:54 |
sean-k-mooney | but yes | 14:54 |
tkajinam | zigo, as I said earlier the same snippet is used multiple times in the same file. If you add a comment then it should be added to the all places. If you fix it then you should fix all. | 15:06 |
dvo-plv | hello, hello | 15:06 |
dvo-plv | I wouldl ike to suggest this spec for review https://review.opendev.org/c/openstack/nova-specs/+/895924 | 15:07 |
dvo-plv | If you will have a time | 15:07 |
tkajinam | zigo, hmm. ignore it. it might be the only place where that hack is used for loop | 15:07 |
* tkajinam is moving to oslo ,really now. sorry for the noise ! | 15:07 | |
opendevreview | Stephen Finucane proposed openstack/nova master: docs: Further tweaks to the CPU models document https://review.opendev.org/c/openstack/nova/+/784066 | 17:15 |
opendevreview | Robert Franzke proposed openstack/nova stable/zed: fix: server rebuild not working during zed upgrade https://review.opendev.org/c/openstack/nova/+/900811 | 17:22 |
opendevreview | Elod Illes proposed openstack/nova stable/2023.2: Fix rebuild compute RPC API exception for rolling-upgrades https://review.opendev.org/c/openstack/nova/+/900338 | 18:49 |
opendevreview | Elod Illes proposed openstack/nova stable/2023.2: Adding server actions tests to grenade-multinode https://review.opendev.org/c/openstack/nova/+/900339 | 18:49 |
opendevreview | melanie witt proposed openstack/nova stable/yoga: Decorate only Flavor.get_* methods that execute queries https://review.opendev.org/c/openstack/nova/+/900821 | 19:23 |
opendevreview | melanie witt proposed openstack/nova stable/xena: Decorate only Flavor.get_* methods that execute queries https://review.opendev.org/c/openstack/nova/+/900822 | 19:25 |
opendevreview | melanie witt proposed openstack/nova stable/yoga: Decorate only Flavor.get_* methods that execute queries https://review.opendev.org/c/openstack/nova/+/900821 | 19:30 |
opendevreview | melanie witt proposed openstack/nova stable/xena: Decorate only Flavor.get_* methods that execute queries https://review.opendev.org/c/openstack/nova/+/900822 | 23:05 |
opendevreview | Jay Faulkner proposed openstack/nova master: WIP: [ironic] Partition & use cache for list_instance* https://review.opendev.org/c/openstack/nova/+/900831 | 23:25 |
opendevreview | melanie witt proposed openstack/nova stable/wallaby: Decorate only Flavor.get_* methods that execute queries https://review.opendev.org/c/openstack/nova/+/900832 | 23:26 |
opendevreview | Jay Faulkner proposed openstack/nova master: WIP: [ironic] Partition & use cache for list_instance* https://review.opendev.org/c/openstack/nova/+/900831 | 23:34 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!