*** bauzas_ is now known as bauzas | 03:23 | |
*** bauzas_ is now known as bauzas | 04:37 | |
frickler | nova-next failed with two unrelated issues in tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps for https://review.opendev.org/c/openstack/nova/+/923284 , I'm now going to resubmit that stack into gate directly | 06:54 |
---|---|---|
frickler | bauzas: sean-k-mooney: there are some failures on the 2023.1 stack, also these still need reviews so they can be gated once the 2023.2 ones go in (in 2+ hours if no more gate issues happen) https://review.opendev.org/q/project:openstack/nova+branch:stable/2023.1+status:open+topic:%22bug/2059809%22 | 08:05 |
frickler | and nova-grenade-multinode failing on https://review.opendev.org/c/openstack/nova/+/923286 :-( | 08:29 |
frickler | bauzas: sean-k-mooney: gibi: please check ^^ and sorry for keeping to nag you, but I think it would be really great to get the remainder of the stack merged before the weekend | 09:24 |
opendevreview | Merged openstack/nova stable/2023.2: Reject qcow files with data-file attributes https://review.opendev.org/c/openstack/nova/+/923284 | 09:24 |
opendevreview | Merged openstack/nova stable/2023.2: Check images with format_inspector for safety https://review.opendev.org/c/openstack/nova/+/923285 | 09:24 |
frickler | I've resubmitted the two remaining 2023.2 patches into gate now, but please check those errors later, just in case | 09:34 |
frickler | dansmith: ^^ fyi | 09:34 |
sean-k-mooney | frickler: yep this is still our priority today | 09:53 |
sean-k-mooney | im just going though my morning email routine but ill take a look at the backport before i go back to the iso regression | 09:55 |
sean-k-mooney | i asked gibi to respine the backport yesterday for the ami/ari/aki format regression so it looks like that has been doen back to 2023.1 correctly | 09:56 |
frickler | sean-k-mooney: yes, the backports look fine, but 2023.1 had some CI failures and is also missing approvals | 09:59 |
stephenfin | sean-k-mooney: Seeing as it's the last quiet day before ze Americans return, any chance of some further reviews on the OpenAPI series so that can keep ticking forward? https://review.opendev.org/c/openstack/nova/+/915738/ | 10:13 |
sean-k-mooney | ill see how things go with the cve stuff but i may have a chance to look at some of them later today. if not then ill look at them on monday | 10:15 |
stephenfin | ack, ty! | 10:16 |
sean-k-mooney | ok downstream email and pings sorted | 10:24 |
sean-k-mooney | frickler: anywhere i should start with the backports | 10:25 |
sean-k-mooney | or will i just work my way form top to bottom | 10:25 |
frickler | sean-k-mooney: well bottom to top I'd say in how I look at the stack (and gerrit UI seems to agree) | 10:27 |
sean-k-mooney | i ment form newest to oldes branch | 10:27 |
sean-k-mooney | but yes bottom to top within the chain | 10:27 |
sean-k-mooney | i was just checkign where we are in teh diffentb branches https://review.opendev.org/q/topic:%22bug/2059809%22+project:openstack/nova | 10:27 |
sean-k-mooney | looks like 2023.2 is almost done | 10:28 |
sean-k-mooney | so 2023.1 is next | 10:28 |
frickler | yes, only 2023.1 open, so starting at https://review.opendev.org/c/openstack/nova/+/923288/2 should be fine | 10:28 |
frickler | I'll ignore unmaintained branches fwiw | 10:29 |
sean-k-mooney | mnaser: may have na invested intreset :) they have the zed backports up | 10:30 |
sean-k-mooney | but i cant appoved them there anyway so ill stick to stabel | 10:31 |
sean-k-mooney | on a lighter note this is the before https://fileshare.seanmooney.info/average_tempest_runtime_for_tempest-integrated-compute_by_provider.png and after https://fileshare.seanmooney.info/average_tempest_runtime_for_tempest-integrated-compute_by_provider_2024-07-03.png for truning on the the devstack optimisations. both graphs are build form 600-700 ish succesful job runs orver | 10:37 |
sean-k-mooney | the proceeding 2 weeks | 10:37 |
sean-k-mooney | new runs are faster but still not quite enough to account for the job to job varince sometimes causing timeouts on slow nodews but we went form about a 10% timout rate to aroudn 5% and hopefully that is still droping | 10:38 |
frickler | sean-k-mooney: do you also want to approve the stack or do you think you need to wait for another review? just for comparison melwitt single-approved the 2023.2 ones | 10:40 |
sean-k-mooney | frickler: ill approve it once i get to the top im goign to +w all of the rest and then ill +w the bottom patch | 10:40 |
sean-k-mooney | im hoping the 2 that are pending in 2023.2 will merge before then so that the backport check does not kick the final patches out of gate | 10:41 |
frickler | does that check work on the speculative state in zuul or on the actual repos? the former should be fine as long as the other patches are ahead in the queue | 10:42 |
sean-k-mooney | actual | 10:42 |
sean-k-mooney | although if you have stable-only in the commit i think it skip it | 10:43 |
sean-k-mooney | that how we gerenally workaround it when we need to fast merge | 10:43 |
sean-k-mooney | we mark it stable only even when its not | 10:43 |
frickler | hmm, ok, then let's wait for the 2023.2 patches, then I can simply promote the whole 2023.1 stack once it is approved and don't need to do more reshuffling | 10:44 |
sean-k-mooney | ok in that case ill leave +w off the base until those merge then we can send the full set | 10:45 |
frickler | ack | 10:45 |
sean-k-mooney | looks like there is about 30 left on thje 2023.2 gate jobs so ill keep an eye on them | 10:59 |
sean-k-mooney | ok coffee then back to the iso regression | 11:00 |
sean-k-mooney | part of me does not like release the cve fixes without that as its a pretty common usecase but im hoping we can have that fixed properly today or very early next week | 11:00 |
frickler | I wouldn't worry about a release too much at this stage, as most people are running with the provisional patches currently anyway and might switch to head of master after today, but getting the iso issue fixed soon certainly will be a good idea | 11:15 |
frickler | humm, nova-multi-cell failing on https://review.opendev.org/c/openstack/nova/+/923287 | 11:16 |
sean-k-mooney | personally im not sure i would deploy this in production without the iso fixes but ya | 11:16 |
sean-k-mooney | annorying but letss see why | 11:17 |
sean-k-mooney | its one of the volume test sbut not one of the normal kernel panics or similar | 11:17 |
frickler | you would rather run your cloud vulnerable? | 11:18 |
frickler | security group cleanup failure https://zuul.opendev.org/t/openstack/build/1f6d9308d0bb46999682ee43e856ee19 | 11:18 |
sean-k-mooney | then break all usage of iso in nova (and ami iamges if you dont have the stable version of the patches) | 11:19 |
sean-k-mooney | it would depend on what my custoemrs are using | 11:19 |
sean-k-mooney | and if its a public or private cloud | 11:19 |
sean-k-mooney | for a private cloud with semi trusted workloads or iamges like our internal ci cloud i would wait | 11:20 |
sean-k-mooney | if i was vexhost or ovh i proably woudl not | 11:20 |
sean-k-mooney | frickler: anyway lookign at the failure i dont think its related so we can recheck when the job reports back | 11:21 |
frickler | I wouldn't even recheck but promote into gate again immediately, just behind the 2023.1 stack | 11:23 |
sean-k-mooney | fair, that works for me too | 11:24 |
opendevreview | Merged openstack/nova stable/2023.2: Additional qemu safety checking on base images https://review.opendev.org/c/openstack/nova/+/923286 | 11:35 |
frickler | sean-k-mooney: ready to approve 2023.1 ^^ | 11:36 |
sean-k-mooney | dont we still need to wait for https://review.opendev.org/c/openstack/nova/+/923287/3 | 11:36 |
frickler | sean-k-mooney: no, that is the one that failed, I'm re-enqueueing it now | 11:37 |
sean-k-mooney | i can appove and most of them will merge but i think its imporant to not cover the vmdk part too | 11:37 |
sean-k-mooney | oh ok | 11:37 |
sean-k-mooney | ill send them but we might need to handel that on 2023.1 seperatly | 11:39 |
sean-k-mooney | we will see what the timeing looks like | 11:39 |
frickler | sean-k-mooney: ah, right, maybe drop the W+1 from 923291 for now, then | 11:40 |
sean-k-mooney | sure done | 11:41 |
frickler | thx, promoted all the other changes now, we'll see more in 2h hopefully | 11:44 |
sean-k-mooney | let me know if i can help with anything else, ill check on them perodicly | 11:45 |
frickler | ack | 11:50 |
fungi | looks like https://review.opendev.org/c/openstack/nova/+/923291 got unapproved at some point, are we ready to reapprove it? | 12:45 |
fungi | oh, though it's not explicitly listed in the ossa so i guess it's not as urgent | 12:47 |
sean-k-mooney | i unapproved it because the 2023.2 version failed and it would fail if it went to the gate on the backport check | 12:47 |
sean-k-mooney | fungi: so ill reappoave it when https://review.opendev.org/c/openstack/nova/+/923287 is merged | 12:48 |
fungi | hopefully the backport check wouldn't fail when the 2023.2 was ahead of it in the gate unless that test isn't properly speculative, but it seems there's no harm in waiting on that one either | 12:48 |
sean-k-mooney | it will | 12:49 |
sean-k-mooney | its intentially not sepculiive | 12:49 |
fungi | what's the reasoning for that? | 12:49 |
sean-k-mooney | the reason it exits is to make sure we never merge soemthign in an older branch if tis not merged in the newer branch | 12:49 |
sean-k-mooney | we can overried it by marking the change stable-only | 12:50 |
sean-k-mooney | which is what we normally do for CVEs but didnt this time | 12:50 |
fungi | yeah, but making it check zuul's speculative state would still accomplish that, because it would get retested if the newer branch's version gets kicked out for any reason and then would fail | 12:50 |
sean-k-mooney | fungi: if you know how to update the script to do that feel free to propse it :) | 12:51 |
fungi | i don't even know what job you're talking about, but if you have a pointer to the script i can take a look | 12:51 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/tools/check-cherry-picks.sh | 12:51 |
fungi | it's likely just a matter of making the script do less than it does now | 12:51 |
sean-k-mooney | this also need to work locally | 12:52 |
sean-k-mooney | the end goal of the script is to make sure we dont regress going form 2023.1 to 2023.2 ectra | 12:54 |
fungi | i guess i'll need to look at the context where the script is run, because i don't see it doing anything that should inherently break speculation, but if some earlier step is checking out from the origin remote instead of the current branch state that would break it | 12:54 |
sean-k-mooney | it runs in check as non voteing and gate as voting | 12:55 |
fungi | i mean the job that runs that script | 12:56 |
sean-k-mooney | fungi: from my perspective not doing speclicitive execution is safer and more correct | 12:56 |
sean-k-mooney | so i dont consider this to be a bug | 12:56 |
fungi | i don't see it, but not my circus, not my monkeys ;) | 12:56 |
sean-k-mooney | its called nova-tox-validate-backport | 12:57 |
sean-k-mooney | https://zuul.opendev.org/t/openstack/build/a6be643734b44f139bca7b4eb8963814 | 12:57 |
sean-k-mooney | Cherry pick hash 11301e7e3f0d81a3368632f90608e30d9c647111 not on any master or stable branches | 12:57 |
fungi | if you want backports to be delayed from being enqueued until other changes merge, and slow down throughput as a result even though zuul can ensure the correctness of the thing you actually want without that additional inefficiency, i'm not going to argue | 12:57 |
sean-k-mooney | fungi: can you expalin how zuul can ensure that a patch is merged on all newer stable branches before it will allow it to merge on the older ones | 12:58 |
fungi | yes, speculative testing | 12:58 |
sean-k-mooney | fungi: because as far as im aware it cant. it would be incorrect to use depend on | 12:58 |
sean-k-mooney | and it does not have awareness of the depeency in any other way | 12:59 |
fungi | zuul assumes the newer branch backport ahead of it will merge, but if it doesn't then testing gets rerun without that change present in the queue and the job will fail | 12:59 |
sean-k-mooney | that only true if they are both in the queu at the same time | 12:59 |
sean-k-mooney | that is not normlaly the case | 12:59 |
fungi | yes, if they're not in the queue at the same time then the job will just fail when that change isn't already merged, or succeed if it is | 13:00 |
sean-k-mooney | why woudl it fail | 13:00 |
fungi | zuul is presenting all the branches to the job with a speculative future state, but if that speculative state changes then the presumed future is invalidated and a new possible future is constructed and tests are rerun | 13:01 |
sean-k-mooney | fungi: what your stating does not fit with my mental model of how zuul works and im pretty sure it not corect | 13:01 |
sean-k-mooney | fungi: how would zuul know about the precondition | 13:01 |
sean-k-mooney | its not express in the normal jobs or via depned on | 13:02 |
fungi | zuul assumes that changes in the same queue share a possible future | 13:02 |
sean-k-mooney | we worte this script to explictly close the fact that zuul did not block this | 13:02 |
sean-k-mooney | right again we cant rely on that for this case | 13:02 |
sean-k-mooney | backporting thet same patch acroos branch and ensuring it merged in reverse chronalogical order | 13:02 |
fungi | yes, i'm not saying the job is unnecessary, i'm saying the job should be able to use the speculative future zuul has constructed instead of blocking on the actual present state of the repositories | 13:02 |
fungi | because if that future changes the job will be re-run | 13:03 |
sean-k-mooney | it would work if zuul sepcetivly merged it into the relevnet stable branch in the git repo it prepared | 13:03 |
fungi | it does | 13:03 |
sean-k-mooney | well the job still fails so apprently not. or there is a bug in our script | 13:04 |
fungi | it prepares all branches of any required-project in the job (implicitly including the project that triggered the job) | 13:04 |
fungi | if the script is using the locally prepared repository state without resetting any branches and without querying remote git states then it should see the prepared speculative future | 13:05 |
sean-k-mooney | we are not downlaoding a seperat repo we are runnign the script on the repo prepared by zuul | 13:05 |
sean-k-mooney | it is https://github.com/openstack/nova/blob/master/tools/check-cherry-picks.sh | 13:05 |
sean-k-mooney | we are doing read only operations on the repo in the script | 13:05 |
fungi | yeah, i agree i'm not immediately seeing why it wouldn't work | 13:06 |
fungi | i'll enqueue it and test the assumption that it can't pass | 13:06 |
fungi | oh, never mind, i can't enqueue it if it's not approved | 13:07 |
sean-k-mooney | ok i can re add +w | 13:07 |
sean-k-mooney | it failed in check which is why i expect it to fail in gate | 13:07 |
sean-k-mooney | but maybe that would pass in this specic case | 13:07 |
fungi | well, it failed in check because the check pipeline lacks the speculative state of the gate pipeline | 13:07 |
sean-k-mooney | because of the queing and the fact gate is a dependet pipeline | 13:08 |
fungi | if you did a depends-on in check it would likely pass there | 13:08 |
fungi | by creating the same speculative future | 13:08 |
sean-k-mooney | it woudl but the ohter juobs weoul dnot be happy with zuul doing a merge of two branches of the same repo | 13:08 |
fungi | why? it still merges them to different branches | 13:09 |
fungi | we do this all the time to test upgrade jobs like grenade | 13:09 |
sean-k-mooney | well its the same commit so would that not be a merge conflict | 13:09 |
sean-k-mooney | i know you can do depens on in the same repo | 13:09 |
fungi | changes on different branches aren't going to merge conflict with one abother | 13:09 |
sean-k-mooney | just never had it work in the same repo for the same comit form two branches | 13:10 |
fungi | the dependency will be merged to the stable/2024.1 branch, the current change will be merged to the stable/2023.2 branch | 13:10 |
sean-k-mooney | did that change at some point | 13:10 |
fungi | they don't get combined into the same branch together, so no merge conflict | 13:10 |
sean-k-mooney | good to know but i tough they woudl both be merged into the branch under test | 13:11 |
fungi | zuul has supported cross-branch dependencies this way since ~2014 | 13:11 |
sean-k-mooney | fungi: i reappoved https://review.opendev.org/c/openstack/nova/+/923291 so you can try an promotet it if you want | 13:11 |
fungi | supporting dependencies across branches was a major design requirement because of grenade | 13:11 |
fungi | it'll get enqueued automatically in the current state anyway | 13:12 |
fungi | which it just did | 13:12 |
sean-k-mooney | ok im pretty sure if hit edgecase with that but it may have been related to somethign other then the merger | 13:12 |
fungi | now to see if nova-tox-validate-backport passes | 13:13 |
sean-k-mooney | if it does that will be good to know | 13:13 |
fungi | yeah, if it does pass then that means in theory it should also be able to pass in check with a depends-on to the version on the newer branch | 13:15 |
fungi | if it fails then i should be able to trace through the steps in the job to see where assumptions about the constructed future state on the other branch are getting invalidated | 13:16 |
fungi | multi-branch speculative states are something we've had for ages because it's necessary for making sure that grenade jobs run in the gate pipeline reflect the end state when the commits on different branches later merge, so we don't land changes on two different branches that then break upgrade tests when used together | 13:20 |
fungi | validate-backport: OK | 13:23 |
fungi | looks like it's going to succeed | 13:23 |
fungi | yep, just went green | 13:23 |
fungi | so the job is actually correctly designed to support this, there's not an actual need to wait for the other change to merge before you approve | 13:24 |
fungi | and using depends-on in cases like this would have two benefits: 1. you'll get an actual passing result (you don't even necessarily have to keep the job non-voting in check if you don't want to), and 2. zuul will ensure that the changes can't get enqueued into the gate pipeline in the wrong order so you can just approve them in any order you like | 13:25 |
fungi | the only caveat i'm aware of is that you need to be careful you don't inadvertently create dependency loops when you do that, but the same can be said for any use of depends-on | 13:27 |
frickler | ok, looks like I could reenqueue the deferred patches, then | 13:40 |
dansmith | frickler: sean-k-mooney: FWIW, I'd much rather have broken iso support and angry users than have this exposure.. I expect images to start showing up on download sites that look to contain something awesome, but are just the exploit | 13:44 |
fungi | in cases like this where there are series of changes for each branch, i'd probably conservatively just set depends-on in the commit message of the first change in the series pointing to the change url for the last change in the series on the newer branch | 13:44 |
dansmith | something a user of a private cloud might download and then upload into the internal glance and then ..boom | 13:44 |
frickler | humm, third nova-multi-cell failure on https://review.opendev.org/c/openstack/nova/+/923287 in a row, looks like something more serious | 13:44 |
dansmith | frickler: why serious? one fail.. | 13:46 |
dansmith | looks like failure to talk to metadata server, | 13:46 |
dansmith | which could be the same sort of thing as prevents us from SSHing to the guest | 13:47 |
dansmith | oh actuall, | 13:47 |
dansmith | we're doing it from inside the guest, so "ssh guest curl metadata" | 13:47 |
fungi | so could be exactly the same problem in that case | 13:48 |
opendevreview | Merged openstack/nova stable/2023.1: Reject qcow files with data-file attributes https://review.opendev.org/c/openstack/nova/+/923288 | 13:48 |
frickler | ok, I miscounted, twice in a row still https://zuul.opendev.org/t/openstack/build/51577af322ee4809889698bde352efac | 13:48 |
frickler | actually https://review.opendev.org/923289 is going to fail, too, stuck in devstack not finishing, waiting for the timeout to strike | 13:49 |
frickler | going to do some gate reshuffling to override | 13:50 |
fungi | seems like a good opportunity | 13:52 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] add iso file format inspector https://review.opendev.org/c/openstack/nova/+/923573 | 13:55 |
sean-k-mooney | oh i messed up that commit message | 13:57 |
sean-k-mooney | let me abandon the new review | 13:57 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] add iso file format inspector https://review.opendev.org/c/openstack/nova/+/923533 | 13:57 |
*** bauzas_ is now known as bauzas | 14:30 | |
frickler | and another gate failure in the cve stack https://zuul.opendev.org/t/openstack/build/90c78eec9270430a8fcaad8652348d0c | 14:54 |
dansmith | nova.exception.InternalError: Unexpected vif_type=binding_failed | 14:57 |
dansmith | I assume that means some neutron fail | 14:57 |
dansmith | sean-k-mooney: ^ | 14:57 |
dansmith | (frickler that was from the gate fail you quoted, in case it's not clear) | 14:58 |
frickler | yes, it is well possible that it is all unrelated and just unstable tests, I just wanted someone to check it | 15:01 |
dansmith | yep, definitely doesn't look related | 15:03 |
*** bauzas_ is now known as bauzas | 15:17 | |
fungi | did 923288 get reenqueued into the gate accidentally? it's already merged | 15:19 |
fungi | if we're reshuffling for the current failures anyway, should probably eject that one at the same time | 15:20 |
dansmith | fungi: maybe someone wanted to be really really sure it landed? | 15:21 |
fungi | double-merged! | 15:21 |
fungi | a change so nice we'll merge it twice | 15:21 |
dansmith | there were definitely things in check and forced into gate at the same time yesterday, perhaps a zuul bug or race? | 15:21 |
frickler | hmm, according to my bash history, I didn't reenqueue 923288, but zuul has been doing some additional requeues of patches on top of others | 15:24 |
frickler | maybe a race condition when I did 923289 and zuul wasn't aware yet of 923288 having been merged | 15:25 |
fungi | normally only if they're not yet merged and meet criteria to be enqueued and depend on a change you've enqueued (either explicitly via depends-on or implicitly through git commit parentage) | 15:25 |
fungi | but yeah, can't rule out bugs either | 15:26 |
frickler | but likely if that buildset passes, it will result in a merge failure and cause a gate reset anyway, so we can as well abandon it now IMO | 15:26 |
fungi | i agree we should kick it out, i'd give 50/50 odds on whether it would actually cause a gate reset or just no-op (not sure what gerrit's behavior is if you call the submit api on an already-merged change) | 15:27 |
frickler | and I'm reaching EOD quickly (time to watch some soccer), so if you could take over, fungi, that'd be nice | 15:27 |
fungi | glad to! | 15:27 |
fungi | enjoy the fußball! | 15:28 |
frickler | there's also a new set of deferred patches in https://etherpad.opendev.org/p/cve-gate-reshuffling in case the gate gets lets crowded, else I'll handle these tomorrow | 15:28 |
fungi | thanks, i saw that too when i was looking back at the pad to see if i had maybe been the one to accidentally reenqueue the already-merged change | 15:29 |
opendevreview | sean mooney proposed openstack/nova master: add iso file format inspector https://review.opendev.org/c/openstack/nova/+/923533 | 15:29 |
sean-k-mooney | frickler: is that the saem one i looked at eariler | 15:30 |
sean-k-mooney | i commented on the binding failed failure and recheck one fo the jobs for it when i was reviewing 2023.1 | 15:31 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/923289/2#message-348bc1fded02d0ec6cd19807bcc2aec0245f606f | 15:31 |
fungi | okay, so going to reshuffle the gate putting these first: 923287,3 923289,2 923290,2 923291,3 (and drop 923288,2 since it's already merged) | 15:32 |
fungi | any other requests while i'm causing a full gate reset anyway? | 15:33 |
fungi | taking your silence as a resounding "no" | 15:35 |
*** bauzas_ is now known as bauzas | 15:38 | |
mnaser | has anyone pushed/proposed the idea of rack-level failure domains in openstack? | 16:51 |
mnaser | I know a cloud is a cloud, but VMware people really feel like they see the need to get there | 16:51 |
sean-k-mooney | failure domaisn are really not a thing in openstack | 16:52 |
JayF | FYI; I know it's pretty low on you all's list right now, but the fix for the un-upgraded rlocks in eventlet for py3.12 is in the next release | 16:52 |
sean-k-mooney | the only way to really do that is using seperate keystone regions and isntalls of all the service per rack | 16:53 |
sean-k-mooney | mnaser: looking only at nova netiern cells or AZ map to failure domains | 16:53 |
sean-k-mooney | i.e. if the host running your schduer or api dies that affacts all hosts in that region | 16:54 |
sean-k-mooney | neutron also spanc all cells/AZs | 16:54 |
fungi | pretty sure the same can be said for the host running your vcenter api in a vmware environment | 16:54 |
sean-k-mooney | yep so it depens on your defintion of failure domains and if your inlcuding the contol plane in that | 16:55 |
sean-k-mooney | i.e. workload failrue domains vs netwrok vs storage vs contolplane | 16:55 |
sean-k-mooney | cinder backend are stroage level failure domains | 16:55 |
sean-k-mooney | of couse that is only true if you access them over seperate stoage networks | 16:56 |
fungi | or have some sort of high availability for the storage network | 16:57 |
sean-k-mooney | yep | 16:57 |
mnaser | sean-k-mooney: workload failure domains | 16:57 |
sean-k-mooney | mlag or similar with two diffent top of rack switch ectra | 16:57 |
mnaser | im thinking a scheduler hint maybe of some sorts + a filter? | 16:57 |
fungi | like cross-chassis lag | 16:57 |
sean-k-mooney | mnaser: we brifly discussed having somethign like cephs cluster map | 16:58 |
mnaser | yeah | 16:58 |
sean-k-mooney | basically building a tree of host aggrates and grouping things in a heriachy then allowing affintiy/anti affinity ectra | 16:59 |
mnaser | exactly | 16:59 |
mnaser | im thinking server group + an extra scheduler hint other than `group` to get that | 16:59 |
mnaser | but im wondering if im walking a path someone tried before and I can pick up work from or | 16:59 |
sean-k-mooney | we said it was proably too complex and out of scope of nova but that was also like 6 years ago so thngs have changes somewhat | 16:59 |
sean-k-mooney | the last time we really talks about this properly was pre/start of the pandmeic | 17:00 |
sean-k-mooney | mnaser: there is a request ot do az anti affinty this/last cycle | 17:00 |
sean-k-mooney | in generally i woul dnot be agsint this in principal. it just would not be high on my list of thing to do next | 17:01 |
mnaser | working in this case, it feels odd to have 1 az per rack | 17:01 |
mnaser | like that feels like its brute forcing the az feature if you're on rack level | 17:01 |
sean-k-mooney | az's are just named aggreates | 17:01 |
mnaser | yeah but I feel like in the perspective of a consumer of the cloud they might be like "oh isn't this like a azure/aws/etc az?" | 17:02 |
sean-k-mooney | and the answer to hell no | 17:03 |
sean-k-mooney | but i get that persective | 17:03 |
sean-k-mooney | so what i would say is we could and might want to make nova eand other services better able to supprot that usecase | 17:03 |
mnaser | cause like maybe it would be nice to be like.. this is an az .. but also you can have different fault domains | 17:03 |
sean-k-mooney | but i think we would need some feature in placment to do it effeictly | 17:04 |
sean-k-mooney | doing it in a filter or weigher would likely be slow an maybe buggy | 17:04 |
sean-k-mooney | im not saying we could not do it in nova alone but we would need to becareful with how we design it | 17:05 |
fungi | in retrospect it was probably a mistake to call them availability zones when the implementation wasn't equivalent to aws's az concept, because we've had to fight that confusion ever since, but it'd be hard to change now (i still refer to projects as tenants, after all) | 17:05 |
sean-k-mooney | nova still has instance_type and flavor depending on what your looking at | 17:06 |
fungi | indeed | 17:06 |
sean-k-mooney | so i think an aggerate groups feature is definlty doable and i like cpehs model | 17:07 |
fungi | we just missed rounding out the suite of renames with "color" and "texture" and maybe "odor" | 17:07 |
sean-k-mooney | i just feel like crambing that into the existing filters/weither might be sub optimal | 17:07 |
sean-k-mooney | im sure if we renamed flavors to Terroir it would confuse noone | 17:09 |
mnaser | sean-k-mooney: I think the path of least resistance is a filter that checks if there is a server group assigned, and a scheduler hint called `different_failure_domain` .. and that goes over host aggregates which have a `failure_domain` attribute and ensures that the hosts in the server group dont end up in that same host aggregate | 17:23 |
sean-k-mooney | mnaser: i really dont like schduler hits | 17:24 |
sean-k-mooney | btu thats an option | 17:24 |
mnaser | sean-k-mooney: could also use `rules` for server groups to have `different_failure_domain` too | 17:25 |
opendevreview | Dan Smith proposed openstack/nova master: Reproduce iso regression with deep format inspection https://review.opendev.org/c/openstack/nova/+/923507 | 17:25 |
opendevreview | Dan Smith proposed openstack/nova master: Add iso file format inspector https://review.opendev.org/c/openstack/nova/+/923533 | 17:25 |
mnaser | but that would mean an api change | 17:25 |
sean-k-mooney | server group/instance wawere filteres are some of the most ram/compute intesitve you can create | 17:25 |
mnaser | yeah | 17:27 |
mnaser | that is for sure going to be a busy one | 17:27 |
mnaser | im not sure if there's another approach | 17:28 |
*** bauzas_ is now known as bauzas | 17:39 | |
fungi | 923287 failed out on nova-next and nova-multi-cell: https://zuul.opendev.org/t/openstack/buildset/0a50e0745e3e4759bf743b39208ed1dc | 17:43 |
fungi | i've reenqueued it and repromoted | 17:44 |
opendevreview | Dan Smith proposed openstack/nova master: Add iso file format inspector https://review.opendev.org/c/openstack/nova/+/923533 | 18:02 |
dansmith | fungi: confirmed at least one of those was a guest kernel panic so obviously unrelated | 18:27 |
fungi | exciting! | 18:28 |
fungi | and good to know, thanks for taking a look | 18:28 |
dansmith | not really, we see guest kernel panics way too often :( | 18:28 |
fungi | yeah, my excitement was understated sarcasm really, sorry | 18:28 |
dansmith | heh, okay | 18:28 |
fungi | i've about worn out all other tools at my disposal, so sarcasm is what's left at this point | 18:29 |
dansmith | yeah, you don't have to apologize to me for that.. I'm beyond exhausted.. I dunno what's after that before death, maybe "acceptance" ? | 18:30 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Fix device type when booting from ISO image https://review.opendev.org/c/openstack/nova/+/909611 | 18:51 |
opendevreview | Dan Smith proposed openstack/nova master: Add iso file format inspector https://review.opendev.org/c/openstack/nova/+/923533 | 19:33 |
opendevreview | Merged openstack/nova stable/2023.2: Fix vmdk_allowed_types checking https://review.opendev.org/c/openstack/nova/+/923287 | 19:54 |
opendevreview | Merged openstack/nova stable/2023.1: Check images with format_inspector for safety https://review.opendev.org/c/openstack/nova/+/923289 | 19:55 |
fungi | 923290 hit a package install error, and that knocked out 923291 as a child commit. i'll get them both back into play but they're the only changes left from the ossa | 20:11 |
fungi | waiting to promote them to the front since there's a stack of 5 succeeding changes that are due to merge in the next 10 minutes | 20:13 |
fungi | but once those land i'll shuffle them up | 20:14 |
fungi | and done | 20:27 |
*** bauzas_ is now known as bauzas | 20:44 | |
opendevreview | Merged openstack/nova stable/2023.1: Additional qemu safety checking on base images https://review.opendev.org/c/openstack/nova/+/923290 | 22:31 |
opendevreview | Merged openstack/nova stable/2023.1: Fix vmdk_allowed_types checking https://review.opendev.org/c/openstack/nova/+/923291 | 22:32 |
fungi | yay!!! | 22:32 |
dansmith | fungi: thanks for all your work, I didn't appreciate how much that would involve you and your clicky finger | 22:44 |
fungi | no worries, it was thoroughly calloused already | 22:45 |
fungi | (no actual clicking was done, unless you count the clickity-clack of my mechanical keyboard) | 22:46 |
fungi | like all real software, zuul has a proper cli | 22:46 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!