opendevreview | Ghanshyam proposed openstack/nova stable/xena: DNM: testing tempest pin for stable/wallaby https://review.opendev.org/c/openstack/nova/+/871800 | 04:41 |
---|---|---|
gmann | gibi: bauzas: updates on stable/wallaby gate: I have updated the devstack patch (depends-on) and it fixes the gate - https://review.opendev.org/c/openstack/nova/+/871798 | 04:51 |
gmann | gibi: bauzas: but it unhide another bug in devstack/grenade side due to which stable/xena grenade job start failing (with devstack stable/wallaby tempest pin) - https://bugs.launchpad.net/grenade/+bug/2003993 | 04:52 |
gmann | I have proposed the fix https://review.opendev.org/q/I5e938139b47f443a4c358415d0d4dcf6549cd085 and testing it in https://review.opendev.org/c/openstack/nova/+/871800 | 04:53 |
opendevreview | Ghanshyam proposed openstack/nova stable/xena: DNM: testing tempest pin for stable/wallaby https://review.opendev.org/c/openstack/nova/+/871800 | 05:58 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: compute: enhance compute evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858383 | 07:10 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: api: extend evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858384 | 07:10 |
sahid | ^ fixed merge conflict | 07:11 |
sahid | o/ | 07:11 |
*** blarnath is now known as d34dh0r53 | 07:22 | |
*** bhagyashris|ruck is now known as bhagyashris | 07:47 | |
bauzas | sahid: I have a busy morning but I'll try to take a look | 09:01 |
bauzas | gmann: ack, thanks for the heads-up | 09:01 |
priteau | Would the nova team consider disabling the failing grenade job on stable/wallaby temporarily to be able to merge the VMDK patch? | 09:02 |
bauzas | priteau: that's one of the options in the table | 09:02 |
bauzas | but before doing it, I need to correctly understand the problem | 09:03 |
opendevreview | Kashyap Chamarthy proposed openstack/nova stable/zed: libvirt: At start-up rework compareCPU() usage with a workaround https://review.opendev.org/c/openstack/nova/+/871968 | 09:40 |
gibi | gmann: those fixes looks good to me. thanks for proposing them | 09:45 |
kashyap | Hm, /me lost track of these upstream backports from last July still :-( -- https://review.opendev.org/q/topic:bug%252F1982853 | 09:46 |
opendevreview | Kashyap Chamarthy proposed openstack/nova stable/yoga: Add a workaround to skip hypervisor version check on LM https://review.opendev.org/c/openstack/nova/+/851202 | 09:54 |
opendevreview | Kashyap Chamarthy proposed openstack/nova stable/yoga: libvirt: At start-up rework compareCPU() usage with a workaround https://review.opendev.org/c/openstack/nova/+/871969 | 09:56 |
kashyap | elodilles: bauzas: This backport has been waiting for a while, can this be put through? - https://review.opendev.org/c/openstack/nova/+/851205 | 09:57 |
bauzas | kashyap: done | 10:02 |
kashyap | Thx! | 10:02 |
elodilles | it won't merge, as the yoga patch has not been merged yet: https://review.opendev.org/c/openstack/nova/+/851202 | 10:09 |
elodilles | bauzas: ^^^ | 10:09 |
bauzas | voila why I didn't +W before | 10:09 |
sahid | bauzas: no worries thank you for your time! | 10:09 |
bauzas | -ETOOMANYREVIEWSONFLY | 10:09 |
elodilles | (as I see you +W'd it once, but the patch wasn't cherry picked from the latest, merged PS) | 10:11 |
kashyap | elodilles: Oh, yeah; the Yoga one is still waiting. And are you saying the Xena cherry-pick is not correct? | 10:12 |
kashyap | Ah, you were talking about the _past_ ("wasn't"). Now it should be fine | 10:13 |
elodilles | kashyap: the cherry-pick needs to be done again from yoga patch to stable/xena | 10:21 |
kashyap | Duh, I thought I just did it ... /me face-palms and looks | 10:21 |
elodilles | (and the current xena patch won't merge as it is not cherry picked from the latest yoga PS) | 10:21 |
elodilles | kashyap: thx for fixing it | 10:22 |
kashyap | elodilles: Gonna cherry-pick from this Yoga commit to Xena: c07495d9d64dd0635d72fc7ff67d73a656a40d13 | 10:23 |
elodilles | kashyap: yepp, that is the hash of the latest PS | 10:24 |
kashyap | elodilles: Hmm, I also need to backport another one before that (https://review.opendev.org/c/openstack/nova/+/845045) | 10:29 |
kashyap | For Xena, i.e.; /me goes to do it | 10:29 |
opendevreview | Kashyap Chamarthy proposed openstack/nova stable/xena: Add a workaround to skip hypervisor version check on LM https://review.opendev.org/c/openstack/nova/+/851205 | 10:31 |
opendevreview | Kashyap Chamarthy proposed openstack/nova stable/xena: libvirt: Add a workaround to skip compareCPU() on destination https://review.opendev.org/c/openstack/nova/+/871975 | 10:31 |
kashyap | elodilles: Hope that looks better --^ | 10:31 |
elodilles | kashyap: yepp, looks good, that should be accepted by the backport validator job as well | 10:33 |
* kashyap nods; thx | 10:33 | |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Dividing global privsep profile https://review.opendev.org/c/openstack/nova/+/871729 | 10:45 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Dividing global privsep profile https://review.opendev.org/c/openstack/nova/+/871729 | 10:47 |
opendevreview | Sofia Enriquez proposed openstack/nova master: Implement encryption on backingStore https://review.opendev.org/c/openstack/nova/+/870012 | 11:22 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Dividing global privsep profile https://review.opendev.org/c/openstack/nova/+/871729 | 11:51 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Moving privsep profiles to nova/__init__.py https://review.opendev.org/c/openstack/nova/+/872010 | 11:51 |
opendevreview | Kashyap Chamarthy proposed openstack/nova stable/xena: libvirt: At start-up rework compareCPU() usage with a workaround https://review.opendev.org/c/openstack/nova/+/872011 | 12:13 |
*** labedz__ is now known as labed | 12:23 | |
*** labed is now known as labedz | 12:23 | |
elodilles | bauzas: speaking about the tons of things you are to review, i don't know whether you saw the 2023.2 Bobcat schedule plan: https://review.opendev.org/c/openstack/releases/+/869976 | 12:37 |
elodilles | o:) | 12:37 |
auniyal | Hello o/ | 12:38 |
auniyal | please review these Improving logging at '_allocate_mdevs'. | 12:39 |
auniyal | - https://review.opendev.org/q/topic:bug%252F1992451 | 12:39 |
*** tosky_ is now known as tosky | 12:55 | |
bauzas | elodilles: I've seen it and I forgot to ask on Tuesday for folks to look at this change | 13:18 |
bauzas | elodilles: during the nova meeting | 13:18 |
bauzas | elodilles: I'm not bad about it fwiw, as we would have the Summit *before* Specfreeze which is nice | 13:18 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: DNM: Testing check pipeline on master branch https://review.opendev.org/c/openstack/nova/+/872018 | 13:44 |
elodilles | bauzas: yepp, it's a fairly long cycle (28 weeks) so milestones have longer times | 13:52 |
*** dasm|off is now known as dasm | 14:01 | |
darkhorse | Hi team, Is it possible to disallow a user from all compute resources using keystone policy or any other means? The use case is I want to have multiple roles to a project. Some users have compute permissions, some have network permissions, some have volume permissions etc. | 14:12 |
darkhorse | I am not sure if this question is more relevant to keystone channel. | 14:12 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/victoria: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833436 | 15:54 |
opendevreview | Merged openstack/nova stable/yoga: Add a workaround to skip hypervisor version check on LM https://review.opendev.org/c/openstack/nova/+/851202 | 16:01 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/ussuri: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833437 | 16:02 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/train: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833438 | 16:02 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/ussuri: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833437 | 16:03 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/train: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833438 | 16:04 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/train: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833438 | 16:04 |
sean-k-mooney | dansmith: can you take a look at https://review.opendev.org/c/openstack/nova/+/863919/12/nova/tests/unit/compute/test_resource_tracker.py#1553 when you have time | 16:14 |
dansmith | sean-k-mooney: yeah, it's on my list.. ya'll drug me into a call at 6:30am and I have another starting in a few ;P | 16:15 |
sean-k-mooney | sorry about that | 16:15 |
dansmith | thanks for hitting the ones below, that's some progress for sure | 16:15 |
sean-k-mooney | so ill try and test teh rest when i get devstack deploy so ill loop back to them again later today if that works out | 16:16 |
dansmith | cool | 16:16 |
sean-k-mooney | cool devstack stacked first time | 17:10 |
sean-k-mooney | now ot actully use your patches | 17:11 |
sean-k-mooney | so currently i have all in one so when i checkout your branch and restart the nova serices im expecting the compute to start up and writeh the uuid to a file and update the service version right | 17:12 |
sean-k-mooney | yep that worked https://paste.opendev.org/show/bLMvRfuUugp9iMcn4R60/ | 17:16 |
sean-k-mooney | im going to break this in a few differnt ways and document it in an ether pad and ill let you know if i find anything odd or broken | 17:17 |
dansmith | sean-k-mooney: yeah, cool | 17:25 |
sean-k-mooney | https://etherpad.opendev.org/p/Stable-compute-uuid-manual-testing ill be usign that to keep my notes | 17:29 |
sean-k-mooney | there is not really much there yet but ill kep adding to it as i go | 17:30 |
sean-k-mooney | dansmith: found a bug | 17:59 |
sean-k-mooney | getting logs now | 17:59 |
dansmith | in "test 3 restart with deleted file" ? | 18:00 |
sean-k-mooney | yep | 18:00 |
dansmith | does that mean deleted node id file? | 18:00 |
sean-k-mooney | i moved it to compute_id_old | 18:00 |
sean-k-mooney | the agent tried to add a duplicate row in the db which resulted in a key error | 18:01 |
sean-k-mooney | and it wrote a new file | 18:01 |
sean-k-mooney | with a diffent uuid | 18:01 |
dansmith | well, right, | 18:01 |
dansmith | because it knows it's not an upgrade (now) and it didn't find a local compute node uuid | 18:02 |
dansmith | what do you think it should do there? | 18:02 |
sean-k-mooney | well it shoudl not result in a traceback in the log for one | 18:02 |
sean-k-mooney | but it should see that the compute node exist in the db and not try to create a new one | 18:03 |
sean-k-mooney | and we should not write a new file to disk | 18:03 |
dansmith | but the whole point of this is that the file becomes the source of truth | 18:03 |
dansmith | so maybe we should see the failure to create with a keyerror as a "something's wrong abort" but I bet it happens too late for us to abort | 18:04 |
sean-k-mooney | well the file didnt exist correct | 18:04 |
sean-k-mooney | so before we try to create a new compute node record shoudl we not check if one exits the old way | 18:04 |
sean-k-mooney | and abort then | 18:04 |
dansmith | we do, specifically on upgrade only | 18:05 |
dansmith | which you saw work | 18:05 |
dansmith | but once you have gotten past upgrade, it knows you're not upgrading, and can only really assume that it's a greenfield deployment | 18:05 |
dansmith | if we're going to stop relying on the hostname, there's really no other way, right? | 18:05 |
sean-k-mooney | no if we have a compute node and the serivce is upgraed then the file shoudl exist | 18:06 |
sean-k-mooney | if it does not then we knwo somethign odd happened | 18:06 |
sean-k-mooney | basicly if compute service>=62? and not file -> error | 18:07 |
sean-k-mooney | for greanfiled thre wont be a compute node in the db | 18:07 |
dansmith | how do we know the difference between "already upgraded" and "greenfield" ? | 18:07 |
dansmith | but again, the only way you know the "comptue node in the db" is because you're relying on the hostname, which we should not be doing | 18:07 |
sean-k-mooney | or we can catch the duplicate key error form the db | 18:08 |
dansmith | depending on when the duplicate key happens, I'm fine aborting in that case for sure | 18:08 |
sean-k-mooney | we at least shoudl not write the file with the wrong uuid | 18:08 |
dansmith | but I think we don't create that record until far too late | 18:08 |
sean-k-mooney | its currently writing the one for the row that was rejected | 18:08 |
sean-k-mooney | proably because we set the singlton uuid | 18:08 |
dansmith | okay, I'm with you on not writing it if there's a keyerror, but I think that will be pretty hard to line those things up | 18:09 |
dansmith | so maybe delete it if we wrote it or something | 18:09 |
dansmith | although, hang on | 18:10 |
dansmith | let's say I deploy two new computes with the same hostname by accident, they both generate and write unique uuids, | 18:10 |
dansmith | one will fail to create their compute node because of the clash, | 18:10 |
dansmith | but if I just rename the offending duplicate one, then the uuid it generated is fine to use when I restart | 18:10 |
sean-k-mooney | yep | 18:10 |
dansmith | the "I deleted the id" case seems pretty edge-y to me, because there are tons of things I can randomly delete from a running system that will break stuff | 18:11 |
dansmith | if we can abort on key conflict at start, then I'm on board, but otherwise, I dunno | 18:11 |
dansmith | this is a bit like saying I deleted /etc/ssh/ssh_host_key and when I restarted it generated a new one and my clients all complain :) | 18:11 |
sean-k-mooney | honestly i did this as my first test because i tough it could be a trivial thing that might happen and we should prevent it | 18:12 |
sean-k-mooney | dansmith: i was more thinging what happens if you fail to bind mount this in a container | 18:12 |
sean-k-mooney | if the file is ever lost i think it defeats much of the utility of the feature if we allow the agent to start | 18:13 |
dansmith | okay, but if you do, there's not much harm, because the resolution is to restart with it, and no harm no foul right? | 18:13 |
dansmith | if you failed to bind mount this, you likely also failed to bind-mount the instance images no? | 18:14 |
sean-k-mooney | fair its in the state dir | 18:14 |
dansmith | in the pre-provisioned case, this goes in /etc/nova anyway | 18:14 |
sean-k-mooney | although it depend on the location | 18:14 |
sean-k-mooney | any way im goign to keep testing/breaking it for a while and see what else i find | 18:17 |
sean-k-mooney | i just added the traceback | 18:17 |
dansmith | ack, I say we punt on that for the moment and see what else you can find | 18:17 |
dansmith | maybe we can circle back with some extra stuff that makes that smarter | 18:17 |
dansmith | at least catching keyerror and logging something that says "okay, here's how you've screwed up..." | 18:18 |
dansmith | because we have that trace right now and it's not super obvious why | 18:18 |
sean-k-mooney | so i think the abort is broken in general | 18:26 |
dansmith | the abort on rename? | 18:28 |
sean-k-mooney | yep so i tried restarting it with the incorrect uuid | 18:28 |
sean-k-mooney | and it just keeps trying to create the recoed with the incorrect uuid | 18:29 |
sean-k-mooney | which keeps failing | 18:29 |
dansmith | does the uuid exist in the db though? | 18:29 |
sean-k-mooney | ill check but i dont think so | 18:29 |
dansmith | then it is doing what it should do | 18:29 |
dansmith | because that's the pre-provisioned case | 18:29 |
dansmith | if you create an object in the db with that uuid but a different host, then it should trigger the rename detection | 18:29 |
sean-k-mooney | so i really think this is broken as is | 18:30 |
sean-k-mooney | but i will try renames later | 18:30 |
dansmith | if you give it a uuid that doesn't exist in the database, how is it supposed to know that's not what you want it to use? | 18:30 |
dansmith | you've told it "this is your uuid, end of story" and if there's no object in the db with that uuid, it's going to try to create it | 18:31 |
dansmith | it should catch and log a better error than the trace, saying there's a name conflict or something, but otherwise it's doing the right thing I think | 18:31 |
sean-k-mooney | so i dont think we shoudl be ignoring the host/hypervior_hostname entirely espcially when we get the db duplicte key error | 18:31 |
dansmith | isn't that the entire point of this series? to break that as the link? | 18:32 |
sean-k-mooney | no | 18:32 |
sean-k-mooney | its to make sure a host never change its hostname or host value | 18:32 |
sean-k-mooney | that is not entirly the same thing | 18:32 |
dansmith | I totally disagree that that's the point of this series :) | 18:33 |
sean-k-mooney | we want to make sure we always have a stable way to mapp a host to the same comptue node record | 18:33 |
sean-k-mooney | and that the host and hypervior_hostname dont change | 18:33 |
dansmith | because I didn't even have the rename protection in there until late | 18:33 |
sean-k-mooney | well this would not have prevented the issues our downstream custoemr had without the host/hypersior host checks | 18:34 |
dansmith | it would, if they didn't delete the state file | 18:34 |
dansmith | like I said, if the uuid is actually a compute node in the db, but with a hostname that doesn't match, it will abort | 18:35 |
sean-k-mooney | and you want to rely on them not doing something we told them not to do when they are doing something else we told them not to do :) | 18:35 |
dansmith | all I want to do is make finding the compute node not tied to the hostname | 18:35 |
sean-k-mooney | ok so that will just result in it failing with placment | 18:36 |
sean-k-mooney | because we will move the dduplicate key to ther resouce provider crations | 18:36 |
dansmith | but we won't be recreating providers | 18:36 |
dansmith | we'll be accessing them by uuid, which hasn't changed right? | 18:37 |
dansmith | the hostname may be wrong, and if something else claims the hostname, then it will fail to create one, | 18:37 |
sean-k-mooney | well im about to start testing that stuff so we will see | 18:37 |
dansmith | but the case we've had downstream was not that they *shuffled* their compute node names, but rather moved to a different naming schema, still no overlaps | 18:37 |
sean-k-mooney | setting it back to the correct uuid and ill change the CONF.HOST and hostname seperatly | 18:37 |
sean-k-mooney | anyway my inital feedback is this is not preventign as much as i was expecting it to | 18:38 |
dansmith | so here's the accepted spec: https://specs.openstack.org/openstack/nova-specs/specs/2023.1/approved/stable-compute-uuid.html#proposed-change | 18:38 |
dansmith | and I think that's covered here | 18:38 |
dansmith | it says that the uuid file will be what we use to find the compute node, | 18:39 |
dansmith | and we will detect compute node renames | 18:39 |
sean-k-mooney | right and i kind fo assume "we will not intoduce any new DB exctpions that prevent the resouce tracker form working" woudl be an imporant point too | 18:40 |
dansmith | I really think that if you take out the "randomly deleted a key state file from the system" then it prevents quite a bit | 18:40 |
dansmith | how is this a new db exception? it's the same db exception as before if we try to create a conflicting compute node record | 18:41 |
sean-k-mooney | dansmith: well if your relase automation updated the uuid it would cause the same failure im seeing | 18:41 |
sean-k-mooney | that was basiclaly step 4 | 18:41 |
sean-k-mooney | no | 18:41 |
sean-k-mooney | before we looked it up by hostname | 18:41 |
sean-k-mooney | and would not have got a colliion in this case | 18:41 |
sean-k-mooney | so we are trying to create a duplicte record that would not have been created before | 18:42 |
dansmith | okay I'm getting frustrated | 18:42 |
dansmith | shall we take this to a gmeet? | 18:42 |
sean-k-mooney | sure | 18:42 |
sean-k-mooney | im not trying to frusttrate you by the way | 18:42 |
sean-k-mooney | just letting you knwo what im finding | 18:42 |
dansmith | meet.google.com/gkf-fdhr-wgr | 18:42 |
dansmith | sean-k-mooney: one other thing, we could also assert that if we're already upgraded *and* are not starting fresh, we could abort if the uuid file is missing | 19:26 |
dansmith | i.e. your didn't-bind-mount case | 19:26 |
dansmith | although, | 19:26 |
dansmith | if we handle that in the extra check we discussed, we can say "found X expected Y" which will be the easy way for them to fix their stuff, even if X is empty | 19:27 |
sean-k-mooney | yes i thought that was one of the things i said above. | 19:27 |
sean-k-mooney | yep | 19:28 |
dansmith | oh, maybe I was foaming at the mouth and missed it | 19:28 |
sean-k-mooney | i read over your comments on teh persist change too so im ok to proceed with that now based on what we discussed | 19:30 |
dansmith | ack | 19:31 |
sean-k-mooney | so the first 4 have +w and the first 2 are merged | 19:31 |
dansmith | thanks, I'll wait until those merge or fail before I push anything else up | 19:32 |
sean-k-mooney | ok im going to see what else i can break or not break and ill review the last 3 on monday | 19:33 |
opendevreview | Merged openstack/nova master: Add get_available_node_uuids() to virt driver https://review.opendev.org/c/openstack/nova/+/863917 | 19:53 |
sean-k-mooney | dansmith: fyi im gong to bold the titles of the tests that look odd | 20:04 |
sean-k-mooney | but the rename logic does not detact a change in hypervior_hostname today as long as CONF.host does not change | 20:05 |
sean-k-mooney | https://etherpad.opendev.org/p/Stable-compute-uuid-manual-testing#L216 | 20:06 |
dansmith | and that's because we get that from libvirt yeah? | 20:06 |
sean-k-mooney | yep | 20:06 |
sean-k-mooney | so if the value of virsh hostname changes then we update hypervior_hostname in the db and rename the placemnet RP | 20:07 |
dansmith | and that's problematic why, just because cinder/neutron will be unhappy? | 20:07 |
sean-k-mooney | im sure that will break things like QOS or other nested resouce providers ya or at least how we discover them | 20:08 |
dansmith | that's technically out of scope of what I was trying to guard against in this set | 20:08 |
dansmith | and what if you change it back? | 20:08 |
sean-k-mooney | that siad if they also rename there RPs then it might just fix itslef | 20:08 |
sean-k-mooney | thats going to be the next test | 20:08 |
dansmith | okay | 20:08 |
sean-k-mooney | after that im going to try changing the value via /etc/hosts | 20:09 |
sean-k-mooney | i expect changing it back to revert all the chagnes | 20:09 |
sean-k-mooney | i proably shoudl test this with nested resouce provides form neutron but ill do that after all the simiple tests | 20:10 |
sean-k-mooney | actully i need to check a few db tables first | 20:11 |
dansmith | okay, so you're saying it updates plaacement but does *not* update our compute node right? | 20:13 |
sean-k-mooney | it updates teh cn table and placment they are both kept in sync | 20:13 |
sean-k-mooney | im checking what the api db looks like | 20:13 |
dansmith | L219 says otherwise | 20:14 |
dansmith | or you meant "isn't detected to abort" | 20:15 |
sean-k-mooney | i think we shoudl abort starting up if the hypervior_hostname for a compute node changes yes | 20:15 |
sean-k-mooney | at least for libvirt | 20:16 |
sean-k-mooney | but again thats an extra check we can add to the end fo the series | 20:16 |
sean-k-mooney | so i proably should have a vm runnign while i do this as i think the instance.node is going to be out of sync | 20:17 |
sean-k-mooney | so ill boot one before i fix it | 20:17 |
sean-k-mooney | and my expectation is that when i restor the old hostname the isntance.node will not be updated but placement and the compute nodes table will be | 20:17 |
sean-k-mooney | thats proably the last test ill do tonight | 20:18 |
sean-k-mooney | but ill do more on monday. | 20:18 |
dansmith | yeah, like I said, out of scope of what I'm trying to do, but probably should be in scope | 20:18 |
dansmith | ack | 20:18 |
sean-k-mooney | actully i need to test with provider.yaml too at some point | 20:19 |
sean-k-mooney | we can refernce the RP by name i belive which uses the hypervior_hostname | 20:19 |
sean-k-mooney | we can also refrrence it by UUID | 20:19 |
sean-k-mooney | so i suspect the uuid way will be fine but the other way might break | 20:20 |
sean-k-mooney | again not nessiarly in scope but i want to know what happens | 20:20 |
sean-k-mooney | ok so if there are allocation in placment we cant delete and rename the resouce provider becaues we get a 409 | 20:41 |
sean-k-mooney | so that fails and we never update the compute node record either | 20:42 |
sean-k-mooney | the perodic is broken | 20:42 |
sean-k-mooney | but the agent does not exit and we just get tracebacks in the logs. | 20:43 |
sean-k-mooney | ill leave it there for today and come back to this on monday | 20:44 |
dansmith | wait, | 20:46 |
dansmith | I thought you said it *did* update the placement provider hostname? | 20:46 |
sean-k-mooney | it appeard too but based on that log it deleted the RP and recreated it the the old uuid and new name | 20:47 |
sean-k-mooney | so its not updating in place its trying too delete orphan compute and then creating a new one | 20:48 |
sean-k-mooney | that by the way i think is ironic code | 20:48 |
sean-k-mooney | or rather code we have in the common manager loop for ironic | 20:48 |
sean-k-mooney | we we detech hypervior_hostname changes and abort just like conf.host we dont have to care about this | 20:50 |
dansmith | ah because it had no instances? | 20:50 |
sean-k-mooney | right so with no instnaces it deelted it fine | 20:51 |
dansmith | right okay | 20:51 |
sean-k-mooney | the 409 conflict in placment is because fo the allcoation for the instnace | 20:51 |
dansmith | so if no instances, maybe no harm to cinder and neutron? | 20:51 |
sean-k-mooney | it might still break the naming | 20:51 |
sean-k-mooney | but it might be ok in the no instance case | 20:51 |
sean-k-mooney | i will confirure bandwith QOS or something next week and see | 20:52 |
sean-k-mooney | cindier i dont think will use placment at all reight now | 20:52 |
sean-k-mooney | but cyborg could break | 20:52 |
sean-k-mooney | cyborg and neutron might need me to restarck so ill test what i can before that | 20:53 |
sean-k-mooney | dansmith: in this particalr case this placement exception i think happend before your code | 20:54 |
dansmith | during reshape or something? | 20:54 |
sean-k-mooney | i have defintly seen this before and its what i was expecting if your code did not block it | 20:54 |
dansmith | so that sanity check of the hostname->node mapping generates 94 functional test failures | 20:54 |
dansmith | fml | 20:54 |
sean-k-mooney | no i have seen this when people actully change dns/hostname but had CONF.host set | 20:55 |
sean-k-mooney | we dont need to fix all theses cases this cycle either like i know that test case 10 is a prexisitng failure mode | 20:57 |
sean-k-mooney | anyway im going to go get food | 20:57 |
sean-k-mooney | dont spend your weekend on this o/ | 20:57 |
dansmith | I shan't, you either | 20:59 |
*** dasm is now known as dasm|off | 21:35 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!