opendevreview | sean mooney proposed openstack/nova master: [DNM] debug kernel paincs https://review.opendev.org/c/openstack/nova/+/905628 | 01:26 |
---|---|---|
*** efried1 is now known as efried | 06:07 | |
auniyal | Hi sean-k-mooney, gibi, bauzas, can you please llok at this change its a simple and important change which can help many tests. | 06:11 |
auniyal | https://review.opendev.org/c/openstack/nova/+/893584 | 06:11 |
gibi | elodilles: we can land the zed backport of https://review.opendev.org/q/topic:%22bug/2025480%22 now | 08:03 |
gibi | auniyal: done | 08:06 |
auniyal | gibi, thanks there are 2 more patches with same topic - can you please review them as well https://review.opendev.org/q/topic:%22refactor-volumeAttachment-calls%22 | 08:07 |
elodilles | gibi: ACK, thanks! +2+W'd \o/ | 08:31 |
opendevreview | Merged openstack/nova stable/zed: Reproduce bug #2025480 in a functional test https://review.opendev.org/c/openstack/nova/+/904374 | 09:13 |
gibi | sean-k-mooney bauzas: re: powermgmt; it seems that during compute startup when power management offlines unused pcpus the privsep daemon is not spawned automatically. During VM boot the privsep daemon spawned automatically | 09:23 |
gibi | it seems the privsep decorator fails to apply | 09:43 |
gibi | https://paste.opendev.org/show/b3m4lKm2z7c9yP2psvSJ/ | 09:43 |
Uggla | Hello, can someone clarify this: This returns a system hostname on which the hypervisor is running (based on the result of the gethostname system call, but possibly expanded to a fully-qualified domain name via getaddrinfo). I mean in which case it is a FQDN or not ? | 09:45 |
Uggla | And do you know if the behavior of libvirt changed recently ? | 09:46 |
gibi | bauzas: sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/885293 it is not backported to 2023.1 :/ | 09:54 |
gibi | bauzas: do you happen to have a list of bugfixes you implemented for the powermgmt feature. I want to check all to see what is not backported to 2023.1 | 09:55 |
bauzas | gibi: I don't have a lot of bugfixes for this | 09:56 |
bauzas | I think you found the only one missing :( | 09:56 |
bauzas | (sorry, was in meeting) | 09:56 |
gibi | OK | 09:57 |
gibi | I will backport it | 09:57 |
bauzas | thanks | 09:58 |
bauzas | Uggla: are you talking of hypervisor_hostname field ? | 10:02 |
bauzas | if so, this is virt-dependent | 10:03 |
Uggla | yep, in fact it seems we have users blocked by the patched introduced by Dan to check rename compute. They registered their nodes with the short name and libvirt seems to provide the FQDN and thus making the comparison to fail and block the compute startup. | 10:04 |
bauzas | Uggla: libvirt populates this field by https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L9611 | 10:05 |
bauzas | which itself calls libvirt with https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L1065 | 10:06 |
Uggla | @bauzas, yep I looked at the methods doc --> This returns a system hostname on which the hypervisor is running (based on the result of the gethostname system call, but possibly expanded to a fully-qualified domain name via getaddrinfo) | 10:06 |
Uggla | but I guess that's not libvirt which changed it's behavior. | 10:07 |
bauzas | yeah, just checked the libvirt release notes | 10:08 |
bauzas | this is possibly something that the operator did which broke | 10:08 |
Uggla | The question was more personal to understand what libvirt is doing. | 10:08 |
bauzas | and what dansmith writes actually prevents that, which is definitely what :) | 10:08 |
bauzas | we want :) | 10:08 |
bauzas | accidental hostname rewrites break placement and many other things | 10:09 |
Uggla | It seams the operator upgraded to 2023.1 and he is blocked by the check. | 10:09 |
Uggla | because short name != fqdn | 10:10 |
bauzas | that's good that they're blocked then | 10:10 |
Uggla | I guess it should not be blocked if the compute were registered wit the fqdn previously. | 10:11 |
Uggla | but I'm not sure we have guideline that enforce this ? | 10:12 |
Uggla | And how the operator ccan get out this issue without db surgery ? | 10:12 |
bauzas | nova only registers the nodename | 10:13 |
bauzas | that means that *something* changed it | 10:13 |
gibi | bauzas: there is another non backported fix for power mgmt: https://review.opendev.org/c/openstack/nova/+/885352 you implemented this on master but not backported. Therefore I see the bug in 2023.1 but did not noticed the master fix and implemented another fix that is now backported https://review.opendev.org/c/openstack/nova/+/903169 . Unfortunately | 10:13 |
bauzas | and now nova refuses to start | 10:13 |
gibi | https://review.opendev.org/c/openstack/nova/+/903169 has a bug in it as it expecting a wrong exception type | 10:13 |
gibi | bauzas: I will do some reverts and backports to clean this up | 10:14 |
bauzas | doh | 10:14 |
bauzas | gibi: apologies for the mess | 10:14 |
bauzas | I should create bugfixes with the same gerrit topic than the feature | 10:15 |
gibi | no worries I made my one fair share of mess in it | 10:15 |
bauzas | instead of using the bug number topic | 10:15 |
gibi | I prefer bug number topics | 10:15 |
bauzas | yeah I understand | 10:15 |
bauzas | this is a tradeoff | 10:16 |
bauzas | then I should mention some changeid in the commit msg, so gerrit could possibly link it | 10:16 |
bauzas | tbc, I don't trust my memory and that one is another evidence | 10:17 |
bauzas | the series was small and there were only two bugfixes, despite that I failed to remember about that | 10:17 |
bauzas | my brain is so fcked | 10:17 |
opendevreview | Merged openstack/nova stable/zed: Do not untrack resources of a server being unshelved https://review.opendev.org/c/openstack/nova/+/904375 | 10:19 |
gibi | bauzas: I don't know how to avoid this via automation | 10:24 |
gibi | elodilles: do I need to backport a revert, or I can do a revert on the stable branch independently of a revert on master? | 10:25 |
gibi | elodilles: i.e can I do git revert on stable, or do I need to do git cherry-pick -x on the revert commit from master to stable? | 10:26 |
gibi | elodilles: I will go with the cherry-pick as I affraid of the stable backport script will not allow an independent revert :/ | 10:32 |
Uggla | @bauzas you said we store only the shortname for compute name in the db right ? | 10:33 |
bauzas | nope | 10:33 |
bauzas | I said we store what libvirt gives us | 10:33 |
bauzas | if after an upgrade libvirt is giving us something else, then now with 2023.1 we fail | 10:34 |
bauzas | previously, we were blindly recreating duplicate resource providers leading to inconsistencies | 10:34 |
Uggla | yep it means that when the compute was registered he got the shortname and after upgrade or something else libvirt supply the fqdn. | 10:35 |
Uggla | @bauzas, note I understand the patch goal. | 10:36 |
elodilles | gibi: we usually simply backport reverts if it is needed all the way from master branch to multiple stable branches. if only needed for some older branches, then [stable-only] is fine. and of course if the bug is superimportant highly critical ultimate fix is needed, then also mark them as [stable-only] and merge them parallel in all stable branches :) | 10:47 |
elodilles | these are my memories from the past ^^^ :) | 10:48 |
gibi | not super critical | 10:49 |
gibi | going with the cherry-picks | 10:50 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Revert "[pwmgmt]ignore missin governor when cpu_state used" https://review.opendev.org/c/openstack/nova/+/905671 | 10:50 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/2023.2: Revert "[pwmgmt]ignore missin governor when cpu_state used" https://review.opendev.org/c/openstack/nova/+/905672 | 10:51 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/2023.2: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/905673 | 10:51 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/2023.1: Revert "[pwmgmt]ignore missin governor when cpu_state used" https://review.opendev.org/c/openstack/nova/+/905674 | 10:51 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/2023.1: cpu: fix the privsep issue when offlining the cpu https://review.opendev.org/c/openstack/nova/+/905675 | 10:51 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/2023.1: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/905676 | 10:51 |
gibi | elodilles, bauzas, sean-k-mooney ^^ | 10:51 |
gibi | it is complicated as on master we only need to revert, on 2023.2 we revert and backport the proper fix from master, on 2023.1 we revert, backport a related fix from 2023.2 and then backport the original good fix from master->2023.2 | 10:52 |
* bauzas goes off for gym but I'll look later | 10:52 | |
bauzas | gibi: ++ | 10:52 |
Uggla | do we agree that there is no "proper" way to rename a compute node. And that we need to stop the compute service, remove the compute from the configuration, and restart the compute to have it registered with the new name ? | 11:06 |
zigo | kashyap: For Ubuntu stuff, please ask jamespage or coreycb_ please, I have no idea about #ubuntu-kernel on Libera at all, and I honestly don't know anything about the Ubuntu kernel in general. | 11:07 |
kashyap | zigo: Thans for the contacts :) In the past I had a couple of chats on #ubuntu-kernel, while it used to be still on FN :) | 11:07 |
jamespage | kashyap: lacking some context - just reading the etherpad linked above somewhere | 11:12 |
kashyap | jamespage: Hi, I didn't ping you with full context yet. :) | 11:12 |
kashyap | jamespage: It's about a certain class of kernel panics we're seeing in the upstream CI. An example: https://bugs.launchpad.net/nova/+bug/2018612 | 11:13 |
kashyap | Admittedly (a) these are difficult to reproduce locally; and (b) distro kernel maintainers (understandably) might not have time to debug "unsupported distros" (as evidenced above). | 11:16 |
kashyap | I need to step out to get some lunch; need to drop here. (I don't have an IRC bouncer anymore) | 11:16 |
jamespage | ack | 11:16 |
auniyal | in cinderfixture for attachment_update https://review.opendev.org/c/openstack/nova/+/658904/4/nova/tests/fixtures.py#1997 | 11:19 |
auniyal | why this volume_id must be self.MULTIATTACH_VOL (the sent volume is a dynamically created new volume, ) MULTIATTACH_VOL is a contant | 11:19 |
auniyal | gibi ^ | 11:19 |
auniyal | I am trying to to create a server from snapshot-image and while updating attachment in cinder its failing at this step - can someone please tell why this is wrtten and what I can I do to use MULTIATTACH_VOL. | 11:23 |
auniyal | so attachment can get updated | 11:24 |
opendevreview | Merged openstack/nova master: Adds server show in helpers https://review.opendev.org/c/openstack/nova/+/893584 | 11:33 |
opendevreview | Amit Uniyal proposed openstack/nova master: Fixes bug 2048184 https://review.opendev.org/c/openstack/nova/+/904817 | 12:00 |
opendevreview | Amit Uniyal proposed openstack/nova master: Updates glance fixture for create image https://review.opendev.org/c/openstack/nova/+/905684 | 12:00 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: snapshot tests https://review.opendev.org/c/openstack/nova/+/905685 | 12:00 |
bauzas | Uggla: in our OSP product, we have some solution for renaming a compute by scaling in and the out the computes | 13:36 |
SvenKieske | Hey, has anyone ever thought about enabling "core scheduling" support by default? It's in libvirt since v8.9.0 but I can't find any information regarding openstack implementation status: https://www.libvirt.org/news.html#v8-9-0-2022-11-01 | 14:42 |
SvenKieske | when searching for this in nova I also found this gem, which seems not to be tracked anymore (last activity 2019, but bug was never closed): https://bugs.launchpad.net/nova/+bug/1417975 | 14:46 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: snapshot tests https://review.opendev.org/c/openstack/nova/+/905685 | 14:49 |
opendevreview | Amit Uniyal proposed openstack/nova master: Updates glance fixture for create image https://review.opendev.org/c/openstack/nova/+/905684 | 14:56 |
opendevreview | Amit Uniyal proposed openstack/nova master: Fixes bug 2048184 https://review.opendev.org/c/openstack/nova/+/904817 | 14:56 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: snapshot tests https://review.opendev.org/c/openstack/nova/+/905685 | 14:56 |
bauzas | SvenKieske: that sounds a new feature request, if you want to use it with nova | 15:12 |
gibi | dansmith melwitt bauzas : a fresh kernel panic on master https://review.opendev.org/c/openstack/nova/+/905671 | 15:13 |
dansmith | gibi: that's similar to the one that we opened against the ubuntu team and they asked us to repro on a newer kernel, so that's ... "good" | 15:23 |
* gibi feels useful | 15:23 | |
SvenKieske | bauzas: yeah sure, I'm just a little baffled that I seem to be the first person to mention this feature request, as this is a rather useful feature, from a security perspective. | 15:31 |
kashyap | gibi: Hi, dansmith tells me you've ran into another kernel panic. Got a log link? | 15:47 |
dansmith | kashyap: https://review.opendev.org/c/openstack/nova/+/905671 | 15:47 |
dansmith | kashyap: also for future reference: https://meetings.opendev.org/irclogs/ | 15:48 |
kashyap | dansmith: Hi; thanks! | 15:48 |
*** Continuity__ is now known as Continuity | 15:48 | |
kashyap | dansmith: Oh, yeah; I know we log upstream stuff; forgot for a min :) | 15:48 |
dansmith | oh, okay | 15:48 |
kashyap | Now the challenge is find the elusive local reproducer :-( | 15:49 |
kashyap | Interesting, this was discovered by a power-management related patch | 15:49 |
dansmith | I doubt it's related, and I also doubt we're going to find a local reproducer | 15:50 |
gibi | kashyap: power management is disable by default in our upstream CI | 15:52 |
kashyap | Hm, I see. | 15:52 |
gibi | so the patch itself is not related | 15:52 |
kashyap | dansmith: Hm, if we can't figure out a way get a local reproducer, it throws a wrench in our communication with kernel maintainers (whether it be distro or upstream). I'm sure you know this | 15:53 |
kashyap | I'm wondering how we can unjam this "deadlock" situation. | 15:54 |
dansmith | there's still information in the stack traces to provide hints on where to go from here | 15:54 |
bauzas | nova meeting in 5 mins | 15:55 |
bauzas | *here | 15:55 |
dansmith | and also, kernel developers are (IME) used to working on issues they can't reproduce because they're related to hardware and/or deadlocks where we can't get forensics out | 15:55 |
dansmith | a kernel developer might also have a better idea about how to force a repro from the details of what was happening when it faulted | 15:56 |
kashyap | dansmith: Yeah; that's a fair point; at least _some_ of the traces should lead us somewhere | 15:57 |
kashyap | And to your previous point, from a recent experience one kernel trace, I did see at least one kernel maintainer immediately say "it's a known issue due to a hardware bug" | 15:59 |
dansmith | it's also in very early boot of a kernel and userspace they have good access to | 15:59 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Jan 16 16:00:02 2024 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
elodilles | o/ | 16:00 |
grandchild | o/ | 16:00 |
bauzas | heya | 16:00 |
dansmith | o/ (kinda) | 16:00 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:00 |
bauzas | let's try to have a short meeting | 16:00 |
bauzas | some folks have another meeting in 15 mins | 16:01 |
bauzas | #topic Bugs (stuck/critical) | 16:01 |
bauzas | #info No Critical bug | 16:01 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 46 new untriaged bugs (+3 since the last meeting) | 16:01 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:01 |
bauzas | Uggla worked a lot on the bugs | 16:01 |
bauzas | Uggla: have you modified the status for all of the ones you looked at ? | 16:01 |
fwiesel | o/ | 16:02 |
bauzas | anyway, moving on | 16:03 |
bauzas | elodilles: are you okay if you could look at the bugs next week ? | 16:03 |
elodilles | sure o/ | 16:03 |
gibi | o/ | 16:04 |
bauzas | cool | 16:04 |
bauzas | #info bug baton is elodilles | 16:04 |
bauzas | #info bug baton is elodilles | 16:04 |
bauzas | shit | 16:04 |
bauzas | #topic Gate status | 16:04 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:04 |
bauzas | #link https://etherpad.opendev.org/p/nova-ci-failures-minimal | 16:04 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status | 16:05 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:05 |
bauzas | do guys want to discuss about the guest kernel issues ? | 16:05 |
bauzas | (periodic runs are all ok) | 16:05 |
bauzas | looks not, moving on then | 16:06 |
bauzas | #topic Release Planning | 16:06 |
bauzas | #link https://releases.openstack.org/caracal/schedule.html#nova | 16:06 |
bauzas | #info Caracal-3 (and feature freeze) milestone in 6 weeks | 16:06 |
bauzas | #topic Review priorities | 16:06 |
bauzas | as a reminder, please look at this etherpad if you want to review our implementations :)= | 16:07 |
bauzas | #link https://etherpad.opendev.org/p/nova-caracal-status | 16:07 |
bauzas | #topic Stable Branches | 16:07 |
bauzas | elodilles: your time | 16:07 |
elodilles | o/ | 16:07 |
elodilles | nothing to report actually | 16:07 |
elodilles | state is the same as last week | 16:07 |
elodilles | (i've pinged release cores to approve Nova's Zed release) | 16:07 |
bauzas | ++ | 16:08 |
elodilles | that's all | 16:08 |
bauzas | I think I did +1 for the zed release | 16:08 |
elodilles | yepp, you did | 16:08 |
bauzas | cool | 16:08 |
bauzas | elodilles: thanks | 16:08 |
elodilles | thanks too :) | 16:08 |
bauzas | #topic vmwareapi 3rd-party CI efforts Highlights | 16:08 |
bauzas | grandchild: fwiesel: your time :) | 16:08 |
fwiesel | #Info ETA on exemption for public access by end of week. Hopefully under DNS name: openstack-ci-logs.global.cloud.sap | 16:09 |
bauzas | <3 | 16:09 |
fwiesel | So, not much happened. Still wrapped in red tape. But at least I have a dns name. | 16:09 |
bauzas | so we'll wait :) | 16:09 |
fwiesel | That's it from my side. Any questions? | 16:09 |
bauzas | thanks a lot | 16:09 |
bauzas | nope from me | 16:09 |
fwiesel | You're welcome. | 16:09 |
bauzas | I think we said we should wait until milestone-2 for seeing whether we would remove vmwareapi, but given what you did and what you continue, of course we won't | 16:10 |
bauzas | so let's continue to see what things happen | 16:10 |
fwiesel | Thanks. Much appreciated. | 16:11 |
bauzas | thanks folks for the work, definitely nice | 16:11 |
bauzas | #topic Open discussion | 16:11 |
bauzas | nothing in the agenda, anything anyone ? | 16:11 |
bauzas | looks not, | 16:12 |
bauzas | thanks all, was a quick one | 16:12 |
bauzas | #endmeeting | 16:12 |
opendevmeet | Meeting ended Tue Jan 16 16:12:27 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:12 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2024/nova.2024-01-16-16.00.html | 16:12 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2024/nova.2024-01-16-16.00.txt | 16:12 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2024/nova.2024-01-16-16.00.log.html | 16:12 |
fwiesel | Thanks chat with you next week, hopefully with more news | 16:13 |
bauzas | ++ | 16:14 |
Uggla | @bauzas, sorry I was discussing with Artom, and I did not managed to answer. | 16:26 |
Uggla | Yep I updated the status of the bugs. I read a lot of them, but did not manage to triage a lot of them. | 16:27 |
opendevreview | Takashi Kajinami proposed openstack/placement master: Bump hacking https://review.opendev.org/c/openstack/placement/+/905706 | 16:29 |
Uggla | I think https://bugs.launchpad.net/nova/+bug/2048154 is important to be fixed. I hope @Amit could have a look. | 16:29 |
opendevreview | Takashi Kajinami proposed openstack/python-novaclient master: Bump hacking https://review.opendev.org/c/openstack/python-novaclient/+/905707 | 16:30 |
opendevreview | Takashi Kajinami proposed openstack/osc-placement master: Bump hacking https://review.opendev.org/c/openstack/osc-placement/+/905715 | 16:41 |
frickler | kashyap: dansmith: added some comments on the etherpad. given that the issue (afaict) only happens for volume resize tests, I'm wondering how we could verify that no actual data corruption is happening on the volume in that scenario | 16:44 |
frickler | because if it does happen, chasing possible kernel bugs is kind of moot | 16:44 |
kashyap | frickler: Hi, yeah, I've just seen your comment on the Etherpad. Good question. (I'm in a meeting, and will be slow here) | 16:45 |
dansmith | frickler: these test instances haven't even made it out of initial kernel boot, so no real opportunity to corrupt anything yet | 16:45 |
frickler | is this happening on the initial boot, not after resize? | 16:46 |
dansmith | frickler: and we're not booting from the volumes either, since we're using the preloaded kernel boot method, which means corruption of the on-disk data causing the crash wouldn't be a thing | 16:46 |
dansmith | before the recent switch to kernel preload that could have been a possible option, but with kernel preload, disk corruption due to the resize should be ruled out AFAIK | 16:47 |
dansmith | I'm also not sure these are only happening on resize tests | 16:49 |
frickler | well the errors seem to be happening around the time when the switch to the actual root-fs would be happening. and in the traceback I was looking at, it was happening after the resize. do you happen to have a pointer to the preloaded kernel change? | 16:57 |
dansmith | frickler: I think it's still loading modules from the initramfs when it crashes | 16:59 |
dansmith | frickler: https://review.opendev.org/c/openstack/nova/+/902217 | 17:00 |
dansmith | (happened while I was out) | 17:00 |
dansmith | frickler: I think what you're seeing there is that it tried to load all the modules from the initramfs that it needed to have access to storage, which all crashed, and then it kept going trying to do the switch_root and failed to do so | 17:01 |
dansmith | it did get vda from the virtio driver so I would think it would have worked even, but with that level of broken having happened who knows | 17:02 |
frickler | from a quick scan, that patch only changes some specific nova jobs, not tempest-integrated-compute? | 17:03 |
dansmith | frickler: tbf, I hadn't even looked at the patch, I was going based on sean-k-mooney, melwitt saying nova-next was the only one still doing image boot | 17:04 |
dansmith | the original bug we opened for this GPF was before we even had the resize-volume-backed test, FWIW :) | 17:05 |
tkajinam | o/ I'd appreciate it if https://review.opendev.org/c/openstack/nova/+/905314 can get some attention because it's now blocking requirement bump. | 17:20 |
dansmith | frickler: yeah, that instance in question is doing preloaded kernel boot: | 17:22 |
dansmith | Jan 16 11:45:59.955345 np0036421623 nova-compute[74855]: <kernel>/opt/stack/data/nova/instances/76256ac2-5db8-4dab-b3b8-0297a2ad2b71/kernel</kernel> | 17:22 |
dansmith | Jan 16 11:45:59.955345 np0036421623 nova-compute[74855]: <initrd>/opt/stack/data/nova/instances/76256ac2-5db8-4dab-b3b8-0297a2ad2b71/ramdisk</initrd> | 17:22 |
dansmith | so, shouldn't have loaded the ramdisk from disk | 17:23 |
sean-k-mooney | dansmith: sorry was drinking coffee reading back | 17:23 |
sean-k-mooney | frickler: dansmith https://review.opendev.org/c/openstack/nova/+/902809/1 | 17:24 |
sean-k-mooney | we had a folow up patch to overried the images for jobs defiend out of repo | 17:24 |
sean-k-mooney | so tempest-integrated-compute was updated here https://review.opendev.org/c/openstack/nova/+/902809/1/.zuul.yaml#944 | 17:25 |
sean-k-mooney | dansmith: i quickly tried enabling the larger vms here by the way | 17:26 |
sean-k-mooney | going to 32 gb feels excessinve but based on the low point (and concurnace 8) we were using about 10G on the contoler | 17:27 |
sean-k-mooney | so if we had a 16G flavor or even a 11G flavor we woudl entirly elimiate our usage of swap form the job | 17:27 |
clarkb | fwiw we do have larger flavors they are just limited in quantity as only a few (maybe one?) cloud currently provide them | 17:28 |
sean-k-mooney | ya i was looking at that | 17:28 |
sean-k-mooney | we have 32G instance form one vexhost cloud and 16G instnace form another | 17:29 |
sean-k-mooney | and then 8G instnace in general | 17:29 |
clarkb | but also I think any "make nodes bigger" effort should coincide with a "understand where the memory use is coming from and whether or not it represents bugs/leaks" effort | 17:29 |
sean-k-mooney | ya so it looks like if we were able to move to 16G as standard we would signifcatly increase the performance of the job (tempest full was about an hour) | 17:30 |
clarkb | for example privsep uses tremendous amounts of memory for what it is | 17:30 |
sean-k-mooney | but if we were to move to that we would likely expand to fill it | 17:30 |
clarkb | with the little info I have I would classify that as a bug | 17:30 |
sean-k-mooney | clarkb: frickler dansmith while ye are aroudn could ye look at my zram change when ye have time https://review.opendev.org/c/openstack/devstack/+/890693 | 17:32 |
sean-k-mooney | clarkb: on the privsep topic, i have not looked at it in detail before but do you knwo roughly how much ram its using per process | 17:33 |
sean-k-mooney | i assume it partly depend on the service? i.e. nova vs neutron | 17:33 |
dansmith | sean-k-mooney: yeah I'm for that for sure, but not sure the qa people will want it on by default | 17:33 |
frickler | sean-k-mooney: tbh I've stayed away from that patch for now since I'm wary of making the CI setup even more complex | 17:33 |
dansmith | gmann: kopecmartin ^ | 17:33 |
sean-k-mooney | ack i can change the defaults os if people prefer it to be off by default ill refactor for that | 17:34 |
JayF | sean-k-mooney: /me points an ironic change at 890693 for science | 17:34 |
sean-k-mooney | its off by default locally but on by default in ci currently | 17:34 |
dansmith | sean-k-mooney: if you disable it by default I'll +2 now, but if not, I want a read from gmann at least | 17:34 |
sean-k-mooney | ack ill do that i need to deploy a devstack vm for my healthcheck stuff anyway so ill upated it then | 17:35 |
clarkb | sean-k-mooney: the memory usage varies by the service talking to the privsep instance (which implies to me that workload probably impacts buffer sizes). In total I think it was about half a gig for privsep | 17:35 |
sean-k-mooney | clarkb: i think tis more related to the size of our oslo.config global + the code that is imported | 17:35 |
clarkb | sean-k-mooney: oh ya I guess everything is a regex? Those shouldn't be terribly large but if you have enough of them... might also be worth checking if libre2 is more efficient maybe | 17:36 |
sean-k-mooney | we have some chaced module level varables for example that might be related to this | 17:37 |
clarkb | sean-k-mooney: for zswap I notice you set swappiness to 100. I think devstack sets swappiness elsewhere you might need to reconcile the two to ensure they don't fight each other | 17:37 |
clarkb | sean-k-mooney: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/roles/configure-swap/tasks/main.yaml#L46-L61 this is the code that does it and I think devstack jobs call that first so you would override which I guess is fine | 17:38 |
sean-k-mooney | clarkb: i was thinking fo https://github.com/openstack/nova/blob/a72f7eaac78927892b937d451cbacb24a83c05ac/nova/api/validation/parameter_types.py#L106 | 17:38 |
sean-k-mooney | although privsep shoudl not be using the api stuff | 17:38 |
sean-k-mooney | so hopefully that wont matter | 17:39 |
clarkb | I don't have any other comments on the zswap change. Seems like it may be worth experimenting with | 17:39 |
dansmith | yep | 17:39 |
sean-k-mooney | i know people have seen a lot of success with desktop responsiveness on rassbery pi with it | 17:40 |
sean-k-mooney | so i was hoping it woudl help us in a simialr way | 17:40 |
sean-k-mooney | ok ill go remove the default change in teh patch | 17:40 |
sean-k-mooney | and we can then enabel it on a per job basises in nova | 17:40 |
sean-k-mooney | JayF: did you see https://review.opendev.org/c/openstack/nova/+/905406 by the way | 17:44 |
sean-k-mooney | JayF: that extract the common instnace metadata code up to the driver level and refactors libvirt to use it | 17:44 |
JayF | https://review.opendev.org/c/openstack/nova/+/900831 I really want this to land before I push up more patches | 17:44 |
JayF | I've been burned in the past by stacking a bunch of patches and being in rebase hell | 17:45 |
sean-k-mooney | ah yes well want that to land to make backport simpler | 17:45 |
JayF | So I have that in my back pocket as a "next thing" but waiting for 900831 to land first | 17:45 |
clarkb | JayF: this may or may not be useful but the way I try to deal with those is to "squash back" I make all of my edits on the tip of my dev branch and then commit will be something like "squash fix for commit foo" then I git rebase -i HEAD~X where X is the number of commits I need to go back to and squash the new commit back into the old one. Then git review the whole thing. I | 17:48 |
clarkb | find it simplifies things because I can focus on the necessary fixes and then focus on the accounting and usually I only need a single rebase to update a whole stack with any number of fixes | 17:48 |
JayF | clarkb: it's more that I have enough things on my list that I can action and will move more quickly so I just prioritize those over things which will languish for longer | 17:49 |
bauzas | git reflog FTW | 17:49 |
JayF | I can't remember the last time I was just like "there's nothing to do today" :D | 17:49 |
jrosser | clarkb: a colleague of mine demoed this recently https://github.com/tummychow/git-absorb | 17:55 |
jrosser | which is some kind of magic | 17:56 |
clarkb | that looks similar to what I do with the pairing of changes to squash done automatically based on file heuristics | 17:56 |
clarkb | There is also git restack which corvus wrote so that you don't have to manually determine the value of X when rebasing | 17:57 |
sean-k-mooney | johnthetubaguy: are you able to be the second core to review https://review.opendev.org/c/openstack/nova/+/900831 its JayF fix for https://launchpad.net/bugs/2043036 | 19:05 |
sean-k-mooney | otherwizse melwitt dansmith ^ perhasp ye can review | 19:05 |
opendevreview | Merged openstack/osc-placement master: Bump hacking https://review.opendev.org/c/openstack/osc-placement/+/905715 | 19:33 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Check if destination can support the src mdev types https://review.opendev.org/c/openstack/nova/+/904177 | 20:19 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Reserve mdevs to return to the source https://review.opendev.org/c/openstack/nova/+/904209 | 20:19 |
opendevreview | Sylvain Bauza proposed openstack/nova master: WIP(docs): Modify the mdevs in the migrate XML https://review.opendev.org/c/openstack/nova/+/904258 | 20:19 |
opendevreview | sean mooney proposed openstack/nova master: enable zswap in nova ci jobs. https://review.opendev.org/c/openstack/nova/+/905791 | 20:38 |
opendevreview | sean mooney proposed openstack/nova master: enable zswap in nova ci jobs. https://review.opendev.org/c/openstack/nova/+/905791 | 20:44 |
opendevreview | Merged openstack/python-novaclient master: Bump hacking https://review.opendev.org/c/openstack/python-novaclient/+/905707 | 20:51 |
opendevreview | Merged openstack/placement master: Bump hacking https://review.opendev.org/c/openstack/placement/+/905706 | 21:06 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!