opendevreview | Merged openstack/nova stable/victoria: [stable-only] Drop lower-constraints job https://review.opendev.org/c/openstack/nova/+/838032 | 04:05 |
---|---|---|
*** amoralej|off is now known as amoralej | 06:47 | |
kashyap | gibi: Hi, will look at the review comments; if isomething is unclear, I'll ask | 07:44 |
kashyap | And thanks for the review! | 07:50 |
kashyap | Also, bonus points for reading the full commit message on baselineHypervisorCPU() :) | 07:51 |
gibi | :) | 08:02 |
gibi | Uggla: added some extra information about the notification work in https://review.opendev.org/c/openstack/nova-specs/+/833669 let me know if you have questions | 08:26 |
Uggla | gibi, thx I will have a look. | 08:28 |
opendevreview | ribaudr proposed openstack/nova master: [WIP] Attach Manila shares via virtiofs (db+object) https://review.opendev.org/c/openstack/nova/+/831193 | 08:42 |
opendevreview | ribaudr proposed openstack/nova master: [WIP] Attach Manila shares via virtiofs (manila abstraction) https://review.opendev.org/c/openstack/nova/+/831194 | 08:42 |
opendevreview | ribaudr proposed openstack/nova master: [WIP] Attach Manila shares via virtiofs (drivers) https://review.opendev.org/c/openstack/nova/+/833090 | 08:42 |
opendevreview | ribaudr proposed openstack/nova master: [WIP] Attach Manila shares via virtiofs (api) https://review.opendev.org/c/openstack/nova/+/836830 | 08:42 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Record SRIOV PF MAC in the binding profile https://review.opendev.org/c/openstack/nova/+/829248 | 08:52 |
opendevreview | Merged openstack/nova master: Follow up for nova-manage image property commands https://review.opendev.org/c/openstack/nova/+/830895 | 09:43 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/wallaby: Reproduce bug 1945310 https://review.opendev.org/c/openstack/nova/+/811414 | 09:45 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/wallaby: Query ports with admin client to get resource_request https://review.opendev.org/c/openstack/nova/+/811416 | 09:45 |
sean-k-mooney | bauzas: if i have missed anythin or missrepresneted can you comment on http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028268.html | 10:22 |
sean-k-mooney | im 99% sure that everything i said there is correct but i have not personally configured vGPUs so if you can confirm that woudl be good | 10:23 |
bauzas | sean-k-mooney: I already replied ;) | 10:58 |
bauzas | sean-k-mooney: but thanks for your reply | 10:58 |
kashyap | gibi[m]: Got a minute here? I'm going a bit mad with the unit test -- something is wrong here because the 'obj' is printing empty <cpu/> when I do a print (obj) - https://paste.opendev.org/show/b9TodzDYO7ukIBO8mGEs/ | 11:32 |
opendevreview | Merged openstack/nova master: Sync rootwrap.conf from oslo.rootwrap https://review.opendev.org/c/openstack/nova/+/823229 | 11:34 |
gibi | kashyap: will look in a minute | 11:35 |
kashyap | No rush | 11:37 |
*** bhagyashris_ is now known as bhagyashris | 11:55 | |
opendevreview | Wenping Song proposed openstack/nova-specs master: Usage of new trait of OWNER_NOVA https://review.opendev.org/c/openstack/nova-specs/+/819510 | 12:09 |
gibi | kashyap: now looking :) | 12:23 |
kashyap | Cool; I'm sure it's something stupid that I'm not parsing myself :D | 12:24 |
sean-k-mooney | gibi: just pushed my comment for https://review.opendev.org/c/openstack/nova-specs/+/819510 but i see the also just pushed an update... | 12:27 |
gibi | kashyap: at least I can reproduce your problem locally :) | 12:29 |
gibi | sean-k-mooney: I will look back to the spec | 12:30 |
kashyap | gibi: Yeah. I wonder where I'm messing up the parsing here | 12:30 |
sean-k-mooney | songwenping_: can you look at my comment on patchset 5 | 12:31 |
sean-k-mooney | songwenping_: if you can respin the spec with those changes i think it would me mergable, please also include my version of the upgrade impact section | 12:32 |
sean-k-mooney | its imporant to call out the depenecy on upgrading placement before nova so that the new standard traits are aviable | 12:33 |
gibi | kashyap: your test code calls https://github.com/openstack/nova/blob/dc30a7e6d19509ce34cb89ffcdb4ff2c36469d89/nova/virt/libvirt/config.py#L75 that is pretty empty | 12:33 |
gibi | wait, no | 12:34 |
gibi | hang on | 12:34 |
* kashyap hangs on | 12:34 | |
gibi | :D | 12:35 |
gibi | it correctly calls nova.virt.libvirt.config.LibvirtConfigCPU.parse_dom | 12:35 |
kashyap | Yea, did you put a breakpoint() to see it | 12:37 |
*** amoralej is now known as amoralej|lunch | 12:38 | |
kashyap | All I want to assert is that the test is parsing the <cpu> element correctly. So assert for the presence of 'model' / 'vendor' etc? | 12:39 |
gibi | does nova use LibvirtConfigCPU to parse xml? I only see existing tests that verify to_xml but no parse_dom | 12:42 |
gibi | it seems that parsing code in LibvirtConfigCPU.parse_dom is either incorrect or the example XML you use is incorrect | 12:45 |
kashyap | gibi: Yeah, I didn't find any of parse_dom either | 12:45 |
gibi | the parser code assumes that everything is directly under <cpu> but in your example XML the tags are under cpu/mode | 12:46 |
kashyap | gibi: The example XML is ... trimmed down output from running `virsh domcapabilities` on my computer | 12:46 |
* kashyap goes to recheck | 12:46 | |
gibi | so the exta nesting by "mode" is makes the parse_dom to find nothing to match | 12:46 |
kashyap | gibi: Run `virsh domcapabilities` and you'll see tags under several modes | 12:47 |
* kashyap checks | 12:47 | |
gibi | then you need to handle the extra mode nesting in the parse_dom code | 12:47 |
gibi | but be aware that the to_xml code also emits the xml without the extra mode nesting right now | 12:48 |
kashyap | gibi: Hmm, I see. So, here's a test you do: the "everything under <cpu>" from `virsh capabilities` | 12:48 |
kashyap | So carefully run these two commands and compare the <cpu> bits: | 12:49 |
kashyap | - virsh capabilities # notice the <cpu> — everything is under it; and it has no "policy" for features | 12:49 |
kashyap | - virsh domcapabilities # notice the <cpu> and compare w/ the above | 12:49 |
gibi | yeah I see | 12:50 |
gibi | so capabilities emits different structure than domcapabilities | 12:50 |
kashyap | Yes, indeed. | 12:50 |
gibi | so probably we need a different config object to parse the two different model | 12:50 |
gibi | as I assume nova still uses getCapabilities even after we start using getDomCapabitilies | 12:51 |
gibi | or can we simply ignore the mode part of the domcapabilities? | 12:51 |
gibi | as we can assume that mode is host-model | 12:51 |
* kashyap taps on the table and thinks | 12:52 | |
gibi | I have to step away from the keyboard for a while now but I will be back later this afternoon | 12:52 |
kashyap | Sure, no prob | 12:52 |
kashyap | I think we can ignore the mode part here, as we indeed know that it's of 'host-model' | 12:55 |
kashyap | And that works | 12:55 |
kashyap | gibi: When you're back: indeed, putting the stuff under <cpu> does resolve the test prob. (Also, near as I see, I don't think we need a diff config object) | 12:57 |
sean-k-mooney | form a consitncy point of view we should have different config objects | 13:04 |
sean-k-mooney | we have one config object per element type in the xml | 13:04 |
sean-k-mooney | so we shoudl not use the same python object to parse both the dom caps and normal caps | 13:05 |
sean-k-mooney | kashyap: ^ | 13:05 |
sean-k-mooney | so i agree with gibi here | 13:05 |
kashyap | sean-k-mooney: There already exists a config object for domCaps | 13:06 |
kashyap | Also, Gibi was wondering out loud, and was not asking. I'll dig a bit more and see what's required here. | 13:06 |
sean-k-mooney | ack | 13:06 |
opendevreview | Merged openstack/nova stable/xena: Reattach mdevs to guest on resume https://review.opendev.org/c/openstack/nova/+/821126 | 13:13 |
*** amoralej|lunch is now known as amoralej | 13:24 | |
gibi | sean-k-mooney, kashyap: if we can always treat caps and domcaps as equivalent with the assumption of mode host-mode in case of domcaps, then I'm OK to keep a single config object for both | 13:51 |
gibi | more preciesly *if we can treat the cpu part of caps and domcaps | 13:54 |
kashyap | gibi: Good point. A related thing, that you've noticed yourself, is domCaps also reports "policy" | 13:55 |
kashyap | gibi: And I _don't_ think we can treat the CPU part of caps and domCaps as same ... because, here, see the diff in features from both, from my laptop - they're not equal: | 13:57 |
kashyap | gibi: https://paste.opendev.org/show/bvMLvWCzjusOKOyzgsOi/ | 14:00 |
kashyap | They're not equal, you see | 14:00 |
kashyap | (Ignore line-1, it's accidental) | 14:00 |
gibi | ahh, I see, if we cannot ignore the diff as we depend on the topology for example | 14:01 |
kashyap | Lemme check with Jiri Denemark from libvirt | 14:01 |
gibi | so I have to side with sean-k-mooney that we need to split the config model | 14:02 |
gibi | and we can add helpers to move data between caps and domcaps object when needed | 14:02 |
gibi | kashyap: sure, more viewpoint helps | 14:03 |
sean-k-mooney | the feature listed in virsh capabliteis are the cpu feature on the host | 14:06 |
sean-k-mooney | and the model listed is the closes standard model to the host feature set | 14:06 |
kashyap | Yeah, as I wrote in my small primer on differences: | 14:06 |
sean-k-mooney | wehn we use dom caps that actully provides what will be used by the vm when that model is selected | 14:06 |
kashyap | sean-k-mooney: No, not "what will be used by the VM" - rather: | 14:07 |
sean-k-mooney | its providing the content of what we will be generted in the libvirt domain xml | 14:07 |
sean-k-mooney | if we requested host model | 14:07 |
sean-k-mooney | which is what i ment by "what will be used" in my previous sentence | 14:08 |
kashyap | sean-k-mooney: It answers two questions: (a) whether a feature is supported by the host; and (b) whether it is supported by a given QEMU binary running on the host | 14:08 |
* kashyap --> a call; bbiab | 14:08 | |
sean-k-mooney | artom: gibi by the way i can confirm that https://review.opendev.org/c/openstack/os-brick/+/838871 fixes the unit test failures | 14:09 |
sean-k-mooney | so this backport https://review.opendev.org/q/topic:bug%252F1947370 | 14:09 |
sean-k-mooney | is what causes the lock path issues | 14:09 |
sean-k-mooney | the reviert is not nessisarly the correct approch | 14:10 |
sean-k-mooney | but the backport made a previously optional config option required | 14:10 |
sean-k-mooney | we neither set the config option in our test or the enviornmental variable | 14:10 |
sean-k-mooney | so strictly speaking i do no think os-brick shoudl have backported that change as written | 14:10 |
sean-k-mooney | it did not comply with satable policy since it requirs config to be updated. | 14:11 |
sean-k-mooney | elodilles:^ correct me if that is a wrong intepretation | 14:11 |
sean-k-mooney | if they provided a sane default then i think it woudl have been fine | 14:12 |
sean-k-mooney | like /run/os-brick/lock or /tmp/os-brick/lock | 14:12 |
sean-k-mooney | but in its current form its a breaking change for any deployemnt that did not set that | 14:13 |
sean-k-mooney | artom: gibi https://bugs.launchpad.net/os-brick/+bug/1969794 | 14:40 |
elodilles | sean-k-mooney: yepp, that's against stable policy and was discussed here with cinder team: https://review.opendev.org/c/openstack/releases/+/829590 | 14:53 |
gibi | sean-k-mooney: good catch | 14:54 |
sean-k-mooney | so the option is form oslo_concurrency | 14:55 |
sean-k-mooney | not form os-brick or nova | 14:55 |
sean-k-mooney | so we can adress this in a few ways | 14:55 |
sean-k-mooney | in nova we could jsut set the env varable to somethign sane in the tox.ini | 14:56 |
sean-k-mooney | that woudl be a minimal fix we could also mock some of the calls in a fixture | 14:56 |
sean-k-mooney | apprently both nova and cinder docuemtn that this shoudl be set in the config in our install guides | 14:56 |
sean-k-mooney | so in practice all cloud should have the config option set | 14:56 |
gibi | so I assume nova also uses that config option for its of file lock | 14:57 |
sean-k-mooney | elodilles: gibi any prefernce on how to proceed. | 14:57 |
sean-k-mooney | yes | 14:57 |
sean-k-mooney | we do | 14:57 |
gibi | so somewhere in the nova test env we already provide a default for that option | 14:57 |
sean-k-mooney | and os-brick is runnign in novas process space so its reading our config value | 14:57 |
sean-k-mooney | gibi: that or we mock the locking code | 14:58 |
sean-k-mooney | so it does not write to disk | 14:58 |
gibi | yeah | 14:58 |
gibi | we already do something so that the nova usage of that opt is covered | 14:58 |
sean-k-mooney | in this case since its failing its actully trying to create a file systm lock in the unit test | 14:58 |
sean-k-mooney | which it shoudl not be | 14:58 |
sean-k-mooney | i woudl guess we are just missing a fixture in those tets | 14:58 |
gibi | yeah that is what I think too | 14:59 |
gibi | we need to find what is missing and extend the fixture | 14:59 |
kashyap | gibi: To tie up the lose end on caps vs domCaps -- the guidance from the libvirt folks is (a) no, we can't treat the caps == domCaps w/ 'host-model' mode; and (b) we should use domCaps wherever possible. | 15:13 |
kashyap | gibi: So that means, we should introduce a new config object | 15:13 |
gibi | kashyap: ack, make sense | 15:13 |
kashyap | gibi: I might need some help on this XML parsing ... I'll take a stab | 15:14 |
kashyap | ... at it. | 15:14 |
gibi | kashyap: just ping me if you need another set of eyes | 15:14 |
kashyap | Will do; thx | 15:14 |
mnaser | i would appreciate some reviews on https://review.opendev.org/c/openstack/nova/+/830646 (wrt allowing addition of viommu to vms) | 15:59 |
bauzas | mnaser: you're not the first one to ask for reviews on that stephenfin's old patch, will mark it as review priority for the team | 16:13 |
mnaser | bauzas: cool thanks, the other one might have been ricolin but thats coming from the same side, so not sure if thats super fair ;) haha | 16:14 |
bauzas | that being said, the bp isn't validated yet | 16:14 |
bauzas | and not sure stephenfin will unghost himself :) | 16:15 |
stephenfin | bauzas: I think mnaser and ricolin are taking care of it now? | 16:15 |
bauzas | good question | 16:16 |
mnaser | yeah it's ready (imho) from a code perspective, but if there's something else we have to do, we can take care of | 16:16 |
* mnaser has mostly been working on bug fixes and this is the most feature-esque thing in nova ever ;p | 16:16 | |
bauzas | mnaser: this is just a paperwork question | 16:17 |
bauzas | since a new extraspec is added + the api validation | 16:17 |
bauzas | this has to be tracked correctly | 16:17 |
bauzas | and we need to balance in a meeting whether we need a spec or not | 16:17 |
bauzas | mnaser: that being said, I'm more than glad to welcome you as a new feature contributor ! :p | 16:18 |
bauzas | we don't have badges but I can create one for Berlin :p | 16:19 |
stephenfin | I can't say if it needs a spec, but it definitely needs a blueprint and some discussion in the meeting | 16:19 |
stephenfin | given there's now an extra spec (I think? | 16:19 |
stephenfin | ) | 16:19 |
bauzas | yeah and yeah | 16:20 |
bauzas | (needs a blueprint and probably don't need a spec) | 16:20 |
bauzas | stephenfin: have you just added some fancy gerrit topic for some non-existing related blueprint ? | 16:21 |
bauzas | if so, you're bragging. | 16:21 |
bauzas | :p | 16:22 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: Add workaround to remove compareCPU() check on the destination https://review.opendev.org/c/openstack/nova/+/838926 | 16:37 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: Add workaround to skip compareCPU() check on the destination https://review.opendev.org/c/openstack/nova/+/838926 | 16:40 |
sean-k-mooney | mnaser: viommu is not useful without a lot of extra work | 17:08 |
sean-k-mooney | stephenfin: that definetly needs a spec | 17:08 |
sean-k-mooney | viommu is trivial to enable i have implemeted that locally before | 17:09 |
sean-k-mooney | mnaser: the issue is all devices will be in the same iommu group | 17:09 |
sean-k-mooney | so either you disable the memory isolation in the guest vfio-pci module | 17:10 |
sean-k-mooney | or nova has to take full contol of the pcie toplogy | 17:10 |
sean-k-mooney | mnaser: stephenfin did some poc work related to this | 17:10 |
sean-k-mooney | but i dont think its complete | 17:11 |
sean-k-mooney | and it had upgrade impacts potentially | 17:11 |
stephenfin | far from complete | 17:11 |
sean-k-mooney | so it needs a spec to figure that out and more expirece playing with the poc | 17:11 |
sean-k-mooney | mnaser: by the way apprently the prefomcne of the viommu is really bad with passthough devices | 17:12 |
sean-k-mooney | im not sure why but that is what the virt team told me in the past | 17:12 |
sean-k-mooney | hopefully that has changed mnaser have you done any tests locally with livbirt? | 17:12 |
mnaser | sean-k-mooney: so, my knowledge level to this is that we have a company which ships pci cards which are hardware accelerators | 17:13 |
mnaser | sean-k-mooney: as we tried to get them to use native openstack, we figured out the extra qemu parameters they had and it included the iommu stuff | 17:13 |
mnaser | Supposedly, the cards won’t work without them in PCI pass through in the guests. | 17:14 |
sean-k-mooney | ack | 17:14 |
sean-k-mooney | what driver to you use in the guest | 17:14 |
sean-k-mooney | is it vfio-pci or a normal kernel dirver | 17:14 |
sean-k-mooney | if they dont need to be in there own iommu group the stephenfin patch would enable your usecase | 17:14 |
mnaser | I think you do bring up a good point that we’re adding it by default to q35 machine types, so I think making another flag to make it opt in might be good | 17:15 |
sean-k-mooney | so we could perhaps start with the simpel turn this one feature | 17:15 |
sean-k-mooney | we could possble add it by default | 17:15 |
sean-k-mooney | but what we cant do by defualt is change the pci layout to put each device in its own iommu group | 17:16 |
sean-k-mooney | at least not without a lot of testing | 17:16 |
mnaser | Yeah I don’t think this is what the device needs in this case | 17:16 |
sean-k-mooney | havign a hw:viommu=on|off extra spec if simple | 17:16 |
sean-k-mooney | ok stephenfin i would be ok with a specless bluepint for just adding the extraspec/image propety for this | 17:17 |
sean-k-mooney | but if we were to do the iommu group split out that would need the spec | 17:17 |
sean-k-mooney | mnaser: tldr if we want each device to be in its own iommu group we need to create a pcie expantion bridge per device. and to do that we need to assign virtual pci devivie adress to every device in the xml | 17:18 |
sean-k-mooney | mnaser: so that is the bit that has upgade concerns as basically every device in the vm would change its address potentially | 17:19 |
sean-k-mooney | mnaser: have you tried stephenfin's patch with that card? | 17:21 |
sean-k-mooney | mnaser: im not sure im comfortable with exposeing the adress with by the way | 17:22 |
sean-k-mooney | that feel a bit too low level | 17:22 |
*** amoralej is now known as amoralej|off | 17:28 | |
mnaser | sean-k-mooney: i haven't yet, but it's kindof a reimplementation of how they've done things with flat libvirt | 17:30 |
mnaser | but i am thinknig hw:viommu=on|off is a good idea | 17:31 |
sean-k-mooney | im about to -1 the patch | 17:31 |
mnaser | in terms of controlling the address, i'm thinking since it's a flavor extra spec and not a image extra spec, it's an operator-level decision | 17:31 |
mnaser | okay great | 17:31 |
sean-k-mooney | no | 17:31 |
mnaser | we'll discuss further there and ill try to poc something as well | 17:31 |
sean-k-mooney | so image extra specs are ment to be used for contoleing emulated hardware | 17:31 |
sean-k-mooney | not extra specs | 17:31 |
sean-k-mooney | so in generaly this shoudl only be exposed as an image property | 17:32 |
sean-k-mooney | but if we are to expose it in the image i would be open to having it in both | 17:32 |
sean-k-mooney | so i think we should have somehtin like hw_viommu_modle=none|intel|smmuv3|virtio in the image to contol turing this on | 17:33 |
sean-k-mooney | and i woudl be open to also haveing hw:viommu_modle=none|intel|smmuv3|virtio for partiy in the flavor | 17:33 |
sean-k-mooney | the adress space unless its somethign we are going to schdule on i am not conviced we shoudl expose as a configurable | 17:33 |
mnaser | sean-k-mooney: you know this stuff better than we do, as long as at the end of the day, we can get the xml to show up on those virt guests, we're good to go | 17:35 |
mnaser | but i agree on the image extra specs controlling hw_viommu_module too | 17:35 |
sean-k-mooney | going forwad i think virtio is what we will want but that is very new | 17:36 |
sean-k-mooney | intel only works on x86 and smmuv3 only works on arm | 17:36 |
sean-k-mooney | virtio works on both | 17:36 |
sean-k-mooney | mnaser: stephenfin comment in line | 18:04 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/830646 | 18:04 |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Run against performance.json patch https://review.opendev.org/c/openstack/nova/+/838934 | 18:20 |
dansmith | gmann: ^ | 18:20 |
sean-k-mooney | dansmith: what is that mesuring by the way | 18:20 |
gmann | dansmith: ack | 18:21 |
dansmith | sean-k-mooney: https://85682c22f75746955b38-998a6cfec97762292b03290ad6103366.ssl.cf5.rackcdn.com/837139/20/check/nova-ceph-multistore/d3c4ea2/controller/logs/performance.json | 18:21 |
dansmith | sean-k-mooney: a random comparison: https://termbin.com/rls0 | 18:21 |
sean-k-mooney | so ram usage, db queiss and api request for differnt services? | 18:22 |
dansmith | for the moment | 18:22 |
sean-k-mooney | ok what is the overall goal? | 18:23 |
dansmith | ideally to get some flag when a patch increases one of these values substantially | 18:23 |
sean-k-mooney | by the way we had weird behavior in a downstream case where there was very high memory usage if you disabeld swap entirely | 18:23 |
dansmith | like, c-bak uses 1GiB of ram for one tempest run, which is pretty dang high.. not sure when that happened (maybe always) but.. | 18:24 |
sean-k-mooney | does it have swap enabeld on the vm? | 18:24 |
clarkb | yes we enable swap on all of our test nodes | 18:24 |
sean-k-mooney | clarkb: i think some jobs turn it back off | 18:24 |
clarkb | thoug hamybe that happens at a job level but it is definitely there for devstack + tempest beacuse you OOM otherwise | 18:25 |
clarkb | and ya swapoff is always an option | 18:25 |
dansmith | yeah we're flying really close to the sun right now | 18:25 |
sean-k-mooney | dansmith: we saw nova-compute and neutron l2 agents both taking over 2G each downstream when swap was off | 18:25 |
sean-k-mooney | and it went back to normal when it was turn on | 18:25 |
dansmith | not sure why that would be, but interesting | 18:25 |
sean-k-mooney | ... with no swap acutlly being used | 18:25 |
clarkb | one of the things that happened that led to this was the rbac work drastically increased the db queries/cost and this sort of thing could catch that? | 18:25 |
dansmith | clarkb: for ironic | 18:26 |
sean-k-mooney | dansmith: it seamed like when swap was disabele the virtual memory and resident memory was the same | 18:26 |
sean-k-mooney | but when it was enabel there was a large deleta between the two | 18:26 |
dansmith | sean-k-mooney: well, makes sense right? | 18:26 |
dansmith | maybe that's just an artifact then.. | 18:26 |
dansmith | if you're only looking at resident, then you see the working set, but if you have no other option, then rss=total | 18:27 |
sean-k-mooney | with swap on the virt memroy was the same but the prviate resent memroy was less and no swap was being used | 18:27 |
dansmith | right, but it's probably because it has overcomitted to the process and until it's CoWd it can lie with swap but not if there's no swap? | 18:28 |
sean-k-mooney | i dont realy know but it did nto appre to be paging it out to swap space | 18:28 |
dansmith | right, but until it needs to it won't, | 18:28 |
dansmith | but it can lie in that case | 18:28 |
sean-k-mooney | but perhaps there was some cow sematics at play | 18:28 |
dansmith | what I mean by cow is, you can expand your heap but until you touch a page, the kernel doesn't have to actually allocate you one | 18:29 |
sean-k-mooney | yes | 18:29 |
dansmith | I dunno what python's behavior is really, other than "use a crapload of memory all the time" | 18:30 |
sean-k-mooney | you can grow the adress space but not page it to phsyical ram | 18:30 |
sean-k-mooney | but it looked liek without swap python was always commiting | 18:30 |
dansmith | right, so without swap, you don't want to overcommit because the process thinks it already has that memory | 18:30 |
dansmith | but with swap, you can lie to it and offload something else during a fault | 18:31 |
sean-k-mooney | yep so just something to be aware of | 18:31 |
sean-k-mooney | if you see big difference between jobs | 18:31 |
dansmith | I think turning off swap on any of our workers would likely be a big fail | 18:31 |
sean-k-mooney | we shoudl just ensure the both have the same swap config | 18:31 |
dansmith | they do, afaik | 18:31 |
dansmith | but yeah, good to know | 18:31 |
sean-k-mooney | am i think they all have 2G by default unless the providers flavor add extra swap space | 18:32 |
sean-k-mooney | some jobs increase it to 8G | 18:32 |
clarkb | none of our providers give us swap | 18:32 |
sean-k-mooney | ack | 18:32 |
clarkb | we create it in the jobs | 18:32 |
sean-k-mooney | yep | 18:32 |
clarkb | there is a difference though. Some use swapfiles and some use swap partitions | 18:32 |
clarkb | we prefer to partition but can only do that when we have a second device | 18:32 |
sean-k-mooney | yep i am reusing the upstream swap role in my ansible stuff | 18:32 |
sean-k-mooney | so i read over it 2 weeks ago | 18:33 |
sean-k-mooney | clarkb: actully since your here is there a reason thats in openstakc-zuul-jobs i was considering porting it to zuul-jobs but just said i woudl ask | 18:33 |
sean-k-mooney | https://github.com/openstack/openstack-zuul-jobs/tree/master/roles/configure-swap | 18:34 |
clarkb | I think because its very specific to our cloud providers and the devices they expose? | 18:34 |
sean-k-mooney | https://github.com/openstack/openstack-zuul-jobs/blob/master/roles/configure-swap/tasks/main.yaml | 18:34 |
clarkb | you can probably make it more generic, possibly by dropping the partitioned swap option | 18:34 |
sean-k-mooney | it has some stuff for ephemeral0 | 18:34 |
sean-k-mooney | but it work to just use a swap file on root | 18:35 |
clarkb | also when it was first written we didn't use dd to preallicate which was fine until linux decided to break that | 18:35 |
clarkb | but now that we use dd it should work on all the filesystems that can host a swapfile | 18:35 |
sean-k-mooney | this https://github.com/openstack/openstack-zuul-jobs/blob/master/roles/configure-swap/tasks/main.yaml#L3-L29= i think is really the only non generic part | 18:35 |
sean-k-mooney | but ya no worries | 18:36 |
clarkb | specifically swapfile on ext4 created with fallocate worked until very recently on linux. But recent changes broke that and made ext4 much more like xfs and others | 18:36 |
sean-k-mooney | hum ok | 18:36 |
sean-k-mooney | we use fallocate in nova to preallocate | 18:37 |
sean-k-mooney | what exactly changed | 18:37 |
clarkb | I'd have to go find the commit that removed fallocate to remember | 18:37 |
sean-k-mooney | i think we check for odirect | 18:37 |
clarkb | note docuemntation specifically says fallocate is fine but it isn't anymore | 18:37 |
clarkb | but it is specifically for how swapfiles are managed by the kernel | 18:37 |
sean-k-mooney | ok | 18:38 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L283-L301= | 18:38 |
sean-k-mooney | we do this but perhaps that is still ok | 18:38 |
clarkb | https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/750941 is the change there is a red hat bz link that I can't open | 18:39 |
sean-k-mooney | thats bad form we are ment to make them public if we are using them in commit or upstream bugs i wonder if there is a reason that lee did not in this case | 18:40 |
sean-k-mooney | oh because it was rhel9 proably in this cases | 18:41 |
sean-k-mooney | ya that was commit before cenots 9 or rhel 9 were a thing publicly | 18:41 |
sean-k-mooney | clarkb: not much in it but you shoudl be able to see it now https://bugzilla.redhat.com/show_bug.cgi?id=1827115 | 18:53 |
sean-k-mooney | tldr fallocate -z shoud be the same as dd and result in no holes | 18:53 |
clarkb | I wonder if it is any faster at that point | 18:54 |
clarkb | might be worth testing that as creating the file can be slow at time | 18:55 |
sean-k-mooney | ya not really sure to be honest | 18:55 |
clarkb | sean-k-mooney: https://lkml.org/lkml/2020/9/24/810 | 18:58 |
clarkb | maybe it is all happy now again | 18:59 |
sean-k-mooney | i cant view the last message but the previous ones in that thread look promising at least | 19:00 |
sean-k-mooney | clarkb: if there was a performace hit we could just revert lees change. shall i open a revert and see if it works? | 19:07 |
sean-k-mooney | i guess we would need a DNM test patch agaisnt devstack to test all the distros that depending on the revert | 19:08 |
sean-k-mooney | actully we proably should talk about this in #openstack-qa isntead | 19:09 |
opendevreview | sean mooney proposed openstack/nova master: enable locking test fixture https://review.opendev.org/c/openstack/nova/+/838942 | 19:16 |
sean-k-mooney | artom: gibi ^ that shoudl fix it | 19:16 |
lyarwood | sean-k-mooney: FWIW the bug was public when I linked it, it was made private by someone else a few months later. | 19:19 |
lyarwood | sean-k-mooney: also looks like that switch for fallocate landed after my original change | 19:19 |
opendevreview | Ghanshyam proposed openstack/nova master: Update python classifier as per testing runtime https://review.opendev.org/c/openstack/nova/+/838943 | 19:20 |
sean-k-mooney | lyarwood: ack not blaming you by the way i assumed you had a good reason to keep it private ot that it was public when you filed it but didnt check who made it private | 19:20 |
lyarwood | Yeah no issues | 19:21 |
sean-k-mooney | i was going to submit a revirt but i see were are not actully passing -z | 19:22 |
artom | sean-k-mooney, that make sense, but why did we not hit more often upstream> | 19:22 |
artom | ? | 19:22 |
sean-k-mooney | i might try updating it to use -z and regert the size ot 8192 | 19:22 |
artom | In downstream CI it's 100% | 19:22 |
sean-k-mooney | artom: the upper constrait bump only happend 2 months ago | 19:23 |
sean-k-mooney | so i guess we only started seeing it recently | 19:23 |
sean-k-mooney | it could be realted to mirrors having old versions | 19:23 |
sean-k-mooney | but not really sure | 19:23 |
opendevreview | Ghanshyam proposed openstack/python-novaclient master: Update python classifier as per testing runtime https://review.opendev.org/c/openstack/python-novaclient/+/838944 | 19:24 |
artom | sean-k-mooney, ah, so actually, lemme test that with my downstream DNM patch | 19:27 |
sean-k-mooney | that shoudl be a clean cherry pick to wallby since i actuly did it on wallayb first and then stashed it checkout masters and applied it | 19:31 |
sean-k-mooney | so i can chery pick it upstream and then do it downstream if you like | 19:31 |
opendevreview | sean mooney proposed openstack/nova stable/yoga: enable locking test fixture https://review.opendev.org/c/openstack/nova/+/838877 | 19:52 |
sean-k-mooney | ... damit | 19:52 |
sean-k-mooney | gerrit nolonger addes the cherry-picked form lines by default | 19:52 |
sean-k-mooney | oh i bet its because its not merged. | 19:53 |
opendevreview | sean mooney proposed openstack/nova stable/yoga: enable locking test fixture https://review.opendev.org/c/openstack/nova/+/838877 | 19:54 |
opendevreview | sean mooney proposed openstack/nova stable/xena: enable locking test fixture https://review.opendev.org/c/openstack/nova/+/838945 | 19:55 |
opendevreview | sean mooney proposed openstack/nova stable/wallaby: enable locking test fixture https://review.opendev.org/c/openstack/nova/+/838946 | 19:55 |
melwitt | sean-k-mooney: correct, it will add the lines only if merged | 20:02 |
clarkb | sean-k-mooney: ya probably worth a revert to check it | 20:11 |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Run against performance.json patch https://review.opendev.org/c/openstack/nova/+/838934 | 21:59 |
opendevreview | Ghanshyam proposed openstack/nova master: Update python testing as per zed cycle teting runtime https://review.opendev.org/c/openstack/nova/+/838943 | 22:04 |
opendevreview | Ghanshyam proposed openstack/nova master: Update python testing as per zed cycle teting runtime https://review.opendev.org/c/openstack/nova/+/838943 | 22:06 |
*** dasm is now known as dasm|off | 23:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!