opendevreview | Takashi Kajinami proposed openstack/os-vif master: Fix missing or unnecessary dependencies https://review.opendev.org/c/openstack/os-vif/+/909341 | 01:40 |
---|---|---|
opendevreview | Takashi Kajinami proposed openstack/os-vif master: Fix missing or unnecessary dependencies https://review.opendev.org/c/openstack/os-vif/+/909341 | 01:41 |
*** mklejn_ is now known as mklejn | 08:36 | |
*** tkajinam is now known as Guest213 | 08:56 | |
opendevreview | Merged openstack/nova master: HyperV: Remove RDP console connection information API https://review.opendev.org/c/openstack/nova/+/906991 | 09:28 |
opendevreview | Merged openstack/nova-specs master: Create specs directory for 2024.2 Dalmatian https://review.opendev.org/c/openstack/nova-specs/+/906073 | 11:03 |
opendevreview | Stephen Finucane proposed openstack/nova-specs master: Add openapi spec https://review.opendev.org/c/openstack/nova-specs/+/909448 | 13:18 |
opendevreview | Takashi Kajinami proposed openstack/nova-specs master: libvirt: Stateless firmware support https://review.opendev.org/c/openstack/nova-specs/+/908297 | 13:23 |
opendevreview | Takashi Kajinami proposed openstack/nova-specs master: libvirt: AMD SEV-ES support https://review.opendev.org/c/openstack/nova-specs/+/907702 | 13:24 |
bauzas | dansmith: sean-k-mooney: when you have a minute, I'd like to discuss about the mdev live-mig series https://review.opendev.org/c/openstack/nova/+/904209/16 | 14:37 |
bauzas | (about the periodic call) | 14:38 |
sean-k-mooney | sure | 14:41 |
sean-k-mooney | we will be hitting meetings shortly but im free for a while | 14:41 |
bauzas | awaiting dan when he can | 14:42 |
bauzas | my concern is about the API we could use for calling the drivcer | 14:42 |
dansmith | bauzas: I'm just getting started in my morning so I'm doing lots of stuff, sorry | 14:43 |
bauzas | no worries then | 14:43 |
bauzas | I'll try to provide a WIP then | 14:43 |
sean-k-mooney | bauzas: just do this in update_aviable_resouces | 14:44 |
sean-k-mooney | thats already being called and its very closely related | 14:45 |
bauzas | sean-k-mooney: sure, but given we don't call the driver by it, I need to create a new driver method :( | 14:45 |
sean-k-mooney | we do | 14:45 |
sean-k-mooney | update_avlaiable_resouces calls the driver | 14:45 |
sean-k-mooney | thats why im suggesting it | 14:45 |
bauzas | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L9582 | 14:46 |
sean-k-mooney | i belive that is called form https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L10587 | 14:47 |
sean-k-mooney | yep here https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L911 | 14:48 |
sean-k-mooney | the resouce tracker calls the driver | 14:48 |
sean-k-mooney | so you can just add it to get_available_resource | 14:48 |
sean-k-mooney | in the driver | 14:48 |
sean-k-mooney | which is what you linked too | 14:49 |
sean-k-mooney | the alternitive which i assume you wanted to dicuss | 14:50 |
sean-k-mooney | was adding a new virt driver api function | 14:50 |
sean-k-mooney | and then calling that directly | 14:50 |
sean-k-mooney | alternitivly you could decorate a method in the libvirt driver as a perodic | 14:50 |
sean-k-mooney | and have it run seperatly form those run by the compute manager | 14:51 |
sean-k-mooney | both are workable but the minimal change is just make get_avialable_resoucces call a new function to reconsile the dict. | 14:52 |
sean-k-mooney | bauzas: looking at the code i wonder if there is a better approch | 14:53 |
sean-k-mooney | bauzas: have you coniserd doing it in update_provider_tree | 14:53 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L8869C9-L8869C29 | 14:54 |
sean-k-mooney | that has the advantage of being passed in the allocations | 14:54 |
sean-k-mooney | so if you see there is not allocation against the provider but there is a reservation in the dict for an instance | 14:54 |
sean-k-mooney | that would be a stale reservation correct? | 14:55 |
sean-k-mooney | hum perhaps not | 14:55 |
sean-k-mooney | the allocation may be owned by the migration uuid | 14:56 |
sean-k-mooney | if you were to do it in _update_provider_tree i would do it in the existing _update_provider_tree_for_vgpu fucntion where you do reshapes | 14:56 |
bauzas | sean-k-mooney: sorry got a phonecall meanwhile | 14:56 |
sean-k-mooney | no worries | 14:57 |
sean-k-mooney | readback and we can chat later | 14:57 |
* bauzas scrolling back | 14:57 | |
bauzas | sean-k-mooney: okay I see your point | 14:58 |
bauzas | so, we would need to inject the db to get the migrations in _update_provider_tree_for_vgpu() | 14:58 |
bauzas | sean-k-mooney: would you be okay ? | 14:59 |
bauzas | I mean calling objects.MigrationList.get_in_progress_and_error() in _update_provider_tree_for_vgpu() | 15:00 |
dansmith | I've lost track of what we're trying to solve here | 15:01 |
dansmith | I thought the periodic was to clean up reservations that are now stale for migrations that never happened? | 15:02 |
dansmith | in the driver memory-resiident dict | 15:02 |
sean-k-mooney | yes | 15:03 |
sean-k-mooney | so i think we can do that in _update_provider_tree_for_vgpu | 15:03 |
dansmith | so what injecting into the DB needs to happen? | 15:03 |
sean-k-mooney | nothing has to happen in the db | 15:03 |
dansmith | ack, okay, I'm confused by bauzas' statement then | 15:04 |
sean-k-mooney | dansmith: is the destionation allcotion held by the instance uuid or the migration uuid | 15:04 |
sean-k-mooney | i think the instance_uuid? | 15:04 |
dansmith | instance IIRC yeah | 15:04 |
sean-k-mooney | ack, assuming that is correct all _update_provider_tree_for_vgpu need to do | 15:04 |
sean-k-mooney | is loop over the in memory dict | 15:05 |
dansmith | maybe bauzas is talking about _querying_ for canceled migration objects to know what to clean from the dict? | 15:05 |
sean-k-mooney | and remove any key that does not have an allocation | 15:05 |
bauzas | dansmith: btw. I added a new check by https://review.opendev.org/c/openstack/nova/+/904209/16/nova/virt/libvirt/driver.py#1740 | 15:05 |
dansmith | sean-k-mooney: you can't do that I think because you might be racing with the start of the migration? | 15:05 |
bauzas | dansmith: my question was about where we could verify the dict | 15:05 |
dansmith | sean-k-mooney: or are you thinking the instance already has an allocation for that host, until it's canceled and we can go on the presence of that allocation? | 15:06 |
sean-k-mooney | we should have claimed the placment allcotion before calling migrate correct | 15:06 |
bauzas | and I just don't want to create a new virt driver API method just for this | 15:06 |
dansmith | right, okay, yeah that makes sense | 15:06 |
dansmith | I'm just wondering what bauzas meant by "so, we would need to inject the db to get the migrations in _update_provider_tree_for_vgpu() " | 15:06 |
bauzas | sean-k-mooney: I wouldn't check the allocations, I would just verify that all the dict keys have a current migration | 15:07 |
sean-k-mooney | well that required addtional db lookups | 15:07 |
sean-k-mooney | which can only be done in the compute manager or resouce tracker | 15:07 |
sean-k-mooney | not the driver | 15:07 |
bauzas | dansmith: I'm saying that I want to get the current migrations | 15:07 |
dansmith | bauzas: but "has an allocation for gpu on this host" is good enough right? | 15:07 |
sean-k-mooney | so that why im trying to use the allcotions instead | 15:07 |
dansmith | it also means the reservation mirrors the allocation in placement, which seems right | 15:08 |
bauzas | dansmith: no, because you could have an allocation for an instance while it's not because of a migration | 15:08 |
bauzas | if we want to check the dict, we need to look at the *migrations* | 15:08 |
dansmith | bauzas: okay, but that would be for an instance that migrated in successfully in which case it doesn't hurt right? | 15:08 |
bauzas | dansmith: that's what you asked me | 15:08 |
bauzas | dansmith: no, it would only be for current migrations | 15:09 |
bauzas | once the migration is done, then the dict key is removed | 15:09 |
bauzas | (or aborted) | 15:09 |
dansmith | bauzas: actually, what I asked is for it to not to get out of sync silently so that things stop working and a restart is the only solution :) | 15:09 |
bauzas | so we need to compare | 15:09 |
sean-k-mooney | bauzas: not what dansmith is saying is if we have an entry in the dict and the instance ahs an allcoation agaisnt the host and there is no migration | 15:09 |
dansmith | bauzas: if that happens, then we're fine, this is just for cleanup if the migration does *not* get rolled back successfully right? | 15:09 |
sean-k-mooney | then its can only happy because the migration succeeded | 15:09 |
dansmith | sean-k-mooney: well, not exactly | 15:10 |
sean-k-mooney | and we failed to remove it form the dict for some bug | 15:10 |
bauzas | dansmith: that's what I'm saying, we need to compare migrations with the dict | 15:10 |
bauzas | if a dict key is not related to a current migration, then it's stale | 15:10 |
dansmith | sean-k-mooney: what I'm saying is that if there's a reservation in the dict and there's an allocation, then it's either still in progress, has yet to be rolled back, or it succeeded in which case keeping the reservation in the dict doesn't prevent other stuff from working | 15:10 |
dansmith | once the instance is deleted or migrated away, healing based on the allocation would work itself out too | 15:11 |
sean-k-mooney | dansmith: ya i agree with that | 15:11 |
sean-k-mooney | and yes to the healing too | 15:11 |
sean-k-mooney | also bauzas if we ever use consumer_types | 15:11 |
dansmith | bauzas: a dict key for a completed migration where the instance is on our host is not incorrect, even if it is stale, and won't harm anything yeah? | 15:11 |
sean-k-mooney | you can actully tell if the allocation is a migration if you really want too | 15:11 |
sean-k-mooney | so if we start using consumer types for that and we do this based on the allcoation if you want to filter you will be able to in the future | 15:12 |
bauzas | dansmith: I wouldn't be super happy, as we should have removed it by https://review.opendev.org/c/openstack/nova/+/904209/16/nova/virt/libvirt/driver.py#11355 | 15:12 |
dansmith | can we gmeet? | 15:12 |
dansmith | I feel like we're going in circles here | 15:12 |
bauzas | sure, I can | 15:12 |
bauzas | sec | 15:12 |
bauzas | https://meet.google.com/cyg-ucsv-xri | 15:13 |
dansmith | sean-k-mooney: are you able to join? | 15:14 |
opendevreview | Amit Uniyal proposed openstack/nova master: enforce remote console shutdown https://review.opendev.org/c/openstack/nova/+/901824 | 15:14 |
opendevreview | Amit Uniyal proposed openstack/nova master: enforce remote console shutdown https://review.opendev.org/c/openstack/nova/+/901824 | 15:17 |
sean-k-mooney | dansmith: sorry no i was in/am in anohter call | 15:28 |
sean-k-mooney | dansmith:i can join quickly now since this call is just ending | 15:29 |
opendevreview | Mateusz Klejn proposed openstack/nova master: Fixed: unit tests for force delete instance https://review.opendev.org/c/openstack/nova/+/909464 | 16:01 |
opendevreview | Mateusz Klejn proposed openstack/nova master: Changed: adjust shelving for shelved image protection https://review.opendev.org/c/openstack/nova/+/909465 | 16:01 |
mnaser | Good morning! https://review.opendev.org/c/openstack/nova/+/909228 + https://review.opendev.org/c/openstack/nova/+/909229 are ready for review today :) | 16:04 |
melwitt | when O | 16:42 |
melwitt | *when I've looked into nova-lvm job failures, what I have seen is that they fail due to the glance apache server responding to nova with a HTTP 502 bad gateway. meanwhile no errors in glance logs https://etherpad.opendev.org/p/nova-ci-failures-minimal#L11 | 16:43 |
melwitt | not really sure what to do next to debug | 16:44 |
sean-k-mooney | melwitt: this maybe be related to glance-api being OOM killed | 16:45 |
sean-k-mooney | melwitt: artom was debuging failure where that happend so it would be good to confirm that is not whats happening fro the nova-lvm case | 16:45 |
artom | "debugging" is a strong word | 16:46 |
artom | I chased a red herring only to figure out that glanceclient does chunking for us | 16:46 |
melwitt | hm ok. I didn't think there was OOM killer in these but let me check | 16:46 |
artom | So I still no have no idea why glance-api gets oom-killed | 16:46 |
melwitt | oh, yeah there is some OOM kill in here | 16:47 |
sean-k-mooney | melwitt: there may not be but if there is no error in the glance api log i would first check for an OOM event | 16:47 |
sean-k-mooney | so we chatted about turning on zswap on nova-next | 16:47 |
sean-k-mooney | but since we are having OOM events in nova-lvm i might turn it on there too | 16:47 |
sean-k-mooney | or increase the swap there to 8G if its still at 4G | 16:48 |
melwitt | yeah, g-api. hm | 16:48 |
melwitt | Feb 15 13:59:43 np0036770057 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/system-devstack.slice/devstack@g-api.service,task=uwsgi,pid=86510,uid=1001 | 16:49 |
sean-k-mooney | melwitt: so its runing with 4G of swap today | 16:51 |
dansmith | also, g-api might not be actually consuming lots of memory, it might just be that the OOM killer decided it was the best thing to kill to create some space | 16:51 |
dansmith | so look at the oom output to see if it was huge or not | 16:51 |
opendevreview | sean mooney proposed openstack/nova master: bump nova-lvm to use 8G of swap from 4G https://review.opendev.org/c/openstack/nova/+/909469 | 16:55 |
melwitt | Feb 15 13:59:43 np0036770057 kernel: Out of memory: Killed process 86510 (uwsgi) total-vm:2290700kB, anon-rss:1733784kB, file-rss:6324kB, shmem-rss:32kB, UID:1001 pgtables:3732kB oom_score_adj:0 4707 so I guess 2.2 GB? not sure if that's typical | 16:56 |
sean-k-mooney | melwitt: dansmith ^ thats the swap bump and ill look at adding zswap to that and nova-next in a seperate patch later | 16:56 |
sean-k-mooney | melwitt: so it should be maybe 150-200mb at most normally | 16:57 |
dansmith | makes sense to try it on nova-lvm too so that seems okay | 16:57 |
sean-k-mooney | that looks like it holding a guest image in memory | 16:57 |
sean-k-mooney | that or we need to add an exicit python garbage collector call in the image upload path | 16:58 |
melwitt | Feb 15 13:59:43 np0036770057 kernel: Free swap = 0kB | 16:58 |
dansmith | the image upload path doesn't store the image | 16:58 |
sean-k-mooney | i.e. to egerly run the garbage collector after straming the image | 16:58 |
dansmith | it's possible that python keeps allocating new buffers for each thing we read and needs a break to do garbage collection, that's certainly possible | 16:59 |
sean-k-mooney | i belive we have hadd issues with that in eithe glance or cinder before | 17:00 |
sean-k-mooney | maybe im thinking of swift | 17:01 |
sean-k-mooney | https://codesearch.opendev.org/?q=gc.collect&i=nope&literal=nope&files=&excludeFiles=&repos= | 17:01 |
sean-k-mooney | there are a few project that have an explcit gc.collect() call | 17:02 |
sean-k-mooney | hum although mainly in test code | 17:03 |
dansmith | there was also something about python never deflating the heap, which I think gorka can tell us about | 17:03 |
sean-k-mooney | ya thats what i was recalling. speaking to gorka about the agents growing there memory usage over time | 17:03 |
dansmith | but I would think that wouldn't be our problem here since we should be good to re-use those portions of the heap over and over, especially with allocations of exactly the same size each time | 17:03 |
sean-k-mooney | in anycase bumping to 8G of swap will mitigate the issue for now | 17:04 |
opendevreview | Jay Faulkner proposed openstack/nova-specs master: Sharding: More detailed test plan https://review.opendev.org/c/openstack/nova-specs/+/908673 | 17:05 |
JayF | https://review.opendev.org/c/openstack/nova-specs/+/908673 Heads up, I'm going to be implementing the plan laid out here this week; please make loud noises quickly if it's unacceptable :D | 17:05 |
sean-k-mooney | ack for 2024.2 or are you still hoping to merge the shard stuff for 2024.1 | 17:07 |
sean-k-mooney | i guess for 2024.1 | 17:08 |
sean-k-mooney | if you can get the testing working in time before FF i guess that is fine | 17:08 |
sean-k-mooney | for me the minium i think we need is a 2 node job with to set of vms with 2 ironnic compuge service partitioning the vms into shards within the same conducotr group | 17:09 |
sean-k-mooney | JayF: having tempest test that acttuly assert the toplogy is proably not in the scope of tempest. it would be in the scope of whitebox or could be done via a differnt tempest plugin | 17:11 |
gibi | Uggla: i responded in the manila series | 17:12 |
JayF | sean-k-mooney: It was going to be in ironic-tempest-plugin | 17:12 |
sean-k-mooney | ack thats proably ok then | 17:12 |
JayF | Two node job is not something we do in Ironic really | 17:12 |
JayF | And we are working to slim down our job set; try to get more cases in less VMs | 17:12 |
sean-k-mooney | JayF: can you deploy 2 nova-compute services on a singel node | 17:13 |
JayF | so adding more extensive jobs works against that -- but if we validate the resource pool only contains sharded nodes, we're golden (and already have code to do this in devstack that I can lift) | 17:13 |
JayF | sean-k-mooney: We have zero tests where we do >1 configured nova-compute in Ironic. | 17:13 |
JayF | sean-k-mooney: and the devstack machinery does not support it | 17:13 |
sean-k-mooney | right so that is why i was expecting to have a 2 node job | 17:14 |
sean-k-mooney | each runnign a single nova-compute with ironic | 17:14 |
sean-k-mooney | but ok we can get a similar effect with what your suggesting | 17:14 |
sean-k-mooney | it just sound like much more work | 17:14 |
JayF | No, much much less work. I've already looked into this with Julia and we have a good amount of confidence in this approach. | 17:15 |
sean-k-mooney | well not nessiarly much more just more then i would have expected | 17:15 |
* JayF famous last words | 17:15 | |
sean-k-mooney | ack did you have time to respine the patches to nova for the pep8 error | 17:15 |
JayF | I just don't want to do it and someone come along and say "not good enough" :D | 17:15 |
JayF | yeah, I just did | 17:15 |
sean-k-mooney | cool | 17:15 |
JayF | right before talking in here because I went to link it and saw the V-1 :D | 17:16 |
dansmith | I guess I'm confused about how we can know the partitioning works with only one node? | 17:17 |
sean-k-mooney | ok i put the spec update on teh same gerrit topic as teh code | 17:17 |
sean-k-mooney | "ironic-shards" | 17:17 |
sean-k-mooney | dansmith: the job will create more then 1 ironic "vm" and only tag a subset of them with the shard key | 17:17 |
sean-k-mooney | dansmith: so the job will need to assert that not all the nodes in the ironic node list | 17:18 |
sean-k-mooney | show up in the RPs | 17:18 |
JayF | Yeah, we'll end up with e.g. 10 nodes configured (fake drivers, no backend), and only 10-N of them configured with a shard | 17:18 |
sean-k-mooney | and then ones that do march the expected shard key | 17:18 |
JayF | once the good nodes show up, we start a timer and look for any bad nodes | 17:18 |
dansmith | so just making sure that the one nova-ironic we have doesn't pick up the other fake resources outside the configured shard? | 17:18 |
JayF | exactly | 17:18 |
dansmith | okay the spec doesn't say ten.. so to be clear, it will be a bunch of fake ironic resources and not just .. one real and one fake, right? | 17:19 |
dansmith | because the existing hash ring thing will get 1/2 right half the time | 17:19 |
JayF | with fake nodes, the number can be as many as we want | 17:19 |
sean-k-mooney | which works and covers the gap we had. it just not how i would have approched it | 17:19 |
JayF | so if we need a large number of "N" to have confidence, it's trivial to do it | 17:19 |
JayF | where large is like, small two digit number | 17:20 |
sean-k-mooney | JayF: we dont but it would have been nice to have more then one nova-compute in the same ironic conductor group with diffent shard keys | 17:20 |
dansmith | okay I was hoping for better, tbh, but I guess it's better than nothing | 17:20 |
dansmith | right exactly | 17:20 |
JayF | sean-k-mooney: doing that same test for conductor groups is a follow on | 17:20 |
JayF | oh, not exactly what you said | 17:20 |
JayF | but ensuring we have nodes in good/bad conductor groups and testing there | 17:20 |
JayF | similarly to how we are planning for shards (so then we'll have CI coverage for conductor group based paritioning, too, which AIUI doesn't exist at all today either) | 17:21 |
sean-k-mooney | so from a nova point of view we want to ensure we can have multiple nova compute services with idffent shard keys in the same cloud | 17:21 |
sean-k-mooney | and for each to manage ther own slice | 17:21 |
sean-k-mooney | we kind of get that by what you propose | 17:22 |
sean-k-mooney | it just would have been nicer to actully test that directly | 17:22 |
JayF | We are validating up to the virt driver level, but not up to the compute level | 17:22 |
JayF | and you'd rather us go up a little bit further, which makes sense | 17:22 |
JayF | This is an MVP is good; but I will keep an eye out for opportunities to test >1 nova compute -- this is likely something to plan and do as a separate project b/c I suspect it'll be a massive change to the ironic devstack plugin to do so | 17:23 |
JayF | TheJulia: ^ WDYT; tl;dr how hard would it be to do an Ironic test job with >1 nova-compute pulling from the same Ironic? | 17:23 |
sean-k-mooney | i dont think it would need much work at all | 17:23 |
JayF | Maybe I'll reword it to be more personal: that is closer to the approach I was taking last year that kept me running into walls; if it's possible/easy I don't see that path myself | 17:24 |
sean-k-mooney | JayF: the way multi node plugins are ment to work is runnign the same plugin once per node just with bit enabled/disbale as needed | 17:24 |
sean-k-mooney | JayF: so to add a second nova-comptue with ironic we would jsut turn off all the bits the plugin does for the ironic api extra | 17:24 |
JayF | I think you're ignoring that we'd have to have the second node talk to Ironic on the first node, which I think would require some network magic | 17:25 |
sean-k-mooney | and only deploy nova-compute with the ironic driver with a difffen tshard key configured | 17:25 |
sean-k-mooney | JayF: yes but we have that for all the other multi node jobs | 17:25 |
sean-k-mooney | anyway when you have a job up for review | 17:26 |
sean-k-mooney | we can look at makeing a multi node variant later | 17:26 |
JayF | I am not sure Ironic community at large will be thrilled to add multinode to our gate complexities, but timing-wise that's probably PTG topic anyway. | 17:26 |
sean-k-mooney | if we are not crating any addtional fake nodes then we can likely do it with just a config overried to change the virt driver to ironic and populate the nova.conf with the ironic details | 17:27 |
dansmith | ironic has no multinode jobs? | 17:27 |
TheJulia | We have one, adding further complexity to it is not ideal | 17:27 |
Uggla | gibi, thx, I will have a look. | 17:27 |
dansmith | eesh | 17:28 |
JayF | oh, we do > ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode https://zuul.opendev.org/t/openstack/build/7a8be5f2987745f390f6078e8ac87d82 : SUCCESS in 1h 14m 03s (non-voting) | 17:28 |
JayF | non-voting | 17:28 |
TheJulia | JayF: to the orignal ask, that is how the job operates now | 17:28 |
JayF | Ack, so as a second step, I can look into making that job also test sharding | 17:28 |
TheJulia | I think it just would if set, fwiw | 17:28 |
dansmith | I guess I'm not sure what configuration is more deserving of a multinode layout than the configuration that arranges for more-than-trivial nodes | 17:29 |
JayF | I honestly don't know what that job tests; it'll be on my plate to look into it | 17:29 |
JayF | if I can just flip sharding on in that job and use it to test, that'd be even simpler (although I still think there's value in doing the "check resources" type job, because it'll fail more cleanly if we break things) | 17:30 |
sean-k-mooney | there is some logic in devstack to run multiple fake computes on a single node | 17:31 |
sean-k-mooney | so we could (next cycle) explore doign that for ironic. but honestly i think all that is need to enable the comptue service on a secnd node and set the virt driver to ironci and set the other config options | 17:31 |
TheJulia | JayF: multiple ironic-conductors and nova-computes if I'm recalling correctly | 17:31 |
TheJulia | It has just been a while since I've done any work on it | 17:32 |
JayF | oooh, so is that the job I dind't know existed that does coverage for conductor_groups | 17:32 |
JayF | that would make a lot of sense, actually | 17:32 |
sean-k-mooney | thats one of the two jobs we have in our expermiental pipeline | 17:33 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/.zuul.yaml#L981-L990 | 17:33 |
sean-k-mooney | we have ironic-tempest-bfv and ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode | 17:34 |
JayF | I think that is the whole mystery laid out there then | 17:35 |
JayF | looks like I have a template job, and need to just shardify it | 17:35 |
sean-k-mooney | ya so this is how hte second node is configured https://github.com/openstack/ironic/blob/master/zuul.d/ironic-jobs.yaml#L692-L733 | 17:35 |
JayF | those OVS config bits where the parts I was worried about figuring out, honestly | 17:36 |
sean-k-mooney | well thats common with the libvirt jobs | 17:37 |
sean-k-mooney | for the most part anyway | 17:37 |
JayF | My Ironic knowledge is concentrated closer to the endpoint :) I'm working to spread it up the stack more and more over time but networking is a big blind spot | 17:37 |
sean-k-mooney | JayF: what i would suggest is focus on the testing you had planed before | 17:39 |
sean-k-mooney | if we can speratelly add teh config seeting to tag the ironic nodes in that other job with a shard key | 17:39 |
sean-k-mooney | and configure the relevnet nova-computes on ech host to look at only the shard key for the mv on the relevent host | 17:39 |
sean-k-mooney | that will cover us for the multi node job | 17:40 |
JayF | Well, except for the "you should | 17:40 |
JayF | do the other one first" that's what I had in mind | 17:40 |
sean-k-mooney | well if you want to do the multi node one first that fine | 17:41 |
JayF | I'll take your advice and still keep on the path I added to the spec -- even though you did an ace job of making the multinode seem easier :D | 17:41 |
sean-k-mooney | i was just hoping you could provide an exmaple of configuring it | 17:41 |
sean-k-mooney | then we could help prot it to the multi node one | 17:41 |
opendevreview | Fabian Wiesel proposed openstack/nova master: Do not close returned image-chunk iterator https://review.opendev.org/c/openstack/nova/+/909474 | 17:41 |
sean-k-mooney | JayF: you can set teh shard key in the nova.conf trivially like this https://github.com/openstack/nova/blob/master/.zuul.yaml#L459-L485 | 17:43 |
sean-k-mooney | im not sure how you go about tagging the relevent fake ironics with a shard key | 17:43 |
sean-k-mooney | if really needed to hack thing quickly i would use a local.sh file created by a pre playbook | 17:44 |
sean-k-mooney | like this https://review.opendev.org/c/openstack/nova/+/679656/13/playbooks/nfv/nfv.yaml | 17:45 |
sean-k-mooney | but i assume you have a nicer way to do that in teh ironic plugin | 17:45 |
sean-k-mooney | ideally via a local.conf parmater | 17:45 |
sean-k-mooney | JayF: if your not familar with /opt/stack/devstack/local.sh if it exist it run right before devstack finishes so if a job creates on eit will be run after devstack has otherwise run fully but before any tests are run | 17:46 |
sean-k-mooney | so its a way to hack in funcitonaltiy quickly via a zuul job that hsould eventually be in a devstack plugin | 17:47 |
*** dmitriis is now known as Guest251 | 17:48 | |
JayF | honestly that's as nifty for local dev setup as it is for a CI trick, ty | 17:50 |
sean-k-mooney | yes so the reason ci jobs should not use it hevialy | 17:50 |
sean-k-mooney | is because its ment to be reserved for local dev | 17:51 |
sean-k-mooney | but they can use it | 17:51 |
sean-k-mooney | plugins just should not require it to have stuff to work | 17:51 |
sean-k-mooney | since its for the user not the "framework/plugins" to use | 17:51 |
sean-k-mooney | so form a ci point of view its perfectly valid to use | 17:52 |
sean-k-mooney | but if its complex it proably shoudl be a feature in the plugin/devstack instead | 17:52 |
JayF | I mean, yeah, it's kinda obvious it'd be hacky to land a CI job using that | 17:54 |
sean-k-mooney | JayF: i was epecting to see something in enroll_nodes https://github.com/openstack/ironic/blob/245505dc04da6e401e1810812809fef3f3757f04/devstack/lib/ironic#L2431 | 17:56 |
sean-k-mooney | to set the shard key on the node | 17:57 |
sean-k-mooney | did i miss that | 17:57 |
JayF | shard key is optional; you only need it when setting things up | 17:57 |
JayF | and it's unused upstream until the nova change lands | 17:58 |
JayF | so there's nothing to miss (yet) | 17:58 |
sean-k-mooney | ah ok so the devstack pluging change has not happened yet | 17:58 |
sean-k-mooney | i was epectign to see a new Settign added to the plugin | 17:58 |
sean-k-mooney | and hten have it append the shared key into the node options if its defined | 17:59 |
JayF | sean-k-mooney: https://opendev.org/openstack/ironic-tempest-plugin/commit/f2b9c74c9cb8459a979d9002aae3c1a41737c77a is the testing that exists r/n in tempest, no scenario tests yet | 17:59 |
JayF | this conversation is me kicking off that work for this week | 17:59 |
sean-k-mooney | right but that patch is in the tempest plugin not the devstack plugin | 17:59 |
JayF | yep; that's what I'm saying, nothing scenario is done, I have the old patch from last cycle which is basically useless now and never landed (but did what you're lookign for) | 18:00 |
sean-k-mooney | ah ok | 18:00 |
sean-k-mooney | well its not useless in that it allows you to deploy locally with ironic fake nodes via devstack | 18:01 |
sean-k-mooney | with shards | 18:01 |
JayF | Yeah, but it's useless in terms of testing coverage | 18:01 |
JayF | because it was passing even when the client wasn't even passing ?shard=blah to the Ironic PAI | 18:02 |
JayF | that's what I mean by useless :D | 18:02 |
sean-k-mooney | it depends on what you were trying to test. but i agree it alone was not suffienct | 18:18 |
sean-k-mooney | dansmith: melwitt nova-lvm passed with https://review.opendev.org/c/openstack/nova/+/909469 by the way https://zuul.opendev.org/t/openstack/build/5069b706178e4430ba48bdd52234b065 so do we want to just merge that for now | 18:20 |
sean-k-mooney | memory_tracker low_point: 358852 | 18:21 |
sean-k-mooney | SwapFree: 5804540 kB | 18:21 |
sean-k-mooney | so we are not ussing most of the addtional swap | 18:22 |
dansmith | I'm cool with it yeah :) | 18:22 |
sean-k-mooney | but that does not mean it was not needed and just not clolelcted i nthe memory tracker | 18:22 |
dansmith | um, parse error | 18:23 |
dansmith | but yeah, I'm in favor of 8G swap for any of these that are on the border | 18:23 |
sean-k-mooney | i mean the memory tracker log runs perodically adn the extra swap could have been needed breifly in between runs | 18:23 |
melwitt | sean-k-mooney: noice, +W | 18:26 |
*** NeilHanlon is now known as nhanlon | 19:00 | |
*** nhanlon is now known as NeilHanlon | 19:02 | |
opendevreview | Merged openstack/nova stable/2023.1: libvirt: stop enabling hyperv feature reenlightenment https://review.opendev.org/c/openstack/nova/+/909229 | 20:43 |
opendevreview | Merged openstack/nova master: bump nova-lvm to use 8G of swap from 4G https://review.opendev.org/c/openstack/nova/+/909469 | 21:11 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!