*** Liang__ has joined #openstack-nova | 00:02 | |
*** nicolasbock has quit IRC | 00:03 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244 | 00:05 |
---|---|---|
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Pass accelerator requests to each virt driver from compute manager. https://review.opendev.org/698581 | 00:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Compose accelerator PCI devices into domain XML in libvirt driver. https://review.opendev.org/631245 | 00:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 00:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard/soft reboot with accelerators. https://review.opendev.org/697940 | 00:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable start/stop of instances with accelerators. https://review.opendev.org/699553 | 00:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable and use COMPUTE_ACCELERATORS trait. https://review.opendev.org/699554 | 00:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Bump compute rpcapi version and reduce Cyborg calls. https://review.opendev.org/704227 | 00:05 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 00:05 |
*** liuyulong has quit IRC | 00:11 | |
*** migawa|lunch is now known as migawa|AFK | 00:19 | |
*** xiaolin has joined #openstack-nova | 00:24 | |
*** samc-bbc has quit IRC | 00:26 | |
*** samc-bbc has joined #openstack-nova | 00:27 | |
*** tetsuro has quit IRC | 00:33 | |
*** tetsuro_ has joined #openstack-nova | 00:33 | |
*** gyee has quit IRC | 00:50 | |
*** mlavalle has quit IRC | 01:17 | |
*** tbachman has joined #openstack-nova | 01:32 | |
*** ileixe has joined #openstack-nova | 01:43 | |
*** tbachman has quit IRC | 02:06 | |
*** zhanglong has joined #openstack-nova | 02:15 | |
*** dave-mccowan has joined #openstack-nova | 02:23 | |
*** zhanglong has quit IRC | 02:27 | |
*** zhanglong has joined #openstack-nova | 02:31 | |
*** lbragstad has quit IRC | 02:37 | |
*** zhanglong has quit IRC | 02:45 | |
*** zhanglong has joined #openstack-nova | 02:47 | |
*** vishalmanchanda has joined #openstack-nova | 02:59 | |
*** brinzhang has joined #openstack-nova | 03:05 | |
*** migawa|AFK is now known as migawa | 03:05 | |
*** mkrai has joined #openstack-nova | 03:15 | |
*** zhanglong has quit IRC | 03:17 | |
*** spatel has joined #openstack-nova | 03:39 | |
*** spatel has quit IRC | 03:44 | |
*** tetsuro_ has quit IRC | 04:03 | |
*** tetsuro has joined #openstack-nova | 04:04 | |
*** udesale has joined #openstack-nova | 04:07 | |
*** migawa is now known as migawa|lunch|AFK | 04:18 | |
*** igordc has joined #openstack-nova | 04:37 | |
*** ileixe has quit IRC | 04:42 | |
*** igordc has quit IRC | 04:48 | |
*** igordc has joined #openstack-nova | 04:48 | |
*** igordc has quit IRC | 04:48 | |
*** migawa|lunch|AFK is now known as migawa|lunch | 04:52 | |
*** ileixe has joined #openstack-nova | 04:52 | |
*** yaawang has joined #openstack-nova | 05:13 | |
*** Liang__ has quit IRC | 05:28 | |
*** Liang__ has joined #openstack-nova | 05:30 | |
*** evrardjp has quit IRC | 05:34 | |
*** evrardjp has joined #openstack-nova | 05:34 | |
*** yaawang has quit IRC | 05:50 | |
*** spatel has joined #openstack-nova | 06:00 | |
*** spatel has quit IRC | 06:05 | |
*** zhanglong has joined #openstack-nova | 06:12 | |
*** migawa|lunch is now known as migawa|AFK | 06:24 | |
*** migawa|AFK is now known as migawa | 06:25 | |
*** ccamacho has quit IRC | 06:39 | |
*** tetsuro has quit IRC | 06:51 | |
*** tetsuro_ has joined #openstack-nova | 06:51 | |
*** spatel has joined #openstack-nova | 06:55 | |
*** spatel has quit IRC | 07:00 | |
*** ociuhandu has joined #openstack-nova | 07:30 | |
*** ociuhandu has quit IRC | 07:35 | |
*** imacdonn has quit IRC | 07:54 | |
*** imacdonn has joined #openstack-nova | 07:54 | |
*** maciejjozefczyk has joined #openstack-nova | 07:57 | |
*** kevinz has joined #openstack-nova | 08:08 | |
*** slaweq has joined #openstack-nova | 08:10 | |
*** ccamacho has joined #openstack-nova | 08:13 | |
kevinz | Hi Nova, Linaro has donate some machines to Infra team already, node is ready in nodepool. We'd like to enable Arm64 CI for Nova(libvirt driver). So any guidline for me to get involved? | 08:15 |
*** ralonsoh has joined #openstack-nova | 08:18 | |
*** tosky has joined #openstack-nova | 08:19 | |
*** tesseract has joined #openstack-nova | 08:20 | |
kevinz | create a bug here to track this: https://bugs.launchpad.net/nova/+bug/1863058 | 08:21 |
openstack | Launchpad bug 1863058 in OpenStack Compute (nova) "Arm64 CI for Nova" [Undecided,New] | 08:21 |
*** ivve has joined #openstack-nova | 08:24 | |
*** ivve has quit IRC | 08:27 | |
*** tkajinam has quit IRC | 08:29 | |
*** amoralej|off is now known as amoralej | 08:29 | |
*** huaqiang has quit IRC | 08:30 | |
lyarwood | kevinz: I guess if these hosts are already in nodepool then we can use them directly without third party CI? | 08:49 |
kevinz | lyarwood: hi, yes these machines are now in nodepool. | 08:49 |
kevinz | lyarwood: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L414 | 08:50 |
kevinz | And for arm64 nodes, there is another pipeline https://review.opendev.org/#/c/698606 | 08:51 |
kevinz | due to lacking of nodes | 08:51 |
kevinz | called "check-arm64" | 08:51 |
lyarwood | https://github.com/openstack/project-config/blob/b393e477951ba3a38c63565c0824f4cf95ae292d/zuul.d/pipelines.yaml#L349-L376 - yeah just found that | 08:54 |
lyarwood | so I assume we'd need to add that pipeline alongside check etc in .zuul.yml - https://github.com/openstack/nova/blob/554a6ffa837ba915c06c8ae70c339e911c9c9303/.zuul.yaml#L213-L336 | 08:55 |
lyarwood | and define some jobs | 08:55 |
lyarwood | kevinz: I'd raise this on the weekly meeting later today if you're around. | 08:56 |
lyarwood | or the ML if you're not | 08:57 |
kevinz | lyarwood: Thanks, We can talk via ML first. as today's meeting is quite early for me :D | 08:58 |
kevinz | we should define some jobs for this CI | 08:58 |
kevinz | lyarwood: The meeting next week(UTC14 is available for me) | 09:00 |
lyarwood | kevinz: ack understood, well in that case feel free to propose a change and we can always talk about things directly there as well :) | 09:01 |
kevinz | lyarwood: no problem, I will. Thanks a lot | 09:01 |
*** tetsuro_ has quit IRC | 09:03 | |
lyarwood | np :) | 09:04 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Always provide the size in bytes when calling virDomainBlockResize https://review.opendev.org/707590 | 09:04 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: images: Remove Libvirt specific configurable use from qemu_img_info https://review.opendev.org/707591 | 09:04 |
*** tetsuro has joined #openstack-nova | 09:07 | |
*** huaqiang has joined #openstack-nova | 09:14 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: DNM - Test TEMPEST_EXTEND_ATTACHED_ENCRYPTED_VOLUME https://review.opendev.org/707593 | 09:21 |
*** derekh has joined #openstack-nova | 09:29 | |
*** bbowen has quit IRC | 09:34 | |
*** bbowen has joined #openstack-nova | 09:35 | |
*** ociuhandu has joined #openstack-nova | 09:37 | |
*** ociuhandu has quit IRC | 09:40 | |
*** ileixe has quit IRC | 09:52 | |
*** ociuhandu has joined #openstack-nova | 09:53 | |
*** ileixe has joined #openstack-nova | 09:57 | |
huaqiang | stephenfin: sean-k-mooney: alex_xu: Thanks for review. And now I just have several things need to further disccuss with you when you are around. | 10:00 |
*** ociuhandu has quit IRC | 10:01 | |
*** ociuhandu has joined #openstack-nova | 10:01 | |
*** ociuhandu has quit IRC | 10:02 | |
*** ociuhandu has joined #openstack-nova | 10:02 | |
*** ociuhandu has quit IRC | 10:05 | |
*** dtantsur|afk is now known as dtantsur | 10:05 | |
*** ociuhandu has joined #openstack-nova | 10:05 | |
*** bbowen has quit IRC | 10:11 | |
*** bbowen has joined #openstack-nova | 10:11 | |
*** ociuhandu has quit IRC | 10:13 | |
*** xiaolin has quit IRC | 10:13 | |
*** ileixe has quit IRC | 10:16 | |
huaqiang | sean-k-mooney: for the mixed instance spec, in the cpu policy matrix, when no 'hw:cpu_policy' and 'hw_cpu_policy' defined, I think the final result should not be 'shared', which is you sugguested in your review. | 10:16 |
huaqiang | because in this case, say again, no 'hw_cpu_policy' in image property and no 'hw:cpu_policy' in flavor extra specs, | 10:17 |
*** ileixe has joined #openstack-nova | 10:17 | |
huaqiang | the final instance CPU allocation policy is determined by 'resources:(P|V)CPU' | 10:18 |
huaqiang | it might be 'dedicated' or 'mixed' | 10:18 |
*** ileixe has quit IRC | 10:21 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 10:37 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard/soft reboot with accelerators. https://review.opendev.org/697940 | 10:37 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable start/stop of instances with accelerators. https://review.opendev.org/699553 | 10:37 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable and use COMPUTE_ACCELERATORS trait. https://review.opendev.org/699554 | 10:37 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Bump compute rpcapi version and reduce Cyborg calls. https://review.opendev.org/704227 | 10:37 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 10:37 |
bauzas | stephenfin: sean-k-mooney: thanks for the comments on https://review.opendev.org/#/c/552924/, I'll try to update the spec by today around 2pm CET | 10:39 |
*** spatel has joined #openstack-nova | 10:41 | |
*** spatel has quit IRC | 10:46 | |
alex_xu | stephenfin: I reply about the image metadata, but we don't have any specific usecase for images meta, I'm just thinking the generic usecase we may have in the nova https://review.opendev.org/#/c/668656/19/specs/ussuri/approved/use-pcpu-vcpu-in-one-instance.rst@122. So i'm not insis on that. just try to ensure that isn't what we want | 10:47 |
alex_xu | sean-k-mooney: I'm prefer the service version, since I think the traits will become useless after upgrade. And this isn't a feature we have to support in the middle of upgrade. https://review.opendev.org/#/c/668656/19/specs/ussuri/approved/use-pcpu-vcpu-in-one-instance.rst@348 | 10:49 |
alex_xu | huaqiang: ^ | 10:49 |
*** xiaolin has joined #openstack-nova | 10:58 | |
*** zhanglong has quit IRC | 11:02 | |
*** udesale has quit IRC | 11:02 | |
*** zhanglong has joined #openstack-nova | 11:03 | |
*** dtantsur is now known as dtantsur|brb | 11:22 | |
openstackgerrit | Liang Fang proposed openstack/nova-specs master: Support volume local cache https://review.opendev.org/689070 | 11:25 |
*** jawad_axd has joined #openstack-nova | 11:25 | |
*** hamzy_ is now known as hamzy | 11:26 | |
*** ivve has joined #openstack-nova | 11:39 | |
*** mkrai has quit IRC | 11:40 | |
*** mkrai has joined #openstack-nova | 11:50 | |
*** Liang__ has quit IRC | 11:57 | |
stephenfin | alex_xu: Need to read your reply for the image metadata bit, but for traits vs. service version, by traits do you mean capabilities? | 11:58 |
*** vishalmanchanda has quit IRC | 11:58 | |
stephenfin | i.e. this virt driver can create/handle mixed instances | 11:59 |
sean-k-mooney | alex_xu: i just added a reason to keep the trait | 11:59 |
sean-k-mooney | alex_xu: specificaly if you have mixed hypervisors e.g. hyper-v and libvirt the compute campaltibity trait will be useful to select just the libvirt hosts via placement | 11:59 |
*** ociuhandu has joined #openstack-nova | 11:59 | |
*** amoralej is now known as amoralej|lunch | 12:00 | |
sean-k-mooney | alex_xu: huaqiang: stephenfin ^ what do you think is that enough reason to use the trait | 12:00 |
*** Liang__ has joined #openstack-nova | 12:01 | |
sean-k-mooney | stephenfin: we have standardised compute capablitys as traits in os-tratis | 12:01 |
sean-k-mooney | stephenfin: so when we add a new compute capability we now also add a trait for that | 12:01 |
sean-k-mooney | basically that is what the compute namespace is for. not entirly but more or less https://github.com/openstack/os-traits/tree/master/os_traits/compute | 12:03 |
sean-k-mooney | for example the COMPUTE_VOLUME_MULTI_ATTACH trait https://github.com/openstack/os-traits/blob/master/os_traits/compute/volume.py#L24 | 12:03 |
sean-k-mooney | or same host cold migrate for vsphere https://github.com/openstack/os-traits/blob/master/os_traits/compute/__init__.py#L30 i think mix cpu suport makes sense as it not otherwise discoverable via plamcent and i think it is something we want to schdule on | 12:05 |
*** ociuhandu has quit IRC | 12:05 | |
*** nicolasbock has joined #openstack-nova | 12:07 | |
sean-k-mooney | alex_xu: with that said i wont -1 if you dont add the trait and decied to go with the service version bump i just dont like useing the service version as a proxy for specific features if we can avoid it. | 12:07 |
stephenfin | sean-k-mooney: not all of them though | 12:21 |
stephenfin | we have traits for e.g. the 'supports_image_type_ploop' capability | 12:21 |
stephenfin | but I don't see any for 'supports_pcpus' | 12:22 |
stephenfin | what's the advantage of the trait approach? | 12:22 |
stephenfin | (for my own reference) | 12:22 |
*** ociuhandu has joined #openstack-nova | 12:22 | |
*** derekh has quit IRC | 12:23 | |
alex_xu | sean-k-mooney: that is good point. I didn't think about it. I agree with mix hypervisor, that is useful | 12:23 |
*** ccamacho has quit IRC | 12:25 | |
alex_xu | sean-k-mooney: we have hypervisor doesn't support NUMA right?, but we don't have traits for them also | 12:25 |
alex_xu | stephenfin: we needn't supports_cpus, probably we need supports_dedicated, or support_numa, I think we have some of hypervisor doesn't support those | 12:26 |
*** ociuhandu has quit IRC | 12:28 | |
sean-k-mooney | alex_xu: we do although numa with sylvains spec would be relected in the toplogy of the RPs | 12:35 |
sean-k-mooney | so it would no longer need it | 12:35 |
*** ociuhandu has joined #openstack-nova | 12:35 | |
alex_xu | ah, right | 12:35 |
sean-k-mooney | numa is supported by hyper-v and libvirt today | 12:35 |
*** tetsuro_ has joined #openstack-nova | 12:40 | |
*** ociuhandu has quit IRC | 12:40 | |
alex_xu | sean-k-mooney: I guess libvirt is the only virt driver report pcpu | 12:40 |
*** tetsuro has quit IRC | 12:44 | |
*** tetsuro_ has quit IRC | 12:45 | |
huaqiang | sean-k-mooney: about the white-box test, it was the decision made on Shanghai PDT meeting, I need stephenfin's openion | 12:46 |
huaqiang | If I remember correctly, he proposed the test. I'd know if he insists on the same openion now | 12:47 |
*** mkrai has quit IRC | 12:48 | |
huaqiang | stephenfin: If I'll add functional tests for the proposing mixed instance spec, do you still think the white-box tempest plugin should be a test that I have to pass? | 12:49 |
huaqiang | s/add functional tests/add functional tests in intree NUMA test cases/ | 12:51 |
alex_xu | sean-k-mooney: I still prefer service version now. Since it is the legacy way to figure out the upgrade status. And the trait support_mix doesn't useful, since libvirt is the only driver support pcpu. So it doesn't feel good we add extra trait in the placement request which need extra filtering and db query inside placement, but it isn't very useful for now. Maybe we need that trait in the future, if | 12:55 |
alex_xu | we have other virt driver support pcpu and vcpu on the same host. | 12:55 |
*** zhanglong has quit IRC | 13:00 | |
*** mkrai has joined #openstack-nova | 13:00 | |
*** nweinber has joined #openstack-nova | 13:01 | |
sean-k-mooney | ok ill leve it up to you to decide | 13:03 |
*** adriant has quit IRC | 13:04 | |
*** adriant has joined #openstack-nova | 13:04 | |
sean-k-mooney | regarding whitebox stephenfin i think you will agree it is a nice to have but not a hard requirement | 13:04 |
*** zhanglong has joined #openstack-nova | 13:04 | |
sean-k-mooney | form a downstream perspective we will need to have this tested with whitbox before we can support it in the osp product but it should not be a requirement for merging upstream | 13:05 |
sean-k-mooney | espcially since we dont currently have a whitebox job runing against nova | 13:05 |
alex_xu | sean-k-mooney: thanks | 13:06 |
*** tbachman has joined #openstack-nova | 13:07 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: DNM - Test TEMPEST_EXTEND_ATTACHED_ENCRYPTED_VOLUME https://review.opendev.org/707593 | 13:15 |
*** mkrai has quit IRC | 13:16 | |
*** ccamacho has joined #openstack-nova | 13:19 | |
*** slaweq has quit IRC | 13:21 | |
*** rosmaita has joined #openstack-nova | 13:23 | |
*** eharney has quit IRC | 13:24 | |
gibi | cores, volume local cache discussion will start soon on https://bluejeans.com/3228528973 | 13:24 |
gibi | or even anybody who is interested | 13:24 |
*** slaweq has joined #openstack-nova | 13:25 | |
*** eharney has joined #openstack-nova | 13:26 | |
*** belmoreira has joined #openstack-nova | 13:27 | |
*** ociuhandu has joined #openstack-nova | 13:27 | |
*** amoralej|lunch is now known as amoralej | 13:32 | |
gibi | sean-k-mooney: ^^ | 13:32 |
stephenfin | efried, gibi, dansmith: API question: should the `validation` parameter for the extra spec validation be part of the POST/PUT request body or the query string? | 13:38 |
*** martinkennelly has joined #openstack-nova | 13:38 | |
stephenfin | we seem to do the former for things like server hints https://docs.openstack.org/api-ref/compute/?expanded=create-extra-specs-for-a-flavor-detail,create-server-detail#id11 | 13:38 |
*** ociuhandu has quit IRC | 13:39 | |
*** ociuhandu has joined #openstack-nova | 13:39 | |
mnaser | just wondering if we can have more eyes on: https://review.opendev.org/#/c/670112/ -- its really helpful and still remains useful to this day :> | 13:41 |
sean-k-mooney | stephenfin: query arg | 13:41 |
sean-k-mooney | i think | 13:41 |
sean-k-mooney | although if we dont have other args in the query args | 13:41 |
sean-k-mooney | then maybe the body | 13:41 |
sean-k-mooney | query arg fells more natural | 13:41 |
sean-k-mooney | as its not part of the data of the flavor resouces | 13:42 |
*** brinzhang has quit IRC | 13:43 | |
stephenfin | Hmm, yeah, you could make the same argument for server hints though | 13:44 |
*** brinzhang has joined #openstack-nova | 13:44 | |
*** tbachman has quit IRC | 13:45 | |
*** icarusfactor has quit IRC | 13:49 | |
*** tetsuro has joined #openstack-nova | 13:50 | |
*** martinkennelly has quit IRC | 13:58 | |
*** ociuhandu has quit IRC | 13:59 | |
*** huaqiang has quit IRC | 14:05 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs https://review.opendev.org/552924 | 14:06 |
*** huaqiang has joined #openstack-nova | 14:08 | |
*** tbachman has joined #openstack-nova | 14:10 | |
bauzas | efried: dansmith: gibi: sean-k-mooney: stephenfin: ^ | 14:10 |
*** lbragstad has joined #openstack-nova | 14:11 | |
*** lbragstad has quit IRC | 14:11 | |
*** lbragstad has joined #openstack-nova | 14:12 | |
*** Liang__ is now known as LiangFang | 14:15 | |
*** zhanglong has quit IRC | 14:19 | |
*** dtantsur|brb is now known as dtantsur | 14:24 | |
openstackgerrit | Merged openstack/nova master: Make RBD imagebackend flatten method idempotent https://review.opendev.org/704330 | 14:25 |
*** maciejjozefczyk has quit IRC | 14:26 | |
*** spatel has joined #openstack-nova | 14:27 | |
*** mriedem has joined #openstack-nova | 14:29 | |
*** udesale has joined #openstack-nova | 14:29 | |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/train: Make RBD imagebackend flatten method idempotent https://review.opendev.org/707650 | 14:29 |
*** spatel has quit IRC | 14:32 | |
*** tetsuro has quit IRC | 14:37 | |
huaqiang | sean-k-mooney: thanks. I have removed the content of white-box and send the spec agian. | 14:38 |
*** ociuhandu has joined #openstack-nova | 14:38 | |
openstackgerrit | Huachang Wang proposed openstack/nova-specs master: Use PCPU and VCPU in one instance https://review.opendev.org/668656 | 14:39 |
huaqiang | stephenfin: sean-k-mooney: alex_xu: the mixed instance spec is updated, please review. Thanks | 14:40 |
*** ociuhandu has quit IRC | 14:45 | |
*** maciejjozefczyk has joined #openstack-nova | 14:48 | |
*** jawad_axd has quit IRC | 14:49 | |
dansmith | stephenfin: agree query arg feels more right-er | 14:52 |
*** jawad_axd has joined #openstack-nova | 14:56 | |
*** dtantsur is now known as dtantsur|afk | 14:58 | |
*** jawad_axd has quit IRC | 15:01 | |
efried | stephenfin: I vote qparam too. | 15:10 |
stephenfin | sweet. qparam it is | 15:10 |
*** mlavalle has joined #openstack-nova | 15:26 | |
*** derekh has joined #openstack-nova | 15:27 | |
*** priteau has joined #openstack-nova | 15:52 | |
*** artom has joined #openstack-nova | 15:59 | |
*** eharney has quit IRC | 16:02 | |
*** martinkennelly has joined #openstack-nova | 16:10 | |
*** udesale_ has joined #openstack-nova | 16:14 | |
*** udesale has quit IRC | 16:14 | |
*** ociuhandu has joined #openstack-nova | 16:16 | |
gibi | stephenfin: I vote for query arg as Sean stated it is not part of the entity you actually create or modify | 16:16 |
*** jmlowe has joined #openstack-nova | 16:18 | |
efried | bauzas: I'm very close on the NUMA RP spec; +2 if you just flip the defaults as noted. But (despite him saying he's not blocking) I want to convince stephenfin that this is the way we should go. | 16:21 |
*** tosky has quit IRC | 16:21 | |
efried | sean-k-mooney: do you agree with my notes on the default for implicit numa nodes? | 16:21 |
efried | https://review.opendev.org/#/c/552924/20/specs/ussuri/approved/numa-topology-with-rps.rst@269 | 16:22 |
stephenfin | efried: So I'm clear, what's the objection to a "I want this host to report/not report NUMA"? | 16:24 |
stephenfin | dansmith too ^ | 16:24 |
efried | stephenfin: no objection. That's being provided. But I want the default to be "report NUMA". | 16:24 |
stephenfin | It's being provided temporarily though, not long term | 16:25 |
stephenfin | Why not do this long-term | 16:25 |
efried | ah, okay: | 16:25 |
*** maciejjozefczyk has quit IRC | 16:25 | |
*** READ10 has joined #openstack-nova | 16:26 | |
efried | The way dansmith explained it, the only reason we don't always report NUMA and create real NUMA topologies for guests is because it's hard. But no consumer *actually* wants a guest that doesn't affine its memory to CPUs; they're taking a significant performance hit because we haven't solved this problem in nova. | 16:27 |
efried | I'm... paraphrasing. Dan was more eloquent about it. | 16:27 |
dansmith | doubtful, but yeah | 16:27 |
efried | But we have to balance that against fitting. | 16:27 |
dansmith | stephenfin: I don't want to have two "modes" for the compute service to operate in long term | 16:28 |
efried | again paraphrasing dansmith, nova explicitly disclaims the ability to fit that last VM on that last almost-full host. So if we're compromising, that's where we're compromising. | 16:28 |
stephenfin | But we could provide NUMA information to the guest. It would just match the topology of the host | 16:29 |
efried | that's exactly what we're doing. | 16:29 |
efried | the U proposal does it imperfectly, 80/20. | 16:29 |
efried | For V we can work on can_split to get that other 20% | 16:29 |
efried | and then we can remove the [workaround]. | 16:29 |
stephenfin | From the the libvirt XML perspective, yeah, but not from placement perspective | 16:30 |
efried | sorry, wha? | 16:31 |
stephenfin | The only guests that *needs* NUMA affinity are pinned instances, yeah? | 16:34 |
stephenfin | and those with hugepage, but that's a self-inflicted wound | 16:34 |
stephenfin | the pinned ones need it because their cores are pinned to host cores from a specific NUMA node | 16:35 |
stephenfin | whereas unpinned instances are floating across all (enabled) host cores | 16:35 |
stephenfin | so they naturally have affinity to everything, even though we don't properly expose that information to the guest | 16:36 |
efried | does memory float too? | 16:37 |
*** ociuhandu has quit IRC | 16:37 | |
stephenfin | if you don't provide a pagesize, yes | 16:37 |
stephenfin | but otherwise, no. that's the self-inflicted wound I talked about above | 16:37 |
efried | Then why are we modeling 4k pages under NUMA nodes? | 16:38 |
efried | Wait, *can* you pin 4k pages? | 16:38 |
stephenfin | I _think_ so, yeah | 16:39 |
stephenfin | if you use the strict mem policy | 16:39 |
stephenfin | actually, I don't think it's pinning in the traditional sense | 16:39 |
stephenfin | because they can be shared | 16:39 |
*** udesale_ has quit IRC | 16:40 | |
stephenfin | sorry, I'm in a meeting so I can't formulate my thoughts properly. gimme 20 | 16:40 |
*** jmlowe has quit IRC | 16:40 | |
*** ociuhandu has joined #openstack-nova | 16:42 | |
efried | Ight. I don't have the depth of understanding to refute the "unpinned/floating" argument, which I was kinda advocating the other day (albeit probably without specifics). Going to need dansmith to take that on. | 16:42 |
*** gyee has joined #openstack-nova | 16:43 | |
dansmith | I think maybe he's talking about the case where you're overcommitting memory | 16:45 |
dansmith | I'm also really not an expert on the low-level details, so maybe we've gotten lost in the woods a bit, | 16:47 |
efried | tobiash: What's the word on https://review.opendev.org/#/c/572805/ ? Spec freeze is today. | 16:49 |
dansmith | I'm trying to translate what I know of reports of what people do, vs. what they would like to do, and what makes sense into what we should be doing | 16:50 |
*** Sundar has joined #openstack-nova | 16:50 | |
tobiash | efried: sorry, I was busy with other other things in the meantime, I fear it has to be postponed to the next release :( | 16:51 |
efried | tobiash: Okay, thanks, I'll do that. | 16:51 |
tobiash | thanks a lot | 16:51 |
*** tesseract has quit IRC | 16:59 | |
bauzas | efried: sorry, was on meeting | 16:59 |
* bauzas reads any comments on the spec | 17:00 | |
efried | bauzas: I really just want to know what you think of my French | 17:00 |
bauzas | LOL | 17:01 |
gibi | efried: based on Tushar's comment on the spec bp/support-shared-storage-resource-provider can be deferred out from U | 17:01 |
* efried looks... | 17:01 | |
bauzas | efried: we have some french continuous present but not really like yours | 17:02 |
efried | no, like I said, I don't see anybody ever saying or writing anything like that. | 17:02 |
bauzas | efried: anyway, I see your -1 but I intentionnally flipped the default to *not* reshape as discussed between sean-k-mooney, stephenfin and I | 17:02 |
efried | bauzas: to me, that's the crux | 17:02 |
efried | If we don't reshape by default, everything changes. | 17:03 |
bauzas | efried: because of the potentiality of the regressions we could get | 17:03 |
bauzas | efried: I know | 17:03 |
bauzas | efried: and I wanted to discuss this with you | 17:03 |
bauzas | because I'm very afraid of any potential issue we would have in Ussuri | 17:03 |
efried | stephenfin and dansmith need to be involved, but I think they're are on calls rn | 17:03 |
bauzas | if we flip to changing the world | 17:03 |
bauzas | my point is, I don't know the figure but not all clouds care about NUMA | 17:04 |
bauzas | for those clouds, I'd prefer us to not change their lifes | 17:04 |
bauzas | and pretending there will be no regressions | 17:04 |
efried | So dansmith's argument was "don't care about NUMA" does *not* mean "give me shitty performance". | 17:05 |
bauzas | on the other hand, if we allow a default to be "no-op", then we can work on the NUMA implementation seamlessly and iteratively like we did for Cells v2 | 17:05 |
efried | The [workaround] and reversible reshape gives you the way to deal with regressions. | 17:05 |
dansmith | I don't really understand.. I thought the agreement was to default the new behavior off for U, not reshape by default, let people opt-in during U and then flip the default (or remove it) for V? | 17:05 |
efried | ugh, no, if we were doing that, there would be no good motivation to hack the splitting thing in. | 17:05 |
efried | And also no motivation for operators to opt in. | 17:06 |
efried | so it would be a waste of effort. | 17:06 |
bauzas | dansmith: that's what i wrote but looks like the outcome of tuesday's discussion between you, sean-k-mooney and efried was the other way | 17:06 |
dansmith | bauzas: not that I remember | 17:06 |
dansmith | but I'll admit to being completely exhausted by this conversation | 17:06 |
bauzas | efried: dansmith: floor is your | 17:06 |
bauzas | dansmith: and tbh, me too | 17:06 |
efried | you people need to do more cardio | 17:07 |
bauzas | I do gym twice a week | 17:07 |
stephenfin | lyarwood: comments on https://review.opendev.org/#/c/706880/ | 17:07 |
bauzas | (and skiing, but that's irrelevant) | 17:07 |
bauzas | efried: anyway, my point is, | 17:07 |
stephenfin | dansmith, efried, bauzas: so, from the top | 17:07 |
bauzas | we can't ask operators to modify their configs *before* they upgrade or *before* they restart their clouds | 17:08 |
stephenfin | from what CERN are saying, they're already dividing their hosts into those for NUMA and those for not NUMA | 17:08 |
efried | nope. We're not asking that. | 17:08 |
bauzas | efried: I know, but what you propose will frighten them | 17:08 |
stephenfin | so I'm not sure why we can't do the same in placement | 17:08 |
bauzas | because, we flip to NUMA everywhere | 17:08 |
bauzas | for CERN, that would mean non-NUMA cells would be NUMA-speaking from Ussuri | 17:09 |
efried | which will happen at some point anyway. | 17:09 |
bauzas | and they would have to let them speak... what? after this | 17:09 |
bauzas | efried: I don't disagree with you, and I think this could be Victoria | 17:09 |
efried | if we default 'off', nobody is going to switch it on. Then in V (or whenever) we switch the default and have this issue. | 17:09 |
stephenfin | if you care about NUMA affinity, configure things so placement speaks NUMA, otherwise YAGNI | 17:10 |
bauzas | but not Ussuri | 17:10 |
bauzas | stephenfin: that's what I propose | 17:10 |
stephenfin | it seems so much simpler | 17:10 |
bauzas | honestly, my vision of the work to do is : | 17:11 |
bauzas | 'plumb, plumb, plumb things on one side, and mark this feature as opt-in' | 17:11 |
bauzas | in the eventuality of a very bad situation close to RC1, then we just add an 'EXPERIMENTAL' flag on the option | 17:12 |
bauzas | boom, problem solved. | 17:12 |
efried | stephenfin: so in that scenario, you upgrade your control plane, and then any hw:numa*-havin flavors will simply refuse to land until you've upgraded *and* opted-in some hosts. | 17:12 |
stephenfin | no | 17:12 |
efried | or that's what the 'fallback' query is for | 17:12 |
stephenfin | yeah, short term fallback query like we do for PCPU | 17:13 |
efried | so, still you're doing two queries and either merging the results or violating pack/spread and server affinity groups. | 17:13 |
stephenfin | with a big ass warning saying "you're using this host for NUMA instances - update configuration now or perish in a future release" | 17:14 |
stephenfin | yeah, but some hosts can choose to never opt-in | 17:14 |
stephenfin | because they don't care | 17:14 |
efried | And you can land NUMA-aware flavors on either kind of host | 17:14 |
bauzas | yeah, that's my thoughts | 17:14 |
efried | What about NUMA-agnostic flavors? Those can only land on un-upgraded or un-reshaped hosts, right? | 17:14 |
efried | So one-way segregation? | 17:14 |
stephenfin | they never boot pinned instances and their instance floats across all (enabled) host cores as before | 17:15 |
bauzas | efried: non-NUMA would stick with non-NUMA hosts | 17:15 |
stephenfin | non-NUMA or non-upgraded | 17:15 |
bauzas | right | 17:15 |
stephenfin | because we can't distinguish | 17:15 |
bauzas | correct, we just say "not those hosts" | 17:15 |
efried | well, we could distinguish if we wanted to. | 17:15 |
bauzas | thru a forbidden triat | 17:15 |
stephenfin | not without operator intervention | 17:16 |
stephenfin | the operator would have to do something to say "this host is intended to be a non-NUMA host" | 17:16 |
efried | we could make the segregation complete by simply adding a trait to (even unreshaped) U hosts. | 17:16 |
stephenfin | how do you tell the difference between unreshaped and intentionally non-NUMA hosts? | 17:17 |
efried | unreshaped U is intentionally non-NUMA. | 17:17 |
stephenfin | it can't be - you'd break upgrades | 17:17 |
efried | um | 17:18 |
efried | yes | 17:18 |
efried | that's what we're talking about doing. | 17:18 |
stephenfin | the query for NUMA-based instances in U would be "all NUMA hosts + all unreshaped hosts" | 17:18 |
stephenfin | the query for non-NUMA-based instances would be "all non-NUMA hosts + all unreshaped hosts" | 17:18 |
efried | or "all NUMA U hosts + all unreshaped pre-U hosts" and "all non-NUMA U hosts + all unreshaped pre-U hosts" | 17:18 |
efried | because I thought we were trying to segregate from U+ | 17:19 |
stephenfin | again, you'll break upgrades | 17:19 |
efried | how so? | 17:19 |
stephenfin | you might not be reshaping | 17:20 |
stephenfin | if NUMA'ness if optional long term | 17:20 |
stephenfin | by V, a host will identify itself as either caring about NUMA or not caring | 17:21 |
stephenfin | but before then, we're in an uncertain state where the host _might_ be NUMA or might not | 17:21 |
stephenfin | and we'd need the operator to do something to tell us which one it is | 17:21 |
bauzas | stephenfin: if the operator doesn't reshape, then all hosts are non-NUMA | 17:21 |
bauzas | we don't need to distinguish them | 17:22 |
lyarwood | stephenfin: thanks, just sent some comments back. FWIW it's part of this bugfix series https://review.opendev.org/#/q/topic:bug/1861071 | 17:22 |
bauzas | it's just that we gonna add a specific forbidden trait for ensuring either way that non-NUMA instances can't land on NUMA hosts | 17:22 |
stephenfin | bauzas: how will you ever kill the fallback query in that case? | 17:23 |
bauzas | if the operator starts definining NUMA hosts, then he will shard its cloud, but I'm cool with it | 17:23 |
bauzas | stephenfin: the failback query should only be for 'NUMA-aware' instances | 17:24 |
bauzas | .... aaaaand I probably messed this up | 17:24 |
bauzas | (in the last rev of the spec) | 17:24 |
stephenfin | we want to make sure a non-NUMA instance will not land on a NUMA host, but long term shouldn't we also make sure a NUMA instance won't land on a non-NUMA host? | 17:24 |
bauzas | stephenfin: yeah | 17:24 |
stephenfin | okay, then you need to find some way to indicate that yes, this *really* is a non-NUMA host | 17:25 |
stephenfin | that way your queries can be "give me all NUMA hosts and all unconfigured hosts, but *not* any non-NUMA hosts" | 17:26 |
stephenfin | and vice versa | 17:26 |
stephenfin | right? | 17:26 |
bauzas | sec, wrapping up things in my mind | 17:26 |
*** dave-mccowan has quit IRC | 17:26 | |
efried | so by your proposal, we actually need a three-way conf opt in U. | 17:27 |
bauzas | there are two timeframes in my mind | 17:27 |
bauzas | Ussuri where hosts can be unconfigured | 17:27 |
bauzas | (because default is no reshape) | 17:27 |
stephenfin | in a future release, those would simply become "give me all NUMA hosts" or "give me all non-NUMA hosts", depending on your instance type | 17:27 |
bauzas | Victoria where all hosts are configured | 17:27 |
stephenfin | efried: yeah, I was thinking a boolean that defaults to None | 17:27 |
stephenfin | I think we can do that | 17:27 |
stephenfin | none/unset | 17:28 |
efried | - "This host is NUMA" ==> reshape, only land hw:numa* flavors | 17:28 |
efried | - "This host is not NUMA" ==> no reshape, only land non-hw:numa* flavors | 17:28 |
efried | - None (default in U) ==> no reshape, looks just like a T host, land either type of flavor | 17:28 |
stephenfin | yup | 17:28 |
bauzas | I can write this | 17:28 |
efried | and then, what, make None illegal in V?? | 17:28 |
*** maciejjozefczyk has joined #openstack-nova | 17:28 | |
bauzas | efried: I'm cool with it | 17:28 |
efried | Thus breaking upgrades?? | 17:28 |
bauzas | nope | 17:28 |
stephenfin | V, W, X, ... at some point in the future | 17:28 |
bauzas | becaue | 17:28 |
bauzas | because, | 17:29 |
bauzas | we can test things | 17:29 |
bauzas | and see 'okay, look, this is harmless' | 17:29 |
bauzas | so, once we all agree, we remove the None value | 17:29 |
stephenfin | essentially this would become one of the things you have to configure | 17:29 |
stephenfin | like 'compute_driver' | 17:29 |
bauzas | and de facto all instances act upon NUMA checking | 17:29 |
efried | I mean, if we're going to segregate eventually, then at some point we "break upgrades". | 17:29 |
efried | btw, dansmith specifically said he didn't want two modes long term. | 17:30 |
stephenfin | yeah, but by that point they'll have had a couple of cycles of warnings saying "yo, you *really* need to set this config option" | 17:30 |
stephenfin | efried: yeah, I don't understand why that's a bad thing | 17:30 |
dansmith | I officially give up, please proceed. | 17:31 |
efried | sigh | 17:31 |
stephenfin | I get that all instances should have some kind of NUMA awareness | 17:31 |
efried | okay, back to PS16 | 17:31 |
efried | stephenfin: tbc, if we go this route, we don't need can_split ever, right? | 17:32 |
stephenfin | but it's a nice-to-have and I don't imagine everyone really cares | 17:33 |
* stephenfin doesn't care what NUMA node Chrome is running on | 17:33 | |
stephenfin | efried: correct | 17:33 |
stephenfin | if we're going with the "everything is mapped to NUMA", then I think we should move the ball forward on 'can_split' instead | 17:33 |
stephenfin | because if we don't, it won't ever happen :) | 17:34 |
*** evrardjp has quit IRC | 17:34 | |
stephenfin | implement that, then use it for NUMA in V | 17:34 |
*** evrardjp has joined #openstack-nova | 17:34 | |
bauzas | folks, you lost me | 17:34 |
stephenfin | but as cdent saw from the openstack-discuss thread, no one's really asking for their NUMA-based instance to coexist alongside their "I don't care about NUMA"-based instances | 17:35 |
stephenfin | bauzas: A boolean '[compute] enable_numa' option that default to unset (None) | 17:35 |
efried | bauzas: that ^, but otherwise PS16. | 17:35 |
stephenfin | when unset, we start flashing a warning saying "you need to decide if this host is meant for NUMA-based instances or not" | 17:36 |
stephenfin | i.e. "go configure this option" | 17:36 |
bauzas | and no 'everything is NUMA and good luck finding a host that can fit your non-NUMA instance ?" | 17:36 |
stephenfin | not needed, IMO | 17:36 |
bauzas | yeah I agree | 17:37 |
stephenfin | it's so much more additional complexity for idk how much gain | 17:37 |
bauzas | ok, it's 6:37pm here and I will have to eat soon | 17:37 |
bauzas | I'm rushing over providing another round | 17:37 |
*** martinkennelly has quit IRC | 17:38 | |
stephenfin | Yeah, I've to go but feel free to +2 in my absence if the spec roughly maps to the above ^^^ I'm onboard with that approach | 17:38 |
efried | As PTL I decree that we can do the final approvals tomorrow morning. | 17:39 |
efried | rather than try to rush it through "tonight". | 17:39 |
stephenfin | sounds good to me (y) | 17:40 |
bauzas | efried: I appreciate your help but I'll still stick with working on a rev tonight | 17:42 |
efried | k | 17:42 |
stephenfin | huaqiang: https://review.opendev.org/#/c/668656/ acked too, btw. Thanks for sticking with that | 17:42 |
efried | saying, I won't proxy stephenfin's +2 tonight; it's fine to wait til morning for that. | 17:42 |
efried | ah, woot | 17:42 |
efried | gibi: re DISK_GB, save me reading the comment history, are you saying that the nova spec will be dependent on the placement change? | 17:45 |
*** ociuhandu_ has joined #openstack-nova | 17:45 | |
efried | ...an because the placement change won't happen in U, therefore the nova bp can be deferred? | 17:45 |
*** ociuhandu has quit IRC | 17:48 | |
*** eharney has joined #openstack-nova | 17:48 | |
*** ociuhandu_ has quit IRC | 17:50 | |
*** Sundar has quit IRC | 17:57 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: virt: Provide block_device_info during rescue https://review.opendev.org/700811 | 17:58 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Add support for stable device rescue https://review.opendev.org/700812 | 17:58 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: compute: Report COMPUTE_RESCUE_BFV and check during rescue https://review.opendev.org/701429 | 17:58 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: api: Introduce microverion 2.82 allowing boot from volume rescue https://review.opendev.org/701430 | 17:58 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: compute: Extract _get_bdm_image_metadata into nova.utils https://review.opendev.org/705212 | 17:58 |
*** derekh has quit IRC | 17:59 | |
efried | brinzhang: What's the story on https://review.opendev.org/#/c/580336/ (bp/destroy-instance-with-datavolume)? We're at spec freeze... | 18:00 |
openstackgerrit | Merged openstack/nova-specs master: Use PCPU and VCPU in one instance https://review.opendev.org/668656 | 18:02 |
gmann | efried: can you remove -2 from this now as spec is merged and good to code- https://review.opendev.org/#/c/701609/ | 18:03 |
efried | gmann: Since we're at spec freeze, we should probably wait until we've decided which unfinished blueprints should be Direction:Approved. | 18:04 |
efried | If the code were ready, that would be different, but... | 18:04 |
gmann | efried: code is in progress so i am not sure if author still confuse with -2 | 18:05 |
efried | gmann: We can help educate the author :P | 18:05 |
gmann | but ok to wait till Direction:Approved decision | 18:05 |
*** maciejjozefczyk has quit IRC | 18:06 | |
gmann | commented on review the same. | 18:07 |
*** amoralej is now known as amoralej|off | 18:12 | |
efried | melwitt: are you now owning nova-audit? (https://review.opendev.org/#/c/693226/) | 18:20 |
melwitt | efried: I didn't want to but I think the answer is technically yes because dansmith lost interest | 18:20 |
efried | melwitt: well, I ask because we're at spec freeze, so you need to get a couple cores on board, ahem, today if it's going to happen in ussuri. | 18:21 |
bauzas | efried: melwitt: FWIW, this is related https://review.opendev.org/#/c/670112/ | 18:22 |
efried | it is? | 18:22 |
bauzas | technically, it's just a rename | 18:22 |
bauzas | but the intent of the spec is to provide a new specific command AFAICR | 18:23 |
bauzas | this change ^ would just be another subcommand | 18:23 |
melwitt | efried: yeah, I don't think that's going to happen. operators are interested but the spec didn't attract review from cores thus far and I don't think I could wrangle two that would not be considered part owners by the end of today | 18:24 |
efried | melwitt: if "tomorrow" would make the difference, I'm fine with that. Or do you just want me to defer? | 18:25 |
melwitt | bauzas: the intent of the spec is to organize all of the heal commands in one place and make them runnable as a daemon service so that they automatically heal your cloud periodically | 18:26 |
bauzas | oh missed the last part | 18:27 |
bauzas | gtk | 18:27 |
*** Sundar has joined #openstack-nova | 18:27 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs https://review.opendev.org/552924 | 18:28 |
bauzas | efried: ^ | 18:28 |
efried | ack | 18:28 |
*** igordc has joined #openstack-nova | 18:28 | |
*** igordc has quit IRC | 18:28 | |
bauzas | anyway, bailing out | 18:29 |
melwitt | efried: I guess yeah if you'll give it till tomorrow, I'll send some email and see if anyone's willing to review. if there's not interest after that, then punt it | 18:29 |
Sundar | dansmith: If https://review.opendev.org/#/c/673735/37/nova/conductor/manager.py@524 is not the right place to delete ARQs on a reschedule, do you have any suggestion for a better plac? I could do it in the callers. | 18:29 |
efried | melwitt: ack. I'm adding it (with other open specs) to today's meeting agenda, if you want to drum up interest there. | 18:29 |
dansmith | melwitt: efried it seems highly unlikely that anything would get implemented in U either way, so I'm not sure it's worth that | 18:29 |
*** priteau has quit IRC | 18:29 | |
dansmith | I thought we were supposed to be trying to reduce the number of things we approved that aren't likely to make it, | 18:30 |
dansmith | but it kinda seems like we're doing the same ol' kind of behavior | 18:30 |
dansmith | Sundar: do it where it needs to be done, not inside a thing called something else.. so yes, wherever that's called from that is the right place | 18:31 |
melwitt | dansmith, efried: well, I could implement it quickly/dumbly (I'm imagining just moving the commands and adding a service) but getting review would be another story. worst case it sits there ready to go for V if ppl can't review in time. so, I dunno | 18:32 |
efried | dansmith: Yes, intend to do a sweep of Definition:Approved blueprints "soon" to decide which of those we can/should defer. | 18:33 |
efried | "spec freeze" -- no more definition approvals -- is what's happening now. | 18:34 |
openstackgerrit | John Garbutt proposed openstack/nova-specs master: Add Unified Limits Spec https://review.opendev.org/602201 | 18:37 |
efried | johnthetubaguy: Save me looking, did you squash the fup? | 18:45 |
efried | (abandon if so) | 18:45 |
*** dking_desktop has joined #openstack-nova | 18:50 | |
dking_desktop | I'm attempting to troubleshoot why I get the "No valid host was found." error when attempting to create a baremetal server, and just found this when I enabled debugging for the nova-scheduler: compute_status_filter request filter added forbidden trait COMPUTE_STATUS_DISABLED | 18:51 |
dking_desktop | Could that be the reason why I'm not able to find a valid host? How would I troubleshoot this further? | 18:52 |
*** ralonsoh has quit IRC | 18:52 | |
efried | dking_desktop: We always add that trait. It's only going to have an effect if the compute host is exposing that trait. You can check with a command like | 19:03 |
efried | openstack resource provider trait list $host_uuid | 19:03 |
efried | (I may not have the syntax exactly right -- see the docs) | 19:04 |
*** jawad_axd has joined #openstack-nova | 19:06 | |
dking_desktop | I'm using Ironic if that helps. I don't see anything for "openstack resource". Is it "openstack service provider list"? | 19:07 |
dking_desktop | Oh, maybe "openstack baremetal node trait list" | 19:08 |
dking_desktop | efried: I tried "openstack baremetal node trait list <UUID>", but that gave no results. Is that the problem? | 19:09 |
efried | dking_desktop: You need to install the osc-placement plugin to get the 'resource provider' subcommands | 19:10 |
efried | pip install osc-placement (or equivalent for your distro) | 19:10 |
efried | COMPUTE_STATUS_DISABLED isn't a trait that ironic itself would know about. | 19:10 |
*** jawad_axd has quit IRC | 19:11 | |
dking_desktop | Odd. I get "Operation or argument is not supported with version 1.0; requires at least version 1.6" | 19:12 |
dking_desktop | I wonder what software that refers to. The osc-placement package should be 1.8.0. | 19:14 |
dking_desktop | python-openstackclient is 4.0.0. Is there another way to check? It's good to know that isn't specifically an Ironic thing. However, my regular VMs work fine. It's only the baremetal nodes causing me trouble. | 19:16 |
dking_desktop | Oh, I add that to the command line. Okay, I can run that, but no mention of the above trait. | 19:19 |
*** READ10 has quit IRC | 19:20 | |
dking_desktop | http://paste.openstack.org/show/789544/ | 19:21 |
dking_desktop | efried: I notice that the above output doesn't show nearly as much information as I see for my compute node. Would the problem be that there's just not any information there about the CPU, etc.? | 19:25 |
efried | dking_desktop: sorry, yes, you need to specify a microversion for almost every OSC command with placement, as OSC defaults to 1.0 and very little in placement worked at that microversion. You can use an environment variable if you'd rather not have to think about it with every command. | 19:26 |
efried | dking_desktop: When you say "as I see for my compute node", you mean a libvirt host? | 19:27 |
efried | is that what compute1.stack1 is? | 19:27 |
dking_desktop | Correct | 19:30 |
efried | Next thing to look at is the inventory of your ironic node vs. the flavor you're trying to deploy. | 19:30 |
efried | If you're seeing COMPUTE_STATUS_DISABLED in play, your control plane is at least at Train, which means your node is supposed to be at least at Stein, by which time we had cut ironic over to reporting single-unit custom resource classes. | 19:30 |
efried | something like | 19:31 |
efried | openstack resource provider inventory list (or maybe show) $node_uuid | 19:31 |
efried | should show you that. | 19:31 |
dking_desktop | I'm using train. | 19:31 |
efried | okay, so you might want to make life easier with | 19:32 |
efried | export OS_PLACEMENT_API_VERSION=1.36 | 19:32 |
efried | (I think I'm spelling that var name right) | 19:32 |
efried | Then you won't have to add --os-placement-api-version with every command. | 19:32 |
dking_desktop | Okay, that shows me the resource class and a few other pieces of info: http://paste.openstack.org/show/789545/ | 19:32 |
efried | Great. So the flavor you're using should be asking for resources:CUSTOM_BAREMETAL_RESOURCE_CLASS=1 | 19:34 |
*** READ10 has joined #openstack-nova | 19:34 | |
efried | is it? | 19:34 |
dking_desktop | Yes, it is: http://paste.openstack.org/show/789546/ | 19:35 |
efried | the fact that your trait list didn't show COMPUTE_STATUS_DISABLED, and your inventory showed reserved=0, is two out of the three things that should make this node eligible for scheduling. | 19:35 |
dking_desktop | That sounds good. Any idea why I would be getting "No valid host was found." | 19:36 |
efried | okay, I need to go check whether you need to do something special the ram and disk (set them to zero) but I think those should be ignored. Meanwhile, the last of the three ^ things is to make sure there's no allocation present. | 19:36 |
efried | openstack resource provider allocation list (or show?) $node_uuid | 19:36 |
dking_desktop | That's empty. | 19:37 |
efried | Okay. Easier than me finding that code would be checking your placement logs. | 19:37 |
efried | Look for a line that has a GET call to the /allocation_candidates route with a querystring that includes CUSTOM_BAREMETAL_RESOURCE_CLASS | 19:38 |
efried | Okay, yeah, it looks like you need to set the flavor VCPUs to zero to make this work. | 19:40 |
efried | dking_desktop: like this: https://docs.openstack.org/ironic/latest/install/configure-nova-flavors | 19:41 |
efried | (not just VCPU, MEMORY_MB and DISK_GB too) | 19:42 |
dking_desktop | efried: Sorry, took me a minute to find it: http://paste.openstack.org/show/789547/ | 19:42 |
dking_desktop | Ah, so the VCPUs could be the problem? Let me see if I can update that. | 19:42 |
efried | Yup, so you see where that query is *also* asking for DISK_GB%3A1%2CMEMORY_MB%3A512%2CVCPU%3A1 (DISK_GB:1,MEMORY_MB:512,VCPU:1)? | 19:43 |
efried | your baremetal node's inventory doesn't have any of those resources. | 19:44 |
efried | Those are being fed in from your base flavor's disk/ram/vcpus | 19:44 |
*** READ10 has quit IRC | 19:45 | |
efried | So the fix (ahem, it's a hack, I am ashamed) is to explicitly override those with zeros to take them out of the query. | 19:45 |
efried | And I think we did that hack because letting you set the base flavor values to real zeros would have blown up the code in a billion places | 19:46 |
efried | dking_desktop: anyway, if you follow https://docs.openstack.org/ironic/latest/install/configure-nova-flavors you should be able to make it work. | 19:46 |
dking_desktop | I'm having trouble getting it to accept 0. | 19:47 |
*** jawad_axd has joined #openstack-nova | 19:48 | |
efried | dking_desktop: You can't set the base flavor properties to zero. You have to set *additional* extra specs for resources:{VCPU, MEMORY_MB, DISK_GB} | 19:48 |
efried | ... to zero | 19:48 |
efried | That will cause the scheduler to ignore the base vcpus/ram/disk values. | 19:49 |
efried | ...which can be set to whatever. | 19:49 |
dking_desktop | Ah. Yes, that's what's in the article, so that makes sense. Let me try that. | 19:50 |
*** jawad_axd has quit IRC | 19:53 | |
dking_desktop | Great! That got much further. The build still failed, and I need to investigate that, but at least it started trying. | 19:55 |
efried | Okay, good deal. | 19:56 |
dking_desktop | I got: 'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance cc78edb5-0268-4d71-a48d-61608b532d6f.' | 19:59 |
dking_desktop | Do you know where in the logs I might see where that failed? | 19:59 |
*** lbragsta_ has joined #openstack-nova | 20:00 | |
dking_desktop | Oh, I see it in nova-compute-ironic.log. It seems that the deploy image needs to be a UUID and not a name. | 20:01 |
efried | dking_desktop: I assume that was in your controller logs. You want to look in the compute log on the host that owns your ironic node to see why it bounced. | 20:01 |
dking_desktop | Yeah. nova-compute-ironic.log showed: Validation of image href deploy-initrd failed, reason: Scheme-less image href is not a UUID. | 20:02 |
dking_desktop | efried: Thank you very much for your help! I wouldn't have been able to make progress today without it. | 20:06 |
efried | dking_desktop: You're welcome. | 20:07 |
efried | dking_desktop: Reading between the lines, you're trying $old_release configurations/images/etc against $new_release code, and running into stuff we changed along the way. | 20:08 |
*** jawad_axd has joined #openstack-nova | 20:09 | |
efried | dking_desktop: since it's ironic you're trying to deploy, you might also try the #openstack-ironic channel, as they'll generally be more familiar with the quirks in that area. dtantsur|afk I think would be particularly helpful, though he's Eastern Europe so best to hit him earlier in the day. | 20:09 |
dking_desktop | Earlier, I think I wasn't able to find any good documentation. My google searches kept taking me to outdated documentation, and so I think I started using that, only to find trouble down the road after I'd forgotten where I'd found the information. | 20:10 |
efried | ugh | 20:10 |
dking_desktop | And yes, the timezone issues make things difficult. I do most of my work while they're sleeping, though they're helpful early in my day. | 20:11 |
efried | well, what might work *sometimes* is, if you hit a page under docs.openstack.org/$proj/$rele/..., try replacing $rele with `latest` | 20:11 |
efried | You'll either get the latest instructions, or you'll get a 404, which means whatever you're trying to do won't work anyway :P | 20:11 |
efried | (in your case s/$rele/train/) | 20:12 |
efried | bauzas, sean-k-mooney, stephenfin: Responded on https://review.opendev.org/#/c/552924/ (NUMA RPs). | 20:12 |
efried | I would update it myself, but then stephenfin and gibi would have to be the approvers. | 20:13 |
*** jawad_axd has quit IRC | 20:13 | |
efried | I guess we pretty much have to wait for morning anyway, so I'll just leave it. | 20:13 |
dking_desktop | Thank you so much. I think that now that I'm seeing the correct documentation, I may just start over and read each piece again in order. | 20:15 |
efried | dking_desktop: Cool. If in doing that you find discrepancies (or even typos), please propose edits. | 20:16 |
efried | ...or open bugs | 20:16 |
efried | ...or at least shout in here and we'll help you do ^ | 20:16 |
dking_desktop | Thanks. I submitted some a while back, and I've been the inspiration for a few more fixes already. Even if I'm having trouble, at least, there's updates being made somewhere. | 20:17 |
*** rosmaita has left #openstack-nova | 20:18 | |
*** factor has joined #openstack-nova | 20:18 | |
*** brinzhang has quit IRC | 20:27 | |
*** jawad_axd has joined #openstack-nova | 20:29 | |
efried | Nova meeting in half an hour in #openstack-meeting. | 20:30 |
efried | Lots to discuss. Please plan to attend. | 20:30 |
*** jmlowe has joined #openstack-nova | 20:32 | |
sean-k-mooney | efried: sorry i was picking up the keys to my now house this afternoon. so i missed the spec discussions. im just getting dinner now but ill try and look at them in an hour or so. | 20:33 |
sean-k-mooney | i can try and attent the nova meeting too | 20:33 |
*** jawad_axd has quit IRC | 20:34 | |
efried | sean-k-mooney: congrats on the house | 20:35 |
sean-k-mooney | now all i need is furniture, broadband, utilities and to move all my stuff :) | 20:36 |
*** Sundar has quit IRC | 20:40 | |
efried | sean-k-mooney: would be nice to have a nova-side delegate who attended the vol local cache meeting. | 20:41 |
efried | I'm watching the replay, but I'm afraid it's not going to help me understand whether this is going to fly for U. | 20:41 |
efried | I assume lyarwood is sleeping? | 20:42 |
efried | sean-k-mooney: also, I added you to the os-vif release patch. Assume no reason not to merge that? https://review.opendev.org/707018 | 20:43 |
*** jawad_axd has joined #openstack-nova | 20:50 | |
*** jawad_axd has quit IRC | 20:54 | |
*** nweinber has quit IRC | 21:08 | |
sean-k-mooney | not that i know of. i will check the review queue and +1 it soon | 21:10 |
*** N3l1x has quit IRC | 21:13 | |
*** xek has quit IRC | 21:25 | |
*** lbragsta_ has quit IRC | 21:29 | |
*** penick has joined #openstack-nova | 21:31 | |
sean-k-mooney | melwitt: it sound like the quota/flavor counting is boraderline a bug fix but ya if you think its better to do in V then thats fine | 21:33 |
sean-k-mooney | i say that because once we fix it upstream im sure we will be asked to backport it downstream | 21:34 |
*** penick has quit IRC | 21:34 | |
melwitt | sean-k-mooney: normally it would be but I think because we need to leverage a new feature in placement and as part of that we'd have to migrate all of nova's allocations to have proper consumer types set, it's too big to be a bug fix. it's a spec | 21:35 |
sean-k-mooney | ah right ya that makes sense | 21:35 |
sean-k-mooney | i didnt think about the need to do a data migration of the allocations. | 21:41 |
*** jmlowe has quit IRC | 21:46 | |
*** penick has joined #openstack-nova | 21:50 | |
*** slaweq has quit IRC | 22:01 | |
*** jmlowe has joined #openstack-nova | 22:02 | |
*** jmlowe has quit IRC | 22:08 | |
*** slaweq has joined #openstack-nova | 22:11 | |
*** jawad_axd has joined #openstack-nova | 22:13 | |
efried | sean-k-mooney: you know that the num_implicit_numa_nodes thing is dead as of the latest rev, right? | 22:15 |
efried | so, your last comment is n/a? | 22:15 |
*** slaweq has quit IRC | 22:16 | |
sean-k-mooney | i just noticed that now | 22:17 |
sean-k-mooney | so we are not doing the spliting? | 22:17 |
*** eharney has quit IRC | 22:18 | |
sean-k-mooney | i havent fully got up to speed with what changed sicne v19-21 | 22:18 |
*** jawad_axd has quit IRC | 22:18 | |
*** jmlowe has joined #openstack-nova | 22:18 | |
sean-k-mooney | i see that the reporting is now a tristate True|false|none | 22:18 |
efried | right. But that's not explained at all in the doc; it needs to be. | 22:19 |
sean-k-mooney | are we still going to do the implict numa generation for non numa guests? or has that been removed too | 22:19 |
*** penick has quit IRC | 22:21 | |
efried | removed | 22:22 |
efried | sean-k-mooney: we're back to segregating | 22:22 |
sean-k-mooney | ok | 22:22 |
sean-k-mooney | i might still propose my automatic asymetirc numa node change as a bug fix then | 22:23 |
sean-k-mooney | then someone can tell me its a feature and it can wait to V | 22:23 |
*** penick has joined #openstack-nova | 22:24 | |
efried | You mean: | 22:24 |
efried | Today if you say hw:numa_nodes=$x and we can't split $x evenly we bounce; | 22:24 |
efried | With your fix we would split asymmetrically, as close to evenly as possible | 22:24 |
efried | ? | 22:24 |
*** brinzhang has joined #openstack-nova | 22:24 | |
sean-k-mooney | yep | 22:24 |
sean-k-mooney | no other change | 22:24 |
efried | What's the error for that bounce today? | 22:25 |
efried | From the API, I imagine? | 22:25 |
sean-k-mooney | we have an exception we raise form the api yes | 22:25 |
sean-k-mooney | that says we cant generate an asemtic numa node configuration and you have to manally set it in the flavor/image | 22:25 |
sean-k-mooney | so it fails only at server boot but before we even create a build request | 22:26 |
efried | So yeah, I'm not sure if that counts as a bug fix or a feature. I'm also not sure whether it's important that it be discoverable or optional. | 22:26 |
sean-k-mooney | i think you will get a 400 | 22:26 |
efried | I can see the argument either way. | 22:26 |
sean-k-mooney | its a triaval change too this fucntion https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L1564-L1579 | 22:27 |
efried | yeah, I understand the change | 22:28 |
efried | maybe gmann could weigh in as to whether it would need a microversion. | 22:28 |
efried | I would be okay without one, I think. Basically today if you have flavors that look like that they are useless. | 22:28 |
sean-k-mooney | it currently returns a 400 https://github.com/openstack/nova/blob/0d3aeb0287a0619695c9b9e17c2dec49099876a5/nova/exception.py#L1776 | 22:29 |
efried | and it's not like you're sitting around trying them again and again to see if maybe they work now. | 22:29 |
efried | you would have tried them, gotten the bounce, and (if you understood the issue) just deleted them or modified them to work. | 22:29 |
efried | so this would just allow you to start making new flavors that aren't subject to that limitation. | 22:29 |
dking_desktop | I know this is the wrong place for this, but would anybody here happen to know, when a baremetal server is being deployed with "openstack server create ...", and it reboots to get a DHCP request, what service should be handling the DHCP request? I'm assuming that it's attempting to get the deploy_image from glance somehow. | 22:29 |
sean-k-mooney | well no so you can set them in the image | 22:29 |
efried | meh, same same. | 22:30 |
* efried steps aside and lets sean-k-mooney handle question with "DHCP" in it... | 22:30 | |
sean-k-mooney | efried: the point is the image may have had 5 cpus and you tried to use it with an image that asked for 2 numa nodes | 22:30 |
efried | oh, I see. Then it's not quite so simple. | 22:30 |
sean-k-mooney | ya so today it would fail | 22:30 |
efried | I didn't know you could ask for numa topo via the image. But that makes sense now I think about it. | 22:30 |
sean-k-mooney | with a tiny change it would work | 22:31 |
dking_desktop | sean-k-mooney: Would you happen to have any idea there? I think the folks over in #openstack-ironic are overseas and sleeping. | 22:31 |
*** penick has quit IRC | 22:31 | |
sean-k-mooney | dking_desktop: the dhcp request is handeled by neutron dhcp agent | 22:31 |
sean-k-mooney | the way it works as part of the dhcp respoce we pass a dhcp option that tells the server where to find the ipxe image | 22:32 |
sean-k-mooney | that then deploys the ironic python agent | 22:32 |
sean-k-mooney | which connects to glance and streams the image onto the local disk of the ironic server | 22:32 |
efried | I'm out. | 22:33 |
efried | Good luck. | 22:33 |
efried | o/ | 22:33 |
dking_desktop | efried: Have a great night! | 22:33 |
*** jawad_axd has joined #openstack-nova | 22:34 | |
sean-k-mooney | dking_desktop: did that anser your question | 22:34 |
sean-k-mooney | by defualt the deploy image with the ironic python agent is served off a tftp share that is pxi booted not form glance | 22:34 |
dking_desktop | sean-k-mooney: Great! I suspected that. I'm looking at the neutron-dhcp-agent container. I see that it's running dnsmasq, but I can't see it listening anywhere. | 22:35 |
sean-k-mooney | dking_desktop: that said i know ironic have been working on redfish and http boot | 22:35 |
sean-k-mooney | dking_desktop: it will be running in a network namespace | 22:35 |
dking_desktop | I'd love to use Redfish. Unfortunately, I think there's a flaw in my server's redfish implimentation that causes it to fail when moving it to manage. | 22:35 |
sean-k-mooney | ya i have seen that altough i have only worked with prepoduction server that had redfish support so i was just happy it booted :) | 22:36 |
*** jmlowe has quit IRC | 22:36 | |
*** jawad_axd has quit IRC | 22:38 | |
dking_desktop | I do like redfish, though. I'm using it for everything outside of openstack. | 22:39 |
sean-k-mooney | enginering samples of motherboads or alpha bios roms are not your friend whn trying to get redfish to work however | 22:40 |
dking_desktop | It seems that I'm not familiar with network namespaces. That's something new I suppose that I"ll need to learn about. I did find the configuration file, though. It only has one line, and that's for "log-facility". | 22:41 |
sean-k-mooney | dking_desktop: so anyway if you log into the network node and do "ip netns" you should see a bunch of network namespaces | 22:41 |
sean-k-mooney | the one where dnsmask is runnign will be dhcp_<network uuid> i think | 22:42 |
dking_desktop | There's just a couple of them at the moment, and one is the "qdhcp-..." | 22:42 |
sean-k-mooney | yep that is likely the one | 22:42 |
sean-k-mooney | q stands for quantum which is what neutron was originally called | 22:43 |
*** penick has joined #openstack-nova | 22:43 | |
sean-k-mooney | so if you do "sudo ip netns exec qdhcp-.... bash" you will spawn a bash shell in the network namespace | 22:44 |
sean-k-mooney | then you can do "netstat -nlp" | 22:44 |
sean-k-mooney | and you shoudl see it listening on port 53? | 22:44 |
sean-k-mooney | that is the dhcp port right | 22:44 |
dking_desktop | I was thinking it was port 67. I think 53 is DNS. | 22:45 |
dking_desktop | But yes, both are there. | 22:45 |
sean-k-mooney | ah you are right 53 is dns | 22:45 |
*** mriedem has quit IRC | 22:45 | |
dking_desktop | That's pretty neat. I see that I still have much to learn. | 22:46 |
sean-k-mooney | so if you install tcpdump or tskark you shoudl be able to dump the dhcp packets | 22:46 |
sean-k-mooney | i prefer tshark(the cli for wireshark) since it print the packets more nicely | 22:46 |
sean-k-mooney | so "tshark -i <interface> -V dhcp" | 22:47 |
sean-k-mooney | the -V is what prints the full packet | 22:47 |
sean-k-mooney | it might not recognise dhcpu in which case you would do 'tshark -i <interface> -V udp port 67 or 68' | 22:48 |
*** jmlowe has joined #openstack-nova | 22:49 | |
dking_desktop | I've been using tcpdump. So, I see that inside the network namespace, my devices are limited to just the loopback, and another, which I'm assuming is from an ovs bridge port. | 22:49 |
sean-k-mooney | yes | 22:49 |
sean-k-mooney | what is the actull issue you are having by the way | 22:51 |
dking_desktop | Is that for the provisioning_network? | 22:51 |
sean-k-mooney | so it depned on how you have it set up. i belive you can etiher use a seperate provisioning netwrok with a dnsmask manage by ironic or you can use a neutron netwrok | 22:51 |
sean-k-mooney | i should point out that i have not used ironic in about 4 release so they could have change things. | 22:52 |
dking_desktop | The issue is that I'm trying to deploy a baremetal server. Where I'm at currently is that I have created the baremetal node, introspected it, provided it, and I'm attempting to "openstack server create". I see that the node reboots and sends a DHCP request, but it gets no response, so it never completes the BUILD. | 22:52 |
sean-k-mooney | ah ok | 22:53 |
sean-k-mooney | is your provisioning network a neutron netwrok | 22:53 |
sean-k-mooney | if so did you make it a flat network | 22:53 |
dking_desktop | Yes, but I'm pretty sure I set it up incorrectly. I'm still trying to get familiar with openstack networking. | 22:53 |
sean-k-mooney | or are you using the external provioning network approch where the network is not manage by openstack | 22:53 |
sean-k-mooney | dking_desktop: i think the issue you are hitting is that ironic only optionally uses neutorn | 22:54 |
*** TxGirlGeek has joined #openstack-nova | 22:54 | |
sean-k-mooney | in older release provioning was handeled by a non nuton network | 22:54 |
sean-k-mooney | in more recent release they use neutron | 22:55 |
sean-k-mooney | not all the docs are clear on whant you shoudl do in each case | 22:55 |
sean-k-mooney | i assume you are using enabled_network_interfaces=noop,flat,neutron and default_network_interface=neutron | 22:56 |
dking_desktop | I'm using train, currently. I'm open to whatever option works. I saw somewhere in the documentation that I should set cleaning_network, and then I got a complaint that I should set provisioning_network also. I didn't find any documentation, so I just made a flat network, and tried using that. | 22:56 |
sean-k-mooney | this is the relevent doc i think https://docs.openstack.org/ironic/train/install/configure-tenant-networks.html | 22:57 |
dking_desktop | enabled_network_interfaces = flat,neutron, and I don't have a default_network_interface. | 22:58 |
sean-k-mooney | dking_desktop: i think that is ok | 22:58 |
*** jmlowe has quit IRC | 22:58 | |
sean-k-mooney | its says if default_network_interface is not set the default network interface is determined by looking at the [dhcp]dhcp_provider | 22:59 |
sean-k-mooney | dking_desktop: did you disabel security groups for your provisioning and cleaning network | 23:00 |
dking_desktop | I was just reading about that. I did not. | 23:02 |
dking_desktop | I suppose that I should set cleaning_network_security_groups and provisioning_network_security_groups ? Are those the group names or IDs? | 23:02 |
sean-k-mooney | usually the uuid | 23:03 |
sean-k-mooney | if intospection is working then you are 90% of the way there | 23:04 |
sean-k-mooney | as that means 1 ironci can manage teh hardawer over ipmi | 23:04 |
*** tkajinam has joined #openstack-nova | 23:04 | |
sean-k-mooney | 2 it can serve the intospection ram disk | 23:04 |
dking_desktop | Yep. It took quite some time to get that working. | 23:04 |
dking_desktop | So, I know that it at least can get a ram disk to boot. I just have to figure out how to get the networking straight for provisioning. | 23:05 |
*** nweinber has joined #openstack-nova | 23:06 | |
sean-k-mooney | ya unfortunetly i think you will have to ask either the ironic or neutron folks | 23:07 |
sean-k-mooney | i have done it years ago but i dont use ironic often so i have forgoten most of it | 23:07 |
openstackgerrit | Brian Rosmaita proposed openstack/nova master: Reject boot request for unsupported images https://review.opendev.org/707738 | 23:08 |
sean-k-mooney | dking_desktop: do you need multi tenancy by the way for the provioning network | 23:09 |
sean-k-mooney | if its a private cloud you could look at teh simpler flat configuration | 23:10 |
dking_desktop | I might need to do that. Right now, I want to leave my options open. | 23:10 |
*** ivve has quit IRC | 23:11 | |
*** huaqiang has quit IRC | 23:11 | |
*** slaweq has joined #openstack-nova | 23:11 | |
dking_desktop | Is the "provisioning_network" only to get the ramdisk booted and deploy the server? So, once that's done, it's either not necessary, or perhaps only for status updates? | 23:11 |
*** huaqiang has joined #openstack-nova | 23:11 | |
sean-k-mooney | yes basicaly | 23:12 |
sean-k-mooney | it is the network that need to have conectivity to where the image is located | 23:12 |
sean-k-mooney | and the tftp server | 23:12 |
sean-k-mooney | once the ironic node is provisioned it will normally use a different interface for the teant to ssh in/have netwrok conenctivity out onto the datacenter | 23:13 |
sean-k-mooney | dking_desktop: you might be hitting this by the way https://docs.openstack.org/ironic/train/admin/troubleshooting.html#dhcp-during-pxe-or-ipxe-is-inconsistent-or-unreliable | 23:15 |
dking_desktop | So, maybe you can help me here. Inside of the network namespace, I'm not seeing any DHCP requests. That explains why I didn't see anything logged and no responses. | 23:15 |
*** slaweq has quit IRC | 23:16 | |
sean-k-mooney | ya so its possible the dhcp request is being droped by the switch before it gets to the contoler | 23:16 |
dking_desktop | So, how _should_ the packets be getting there? I see that this network interface is an ovs port inside of br-int. I know that br-int is patched to br-ex. | 23:17 |
*** nweinber has quit IRC | 23:17 | |
sean-k-mooney | yes and the br-ex should have a physical interface attached | 23:18 |
dking_desktop | The server is booting up using DHCP/PXE, but it is on a trunked port, so the packets are coming in untagged. I know that's caused me trouble before. | 23:18 |
sean-k-mooney | right so if the neutron network is a flat netwrok | 23:18 |
*** TxGirlGeek has quit IRC | 23:18 | |
sean-k-mooney | then it should be untag form the server, get to the top of rack switch and remain untagged | 23:19 |
sean-k-mooney | then as ita a broadcast it will flood | 23:19 |
dking_desktop | It does. It's attached to bond0. So, does br-ex send DHCP broadcasts to br-int, and then it sends them to all of its ports? That doesn't sound right. | 23:19 |
sean-k-mooney | eventually make it to the contoler | 23:19 |
sean-k-mooney | when it arrives in the contoler it will enter the br-ex. it will be vlan taged with a local vlan and then be flooded to only the ports fo that vlan | 23:20 |
sean-k-mooney | then it will be striped when it is send to the dhcp namespace | 23:20 |
sean-k-mooney | so if you do a tcp dump on the bond you should see the request | 23:20 |
sean-k-mooney | if its gettign that far | 23:21 |
dking_desktop | Yes, I see them on the requests. In order to get ironic dnsmasq to work, though, I had to bring up the br-ex interface with an IP address. Could that be messing with this? | 23:22 |
sean-k-mooney | perhaps the br-ex normally should not require an ip | 23:23 |
dking_desktop | So, the baremetal server sends a DHCP request, it goes through the chassis switch, to the ToR, and then from there to the controller, and I see the data coming in on bond0. | 23:23 |
sean-k-mooney | so you deploed a second dnsmas for ironic | 23:24 |
sean-k-mooney | that is vaild but you have to set the dhcp provider i belive | 23:24 |
dking_desktop | Maybe not, but without it, I couldn't get ironic's dnsmasq to be able to see the packets. So, it was a hack. Would there have been a better way? Folks in the other channel were recommending that I have untagged packets tagged at the switch port, but so far, that's not been working. | 23:24 |
sean-k-mooney | so that is the old way to do it im not sure if its still required or the default. | 23:25 |
sean-k-mooney | when ironic was first created it handeld amost all its nteworking itself | 23:25 |
sean-k-mooney | then neutron was added after | 23:25 |
dking_desktop | Ironic handles its own dnsmasq. It works fine once I manually changed the interface to br-ex and put an IP on br-ex to bring it up. | 23:25 |
sean-k-mooney | slowly over the laft few years they have been moving ot useing neutorn where possible | 23:25 |
sean-k-mooney | ya | 23:26 |
sean-k-mooney | that was how i deployed previously | 23:26 |
sean-k-mooney | if you do a tcp dump on br-ex | 23:26 |
sean-k-mooney | do you see the dhcp request | 23:26 |
dking_desktop | Yes, I can see them on br-ex | 23:27 |
sean-k-mooney | and they are not vlan tagged | 23:27 |
dking_desktop | Correct | 23:28 |
sean-k-mooney | i have had issue with default route and arp that cause the respoces to not be sent by the br-ex in the past | 23:28 |
sean-k-mooney | do you have a second interface on the same subnet | 23:28 |
dking_desktop | Which subnet is that? The one that I setup for br-ex? | 23:28 |
sean-k-mooney | yes | 23:29 |
sean-k-mooney | i have had issues in the past where i have added an ip to the br-ex and recived packet but had teh reply sent via ens3 becaue it had an ip in the same subnet but a better metric | 23:29 |
dking_desktop | I don't think so. I set that up to use an IP from the range I've been using for untagged packets. | 23:29 |
*** spatel has joined #openstack-nova | 23:30 | |
dking_desktop | No, the only route for that subnet is through dev br-ex. | 23:30 |
sean-k-mooney | ok this sounds very familar but i dont recall the casue. | 23:31 |
dking_desktop | From inside of the qdhcp-* subnet, "tcpdump -i any -nne -xx -Avvvv" hasn't shown any packets yet. | 23:31 |
sean-k-mooney | yes so i dont think it will since you the iniall boot will go to the provisoing network | 23:32 |
*** lbragstad_ has joined #openstack-nova | 23:32 | |
sean-k-mooney | i suspect if you check the uuid that is the dhcp agent for the tenat network | 23:32 |
sean-k-mooney | not the provisoning network | 23:32 |
sean-k-mooney | anyway its getting late and im out of ideas so ill have to leave it there | 23:33 |
*** lbragstad has quit IRC | 23:33 | |
*** spatel has quit IRC | 23:34 | |
dking_desktop | Oh, okay. I see now that the * in qdhcp-* is actually an ID for a network. Exactly, it's the tenant network. | 23:35 |
dking_desktop | Let me check that I setup DHCP for the provisioning network. | 23:35 |
dking_desktop | Oh, I did not enable DHCP for the provisioning network. I can enable it. Is that what's supposed to happen? | 23:36 |
sean-k-mooney | if you didnt enable it in the subnet then neutron would not create the namespce or spwan the dnsmas process for it so that could be it | 23:36 |
sean-k-mooney | there is one way to find out:) but i think if you use the nutron netorking interface dirver then yes you shoudl turn it on | 23:37 |
dking_desktop | Let me try that. But if that's the case, would the DNS be the right one, with the PXE information in it? | 23:37 |
sean-k-mooney | if you use the flat network interface dirver i think you deploy a seperate dnsmask for ironic as you did manually | 23:37 |
sean-k-mooney | honestly im at the edge of my knoladge here as i said its been a while but i cant recall | 23:38 |
dking_desktop | Your help has been very enlightening. Also, I just saw some packets in that network, and it does seem to be set up for PXE. I'm going to try creating a server and see if that works. | 23:42 |
*** lbragstad__ has joined #openstack-nova | 23:42 | |
dking_desktop | But even if not, I've learned much, so thank you very much! | 23:42 |
*** lbragstad_ has quit IRC | 23:44 | |
sean-k-mooney | this is realy really old but if you have not seen it before its how neutron ovs networking used to work | 23:45 |
sean-k-mooney | https://www.rdoproject.org/networking/networking-in-too-much-detail/ | 23:45 |
sean-k-mooney | its now simpler | 23:45 |
sean-k-mooney | but it is a good thing to read over at least once even if its not how it works exactly today | 23:45 |
* sean-k-mooney feels old since this was published after i started wroking on openstack and figuring this stuff out | 23:46 | |
dking_desktop | Thank you. I'll check that out. It may help with some of the dark spots in my knowledge. | 23:48 |
*** lbragstad_ has joined #openstack-nova | 23:49 | |
*** lbragstad__ has quit IRC | 23:51 | |
*** lbragstad__ has joined #openstack-nova | 23:56 | |
*** lbragstad_ has quit IRC | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!