Friday, 2021-02-12

*** zzzeek has quit IRC00:08
*** zzzeek has joined #openstack-ironic00:10
zer0c00lthe prepare_ramdisk() and prepare_instance() methods kind of looks similar. Wonder when each of these methods are called?00:25
zer0c00lis prepare_ramdisk called during "ironic only" deployment?00:25
zer0c00lprepare_instance() is called when nova calls ironic to do it's job?00:26
zer0c00lor an instance boot is involved?00:26
*** tosky has quit IRC00:36
TheJuliazer0c00l: prepare_ramdisk is when basically network booting is in the  mix and the default, where as prepare_instance is the machine is booting, it may be network by default or it may be booting to local boot00:48
TheJuliapreferably local00:48
zer0c00lTheJulia: ack. Thanks.02:02
*** tzumainn has quit IRC02:18
*** rcernin has quit IRC02:19
*** rcernin has joined #openstack-ironic02:33
*** rcernin has quit IRC02:42
*** rcernin has joined #openstack-ironic02:42
*** buhman_ is now known as buhman02:49
*** uzumaki has quit IRC03:11
*** HardCase has joined #openstack-ironic03:41
HardCaseHi all, question, is it possible to introspect a node that has already been deployed?  Lost my introspection data and need to rebuild it so I can use NodeDataLookup03:42
*** alexmcleod has quit IRC03:42
*** zzzeek has quit IRC03:57
*** zzzeek has joined #openstack-ironic03:58
*** k_mouza has joined #openstack-ironic04:06
*** k_mouza has quit IRC04:11
*** uzumaki has joined #openstack-ironic04:11
*** gyee has quit IRC04:30
*** zzzeek has quit IRC04:43
*** zzzeek has joined #openstack-ironic04:45
*** k_mouza has joined #openstack-ironic04:54
*** k_mouza has quit IRC04:58
*** zzzeek has quit IRC05:15
*** zzzeek has joined #openstack-ironic05:16
*** HardCase has quit IRC06:01
*** zzzeek has quit IRC06:03
*** zzzeek has joined #openstack-ironic06:04
*** k_mouza has joined #openstack-ironic06:07
*** bburns_ has quit IRC06:07
*** k_mouza has quit IRC06:11
*** zzzeek has quit IRC06:13
*** zzzeek has joined #openstack-ironic06:14
*** zzzeek has quit IRC06:24
*** zzzeek has joined #openstack-ironic06:30
*** bburns has joined #openstack-ironic06:31
*** zzzeek has quit IRC06:37
*** zzzeek has joined #openstack-ironic06:37
*** zzzeek has quit IRC07:01
*** zzzeek has joined #openstack-ironic07:02
*** anuradha1904 has joined #openstack-ironic07:02
*** moshiur has joined #openstack-ironic07:04
*** rcernin has quit IRC07:37
arne_wiebalckGood morning, ironic!07:42
*** zzzeek has quit IRC07:44
*** zzzeek has joined #openstack-ironic07:46
arne_wiebalckHardCase: Yes, there is a feature called 'active inspection'. It allows to run the inspector from within the physical instance.07:48
*** rpittau|afk is now known as rpittau08:02
rpittaugood morning ironic! o/08:02
rpittauhappy Friday!08:02
*** zzzeek has quit IRC08:05
*** zzzeek has joined #openstack-ironic08:07
*** rcernin has joined #openstack-ironic08:18
*** monica_pardhi has joined #openstack-ironic08:19
*** tosky has joined #openstack-ironic08:24
*** zzzeek has quit IRC08:43
*** zzzeek has joined #openstack-ironic08:46
*** monica_pardhi has quit IRC08:46
*** zzzeek has quit IRC08:55
*** zzzeek has joined #openstack-ironic08:56
*** zzzeek has quit IRC09:05
*** zzzeek has joined #openstack-ironic09:07
*** lucasagomes has joined #openstack-ironic09:13
*** zzzeek has quit IRC09:19
*** zzzeek has joined #openstack-ironic09:21
*** tosin has joined #openstack-ironic09:25
*** ociuhandu has joined #openstack-ironic09:26
*** derekh has joined #openstack-ironic09:32
jandersgood morning rpittau and arne_wiebalck o/09:35
rpittauhey janders :)09:35
*** uzumaki has quit IRC09:44
*** uzumaki has joined #openstack-ironic09:44
*** ociuhandu has quit IRC09:45
*** ociuhandu has joined #openstack-ironic09:50
*** rcernin has quit IRC09:55
jandershttps://bluejeans.com/772893798 SPUC anyone? :)09:57
*** ociuhandu has quit IRC10:02
*** k_mouza has joined #openstack-ironic10:02
*** rcernin has joined #openstack-ironic10:09
*** ociuhandu has joined #openstack-ironic10:11
*** ociuhandu has quit IRC10:11
*** ociuhandu has joined #openstack-ironic10:18
*** rcernin has quit IRC10:23
*** dougsz has joined #openstack-ironic10:28
*** strigazi has joined #openstack-ironic10:30
*** k_mouza has quit IRC10:56
*** k_mouza has joined #openstack-ironic10:56
*** k_mouza has quit IRC11:02
iurygregorygood morning Ironic! happy friday11:09
* iurygregory grabs more coffee11:09
*** ociuhandu has quit IRC11:39
*** sshnaidm|afk has quit IRC11:41
*** dtantsur|afk is now known as dtantsur11:41
dtantsurmorning/afternoon ironic11:41
*** anuradha1904 has quit IRC11:41
dtantsurjanders: sorry, had to miss SPUC because of a doctor apt11:41
*** ociuhandu has joined #openstack-ironic11:42
*** zzzeek has quit IRC11:42
iurygregorymorning dtantsur (I hope you are well)11:44
*** zzzeek has joined #openstack-ironic11:45
*** rcernin has joined #openstack-ironic11:48
dtantsurroutine stuff, nothing to worry about11:49
*** sshnaidm|afk has joined #openstack-ironic11:49
iurygregoryawesome =)11:50
*** sshnaidm|afk is now known as sshnaidm|off11:50
*** k_mouza has joined #openstack-ironic11:53
*** ociuhandu has quit IRC11:54
jandershey iurygregory o/11:55
iurygregoryhey janders o/11:55
jandersdtantsur no worries! first things first!11:55
jandersI barely made it to the SPUC myself, storm clouds beat me this time, got pretty wet on the bike11:56
jandersdtantsur Thank you for reviewing my NVMe patch. I have a question regarding https://review.opendev.org/c/openstack/ironic-python-agent/+/771904/27/ironic_python_agent/hardware.py#1669 (using json output). I actually wanted to do this first, but the json format seems really weird:11:59
jandershttp://paste.openstack.org/show/802599/11:59
jandersI would need to figure out how to parse this (I suppose it's a decimal representation of bits indicating different capabilities). This can be done but my other concern is if some NVMe manufacturers don't stick with the spec, my interpretation of these numbers can be completely off12:00
jandersback in the time when I was trying to support NVMe sanitize in addition to format I learned that some vendors use the designated fields in their sanitize-logs, while others use reserved fields (!) so that the output makes no sense12:01
* iurygregory is trying to understand why the json format is weird12:01
janderslong story short: would you like me to try figure out how to drive the json output given all this? I ended up just parsing the plain output cause it did seem easier and I've seen it done with other tools (e.g. hdparm)12:03
rpittaujanders, dtantsur: about the json output, just please check from which version that's supported, I think I didn't mention that before because it was not compatible with all the OSes we test with, but I may have made a mistake12:06
* rpittau goes to lunch12:07
*** rcernin has quit IRC12:08
*** alexmcleod has joined #openstack-ironic12:08
dtantsurjanders: I would expect the textual form to be derived from these numbers anyway12:13
dtantsurrpittau: at least centos 8 version supports it12:13
dtantsurjanders: https://gist.github.com/dtantsur/f5200859799d29f0296005580bbe7b1e12:14
jandersdtantsur nicely done!12:19
dtantsuryeah, I wrote it when we first discussed NVMe cleaning.. not sure why I did not share with you, maybe just forgot?12:20
jandersI'm happy to switch the cleaning code to this, will just re-read the relevant part of NVMe spec to make sure there are no catches12:21
dtantsursure12:21
jandersthe catches I was concerned about were around sanitize-log12:21
dtantsurbut I think it did it based on the specs12:22
*** rcernin has joined #openstack-ironic12:22
janderssecure format should be less problematic12:22
jandersas there is no need to check logs prior to / after cleaning12:22
janders(I think :) )12:22
jandersthank you dtantsur! :)12:23
dtantsurnp12:23
openstackgerritAija Jauntēva proposed openstack/sushy master: Refactor TaskMonitor and update Volume methods  https://review.opendev.org/c/openstack/sushy/+/77453212:26
openstackgerritDmitry Tantsur proposed openstack/ironic-python-agent stable/victoria: Fix error message with UEFI-incompatible images  https://review.opendev.org/c/openstack/ironic-python-agent/+/77534012:37
dtantsurokay, where do we stand with the releases?12:38
*** ociuhandu has joined #openstack-ironic12:42
iurygregorydtantsur, I've requested bifrost yesterday12:43
dtantsurgreat, thank you! I suspect we can request IPA already (will check ironic in a few)12:44
iurygregoryany other we are ready I can request12:44
iurygregoryack will push IPA12:44
iurygregorydtantsur, ipa https://review.opendev.org/c/openstack/releases/+/77537812:52
dtantsur++12:52
dtantsuriurygregory: ironic should be ready as well, I think12:53
iurygregorydtantsur, I can request in about 39min (getting ready for my 1:1) =)12:53
*** rcernin has quit IRC12:53
iurygregoryif we are not in a hurry =)12:54
*** anuradha1904 has joined #openstack-ironic12:54
*** ociuhandu has quit IRC12:55
dtantsurnot at all12:56
janderssee you on Monday Ironic12:59
jandershave a great weekend everyone o/12:59
*** ociuhandu has joined #openstack-ironic13:00
openstackgerritRiccardo Pittau proposed openstack/ironic-python-agent master: Remove samples from the hardware test module  https://review.opendev.org/c/openstack/ironic-python-agent/+/77516313:02
*** k_mouza has quit IRC13:04
dtantsuryou too janders13:05
rpittaubye janders :)13:05
*** ociuhandu has quit IRC13:07
*** ociuhandu has joined #openstack-ironic13:11
*** ociuhandu has quit IRC13:15
openstackgerritMerged openstack/ironic master: Populate existing policy tests  https://review.opendev.org/c/openstack/ironic/+/76813613:20
openstackgerritMerged openstack/ironic master: Duplicate testing for system scoped ACL testing  https://review.opendev.org/c/openstack/ironic/+/77000213:21
*** k_mouza has joined #openstack-ironic13:21
dtantsuriurygregory: with ironic, let's probably wait for https://review.opendev.org/c/openstack/ironic/+/76835313:23
dtantsurthis is a great addition for this release13:23
dtantsurajya++13:23
openstackgerritRiccardo Pittau proposed openstack/ironic-python-agent master: [WIP] Move some raid specific functions to raid_utils  https://review.opendev.org/c/openstack/ironic-python-agent/+/77485413:23
*** moshiur has quit IRC13:27
*** rcernin has joined #openstack-ironic13:27
iurygregorydtantsur, ack!13:32
openstackgerritRiccardo Pittau proposed openstack/ironic master: Replace retrying with tenacity  https://review.opendev.org/c/openstack/ironic/+/37657413:40
openstackgerritRiccardo Pittau proposed openstack/ironic master: Replace retrying with tenacity  https://review.opendev.org/c/openstack/ironic/+/37657413:41
*** moshiur has joined #openstack-ironic13:45
*** zzzeek has quit IRC13:48
*** zzzeek has joined #openstack-ironic13:49
*** zzzeek has quit IRC13:59
dtantsurrpittau: this ^^ doesn't seem to fix the unit tests issue. or does it?13:59
*** ociuhandu has joined #openstack-ironic13:59
rpittaudtantsur: mmm no, I ignored the first part of your comment for some reason14:00
*** rloo has joined #openstack-ironic14:03
*** zzzeek has joined #openstack-ironic14:04
*** ociuhandu has quit IRC14:04
*** lmcgann has joined #openstack-ironic14:08
*** ociuhandu has joined #openstack-ironic14:08
*** bburns is now known as bburns_afk14:08
dtantsurheh14:10
openstackgerritRiccardo Pittau proposed openstack/ironic-python-agent master: Remove samples from the hardware test module  https://review.opendev.org/c/openstack/ironic-python-agent/+/77516314:11
openstackgerritRiccardo Pittau proposed openstack/ironic-python-agent master: [WIP] convert lsblk output to json  https://review.opendev.org/c/openstack/ironic-python-agent/+/77539114:11
openstackgerritRiccardo Pittau proposed openstack/ironic-python-agent master: [WIP] Use json for lsblk output  https://review.opendev.org/c/openstack/ironic-python-agent/+/77539114:11
dtantsurajya: hi, do you plan on adding deploy_steps support to openstacksdk14:12
dtantsur?14:12
*** ociuhandu has quit IRC14:13
ajyadtantsur: I can add, will create a task to deploy-steps story14:16
dtantsurthanks!14:16
*** ociuhandu has joined #openstack-ironic14:26
dtantsurajya: do you know if it works when you have the same step in a template and explicitly? say, with different priority/arguments?14:29
*** rcernin has quit IRC14:30
TheJuliagood morning14:30
dtantsurmorning TheJulia14:30
ajyadtantsur: explicit step overrides template step and has whatever is in explicit step14:31
*** ociuhandu has quit IRC14:31
ajyamorning TheJulia14:31
dtantsurajya: nice!14:31
iurygregorygood morning TheJulia =)14:37
*** moshiur has quit IRC14:37
*** tzumainn has joined #openstack-ironic14:43
*** ociuhandu has joined #openstack-ironic14:43
*** ociuhandu has quit IRC14:43
*** ociuhandu has joined #openstack-ironic14:43
dtantsurseeking opinions on https://storyboard.openstack.org/#!/story/2008611 wrt location specification. what is better:14:44
dtantsur1) "/dev/disk-by-label/bool:/some/location": "<content>"14:44
dtantsur2) "/some/location": {"on": "/dev/disk/by-label/boot", "data": "<content>"}14:45
dtantsur?14:45
openstackgerritVerification of a change to openstack/ironic failed: Guard conductor from consuming all of the ram  https://review.opendev.org/c/openstack/ironic/+/72648314:45
openstackgerritVerification of a change to openstack/ironic failed: Set default to prevent out of memory conditions  https://review.opendev.org/c/openstack/ironic/+/76310714:45
dtantsurThe latter is more explicit, the former is shorter (and thus more convenient for CLI)14:45
dtantsuror even 3) [ {"path": "/some/location", "on": "/dev/disk/by-label/boot", "data": "<content>" ]14:52
iurygregoryI would vote for 2 and 3 just because they are more explicit14:57
dtantsurand between those?14:57
dtantsurwell, #3 is probably easier to use for statically typed languages14:58
iurygregory314:58
rpittauI would say 3 as well14:58
* dtantsur has Go and Rust in mind14:58
dtantsurthanks guys, I think I'm learning towards #3 now too14:58
dtantsurlearning.. leaning14:58
openstackgerritJulia Kreger proposed openstack/ironic master: Introduce common personas for secure RBAC  https://review.opendev.org/c/openstack/ironic/+/76325514:59
openstackgerritJulia Kreger proposed openstack/ironic master: Address some rbac review feedback in merged patches  https://review.opendev.org/c/openstack/ironic/+/77539914:59
*** MentalSiege has joined #openstack-ironic15:01
rpittaubye everyone, have a great weekend! o/15:03
*** rpittau is now known as rpittau|afk15:03
TheJuliadtantsur: file path as key sounds like a good idea, data content makes sense, permissions is missing in that we should be able to assert a numeric gid/uid as well as basic unix perms15:03
TheJuliaOnly even suggesting it because some stuff evaluates permissions as first class protection against the machine having been compromised, if the file is too wide open it goes "all the nopes"15:04
dtantsurwdyt about allowing URLs for content?15:04
TheJuliasounds good to me15:04
dtantsuryeah, that's why I had to add `mode`, but good call re owner.15:04
TheJuliayeah, it bit me relatively recently :(15:05
TheJuliaalso, folders15:05
TheJuliaor maybe folders are out of scope?15:05
TheJuliaor just get the same treatment15:05
TheJuliaI dunno15:05
TheJulia"hi, uncompress this tarball on /15:06
TheJulia"15:06
TheJuliakthxbai15:06
dtantsurdirectories are automatically created, dirmode is passed to makedirs15:06
TheJulia\o/15:07
*** MentalSiege has quit IRC15:07
dtantsurhttps://storyboard.openstack.org/#!/story/2008611 updated, PTAL15:07
TheJuliaone minor thing, owner/group should be numeric most likely15:07
TheJuliaunless the actions are in the chroot of the OS that has been written to disk15:08
dtantsuryeah, we probably cannot resolve names15:08
TheJuliaexactly15:08
* TheJulia goes back to rbac15:08
*** MentalSiege has joined #openstack-ironic15:09
dtantsuriurygregory: please take another look if you have a minute15:09
iurygregorydtantsur, sure15:11
openstackgerritVerification of a change to openstack/ironic failed: Guard conductor from consuming all of the ram  https://review.opendev.org/c/openstack/ironic/+/72648315:15
TheJulia:(15:15
dtantsurfor some reason the bot started reporting each failure twice15:21
TheJuliaOut of curiosity, what are folks seeing on average for unit test runs these days?15:29
dtantsurwhat = ?15:29
TheJuliaseconds15:30
TheJuliaruntime seconds15:30
dtantsurI nearly never run the whole suite15:30
TheJuliaI typically see ~80-83 unless I'm streaming the news, which then it is like 110 seconds15:30
dtantsurthe last time I tried it was something around 120-150 seconds15:30
TheJuliamy old work laptop was running like 200+ seconds15:31
TheJuliaI guess that means my desktop is still hanging in there15:32
openstackgerritJulia Kreger proposed openstack/ironic master: Implement "system" scoped RBAC for the node endpoint  https://review.opendev.org/c/openstack/ironic/+/76325715:36
*** ociuhandu has quit IRC15:40
*** lbragstad_ has joined #openstack-ironic15:46
*** lbragstad has quit IRC15:50
iurygregorydtantsur, lgtm the RFE15:54
dtantsurthx!15:54
openstackgerritMerged openstack/ironic master: Add 'deploy steps' parameter for provisioning API  https://review.opendev.org/c/openstack/ironic/+/76835316:01
openstackgerritMerged openstack/python-ironicclient master: Add 'deploy steps' for  provisioning API  https://review.opendev.org/c/openstack/python-ironicclient/+/76835416:02
openstackgerritMerged openstack/ironic-inspector stable/train: Remove grenade jobs from old stable branches  https://review.opendev.org/c/openstack/ironic-inspector/+/77333216:02
dtantsurw00t, deploy steps!16:02
ajya\o/16:03
*** MentalSiege has left #openstack-ironic16:04
openstackgerritDmitry Tantsur proposed openstack/ironic master: Trivial: update version for deploy steps  https://review.opendev.org/c/openstack/ironic/+/77540816:04
dtantsurcould I get a quick approval ^^ iurygregory, TheJulia16:04
*** ociuhandu has joined #openstack-ironic16:04
*** ociuhandu has quit IRC16:10
openstackgerritDmitry Tantsur proposed openstack/ironic master: Move the IPv6 job to the experimental pipeline  https://review.opendev.org/c/openstack/ironic/+/77541016:10
openstackgerritMerged openstack/ironic-python-agent stable/victoria: Fix error message with UEFI-incompatible images  https://review.opendev.org/c/openstack/ironic-python-agent/+/77534016:12
TheJuliaapproved16:12
* TheJulia awaits patches16:12
openstackgerritJulia Kreger proposed openstack/ironic master: Implement "system" scoped RBAC for ports  https://review.opendev.org/c/openstack/ironic/+/76326716:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for port groups  https://review.opendev.org/c/openstack/ironic/+/76326816:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for chassis  https://review.opendev.org/c/openstack/ironic/+/76326916:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for baremetal drivers  https://review.opendev.org/c/openstack/ironic/+/76327016:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for node and driver passthru  https://review.opendev.org/c/openstack/ironic/+/76327116:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for utility APIs  https://review.opendev.org/c/openstack/ironic/+/76327216:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for volume APIs  https://review.opendev.org/c/openstack/ironic/+/76327316:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for conductor APIs  https://review.opendev.org/c/openstack/ironic/+/76327416:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for the allocation APIs  https://review.opendev.org/c/openstack/ironic/+/76327516:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for the event APIs  https://review.opendev.org/c/openstack/ironic/+/76327616:13
openstackgerritJulia Kreger proposed openstack/ironic master: Implement system scoped RBAC for the deploy templates APIs  https://review.opendev.org/c/openstack/ironic/+/76327716:13
openstackgerritJulia Kreger proposed openstack/ironic master: RBAC System Scope: observer -> reader  https://review.opendev.org/c/openstack/ironic/+/77245016:13
openstackgerritJulia Kreger proposed openstack/ironic master: Initial Project scoped tests  https://review.opendev.org/c/openstack/ironic/+/77245116:13
openstackgerritJulia Kreger proposed openstack/ironic master: Project Scoping Node endpoint  https://review.opendev.org/c/openstack/ironic/+/77392416:13
TheJuliakaboom16:13
dtantsurboom!16:13
dtantsur:)16:13
TheJuliaI had to do a minor unit test fix, because an earlier change caused the additional tests to be checked, and the filter view changes slighty if scopes are enabled... so the last patch in that chain now passes unit testing16:14
*** ociuhandu has joined #openstack-ironic16:24
*** lbragstad_ is now known as lbragstad16:25
*** k_mouza has quit IRC16:26
*** uzumaki has quit IRC16:27
*** ociuhandu has quit IRC16:31
*** ociuhandu has joined #openstack-ironic16:35
*** uzumaki has joined #openstack-ironic16:45
openstackgerritDmitry Tantsur proposed openstack/python-ironicclient master: Add missing unit tests for provision state commands  https://review.opendev.org/c/openstack/python-ironicclient/+/37760716:46
*** ociuhandu has quit IRC16:48
*** ociuhandu has joined #openstack-ironic16:49
*** ociuhandu has quit IRC16:51
*** ociuhandu has joined #openstack-ironic16:52
*** ociuhandu has quit IRC16:52
*** ociuhandu has joined #openstack-ironic16:53
*** tosin has quit IRC16:54
*** ociuhandu has quit IRC16:58
*** ociuhandu has joined #openstack-ironic17:00
*** lucasagomes has quit IRC17:00
*** tosin has joined #openstack-ironic17:02
TheJuliaspuc?17:03
iurygregorynot in the mood for spuc today .-.17:05
*** ociuhandu has quit IRC17:05
iurygregorysorry =(17:05
*** ociuhandu has joined #openstack-ironic17:05
TheJulia:(17:06
TheJuliait happens, hopefully everything is okay17:06
iurygregorymy mother is not feeling very well, I will probably take her to the doctor (I hope it's not covid .-.)17:07
TheJulia*hugs*17:08
iurygregoryty =)17:08
dtantsuriurygregory: best wishes!17:12
iurygregorydtantsur, ty!17:12
arne_wiebalckcannot join either: we had some network/db outage and Ironic still not back ...17:14
TheJulia:(17:34
*** k_mouza has joined #openstack-ironic17:38
*** gmann is now known as gmann_afk17:40
*** k_mouza has quit IRC17:41
*** ociuhandu has quit IRC17:42
* TheJulia digs deep in to the port node controller17:45
*** ociuhandu has joined #openstack-ironic17:58
*** derekh has quit IRC18:00
TheJuliahmm, it is monsooning18:03
*** ociuhandu has quit IRC18:04
openstackgerritMerged openstack/ironic master: Guard conductor from consuming all of the ram  https://review.opendev.org/c/openstack/ironic/+/72648318:12
openstackgerritMerged openstack/ironic master: Trivial: update version for deploy steps  https://review.opendev.org/c/openstack/ironic/+/77540818:12
iurygregoryinsert happy dance18:15
iurygregory\o/\o/18:15
iurygregoryI should probably wait for https://review.opendev.org/c/openstack/ironic/+/763107 since https://review.opendev.org/c/openstack/ironic/+/726483 is merged18:17
iurygregoryto request the bugfix for ironic18:18
openstackgerritMerged openstack/ironic master: Introduce common personas for secure RBAC  https://review.opendev.org/c/openstack/ironic/+/76325518:20
openstackgerritVerification of a change to openstack/ironic failed: Set default to prevent out of memory conditions  https://review.opendev.org/c/openstack/ironic/+/76310718:32
dtantsuriurygregory: I think we're ready for an ironic release. Can be done today or on Monday (it's unlikely that anybody processes it today)18:34
openstackgerritDmitry Tantsur proposed openstack/ironic-python-agent master: [WIP] New deploy step for injecting arbitrary files  https://review.opendev.org/c/openstack/ironic-python-agent/+/77542818:35
dtantsurthis ^^ is a complete implementation, although some tests are missing18:35
dtantsurit's larger than I hoped, but it also does much more than initially planned18:35
*** bburns_afk is now known as bburns18:36
iurygregorydtantsur, ack I will wait to push so we can have https://review.opendev.org/c/openstack/ironic/+/763107 on it18:38
dtantsurI'm not sure if it was planned to have it in this release or the next one?18:38
dtantsurotherwise why splitting the patches?18:38
* dtantsur will leave it up to TheJulia18:39
*** dtantsur is now known as dtantsur|afk18:39
dtantsur|afkhave a great weekend, folks!18:39
TheJuliahmm18:41
TheJuliamakes sense to go ahead and release with it merged18:41
TheJuliahopefully it will make it through the gate today18:41
iurygregorywe merged the 1st in chain...18:47
TheJuliasecond is in check queue after failure on merge18:47
TheJulia3rd is just docs18:47
iurygregoryyeah18:47
*** gmann_afk is now known as gmann18:52
*** k_mouza has joined #openstack-ironic19:13
*** anuradha1904 has quit IRC19:13
*** k_mouza has quit IRC19:17
*** ociuhandu has joined #openstack-ironic19:40
*** ociuhandu has quit IRC19:46
*** tosin has quit IRC20:04
*** rcernin has joined #openstack-ironic20:35
*** k_mouza has joined #openstack-ironic20:40
*** k_mouza has quit IRC20:41
*** k_mouza has joined #openstack-ironic20:43
*** uzumaki has quit IRC20:47
*** k_mouza has quit IRC20:48
arne_wiebalckTheJulia: we may hit the nova/ironic power sync as a scaling limit now: the time to complete a "node list" times the number of nova nodes requesting it is larger than the power sync interval of 600 secs, so the processes seem to run into each other and melt the database (I am not clear if this is only the case with cold caches as it worked before we had a network outage today)21:18
*** rcernin has quit IRC21:22
TheJuliaarne_wiebalck: are you trying to bring both services online at the same time?21:38
TheJuliaarne_wiebalck: or is the conductor being given time to get back online and do the first sweep and then the api/nova-compute services getting fired up?21:38
arne_wiebalckTheJulia: both at the same time21:38
arne_wiebalckTheJulia: you think it is a startup-only issue?21:39
TheJuliai suspect21:39
TheJuliawhen you say melt the database, is it just request latency?21:39
arne_wiebalckTheJulia: From what I see, it works ok some time after the startup, but when the power sync queries come in, all hell breaks lose ...21:40
arne_wiebalckTheJulia: yes21:40
arne_wiebalckTheJulia: requests time out at the loadbalancer21:40
TheJuliawhich requests time out?21:40
TheJulianova-compute?21:41
arne_wiebalckthe requests from nc to the ironic APIs21:41
arne_wiebalckyes21:41
TheJuliawhen you say outage? was a backup restored?21:42
arne_wiebalckno21:42
arne_wiebalckthere was a network interruption21:42
TheJuliaokay21:42
TheJuliathat was my next question21:42
arne_wiebalckthe database was unreachable for a while21:42
arne_wiebalckonce it was back, problems started ...21:43
arne_wiebalckI am still a little puzzled why it worked before21:43
arne_wiebalckI was thinking all processes were aligned or sth21:44
TheJuliawithout nova-computes running, what is do you see as the response time just requesting a list of nodes?21:44
TheJulia /is do/is it that/21:44
arne_wiebalckI have not made this test, but the query takes about 30-40 secs for 8600 nodes.21:44
arne_wiebalckthis is a node list21:45
TheJuliaI guess load balancers are killing the query at 60 seconds?21:45
arne_wiebalckwhich I use as a probe21:45
arne_wiebalckthe timeout is 270 secs21:45
arne_wiebalckI increased it at some point21:45
TheJuliaokay21:45
TheJuliaso everything seems to be find, then power sync comes along21:46
TheJuliapulls a giant list of nodes, locks one at a time and iterate through21:46
arne_wiebalckyes, this is my current thinking21:46
TheJuliathat shouldn't impact db reads21:46
arne_wiebalckpulling the list takes 30-40 secs21:46
arne_wiebalckthen there are around 20 doing this in parallel21:47
arne_wiebalckevery 600 secs21:47
arne_wiebalckso every 30 secs on avg21:47
arne_wiebalckmind you, we just moved from 17 to 20 earlier this week21:47
TheJuliasounds like we have a thundering herd problem with the database21:48
arne_wiebalckI think we just tipped over the threshold21:48
arne_wiebalckyes21:48
arne_wiebalckI have increased the interval from 600 to 3600 secs now21:48
TheJuliaany chance your able to startup delay the conductors to bring them up over a couple minutes and not all at once?21:48
arne_wiebalck things seems stable now21:48
arne_wiebalckyes21:49
TheJuliaI guess we actually have two thundering herd problems in this21:49
arne_wiebalckbut after increasing the interval, I started them all at once :)21:49
arne_wiebalckyes, totally!21:49
TheJuliai guess your nova config is still asking ironic for the power state instead of getting a callback?21:49
arne_wiebalckI saw processes piling up on the db21:49
arne_wiebalckyes :(21:50
TheJulia:(21:50
arne_wiebalcknova is still on stein21:50
TheJuliaoh21:50
arne_wiebalckironic on train21:50
arne_wiebalckquite ironic that we developed this feature :)21:50
TheJuliaI wasn't going to say that... but this is ironic21:50
arne_wiebalckand have not yet deployed it21:50
arne_wiebalckyeah :-D21:50
TheJuliaThe simplest thing I think is we need a randomizer on the power sync thundering heard initiator21:51
TheJuliaactually, all periodics21:51
arne_wiebalckI think this is there already, no?21:52
TheJuliaI don't think so21:52
arne_wiebalckhmm, ok21:52
arne_wiebalckI thought there was some random start up delay21:52
arne_wiebalckif not, then yes!21:52
TheJuliaipa heartbeat21:52
arne_wiebalckbut we also need to make sure that the #processes x runtime < sync_interval I think21:53
arne_wiebalckthis is for the nova syny21:53
arne_wiebalcksync21:53
arne_wiebalckyou cannot ask more questions than there are possible answers ;)21:53
arne_wiebalckthe db becomes the bottleneck21:54
TheJuliathat would slow that down21:54
arne_wiebalckyes21:54
TheJuliaand all the read activity21:54
TheJuliaalthough is there any sign that the row locking for the lock updates is also impacting it?21:54
arne_wiebalckI wonder if this is also only a startup problem as we have caches in nova, no?21:54
TheJuliayeah, nova builds a cache, so if everything is starting up it is trying to populate the resource tracker cache as well21:55
TheJuliaand it also they each have to build a hash ring21:55
arne_wiebalckall I saw were big queries for the nodes in a conductor group21:55
arne_wiebalckthere is a cache for the power state, no?21:55
TheJuliaI don't think so21:56
arne_wiebalckah, no?21:56
TheJuliabut the objects are created via the db21:56
TheJuliaso there is the db hit there21:56
TheJuliawere you able to see what the effectve db query was/is?21:56
arne_wiebalck I was thinking if it worked before since I only added a few nodes at a time while the others had warm caches21:56
TheJuliaI'm wondering if we've got room to improve a query or threee21:56
TheJuliawell, resource tracker itself would have been warm21:57
arne_wiebalck:)21:57
TheJuliaand the other node's hashrings21:57
TheJuliaadd just a little bit more then mix in a thundering herd, and I could see everything colliding21:57
arne_wiebalckhmm, yeah resource tracker ...21:57
arne_wiebalckcolliding ... that somehow rings a bell :-D21:58
TheJuliaI'm sure there is a soundtrack that would match21:59
arne_wiebalckthings are up now since an hour with power_sync=360021:59
arne_wiebalckso, this seems to do the trick21:59
TheJuliaYeah, you were hitting the initial crunch22:00
arne_wiebalckyes22:00
arne_wiebalckbringing up everything at the same time is probably not ideal and should be improved22:00
arne_wiebalckfor the power sync interval we have to see what the impact is in real life22:01
TheJuliaif you ever reproduce or capture queries to the db, if we can figure out what is doing the most "work" then we should look into changing those22:01
arne_wiebalckI have the query, I think22:01
TheJuliayeah, I'm wondering if we need to do a "autosize deployment and try to prevent a thundering heard based on that"22:01
TheJuliaohhhhh22:01
TheJuliawe may have an easy fix to improve performance there as well22:01
TheJuliaadd an index for conductor22:02
arne_wiebalckwell, not the full query22:02
arne_wiebalck"SELECT nodes.created_at AS nodes_created_at, nodes.updated_at AS nodes_updated_at, nodes.version AS"22:03
arne_wiebalckonly the beginning22:03
arne_wiebalckI guess I can get it from the DB colleagues looking at this with me during the past hours22:03
arne_wiebalckbut the whole problem smells very much like the inspector issue some months ago22:03
TheJuliayeah, if they can extract the full query the DB is getting, that would be helpful22:03
arne_wiebalckwhich was solved with the leader election22:04
arne_wiebalckok, will ask them22:04
arne_wiebalckthanks for your thoughts, TheJulia, I will have another look at this next week22:04
arne_wiebalckhave a good week-end o/22:04
TheJuliaarne_wiebalck: I'm putting some notes into an item in storyboard22:04
arne_wiebalckTheJulia: thanks, I will have a look and update, I think we have a scaling item there somewhere ...22:07
TheJuliavery much so22:08
TheJuliaarne_wiebalck: https://storyboard.openstack.org/#!/story/200862622:11
*** lmcgann has quit IRC22:12
arne_wiebalckTheJulia: thanks!22:12
*** uzumaki has joined #openstack-ironic22:35
openstackgerritMerged openstack/ironic master: Set default to prevent out of memory conditions  https://review.opendev.org/c/openstack/ironic/+/76310722:46
iurygregorymerged \o/22:47
TheJuliawoot!23:07
TheJuliaship it!23:07
iurygregory\o/23:25
iurygregoryshipped in https://review.opendev.org/c/openstack/releases/+/775456 =)23:31
*** rloo has quit IRC23:36

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!