Tuesday, 2024-04-30

mnasiadkamorning07:05
SvenKieskemorning :)07:21
opendevreviewMartin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario  https://review.opendev.org/c/openstack/kolla-ansible/+/83694108:34
kevkomorning 09:59
opendevreviewMichal Nasiadka proposed openstack/kolla-ansible master: kolla_url: port is a string  https://review.opendev.org/c/openstack/kolla-ansible/+/91697910:09
opendevreviewMatúš Jenča proposed openstack/kolla-ansible master: Add Redis as caching backend for Heat  https://review.opendev.org/c/openstack/kolla-ansible/+/90922410:19
opendevreviewMichal Nasiadka proposed openstack/kolla-ansible master: kolla_url: port is a string  https://review.opendev.org/c/openstack/kolla-ansible/+/91697910:33
kevkoSvenKieske: check my reply and let me know 10:45
opendevreviewMatúš Jenča proposed openstack/kolla-ansible master: Add Redis as caching backend for Placement  https://review.opendev.org/c/openstack/kolla-ansible/+/90922210:52
opendevreviewMatúš Jenča proposed openstack/kolla-ansible master: Add Redis as caching backend for Nova  https://review.opendev.org/c/openstack/kolla-ansible/+/90920310:56
opendevreviewMatúš Jenča proposed openstack/kolla-ansible master: Add Redis as caching backend for Keystone  https://review.opendev.org/c/openstack/kolla-ansible/+/90920111:00
SvenKieskekevko: sorry, maybe I lost track, but do you mean https://review.opendev.org/c/openstack/kolla/+/915440 or something else?11:18
SvenKieskekevko: and belated happy birthday! :)11:18
opendevreviewJakub Darmach proposed openstack/kolla master: Add exclude for permissions in kolla config.json file  https://review.opendev.org/c/openstack/kolla/+/65562011:30
kevkoSvenKieske: yes11:57
kevkokevko: how do you know it :D 11:57
SvenKieskelinkedin has all your secrets ;)12:01
opendevreviewMichal Arbet proposed openstack/kolla master: Fix handling configs in base image  https://review.opendev.org/c/openstack/kolla/+/91544012:03
kevkohaha12:03
kevkoSvenKieske: so did you check the comment ? 12:03
kevkoi would like to handle most of my patches this week ... 12:10
kevkobtw, thanks for a birthday wishes .. SvenKieske .... maybe I can get some patches merged as a present ? :D 12:13
SvenKieskekevko: sure, but you didn't really address the comment about a docstring? even using actual variable names would go a long way to make this more readable.12:15
SvenKieskealso notice we have a public holiday tomorrow at least in germany, and frickler will be offline for the rest of the week (I'll be available though), so not sure you can get enough +2.12:16
kevkoSvenKieske: working on it now ..12:18
SvenKieskeI'll write specific comments where I would like to have an explanation, I guess that helps more than the generic comments I have done until now.12:19
kevkoSvenKieske: but I am still not sure that I should add a docstring to a python file while no function has docstring ....12:19
kevkoSvenKieske: as that function is self-explanatory ... but ok 12:19
kevkoSvenKieske: thanks ! ...that will help 12:19
kevkoSvenKieske: check also reply to mnasiadka where I pointed that this is not only policy json/yaml stuff ...from my perspective it's easy way how to corrupt service container  .... tested on horizon ...12:21
kevkostuff/problem/12:21
SvenKieskePart of my problem I think is that I'm not that familiar with this whole file, I guess I'll reread the complete code from top to bottom in that File. e.g. I currently don't know what ConfigFile.copy() class actually does (well from the name I think it will copy some config :D)12:22
SvenKieskegive me a second..12:22
kevkoSvenKieske: it's a long time defined class for a copy file ....that class copy files from /var/lib/kolla/config_files  <- from config.json 12:23
kevkoSvenKieske: so I am not adding another complexity ..I've just used what is already implemented ..just add the logic to "remember" the beginning state ...12:24
kevkoSvenKieske: so i am working only with dicts and copy the files as per class code-documentation :) 12:25
SvenKieskeyeah let me check quickly the original code and classes and stuff, never looked in this before, as far as I remember at least :D12:26
kevkoSvenKieske: and also check the comment mentioned above .... anybody who want to be a badass can create dummy non-valid file and inject into container using config.json ...let's say some operator will find that some badass changed it .... normally he need to remove container (as it's restarting ...or do some additional steps with and some docker cp12:31
kevko..etc ..etc stuff...) ...and rerun kolla-ansible ....   this patch just fixing this behaviour and ensure that if operator will remove the entry from config.json ... the original file will be restored (the one before the malicious dummy file from badass was copied to the location)12:31
SvenKieskeyeah I guess I understand the problem and part of the solution, I just don't understand/have the background how this works in general and I must say I'm not sure I'm much smarter looking at the ConfigFile class which also has a lot of code with not really much explanation why stuff is being done. Some of it I of course can infer from the problem domain and I suppose more can be extracted when looking at git commit messages 12:33
SvenKieskeand gerrit review comments.12:33
SvenKieskemy point is, this stuff is hard to review for someone not already deeply familiar with the code and it's also a lot of work making yourself familiar with it because all explanation is basically hidden at different places. really not ideal, but I guess I'll persist through it.12:34
SvenKieskethe downside is, for you, that my review will take a little longer I guess :) not that it matters much, with regards to having merging power.12:35
kevkohmm, but why do you focus on a code which i didn't provide .... mechanism for copying files defined in some config/dict was already merged in a past and working ... i've just take that class and used it 12:35
kevkoSvenKieske: hmm, no new comments ...did you click the send button in gerrit ? 12:37
SvenKieskeyeah, I just want to understand what "config_file.copy()" actually does, when I review code. So if I can't answer that I usually read the source myself until I can answer the question. If I can't tell myself what the code does I can't tell if it works correctly. I don't doubt it works correctly, but the point of a review is, to me at least, that I checked that this works, at least to soem degree.12:37
SvenKieskenot yet, I can send a single comment, if you want12:38
SvenKieskeI replied with a small comment, getting back to reading/understanding the code :D this class seems at first sight to do a lot of very basic stuff, but I guess there will be good reasons for all that, I will ignore it and focus on that copy function for now.12:39
SvenKieskewow, the path separator is even coded platform agnostic (win/linux via using os.sep instead of simply "/"). seems like unnecessary complexity..is kolla supported on windows? :D12:42
SvenKieskeokay it really looks rather simple12:44
SvenKieskegood thing that this stuff is not supposed to deal with malicious actors, the whole stuff is racy as ****12:46
SvenKieskekevko: replied, now with actual comments on the code, hope it makes sense.12:57
SvenKieskeI _think_ I understand now 90-99% percent of the code in that change :D12:58
kevkoKolla was supported on Windows also yes ... But I don't know if still 13:02
opendevreviewWill Szumski proposed openstack/kolla-ansible master: Adds option to add extra scape targets for node exporter  https://review.opendev.org/c/openstack/kolla-ansible/+/91597513:03
SvenKieskeah didn't know that13:03
SvenKieskekevko: maybe you can in turn take a look at https://review.opendev.org/c/openstack/kolla-ansible/+/915901 ? It's a really nice addition to the service-cert-copy role, so it also works for non haproxy certs13:09
SvenKieskeit's also just missing one plus 2 ;)13:09
kevkoSvenKieske: okay, replied and resolved as I replied detailed ..let me know 13:23
kevkoSvenKieske: ^^^ hmmmm, i don't like the solution in service-cert-copy 13:26
SvenKieskekevko: oh okay, well if you have a better idea feel free to comment on that I guess :)13:31
SvenKieskeah thanks for the explanation :)13:34
kevkoSvenKieske: well, i am wondering why there is an when conditional with those all haproxy stuff and then 'or not use_haproxy' ....task's name is 'Copying over backend internal TLS certificate' where it's trying to find a cert ...13:35
kevkoSvenKieske: i think removal of that when will just work :D ..as if no cert will be found ...no cert will be copied ..but if will be found ..then it will be copied ...13:36
SvenKieskeyeah, that is a little complicated but I couldn't come up with an easier solution, if you can, go ahead! :913:36
SvenKieskemhm13:36
SvenKieskemaybe? :D13:36
kevkoI think so ...13:36
kevkoit's backend tls ... from my perspective just means ... ok ..wherever i am configuring tls on the backend side (means container ) ...in apache.. or in some service config ... whatever ...it's done in service's role itself ...service-cert-copy need to ensure copy of that cert ... 13:38
SvenKieskelet me reread the code, it's been some days since I looked at it13:39
kevkoso in all cases i need to have cert inside if i placed into path kolla-ansible will find .. ..even if i want to run standalone 13:39
SvenKieskebut from the surface your explanation seems right13:39
SvenKiesketo be fair the code was already complicated before the change and I think the author is not that familiar with the code, so I can understand they made the most minimal change.13:41
SvenKieskebut wait, won't the copy task run into an error if there is no cert to be copied?13:44
SvenKieskethe detection of haproxy backend and tls support could be done easier I think though. it's rather complicated I think13:45
kevkoSvenKieske: Unfortunately, the author has to get used to receiving sometimes nonsensical objections or objections that they will have to refute in the comments, or he may completely rewrite the logic in several patch sets in some chain, if they defend it and the rework will makes sense. And I know what I'm talking about, I have it constantly here13:47
kevko...still explaining ..rewriting ..amending and amending , did you notice that ? :D :D 13:47
SvenKieskewell our patch workflow takes rather long from initial patch until it's merged. I don't think that's tracked anywhere, but I vividly remember my first patch taking almost a year or something, and it fixed a regression. But part of it taking so long was also me not working full time on it.13:49
SvenKieskeI would like to see that we spent more effort writing unit or integration tests when fixing bugs, so we can make sure that a bug doesn't resurface suddenly. I hope we can do that once we got tempest in, as I think that will boost our test matrix quite a bit.13:50
kevkoSvenKieske: yeah it will fail, but it's failing also now :D ... if kolla-ansible will not find let's say for service name rabbitmq .......   /etc/kolla/certificates/backend-cert.pem OR  /etc/kolla/config/controller0/rabbitmq-cert.pem OR /etc/kolla/config/controller0-cert.pem OR /etc/kolla/config/rabbitmq-cert.pem  it will fail 13:51
kevkoSvenKieske: you can't expect that task will not fail if the source for copy is missing, can you ? :D 13:51
SvenKieskesure13:52
kevkoSvenKieske: we are discussing  about when :  ...why it's there ...because from my perspective it's really not needed 13:52
SvenKieskemhm right, it doesn't even copy the certs into haproxy container, just into the project dir, they really don't hurt there13:53
SvenKieskewhy is that even special code, I mean it's just file copy, with a certain file mode set, that's all :D13:54
kevkoSvenKieske: yeah, and there is +2 where it's obvious it's bad :D ..but my code which is tested from bottom to top has -1 :D 13:54
SvenKieskeso not really a need for a dedicated role13:54
kevkoSvenKieske: why ? because you are not going to repeat the same code for every service ? :D 13:55
SvenKieskewell I think it depends on how you look at the code: if you look at only commit message and code change, I also think "yes, this does what it should do" I didn't think further if the old code is even needed. I think it's human nature: we take for granted what is already there :D13:55
SvenKiesketo be fair, your change is to a core component (copying stuff around in the core kolla libs), so I can understand doing more scrutiny there in contrast with adding only a single variable to a cert copy condition :)13:56
SvenKieskebut such "easy" changes can be deceptive as well.13:57
kevkoSvenKieske: also, i think there should be bugreport created ...because kolla saying we support TLS  on the backend side (means LB members ) ....but not saying that if you will use your own LB solution ..your certs will be not copied :D 13:59
SvenKieskemhmm, not sure what's really "supported" there with regards to different LB solutions? I love the term "supported" in relation to unpaid opensource support :D14:00
kevkoSvenKieske: yeah, but i provided a huuuuuge code serveral times and never reverted ... :) i think i need to take someone from my company to be core also :D 14:00
kevkoSvenKieske: once i need to create a video to just explain what is going on :D and put into bugreport as link to my server :D 14:01
kevkoSvenKieske: in past when you lost your haproxy with keepalived it takes minutes for services to start asking the new address 14:02
kevkoSvenKieske: it was one option in kernel param ...i needed to explain for months i think :D 14:02
SvenKieskeI mean I honestly would like to see less "I know this person, their code will be good" and more actual looking at the code with some scrutiny for sure. I understand why it's done the way it is, and I think it's normal human nature to do so, but well. I don't think it produces always the best results. I don't want to personally criticise anybody though, this happens to myself as well, when I think "a that guy has so many 14:04
SvenKieskecommits in project $foo, they won't certainly make many errors in this code?" 14:04
kevkoSvenKieske: last week we had upgrade to zed ...again we lost rabbitmq and we were unable to start rabbitmq ...(but from previous upgrade we knew what to do ...so it didn't hurt) ....but it's bug in kolla ...i probably know where is  a problem  ...but should i send a patch to upstream ?...i don't know .. it takes time to handle my waiting reviews :(14:05
SvenKieskekevko: I hope the rabbit situation will get better once we have proper quorum queues, but there are still some oslo patches from OVH lingering around afaik.14:06
kevkoSvenKieske: it's about clustering not about the queue types i would say 14:06
SvenKieskethere was a wrap up of that in one of the recent oslo meetings14:06
SvenKieskewell until some releases ago there was no clustering by default, at least that where the failures I have personally seen, did you already run everything with HA queues in yoga/zed and older releases? but I think there where also some buggy/false assumptions in core openstack services, e.g. with regards to "transient queues" which really aren't transient14:07
SvenKieskeso to be precise: yes of course rabbitmq was clustered, but the queues where not HA by default14:08
kevkolast thing what i wanted to say ..sometimes i am frustrated why some code can't be accepted ... why frustrated ...because i understand  new feature can break many things ...but if there is such type of risk ..it can be always written as "optional" ...so it will never run IF user will not allow14:09
kevkoSvenKieske: rabbit just said ...sorry ..blabla...mnesia tables...blabla ...inconsistency ..blabla ...14:10
kevkoSvenKieske: it's because of kolla should stop containers one by one and wait for stop... ...and when starting ... do it vice versa14:11
SvenKieskethat's why I mentioned more tests, because I think a lot of the fear comes from the fact we don't have - imho - that many tests catching errors early in CI, so we resolve this by being adding extra scrutiny during code review.14:14
SvenKieskeyeah the mnesia stuff, I think I have seen that, was it your bugreport or patch? I don't quite remember the details14:15
kevkoSvenKieske: point is that it work normally if you don't have many data .... it just don't work on big scale we have ...  how to do you replicate that ? i was trying to replicate ... i was not able to replicate it on test ..even if i run rally/tempest in several threads14:16
SvenKieskeyeah, scaling tests is hard, a colleague from another project just ran nova unrestricted on a 128 core server and nova defaults blew up the database due to spawning some worker subproccess * num_of_cpu and all hammered the DB :D14:18
SvenKieskekevko: if you want to look into the details: https://bugs.launchpad.net/nova/+bug/206345114:19
kevkoSvenKieske: well, we had nova DB performance problems ... that's the reason why I implemented proxysql ..14:20
SvenKieskeyeah, I really like proxysql, never could convince many people that it's good though (in the past). :(14:21
kevkoSvenKieske: well, on the end we have it in kolla-ansible and can be easy switched :) 14:23
kevkoSvenKieske: what version do you use ... for example osapi_compute_workers is not read by our setup as we serve api via apache ...14:26
SvenKieskeah that bug is not reported by myself, someone from another deployment project ran into it, they deploy openstack with k8s and now learn the hard way that you can't just use most openstack service defaults if you want to run in production :D14:29
SvenKieskeI just gave them the advice to learn from all the hard learned experience in kolla and copy our defaults where it makes sense, as I think we have much operational knowledge already encoded in kolla14:30
SvenKieskeso I don't know their exact setup14:30
kevkoSvenKieske: haha, our defaults are also not always good :D 14:32
kevkoSvenKieske: for example i rememebr that we need to amend max_overflow because of DB 14:33
opendevreviewMichal Arbet proposed openstack/kolla master: Fix handling configs in base image  https://review.opendev.org/c/openstack/kolla/+/91544015:08
kevkoSvenKieske: i marked one comment as resolved and one waiting for your reply ..check please 15:18
opendevreviewWill Szumski proposed openstack/kayobe master: Prevent accidental overriding of Ansible extensions  https://review.opendev.org/c/openstack/kayobe/+/91776216:35
opendevreviewMerged openstack/kolla stable/2023.2: nova-compute: Remove duplicate openvswitch package  https://review.opendev.org/c/openstack/kolla/+/91709818:36

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!