Thursday, 2022-10-13

ianwi'm really struggling to see why bridge01.opendev.org doesn't pick up variables from the bastion group in  https://review.opendev.org/c/opendev/system-config/+/861112 00:29
Clark[m]Is it the zuul level or the inner nested Ansible that isn't picking it up?00:35
opendevreviewIan Wienand proposed opendev/system-config master: [wip] switch testing bridge name to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86111200:52
ianwin inner ansible.  why it finds it when it uses bridge.openstack.org is really weird :/00:53
*** rlandy is now known as rlandy|out01:00
ianwfor some reason when we switch the bastion name, it is picking up https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/templates/group_vars/control-plane-clouds.yaml.j2 01:35
opendevreviewIan Wienand proposed opendev/system-config master: Run jobs with a jammy bridge.openstack.org  https://review.opendev.org/c/opendev/system-config/+/85779902:12
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls  https://review.opendev.org/c/opendev/system-config/+/85800302:12
opendevreviewIan Wienand proposed opendev/system-config master: Abstract name of bastion host for testing path  https://review.opendev.org/c/opendev/system-config/+/85847602:12
opendevreviewIan Wienand proposed opendev/system-config master: bootstrap-bridge: use abstracted hostname  https://review.opendev.org/c/opendev/system-config/+/86103102:12
opendevreviewIan Wienand proposed opendev/system-config master: Convert production playbooks to bastion host group  https://review.opendev.org/c/opendev/system-config/+/85848602:12
opendevreviewIan Wienand proposed opendev/system-config master: Run a base test against "old" bridge  https://review.opendev.org/c/opendev/system-config/+/86080202:12
opendevreviewIan Wienand proposed opendev/system-config master: [wip] switch testing bridge name to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86111202:12
opendevreviewIan Wienand proposed opendev/system-config master: Move clouds definitions into control-planes-clouds group  https://review.opendev.org/c/opendev/system-config/+/86113002:12
opendevreviewIan Wienand proposed opendev/system-config master: Abstract name of bastion host for testing path  https://review.opendev.org/c/opendev/system-config/+/85847603:26
opendevreviewIan Wienand proposed opendev/system-config master: bootstrap-bridge: use abstracted hostname  https://review.opendev.org/c/opendev/system-config/+/86103103:26
opendevreviewIan Wienand proposed opendev/system-config master: Convert production playbooks to bastion host group  https://review.opendev.org/c/opendev/system-config/+/85848603:26
opendevreviewIan Wienand proposed opendev/system-config master: Run a base test against "old" bridge  https://review.opendev.org/c/opendev/system-config/+/86080203:26
opendevreviewIan Wienand proposed opendev/system-config master: [wip] switch testing bridge name to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86111203:26
fricklerfwiw I noticed that "duplicated IRC verification failed message" symptom on kolla changes earlier. bit confusing, but no easy way to resolve it, I guess05:02
opendevreviewIan Wienand proposed opendev/system-config master: infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113805:50
ianwclarkb: ^ your comment on https://review.opendev.org/c/opendev/system-config/+/861031/3 inspired 861138.  i think i intended the bootstrap process to run against bridge directly; not via the playbooks/zuul/run-production-playbook path.  05:52
ianwi might have just got distracted on seeing it through.  05:52
*** ysandeep|out is now known as ysandeep05:55
opendevreviewIan Wienand proposed opendev/system-config master: [wip] switch testing bridge name to bridge01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/86111206:00
opendevreviewTakashi Kajinami proposed opendev/system-config master: Add puppetlabs packages for Ubuntu Jammy to mirror  https://review.opendev.org/c/opendev/system-config/+/86113906:00
*** jpena|off is now known as jpena07:18
*** pojadhav is now known as pojadhav|afk07:49
*** pojadhav|afk is now known as pojadhav08:16
*** ysandeep is now known as ysandeep|afk08:17
*** rlandy|out is now known as rlandy10:24
opendevreviewMerged opendev/irc-meetings master: Update Barbican meeting chair and time  https://review.opendev.org/c/opendev/irc-meetings/+/86092910:29
*** ysandeep|afk is now known as ysandeep10:42
*** rlandy is now known as rlandy|mtg11:09
*** rlandy|mtg is now known as rlandy11:50
*** dasm|off is now known as dasm13:00
*** knikolla[m] is now known as knikolla13:26
*** ysandeep is now known as ysandeep|dinner14:19
opendevreviewMerged opendev/system-config master: Revert "Pin version of grafana-oss container"  https://review.opendev.org/c/opendev/system-config/+/85205614:48
*** dviroel_ is now known as dviroel14:55
*** ysandeep|dinner is now known as ysandeep15:18
clarkbfrickler: if you have time for https://review.opendev.org/c/opendev/system-config/+/861117 today and don't have objections for us restarting Gerrit later today this will fix the gerrit + ssh rsa key issue some users have run into with newer openssh clients15:25
*** ysandeep is now known as ysandeep|out15:33
*** marios is now known as marios|out15:38
*** tkajinam is now known as Guest297115:43
clarkbfungi: you up for testing meetpad today to double check the recent update before the PTG starts next week?16:31
clarkbif so what is a good time for that?16:31
fungiclarkb: i can give it a go in a few minutes if you're available16:32
clarkbthat works for me16:34
*** jpena is now known as jpena|off16:34
clarkbwe can use https://meetpad.opendev.org/isitbroken just let me know when you are ready16:35
fungiclarkb: i've joined if you have a few minutes16:43
jrosser_clarkb: i got some mail from review.openstack.org today so messagelabs is passing those now - i added a rule for the domain. unusually there were no held messages that i could release from quarantine so i'm not sure what was happening to them before.16:47
clarkbjrosser_: well it seemed that the remote mail server was rejecting them outright according to the responses our mail server was getting16:52
clarkbwouldn't surprise me if they never made it far enough to be quarantined16:52
fungiprobably someone who uses that same mail service flagged messages from the address as spam or something16:53
clarkbI think gitea-lb01 will be replaced with a jammy gitea-lb02 since replacing that server should be really simple. I think all it requires is making the new server and updating DNS16:58
clarkber will be my first jammy replacement16:58
clarkbthat way I'm not debugging afs on jammy and other jammy things if they come up16:58
fungiyeah, should be quite trivial16:59
fungieven the dns update ought to be effectively hitless for clients as long as we don't take down the original server until the old ttl expires17:00
clarkbyup17:01
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66612&rra_id=all I think that shows just under 8GB of memory is appropriate so I'll stick with the same flvaor size17:02
fungiagree17:03
clarkbhrm vexxhost has v3 flavors now though so I should look at this more carefully. THis might be the most complicated part :)17:03
fungisizing based on processors may be better in that case17:03
fungiwe may end up with too few cpus if we choose based on ram17:03
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66609&rra_id=all I dont' think we are very cpu limited17:04
fungipretty sure the newer flavors there are comparatively generous on ram17:04
clarkbv3-standard-2 is 8GB memory 2dedicated vcpu. We currently run v2-highcpu-8 which is 8GB memory and 8vcpu (not sure if dedicated)17:06
clarkbbut it really seems like we don't use a ton of cpu according to cacti. I think v3-standard-2 will be sufficient17:07
clarkbI'll go ahead with v3-standard-2 and we can always start over if necessary17:08
clarkbhrm there is no jammy image in vexxhost yet17:10
clarkbmnaser__: ^ is that something ya'll are planning to do or should I upload our own image?17:11
mnaser__clarkb: feel free to upload one, i think we've got a couple other things we're juggling around17:11
clarkback17:13
clarkbhttps://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img is the image I'll pull down and reupload17:13
mnaser__clarkb: you might want to flip it to raw if you're doing bfv :>17:14
clarkbI don't think we're doing bfv. But ya I'll upload raw to make it more versatile17:14
* clarkb converts it locally first to ensure I don't fill bridge's disk17:16
clarkbI'm uploading the image now finally (took some time to convert and copy it and check hashes and so on)17:48
*** Guest2868 is now known as diablo_rojo17:59
clarkbI'm working through some ssh issues trying to boot the node now18:01
clarkbadding local debugging to launch-node18:01
clarkbI think authentication is failing but it is failing suspiciously quickly I suspect that this may be related to when ssh tells you to go away as the server isn't fully booted yet. I may need to manually boot an instance and test it18:08
fungidoes passing --keep to the script not help?18:09
clarkbthats the next step. I wanted a manual boot to control everything and get a baseline that the image works (it does)18:14
clarkbwhen you use --keep sometimes cleaning up everything that results in isn't straightforwad18:14
clarkbI'm suspecting that maybe the userwarning about the unknown host key is related18:14
clarkboh nevermind we set the policy to warn which is what it does18:19
clarkbI think this might be rsa sha2 + paramiko problems18:34
clarkbparamiko doesn't have an ed25519 generate method18:38
clarkbwe could maybe use ecdsa or upgrade paramiko instead. I guess that should be my next thing to try. Install stuff into a venv. Run out of venv with newer paramiko18:38
clarkbbut I must eat lunch now18:39
ianwclarkb: if you have a sec to think about https://review.opendev.org/c/opendev/system-config/+/861138 which is a slight rework to the bridge bootstrap job, that would be good, as if ok i'll have to rebase the upgrade bits on it.  19:49
ianwthanks for looking at the jammy images, lmn if i can help ...19:50
ianwone other thing i noticed we should probably venv-ize is docker-compose.  not sure i want to switch every production host all at once, maybe an opt-in update19:51
clarkbI was just looking to see where we install paramiko on bridge and I'm not seeing it19:51
clarkbI suspect it may have been pulled in as an ansible dep in the past but it is no longer an ansible dep?19:51
clarkbanyway I think the next thing with jammy is to make a virtualenv with a new paramiko and run out of that then see if that fixes the auth issues19:52
clarkbyup that fixed it19:55
clarkbnow to see if we can bootstrap the rest of a jammy node19:55
*** dviroel is now known as dviroel|biab19:55
clarkbinfra-root I think the "fix" here is to update paramiko on bridge to latest. I'm not aware of anything else using paramiko there (ansible uses openssh)19:56
opendevreviewClark Boylan proposed opendev/system-config master: Add Jammy gitea-lb02 to our inventory  https://review.opendev.org/c/opendev/system-config/+/86122620:10
clarkbit appears that this IP was used by an older server in the past. We have a known hosts entry for it that conflicts. I'm going to clear that out20:15
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Add gitea-lb02 to DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/86122720:22
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Prepare opendev.org records for switch to new LB  https://review.opendev.org/c/opendev/zone-opendev.org/+/86122820:29
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Swap gitea-lb01 to gitea-lb02 for opendev.org  https://review.opendev.org/c/opendev/zone-opendev.org/+/86122920:29
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Reset TTLs for opendev.org records  https://review.opendev.org/c/opendev/zone-opendev.org/+/86123020:29
clarkbianw: I left a note on that chagne. I think it is mostly workable except for the one thing I called out20:31
opendevreviewClark Boylan proposed opendev/system-config master: Fixups for launch node  https://review.opendev.org/c/opendev/system-config/+/86123120:39
opendevreviewMerged openstack/diskimage-builder master: Added cloud-init growpart element  https://review.opendev.org/c/openstack/diskimage-builder/+/85585620:41
clarkbianw: https://review.opendev.org/c/opendev/system-config/+/861117 is a good one for fixing ssh + rsa + sha1 with gerrit.20:43
clarkbI'm doing all the ssh + rsa + sha1 things today :)20:43
*** dviroel|biab is now known as dviroel20:51
ianwclarkb: hrm, yes good call on that key.  that's the key that is used to login to all the hosts, right?21:09
clarkbianw: I think so21:28
clarkbHeres haproxy on the test gitea-lb02 jammy node logging what appear to be successful requests https://zuul.opendev.org/t/openstack/build/0681e64943674281bbfa1645e51df878/log/gitea-lb02.opendev.org/haproxy.log21:30
clarkbI think https://review.opendev.org/c/opendev/system-config/+/861226 and https://review.opendev.org/c/opendev/zone-opendev.org/+/861227 are probably safe to land21:31
*** dviroel is now known as dviroel|afk21:33
clarkbif we ever need good examples of what healthy mailing list interactions look like I've been really impressed with the mailman3 users list21:58
opendevreviewMerged opendev/system-config master: Update our Gerrit images  https://review.opendev.org/c/opendev/system-config/+/86111722:10
*** rlandy is now known as rlandy|bbl22:12
opendevreviewIan Wienand proposed opendev/system-config master: [wip] infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113822:23
ianwclarkb: ^ do you want to restart gerrit for 861117?22:24
clarkbianw: ya I can do it. Probably in about an hour just so that its as quiet as possible first22:26
ianw++ i can do it my afternoon if you like too22:26
clarkbI should probably do one to keep in practice :) It should be a docker-compose pull && docker-compose down && docker-compose up -d right?22:28
ianwyep that's what i do :)  i usually run an inspect on the latest image just to double check it is what i think it is before restarting22:30
opendevreviewIan Wienand proposed opendev/system-config master: [wip] infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113822:33
ianwclarkb: 861226 should be safe to merge, as the host will just sit there until we switch dns?  might be good to just get it in and make sure it's running ansible ok, etc?22:35
clarkbianw: yup exactly22:39
clarkbinfra-root one thing I notice on review02 is that we may want to run `docker image prune -f` there? seems we've got a number of older imgaes and I'm not sure how much value those provide but they do consume disk22:41
ianwyeah, i think we can prune most.  i guess we just keep them out of an abundance of caution so we can roll backwards22:42
clarkbyup, on a number of other services we automatically prune but we don't do that for gerrit (which seems sane? I dunno)22:43
clarkbI think it will leave behind images for gerrit 3.2, 3.3, and 3.4 since they are tagged22:43
clarkb(which I'm also fine with)22:43
clarkbshould I go ahead and run that? the upside to running it before we pull the new image too is then we can have the current image left behind untouched by prune for rolling back to22:44
ianwcan you do something like prune all but the latest 2?22:45
clarkbno unfortunately. You can time bound and leave behind tagged images. I guess you can figure out what the last two are and tag those before pruning22:46
clarkbmaybe do something like pruning everyhting more than 5 months old?22:46
clarkbthat should keep the last 2-3 images22:47
opendevreviewIan Wienand proposed opendev/system-config master: [wip] infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113822:47
ianwmaybe start with that and just manual prune any left overs?  should remove the bulk of them22:48
clarkbya22:48
clarkb`sudo docker prune --filter "until=2022-05-01T00:00:00" -f` maybe22:49
clarkbrunning that command locally against the images i have on my laptop it pruned what I expected22:50
clarkbeverything with a tag was kept and then everything without a tag older than 5 months was removed22:51
clarkbhowever I don't have an untagged image newer than 5 months22:51
ianwlgtm; we don't need those 15/20 month old things22:51
clarkbok I'll run that on review02 now22:51
clarkber `sudo docker image prune` not `sudo docker prune`22:52
clarkbTotal reclaimed space: 17.72GB and the image listing looks sane22:54
clarkbone thing we could add to the gerrit automation is a docker image prune with an until of like 6 months22:57
clarkbadding an until to all our image prunes to keep the last week might also be good (though I think space is tighter on some hosts)22:57
clarkbI'll have to think about that a bit more. Maybe do it on a case by case basis22:57
opendevreviewIan Wienand proposed opendev/system-config master: [wip] infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113822:59
clarkbthere are two changes in hte openstack gate pipeine that look close to merging. I'll wait for those to clear out then do the gerrit restart23:12
clarkbfungi: if you are around this evening it would be great if you can test and confirm removing your sha1 override is functional too, but that can happen tomorrow23:12
fungiyeah, i can test, np23:13
opendevreviewIan Wienand proposed opendev/system-config master: [wip] infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113823:18
opendevreviewIan Wienand proposed opendev/system-config master: [wip] infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113823:27
clarkbthe last job I was waiting on is finishing up now.23:27
clarkband its done. Running docker-compose pull and then will inspect it to double hceck23:29
clarkbopendevorg/gerrit@sha256:9947e82a212c9a00c7171a656e8935485522d509a49d5b97fde5e54fabfaf7c9 seems to match docker hub23:30
clarkbproceeding with the down and up -d now23:31
clarkbthe web ui is up for me23:33
clarkbthere is an IBM IP spamming the error log about not being able to negotiate ssh connections but that is preexisitng in the log pre restart as well23:33
clarkbthe client only does diffie hellman sha1 variants and gerrit doesn't do those23:34
clarkbfungi: are you able to ssh without your config override?23:35
clarkbI can using my ed25519 key which doesn't tell us much other than ssh is generally working23:35
opendevreviewIan Wienand proposed opendev/system-config master: [wip] infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113823:37
fungiclarkb: yep! it works23:38
clarkbwoot23:38
fungii no longer need the override for it23:39
clarkbI just realized that we're still installing the 3.5.2 tag for a number of plugins when we build our gerrit war. We should update those to 3.5.3 but I don't think this is urgent. Often times the tags point to the same commit and we were already running on 3.5.3 with 3.5.2 plugins for a few days23:40
clarkbworking on a change for that now23:40
clarkband then I hsould write an email followup to the last email I sent about rsa keys23:41
clarkbyup confirmed that for 3.5.x all of the 3.5.2 plugins we install have 3.5.3 tagged on the same commit so defiitely not urgent, just good bookkeeping23:45
clarkboh but I have discovered that we apparnetly try to checkout plugins/its-base on stable-3.6 which no longer exists (it got merged into master) so zuul must be using a stale version of that branch? And a stable-3.5 branch exists for that now (instead of master) which is 3 commits behind master all of which appera related to stable-3.6. I'll correct that for consistency too23:51
*** dasm is now known as dasm|off23:52
opendevreviewIan Wienand proposed opendev/system-config master: infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113823:54
ianwclarkb: ^ I think that works around the issue with the root key.  and gives us a point to write out other things if we need to23:55
*** rlandy|bbl is now known as rlandy23:55
opendevreviewClark Boylan proposed opendev/system-config master: Resync gerrit plugin versions to latest gerrit releases  https://review.opendev.org/c/opendev/system-config/+/86127023:58
clarkbok I think that ^ is bookkeeping caught up. Reviewers should pay attention to the its base changes23:58
clarkbsince that is the only real thing that changes there.23:58
*** rlandy is now known as rlandy|out23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!