Friday, 2022-03-04

fungiyeah, i expected that might happen00:03
fungii also mentioned it as a possibility during the openstack tc meeting earlier today (yesterday utc), in case it's also happening to any of openstack's jobs which might have relied on those labels00:04
corvusNeilHanlon: is there a bug in zuul docs?00:04
opendevreviewIan Wienand proposed opendev/system-config master: docs: reorganise around a open infrastructure overview
opendevreviewIan Wienand proposed opendev/system-config master: docs: reorganise around a open infrastructure overview
*** wxy-xiyuan_ is now known as wxy-xiyuan01:45
corvusexecutor restart complete...i'm inclined to see if we can get some of the other changes landed and restart the schedulers after that tomorrow01:47
NeilHanloncorvus: I think so, but it may have just been my interpretation. says that nodesets specified as a dictionary in a job (instead of a string) need not specify the 'name' key as they are interpreted as anonymous pools. However the Nodeset docs say both are required, and indeed a03:54
NeilHanlonzuul run will error if the name key doesn't exist (e.g.:
NeilHanloni'd be happy to fixup docs and/or file a bug if needed 03:55
*** frenzy_friday is now known as frenzyfriday|ruck04:38
*** ysandeep|out is now known as ysandeep04:47
*** frenzyfriday|ruck is now known as frenzyfriday|rover05:40
*** ysandeep is now known as ysandeep|mtg07:13
*** jpena|off is now known as jpena08:10
*** ysandeep|mtg is now known as ysandeep|lunch08:45
mnasiadkaIs there a way to get newer Ansible version than 2.9 for Zuul executed playbooks?10:00
*** rlandy_ is now known as rlandy|ruck11:12
*** bhagyashris|ruck is now known as bhagyashris11:36
*** pojadhav is now known as pojadhav|brb13:09
fungimnasiadka: at the moment, zuul only supports ansible 2.8 and 2.9:
mnasiadkafungi: ok then, thank you - I'll need to live with ansible_os_family: "Rocky" instead of "RedHat" ;-)13:19
fungimnasiadka: adding a newer ansible version to zuul will entail a change something along the lines of which was what added support for 2.913:20
mnasiadkafungi: doesn't look bad, I'll think about adding 2.11 - thanks13:23
fungiprobably we should add 2.10 first or at the same time since i don't think zuul has ever supported a discontiguous series of minor versions, but i guess we can hash that out in review13:24
opendevreviewyatin proposed zuul/zuul-jobs master: [multi-node-bridge] Add script to configure connectivity
mnasiadkafungi: - seems there was an approach, but abandoned.13:46
fungiyeah, looks like it was mostly passing tests but that was over a year ago when it was last tested13:53
fungii'm happy to restore that change if you want to work on updating it13:54
mnasiadkasure, why not, can learn something new ;)13:54
fungiit's restored now13:56
mnasiadkathanks, rebased - let's see14:04
*** pojadhav|brb is now known as pojadhav|afk14:09
*** iurygregory_ is now known as iurygregory14:28
mnasiadkafungi: seems Zuul is stripping 0 from 2.10 - and tries to run ansible 2.1 - - any idea why?14:29
fungimnasiadka: yaml is probably interpreting that version field as a float instead of a str14:33
fungiwrap teh value in quotes14:33
mnasiadkaok, makes sense14:33
fungiwe ran into a similar situation elsewhere recently trying to add python 3.10 jobs14:37
fungiwe've apparently not done a great job of quoting version strings in configs14:37
fungiit's really only a problem for two-component versions since they look like floats, three-component versions get treated as strings because they contain more than one .14:38
* NeilHanlon is still sad the patch to backport rocky into 2.9 didn't make it14:41
corvusNeilHanlon: ah, i could see how that would be confusing.  the docs are trying to say that the first name attribute must be absent:
corvusNeilHanlon: not the second one which applies to the nodes:
corvusNeilHanlon: ie, the "nodeset" should not have a name since it's anonymous, but the nodes within the nodeset should.14:48
NeilHanlonahhh, yes. okay that makes sense lol14:49
corvusmaybe if we added the word nodeset before name there... or just added a parenthetical ("but the nodes still need names").... or, added a note in the nodeset side....14:50
corvusNeilHanlon: how's this look? remote: Clarify anonymous nodeset docs [NEW]        14:53
NeilHanlonyep, I think that clears it up! 14:55
*** dviroel is now known as dviroel|lunch14:56
opendevreviewyatin proposed zuul/zuul-jobs master: [multi-node-bridge] Add script to configure connectivity
corvusi'm going to complete the zuul rolling restart now (schedulers+web)15:44
corvusthis will entail both of the schedulers and the web service going offline for a few minutes; we will miss some gerrit events, but it shouldn't be a long outage.  existing queue states will remain.15:45
fungithanks corvus!15:49
corvusfirst scheduler is up; web and 2nd scheduler are on their way15:53
corvusi'm going to restart the nodepool launchers now15:53
corvusthat's done15:54
opendevreviewMerged openstack/diskimage-builder master: Correctly create DIB_ENV variable and dib_environment file
corvus#status log restarted zuul and nodepool launchers; schedulers are at bb2b38c4be8e2592dd2fb7f1f4b631436338ec98 executors a few commits behind, and launchers at ac35b630dfbba7c6af90398b3ea3c82f14eabbde15:56
opendevstatuscorvus: finished logging15:56
corvuscool, the node request time graph on is alive now15:58
corvusand everybody is back and running now16:00
fungithat node request time graph is going to be interesting for seeing how different pipeline priorities compare16:02
corvusyeah that should show up there16:03
opendevreviewyatin proposed zuul/zuul-jobs master: [multi-node-bridge] Add script to configure connectivity
*** dviroel|lunch is now known as dviroel16:20
clarkblooks like we are still a go for landing today to update the zuul-registry deployment on insecure-ci-regitry's user? corvus  is that something you might be interested in revieweing given connection to zuul? Otherwise I think I'll approve it after my meeting this morning16:23
clarkbcorvus: one thing I notice on the performance metrics is the really good compression ratio16:26
corvusclarkb: nothing jumps out at me; lgtm thanks :)16:26
corvusclarkb: yeah, i'm wondering if it's "too" good, but i can't find an error in the code16:26
*** ysandeep is now known as ysandeep|out16:31
*** marios is now known as marios|out16:52
clarkbI have approved and will monitor it as it does its thing16:55
*** jpena is now known as jpena|off17:06
opendevreviewMerged opendev/system-config master: Adds support for running zuul-registry as a non-root user
clarkbthat is behind the hourly deploy jobs17:28
opendevreviewJames E. Blair proposed openstack/project-config master: Add more stats to zuul performance metrics dashboard
corvusinfra-root: ^ if you have a second for a quick review, that would be nice to get out there17:45
corvusi'm also wondering if reconfiguration_time is something we should add to the main zuul status page.. but maybe let's see what it looks like here first17:46
corvuson the ad-hoc graph i'm looking at, it looks like openstack reconfigures about every 5 minutes... and takes 2.5 minutes to do so.17:48
corvusthat's a wee bit more often than i would have expected.17:48
fungiyeah i had no idea it was that frequent17:49
opendevreviewMerged openstack/project-config master: Add more stats to zuul performance metrics dashboard
clarkb831462 triggered more jobs than I expected. I'm guessing due to the group vars update. We might want to look at those files specifications again18:14
clarkbIt shouldn't be a problem. Just takes longer to run the job we're actually interested in18:14
opendevreviewyatin proposed zuul/zuul-jobs master: [multi-node-bridge] Add script to configure connectivity
clarkboh heh the hourly jobs appear to have updated the insecure-ci-registry so this whole time I've been waiting for the job to run it was already done18:22
clarkbthe process is up and running as the expected user18:23
clarkbI guess I should recheck a change that willtalk to it18:23
clarkb has been rechecked and show talk to it18:23
clarkball that to say initial indications are this is happy but double checking with actual jobs now18:25
corvuswhat's the status of the new lb?18:32
corvuslooks like the config changes merged... i'm guessing no one has launched a vm yet and that's next?18:33
clarkbcorvus: ya its been on my todo list to try and do that but I keep getting distracted18:33
clarkbbut ya we need to boot the instance, test it, then update dns to point at the lb instead of zuul0218:33
clarkbNext week I'll be afk a bunch for meetings. But if it isn't done by week after I can try and put it higher on the priority queue18:34
corvusi'm guessing we want it in rax-dfw?18:40
clarkbcorvus: ya that should match the region of the schedulers18:40
corvuslemme see if i can kick that off now18:40
corvuswe have an 8gb vm for gitea-lb... do we want to scale that down for zuul?18:41
corvusomg yes.18:42
clarkbcorvus: I think we can. The main consideration there is network bw and I think that scales with flavor size18:42
clarkbcorvus: however zuul web network traffic is pretty small compared to git traffic18:42
corvusoh yeah that must be why we did that.  i'll consult the rax tables.18:42
fungiis that the case in vexxhost?18:42
clarkbI'm not sure if it is the case in vexxhost18:42
fungii thought gitea-lb was in vexxhost, so that doesn't quite explain why we used an 8gb flavor there18:43
corvuswe're using basically no cpu and ram on gitealb18:44
fungibut yes, if we're creating the zuul lb in rackspace we'll want to be mindful of the flavor-specific bandwidth18:44
clarkbcorvus: ya haproxy is extremely efficient. We are limited by the application not haproxy18:44
corvusah rxtx factor is what we're looknig for i think18:45
fungiyep, that's it18:45
corvuswe don't have a way to break out web traffic from zuul02... but we can probably subtract zuul01 from zuul02 and get a rough estimate of req bandwidth18:46
fungishould be more than accurate for that, yes. i expect we'll want to have a lot of breathing room anyway18:48
corvusback-of-napkin math says: zuul02 outbound bandwidth average 2.5mbps, zuul01 is 1.5; so we need 1mbps :)18:48
corvusso literally any flavor they have should work18:48
corvuswant to try a 2GB instance?  2vcpus, 240mbps, 80gb disk18:49
clarkbI think typcially we have used the "performance" flavors which have smaller disks which may be helpful here since we don't need much disk18:50
corvussorry, we'd do performance flavor, so 40gb disk and 400mbps.  vcpu/ram is the same18:50
clarkbsounds great18:50
corvusmaking it so18:51
corvusfocal still the image du jour?18:53
fungi240mbps sounds like more than plenty19:01
fungieven if we missed a MB/s to Mbps conversion in there it's still an order of magnitude beyond what we need19:02
corvusthen 400mbps is even better :)19:02
fungiyep, wfm19:02
fungimoar betterer19:02
corvusrunning unattended upgrades is taking a wee bit of time19:13
corvusgood, that was the prompt it needed to finish :)19:13
corvusthe rdns scripts don't seem to work... 19:15
corvusERROR: 'response'19:15
opendevreviewJames E. Blair proposed opendev/system-config master: Add zuul-lb01 to inventory
clarkbcorvus: it responds with error but the records are created19:18
opendevreviewJames E. Blair proposed opendev/ master: Add zuul-lb01
corvusclarkb: error: success19:19
fungithat's marvellous19:19
corvusstraight from the ministry of truth19:20
fungii figured it was ministry of information retrieval19:21
fungithe truth will MAKE you free19:21
elodillessorry, fyi, we now had again the issue with the create-yoga patches: they were not enqueued to the check queue (i mean they were, but no job matched for them if i remember correctly the error). do you need to leave them as they are to be able to debug those, or is that OK if I 'recheck' them?19:24
elodilles( )19:24
fungielodilles: you can recheck them19:24
elodillesfungi: ack, thanks19:24
fungiwe expect it's a race condition with layouts updating from the branch creation19:25
fungiwhich has been there for a while, but may have worsened when we started running more than one scheduler19:25
elodillesack, thanks for the details!19:26
opendevreviewMerged opendev/ master: Add zuul-lb01
fungijentoio: clarkb: looks like deployment of the updated registry container config finally happened19:36
clarkbfungi: ya I discovered it actually happend a previously via the hourly job19:37
fungioh, awesome19:38
clarkbanyway I recheked the gitea 1.16.3 change to see that it handles things happily and so far it seems fine19:38
opendevreviewMerged opendev/system-config master: Add zuul-lb01 to inventory
corvuszuul-lb service playbook is running20:24
corvusand now seems to be proxying20:24
clarkbit redirected me to zuul.o.o so seems it hit the backend20:25
clarkbI guess I need to set up /etc/hosts override to check it properly20:25
corvusi went to and got a cert warning but otherwise works20:26
clarkbmy /etc/hosts override shows that it seems to owrk from here as well20:26
fungii put "2001:4800:7818:104:be76:4eff:fe02:f30f" in my /etc/hosts and went to with my browser, no problems20:27
clarkbnote only zuul02 is in the balance pool currnetly20:27
clarkband zuul01 is not running a zuul-web but changing that should be straightforward. We put a cert on zuul01 already iirc20:28
fungiyep, that's the one the cert says i got20:28
fungizuul02 i mean20:28
corvusworking on a change for zuul01 now20:28
opendevreviewJames E. Blair proposed opendev/system-config master: Run zuul-web on zuul01 and add to load balancer
corvusi think we can go ahead and manually start the zuul-web process on 01; i'll go ahead and do that20:32
clarkbdoes it go through apache?20:33
corvusi think we were only not running the actual zuul-web service20:34
opendevreviewJames E. Blair proposed opendev/ master: Point zuul.o.o at the lb
corvusi think we should be able to land those changes at any time and in any order20:36
clarkbcorvus: fwiw the discussion about haproxy checks prompted me to look at that for gitea. If you think that would be helpful for zuul as well I can look into an update for that too20:36
clarkbbasically we do an http check agsint the backend which checks both apache and the srevice behind it are functional20:36
corvusyeah that sounds like it would be better20:37
corvusespecially since zuul-web can take a long time to start20:38
clarkbok I'll look at that20:38
corvuslooks like we're just doing tcp checks now20:38
corvusclarkb: thank20:38
corvusin other news, the points on the grafana dashboard look ridiculous, i'll figure out how to to make them less fisher price20:39
fungiyeah, those are some large dots relative to the spacing20:39
*** dviroel is now known as dviroel|out20:43
opendevreviewJames E. Blair proposed openstack/project-config master: Fix reconfiguration time graph
corvusthat fixes an oops on one of the graphs and also the dots20:44
clarkbcorvus: small issue on the zuul01 change one sec20:45
clarkband posted20:45
clarkbI noticed it putting the checks change together20:46
opendevreviewJames E. Blair proposed opendev/system-config master: Run zuul-web on zuul01 and add to load balancer
opendevreviewClark Boylan proposed opendev/system-config master: Do more robust checks against zuul-web with haproxy
clarkbI think ^ should do it for the checks20:52
corvusclarkb: cool, lgtm.  another option would be to have apache proxy the /health/ready endpoint (which is on a separate backend port) and check that.  but we intentionally don't start cherrypy until we're ready anyway, so this should be equivalent.20:56
fungicorvus: inline question on 83213821:08
opendevreviewJames E. Blair proposed opendev/system-config master: Run zuul-web on zuul01 and add to load balancer
corvusthx fixed21:11
opendevreviewClark Boylan proposed opendev/system-config master: Do more robust checks against zuul-web with haproxy
clarkbshould we hold off on landing the dns update until those two change sland nd are confirmed working?21:13
fungii guess we could, just for a chance to double-check21:16
corvuseh, i'm not too worried.  either way :)21:17
clarkbya I gues its simple to fix if something has a sad21:18
corvusand the worst case scenario of zuul-web being semi-inaccessible on a friday afternoon isn't terrible21:19
clarkbmy typing is extra bad today because I'm on the laptop ensuring everything works before I dpeend on it next week. Unfortunately the network card in this thing has very variable rtt to my AP. I might swap it out with an intel ac200 in the future21:21
jentoiofungi: cool, glad to see it role finally21:34
corvusclarkb: if you have a quick sec for this dashboard fix that'd be swell:
clarkboh yup missed it earlier21:40
corvusfungi: and if you have a sec for that'd be groovy21:40
opendevreviewMerged openstack/project-config master: Fix reconfiguration time graph
opendevreviewMerged opendev/ master: Point zuul.o.o at the lb
opendevreviewClark Boylan proposed opendev/system-config master: Do more robust checks against zuul-web with haproxy
clarkbcorvus: fungi  ^ I got the variable scoping wrong there. Yay for testing22:18
fungioh neat22:18
clarkbI see dns has updated for me22:43
fungifor me as well22:44
fungiwebui still solid here22:45
fungiif somewhat quiet. but it's the weekend22:45
opendevreviewMerged opendev/system-config master: Run zuul-web on zuul01 and add to load balancer
fungicorvus: the smaller dots look much better23:06
corvusdns updated for me, looks good23:20
corvusi think i'll try shutting down zuul-web on 01 and see how the lb responds23:20
clarkbcorvus: I don't know if thta has applied yet?23:21
corvusyeah, logs are saying only 02 at this point23:21
clarkbits running jobs for that change now23:21
clarkbso should be soon23:21
opendevreviewJames E. Blair proposed opendev/system-config master: Allow zuul-lb to send stats to graphite
opendevreviewJames E. Blair proposed openstack/project-config master: Add zuul load balancer dashboard
corvusclarkb: fungi  something to do while waiting :)23:29
opendevreviewClark Boylan proposed opendev/system-config master: Don't run infra-prod-run-refstack on all group var updates
clarkb^ bugged me looking at the jobs running for the lb updates23:32
clarkbcorvus: the bits per second entries have a scale of 8 does that mean we are converting bytes to bits?23:33
corvuserm, i copied the existing dashboard and s/git/zuul/ :)23:37
corvusbtw, while i'm looking at that, does anyone know how to make grafana not look like a clickbait news site?23:38
clarkbya isn't it great they give a feed of their blog in the software?23:39
corvus"Learn these 10 secret tricks to grafana the most from your grafana" or whatever23:39
clarkb I'm not sure how to parse that23:40
clarkbsounds like if you have an account you change your account's preferences23:41
clarkbbut not sure how to drop it from the main page23:41
corvusclarkb: confirmed that haproxy reports bytes/sec and our dashboard translates to bits/s23:41
corvusso we can straight up see that looks like it does 100mbit continuous23:42
clarkbthe load balancer should be getting updated to do both 01 and 02 nowish23:42
clarkbya I think it just started the replacement process with the new config23:43
corvusand peaks around 400mbps23:43
clarkbI got what looked like a lack of css on first reload after the lb updated. I did a hrad refresh and it seemed fine23:44
clarkbI am talking to 01 according to the cert23:44
corvusi am also talking to 0123:44
corvussome people are getting 02 though, i see log entries for both23:46
corvusi'm going to shut down 01 now23:46
clarkbthe tcp check doesn't quite do what we want there. Thats ok23:47
clarkb(it was a known issue)23:47
clarkbsince apache is listening on those ports the tcp check passes and we get a 500 error23:47
clarkbmy chang ewhich should land soon should fix that23:47
corvusoh has that change not landed yet?23:47
corvushah whoops ok23:47
clarkbnot yet. I can manually disable 01 in haproxy23:47
clarkbthat will fix it23:47
clarkbwe just have to manually reenable it again23:48
corvusnah, i mostly wanted to verify your change :)23:48
corvusi thought they were both going in together23:48
corvus01 is coming back up now23:48
corvusi shut down apache on 0123:49
corvusit did take the finger server out correctly :)23:49
corvusand 01 is out now because apache is down.  so the tcp check works.23:49
fungiwe could redirect from the root grafana page to the dashboards index23:59

Generated by 2.17.3 by Marius Gedminas - find it at!