Monday, 2020-09-14

*** DSpider has quit IRC00:37
fungiyes, i agree the fact that we're logging both vhosts to the same file makes investigating this slightly more confusing00:47
fungiultimately, i would expect requests considered for caching to either be logged as a cache hit or a cache miss00:47
fungirequests logged as neither are, i think, not being considered for caching at all00:47
fungipossibly skipped by the cache mod, possibly not routed to it, i'm not sure which00:48
openstackgerritIan Wienand proposed opendev/system-config master: zuul-web: move LogFormat combined-cache into config
*** user_19173783170 has joined #opendev01:16
ianwfungi: ^ i agree it's not being considered, with the mod_cache status just "-"01:21
user_19173783170when i register my openstack fundation account, it alway prompt "Please confirm that you are not a robot",why can't i receive the captcha?03:14
user_19173783170when i register my openstack fundation account, it alway prompt "Please confirm that you are not a robot", why can't i receive the captcha?03:15
ianw2001:4800:7819:103:be76:4eff:fe04:5870 - - [2020-09-14 03:19:27.549] "GET /api/tenant/pyca/status HTTP/1.1" 200 1140 cache hit "-" "curl/7.47.0"03:20
ianwso so much is wrong03:20
ianwuser_19173783170: what's the page url?03:20
user_19173783170it's this:""03:21
ianwuser_19173782170: ok, so you don't see the "i'm not a robot" check box down the bottom?  i do03:23
ianwor are you saying you select that and it doesn't belive you?03:23
user_19173783170i dont see it03:23
user_19173783170is the reason my ip is in china?03:24
ianwuser_19173782170: do you have any sort of ad-blockers or similar installed?03:24
ianwoh ... china ... well maybe?  it's a standard reCAPTCHA box i see03:24
ianwGoogle reCAPTCHA works in China, as long as you reference reCAPTCHA library by instead of See developer doc section “Can I use reCAPTCHA globally”03:25
user_19173783170dont have ad-blockers03:25
ianw<script src=''></script>03:26
ianwso it looks like that recaptcha should probably be not referencing for global support03:26
ianwuser_19173783170: it looks like the website will have to fix this ... can you use a vpn :/03:28
clarkbfwiw I expect jimmy can help tomorrow. Maybe file a bug?03:28
ianwsorry not sure what else to suggest.  we definitely have users from China, but I'm not sure if they worked around this or it's something new03:28
clarkbis the bug tracker and I can ping jimmy et al in the morning03:30
ianwclarkb / user_19183783170 : i can quickly file the bug03:30
user_19173783170i use this for the first time03:31
openstackLaunchpad bug 1895496 in openstack-org "User from China reporting reCAPTCHA does not work" [Undecided,New]03:32
ianwuser_19173783170: ^ i'm afraid we might have to wait for a resolution on (which i imagine will happen US daytime tomorrow) to get this going for you03:33
user_19173783170no problem, thanks for you help03:33
openstackgerritIan Wienand proposed opendev/system-config master: zuul-web: rework caching
ianwfungi / clarkb: ^ i've been poking on the 000-default.conf to come up with that.  with it, i'm seeing everything get it's cache-event flag in the logs filled out.  i think it's on the way, at least04:11
*** lpetrut has joined #opendev05:57
*** ysandeep|away is now known as ysandeep06:10
*** cgoncalves has joined #opendev06:13
*** qchris has quit IRC06:21
openstackgerritCarlos Goncalves proposed openstack/project-config master: Update branch checkout for octavia-lib DIB element
*** qchris has joined #opendev06:33
*** hashar has joined #opendev06:40
*** andrewbonney has joined #opendev07:42
*** ysandeep is now known as ysandeep|lunch07:46
openstackgerritPierre-Louis Bonicoli proposed zuul/zuul-jobs master: default test_command: don't use a shell builtin
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
openstackgerritSorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Create elastic-recheck container image
*** DSpider has joined #opendev08:24
*** tosky has joined #opendev08:27
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add support to use stow for ensure-python
openstackgerritwu.shiming proposed openstack/diskimage-builder master: Remove install unnecessary packages
openstackgerritPierre-Louis Bonicoli proposed zuul/zuul-jobs master: default test_command: don't use a shell builtin
*** ysandeep|lunch is now known as ysandeep09:10
*** sshnaidm|pto is now known as sshnaidm09:10
*** dtantsur|afk is now known as dtantsur10:27
openstackgerritSorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Create elastic-recheck container image
*** ykarel has joined #opendev10:52
ykarelIs there some issue with
ykarelit returns 50010:52
cgoncalvesI can confirm 500s11:01
user_19173783170i have solved the problem which can't receieve the CAPTCHA in chinese ip11:14
user_19173783170the solution is installing a plugin which named "Ghelper" in Chrome11:16
user_19173783170i also want to ask how to relate my openstack fundation account to my ubuntuone account11:33
*** lpetrut has quit IRC12:00
*** lpetrut has joined #opendev12:01
*** Goneri has joined #opendev12:11
*** priteau has joined #opendev12:14
*** slaweq_ has joined #opendev12:37
ttxuser_19173783170: ah, good to know. I'll pass the info by12:41
ttxuser_19173783170: For your account issue, you should send an email to so that they can help you12:42
*** mnaser has quit IRC13:32
*** mnaser has joined #opendev13:32
*** mnaser has quit IRC13:32
*** mnaser has joined #opendev13:32
*** tkajinam has quit IRC13:37
*** slaweq_ has quit IRC13:38
fungiykarel: yeah, emilienm reported it in #openstack-infra too. taking a look now13:59
fungi#status log restarted houndd on codesearch.o.o following a json encoding panic at 10:03:40z
openstackstatusfungi: finished logging14:01
fungiykarel: ^ it should be on its way back up now14:02
fungicgoncalves: ^14:02
cgoncalvesfungi, thanks! waiting for reindexing to finish :)14:02
fungiyeah, it takes a few minutes for that to complete unfortunately14:04
*** auristor has quit IRC14:05
*** sshnaidm is now known as sshnaidm|afk14:07
ykarelfungi, Thanks14:08
fungiuser_19173783170: if you use the same e-mail addresses for both your openstack foundation and ubuntuone accounts, then we'll be able to correlate them14:12
*** hashar has quit IRC14:14
*** auristor has joined #opendev14:18
dmsimardbtw seeing all fedora-31 based jobs fail in RETRY_LIMIT due to unbound being in an "unknown state", i.e:
dmsimardwon't have time to troubleshoot for a while longer but wanted to point out in case others have a similar issue14:20
*** ykarel_ has joined #opendev14:25
*** ykarel has quit IRC14:28
*** ykarel__ has joined #opendev14:32
*** ykarel_ has quit IRC14:35
*** ykarel__ is now known as ykarel14:35
openstackgerritCarlos Goncalves proposed openstack/project-config master: Add 'check arm64' trigger to check-arm64 pipeline
*** slaweq_ has joined #opendev14:40
*** icey has quit IRC14:48
*** icey has joined #opendev14:49
*** slaweq_ has quit IRC14:55
*** ykarel is now known as ykarel|away15:00
*** lpetrut has quit IRC15:06
*** Topner has joined #opendev15:11
*** Topner has quit IRC15:11
*** ykarel|away has quit IRC15:20
*** Topner has joined #opendev15:23
*** lpetrut has joined #opendev15:28
*** priteau has quit IRC16:04
openstackgerritMerged opendev/system-config master: zuul-web: move LogFormat combined-cache into config
*** ysandeep is now known as ysandeep|away16:31
*** ykarel|away has joined #opendev16:56
*** mlavalle has joined #opendev16:57
*** lpetrut has quit IRC16:57
*** Gyuseok_Jung has quit IRC17:00
*** ykarel|away has quit IRC17:05
openstackgerritSorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Create elastic-recheck container image
clarkbI'm looking at the fedora issue dmsimard reported17:20
fungithanks, i hadn't gotten time to dig into that yet17:20
clarkb seems related17:20 bug 1853736 in systemd "systemctl show service fails with "Failed to parse bus message: Invalid argument"" [Unspecified,Closed: errata] - Assigned to systemd-maint17:20
clarkbit seems that they fixed fedora 32 but not 3117:22
fungifedora 31 is *so* last year (literally!)17:23
clarkbya I'm not sure how to handle this17:23
clarkbwe could shell out on f31 but I worry the next use of the service module will just fail17:23
clarkbthough in this case maybe having a working base job is enough for us then jobs can correct their use of service otherwise17:24
dmsimardo/ thanks for looking into this -- it started happening fairly recently, worked until now17:31
*** andrewbonney has quit IRC17:31
dmsimardI can confirm that bumping to f32 "fixes" it17:31
clarkbI'm just going to brute force the service restarts with commands17:32
clarkbworking on that change now17:32
*** dtantsur is now known as dtantsur|afk17:35
clarkbhow do you use different handlers based on some criteria?17:40
clarkbdo we need to have different notifying tasks for those criteria?17:40
openstackgerritClark Boylan proposed opendev/base-jobs master: Test handling unbound restart on fedora 31
clarkbI guess we can test and seeif ^ works17:45
clarkbanother option may be to just deprecate f31 quickly? I dunno where f33 is at. Something to talk to ianw about I guess17:50
clarkbinfra-root and are two different followups to the zuul web performance issues from last week. One addresses caching and the other adds more zuul-webs17:51
dmsimardf33 is in beta right now iirc17:52
fungiand includes zuul!17:53
* fungi is so proud17:53
clarkbdmsimard: ya I mean if the fix works I think we should land it .Mostly concerned that ansible makes this difficult17:53
clarkbits intentionally hidden in a test role to start too since I'm not super confident in it17:53
corvusclarkb: i vote try cache first then scale; i reviewed accordingly17:56
corvusclarkb: that okay, or do you want to get them going in parallel?17:56
clarkbcorvus: I think doing them one after another to better see impact is a good idea17:56
openstackgerritMerged opendev/system-config master: zuul-web: rework caching
openstackgerritMerged openstack/project-config master: Revert "Pin setuptools<50 in our image venvs"
fungiwhat's the next step toward deleting nb04? just taking it out of system-config? are there any remaining blockers to that? has that change been proposed already?18:40
clarkbfungi: we need to ensure none of its images are still alive in clouds18:40
fungiahh, right, particularly bfv clouds like vexxhost i guess18:40
fungii'll take a look shortly18:41
clarkbthere are two remaining opensuse-tumbleweed-0000240092 and opensuse-tumbleweed-000024009318:41
clarkbthose are the only two tumbleeed images we have18:41
fungistale ready nodes maybe18:42
clarkbwe probably aren't building new tumbleweed images otherwise nb01 and nb02 would have at least one18:42
fungi#status log provider maintenance 2020-09-30 01:00-05:00 utc involving ~5-minute outages for databases used by cacti, refstack, translate, translate-dev, wiki, wiki-dev18:44
openstackstatusfungi: finished logging18:44
clarkbya our tumbleweed image builds are failing18:46
clarkbconflict between grep and busybox-grep18:46
clarkbI think we can add busybox-grep to the deinstalls list to fix it18:46
fungi#status log deleted old 2017-01-04 snapshot of in rax-dfw18:48
openstackstatusfungi: finished logging18:48
fungi#status log cinder volume for wiki.o.o has been replaced and cleaned up18:50
openstackstatusfungi: finished logging18:50
fungiso that only leaves the nb04 cinder volume which would be impacted by next months maintenance18:51
fungiand rackspace seems to have cleaned up all our old error_deleting volumes too18:51
fungionce nb04 is fully gone, i'll update the open ticket for the cinder maintenance and let them know we've replaced/deleted all the volumes they mentioned18:54
openstackgerritClark Boylan proposed openstack/diskimage-builder master: Install grep before busybox on suse distros
clarkbfungi: ^ I think that is the fix, we already do similar for xz in dib18:58
clarkb(why xz doesn't just supercede busybox-xz and grep supercede busybox-grep I don't know)18:59
fungiinteresting. yeah, debian doesn't even allow that. packages have to declare replaces or breaks if they have conflicting files, otherwise they don't make it into the distro19:01
clarkbzypper gives you the option of breaking rsync by keeping busybox-grep, replacing busybox-grep with grep or doing nothing19:02
clarkbso it has some of the info but doesn't default just do the sane thing19:02
openstackgerritClark Boylan proposed opendev/base-jobs master: Test handling unbound restart on fedora 31
clarkblinter didn't like that I used systemctl in command instead of the service module19:18
* fungi sighs19:18
clarkbit would be fine if the service module worked :)19:19
donnydSo the transition to ceph and nvme for object storage at OE is complete and I think we would probably be ok to put it back in the rotation for logs19:34
donnydNot sure what needs to be tested before putting it back into prod19:36
clarkbdonnyd: nice. We'll want ot update the secret at as well as update the list of clouds used at
clarkbthen when we're happy with base-test behavior we can make the same playbook change to the base/post-logs.yaml change19:38
clarkbdonnyd: should we go ahead and add that when we get time or do you have more to do?19:38
clarkb(figure infra-root should do it since we have to encrypt the secret19:38
donnydI think we are good to go... but I will find out for sure when the workload comes19:39
donnydI have tested it and seems to work as much as one person can test19:39
*** Topner has quit IRC19:39
fungiclarkb: i expect we can just revert the removal unless we have reason to believe the credentials changed?19:42
fungiwell, revert but apply it to base-test i mean19:42
fungibut not need to reencrypt19:42
clarkbfungi: well the cloud name changed at least. Did we also change credentials or did they stay the same?19:42
fungioh, if it was pre-oe then yeah the creds are likely entirely different19:43
donnydyea, the creds will likely need to be redone19:43
clarkbyes the current secret is labeled cloud_fn_one19:44
fungigot it19:44
clarkbI can work on a change in a bit19:44
openstackgerritPierre Riteau proposed ttygroup/boartty master: Update author and home page to match gertty
openstackgerritPierre Riteau proposed ttygroup/gertty master: Update author email address
*** slaweq_ has joined #opendev19:53
openstackgerritClark Boylan proposed opendev/base-jobs master: Use OpenEdge swift to host job logs
clarkbdonnyd: infra-root ^ fyi20:01
openstackgerritMerged opendev/storyboard master: Optimise the Story browsing query
donnydIf i wanted to help write some poorly written ansible to help the infra teams app deployments, where could I start? Is the deployment code in each app, or in a central repo?20:37
clarkbdonnyd: most of our config management is in
clarkbdonnyd: that contains our inventory and groups definitions as well as most of the playbooks and roles we use20:38
corvusdonnyd: we aim for well tested so poorly written won't bother us :)20:39
clarkbdonnyd: we're using more and more docker containers as well (driven by ansible and docker-compose) and the Dockerfiles for those tend to be in the application repos (like zuul or nodepool) unless we need to do a forked docker image for some reason20:39
donnydcorvus: we will see about that.. I write some pretty bad stuff20:39
clarkbfungi: good idea on the testing of OE swifts thing20:40
clarkblet me do a new pas20:40
donnydSo the dockerfile and compose will be in the app repo and system-config is the tooling to deploy it20:40
openstackgerritClark Boylan proposed opendev/base-jobs master: Use OpenEdge swift to host job logs
fungidonnyd: one which is halfway there is our storyboard deployment... we're publishing docker images to dockerhub for the various storyboard services but not using them yet, we're still deploying storyboard with the storyboard-puppet module at the moment20:41
clarkbdonnyd: the dockerfile will be in the app repo but then the docker-compose and ansible to deploy it is in system-config20:41
donnydso where do the containers get deployed? not that it matters.. just curious20:41
fungidonnyd: the service roles20:42
clarkbsystem-config/playbooks/roles/gitea may be a good example20:42
clarkbthough we have the dockerfile for gitea in system-config/docker/gitea because we've forked it to add our own main page and branding stuff20:42
fungier, service plabooks20:42
fungiwhich then use service-specific roles20:43
donnydI was just looking at that one clarkb20:43
donnydI was reading the thread on storyboard and it made me think that maybe I could actually make a useful contribution20:45
donnydthe etherpad one also looks like a decent example20:47
diablo_rojo_phonYou definitely could and we'd love to have whatever help you'd like to offer :)20:47
clarkbdiablo_rojo_phon: ya etherpad and gitea should be pretty similar to how we'd do storyboard except we'd put the Dockerfile in storyboard itself I bet20:47
clarkber donnyd ^20:47
clarkbtab complete failed me20:47
fungistoryboard or anything else you want to help with, help is most welcome20:49
fungiit's also worth noting that switching from puppet to ansible (+docker where relevant) is a blocker for us updating our deployment platforms too. the version of puppet we're stuck on works on xenial but not bionic, so to upgrade past xenial we need to replace the old puppet orchestration and config management20:51
donnydhow many things still need to be migrated? lots??20:51
clarkbits a fair bit, though I think many of them should be of the more direct variety now20:53
clarkbthe more difficult ones like gerritand zuul have been done (though netx up is working out the gerrit upgrade whichI'm slowly making progress on)20:53
fungiyeah, the stuff which remains doesn't really have interdependencies20:55
fungiso a lot more manageable as a task on its own20:55
fungii think ianw has graphite in progress already, but i'm not aware of any others which are in progress21:08
openstackgerritMerged opendev/base-jobs master: Use OpenEdge swift to host job logs
*** diablo_rojo has joined #opendev21:15
clarkbI've rechecked which should test ^21:16
clarkbthe commit message is no longer accurate but its still using base-test21:18
clarkbdonnyd: it seems to work21:25
clarkbdonnyd: its a little weird to see https on port 8080 but nothing actually wrong iwth that. Do you want to check anything before I propose a change to add it into the production rotation?21:25
donnydYea there are containers populating in the project21:25
donnydyea, it is probably a bit strage21:26
donnydI usually proxy to the 13XXX range21:26
donnydbut eh.. .it works21:26
donnydI think we are good to hook21:26
clarkbya looks functional to me /me makes another chnage21:26
donnydI am hopeful that the nvme object storage will work well this time around21:27
donnydwe will see when its time for logs to expire hit21:27
openstackgerritClark Boylan proposed opendev/base-jobs master: Use openedge swift for logs on all jobs
donnydso long as everyone can reach them, we should be good to hook21:28
clarkbI'm ipv4 only at home. Maybe fungi wants to hit it from ipv6 before +2'ing ^21:29
clarkbor if there is no ipv6 then thats fine too :) I dind't check dns21:29
donnydhrm, there should be21:29
donnydI do have a record21:30
clarkbya there is a AAAA record so having someone like fungi confirm ipv6 access works t owould be good. Otherwise I think we can land it21:31
donnydlooks like its open outside of my network best I can test.. that whole being local thing has bitten me before though... so probably best to wait for fungi21:35
*** slaweq_ has quit IRC21:38
*** slaweq_ has joined #opendev21:40
fungiyup, sorry, food distractions here. what url am i testing ipv6 connectivity to?21:50
clarkbfungi: that was generated by rechecking which used the base-test update to have OE swift hosted logs21:51
clarkbif that looks good to you (it does to me via ipv4) then should be safe to land21:52
fungiyeah, i have no trouble accessing that over ipv621:52
*** slaweq_ has quit IRC21:53
*** slaweq has joined #opendev21:54
clarkbunrelated: I'm about to send out the meeting agenda, Get your items in now :)21:54
openstackgerritMerged opendev/base-jobs master: Use openedge swift for logs on all jobs
*** slaweq has quit IRC22:05
*** slaweq has joined #opendev22:09
*** slaweq has quit IRC22:21
ianware we good with the zuul-web proxy bits?22:27
clarkbianw: ya I think its working fine22:27
clarkbor at least it hasn't regressed. I haven'ttried to characterize the cache hit rate or anything like that22:27
ianw[2020-09-14 22:29:15.103] "GET /api/tenant/openstack/status/change/706153,10 HTTP/1.1" 200 2951 "cache miss: attempting entity save" "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"22:29
ianw[2020-09-14 22:29:14.308] "GET /api/status HTTP/1.1" 200 94041 "-" "" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0"22:30
ianwi feel like the first we didn't expect cached, and the second we did22:31
clarkbI think we expected both? it will cache the urls we specify and any below22:31
clarkbso maybe having the changes below that is unexpected but not necessarily wrong aiui22:31
clarkband ya I would've expected the second to be cached or missed22:31
ianwi think i've copied it wrong in the config22:32
*** tosky has quit IRC22:33
clarkbianw: two other things came up that may interesy you. The first is systemd is broken for ansible on f31. attempts to work around that and has links to bugs. The other is our tumbleweed image was build on nb04 and we haven't had a successful nb01 or 02 build which is preventing us from deleting nb04. will fix that I think22:34
openstackgerritIan Wienand proposed opendev/system-config master: zuul-web: fix location match
ianwyeah i saw that on f31.  given f33 comes in october, maybe we just get rid of it; we have f32 now22:35
ianwalthough i think there was something to do with swap images i saw come by for that ...22:36
clarkbthe swap change in ozj merged iirc22:36
clarkbbasically use dd instead of fallocate because ext4 in new kernels breaks swapon on fallocate22:36
ianwyeah, that one22:37
clarkbwe changed it universally to osince the issue is expected to hit everywhere soon enough22:37
ianwi'll go through the dib queue and i guess we want a release22:41
clarkbya its also possible there is a better way to handle that in the zypper context22:43
clarkbbut we already do that workaround for xz and busybox-xz so went with it22:43
ianwfungi / clarkb: maybe one more eye on for copying keys to apt dir would be nice22:43
clarkbI can review that one in a few. Getting the infra meeting agenda out now22:44
clarkbianw: prometheanfire fungi I thought that doesn't work for xenial and I forget the correspinding debian release22:47
clarkbI guess the issue is that it never worked in the first place? so if we fix it we can just fix it for newer distros?22:47
fungithat worked for using binary pgp keyring files, just not ascii-armored keys22:49
clarkbso this will work for older releases too with the proper input data22:49
ianwyeah, that's what i thought.  we could possibly expand the release note to be clearer on that i guess if you want22:50
clarkbI think chances are anyone was using this are slim since prometheanfire found it didn't work at all due to gpg being missing22:50
clarkbso should be an improvement going foward. Probably fine as is22:51
*** tkajinam has joined #opendev22:52
clarkbianw: hrm I've just noticed that failed on the suse fix22:52
clarkbI wonder if we should figure that out too22:53
ianwe2fsprogs-1.45.6-1.19.x86_64 requires info, but this requirement cannot be provided22:53
clarkbits the same basic problem with busybox-gzip I think22:54
clarkbI wonder if we can do an install without busybox22:54
clarkbsince it seems to be problematic here22:55
clarkbbusybox, busybox-gzip, and busybox-static are the 3 busybox things we install in that log22:55
clarkbmaybe if we do gzip instead of busybox-gzip (like with xz and grep) that will be sufficient22:56
johnsomFYI we seem to be seeing that strange CDN/cache issue again. The recently released oslo.log 4.4.0 is returning not found on some jobs.22:56
prometheanfireah, cool, +2+W22:56
clarkbpatterns-openSUSE-base is likely what pulls in the busybox stuff22:56
ianwjohnsom: hrm, do you ahve a link?  is ymq involved again?23:00
openstackgerritClark Boylan proposed openstack/diskimage-builder master: Install gzip instead of busybox-gzip on suse
johnsomSome jobs pass, some aren't23:01
clarkbianw: ^ that should test it at least23:01
clarkbianw: looks like ovh gra123:02
ianwjohnsom: ovh-gra123:02
johnsomovh-bhs1 are passing and finding it fine23:03
ianw will be the thing to look at23:06
clarkbjohnsom: they are on different continents :)23:07
ianw< x-served-by: cache-bwi5141-BWI, cache-cdg20739-CDG23:07
ianw< x-cache: HIT, HIT23:07
ianw< x-cache-hits: 2, 123:07
ianwand that seems to show it23:07
clarkb it is there now on the gra1 proxy23:07
johnsomI guess a number of the oslo libs are triggering it. I have only seen oslo.log but others are reporting other modules23:08
clarkbunfortunately we end up serving whatever pypi gives us and that has a short TTL23:08
clarkboften by the time we notice things have rolled over and are happy23:08
johnsominap-mtl01 is also good23:09
clarkb seems to be where they want feedback and issues23:10
clarkbI wonder how terrible it would be to file an issue with a captured index fiel23:11
ianwif it had x- headers that would probably be good23:12
clarkbya, now th etrouble is catching one :/23:12
ianwi got pretty far with it last time, but fastly had status issue up for slow purging or something, so it was put down to that23:12
clarkbI think the key thing from the job logs is that it sees 4.3.0 as a valid version which implies this isn't a python version thing since 4.4.0 and 4.3.0 both have the same python version requirements in the source index html. It also implies we got an index.html and not an empty response23:14
clarkbthe source html also has a serial on it23:14
clarkbianw: prehaps was the incident this time around23:16
clarkbor maybe they never properly purged back then and we hit the bad servers23:16
clarkbfwiw I think it is likely that fastly is at fault23:16
*** mlavalle has quit IRC23:20
ianwfungi: you could double check the /api/status cache match for in and i can monitor it when it deploys23:20
clarkbianw: your IRC logs are for about whne oslo.log 4.4.0 was reeased too23:20
clarkbianw: I suppose it could just be fallout from that original incident and fastly/pypi never did a proper rsync23:20
ianw... ohhh, i had just assumed oslo.log 4.4.0 had released like an hour ago :)23:21
ianwthat perhaps makes it more interesting to pypa/fastly ...23:21
fungiianw: lgtm, i guess we got rid of all the conditional matches on whether different cache modules were loaded?23:22
clarkbfungi: ya that was the prior change
fungithanks, today has been a bit hectic23:23
fungii saw the title of that change but hadn't taken time to look through it23:23
ianwyeah, it took me quite a while to realise that the cache_mem modules was no more ...23:23
fungiaha! what we were seeing makes MUCH more sense now23:23
fungithanks for figuring that out23:24
ianwi think the <ifdef> stuff is a bit of an anti-pattern; it's better for apache to just stop if you don't have the modules you want i think23:24
fungiit made some sense back when this was in presumed portable puppet modules, but no longer23:25
ianwi did also read that mod_rewrite with [p] is not considered as good as proxypass23:27
fungidid we change how puppet apply's stdout gets logged? is it no longer going to syslog?23:28
clarkbianw: ya my related changes converts it all to proxypass23:28
fungii'm trying and failing to work out why we've stopped deploying new storyboard commits on storyboard.o.o23:28
clarkbianw: but I think I may rewrite that one to not do the extra zuul-web servers if we don't need them with just the caching and the zuul-web bugfix23:28
fungiwe landed a new storyboard commit at 20:16 utc and it seems to have gotten checked out on the server at /opt/storyboard but not pip installed, according to `pbr freeze` there we're several commits behind23:29
fungibut i can't figure out where puppet's attempt to log that would be. we used to get puppet-user entries logged in /var/log/syslog23:30
clarkbfungi: I think it ends up in the ansible logs now23:31
clarkbon bridge23:31
clarkbI don't remember why it changed though23:31
fungii didn't think the puppet output ended up there, i guess that's the behavior change23:31
clarkbI think it was so that we could hae the logs show up in zuul23:31
fungii grepped /var/log/ansible/remote_puppet_else.yaml.log for "pip" but didn't find anything23:31
fungithat log is huge23:31
clarkbwe switched from dumping into syslog to stdout23:32
clarkband ansible grabs the stdout23:32
fungii guess i need to figure out what the name of the task would have been to run puppet on there23:32
ianwthere was some wip to split out the puppet jobs from one big puppet_else into more separate things i think?23:32
clarkbfungi: it may be in an older log file too depending on when the triggering change landed23:32
clarkbianw: that hasn't been done for storyboard yet I don't think23:33
fungioh! we rotate these very aggressively23:33
clarkbfwiw I could go back to using syslog since we aren't really using zuul for those logs23:36
fungithere's a massive gap between the logfiles too23:37
fungithe newest rotated logfile ends at 14:16:14 but the first activity in the current log is 22:49:1123:38
fungiwhere did the other 8.5 hours go?23:38
fungiunfortunately our princess was in that castle, mario23:39
clarkbfungi: its rotated by the job when they complete23:39
clarkbI wonder if we're timing out and breaking that setup23:39
fungior maybe this was not the periodic job23:39
fungii'll start from the zuul end and work backwards23:39
clarkbjob runs and logs to service-foo.yaml.log. At the end of the job we copy that to service-foo.yaml.log.timestamp. Next job runs and does it again23:40
fungiin this case there's no separate service log because it's handled by the puppet catch-all23:41
clarkbbut its the same system (its per playbook)23:42
fungiyeah, the hourly runs for puppet are timing out23:43
fungii'll have to look at this with fresh eyes tomorrow, i'm quickly getting fuzzy here23:44
ianwi think that's the puppet holder-upper-er23:46
ianwit's dead, jim23:47
clarkbI wonder if we should proactively reboot those23:48
clarkbreboot first one, wait for status to settle, reboot second, wait for settling, etc23:48
ianwhave we done them all yet?  something has clearly happened to them :/23:48
ianw#status log rebooted, which was hung23:48
openstackstatusianw: finished logging23:48
clarkbI've done I think two of them23:49
*** DSpider has quit IRC23:52
ianwi've killed all the stuck processes23:52
clarkbwe might also consider splitting them out of puppet else23:52
clarkbthen if they fail the impact is lessened23:52
*** Goneri has quit IRC23:53
ianwyeah, i think we should probably continue that work to split up all of puppet-else23:56
clarkbthe mechanics of it are pretty straightforward iirc. We create a new .pp file for that service/hosts. We then add a job to run the puppet for that manifest and basically run it hwen else runs23:57
clarkbthere are a couple explains we can look at to compare23:57
clarkbthis would make agood meeting agenda item. I'll ninja add it tomorrow23:57
openstackgerritMerged opendev/system-config master: zuul-web: fix location match

Generated by 2.17.2 by Marius Gedminas - find it at!