openstackgerrit | Ian Wienand proposed opendev/zone-opendev.org master: Fix RAX ORD internal mirror typo https://review.opendev.org/759974 | 00:01 |
---|---|---|
openstackgerrit | Merged opendev/zone-opendev.org master: Fix RAX ORD internal mirror typo https://review.opendev.org/759974 | 00:14 |
*** hamalq has quit IRC | 00:26 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: reprepo: enable cron jobs https://review.opendev.org/759965 | 00:29 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: reprepro: deploy Ubuntu keys https://review.opendev.org/759975 | 00:29 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove mirror-update server and related puppet https://review.opendev.org/759976 | 00:43 |
ianw | $ host mirror-int.ord.rax.opendev.org | 00:43 |
ianw | mirror-int.ord.rax.opendev.org is an alias for mirror01-int.ord.rax.opendev.org. | 00:43 |
ianw | mirror01-int.ord.rax.opendev.org has address 10.209.128.57 | 00:43 |
fungi | lgtm now | 00:54 |
*** Green_Bird has joined #opendev | 00:57 | |
*** Green_Bird has quit IRC | 00:57 | |
*** Goneri has quit IRC | 01:03 | |
openstackgerrit | Merged opendev/system-config master: Generate internal certs for RAX ORD mirror https://review.opendev.org/759971 | 01:09 |
*** kindwindfall has quit IRC | 01:40 | |
openstackgerrit | Merged opendev/system-config master: reprepro: deploy Ubuntu keys https://review.opendev.org/759975 | 01:40 |
ianw | [Wed Oct 28 01:44:39 UTC 2020] Verifying: mirror01-int.ord.rax.opendev.org | 02:01 |
ianw | [Wed Oct 28 01:44:43 UTC 2020] mirror01-int.ord.rax.opendev.org:Verify error:During secondary validation: Incorrect TXT record | 02:01 |
ianw | i'm not sure what the deal is with that :/ | 02:01 |
ianw | looks like LE does multiple lookups and one of them failed, which is weird | 02:06 |
ianw | acme.opendev.org is showing the right txt records | 02:06 |
clarkb | did it race? | 02:10 |
clarkb | or maybe we have the old txt there too for the existing name and it doesnt see that asvalid currently? | 02:10 |
ianw | [Wed Oct 28 01:44:50 UTC 2020] review-dev.opendev.org:Verify error:During secondary validation: Incorrect TXT record | 02:11 |
ianw | so same thing happened for review-dev | 02:11 |
ianw | "cmd": "rndc reload acme.opendev.org", | 02:13 |
ianw | it says "zone reload queued" | 02:13 |
ianw | but i've never know it to take that long | 02:14 |
clarkb | perhaps the lag was in syncing to the ns servers? | 02:15 |
ianw | Oct 28 01:44:34 adns1 named[15038]: client @0x7fdfd030d300 104.239.140.165#56334/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR started: TSIG tsig (serial 1603849472) | 02:16 |
ianw | Oct 28 01:44:34 adns1 named[15038]: client @0x7fdfd030d300 104.239.140.165#56334/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR ended | 02:16 |
ianw | Oct 28 01:52:19 adns1 named[15038]: client @0x7fdfd04e3790 162.253.55.16#35254/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR started: TSIG tsig (serial 1603849472) | 02:16 |
ianw | Oct 28 01:52:19 adns1 named[15038]: client @0x7fdfd04e3790 162.253.55.16#35254/key tsig (acme.opendev.org): transfer of 'acme.opendev.org/IN': AXFR ended | 02:16 |
ianw | i think that says that ns2 got the update some 10 minutes later | 02:17 |
ianw | maybe we have to manually run a transfer on ns1/ns2 to be sure? i haven't seen this before | 02:19 |
clarkb | Ya I'm not sure why it would stagger them like that. It could be a load balacning thing done for larger zones? | 02:19 |
ianw | Oct 28 01:44:34 ns2 nsd[18851]: notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches. | 02:20 |
ianw | Oct 28 01:44:34 ns2 nsd[2112]: [2020-10-28 01:44:34.796] nsd[18851]: info: notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches. | 02:20 |
ianw | i wonder if we don't have ipv6 configured | 02:21 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: nameserver: Allow master server to notify via ipv6 https://review.opendev.org/759988 | 02:34 |
ianw | i think maybe we've just been lucky with timing? unless something in particular changed | 02:35 |
clarkb | doesthe ipv6 log happen earlier when we expect? | 02:37 |
ianw | yeah, the timestamps line up | 02:37 |
ianw | we issued the reload at 01:44:50 | 02:37 |
clarkb | change seems fine fwiw | 02:38 |
ianw | Oct 28 01:44:34 adns1 named[15038]: received control channel command 'reload acme.opendev.org' | 02:40 |
ianw | Oct 28 01:44:34 ns2 nsd[18851]: notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches | 02:40 |
ianw | so yeah, same second | 02:40 |
ianw | but [Wed Oct 28 01:44:50 UTC 2020] review-dev.opendev.org:Verify error:During secondary validation: Incorrect TXT record | 02:40 |
ianw | perhaps we should put like a 1 minute sleep in there at least | 02:41 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: letsencrypt-install-txt-record: pause after adding TXT records https://review.opendev.org/759991 | 02:45 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: nameserver: Allow master server to notify via ipv6 https://review.opendev.org/759988 | 03:10 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: letsencrypt-install-txt-record: pause after adding TXT records https://review.opendev.org/759991 | 03:10 |
*** tkajinam has quit IRC | 03:10 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: mirror: remove old ceph links https://review.opendev.org/760040 | 03:19 |
*** lamt has quit IRC | 03:22 | |
*** rchurch has quit IRC | 03:33 | |
*** rchurch has joined #opendev | 03:34 | |
openstackgerrit | Merged opendev/system-config master: ARM64 : run base test on Focal too https://review.opendev.org/756629 | 03:57 |
*** sean-k-mooney has quit IRC | 04:16 | |
*** sean-k-mooney has joined #opendev | 04:16 | |
*** ykarel has joined #opendev | 04:24 | |
*** fressi has joined #opendev | 04:33 | |
*** fressi has quit IRC | 04:43 | |
fungi | okay, awake again... are we okay for a quick gerrit restart now before the call starts? | 04:51 |
clarkb | I guess so? I'm around | 04:54 |
fungi | okay, restarting | 04:55 |
fungi | downed | 04:55 |
fungi | up -d'd | 04:55 |
clarkb | fungi: when gerrit is done you may want to catch up on the conversation we had around zone transfers and allowed ip addrs | 04:56 |
clarkb | tldr we think LE may be failing beacuse its trying to use ipv6 and it fallsback to ipv4 too slowly for LE verification to work | 04:56 |
ianw | you'd think we would have noticed by now, but yeah, maybe something changed? | 04:57 |
clarkb | I know on some clouds the RAs happen late enough that you don't get them immediately on boot | 04:58 |
fungi | gerrit seems to be up again | 04:58 |
clarkb | maybe unattended upgrades caused the server to restart and notice it has an ipv6 address ti can speak from now? | 04:58 |
ianw | yeah, it hasn't rebooted in ... a long time | 04:59 |
fungi | "notify for acme.opendev.org. from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches" suggests that our secondary nameservers are not configured to trust zone update notifications from our primary | 04:59 |
fungi | or at least not from its v6 addy | 05:00 |
clarkb | fungi: yup ianw has a change up to fix that | 05:00 |
fungi | ahh, okay | 05:00 |
fungi | now i see it | 05:00 |
fungi | also i guess call is starting | 05:00 |
clarkb | yup I'm there :) | 05:00 |
ianw | it seems everything is having fun with the new cryptography release | 05:00 |
clarkb | did they break things? | 05:01 |
ianw | http://paste.openstack.org/show/gYsegN1rbLIlCYAnR7Dj/ | 05:15 |
ianw | clarkb/fungi: http://paste.openstack.org/show/IpLdtZECcqsncXyThBGa/ | 05:18 |
fungi | #status log restarted gerrit at 04:55 to pick up its-storyboard plugin config update | 05:25 |
openstackstatus | fungi: finished logging | 05:25 |
fungi | openstackgerrit seems to have gone silent | 05:44 |
fungi | i wonder if the gerrit restart has confuddled it. will restart | 05:44 |
fungi | 2020-10-28 04:55:41,410 ERROR gerrit.GerritWatcher: Exception con | 05:45 |
fungi | suming ssh event stream: | 05:46 |
fungi | yeah, restarting it now | 05:46 |
*** openstackgerrit has quit IRC | 05:46 | |
fungi | #status log restarted gerritbot which got confused reading the event stream while gerrit was restarted | 05:47 |
openstackstatus | fungi: finished logging | 05:47 |
fungi | https://review.opendev.org/760051 Document dual account split for Gerrit admins | 05:49 |
*** openstackgerrit has joined #opendev | 05:55 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: mirror-update/reprepro : use common functions https://review.opendev.org/758695 | 05:55 |
openstackgerrit | Merged opendev/system-config master: mirror: remove old ceph links https://review.opendev.org/760040 | 06:08 |
*** ysandeep|away is now known as ysandeep|ruck | 06:08 | |
*** DSpider has joined #opendev | 06:34 | |
fungi | before i go to bed, just briefly marvelling at how polite robots are to one another, if only people were as cordial: https://github.com/openstack/openstack-helm-images/pull/1 | 06:34 |
fungi | as robograndpa always said, "you'll catch more rustflies with lubricant than cleaning fluid" | 06:44 |
* fungi realizing he's probably mildly delirious wanders off to find sleep | 06:46 | |
*** tkajinam has joined #opendev | 06:53 | |
*** marios has joined #opendev | 06:54 | |
*** sshnaidm|afk is now known as sshnaidm|rover | 07:00 | |
*** eolivare has joined #opendev | 07:22 | |
*** sboyron has joined #opendev | 07:30 | |
*** ykarel has quit IRC | 07:44 | |
*** ralonsoh has joined #opendev | 07:45 | |
*** ykarel has joined #opendev | 07:45 | |
*** rpittau|afk is now known as rpittau | 07:51 | |
*** slaweq has joined #opendev | 08:00 | |
*** andrewbonney has joined #opendev | 08:10 | |
*** lpetrut has joined #opendev | 08:11 | |
*** webmariner has quit IRC | 08:35 | |
*** hashar has joined #opendev | 08:42 | |
*** ykarel_ has joined #opendev | 08:49 | |
*** ykarel has quit IRC | 08:52 | |
*** tosky has joined #opendev | 08:57 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 09:00 | |
*** openstack has quit IRC | 09:21 | |
*** openstack has joined #opendev | 09:22 | |
*** ChanServ sets mode: +o openstack | 09:22 | |
openstackgerrit | Jens Harbott (frickler) proposed opendev/system-config master: nameserver: Allow master server to notify via ipv6 https://review.opendev.org/759988 | 09:26 |
*** logan- has quit IRC | 09:27 | |
*** logan- has joined #opendev | 09:27 | |
*** ysandeep|lunch is now known as ysandeep|ruck | 09:45 | |
*** ykarel_ is now known as ykarel | 09:51 | |
openstackgerrit | Lajos Katona proposed openstack/project-config master: Add publish-to-pypi template to networking-l2gw https://review.opendev.org/760096 | 09:59 |
openstackgerrit | Lajos Katona proposed openstack/project-config master: Add publish-to-pypi template to networking-l2gw https://review.opendev.org/760096 | 10:03 |
*** marios has joined #opendev | 10:14 | |
*** ysandeep|ruck is now known as ysandeep|brb | 10:31 | |
zbr | apparently gerrit seems to have problems remembering sessions, it lost my session seconds ago, even if I was already logged in and active. | 10:41 |
frickler | #status log force-merged https://review.opendev.org/759831 at the request of nova ptl in order to unblock integrated gate | 10:43 |
openstackstatus | frickler: finished logging | 10:43 |
frickler | zbr: gerrit was restarted yesterday evening, was this the first time you accessed it since then? | 10:46 |
frickler | zbr: actually not yesterday but this morning at 04:55 UTC | 10:46 |
zbr | like seconds before, anyway lets hope we succeed with the upgrade soon. | 10:47 |
*** slaweq has quit IRC | 10:51 | |
*** ysandeep|brb is now known as ysandeep|ruck | 10:54 | |
*** slaweq has joined #opendev | 10:55 | |
*** Green_Bird has joined #opendev | 11:13 | |
*** lpetrut has quit IRC | 11:25 | |
*** lpetrut has joined #opendev | 11:26 | |
openstackgerrit | Merged opendev/system-config master: nameserver: Allow master server to notify via ipv6 https://review.opendev.org/759988 | 11:26 |
*** lpetrut has quit IRC | 11:26 | |
*** lpetrut has joined #opendev | 11:27 | |
*** dmellado has quit IRC | 11:38 | |
*** dmellado has joined #opendev | 11:39 | |
*** ykarel has quit IRC | 11:47 | |
*** ykarel has joined #opendev | 11:49 | |
*** Green_Bird has quit IRC | 12:03 | |
*** Green_Bird has joined #opendev | 12:04 | |
*** ykarel_ has joined #opendev | 12:16 | |
*** eolivare has quit IRC | 12:21 | |
*** hashar has quit IRC | 12:50 | |
*** hashar has joined #opendev | 12:56 | |
openstackgerrit | Merged openstack/project-config master: Use internal address for RAX ORD https://review.opendev.org/759972 | 13:10 |
fungi | zbr: if you have multiple gerrit browser tabs open and the session for one of them expires and you re-login, then the login for the other tabs is invalidated, and will appear not to be logged in the next time you do anything in them (unless you just reload them). then if you try to log one of them in it will invalidate the other login you just completed... the only workarounds i know of are to either close | 13:11 |
fungi | all but one of your gerrit tabs, or explicitly reload every other gerrit tab if you have to log one of them back in | 13:11 |
zbr | ouch... what can i say. | 13:12 |
fungi | i tend to do the latter if i have multiple gerrit tabs open | 13:12 |
zbr | this explains it, i have tons of tabs. | 13:12 |
fungi | yeah, basically auth will fight between tabs | 13:12 |
zbr | i wonder if newer versions sorted that | 13:12 |
fungi | polygerrit might, but this problem has been around for as long as i can remember | 13:13 |
fungi | it's a good idea of something to test on review-test once we run back through the upgrade on it (likely next week) | 13:15 |
openstackgerrit | Merged openstack/project-config master: Add publish-to-pypi template to networking-l2gw https://review.opendev.org/760096 | 13:16 |
*** eolivare has joined #opendev | 13:24 | |
*** sshnaidm|rover has quit IRC | 13:25 | |
*** ykarel_ has quit IRC | 13:31 | |
frickler | infra-root: publish-irc-meetings in opendev-prod-hourly has been queued for 87 hrs according to the status page, that doesn't look right to me | 13:39 |
*** sshnaidm|rover has joined #opendev | 13:39 | |
*** sshnaidm|rover is now known as sshnaidm|mtg | 13:44 | |
*** mlavalle has joined #opendev | 14:07 | |
*** d34dh0r53 has quit IRC | 14:07 | |
*** d34dh0r53 has joined #opendev | 14:11 | |
*** Goneri has joined #opendev | 14:48 | |
*** Goneri has quit IRC | 14:52 | |
frickler | infra-root: puppet has failed in the deploy pipeline for https://review.opendev.org/759972 with some issues I can't identify | 14:53 |
openstackgerrit | Aleksey Zvyagintsev proposed openstack/diskimage-builder master: Allow processing 'focal' ubuntu release in lvm https://review.opendev.org/760156 | 14:55 |
fungi | frickler: where do you see the deploy failure? or has it not reported yet? | 15:00 |
frickler | fungi: ah, wrong link, sorry, https://review.opendev.org/759988 is the one with the failures | 15:03 |
frickler | for the rax mirror change I was going to ask whether it needs some further action in order to get deployed | 15:04 |
fungi | which failure are you looking at for 759988? looks like both infra-prod-remote-puppet-afs and infra-prod-remote-puppet-else had problems | 15:05 |
*** ykarel is now known as ykarel|away | 15:07 | |
fungi | infra-prod-remote-puppet-else seems to have failed installing puppet on grafana01.opendev.org ("Platform not currently supported"), running puppet on ask01.openstack.org ("Error: Systemd start for jetty failed!") and running puppet on openstackid-dev01.openstack.org ("Error: Function lookup() did not find a value for the name 'openstackid_dev_message_broker_host'") | 15:10 |
*** lpetrut has quit IRC | 15:19 | |
fungi | etherpad is looking really slow for me. is anyone else having trouble? could just be my systems is overloaded by ptg browser stuff | 15:25 |
fungi | scoreboard looks fairly open, so it's probably just me | 15:28 |
*** ykarel|away has quit IRC | 15:41 | |
mrunge | hi there, is there a process to become stable core in projects? | 15:51 |
mrunge | I've been stable core in horizon quite some time ago | 15:51 |
mrunge | now most of the people from telemetry are gone, and it seems there is only one stable core for telemetry left | 15:52 |
fungi | mrunge: it's up to the projects themselves. sounds like a question you might want to (re)ask in #openstack-tc | 15:53 |
mrunge | fungi, ack | 15:53 |
fungi | mr i see zhurong is the ptl for telemetry, and cloudnull/gmann were the openstack tc liaisons for the project last cycle (i think they're in the process of identifying liaisons for the new cycle) so those might also be people to reach out to: https://governance.openstack.org/tc/reference/projects/telemetry.html | 15:54 |
fungi | er, mrunge ^ | 15:54 |
mrunge | fungi, it seems I am the PTL for telemetry in this cycle | 15:55 |
mrunge | :) | 15:55 |
fungi | mrunge: aha, i guess the change to set you hasn't merged yet | 15:56 |
mrunge | probably, yes | 15:56 |
fungi | in that case i can add you to the core review group if none of the existing core reviewers is around to do so. i'll just double-check the governance changes real fast | 15:56 |
mrunge | that would be great fungi | 15:56 |
mrunge | yes, sure | 15:56 |
gmann | mrunge: process is to ask stable core team to add in list but I do not think that team is active anymore. We in TC had some discussion about changing the process in shangahi PTG but that could not proceed further. I am going to add it in TC PTG etherpad to discuss | 15:59 |
clarkb | fungi: I believe we'rerunning grafana on docker now so that may need an inventpry cleanup? | 16:00 |
clarkb | not sure about the other two | 16:01 |
fungi | mrunge: aha, looks like the confusion is my (openstack technical election official) fault, i didn't notice https://review.opendev.org/757971 generated by our governance update script didn't reflect the telemetry election outcome recorded in https://review.opendev.org/757967 | 16:05 |
fungi | mrunge: i'll push up a change to fix that real quick | 16:05 |
*** sauloasilva1 has joined #opendev | 16:06 | |
mrunge | fungi, thank you! | 16:06 |
openstackgerrit | zbr proposed zuul/zuul-jobs master: Add test_setup_reset_connection setting https://review.opendev.org/653130 | 16:08 |
fnordahl | #os-charms discussing in-flight development and hashing out topics for tomorrows session | 16:09 |
fnordahl | #os-charms now discussing in-flight development and hashing out topics for tomorrows session | 16:10 |
*** chandankumar is now known as raukadah | 16:23 | |
AJaeger | fnordahl: are you sure you're in the right channel? | 16:27 |
*** ysandeep|ruck is now known as ysandeep|away | 16:29 | |
fnordahl | AJaeger: I'm postive that I was not, did not post #chanfail in an attempt to dodge that fact but you caught me :) | 16:33 |
openstackgerrit | Jens Harbott (frickler) proposed openstack/project-config master: Fix the internal rax mirror name https://review.opendev.org/760183 | 16:35 |
mrunge | fungi, did we had a recent change in the ssh config for review? | 16:48 |
mrunge | "no mutual signature algorithm" ? | 16:48 |
clarkb | mrunge: did you recently upgrade to fedora 33? | 16:49 |
mrunge | yes, I did | 16:49 |
fungi | mrunge: i'm guessing you had a recent change in your operating system ;) | 16:49 |
fungi | yeah, that | 16:49 |
mrunge | yes.... sigh | 16:50 |
clarkb | fedora 33 has decided that the deprecated sha1 hashing for ssh host keys should be disallowed and not just deprecated so that is disabled by default on fedora 33 | 16:50 |
clarkb | you can reenable it on a per host basis which is our suggestion for now. And you can see yseterday's announcement for a gerrit upgrade in a few weeks for how we'll address that longer term | 16:50 |
mrunge | thank you clarkb , that is a great help | 16:51 |
mrunge | at least, I have an idea where to look now | 16:51 |
*** hamalq has joined #opendev | 16:51 | |
openstackgerrit | Merged openstack/project-config master: Fix the internal rax mirror name https://review.opendev.org/760183 | 16:55 |
mrunge | clarkb, out of curiosity, where was the announcement re. gerrit upgrade sent to? I certainly missed that | 16:56 |
fungi | mrunge: service-announce@lists.opendev.org mailing list | 16:57 |
clarkb | mrunge: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html | 16:57 |
mrunge | thank you, /me takes a note on subscribing to that | 16:57 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: This updates LE config for the ord mirror to the correct name https://review.opendev.org/760185 | 16:59 |
clarkb | AJaeger: fungi ^ I think that is the fix we need then that will trigger the playbook and update the cert | 17:00 |
fungi | thanks | 17:00 |
clarkb | mgagne: may also have throughts on the network issues in inap | 17:01 |
fungi | the ones i looked at were consistently reporting an unexpected network disconnection in ansible which bubbled up as failed/retry | 17:05 |
clarkb | in the past we've seen similar when ansible tries to start a new connection (usually for rsync since it doesn't use the control persistent ssh process) when two instances are fighting over an IP address via arp | 17:06 |
fungi | i thought we usually saw ssh host key mismatches, but maybe only if the rogue instances were listening on ssh | 17:07 |
clarkb | ya that was how it often showed up, but I imagine it isn't the only way the error can manifest | 17:08 |
fungi | i agree | 17:08 |
*** rpittau is now known as rpittau|afk | 17:09 | |
*** mlavalle has quit IRC | 17:14 | |
openstackgerrit | Matthias Runge proposed openstack/project-config master: Create telemetry group and include https://review.opendev.org/760190 | 17:16 |
*** mlavalle has joined #opendev | 17:17 | |
*** tosky has quit IRC | 17:20 | |
*** marios is now known as marios|out | 17:25 | |
*** eolivare has quit IRC | 17:25 | |
clarkb | fungi: zuul is happy with https://review.opendev.org/#/c/760185/ now if you take a look at that to fix the ssl cert for mirror-int.ord | 17:41 |
*** sshnaidm|mtg is now known as sshnaidm|rover | 17:46 | |
*** ykarel|away has joined #opendev | 17:47 | |
fungi | will check it out, thanks. juggling too many things at once | 17:47 |
*** marios|out has quit IRC | 17:53 | |
*** andrewbonney has quit IRC | 17:57 | |
*** ykarel|away has quit IRC | 17:59 | |
*** ralonsoh has quit IRC | 18:00 | |
openstackgerrit | Merged opendev/system-config master: This updates LE config for the ord mirror to the correct name https://review.opendev.org/760185 | 18:18 |
*** hashar has quit IRC | 18:28 | |
*** webmariner has joined #opendev | 18:28 | |
clarkb | base playbook for the LE config fix failed on a small number of hosts. It looks like rc: -13 running ASK [base/exim : Install Exim] | 18:46 |
clarkb | that should run hourly too I guess we check if the same hosts fail again then dig in more (possible apt problem?) | 18:47 |
openstackgerrit | Matthias Runge proposed openstack/project-config master: Create telemetry group and include https://review.opendev.org/760190 | 18:52 |
clarkb | fungi: ok the cert looks good to me now on mirror.ord. Checking to see if zuul executors have an updated project config which will change the mirror naming | 18:54 |
clarkb | ya ze01 has an updated project config | 18:54 |
clarkb | I think we can reenable ord in nodepool if you agree | 18:54 |
*** sshnaidm_ has joined #opendev | 18:57 | |
*** sshnaidm|rover has quit IRC | 19:00 | |
*** sshnaidm_ is now known as sshnaidm|rover | 19:06 | |
TheJulia | is zuul being crawled again? | 19:14 |
clarkb | do you mean gerrit? | 19:14 |
clarkb | load is elevated http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=25&rra_id=all | 19:15 |
TheJulia | err, yeah | 19:16 |
TheJulia | gerrit | 19:16 |
TheJulia | 20-30 seconds per page load | 19:16 |
TheJulia | and now a few in a second | 19:16 |
TheJulia | ~7-8 seconds on my last click | 19:17 |
clarkb | the index threads are the ones consuming all the cpu time says melody so that continues to fit the pattern | 19:17 |
clarkb | ya I see the requests in the apache logs now | 19:18 |
*** mtreinish has quit IRC | 19:19 | |
clarkb | I've applied an firewall rule for that source | 19:20 |
*** mtreinish has joined #opendev | 19:20 | |
*** sshnaidm|rover is now known as sshnaidm|afk | 19:28 | |
*** mtreinish has quit IRC | 19:29 | |
*** mtreinish has joined #opendev | 19:29 | |
clarkb | ianw: I think you had mentioned maybe contacting google abuse and seeing if we can work out something via that avenue? do we still think that is a good idea? | 19:33 |
*** rchurch has quit IRC | 19:36 | |
*** hashar has joined #opendev | 19:39 | |
*** rchurch has joined #opendev | 19:40 | |
clarkb | "Excessive web crawling" is an option in their drop down of abuse types | 19:44 |
fungi | cool, i've taken nl01 back out of the emergency disable list. should i manually restore the max-servers for rax-ord or wait for ansible to bring it back up? i guess no need to wait since only the executors need the mirror name fix right? | 19:52 |
clarkb | ya I think you can just return the old max-servers value | 19:52 |
fungi | oh, huh, looks like ansible did anyway. i probably raced an in-progress run and lost | 19:53 |
fungi | yeah, it got updated at 17:33z | 19:54 |
fungi | shall i go ahead and revert the inap-mtl01 disablement too? | 19:54 |
clarkb | I guess so and we can monitor it for trouble | 19:55 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Revert "Temporarily stop booting nodes in inap-mtl01" https://review.opendev.org/760220 | 19:55 |
donnyd | in case somebody asks about POST_FAILURES related to OE - my edge router is on it's last leg apparently and just turned itself off and was quite resistant to turning back on.... | 20:00 |
openstackgerrit | Clark Boylan proposed opendev/base-jobs master: Disable logging to OpenEdge https://review.opendev.org/760222 | 20:03 |
clarkb | fungi: donnyd ^ that should cover the log publishing side | 20:03 |
fungi | donnyd: i know how it feels | 20:05 |
donnyd | yea, its a pretty large bummer fungi | 20:10 |
donnyd | but I have done what I can | 20:10 |
fungi | well, er, yes i also know how it feels to have a router which doesn't want to boot, but i meant more that after the past few weeks i think i know how your router feels too. i've got a tendency to turn off at random and don't really want to power back on | 20:12 |
openstackgerrit | Donny Davis proposed openstack/project-config master: Disable OE, edge router is about to buy the farm https://review.opendev.org/760227 | 20:12 |
donnyd | I think the power supply is going, but it also puked up some controller errors for the internal disk controller | 20:13 |
donnyd | honestly I can't believe it came back on at all | 20:13 |
donnyd | not good at all | 20:14 |
fungi | ahh, yeah, failing psu can result in power sags which lead to misbehaving subsystems for sure | 20:14 |
donnyd | oh I feel you on the powering down part too | 20:15 |
donnyd | I want to power myself down right now and reboot sometime next week | 20:15 |
fungi | definitely tempting | 20:16 |
openstackgerrit | Merged opendev/base-jobs master: Disable logging to OpenEdge https://review.opendev.org/760222 | 20:28 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=68293&rra_id=all I think that shows the mirror.ord change is working | 20:32 |
clarkb | traffic on the eth0 side has dropped way off | 20:33 |
clarkb | we're still proxying externally as well as pulling from afs externally so the public interface slowness may still impact us but much less so due to caching | 20:33 |
clarkb | I get much quicker throughput pulling that fedora atomic image from home so ya seems happier | 20:34 |
*** tosky has joined #opendev | 20:39 | |
clarkb | TheJulia: ^ fyi on that since you reported the pep8 timeouts that led to use doing that switch of interfaces | 20:39 |
*** mlavalle has quit IRC | 20:40 | |
ianw | clarkb: yeah, i ended up reporting it, if it goes to a human is debatable | 20:46 |
clarkb | ianw: is there a way to add new IPs to that or maybe we just wait and see what happens with the original report? | 20:46 |
ianw | i comes from an email google-cloud-compliance-reply+<uuid-looking-thing> which says "please do not hesitate to reach out to us with additional questions or concerns" | 20:48 |
ianw | pleasingly vague on what reaching out means | 20:48 |
fungi | please feel free to go soak your head | 20:49 |
ianw | i can put together a reply with the latest ip | 20:49 |
ianw | fungi: i noticed discussion on grafana puppet, i need to remerge https://review.opendev.org/#/c/739625/ but we should get rid of that | 20:50 |
fungi | ahh, okay | 20:51 |
openstackgerrit | Merged openstack/project-config master: Disable OE, edge router is about to buy the farm https://review.opendev.org/760227 | 20:56 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Cleanup grafana.openstack.org https://review.opendev.org/739625 | 20:59 |
ianw | fungi: any objections switching the mirror-update jobs today? https://review.opendev.org/#/c/759965/ i've run a few by hand, especially the ubuntu ones, and they seem ok (keys available, seems to just work) | 21:01 |
ianw | note logs will be exported now too @ https://static.opendev.org/mirror/logs/reprepro/ | 21:02 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Document dual account split for Gerrit admins https://review.opendev.org/760051 | 21:03 |
fungi | ianw: should we poweroff the mirror-update.openstack.org server first so it won't fire any of its cronjobs in a race with the newer server between deployment jobs for the two, or do you already have it covered another way? | 21:06 |
*** lamt has joined #opendev | 21:08 | |
ianw | fungi: that might be best, i've marked them as disabled via puppet in that change but really i imagine we don't even need to apply that | 21:08 |
fungi | mmm, yeah, mainly trying to work out how we stop them from possibly both running briefly and fighting one another | 21:09 |
ianw | if you're happy i can shut it down, put it in emergency, merge the change, and then monitor; if it works we can just merge the change to remove the old server from puppet | 21:09 |
ianw | if it doesn't, we can always hand-edit the cron jobs while we figure it out | 21:10 |
fungi | yeah, i'll poweroff the old server now after commenting out everything in root's crontab just in case we have to boot it again for some reason | 21:11 |
fungi | #status log powered off mirror-update.openstack.org (with its root crontab content commented out) in preparation for merging https://review.opendev.org/759965 | 21:13 |
openstackstatus | fungi: finished logging | 21:13 |
fungi | config-core: https://review.opendev.org/760220 brings inap back online should restore more capacity again | 21:15 |
ianw | fungi: cool, thanks, i'll watch the jobs today | 21:17 |
ianw | another puppet host down! | 21:17 |
ianw | clarkb: for consistency do we want to switch all the RAX's to an internal address? | 21:18 |
clarkb | ya probably a good idea | 21:18 |
ianw | i can setup the dns entries today | 21:19 |
openstackgerrit | Merged opendev/system-config master: reprepo: enable cron jobs https://review.opendev.org/759965 | 21:45 |
*** slaweq has quit IRC | 21:48 | |
*** Green_Bird has quit IRC | 21:54 | |
*** dulek has quit IRC | 21:59 | |
*** rm_work has quit IRC | 21:59 | |
*** jrosser has quit IRC | 21:59 | |
*** zer0c00l has quit IRC | 21:59 | |
*** zer0c00l has joined #opendev | 21:59 | |
*** rm_work has joined #opendev | 21:59 | |
*** jrosser has joined #opendev | 21:59 | |
*** dulek has joined #opendev | 22:01 | |
*** slaweq has joined #opendev | 22:04 | |
*** DSpider has quit IRC | 22:09 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: reprepro: randomise start times better https://review.opendev.org/760250 | 22:10 |
*** slaweq has quit IRC | 22:14 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: reprepro: fix cron config path and randomise times better https://review.opendev.org/760250 | 22:28 |
ianw | fungi: ^ that one's now an actual fix as the cron job requires the full path (my testing i was doing with a manual command-line that i guess i got right) | 22:29 |
fungi | i've been noticing the cronspam from reprepro runs is kinda verbose too | 22:32 |
ianw | hrm, are we missing stderr | 22:33 |
fungi | it says it's doing 2>&1 | 22:34 |
fungi | so i think reprepro may be writing to fd 3 or something | 22:34 |
ianw | that sounds like a reprepro thing to do | 22:35 |
fungi | if you want to temporarily add your address to the daily firehose, it's done by editing /etc/ansible/hosts/group_vars/all.yaml | 22:35 |
fungi | on bridge | 22:36 |
ianw | hrm, running it on command line i'm not seeing output with 2>&1 | 22:38 |
ianw | fungi: is it debian-docker in particular? | 22:39 |
ianw | flock -n /var/run/reprepro/debian-docker.lock bash -c "for DISTRO in xenial bionic focal; do reprepro-mirror-update /etc/reprepro/debian-docker-\$DISTRO mirror.deb-docker >>/var/log/reprepro/debian-docker-\$DISTRO-mirror.log; done" 2>&1 | 22:40 |
ianw | i think the 2>&1 might be in the wrong place there | 22:40 |
fungi | ahh, yup possibly | 22:40 |
fungi | needs to be inside the done, i agree that's it | 22:41 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: reprepro: catch stderr of individual deb-docker runs https://review.opendev.org/760254 | 22:42 |
fungi | yeah, instead of redirecting stdout and stderr to the log, it was redirecting stdout to the log and then redirecting stderr of the outer calling process to its stderr | 22:43 |
fungi | er, to its stdout | 22:43 |
fungi | so the stderr of the reprepro invocations was being written to the stdout of the flock and caught by crond | 22:44 |
ianw | that job's a bit ridiculous, but because they all share a volume it doesn't seem to make sense to split it into 3 jobs that could sometimes lock each other out | 22:45 |
fungi | indeed | 22:45 |
TheJulia | clarkb: thanks for the update! | 23:05 |
*** sboyron has quit IRC | 23:10 | |
openstackgerrit | Merged opendev/system-config master: reprepro: fix cron config path and randomise times better https://review.opendev.org/760250 | 23:27 |
*** hashar has quit IRC | 23:28 | |
openstackgerrit | Merged opendev/system-config master: reprepro: catch stderr of individual deb-docker runs https://review.opendev.org/760254 | 23:47 |
ianw | fungi: i think we're coming to the conclusion the next best step is to mod_rewrite these problem queries to a static html page saying "this is causing problems, please contact us" | 23:47 |
ianw | it seems like a match on the UA + bits of the query string will keep it pretty unique? | 23:48 |
clarkb | we should be able to grep apache logs to do a rough confirmation of ^ | 23:48 |
ianw | i guess we have to chain a few rules together to be effective? | 23:48 |
ianw | i was saying to clarkb thought it will be good to have examples in there when we're not under actual pressure to block something urgently | 23:49 |
clarkb | fwiw I just grepped the UA and the set of url query parameters and both the head and tail of that were our most recent friend | 23:50 |
clarkb | ianw: I htink if you have a sequence of rewrite conditions they are ANDed together? | 23:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!