Friday, 2021-06-11

*** opendevtest <opendevtest!~limnoria@104.239.144.232> has joined #opendev01:15
*** Guest1653 <Guest1653!~limnoria@104.239.144.232> has joined #opendev01:25
opendevreviewmelanie witt proposed opendev/jeepyb master: Convert update_blueprint to use the Gerrit REST API  https://review.opendev.org/c/opendev/jeepyb/+/79591201:27
ianwoh, it looks like i have not setup limnoria correctly to identify to nickserv01:28
ianw!plugins01:30
Guest1653ianw: Error: "plugins" is not a valid command.01:30
*** opendevmeet` <opendevmeet`!~limnoria@104.239.144.232> has joined #opendev01:32
opendevreviewmelanie witt proposed opendev/system-config master: Re-enable update_blueprint for patchset-created  https://review.opendev.org/c/opendev/system-config/+/79591401:34
*** opendevmeet` <opendevmeet`!~limnoria@104.239.144.232> has joined #opendev01:39
*** opendevmeet <opendevmeet!~limnoria@104.239.144.232> has joined #opendev02:02
corvus#status log restarted all of zuul on commit dd45f931b62ef6a5362e39bdb56ee203b74e1381 (4.5.0 +1)02:02
opendevstatuscorvus: finished logging02:02
opendevreviewIan Wienand proposed opendev/system-config master: limnoria: production fixes  https://review.opendev.org/c/opendev/system-config/+/79591702:02
*** timburke <timburke!~timburke@2601:645:c480:3660:d4f3:a3b4:736b:710d> has joined #opendev02:07
corvusre-enqueing02:08
corvusianw: ^ that may need a recheck02:09
*** boistordu <boistordu!~boistordu@0002bdcc.user.oftc.net> has joined #opendev02:11
*** boistordu_ex <boistordu_ex!~boistordu@0002bdcc.user.oftc.net> has quit IRC (Ping timeout: 480 seconds)02:17
corvusre-enqueue is done02:20
*** ysandeep|out is now known as ysandeep02:34
opendevreviewIan Wienand proposed opendev/system-config master: limnoria: production fixes  https://review.opendev.org/c/opendev/system-config/+/79591702:52
*** timburke <timburke!~timburke@2601:645:c480:3660:d4f3:a3b4:736b:710d> has quit IRC (Ping timeout: 480 seconds)03:04
*** opendevmeet <opendevmeet!~limnoria@104.239.144.232> has joined #opendev03:04
opendevreviewIan Wienand proposed opendev/system-config master: limnoria: production fixes  https://review.opendev.org/c/opendev/system-config/+/79591703:17
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: add mariadb_container option  https://review.opendev.org/c/opendev/system-config/+/77596103:32
opendevreviewIan Wienand proposed opendev/system-config master: review02 : switch reviewdb to mariadb_container type  https://review.opendev.org/c/opendev/system-config/+/79519203:32
*** redrobot <redrobot!~redrobot@108-84-79-198.lightspeed.snantx.sbcglobal.net> has quit IRC (Remote host closed the connection)03:40
opendevreviewIan Wienand proposed opendev/system-config master: static: enable SSLProxyEngine for meetings  https://review.opendev.org/c/opendev/system-config/+/79592003:43
*** ysandeep <ysandeep!~sandy@202.173.126.121> has quit IRC (Ping timeout: 480 seconds)04:07
*** ykarel <ykarel!~ykarel@2405:201:5c10:d062:7dc:c662:5028:45c8> has joined #opendev04:22
opendevreviewMerged opendev/system-config master: limnoria: production fixes  https://review.opendev.org/c/opendev/system-config/+/79591704:25
*** ricolin_ <ricolin_!~ricolin@118.150.144.205> has joined #opendev04:31
*** ricolin <ricolin!~ricolin@118.150.144.205> has quit IRC (Ping timeout: 480 seconds)04:35
*** timburke <timburke!~timburke@2601:645:c480:3660:d4f3:a3b4:736b:710d> has joined #opendev04:39
*** ysandeep <ysandeep!~sandy@202.173.126.121> has joined #opendev04:50
opendevreviewMerged opendev/system-config master: static: enable SSLProxyEngine for meetings  https://review.opendev.org/c/opendev/system-config/+/79592005:02
ianwtesting05:20
*** marios <marios!~marios@62-171-24.netrun.cytanet.com.cy> has joined #opendev05:37
*** marios is now known as marios|ruck05:41
*** timburke <timburke!~timburke@2601:645:c480:3660:d4f3:a3b4:736b:710d> has quit IRC (Ping timeout: 480 seconds)06:17
ianwhttps://meetings.opendev.org/irclogs/%23opendev/%23opendev.2021-06-11.log.html looks good06:20
ianwi've tested a few eavesdrop links and they all correctly bounce eavesdrop01.openstack.org -> meetings.opendev.org -> (proxy) -> eavesdrop01.opendev.org06:22
ianwi think we can tick this one off06:22
ianw#status log meetbot/logging now running from limnoria on eavesdrop01.opendev.org06:23
opendevstatusianw: finished logging06:23
*** ralonsoh <ralonsoh!~ralonsoh@36.red-79-150-231.dynamicip.rima-tde.net> has joined #opendev06:27
*** amoralej <amoralej!~amoralej@153.red-80-26-161.dynamicip.rima-tde.net> has joined #opendev06:43
*** hashar <hashar!~hashar@hashar.user.oftc.net> has joined #opendev07:13
*** tosky <tosky!~luigi@dynamic-adsl-78-13-253-141.clienti.tiscali.it> has joined #opendev07:14
*** rpittau|afk is now known as rpittau07:17
*** andrewbonney <andrewbonney!uid417545@id-417545.highgate.irccloud.com> has joined #opendev07:27
*** jpena|off is now known as jpena07:33
opendevreviewMerged opendev/system-config master: Cleanup eavesdrop puppet references  https://review.opendev.org/c/opendev/system-config/+/79501407:45
opendevreviewMerged opendev/system-config master: Run statusbot from eavesdrop01.opendev.org  https://review.opendev.org/c/opendev/system-config/+/79521307:46
*** lucasagomes <lucasagomes!~lucasagom@89.100.20.18> has joined #opendev07:56
*** ysandeep is now known as ysandeep|lunch08:06
*** opendevstatus is now known as Guest168408:09
*** tosky <tosky!~luigi@dynamic-adsl-78-13-253-141.clienti.tiscali.it> has quit IRC (Ping timeout: 480 seconds)08:13
*** mgoddard- <mgoddard-!~mgoddard@238.240.125.91.dyn.plus.net> has joined #opendev08:14
opendevreviewIan Wienand proposed opendev/system-config master: Move statusbot channels out of hiera  https://review.opendev.org/c/opendev/system-config/+/79595808:16
*** mgoddard <mgoddard!~mgoddard@187.240.125.91.dyn.plus.net> has quit IRC (Ping timeout: 480 seconds)08:18
*** mgoddard- is now known as mgoddard08:18
fricklerianw: this doesn't look correct to me: 08:09 -!- opendevstatus is now known as Guest168408:21
*** sshnaidm is now known as sshnaidm|afk08:24
*** tosky <tosky!~luigi@dynamic-adsl-78-13-253-141.clienti.tiscali.it> has joined #opendev08:24
*** ykarel is now known as ykarel|lunch08:31
*** Guest1685 <Guest1685!~limnoria@104.239.144.232> has joined #opendev08:40
*** opendevstatus_ <opendevstatus_!~opendevst@104.130.70.91> has joined #opendev08:45
*** opendevstatus_ is now known as opendevstatus__08:47
*** opendevstatus__ is now known as opendevstatus___08:47
*** opendevstatus___ is now known as opendevstatus____08:47
*** opendevstatus____ is now known as opendevstatus_____08:48
*** opendevstatus_____ is now known as opendevstatus______08:48
ianwfrickler: sorry still working on opendevstatus atm08:57
*** opendevstatus______ <opendevstatus______!~opendevst@104.130.70.91> has quit IRC (Ping timeout: 480 seconds)08:58
*** Guest1684 <Guest1684!~opendevst@eavesdrop01.openstack.org> has quit IRC (Remote host closed the connection)08:58
*** ysandeep|lunch <ysandeep|lunch!~sandy@202.173.126.121> has quit IRC (Ping timeout: 480 seconds)08:59
ianwi've just killed the statusbot running in screen on eavesdrop01.openstack.org.  the service is running on eavesdrop01.opendev.org now, and i'm just waiting on 795958 to give us the channel config09:00
ianwsorry that above was the two bots fighting for the name09:00
*** opendevstatus_ <opendevstatus_!~opendevst@158.69.72.85> has joined #opendev09:13
*** opendevstatus_ is now known as opendevstatus__09:16
*** opendevstatus__ is now known as opendevstatus___09:16
*** opendevstatus___ is now known as opendevstatus____09:16
*** opendevstatus____ is now known as opendevstatus_____09:16
*** opendevstatus_____ is now known as opendevstatus______09:16
opendevreviewMerged opendev/system-config master: Move statusbot channels out of hiera  https://review.opendev.org/c/opendev/system-config/+/79595809:23
*** opendevstatus______ <opendevstatus______!~opendevst@158.69.72.85> has quit IRC (Ping timeout: 480 seconds)09:25
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has joined #opendev09:31
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has quit IRC (Remote host closed the connection)09:31
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has joined #opendev09:31
ianw#status log statusbot running on eavesdrop01.opendev.org09:32
opendevstatusianw: finished logging09:34
ianwthankyou statusbot09:34
ianwthat is visible on https://wiki.openstack.org/wiki/Infrastructure_Status09:35
*** ysandeep|lunch <ysandeep|lunch!~sandy@202.173.126.121> has joined #opendev09:41
*** ysandeep|lunch is now known as ysandeep09:47
opendevreviewIan Wienand proposed opendev/system-config master: limnoria: don't log channel join/parts  https://review.opendev.org/c/opendev/system-config/+/79597209:47
ianwi believe meetbot and statusbot are now fully deployed on eavesdrop01.opendev.org09:49
ianwi think i will shutdown eavesdrop01.openstack.org to avoid any confusion.  this leaves ptg still todo, but we know about that09:50
*** opendevmeet <opendevmeet!~limnoria@104.239.144.232> has joined #opendev09:55
*** opendevmeet is now known as Guest168809:56
*** opendevmeet <opendevmeet!~limnoria@104.239.144.232> has joined #opendev09:57
*** opendevmeet is now known as Guest168909:57
*** Guest1690 <Guest1690!~limnoria@104.239.144.232> has joined #opendev10:03
*** opendevmeet` <opendevmeet`!~limnoria@104.239.144.232> has joined #opendev10:16
*** opendevmeet <opendevmeet!~limnoria@104.239.144.232> has joined #opendev11:15
opendevreviewIan Wienand proposed opendev/system-config master: limnoria: fix nicks syntax  https://review.opendev.org/c/opendev/system-config/+/79598811:15
*** Guest1712 <Guest1712!~opendevst@104.239.144.232> has quit IRC (Remote host closed the connection)11:17
*** Guest1713 <Guest1713!~opendevst@149.202.169.13> has quit IRC (Ping timeout: 480 seconds)11:18
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has joined #opendev11:18
ianwwhen i was debugging connecting as opendevmeet (which can't really do in the gate) i manually fixed the config file correctly.  but then i committed the typo ^, so when ansible applied it, it put in the broken config11:21
ianwthat's why it was working then stopped11:22
ianwanyway, once this little stack of fixes merges, i think we're all good11:22
*** jpena is now known as jpena|lunch11:27
*** opendevstatus_ <opendevstatus_!~opendevst@104.130.219.52> has joined #opendev11:40
*** opendevstatus_ is now known as opendevstatus__11:43
*** opendevstatus__ is now known as opendevstatus___11:43
*** opendevstatus___ is now known as opendevstatus____11:43
*** opendevstatus____ is now known as opendevstatus_____11:43
*** opendevstatus_____ is now known as opendevstatus______11:43
*** opendevstatus______ <opendevstatus______!~opendevst@104.130.219.52> has quit IRC (Ping timeout: 480 seconds)11:51
*** ykarel is now known as ykarel|afk12:07
*** whayutin <whayutin!~weshay|ru@c-73-229-75-146.hsd1.co.comcast.net> has joined #opendev12:09
*** opendevstatus_ <opendevstatus_!~opendevst@104.130.219.164> has joined #opendev12:12
*** opendevstatus_ is now known as opendevstatus__12:14
*** opendevstatus__ is now known as opendevstatus___12:15
*** opendevstatus___ is now known as opendevstatus____12:15
*** opendevstatus____ is now known as opendevstatus_____12:15
*** opendevstatus_____ is now known as opendevstatus______12:15
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has quit IRC (Remote host closed the connection)12:18
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has joined #opendev12:18
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has quit IRC (Remote host closed the connection)12:22
*** opendevstatus______ <opendevstatus______!~opendevst@104.130.219.164> has quit IRC (Ping timeout: 480 seconds)12:24
*** opendevstatus <opendevstatus!~opendevst@104.239.144.232> has joined #opendev12:26
*** ysandeep is now known as ysandeep|mtg12:29
*** jpena|lunch is now known as jpena12:30
opendevreviewIan Wienand proposed opendev/system-config master: statusbot: don't use opendevstatus name in testing  https://review.opendev.org/c/opendev/system-config/+/79599812:35
*** opendevstatus_ <opendevstatus_!~opendevst@213.32.72.249> has joined #opendev12:41
*** opendevstatus_ is now known as opendevstatus__12:44
*** opendevstatus__ is now known as opendevstatus___12:44
*** opendevstatus___ is now known as opendevstatus____12:44
*** opendevstatus____ is now known as opendevstatus_____12:44
*** opendevstatus_____ is now known as opendevstatus______12:44
ianw^ will stop this happening; it's trying to connect during testing12:45
*** amoralej is now known as amoralej|lunch12:46
*** opendevstatus______ <opendevstatus______!~opendevst@213.32.72.249> has quit IRC (Ping timeout: 480 seconds)12:53
*** ykarel|afk is now known as ykarel12:57
opendevreviewIan Wienand proposed opendev/system-config master: statusbot: don't use opendevstatus name in testing  https://review.opendev.org/c/opendev/system-config/+/79599812:59
opendevreviewGhanshyam proposed openstack/project-config master: Add gmann to IRC accessbot  https://review.opendev.org/c/openstack/project-config/+/79598613:07
*** opendevstatus_ <opendevstatus_!~opendevst@104.130.26.53> has joined #opendev13:12
*** opendevstatus_ is now known as opendevstatus__13:15
*** opendevstatus__ is now known as opendevstatus___13:15
*** opendevstatus___ is now known as opendevstatus____13:15
*** opendevstatus____ is now known as opendevstatus_____13:15
*** opendevstatus_____ is now known as opendevstatus______13:15
opendevreviewMerged opendev/system-config master: limnoria: fix nicks syntax  https://review.opendev.org/c/opendev/system-config/+/79598813:17
*** CeeMac <CeeMac!uid366483@id-366483.brockwell.irccloud.com> has quit IRC (Quit: Connection closed for inactivity)13:19
opendevreviewIan Wienand proposed opendev/system-config master: Update eavesdrop deploy job  https://review.opendev.org/c/opendev/system-config/+/79600613:24
*** opendevstatus______ <opendevstatus______!~opendevst@104.130.26.53> has quit IRC (Ping timeout: 480 seconds)13:24
*** amoralej|lunch is now known as amoralej13:28
opendevreviewIan Wienand proposed opendev/system-config master: statusbot: don't prefix with extra # for testing  https://review.opendev.org/c/opendev/system-config/+/79600913:32
*** artom_ <artom_!~artom@205.233.59.73> has quit IRC (Remote host closed the connection)13:35
*** artom_ <artom_!~artom@205.233.59.73> has joined #opendev13:35
ianwok, all config changes rolled out, the meetbot and statusbot containers should be happy and in a steady state13:35
ianwi'm going to turn in now13:35
*** ysandeep|mtg is now known as ysandeep13:37
*** artom <artom!~artom@205.233.59.73> has joined #opendev13:40
opendevreviewDanni Shi proposed openstack/diskimage-builder master: Add a keylime-agent element and a tpm-emulator element  https://review.opendev.org/c/openstack/diskimage-builder/+/78960113:40
*** ysandeep is now known as ysandeep|out13:45
*** artom_ <artom_!~artom@205.233.59.73> has quit IRC (Ping timeout: 480 seconds)13:47
*** artom <artom!~artom@205.233.59.73> has quit IRC (Remote host closed the connection)13:50
*** artom <artom!~artom@205.233.59.73> has joined #opendev13:51
opendevreviewMerged opendev/system-config master: statusbot: don't use opendevstatus name in testing  https://review.opendev.org/c/opendev/system-config/+/79599814:02
*** ralonsoh <ralonsoh!~ralonsoh@36.red-79-150-231.dynamicip.rima-tde.net> has quit IRC (Quit: Leaving)14:14
*** ralonsoh <ralonsoh!~ralonsoh@36.red-79-150-231.dynamicip.rima-tde.net> has joined #opendev14:16
*** artom <artom!~artom@205.233.59.73> has quit IRC (Quit: Leaving)14:37
*** artom <artom!~artom@205.233.59.73> has joined #opendev14:38
*** ysandeep|out <ysandeep|out!~sandy@202.173.126.121> has quit IRC (Ping timeout: 480 seconds)14:38
*** dklyle <dklyle!~dklyle@134.134.139.72> has joined #opendev14:51
*** david-lyle <david-lyle!~dklyle@jfdmzpr05-ext.jf.intel.com> has quit IRC (Remote host closed the connection)14:57
*** timburke <timburke!~timburke@2601:645:c480:3660:d4f3:a3b4:736b:710d> has joined #opendev15:06
*** timburke <timburke!~timburke@2601:645:c480:3660:d4f3:a3b4:736b:710d> has quit IRC (Ping timeout: 480 seconds)15:16
clarkbianw: thank you for taking care of that!15:24
*** hashar <hashar!~hashar@hashar.user.oftc.net> has quit IRC (Quit: I am a virus. Please copy paste me in your /quit message to help me propagate)15:29
*** marios|ruck is now known as marios|out15:37
*** ykarel is now known as ykarel|away15:44
*** rpittau is now known as rpittau|afk15:47
*** odyssey4me <odyssey4me!~odyssey4m@host31-51-109-193.range31-51.btcentralplus.com> has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)15:50
*** ykarel|away <ykarel|away!~ykarel@2405:201:5c10:d062:7dc:c662:5028:45c8> has quit IRC (Ping timeout: 480 seconds)15:56
*** lucasagomes <lucasagomes!~lucasagom@89.100.20.18> has quit IRC (Quit: Leaving)15:56
*** ysandeep <ysandeep!~sandy@202.173.126.240> has joined #opendev16:04
*** ysandeep <ysandeep!~sandy@202.173.126.240> has quit IRC ()16:04
*** timburke <timburke!~timburke@2601:645:c480:3660:d4f3:a3b4:736b:710d> has joined #opendev16:11
*** marios|out <marios|out!~marios@62-171-24.netrun.cytanet.com.cy> has quit IRC (Ping timeout: 480 seconds)16:11
clarkbfungi: ianw mentioned that logan- had responded and we might want to consider https://review.opendev.org/c/openstack/project-config/+/794406 during US hours for better overlap16:15
clarkbfungi: any thoughts on landing that now? I'll be around today if we need to disable it again16:16
*** jpena is now known as jpena|off16:17
fungilooks like it's still workflow -1... will need to revise the change or delete ianw's vote from it16:18
clarkbya I guess we can't as easily click the 'x' to remove the WIP vote anymore16:18
clarkba new ps is probably easiest16:18
fungior temporarily elevating account perms16:19
*** amoralej is now known as amoralej|off16:23
*** amoralej|off <amoralej|off!~amoralej@153.red-80-26-161.dynamicip.rima-tde.net> has quit IRC (Quit: Leaving)16:24
fungii have temporarily elevated my perms to delete ianw's workflow -1 from 794406 and am approving it16:36
clarkbcool16:36
opendevreviewMerged openstack/project-config master: Revert "Revert "Revert "Disable limestone due to mirror issues"""  https://review.opendev.org/c/openstack/project-config/+/79440616:46
*** andrewbonney <andrewbonney!uid417545@id-417545.highgate.irccloud.com> has quit IRC (Quit: Connection closed for inactivity)17:09
*** ralonsoh <ralonsoh!~ralonsoh@36.red-79-150-231.dynamicip.rima-tde.net> has quit IRC (Quit: Leaving)17:19
clarkbfungi: let me get a few commands stashed in a text document really quickly before I start17:41
*** slittle1 <slittle1!~slittle@108.162.140.52> has quit IRC (Read error: Connection reset by peer)17:42
* fungi is standing by to test mirrors17:42
*** slittle1 <slittle1!~slittle@108.162.140.52> has joined #opendev17:43
clarkb#status Notice Zuul is being restarted for server reboots17:45
opendevstatusclarkb: sending notice17:45
-opendevstatus- NOTICE: Zuul is being restarted for server reboots17:46
clarkbfungi: zuul is stopping now17:47
fungiawesome17:48
opendevstatusclarkb: finished sending notice17:48
clarkbzm01 and zm02 failed to reboot because they are not accessible via ssh according to ansible17:48
fungigood to see opendevstatus is still able to send notifications17:49
fungiso they were already down?17:49
*** CeeMac <CeeMac!uid366483@id-366483.brockwell.irccloud.com> has joined #opendev17:49
clarkbnot sure yet. ze01 04 and 05 show similar17:49
clarkbI think thats an ansible behavior17:50
clarkbthe reboot closed the ssh connection before ansible was done with it17:50
clarkbat least zm01 and zm02 report small uptimes17:50
clarkbI'll proceed with the mirror reboots and check the ze's that errored after17:50
fungiahh, okay17:53
clarkbfungi: the 6 focal mirrors are done, but they all produced the same error. Cna you check their uptimes when you check their afs as well?17:53
clarkbI'm checking zuul servers now17:53
fungiyep, testing the mirrors now17:54
clarkbI'm checking uptimes with ansible too17:55
clarkbuptimes all lgtm. Let me know if you think afs is happy then I can run the zuul start playbook17:57
fungii can't reach https://mirror.regionone.osuosl.opendev.org/17:58
fungii also seem to be timing out on https://mirror.regionone.linaro-us.opendev.org17:58
clarkbosuosl's mirror shows the same dmesg problem that limestone did17:59
clarkbor maybe its a different error. /me checks linaro next17:59
fungiit's possible my wireless modem being ipv4-only is presenting some v6 connectivity problems for them17:59
fungiokay, i did finally get a response from https://mirror.regionone.linaro-us.opendev.org/18:00
clarkblinaro doesn't show the oops/protection fault in dmesg18:00
clarkbit is having a hard time with ls in /afs/openstack.org though18:00
fungilooks like mirror.regionone.osuosl.opendev.org is v4-only18:00
clarkbfungi: did you want to try cleanign out the openafs cache on the osuosl mirror?18:01
fungiso not a local v6 problem on my end18:01
clarkbyes I believe we don't have ipv6 there yet18:01
fungiyeah, i'll give that a shot18:01
clarkblinaro looks happy now I can ls in there. I think if you're happy with osuosl after cleaning the cache up we're ready to start zuul18:02
fungiremoving /var/cache/openafs/* on mirror01.regionone.osuosl.opendev.org now18:03
fungiit's taking a few minutes18:03
clarkblooks empty now?18:04
clarkbdont' worry about my ssh session feel free to reboot again when you are ready18:04
fungiokay, deletion complete, rebooting it now18:04
*** odyssey4me <odyssey4me!~odyssey4m@rdng-28-b2-v4wan-161903-cust132.vm39.cable.virginm.net> has joined #opendev18:05
*** ykarel|away <ykarel|away!~ykarel@2405:201:5c10:d062:7dc:c662:5028:45c8> has joined #opendev18:05
clarkbshoudl I go ahead and start zuul since that is the last remaining sad server?18:07
clarkband zuul will take some time anyway?18:07
clarkbit isn't pinging :/18:07
fungiyeah, go for it. worse case we emergency turn down this provider until we get the mirror for it back on track18:08
clarkbok starting zuul now18:08
fungii'm still waiting for it to boot18:08
clarkbfungi: I wonder if we'll need to ask nova to hard reboot that osuosl mirror18:10
clarkbI wonder if the kernel isn't unloading openafs properly or some similar unit is failing to stop making it slow18:10
fungipossibly, i did a `systemctl reboot` on it18:11
fungii'll check the server console, and if it's not in progress i'll hard reboot it18:11
fungiA start job is running for OpenAFS client (2min 36s / 3min 21s)18:14
*** ykarel|away <ykarel|away!~ykarel@2405:201:5c10:d062:7dc:c662:5028:45c8> has quit IRC (Ping timeout: 480 seconds)18:14
fungiit finally booted18:16
clarkbfungi: looks like the reboot ya18:16
*** dviroel <dviroel!uid349012@id-349012.stonehaven.irccloud.com> has quit IRC (Quit: Connection closed for inactivity)18:16
clarkbzuul configs appear to have loaded I'm restoring queues now18:16
fungilooks like afs didn't completely sync up18:16
fungihttps://mirror.regionone.osuosl.opendev.org/ just shows a robots.txt file18:16
clarkb/afs is empty too18:16
fungils: cannot access '/afs/openstack.org': No such file or directory18:17
fungiyup18:17
fungiStarting AFS cache scan... Unable to handle kernel paging request at virtual address ffff800016723e40 [...] Internal error: Oops: 96000007 [#1] SMP18:18
fungii think it got unhappy18:18
fungii'll give it a second reboot18:18
clarkbok18:18
clarkbfungi: seems like it is being slow again? I suspect either something on shutdown trying to stop afs units or on startup trying to start them. Then when it gives up things complete and reboot finishes18:22
fungiit's more like it's timing out trying to communicate with the afs servers18:22
clarkbah18:23
fungii wonder if there's some sort of udp communications issue there18:23
clarkbreenqueing has completed18:25
fungilooks like it eventually rebooted and is timing out starting afsd again18:28
clarkbit isn't letting me ssh in yet either. I get the pam nologin message18:28
fungiyeah, it won't until it gets past this18:28
clarkbin the oops trace is 'afs_InitCacheFile'18:30
clarkbwhich implies that maybe something is still up with the cache?18:30
fungii can try clearing it again, sure18:30
clarkbit certainly seems to have populated18:30
opendevreviewAde Lee proposed zuul/zuul-jobs master: Add role to enable FIPS on a node  https://review.opendev.org/c/zuul/zuul-jobs/+/78877818:31
fungii'm clearing it again and will check more closely once it's done18:31
clarkbfungi: k18:31
fungini retrospect i should probably make sure afsd is completely stopped too18:32
fungiin retrospect18:32
* fungi is apparently one of the knights who say ni18:32
clarkbfungi: theory: we mount /var/cache/openafs as another device, I wonder if openafs starting and mounting normal devices race each other and we possible mount over the cache while openafs is operating on it?18:33
fungini-wom!18:33
fungii'll check underneath it, great suggestion18:33
clarkbthat could explain errors writing to there too if all of a sudden dirs aren't present or something like that18:33
fungii can't seem to kill afsd18:35
clarkbI seem to recall this from before and it was cache related then too.18:35
fungii may need to disable the openafs-client.service unit temporarily and reboot18:36
clarkbfungi: I want to say you can disable it in systemd then reboot18:36
clarkbya18:36
fungidone and rebooting again18:36
fungiit's still trying to kill afsd to restart18:40
clarkbya I think that may be why reboots have been slow previously18:41
clarkbsince systemd wants to stop all the services as part of that18:41
*** david-lyle <david-lyle!~dklyle@134.134.139.72> has joined #opendev18:44
*** dklyle <dklyle!~dklyle@134.134.139.72> has quit IRC (Remote host closed the connection)18:44
fungiit did eventually reboot18:45
funginothing in /var/cache/openafs on its rootfs after unmounting the volume there18:46
clarkbI guess mount the normal cache back, clean it up, then try starting afs manually?18:47
fungiyeah, was just checking that i could ping all our afs servers from it18:47
clarkboh good idea18:48
fungicache volume mounted again and starting openafs-client now18:48
clarkband if this doesn't work maybe we force a dkms rebuild next?18:49
fungithis is arm64 right?18:49
clarkbyes18:49
clarkbit oopsed again18:50
clarkbfwiw linaro is also arm6418:50
clarkbbut possibly different hardware18:50
fungii'll try package upgrades too just to be sure it's not missing a newer rev18:50
clarkbthe trace looks the same as before. its failing in a path to init cache file18:51
clarkbthe cache device has plenty of free disk so not a catastrophic handling of no more disk18:51
fungidisabled openafs-client.service and am rebooting again since i couldn't kill afsd and am worried that any openafs package updates might fail postinst scripts if it's stuck18:53
clarkb++18:53
fungiand if it's already latest, i'll force a reinstall so dmks rebuild will take place18:54
fungiand will clear the cache volume yet again18:54
fungii have seen package upgrades in the past timeout/fail dkms builds leaving incomplete or otherwise broken lkms which tehn act weird on the next reboot, so maybe it's that18:55
clarkbfingers crossed18:56
fungitesting to see if it reboots faster when it doesn't have to wait for afsd to not stop19:02
fungithe 10-second grub menu timeout was the longest part of that reboot ;)19:03
fungiso freshly rebooted with afsd not running at all, and `sudo ls -l /var/cache/openafs` took almost 30 seconds19:04
fungii wonder if there's something not quite right with that volume19:04
clarkbthat could also cause problems with cache init19:05
fungiyeah, i'm waiting for it to finish deleting contents again19:06
clarkbfungi: anything I can help with or should I go eat a sandwich really quickly?19:10
fungigo eat, i'm going to reformat the logvol for it19:11
clarkbk19:11
fungithough i have a feeling it's something like terrible iscsi throughput19:11
fungiyeah, slow for sure, even the mkfs.ext4 on that lv is taking a while19:16
fungiand rebooting again for good measure19:16
fungiokay, starting openafs-client again19:22
fungistill not starting19:24
fungii'll move on to trying the reinstall and dkms rebuild19:24
fungionce it's done oopsing again anyway19:25
fungiyeah, /var/cache/openafs is still megaslow even after reformatting19:34
fungiforcing reinstall of openafs-modules-dkms now19:39
clarkbfungi: do we think the dkms rebuild will help if we suspect a slow volume? maybe we need to try provision a new volume and swap them around and see if that is happier?19:40
fungii'm doing the dkms rebuild on the chance that the slow volume is unrelated to the afsd startup problem19:40
clarkbah19:41
fungiokay, that's done, server's rebooted, new openafs lkm is installed, manually starting openafs-client again19:54
fungiit's... taking a while. may still be just as broken as before19:55
fungiyep19:55
fungikernel:[  178.804188] Internal error: Oops: 96000007 [#1] SMP19:56
clarkbcould be the cache volume then being slow?19:56
clarkbmaybe we try replacing it?19:56
fungiprobably19:56
clarkbotherwise we're probably at potential bug in openafs on this particular arm hardware that linaro doesn't hit for some reason19:58
fungiremoving logical volumes apache and openafs (100gb each) and vg main, along with the pv backing it20:13
fungimirror01.regionone.osuosl.opendev.org/main detached and deleted20:16
fungii've added a new cinder volume carved up into 2 lvm logical volumes of the same names as before and reformatted, rebooted to confirm they're automatically mounting and showing the correct available space, trying to start openafs-client again now20:22
clarkbfingers extra crossed20:24
fungitaking a while, i suspect it's no better than before20:24
clarkb:(20:24
fungiyeah, i think it must still be hosed20:25
fungikernel:[  208.637324] Internal error: Oops: 96000047 [#1] SMP20:25
clarkband still in the init cache function?20:27
clarkbits part of the trace20:27
*** dviroel <dviroel!uid349012@id-349012.stonehaven.irccloud.com> has joined #opendev20:28
fungiafs_GetDownDSlot.constprop.0+0xa0/0x1b0 [openafs]20:28
fungithough a couple calls up it's coming from afs_InitCacheFile+0xb0/0x628 [openafs]20:28
clarkbya that is what it looked like before20:28
fungii need to switch to dinner prep20:32
clarkbI can get up a change to disable that ergion shortly20:32
opendevreviewClark Boylan proposed openstack/project-config master: Disable the osuosl arm64 cloud  https://review.opendev.org/c/openstack/project-config/+/79606220:43
clarkbfungi: ^ maybe after dinner you can approve that one? otherwise I'll try to remember to approve it in a bit20:43
fungidone, i had a moment while waiting for the skillet to heat up20:54
*** donnyd <donnyd!sid368272@id-368272.tooting.irccloud.com> has joined #opendev20:54
*** donnyd_ <donnyd_!~oftc-webi@static-108-44-198-34.clppva.fios.verizon.net> has joined #opendev20:55
opendevreviewMerged openstack/project-config master: Disable the osuosl arm64 cloud  https://review.opendev.org/c/openstack/project-config/+/79606221:03
*** donnyd <donnyd!sid368272@id-368272.tooting.irccloud.com> has quit IRC ()21:06
*** donnyd <donnyd!sid368272@id-368272.tooting.irccloud.com> has joined #opendev21:06
*** donnyd_ <donnyd_!~oftc-webi@static-108-44-198-34.clppva.fios.verizon.net> has quit IRC (Quit: Page closed)21:08
opendevreviewClark Boylan proposed opendev/system-config master: Fix some hostnames in afs docs  https://review.opendev.org/c/opendev/system-config/+/79606421:21
*** tosky <tosky!~luigi@dynamic-adsl-78-13-253-141.clienti.tiscali.it> has quit IRC ()22:13
opendevreviewClark Boylan proposed opendev/system-config master: Use tmpfiles.d to create /var/run/reprepro  https://review.opendev.org/c/opendev/system-config/+/79609322:38
clarkbI learned a thing that systemd can do today ^22:38
clarkbI think we probably want ot monitor that just to ensure that a reboot does what we want and no unexpected cleanup happens22:41
mordredclarkb: that doesn't suck22:59
clarkbfungi: ianw: it turns out that ianw has run into this openafs oops before https://www.mail-archive.com/openafs-info@openafs.org/msg41186.html23:39
clarkbseems that 1.8.7 may fix it? maybe we build new package next week?23:39
clarkbor maybe ianw remembers how it was addressed the last time around23:40
clarkbreading the patch that is expected to fix it it does seem like a race between the kernel module setting things up and afsd starting23:41
clarkbI wonder if we can disable afsd, reboot, then force the kernel module to initialize that /proc entry somehow then start afsd23:41

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!