Wednesday, 2021-02-10

clarkb	I've started trying to figure out how the inmotion cloud vip is working. I set up a nc -l 54321 < file on each of the three hosts and the content of the files was the last octet for the actual ip of those hosts	00:15
clarkb	then I requested port 54321 against the VIP from home, review-test, and a third host in ovh	00:15
clarkb	each time I got back a consistent IP address. That makes me think that the VIP is doign a 1:1 mapping currently	00:16
clarkb	and they aren't doing higher level proxying or laod balancing	00:16
clarkb	as a sanity check I don't see the vip directly on an interface on that host either	00:17
*** gmann_afk is now known as gmann		00:19
clarkb	ok cool this is a kolla managed vip	00:19
clarkb	I see it in the kolla config for that host	00:20
clarkb	I still have no idea how it is functionally working but that is a start	00:20
clarkb	aha, apparently ifconfig isn't showing me all the addresses on the interface kolla is using but ip addr does	00:25
clarkb	cool so now I can see that ip addr is present on one of the three hosts	00:25
clarkb	that confirms it is effectively 1:1	00:25
clarkb	and then if I exec into the haproxy container and look at /etc/haproxy/services.d/ contents i see haproxy listening on the vip	00:26
clarkb	reading the kolla docs I think we can set some config options (specifically kolla_enable_tls_external, and kolla_external_fqdn_cert) then rerun kolla and that should update the haproxy configs with a cert?	00:28
clarkb	I'll ask them if rerunning kolla is something they expect people to do	00:29
clarkb	the upside to having kolla do that for us is it can be sure to get all the necessary ports in haproxy	00:33
clarkb	but we could put another proxy in front of the haproxy from kolla and do it ourselves	00:33
openstackgerrit	Merged opendev/system-config master: refstack: move non-private variables to public https://review.opendev.org/c/opendev/system-config/+/774587	00:37
openstackgerrit	Merged opendev/system-config master: Setup OpenInfra-Board Channel https://review.opendev.org/c/opendev/system-config/+/774706	00:40
clarkb	I think I have a general idea of how to rerun kolla with an updated config. I expect that to take some time to simply execute and dinner is happening momentarily. I'll see if my questions to inmotion have been answered tomorrow and take it from there	00:55
*** rchurch has quit IRC		01:05
*** mlavalle has quit IRC		01:07
*** rchurch has joined #opendev		01:07
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup-server: run a weekly backup verification https://review.opendev.org/c/opendev/system-config/+/774753	01:27
openstackgerrit	Merged opendev/system-config master: refstack: add production image and deployment jobs https://review.opendev.org/c/opendev/system-config/+/774586	01:28
openstackgerrit	Merged opendev/system-config master: borg-backup-server: add script for pruning borg backups https://review.opendev.org/c/opendev/system-config/+/774561	01:28
openstackgerrit	Merged opendev/system-config master: borg-backup-server: volume space monitor https://review.opendev.org/c/opendev/system-config/+/774564	01:28
openstackgerrit	Merged opendev/system-config master: doc: update backup instructions https://review.opendev.org/c/opendev/system-config/+/774570	01:29
openstackgerrit	Merged opendev/system-config master: borg testing: catch stdout and stderr from test prune correctly https://review.opendev.org/c/opendev/system-config/+/774745	01:33
openstackgerrit	Ian Wienand proposed opendev/system-config master: refstack: trigger image upload https://review.opendev.org/c/opendev/system-config/+/774756	02:13
*** artom has quit IRC		02:13
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup-server: run a weekly backup verification https://review.opendev.org/c/opendev/system-config/+/774753	02:39
openstackgerrit	Ian Wienand proposed opendev/system-config master: openafs-<db\|file>-server: fix role name https://review.opendev.org/c/opendev/system-config/+/774761	02:50
openstackgerrit	Merged opendev/system-config master: borg-backup: save PIPESTATUS before referencing https://review.opendev.org/c/opendev/system-config/+/774588	03:01
*** rchurch has quit IRC		03:14
*** rchurch has joined #opendev		03:15
*** artom has joined #opendev		03:19
*** hemanth_n has joined #opendev		03:25
openstackgerrit	Merged opendev/system-config master: refstack: trigger image upload https://review.opendev.org/c/opendev/system-config/+/774756	03:30
*** diablo_rojo has quit IRC		03:41
*** dviroel has quit IRC		04:07
*** lamt has quit IRC		04:25
*** mrunge has quit IRC		04:37
*** dmellado has quit IRC		04:37
*** JohnnyRainbow has quit IRC		04:37
*** ykarel has joined #opendev		04:38
*** mrunge has joined #opendev		04:42
*** dmellado has joined #opendev		04:42
*** JohnnyRainbow has joined #opendev		04:42
*** Eighth_Doctor has quit IRC		04:47
*** ysandeep\|away is now known as ysandeep\|rover		04:49
*** mordred has quit IRC		04:50
*** whoami-rajat__ has joined #opendev		04:57
*** openstackstatus has quit IRC		04:58
*** openstack has joined #opendev		04:59
*** ChanServ sets mode: +o openstack		04:59
*** ysandeep\|rover is now known as ysandeep\|brb		05:13
*** mordred has joined #opendev		05:19
*** Eighth_Doctor has joined #opendev		05:22
ianw	clarkb / kopecmartin : i have run a mysqldump of the refstack db and imported it into https://refstack01.openstack.org	05:26
ianw	clarkb / kopecmartin : to me, it looks like things are not working.	05:30
ianw	SQL connection failed. 10 attempts left.: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)")	05:32
ianw	that's in the container	05:32
*** redrobot4 has joined #opendev		05:34
*** ysandeep\|brb is now known as ysandeep\|rover		05:35
*** redrobot has quit IRC		05:37
*** redrobot4 is now known as redrobot		05:37
*** ykarel has quit IRC		05:55
*** marios has joined #opendev		05:55
*** ykarel has joined #opendev		06:12
ianw	i think it probably needs someting to wait for the mysql container to be alive	06:14
ianw	but, i've hacked in something like that and it still doesn't work	06:14
openstackgerrit	Ian Wienand proposed opendev/system-config master: refstack: create database storage area https://review.opendev.org/c/opendev/system-config/+/774773	06:35
ianw	clarkb / kopecmartin : ^ that's a start i guess ... out of time for today	06:35
*** levalicious has joined #opendev		07:19
*** eolivare has joined #opendev		07:32
*** rpittau\|afk is now known as rpittau		07:51
*** ysandeep\|rover is now known as ysandeep\|lunch		07:54
*** hashar has joined #opendev		07:54
*** ralonsoh has joined #opendev		07:55
*** sboyron has joined #opendev		08:02
*** andrewbonney has joined #opendev		08:21
*** slaweq\|away is now known as slaweq		08:29
*** zbr\|pto is now known as zbr		08:35
*** ysandeep\|lunch is now known as ysandeep\|rover		08:52
*** jpena\|off is now known as jpena		08:57
*** tosky has joined #opendev		09:12
*** DSpider has joined #opendev		09:15
*** ykarel is now known as ykarel\|lunch		09:34
*** dtantsur\|afk is now known as dtantsur		10:38
*** hashar is now known as hasharLunch		10:45
*** ykarel\|lunch is now known as ykarel		10:53
*** dviroel has joined #opendev		11:02
*** hasharLunch has quit IRC		11:19
*** hasharLunch has joined #opendev		11:42
openstackgerrit	Oleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs https://review.opendev.org/c/zuul/zuul-jobs/+/774650	11:44
openstackgerrit	Oleksandr Kozachenko proposed openstack/project-config master: Add zuul-storage-proxy in zuul namespace https://review.opendev.org/c/openstack/project-config/+/772364	11:48
*** hemanth_n has quit IRC		11:56
*** cloudnull has quit IRC		12:02
*** cloudnull has joined #opendev		12:05
*** eolivare_ has joined #opendev		12:23
*** eolivare has quit IRC		12:25
*** hasharLunch is now known as hashar		12:29
*** ysandeep\|rover is now known as ysandeep\|call		12:31
*** jpena is now known as jpena\|lunch		12:36
*** hashar is now known as hasharAway		12:39
*** eolivare_ has quit IRC		12:46
*** iurygregory has quit IRC		12:51
*** ysandeep\|call is now known as ysandeep\|rover		13:16
*** eolivare_ has joined #opendev		13:23
*** ykarel_ has joined #opendev		13:24
*** ykarel has quit IRC		13:27
*** jpena\|lunch is now known as jpena		13:33
*** ykarel_ is now known as ykarel		13:40
ttx	Hi all, I'm working to move the openstackptg bot to #openinfra-events and was taking the opportunity to rename it to "openinfraptg". But to do that it looks like someone will have no manually log in to Nickserv with the openstackptg account and associate an additional nick to it. Someone with access to the ptgbot password in hiera... Also after that the ptgbot_nick entry will have to be changed in hiera. I'm	13:52
ttx	a bit unclear on the process to follow to do hiera things, so any guidance would be appreciated.	13:52
fungi	ttx: i can take care of it shortly, just need to wire up a separate irc client	13:55
ttx	fungi: ok, no urgency at all	13:55
openstackgerrit	Thierry Carrez proposed opendev/system-config master: PTGBot is now openinfraptg on #openinfra-events https://review.opendev.org/c/opendev/system-config/+/774862	13:56
*** cloudnull has quit IRC		13:56
*** cloudnull has joined #opendev		13:59
fungi	config-core: diablo_rojo is volunteering to help with irc channel management, and is working on some foundation channel moves to the #openinfra channel namespace, simple change to add her to our default channel operators list here: https://review.opendev.org/774555	14:00
*** iurygregory has joined #opendev		14:13
*** hasharAway has quit IRC		14:15
openstackgerrit	Merged openstack/project-config master: Add diablo_rojo to AccessBot Operators https://review.opendev.org/c/openstack/project-config/+/774555	14:18
*** hasharAway has joined #opendev		14:46
openstackgerrit	Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474	14:49
*** hasharAway is now known as hashar		14:54
mordred	ttx: my eyes were reading your ptg bot change and misparsed the new bot name as "open in fraptg" and I was like "what's a fraptg?" I'm clearly not fully awake	14:57
fungi	that's open in frap tg	14:59
mordred	exactly	15:00
mordred	see - I knew I needed more coffee	15:00
fungi	it's all about the frappuccino in here	15:01
openstackgerrit	Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474	15:08
*** ysandeep\|rover is now known as ysandeep\|dinner		15:09
*** fressi has quit IRC		15:23
*** ysandeep\|dinner is now known as ysandeep\|rover		15:29
openstackgerrit	Sorin Sbârnea proposed zuul/zuul-jobs master: Upgrade ansible-lint to 5.0 https://review.opendev.org/c/zuul/zuul-jobs/+/773245	15:38
*** hashar is now known as hasharAway		15:42
*** ykarel is now known as ykarel\|away		15:54
clarkb	ianw: kopecmartin: I'll take a look after breakfast	15:57
openstackgerrit	Oleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs https://review.opendev.org/c/zuul/zuul-jobs/+/774650	16:08
*** mlavalle has joined #opendev		16:16
*** mlavalle has quit IRC		16:16
*** mlavalle has joined #opendev		16:17
openstackgerrit	Oleksandr Kozachenko proposed openstack/project-config master: Add zuul-storage-proxy in zuul namespace https://review.opendev.org/c/openstack/project-config/+/772364	16:17
*** ykarel\|away has quit IRC		16:17
fungi	ttx: i've grouped openinfraptg into the nickserv registration for the existing openstackptg account, looking at what we need to update in hiera next	16:23
ttx	fungi: probably just the $ptgbot_nick	16:24
ttx	hiera('ptgbot_nick', 'username')	16:25
*** marios has quit IRC		16:31
fungi	#status log Grouped openinfraptg nick to existing openstackptg account in Freenode and updated ptgbot_nick in our private group_vars accordingly	16:32
openstackstatus	fungi: finished logging	16:32
*** ianw has quit IRC		16:39
*** ianw has joined #opendev		16:39
openstackgerrit	Merged openstack/project-config master: Add zuul-storage-proxy in zuul namespace https://review.opendev.org/c/openstack/project-config/+/772364	16:40
*** hasharAway has quit IRC		16:47
*** hasharAway has joined #opendev		16:49
*** ysandeep\|rover is now known as ysandeep\|away		16:54
clarkb	ianw: the refstack change lgtm. I did leave a couple of thoughts/questions though would be great if you can check those before we merge it	16:55
openstackgerrit	Jeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version https://review.opendev.org/c/opendev/puppet-pip/+/774900	16:58
fungi	infra-root: ^ more fallout from pip 21	16:58
clarkb	+2	17:00
openstackgerrit	Oleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs https://review.opendev.org/c/zuul/zuul-jobs/+/774650	17:15
tobiash	fungi: is the only problem with pip 21 the drop of py 3.5 or should I expect more issues?	17:25
openstackgerrit	Luigi Toscano proposed openstack/project-config master: test-release-openstack: use focal https://review.opendev.org/c/openstack/project-config/+/774906	17:26
openstackgerrit	Oleksandr Kozachenko proposed zuul/zuul-jobs master: Update upload-logs-swift and upload-logs-gcs https://review.opendev.org/c/zuul/zuul-jobs/+/774650	17:35
*** d34dh0r53 has quit IRC		17:43
*** diablo_rojo has joined #opendev		17:44
*** d34dh0r53 has joined #opendev		17:45
diablo_rojo	fungi, since all those patches for the irc channel admin have landed, are we good to go on with the next set of commands then?	17:47
diablo_rojo	And hypothetically I will be able to run them given I am now apart of that group?	17:47
clarkb	diablo_rojo: you can check your perms with chanserv first to confirm too	17:48
clarkb	diablo_rojo: /query chanserv access #your-channel list	17:49
clarkb	tobiash: yes pretty much. They just made new pip >=python3.6 only	17:50
clarkb	I don't think much else about it has changed	17:50
clarkb	(previously they added the dependency resolution which was a major change but that happened with 3.5 supprot)	17:50
tobiash	clarkb: thanks, so probably no problem for us :)	17:50
diablo_rojo	Looks like I am good to go clarkb! Thanks for the direction.	17:52
*** hasharAway is now known as hashar		18:00
*** jpena is now known as jpena\|off		18:01
stephenfin	Hit a weird bug	18:06
stephenfin	I tried to edit a patch's commit message and clicked save	18:06
stephenfin	waait, ignore that	18:06
* stephenfin had to refresh to get the Publish Edit button to appear		18:07
stephenfin	and as confused by the "Go to latest patch set" bar that was appearing	18:07
stephenfin	*Was	18:07
clarkb	I think the go to latest patchset set button may imply you are editing an older patchset	18:08
clarkb	but ya the in browser editor has always been a bit weird	18:08
clarkb	(I think its better now than it was on 2.13 though)	18:08
*** Alex_Gaynor has joined #opendev		18:10
Alex_Gaynor	👋 I'm seeing arm64 builds hanging out in queue'd status for an extended period (>1 hour) https://zuul.opendev.org/t/pyca/status/ I don't see anything obvious in grafana that explains this.	18:10
clarkb	https://grafana.opendev.org/d/pwrNXt2Mk/nodepool-linaro?orgId=1 shows that we were recently using at or near capacity, but right now it does look idle	18:11
Alex_Gaynor	And has 0 in building. I'd expect things to be building if I'm queued :-)	18:12
clarkb	me too	18:12
clarkb	there are some errored launch attempts there, I wonder if it is failing early to build so we don't trigger the building report to graphite /me goes to look at launcher logs	18:13
Alex_Gaynor	The queue also appears to be very deep, though I obviously have no idea if that's related.	18:15
clarkb	a deep queue can also cause jobs to wait before building just due to lack of available resources to process everything at once, but in this case it seems to be that it isn't using any of the available resources from some reason	18:16
clarkb	nodepool just logged 504 Gateway Time-out: The server didn't respond in time. for linaro server deletions	18:17
clarkb	I wonder if the api services just went away /me digs more	18:17
Alex_Gaynor	if it can't handle deletions, seems like you might end up with a "phantom" pool of resources that exist, but are unusable, and also prevent spinning up new ones.	18:18
clarkb	ya its also affecting the listing of resources and services on different ports.	18:18
*** eolivare_ has quit IRC		18:18
mordred	that reminds me of the old HP Public Cloud bug	18:18
clarkb	my preliminary analysis is that the cloud apis just went away	18:19
clarkb	kevinz: ^ fyi if you happen to be awake	18:19
clarkb	I'll try interacting with it manually to see if I can observe any other useful behaviors	18:19
mordred	with the missing database index that caused both creates and deletes to timeout at the LB but continue running/blocking on the backend - and of course since the LB timed it out, client code would retry the operation just putting more in the backend queue ...	18:20
clarkb	mordred: ya image lists and server shows work	18:21
clarkb	if run manually so I'm suspecting the issues are more narrow (similar to what you describe)	18:21
* clarkb tries to manually boot and delete a server		18:22
*** rpittau is now known as rpittau\|afk		18:23
clarkb	my test node failed with 'No valid host was found. There are not enough hosts available.'	18:25
clarkb	however, if nodepool was hitting ^ I would've expect node failures to bubble up to zuul	18:26
clarkb	this is an interesting situation	18:26
mordred	yeah - why was nodepool getting gateway timeouts - no valid host is a real error	18:30
mordred	unless no valid host is causing nodepool to retry loop and the loadbalancer is rate-limiting nodepool now but not you	18:30
mordred	did you do that manual launch from a nodepool node?	18:31
clarkb	no I did it from bridge	18:31
clarkb	so that could be it, the proxy telling us to go away after tight looping due to failures	18:31
mordred	yeah	18:31
mordred	so could be a double failure sitch	18:31
clarkb	ya grepping on not enough hosts I see a bunch of those errors in a small period of time then it stops which would be inline with your hypothesis	18:32
clarkb	hrm except the gateway failures happened first and now its going through and finding no valid host	18:35
clarkb	I expect that what is happening is something broke at a network level and caused the cloud to have a sad. It has since recovered enough to fail node boots with no valid host but not recovered enough to actually boot them	18:36
clarkb	and now jobs are going to start getting node failures	18:36
clarkb	but I'll keep poking and see if I can come up with a more concrete idea of what is going on	18:36
clarkb	just caught it doing another round of attempts and then getting back no valid hosts	18:40
clarkb	do we have a backoff on node relaunch attempts?	18:40
clarkb	that may explain why we aren't seeing this in a tight loop	18:41
clarkb	corvus: ^	18:41
*** diablo_rojo has quit IRC		18:44
*** dtantsur is now known as dtantsur\|afk		18:45
fungi	tobiash: that's the main change i'm aware of in pip 21, it drops support for python <3.6 (including dropping 2.7 support)	18:45
fungi	oh, i see clarkb also answered you	18:46
fungi	diablo_rojo seems to have dropped again	18:46
corvus	clarkb: i don't think there's an explicit backoff, just a complex system of loops and timeouts	18:52
clarkb	hrm, it is definitely not progressing through the requests as quickly as I would expect if there is no backoff. I think this is a "good" thing in that it means we may end up with a fixed cloud before everythong NODE_FAILUREs though	18:53
*** klonn has joined #opendev		18:56
*** whoami-rajat__ has quit IRC		18:57
openstackgerrit	Merged openstack/project-config master: test-release-openstack: use focal https://review.opendev.org/c/openstack/project-config/+/774906	19:02
clarkb	fwiw still seeing bursts of no valid host found	19:15
*** ralonsoh has quit IRC		19:17
*** rchurch has quit IRC		19:17
Alex_Gaynor	FWIW, I'm now seeing clear "node_failure" statuses, so progress?	19:18
fungi	corvus: so the manage-projects run took 1.5 hours	19:19
corvus	infra-root: i'm looking at a manage-projects log, and it output a lot of errors on gitea03	19:19
corvus	and 02	19:19
fungi	maybe we're getting slammed again	19:20
*** rchurch has joined #opendev		19:20
fungi	checking graphs	19:20
clarkb	Alex_Gaynor: yes I think what is happening is the no valid host errors we are seeing more recently are going to start bubbling up as NODE_FAILURES	19:20
tobiash	clarkb: I'm wondering if we should treat no valid host found errors in nodepool like non-fatal quota issues	19:20
corvus	and 05... let's just say several giteas for now since it's hard to read these logs	19:20
fungi	corvus: oh yeah, massive swap thrash and eventual oom in that timeframe	19:20
fungi	so yay, our mystery load generator has returned? and now we have improved logging to investigate with	19:21
clarkb	tobiash: in this case there is only one cloud provider for these node types so that would just cause all jobs to sit and wait until the cloud fixed tiself	19:21
clarkb	fungi: fwiw I think its been about exactly one week since last time	19:21
fungi	neat	19:21
clarkb	fungi: a fun cron maybe?	19:21
fungi	last time our suspect was rdo's ci servers, right?	19:21
clarkb	but ya the improved logs will hopefully allows us to identify the source	19:21
tobiash	clarkb: we were getting no valid host found mostly when the cloud is short on resources due to potential too high over provisioning	19:22
clarkb	fungi: yes, they had an order of magnitude more requests on some of the servers (and theory was they tripped that one over which caused a chain reaction as haproxy rebalanced the pool)	19:22
corvus	fungi: gitea03 starting at 2021-02-10T17:05:30.156912	19:22
clarkb	Alex_Gaynor: unfortunately I don't think there is much more we can do without the cloud intervening.	19:22
Alex_Gaynor	👍	19:22
clarkb	kevinz: when your day starts can you sync up with us and see if we can help with further debugging?	19:22
clarkb	corvus: fungi: the rough debugging process with improved logs is look at apache2 access logs on affected hosts during the time frame and note the source port for large or out of place requests. Then go to haproxy server syslog and grep for that port and gitea backend	19:23
fungi	clarkb: also sometimes it's helped to e-mail kevinz since he may see that sooner than irc	19:24
clarkb	because there are only 65k possible ports you also typically have to match timestamp ranges too	19:24
clarkb	fungi: good idea. I'll write an email then see if I can help with gitea	19:24
corvus	tobiash, clarkb: i agree, that's usually what that error means. i think i'd be in favor of treading it as a non-fatal error; though since it's not actually reflected in quota, i don't think we'll be able to handle it intelligently. i think we ought to decline the request if we are not the last possible launcher.	19:24
fungi	yeah, i'm doing the thing with gitea02, but someone independently doing the same for another impacted backend would help correlation	19:24
corvus	fungi: i think i need to leave the gitea debugging to you, sorry	19:25
fungi	corvus: no worries, thanks for spotting it!	19:26
fungi	i'll be semi-focused on this for the next little while, but also need to do some cooking shortly	19:27
*** hashar has quit IRC		19:33
clarkb	ok email sent	19:34
clarkb	tried to accurately describe the transition from 504 gateway errors to no valid host found with accurate timestamps	19:35
fungi	interestingly, the greatest number of connections i see to gitea02 during the 17z hour was from codesearch.o.o	19:36
*** andrewbonney has quit IRC		19:39
clarkb	is it possible that creating new projects is doing it?	19:41
clarkb	16:40:05 openstackgerrit \| Merged openstack/project-config master: Add zuul-storage-proxy in zuul namespace https://review.opendev.org/c/openstack/project-config/+/772364	19:41
clarkb	or is that just getting caught in the fallout? Our gitea testing does actually create all the projects in our project list and it does that successfully	19:42
fungi	yeah, it's mostly come to our attention when new project creation fails, but we create new projects at other times without issue	19:43
fungi	i don't think it's codesearch, because it's also far and away the largest source of connections to gitea02 at other times where this wasn't going on	19:44
clarkb	automated email response has reminded me that it is the chinese new year	19:46
fungi	d'oh!	19:47
fungi	that doesn't bode well	19:47
*** slaweq has quit IRC		19:49
*** zimmerry has quit IRC		20:11
*** zimmerry has joined #opendev		20:13
*** sboyron_ has joined #opendev		20:55
*** sboyron has quit IRC		20:58
*** sboyron_ has quit IRC		21:09
ianw	clarkb: how strongly do you feel about the /var/refstack v /var/lib/refstack? enough to respin?	21:29
fungi	even with /var/refstack being non-fhs-compliant, i wouldn't want anyone to redo work	21:30
clarkb	ianw: not super. I tend to always look at the docker compose file and work back from there anyway (and that has teh /var/lib/refstack pointers	21:31
ianw	fungi: well the change is moving everything to /var/lib/refstack so i guess we're good from that pov	21:31
clarkb	just calling it out as a difference to gitea if others care more strongly	21:31
ianw	ok i might just go with it, and see if having a persistent db makes things work. i'm not sure though, i did a "mysqldump <trove-details> \| mysql" to try and populate it and it didn't seem to work, but i don't know	21:32
ianw	i have to just do school run but can help with gitea things if i can be useful	21:33
fungi	in what way did it not work? i think i've used mysql -e for such things in the past	21:34
fungi	or source the path to the dumpfile in the interactive mysqlclient prompt	21:34
clarkb	re gitea I'm beginning to wonder if it could be our own project description updates that does it	21:34
clarkb	possible that when there is background load on gitea that doing the management stuff all at once like that can make things sad	21:35
clarkb	however, not 100% sure of that yet	21:35
fungi	yeah, i need to find a minute to try and match up the manage-projects ansible log to see if i can tell when it started hitting different backends with when the memory on each of them started to skyrocket	21:37
fungi	ugh, perhaps unsurprisingly, we have bitrot in our puppet-pip module testing	21:41
fungi	looks like it could be a problem with beaker-hiera	21:42
fungi	reading a bit, we may need to pin beaker-hiera<0.2 in our spec helper repo	21:44
clarkb	that seems to be the classic case of bit rot in the puppet space for us	21:44
openstackgerrit	Jeremy Stanley proposed opendev/puppet-openstack_infra_spec_helper master: Pin beaker-hiera<0.2.0 https://review.opendev.org/c/opendev/puppet-openstack_infra_spec_helper/+/775030	21:49
openstackgerrit	Jeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version https://review.opendev.org/c/opendev/puppet-pip/+/774900	21:50
clarkb	fungi: I seem to recall that depends on won't work for that for some reason? we may just have to land the infra spec helper change (which I am reviewing)	21:51
fungi	yeah, maybe	21:52
openstackgerrit	Merged opendev/system-config master: refstack: create database storage area https://review.opendev.org/c/opendev/system-config/+/774773	21:54
clarkb	ianw: fwiw I would've expected the mariadb to work without the persistent mount but if docker-compose down then up -d was run you'd lose the db	21:55
clarkb	its definitely a good and correct improvment	21:56
clarkb	fungi: just thinking out loud here about the gitea thing. Maybe we can measure it in our test job and see if that exhibits high load during the project description update pass?	22:12
clarkb	I don't know how easy our existing test tooling makes that though	22:12
fungi	wonder if we could add roles to start dstat and collect its record?	22:16
ianw	devstack already has a background service that does similar	22:17
fungi	yeah, just didn't know if devstack's implementation was easily reused or tightly coupled to devstack's overall design	22:19
ianw	probably jumping on a running host a copying the .service file would be enough	22:21
*** diablo_rojo has joined #opendev		22:21
diablo_rojo	fungi, been trying to run the renaming commands in the docs, but chanserv says I am not authorized?	22:23
clarkb	diablo_rojo: you may have to explicit op yourself in the channel first	22:23
clarkb	the ability to op and actually being op are separate	22:23
fungi	diablo_rojo: /msg chanserv op #openstack-board	22:24
fungi	i think that's the syntax	22:24
fungi	and then the same but deop instead of op when you're finished	22:24
clarkb	also this could be behavior change between gitea 1.13 and 1.14?	22:28
clarkb	assuming the project description updates are related.	22:28
clarkb	Just thinking out loud here: we could also remove the description updates for now and monitor the next project creation	22:29
diablo_rojo	I did actually explicitly op myself in the channel first.	22:33
diablo_rojo	But let me try again	22:33
diablo_rojo	Yeah. It still says I am not authorized..	22:34
clarkb	do you need it on both sides maybe?	22:35
diablo_rojo	In both #openinfra-events and #openstack-ptg I have op	22:35
diablo_rojo	but I can't set the guard on #openstack-ptg	22:35
clarkb	hrm	22:36
diablo_rojo	Right?	22:36
fungi	i'll need to re-check the freenode mode reference2	22:38
fungi	most of us are +Aeforstv but diablo_rojo is only +Aefortv	22:38
diablo_rojo	Weird.	22:38
fungi	founder perms are +AFRefiorstv	22:39
fungi	https://freenode.net/kb/answer/channelmodes	22:39
fungi	ahh, nope, i wanted perms	22:40
*** levalicious has quit IRC		22:40
diablo_rojo	I would guess since I am missing the 's' that's why I can't 'SET' things.	22:40
fungi	oh, though that says "An operator can use MLOCK with +f only if they have access flag +s in both channels, or if the channel to be forwarded to is +F and they have +s in the original channel."	22:43
fungi	so, yep	22:43
fungi	that looks likely	22:43
fungi	i wonder why we don't normally grant +s to our operators list?	22:43
fungi	diablo_rojo: i've added +s to your perms for #openinfra-events and #openstack-ptg, see if that worked?	22:45
fungi	if so i can add you to the others you're working on while we figure out why we're not setting +s on everyone in the access list	22:45
diablo_rojo	fungi, no dice.	22:47
fungi	+s is "Enables use of the set command." according to `/msg chanserv help FLAGS`	22:47
fungi	diablo_rojo: my fault, syntax error. should actually be added now	22:49
fungi	i didn't spot the error response on my first try. probably doing too many things at the same time	22:49
*** klonn has quit IRC		22:51
fungi	ianw: "This message is to inform you that the host your cloud server, ianw-klog-collector, resides on alerted our monitoring systems at 2021-02-09T12:39:27.515190."	22:59
fungi	i can close that ticket out, just wanted to make sure you knew	23:00
ianw	fungi: oh, we can delete that. that was the server i was using to collect logs from the linaro hosts that kept disappearing	23:01
fungi	ianw: also are the "backup inconsistency" e-mails a test, or false negative?	23:01
openstackgerrit	Merged opendev/puppet-openstack_infra_spec_helper master: Pin beaker-hiera<0.2.0 https://review.opendev.org/c/opendev/puppet-openstack_infra_spec_helper/+/775030	23:01
clarkb	ianw: re linaro do you know if there is someone else we should send email about the no valid host found errors there? kevinz's returned that it is the chinese new year	23:01
fungi	"Inconsistency found in backup /opt/backups/borg-gitea01/backup on backup01 at Wed Feb 10 01:13:47 UTC 2021"	23:02
fungi	et cetera	23:02
ianw	sorry just pulling up my mail client	23:02
clarkb	fungi: I'm yet to receive that one it seems	23:02
fungi	clarkb: they went to the root inbox	23:02
ianw	clarkb: yeah, i don't have another contact unfortunately ... not sure what else to do :(	23:02
diablo_rojo	fungi, good to go now.	23:02
diablo_rojo	I can set stuff.	23:03
ianw	huh, those backup inconsistency ones i would not expect	23:03
fungi	clarkb: ianw: could hrw know who to contact?	23:03
diablo_rojo	If you want to give me that perm on the other channels I can move forward.	23:03
fungi	diablo_rojo: awesome, yeah doing that now, just a sec	23:03
diablo_rojo	fungi, thank you!	23:03
clarkb	fungi: ya not finding it (I did a global serach too to rule out filing into an unexpected dir)	23:04
ianw	fungi: maybe, he is usually in here but from #linaro we might have just missed him. worth a try	23:05
fungi	clarkb: no, i mean our shared root inbox	23:05
clarkb	oh I see	23:05
fungi	diablo_rojo: i think i got them all	23:06
ianw	oh, hrm we didn't actually approve the backup verification yet @ https://review.opendev.org/c/opendev/system-config/+/774753	23:06
fungi	clarkb: also inmotion has been sending setup complete notifications to that address, do you need those or can i file them into a subfolder?	23:06
ianw	fungi: ok, they are definitely false positives from when i was testing it and pressed ctrl-c, killing the verification process that the script then warned out	23:07
diablo_rojo	fungi, sweet! Thank you :)	23:07
fungi	cool	23:08
fungi	diablo_rojo: you're welcome, lmk if you run into more problems	23:08
clarkb	fungi: they can be filed away	23:09
diablo_rojo	fungi, will do!	23:11
fungi	clarkb: thanks, done	23:11
fungi	working on closing out the rackspace ticket about ianw-klog-collector as well	23:12
openstackgerrit	Clark Boylan proposed opendev/system-config master: Build Gerrit 3.3 images https://review.opendev.org/c/opendev/system-config/+/765021	23:12
openstackgerrit	Clark Boylan proposed opendev/system-config master: Run gerrit 3.2 and 3.3 functional tests https://review.opendev.org/c/opendev/system-config/+/773807	23:12
openstackgerrit	Clark Boylan proposed opendev/system-config master: Cleanup refstack job dependencies https://review.opendev.org/c/opendev/system-config/+/775041	23:12
ianw	fungi: if you're in the control panel can you just delete it?	23:12
ianw	(the server, otherwise i'll do it later)	23:12
fungi	ianw: oh, yep happy to do that too	23:12
clarkb	ianw: ^ I stuck that refstack cleanup behind my gerrit 3.3 jobs addition beacuse there were merge conflicst	23:12
diablo_rojo	fungi, missing #openstack-summit	23:13
clarkb	I don't think either is urgent but wanted to point that out as I noticed it when fixing the conflicts	23:13
ianw	clarkb: oh, sorry, i owed a review on the gerrit 3.3 things, looking	23:13
fungi	diablo_rojo: oh, hah, i'm not allowed to do that	23:13
diablo_rojo	Ha ha ha	23:14
diablo_rojo	Alright, will circle back to that one then.	23:14
fungi	diablo_rojo: apparently not actually an official channel, only access is for the founder "spy" who created that channel >10 years ago	23:14
fungi	>10.5 years ago in fact	23:15
fungi	"modified 10y 31w 3d ago"	23:16
fungi	corvus: mordred: does the irc nick "spy" ring a 10.5-year-old bell for you?	23:16
clarkb	that must've been the first summit	23:16
fungi	indeed	23:16
clarkb	(if I've done math right)	23:16
fungi	jbryce: ^ you might remember too, i suppose?	23:18
corvus	yes spy was an OG	23:20
fungi	ianw: i've closed out the ticket and deleted the instance now	23:20
corvus	fungi: need me to ask freenode for it?	23:20
ianw	thankyou!	23:20
fungi	corvus: if you have a moment, that would be much appreciated!	23:20
fungi	at this point we're just trying to set a forward on it anyway	23:21
fungi	that reminds me i still need to start the org application for the #openinfra channel namespace, i found freenode's documentation on the process at least	23:24
corvus	fungi: better sooner than later	23:26
fungi	yup	23:26
fungi	we're only just starting to forward to those, and i didn't want to jump the gun asking for that namespace until we'd given the former occupants of the base channel some time	23:27
corvus	fungi: done; Flags +AFRefiorstv were set on openstackinfra in #openstack-summit.	23:28
fungi	thanks corvus!	23:28
fungi	diablo_rojo: as soon as our next accessbot run fires, we should be all set	23:29
mordred	fungi: wow. spy is old	23:29
diablo_rojo	seems I cant set the MLOCK for the openstack-foundation to openinfra redirect	23:30
ianw	clarkb: did you ever look at making gitea pause until the db container was active? you used to be able to set a "healthcheck" on the mariadb instance and make other containers wait on that with a condition, but for some reason they removed that apparently	23:31
corvus	diablo_rojo: there's an existing forward for the unregistered channel; maybe that needs to be removed first?	23:32
corvus	info #openstack-foundation	23:33
corvus	derp	23:33
diablo_rojo	corvus, ohh that makes sense. Oh nailed it.	23:33
diablo_rojo	Thats already in place then	23:33
clarkb	ianw: I think docker-compose doesn't have that ability to wait. YOu have to do it within the container with like an init script	23:33
clarkb	ianw: I want to say once I couldn't do it with docker-compose I gave up because I didn't want to have a super complicated container image	23:34
clarkb	(but maybe complicated container image is a good idea?)	23:34
corvus	diablo_rojo: the current status is: Mode lock : +ntcrf #openstack-unregistered	23:34
diablo_rojo	oh so not redirected to the right place	23:35
*** DSpider has quit IRC		23:35
ianw	clarkb: i got it working with http://paste.openstack.org/show/lCL5sfUhtkXLtvmHwMmV/ using version 2.1, but then i read that apparently that was considered too useful and so they removed it in version 3 :/	23:36
ianw	i don't actually know if it matters; i'm assuming refstack retries until it connects anyway	23:36
ianw	we didn't deploy because of a typo	23:36
clarkb	gitea retries	23:36
clarkb	ianw: what was the typo?	23:37
openstackgerrit	Ian Wienand proposed opendev/system-config master: refstack: fix typo in role matcher https://review.opendev.org/c/opendev/system-config/+/775044	23:37
ianw	clarkb: ^ :)	23:37
fungi	so looking at our accessbot config, we say to set +Aeforstv on everyone in the operators list, so i'm not sure why it added diablo_rojo to them without +s	23:38
fungi	actually there are channels it didn't add her to at all, i'll check the accessbot output	23:39
clarkb	ianw: doh	23:41
ianw	i'll just do a manual run to get the new files on	23:41
fungi	2021-02-10 08:05:41,556 [INFO] setaccess - access #openinfra-board add diablo_rojo -FRis	23:45
ianw	ok, i've started the refstack mariadb container and /var/lib/refstack/db/ is populated. i'm going to run the mysqldump import from the old trove	23:45
fungi	now to figure out where/why accessbot is setting -FRis	23:46
ianw	well https://refstack01.openstack.org/#/community_results still seems to not be happy	23:48
clarkb	fungi: operators don't have FRi (but do have s)	23:49
fungi	yeah, that's what i find weird	23:49
clarkb	-FRi seems like what I would've expected	23:49
fungi	but also she wouldn't have had FRi anyway	23:49
clarkb	as a santiy check other operators do have +s (but no FRi)	23:51
fungi	and it seems like accessbot isn't processing the whole list either, though it's not immediately apparent to me from the log why that is	23:51
*** CeeMac has quit IRC		23:51
openstackgerrit	Ian Wienand proposed opendev/system-config master: refstack: capture container logs to disk https://review.opendev.org/c/opendev/system-config/+/775046	23:52
fungi	2021-02-10 14:33:57,877 [DEBUG] irc.client - TO SERVER: QUIT :Connection reset by peer	23:52
fungi	oh maybe we're getting disconnected	23:53
clarkb	is it being rate limited?	23:53
fungi	yeah, could be something like that, though the server doesn't seem to explain	23:54
ianw	"Blocked loading mixed active content “http://refstack01.openstack.org:8000/v1/results?page=1”	23:55
fungi	that's being logged by the apache layer?	23:55
clarkb	ianw: fwiw it gives me json back	23:56
clarkb	but I have to switch it to port 443	23:56
ianw	yeah, i think the errors might be on the front end and the db is ok, it's just all confused between https/http and it's hostname...	23:57
clarkb	ianw: I think the port 8000 stuff isn't meant to be publicly exposed fwiw	23:57
clarkb	but I guess if that is apache complaining then you aren't hitting that	23:57
fungi	looks like tools/apply-test.sh needs some help to deal with latest cryptography now	23:58
clarkb	fungi: that uses ansible to run puppet with ansible and probably uncapped cryptography with old pip being the problem there?	23:58
clarkb	fungi: can probably upgrade pip first or cap cryptography in the ansible install	23:58
fungi	yeah, it's happening when we pip install ansible	23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!