Thursday, 2021-02-11

openstackgerrit	Merged opendev/system-config master: openafs-<db\|file>-server: fix role name https://review.opendev.org/c/opendev/system-config/+/774761	00:00
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Upgrade pip in our apply tests https://review.opendev.org/c/opendev/system-config/+/775048	00:05
fungi	who knew a simple change to switch the irc channel for ptgbot would turn into an all-day rabbit hole of test bitrot? and i'm still not sure i've hit bedrock yet	00:06
* ianw actually probably did know :)		00:12
ianw	it's always like that	00:12
clarkb	looking at devstack dstat stuff not sure its super reconsumable directly, but this gives good clues as to how to redo it in a bit of ansible	00:15
clarkb	I'm giving that a go	00:15
diablo_rojo	fungi, stepping away for a little bit, but will get the etherpad updated this evening. I got the events stuff all done (aside frm #openstack-summit). And the diversity channel should all be done as well. Will get the board stuff done and then circle back to the foundation channel.	00:15
fungi	diablo_rojo: awesome!	00:15
fungi	i'll still be around if you need any help with it	00:16
diablo_rojo	Sounds good :)	00:16
*** tosky has quit IRC		00:17
ianw	ok, i changed a config variable and https://refstack01.openstack.org/#/community_results seems to be working	00:21
openstackgerrit	Clark Boylan proposed opendev/system-config master: Use dstat to record performance of gitea management https://review.opendev.org/c/opendev/system-config/+/775051	00:33
clarkb	ianw: ^ something like that what you had in mind?	00:33
clarkb	I didn't actually test that unit locally though I should've I guess	00:34
ianw	looks about right, should be self-testing	00:36
fungi	and, shockingly, the tools/prep-apply.sh fix is failing system-config-legacy-logstash-filters	00:36
* fungi climbs in even deeper		00:36
clarkb	fungi: Those filters have not changed recnetly. We can probably turn off that job or make it non voting	00:37
clarkb	then if they need to change fix the testing then	00:37
fungi	we'll see how simple it is to unwind	00:37
clarkb	ianw: minor nit on https://review.opendev.org/c/opendev/system-config/+/774753 but testing seems to show it isn't a problem so I +2'd	00:40
fungi	actually, looks like upgrading pip wasn't actually the problem, but the patch to upgrade it is self-testing the same error with cryptography	00:43
fungi	https://zuul.opendev.org/t/openstack/build/7edfdec54dc0405783facfdc76c9e4e5	00:43
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup-server: run a weekly backup verification https://review.opendev.org/c/opendev/system-config/+/774753	00:43
fungi	i'm not able to make heads or tails of the error there, i got fixated earlier on the other exception hit while raising the actual one	00:44
clarkb	fungi: I'll take a look once I rereview ^	00:44
openstackgerrit	Ian Wienand proposed opendev/system-config master: refstack: use external https for API https://review.opendev.org/c/opendev/system-config/+/775053	00:46
fungi	ooh, now i'm thinking it's actually the setuptools pin causing the problem, and we're trying to use a version of pip which is too new to work with it	00:46
clarkb	fungi: ya I think we can drop that pin since they fixed things?	00:46
clarkb	I feel like I had a change for that osmewhere too	00:47
clarkb	let me see	00:47
fungi	https://review.opendev.org/749766	00:47
fungi	i just rechecked it	00:47
ianw	fungi: would this all be better not running on xenial host?	00:47
clarkb	ianw: it would be but the puppets only run on xenial now	00:48
fungi	ianw: maybe, though right now there's still puppeted stuff on eavesdrop	00:48
ianw	ahh, yeah	00:48
clarkb	fungi: note that that change may need to be squashed into yours? I guess we'll find out	00:48
fungi	clarkb: yeah, that's why the rechecking	00:49
fungi	i think we don't actually need mine	00:49
fungi	the error i was trying to fix by upgrading pip actually stems from too new pip for the setuptools we pinned, if i'm reading correctly	00:49
fungi	the script seemed to have already installed the latest pip available for python 3.5 anyway	00:50
clarkb	got it	01:01
*** dmellado has quit IRC		01:03
fungi	though also it wants to build cryptography from sdist, so once we clear this hurdle i expect more	01:03
clarkb	ya that is why I assume it isn't respecting the python requires metadata which implies old pip	01:04
clarkb	since its 3.6 only now too right?	01:04
* clarkb checks		01:04
fungi	well, it seems to be using pip 20	01:04
fungi	it's pulling a slightly older cryptography i think	01:04
clarkb	cryptography 3.2.1 is what it should pull for python3.5	01:05
clarkb	3.4.4 is latest	01:05
fungi	pip 20.3.4 and cryptography 3.4.4 yeah	01:06
clarkb	and the metadata appears to be in pypi for that	01:06
fungi	yep, "Requires: Python >=3.6" at https://pypi.org/project/cryptography/ too	01:07
*** mlavalle has quit IRC		01:10
*** dmellado has joined #opendev		01:12
openstackgerrit	Jeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version https://review.opendev.org/c/opendev/puppet-pip/+/774900	01:18
openstackgerrit	Merged opendev/system-config master: refstack: fix typo in role matcher https://review.opendev.org/c/opendev/system-config/+/775044	01:31
openstackgerrit	Merged opendev/system-config master: refstack: capture container logs to disk https://review.opendev.org/c/opendev/system-config/+/775046	01:33
openstackgerrit	Merged opendev/system-config master: Revert "Install older setuptools in puppet apply jobs" https://review.opendev.org/c/opendev/system-config/+/749766	01:34
fungi	okay, can anyone else spot the error on this failure? https://zuul.opendev.org/t/openstack/build/100ea04cd6194669a84c8f16c5774e91	01:53
fungi	oh, maybe it's the two warnings about "class included by absolute name"	01:54
openstackgerrit	Jeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version https://review.opendev.org/c/opendev/puppet-pip/+/774900	01:56
ianw	yeah i think so https://zuul.opendev.org/t/openstack/build/100ea04cd6194669a84c8f16c5774e91/log/job-output.txt#1244	01:58
ianw	clarkb / Alex_Gaynor : kevinz reports the cloud is back up, some sort of rabbitmq issue	02:02
kevinz	ianw: yes, rabbitmq partition	02:02
fungi	thanks kevinz! sorry to bother you during holidays	02:03
kevinz	fungi: Np, I do not go anywhere, so just play with the cloud for fun :-)	02:04
openstackgerrit	Ian Wienand proposed opendev/system-config master: Refactor AFS groups https://review.opendev.org/c/opendev/system-config/+/775057	02:25
openstackgerrit	Ian Wienand proposed opendev/system-config master: Use dstat to record performance of gitea management https://review.opendev.org/c/opendev/system-config/+/775051	02:30
openstackgerrit	Ian Wienand proposed opendev/system-config master: Refactor AFS groups https://review.opendev.org/c/opendev/system-config/+/775057	02:35
*** dmellado has quit IRC		02:36
openstackgerrit	Merged opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version https://review.opendev.org/c/opendev/puppet-pip/+/774900	02:49
fungi	nearly there	02:50
openstackgerrit	Ian Wienand proposed opendev/system-config master: refstack: add backup https://review.opendev.org/c/opendev/system-config/+/775061	02:57
*** dmellado has joined #opendev		03:02
openstackgerrit	Merged opendev/system-config master: PTGBot is now openinfraptg on #openinfra-events https://review.opendev.org/c/opendev/system-config/+/774862	03:44
*** dviroel has quit IRC		04:28
*** ykarel\|away has joined #opendev		04:40
*** ykarel\|away is now known as ykarel		04:41
*** ysandeep\|away is now known as ysandeep\|rover		05:09
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup: fix backup script failure match https://review.opendev.org/c/opendev/system-config/+/775068	05:40
openstackgerrit	Merged opendev/system-config master: borg-backup-server: run a weekly backup verification https://review.opendev.org/c/opendev/system-config/+/774753	05:53
*** whoami-rajat__ has joined #opendev		06:03
*** marios has joined #opendev		06:05
*** ralonsoh has joined #opendev		06:49
*** diablo_rojo has quit IRC		06:55
*** CeeMac has joined #opendev		07:07
*** sboyron_ has joined #opendev		07:10
*** eolivare has joined #opendev		07:39
*** slaweq has joined #opendev		07:44
*** ykarel has quit IRC		07:44
*** ykarel has joined #opendev		07:46
*** hashar has joined #opendev		07:52
*** rpittau\|afk is now known as rpittau		07:53
*** ysandeep\|rover is now known as ysandeep\|lunch		08:01
*** fressi has joined #opendev		08:03
*** DSpider has joined #opendev		08:08
*** andrewbonney has joined #opendev		08:15
*** jaicaa has quit IRC		08:27
*** jpena\|off is now known as jpena		08:29
*** jaicaa has joined #opendev		08:30
*** hemanth_n has joined #opendev		08:31
*** tosky has joined #opendev		08:36
*** ysandeep\|lunch is now known as ysandeep\|rover		08:43
*** swest has joined #opendev		09:19
*** felixedel has joined #opendev		09:32
felixedel	Hi all, we are currently working on a larger feature in Zuul for which we currently have ~60 staging patches in Gerrit. During development we quite often have to update/rebase the whole stack of changes. When pushing this stack to Gerrit, it takes quite some time to process them (2-3 minutes at least) and often fails with the following error http://paste.openstack.org/show/802556/	09:33
felixedel	Although the changes seem to be all there, I'm not sure if this error has an impact on the consistency of those changes. Could you maybe have a look on this error?	09:33
felixedel	The latest update of those changes can be found here: https://review.opendev.org/c/zuul/zuul/+/774610/3	09:33
*** ykarel is now known as ykarel\|lunch		09:33
*** ralonsoh has quit IRC		09:34
*** ralonsoh has joined #opendev		09:34
*** dtantsur\|afk is now known as dtantsur		10:00
*** hashar has quit IRC		10:07
*** hashar has joined #opendev		10:08
priteau	Good morning. Is there a known issue with Zuul? I can see some patches in the gate queue have completed all their jobs but they are stuck there	10:17
yoctozepto	infra-root: it looks as if zuul (or gerrit) hung? jobs seem to never get off the queue	10:17
yoctozepto	priteau: lol, what a sync	10:17
cgoncalves	as of now: "Queue lengths: 4674 events, 0 management events, 445 results."	10:18
priteau	Get out of my mind yoctozepto!	10:19
yoctozepto	priteau: great men* think alike :-)	10:21
priteau	Some jobs have left the gate now but it seems much slower than usual	10:21
yoctozepto	* and women, I think the saying was meant to be genderless though	10:22
yoctozepto	yes, I confirm :-)	10:22
frickler	seems like zuul is having a hard time getting a stack of 60 patches being submitted all at once handled through all its queues	10:25
frickler	felixedel: I suggest you submit your stack in batches of maybe 10 patches at a time, seems getting 60 patches at once does overload zuul for quite some time	10:26
cgoncalves	frickler, felixedel has mentioned (scroll a few lines up) he was submitting a stack of 60 patches at once	10:26
yoctozepto	ugh	10:30
yoctozepto	:D	10:30
frickler	there also seem to be java exceptions in gerrit related to submitting this stack. maybe also discuss with #zuul folks whether that stack could be submitted in parts in order to reduce the amount of rebasing that is needed	10:33
swest	frickler: felixedel and I discussed this with corvus yesterday and agreed to continue pushing the stack for the time being and report issues here :D	10:36
*** ykarel\|lunch is now known as ykarel		10:36
frickler	swest: hmm, o.k., then please wait for feedback from corvus before submitting any new stacks	10:44
swest	yea, we'll avoid pushing the whole stack at once the next time	10:44
swest	sorry, about the DoS :(	10:45
frickler	well, exploring the limits of our infrastructure generally isn't a bad thing I'd say, just avoid doing it too often ;)	10:46
tobiash	yeah, now we know that's a problem and logs for that so I hope that's solvable	10:52
tobiash	looks like gerrit still has some performance regressions compared to the old version	10:52
*** dviroel has joined #opendev		12:01
*** bwensley has joined #opendev		12:23
*** jpena is now known as jpena\|lunch		12:33
*** hemanth_n has quit IRC		12:47
*** mgagne has quit IRC		12:52
*** mgagne has joined #opendev		12:53
yoctozepto	unfortunately that is true :-(	13:09
fungi	well, we went into it knowing gerrit would have worse performance with the new notedb backend than the old sql backend, but new releases of gerrit haven't supported using sql for years	13:11
fungi	we held out quite some time while the gerrit developers improved notedb performance, but there's only so much you can do when you want quick access to data stored in git repos	13:12
yoctozepto	well, it also boils down to 'how much worse' :D	13:12
fungi	yeah, and in this case i expect, but haven't looked yet, that the java exceptions frickler saw in the log were related to contention over write locks	13:14
fungi	something a traditional rdbms is engineered to handle gracefully	13:14
bwensley	Hey everyone - just wondering if there is any progress on the missing launchpad updates when reviews are created or merged. I see it mentioned here: https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes	13:14
bwensley	Is there an LP or something tracking that issue?	13:15
bwensley	It's pretty painful for everyone to have to remember to update each LP. Easy to forget and then it becomes hard to track fixes.	13:15
fungi	bwensley: no, just the etherpad. someone needs to find time to write a replacement. i had hoped to but other fires keep errupting	13:16
fungi	if some of the people it's painful for would join the opendev sysadmins in running these services things might happen more quickly	13:16
fungi	probably the most sustainable replacement for launchpad integration would be a zuul job which talks to lp's api	13:18
fungi	felixedel: corvus: aha, actually the earliest unexpected error i see appears to be a worker timeout	13:22
fungi	[2021-02-11T09:17:40.772+0000] [SSH git-receive-pack /zuul/zuul.git (REDACTED)] WARN com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker killed after 240108ms: (timeout 108ms, cancelled)	13:23
fungi	[2021-02-11T09:17:40.773+0000] [SSH git-receive-pack /zuul/zuul.git (REDACTED)] WARN com.google.gerrit.server.git.MultiProgressMonitor : unable to finish processing	13:23
fungi	java.util.concurrent.CancellationException	13:23
*** dmellado has quit IRC		13:25
*** dmellado has joined #opendev		13:25
fungi	another of the same at 10:06:11.948	13:26
*** jpena\|lunch is now known as jpena		13:26
bwensley	fungi: Thanks for the update. What skills would be required to write the new zuul job you propose?	13:26
bwensley	And is this something that is a few days of effort or bigger than that?	13:27
fungi	bwensley: familiarity with writing zuul jobs, and reading how to interact with the gerrit and launchpad rest apis, then doing some local testing of the resulting ansible	13:27
fungi	felixedel: corvus: and then at 10:20:51.036 an apparently new exception starts to appear for those, "com.google.gerrit.exceptions.StorageException: interrupted"	13:28
fungi	lots and lots of those	13:29
fungi	bwensley: probably a few days effort. it would basically need to functionally replace our old implementation from https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/update_bug.py and the update_blueprint.py in the same directory	13:31
fungi	those were called with gerrit hook scripts, but as a result they're not particularly transparent since they ran locally on the gerrit server and logged to gerrit's error log	13:32
fungi	and the old gerrit database access mechanism they relied on is no longer a possibility in the version of gerrit we're running now	13:32
*** stand has quit IRC		13:54
*** ysandeep\|rover is now known as ysandeep\|dinner		14:20
*** ykarel_ has joined #opendev		14:26
*** ykarel has quit IRC		14:29
*** fressi has left #opendev		14:35
*** ykarel_ has quit IRC		14:49
*** hashar is now known as hasharAway		14:51
*** dmellado has quit IRC		14:53
*** dmellado has joined #opendev		14:54
openstackgerrit	Merged opendev/system-config master: Build Gerrit 3.3 images https://review.opendev.org/c/opendev/system-config/+/765021	14:55
*** ysandeep\|dinner is now known as ysandeep\|rover		15:04
slittle1	Can anyone help us understand why zuul objected to https://review.opendev.org/c/starlingx/integ/+/775056 ?	15:09
slittle1	we'll try a recheck. See if it happens again	15:10
*** dmellado has quit IRC		15:10
*** dmellado has joined #opendev		15:12
fungi	slittle1: that looks similar to a problem i just started investigating	15:13
*** hasharAway is now known as hashar		15:22
fungi	so far i haven't found a correlation, but i'm also double-booked in meetings currently and for the next couple of hours	15:24
fungi	looks like that one ran from our ze03 executor	15:29
fungi	debug log says it got a socket timeout talking to storage.bhs.cloud.ovh.net	15:30
fungi	the variety of network errors we're seeing in different failures and the spread across different executors makes me wonder if there are local network problems in rackspace's dfw region (where all the executors are running)	15:32
*** weshay\|ruck has joined #opendev		15:33
weshay\|ruck	0/	15:33
fungi	https://rackspace.service-now.com/system_status is where i thought they posted their systems status info, but all i get there is a blank page	15:35
slittle1	Can zuul jobs be redirected to other executors? other regions?	15:36
fungi	it looks like i can get to the https port on telnet storage.bhs.cloud.ovh.net from home, but not from one of our executors	15:37
fungi	storage.gra.cloud.ovh.net is still reachable for them though	15:37
fungi	so it's likely a routing problem close to rackspace or in a backbone provider i'm not traversing	15:37
fungi	i can also get to storage.bhs.cloud.ovh.net just fine from another rackspace region (iad)	15:38
fungi	yeah, looks like nothing i've tried in rackspace's dfw region can reach storage.bhs.cloud.ovh.net	15:39
fungi	i'll push up a temporary change to yank it from the log upload pool	15:39
clarkb	fungi: is that via ipv4 v6 or both?	15:40
fungi	ipv4	15:41
fungi	doesn't look like they publish aaaa rrs for it	15:41
openstackgerrit	Jeremy Stanley proposed opendev/base-jobs master: Temporarily remove storage.bhs.cloud.ovh.net https://review.opendev.org/c/opendev/base-jobs/+/775193	15:43
clarkb	fungi: does ovh gra1 have the same issue or just bhs1?	15:44
fungi	clarkb: if that ^ looks acceptable, i'll bypass zuul and merge it directly	15:44
clarkb	oh the commit message says it was just bhs	15:44
fungi	clarkb: we can reach their gra swift from dfw, yeah	15:44
clarkb	fungi: +2	15:44
openstackgerrit	Merged opendev/base-jobs master: Temporarily remove storage.bhs.cloud.ovh.net https://review.opendev.org/c/opendev/base-jobs/+/775193	15:46
fungi	slittle1: weshay\|ruck: ^ that should take effect immediately for any builds starting at this point forward	15:46
weshay\|ruck	rock on.. thanks for jumping on this!	15:47
fungi	that still doesn't address the other examples weshay\|ruck found earlier and mentioned in #openstack-infra but i think it's the bulk of the recent POST_FAILURE results	15:47
weshay\|ruck	k	15:47
fungi	the builds running into this problem would have had no logs uploaded at all, not even zuul manifests	15:48
fungi	#status notice Recent POST_FAILURE results from Zuul for builds started prior to 15:47 UTC were due to network connectivity issues reaching one of our log storage providers, and can be safely rechecked	15:49
openstackstatus	fungi: sending notice	15:49
-openstackstatus- NOTICE: Recent POST_FAILURE results from Zuul for builds started prior to 15:47 UTC were due to network connectivity issues reaching one of our log storage providers, and can be safely rechecked		15:49
priteau	What should we do when the failure was in a promote-openstack-releasenotes job?	15:52
priteau	Ignore and wait until another change updates release notes?	15:52
clarkb	priteau: the failure should have happened at the very end of the jobs as they were related to log uploads	15:52
clarkb	priteau: I would check if your release notes have published successfully by checking the published results directly	15:52
clarkb	its possible nothing needs to be done	15:52
fungi	right, the actual work performed by the job was probably done	15:52
openstackstatus	fungi: finished sending notice	15:52
clarkb	re pushing large stacks of changes, it does seem to be a workload that newer gerrit struggles with	15:53
clarkb	it may be worth starting a discussion upstream to see if they have thoughts on making that particular case run smoother, but in the short term i expect that the best bet is to keep stacks shorter and manageable	15:54
fungi	clarkb: though most of the user-facing impact was zuul trying to get through the resulting event pileup	15:54
clarkb	fungi: yup	15:54
priteau	I don't see the change I was expected, although the job ran two hours ago	15:56
priteau	Not a huge issue, I am sure we have more backports in the queue	15:56
clarkb	landing a followup change should rerun everything with those addtional updates as well	15:57
clarkb	priteau: was the job a POST_FAILURE ? or just normal failure?	15:57
fungi	https://zuul.opendev.org/t/openstack/builds?job_name=promote-openstack-releasenotes shows several recent POST_FAILURE results	15:57
priteau	promote-openstack-releasenotes https://zuul.opendev.org/t/openstack/build/55f8c685f5fb4d6cbcf73b5a0acd74af : POST_FAILURE in 14m 30s	15:57
fungi	i can probably go digging in executor debug logs for the ansible output for those after my upcoming meetings	15:58
priteau	Only if you feel like it, I am sure you have better things to do :)	15:59
fungi	not necessarily better, just more	15:59
fungi	traceroutes to there seem to die within rackspace's dfw border	16:00
fungi	so i suspect bug problems or something similar at their border	16:01
fungi	or that just happens to be the differentiating point for an assymetric route with a problem elsewhere on the return path	16:01
clarkb	fungi: I'm seeing similar in the opposite direction and ipv6 seems affected too	16:03
clarkb	(host from bhs can't reach review)	16:03
fungi	tracerouting the other direction stops at an ovh router in newark, new jersey	16:03
fungi	so within their border too	16:04
fungi	i wonder if they peer with each other	16:04
*** Alex_Gaynor has left #opendev		16:04
*** ysandeep\|rover is now known as ysandeep\|out		16:09
fungi	slittle1: weshay\|ruck: here it is... http://travaux.ovh.net/?do=details&id=48997&	16:16
fungi	looks like ovh has a network peer in dallas	16:17
fungi	or at least had ;)	16:17
weshay\|ruck	k.. so.. they are working on it afaict	16:18
weshay\|ruck	Provider operations continues on-site to restore service.	16:18
weshay\|ruck	fungi thanks for the udpate	16:18
fungi	the internet never ceases to be a source of daily excitement	16:18
fungi	i nearly typed excrement there	16:18
clarkb	"weather conditions in the area are extending resolution time" I feel that	16:18
clarkb	fungi: no doubt it has that too	16:19
clarkb	looking at the csv generated in that gitea change I can see that it results in some load, but not significant from what I can tell. However that maybe enough to tip over a busy system?	16:22
*** mlavalle has joined #opendev		16:24
clarkb	doing the description updates seems to produce a load average of 3.5 on the server, fairly consistent ~50% usr proc (is that 50% of 8vcpu or 50% of one?) and it runs stead with 1.3GB of memory consumed,	16:25
clarkb	I want to try cleaning up this dstat role as I think this is actually pretty useful insight	16:25
clarkb	then maybe I'll see if there are some obvious ways to improve the description updates. Maybe newer gitea added a way to check existing descriptions or something	16:26
openstackgerrit	Jeremy Stanley proposed opendev/base-jobs master: Revert "Temporarily remove storage.bhs.cloud.ovh.net" https://review.opendev.org/c/opendev/base-jobs/+/775209	16:34
clarkb	fungi: ^ is that in preparation or do we think it is happy now?	16:35
fungi	not happy yet afaik, just pushing it so we don't forget	16:35
fungi	commit message and wip comment indicate preconditions for approving	16:36
fungi	i do wonder if we should turn that region down in nodepool too, but if the api endpoint there is also unreachable then we're not starting new jobs there anyway	16:39
clarkb	yup I thought about that too and I think ^ is effectively going to idle it	16:40
*** marios is now known as marios\|out		17:01
*** jdwidari has joined #opendev		17:26
*** jdwidari has quit IRC		17:30
openstackgerrit	Iury Gregory Melo Ferreira proposed openstack/project-config master: Add Backport-Candidate label to Ironic projects https://review.opendev.org/c/openstack/project-config/+/775244	17:39
hashar	hi opendev :) We have a post failure for opendev/gear . It is apparently unable to publish a Docker image. I guess due to lack of credentials? https://review.opendev.org/c/opendev/gear/+/688446	17:39
hashar	I don't know where to report it so it can be acted asynchronously	17:39
*** rpittau is now known as rpittau\|afk		17:43
*** dtantsur is now known as dtantsur\|afk		17:45
clarkb	hashar: likely just needs credentials	17:46
hashar	clarkb: is there any doc for that?	17:46
hashar	or maybe that is something that can only be setup by admins	17:46
clarkb	I think we do document it actually. Let me see	17:46
clarkb	but yes I think also that someone with access to the secret needs to do it	17:47
*** jpena is now known as jpena\|off		17:47
hashar	the job is based on opendev-upload-docker-image if you are familiar with that	17:47
clarkb	https://docs.opendev.org/opendev/base-jobs/latest/docker-image.html#jobvar-opendev-upload-docker-image.docker_credentials	17:48
clarkb	hashar: specifically someone that has access to the docker hub org credentials needs to make the zuul secret	17:48
clarkb	in this case that would be one of the opendev admins	17:49
hashar	clarkb: should I file a bug about it somewhere?	17:50
hashar	or maybe an email to some list is sufficient	17:50
fungi	i'm not able to pull up that build result actually	17:50
hashar	maybe it has expired	17:51
clarkb	ya emailing the service-discuss list asking for someone to find time to encrypt the secret and update the change is probably a good next step	17:51
hashar	when I looked at it I think it complained about a lack of credential	17:51
clarkb	maybe mordred and corvus since they have previously reviewed it	17:51
clarkb	ya the ttl on those is 30 days iirc and the job ran in december	17:51
fungi	oh, yep, december 11. it would have just expired out of the object store today	17:51
hashar	but the task output was hidden in the ansible build output	17:51
fungi	er, a month ago	17:52
fungi	i'm terrible with calendars	17:52
fungi	looks like that job runs in the gate pipeline and is preventing the change from merging, so we can recheck to get fresh logs	17:54
*** ralonsoh has quit IRC		17:54
mordred	it just needs to be updated to add credential - and be updated to the new state of the art with image jobs	17:54
fungi	so i should be able to reuse the same credential we have for that dockerhub namespace just reencrypt it for the opendev/gear key	17:55
clarkb	fungi: yup	17:55
fungi	i'll see what i can do there	17:56
hashar	:-]	17:56
hashar	thank you !	17:56
mordred	https://opendev.org/zuul/zuul-registry/src/branch/master/.zuul.yaml#L1-L61 is a good simple place to cargo-cult from	17:56
fungi	after i finish reenqueuing a couple of failed release jobs from last month	17:56
fungi	okay, that's done, working on the gear image uploads	17:59
hashar	fungi: thank you for stepping in!	18:10
mordred	fungi: if you're in a secret encoding and uploading mood, https://review.opendev.org/c/zuul/zuul-storage-proxy/+/774998 needs a secret too - as well as the upload and promote jobs	18:19
*** eolivare has quit IRC		18:19
fungi	mordred: sure, i can take a look shortly. almost have the gear change done i think	18:20
mordred	\o/	18:20
*** marios\|out has quit IRC		18:20
fungi	any idea where that failing gear-upload-image job is defined? codesearch isn't turning it up	18:24
fungi	will want to track it down and rip it out	18:25
mordred	I thnik it may not exist anymore?	18:26
mordred	like - wasn't there a "move jobs in repo" related to gear in the not-too-distant-past?	18:26
fungi	oh, i love problems which solve themselves ;)	18:26
mordred	I don't see anything in any of the jobs lists on zuul.opendev.org	18:28
fungi	not finding it in the openstack/project-config git history either	18:30
fungi	i'll just assume it no longer exists	18:30
mordred	++	18:30
fungi	and we'll find out if that's true	18:30
mordred	if it shows back up, we'll have breadcrumbs	18:30
fungi	looking in https://hub.docker.com/u/opendevorg/ i don't see that we ever published an image for it there anyway	18:32
fungi	should i call the image gear or geard?	18:32
fungi	i guess it needs to match whatever's in the dockerfile?	18:34
fungi	except there is no dockerfile	18:34
fungi	so i need to add more than just zuul configuration i suppose	18:34
fungi	oh, hah, i should have looked more closely at https://review.opendev.org/688446	18:35
fungi	it's adding the failing job, but also a dockerfile	18:35
mordred	yeah - definitely this is an "update that patch" sort of patch	18:39
mordred	and - I think I'd go with opendev/geard	18:39
mordred	(you'll want to s/as nodepool/as geard/ in the Dockerfile)	18:39
fungi	repository: opendevorg/gear	18:41
fungi	target: geard	18:41
fungi	that's what i want to pass in the docker_images var for the job?	18:41
fungi	i assume so anyway, giving it a shot	18:41
openstackgerrit	Jeremy Stanley proposed opendev/gear master: Added Docker image builds https://review.opendev.org/c/opendev/gear/+/688446	18:42
fungi	mnaser: hashar: mordred: ^ hopefully that's complete	18:42
openstackgerrit	Clark Boylan proposed opendev/system-config master: Use dstat to record performance of system-config-run hosts https://review.opendev.org/c/opendev/system-config/+/775051	18:42
clarkb	If ^ works I think that will produce dstat info for all our system-config-run hosts	18:43
clarkb	might produce interesting/useful data or it might be so inaccurate that it doesn't matter, but I figure having it can't hurt	18:43
fungi	oh, i see, repository is what to call the image on dockerhub, target is the dockerfile target to publish there. i'll respin	18:46
openstackgerrit	Jeremy Stanley proposed opendev/gear master: Added Docker image builds https://review.opendev.org/c/opendev/gear/+/688446	18:47
mordred	fungi: I left a comment on the prior PS	18:48
mordred	oh - nevermind - theyr'e on the current PS	18:49
fungi	cool, thanks	18:55
openstackgerrit	Jeremy Stanley proposed opendev/gear master: Added Docker image builds https://review.opendev.org/c/opendev/gear/+/688446	19:11
*** sboyron_ has quit IRC		19:18
*** sshnaidm is now known as sshnaidm\|afk		19:19
fungi	looking at the zuul-storage-proxy image publication change now	19:24
*** rchurch has quit IRC		19:34
*** rchurch has joined #opendev		19:34
hashar	mordred: fungi: you are awesome thank you!	19:35
fungi	it's my pleasure!	19:36
*** andrewbonney has quit IRC		19:42
openstackgerrit	Merged opendev/gear master: Added Docker image builds https://review.opendev.org/c/opendev/gear/+/688446	19:54
ianw	clarkb: using gnuplot and putting out nice-ish graphs as artifacts would be pretty cool for https://review.opendev.org/c/opendev/system-config/+/775051 :)	20:03
clarkb	ianw: https://lamada.eu/dstat-graph/ is what I usually use. I bet we could vendor that code (its on github somewhere iirc) and provide similar	20:05
ianw	clarkb: nice! git clone https://github.com/Dabz/dstat_graph.git && cd dstat_graph && generate_page.sh ./the-csv.file > page.html	20:13
clarkb	oh cool, I can take a look at adding that	20:14
clarkb	I wonder if we can actually vendor the code instead of cloining it too	20:14
ianw	it might be even better in zuul-jobs ... i don't see why we couldn't use it in a lot of places	20:14
clarkb	oooh good idea	20:14
ianw	the bits it uses are pretty old	20:15
ianw	https://github.com/Dabz/dstat_graph/tree/master/js	20:15
clarkb	ya there may be better ways of doing it now, I had just used that tool in the past via that website and it worked well enough for me	20:15
ianw	the glyphicons don't show up for me	20:17
clarkb	I wonder if more modern browses have stricter rules for fonts/glyphs?	20:18
ianw	and the buttons don't seem to work	20:18
ianw	as with all great web projects, it looks like pretty much every part has been abandonded and needs to be basically re-written	20:20
clarkb	the buttons and menus do work on the hosted version. Not sure what the glyphs are for	20:20
clarkb	it is also somewhat slow	20:20
clarkb	(I expect that is based on the number of csv entries	20:20
ianw	e.g. bootstrap 3 -> boostrap 4; https://github.com/novus/nvd3 d3 v3 -> v4 ;	20:21
ianw	oh, d3 is old news, you should just be making your own graphs using canvas and css https://medium.com/@PepsRyuu/why-i-no-longer-use-d3-js-b8288f306c9a	20:27
clarkb	is that hard mode?	20:27
ianw	i just checked; the makefile that created all the graphs using gnuplot for my thesis from 2007 still works just fine :)	20:29
mordred	AND ... TIL about https://preactjs.com/	20:32
ianw	and somehow i had totally missed that dstat is dead(?) but replaced with something on rh platforms or something ... https://news.ycombinator.com/item?id=19986646	20:35
clarkb	I think spamaps took over matinaining it?	20:37
ianw	mordred: in the final insult, the preact website lists "BBC Roasting Calculator 🦃 Calculates cooking times for different cuts of meat." as a sample application, but it's gone 404	20:40
fungi	nooo! not the barbecue calculator!	20:41
*** bwensley_ has joined #opendev		20:47
mordred	hah	20:50
mordred	well - a different implementation of calculating cooking times is known as "thermometer"	20:50
*** bwensley__ has joined #opendev		20:50
*** bwensley has quit IRC		20:50
mordred	I actually had a fork of dstat YEARS ago (before openstack) that added plugin support - which at least at the time dag was against but which I needed for my life as a mysql consultant	20:54
mordred	https://launchpad.net/mtstat	20:54
mordred	I'm fairly certain the existence of that is not useful to anyone	20:54
*** bwensley_ has quit IRC		20:54
ianw	also see my github project "poke it with your finger to see how springy it is" ... you can, quite literally, fork it	20:54
mordred	oh wow - that actually just flat doesn't exist - since I didn't host the code directly on launchpad :)	20:54
mordred	well - I suppose the source tarball is there	20:55
* mordred goes back into his hole		20:55
fungi	yeah, i use remote digital thermometers with alarms because:lazy	20:56
ianw	could i request a couple of reviews	20:58
mordred	yup	20:58
ianw	https://review.opendev.org/c/opendev/system-config/+/775068 - fixes a backup script typo, want to make sure i didn't make any more mistakes	20:58
ianw	https://review.opendev.org/c/opendev/system-config/+/775057 - afs ansible isn't quite working. this allows the servers to share the key material from just one group file on bridge	20:59
ianw	https://review.opendev.org/c/opendev/system-config/+/775053 - this is for refstack ... it worked when i manually made those changes, but docker redeployed it	21:00
ianw	that will (i think) bit it for refstack; we can get kopecmartin to confirm it looks ok and then start the process of removing the old bits	21:01
mordred	ianw, fungi : on the backup script change - I think there is a typo	21:11
fungi	oh?	21:12
fungi	oh, yep!	21:13
fungi	i did not see that	21:13
fungi	i think my eyes must filter out _	21:13
mordred	rightfully so	21:14
hashar	fungi: thanks for the opendevorg/geard Docker image!	21:14
ianw	thanks, i had that feeling something was still wrong :)	21:15
fungi	hashar: it worked? awesome!	21:15
hashar	yeah!	21:16
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup: fix backup script failure match https://review.opendev.org/c/opendev/system-config/+/775068	21:16
hashar	I will look at other open changes for that repo eventually :]	21:17
hashar	but for now sleep time. thanks for the fix	21:17
fungi	have a good evening hashar, thanks for the help!	21:17
hashar	you are welcome =]	21:18
*** hashar has quit IRC		21:18
mordred	O M G	21:22
mordred	https://www.youtube.com/watch?v=hgI0p1zf31k	21:22
mordred	corvus: ^^	21:22
fungi	his beard is the correct length	21:25
corvus	i love gary jules	21:27
corvus	mordred: you are a 'git blame titan'	21:28
mordred	yup	21:32
mordred	:)	21:32
corvus	however, while they did use the gary jules arrangement, the song is actually by tears for fears	21:32
corvus	(i love the gary jules version, but i like the interpretation of the original better due to the musical irony)	21:35
corvus	i have both albums :)	21:35
corvus	i could go on for a bit on this....	21:36
*** bwensley__ has quit IRC		21:41
ianw	clarkb: hrm, any idea why the gerrit 3.3 func tests worked in check but not in gate? https://review.opendev.org/c/opendev/system-config/+/773807	21:45
ianw	Pulling shell (docker.io/opendevorg/gerrit:3.3)...	21:45
ianw	manifest for opendevorg/gerrit:3.3 not found: manifest unknown: manifest unknown	21:45
clarkb	no	21:46
clarkb	did it not upload properly to the buildset registry maybe?	21:47
ianw	i guess https://review.opendev.org/c/opendev/system-config/+/765021/8 didn't promote it ...	21:47
ianw	system-config-promote-image-gerrit-3.2 is a dependency of infra-prod-manage-projects	21:49
clarkb	ya because 3.2 is what we are running in prod	21:50
clarkb	I don't know that promotion is the problem since those jobs create the image	21:50
clarkb	pre promotion	21:50
openstackgerrit	Merged opendev/system-config master: refstack: use external https for API https://review.opendev.org/c/opendev/system-config/+/775053	21:52
ianw	https://review.opendev.org/c/opendev/system-config/+/773807 didn't run system-config-build-image-gerrit-3.3	21:52
clarkb	it should run the upload job	21:53
clarkb	oh it didn't run the upload for 3.2 either	21:53
clarkb	so ya we haven't yet published that image yet in docker hub so it failed? I bet this is a dependency problem	21:53
ianw	it must have pulled it from the registry of it's child job to pass in the check?	21:53
clarkb	from the parent change maybe	21:54
clarkb	in this case if we recheck it should work if the parent promoted properly?	21:54
ianw	yeah, but nothing will run system-config-promote-image-gerrit-3.3	21:55
clarkb	seems that it didn't	21:55
corvus	clarkb: if a change used a container dependency in check, it should do that in gate too	21:55
corvus	oh, unless the change merged ahead of time and then failed promote	21:55
corvus	(ie, if they weren't in gate at the same time)	21:56
ianw	i think the problem isn't that the promote failed as such, the promote job was never added. which is about the same thing in the end, i think	21:56
ianw	yeah, we have opendevorg/gerrit/change_765021_3.3	21:57
clarkb	ya so I did something wrong in the set of jobs?	21:57
clarkb	system-config-gerrit-images template has the promote job in it	21:59
clarkb	do we not actually use that template? is that just an attractive problem?	21:59
clarkb	no it is in there in the templates	21:59
openstackgerrit	Ian Wienand proposed opendev/system-config master: Add promote for gerrit 3.3 image https://review.opendev.org/c/opendev/system-config/+/775287	21:59
ianw	hrm, maybe i'm wrong on that. we have a template ...	22:00
clarkb	ya I think the template has the jobs. I'm guessing we didn't meet some condition to run the job?	22:00
clarkb	we probably have a file filter that excluded it. I feel like we've run into this before with promotes	22:00
ianw	indeed	22:01
ianw	zuul.d/docker-images/gerrit.yaml doesn't match the promote job	22:01
clarkb	we could do a noopy change to the dockerfile probalby	22:03
openstackgerrit	Ian Wienand proposed opendev/system-config master: Add promote for gerrit 3.3 image https://review.opendev.org/c/opendev/system-config/+/775287	22:03
ianw	gerrit is doing that thing where it takes a long time to post a review	22:03
openstackgerrit	Ian Wienand proposed opendev/system-config master: Add promote for gerrit 3.3 image https://review.opendev.org/c/opendev/system-config/+/775287	22:04
openstackgerrit	Ian Wienand proposed opendev/system-config master: Trigger promote for gerrit 3.3 image https://review.opendev.org/c/opendev/system-config/+/775287	22:04
openstackgerrit	Merged opendev/system-config master: borg-backup: fix backup script failure match https://review.opendev.org/c/opendev/system-config/+/775068	22:06
fungi	ianw: "that thing" is usually accompanied by high system load on the server... same correlation again?	22:07
ianw	top seems ok	22:08
*** gmann is now known as gmann_afk		22:10
ianw	memory usage is high, but otherwise cpu doesn't seem up on cacti	22:11
clarkb	note the jvm hogs the memory and you can separately check jvm memory usage in melody	22:11
clarkb	though it does peak to use the jvm hogged memory last I looked its 95 percentile was like 1/2 of peak	22:12
fungi	yeah, memory usage for the server is essentially irrelevant	22:12
fungi	usually when i look it's consumed all but a tiny bit of available ram and plateaued there	22:13
fungi	the jvm is doing its own internal allocation from that	22:13
*** Dmitrii-Sh has quit IRC		22:19
*** Dmitrii-Sh has joined #opendev		22:20
fungi	the high load average, when i see it, tends to be from or accompanied by a mix of much higher than usual system and user cpu cycles	22:20
corvus	mordred: it looks like since the python-builder image assemble script creates the /output/bindep/run.txt file on the python-builder image, that means that if a package is installed on the builder image it won't be included in that file. so if there's a package needed at runtime that's installed in python-builder but not in python-base, then it won't be there. does that sound right to you?	22:39
corvus	mordred: (this isn't actually a problem for me right now, i just discovered it when trying to add the 'unzip' package to bindep to get it to show up on the final image to debug the actual problem i'm having, which is that there doesn't seem to be any python files in the zuul_storage_proxy wheel that's being built	22:40
*** ildikov has quit IRC		22:40
*** ildikov has joined #opendev		22:40
mordred	corvus: it seems right - it's why we try to install as little as possible in the builder images and have some amount of special logic to handle the things that unavoidably must be installed there	22:40
corvus	ok; so just something to keep in mind (and possible add as a comment or something) and deal with later if it comes up :)	22:41
openstackgerrit	Merged opendev/system-config master: Refactor AFS groups https://review.opendev.org/c/opendev/system-config/+/775057	22:46
mordred	corvus: yah	22:50
*** slaweq has quit IRC		23:09
*** gmann_afk is now known as gmann		23:11
mordred	corvus: so - followup - you're saying we have unzip installed in python-builder?	23:22
ianw	fungi: hrm, so i can't get the wiki logging into the backup hosts ... so far not sure if it's mis-configuration, or some sort of weird issues with key types :/	23:22
corvus	mordred: i'm no longer sure about that. (i'm fairly sure our conclusion about the bindep "hole" is correct, but i was inspecting the wrong image, so unzip may well not be installed)	23:26
mordred	yeah - I just checked python-builder and didn't find it there	23:29
corvus	sorry for the red herring :(	23:29
corvus	clarkb, fungi, ianw: re the report on gerrit from the bmw folks this morning -- what are the next steps there? do we want to investigate it, or go with a "don't do that" approach for now?	23:35
ianw	is that the 60 changeset push?	23:36
corvus	yeah	23:36
ianw	excellent, it seems the trusty host can not log into the backup servers. if i take the key and use it locally, i can	23:36
clarkb	corvus: I suggested that maybe asking upstream gerrit about it would be a good idea, as it seems to affect large stacks of pushes generally	23:42
clarkb	perhaps this is a known issue or something they have tuning advice for	23:42
clarkb	but I half expect the answer may be don't do that or you need a bigger server	23:42
*** whoami-rajat__ has quit IRC		23:43
clarkb	fungi: I am able to fetch review.o.o's index.html from an ovh bhs1 host now	23:45
clarkb	fungi: maybe we test the other direction too then put bhs1 back into the rotation?	23:45
clarkb	ianw: ssh -vvv may have clues (that was super useful debugging the gerrit ssh stuff from fedora33 recently)	23:46
clarkb	fungi: https://grafana.opendev.org/d/qh6NXp2Mk/nodepool-ovh?orgId=1&from=now-12h&to=now that seems to show the gap when the network gear was sad	23:47
ianw	clarkb: yeah, afaict it seems to be offering the key ... the other side doesn't say anything helpful :/	23:48
ianw	the amount of time i want to debug trusty openssh issues is ... not large	23:49
clarkb	the client side should say if the server side didn't offer a matching algoritm though I think	23:49
clarkb	no mutual something something iirc	23:49
ianw	it just says "Invalid user borg-wiki-upgrade-test"	23:49
clarkb	huh I wonder of the server side /var/log/auth.log or similar would give more clues based on that message	23:50
ianw	that's coming from auth.log	23:50
clarkb	huh, I'm all out of immediate ideas then :)	23:52
ianw	yeah, me too. the fact that the same key works from my machine makes me suspect the trusty side client version	23:53
clarkb	is it an rsa key?	23:53
ianw	ed25519	23:53
clarkb	(I would expect maximum compat via rsa, but maybe the fedora situation implies otherwise)	23:53
ianw	it's offering it -- debug1: Offering ED25519 public key: /root/.ssh/id_borg_backup_ed25519	23:54
clarkb	I guess another thought would be to try rsa?	23:56
clarkb	http://travaux.ovh.net/?do=details&id=48997& implies that things may not be fully back to normal yet, maybe we wait a bit longer given that	23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!