Thursday, 2020-11-05

ianw	infra-root: pypa/pip have enabled the opendev app with https://github.com/pypa/pip/issues/9103	00:01
ianw	infra-root: i've proposed https://review.opendev.org/#/c/761467/2 and https://review.opendev.org/#/c/761468/2 to setup the same project-config/tenant setup as for pyca	00:01
ianw	I'll leave it for review on those to essentially agree that we're happy to have our resources put towards this; for mine I think similar to pyca it's going to help everyone	00:01
clarkb	what sort of job do we expect them to be running? openstack constraints integration testing type deals?	00:01
ianw	my understanding would be tox type testing on, particularly on debian/centos and perhaps fedora	00:06
*** tosky has quit IRC		00:14
openstackgerrit	Ian Wienand proposed opendev/system-config master: [wip] borg: add fuse https://review.opendev.org/761275	00:15
clarkb	ianw: couple of comments on ^	00:19
ianw	clarkb: not sure what you mean if it depends on borgbackup? i think adding it to extras includes it? i'll have a poke at the unmount, you're probably right but it does seem to have a command	00:21
clarkb	I mean does pip install borgbackup[fuse] imply installing borgbackup? I don't actually know	00:21
clarkb	and for the unmount I think you're just doing `borg /opt/backups` rather than something like borg unmount /opt/backups	00:22
fungi	usually [extras] will install everything which gets installed without that, plus whatever is in the extra-requires	00:22
ianw	is grafana.opendev.org not responding for others?	00:28
clarkb	I can get it via ssh but not https	00:29
clarkb	looks like an iptables problem	00:30
clarkb	it doesn't have port 80 and 443 open in iptables	00:30
fungi	how would that have happened?	00:31
clarkb	did the groups change for it? we use webserver group for 80 and 443 in many cases	00:31
ianw	hrm, weird	00:31
fungi	last ssh login (before now) was nearly a month ago, so i doubt we did anything manually directly on that server	00:33
ianw	grafana[0-9].opendev.org is in the webservers group	00:33
ianw	that wants a *	00:35
ianw	hrm	00:35
fungi	d'oh!	00:36
openstackgerrit	Ian Wienand proposed opendev/system-config master: Add * match to grafana.opendev.org https://review.opendev.org/761476	00:36
fungi	that changed with the cleanup of the old grafana server i guess	00:36
ianw	i need to shut that down	00:38
ianw	i'll get the opendev working then do that today	00:38
ianw	i think i didn't notice because my url bar has auto-filled in the old openstack.org server	00:38
fungi	ahh, okay, i didn't realize that was still in progress	00:39
*** DSpider has quit IRC		00:40
ianw	neither did I :)	00:42
ianw	clarkb: This is a convenience wrapper that just calls the platform-specific shell command - usually this is either umount or fusermount -u.	00:48
ianw	so yeah, can just unmount	00:49
ianw	testinfra works though; it runs a test backup to the test backup server, then can mount it via fuse. pretty cool!	00:49
clarkb	nice	00:49
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup: add fuse https://review.opendev.org/761275	00:57
*** whoami-rajat___ has joined #opendev		01:06
openstackgerrit	Merged opendev/elastic-recheck master: Add query for bug 1901739 https://review.opendev.org/759967	01:07
openstack	bug 1901739 in OpenStack Compute (nova) " libvirt.libvirtError: internal error: missing block job data for disk 'vda'" [High,Confirmed] https://launchpad.net/bugs/1901739	01:07
openstackgerrit	melanie witt proposed opendev/elastic-recheck master: Add query for bug 1902002 https://review.opendev.org/761478	01:16
openstack	bug 1902002 in devstack "Fail to get default route device in CI jobs" [Medium,In progress] https://launchpad.net/bugs/1902002 - Assigned to Dr. Jens Harbott (j-harbott)	01:16
openstackgerrit	Merged opendev/system-config master: Add * match to grafana.opendev.org https://review.opendev.org/761476	01:16
openstackgerrit	melanie witt proposed opendev/elastic-recheck master: Add query for bug 1902002 https://review.opendev.org/761478	01:18
openstack	bug 1902002 in devstack "Fail to get default route device in CI jobs" [Medium,In progress] https://launchpad.net/bugs/1902002 - Assigned to Dr. Jens Harbott (j-harbott)	01:18
ianw	grafana opendev back, i'll clean up the old now	02:18
openstackgerrit	Ian Wienand proposed opendev/system-config master: grafana: redirect http to CNAME https://review.opendev.org/761487	02:28
ianw	i think the new graphite server is good too. i'll cleanup the old one	02:38
ianw	#status log remove old graphite01.opendev.org server and storage	02:41
openstackstatus	ianw: finished logging	02:42
ianw	#status log removed grafana02.openstack.org, CNAME now goes to grafana.opendev.org	02:42
openstackstatus	ianw: finished logging	02:42
openstackgerrit	Merged opendev/system-config master: borg-backup: add fuse https://review.opendev.org/761275	02:45
openstackgerrit	Ian Wienand proposed opendev/system-config master: grafana: fix typo in test name https://review.opendev.org/761489	02:57
*** hamalq has quit IRC		03:21
*** ykarel has joined #opendev		03:49
melwitt	does anyone know if this kind of 503 a common/known thing? ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='opendev.org', port=443): Max retries exceeded with url: /openstack/requirements/raw/branch/master/upper-constraints.txt (Caused by ResponseError('too many 503 error responses',))	03:50
mnaser	melwitt: interesting you bring that up, i am getting a few failures in our downstream with `error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.`	04:59
melwitt	O.o	05:01
melwitt	the gate is killing me rn I swear :(	05:01
mnaser	:(	05:03
ianw	melwitt: that ... shouldn't happen. one of our backends might be unhappy	05:04
ianw	melwitt: which job?	05:04
melwitt	that was nova-live-migration https://zuul.opendev.org/t/openstack/build/5a4e126cd734457fa1024575ec193440/log/logs/devstacklog.txt#8474	05:04
ianw	mnaser: i agree with your point re: zuul as a 3rd party CI. but ... I think we need to reach out and try to bring people along for the journey	05:05
melwitt	I'll be back in a couple of hours to recheck for the 9th time o/	05:05
ianw	let me see if i can find where the lb sent that	05:06
mnaser	ianw: i only voice this because i kinda tried the experiment with cherrypy and it really lead to nothing but just a normal reporting job amongst many others	05:06
mnaser	if there's no incentive to move towards gating, eh.	05:06
ianw	i do see your point and somewhat, agree and pyca has been similar. we could do their wheel releases for them, which they get manually involved with but there's been resitance	05:08
ianw	however, i feel like having skin in the game, when things come up; when you can point out that zuul would have stopped that breaking change, etc. gives a chance for adoption	05:09
ianw	i think that was 23.253.203.147	05:09
ianw	i think that went to gitea05 balance_git_https/gitea05.opendev.org	05:11
ianw	sorty, it actually went to balance_git_https/gitea06.opendev.org	05:18
ianw	that host does actually look unhappy	05:20
ianw	2020/11/05 03:21:43 cmd/web.go:107:runWeb() [I] Starting Gitea on PID: 1	05:22
ianw	2020-11-05 03:21:32.393 \| ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='opendev.org', port=443): Max retries exceeded with url: /openstack/requirements/raw/branch/master/upper-constraints.txt (Caused by ResponseError('too many 503 error responses',))	05:24
ianw	it seems this managed to happen right as the container was restarting	05:24
*** fressi has joined #opendev		05:29
*** whoami-rajat___ is now known as whoami-rajat__		05:32
ianw	clarkb: just mounted fuse backups on ethercalc02, all seems to work. i think everything is ready to roll otu to more servers now	05:39
mnaser	ianw: just had a downstream job fail on 'error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.'	06:27
*** sboyron has joined #opendev		06:27
*** ysandeep\|away is now known as ysandeep\|ruck		06:36
*** Tengu has quit IRC		06:58
*** Tengu has joined #opendev		07:05
*** mschoenlaub has joined #opendev		07:05
*** mschoenlaub has quit IRC		07:06
*** marios has joined #opendev		07:09
*** lpetrut has joined #opendev		07:11
*** melwitt has quit IRC		07:20
*** melwitt has joined #opendev		07:21
*** ykarel_ has joined #opendev		07:21
*** eolivare has joined #opendev		07:22
*** ykarel has quit IRC		07:24
frickler	mnaser: RPC sounds like some internal call, not a download. do you have the logs accessible somewhere?	07:31
mnaser	frickler: that was during a git clone -- i don't have the log in an easily locatable way but ill try to keep more for the next time	07:32
frickler	iiuc we did unblock that crawling job, maybe it isn't limiting itself enough yet	07:34
mnaser	frickler: that was git clones to https://opendev.org though	07:35
*** ykarel_ is now known as ykarel		07:39
frickler	ah, right, the crawler went against gerrit. I do see some spikes on the gitea-lb cacti graphs since 0400, not sure if those might be related or whether they are normal and just smoothed out in the longer intervals	07:43
frickler	seems selection of custom intervals in the cacti graphs doesn't work for me, I always get only the default view	07:44
mnaser	frickler: caught it -- http://paste.openstack.org/show/799721/	07:46
mnaser	i think its almost always horizon at the root	07:46
*** slaweq has joined #opendev		07:54
*** ralonsoh has joined #opendev		07:59
frickler	I didn't find anything obvious in the logs. I also tried cloning horizon from every gitea instance, found no issues there, either	08:07
*** tosky has joined #opendev		08:13
*** andrewbonney has joined #opendev		08:14
zbr	fungi: ianw: ansible-lint does require ansible >=2.9 and the can't upgrade ansible in place is still valid without any fixing being planned.	08:18
zbr	ansible team is seeing it as a pip/setuptools bug, and they other side has other priorities. so we need to be careful to avoid it.	08:19
ianw	zbr: ansible-lint only specifies ansible>=2.8 in it's setup.cfg	08:19
zbr	well, that is easy to fix.	08:20
ianw	i'm not sure how that fits with https://review.opendev.org/#/c/761473/	08:21
zbr	i wanted to make the linter ansible version agnostic but is not possible now, it may take an year or more and help from ansible core to implement some missing features.	08:21
ianw	it doesn't seem right to lint the jobs with an ansible that zuul isn't running, although i'm not sure how much that actually matters	08:22
*** ysandeep\|ruck is now known as ysandeep\|lunch		08:22
zbr	hmm... now i check and I see that we still have the 2.8 pipelines in linter so it should really support 2.8	08:25
zbr	if it failed is likely due to a messed ansible install (due to upgrade/downgrade)	08:25
ianw	there's no upgrade or downgrade happening in the tox jobs; it was pinning the version to 2.7 which causes the failure	08:26
zbr	yep. that is because old pip does not check for conflicts, the new resolver would have prevented it.	08:27
zbr	add a "pip check" as first command, to prevent running code with broken deps.	08:27
zbr	upgrade/downgrade can still happen inside tox jobs based on the order the deps are defined, but that was not the issue in this case.	08:28
zbr	I could try to add an extra check for version in linter but I am not sure it does worth the effort.	08:29
ianw	i'm not actually sure what the failure mode would be leaving ansible uncapped. perhaps a later version would correctly parse something that would not actually parse in the earlier version zuul is using?	08:31
openstackgerrit	zbr proposed zuul/zuul-jobs master: More E208 https://review.opendev.org/761293	08:33
zbr	linting should not be confused with functional testing, linting is more about testing practices and detecting upcoming changes that may break your code, so that is why is better to use the upper bounds instead of lower ones. For functional is different.	08:36
*** sshnaidm\|rover has quit IRC		08:36
zbr	i have a good example from flake8 where it required to be run on newer version of python in order to detect a big range of issues, even if the linted code did support a lower version of python.	08:36
zbr	to test compatibility, we would have to run functional testing with both lower and upper bounds, but that brings huge extra costs.	08:37
*** sshnaidm\|rover has joined #opendev		08:38
zbr	i personally finding the version mix as providing a decent coverage of both.	08:38
*** rpittau\|afk is now known as rpittau		08:39
*** sshnaidm\|rover has quit IRC		08:43
*** sshnaidm\|rover has joined #opendev		08:45
*** sshnaidm\|rover has quit IRC		08:52
*** sshnaidm\|rover has joined #opendev		08:56
*** sshnaidm\|rover has quit IRC		09:00
*** jaicaa has quit IRC		09:01
*** jaicaa has joined #opendev		09:02
kevinz	frickler: ianw: Following the talk about https://review.opendev.org/#/c/760790/. We'd like to introduce OpenEuler 20.09 to Devstack, which is a Rpm based operation system and now work for AArch64 and X86_64	09:17
kevinz	If the DIB is essential, I will ask OpenEuler team to offer the some support in upstreaming this features.	09:18
kevinz	But if uploading images is fine temporily, I think adding a jobs to test this Devstack support woule be a good plus, so that we can work parallelly to make that work quickly happen	09:20
frickler	infra-root: ^^ there seems to be a generic cloud image available, not sure whether it would be o.k. for us to start with that or whether we'd have to insist on having dib support in order to get our customizations in place from the start	09:20
*** ysandeep\|lunch is now known as ysandeep\|ruck		09:20
frickler	I'm also not sure whether we'd have a procedure in place to use upstream images in nodepool at all, or whether that would have to be done manually	09:21
*** Green_Bird has joined #opendev		09:33
*** sshnaidm has joined #opendev		09:49
*** DSpider has joined #opendev		10:00
*** fressi has quit IRC		10:10
*** hashar has joined #opendev		10:15
*** noonedeadpunk has quit IRC		10:32
*** noonedeadpunk has joined #opendev		10:32
*** ysandeep\|ruck is now known as ysandeep\|brb		10:35
kevinz	frickler: Thanks! will wait for more comments here :-D	10:48
*** ysandeep\|brb is now known as ysandeep\|ruck		10:51
*** fressi has joined #opendev		10:57
*** fressi has quit IRC		11:13
*** noonedeadpunk has quit IRC		11:21
*** noonedeadpunk has joined #opendev		11:25
*** sboyron has quit IRC		11:49
*** sboyron has joined #opendev		11:52
*** marios has quit IRC		12:17
*** marios has joined #opendev		12:21
*** marios has quit IRC		13:00
*** marios has joined #opendev		13:03
*** dmellado has quit IRC		14:16
*** dmellado has joined #opendev		14:20
openstackgerrit	Merged openstack/project-config master: tox.ini : update Ansible pin https://review.opendev.org/761473	14:34
*** dtantsur has joined #opendev		15:00
dtantsur	hi folks! sorry if it has been asked too often already, but would it possible to enable code search on opendev git?	15:06
mordred	dtantsur: it exists? https://opendev.org/explore/code?tab=&q=novaclient https://opendev.org/sardonic/sardonic/search?q=cmdb	15:08
mordred	there is an open issue upstream gitea for making that all pluggable so that something like elasticsearch could be used to power the indexing ... so at the moment I think codesearch.openstack.org is still better at searching	15:11
dtantsur	mordred: this is empty for me: https://opendev.org/openstack/ironic/search?q=automated_clean	15:12
dtantsur	how does it look for you?	15:12
mordred	similar. I'm guessing automated_clean is something that is in the ironic repo?	15:12
dtantsur	yep. I've tried many things including "if" :)	15:13
mordred	fascinating	15:13
mordred	well - it's not a thing that's intentionally turned off	15:13
dtantsur	I've never had any results whenever I tried, so I assumed it might have been turned off	15:13
dtantsur	fascinating indeed	15:13
mordred	but it's also not a subsystem that's gotten a lot of love - largely because it's currently a per-gitea-node thing	15:13
*** hashar is now known as hasharKids		15:20
clarkb	the current code search uses a go lib that seems to have odd behaviors that don't map well to how humans search for text	16:01
clarkb	the elasticsearch support comes in the next release and should be more familiar to those who have used our logstash	16:01
clarkb	I expect we can try deploying ES alongside gitea in non clustered mode just to ensure that all works well	16:01
clarkb	frickler: kevinz: we strongly prefer dib because what we've found happens with the upstream images is they change behaviors or do things with cloud init that don't make sense. Its just easier to have a single common image that uses glean for our test nodes	16:02
*** ysandeep\|ruck is now known as ysandeep\|away		16:06
*** dmellado has quit IRC		16:11
*** dmellado has joined #opendev		16:13
openstackgerrit	Merged openstack/project-config master: Add manila client,ui,tempest plugin core teams https://review.opendev.org/758868	16:21
*** marios is now known as marios\|out		17:01
*** ykarel has quit IRC		17:04
*** ykarel has joined #opendev		17:05
openstackgerrit	Merged openstack/project-config master: Update neutron grafana dashboard https://review.opendev.org/758208	17:06
*** marios\|out has quit IRC		17:14
openstackgerrit	Clark Boylan proposed opendev/system-config master: Update gerrit plugins on 2.16 and 3.0 https://review.opendev.org/761641	17:16
clarkb	ensuring we're keeping our gerrit images up to date after hashar's feedback	17:16
openstackgerrit	Merged opendev/system-config master: Document dual account split for Gerrit admins https://review.opendev.org/760051	17:19
*** rpittau is now known as rpittau\|afk		17:21
*** Green_Bird has quit IRC		17:21
*** Green_Bird has joined #opendev		17:25
hasharKids	clarkb: hi, I hope my reply was not perceived as me patronizing!	17:30
*** hasharKids is now known as hashar		17:30
fungi	hashar: not at all! it had a lot of good reminders	17:30
clarkb	hashar: nope, it was useful to get input on whether or not we are on track	17:30
clarkb	and the bits about the js stuff were helpful too	17:30
hashar	but generally Gerrit upstream recommend to use the very latest patch release of any minor series	17:30
fungi	yeah, reassuring to see it basically matches our upgrade plan	17:30
clarkb	hashar: yup, our docker builds build off of stable-* branches and get the latest commit	17:30
hashar	so 2.x.max(y)	17:30
clarkb	so we should be at least as new as the most recent release for each stable branch when we rebuild	17:31
hashar	also note I haven't been directly involved in the Gerrit upgrade planning. Christian Aistleitner has done all the hardwork	17:32
hashar	so the reference is his writing at https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag/m/pLin-i3mBgAJ :]	17:32
hashar	I merely echoed and mentioned a few things we found after we upgraded	17:33
fungi	2.x.max(y) or newer, yeah. in many cases there are subsequent stable branch commits which are not yet tagged as point releases	17:38
*** hamalq has joined #opendev		17:38
clarkb	yup, we were already testing with the notedb migration improvements prior to the latest 2.16 release as a result	17:39
*** eolivare has quit IRC		17:45
hashar	ahh great !	17:51
hashar	do you, or will you, run the production Gerrit out of a Docker image?	17:51
clarkb	hashar: we do and we will :)	17:52
fungi	yeah, we build a docker image with zuul jobs, so that our chosen set of plugins will be included	17:52
fungi	we also continuously deploy image updates with a zuul job too	17:53
hashar	nice. You are way more automatized than us :-D	17:53
clarkb	we don't auto restart though	17:53
hashar	for later, you might be interested in the multi-site plugin https://gerrit.googlesource.com/plugins/multi-site	17:53
fungi	right, we'd rather still control the outage times for restarts	17:53
hashar	as I get it, that lets ones do rolling upgrades with 0 downtime	17:53
fungi	but yeah, if multi-site is robust enough now, maybe rolling restarts of cluster members behind an lb would suffice	17:54
frickler	that might even allow us to distribute over multiple providers. not sure how we'd lb ssh though?	17:55
hashar	there are some explanation by Luca Milanesio (a Gerrit maintainer and he is behind https://www.gerritforge.com/ ) at https://www.mediawiki.org/wiki/Topic:Vwkvtt6hlurmo42t	17:55
clarkb	frickler: aiui its all primary secondary	17:55
clarkb	frickler: not active active	17:55
fungi	right, we'd configure haproxy or whatever to only ever send connections to one cluster member or the other, never both	17:56
fungi	but i agree, it will likely be disruptive for ssh stream-events connections. they'll get reset and need to reconnect	17:56
*** ykarel is now known as ykarel\|away		17:56
fungi	which could leave windows of time where events are missed	17:56
clarkb	one thing at a time :)	17:57
fungi	yeah, i'm not in any hurry to add multi-site but it's neat to consider for down the road	17:58
fungi	it would also be lots of additional complexity optimizing away one-minute restart outages which happen at most once a month	17:58
fungi	so we should definitely weigh the positives and negatives of such a solution	17:59
hashar	another advantage is to reduce latency which comes helpful when your users are geographically distributed all accross the world	17:59
fungi	how does it reduce latency if only one cluster member is active?	18:00
hashar	I mean, you could have a Gerrit in asia for example	18:00
fungi	or is there an active/active model with multi-site too, not just active/standby?	18:00
clarkb	it was my understanding that the gerrit clustering doens't do active active	18:00
hashar	but maybe some locks have to happen all the way back to a reference that is held in the US, so maybe that doesn't help much	18:00
clarkb	you sync from the primary to the standby's using the replication plugin	18:01
clarkb	and you can't sanly do that in both directions I don't think	18:01
fungi	in theory clients could read from the standby node, but not write to it	18:01
clarkb	(but maybe that has changed since I lasts looked at this)	18:01
hashar	the link I pasted above was us complaining about multi site not really working for us ( https://www.mediawiki.org/wiki/Topic:Vwkvtt6hlurmo42t ) , but one of its maintainer pointed out the doc we used was outdated	18:02
hashar	seems like the plugin has been largely improved and the doc has been updated as a result of the above discussion	18:02
hashar	https://gerrit.googlesource.com/plugins/multi-site/+/HEAD/DESIGN.md might gives more clues	18:02
hashar	but as Clark said, one thing at a time. You can look at it next year I guess :]	18:03
*** ykarel\|away has quit IRC		18:11
openstackgerrit	Merged opendev/system-config master: Update gerrit plugins on 2.16 and 3.0 https://review.opendev.org/761641	18:25
*** andrewbonney has quit IRC		18:29
*** hashar is now known as hashardinner		18:39
*** ralonsoh has quit IRC		18:41
fungi	ooh, python 3.10.0a2 just dropped!	18:42
*** sshnaidm is now known as sshnaidm\|afk		18:44
fungi	i've booted review-test back up and then downed the gerrit container on it	18:52
*** dtantsur is now known as dtantsur\|afk		18:53
*** lpetrut has quit IRC		19:16
*** _mlavalle_2 has quit IRC		19:16
*** Tengu has quit IRC		19:31
*** rchurch has quit IRC		20:22
*** hashardinner is now known as hashar		20:38
*** dwilde has quit IRC		20:46
*** d34dh0r53 has joined #opendev		20:46
fungi	any tmux users who aren't aware, be on the lookout for updates to fix code execution by carefully crafted escape sequences: https://www.openwall.com/lists/oss-security/2020/11/05/3	21:10
ianw	fungi: when you have a sec, would you mind a double check on the grafana http -> https redirect one-liner @ https://review.opendev.org/#/c/761487/ ... just making sure there isn't a better way to do it	21:19
fungi	we can redirect /.* to /$1	21:21
fungi	so old http urls continue to work	21:21
fungi	i think redirecting / only does any good if folks load up / explicitly?	21:22
ianw	fungi: i think Redirect just replaces the string and leaves the rest of the url alone ... i mean it seems to work that way? e.g. http://grafana.opendev.org/dashboards	21:24
fungi	oh, maybe	21:25
fungi	could be i'm confusing it with rewrite	21:25
*** hashar has quit IRC		21:27
*** sboyron has quit IRC		21:28
ianw	yeah, something about "nice thing about standards is there's so many to choose from" :)	21:59
fungi	re.*	22:01
fungi	lgtm then	22:01
*** slaweq has quit IRC		22:06
*** mlavalle has joined #opendev		22:08
*** hamalq has quit IRC		22:11
openstackgerrit	Merged opendev/system-config master: grafana: redirect http to CNAME https://review.opendev.org/761487	22:31
ianw	does limestone not have ipv4 nat? or is glean doing something wrong?	22:37
ianw	wrt https://review.opendev.org/#/c/761178/	22:37
ianw	https://d4eb7e3efe98cba79a4b-f4d168cdb20f40841821e4b213645c0f.ssl.cf2.rackcdn.com/739139/12/gate/neutron-tempest-plugin-scenario-linuxbridge/9a6b4f7/zuul-info/zuul-info.controller.txt	22:37
clarkb	ianw: something is going on there. I pinged logan yesterday in -infra but havemt heard back	22:37
ianw	ahh, ok	22:37
clarkb	it should have a 10/8 network and gleans hould configure it to dhcp	22:37
clarkb	but I havent actually poked at the opemstack apis and hosts yet	22:38
openstackgerrit	Merged opendev/system-config master: grafana: fix typo in test name https://review.opendev.org/761489	22:38
openstackgerrit	Merged openstack/project-config master: Add pypa/project-config https://review.opendev.org/761467	22:39
ianw	fungi/clarkb: you seemed to have some opinions on the 8gb swap reset @ https://review.opendev.org/761119 in the linked irc conversation, so i haven't approved. it does seem that the larger swap is a matter of ~20 seconds to create which doesn't seem too bad to me	22:43
clarkb	the problem is projects like ironic run out of disk with even the 1gb swap	22:43
clarkb	and increasing it to 8gb will only make that bigger. If this wasn't a last ditch method to avoid jobs failing when they need to swap a little I'd be more on board but the swap isn't really there to double the "memory"	22:44
clarkb	if jobs hvae those problems they need to reduce memory or be multinode and distribute the memory load	22:44
clarkb	ultimately if the rest of openstack says projects like ironic are the ones that need to change then ok we cna land something like that, but I think that gets the purpose of the swap device wrong	22:45
ianw	yeah, good points; we probably should communicate that though	22:50
ianw	back to limestone, in "nodepool list" the nodes have a 10. ip address. so presumably openstacksdk is seeing an address defined	22:50
clarkb	ya it was probably a mistake to make it so big previously, but we figured its sparse allocated so it doesn't actually matter unless you need it and if you run out of disk and need swap you'll break anyway	22:51
openstackgerrit	Merged openstack/project-config master: Add pypa tenant https://review.opendev.org/761468	22:51
ianw	i just jumped on a random focal node and it has ipv4	22:51
clarkb	ianw: was it configured by dhcp (just want to confirm that assumption on my part)	22:51
ianw	Nov 5 22:21:38 ubuntu-focal-limestone-regionone-0021581754 dhclient[466]: DHCPREQUEST for 10.4.70.27 on ens3 to 255.255.255.255 port 67 (xid=0xcc89a40)	22:52
ianw	yep	22:52
clarkb	I wonder if there is some issue with dhcp for some hosts, like maybe neutron isn't setting up the mapping in dnsmasq in some cases then it fails?	22:53
clarkb	I've jumped on a bionic node and it too looks fine, has a default route via ens3 and a 10/8 address	22:53
clarkb	another thing it could be is we're running out of addresses in the pool?	22:54
ianw	similar on another two nodes i've jumped on	22:54
clarkb	allocation_pools \| 10.4.70.10-10.4.70.254	22:56
clarkb	that should be plenty for what I think is a ~50 node max-server limit	22:56
ianw	unfortunately the syslog in that job doesn't go back to the start of boot	22:56
clarkb	there are 62 ports in use	22:57
clarkb	all that is telling me that we're well below our allocation limit so that shouldn't be the problem	22:58
fungi	it's possible that with random macs and decent churn we're overrunning the pool in dhcpd if the leases are established with too long of a timeout?	22:58
clarkb	oh maybe	22:58
clarkb	usually neutron leases are very short, but that isn't necessarily the case	22:58
clarkb	ianw: does for focal node say what the lease period is?	22:59
ianw	option dhcp-lease-time 86400;	23:00
clarkb	that is one day right? I wonder if that is the problem	23:00
ianw	option dhcp-renewal-time 43200;	23:01
ianw	option dhcp-rebinding-time 75600;	23:01
ianw	dunno what those are	23:01
fungi	yeah, that's a day. it really depends on the dhcpd though as to whether it will recycle leases which don't respond to ping/arp under pressure	23:01
clarkb	ianw: the renewal time is when the client should renew usually set to 1/2 the lease time	23:01
clarkb	I'm trying to see where neutron might expose this and if we can see it as non cloud admins	23:02
fungi	however, if the api is claiming to have assigned an ip address for the question nodes, then i don't expect it to be a pool problem	23:02
clarkb	looks like its a config option in the dhcp agent config	23:02
fungi	i want to say neutron sets up reservations in dnsmasq?	23:02
clarkb	not something exposed by the api?	23:03
ianw	not sure if the journal file will have the syslog	23:03
clarkb	fungi: yes it uses mac address maps that dnsmasq assigns	23:03
fungi	if it's really all explicit reservations then lease times are irrelevant	23:03
fungi	since it's not doing an actual dhcp "pool"	23:03
clarkb	ah ok.	23:03
fungi	(where allocation within the pool is left up to the dhcpd)	23:04
fungi	sounds like neutron is responsible for tracking allocations and just tells the dhcpd what's been assigned instead	23:04
ianw	i think we should probably convert the journal to export from the start of boot, not from the time devstack started	23:14
clarkb	ianw: I think devstack does that since some people reuse the nodes for CI	23:15
clarkb	but in our case it would be fine	23:15
ianw	at the moment we're flying blind, but i guess there's nothing obvious/systematic, at least right now	23:16
ianw	i need to force merge the pypa project-config pipeline config	23:38
ianw	so trying the instructions	23:39
clarkb	ianw: why is that?	23:39
clarkb	oh its a config project with no jobs	23:39
clarkb	change adds the jobs	23:39
ianw	chicken egg because there's no pipeline config to merge before there's a config :)	23:39
ianw	"Members added to group Project Bootstrappers: n/a"	23:39
fungi	i think that's normal if the account running the set-members command isn't itself a member	23:40
clarkb	ianw: web ui says it added you	23:40
ianw	yeah, i think red herring because the email is "n/a"	23:41
fungi	aha	23:41
fungi	that makes sense	23:41
fungi	in a gerrit sort of way	23:41
clarkb	oh that reminds me one thing I ran into when testing new gerrit is it really wants an email on accounts	23:41
clarkb	so we may have to add email addrs to those accounts at some point	23:41
fungi	also you can use the ls-members command to look at the list of group members, if needed	23:42
clarkb	I had to set one to update the public key for the test project creator account	23:42
clarkb	fungi: we should test if the upgrade will break our admin accounts without email addrs set	23:42
clarkb	that is something we can test though	23:42
fungi	that could pose problems since gerrit also doesn't like accounts to share e-mail addresses, so all admins will need two e-mail addresses	23:42
clarkb	well we can also set it to a bogus value	23:43
clarkb	we don't actually need the review emails	23:43
clarkb	just need to convince it to not complan when setting things like public keys	23:43
clarkb	(my worry is there is a chicken and egg where we might not be able to change things like that because we need the email field to have something in it)	23:43
ianw	https://review.opendev.org/#/c/761681/ looks merged by ianw.admin, that's good	23:43
fungi	but also having it try to e-mail bogus addresses could be problematic	23:44
ianw	can probably use + addresses?	23:44
ianw	to make something valid but different	23:44
fungi	yeah, i mean it's no problem for me, i run my mailserver so i fan add whatever addresses i want	23:44
clarkb	ya we'll sort it out on the test node	23:44
clarkb	its possible its a non issue too	23:44
ianw	https://zuul.opendev.org/t/pypa/status has all the pipelines	23:46
ianw	and removed. ++ to fungi for great instructions	23:47
fungi	+++ to gerrit's documentation	23:48
*** tosky has quit IRC		23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!