Friday, 2020-09-04

hashar	my ISP is probably too picky since that works fine from a gmail address	00:00
hashar	I tried with hashar@free.fr	00:00
hashar	so maybe there is a misconfiguration in your config or it is just my ISP being annoying	00:00
corvus	hashar: we got this from your mail server: "SMTP error from remote mail server after end of data: 550 spam detected"	00:01
hashar	ah yeah	00:01
hashar	that is my isp ;]	00:01
hashar	thank you for checking it!	00:01
corvus	hashar: np, sorry :(	00:01
hashar	it is one of the largest isp in France and they went with a few hammers when it comes to deal with inbound spam	00:01
hashar	anyway	00:01
hashar	the reason was to ask about the status of opendev/gear since it has a lot of small patches that could use review	00:02
*** cloudnull has joined #opendev		00:02
corvus	hashar: ah, i can try to take a pass through those soon. it mostly "just works" so i haven't really been looking	00:04
hashar	an idea I had was to write down a mail listing the patches and giving a brief overview for each of them	00:05
hashar	that might be less intimidating / easier to process them in bulk	00:05
* hashar mailed the postmaster		00:24
*** tkajinam has quit IRC		00:59
*** tkajinam has joined #opendev		00:59
*** elod has quit IRC		01:35
*** elod has joined #opendev		01:37
*** hashar has quit IRC		01:49
*** euclidsun has joined #opendev		02:53
*** euclidsun has left #opendev		02:58
*** zbr4 has joined #opendev		05:03
*** zbr has quit IRC		05:06
*** zbr4 is now known as zbr		05:06
*** ysandeep\|away is now known as ysandeep		05:06
*** qchris has quit IRC		06:20
*** qchris has joined #opendev		06:33
*** Gyuseok_Jung has quit IRC		06:51
yoctozepto	morning	07:16
yoctozepto	how can infra help us (kolla) deal with the rate-limiting problem of docker.io - could we set up a caching docker registry?	07:17
*** tosky has joined #opendev		07:25
*** andrewbonney has joined #opendev		07:40
*** moppy has quit IRC		08:01
*** moppy has joined #opendev		08:03
*** hashar has joined #opendev		08:19
*** pushparajkvp has joined #opendev		08:19
*** dtantsur\|afk is now known as dtantsur		08:24
*** moppy has quit IRC		08:28
*** moppy has joined #opendev		08:28
*** moppiner has joined #opendev		08:32
*** moppy has quit IRC		08:33
*** DSpider has joined #opendev		08:51
*** pushparajkvp has quit IRC		08:54
*** xiaolin has joined #opendev		09:15
*** xiaolin has quit IRC		09:28
*** stephenfin has quit IRC		10:27
*** hashar is now known as hasharAway		11:43
*** redrobot has quit IRC		12:10
*** Eighth_Doctor has quit IRC		12:21
*** mordred has quit IRC		12:22
*** mordred has joined #opendev		12:30
*** hasharAway has quit IRC		12:45
*** Eighth_Doctor has joined #opendev		13:00
*** lpetrut has joined #opendev		13:21
fungi	yoctozepto: we were talking about that yesterday (either in here or #openstack-infra, maybe both)	13:50
fungi	yoctozepto: docker has promised to publish recommendations for operators of ci systems as to how best to solve the problem, so we're mostly holding out for that	13:52
fungi	though if running a proxy registry does wind up being their recommended solution, i wonder if we should do a double-layered solution where we used a proxy registry to cache images somewhere centrally, and then pointed our current caching http proxies in each provider at that instead of at dockerhub. that way you get images cached near the nodes, but also have the caches hitting a registry which doesn't rate	13:54
fungi	limit them (we could even restrict access to it so only our http proxies were allowed to make requests if we needed to mitigate abuse	13:54
frickler	fungi: yoctozepto: the docker blog says "To apply for an open source plan, please complete the short form here.", did anyone do that? Not sure whether they'll announce more details only to those that leave their data there, instead of publically	13:56
*** ysandeep is now known as ysandeep\|away		13:56
frickler	the form starts by asking for personal data including a docker id	13:57
frickler	oh, the form even says "Please complete our survey to get more information about how Docker can support your open source project on Docker Hub.	14:01
frickler	" at the top https://forms.gle/vvKURDTYwok7Pc4r5	14:02
fungi	i think we also assumed that it would require some sort of authentication to make use of a special plab	14:07
fungi	plan	14:07
*** qchris has quit IRC		14:13
*** qchris has joined #opendev		14:14
clarkb	to be clear we do already cache. The specific issue is we cache blobs not manifests. The old rate limits werebased on blob fetches because they arethe actual data but docker changed the rate limiting ti be based on manifests becausepeople found blob limits confusing	14:24
clarkb	it is unfortunate because we were doing the right thing for the previous situation	14:25
clarkb	and ya they promised a blog post specifically related to CI	14:25
fungi	new yesterday, https://discuss.python.org/t/pep-632-deprecate-distutils-module/5134 "Deprecate distutils module"	14:35
fungi	(in the ongoing setuptools/distutils saga)	14:35
fungi	er, i meant to link https://www.python.org/dev/peps/pep-0632/ but that's the discussion on their discourse	14:35
fungi	maybe this will finally force distros who want to be able to split files under their package management from those managed by pip et cetera to better come to a compromise with the upstream python devs and package ecosystem	14:37
fungi	since "just patch distutils" will cease to be an option	14:37
fungi	"Code that imports distutils will no longer work from Python 3.12."	14:39
fungi	that's going to be un	14:39
fungi	also fun	14:39
clarkb	and setuptools is vendoring distutils because it does import distutils?	14:42
*** priteau has joined #opendev		14:43
clarkb	for docker I expect we have two simple options in the short term. First is stop using our caches, then the requests willbe distributed across many more IPs	14:57
clarkb	Second is set up per project accounts and then use those with the mirrors so that manifest fetches are associated to accounts not IPs but we atill get blob caching for reliability (and perhaps speed)	14:58
*** lpetrut has quit IRC		15:06
fungi	setuptools is vendoring distutils because it needs new distutils features and doesn't want to have to maintain backward compatibility with whatever the implementations in various 5-year-old stdlib might be	15:12
fungi	and also as indicated by pep 632, the python stdlib maintainers would like to be able to stop maintaining it themselves (it's currently used for building the stdlib modules, but they're looking to switch to using makefiles directly like the interpreter does)	15:14
*** mlavalle has joined #opendev		15:14
*** rpittau is now known as rpittau\|afk		15:18
openstackgerrit	Nate Johnston proposed openstack/project-config master: Make the Backport-Candidate field in Octavia reviews persist https://review.opendev.org/749986	15:19
*** hashar has joined #opendev		15:22
fungi	hrm, the old afs02.dfw cinder volumes i cleaned up went into error_deleting state for some reason	15:24
fungi	i don't think afs01.dfw's did that	15:25
fungi	#status log all four cinder volumes for afs02.dfw have been replaced and cleaned up	15:26
openstackstatus	fungi: finished logging	15:26
fungi	i'll get to work on the dfw mirror server's volume shortly	15:26
clarkb	infra-root I intend to catch up on email and any review response, then pop out for a bike ride. When I get back I plan to try booting an nb03.opendev.org server in linaro-us which can serve as our new dockerized nodepool builder for arm	15:37
fungi	awesome	15:38
clarkb	I do wonder if I sould boot a nb05.opendev.org instead to avoid the hostname conflicts but iirc we fixed that in nodepool	15:40
clarkb	and maybe this is a good test of that	15:40
clarkb	fungi: re error deleting volumes, di they successfully detach from the VM at least?	15:48
*** pushparajkvp has joined #opendev		15:57
*** dtantsur is now known as dtantsur\|afk		16:06
fungi	yep, or at least cinder thought they did	16:12
fungi	it reported them as "available" rather than "in-use"	16:12
fungi	ick, the dfw mirror recorded a page allocation failure in xenwatch the same time i attached its new volume	16:16
fungi	infra-root: ^ should we turn down that region temporarily and reboot the mirror?	16:17
clarkb	is it persistently unhappy?	16:18
clarkb	page allocation failures would indicate some type of OOM?	16:18
fungi	i have a feeling the new volume wasn't successfully hot-added	16:18
clarkb	ah	16:18
fungi	that was the first entry in dmesg in roughly a month	16:19
fungi	and it didn't claim to be out of memory, or even close	16:19
clarkb	ya unless you want to quickly write a partition table and mkfs and do some tests a reboot sounds practical (to be clear I'm saying reboot is probably simpler and easier than the laternative)	16:21
yoctozepto	fungi, frickler, clarkb: I haven't done that form for sure; glad to know there is plan to have some cache; it might be beneficial in general - dockerhub likes to go awry; please ping me wherever you discuss docker issues, I won't mind but rather be thankful :-)	16:23
openstackgerrit	Jeremy Stanley proposed openstack/project-config master: Temporarily disable rax-dfw for mirror reboot https://review.opendev.org/749993	16:24
openstackgerrit	Jeremy Stanley proposed openstack/project-config master: Revert "Temporarily disable rax-dfw for mirror reboot" https://review.opendev.org/749994	16:24
fungi	i'll wip the revert for now	16:25
openstackgerrit	Nate Johnston proposed openstack/project-config master: Allow copyAnyScore in gerrit ACLs https://review.opendev.org/749995	16:31
openstackgerrit	Nate Johnston proposed openstack/project-config master: Make the Backport-Candidate field in Octavia reviews persist https://review.opendev.org/749986	16:34
fungi	while we wait for a safe mirror reboot, i'll work on etherpad, and then maybe gerrit after that since hopefully activity level will be dropping headed into the weekend so if there is any (unlikely) disruption from the pvmove there it won't be too painful	16:41
*** hashar has quit IRC		17:11
openstackgerrit	Merged openstack/project-config master: Temporarily disable rax-dfw for mirror reboot https://review.opendev.org/749993	17:12
fungi	etherpad pvmove is in progress under a root screen session now	17:13
fungi	shouldn't take long, it's a 50gb ssd	17:14
fungi	not like the afs servers where we have 4tb attached	17:14
fungi	already 10% complete	17:14
fungi	#status log cinder volume for etherpad01 has been replaced and cleaned up	17:28
openstackstatus	fungi: finished logging	17:28
fungi	clarkb: if we're worried about i/o performance, should we give review.o.o an ssd volume instead of sata?	17:34
fungi	easy enough to do while i'm replacing anyway	17:34
fungi	it might help with the upgrade	17:34
fungi	looks like the rax-dfw max-servers was zeroed at 17:25z, so in use counts there are dwindling	18:07
fungi	once they bottom out i'll reboot it and then may as well do its pvmove before bringing it back into service	18:08
fungi	demand's not that high now anyway	18:08
fungi	and after that maybe we'll have an idea of whether we want to make changes to the volume for review.o.o	18:08
*** pushparajkvp has quit IRC		18:11
*** andrewbonney has quit IRC		18:22
clarkb	fungi: I thought it was an ssd voolume already	18:42
clarkb	but yes I think we want an ssd volume for the upgrade process	18:43
fungi	it's a 200gb sata volume right now. i could make it a... 256gb? ssd	18:43
fungi	it's around half-used currently, but i figure we've got db content moving into it too	18:44
fungi	when the notedb migration happens that is	18:44
clarkb	ya one of the things we'll need to sort out is how much extra disk we need	18:44
clarkb	one reason the current volume is so full is we've got a bit of old stuff in /home/gerrit2	18:44
clarkb	I cleaned up some of that recently though	18:44
clarkb	~250gb seems fine and if we need more we can always attach another	18:45
fungi	or migrate to a larger volume, yep	18:46
fungi	down to 7 nodes in-use for rax-dfw	18:47
fungi	though we have a bunch of nodes there which look to be stuck in a "deleting" state according to grafana/graphite/nodepool	18:48
fungi	the cinder volume for review.o.o is undergoing pvmove to a 256gb ssd in a root screen session	18:50
fungi	hopefully should be done within the hour	18:50
fungi	it's already 3% complete	18:50
clarkb	it looks like the cinder volume on nb03.openstack.org isn't lvm'd	18:51
fungi	by then we ought to be clear to do the mirror	18:51
clarkb	I think I'll lvm the new server and if thats wrong becaus arm64 we'll sort that out	18:51
fungi	yeah, having the lvm layer in place allows us to do stuff like this ;)	18:51
*** zbr0 has joined #opendev		18:52
fungi	once the pvmove on review.o.o is done and cleaned up, we can either do the logical volume and filesystem resize, or just wait and remember we want to do that during the next gerrit maintenance	18:53
fungi	but risk for online resize is low in my opinion	18:53
*** zbr has quit IRC		18:53
*** zbr0 is now known as zbr		18:53
*** priteau has quit IRC		19:00
clarkb	launch node is running (I had to specify a network to make it work but otherwise smooth sailing so far)	19:02
clarkb	hrm ping6 to wiki failed	19:04
clarkb	I can do --ignore_ipv6 which maybe is necessary in this cloud (it is for ovh)	19:04
clarkb	I'll see where that gets me I guess	19:05
fungi	aha, yeah ping6 in ovh will likely fail out of the gate	19:05
fungi	we should see if we can think up a round-trip mechanism to push a working v6 config to the instances there	19:05
clarkb	we can configure it statically with launch node there	19:06
fungi	ought to be able to query the nova api and then generate something to scp onto it	19:06
clarkb	(still not sure what is up with linaro-us ipv6 but we can always scrap this server and start over if necessary. Figure some progress is better than one)	19:06
fungi	review pvmove is 60% complete	19:09
fungi	0 nodes in use for rax-dfw. i'll reboot it now	19:11
clarkb	now I seem to have a conflict between make swap script and mount volume script	19:12
fungi	#status log rebooted mirror01.dfw.rax to resolve a page allocation failure during volume attach	19:13
openstackstatus	fungi: finished logging	19:13
clarkb	the issue is that /dev/vdb is the device and make_swap.sh assumes it owns that, then mount_volume.sh tries to use the same device and fails	19:14
clarkb	Instead of using launch_node to mount and configure the volume I'll do that after I've booted the instance and launch node is happy	19:14
clarkb	I need to pop out fo ra few minutes and finish lunch I'll try again without automated volume mounting after	19:17
fungi	the old sata volume for review.o.o went into error_deleting too, but it's migrated	19:31
fungi	#status log cinder volume for review.o.o has been replaced, upgraded from 200gb sata to 256gb ssd, and cleaned up	19:32
openstackstatus	fungi: finished logging	19:32
fungi	i went ahead and resized the logical volume and fs to fill it	19:35
fungi	activity seems pretty low at the moment	19:35
fungi	infra-root: ^ just in case anyone spots anything amiss	19:36
fungi	the /home/gerrit2 fs is now 36% used	19:36
fungi	so we've got lots of headroom	19:36
fungi	pvmove for the mirror01.dfw.rax volume is underway in a root screen session on the server	19:40
clarkb	thank you for keeping us up to date	19:42
clarkb	I've just started a new launch without volume automation to get around make swap and mount volume fighitng	19:42
fungi	yw	19:42
clarkb	server booted successfull this time. And it got its ipv6 address. It appears that the RAs may have a delay wich is why it failed before	20:00
clarkb	we may want launch node to have some reasonable timeout for global ipv6 addr to show up via RAs but will worry about that once I've got this all sorted	20:01
clarkb	fungi: re uuids and devfs its the /dev/disk/by-id path but its not strictly 1:1	20:04
fungi	yeah, i think that makes sense (so long as we output a message saying that's the delay)	20:04
clarkb	you get a entry there that is virtio-truncateduuid	20:04
fungi	oh, fun	20:04
clarkb	this is on kvm though	20:04
fungi	yep, in rackspace (maybe because of xen?) we get no uuid for raw disk, only for partitions	20:04
fungi	at least according to the kernel	20:05
fungi	pvmove on the rax-dfw mirror is about half done	20:05
clarkb	ok I think the server is all good now with a volume attached and part of lvm and fstab. Working on the changes needed to turn it into a builder now	20:08
openstackgerrit	Clark Boylan proposed opendev/zone-opendev.org master: Add nb03.opendev.org to DNS https://review.opendev.org/750037	20:16
openstackgerrit	Clark Boylan proposed opendev/system-config master: Remove nodepool builder puppetry and nb03.openstack.org https://review.opendev.org/749853	20:23
openstackgerrit	Clark Boylan proposed opendev/system-config master: Add nb03.opendev.org https://review.opendev.org/750039	20:23
clarkb	there is a whole depends on chain in there to ensure we don't configure the nb03 with the wrong config	20:23
clarkb	if we do that I think it may delete images in all the clouds (because of our default config?)	20:23
fungi	yep	20:24
clarkb	and I'll WIP the last one in the stack which is the removal of the puppetry	20:24
fungi	that got ugly last time	20:24
clarkb	(we should only do that once we are happy with the new server	20:24
clarkb	It is a holiday on monday	20:49
clarkb	chances are I'll end up being around but maybe not as much as a typical monday	20:49
clarkb	infra-root https://review.opendev.org/#/c/744821/ is a useful launch node change from ianw to add sshfp info to the output of the script	21:05
clarkb	I had to manaully figure that out (based on what the script does)	21:05
fungi	yeah, i think christine doesn't expect me to be on the computer much monday, but i'll still try to be around for emergencies	21:24
openstackgerrit	Clark Boylan proposed opendev/system-config master: Wait for ipv6 addrs when launching nodes https://review.opendev.org/750049	21:27
clarkb	that change feels hacky but shoulddo the job	21:27
fungi	#status log cinder volume for mirror01.dfw.rax.o.o has been replaced and cleaned up	21:32
openstackstatus	fungi: finished logging	21:32
fungi	and i've approved the revert of its max-servers zeroing	21:33
fungi	after quickly browsing around it and making sure it's not obviously broken	21:33
openstackgerrit	Merged openstack/project-config master: Revert "Temporarily disable rax-dfw for mirror reboot" https://review.opendev.org/749994	21:39
*** mlavalle has quit IRC		22:59
*** tosky has quit IRC		23:07
*** DSpider has quit IRC		23:41

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!