Wednesday, 2022-06-01

*** rlandy\|bbl is now known as rlandy		00:03
opendevreview	Ian Wienand proposed opendev/glean master: redhat-ish platforms: refactor simplification of interface writing https://review.opendev.org/c/opendev/glean/+/844148	00:35
ianw	clarkb: ^ thanks, that was the refactor you suggested. i have to walk away from this or I'll end up rewriting the whole thing, which is probably not time well spent.	00:36
fungi	you'd distil it down to merely a glimmer	00:37
ianw	the next thing it needs to do is switch to "keyfile" ini-style NetworkManager files, instead of ifcfg-* format. clearly no development is happening on the NM plugin that reads ifcfg-* files, but I also don't see why any of it would need to be broken	00:37
*** rlandy is now known as rlandy\|out		01:19
*** rlandy is now known as rlandy\|out		01:24
ianw	thanks for the glean reviews. i tagged and pushed 1.22.0 which sets the foundation for rh ipv6 but is intended to have no functional changes. so i'll keep an eye on things, and we can push the actual changes in a day or two	01:36
fungi	thanks for all your hard work on it so far!	01:36
fungi	looks like we're on to ze03 now	01:38
opendevreview	Merged opendev/system-config master: Update Gerrit images to 3.4.5 and 3.5.2 https://review.opendev.org/c/opendev/system-config/+/843298	02:04
ianw	^ i can do a quick gerrit pull and restart for that in an hour or two when it's super quiet	02:55
ianw	docker inspect 23534ba51fc3 \| grep opendevorg/gerrit@sha	03:59
ianw	"opendevorg/gerrit@sha256:e114ec73aa90e04f0611609f34f585b269bea766d42e1b57d150987b5d450864"	03:59
ianw	lines up with https://hub.docker.com/layers/opendevorg/gerrit/3.4/images/sha256-e114ec73aa90e04f0611609f34f585b269bea766d42e1b57d150987b5d450864?context=explore	03:59
ianw	i'll restart it now	04:01
ianw	#status log Restarted gerrit with 3.4.5 (https://review.opendev.org/c/opendev/system-config/+/843298)	04:04
opendevstatus	ianw: finished logging	04:04
*** marios is now known as marios\|ruck		05:05
*** ysandeep\|out is now known as ysandeep		06:14
opendevreview	Rodolfo Alonso proposed openstack/project-config master: Remove lower-constraints and tox-py36 from Neutron Grafana https://review.opendev.org/c/openstack/project-config/+/844254	07:43
Tengu	1	08:02
*** ysandeep is now known as ysandeep\|lunch		08:09
*** pojadhav is now known as pojadhav\|lunch		08:16
*** pojadhav\|lunch is now known as pojadhav		08:45
*** marios\|ruck is now known as marios\|ruck\|afk		08:55
*** ysandeep\|lunch is now known as ysandeep		09:26
*** jpena\|off is now known as jpena		09:45
*** marios\|ruck\|afk is now known as marios\|ruck		09:46
*** rlandy\|out is now known as rlandy		10:18
*** rlandy_ is now known as rlandy__		10:24
*** pojadhav is now known as pojadhav\|afk		11:12
mgariepy	good morning	11:45
mgariepy	is the centos9-stream hold available for 844037 change ?	11:46
fungi	mgariepy: just a sec and i'll take a peek	11:50
fungi	mgariepy: ssh root@149.202.172.204	11:52
mgariepy	great thanks a lot :D	11:53
fungi	yw	11:53
*** dviroel\|afk is now known as dviroel		12:21
mgariepy	thanks fungi i did find the issue.	12:42
mgariepy	https://zuul.opendev.org/t/openstack/build/ffb71d3850284a028fe4329c2c4abb20/log/logs/openstack/aio1_galera_container-170eefd4/mariadb.service.journal-21-15-29.log.txt#150	12:42
frickler	"find not found" sounds ... nice ;)	12:54
mgariepy	lol	12:56
mgariepy	yep indeed	12:56
mgariepy	thanks again for you help on this one.	12:57
fungi	mgariepy: so you're done with the held node, or still experimenting?	12:59
mgariepy	i'm done with it.	12:59
fungi	also don't forget, if it's just a matter of not being sure you have a representative test environment, we do publish our vm images you can download	12:59
mgariepy	were are the images?	13:00
fungi	mgariepy: https://nb01.opendev.org/images/ and https://nb02.opendev.org/images/ depending on which builder built a particular image (they grab the build requests at random)	13:02
fungi	also https://nb03.opendev.org/images/ for the aarch64 (arm64) images	13:02
fungi	you'd have to do something to get your ssh key into them (configdrive metadata or editing the images)	13:03
mgariepy	ha cool i didn't knew that they were published.	13:03
frickler	worth noting that these require to be booted with a config-drive for setup, they don't do cloud-init (I think)	13:03
mgariepy	were is the config for the building of the image?	13:04
fungi	they can get away without configdrive if you have dhcp/slaac for dynamic network configuration	13:04
fungi	mgariepy: nodepool builds the images by calling diskimage-builder, but the basic parameters for that are configured here: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nodepool.yaml#L103	13:05
fungi	most of the elements listed are part of dib's stdlib, but a few (like infra-package-needs) are custom here: https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements	13:07
mgariepy	cool i'll take a look to try them on my servers so i can stop asking for hold : ) haha	13:08
fungi	yeah, if you have an openstack you can just upload those to glance and set the appropriate metadata for your ssh key, then boot them with configdrive enabled	13:09
Clark[m]	It is also worth noting that you should look at the logs jobs produce and determine if they are sufficient or need to be improved. In this case the error was logged	13:10
fungi	right, and obviously if there's information you're missing which would have helped to find that, gathering those additional logs in the job would be a great idea	13:10
Clark[m]	Looking at zuul restart progress as soon as ze12 is paused I think we can test the rax swift uploads as ze12 won't schedule new jobs	13:11
fungi	Clark[m]: agreed. what's the simplest way to exercise base-test?	13:11
fungi	we just need a change in an untrusted repo which would normally run something directly parented to base, i guess, and reparent it	13:11
fungi	but wondering if you happen to know a good one off the top of your head	13:12
Clark[m]	fungi: I typically use a DNM change against zuul-jobs swapping out base for base-test on it's unittests iirc	13:12
fungi	maybe something in zuul/zuul-jobs	13:12
fungi	yeah that would work	13:12
fungi	i'll get that pushed now	13:12
*** pojadhav\|afk is now known as pojadhav		13:12
opendevreview	Jeremy Stanley proposed zuul/zuul-jobs master: DNM: exercise base-test job https://review.opendev.org/c/zuul/zuul-jobs/+/844291	13:14
*** dviroel_ is now known as dviroel		13:15
Clark[m]	https://zuul.opendev.org/t/zuul/stream/1906f09b18644882971112de85f54ef4?logfile=console.log if that uploads to rax it is running on ze04	13:24
Clark[m]	It has reported and looks like it uploaded to rax and the logs are viewable	13:29
Clark[m]	Someone not on a phone should double check all that :) but we may need to bring this up with openstacksdk next if that all checks out	13:30
*** marios\|ruck is now known as marios\|ruck\|call		13:32
fungi	i'll recheck it again just so we have a bit more data	13:47
opendevreview	Merged openstack/project-config master: Add a repository for the Large Scale SIG https://review.opendev.org/c/openstack/project-config/+/843534	14:01
corvus	ze12 is still running, so there's still a small chance it could run a job	14:13
*** rlandy__ is now known as rlandy		14:16
fungi	yeah, i'll check that examples didn't come from there	14:22
fungi	we're still waiting for ze11 to stop completely	14:23
Clark[m]	ze12 is paused now. All new jobs should run with old openstacksdk	14:58
*** ysandeep is now known as ysandeep\|out		14:58
*** marios\|ruck\|call is now known as marios\|ruck		15:06
*** dviroel is now known as dviroel\|lunch		15:08
clarkb	tox-py27 and tox-py38 on fungi's recheck both uploaded to rax and they have logs	15:31
clarkb	the other tox jobs uploaded to ovh and also have logs	15:31
clarkb	I'm pretty much convinced now that openstacksdk is deleting our metadata somehow	15:31
clarkb	er openstacksdk 0.99.0	15:31
clarkb	gtema: fyi ^ would it help to send email to the discuss list about that or is filing an issue btter or?	15:32
clarkb	fungi: and I think we can go ahead and revert the rax removal from base jobs?	15:33
fungi	clarkb: i think so, still seems to be working	15:33
fungi	checking to see if i proposed the revert as wip	15:34
opendevreview	Jeremy Stanley proposed opendev/base-jobs master: Revert "Temporarily stop uploading logs to Rackspace" https://review.opendev.org/c/opendev/base-jobs/+/844316	15:36
fungi	clarkb: ^	15:36
clarkb	I've gone ahead and approed that as ze12 won't run any new jobs now	15:41
opendevreview	Merged opendev/base-jobs master: Revert "Temporarily stop uploading logs to Rackspace" https://review.opendev.org/c/opendev/base-jobs/+/844316	15:47
*** marios\|ruck is now known as marios\|out		15:53
*** dviroel\|lunch is now known as dviroel		16:20
clarkb	corvus: I think we might be stuck stopping mergers still	16:36
clarkb	corvus: it looks like the merger restarted instead of stopping?	16:37
clarkb	ansible is still waiting on it to stop so I don't know what happend there	16:37
clarkb	maybe docker restarted it too quickly for our wait to notice?	16:38
fungi	oh, hrm	16:42
clarkb	ok ya the container id never changed the process just restarted	16:42
clarkb	the way we awit for the container to stop is we expect the container to go away?	16:43
clarkb	so I think there is a mismatch in how the graceful stop works now and how docker is reporting the container presence	16:43
fungi	though docker-compose logs for it doesn't mention any restart	16:43
fungi	yeah, i concur, i don't think docker restarted it	16:43
clarkb	docker ps -a says it is up 16 minutes	16:43
clarkb	and ps proper seems to correlate with that	16:43
fungi	oh, docker restarted it but didn't percolate into the docker-compose logs?	16:44
fungi	s/oh/or/	16:44
clarkb	https://paste.opendev.org/show/b972p4Kp9z2t6psFkHmZ/	16:45
clarkb	`docker-compose ps -q \| xargs docker wait` is what we are waiting on so its not the logs that matter but the listing	16:45
clarkb	restart: always <- is set on the service	16:46
clarkb	I think that is the issue	16:46
clarkb	ya executors set restart: on-failure	16:47
clarkb	let me push a fix then once that lands we can manually trigger a merger stop again which the ansible playbook should catch allowing it to continue	16:47
clarkb	hrm schedulers are also restart always. How did they work before?	16:48
clarkb	ah because we down the scheduler rather than doing a graecful thing	16:48
clarkb	so ya one sec change incoming	16:48
opendevreview	Clark Boylan proposed opendev/system-config master: Fix zuul merger graceful stops https://review.opendev.org/c/opendev/system-config/+/844320	16:51
corvus	clarkb: ah yep, lgtm	16:54
*** jpena is now known as jpena\|off		17:10
clarkb	infra-root https://etherpad.opendev.org/p/3Rmqg-Tbb8qOqi1nFQp1 thats an openstack-discuss email about openstacksdk 0.99.0 and the swift uploads. Does that draft look good?	17:32
clarkb	also looking at the script I wonder if delete after was not set on those objects either	17:34
clarkb	which means we'll have leaked the job logs from those days potentially	17:34
frickler	gtema: ^^	17:42
gtema	I would be looking at that, was waiting to get bit more details	17:43
gtema	For sure 0.99 changes things and also on this front, but I really wonder that it now fails for particular type of cloud only	17:44
gtema	Would be good to think about testing possibility	17:44
frickler	well it seems to fail only for rax and not for ovh	17:44
clarkb	well it looks like we have to specifically set a header on each obect in rax to get the cors headers (linked to in that ehterpad)	17:45
clarkb	which is why I'm wondering iwe just aren't setting those headers anymore	17:46
frickler	so possibly would need some historic version of swift to reproduce	17:46
clarkb	which is why I also wonder about the delete after values	17:46
fungi	rackspace's swift wasn't ever actually swift, from what i gather	17:46
frickler	clarkb: but setting the headers on ovh continued to work? or don't we need them there?	17:46
clarkb	frickler: we don't appear to need them there (they must either default to the right cors value or maybe they consult the index.html value for the container top level)	17:47
corvus	the headers are required for allowing access through rackspace's cdn, which is the only way to have anonymous public access in rax. that's the main difference.	17:47
clarkb	https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-logs-base/library/zuul_swift_upload.py#L108-L129 is the other place we set these headers but that happens once per container and all our containers would've had that happen long ago	17:47
fungi	i suppose we could record the api interactions at debug level with openstacksdk 0.61.0 and 0.99.0 and compare them	17:48
corvus	however, if the mechanism for setting those is broken in general, then it's possible that the x-delete-after header was not being set on any cloud uploads, so as clarkb suggested, we may have objects in all of our clouds which will not expire automatically.	17:48
clarkb	fungi: ya that might be a good way to test it too. Can check that all expected headers make it outbound	17:48
clarkb	corvus: exactly	17:48
fungi	clarkb: small edits to the pad, also those urls are not permalinks so may change before people read them	17:51
clarkb	fungi: good point I can fix the links	17:51
clarkb	I'll also add a note about x-delete-after too	17:51
clarkb	alright sending that out	17:53
fungi	thanks!	17:56
clarkb	we can track progress on the problem there. To be clear I'm not sure how much I'll be able to debug for the next while as other things are distracting too :) I think we're likely stable on 0.61.0 though	17:56
timburke_	from a swift perspective, i would've expected CORS to be controlled via X-Container-Meta-Access-Control-Allow-Origin set at the container level, not Access-Control-Allow-Origin set on individual objects	18:04
gtema	I am pretty confident x-delete was working earlier, since i rely on that heavily in my cloud. Will anyway have a deeper analysis tomorrow (will try to bisect what particularly changed in 0.99	18:04
timburke_	rax might be doing something different, though -- while they certainly used to run swift, they might be running hummingbird for cloudfiles these days. even with vanilla swift, though, you could probably update the allowed_headers configured on the object-server to have that CORS header stick -- i just didn't know of anyone that did that	18:04
clarkb	timburke_: according to the comments its specific to their CDN	18:05
gtema	The only thing that immediately came to my mind is eventually changed case for header names	18:06
clarkb	timburke_: basically we're setting an object metadata/header value in swift to affect another service	18:06
clarkb	timburke_: and ya we set it at the container level too https://opendev.org/zuul/zuul-jobs/src/commit/e69d879caecb454c529a7d757b80ae49c3caa105/roles/upload-logs-base/library/zuul_swift_upload.py#L112-L117	18:07
clarkb	however we use something like 4096 containers to shard the logs across and all of those would've been created a long time ago so hard to say if that is also affected here	18:07
timburke_	👍 yeah, the CDN stuff can definitely cause headaches, too	18:07
gtema	Clarkb: do you have a chance to look at any of the "broken" containers/objects in rax to see if they have any headers set?	18:10
clarkb	let me see	18:11
clarkb	have to find an object from last week	18:11
gtema	No hurry, thanks. I am anyway already off for today	18:12
clarkb	ya enjoy your evening. we've managed to work around it for now	18:12
clarkb	hrm https://d321133537aef6ff2c0f-8ffa80ef1885272f8fa2b55d06420ca4.ssl.cf2.rackcdn.com/837180/7/check/designate-bind9-stable-xena/554a978/ is an example but I'm not sure how to map that to a container or cloud	18:12
*** rlandy is now known as rlandy\|mtg		18:13
gtema	Ok, that should work. Regulat curl should help here	18:13
clarkb	gtema: I'm not sure if that will pass through all of the swift metadata though. I'm trying to map it to a swift container so I can check that directly but we'll see how successful I am	18:15
gtema	well, I clearly see that X-Delete-At is set on container and any random object	18:16
gtema	and Access-Control-Allow-Origin: "*" is set on root	18:17
clarkb	for the objects at the link I just provided? I don't see either	18:18
gtema	clarkb: are you sure this is bad example?	18:18
clarkb	but maybe you don't get them by default with a basic GET	18:18
clarkb	gtema: yes https://zuul.opendev.org/t/openstack/build/554a978fa1f346ddb89aea349cd4d76b fails to load and according to the console it is due to CORS and if you clikc the view log link top right you get the url above	18:19
clarkb	and there definitely isn't cors headers set	18:19
gtema	ok, I see	18:20
clarkb	ya ok curl shows the X-Delete-At but direfox didn't	18:20
clarkb	gtema: https://e6bafb017d035c5dec65-4d6af7aae3d7a953b788d97e53f5e54e.ssl.cf1.rackcdn.com/844291/1/check/tox-py27/5422451/ is a working example uploaded by 0.61.0	18:21
fungi	did something else from that buildset upload to ovh so we can compare? are we also missing the headers there?	18:21
clarkb	Access-Control-Allow-Origin: * is rpesent there	18:21
gtema	yes, I see	18:21
clarkb	https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_780/837180/7/check/designate-bind9-stable-yoga/780677a/ is an ovh build result from the same buildset as the failing rax 0.99.0 case	18:22
gtema	https://opendev.org/zuul/zuul-jobs/src/commit/e69d879caecb454c529a7d757b80ae49c3caa105/roles/upload-logs-base/library/zuul_swift_upload.py#L227 - there is even explicit "hack" for rax in the role	18:23
clarkb	gtema: yes I called that out in my email	18:23
gtema	ah, right	18:23
clarkb	interesting the ovh case also has CORS errors in the console but it loads the logs in the dashboard anyway	18:25
clarkb	ah but it is only for a specific file which apparently zuul doesn't need and it isn't fatal?	18:26
opendevreview	Merged opendev/system-config master: Fix zuul merger graceful stops https://review.opendev.org/c/opendev/system-config/+/844320	18:26
gtema	ugh, I think I see the issue: access-control-allow-origin is not matching expected prefix for supported object headers(https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/object_store/v1/_base.py#L37)	18:27
fungi	once that ^ deploys, we'll need to manually down the zuul-merger container on zm01 but after that we should be good?	18:27
gtema	I will test this carefully tomorrow	18:27
*** artom_ is now known as artom		18:27
fungi	gtema: is there a reason for having an allowed list of object headers? i thought you could include any arbitrary header	18:28
gtema	well, pretty much API description of Swift that tells that object metadata starts with X-Object-Meta	18:29
gtema	there is set of additional "system" headers https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/object_store/v1/obj.py#L23 but cors headers are not in there	18:29
gtema	I will create a test case tomorrow for that in sdk	18:29
gtema	hopefully however we can unblock ourselves with devstack-networking often OOMing	18:30
clarkb	interestingly when I make requests against ovh with curl I don't get the cors headers but when my browser makes them it does get them. The one that fails is for a 404 which is why it breaks and why its fine	18:32
timburke_	fwiw, there's also a configurable list of additional headers that may be stored: https://github.com/openstack/swift/blob/2.29.1/etc/object-server.conf-sample#L146	18:33
gtema	so at the moment code works correctly according to official docs. And since cors is an additional middleware it is not present in API docs and thus not properly considered even as concept	18:33
clarkb	we're fetching job-output.json.gz but the path is acutally job-output.json. I think its a compat thing to try and find multiple possible versions but the 404 version fails CORS headers ebcause nothing has said a 404 is a valid cross site request	18:33
timburke_	clarkb, if you include something like `-H 'Origin: example.com'` you'll get the CORS header	18:33
gtema	clarkb: I think you need also to send refer header or something like that (browsers should be doing that)	18:33
clarkb	timburke_: ah thanks	18:33
clarkb	gtema: re working correctly according to official docs I don't think the docs say other headers are invalid just that if properly formatted they are treated special?	18:34
fungi	is the client-side header filtering idea that it can save the user time and bandwidth over a server-side api rejection?	18:34
clarkb	gtema: I did look at that fwiw and didn't find anything saying arbitrary headers are invalid/disallowed just that properly formatted ones can be managed by swift	18:34
gtema	clarkb: it depends on how you read the doc. It of course does not mention that not listed headers are forbidden, but it lists headers it recognizes	18:35
clarkb	gtema: right	18:35
fungi	because otherwise, a client second-guessing server-side limits is at best a redundancy and at worst likely to diverge over time	18:35
clarkb	I would argue clients/sdks/tools shouldn't be overly aggressive then	18:35
clarkb	client tools should always be forgiving	18:35
fungi	postel's law	18:36
clarkb	then let the remote end be angry if necessary	18:36
fungi	says the opposite ;)	18:36
gtema	approach of SDK was always to try to fail client as early as possible before even reaching server	18:36
fungi	but yeah, i don't think it's applicable in this case	18:36
clarkb	gtema: the problem with that is openstack has never been consistent enough to make that a reasonable thing to do	18:36
gtema	that is so sadly true, this makes me cry	18:37
* gtema is wiping tears		18:37
gtema	okay, as said - I will try to fix that tomorrow	18:38
clarkb	fungi: ya I mean swift seems to ignore it entirely in the ovh case for example	18:38
clarkb	the zuul merger docker compose config fix is deploying now	18:39
clarkb	once it deploys I'll manually gracefulyl stop zm01 again and see if we get furhter	18:39
clarkb	I wonder if I have to down up the container to pick up the new config though :/	18:39
fungi	from an sdk standpoint, i would interpret postel's law as saying that the user should be conservative in what data they supply as inputs but the sdk should be forgiving if what the user supplies it. then the sdk should be as conservative as it can in what it sends to the server-side api (while still trying to honor the caller's wishes), and the server should be as accepting as possible about	18:39
fungi	what it receives from the sdk	18:39
corvus	fwiw, the sdk docs don't say they will filter the header list: https://docs.openstack.org/openstacksdk/latest/user/connection.html#openstack.connection.Connection.create_object	18:40
corvus	```headers – These will be passed through to the object creation API as HTTP Headers.```	18:40
corvus	(which, to be clear, is what i think is the expected and desired behavior)	18:41
gtema	btw, https://docs.openstack.org/swift/latest/cors.html mentions that you need to set X-Container-Meta-Access-Control-Allow-Origin	18:42
clarkb	ya we do that too https://opendev.org/zuul/zuul-jobs/src/commit/e69d879caecb454c529a7d757b80ae49c3caa105/roles/upload-logs-base/library/zuul_swift_upload.py#L113	18:43
gtema	correct. That is why it works for OVH and not for RAX	18:44
clarkb	that and the conatiners were all created years ago	18:44
clarkb	but ya if we ran this against ovh today and it created new containers it would probably work	18:44
gtema	if they have something not standard (what if not matching API docs of swift) we have issues	18:44
gtema	ok, done for tonight. Will add exception to sdk	18:45
clarkb	deployment of the zm fix is done. Manually running the merger stop on zm01 now	18:48
fungi	also remember that the current api docs for swift are not necessarily going to be relevant to the 10-year-old fork some major service providers are still running	18:48
clarkb	the playbook is proceeding	18:48
corvus	we also set content-encoding and content-type using that mechanism -- do we know if we expect those to make it through 0.99?	18:49
fungi	but users may have an application which needs to talk to diablo-era and yoga-era swift in different providers at the same time	18:49
clarkb	corvus: fungi might be a good idea to followup on the thread with that info so it isn't lost in irc scrollback? but ya I agree those are good questions and considerations :)	18:50
fungi	which was the use case for the code in nodepool which was later extracted to become shade and then merged into openstacksdk	18:50
clarkb	I think zm02 is hitting the same problem because it started on the wrong config?	18:50
clarkb	we may have to amnually stop each merger. I'll do that if so	18:51
fungi	oh, so we'll need to stop them all manually this time	18:51
fungi	yeah, that makes sense. i guess docker-compose interprets that when "upping" and doesn't re-read it for other actions	18:51
corvus	but this is the last time, really for real this time	18:51
clarkb	yes Ithink so. I'm running the same command the playbook runs to stop them which means in theory they will all work next time	18:51
timburke_	looks like content-encoding and content-type should be fine: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/object_store/v1/obj.py#L26-L27	18:52
clarkb	ERROR: 137 trying to stop on zm03 but it seems to have stopped	18:54
clarkb	04 didn't do that but 05 did. I wonder if its a timing thing stopping the merger too close to startup	18:58
clarkb	I'll give 06 plenty of time	18:59
fungi	i guess 137 is a docker-specific exit code? i don't see zuul special-casing it anyway	19:00
clarkb	ya I think so	19:00
clarkb	apparently error 137 is a "I don't have enough memory" error	19:02
clarkb	maybe our mergers are a bit too small?	19:02
fungi	maybe it tried to start a new merger process while the old one's allocations hadn't been cleaned up?	19:03
clarkb	free reports plenty of available memory	19:03
fungi	but yeah, the mergers only have 2gb ram	19:04
fungi	they do have swap too though	19:04
clarkb	ya 07 was fine	19:04
clarkb	something to keep an eye on but probably not urgent?	19:04
clarkb	8 is proceeding now. It should get to zuul01 processes fairly quickly	19:06
clarkb	yup zuul01 is stopping now	19:07
clarkb	corvus: I notice that the fingergw does not remove itself from the components registry when stopped	19:08
clarkb	oh wait there it goes. Maybe just a delay on the zk ephemeral node cleanup?	19:08
clarkb	it is waiting for the scheduler on 01 to start now. i'm going to eat lunc hwhile that happens	19:09
fungi	yeah, web and scheduler will take a while	19:09
*** rlandy\|mtg is now known as rlandy		19:12
clarkb	looks like it is doing 02 now	19:38
fungi	yep	19:40
clarkb	the playbook is done. Seems to have had no errors. I'll close the screen session now since everything was logged for it	20:06
clarkb	part of me wants to run it again today just to make sure the mergers are happy but I don't think that is super important	20:07
fungi	i'm happy to run it again. we technically also didn't really exercise the image update this last time since that was done prior to restarting the mergers manually	20:09
corvus	if you wanted a new version of zuul for the next update....merging https://review.opendev.org/843737 would do, and it's operationally interesting for opendev... ;)	20:12
corvus	(meanwhile, any objection to my restarting the launchers?	20:13
fungi	no objection from me	20:13
clarkb	ya no objection here, though that will likely pull in openstacksdk 0.99.0 on the launchers	20:15
clarkb	I think now that the previous issue is better understood we wouldn't epxect that to affect nodepool, but calling it out as a change	20:15
corvus	#status log restarted nodepool launchers on 6416b1483821912ac7a0d954aeb6e864eafdb819, likely with sdk 0.99	20:15
opendevstatus	corvus: finished logging	20:15
clarkb	I jsut we should status log the restart of zuul too	20:16
corvus	clarkb: agreed	20:16
clarkb	#status log Restarted all of zuul on 6.0.1.dev54 69199c6fa	20:16
corvus	(agreed re sdk)	20:16
opendevstatus	clarkb: finished logging	20:16
corvus	openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: [...] Bad networks format	20:17
corvus	i'm looking into whether that's new or not	20:17
corvus	nope that's new	20:18
corvus	i'm going to assume occam's razor and that's an sdk 0.99 bug (i have confirmed 0.99 is in the container)	20:19
clarkb	wouldn't surprise me	20:19
corvus	next step? roll back our launchers to nodepool 6.0.0 and then merge a pin?	20:19
clarkb	seems reasonable to me	20:20
clarkb	I don't think opendev is relying on any new unreleased nodepool features/functionality	20:20
fungi	oh that's a fun error	20:20
fungi	i'll get the pin pushed	20:21
corvus	ansible -f 20 nodepool-launcher -m shell -a 'docker pull zuul/nodepool-launcher:6.0.0; docker tag zuul/nodepool-launcher:6.0.0 zuul/nodepool-launcher:latest'	20:22
corvus	#status log restarted nodepool launchers on 6.0.0 after encountering suspected sdk 0.99 bug	20:22
opendevstatus	corvus: finished logging	20:22
clarkb	I'll review the pin but then I'm going for a bike ride. My opportunity to do that are becoming fewer as we get closer to the summit	20:23
fungi	corvus: clarkb: https://review.opendev.org/c/zuul/nodepool/+/844334 Temporarily pin OpenStackSDK before 0.99	20:27
clarkb	heh we even had the 1.0.0 cap	20:28
clarkb	+2	20:28
fungi	yeah...	20:29
clarkb	and now bike ride time. Back in a bit	20:30
*** timburke_ is now known as timburke		20:59
*** dviroel is now known as dviroel\|out		21:34
clarkb	fungi: just to catch up were you goign to rerun the reboot playbook? I can help keep an eye on it if so	22:26
fungi	lemme check if that zuul change merged and published	22:30
clarkb	I think it did	22:31
clarkb	assuming the change that merged is the right one	22:31
fungi	yeah, promote finished	22:31
fungi	i have a root screen session with the new run teed up	22:32
clarkb	cool I'm not joined yet but can keep an eye on it via /components and grafana and dig in futher if necessary	22:32
fungi	ready to hit enter if no immediate objections	22:33
clarkb	none from me.	22:33
fungi	fire in the hole!	22:33
fungi	seems to have pulled the new images	22:33
clarkb	ya that should be the first thing it does	22:33
fungi	ze01 should be in the process of stopping	22:33
*** rlandy is now known as rlandy\|out		22:44
clarkb	reminder I plan to delete the ethercalc server and its dns records tomorrow	23:52
clarkb	I haven't heard any noise since we shutdown the server. Please let me know if you saw something I missed	23:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!