Saturday, 2021-02-06

*** JayF has joined #openstack-infra		00:11
*** tosky has quit IRC		00:28
*** thiago__ has joined #openstack-infra		00:43
*** tdasilva_ has quit IRC		00:45
*** tdasilva_ has joined #openstack-infra		00:48
*** thiago__ has quit IRC		00:50
*** hamalq has joined #openstack-infra		01:17
*** yamamoto has quit IRC		01:18
*** Xuchu has joined #openstack-infra		01:20
*** Xuchu_ has quit IRC		01:22
*** yamamoto has joined #openstack-infra		01:23
*** hamalq has quit IRC		01:24
*** hamalq has joined #openstack-infra		01:25
*** yamamoto has quit IRC		01:28
*** Xuchu_ has joined #openstack-infra		02:12
*** Xuchu has quit IRC		02:15
*** yamamoto has joined #openstack-infra		02:21
*** hamalq has quit IRC		02:24
*** yamamoto has quit IRC		02:38
*** dviroel has quit IRC		03:22
*** dviroel has joined #openstack-infra		03:25
*** Xuchu has joined #openstack-infra		03:26
*** Xuchu_ has quit IRC		03:29
dansmith	Saw this on a grenade job during setup, trying to install packages from apt:	04:03
dansmith	E: You don't have enough free space in /var/cache/apt/archives/.	04:03
dansmith	https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c31/774317/2/check/grenade/c31a0f0/job-output.txt	04:03
*** david-lyle has joined #openstack-infra		05:25
*** Xuchu_ has joined #openstack-infra		05:25
*** redrobot6 has joined #openstack-infra		05:27
*** dklyle has quit IRC		05:28
*** redrobot has quit IRC		05:28
*** redrobot6 is now known as redrobot		05:28
*** Xuchu has quit IRC		05:29
*** mgoddard has quit IRC		05:31
*** kota_ has quit IRC		05:31
*** kota_ has joined #openstack-infra		05:31
*** mgoddard has joined #openstack-infra		05:31
*** dviroel has quit IRC		05:42
*** Xuchu_ has quit IRC		05:46
*** yamamoto has joined #openstack-infra		06:35
*** yamamoto has quit IRC		06:39
*** david-lyle has quit IRC		07:42
*** vesper11 has joined #openstack-infra		08:35
*** yamamoto has joined #openstack-infra		08:36
*** yamamoto has quit IRC		08:41
*** matt_kosut has joined #openstack-infra		09:19
*** matt_kosut has quit IRC		09:19
*** xek has joined #openstack-infra		09:51
*** vesper11 has quit IRC		09:51
*** paladox has quit IRC		09:59
*** tosky has joined #openstack-infra		10:14
*** xek has quit IRC		10:22
*** yamamoto has joined #openstack-infra		10:37
*** yamamoto has quit IRC		10:42
*** yamamoto has joined #openstack-infra		10:55
*** yamamoto has quit IRC		11:29
*** yamamoto has joined #openstack-infra		12:02
*** yamamoto has quit IRC		12:10
*** tdasilva_ has quit IRC		12:11
*** tdasilva_ has joined #openstack-infra		12:12
*** yamamoto has joined #openstack-infra		14:07
fungi	dansmith: looks like devstack could stand to do an apt clean after each round of things it installs. by default, debian derivatives leave copies of all installed packages in /var/cache/archive, and disk space might be tight in that provider	14:09
fungi	er, in /var/cache/apt/archive i mean	14:10
fungi	`apt-get clean` or `apt clean` will clear them out	14:10
*** yamamoto has quit IRC		14:11
*** paladox has joined #openstack-infra		14:18
*** dviroel has joined #openstack-infra		14:23
*** tosky has quit IRC		16:03
*** maysams has quit IRC		16:12
*** Tengu has quit IRC		17:20
*** Tengu has joined #openstack-infra		17:21
*** ralonsoh has joined #openstack-infra		17:22
*** xek has joined #openstack-infra		17:22
*** ralonsoh has quit IRC		17:25
*** xek has quit IRC		17:35
*** xek has joined #openstack-infra		17:38
*** xek has quit IRC		17:38
*** d34dh0r53 has quit IRC		18:32
*** tosky has joined #openstack-infra		19:23
*** slaweq has joined #openstack-infra		19:48
dansmith	fungi: really? it's complaining about not having 400mb of disk.. are the workers really that tight on space?	20:02
dansmith	the workers get cleaned after each run, so it's not package cache from the previous run right?	20:02
*** yamamoto has joined #openstack-infra		20:10
*** yamamoto has quit IRC		20:14
fungi	not sure what you mean by workers, but the job nodes are deleted and booted fresh	20:15
*** slaweq has quit IRC		20:15
fungi	unfortunately it failed in such a way that the usual devstack log collection didn't happen, so we don't have a df to see what the actual filesystem size was	20:15
fungi	possible something happened when that node booted which caused it not to growpart the rootfs at boot and left it at the nominal image size	20:16
fungi	first time i've seen that, so hard to speculate as to the cause	20:17
fungi	are you finding multiple occurrences?	20:18
dansmith	yeah I saw it a couple times yesterday, always on the grenade job	20:18
dansmith	by workers I mean the thing that we run devstack in.. so yeah, I assumed those get booted fresh, but I thought maybe you were suggesting that we just do a ./clean and re-run of devstack, so wasn't sure	20:19
dansmith	fungi: this doesn't have to be something for a saturday for either of us, it just seemed like maybe something had changed and we were going to see a rash of fails due to disk space coming	20:19
dansmith	I did a bunch of pushes last night before the jobs finished, so I'm not sure many of those actually got reported, but the last round that I let complete last night seemed to finish	20:21
fungi	we might ought to add a df to https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/validate-host/library/zuul_debug_info.py and then it will be included in the zuul-info/zuul-info.*.txt files we collect from the nodes	20:22
fungi	i'll throw up a patch for that now while i'm thinking about it	20:22
fungi	at least that way we'll know what the filesystem sizes and utilization look like at the start of each job, and can speculate a bit better as to what happened	20:23
dansmith	ack, devstack or zuul could also do it before to see what we start with.. I dunno how big the disks are on those, but 400m seemed like an awfully small margin	20:23
fungi	yep, but that was also after numerous package install rounds earlier in the log	20:24
dansmith	sure sure, but.. 400m :)	20:24
dansmith	if we need to start being more disk conscious then that's a thing I guess, but I'd want to know where it's all going	20:25
fungi	dansmith: https://review.opendev.org/774358	20:30
fungi	also there was a time when we sparsely fallocated swapfiles on nodes, but more recent linux kernels have required us to preallocate them instead	20:30
fungi	so depending on the swap size set in the job configuration (default in our deployment is 1gb) that can eat away at available space on the rootfs	20:31
fungi	a lot of jobs have it set to 8gb, but even that alone doesn't seem like it should be the cause of the problem in that example	20:32
dansmith	how big are the roots supposed to be?	20:32
dansmith	maybe our fs isn't expanded or something and we have a small margin over the actual size of the disk?	20:32
fungi	i think some providers have a rootfs as small as 20gb and then allocate a larger ephemeral disk which some kinds of jobs (e.g. devstack) mount at /opt	20:33
dansmith	okay	20:34
fungi	i don't recall how small they are for that particular provider from your example, i'd probably have to manually boot or hold a node and investigate	20:34
dansmith	ack, well, anyway, let's not make a saturday out of this.. it was mostly just an FYI in case something has changed lately that was likely to cause a raft of disk space fails	20:36
fungi	yep, totally appreciate the heads up, i'll advocate for the patch to collect initial fs sizes/utilization and see if we can't get a better idea of why we see it sometimes	20:38
dansmith	logstash shows several hits in the last 48 hours btw	20:41
dansmith	so it wasn't just those two	20:41
fungi	same provider each time?	20:42
dansmith	airship-kn1 yeah looks like	20:43
fungi	possible something has changed there, in that case	20:53
*** zzzeek has quit IRC		21:01
*** zzzeek has joined #openstack-infra		21:02
fungi	i wonder if they shrunk the disk on the flavor we've been using, for example	21:07
dansmith	I heard there's some k8s malware going around mining bitcoin, maybe we've got an openstack virus on our hands that eats disk :)	21:08
fungi	tasty, tasty disk	21:08
fungi	it could also be something like this has always been the smallest rootfs of all our providers but recently some change merged to grenade which caused it to begin using far more disk, and because we boot so few instances in that provider it's gone unspotted until now	21:09
dansmith	yeah I dunno what has changed really.. could be as simple as the mysql package includes sample databases now or something I guess	21:14
dansmith	the message actually says 100mb is what it has free, but needs 400, which really seems like too close a margin for something not to have changed recently	21:15
jrosser	if it’s focal could it be the delta to 20.04.2 trying to install? that landed on feb 4th	21:19
dansmith	that's a good idea, but I don't see any giant "and all these 300 will come too" package installs in that log	21:24
corvus	dtantsur\|afk: hi, it looks like openlab terraform-provider-openstack jobs are failing after feb 2; i looked and do not immediately see the cause. here's the build history: http://status.openlabtesting.org/builds?job_name=terraform-provider-openstack-acceptance-test it's failing on "TASK [install-devstack : Set fact for devstack openrc]" which has a no_log, so i can't see why. do you have any	21:37
corvus	idea?	21:37
corvus	dtantsur\|afk: see also https://github.com/theopenlab/openlab/issues/681 and https://github.com/theopenlab/openlab-zuul-jobs/pull/1104	22:03
corvus	(and i'm totally open to suggestions of a better irc channel, i realize this is not directly TACT related, but there's a community nexus here; sorry)	22:05
*** slaweq has joined #openstack-infra		22:05
*** zzzeek has quit IRC		22:06
*** zzzeek has joined #openstack-infra		22:08
*** yamamoto has joined #openstack-infra		22:11
*** slaweq has quit IRC		22:50
*** dviroel has quit IRC		22:53
*** yamamoto has quit IRC		23:19
*** tosky has quit IRC		23:52

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!