Friday, 2018-01-26

*** panda\|bbl is now known as panda\|off		00:24
*** rlandy\|biab is now known as rlandy		00:50
dmsimard	fsck 70% phase 1	00:51
dmsimard	71% phase 2 !!!	01:17
pabelanger	progress	01:19
dmsimard	pabelanger: also, the increase in swap is because the scratch files are larger now.. it's above 7GB now, earlier it was between 5 and 6	01:20
dmsimard	s/is because/is likely because/	01:21
*** SergeyLukjanov has quit IRC		02:33
*** SergeyLukjanov has joined #openstack-infra-incident		02:35
*** rosmaita has quit IRC		03:08
dmsimard	Stepping away for a while, we're at 74.2%	03:50
dmsimard	load, swap and disk usage fairly stable	03:50
*** openstackstatus has quit IRC		04:56
*** openstackstatus has joined #openstack-infra-incident		04:57
*** ChanServ sets mode: +v openstackstatus		04:57
*** rlandy has quit IRC		06:57
*** panda\|off is now known as panda		11:01
*** rosmaita has joined #openstack-infra-incident		11:54
dmsimard	fsck @85.3%, temporary volume at 36% (I just bumped timeout since we have room).. swap at 4GB but swapping activity is low	12:57
*** rlandy has joined #openstack-infra-incident		13:34
mordred	dmsimard: \o/ maybe it'll finish today	13:55
*** panda is now known as panda\|lunch		13:55
dmsimard	it's a shame the gate is backlogged due to the integrated gate resets	13:56
mordred	yah	13:57
*** efried is now known as fried_rice		14:39
*** dansmith is now known as superdan		14:41
rosmaita	i second that emotion	14:51
dmsimard	We went to great lengths to keep the jobs running and prevent impact on the gate as much as possible but here we are :/	14:54
rosmaita	dmsimard appreciate all you and mordred are doing, just wish thursday had been 72 hours long this week	14:57
mordred	rosmaita: yah. I could use another 72 hours myself	14:58
dmsimard	rosmaita: just the fact that people are understanding is already awesome in itself :)	14:59
*** panda\|lunch is now known as panda		15:05
*** myoung\|pto has quit IRC		16:09
*** myoung has joined #openstack-infra-incident		16:13
dmsimard	fsck 88.3%	17:07
*** fried_rice is now known as fried_rolls		17:15
*** rlandy is now known as rlandy\|brb		18:02
*** rlandy\|brb is now known as rlandy		18:28
*** weshay is now known as weshay\|ruck\|brb		18:43
corvus	i continue to favor the reformat option. my feeling is that the utility of new logs exceeds that of the old ones, and so if an outage extends beyond the normal 8 hours, it's better to just start from scratch so that people can find and fix the current bugs.	19:11
*** zaneb has joined #openstack-infra-incident		19:11
corvus	given that we're almost at the weekend, if folks wanted to try alternatives, that's probably okay.	19:12
corvus	however, i'd suggest that we start monday morning with a fully functional log volume and all log uploads enabled, regardless.	19:13
dmsimard	I'm losing hope that the server will become responsive in a timely fashion given the trend of the last two hours	19:13
corvus	so, if folks wanted to spend the weekend either waiting for the current system to finish, or if they wanted to restart it without the scratch space and use only ram, i think that's okay. but we should set the deadline that if it isn't done by 00:01 utc monday, we reformat.	19:14
dmsimard	We've gone through a large percentage of the fsck at this point.. with a bit of luck if we reboot we could just mount it successfully but I wouldn't count on that	19:14
dmsimard	Is resizing the logserver to 16gb of ram entirely out of the question ? I don't know what the constraints are	19:15
dmsimard	16gb of ram sounds perfectly appropriate for 13TB of storage	19:16
corvus	dmsimard: an online resize? it's possible. the server would be offline for an unknown amount of time while it ran. a replacement is also possible.	19:17
dmsimard	corvus: a resize implies a hard reboot iirc	19:18
corvus	dmsimard: that server only uses, at most, 2GB of ram normally, there's no reason for it to be that large, except to fsck. it would be better to create a server merely to perform the fsck, and shift the volumes to and fro it.	19:18
corvus	dmsimard: yes, it culminates with a hard-reboot after an unknown period of downtime	19:18
dmsimard	corvus: yeah I thought about a temporary server for that purpose too -- the problem is making sure we mount/remount the volumes in the right order and re-create the LVM on the other end.. could prove tricky	19:19
corvus	dmsimard: i don't believe order matters.	19:19
dmsimard	it sure does when dealing with physical disks :D	19:20
corvus	the lvm superblocks should take care of that	19:20
dmsimard	ah.. perhaps.	19:20
pabelanger	Yah, I think at this point in our outage, reformat might be best to move forward. I,m not sure I'd want to start rebuilding / resizing to get 16GB for fsck.	19:24
dmsimard	corvus: the thing that bothers me about all of this is that we're not making the situation any better or doing anything to prevent this from re-occurring. I know pabelanger mentioned Vexxhost volume sizes were not as restrictive... At this point, is a bare metal even out of the question ?	19:24
corvus	dmsimard: we'll move to swift.	19:24
pabelanger	well, we know this is a point of failure and have some discussion to fix it	19:24
pabelanger	yah	19:24
corvus	dmsimard: or the vexxhost thing.	19:24
corvus	one of those.	19:25
pabelanger	I don't think we can fix that now	19:25
corvus	indeed. this is the least appropriate time to change the system.	19:25
pabelanger	I'm still not able to SSH into the server, do we still have a connection up at this point? If we do reboot, the volume is still removed from /etc/fstab, so we shouldn't mount	19:27
*** weshay\|ruck\|brb is now known as weshay\|ruck		19:27
dmsimard	Ok.. so how does this sound.. 1) Reboot 2) See if we can mount the log volume 3) If not, consider reformatting ? or run fsck without scratch files and fully disable log upload (which still won't impact the gate since only failed jobs would upload logs)	19:27
dmsimard	pabelanger: even the console is unresponsive, there is no password prompt after typing the username	19:28
dmsimard	but http is working flawlessly, go figure	19:29
corvus	i'm not in favor of mounting without an fsck. i have no confidence it would not fail randomly at any point later.	19:29
pabelanger	yah, I don't think we can do #2 without fsck	19:29
dmsimard	ok so it boils down to reformat or fsck without scratch files (which is what we usually end up being able to do)	19:29
pabelanger	so, reformat (lose 4 weeks of logs) or fsck for 6 hours	19:30
dmsimard	hmmm... are we able to clone volumes ? Like, cinder create --source-volid	19:30
corvus	in order of preference, i suggest: (1) reformat (2) fsck without scratch files (3) fsck on temporary larger host (4) allow to continue	19:30
corvus	you could probably convince me to swap 2<->3 if you really felt like spending your weekend doing that	19:31
dmsimard	I am thinking perhaps we could clone the volumes before formatting -- see if we can fsck them elsewhere	19:31
corvus	but in all cases, i suggest we maintain the sunday/monday midnight reformat deadline.	19:32
pabelanger	Right agree with order, I'm not sure if I am around much this weekend.	19:33
corvus	dmsimard: i'm not certain if our quota would permit that, or how long it would take.	19:33
corvus	i am sure i am not around this weekend.	19:33
dmsimard	I don't know what's their storage backend -- with ceph, even with large volume sizes, it's near instantaneous	19:33
pabelanger	we've also had release team extend queen-3 milestones due to CI issues too	19:35
dmsimard	I'm not sure I know how to check what our quotas are, the different CLI commands are returning 400's or 404's and the bulk of our servers aren't showing up in the rax interface	19:36
corvus	dmsimard: all the servers should be there... are you using the right account? openstackci?	19:36
dmsimard	ah I was using openstack	19:36
pabelanger	dmsimard: your looking to see if we can clone volumes?	19:42
dmsimard	yeah, sec	19:42
dmsimard	finally managed to get quotas...	19:51
dmsimard	So we have 51200 SATA and 25600 SSD.. I'm going to guess those are gigabytes. We're not using any SSDs and for SATA we're at 35591 out of 51200.	19:52
dmsimard	We'd have just enough room in the sata pool and plenty in the ssd pool.	19:53
dmsimard	I don't suspect it's possible to clone volumes across volume types	19:53
corvus	dmsimard: cloning sounds okay to me, as long as we're okay with falling back on just losing the data.	19:56
corvus	so we'd clone, reformat originals, attach clones to new 16g host, fcsk clones, rsync data back, delete new host and clones?	19:56
pabelanger	and we don't currently know how long a clone would take	19:58
corvus	(i suppose rsyncing the other direction, and swapping the volumes out again is possible, but that means more downtime, whereas the plan above means no downtime after the reformat, and we just progressively fill in old data)	19:58
*** fried_rolls is now known as fried_rice		19:58
corvus	pabelanger: true. if it's cow, it could be instantaneous. if not, it could take 8 hours.	19:58
dmsimard	trying a clone now, it doesn't seem instantaneous and there's actually a sort of progress indicator: clone-progress='1.29%'	19:59
corvus	i'm hoping/assuming that if things go wrong, we can just delete all the volumes and start over (thereby falling back on the 'just reformat' option)	19:59
corvus	dmsimard: can you extrapolate that? (also, this is an on-line clone?)	20:00
pabelanger	yah, 8 hours to clone, another 6 for fsck, then xhrs for rsyc. Say another day to round off	20:00
corvus	pabelanger: right. my main concern though is when the 'upload logs' service starts working again. so with the clone plan, that's determined by how long it takes to clone.	20:01
corvus	if the other stuff takes longer, i'm not worried.	20:01
dmsimard	corvus: That's online -- some Cinder backends provide the capability, some don't.. apparently they do, but it's not a snapshot/cow	20:01
pabelanger	corvus: agree	20:02
dmsimard	it's at 6% now, eh	20:02
corvus	dmsimard: what's the start time?	20:02
dmsimard	2018-01-26T19:57:39.000000 clone-progress='6.96%'	20:03
dmsimard	Seems slow considering we have 13 of these to do.	20:03
corvus	dmsimard: can you do them all in parallel?	20:03
dmsimard	Probably nothing preventing me from doing them in parallel	20:03
corvus	dmsimard: i'm estimating about an hour if that rate holds. if that works in parallel, this is probably doable.	20:04
dmsimard	Let me see	20:04
dmsimard	9.25% now :)	20:04
corvus	i'm going to lunch now.	20:05
corvus	dmsimard: if you want to proceed with this, i'd recommend getting all of them cloning, and also start spinning up a new temporary 16G server for the fsck.	20:05
corvus	hopefully that means in about an hour we can get moving on all aspects.	20:05
pabelanger	also looks like I'm going out for dinner tonight, so I'll won't be online much in next 60mins	20:06
pabelanger	corvus: ++	20:06
dmsimard	clone in progress: http://paste.openstack.org/raw/653933/	20:19
dmsimard	brb	20:19
dmsimard	hmm, I think the fsck is hitting one of the volumes pretty hard, one of the clones is lagging behind a lot	20:23
dmsimard	Should we reboot since we're not going to let the fsck finish anyway ?	20:23
*** srwilkers has joined #openstack-infra-incident		20:25
*** mrhillsman has joined #openstack-infra-incident		20:30
pabelanger	yes, if new plan is to clone / fsck and / or format, we likely reboot now	20:31
dmsimard	pabelanger: ok, I will attempt ctrl+alt+delete in hope that it kills the fsck in a way that is slightly more gentle and fallback to an API reboot. Ack ?	20:33
pabelanger	okay	20:34
dmsimard	no go on ctrl+alt+delete, trying a soft reboot	20:37
dmsimard	I have a ping going, it's probably going to fall back to hard reboot.. it's not rebooting.	20:39
dmsimard	so rax is still on cinder api v1, openstackclient defaults to v3 :/	20:41
dmsimard	hard reboot went through	20:43
dmsimard	logs.o.o is back	20:45
pabelanger	yup, I'm able to SSH again	20:45
pabelanger	dmsimard: any improvement in cinder clone?	20:46
dmsimard	pabelanger: Hard to tell without timed data points after rebooting, I'll be able to tell in a few minutes but here's the current status (main10 was the one lagging behind) http://paste.openstack.org/raw/653942/	20:47
pabelanger	ok	20:50
*** rosmaita has quit IRC		20:57
dmsimard	Looks like main10 is struggling a bit despite the reboot, update from now: http://paste.openstack.org/raw/653973/	21:07
dmsimard	Maybe it's on a slower storage node or something	21:07
corvus	dmsimard: so that's maybe 4-5 more hours to finish the clone.	21:10
dmsimard	we got a few volumes that finished cloning	21:24
dmsimard	main10 got about ... 4% in 15 minutes. Bleh.	21:26
*** rosmaita has joined #openstack-infra-incident		21:36
dmsimard	main10 just got 15% progress in about 3 minutes \o/	21:41
dmsimard	I lied, bad timestamping, it's still at 38% now though. Most of the other volumes are finished cloning.	21:43
*** rlandy has quit IRC		21:49
fungi	in another airport for a few minutes and caught up on scrollback in here... an instance resize wouldn't have been an option anyway (i think it might be possible with older non-pvhvm flavors if they even still have those, but not supported on the modern flavors we've been using)	22:03
dmsimard	infra-root: Have something I need to take care of and need to afk for a few hours. I started a screen on the puppetmaster with the status for the main10 volume which is the last one left to clone. It's at 61% right now. Once the clone is finished we can go ahead with the reformat and I'll take care of attempting the recovery (which might be tomorrow)	22:04
fungi	poor i/o to main10 is likely why the fsck was going slowly	22:04
dmsimard	maybe	22:04
dmsimard	infra-root: note that there's no loop right now deleting data from /srv/static, it's at 46% so we'll more than likely going to have enough time to switch things around.	22:07
dmsimard	taking off now, sorry	22:07
fungi	reformatting the "slow" main10 may be a poor choice	22:07
*** panda is now known as panda\|off		22:15
corvus	fungi: agreed; we may want to roll the dice on a new volume while we're at it, if we have the quota.	22:22
fungi	could lvremove the logs volume, vgreduce off of main10 (pvmove any other logical volumes if they're on there, but odds are they aren't), detach the old main10, cinder create a new main10 to attach, vgextend back onto it and then lvcreate the replacement logs volume and format that	22:31
fungi	if the others cloned quickly then there's likely no need to replace them	22:32
dmsimard	temporarily stepped by the laptop, main10 has finished cloning and everything else is finished as well: http://paste.openstack.org/raw/654070/	22:41
dmsimard	feel free to proceed	22:41
* dmsimard off		22:41
corvus	infra-root: i will execute fungi's plan now	22:49
corvus	(though i'm going to create the new volume first, to reduce the chance that the old gets immediately re-used)	22:51
corvus	infra-root: oh, since we're making a new fs, what settings should we use (inodes, etc)?	23:02
fungi	i would consider kicking up the inode ratio a notch	23:06
corvus	were we at the default before?	23:06
fungi	to my knowledge, yes	23:07
fungi	the graph said we were doing okay on inode count last time i looked, but it's frustrating to hit an inode cap when you still have room for more blocks	23:07
corvus	do we graph inode count?	23:08
fungi	oh, right, i may have been looking at df actually :/	23:08
corvus	ah	23:08
corvus	i don't see anything in https://wiki.openstack.org/wiki/Infrastructure_Status about inodes, so assuming default	23:08
fungi	so if we don't graph it (i can't easily check at the moment) then it'll be hard to know	23:09
fungi	but yeah, i don't remember us setting it to a non-default value when we switched to ext4 way back whenever	23:09
corvus	http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-11.log.html#t2017-10-11T20:25:23	23:11
corvus	there's some information	23:12
fungi	i suppose if we want to reformat it a second time later in the weekend after we fsck the clones copies, we might b able to tell for sure	23:12
corvus	i'm not sure we need to double it... should we maybe do 1.5x?	23:14
corvus	default is 16kB per inode.	23:15
corvus	well, our current use, based on only the failed logs uploaded in the last few hours, is 28kB per inode.	23:17
corvus	and of course, a lot of that stuff hasn't been gzipped yet	23:17
fungi	did you mean should we maybe do something <1x?	23:18
fungi	like 12kb/inode (0.75x)	23:18
corvus	fungi: yes (i was thinking 1.5x inode count at the time i wrote that, which would be 0.75x inode ratio)	23:19
fungi	1.5x the old bytes per inode would mean we run out of inodes faster	23:19
fungi	okay, cool	23:20
fungi	same page then ;)	23:20
corvus	right, i think no higher inode ratio than the default, but maybe less.	23:20
corvus	i'm trying to run a gzip pass real quick to get slightly better data from the admittedly very small sample size.	23:20
fungi	yeah, i'm pretty sure we were close to 1:1 block % vs inode % a few weeks ago when i took a look	23:21
fungi	which suggests that the ratio was sufficient, but not generous in terms of absorbing unanticipated inode consumption spikes	23:22
corvus	fungi: maybe even 0.625: 10240 ?	23:23
corvus	just a small tweak	23:23
corvus	or strike that, that's not what i meant to do	23:24
corvus	0.875: 14336 is what i meant	23:25
corvus	okay, the gzip pass finished and it's still about 28k/inode used	23:27
fungi	sure, that's probably plenty of headroom until we execute a plan to drop the old logserver in favor of something better (sharded across volumes, stashed in swift, whatever)	23:27
corvus	fungi: i'm inclined to only make a small change, so my gut instinct based on remembered data would be to do -i 14336	23:28
corvus	how's that sound?	23:28
fungi	wfm	23:29
corvus	#status log cloned all volumes from static.openstack.org for later fsck; replaced main10 device because it seemed slow and recreated logs logical volume.	23:29
openstackstatus	corvus: finished logging	23:29
fungi	matches what i would expect per the manpage	23:29
fungi	i need to disappear again shortly to board yet another flight	23:31
corvus	#status log created logs filesystem with "mkfs.ext4 -m 0 -j -i 14336 -L $NAME /dev/main/$NAME" http://paste.openstack.org/show/654140/	23:31
openstackstatus	corvus: finished logging	23:31
corvus	fungi: that look reasonable? ^	23:31
fungi	not sure you need -j with mkfs.ext4 but it's likely fine? the rest is definitely sane	23:32
corvus	okay, rsyncing the accumulated data	23:34
fungi	yeah, based on my reading of the -t option (which is implied by mkfs.ext3 and 4) you'd have a journal created regardless (though could override to omit the journal under ext3 by adding a -O exclusion)	23:35
fungi	i'll likely be offline again until fairly late tomorrow, sorry i can't be more help	23:42
corvus	fungi: bon voyage!	23:42
fungi	thanks! and also thanks to you and others for getting the logserver back on track	23:43
fungi	i'll try to check back in again as soon as i can	23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!