*** panda|bbl is now known as panda|off | 00:24 | |
*** rlandy|biab is now known as rlandy | 00:50 | |
dmsimard | fsck 70% phase 1 | 00:51 |
---|---|---|
dmsimard | 71% phase 2 !!! | 01:17 |
pabelanger | progress | 01:19 |
dmsimard | pabelanger: also, the increase in swap is because the scratch files are larger now.. it's above 7GB now, earlier it was between 5 and 6 | 01:20 |
dmsimard | s/is because/is likely because/ | 01:21 |
*** SergeyLukjanov has quit IRC | 02:33 | |
*** SergeyLukjanov has joined #openstack-infra-incident | 02:35 | |
*** rosmaita has quit IRC | 03:08 | |
dmsimard | Stepping away for a while, we're at 74.2% | 03:50 |
dmsimard | load, swap and disk usage fairly stable | 03:50 |
*** openstackstatus has quit IRC | 04:56 | |
*** openstackstatus has joined #openstack-infra-incident | 04:57 | |
*** ChanServ sets mode: +v openstackstatus | 04:57 | |
*** rlandy has quit IRC | 06:57 | |
*** panda|off is now known as panda | 11:01 | |
*** rosmaita has joined #openstack-infra-incident | 11:54 | |
dmsimard | fsck @85.3%, temporary volume at 36% (I just bumped timeout since we have room).. swap at 4GB but swapping activity is low | 12:57 |
*** rlandy has joined #openstack-infra-incident | 13:34 | |
mordred | dmsimard: \o/ maybe it'll finish today | 13:55 |
*** panda is now known as panda|lunch | 13:55 | |
dmsimard | it's a shame the gate is backlogged due to the integrated gate resets | 13:56 |
mordred | yah | 13:57 |
*** efried is now known as fried_rice | 14:39 | |
*** dansmith is now known as superdan | 14:41 | |
rosmaita | i second that emotion | 14:51 |
dmsimard | We went to great lengths to keep the jobs running and prevent impact on the gate as much as possible but here we are :/ | 14:54 |
rosmaita | dmsimard appreciate all you and mordred are doing, just wish thursday had been 72 hours long this week | 14:57 |
mordred | rosmaita: yah. I could use another 72 hours myself | 14:58 |
dmsimard | rosmaita: just the fact that people are understanding is already awesome in itself :) | 14:59 |
*** panda|lunch is now known as panda | 15:05 | |
*** myoung|pto has quit IRC | 16:09 | |
*** myoung has joined #openstack-infra-incident | 16:13 | |
dmsimard | fsck 88.3% | 17:07 |
*** fried_rice is now known as fried_rolls | 17:15 | |
*** rlandy is now known as rlandy|brb | 18:02 | |
*** rlandy|brb is now known as rlandy | 18:28 | |
*** weshay is now known as weshay|ruck|brb | 18:43 | |
corvus | i continue to favor the reformat option. my feeling is that the utility of new logs exceeds that of the old ones, and so if an outage extends beyond the normal 8 hours, it's better to just start from scratch so that people can find and fix the current bugs. | 19:11 |
*** zaneb has joined #openstack-infra-incident | 19:11 | |
corvus | given that we're almost at the weekend, if folks wanted to try alternatives, that's probably okay. | 19:12 |
corvus | however, i'd suggest that we start monday morning with a fully functional log volume and all log uploads enabled, regardless. | 19:13 |
dmsimard | I'm losing hope that the server will become responsive in a timely fashion given the trend of the last two hours | 19:13 |
corvus | so, if folks wanted to spend the weekend either waiting for the current system to finish, or if they wanted to restart it without the scratch space and use only ram, i think that's okay. but we should set the deadline that if it isn't done by 00:01 utc monday, we reformat. | 19:14 |
dmsimard | We've gone through a large percentage of the fsck at this point.. with a bit of luck if we reboot we could just mount it successfully but I wouldn't count on that | 19:14 |
dmsimard | Is resizing the logserver to 16gb of ram entirely out of the question ? I don't know what the constraints are | 19:15 |
dmsimard | 16gb of ram sounds perfectly appropriate for 13TB of storage | 19:16 |
corvus | dmsimard: an online resize? it's possible. the server would be offline for an unknown amount of time while it ran. a replacement is also possible. | 19:17 |
dmsimard | corvus: a resize implies a hard reboot iirc | 19:18 |
corvus | dmsimard: that server only uses, at most, 2GB of ram normally, there's no reason for it to be that large, except to fsck. it would be better to create a server merely to perform the fsck, and shift the volumes to and fro it. | 19:18 |
corvus | dmsimard: yes, it culminates with a hard-reboot after an unknown period of downtime | 19:18 |
dmsimard | corvus: yeah I thought about a temporary server for that purpose too -- the problem is making sure we mount/remount the volumes in the right order and re-create the LVM on the other end.. could prove tricky | 19:19 |
corvus | dmsimard: i don't believe order matters. | 19:19 |
dmsimard | it sure does when dealing with physical disks :D | 19:20 |
corvus | the lvm superblocks should take care of that | 19:20 |
dmsimard | ah.. perhaps. | 19:20 |
pabelanger | Yah, I think at this point in our outage, reformat might be best to move forward. I,m not sure I'd want to start rebuilding / resizing to get 16GB for fsck. | 19:24 |
dmsimard | corvus: the thing that bothers me about all of this is that we're not making the situation any better or doing anything to prevent this from re-occurring. I know pabelanger mentioned Vexxhost volume sizes were not as restrictive... At this point, is a bare metal even out of the question ? | 19:24 |
corvus | dmsimard: we'll move to swift. | 19:24 |
pabelanger | well, we know this is a point of failure and have some discussion to fix it | 19:24 |
pabelanger | yah | 19:24 |
corvus | dmsimard: or the vexxhost thing. | 19:24 |
corvus | one of those. | 19:25 |
pabelanger | I don't think we can fix that now | 19:25 |
corvus | indeed. this is the least appropriate time to change the system. | 19:25 |
pabelanger | I'm still not able to SSH into the server, do we still have a connection up at this point? If we do reboot, the volume is still removed from /etc/fstab, so we shouldn't mount | 19:27 |
*** weshay|ruck|brb is now known as weshay|ruck | 19:27 | |
dmsimard | Ok.. so how does this sound.. 1) Reboot 2) See if we can mount the log volume 3) If not, consider reformatting ? or run fsck without scratch files and fully disable log upload (which still won't impact the gate since only failed jobs would upload logs) | 19:27 |
dmsimard | pabelanger: even the console is unresponsive, there is no password prompt after typing the username | 19:28 |
dmsimard | but http is working flawlessly, go figure | 19:29 |
corvus | i'm not in favor of mounting without an fsck. i have no confidence it would not fail randomly at any point later. | 19:29 |
pabelanger | yah, I don't think we can do #2 without fsck | 19:29 |
dmsimard | ok so it boils down to reformat or fsck without scratch files (which is what we usually end up being able to do) | 19:29 |
pabelanger | so, reformat (lose 4 weeks of logs) or fsck for 6 hours | 19:30 |
dmsimard | hmmm... are we able to clone volumes ? Like, cinder create --source-volid | 19:30 |
corvus | in order of preference, i suggest: (1) reformat (2) fsck without scratch files (3) fsck on temporary larger host (4) allow to continue | 19:30 |
corvus | you could probably convince me to swap 2<->3 if you really felt like spending your weekend doing that | 19:31 |
dmsimard | I am thinking perhaps we could clone the volumes before formatting -- see if we can fsck them elsewhere | 19:31 |
corvus | but in all cases, i suggest we maintain the sunday/monday midnight reformat deadline. | 19:32 |
pabelanger | Right agree with order, I'm not sure if I am around much this weekend. | 19:33 |
corvus | dmsimard: i'm not certain if our quota would permit that, or how long it would take. | 19:33 |
corvus | i am sure i am not around this weekend. | 19:33 |
dmsimard | I don't know what's their storage backend -- with ceph, even with large volume sizes, it's near instantaneous | 19:33 |
pabelanger | we've also had release team extend queen-3 milestones due to CI issues too | 19:35 |
dmsimard | I'm not sure I know how to check what our quotas are, the different CLI commands are returning 400's or 404's and the bulk of our servers aren't showing up in the rax interface | 19:36 |
corvus | dmsimard: all the servers should be there... are you using the right account? openstackci? | 19:36 |
dmsimard | ah I was using openstack | 19:36 |
pabelanger | dmsimard: your looking to see if we can clone volumes? | 19:42 |
dmsimard | yeah, sec | 19:42 |
dmsimard | finally managed to get quotas... | 19:51 |
dmsimard | So we have 51200 SATA and 25600 SSD.. I'm going to guess those are gigabytes. We're not using any SSDs and for SATA we're at 35591 out of 51200. | 19:52 |
dmsimard | We'd have just enough room in the sata pool and plenty in the ssd pool. | 19:53 |
dmsimard | I don't suspect it's possible to clone volumes across volume types | 19:53 |
corvus | dmsimard: cloning sounds okay to me, as long as we're okay with falling back on just losing the data. | 19:56 |
corvus | so we'd clone, reformat originals, attach clones to new 16g host, fcsk clones, rsync data back, delete new host and clones? | 19:56 |
pabelanger | and we don't currently know how long a clone would take | 19:58 |
corvus | (i suppose rsyncing the other direction, and swapping the volumes out again is possible, but that means more downtime, whereas the plan above means no downtime after the reformat, and we just progressively fill in old data) | 19:58 |
*** fried_rolls is now known as fried_rice | 19:58 | |
corvus | pabelanger: true. if it's cow, it could be instantaneous. if not, it could take 8 hours. | 19:58 |
dmsimard | trying a clone now, it doesn't seem instantaneous and there's actually a sort of progress indicator: clone-progress='1.29%' | 19:59 |
corvus | i'm hoping/assuming that if things go wrong, we can just delete all the volumes and start over (thereby falling back on the 'just reformat' option) | 19:59 |
corvus | dmsimard: can you extrapolate that? (also, this is an on-line clone?) | 20:00 |
pabelanger | yah, 8 hours to clone, another 6 for fsck, then xhrs for rsyc. Say another day to round off | 20:00 |
corvus | pabelanger: right. my main concern though is when the 'upload logs' service starts working again. so with the clone plan, that's determined by how long it takes to clone. | 20:01 |
corvus | if the other stuff takes longer, i'm not worried. | 20:01 |
dmsimard | corvus: That's online -- some Cinder backends provide the capability, some don't.. apparently they do, but it's not a snapshot/cow | 20:01 |
pabelanger | corvus: agree | 20:02 |
dmsimard | it's at 6% now, eh | 20:02 |
corvus | dmsimard: what's the start time? | 20:02 |
dmsimard | 2018-01-26T19:57:39.000000 clone-progress='6.96%' | 20:03 |
dmsimard | Seems slow considering we have 13 of these to do. | 20:03 |
corvus | dmsimard: can you do them all in parallel? | 20:03 |
dmsimard | Probably nothing preventing me from doing them in parallel | 20:03 |
corvus | dmsimard: i'm estimating about an hour if that rate holds. if that works in parallel, this is probably doable. | 20:04 |
dmsimard | Let me see | 20:04 |
dmsimard | 9.25% now :) | 20:04 |
corvus | i'm going to lunch now. | 20:05 |
corvus | dmsimard: if you want to proceed with this, i'd recommend getting all of them cloning, and also start spinning up a new temporary 16G server for the fsck. | 20:05 |
corvus | hopefully that means in about an hour we can get moving on all aspects. | 20:05 |
pabelanger | also looks like I'm going out for dinner tonight, so I'll won't be online much in next 60mins | 20:06 |
pabelanger | corvus: ++ | 20:06 |
dmsimard | clone in progress: http://paste.openstack.org/raw/653933/ | 20:19 |
dmsimard | brb | 20:19 |
dmsimard | hmm, I think the fsck is hitting one of the volumes pretty hard, one of the clones is lagging behind a lot | 20:23 |
dmsimard | Should we reboot since we're not going to let the fsck finish anyway ? | 20:23 |
*** srwilkers has joined #openstack-infra-incident | 20:25 | |
*** mrhillsman has joined #openstack-infra-incident | 20:30 | |
pabelanger | yes, if new plan is to clone / fsck and / or format, we likely reboot now | 20:31 |
dmsimard | pabelanger: ok, I will attempt ctrl+alt+delete in hope that it kills the fsck in a way that is slightly more gentle and fallback to an API reboot. Ack ? | 20:33 |
pabelanger | okay | 20:34 |
dmsimard | no go on ctrl+alt+delete, trying a soft reboot | 20:37 |
dmsimard | I have a ping going, it's probably going to fall back to hard reboot.. it's not rebooting. | 20:39 |
dmsimard | so rax is still on cinder api v1, openstackclient defaults to v3 :/ | 20:41 |
dmsimard | hard reboot went through | 20:43 |
dmsimard | logs.o.o is back | 20:45 |
pabelanger | yup, I'm able to SSH again | 20:45 |
pabelanger | dmsimard: any improvement in cinder clone? | 20:46 |
dmsimard | pabelanger: Hard to tell without timed data points after rebooting, I'll be able to tell in a few minutes but here's the current status (main10 was the one lagging behind) http://paste.openstack.org/raw/653942/ | 20:47 |
pabelanger | ok | 20:50 |
*** rosmaita has quit IRC | 20:57 | |
dmsimard | Looks like main10 is struggling a bit despite the reboot, update from now: http://paste.openstack.org/raw/653973/ | 21:07 |
dmsimard | Maybe it's on a slower storage node or something | 21:07 |
corvus | dmsimard: so that's maybe 4-5 more hours to finish the clone. | 21:10 |
dmsimard | we got a few volumes that finished cloning | 21:24 |
dmsimard | main10 got about ... 4% in 15 minutes. Bleh. | 21:26 |
*** rosmaita has joined #openstack-infra-incident | 21:36 | |
dmsimard | main10 just got 15% progress in about 3 minutes \o/ | 21:41 |
dmsimard | I lied, bad timestamping, it's still at 38% now though. Most of the other volumes are finished cloning. | 21:43 |
*** rlandy has quit IRC | 21:49 | |
fungi | in another airport for a few minutes and caught up on scrollback in here... an instance resize wouldn't have been an option anyway (i think it might be possible with older non-pvhvm flavors if they even still have those, but not supported on the modern flavors we've been using) | 22:03 |
dmsimard | infra-root: Have something I need to take care of and need to afk for a few hours. I started a screen on the puppetmaster with the status for the main10 volume which is the last one left to clone. It's at 61% right now. Once the clone is finished we can go ahead with the reformat and I'll take care of attempting the recovery (which might be tomorrow) | 22:04 |
fungi | poor i/o to main10 is likely why the fsck was going slowly | 22:04 |
dmsimard | maybe | 22:04 |
dmsimard | infra-root: note that there's no loop right now deleting data from /srv/static, it's at 46% so we'll more than likely going to have enough time to switch things around. | 22:07 |
dmsimard | taking off now, sorry | 22:07 |
fungi | reformatting the "slow" main10 may be a poor choice | 22:07 |
*** panda is now known as panda|off | 22:15 | |
corvus | fungi: agreed; we may want to roll the dice on a new volume while we're at it, if we have the quota. | 22:22 |
fungi | could lvremove the logs volume, vgreduce off of main10 (pvmove any other logical volumes if they're on there, but odds are they aren't), detach the old main10, cinder create a new main10 to attach, vgextend back onto it and then lvcreate the replacement logs volume and format that | 22:31 |
fungi | if the others cloned quickly then there's likely no need to replace them | 22:32 |
dmsimard | temporarily stepped by the laptop, main10 has finished cloning and everything else is finished as well: http://paste.openstack.org/raw/654070/ | 22:41 |
dmsimard | feel free to proceed | 22:41 |
* dmsimard off | 22:41 | |
corvus | infra-root: i will execute fungi's plan now | 22:49 |
corvus | (though i'm going to create the new volume first, to reduce the chance that the old gets immediately re-used) | 22:51 |
corvus | infra-root: oh, since we're making a new fs, what settings should we use (inodes, etc)? | 23:02 |
fungi | i would consider kicking up the inode ratio a notch | 23:06 |
corvus | were we at the default before? | 23:06 |
fungi | to my knowledge, yes | 23:07 |
fungi | the graph said we were doing okay on inode count last time i looked, but it's frustrating to hit an inode cap when you still have room for more blocks | 23:07 |
corvus | do we graph inode count? | 23:08 |
fungi | oh, right, i may have been looking at df actually :/ | 23:08 |
corvus | ah | 23:08 |
corvus | i don't see anything in https://wiki.openstack.org/wiki/Infrastructure_Status about inodes, so assuming default | 23:08 |
fungi | so if we don't graph it (i can't easily check at the moment) then it'll be hard to know | 23:09 |
fungi | but yeah, i don't remember us setting it to a non-default value when we switched to ext4 way back whenever | 23:09 |
corvus | http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-11.log.html#t2017-10-11T20:25:23 | 23:11 |
corvus | there's some information | 23:12 |
fungi | i suppose if we want to reformat it a second time later in the weekend after we fsck the clones copies, we might b able to tell for sure | 23:12 |
corvus | i'm not sure we need to double it... should we maybe do 1.5x? | 23:14 |
corvus | default is 16kB per inode. | 23:15 |
corvus | well, our *current* use, based on only the failed logs uploaded in the last few hours, is 28kB per inode. | 23:17 |
corvus | and of course, a lot of that stuff hasn't been gzipped yet | 23:17 |
fungi | did you mean should we maybe do something <1x? | 23:18 |
fungi | like 12kb/inode (0.75x) | 23:18 |
corvus | fungi: yes (i was thinking 1.5x inode count at the time i wrote that, which would be 0.75x inode ratio) | 23:19 |
fungi | 1.5x the old bytes per inode would mean we run out of inodes faster | 23:19 |
fungi | okay, cool | 23:20 |
fungi | same page then ;) | 23:20 |
corvus | right, i think no higher inode ratio than the default, but maybe less. | 23:20 |
corvus | i'm trying to run a gzip pass real quick to get slightly better data from the admittedly very small sample size. | 23:20 |
fungi | yeah, i'm pretty sure we were close to 1:1 block % vs inode % a few weeks ago when i took a look | 23:21 |
fungi | which suggests that the ratio was sufficient, but not generous in terms of absorbing unanticipated inode consumption spikes | 23:22 |
corvus | fungi: maybe even 0.625: 10240 ? | 23:23 |
corvus | just a small tweak | 23:23 |
corvus | or strike that, that's not what i meant to do | 23:24 |
corvus | 0.875: 14336 is what i meant | 23:25 |
corvus | okay, the gzip pass finished and it's still about 28k/inode used | 23:27 |
fungi | sure, that's probably plenty of headroom until we execute a plan to drop the old logserver in favor of something better (sharded across volumes, stashed in swift, whatever) | 23:27 |
corvus | fungi: i'm inclined to only make a small change, so my gut instinct based on remembered data would be to do -i 14336 | 23:28 |
corvus | how's that sound? | 23:28 |
fungi | wfm | 23:29 |
corvus | #status log cloned all volumes from static.openstack.org for later fsck; replaced main10 device because it seemed slow and recreated logs logical volume. | 23:29 |
openstackstatus | corvus: finished logging | 23:29 |
fungi | matches what i would expect per the manpage | 23:29 |
fungi | i need to disappear again shortly to board yet another flight | 23:31 |
corvus | #status log created logs filesystem with "mkfs.ext4 -m 0 -j -i 14336 -L $NAME /dev/main/$NAME" http://paste.openstack.org/show/654140/ | 23:31 |
openstackstatus | corvus: finished logging | 23:31 |
corvus | fungi: that look reasonable? ^ | 23:31 |
fungi | not sure you need -j with mkfs.ext4 but it's likely fine? the rest is definitely sane | 23:32 |
corvus | okay, rsyncing the accumulated data | 23:34 |
fungi | yeah, based on my reading of the -t option (which is implied by mkfs.ext3 and 4) you'd have a journal created regardless (though could override to omit the journal under ext3 by adding a -O exclusion) | 23:35 |
fungi | i'll likely be offline again until fairly late tomorrow, sorry i can't be more help | 23:42 |
corvus | fungi: bon voyage! | 23:42 |
fungi | thanks! and also thanks to you and others for getting the logserver back on track | 23:43 |
fungi | i'll try to check back in again as soon as i can | 23:43 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!