*** jeblair has joined #openstack-infra-incident | 01:47 | |
*** anteaya has joined #openstack-infra-incident | 01:47 | |
fungi | quieter here | 01:47 |
---|---|---|
*** ChanServ changes topic to "wiki compromise" | 01:48 | |
anteaya | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=355&rra_id=all | 01:48 |
fungi | so anyway, in combining jeblair's and my theories, best guess is that i missed that the trusty upgrade moved our v4 rules file incompatible wit the way it was previously symlinked, so a reboot ended with no v4 rules loaded exposing the elasticsearch api to the internet | 01:50 |
jeblair | fungi: bup claims to have daily backups up to and including 8-16. they appear at 05:37 | 01:51 |
jeblair | they are append-only backups and can therefore be used for forensics as well as recovery | 01:52 |
jeblair | having said that, i don't think we've done a restore test on wiki | 01:52 |
jeblair | so i don't beleive anyone has verified whether there's anything actually in the backup. | 01:53 |
fungi | with file uploads disabled, we're probably fine rolling back to the upgrade and using the current db from trove | 01:53 |
fungi | i'll see if the snapshot i saved comes up in a sane state | 01:53 |
jeblair | fungi: aren't there db migrations? | 01:53 |
fungi | yes, i mean reupgrade the snapshot and use the migrated db | 01:53 |
jeblair | fungi: okay, so create new server from snapshot, upgrade mediawiki, check firewall, then online? | 01:54 |
fungi | right | 01:54 |
fungi | no idea how long it'll take this to build from the snapshot though | 01:54 |
fungi | though actually, we _do_ have one already booted and configured | 01:56 |
fungi | with the db in trove, wiki-upgrade-test.openstack.org can simply have apache reenabled and started, and dns pointed at it | 01:57 |
anteaya | that sounds faster | 01:57 |
jeblair | fungi: was it similarly exposed? | 01:58 |
jeblair | (i worry about whether we can trust it) | 01:58 |
anteaya | I'm grateful for what you are worried about | 01:59 |
jeblair | i worry a lot so i hope your gratitude is boundless :) | 01:59 |
fungi | jeblair: not so far as i could tell. after the upgrade puppet ended up blocking everything because the server wasn't in system-config at all so ti got our default ssh-only rules | 01:59 |
anteaya | jeblair: my gratitute for your worry is boundless :) | 02:00 |
fungi | we didn't turn off puppet for wiki-upgrade-test until after the upgrade, and then i had to manually copy the iptables rules back over from wiki.o.o | 02:00 |
jeblair | (i also worry a little about whether someone could have elevated wiki permissions by modifying the database, though i confess that might be somewhat low risk) | 02:00 |
anteaya | I think that is a fair worry | 02:01 |
anteaya | can we check if anyone was granted admin wiki status since friday? | 02:01 |
anteaya | from what I'm guessing cacti is telling me whoever was doing what they were doing were doing it between 13:00 and 19:00 today | 02:02 |
jeblair | fungi: there must have been days when the clone and the real server were both online? | 02:03 |
fungi | yeah, it's possible, though it's also an unlikely target. once most scripted compromises get access to a shell their main interest is in using the server to run other stuff. it's possible we got compromised by someone who knew it was running mediawiki and wanted to leverage it to leave some sort of backdoor in the db, but it's on the unlikely end of things | 02:03 |
fungi | jeblair: by clone you mean wiki-upgrade-test.o.o? | 02:03 |
fungi | only one was pointed at the trove db at a time | 02:03 |
jeblair | fungi: i'm trying to figure out how this affects backups | 02:03 |
fungi | oh... | 02:04 |
anteaya | oh you are talking about server permissions, not wiki permissions, sorry | 02:04 |
jeblair | i imagine if we had to copies of a server online at a time, they would both run backups. i only see one backup per day, so it might be that bup only allowed the first one through, and which one that was was random... | 02:04 |
jeblair | anteaya: well, could be either | 02:05 |
fungi | yeah, both of them have the same backup cronjob | 02:05 |
anteaya | ah so we might not be able to trust the backups :( | 02:05 |
jeblair | anteaya: i'd rather say that they may not contain the data we expect | 02:06 |
fungi | it hadn't dawned on me that the backups are initiated from the servers being backed up, so clones are certainly a danger for servers with backup configuration in this case | 02:06 |
anteaya | jeblair: okay a better way of putting it | 02:06 |
anteaya | so is fungi's snapshot the most recent backup that would contain the data we expect? | 02:06 |
anteaya | that we have confidence in? | 02:07 |
jeblair | anteaya: with a key of when the clones were active, we can figure the last date that was certainly the old server. backups after that could be one or the other, and it may be possible to determine which by clues on their filesystems. | 02:07 |
jeblair | also, we need to rotate the backup key for wiki.o.o after this, because *it* has been compromised | 02:08 |
anteaya | ack | 02:08 |
jeblair | (the append-only nature of the backups makes them still reliable though [modulo the clone issue]) | 02:09 |
jeblair | i have moved the authorized_keys file for wiki on the backup server out of the way so that key can no longer be used to access the server | 02:10 |
fungi | yeah, the wiki-upgrade-test clone was booted a few weeks ago, so i wouldn't trust it to necessarily be production data past july 21 (that's what rackspace says is when i created it) | 02:10 |
jeblair | so we won't get a backup again until we fix that (by creating a new key, ideally) | 02:11 |
jeblair | okay, well if we need the backups, we can poke at that, but i'm not going to restore 26 copies of the wiki server out of curiosity. :) | 02:12 |
anteaya | ha ha ha | 02:12 |
jeblair | that would probably take 26 days. | 02:12 |
anteaya | I better things planned for the next 26 days | 02:13 |
jeblair | dinner is here, i have to run | 02:13 |
anteaya | enjoy dinner | 02:13 |
anteaya | thank you | 02:13 |
fungi | thanks jeblair! | 02:13 |
anteaya | fungi: happy to help or listen | 02:14 |
anteaya | or help by listening | 02:15 |
fungi | i'm moving forward pointing wiki.o.o dns at wiki-upgrade-test.o.o and taking the compromised wiki.o.o offline | 02:15 |
anteaya | ack | 02:15 |
fungi | teh boot from snapshot is still spinning at "10%" so no idea how long that'll take | 02:16 |
anteaya | woooo | 02:16 |
anteaya | the watching paint dry routine | 02:16 |
anteaya | your wife loves this one | 02:16 |
fungi | okay, dns has propagated, configs have been adjusted on wiki-upgrade-test for the correct vhost name, and i've confirmed the firewall rules there are and were still sane | 02:34 |
*** ChanServ changes topic to "situation normal" | 02:59 | |
*** openstack has joined #openstack-infra-incident | 10:16 | |
*** pabelanger has quit IRC | 16:54 | |
*** pabelanger has joined #openstack-infra-incident | 16:54 | |
-openstackstatus- NOTICE: The volume for logs.openstack.org filled up rather suddenly, causing a number of jobs to fail with a POST_FAILURE result and no logs; we're manually expiring some logs now to buy breathing room, but any changes which hit that in the past few minutes will need to be rechecked and/or approved again | 19:45 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!