Saturday, 2010-11-06

zaitcev -- ok, coding is basically done. Needs debugging and then deciding how and if we integrate things. Configuring through CLD is too arcane.01:54
* arcane is too01:55
uvirtbotNew bug: #671704 in swift "Stats collector failing to parse log lines, server_name match fails" [Undecided,New]
*** ChanServ sets mode: +v pvo02:59
*** krish has joined #openstack12:02
*** krish has joined #openstack13:18
*** krish has joined #openstack13:19
* jeremyb is looking at the Updaters and Auditors sections of the arch overview16:40
jeremyb... only as large as the frequency at which the updater runs and may not even be noticed ...16:41
jeremybbut it doesn't seem to say what an "updater" is16:42
jeremyb(is this channel dead on weekends? i'm new :) )16:42
jeremybalso, for auditors and initial writes how is integrity checked? are checksums generated, stored and verified? where are they stored and what algorithm? do clients do any checking or just the server?16:44
* jeremyb has 2 different use cases in mind: storing a repo of images (over a million? trying to get a count) and/or thumbs derived from those images (~5 std sizes, plus misc arbitrary sizes) and an unrelated project with ~25 million emails and various other file types related to each of the emails (lists of links extracted from emails, screenshots of emails)16:49
jeremyb(this is swift of course)16:49
notmynamejeremyb: let me see if I can help17:06
notmyname(and, yes, the channel is much less active than weekdays)17:07
jeremybnotmyname: hi!17:07
jeremybnotmyname: no rush, this is just when i got around to asking17:07
* jeremyb is screen'd :)17:07
notmynamejeremyb: when an object is PUT, swift attempts to write that in to the container listing and update the container metadata (object count, container size, etc)17:08
jeremybbut first writes the object itself?17:08
notmynamebut it will fail quickly to ensure good performance and queue it for an async update later17:08
notmynameya. object is written first17:08
notmynameso the updater handles the async requests17:09
notmynamethis is how you can be guaranteed to read your writes, but container listings may be eventually consistent17:09
notmynameintegrity is checked on writes with the etag header (md5 sum of the object)17:09
notmynamethe auditors scan the drives and verify that the checksums still match17:10
notmynamefor objects17:10
notmynameand verify that the db isn't corrupted for container and accounts17:10
* jeremyb grumbles @ md517:11
notmynameit's fast and standard. and we aren't at risk for any preimage attacks17:11
jeremybright :)17:11
notmynameyour million+ image use case is a great fit for swift17:11
jeremybunless the node is compromised17:11
jeremybdoes swift do any compression on disk or wire?17:12
notmynameI would recommend a few things to help your use case get better performance17:12
jeremybdoes each object require at least 1 file on disk? what if objects are smaller than block size?17:12
jeremybnote: not familiar with XFS17:13
notmynameI'm by no means the expert on the swift team for XFS (looks at redbo), but it hasn't been an issue that is problematic as far as I know17:13
* jeremyb listens for notmyname's reccomendations17:13
notmynamebut, yes, one file on disk per object17:14
jeremybi meant if there were 1k files. that's already double file size because of inode size17:14
notmynamerecommendation 1: use multiple containers. you will be able to use higher concurrency and get better throughput17:14
notmynameif there is a logical sharding method for your data, use that (grouped by month or date or resolution)17:15
notmynamealso, i'd recommend that you not use container listings to determine if your object exists. container listings are going to be eventually consistent and relatively slow (especially for million+ item containers)17:15
notmynameso I'd recommend that you keep a local index of your data. this also will let you sort and group better than swift allows17:16
notmynameI think using sqlite3 is nice because you can back up the db file itself to swift too17:16
* jeremyb reads17:18
notmynamesome sort of dynamic compression would be a very interesting feature to add to swift. I'll have to add it to my list of "things I'd like to see in swift"17:19
jeremybalso, was thinking about encryption17:20
jeremybone use case was:17:20
notmynameencryption is a whole different ballgame. I'd be really reluctant to add it because it makes swift have to handle encryption keys. I think it's a much better feature for a client17:20
* notmyname doesn't want to get in the business of key management17:21
notmynamejeremyb: make sense? answer your questions?17:21
kashif1could someone please help me, i am setting up openstack and when i issue the command euca-upload-bundle -m /tmp/kernel.manifest.xml -b mybucket17:22
jeremybnotmyname: yeah, was writing my fantasy use case17:22
kashif1it throws an error saying i dont have permission to mybucket17:22
jeremybnotmyname: and i was already planning to do the natural partitiioning you mentioned17:22
notmynamejeremyb: I'd still like to hear your use case17:24
jeremyb2-3 nodes per object in semitrusted DC + 2-3 untrusted+distributed nodes (e.g. stick a node in someone's house) (this is currently <10TB total so you could get a big bang for your buck with residential nodes)17:24
jeremybbut with data that you don't want leaking if something is stolen from the house17:25
jeremybi was thinking 4-5 sata drives with esata enclosure and a guruplug17:26
kashif1anybody can help me on the bucket permissions problem?17:26
notmynamejeremyb: interesting. one thing that has been mentioned (and will be talked about at the design summit next week) is having one logical cluster spread over a wide geographic area.17:27
jeremybdoesn't require much performance because there's no churn only additions so it's less than 1mbit/s to keep up once up to date17:27
notmynametechnically, it's possible now, but there are some things that would need to change to keep performance up17:27
notmynamethe general answer is that if you want your data to be encrypted, write encrypted data17:28
*** rsampaio has joined #openstack17:28
jeremybwell i'd at least want to ensure that the untrusted node can't change the checksum for something on another node17:28
notmynamejeremyb: so swift is divided into availability zones (see the Ring docs). these zones could be widely dispersed17:29
jeremybor delete or overwrite17:29
jeremybright, i've read some about those17:29
notmynamethat pretty much what the auditors do. the object auditor will scan the drive and compare the checksum to the stored checksum17:30
notmynameit will quarantine bad objects and replication (from other zones) will replace the data17:30
jeremybright, but where does the stored checksum come from?17:31
notmynamethe initial write17:31
jeremybi mean to feed the auditor17:31
notmynamethe local store (the object metadata, stored in the fs xattrs)17:32
notmynameso, yes, it's all local17:32
notmynamewe didn't design swift to provide perfect security in an untrusted environment. most of your needs could be solved by a client, but there are some things that swift is just not designed to do17:34
notmynamein your example, I'd be more concerned about the network security than the disk security, but most things go out the window when you give the attacker physical access to hardware17:35
notmynamekashif1: sorry I can't help with your issue. perhaps some nova experts will be on later17:36
kashif1thanks man17:37
notmynamebut I'm happy to help if you have questions about swift :-)17:37
notmynamejeremyb: thoughts?17:40
jeremybnotmyname: in a few, getting pinged in 3+ windows17:40
kim0Hi folks, I'm installing nova on ubuntu 10.10 based on the wiki guide. All steps are ok, except the final euca-run-instances is hanging for more than 5 minutes17:57
kim0any pointers as to what could be wrong17:58
* jeremyb reads back18:06
kim0ah yeah .. nvm .. some services were failing to start18:06
kim0namely nova-network .. because "dnsmasq" was already bound to the port18:06
kim0problem fixed for me18:06
kim0not sure though if it's a problem others would face18:07
kim0sweet instance launched18:08
jeremybnotmyname: so basically i wanted to allow for 2 "residential" nodes to fail while still having a trusted copy of everything. so at least metadata (including checksums) would be signed by a private key in a vault that the residential nodes don't have access to18:09
kim0euca-describe-instances => Yields 3 results .. the one VM that's up, plus two old VMs that failed to start when nova-network was down18:09
*** JordanRinke has quit IRC18:09
notmynamejeremyb: that wouldn't be compatible with swift at all :-)18:14
jeremybnotmyname: anyway, like i said that was more fantasy18:14
jeremybreal life: what about snapshots? i know objects are immutable which makes it easier18:15
notmynamestoring snapshots of something in swift or taking snapshots of swift?18:15
jeremybcan you make a fast and cheap copy of a container?18:15
jeremybsnaps of swift18:15
jeremybbut could be just a container not the whole cluster18:15
notmynamea container is an sqlite3 db file18:15
jeremybi meant including all objects ref'd18:16
notmynameit only has an object listing and some metadata18:16
notmynamethe only thing I know of that could store a backup of swift is swift ;-)18:16
notmynameI mean, where do you back up a 10PB cluster to?18:16
jeremybare you familiar with zfs?18:16
jeremybone sec18:17
notmynameare you asking about a copy-on-write snapshot type feature?18:17
jeremybso, both my use cases seem to grow around 20GB / day18:19
jeremyband it's all additions no deletes18:19
jeremyb1 is now ~9TB, 1 is ~12TB. so much smaller than 10PB18:20
notmynamei suppose object versioning would allow for something similar (versioning is something else on my "cool features to add to swift" list). the question is making it work at scale18:22
jeremybi don't think that's necessary even18:22
jeremybwhat you'd need is a way to prevent objects from being deleted if they're ref'd by a snap but not a "live" container18:23
jeremybso if i'm partiitioning on date then i want to iterate over containers periodically and decide that a given container will never get any more changes (or maybe just do it each time when switching to a new container) and then do a final backup of that container and lock it down by ACL18:24
jeremybwould be nice to be able to get atomic periodic backups of entire containers (to swift and then from there to anywhere) while they're still open for writes18:26
notmynamefor your current use case, or are we still talking "what-ifs"?18:26
jeremybcurrent use cases, both18:26
jeremybi guess another solution would be rotating containers, writing to 1 for a day then another, then switch back. back up each while idle18:27
jeremybinstead of just writing to one until full18:27
notmynameyour containers are locked down pretty tight by default, so there isn't a need to further lock them down after writing to them (IMO)18:27
notmynamebut atomic backups is not something that will ever happen18:28
jeremybeven per container?18:28
jeremybthe "guess another solution" would do it but they wouldn't be entirely up to date18:28
notmynamehow do you perform an atomic operation over all of the objects in a container when they are dispersed throughout hundreds of servers?18:29
notmynameeventually consistent backups, then18:29
notmynameand that, IMO, gets back in to the realm of the client rather than swift-proper18:30
jeremybso, at least in my cases, we can assume no deletes. so if it's in the container then it's readable. and objects are immutable so we don't have to worry about it changing during the backup18:30
jeremybbut the "no deletes" thing doesn't generalize18:31
* jeremyb goes to read on ACLs18:31
notmynameI've got to go do some stuff around the house18:32
jeremybk, thanks18:32
notmynamefeel free to ask any questions18:32
kim0killing a VM thru virsh, nova still thinks it's running18:41
jeremybcan containers be moved between accounts?18:44
jeremybalso, is there any log of actions so you could replay them if you had a point in time snapshot?18:44
jeremyb(would need to have actual data in them)18:44
notmynamejeremyb: containers cannot be moved between accounts. objects can be copied (swift-side) within an account.19:34
notmynamejeremyb: everything is logged, but the actual data isn't (or the logs would be a copy of the cluster!)19:34
jeremybhrmm, k19:34
jeremybthat's what i wanted :)19:34
notmynamejeremyb: well, i suppose that a token with read access could be copied to a different account19:35
notmynameessentially, the server-side copy feature does a GET + PUT19:35
notmynameso if the GET works, the PUT will work too19:35
jeremybi was just wondering about that. much more interested in logs with data :)19:36
notmynamehonestly, why? then your log files are as big as the cluster.19:38
*** dubsquared has quit IRC19:38
jeremybnotmyname: same as mysql. then you can back up the whole cluster and in between full backups back up the logs. then use the latest full + logs since then to recover20:02
*** rsampaio has quit IRC20:12
*** xfrogman5 has quit IRC20:20
