*** artom has quit IRC | 00:11 | |
openstackgerrit | Merged opendev/system-config master: gitea backup: prune some large directories https://review.opendev.org/c/opendev/system-config/+/771534 | 00:22 |
---|---|---|
openstackgerrit | Merged opendev/system-config master: borg-backup: fix logrotate name https://review.opendev.org/c/opendev/system-config/+/771557 | 00:22 |
*** akrpan-pure has quit IRC | 00:25 | |
*** klonn has quit IRC | 00:26 | |
clarkb | https://nb02.opendev.org/centos-8-0000140776.log that looks promising (says the build succeeded) | 00:30 |
clarkb | I've cleaned up those three dirs in dib_tmp as they contiued to not go away and the builder had moved on to its third image build | 00:47 |
clarkb | nb02 is looking good so far, it now needs time to take load off of nb01 | 00:48 |
clarkb | ianw: fungi I noticed that the afs servers are still in the emergency file when I was modifying it earlier for the nb0X servers. | 00:48 |
clarkb | Not sure if they still need to be there? | 00:48 |
ianw | clarkb: was just making sure that the new ansible was ok ... which it wasn't :) | 00:48 |
ianw | but i think it's good now. ord has it's correct rules and i'm pretty confident it's all acting idempotently | 00:49 |
*** mlavalle has quit IRC | 00:49 | |
auristor | there is something odd because "vos examine docs" doesn't show the entry locked for a release but there is clearly a volume transfer from afs01.dfw to afs01.ord in flight. | 00:52 |
clarkb | nb02 has built two images now | 00:53 |
*** DSpider has quit IRC | 00:55 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-bcakup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 01:03 |
ianw | auristor: yeah, i think i unlocked it, not realising the cron job to release it had just kicked off | 01:03 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 01:04 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 01:20 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 01:34 |
*** hamalq has quit IRC | 01:40 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 02:05 |
*** lbragstad_ is now known as lbragstad | 02:15 | |
*** hemanth_n has joined #opendev | 02:18 | |
ianw | clarkb: https://4af1cb710e8a42609c94-0507f0d56ad72621551127593d6d8a94.ssl.cf5.rackcdn.com/771738/5/check/system-config-run-borg-backup/8d5f5e5/borg-backup-test01.opendev.org/borg-backup-borg-backup01.region.provider.opendev.org.log | 02:35 |
ianw | clarkb: hrmmm, i think need to think about pruning with multiple archives | 02:35 |
ianw | Keeping archive: borg-backup-test01-random-2021-01-21T02:27:13 Thu, 2021-01-21 02:27:14 [5f875153437165003c135a6c0b45e96f48fe7a9a876f669bcec94bb4d653b90c] | 02:35 |
ianw | Pruning archive: borg-backup-test01-2021-01-21T02:27:04 Thu, 2021-01-21 02:27:05 [3b2d8dcaa0f43db571febfe846f60c698de4a17d9397a41c401e5b3b1f2daca4] (1/1) | 02:35 |
ianw | with --daily, it deletes on of the mysqldump stream or file backup | 02:35 |
ianw | or i guess we need separate prunes with prefixes maybe ... | 02:37 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: separate archives https://review.opendev.org/c/opendev/system-config/+/771748 | 03:28 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: separate archives https://review.opendev.org/c/opendev/system-config/+/771748 | 04:04 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: separate archives https://review.opendev.org/c/opendev/system-config/+/771748 | 04:33 |
*** ykarel has joined #opendev | 04:52 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: separate archives https://review.opendev.org/c/opendev/system-config/+/771748 | 05:39 |
*** whoami-rajat__ has joined #opendev | 05:56 | |
*** ysandeep|away is now known as ysandeep | 06:06 | |
*** marios has joined #opendev | 06:15 | |
*** ykarel_ has joined #opendev | 06:16 | |
*** ykarel has quit IRC | 06:19 | |
openstackgerrit | Rico Lin proposed openstack/project-config master: Add ubuntu bionic and focal with xxxlarge flavor https://review.opendev.org/c/openstack/project-config/+/771565 | 07:02 |
*** lpetrut has joined #opendev | 07:10 | |
*** tkajinam_ has joined #opendev | 07:19 | |
*** tkajinam has quit IRC | 07:20 | |
*** ykarel_ is now known as ykarel | 07:27 | |
*** eolivare has joined #opendev | 07:39 | |
*** ralonsoh has joined #opendev | 07:41 | |
*** jpena|off is now known as jpena | 07:52 | |
*** sboyron has joined #opendev | 07:57 | |
*** slaweq has joined #opendev | 07:58 | |
*** DSpider has joined #opendev | 08:03 | |
*** fressi has joined #opendev | 08:06 | |
*** rpittau|afk is now known as rpittau | 08:11 | |
*** sboyron has quit IRC | 08:14 | |
*** tosky has joined #opendev | 08:16 | |
*** hashar has joined #opendev | 08:23 | |
*** sboyron has joined #opendev | 08:24 | |
*** andrewbonney has joined #opendev | 08:24 | |
openstackgerrit | Alfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream https://review.opendev.org/c/zuul/zuul-jobs/+/770815 | 08:33 |
*** sboyron has quit IRC | 08:37 | |
*** brinzhang has quit IRC | 08:55 | |
*** brinzhang has joined #opendev | 08:55 | |
*** brinzhang has quit IRC | 08:57 | |
*** brinzhang has joined #opendev | 08:57 | |
openstackgerrit | Alfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream https://review.opendev.org/c/zuul/zuul-jobs/+/770815 | 09:00 |
*** jaicaa has quit IRC | 09:16 | |
*** jaicaa has joined #opendev | 09:18 | |
*** lpetrut_ has joined #opendev | 09:34 | |
*** ykarel_ has joined #opendev | 09:34 | |
*** tosky has quit IRC | 09:36 | |
*** tosky_ has joined #opendev | 09:36 | |
*** lpetrut has quit IRC | 09:37 | |
*** ykarel has quit IRC | 09:37 | |
*** ykarel_ is now known as ykarel | 09:39 | |
*** tosky_ is now known as tosky | 09:43 | |
*** zoharm has joined #opendev | 09:58 | |
openstackgerrit | Daniel Lublin proposed opendev/git-review master: Allow choosing field for author in named branch https://review.opendev.org/c/opendev/git-review/+/444574 | 10:11 |
openstackgerrit | Guillaume Chauvel proposed opendev/system-config master: Increase autogenerated comment width to avoid line wrap https://review.opendev.org/c/opendev/system-config/+/771445 | 10:14 |
*** ysandeep is now known as ysandeep|afk | 10:14 | |
*** sboyron has joined #opendev | 10:14 | |
*** hashar is now known as hasharAway | 10:53 | |
*** dtantsur|afk is now known as dtantsur | 10:58 | |
openstackgerrit | Guillaume Chauvel proposed opendev/system-config master: Increase autogenerated comment width to avoid line wrap https://review.opendev.org/c/opendev/system-config/+/771445 | 11:33 |
openstackgerrit | Guillaume Chauvel proposed opendev/system-config master: [DNM] test comment width: review without autogenerated tag https://review.opendev.org/c/opendev/system-config/+/771798 | 11:33 |
*** ysandeep|afk is now known as ysandeep | 11:36 | |
*** jpena is now known as jpena|lunch | 12:30 | |
openstackgerrit | Rico Lin proposed openstack/project-config master: Add ubuntu bionic and focal with xxxlarge flavor https://review.opendev.org/c/openstack/project-config/+/771565 | 12:56 |
*** hemanth_n has quit IRC | 13:02 | |
*** tosky has quit IRC | 13:06 | |
*** tosky has joined #opendev | 13:06 | |
openstackgerrit | Dmitry Tantsur proposed openstack/diskimage-builder master: Remove the deprecated ironic-agent element https://review.opendev.org/c/openstack/diskimage-builder/+/771808 | 13:07 |
*** jpena|lunch is now known as jpena | 13:29 | |
*** owalsh has quit IRC | 13:48 | |
*** owalsh has joined #opendev | 14:08 | |
*** ykarel is now known as ykarel|mtg | 14:10 | |
*** hemanth_n has joined #opendev | 14:21 | |
kopecmartin | clarkb: hi, when you have a moment, could you have a look at this please? https://review.opendev.org/c/opendev/system-config/+/705258 .. i got patchset 22 passing, i have troubles with https support, any advice? | 14:27 |
*** artom has joined #opendev | 14:28 | |
*** klonn has joined #opendev | 14:29 | |
mordred | infra-root: I've been getting some email bounces from mttest and buildx-test - which are both test vms I had spawned up a while ago. I'm going to delete them | 14:37 |
fungi | mordred: thanks! i kept meaning to ask you about whether we could delete mttest | 14:38 |
fungi | i hadn't noticed buildx-test but ++cleanup | 14:38 |
mordred | fungi: in fact, I'm going to delete both mttest's and mttest-docker too | 14:40 |
mordred | at least I name things consistently :) | 14:41 |
fungi | awesome | 14:41 |
openstackgerrit | Merged openstack/project-config master: Add ubuntu bionic and focal with xxxlarge flavor https://review.opendev.org/c/openstack/project-config/+/771565 | 14:42 |
*** zul has joined #opendev | 14:42 | |
mordred | fungi: done | 14:42 |
*** zimmerry has quit IRC | 14:45 | |
*** sshnaidm|ruck is now known as sshnaidm|afk | 14:49 | |
fungi | thanks again! | 14:50 |
*** brinzhang has quit IRC | 15:01 | |
*** brinzhang has joined #opendev | 15:01 | |
*** brinzhang has quit IRC | 15:04 | |
*** brinzhang has joined #opendev | 15:04 | |
*** zimmerry has joined #opendev | 15:04 | |
*** fressi has left #opendev | 15:07 | |
*** ykarel|mtg is now known as ykarel | 15:08 | |
*** hasharAway has quit IRC | 15:18 | |
*** Eighth_Doctor has quit IRC | 15:19 | |
*** mordred has quit IRC | 15:20 | |
*** hemanth_n has quit IRC | 15:25 | |
*** mordred has joined #opendev | 15:29 | |
*** klonn has quit IRC | 15:33 | |
*** lpetrut_ has quit IRC | 15:42 | |
*** Eighth_Doctor has joined #opendev | 15:49 | |
clarkb | I've reenabled nb01's builder. Will remove it from the emergency file now | 15:50 |
*** mlavalle has joined #opendev | 15:51 | |
clarkb | the disk use is much more balanced between the two servers now | 15:52 |
clarkb | kopecmartin: I need to find breakfast, but will take a look after wards | 15:53 |
*** sshnaidm|afk is now known as sshnaidm|ruck | 15:56 | |
*** hashar has joined #opendev | 15:59 | |
clarkb | kopecmartin: I think you need to add an apache2 reverse proxy to terminate 80 and 443. I'm not seeing that in the current role | 16:09 |
clarkb | kopecmartin: our letsencrypt stuff has a test mode that will use a self signed cert which you can use | 16:09 |
clarkb | looking for some examples from other services now | 16:09 |
clarkb | kopecmartin: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/codesearch/templates/codesearch.vhost.j2 that is an apache config for the codesearch reverse proxy and https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/codesearch/tasks/main.yaml#L11-L37 is the ansible to manage the service | 16:11 |
openstackgerrit | Merged zuul/zuul-jobs master: Temporarily stop running Gentoo base role tests https://review.opendev.org/c/zuul/zuul-jobs/+/771105 | 16:11 |
kopecmartin | clarkb: thanks, i was trying to analogically compare the code with other servers' code but it takes a lot of time as this system-config stuff is new to me .. | 16:16 |
kopecmartin | thank you for the links, i'm gonna check them | 16:16 |
clarkb | kopecmartin: I would start just by getting apache running with a config and then worry about getting the letsencrypt testing set up next | 16:17 |
clarkb | since I think you should be able to mostly confirm that apache is running first on port 80 even if 443 doesn't work yet | 16:17 |
kopecmartin | clarkb: hm, in patchset 22 the test_refstack_listening passed , didn't that confirm that apache is running? | 16:19 |
clarkb | kopecmartin: no, that checked port 8000 which is the refstack python daemon service | 16:21 |
clarkb | kopecmartin: we want apache2 to proxy port 443 to port 8000 and do the ssl termination in apache (it should also redirect port 80 to port 443) | 16:21 |
kopecmartin | ah, ok, makes sense | 16:23 |
clarkb | kopecmartin: the codesearch apache config should do almost exactly that but for a different service that uses port 6080 instead of 8000 | 16:23 |
clarkb | kopecmartin: feel free to point out when new patchsets show up and I can rereview too | 16:29 |
*** ysandeep is now known as ysandeep|away | 16:31 | |
openstackgerrit | Merged opendev/git-review master: Fix bug in git_credentials() https://review.opendev.org/c/opendev/git-review/+/753946 | 16:34 |
*** lpetrut has joined #opendev | 16:42 | |
*** klonn has joined #opendev | 16:42 | |
*** marios is now known as marios|out | 16:44 | |
*** lpetrut has quit IRC | 16:50 | |
*** jpena is now known as jpena|off | 16:59 | |
*** marios|out has quit IRC | 17:03 | |
clarkb | fungi: the tc meeting got me thinking about general capacity issues and while I looked last week to see if any clouds were just hard failing I didn't realize that inap was also still off | 17:04 |
clarkb | I suppose we could try turning it on to see if the ip arp issues persist? | 17:04 |
fungi | yeah, we've tried several times but worth trying again, and giving mgagne (or someone) a new list of affected addresses | 17:06 |
clarkb | let me push that change up (along with a revrt) | 17:06 |
clarkb | or maybe we already have a revrt to the disable? | 17:06 |
fungi | not sure | 17:07 |
fungi | i can probably go digging after the next couple of meetings wrap up | 17:07 |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Revert "Revert "Revert "Temporarily stop booting nodes in inap-mtl01""" https://review.opendev.org/c/openstack/project-config/+/771857 | 17:09 |
clarkb | easy enough to just make a change ^ | 17:09 |
*** ykarel has quit IRC | 17:09 | |
*** ykarel has joined #opendev | 17:11 | |
artom | Is /etc/nodepool/id_rsa(.pub) something that Zuul or another "base" component generates on the VMs running CI jobs | 17:13 |
artom | ? | 17:13 |
artom | I'm tyring to track down where it's from... | 17:13 |
clarkb | artom: I think our base jobs do that as a way to keep backward compatibility from forever ago when nodepool did it | 17:13 |
clarkb | let me see if I can find it | 17:13 |
artom | It looks like TripleO-CI uses it (and creates if it's not already present) | 17:14 |
artom | But changing the way it's created in there didn't affect anything, so I'm assuming it comes from somewhere else | 17:14 |
artom | My "actual" issue is - it's generated in OpenSSH format, and read by some code that's using paramiko, which doesn't support the exact format | 17:15 |
clarkb | artom: openstack/openstack-zuul-jobs/playbooks/legacy/pre.yaml | 17:15 |
artom | clarkb++ | 17:16 |
clarkb | another option would be to get off of the legacy base job stuff and just do it directly in the way you need it | 17:16 |
clarkb | or modify the legacy stuff to do a ssh-keygen -m PEM conversion | 17:16 |
artom | Not a choice I have, I depend on the tripleo job for what I'm tyring to do | 17:16 |
clarkb | also you can do that conversion at any time before you use it I suppose | 17:17 |
clarkb | new openssh reads the PEM format just fine | 17:17 |
clarkb | eventually it might stop, but for now that seems fine | 17:17 |
*** ykarel has quit IRC | 17:17 | |
fungi | i expect it will be *many years* before openssh starts to refuse to read pem formatted private keys | 17:18 |
fungi | they waited 5+ years from when they added support for the new key format before they changed the default in ssh-keygen, after all | 17:19 |
artom | clarkb, I don't support I'd be allowed to propose a change to the legacy playbooks that allow you to specify the format with a variable? | 17:19 |
artom | I tried a conversion on my laptop, while paramiko read the key fine, it wasn't able to authenticate with the public key... | 17:19 |
clarkb | I wouldn't make it optional I would just always write a pem version there | 17:20 |
clarkb | since it is forward compatbile and backward compatible unlike the new ssh format | 17:20 |
clarkb | `ssh-keygen -p -m PEM -f ./$FILE -N '' -P ''` is the conversion process | 17:20 |
clarkb | and it shouldn't affect the public key at all | 17:21 |
clarkb | that command says change the keyfile's passphrase from empty '' to empty '' and in the process we can side effect a format change with -m PEM | 17:22 |
* artom tries again | 17:22 | |
artom | (Locally) | 17:22 |
*** hamalq has joined #opendev | 17:33 | |
artom | OK, I think the auth failures with the converted key are un-related... something about paramiko sending the wrong pubkey type... | 17:35 |
artom | clarkb, so is https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/771858 what you had in mind, or am I meant to add an override for that in my own job only? | 17:35 |
clarkb | are you on fedora 33? | 17:35 |
artom | Yeah | 17:35 |
clarkb | fedora 33 has effectively broken all ssh-rsa due to sillyness | 17:36 |
clarkb | (thats my personal opinion) | 17:36 |
clarkb | give me a mintue and I can try and summarize | 17:36 |
clarkb | yes that change is roughly what I had in mind | 17:36 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: remove eselect python from gentoo element https://review.opendev.org/c/openstack/diskimage-builder/+/771861 | 17:36 |
artom | clarkb, cool, thanks - I was worried about changing stuff this "low down" in the CI stack | 17:37 |
clarkb | artom: SSH uses rsa for a number of things like authentication and verification of host keys (to avoid mitm attacks) | 17:37 |
clarkb | I think to make things more manageable sizes pubkey information is exchanged in a hashed form. The old school hash is sha1 | 17:38 |
clarkb | so when ssh does things called 'ssh-rsa' that means rsa + sha1 when hashing stuff | 17:38 |
clarkb | beacuse sha1 is no longer considered strong enough for security work openssh has deprecated ssh-rsa (note deprecated, not disabled) for host key verification and supports rsa-sha2-256 and rsa-sha2-512 as alternatives to continue verifying rsa host keys but with much stronger sha2 hashes | 17:39 |
clarkb | fedora 33 has disabled all use of ssh-rsa including for the use of host key verification and authentication (note upstream hasn't even deprecated the authentication side yet, only hostkey verification as far as I can tell) | 17:39 |
*** lpetrut has joined #opendev | 17:40 | |
clarkb | the authentication side is where things break beacuse in order to use rsa-sha2-256 and rsa-sha2-512 for authentication instead of the sha1 ssh-rsa both the client and the server must support kex extensions to specify server-sig-algs | 17:40 |
clarkb | fedora 33 users have struggled with using rsa keys to talk to our gerrit server beacuse the java sshd there does not support server-sig-algs extension and openssh falls back to ssh-rsa in that case which is disabled | 17:41 |
*** diablo_rojo has quit IRC | 17:41 | |
clarkb | the rfc says that clients can one day default to a sha2 variant in this case, but we haven't reached that point yet I guess | 17:42 |
artom | Ah, so that's why you grok this so well - Gerrit SSH auth :) | 17:42 |
*** rpittau is now known as rpittau|afk | 17:42 | |
clarkb | ya :) | 17:42 |
artom | Thanks for the crash course, very concise and understandable | 17:42 |
openstackgerrit | Merged openstack/project-config master: Revert "Revert "Revert "Temporarily stop booting nodes in inap-mtl01""" https://review.opendev.org/c/openstack/project-config/+/771857 | 17:42 |
clarkb | artom: if you do ssh -v you can see the server-sig-algs get negotiated by the server if supported | 17:42 |
clarkb | whcih can help narrow down where the problem is. | 17:43 |
clarkb | Or if you just want to make things work you can reenable ssh-rsa via ssh config changes on fedora33 or you can use ecdsa or ed25119 | 17:43 |
artom | I can auth to Gerrit just fine, btw :) | 17:43 |
clarkb | I think we're suggesting that people should use ecdsa or ed25119 totalk to our gerrit from fedora 33 ratehr than reduce the distro's security stance | 17:44 |
artom | This is a CI job thing... Tempest plugin Python code using paramiko SSHing into the nodeset VMs | 17:44 |
clarkb | that said, the reason I call it silly is I think fedora 33 should've updated openssh to fallback to rsa-sha2-512 instead of ssh-rsa since they have disabled ssh-rsa | 17:44 |
clarkb | their users won't be any worse off if the server doesn't negotiate rsa-sha2-512, it will fail if the server can't support it. But if the server can support it it should work | 17:45 |
clarkb | artom: ya in that case my bet is paramiko either doesn't know how to do rsa-sha2-* or paramiko isn't handling server-sig-algs properly | 17:45 |
clarkb | unless this is a cirros vm. maybe dropbear doesn't do kex extensions like gerrit | 17:46 |
clarkb | I could see that being the case since dropbear sshd is super tiny | 17:46 |
artom | I think the latter? Because with the converted key, SSHing with paramiko to localhost fails | 17:46 |
clarkb | artom: if paramiko has the equivalent of ssh -vvv turning that on and reviewing the log would be helpful. I can probably skim it too since I spent a bunch of time doing that recently | 17:47 |
artom | With the server complaining about "userauth_pubkey: key type ssh-rsa not in PubkeyAcceptedKeyTypes [preauth]" | 17:47 |
artom | And setting PubkeyAcceptedKeyTypes in sshd.conf doesn't seem to affect anything | 17:47 |
clarkb | artom: oh ya localhost will fail if it is fedora 33 since it doesn't want to do ssh-rsa | 17:47 |
clarkb | modifying PubkeyAcceptedKeyTypes should work but its apparently more complicated than people thought | 17:48 |
artom | So how come ssh -i <that same key> works? | 17:48 |
clarkb | because openssh `ssh` can tell the server it will do rsa-sha2-* with that same key | 17:48 |
clarkb | the on disk format doesn't affect the hash used, just the client and servers supported protocols | 17:48 |
clarkb | the hash is a hash of the stuff on disk and is calculated at runtime based on what they negotiate between themselves | 17:49 |
artom | Ah, so stacking issues here... | 17:49 |
clarkb | (the pem format things makes this confusing because there is a separate file format issue but that is independent) | 17:49 |
artom | 1. paramiko doesn't understand the new openssh key format | 17:49 |
artom | And 2. It can't "match protocols" (sorry for the vulgar oversimplification) | 17:50 |
clarkb | that is a good simplification | 17:50 |
clarkb | (un)fortunately I have yet to debug thsi in the context of paramiko as the client so don't have any great pointers to a fix for paramiko | 17:51 |
clarkb | but hopefully the general problem description simplifies your debugging | 17:51 |
artom | I helps me understand the situation better | 17:51 |
artom | For debugging I'm just re-running my job with a Depends-on: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/771858 :) | 17:52 |
*** ralonsoh has quit IRC | 17:52 | |
*** lpetrut has quit IRC | 17:55 | |
clarkb | fungi: inap is in use in nodepool now, but no successful boots yet. Going to check logs momentarily | 18:06 |
fungi | fun | 18:08 |
clarkb | if I had to guess server launches timed out due to hypervisor image caches being stale, but not finding any evidence of that yet (or of any failures, need to dig more) | 18:09 |
clarkb | ya "openstack.exceptions.ResourceTimeout: Timeout waiting for the server to come up." appears to be the error which has been in the past due to stale images | 18:11 |
clarkb | I guess give it another hour and check back in | 18:11 |
clarkb | ya the in use number is non zero now | 18:16 |
clarkb | small, but trending the right way | 18:16 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: remove eselect python from gentoo element https://review.opendev.org/c/openstack/diskimage-builder/+/771861 | 18:16 |
clarkb | I have rechecked 775121 (the xenial openafs ppa cleanup in openafs-client role) it failed in the gate beacuse a couple of jobs failed to add the ppa due to a timeout talking to gpg keyservers | 18:25 |
clarkb | *771521 | 18:25 |
*** dtantsur is now known as dtantsur|afk | 18:32 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: DNM: simplify updating python versions in gentoo https://review.opendev.org/c/openstack/diskimage-builder/+/771861 | 18:36 |
clarkb | anyone understand why a bunch of zuul changes are in the openstack zuul check queue? | 18:54 |
clarkb | they aren't running any jobs and seem to just be hanging out | 18:54 |
clarkb | all the queue lenght values are 0 | 18:55 |
*** andrewbonney has quit IRC | 18:55 | |
*** _mlavalle_1 has joined #opendev | 19:00 | |
fungi | i don't see them | 19:01 |
fungi | was it maybe momentary? | 19:01 |
*** mlavalle has quit IRC | 19:03 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 19:08 |
*** ShadowJonathan has quit IRC | 19:27 | |
*** bbezak has quit IRC | 19:28 | |
*** bbezak has joined #opendev | 19:28 | |
*** ShadowJonathan has joined #opendev | 19:28 | |
*** klonn has quit IRC | 19:33 | |
clarkb | fungi: ya I did a manual refresh and they have gone away | 19:51 |
*** slaweq has quit IRC | 19:55 | |
*** zoharm has quit IRC | 20:00 | |
*** hashar has quit IRC | 20:00 | |
clarkb | inap seems to have stabilized from a node launching perspective | 20:02 |
clarkb | now we just need to keep an eye out for the ip issues | 20:02 |
*** sboyron has quit IRC | 20:09 | |
fungi | last few times we tried it started to show up fairly quickly | 20:11 |
fungi | i think maybe i used the e-r signature for changed host keys to spot it | 20:11 |
clarkb | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_874/705258/25/check/system-config-run-refstack/8742657/job-output.txt is a post run failure in inap | 20:12 |
clarkb | looking at the log I don't see a clear reason for why it failed though | 20:13 |
clarkb | it says ok ok, skipping skipping, then not much info and it reports failed: 1 for each host | 20:13 |
clarkb | the job failed properly too though | 20:13 |
clarkb | so maybe a side effect from the runtime failure? | 20:13 |
*** tosky has quit IRC | 20:14 | |
*** tosky has joined #opendev | 20:15 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 20:16 |
*** eolivare has quit IRC | 20:17 | |
openstackgerrit | Ghanshyam proposed opendev/subunit2sql master: Fix compatibility with latest oslo.config https://review.opendev.org/c/opendev/subunit2sql/+/764832 | 20:30 |
*** klonn has joined #opendev | 20:34 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: DNM: simplify updating python versions in gentoo https://review.opendev.org/c/openstack/diskimage-builder/+/771861 | 20:45 |
*** klonn has quit IRC | 20:50 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: simplify updating python versions in gentoo https://review.opendev.org/c/openstack/diskimage-builder/+/771861 | 20:54 |
*** whoami-rajat__ has quit IRC | 21:04 | |
ianw | clarkb: if you have a little time, love to get your review on https://review.opendev.org/c/opendev/system-config/+/771738 and 771748 | 21:07 |
ianw | the gist is individual hosts can add a file in a directory that streams output that the backup script then looks for and runs, and puts in a separate archive | 21:08 |
ianw | fungi: i think we're back to 100% on afs ... even docs is running fine right? no other known issues ATM? | 21:08 |
ianw | clarkb: i can probably squash 771748 into it as well if we like; that just makes sure we prune the archives separately (otherwise you only keep one archive per day, which fails when you have a filesystem archive AND db archive) | 21:09 |
fungi | ianw: yep, i think we're all clear on afs today so can think about distro upgrades for the servers | 21:12 |
clarkb | ianw: at first glance squashing those makes sense. I'll have to do proper review of the setup in the parent though. One concern is that complicates getting complete backups setup for new hosts/services, but maybe we just do our best to document that | 21:20 |
ianw | clarkb: yeah, i should also add a documentation. i think it's likely new services liberally copy-paste things like the db backup bits so that might help | 21:21 |
clarkb | ianw: that might be a good one to get fungi and corvus to weigh in on too? just to make sure we're comfortable with split backups like that | 21:22 |
corvus | hioh | 21:22 |
ianw | corvus: https://review.opendev.org/c/opendev/system-config/+/771738 and 771748 are the question | 21:22 |
ianw | note they're not *totally* split; like it's still all in the one directory on the backup server, just different archive entries | 21:23 |
clarkb | they are split on the front end though s you have to configure them separately | 21:23 |
clarkb | which is I think my biggest concern | 21:24 |
clarkb | (it adds an extra step to getting proper backups) | 21:24 |
corvus | why is the prune separate? is it because you can't have the db stream and the filesystem in the same archive? | 21:24 |
clarkb | corvus: that is my understanding of it. They are separate archives so separate pruns | 21:25 |
clarkb | if you try to prune together with the shared prefix it prunes things improperly | 21:25 |
mordred | ianw: you may want to add --skip-extended-insert | 21:25 |
ianw | mordred: yeah, i was thinking you might have some ideas on the most effective way to dump :) | 21:26 |
mordred | otherwise you get giant insert lines that might be bad or differential? or maybe it's ok if borg is smarter than diff | 21:26 |
*** owalsh has quit IRC | 21:26 | |
ianw | there's also a --compact | 21:26 |
mordred | yah - I think you want the opposite of that | 21:26 |
clarkb | thinking out loud this will take us from 1 db dump a day to 3 | 21:27 |
mordred | assumign a diff-like behavior, you'd want the largest number of lines | 21:27 |
clarkb | any concern with that? | 21:27 |
ianw | and i saw something that mentioned you should order by primary key to try and keep it stable | 21:27 |
fungi | presumably if we wanted database dumps in the same archive the solution would be to back them up from the filesystem, the attempt here is to be more space efficient on the archive end? | 21:27 |
corvus | sake of argument: what if we didn't compress the mysqldumps (and had fewer of them; like... maybe one); would that make borg happy and we could have one archive? | 21:27 |
clarkb | corvus: yes, but some services db backups are too big for that to fit on their disk iirc | 21:27 |
ianw | corvus: on etherpad, even having one uncompressed dump is getting tight | 21:27 |
clarkb | we'd need to figure that out to make it work that way | 21:28 |
mordred | ianw: innodb tables are naturally sorted by primary key | 21:28 |
mordred | so it should dump in that order regardless | 21:28 |
corvus | clarkb: why 3/day? | 21:28 |
clarkb | corvus: 1 for the loca dump, then once a day to each of our different borg targets | 21:28 |
ianw | corvus: we backup to 2 servers, and i've left the local compressed jobs too | 21:28 |
fungi | mysqldump to the fs, and separately to both backup sites | 21:28 |
fungi | = 3 | 21:29 |
clarkb | we could make the stream script do a zcat of the local dump instead of dumping during the backup? | 21:29 |
clarkb | then we'd be back down to 1 dump a day | 21:29 |
ianw | clarkb: yep, although i guess that would need some locking to be 100% sure we never sent something corrupt | 21:30 |
clarkb | hrm ya | 21:31 |
mordred | ianw: yeah - I think --skip-extended-insert should be all you need for good differential backup storage. oh - also - add --skip-dump-date | 21:31 |
corvus | it's worth considering the (lack of) atomicity between the db and fs. that's true today anyway (we dump the db to the fs, then back that up, so the host's filesystem is always ahead of the db in a backup). i don't think this substantially changes anything related to that (other than we have a few more timestamps involved) | 21:31 |
mordred | (no need to put a "this was dumped on $date" line into the dump) | 21:31 |
corvus | mordred: actually, re my last line, that might be handy? :) | 21:32 |
mordred | oh - yeah | 21:32 |
corvus | unless we think it's going to explode the backup size -- i'm not sure what the de-dup window is like with borg? | 21:32 |
mordred | and it's not that much data that would be diffed each time or anything | 21:32 |
corvus | right, assuming dedup is sane, i think it would be an asset to keep it. | 21:33 |
ianw | corvus: it's configurable, about 2mb by default i think | 21:33 |
corvus | oh | 21:33 |
corvus | mordred: does the datestamp come at the start? | 21:33 |
ianw | so on a test yesterday of the etherpad db, basically back-to-back updates were a diff of ~250mb | 21:33 |
corvus | oh, it looks like it's at the end of the file! which is the most likely to be different anyway, so i think we're good to keep dump-date | 21:33 |
ianw | (this is better than the gzip'd on-disk, which are 5gb with nothing to de-dup) | 21:33 |
ianw | i can try with a few of these options and see if we get better | 21:34 |
fungi | for the etherpad example, the skew between db and fs is likely unimportant (except maybe around upgrades with a schema transition or something) as all the data is in the db. for a service like mediawiki which has a db and associated user-submitted files (images, et cetera) outside the db it could be somewhat more relevant | 21:34 |
corvus | fungi: and gerrit | 21:34 |
ianw | but even so, keeping 7*250mb + 12*250 isn't too much of an overhead | 21:34 |
clarkb | corvus: fungi ya but we've always had that gap I think | 21:35 |
clarkb | I guess removing as much gap as possible is a nice improvement though | 21:35 |
fungi | yeah, gerrit is a better example than mediawiki, but we have very little of importance in mysql there now | 21:35 |
clarkb | in fact we could stop backing up the mysql db and we in theory only lose what files people have previously reviewed | 21:35 |
corvus | probably don't even need to back it up? | 21:35 |
clarkb | my last statement was for gerrit specifically | 21:35 |
corvus | yep | 21:35 |
mordred | random thought - for the services where we're running container + db - should we add a table to the db to record the container sha we're runnign with? that way a db dump would also carry which version fo the container it was created from? | 21:36 |
clarkb | or add a docker ps -a and docker image list to the stream? | 21:37 |
clarkb | (I assume we can sneak that in as a comment) | 21:38 |
ianw | mordred: so you think "--opt" ON bu tthen "--skip-extended-insert"? | 21:38 |
clarkb | thats probably the osrt of thing to figure out once this is running and we are happy iwth it tough | 21:38 |
mordred | clarkb: ++ | 21:38 |
mordred | ianw: yah. although --opt defaults to on - so you don't really need it anymore | 21:38 |
ianw | oh --opt "This option, enabled by default" | 21:38 |
ianw | yeah, jinx | 21:39 |
mordred | it doesn't hurt - you can leave it in | 21:39 |
ianw | i'm running a baseline with --skip-extended-insert now, and then will run basically a zero delta and see how it goes. 208.56 MB is the number to beat :) | 21:40 |
ianw | oh and for reference the old dump was 15.92 GB | 21:41 |
fungi | i feel like i should be deciding whether to place my chips on red or black | 21:41 |
corvus | come on! big data! big data! | 21:43 |
*** owalsh has joined #opendev | 21:44 | |
ianw | This archive: 17.33 GB 4.24 GB 4.24 GB | 22:00 |
ianw | interesting, slightly bigger but compresses smaller | 22:00 |
*** DSpider has quit IRC | 22:14 | |
*** owalsh has quit IRC | 22:16 | |
ianw | his archive: 17.33 GB 4.24 GB 22.33 MB | 22:17 |
ianw | ok, mordred wins, 22mb v 250mb for a zero-ish delta | 22:17 |
ianw | and it compresses smaller too | 22:17 |
mordred | \o/ | 22:17 |
fungi | an order of magnitude? wow! | 22:18 |
*** owalsh has joined #opendev | 22:25 | |
JayF | I always learn the most interesting tricks watching you all work in here :) | 22:26 |
*** d34dh0r53 has quit IRC | 22:37 | |
*** d34dh0r53 has joined #opendev | 22:48 | |
*** brinzhang has quit IRC | 23:00 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Set stop_grace_period on nodepool-builder containers https://review.opendev.org/c/opendev/system-config/+/771899 | 23:14 |
clarkb | corvus: ianw ^ maybe we just try that | 23:14 |
corvus | clarkb: think that's long enough? | 23:16 |
clarkb | corvus: it probably depends on how full those dirs are | 23:17 |
clarkb | corvus: cleaning up multiples of them yesterday definitely took longer than 90s, but when I did a single one it wsa reasonably quick | 23:17 |
clarkb | ianw: ^ may have a better sense for timing on that though | 23:17 |
JayF | eiddccidrhjiviiikiukfijbhebltdeunkckugcfveru | 23:19 |
JayF | whoops, sorry | 23:19 |
ianw | clarkb: yeah, if it's not done by 90s it probably isn't going to get done :) ... | 23:20 |
*** brinzhang has joined #opendev | 23:34 | |
mordred | JayF: I couldn't agree more | 23:42 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!