fungi | i'm still around but seems like the deployment backlog isn't close to reaching 720302 yet, still 7 changes ahead of it | 00:40 |
---|---|---|
fungi | so i have doubts i'll be awake by then | 00:40 |
fungi | a lot of deploy jobs are failing or hitting timeouts too | 00:41 |
corvus | fungi, clarkb: we've disabled ansible, so everything should timeout. we'll let it continue doing that overnight, then run stuff manually tomorrow. | 00:42 |
fungi | ahh, okay. so i should probably just apply 733673 by hand at this point | 00:43 |
fungi | though i've technically already missed the 15th for deleting that ml now, i doubt anyone is put out by it | 00:45 |
fungi | #status log deleted user-committee ml from openstack mailman site on lists.o.o | 00:48 |
openstackstatus | fungi: finished logging | 00:48 |
auristor | ianw: how long did it take to release mirror.fedora after the rsync without -t ? | 00:52 |
fungi | #status log ze04 rebooted to clear inconsistent afs rw volume access following saturday's outage | 00:53 |
openstackstatus | fungi: finished logging | 00:53 |
ianw | auristor: i ran a zero-delta update and it took about 10 seconds :) | 00:54 |
auristor | I think we've found your smoking gun. | 00:55 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and standarise flags https://review.opendev.org/735753 | 00:55 |
ianw | auristor/fungi: ^ and that's what i've come up with | 00:55 |
auristor | rsync -t isn't helping anything because it always finds a time different | 00:56 |
ianw | right, it always tries to set tv_nsec now; possibly previously (i mean, this has been a long time) upstream mirror was ext3 or nfs or something that wasn't reporting ns precision | 00:57 |
auristor | do you care about typos in commit messages? | 00:57 |
ianw | auristor: yep; i'll fix any pointed out :) | 00:57 |
ianw | like standarise :) | 00:57 |
auristor | ivestigating :) | 00:58 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent https://review.opendev.org/735753 | 00:58 |
fungi | looking at the rsync manpage, it has a --modify-window option to make the timestamp comparisons flexibly fuzzy, though i have no idea if that would have helped | 00:59 |
auristor | a window of 1s would have helps | 00:59 |
ianw | aiui it's default window does ignore ns ... but it's the fact it tries to *update* the files on the sync part (rather than the "what should i sync" part) | 01:00 |
ianw | again, aiui, it uses the modification time as a shortcut ... if modification time is == then file hasn't changed | 01:01 |
ianw | but without "--times" it falls back to it's old logic of ctime and file size | 01:01 |
auristor | ianw: sorry but there are two more typos | 01:01 |
auristor | I think you need a new keyboard, more coffee or more beer | 01:01 |
auristor | true, its the setting of time not the comparison that is the problem | 01:03 |
*** Meiyan has joined #opendev | 01:03 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent https://review.opendev.org/735753 | 01:03 |
ianw | heh, just need to slow down, and pay attention to flyspell | 01:04 |
auristor | a problem for openafs that is. either an auristorfs client or an auristorfs fileserver would handle it | 01:04 |
fungi | and yeah, that's why i figured altering the comparison window wouldn't do any good | 01:04 |
fungi | ianw: not sure if you saw, i left comments/questions on patch #2 | 01:05 |
auristor | now that syncing will transfer the correct amount of data perhaps replicas can be added back to afs01.ord | 01:05 |
ianw | fungi: yeah, that -p was missed thanks | 01:06 |
ianw | fungi: yeah, we had the "-i" to debug what rsync thought it was touching | 01:07 |
fungi | ahh, okay | 01:07 |
ianw | however, it *was* touching the timestamp, and not reporting that | 01:07 |
fungi | so we'll likely roll those back to just -v? | 01:07 |
ianw | (which i am poking at the rsync source about) | 01:08 |
ianw | we can; let me update that too | 01:08 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent https://review.opendev.org/735753 | 01:10 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Fix namespace speicification in collect-kubernetes-logs role https://review.opendev.org/735755 | 01:17 |
ianw | https://git.samba.org/?p=rsync.git;a=patch;h=0f8e9e2d8638e47d646a6baba694b303ac84e695;hp=c4a3f55be35726d0a033996dc37b0fb248b45cb5 | 01:41 |
ianw | fungi/auristor: ^ and here's the change that fixes it ... | 01:42 |
ianw | which made it into 3.1.3 ... and bionic has ... you guessed it ... 3.1.2 | 01:43 |
ianw | also, it seems "If you repeat the option, unchanged files will also be output," is mentioned for "-i" ... so if we had "-ii" we would have actually seen the itemized output saying "t", so it's just a bug with 3.1.2 we didn't see that which would have tipped me off without having to strace and blah blah blah | 01:46 |
ianw | actually, no, that's not true. it doesn't show up with "-ii" either | 01:47 |
ianw | http://paste.openstack.org/show/794789/ | 01:48 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent https://review.opendev.org/735753 | 01:51 |
*** ysandeep is now known as ysandeep|away | 02:03 | |
auristor | ianw: I think that rsync fix is wrong | 02:25 |
auristor | The clock resolution of afs3 is 1s just as the clock resolution of FAT is 2s. rsync should be querying the clock resolution of the source and destination filesystems and use them to decide what a matching timestamp is and not set the modify time if there was a time match. | 02:29 |
ianw | but how do you really know what filesystem you're writing to? | 02:33 |
auristor | if the version of linux you are using doesn't support fsinfo then you do what df does and use a table of file system names as reported by stat | 02:35 |
ianw | mirror runs are failing on mirror01.ca-ymq-1.vexxhost.opendev.org "src file does not exist, use "force=yes" if you really want to create the link: /afs/openstack.org/mirror/logs" ... i'm guess this is the same "directory gone" issue | 02:42 |
ianw | yeah, it's the same odd mix of missing stuff fungi reported; i'll take the same approach and reboot it | 02:44 |
auristor | https://bugzilla.redhat.com/show_bug.cgi?id=1672779 | 02:44 |
openstack | bugzilla.redhat.com bug 1672779 in rsync "Rsync bug resets modification time of every destination file that has not changed" [Medium,Closed: errata] - Assigned to mruprich | 02:44 |
*** Meiyan has quit IRC | 02:45 | |
*** Meiyan has joined #opendev | 02:46 | |
ianw | auristor: yeah, i saw that, that's asking for a backport of that change | 02:46 |
auristor | the fix is from 2001 | 02:47 |
ianw | i have also added the credentials i missed for https://review.opendev.org/#/c/728739/ to hopefully fix the bridge run | 02:48 |
auristor | the overlayfs problem is that its reporting nanosecond timestamp support even though the underlying filesystem might be 1s or 2s granularity. however, the fix there should be in overlayfs to ignore the setting of the timestamp when the underlying filesystem doesn't support the required resolution. | 02:50 |
ianw | #status log rebooted mirror01.ca-ymq-1.vexxhost.opendev.org for afs connection issues | 02:53 |
openstackstatus | ianw: finished logging | 02:53 |
ianw | static, backup, etc seem to have timed out .. | 02:56 |
ianw | 2020-06-15 23:09:27,070 DEBUG zuul.AnsibleJob.output: [e: 13020d2bc71446c89ab132b3449dccdf] [build: da25d932c7914344af04e1b55a1a5488] Ansible output: b"TASK [Make sure a manaul maint isn't going on path=/home/zuul/DISABLE-ANSIBLE, state=absent, timeout=3600, sleep=10] ***" | 03:02 |
ianw | 2020-06-15 23:39:02,272 WARNING zuul.AnsibleJob: [e: 13020d2bc71446c89ab132b3449dccdf] [build: da25d932c7914344af04e1b55a1a5488] Ansible timeout exceeded: 1781.1864149570465 | 03:02 |
ianw | ok, so corvus' "Make disable-ansible fancier" seems to address the issue of this file being in place and people who are out of sync, like me :) not knowing what's going on | 03:07 |
ianw | Jun 16 10:42:41 <corvus> fungi, clarkb: we've disabled ansible, so everything should timeout. we'll let it continue doing that overnight, then run stuff manually tomorrow | 03:12 |
*** ysandeep|away is now known as ysandeep | 04:17 | |
*** ykarel|away is now known as ykarel | 04:28 | |
openstackgerrit | Merged zuul/zuul-jobs master: Fix namespace speicification in collect-kubernetes-logs role https://review.opendev.org/735755 | 05:00 |
*** DSpider has joined #opendev | 05:21 | |
*** hashar has joined #opendev | 06:02 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: openafs-client: Use PPA for Xenial ARM64 https://review.opendev.org/735055 | 06:10 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Acutally run system-config arm64 test on an arm64 node https://review.opendev.org/735281 | 06:10 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: mirror-update: mirror Fedora 32 https://review.opendev.org/735773 | 06:16 |
*** hashar has quit IRC | 06:21 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Remove the -plain job variants https://review.opendev.org/735774 | 06:23 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Turn -plain nodes down to min-ready 0 https://review.opendev.org/735777 | 06:30 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Remove plain images https://review.opendev.org/735778 | 06:30 |
openstackgerrit | Merged openstack/project-config master: Retire Tricircle projects: finish infra todo https://review.opendev.org/728902 | 06:42 |
*** hashar has joined #opendev | 06:48 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/735782 | 06:48 |
*** rpittau|afk is now known as rpittau | 06:57 | |
*** sgw1 has quit IRC | 07:01 | |
openstackgerrit | Merged zuul/zuul-jobs master: Return upload_results in test-upload-logs-swift role https://review.opendev.org/735503 | 07:11 |
*** tosky has joined #opendev | 07:31 | |
*** Meiyan has quit IRC | 07:40 | |
*** Meiyan has joined #opendev | 07:41 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
*** ykarel is now known as ykarel|lunch | 08:08 | |
*** lpetrut has joined #opendev | 08:16 | |
*** hashar_ has joined #opendev | 08:26 | |
*** hashar has quit IRC | 08:27 | |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas project https://review.opendev.org/735807 | 08:38 |
*** hashar_ is now known as hashar | 08:43 | |
*** priteau has joined #opendev | 08:45 | |
*** ysandeep is now known as ysandeep|lunch | 08:46 | |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Rename neutron-fwaas and neutron-fwaas-dashboard to x/ namespace https://review.opendev.org/735812 | 08:53 |
*** ykarel|lunch is now known as ykarel | 08:57 | |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects https://review.opendev.org/735807 | 09:08 |
AJaeger | infra-root, please review https://review.opendev.org/735832 for infra-manual to better document project removal. | 09:29 |
jrosser | AJaeger: can you give me any advice on the openstack-tox-docs failure here https://review.opendev.org/#/c/735805/ | 09:35 |
jrosser | i must be missing something but can't see an obvious problem | 09:35 |
AJaeger | jrosser: I hope it's fixed by 735801 which just merged | 09:36 |
AJaeger | jrosser: let me check the logs to see whether that is the same problem | 09:36 |
AJaeger | jrosser: yes, that's the problem that 735801 should fix. | 09:37 |
jrosser | AJaeger: excellent thanks | 09:37 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Retire neutron-fwaas and neutron-fwaas-dashboard projects https://review.opendev.org/735812 | 09:47 |
openstackgerrit | Merged zuul/zuul-jobs master: Terraform roles and jobs. https://review.opendev.org/733675 | 09:48 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects https://review.opendev.org/735807 | 09:51 |
*** diablo_rojo has quit IRC | 09:54 | |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects https://review.opendev.org/735807 | 09:55 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Readd publish-to-pypi for neutron-fwaas and dashboard https://review.opendev.org/735850 | 10:00 |
*** ysandeep|lunch is now known as ysandeep | 10:05 | |
*** hashar_ has joined #opendev | 10:10 | |
*** hashar has quit IRC | 10:11 | |
*** Meiyan has quit IRC | 10:11 | |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects https://review.opendev.org/735807 | 10:17 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Readd publish-to-pypi for neutron-fwaas and dashboard https://review.opendev.org/735850 | 10:17 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Readd publish-to-pypi for neutron-fwaas and dashboard https://review.opendev.org/735850 | 10:19 |
*** hashar__ has joined #opendev | 10:20 | |
*** hashar__ is now known as hashar | 10:24 | |
*** hashar_ has quit IRC | 10:24 | |
*** lpetrut has quit IRC | 10:33 | |
*** rpittau is now known as rpittau|bbl | 10:34 | |
*** calcmandan has quit IRC | 10:36 | |
*** calcmandan has joined #opendev | 10:39 | |
*** sshnaidm is now known as sshnaidm|afk | 10:45 | |
*** hashar has quit IRC | 10:50 | |
*** lpetrut has joined #opendev | 11:14 | |
*** sshnaidm|afk is now known as sshnaidm | 11:40 | |
openstackgerrit | Merged openstack/diskimage-builder master: Add .eggs to gitignore https://review.opendev.org/734469 | 11:43 |
*** priteau has quit IRC | 11:47 | |
openstackgerrit | Emilien Macchi proposed openstack/project-config master: paunch: don't run publish-to-pypi template https://review.opendev.org/735889 | 12:13 |
openstackgerrit | Emilien Macchi proposed openstack/project-config master: paunch: don't run publish-to-pypi template https://review.opendev.org/735889 | 12:26 |
openstackgerrit | Emilien Macchi proposed openstack/project-config master: Revert "Deprecate Paunch" https://review.opendev.org/735893 | 12:26 |
*** rpittau|bbl is now known as rpittau | 12:28 | |
*** tkajinam has quit IRC | 12:31 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: paunch: don't run publish-to-pypi template https://review.opendev.org/735889 | 12:43 |
*** rchurch has quit IRC | 12:49 | |
*** rchurch has joined #opendev | 12:51 | |
*** ysandeep is now known as ysandeep|afk | 12:51 | |
mordred | AJaeger: the "build-python-release" job seems sad with the new imagew | 12:56 |
*** sgw1 has joined #opendev | 12:56 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add ensure-pip to build-python-release https://review.opendev.org/735904 | 12:57 |
AJaeger | mordred: ;/ | 12:58 |
AJaeger | mordred: we had a few interesting cases that needed fixing ;/ | 12:58 |
AJaeger | mordred: LGTM, thanks | 12:59 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add ensure-pip to build-sphinx-docs https://review.opendev.org/735908 | 13:10 |
*** mtreinish has quit IRC | 13:13 | |
*** mtreinish has joined #opendev | 13:14 | |
openstackgerrit | Merged opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent https://review.opendev.org/735753 | 13:15 |
openstackgerrit | Merged openstack/project-config master: paunch: don't run publish-to-pypi template https://review.opendev.org/735889 | 13:15 |
openstackgerrit | Emilien Macchi proposed openstack/project-config master: Revert "Deprecate Paunch" https://review.opendev.org/735893 | 13:17 |
openstackgerrit | Emilien Macchi proposed openstack/project-config master: Revert "Deprecate Paunch" https://review.opendev.org/735893 | 13:18 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add ensure-pip and ensure-virtualenv to build-sphinx-docs https://review.opendev.org/735908 | 13:18 |
openstackgerrit | Merged zuul/zuul-jobs master: Add ensure-pip to build-python-release https://review.opendev.org/735904 | 13:20 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add ensure-pip and ensure-virtualenv to build-sphinx-docs https://review.opendev.org/735908 | 13:21 |
*** ysandeep|afk is now known as ysandeep | 13:27 | |
*** mtreinish has quit IRC | 13:36 | |
*** mtreinish has joined #opendev | 13:38 | |
*** mtreinish has quit IRC | 13:43 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: DNM: test ensure-sphinx role https://review.opendev.org/735919 | 13:49 |
*** mtreinish has joined #opendev | 13:49 | |
*** tkajinam has joined #opendev | 13:57 | |
*** mlavalle has joined #opendev | 13:58 | |
*** roman_g has joined #opendev | 14:06 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead https://review.opendev.org/735923 | 14:12 |
*** hashar has joined #opendev | 14:12 | |
*** jbryce_ has joined #opendev | 14:13 | |
*** mnaser has quit IRC | 14:14 | |
*** zbr_ has joined #opendev | 14:15 | |
*** auristor has quit IRC | 14:15 | |
*** jbryce has quit IRC | 14:16 | |
*** zbr has quit IRC | 14:16 | |
*** jbryce_ is now known as jbryce | 14:16 | |
*** zbr_ is now known as zbr | 14:16 | |
*** mnaser has joined #opendev | 14:17 | |
*** hrw has quit IRC | 14:18 | |
*** hrw has joined #opendev | 14:18 | |
*** auristor has joined #opendev | 14:23 | |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Record artifact checksums and signatures to stdout https://review.opendev.org/735929 | 14:40 |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Simplify twine invocation for PyPI uploads https://review.opendev.org/735932 | 14:50 |
clarkb | gitea is down to a single 1.12.0 milestone bug | 15:02 |
clarkb | and it is the bug that discussion says is probably not a bug | 15:02 |
AJaeger | clarkb: that's what I would try as well to get a release out of the door ;) | 15:04 |
clarkb | ha | 15:05 |
clarkb | in this case I've read the comments and I think they are right? its about a gitea push hook not running on new project create. The reason for that is the project creation hook runs and a push hook is a different event | 15:05 |
AJaeger | yeah, looks different | 15:06 |
openstackgerrit | Merged zuul/zuul-jobs master: Add ensure-pip and ensure-virtualenv to build-sphinx-docs https://review.opendev.org/735908 | 15:09 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead https://review.opendev.org/735923 | 15:09 |
clarkb | https://github.com/go-gitea/gitea/issues/11534 is the bug fwiw | 15:10 |
mordred | clarkb: I agree, that does not sound like a bug | 15:18 |
*** ykarel is now known as ykarel|away | 15:23 | |
AJaeger | infra-root, please review https://review.opendev.org/735832 for infra-manual to better document project removal. | 15:23 |
clarkb | infra-root re Zuul restart is the plan there to keep zuul ansible disabled, stop zuul, run playbook for zuul and zk manually then start zuul? | 15:40 |
clarkb | I'm making tea and breakfast but will be around to help in just a few minutes | 15:41 |
fungi | that's what last night's plan sounded like, at least | 15:41 |
clarkb | AJaeger: is openstack-tox-docs expected to run on eg cinder master now? | 15:43 |
mordred | clarkb: yes, I believe so (re restart plan) | 15:43 |
AJaeger | clarkb: since ages | 15:43 |
clarkb | AJaeger: ok was following up on https://zuul.opendev.org/t/openstack/build/28e2129d52564def8c473b2545f32b1f which failed ~7 hours ago. I'll let them know to recheck | 15:44 |
AJaeger | clarkb: the switch to those jobs was done during stein cycle across all projects | 15:44 |
AJaeger | clarkb: ah, yes, that is fixed | 15:44 |
AJaeger | clarkb: I thought you were talking about 735937 | 15:44 |
clarkb | AJaeger: sorry I mean is it expected to be successful after the pip and virtualenv fallout | 15:44 |
clarkb | sounds like yes to both things :) | 15:44 |
AJaeger | clarkb: got it ;) | 15:44 |
AJaeger | clarkb: and yes to both | 15:44 |
AJaeger | clarkb: I was talking about 735923 and 735937, reviews welcome ;) | 15:45 |
*** lpetrut has quit IRC | 15:49 | |
*** ysandeep is now known as ysandeep|brb | 15:51 | |
*** rpittau is now known as rpittau|afk | 16:12 | |
*** ysandeep|brb is now known as ysandeep | 16:23 | |
corvus | clarkb, fungi, mordred: i'm around and ready to do the restart | 16:24 |
fungi | i too am around and basically free (just polishing off the last of my lunch) | 16:24 |
clarkb | ya I'm sipping tea and reading too much about python packaging :) happy for the distraction | 16:25 |
*** diablo_rojo has joined #opendev | 16:27 | |
* mordred is also here | 16:27 | |
clarkb | I notified the openstack release team earlier today and they said they would hold off on releaes too | 16:28 |
corvus | cool | 16:28 |
corvus | just checking on the load -- we're busy, but not too backlogged | 16:28 |
corvus | there is a release-post job | 16:28 |
fungi | and i just got the last of the release failure fallout from the afs outage cleaned up | 16:29 |
fungi | (had to retrieve some files from pypi and manually recreate pgp sigs) | 16:29 |
corvus | and it's done | 16:29 |
*** ysandeep is now known as ysandeep|away | 16:30 | |
corvus | mordred: is there a playbook to update the git repos on bridge? | 16:30 |
corvus | maybe we just manually update p-c and s-c and that's good enough? | 16:31 |
mordred | corvus: yes, I believe that | 16:32 |
fungi | does bridge even care about p-c? do we copy that onto the servers from bridge or fetch it on them? | 16:32 |
mordred | (there is not a playbook, but just manually updating should be fine) | 16:32 |
corvus | fungi: no idea, better safe than sorry :) | 16:33 |
mordred | we clone project-config ourselves | 16:33 |
mordred | in playbooks/roles/sync-project-config/tasks/main.yaml | 16:33 |
corvus | i ran: "git pull https://opendev.org/opendev/system-config" | 16:33 |
fungi | corvus: wfm | 16:33 |
corvus | head looks right | 16:33 |
mordred | into /opt/project-config | 16:33 |
mordred | so - system-config is all we need - but you can also pull project-config in /home/zuul to be safe :) | 16:34 |
corvus | i did that too :) | 16:34 |
fungi | oh, right, at one point we were pushing zuul refs for p-c onto bridge and it was in turn pushing those to servers, i guess | 16:34 |
mordred | yeah - and - we can actually go back to that now ... | 16:35 |
mordred | we stopped because we didn't have the serial pipeline manager | 16:35 |
mordred | and things would get enqueued out of order | 16:35 |
mordred | but - this also seems fine | 16:35 |
fungi | mordred: well, i think we still have that issue because deploy and periodic pipelines can race | 16:36 |
corvus | status notice Zuul is being restarted for an urgent configuration change and may be offline for 15-30 minutes. Patches uploaded or approved during that time will need to be rechecked. | 16:36 |
corvus | how's that look? | 16:37 |
mordred | fungi: oh right | 16:37 |
mordred | corvus: ++ | 16:37 |
fungi | corvus: lgtm | 16:37 |
corvus | #status notice Zuul is being restarted for an urgent configuration change and may be offline for 15-30 minutes. Patches uploaded or approved during that time will need to be rechecked. | 16:37 |
openstackstatus | corvus: sending notice | 16:37 |
-openstackstatus- NOTICE: Zuul is being restarted for an urgent configuration change and may be offline for 15-30 minutes. Patches uploaded or approved during that time will need to be rechecked. | 16:37 | |
corvus | i have saved queues; will stop zuul now | 16:37 |
mordred | corvus: are you screening? | 16:38 |
corvus | no, maybe i should going forward | 16:38 |
corvus | i've started a root screen on bridge | 16:39 |
corvus | the zuul stop is still running in an unscreened window, i'll let you know when it's done | 16:39 |
corvus | but next we should stop nodepool | 16:40 |
*** sgw1 has quit IRC | 16:40 | |
openstackstatus | corvus: finished sending notice | 16:40 |
corvus | anyone understand that failure? | 16:42 |
mordred | corvus: uhm | 16:42 |
corvus | oh, is because of containers? | 16:43 |
mordred | corvus: oh - I think it's trying to stop nodepool launcher | 16:43 |
mordred | but we don't have that anymore? | 16:43 |
corvus | we don't have nodepool-launchers? | 16:43 |
mordred | we don't have service: nodepool-launcher | 16:43 |
corvus | ok, yeah, so the container stuff | 16:43 |
fungi | we talked about a service wrapper around docker-compose, but that doesn't exist (yet) | 16:43 |
corvus | i think we need to find a better way to keep these ad-hoc playbooks current :/ | 16:43 |
mordred | I think we updated the zuul one and not the nodepool one | 16:44 |
corvus | are all of our launchers in containers? are all of our builders in containers? | 16:44 |
mordred | looking | 16:45 |
mordred | launchers are all in containers | 16:45 |
clarkb | nb03 is not in container | 16:45 |
mordred | builders are a mix - nb03 is not container nodepool-builder_opendev is container | 16:45 |
clarkb | nb01 and nb02 are in containers | 16:45 |
corvus | is that "include_role" "tasks_from: stop" thing going to work for the nodepool containers? | 16:46 |
mordred | (nodepool-builder_opendev is a group containing the list of container builder hosts) | 16:46 |
mordred | corvus: no. but we should make that work | 16:46 |
mordred | corvus: I will make a patch to update the nodepool roles to support that pattern | 16:46 |
corvus | okay, so what i'm getting is the best way to make sure nodepool is stopped is to log into all the machines and make it be stopped? | 16:46 |
clarkb | we can probably do a simple ansible command? | 16:47 |
mordred | corvus: we can make a quick local playbook - or we can log in to them all | 16:47 |
mordred | we want to do docker-compose down on nodepool-builder_opendev and nodepool-launcher groups - and service stop on nb03 | 16:48 |
corvus | mordred: okay, want to make that playbook then? | 16:50 |
corvus | mordred: you can drive the screen session | 16:50 |
mordred | http://paste.openstack.org/show/794825/ | 16:51 |
mordred | corvus: k. driving | 16:51 |
mordred | corvus: you want me to put the nodepool stop into the zuul_stop playbook for now then? | 16:51 |
corvus | nope | 16:51 |
corvus | how about that one :) | 16:51 |
corvus | if it's just what's in paste, we can just cat> | 16:52 |
mordred | how's that | 16:54 |
corvus | ok let's try it | 16:54 |
corvus | the nb03 not existing is concerning? | 16:54 |
mordred | yeah ... | 16:54 |
corvus | .openstack? | 16:55 |
corvus | yeah, that's right | 16:55 |
mordred | maybe it's disabled | 16:55 |
corvus | ok, i guess i'll log in manually | 16:55 |
corvus | it's been disabled a long time | 16:56 |
mordred | yes - it is disabled | 16:56 |
mordred | - nb03.openstack.org # ianw 2020-05-20 hand edits applied to dib to build focal on xenial | 16:56 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Run restart playbooks to test they work https://review.opendev.org/735969 | 16:56 |
corvus | and it's about to not be able to come back up | 16:56 |
corvus | because its config is going to be wrong | 16:56 |
corvus | i guess we can talk about it at the meeting | 16:56 |
mordred | you know ... | 16:56 |
mordred | I think those edits have all landed to dib | 16:56 |
corvus | so i'll just stop it for now, and not bring it back up | 16:56 |
mordred | kk | 16:56 |
clarkb | corvus: I think that is fine | 16:56 |
clarkb | the existing images won't go away | 16:56 |
corvus | ok it is stopped | 16:57 |
clarkb | also do these changes force tls for everything? or can it still talk via not tls? | 16:57 |
corvus | i think that means all of nodepool is stopped; zuul is almost finished stopping | 16:57 |
corvus | clarkb: force tls | 16:57 |
clarkb | fwiw I expect that https://review.opendev.org/735969 will fail due to the problems we discovered. We can squash fixes into that or I'll rebase on the fixes. But i think that will help us keep those playbooks running | 16:58 |
corvus | now i think we can run the zk playbook | 16:58 |
mordred | corvus: ++ | 16:58 |
mordred | clarkb: I thinkt hat patch is a great idea - I'll update it to fix the start/stop stuff once this is done | 16:59 |
fungi | the only thing which still has not-tls left over as far as i could find are the firewall rules (fixed when 735740 merges) and the nodepool conffiles in project-config (which ansible overwrites) | 16:59 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run restart playbooks to test they work https://review.opendev.org/735969 | 17:03 |
mordred | clarkb: ^^ updated | 17:03 |
clarkb | corvus: that looked happy | 17:03 |
corvus | agreed | 17:03 |
mordred | \o/ | 17:03 |
corvus | now we need to restart all the zk docker containers | 17:03 |
mordred | yup | 17:03 |
mordred | corvus: it's /etc/zookeeper-compose/ | 17:04 |
mordred | corvus: those look good | 17:05 |
corvus | kk, running the stop | 17:05 |
corvus | 2020-06-16 17:05:48,010 [myid:1] - INFO [/23.253.236.126:3888:UnifiedServerSocket$UnifiedSocket@273] - Accepted TLS connection from zk03.openstack.org/23.253.90.246:37594 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 | 17:06 |
clarkb | neat | 17:06 |
mordred | corvus: do we expect: 2020-06-16 17:05:48,238 [myid:3] - ERROR [QuorumPeer[myid=3](plain=disabled)(secure=[0:0:0:0:0:0:0:0]:2281):QuorumPeer@1619] - Error writing next dynamic config file to disk: | 17:06 |
clarkb | mordred: yes I think that is expected | 17:06 |
corvus | yep | 17:06 |
mordred | I thought so - but thought I'd check | 17:06 |
clarkb | we thought it may have been causing the problems we had when we switched to containers but turns otu it was a bug in zk itself that was workedaround by using IP addrs in config isntead of dns names | 17:07 |
mordred | ++ | 17:07 |
corvus | all 3 look happy | 17:08 |
clarkb | I've just confirmed that port 2181 isn't listening on zk01 but 2281 is (confirms the assertion earlier that non tls is disabled) | 17:08 |
corvus | let's bring nodepool back up? | 17:08 |
clarkb | corvus: ++ | 17:08 |
fungi | yep, looks right | 17:08 |
mordred | corvus: I added nodepool start tasks in https://review.opendev.org/735969 | 17:08 |
corvus | actually | 17:09 |
corvus | oh we haven't run the nodepool playbook yet | 17:09 |
mordred | that's important | 17:09 |
corvus | let's do that now | 17:09 |
mnaser | (something i may get around contributing to opendev is a custom 503 page) | 17:09 |
corvus | look good? | 17:09 |
fungi | hrm, yeah nodepool configs still showed the old port | 17:09 |
mordred | corvus: while you're doig that - I can make a nodepool_start.yaml playbook for you | 17:09 |
corvus | mordred: k thx; running now | 17:10 |
clarkb | corvus: yes lgtm | 17:10 |
fungi | at least nl01 still has 2181 in nodepool.yaml | 17:10 |
corvus | fungi: yep, that should get fixed by the current playbook run | 17:10 |
fungi | that's what i figured | 17:10 |
mordred | corvus: playbooks/nodepool_start.yaml should be updated | 17:11 |
corvus | woot | 17:11 |
fungi | mnaser: also i've been thinking if statusbot wrote status messages to a published file on eavesdrop (rather than just to the wiki and twitter) we could probably easily transclude the most recent few into the main opendev.org page and even color-code or filter them by severity | 17:11 |
fungi | mnaser: and then custom 503 pages or similar could include a message linking there | 17:13 |
corvus | fungi: /var/lib/statusbot/www/alert.json | 17:13 |
fungi | aha, right, so half of that is already done ;) | 17:13 |
mnaser | is statusbot hosted on the same machine as zuul.opendev.org apache frontend? | 17:13 |
clarkb | mnaser: no | 17:14 |
mnaser | aw ok | 17:14 |
clarkb | status bot is on eavesdrop.openstack.org | 17:14 |
mnaser | could have been nice, if that file is created, we could have given 503 if it exists | 17:14 |
corvus | i manually killed ze04; i suspect it was dead on reboot due to the gearman cert issue | 17:16 |
corvus | all of zuul is stopped now | 17:16 |
fungi | mnaser: well, it could still be included with some js, if we served it from a vhost on eavesdrop.o.o | 17:16 |
fungi | /etc/nodepool/nodepool.yaml on nl01 looks correct now | 17:16 |
mnaser | fungi: yeah -- but i was thinking if file exists locally => serve 503 else serve usual 503 bc backend down -- so we don't end up showing a "in maintenance" page if the reality of things is "zuul-web is down" | 17:17 |
mnaser | but ill leave that discussion for after the restart :) | 17:17 |
fungi | corvus: yeah, that sounds likely. i rebooted it late last night just so we wouldn't forget and accidentally bring it back up with broken afs access | 17:17 |
corvus | cool, i think we can restart nodepool now | 17:17 |
mordred | corvus: ++ | 17:17 |
clarkb | ++ | 17:18 |
corvus | 2020-06-16 17:18:07,468 [myid:2] - INFO [nioEventLoopGroup-4-3:X509AuthenticationProvider@172] - Authenticated Id 'CN=nl01.openstack.org,OU=Org,O=Company Name,L=Oakland,ST=California,C=US' for Scheme 'x509' | 17:18 |
mordred | \o./ | 17:18 |
mordred | that's so exciting | 17:18 |
clarkb | hrm didn't realize we still have nb04 up (shouldn't be a problem for this) | 17:18 |
corvus | nl01 appears active and dealing with requests | 17:18 |
fungi | mnaser: i figured we could do something simpler, like a bit of js on the opendev.org main page to include the last few status updates, and then our 503 page for zuul could suggest people look at opendev.org for possible maintenance in progress | 17:19 |
mnaser | fungi: ah yes, that works too -- or simple js to check if alert.json contains something | 17:19 |
mnaser | if we serve that of eavesdrop | 17:19 |
mordred | or our 503 page could have javascript and fetch from the status | 17:19 |
mordred | yeah | 17:19 |
corvus | hrm, a connection reset just happened | 17:19 |
mordred | uhoh | 17:19 |
corvus | it recovered, but it's curious | 17:20 |
mordred | we should keep our eyes on that | 17:20 |
fungi | zk connection reset? | 17:20 |
mordred | corvus: is tobiash already running tls zk at bmw? | 17:20 |
corvus | yeah, on nl01 | 17:20 |
corvus | i think? | 17:20 |
corvus | and onother one | 17:20 |
tobiash | weird, we saw connection resets on staging as well | 17:21 |
tobiash | Not yet running it in prod | 17:21 |
fungi | but not in production? | 17:21 |
clarkb | nl03 and nl04? | 17:21 |
corvus | i think just nl01 so far? | 17:22 |
corvus | nope, others too | 17:22 |
tobiash | Just remembered thr reason for this was too many builds in one path in zk in our case | 17:22 |
clarkb | 2020-06-16 17:19:25,229 WARNING kazoo.client: Connection dropped: socket connection error: The operation did not complete (read) (_ssl.c:2607) from nl03 | 17:23 |
tobiash | We had an image build failure loop there | 17:23 |
corvus | perhaps the tls adapters for either kazoo or zookeeper can't handle large data? | 17:24 |
clarkb | that seems to be happening pretty regularly in nl03's log | 17:25 |
fungi | that would be unfortunate for us | 17:25 |
corvus | clarkb: nl01 too | 17:25 |
clarkb | https://github.com/python-zk/kazoo/issues/587 | 17:25 |
clarkb | unfortunately no much additional info there, but looks like upstream is aware of the problem and would like help debugging it | 17:26 |
corvus | this seems like it may not be viable; and i don't see an obvious immediate emergency fix | 17:26 |
mordred | corvus: I agree | 17:26 |
corvus | i think we will need to revert those patches out of system-config and re-run the process so far | 17:27 |
clarkb | ya I think taht should do it. We'll end up with potentially stale CA info but that will get fixed when we next try this | 17:27 |
fungi | too bad, but i concur | 17:27 |
mordred | corvus: yah - and the gearman cert update will still be in place, so we should be able to start | 17:27 |
clarkb | and reverting the chagne should reset the zk and nodepool configs to talk 2181 | 17:27 |
mordred | yah | 17:27 |
corvus | yep. i'll look into whether we can run the zk cluster in dual mode | 17:28 |
corvus | that may help us with testing, and maybe we can take over that gh issue | 17:28 |
clarkb | ooh ya that may help with reproduction | 17:28 |
clarkb | ++ | 17:28 |
mordred | ++ | 17:28 |
corvus | just revert 29825ac18b58145f007f64b2998357445b8fdd91 ? | 17:29 |
clarkb | yes I think so | 17:30 |
mordred | corvus: yes, I think that's right | 17:30 |
clarkb | that'll update the zk configs across the baord to go back to 2181 without tls | 17:30 |
clarkb | tobiash: your docker images are different than ours right? I wonder if it could be a openssl or kazoo version thing | 17:32 |
clarkb | tobiash: iirc we are both running python3.6 though | 17:32 |
tobiash | yes, ours are bionic based | 17:33 |
clarkb | though you probably use the same images in your staging as in production? | 17:33 |
tobiash | yes | 17:33 |
mordred | clarkb: kazoo should be coming in via pip - so I imagine we'd have the same kazoo. could be different openssl | 17:35 |
mordred | tobiash: are you using the upstream zookeeper images? or building your own? | 17:35 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Set root dir to zuul build uuid https://review.opendev.org/735980 | 17:35 |
mordred | cause in the list of things we should check - we've got zookeeper server, zookeeper client/kazoo and OS things like openssl | 17:36 |
corvus | np seems happy, i'll run service-zuul while it idles and we confirm it's stable | 17:43 |
*** sgw has quit IRC | 17:47 | |
mordred | corvus: watching taht ansible is not super exciting | 17:50 |
corvus | booooorrrriiinngggg | 17:51 |
corvus | shouldn't have turned cowsay off | 17:51 |
clarkb | Moo | 17:52 |
*** sgw has joined #opendev | 17:53 | |
mordred | corvus: looks done | 17:54 |
fungi | and nodepool configs are back to before | 17:55 |
clarkb | fungi: ya I think we're all the way up to zuul has been updated (but not started/) | 17:55 |
corvus | i'll run the zuul start playbook now | 17:57 |
corvus | so far so good | 17:59 |
clarkb | looks like jobs have queued up | 18:01 |
clarkb | haven't seen any starting yet | 18:02 |
clarkb | and now it looks like a bunch have started | 18:03 |
corvus | status notice Zuul is back online; changes uploaded or approved between 16:40 and 18:00 will need to be rechecked. | 18:03 |
corvus | how's that look? | 18:03 |
clarkb | that looks good to me | 18:03 |
mordred | corvus: ++ | 18:04 |
mordred | corvus: and .... boo that we have a weird zk scale/connection/tls issue to investigate | 18:05 |
corvus | #status notice Zuul is back online; changes uploaded or approved between 16:40 and 18:00 will need to be rechecked. | 18:05 |
openstackstatus | corvus: sending notice | 18:05 |
-openstackstatus- NOTICE: Zuul is back online; changes uploaded or approved between 16:40 and 18:00 will need to be rechecked. | 18:06 | |
corvus | yep. on the plus side, the deployment (at least up through nodepool) went flawlessly because of all that gate testing :) | 18:06 |
mordred | corvus: \o/ | 18:06 |
* mordred now sandwiches | 18:06 | |
auristor | are the current releases of mirror.centos, mirror.epel, mirror.yum-puppetlabs, and mirror.opensuse from rsyncs including -t or excluding -t? I ask because they appear to be transferring the contents of the entire volume. | 18:07 |
openstackstatus | corvus: finished sending notice | 18:09 |
clarkb | auristor: while https://review.opendev.org/#/c/735753/ has merged I don't think we've applied it to the server yet as we were in a limbo state overnight (pacific time) | 18:09 |
clarkb | auristor: we're just about to get past that limbo state (we need to merge another revert I think) then that server should get updated and we can see if things improve | 18:10 |
*** mugsie has quit IRC | 18:11 | |
clarkb | corvus: ^ do we need to push and approve/merge that revert for the zk update? | 18:13 |
corvus | clarkb: yes, will do in just a sec | 18:14 |
corvus | i've reset the repo on bridge to current HEAD | 18:14 |
*** mugsie has joined #opendev | 18:15 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: Revert "Add Zookeeper TLS support" https://review.opendev.org/735990 | 18:15 |
clarkb | +2 thanks | 18:16 |
corvus | clarkb, fungi, mordred: ^ we should merge that asap, then we can re-enable ansible | 18:16 |
fungi | approved | 18:22 |
openstackgerrit | Merged openstack/project-config master: Revert "Deprecate Paunch" https://review.opendev.org/735893 | 18:24 |
*** tosky_ has joined #opendev | 18:26 | |
*** tosky has quit IRC | 18:28 | |
*** iurygregory has quit IRC | 18:33 | |
*** tosky_ is now known as tosky | 18:38 | |
frickler | something looks very broken now https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_53b/735536/3/gate/tempest-full/53be1fa/job-output.txt | 19:15 |
*** hashar has quit IRC | 19:15 | |
mordred | frickler: I agree | 19:17 |
clarkb | did ansible update as part of our zuul changes? | 19:20 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Make sure pip is installed for python releases https://review.opendev.org/736001 | 19:21 |
fungi | parsing error around https://opendev.org/openstack/devstack-gate/src/branch/master/roles/test-matrix/tasks/main.yaml#L24 | 19:22 |
fungi | could it be the trailing +? | 19:22 |
clarkb | fungi: he error was: template error while templating string: no filter named 'match' | 19:22 |
fungi | hrm, no, last touched two years ago | 19:22 |
clarkb | I think its mad about the match | 19:22 |
ianw | fungi: isn't it "no filter named 'match'" | 19:23 |
clarkb | newer ansible does a thing where you can't | some things | 19:23 |
clarkb | you have to is them or something | 19:23 |
fungi | ahh, yeah, around line 29 | 19:25 |
fungi | i guess we don't record the ansible version in our build inventory either | 19:26 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Make sure pip and wheel are installed for python releases https://review.opendev.org/736001 | 19:30 |
ianw | fungi: isn't that 2020-06-16 18:44:04.254108 | Ansible Version: 2.9.5 | 19:36 |
fungi | oh, cool so we do log it at least | 19:38 |
fungi | and, wow, i guess we're using very new ansible there as clarkb predicted | 19:38 |
mordred | 2.9.9 is latest ansible 2.9 | 19:39 |
mordred | so 2.9.5 is from feb 13 - I'd expect no behavior change | 19:40 |
fungi | yeah, but new as in 2.9 and not our platform default (still 2.7 right?) | 19:40 |
ianw | https://zuul.openstack.org/builds?job_name=tempest-full -- the ones that passed before were on 2.8.8 | 19:40 |
fungi | ahh, 2.8 | 19:40 |
fungi | so something seems to have switched that job to use 2.0 | 19:40 |
fungi | 2.9 | 19:40 |
ianw | between 2020-06-16 13:51:38 and 2020-06-16 19:16:23 | 19:41 |
clarkb | we updated zuul | 19:43 |
clarkb | which may have updated the default from 2.8 to 2.9? | 19:43 |
ianw | https://review.opendev.org/#/c/736006/ switches it to "is" which i think is the fix | 19:45 |
clarkb | ianw: ya that seems right from memory | 19:46 |
ianw | https://docs.ansible.com/ansible/latest/user_guide/playbooks_tests.html#test-syntax | 19:46 |
ianw | i guess that makes it interesting that we might now have pip/venv and ansible issues all together | 19:47 |
frickler | there's more warnings about 2.9 breaking things like https://zuul.opendev.org/t/openstack/build/806eae234ba646409c0a73833797d080/log/job-output.txt#4865 | 19:52 |
clarkb | mordred: https://zuul.opendev.org/t/openstack/build/6728299a161b44d9ab25ed85b9c941bb/log/applytest/puppetapplytest30.final.out.FAILED any idea why we are getting those? that should be ansible talking to localhost right? | 19:52 |
ianw | frickler: hrm, nice catch ... i guess the pain of updating this is more useful than working around it | 19:54 |
ianw | looks like --become is drop in for --sudo, so that change switches that now too | 19:58 |
corvus | that --sudo thing isn't because of the zuul upgrade, right? that's the internal devstack-gate ansible command | 19:58 |
*** iurygregory has joined #opendev | 19:59 | |
ianw | true, i wonder if we pin the ansible we install there | 19:59 |
clarkb | ianw: we do | 20:00 |
corvus | are there any errors related to zuul's upgrade to ansible 2.9? | 20:01 |
ianw | ANSIBLE_VERSION=${ANSIBLE_VERSION:-2.7.14} | 20:01 |
clarkb | corvus: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_53b/735536/3/gate/tempest-full/53be1fa/job-output.txt | 20:01 |
clarkb | corvus: that is the only error/failure I'm aware of so far | 20:01 |
corvus | oookay, so it's part of https://review.opendev.org/736006 | 20:02 |
ianw | yeah, i can split that into two to put the --become thing after, if we like | 20:03 |
corvus | i just saw a bunch of command line stuff, i missed the test_matrix bit | 20:03 |
corvus | ianw: meh, if it passes tests, i say we leave it alone :) | 20:03 |
ianw | (it seems we'll eventually hit the opposite issue of 2.7 breaking and not being supported, and having to update that, at some point) | 20:04 |
corvus | presumably that should all work since it's emitting the warning | 20:04 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead https://review.opendev.org/735923 | 20:05 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Make sure pip and wheel are installed for python releases https://review.opendev.org/736001 | 20:05 |
mordred | clarkb: I am baffled as to why that isn't working | 20:09 |
mordred | corvus: which change is that for? | 20:10 |
mordred | gah | 20:10 |
mordred | clarkb: | 20:10 |
clarkb | mordred: cool not just me then | 20:10 |
clarkb | I rechecekd it for that reason, going to see if it is more consistent | 20:10 |
mordred | clarkb: I saw that issue crop up yesterday I think - so we might actually need to investigate something | 20:11 |
mordred | clarkb: OR | 20:11 |
mordred | that might be motivation to move those out of site.pp and into their own job using the real stuff | 20:11 |
* frickler is done for the day, hoping things look better tomorrow | 20:12 | |
clarkb | frickler: thanks! I'll try to catch up on devstack reviews after lunch | 20:12 |
clarkb | and now it is time for lunch | 20:13 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Replace build-sphinx-docs jobs https://review.opendev.org/736016 | 20:21 |
AJaeger | mordred: I suggest to be friendly to those projects using build-sphinx - even if none of them merged a change for over a year ;( ^ | 20:22 |
openstackgerrit | Merged opendev/system-config master: Revert "Add Zookeeper TLS support" https://review.opendev.org/735990 | 20:24 |
mordred | AJaeger: ++ | 20:30 |
clarkb | infra-root ^ has merged, should we rm the DISABLE-ANSIBLE file? | 20:36 |
clarkb | Looks like that revert change queued up all the things too | 20:36 |
*** roman_g has quit IRC | 20:36 | |
mordred | clarkb: then yeah - as long as the revert change is what's queued up | 20:36 |
clarkb | https://review.opendev.org/#/c/735990/ is what i see as queued up | 20:37 |
mordred | yup. same | 20:37 |
mordred | let's remove it | 20:37 |
ianw | it's be worth keeping an eye; yesterday i noticed for example the mirror job failing due to our weird blank afs volumes on shome hosts | 20:37 |
clarkb | and that is the revert | 20:37 |
corvus | clarkb: ++ you doing it? | 20:37 |
clarkb | corvus: mordred I can do that now | 20:37 |
mordred | ++ | 20:37 |
clarkb | done | 20:37 |
clarkb | hrm the is vs | match thing is in devstack too? | 20:40 |
clarkb | considering that is affecting the devstack uwsgi fixes we may want to set openstack tenant to ansible 2.8 default | 20:40 |
clarkb | that way we can land the uwsgi fixes then the ansible fixes | 20:41 |
ianw | clarkb: no, i don't think so | 20:41 |
ianw | i'm trying to catch up on where the other fixes are at | 20:42 |
clarkb | ianw: https://zuul.opendev.org/t/openstack/build/53be1fab81f643aab08e84fb51126a9c/console that seems to be a native zuul devstack job with the same failure | 20:42 |
clarkb | ianw: I think the code may have been copy pastad from d-g to devstack? | 20:42 |
ianw | huh, ok. i think i'd be of the opinion we could force merge a fix of "|" to "is" if required | 20:43 |
clarkb | ya we could also do that | 20:43 |
ianw | ahh, perhaps this is only on old branches | 20:45 |
clarkb | ianw: fwiw I think the actual fixes for uwsgi seem to be fine | 20:45 |
clarkb | ianw: its just a matter of landing them in a bottom up fashion so that grenade is happy | 20:45 |
clarkb | prod install ansible job completed successfully | 20:46 |
clarkb | prod base is running now | 20:46 |
ianw | clarkb: yeah, that's right, so test-matrix is gone on master. on older branches, it's the 'test-matrix' role that comes from d-g | 20:48 |
clarkb | aha | 20:48 |
ianw | i.e. 736006 should fix what you posted above | 20:48 |
clarkb | its actually pulling it from d-g? | 20:48 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add stop and start playbooks for nodepool https://review.opendev.org/736031 | 20:48 |
mordred | clarkb, corvus: ^^ there's the re-org we discussed during the maint | 20:49 |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Ensure pip is installed for propose-update-constraints https://review.opendev.org/736032 | 20:49 |
corvus | mordred: should there be a nodepool_launcher/start.yaml ? | 20:51 |
ianw | clarkb: The error appears to be in <blah> opendev.org/openstack/devstack-gate/roles/test-matrix/tasks/main.yaml': line 24, column 3, but may ... so yeah | 20:51 |
mordred | corvus: there already is | 20:52 |
mordred | corvus: we almost even sort of maybe got this somewhat right :) | 20:53 |
clarkb | ianw: ya I'm trying to find where we run test-matrix role on the devstack side and failing but I expect you are right | 20:53 |
clarkb | ianw: should we maybe promote the d-g chagne to the gate? | 20:53 |
ianw | clarkb: it actually passed the bits it changed, right? | 20:54 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Trigger zuul and nodepool on start/stop playbook changes https://review.opendev.org/736039 | 20:55 |
mordred | clarkb, corvus: ^^ that said, we shoudl do that too | 20:55 |
clarkb | ianw: thats a tricky question :/ | 20:56 |
clarkb | we run a bunch of non legacy jobs against d-g :/ | 20:57 |
ianw | clarkb: and not actually "tempest-full" | 20:58 |
clarkb | ianw: seems like https://29c14fb02adb24d88a67-f81dd8fd4f9add75179602ffbcf81b7d.ssl.cf1.rackcdn.com/736006/2/check/legacy-tempest-neutron-full-stable/24b8db6/logs/devstacklog.txt indicates it failed on uwsig as expected | 20:58 |
clarkb | whcih implies it did get past the test-matrix thing? | 20:58 |
clarkb | ianw: but also it runs neutron's multinode job which is failing on a venv thing? I'm going to look at the venv thing now I guess. But suspect we should maybe force merge the d-g change then enqueue uwsgi fixes to the gate | 20:59 |
clarkb | nevermind the multinode thing failed on uwsgi | 21:00 |
ianw | legacy-tempest-dsvm-neutron-full-centos-7 (3. attempt) ... that might be showing an issue? if it's in a pre playbook | 21:01 |
ianw | "msg": "No package matching 'python-pip' found available, installed or updated" on that ... urgh some other issue then | 21:03 |
clarkb | #status log Rebooted logstash-worker02 and 13 as ansible base.yaml complained it could not reach them | 21:03 |
openstackstatus | clarkb: finished logging | 21:03 |
clarkb | if anyone is wondering what the base run is taking so long ^ | 21:03 |
clarkb | those serversseemed to be out to lunch | 21:03 |
clarkb | 02 is back, still waiting on 13 | 21:03 |
ianw | i'm struggling to see any of the devsatck-gate tests that have actually tested devstack-gate :/ | 21:05 |
clarkb | ianw: the centos job is failing on no package-pip available | 21:08 |
ianw | yeah, i'll have to look at that. i guess epel is involved | 21:08 |
clarkb | but again no complaints with your chagne itself. I think we may be ok, but hard to know for sure | 21:08 |
clarkb | I'm still somewhat inclined to merge it, then roll forward on the uwsgi fixes | 21:09 |
mordred | ++ | 21:10 |
mordred | I think that sounds right | 21:10 |
ianw | clarkb: we could depends-on https://review.opendev.org/#/c/735536/ to it, and make sure it gets going, then merge it | 21:11 |
clarkb | ianw: that works, then we could also direct enqueue 735536 to the gate once the d-g change is in | 21:11 |
ianw | i think you have to have a stable/train run to see it | 21:11 |
clarkb | lets do that | 21:11 |
clarkb | we also need train to merge before anything else as the uwsgi fixes need to be bottom up | 21:12 |
clarkb | I like that plan | 21:12 |
clarkb | are you updating that change or should I? | 21:12 |
clarkb | (with the depends on I eamn | 21:12 |
ianw | clarkb: umm, i can, just a sec | 21:12 |
ianw | ok, https://review.opendev.org/735536 updated with depends-on https://review.opendev.org/736006 | 21:15 |
ianw | we can watch the tempest-full job and should see the d-g changes apply early in the run, then be confident it's safe to merge | 21:15 |
clarkb | infra-root the base job timed out (due to the ssh'ing issues) | 21:17 |
clarkb | I expect a rerun of that now would be happier since I restarted those servers. But I wonder if we can make it timeout ssh attempts much more quickly? | 21:18 |
ianw | getting some breakfast, will come back to check on d-g stuff | 21:18 |
clarkb | mordred: ^ do you know? I kinda think zuul must do somethign along those lines talking to zuul test nodes? | 21:18 |
clarkb | ianw: https://zuul.opendev.org/t/openstack/stream/cfd2de2f98ff4e62afd7e56d12428621?logfile=console.log that job | 21:31 |
clarkb | it isn't quite started up yet but should have a console log soon | 21:31 |
clarkb | ianw: also service-bridge is running right now whihc shoudl configure the dns stuff if you have a moment to check that when it is done | 21:32 |
clarkb | ianw: it succeeded and the cron job is installed | 21:37 |
clarkb | do we want to run it in the foreground early to ensure it works? | 21:37 |
clarkb | overall things are looking good \o/ | 21:38 |
clarkb | ianw: we have console log on that job now | 21:41 |
clarkb | it looks like test matrix ran successfully | 21:46 |
clarkb | ianw: want to confirm? | 21:46 |
ianw | clarkb: looking | 21:48 |
ianw | 2020-06-16 21:44:30.393760 | TASK [test-matrix : Append neutron to configs for stable/ocata+] | 21:50 |
ianw | 2020-06-16 21:44:31.186053 | controller | ok | 21:50 |
ianw | the become stuff should be coming in a tic | 21:51 |
ianw | https://zuul.openstack.org/stream/7aba2eec60134836bc520c196fffcf34?logfile=console.log has used the --become flags too | 21:55 |
ianw | clarkb: so i agree, the changed bits of 736006 have run successfully so if you agree i'll force merge it | 21:56 |
clarkb | I'm double checking the --become now | 21:57 |
clarkb | ianw: have a timestamp for --become the search function doesn't seem to work | 21:57 |
ianw | clarkb: ~ 2020-06-16 21:54:45.083996 | primary | + /opt/stack/new/devstack-gate/devstack-vm-gate.sh:setup_ssh:L81: /tmp/ansible/bin/ansible all --become -f 5 -i /home/zuul/workspace/inventory -m file -a 'path='\''/root/.ssh'\'' mode=0700 state=directory' | 22:01 |
*** Eighth_Doctor has quit IRC | 22:01 | |
clarkb | oh its a different job that makes sense | 22:01 |
clarkb | ianw: yup I agree that is all good we should merge it now I think | 22:02 |
ianw | ok i'll do that now | 22:03 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: WIP: prepare-workspace: Set root dir to zuul build uuid https://review.opendev.org/735980 | 22:04 |
*** rchurch has quit IRC | 22:06 | |
*** factor has joined #opendev | 22:07 | |
*** rchurch has joined #opendev | 22:09 | |
ianw | that's merged | 22:16 |
clarkb | I thinkwe can enqueue https://review.opendev.org/#/c/735536/ and https://review.opendev.org/#/c/735523/ to the gate now? | 22:17 |
*** Eighth_Doctor has joined #opendev | 22:17 | |
clarkb | I guess that second one doesn't have enough +2's yet technically (or approval) | 22:17 |
clarkb | but I think we can apply that too | 22:18 |
ianw | yeah, ++ | 22:22 |
ianw | i just have to run to school and back, back in about 15 min | 22:23 |
clarkb | k I'll enqueue the first one now | 22:23 |
*** icarusfactor has joined #opendev | 22:26 | |
*** factor has quit IRC | 22:26 | |
*** Eighth_Doctor has quit IRC | 22:27 | |
mordred | clarkb: WHY THE PUPPET JOB FAILED AGAIN ARGHHHHH | 22:29 |
clarkb | mordred: well I've just finished with the devstack things, but I think I need a break for a minute but I can probably help wtih that next | 22:29 |
clarkb | mordred: we probably want to confirm what it is sshing to? like should it be (is it) using local connection? | 22:29 |
mordred | clarkb: I'm honestly not 100% sure - this is one of those things using infra-spec-helper | 22:30 |
mordred | clarkb: I'm about at EOD and definitely don't have enough brain pellets for debugging this | 22:30 |
mordred | clarkb: BUT | 22:30 |
mordred | clarkb: what I think might be more useful is first thing in the morning when my trough of pellets is more full, I crank through a patch to split the puppets into their system-config-run jobs | 22:31 |
mordred | oh - and one to be more specific with inventory file matchers | 22:32 |
clarkb | mordred: are you a traeger now? | 22:32 |
mordred | clarkb: yes | 22:32 |
*** iurygregory has quit IRC | 22:33 | |
*** mugsie has quit IRC | 22:33 | |
*** auristor has quit IRC | 22:33 | |
*** jaicaa_ has quit IRC | 22:33 | |
*** yoctozepto has quit IRC | 22:33 | |
*** ttx has quit IRC | 22:33 | |
*** cgoncalves has quit IRC | 22:33 | |
*** amotoki has quit IRC | 22:33 | |
*** bhagyashris has quit IRC | 22:33 | |
*** dmsimard has quit IRC | 22:33 | |
*** fdegir has quit IRC | 22:33 | |
*** tbarron has quit IRC | 22:33 | |
*** corvus has quit IRC | 22:33 | |
*** ysandeep|away has quit IRC | 22:33 | |
*** tristanC has quit IRC | 22:33 | |
*** wrenchyfrenchy has quit IRC | 22:33 | |
*** dirk has quit IRC | 22:33 | |
*** johnsom has quit IRC | 22:33 | |
*** rchurch has quit IRC | 22:33 | |
*** moppy has quit IRC | 22:33 | |
*** smcginnis has quit IRC | 22:33 | |
*** hillpd has quit IRC | 22:33 | |
*** wendallkaters has quit IRC | 22:33 | |
*** SotK has quit IRC | 22:33 | |
*** owalsh has quit IRC | 22:33 | |
*** tobiash has quit IRC | 22:33 | |
*** dpawlik6 has quit IRC | 22:33 | |
*** mnaser has quit IRC | 22:33 | |
*** mlavalle has quit IRC | 22:33 | |
*** tkajinam has quit IRC | 22:33 | |
*** shtepanie has quit IRC | 22:33 | |
*** rajinir has quit IRC | 22:33 | |
*** spotz has quit IRC | 22:33 | |
*** sgw has quit IRC | 22:33 | |
*** calcmandan has quit IRC | 22:33 | |
*** seongsoocho has quit IRC | 22:33 | |
*** vblando has quit IRC | 22:33 | |
*** mnasiadka has quit IRC | 22:33 | |
*** prometheanfire has quit IRC | 22:33 | |
*** Dmitrii-Sh has quit IRC | 22:33 | |
*** icarusfactor has quit IRC | 22:33 | |
*** olaph has quit IRC | 22:33 | |
*** avass has quit IRC | 22:33 | |
*** melwitt has quit IRC | 22:33 | |
*** cmurphy has quit IRC | 22:33 | |
*** logan- has quit IRC | 22:33 | |
*** paladox has quit IRC | 22:33 | |
*** panda has quit IRC | 22:33 | |
*** jhesketh has quit IRC | 22:33 | |
*** AJaeger has quit IRC | 22:33 | |
*** JayF has quit IRC | 22:33 | |
*** ChanServ has quit IRC | 22:33 | |
*** markmcclain has quit IRC | 22:33 | |
*** persia has quit IRC | 22:33 | |
*** odyssey4me has quit IRC | 22:33 | |
*** mgagne has quit IRC | 22:33 | |
*** hrw has quit IRC | 22:33 | |
*** jbryce has quit IRC | 22:33 | |
*** sshnaidm has quit IRC | 22:33 | |
*** elod has quit IRC | 22:33 | |
*** cloudnull has quit IRC | 22:33 | |
*** ianw has quit IRC | 22:33 | |
*** osmanlicilegi has quit IRC | 22:33 | |
*** openstackgerrit has quit IRC | 22:33 | |
*** frickler has quit IRC | 22:33 | |
*** zbr has quit IRC | 22:33 | |
*** DSpider has quit IRC | 22:33 | |
*** Open10K8S has quit IRC | 22:33 | |
*** mrunge has quit IRC | 22:33 | |
*** aannuusshhkkaa has quit IRC | 22:33 | |
*** donnyd has quit IRC | 22:33 | |
*** jrosser has quit IRC | 22:33 | |
*** noonedeadpunk has quit IRC | 22:33 | |
*** rpittau|afk has quit IRC | 22:33 | |
*** rm_work has quit IRC | 22:33 | |
*** jroll has quit IRC | 22:33 | |
*** mordred has quit IRC | 22:33 | |
*** ykarel|away has quit IRC | 22:33 | |
*** fungi has quit IRC | 22:33 | |
*** tosky has quit IRC | 22:33 | |
*** diablo_rojo has quit IRC | 22:33 | |
*** mtreinish has quit IRC | 22:33 | |
*** knikolla has quit IRC | 22:33 | |
*** kevinz has quit IRC | 22:33 | |
*** clarkb has quit IRC | 22:33 | |
*** stephenfin has quit IRC | 22:33 | |
*** andreykurilin has quit IRC | 22:33 | |
*** avass has joined #opendev | 22:37 | |
*** olaph has joined #opendev | 22:37 | |
*** icarusfactor has joined #opendev | 22:37 | |
*** johnsom has joined #opendev | 22:37 | |
*** AJaeger has joined #opendev | 22:37 | |
*** jhesketh has joined #opendev | 22:37 | |
*** panda has joined #opendev | 22:37 | |
*** paladox has joined #opendev | 22:37 | |
*** fungi has joined #opendev | 22:37 | |
*** ykarel|away has joined #opendev | 22:37 | |
*** mordred has joined #opendev | 22:37 | |
*** jroll has joined #opendev | 22:37 | |
*** rm_work has joined #opendev | 22:37 | |
*** rpittau|afk has joined #opendev | 22:37 | |
*** noonedeadpunk has joined #opendev | 22:37 | |
*** jrosser has joined #opendev | 22:37 | |
*** aannuusshhkkaa has joined #opendev | 22:37 | |
*** mrunge has joined #opendev | 22:37 | |
*** donnyd has joined #opendev | 22:37 | |
*** Open10K8S has joined #opendev | 22:37 | |
*** zbr has joined #opendev | 22:37 | |
*** Dmitrii-Sh has joined #opendev | 22:37 | |
*** prometheanfire has joined #opendev | 22:37 | |
*** calcmandan has joined #opendev | 22:37 | |
*** sgw has joined #opendev | 22:37 | |
*** logan- has joined #opendev | 22:37 | |
*** cmurphy has joined #opendev | 22:37 | |
*** melwitt has joined #opendev | 22:37 | |
*** dpawlik6 has joined #opendev | 22:37 | |
*** tobiash has joined #opendev | 22:37 | |
*** owalsh has joined #opendev | 22:37 | |
*** SotK has joined #opendev | 22:37 | |
*** wendallkaters has joined #opendev | 22:37 | |
*** hillpd has joined #opendev | 22:37 | |
*** smcginnis has joined #opendev | 22:37 | |
*** moppy has joined #opendev | 22:37 | |
*** rchurch has joined #opendev | 22:37 | |
*** andreykurilin has joined #opendev | 22:37 | |
*** stephenfin has joined #opendev | 22:37 | |
*** dirk has joined #opendev | 22:37 | |
*** wrenchyfrenchy has joined #opendev | 22:37 | |
*** ChanServ has joined #opendev | 22:37 | |
*** cgoncalves has joined #opendev | 22:37 | |
*** amotoki has joined #opendev | 22:37 | |
*** tepper.freenode.net sets mode: +o ChanServ | 22:37 | |
*** bhagyashris has joined #opendev | 22:37 | |
*** dmsimard has joined #opendev | 22:37 | |
*** fdegir has joined #opendev | 22:37 | |
*** tbarron has joined #opendev | 22:37 | |
*** corvus has joined #opendev | 22:37 | |
*** ysandeep|away has joined #opendev | 22:37 | |
*** tristanC has joined #opendev | 22:37 | |
*** iurygregory has joined #opendev | 22:37 | |
*** mugsie has joined #opendev | 22:37 | |
*** auristor has joined #opendev | 22:37 | |
*** jaicaa_ has joined #opendev | 22:37 | |
*** yoctozepto has joined #opendev | 22:37 | |
*** hrw has joined #opendev | 22:38 | |
*** jbryce has joined #opendev | 22:38 | |
*** sshnaidm has joined #opendev | 22:38 | |
*** elod has joined #opendev | 22:38 | |
*** cloudnull has joined #opendev | 22:38 | |
*** ianw has joined #opendev | 22:38 | |
*** osmanlicilegi has joined #opendev | 22:38 | |
*** openstackgerrit has joined #opendev | 22:38 | |
*** frickler has joined #opendev | 22:38 | |
*** ttx has joined #opendev | 22:38 | |
*** markmcclain has joined #opendev | 22:38 | |
*** odyssey4me has joined #opendev | 22:38 | |
*** mgagne has joined #opendev | 22:38 | |
*** persia has joined #opendev | 22:38 | |
*** JayF has joined #opendev | 22:38 | |
*** mnaser has joined #opendev | 22:39 | |
*** mlavalle has joined #opendev | 22:39 | |
*** tkajinam has joined #opendev | 22:39 | |
*** shtepanie has joined #opendev | 22:39 | |
*** rajinir has joined #opendev | 22:39 | |
*** spotz has joined #opendev | 22:39 | |
*** jrosser has quit IRC | 22:40 | |
*** tosky has joined #opendev | 22:40 | |
*** mnaser has quit IRC | 22:41 | |
*** mtreinish has joined #opendev | 22:42 | |
*** kevinz has joined #opendev | 22:42 | |
*** knikolla has joined #opendev | 22:42 | |
*** clarkb has joined #opendev | 22:42 | |
*** jrosser has joined #opendev | 22:43 | |
*** mnaser has joined #opendev | 22:48 | |
*** vblando has joined #opendev | 22:51 | |
corvus | okay, it looks like we can run zk with ssl and plain in parallel. i'll work on splitting out my zk-all-the-things change to just add ssl listening the zk server | 22:51 |
*** diablo_rojo has joined #opendev | 22:52 | |
*** Eighth_Doctor has joined #opendev | 22:54 | |
*** mnasiadka has joined #opendev | 23:02 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Make sure wheel is installed for python releases https://review.opendev.org/736001 | 23:03 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead https://review.opendev.org/735923 | 23:03 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Make sure wheel is installed for python releases https://review.opendev.org/736001 | 23:08 |
ianw | infra-root: bridge.openstack.org is really dying with a bunch of what seem to be old ansible-playbook services | 23:13 |
ianw | i'm going to kill all the ones from jun 13,14,15 | 23:13 |
clarkb | ianw: I wonder if those timed out like I saw base do today and sometimes they don't die | 23:14 |
mordred | theyr'e all remote_puppet_else | 23:14 |
mordred | oh - that's not true | 23:14 |
mordred | a few are base | 23:14 |
mordred | but most of them are remote_puppet_else | 23:14 |
mordred | ianw: I support the killing - but I thinkw e should now keep our eyes on this and try to figure out why these are hung | 23:15 |
ianw | host 23.253.242.14 | 23:15 |
ianw | 14.242.253.23.in-addr.arpa domain name pointer logstash-worker15.openstack.org. | 23:15 |
ianw | appears to be what hung them up... | 23:15 |
clarkb | ianw: 02 and 13 I rebooted in response to seeing them fail ssh in today's base log | 23:15 |
clarkb | perhaps that set of servers got live migrated ro something and tripped up ansible | 23:16 |
clarkb | I do think reducing ansible ssh timeout would be good if possible | 23:16 |
mordred | yeah - ssh-ing to logstash-worker15.openstack.org is hanging from there | 23:16 |
mordred | clarkb: ++ | 23:16 |
mordred | sshing to logstash-worker15.openstack.org from my laptop is also hanging | 23:17 |
ianw | yeah there's 36 hung ssh's to it on bridge | 23:17 |
clarkb | k let me reboot it like the other 2 | 23:17 |
mordred | ianw: we should kill those ssh's | 23:17 |
ianw | yep, doing that now | 23:17 |
mordred | -o ConnectTimeout=10 | 23:18 |
clarkb | reboot issued | 23:18 |
mordred | that dosn't seem to have done anything | 23:18 |
mordred | but it's in our ssh commands | 23:18 |
mordred | oh | 23:18 |
clarkb | mordred: "This value is used only when the target is down or really unreachable, not when it refuses the connection." | 23:18 |
mordred | clarkb: we have a control persist target | 23:18 |
mordred | clarkb: nod | 23:19 |
ianw | $ ps -aef | grep "/bin/bash" | wc -l | 23:20 |
ianw | 1812 | 23:20 |
ianw | ummm | 23:20 |
mordred | wow | 23:21 |
ianw | i think something has fork bombed | 23:21 |
mordred | clarkb: what's the deal with stable branch devstack? | 23:21 |
corvus | i closed all my terminals yesterday | 23:21 |
clarkb | ianw: the earliest one seems owned by init | 23:22 |
clarkb | mordred: well the fixes were enqueued to the gate where they seem to have failed | 23:22 |
clarkb | mordred: HTTPError: 404 Client Error: Not Found for url: http://mirror.gra1.ovh.opendev.org/wheel/ubuntu-16.04-x86_64/pkg-resources/ so maybe not the fault of the changes | 23:23 |
mordred | clarkb: awesome | 23:24 |
clarkb | ianw: the bashes have gone away did you do anything? | 23:24 |
ianw | i kill -9 1018 | 23:24 |
clarkb | rgr | 23:24 |
ianw | which was the one owned by init at the top | 23:24 |
mordred | ssh to logstash15 no longer hangs | 23:25 |
mordred | tons of bash again | 23:25 |
mordred | ianw: it's owned by you | 23:26 |
ianw | i think rax-dns-backup is a fork bomb | 23:26 |
ianw | it is ... it calls itself ... wtf | 23:26 |
openstackgerrit | Merged zuul/zuul-jobs master: Make sure wheel is installed for python releases https://review.opendev.org/736001 | 23:26 |
clarkb | wasn't it python? | 23:26 |
clarkb | ianw: we must've gotten the file resource wrong in asnible? | 23:27 |
ianw | content: rax-dns-backup | 23:28 |
ianw | yeah... | 23:28 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: rax-dns-backup: fix copy file typo https://review.opendev.org/736063 | 23:29 |
ianw | clarkb: ^ i feel like that might work better | 23:29 |
mordred | ianw: +A | 23:34 |
mordred | ianw: you might have to fight fork-bombs until that lands | 23:34 |
ianw | mordred: it should only trigger at 2am; i was just going to run it manually once to confirm it | 23:35 |
mordred | also - TIL file content can be a script | 23:35 |
mordred | wait - isn't it .... OOHHHHHHHHHHHHH | 23:35 |
mordred | I understand | 23:35 |
mordred | ianw: maybe rm the file that's on disk anyway :) | 23:36 |
ianw | good idea, done | 23:36 |
auristor | looks like all of the volume transactions finally completed | 23:46 |
clarkb | auristor: ianw also we should have updated our rsync commands at this point | 23:47 |
ianw | yeah, agree, next runs should drop "-t" | 23:49 |
*** tosky has quit IRC | 23:50 | |
ianw | http://grafana.openstack.org/d/ACtl1JSmz/afs?viewPanel=12&orgId=1 should (hopefully) show a marked decline soon ... | 23:51 |
auristor | what triggers the sync? | 23:51 |
ianw | auristor: they are just cron jobs | 23:53 |
ianw | auristor: installed @ https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/tasks/rsync.yaml#L64 , specifically | 23:54 |
auristor | if everything on afs02.dfw will be replicated to afs01.ord, vicepa on afs01.ord will need to be increased | 23:54 |
clarkb | auristor: I don't think it will be beacuse it would never finihs | 23:56 |
auristor | it doesn't finish because each vos release is transferring the entire contents of each volume | 23:57 |
fungi | even just initially, we have around 3tib of data we'd need to sync, and pushing that even halfway across the usa (from dallas to chicago) will take a very long time | 23:57 |
fungi | though we could slowly add volumes to the set we replicate there and knock it out eventually, even just something as innocuous as ubuntu making a new point release will mean a days-log vos release to copy the updated data | 23:59 |
auristor | sure. the way you start it is by "vos dump" to a local file. compress the file. scp it. decompress and feed it to "vos restore" to create the volume on the afs01.ord. "vos addsite" and then the next incremental will send just the diff from the version that was restored. | 23:59 |
fungi | ahh, yeah that might not be so bad | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!