fungi | ahh, yeah, so it's taking a long time even when rsync hasn't run | 00:00 |
---|---|---|
ianw | https://lists.openafs.org/pipermail/openafs-info/2019-September/042865.html includes a link to afs audit logs from an rsync run | 00:00 |
fungi | ahh, right, this is ringing a bell now | 00:02 |
fungi | but https://review.opendev.org/681367 didn't actually solve it? | 00:02 |
ianw | fungi: it seems not, the graph is showing every release takes ~8 hours | 00:05 |
ianw | this is what led to the path of doing the release via ssh and -localauth | 00:07 |
ianw | fungi: perhaps we should leave mirror-update off for a bit and investigate again? | 00:09 |
fungi | yeah, not a bad idea | 00:10 |
ianw | for a start, when fedora gets in sync, we could turn on file auditing and run a "vos release" with a zero-delta and see what happens | 00:10 |
openstackgerrit | Merged openstack/project-config master: Drop pip-and-virtualenv from images https://review.opendev.org/734428 | 00:29 |
openstackgerrit | Merged openstack/project-config master: Use https apt mirrors for image builds https://review.opendev.org/735362 | 00:30 |
auristor | ianw fungi: afs vice partitions should be noatime but that won't alter the contents of the incremental dumps. | 00:48 |
auristor | A third fileserver should be added so that there is always a redundant clone in case of a failure of afs01.dfw | 00:49 |
*** Meiyan has joined #opendev | 01:01 | |
*** xiaolin has joined #opendev | 01:04 | |
ianw | auristor: it's probably a bit of a moot point though when it's basically in a constant state of "vos release" (i.e. the next one starts immediately after the previous one finishes) | 01:06 |
*** xiaolin has quit IRC | 01:11 | |
auristor | not really. The point is that while afs01 is updating afs02, there is no valid copy on afs02. the only consistent copy is on afs01. which is at 100% network capacity so sending fetches there from clients only makes things slower. If afs03 existed, then either afs02 or afs03 would be online with a self consistent copy while the release was taking place. | 01:22 |
ianw | we have afs01.ord too, i'm not sure if it's deliberately or it's just an accident of history | 01:26 |
auristor | is anything replicated to it? mirror.centos for example is not | 01:27 |
clarkb | ianw: ord was used until we hit the window sizing issues | 01:28 |
clarkb | the idea was to be offsite for resiliency but that meant copies took forever | 01:28 |
ianw | hrm, i don't remember that but ok; that probably explains the odd mix of replications we have | 01:29 |
auristor | docs, docs.dev, mirror, project, project.airship, root.afs, root.cell, and user.corvus | 01:29 |
auristor | if throughput to ord is a problem, then I suggest standing up a afs03.dfw. | 01:30 |
auristor | I really wish we could figure out some way that auristorfs could be used to host this cell | 01:30 |
ianw | we have updating to bionic and 1.8 as a increasingly insistent todo | 01:31 |
auristor | openafs 1.8 will help a bit with rx issues but it isn't going to fix most of the underlying issues | 01:32 |
ianw | auristor: it's still true we shouldn't mix 1.6 and 1.8 servers? i think that's the assumption we've been working under | 01:34 |
auristor | absolutely not | 01:34 |
ianw | auristor: sorry, we absolutely should not mix them, or it's ok to? :) | 01:36 |
auristor | there are no data format or wire protocol changes between 1.6 and 1.8. mix and match to your hearts content. | 01:44 |
auristor | what command line options are passed to rsync? | 01:52 |
ianw | auristor: rsync -rltDiz | 01:53 |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/fedora-mirror-update#L101 | 01:53 |
ianw | looks like everything has released now | 02:04 |
auristor | vos status afs02.dfw reports no transactions | 02:05 |
ianw | i can try a release on fedora now and see what happens | 02:06 |
ianw | since the update server is shutdown, nothing has written to it | 02:06 |
ianw | if we want i can restart with audit logging | 02:08 |
auristor | I don't think there is any interesting audit logging for the release. its the rsync that is interesting from my perspective. | 02:08 |
fungi | yep, confirmed, the fedora and opensuse volume releases did finally complete some time in the last few minutes | 02:11 |
auristor | As we discussed many months ago, the vos release is going to send all directories and any files that changed from five minutes before the last release time. The last release time was 2s after the last update time. | 02:11 |
auristor | s/five minutes/fifteen minutes/ | 02:15 |
ianw | auristor: yeah, that's why we put in the sleep https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/fedora-mirror-update#L152 | 02:21 |
ianw | istr we did try that experiment, running multiple releases | 02:23 |
auristor | ianw: instead of performing a "vos release" that will require network bandwidth and taking the afs02.dfw volume offline, could you execute | 02:23 |
auristor | vos size -server afs01.dfw.openstack.org -part a -id 536871007 -dump -time "2020-06-13 15:04" | 02:23 |
ianw | Volume: 536871007 | 02:24 |
ianw | dump_size: 306725041646 | 02:24 |
auristor | and remove the -time switch and parameter | 02:25 |
auristor | That is effectively the entire volume | 02:26 |
ianw | Volume: 536871007 | 02:26 |
ianw | dump_size: 306840822582 | 02:26 |
ianw | echo $(( 115780936 / 8 / 1024 / 1024)) | 02:27 |
ianw | 13 | 02:27 |
ianw | ~13 gb difference ? | 02:27 |
auristor | why dividing by 8? | 02:28 |
ianw | oh it's bytes | 02:28 |
auristor | 110MB difference which is nothing | 02:29 |
auristor | if you specify the time as "2020-06-14" what do you get? | 02:29 |
ianw | Volume: 536871007 | 02:30 |
ianw | dump_size: 15613187 | 02:30 |
auristor | the times listed by vos examine are local times. So I'm giving you EDT. Use vos examine mirror.fedora from the machine the vos size command is being executed on and use that time | 02:31 |
auristor | Last Update time | 02:31 |
ianw | all, all the hosts run in UTC | 02:31 |
auristor | vos doesn't | 02:32 |
ianw | i'm doing this on afs01 | 02:32 |
auristor | I'm not on afs01. So my Last Update Sat Jun 13 15:04:11 2020 | 02:32 |
ianw | Last Update Sat Jun 13 19:04:11 2020 | 02:33 |
auristor | Provide that time to vos size | 02:33 |
ianw | ianw@afs01:~$ vos size -server afs01.dfw.openstack.org -part a -id 536871007 -dump -time "2020-06-13 19:04:11" | 02:34 |
ianw | Volume: 536871007 | 02:34 |
ianw | dump_size: 15613266 | 02:34 |
auristor | 14MB which will be the size of the directories | 02:34 |
auristor | subtract 15m from that time and what do you get? | 02:35 |
ianw | $ vos size -server afs01.dfw.openstack.org -part a -id 536871007 -dump -time "2020-06-13 18:45" | 02:35 |
ianw | Volume: 536871007 | 02:35 |
ianw | dump_size: 15613266 | 02:35 |
auristor | the problem isn't the incremental dump | 02:36 |
auristor | rsync the content from mirror.fedora.readonly to mirror.fedora. That should be "no change" Then perform the "vos size with -time "2020-06-13 18:45"" again | 02:38 |
ianw | umm, ok, i want to be very careful i don't destroy things with an errant command :) | 02:40 |
auristor | you can copy mirror.fedora to a new volume | 02:40 |
auristor | vos copy -id mirror.fedora -fromserver 104.130.138.161 -frompart a -toname test.fedora -toserver 104.130.138.161 -topart a | 02:43 |
auristor | then mount test.fedora so you can rsync to it | 02:43 |
ianw | ok, i just have a dry-run going anyway to see what it thinks about things | 02:44 |
ianw | rsync -avz --dry-run /afs/openstack.org/mirror/fedora/ /afs/.openstack.org/mirror/fedora/ reports nothing to do | 02:45 |
auristor | those aren't the rsync options you indicated earlier | 02:46 |
auristor | of -rltDiz the most interesting is -t | 02:47 |
ianw | https://static.opendev.org/mirror/logs/rsync-mirrors/fedora.log | 02:50 |
ianw | does have verbose logging on that should show if rsync touches anything | 02:50 |
ianw | that's the itemize changes (-i) which will show why it updated files | 02:51 |
auristor | the behavior I observed was that rsync didn't update the data but it set the last update time on files it didn't modify | 02:52 |
ianw | the vos copy i guess will take a while | 02:55 |
auristor | sadly its performed via rx over loopback | 02:55 |
ianw | i can strace the rsync to see exactly what it touches | 02:56 |
auristor | the fileserver audit log would tell as well | 02:56 |
ianw | right, i'm pretty sure that's what i got @ http://people.redhat.com/~iwienand/fedora-mirror-11-09-2019.tar.gz | 02:57 |
auristor | I wonder if this is the problem with the openafs client | 02:58 |
auristor | ip->i_mtime.tv_sec = vp->va_mtime.tv_sec; | 02:58 |
auristor | /* Set the mtime nanoseconds to the sysname generation number. | 02:58 |
auristor | * This convinces NFS clients that all directories have changed | 02:58 |
auristor | * any time the sysname list changes. | 02:58 |
auristor | */ | 02:58 |
auristor | ip->i_mtime.tv_nsec = afs_sysnamegen; | 02:58 |
auristor | in other words, the nsec component of the mtime reported by the openafs client is not going to match the nsec time that rsync obtains from the source | 02:59 |
auristor | if the data hasn't changed, rsync won't rewrite it. but with -t it will try to fix the mtime | 03:00 |
auristor | In the FileAuditLog you are looking for AFS_SRX_StStat events | 03:01 |
ianw | i feel like that would show in itemized-changes | 03:02 |
auristor | AFS_SRX_StStat events for a FID without a AFS_SRX_StData event | 03:02 |
ianw | i think maybe if i bring mirror-update back online, and get in there fast and take the update lock, then i should be able to run the exact rsyncs under strace | 03:04 |
ianw | that seems the lowest impact way to get data right now | 03:04 |
auristor | ok | 03:06 |
ianw | ok, i've commented out the cron run and will update the script and run manually | 03:11 |
ianw | it's running in a screen on mirror-update | 03:16 |
ianw | logging to ~ianw/rsync-run | 03:17 |
ianw | lstat("Modular/x86_64/os/Packages/p/perl-Time-Piece-1.31-415.module_2570+32b47dc0.x86_64.rpm", {st_mode=S_IFREG|0644, st_size=43780, ...}) = 0 | 03:17 |
ianw | utimensat(AT_FDCWD, "Modular/x86_64/os/Packages/p/perl-Time-Piece-1.31-415.module_2570+32b47dc0.x86_64.rpm", [UTIME_NOW, {tv_sec=1544180283, tv_nsec=202155000} /* 2018-12-07T10:58:03.202155000+0000 */], AT_SYMLINK_NOFOLLOW) = 0 | 03:17 |
ianw | is basically it | 03:17 |
ianw | this isn't a zero delta, it's bringing in a bunch of stuff from upstream | 03:21 |
ianw | ok, it's into that "+ sleep 1200" period | 03:22 |
ianw | ianw@afs01:~$ vos size -server afs01.dfw.openstack.org -part a -id 536871006 -dump | 03:42 |
ianw | Volume: 536871006 | 03:42 |
ianw | dump_size: 306816062285 | 03:42 |
ianw | ianw@afs01:~$ vos size -server afs01.dfw.openstack.org -part a -id 536871006 -dump -time "2020-06-15 03:00" | 03:42 |
ianw | Volume: 536871006 | 03:42 |
ianw | dump_size: 306700281349 | 03:42 |
ianw | i don't know if that is right, but that's 110mb difference from before and now | 03:42 |
auristor | -time "2020-06-13 18:45" | 03:44 |
ianw | $ vos size -server afs01.dfw.openstack.org -part a -id 536871007 -dump -time "2020-06-13 18:45" | 03:46 |
ianw | Volume: 536871007 | 03:46 |
ianw | dump_size: 15613266 | 03:46 |
auristor | you want the incremental dump of the RW | 03:47 |
ianw | well the release has started | 03:49 |
ianw | i've put mirror-update in emergency so the cron job doesn't come back | 03:52 |
*** ykarel|away is now known as ykarel | 03:55 | |
auristor | I'm done for the night. | 03:59 |
ianw | auristor: thanks, i think if we do some manual tracing of zero-delta updates we can get some more info to go off | 04:01 |
AJaeger | ianw: what kind of cleanup is needed after https://review.opendev.org/735301? | 05:47 |
openstackgerrit | Merged openstack/project-config master: Add github sync job for tricircle https://review.opendev.org/735417 | 05:54 |
AJaeger | ianw: I see you left the plain ones in - ok, so no need for cleanup *yet*. | 05:55 |
ianw | AJaeger: yeah, i'll get rid of everything after it's settled | 06:00 |
ianw | delete the zuul-jobs testing, then the nodes can go | 06:01 |
AJaeger | ok | 06:07 |
openstackgerrit | Felix Edel proposed zuul/zuul-jobs master: Return upload_results in upload-logs-swift role https://review.opendev.org/733564 | 06:19 |
openstackgerrit | Felix Edel proposed zuul/zuul-jobs master: Return upload_results in test-upload-logs-swift role https://review.opendev.org/735503 | 06:19 |
*** ysandeep is now known as ysandeep|afk | 06:31 | |
*** priteau has joined #opendev | 06:34 | |
AJaeger | infra-root, I just saw "Could not connect to mirror.mtl01.inap.opendev.org:443 (198.72.125.6), connection timed " ;( | 06:50 |
AJaeger | Log: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_91d/735494/5/check/tempest-full-py3/91dbf66/job-output.txt | 06:50 |
AJaeger | happens in https://b7511727a7deb59d79f6-083f3205a01a368b196dd0a8486413e5.ssl.cf2.rackcdn.com/735494/5/check/neutron-tempest-linuxbridge/cfb66a5/job-output.txt as well | 06:51 |
ianw | AJaeger: hrm, it's up and i can talk to it | 06:51 |
AJaeger | ianw: I cannot from here | 06:52 |
ianw | yeah, apache not talking but the host is | 06:52 |
ianw | it's been up 200+ days, i'm going to reboot it | 06:53 |
ianw | there's nothing in dmesg for over a month | 06:53 |
AJaeger | thanks | 06:53 |
ianw | ok responding now | 06:55 |
ianw | #status log rebooted mirror.mtl01.inap.opendev.org due to unresponsive apache processes | 06:56 |
openstackstatus | ianw: finished logging | 06:56 |
*** ykarel is now known as ykarel|afk | 06:56 | |
ianw | fungi/auristor: i think the nanosecond comment is honing in on the problems; there's constant calls to utimensat() on no-op rsyncs | 06:57 |
ianw | mirror-update.opendev.org:~ianw/rsync-run/rsync.3911 is an example | 06:58 |
*** ykarel|afk is now known as ykarel | 07:00 | |
*** iurygregory has joined #opendev | 07:11 | |
*** tosky has joined #opendev | 07:28 | |
*** DSpider has joined #opendev | 07:40 | |
-openstackstatus- NOTICE: uWSGI made a new release that breaks devstack, please refrain from rechecking until a devstack fix is merged. | 07:41 | |
*** rpittau|afk is now known as rpittau | 08:00 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
*** ykarel is now known as ykarel|lunch | 08:04 | |
ianw | fungi/auristor: i think that's the smoking gun -- http://paste.openstack.org/show/794754/ -- that just uses utimensat to update the mtime. it's always "1". i have to think about the implications | 08:09 |
ianw | is it as easy as dropping "-t"? | 08:10 |
frickler | #status log force-merged https://review.opendev.org/735517 and https://review.opendev.org/577955 to unblock devstack and all its consumers after a new uwsgi release | 08:15 |
openstackstatus | frickler: finished logging | 08:15 |
hrw | morning | 08:34 |
*** ysandeep|afk is now known as ysandeep | 08:41 | |
*** ykarel|lunch is now known as ykarel | 08:49 | |
*** priteau has quit IRC | 09:11 | |
*** priteau has joined #opendev | 09:21 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 09:25 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 09:35 |
hrw | 4 days of weekend were great. but had to end. | 09:46 |
hrw | http://mirror.regionone.linaro-us.opendev.org/ feels weird. does not list anything anymore (did in past). something changed? | 09:49 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 09:50 |
ykarel | looks like centos mirrors are gone again https://mirror.ca-ymq-1.vexxhost.opendev.org/ or it was not fixed for the provider | 09:54 |
ykarel | just seen in a job https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_552/727200/73/check/tripleo-ci-centos-8-containers-multinode/5524ccf/job-output.txt | 09:54 |
AJaeger | infra-root, any idea? Looks good on https://mirror.mtl01.inap.opendev.org/centos/ | 09:56 |
hrw | hm. looks like mirrors are weird state or sth. | 09:58 |
hrw | linaro-us one feels empty | 09:58 |
priteau | Is Zuul a bit slow today? It took 6 minutes between W+1 and starting gate jobs on https://review.opendev.org/#/c/734040/ | 10:01 |
*** rpittau is now known as rpittau|bbl | 10:03 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 10:03 |
*** Meiyan has quit IRC | 10:05 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 10:13 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 10:28 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 10:35 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 10:49 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 11:02 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 11:12 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 11:18 |
*** hashar has joined #opendev | 11:20 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 11:28 |
*** ykarel is now known as ykarel|afk | 11:30 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 11:35 |
*** nautics889 has joined #opendev | 11:45 | |
*** nautics889 has quit IRC | 11:55 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 11:58 |
*** rpittau|bbl is now known as rpittau | 12:04 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 12:07 |
*** ysandeep is now known as ysandeep|afk | 12:07 | |
ianw | hrw: i dunno ... ls on /afs/openstack.org times out | 12:07 |
ianw | there's a lot of messages in there about dropped connections | 12:08 |
AJaeger | argh ;( | 12:11 |
ianw | it's annoying and i just rebooted it ... another afs issue to investigate longer term :/ | 12:12 |
AJaeger | thanks, ianw | 12:19 |
hrw | ianw: thanks | 12:35 |
*** ysandeep|afk is now known as ysandeep | 12:45 | |
*** ykarel|afk is now known as ykarel | 12:46 | |
auristor | ianw: I'm just returning to my desk. From my reading of the rsync repository the nanosec comparison is a fairly recent addition and -t sends the timestamp to the remote for time optimization. If -t is not set, then the timestamp comparison optimization is ignored and comparison of the data contents is used exclusively. In the case, of rsync and /afs the timestamp comparison doesn't work anyway so I think leaving it off is the | 12:52 |
auristor | . | 12:52 |
*** priteau has quit IRC | 12:54 | |
*** priteau has joined #opendev | 12:55 | |
fungi | ianw: should i check all the mirror frontends to make sure there's not more of them hung, or have you already? | 13:10 |
hrw | https://review.opendev.org/#/c/730331 got refreshed so Kolla now uses wheel cache first and then pypi mirror as a fallback. | 13:24 |
mordred | hrw: cool | 13:26 |
*** hashar has quit IRC | 13:33 | |
fungi | infra-root: seems there are some jobs failing on afs writes from the zuul executors. i'm going through and checking them one by one, so far i've shutdown the zuul-executor service on ze01 | 13:39 |
hrw | checking build time difference now | 13:39 |
fungi | er, sorry, on ze04 | 13:39 |
fungi | okay, ze04 seems to have been the only one which couldn't ls /afs/.openstack.org/docs/ | 13:40 |
corvus | fungi: i'm around - need anything? | 13:41 |
fungi | similar to the mirrors ianw was looking at, `ls /afs/.openstack.org/` on ze04 is empty | 13:41 |
fungi | corvus: sanity checks maybe | 13:41 |
fungi | still just cleaning up from the afs01.dfw outage late saturday utc | 13:42 |
auristor | fs checkservers -all | 13:42 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add tests for upload-docker-image https://review.opendev.org/735402 | 13:42 |
auristor | fs checkvolumes | 13:42 |
fungi | auristor: sadly, those give me "All servers are running." and "All volumeID/name mappings checked." but `ls /afs/.openstack.org/` is still coming back empty | 13:43 |
fungi | (on this particular client that is) | 13:43 |
fungi | interestingly dmesg there doesn't report any "lost contact" log entries from around or after the outage | 13:45 |
*** ysandeep is now known as ysandeep|afk | 13:46 | |
fungi | i have a feeling if i restarted afsd and possibly also did an rmmod/modprobe of the openafs lkm, this would go back to normal | 13:47 |
fungi | rebooting the other clients which exhibited similar issues with the ro replicas seemed to solve it, but unfortunately doesn't tell us much about what the actual problem was | 13:48 |
openstackgerrit | David Moreau Simard proposed openstack/project-config master: Create a new project for recordsansible/ara-collection https://review.opendev.org/735439 | 13:49 |
fungi | though this particular client is one out of a redundant cluster of a dozen servers, so we can more easily keep it like this for a bit to poke around | 13:49 |
fungi | interestingly it sees the read-only tree under /afs/openstack.org/ just not the read/write tree under /afs/.openstack.org/ | 13:50 |
openstackgerrit | Drew Walters proposed openstack/project-config master: Add missing project to Airship doc job https://review.opendev.org/734874 | 13:52 |
corvus | fungi: i don't have any other ideas | 13:54 |
corvus | fungi: i agree that a client restart may be in order | 13:55 |
fungi | being down one out of twelve executors for a bit is likely fine, so i'm happy leaving it like this in case there are other ideas of things we want to check first | 13:57 |
corvus | fungi: i ran 'fs flush /afs/.openstack.org' and things have improved | 14:00 |
fungi | corvus: oh, indeed, that seems to now be returning expected content | 14:00 |
fungi | so was it possible it cached an empty state for the cell root? | 14:00 |
corvus | that's what it looks like | 14:01 |
corvus | auristor: ^ fyi | 14:01 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Forward user-committee ML to openstack-discuss https://review.opendev.org/733673 | 14:04 |
corvus | docs volume under that looks fine | 14:05 |
fungi | `ls /afs/.openstack.org/mirror/` on ze04 is taking several minutes to complete so far | 14:05 |
hrw | 0:05:21.497262 versus 0:23:49.306074 is nice improvement | 14:07 |
fungi | hrw: is that the speedup from using prebuilt wheels? | 14:08 |
hrw | fungi: yes | 14:08 |
fungi | significant! | 14:08 |
hrw | we have two images which suck time. waiting for second one | 14:08 |
corvus | fungi: well, that might call for a reboot :/ | 14:09 |
fungi | corvus: yeah, it's still blocking... | 14:10 |
fungi | i mean, technically the executor shouldn't need to write to /afs/.openstack.org/mirror/ at the moment (though when we get the wheel builder jobs reworked it will) | 14:11 |
fungi | i'm just more worried it's indicative of deeper problems | 14:11 |
*** priteau has quit IRC | 14:12 | |
corvus | fungi: agreed. at this point, i'd suggest we restart the client or reboot (reboot since it's more thorough and no less disruptive) | 14:14 |
fungi | it just now returned | 14:16 |
fungi | after spitting out "ls: cannot access '/afs/.openstack.org/mirror/fedora': Resource temporarily unavailable" | 14:16 |
AJaeger | also: I presented something when I visited Amundi in February. Do you need anything else? | 14:16 |
AJaeger | fungi, https://mirror.ord.rax.opendev.org/centos/7/os/x86_64/Packages/virt-what-1.18-4.el7.x86_64.rpm is failing to download | 14:17 |
AJaeger | gives a forbidden ;( | 14:17 |
AJaeger | (ignore my first pasto :( | 14:17 |
fungi | fungi@mirror01:~$ ls /afs/openstack.org/mirror/centos/ | 14:18 |
fungi | ls: cannot access '/afs/openstack.org/mirror/centos/': Connection timed out | 14:18 |
corvus | 'fs checkservers' is unhappy here | 14:19 |
fungi | checkservers on mirror01.ord.rax.opendev.org is taking a while | 14:20 |
corvus | These servers unavailable due to network or server problems: mirror01.ord.rax.opendev.org. | 14:20 |
corvus | slighly counterintuitive message :/ | 14:20 |
fungi | that looks like the 127.0.1.1 problem showing up again | 14:20 |
corvus | iiuc, that was a volume which had its vldb entry set to 127.0.1.1 ? | 14:21 |
corvus | those were all fixed, right? | 14:21 |
fungi | that's what i thought | 14:21 |
corvus | dmesg says: [Jun13 20:45] afs: Lost contact with file server 104.130.138.161 in cell openstack.org (code -1) (all multi-homed ip addresses down for the server) | 14:22 |
corvus | and no "back up" message | 14:22 |
fungi | that's when afs01.dfw hung, yeah | 14:22 |
corvus | maybe we should go with a reboot here too? | 14:22 |
corvus | or afsd restart | 14:23 |
fungi | i can give that a shot first | 14:23 |
corvus | k | 14:23 |
hrw | should I use http://mirror.regionone.linaro-us.opendev.org:8080/wheel/debian-10-aarch64/ or http://mirror.regionone.linaro-us.opendev.org/wheel/debian-10-aarch64/ on CI? | 14:24 |
hrw | :8080 gives 403 ;( | 14:24 |
openstackgerrit | Monty Taylor proposed opendev/zone-opendev.org master: Add review-test https://review.opendev.org/735600 | 14:25 |
fungi | hrw: the wheel cache is served over 80 and 443, 8080 is a proxy | 14:26 |
hrw | fungi: thanks. was not sure | 14:26 |
hrw | updated patch | 14:27 |
fungi | corvus: i ended up rebooting it because afsd wouldn't stop | 14:27 |
corvus | i suspected as much :) | 14:28 |
fungi | #status log rebooted mirror01.ord.rax.opendev.org to clear hung openafs client state | 14:28 |
openstackstatus | fungi: finished logging | 14:28 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Make a review-test that we run ansible on https://review.opendev.org/735602 | 14:28 |
hrw | fungi: need to check does requirements-tox-py3x-check-uc* jobs in openstack/requirements use cache too | 14:30 |
mordred | corvus, fungi: those two patches ^^ should help me finish standing up review-test so that I can rsync / mysqldump the existing prod content over. I made a private hostvars file for it with what I think is the bare minimum of secrets (we don't need a bunch of the prod ones for this) - and I moved group_vars/review.yaml to host_vars/review01.openstack.org.yaml | 14:31 |
corvus | mordred: i guess we want to keep review-dev around for testing without production-copy data, which is why this is a new server and not repurposing that? | 14:32 |
fungi | i was about to hard reboot mirror01.ord.rax.opendev.org via api, but oob console just showed it finally giving up waiting on [something] to terminate | 14:32 |
AJaeger | config-core, please review https://review.opendev.org/#/c/734874/ - the starlingx team needs this to prepare for the election | 14:33 |
mordred | corvus: yeah - although I think we could also consider merging the two ideas at some point - now that we don't replicate to github, I think we could move gtest to the production gerrit and then have a review-dev like the one I'm setting up for review-test that gets a periodic data rsync from review | 14:34 |
mordred | but I didn't want to block upgrade testing on getting that done | 14:34 |
corvus | ++ | 14:34 |
fungi | mirror01.ord.rax.opendev.org is back online now and i can `ls /afs/openstack.org/mirror/centos/` successfully | 14:35 |
hrw | looks like I will have a change which touch all jobs | 14:35 |
openstackgerrit | Marcin Juszkiewicz proposed zuul/zuul-jobs master: pip.conf: use wheel cache first and fallback to pypi mirror https://review.opendev.org/735606 | 14:40 |
hrw | can config-core take a look at ^^? | 14:40 |
hrw | I hope that commit message is clear enough | 14:40 |
fungi | hrw: it's not clear to me why that's necessary. pypi doesn't try things in sequence, it pulls all the indices and then decides what to download | 14:42 |
fungi | extra-index-url isn't a "fallback" it's just yet another index it incorporates | 14:42 |
fungi | otherwise our wheel cache wouldn't work for any architecture | 14:43 |
hrw | ah. so maybe I messed it with :8080 used for cache at same time | 14:44 |
hrw | dropped | 14:45 |
fungi | yeah, our "pypi_mirror" is a caching proxy, our "wheel_mirror" is served directly by apache | 14:45 |
hrw | thanks fungi | 14:45 |
openstackgerrit | Merged openstack/project-config master: Add missing project to Airship doc job https://review.opendev.org/734874 | 14:46 |
hrw | INFO:kolla.common.utils.openstack-base: Downloading http://mirror.regionone.linaro-us.opendev.org/wheel/debian-10-aarch64/setproctitle/setproctitle-1.1.10-cp37-cp37m-linux_aarch64.whl (37 kB) | 14:47 |
*** sgw has quit IRC | 14:47 | |
hrw | yes ;D | 14:47 |
mnaser | hi friends -- appreciate reviews on https://review.opendev.org/#/c/735478/ | 14:48 |
fungi | hrw: yeah, if you want to see the details, the mirror servers' use this vhost configuration: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2 | 14:48 |
*** sgw has joined #opendev | 14:50 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302 | 14:50 |
fungi | hrw: according to that config, we actually make the pypi proxy available over 80/443/8080/4443 (because the proxy statements for it are included in both the basemirror and proxymirror macros) | 14:53 |
fungi | though we likely set the mirror vars to use 8080/4443 because the 80/443 proxies are just backward compatibility from when we used to host our own mirror of pypi (before it got far too large) | 14:54 |
hrw | https://7a40b7c4a1adb2feec0f-f29e759a440a8c469e5909803b48c54b.ssl.cf1.rackcdn.com/735599/1/check-arm64/requirements-tox-py38-check-uc-aarch64/3038d7d/tox/py38-check-uc-1.log works lovely ;D | 14:55 |
fungi | excellent | 14:55 |
clarkb | fungi: corvus I don't think centos was one with 127.0.1.1. The centos wheel mirror for arm64 was. The centos wheel mirror for x86 was not but it was accidentally cleaned up and recreated | 14:55 |
auristor | fungi: sorry, I had to step away. I wonder if the location server list for the cell became corrupted. | 14:55 |
*** ysandeep|afk is now known as ysandeep | 14:56 | |
*** ykarel is now known as ykarel|away | 14:56 | |
fungi | clarkb: yes. entirely possible checkservers was still trying to find 127.0.1.1 though even though we deleted and recreated those volumes | 14:56 |
mnaser | thanks corvus and AJaeger :D | 14:56 |
clarkb | fungi: but that volume was never part of the 127.0.1.1 problem? | 14:56 |
hrw | fungi: is RETRY_LIMIT on https://zuul.openstack.org/builds?job_name=requirements-tox-py38-check-uc-aarch64 means 'we need more hosts'? | 14:56 |
clarkb | or do you think that could have affected other volumes somehow? | 14:57 |
fungi | clarkb: right, that volume wasn't, i was just speculating on why the checkservers command was reporting the local hostname for the client as unavailable | 14:57 |
clarkb | got it | 14:57 |
fungi | not necessarily related to the volume access issue | 14:57 |
fungi | auristor: how do i query the location server list? | 14:57 |
* fungi checks docs | 14:57 | |
AJaeger | hrw: RETRY_LIMIT normally means: pre-playbook failed, was retried and Zuul gave up after three tries | 14:58 |
fungi | ahh, the vldb | 14:58 |
openstackgerrit | Merged opendev/zone-opendev.org master: Add review-test https://review.opendev.org/735600 | 14:59 |
hrw | AJaeger: thx | 14:59 |
fungi | the sites listed for mirror.fedora look correct (rw and ro on afs01.dfw.openstack.org, ro on afs02.dfw.openstack.org) | 14:59 |
auristor | afs clients do not forget fileservers addresses once they've been told about them. only a restart will clear the known fileserver list | 14:59 |
auristor | there is no 127.0.1.1 fileserver entry in the VLDB at this time | 15:00 |
fungi | auristor: got it, that likely explains the checkservers error hanging around | 15:00 |
auristor | fs checkvolumes should discard the known volume to fileserver address bindings. | 15:00 |
auristor | if /afs/.openstack.org/ is not accessible that sounds like a bug in the dynamic root logic. fs flush /afs or fs flush /afs/.openstack.org might clear it. | 15:03 |
auristor | I don't remember if "fs lsm /afs/.openstack.org" works for OpenAFS on dynamic root entries. | 15:03 |
fungi | auristor: yes, `fs flush /afs/.openstack.org` did clear it according to corvus | 15:03 |
openstackgerrit | Merged openstack/project-config master: Add vexxhost/atmosphere https://review.opendev.org/735478 | 15:05 |
*** mlavalle has joined #opendev | 15:05 | |
*** sgw1 has joined #opendev | 15:06 | |
auristor | That sounds like corruption of the dynamic root entry | 15:08 |
openstackgerrit | Merged zuul/zuul-jobs master: Add namespace in the collect-k8s-logs role https://review.opendev.org/731319 | 15:08 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add playbook for syncing state from review to review-test https://review.opendev.org/735610 | 15:10 |
mordred | corvus: ^^ does that seem like a sane sync playbook? | 15:11 |
mordred | corvus: my thinking is that if we shut down gerrit, sync the git repos, the indexes and the caches, apply the most recent mysqldump - we should be in a pretty equivilent state, yeah? | 15:16 |
mordred | so we can then do a test migration, see how it goes, then just do a state sync | 15:16 |
mordred | and do it again | 15:17 |
mordred | (I was originally thinking about using cloud snapshots - but I think that's too complicated honestly - because rebooting into a snapshot does stuff with ephemeral, so we'd need to invent some automation around launch-node tasks that would need to be re-done - and I think rsync will do it) | 15:17 |
corvus | mordred: things may be a little out of sync in terms of the mysqldump being behind the current prod git repo state. do you think that would be a problem? i think it would be really important to have the 2 in sync for the notedb migration, but maybe just going to 2.16 it's not as important? | 15:25 |
corvus | mordred: if we do think it's important, we could shut down prod gerrit briefly, take a mysql dump, and do a final incremental rsync. outage should only be a few minutes? | 15:26 |
clarkb | corvus: mordred: maybe as a first step having an in sync point in time we can restore is sufficient? | 15:26 |
clarkb | then once we'd decided if upgrading in sequence with online upgrades or doing one major upgrade is better we can refine that specific option with more up to date data? | 15:26 |
corvus | clarkb: sorry, i'm not following -- i'm wondering whether we need to have the mysqldb and the git repos in sync on review-test, or if having a db that's slightly older than the git repos is okay | 15:27 |
clarkb | corvus: ya I was more addressing the automation around launch node. eg we don't need a full proper sync each time we launch a new review-test. We only need one that we can copy and restore | 15:28 |
clarkb | assuming that we decide a full sync is necessary | 15:28 |
corvus | clarkb: oh yeah, i think mordred intends to keep review-test persistent; i think that playbook is an ad-hoc playbook | 15:29 |
clarkb | ah | 15:29 |
corvus | i think mordred's approach is probably okay, but we're going to have change refs for changes that aren't in the db, so pushing up new changes is almost certainly a bad idea. but just to test/time re-indexing, etc, it's probably sufficient. | 15:32 |
mordred | corvus: yeah - that. I think we could also create a point-in-time snapshot like you suggest | 15:43 |
mordred | corvus: perhaps once we're happy with an upgrade procedure we can create a consisent snapshot to test and upgrade that as a more final test before we go - so that we can test pushing up changes and stuff | 15:44 |
corvus | mordred: sounds good | 15:53 |
corvus | an outage for a PIT should be fairly short. | 15:53 |
mordred | yah | 15:54 |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302 | 16:01 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add playbook for syncing state from review to review-test https://review.opendev.org/735610 | 16:04 |
mordred | corvus: ^^ I lost to the whitespace gods | 16:05 |
*** sshnaidm_ is now known as sshnaidm | 16:05 | |
fungi | the only way to win is not to play | 16:06 |
*** ysandeep is now known as ysandeep|away | 16:13 | |
*** rpittau is now known as rpittau|afk | 16:17 | |
mnaser | hi friends | 16:27 |
mnaser | is there any chance that zuul logging is borked because of some ooms? | 16:27 |
fungi | i can look | 16:28 |
mnaser | https://zuul.opendev.org/t/vexxhost/status <= my jobs here when clicking go straight to END OF STREAM | 16:28 |
mnaser | (could also be something else, but i can't tell really) | 16:28 |
fungi | most recent oom on any executor was 2020-03-29 on ze02 | 16:31 |
fungi | well, on any running executor (there was one from april on ze04 but it's currently down for evaluation) | 16:32 |
clarkb | there is a period of time between the node being assigned and the job actualyl starting on the remote node where there is no stream content | 16:32 |
fungi | we restarted our executor services on 2020-05-26 so i don't think any log streamers have been sacrificed in an oom event since then | 16:33 |
*** diablo_rojo has joined #opendev | 16:34 | |
clarkb | mnaser: fungi both jobs seem to have content now | 16:35 |
clarkb | I think the period between node assignment and job starting enough to have a streamer running is likely the cause here | 16:36 |
*** diablo_rojo has quit IRC | 16:39 | |
clarkb | corvus: did you see my question on https://review.opendev.org/#/c/730929/6 ? | 16:44 |
clarkb | also I'm double checking that we merged all the changes from friday's renaming and it appears we have. If you've got any still open please let me/us know | 16:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Don't install puppet modules when we don't need them https://review.opendev.org/735642 | 16:46 |
mordred | clarkb: ^^ I just noticed that when looking at a test run that timed out - we're installing all of the pupept modules from git in every job even when those jobs don't run puppet | 16:46 |
mordred | (t | 16:47 |
mordred | it's only taking 2 minutes - but still, that's 2 completely wasted minutes in most of our jobs) | 16:47 |
corvus | clarkb: ah yeah, looks like a rebase snafu | 16:56 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Fake zuul_connections for gate https://review.opendev.org/730929 | 16:57 |
*** diablo_rojo has joined #opendev | 16:59 | |
mordred | corvus: stop using backend hostname should be safe to land yes? | 17:00 |
mordred | (I mean, it looks that way, just checking to make sure) | 17:01 |
corvus | mordred: yeah, i think it'll all good up to WIP zookeepe | 17:01 |
mordred | cool | 17:01 |
clarkb | https://review.opendev.org/#/c/734711/ is an easy puppet code deletion if anyone has a quick moment | 17:01 |
clarkb | and https://review.opendev.org/#/c/734647/ will update a number of docker images, but helps make our python3 auditing cleaner | 17:02 |
mordred | clarkb: done on both | 17:02 |
mordred | clarkb: did we just switch out nodes to ones without virtualenv pre-installed? | 17:03 |
clarkb | gitea's 1.12.0 milestone is down to a single issue without an open PR. The other issue has an open PR that passes testing and needs review | 17:04 |
clarkb | mordred: we did | 17:04 |
mordred | because I just got a failure on system-config-legacy-logstash-filters: https://zuul.opendev.org/t/openstack/build/876b22c1c06649ea8aaea5f0733a7937 | 17:04 |
mordred | AWESOME | 17:04 |
clarkb | mordred: ianw did that during australia monday | 17:04 |
mordred | I'll get up a fix | 17:04 |
clarkb | thanks | 17:04 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Use python3 -m venv instead of virtualenv https://review.opendev.org/735643 | 17:06 |
mordred | infra-root: ^^ fix gate break | 17:06 |
mordred | clarkb: I | 17:07 |
mordred | 'I'm excited we're close to 1.12 | 17:07 |
clarkb | mordred: hrm for the venv fix I think that may still nto work on xenial beacuse xenial's pip isn't able to handle our wheel mirror config? I could be wrong about that (testing should tell us) | 17:07 |
clarkb | if it does fail due to the wheel mirror being present we can just add the ensure-virtualenv role to the job | 17:07 |
fungi | mordred: shouldn't that use -m ? | 17:09 |
fungi | at least testing locally, `python3 -v venv foo` doesn't seem to create a venv | 17:10 |
fungi | "python3: can't open file 'venv': [Errno 2] No such file or directory" | 17:10 |
clarkb | fungi: yes, the commit message got it right | 17:11 |
fungi | indeed, seems so | 17:12 |
clarkb | oh neat looks like the other issue assocaited with 1.12 is maybe not a bug | 17:13 |
clarkb | I wonder if this means we could have a 1.12.0 release this week | 17:13 |
fungi | that would be exciting | 17:13 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Use python3 -m venv instead of virtualenv https://review.opendev.org/735643 | 17:13 |
mordred | fungi, clarkb: yup. I can't type :) | 17:14 |
fungi | no worries, me neither | 17:14 |
fungi | half the time i'm lucky i can even read | 17:14 |
mordred | fungi: I think it's unreasonable to expect a single person to be able to both read AND write | 17:16 |
fungi | sometimes i can append, does that count? | 17:17 |
corvus | as i read this conversation, the word 'truncate' comes to mind | 17:18 |
mordred | corvus: that sounds like truculence to me | 17:28 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Make a review-test that we run ansible on https://review.opendev.org/735602 | 17:30 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add playbook for syncing state from review to review-test https://review.opendev.org/735610 | 17:30 |
corvus | mordred: is that when a big-rig driver .... nevermind | 17:31 |
mordred | corvus: yes | 17:34 |
dmsimard | mordred: would love a refresh of your +2 on https://review.opendev.org/#/c/735439/ <3 | 17:42 |
hrw | https://marcin.juszkiewicz.com.pl/2020/06/15/opendev-ci-speed-up-for-aarch64/ | 17:44 |
mordred | dmsimard: done | 17:46 |
dmsimard | \o/ thanks | 17:46 |
AJaeger | hrw: thanks, nice numbers on speed improvement! | 17:53 |
mordred | hrw: nice! | 17:53 |
hrw | 2020-05/#opendev:22 14:47 < hrw> I should probably find it 2-3 years ago ;D | 17:56 |
hrw | ;D | 17:56 |
*** hashar has joined #opendev | 17:57 | |
fungi | excellent article | 17:58 |
openstackgerrit | Merged openstack/project-config master: Create a new project for recordsansible/ara-collection https://review.opendev.org/735439 | 18:00 |
hrw | thx | 18:00 |
hrw | should have some links in it but I care less about seo than before ;D | 18:01 |
clarkb | it is always great to see how changes we've made help | 18:01 |
mordred | clarkb, fungi: *wat* - https://zuul.opendev.org/t/openstack/build/aee168e0d87c4dbf9337c4bee692104b | 18:18 |
mordred | does python3 -m venv not produce a venv with a working pip in it? | 18:18 |
corvus | \o/ zuul with zk tls started! https://zuul.opendev.org/t/openstack/build/dbff561b77214db19a05d9711a09634a/log/zuul01.openstack.org/debug.log | 18:19 |
clarkb | mordred: ya I think that was what I was trying to describe earlier | 18:19 |
clarkb | mordred: you may need to use ensure-virtualenv on xenial to work around python sillyness on ubuntu | 18:19 |
mordred | clarkb: ok. I'm going to do that | 18:19 |
fungi | i'm dubious that's the cause, but doing some local testing now | 18:20 |
clarkb | the problem I remember had to do with it using old pip | 18:20 |
clarkb | I would've expected a pip in the virtualenv though | 18:20 |
clarkb | possible that site pacakges changes the behavior there | 18:20 |
clarkb | and if you don't have a python3 pip installed in the system you get no pip in the venv? | 18:21 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Add Zookeeper TLS support https://review.opendev.org/720302 | 18:21 |
fungi | yeah, i don't get the behavior there. on debian/ubuntu with distro packaged python3, either you have python3-venv installed which depends on a wheel bundle including pip, or you get an error about ensurepip failing | 18:21 |
fungi | i thought maybe there was a chance --system-site-packages changed that behavior, but it doesn't seem to for me | 18:22 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Use ensure-virtualenv in legacy puppet jobs https://review.opendev.org/735643 | 18:22 |
fungi | there is a --without-pip option to the venv module | 18:22 |
fungi | maybe somehow it's defaulting on | 18:22 |
fungi | more testing | 18:23 |
* mordred isn't going to lose a lot of sleep on it - these jobs need to diaf anyway | 18:23 | |
fungi | at least in debian/sid it's installing pip into the venv for me even using distro-packaged python3-venv | 18:24 |
mordred | fungi: maybe it's clarkb's thing - if - you don't have python3-pip installed do you wind up with no pip? | 18:25 |
fungi | i did not install python3-pip | 18:26 |
fungi | and did not have it installed | 18:26 |
mordred | yeah. I agree - I just did that locally too | 18:26 |
fungi | python3-venv pulls in python3.8-venv and python-pip-whl, the latter has wheel bundles for stuff including pip | 18:26 |
mordred | and I happily have pip in the venv | 18:26 |
openstackgerrit | Ghanshyam Mann proposed openstack/project-config master: Retire Tricircle projects: finish infra todo https://review.opendev.org/728902 | 18:27 |
mordred | fungi: *WEIRD* | 18:27 |
fungi | this failed on xenial though | 18:28 |
fungi | so maybe it's older behavior? | 18:28 |
clarkb | fungi: I was just going to ask python3.8 isn't on xenial | 18:28 |
clarkb | fungi: yes that is my hunch | 18:28 |
clarkb | ianw discovered xenial to be weird | 18:28 |
fungi | yeah, i was testing on debian/sid since it's what i have locally | 18:28 |
fungi | i thought this was how the python3-venv package had worked for a while, but perhaps not so long as xenial | 18:29 |
fungi | though it looks the same from a deps standpoint | 18:30 |
fungi | python3-venv on xenial depends on python3.5-venv which depends on python-pip-whl | 18:30 |
fungi | and it in turn only depends on ca-certificates, no python3-pip or python3.8-pip or anything of the sort | 18:31 |
* mordred just tried it in a xenial container | 18:31 | |
mordred | and it worked just fine | 18:31 |
fungi | and python-pip-whl only installs .whl files under /usr/share/python-wheels/ nothing directly importable or executable | 18:31 |
clarkb | mordred: ya I just did that too | 18:31 |
fungi | https://packages.ubuntu.com/xenial/all/python-pip-whl/filelist | 18:31 |
fungi | so that build failure is *very* puzzling | 18:32 |
mordred | I'm almost interested in holding a node | 18:32 |
fungi | could the python3 -m venv call have failed but not returned an error somehow? | 18:32 |
mordred | maybe? | 18:33 |
clarkb | fungi: that seems plausible | 18:33 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Make a review-test that we run ansible on https://review.opendev.org/735602 | 18:33 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add playbook for syncing state from review to review-test https://review.opendev.org/735610 | 18:33 |
clarkb | if venv wasn't installed we'd get an error (just tested this on xenial container) | 18:34 |
clarkb | so venv needs to be there in some capacity to have it be silent like that | 18:34 |
clarkb | are we only serving the arm64 wheels on the arm64 mirror? | 18:40 |
clarkb | I guess that kinda makes sense | 18:40 |
clarkb | but with things like zuuls cross arch docker image builds we may want to put the contents for all the arches in all the mirrors | 18:40 |
mordred | clarkb: that's a good point | 18:43 |
mordred | although won't that require some logistical reworking? | 18:43 |
clarkb | mordred: I don't think so since everything is path scoped by arch already | 18:44 |
clarkb | mordred: I think it may just be a matter of having the correct symlinks on disk and apache config? | 18:44 |
mordred | nod | 18:44 |
mnaser | has anyone looked at system-config-legacy-logstash-filters or not yet? :> | 18:57 |
mnaser | i can try myself at fixing it if there's no one at it | 18:58 |
clarkb | mnaser: mordred is | 18:58 |
clarkb | mnaser: https://review.opendev.org/735643 that change | 18:58 |
mnaser | ok, cool -- /me can help if needed | 18:59 |
mordred | mnaser: it _should_ be fixed by that | 19:02 |
mordred | mnaser: and one day I'll get around to killing that job | 19:02 |
mnaser | \o/ | 19:03 |
fungi | mordred: i also noticed over the weekend pbr's unit and devstack/tempest jobs are busted too, though i haven't had time to dig into that yet | 19:04 |
clarkb | fungi: the python2 failure is using the stestr constraint for version 3.0.1 which is python3 only | 19:07 |
fungi | yeah, i'm unsurprised there | 19:07 |
clarkb | and python3 failed on some virtualenv thing which may be related to new images? though the timestamp is such that I don't think so | 19:07 |
clarkb | AttributeError: module 'virtualenv' has no attribute 'create_environment' | 19:08 |
clarkb | possible that is related to virtualenv 3 updates? | 19:08 |
fungi | oh, maybe | 19:13 |
mordred | clarkb: feel like a +A on https://review.opendev.org/#/c/735602 ? | 19:15 |
clarkb | I'll take alook after lunch | 19:15 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Don't install puppet modules when we don't need them https://review.opendev.org/735642 | 19:35 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install pip3 on codesearch https://review.opendev.org/735668 | 19:35 |
mordred | clarkb, fungi: more fallout from new nodes ^^ | 19:35 |
openstackgerrit | Merged opendev/system-config master: Use ensure-virtualenv in legacy puppet jobs https://review.opendev.org/735643 | 19:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add bit more info on disabling ansible runs https://review.opendev.org/735246 | 19:42 |
mordred | fungi: ^^ I rebased that on the logstash filters fix and added reference to disable-ansible script | 19:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Switch prep-apply.sh to use python3 https://review.opendev.org/729543 | 19:43 |
clarkb | centos 8.2 has released.Anothet thing to keep an eye on of/when failures happen | 19:46 |
clarkb | mordred: is review-test a full size node? | 20:04 |
clarkb | also looking at it we don't use the review group for group vars. We use hostvars and you've trimmed the hostvars down for review-test. Is that sufficient to ensure that things like gerritbot and launchpad syncing won't try to run in both places at once? | 20:06 |
clarkb | (we want to rpevent that and want to make sure we've considered it and I think the split host vars does that?) | 20:06 |
corvus | clarkb, mordred, fungi: https://review.opendev.org/720302 zk tls is ready -- do we want to think about doing that on friday? | 20:07 |
clarkb | corvus: I'll be around and able to help | 20:08 |
corvus | i'll add an item to the mtg agenda | 20:08 |
fungi | yeah, i can do friday, no problem | 20:09 |
openstackgerrit | Merged opendev/system-config master: Add tool to export Rackspace DNS domains to bind format https://review.opendev.org/728739 | 20:10 |
mordred | clarkb: yes | 20:13 |
mordred | clarkb: as is the rax db I made | 20:13 |
clarkb | mordred: cool so the 48g heap size won't cause problems then. What about the other thing? | 20:14 |
mordred | clarkb: well - before I did private hostvar surgery on bridge, we actually used group_vars for review for settings | 20:14 |
mordred | clarkb: but - I believe with the secrets being in host-specific files we will not be putting any secrets on review-test that would allow those services to operate | 20:15 |
clarkb | cool | 20:15 |
clarkb | that was my read of it too, just double checking | 20:15 |
clarkb | what about email | 20:15 |
mordred | (I'm pretty sure this first ansible run won't even finish because it'll be missing some required secrets) | 20:15 |
clarkb | are we concerend about gerrit sending people email? | 20:15 |
mordred | hrm. that's a good question | 20:15 |
mordred | it should really only send mail on patchset upload right? | 20:15 |
clarkb | ya I think upload and merge | 20:16 |
clarkb | as long as we avoid updating random changes we're probably fine | 20:16 |
mordred | like - as long as we're not pushing changes to or merging changes there is _SHOULD_ be fine? | 20:16 |
mordred | yeah | 20:16 |
clarkb | I've +2'd the change though zuul is unhappy with it | 20:16 |
mordred | \o/ | 20:16 |
clarkb | possibly due to the host vars | 20:16 |
mordred | let's see what's broken this time | 20:16 |
*** hashar has quit IRC | 20:17 | |
mordred | Data could not be sent to remote host "198.72.124.215". Make sure this host can be reached over ssh: ssh: connect to host 198.72.124.215 port 22: No route to host | 20:17 |
mordred | clarkb: it seems to have been unhappy trying to talk to fake review-dev | 20:17 |
clarkb | ah so maybe just a recheck? | 20:18 |
mordred | clarkb: yeah - I'll try that | 20:18 |
mordred | clarkb: oh - also - if you have a sec ... | 20:18 |
mordred | clarkb: check the most two recent commits in private hostvars and make sure I didn't derp? | 20:18 |
clarkb | mordred: it looks right to me. You renamed group_vars/review.yaml to host_vars/review01.openstack.org.yaml and added host_vars/review-test.opendev.org.yaml with minimal content | 20:20 |
mordred | clarkb: \o/ | 20:27 |
clarkb | I did git log -2 -p fwiw | 20:28 |
clarkb | corvus: I had the zuul tls changes under CD topic. Would you like me to drop ti there and discuss it as a separate item or collapse under that heading? | 20:40 |
clarkb | I'm getting ready to send the agenda out and want to make sure its got the proper attention | 20:40 |
corvus | clarkb: your choice | 20:40 |
corvus | sorry i missed it was already there | 20:41 |
clarkb | no worries | 20:41 |
clarkb | you added mroe info :) | 20:41 |
openstackgerrit | Merged opendev/system-config master: Add bit more info on disabling ansible runs https://review.opendev.org/735246 | 20:41 |
openstackgerrit | Merged opendev/system-config master: Switch prep-apply.sh to use python3 https://review.opendev.org/729543 | 20:41 |
openstackgerrit | Merged opendev/system-config master: Install pip3 on codesearch https://review.opendev.org/735668 | 20:44 |
*** rchurch has quit IRC | 20:44 | |
*** rchurch has joined #opendev | 20:45 | |
mordred | corvus, fungi: if either of you have a sec: https://review.opendev.org/#/c/735642/ is easy | 20:46 |
*** mlavalle has quit IRC | 20:47 | |
mordred | infra-root: the ensure-virtualenv patch landed and jobs work again - I have rechecked the system-config patches that had failed due to that | 20:48 |
clarkb | mordred: thanks | 20:48 |
clarkb | (I had a couple get caught by it) | 20:48 |
mordred | yeah | 20:49 |
mordred | it was actually quite the carnage - there are 7 changes in recheck right now | 20:49 |
clarkb | I'm around but will need to transition to dadops in about half an hour to run kids' remote class thing | 20:49 |
clarkb | (as general heads up) | 20:49 |
mordred | I'm around and unlikely to go anywhere for a bit as I am beset on all sides by a pile of sleeping kittens | 20:50 |
clarkb | mordred: just noticed you update https://review.opendev.org/735246 thanks for catching that | 20:57 |
fungi | mordred: i've approved it, in case you survive burial by kitten | 20:57 |
mordred | clarkb: sure nuff | 20:58 |
mordred | fungi: \o/ | 20:58 |
*** mlavalle has joined #opendev | 20:59 | |
clarkb | infra-root I've discovered https://review.opendev.org/#/c/686237/ in my spring cleaning and wonder if I should either abandon that because we dno't want the behavior or push a new patchset to do that with ansible now that we use ansible to deploy zuul? | 21:07 |
mordred | clarkb: well - maybe we should just land teh "use docker for executors" patch | 21:08 |
mordred | clarkb: https://review.opendev.org/#/c/733967/ | 21:08 |
clarkb | mordred: ++ I'll abandon my change | 21:09 |
clarkb | mordred: also I think that old change not being merge conflicted implies we can clean up some puppet things? | 21:10 |
clarkb | I'll look into that while dealing with kids' school stuff | 21:10 |
mordred | clarkb: ++ | 21:10 |
openstackgerrit | Merged opendev/system-config master: Forward user-committee ML to openstack-discuss https://review.opendev.org/733673 | 21:14 |
openstackgerrit | Merged opendev/system-config master: Change launch scripts to python3 shebangs https://review.opendev.org/734345 | 21:14 |
corvus | i have a zuul enqueue-ref command that is hung; something fishy may be going on | 21:16 |
corvus | um. the gearman certificate has expired. | 21:22 |
corvus | ha, it's the ca cert that expired | 21:24 |
corvus | the client/server certs have 10-year lives | 21:24 |
corvus | the ca only 3 | 21:24 |
corvus | istr we lost our ca infrastructure somewhere along the line | 21:24 |
corvus | but i can use zk-ca.sh to make new certs easily | 21:25 |
corvus | however, we'll need a full system restart to use them | 21:25 |
mordred | corvus: yes - I believe we decided we didn't need to replace the ca infrastructure with LE and zk-ca | 21:25 |
corvus | mordred: well, we decided that after it was removed, but yes :) | 21:26 |
mordred | corvus: yeah | 21:26 |
mordred | it wasn't like an _active_ choice ;) | 21:26 |
corvus | mordred, clarkb, fungi: perhaps we should go ahead and merge https://review.opendev.org/720302 and do the zk tls and gearman rekey all at once? | 21:27 |
clarkb | the existing connections are fine? | 21:27 |
corvus | clarkb: yeah, as long as they aren't interrupted | 21:27 |
corvus | we probably won't be brining any of those offline ze's back online till then though | 21:27 |
mordred | corvus: wcpgw? | 21:28 |
clarkb | corvus: I'm not opposed to bunlding those changes since they are related (they share a CA right?) | 21:28 |
clarkb | but I'm not really able to help at this very moment | 21:29 |
corvus | clarkb: the thing they *most* share in common is they need a full restart | 21:29 |
clarkb | corvus: got it | 21:29 |
mordred | corvus: so - I | 21:29 |
mordred | I'm game | 21:29 |
openstackgerrit | Merged opendev/system-config master: Don't install puppet modules when we don't need them https://review.opendev.org/735642 | 21:29 |
openstackgerrit | Merged opendev/system-config master: uwsgi-base: drop packages.txt https://review.opendev.org/735473 | 21:29 |
mordred | corvus: while we're full system restarting - should we land the z-e docker change? | 21:29 |
mordred | (that might be a bit much though - and we really can do that executor at a time to make sure) | 21:30 |
corvus | mordred: let's not -- we can restart executors one-at-a-time and reduce risk there | 21:30 |
mordred | corvus: oh headdesk. there's another puppet failure in the stack. looking | 21:31 |
corvus | oy i just saw that | 21:31 |
mordred | corvus: I think it's unrelated | 21:31 |
mordred | https://zuul.opendev.org/t/openstack/build/42cb6c4d6188486eae6dd2b7a05a6b5c/log/applytest/puppetapplytest07.final.out.FAILED | 21:31 |
mordred | is the failure | 21:31 |
corvus | unfortunately, that means we'll have a full re-run cycle for that | 21:31 |
mordred | yeah | 21:31 |
mordred | we could cheat | 21:31 |
corvus | mordred: and? | 21:31 |
corvus | we could force-merge all 3 changes | 21:32 |
mordred | yeah | 21:32 |
corvus | we can't enqueue-to-gate though because the zuul cli is out of commission | 21:32 |
mordred | yea. given that I don't think we want to spend _hours_ in the current situation | 21:32 |
mordred | and we do have green runs of the jobs that are actually relevant | 21:33 |
corvus | but these do all have good check results, so seems like good risk/reward. | 21:33 |
mordred | yah | 21:33 |
corvus | clarkb: are you reviewing https://review.opendev.org/720302 ? | 21:33 |
mordred | clarkb: ? | 21:33 |
clarkb | I can review but not help with the change landing itself /me looks | 21:33 |
fungi | i've resurfaced from making/consuming evening sustenance... catching up but can definitely help with a zuul restart for new certs | 21:34 |
mordred | corvus: the zk patch isn't going to fix the gearman cert though | 21:34 |
clarkb | I'm reviewing the change | 21:34 |
mordred | corvus: but that's just a zk-ca and updating private hostvars, right? | 21:34 |
corvus | mordred: correct | 21:34 |
corvus | i can do that now so that it gets incorporated into the next run | 21:35 |
fungi | also the executor daemon for ze04 is still stopped, we haven't rebooted that server yet. i didn't know if ianw might want to take a look, but we should either avoid restarting the executor on it or reboot the server | 21:35 |
mordred | cool. so - yeah - I think the sequence would be land all the patches, shut down zuul, update hostvars, run service-zuul and service-zk and then re-start yes? | 21:35 |
mordred | I suppose we can update the hostvars before doing the shutdown | 21:36 |
mordred | in fact, you could probably go ahead and update the hostvars | 21:36 |
corvus | i'd like to update hostvars; merge patches, wait for playbook completion, then restart | 21:36 |
mordred | yes | 21:36 |
mordred | I was just about to write that same thing | 21:36 |
mordred | I think ti's correct - I blame the clowder of kittens for making me take a while to reach that conclusion | 21:37 |
fungi | 720302 is safe to merge, it just won't take effect until restarts, yeah? | 21:38 |
corvus | fungi: probably? :) | 21:38 |
fungi | but yeah, i agree with hostvars first | 21:38 |
corvus | if it breaks, that's our signal to restart anyway :) | 21:39 |
fungi | no argument there ;) | 21:39 |
corvus | installing new certs now | 21:39 |
fungi | $CAROOT looks like a hipster vegetable | 21:39 |
ianw | fungi: ze04 having the same afs issues? | 21:39 |
ianw | as linaro last night i mean | 21:39 |
ianw | s/night/your local time/ :) | 21:40 |
fungi | ianw: yes, i left it stopped since it was something we could safely leave broken to evaluate | 21:40 |
fungi | the rax-ord mirror also needed a reboot | 21:40 |
openstackgerrit | Merged opendev/system-config master: Cleanup old puppet management of release-volumes.py https://review.opendev.org/734711 | 21:40 |
fungi | for similar reasons | 21:40 |
mordred | corvus: oh - fwiw - executor on ze01 is stopped because I was doing the afs+docker testing - but I think it's fine to restart when we do the restart | 21:40 |
mordred | I do not think we need it to remain stopped | 21:40 |
fungi | ianw: fs flush got the cell root browseable again, but trying to look at some subpaths of the tree timed out read operations | 21:41 |
ianw | fungi: yeah, i did poke around on the linaro mirror and didn't see anything other than a lot of disconnection/connection logs | 21:41 |
fungi | ianw: the difference with ze04 is it was having trouble getting to the rw tree rather than the ro tree | 21:41 |
fungi | also dmesg on rax-dfw mirror showed the loss of connectivity with afs01.dfw but never logged it coming back into service | 21:42 |
fungi | er, rax-ord mirror i mean | 21:42 |
clarkb | corvus: couple of questions on https://review.opendev.org/#/c/720302/17 but lgtm otherwise | 21:42 |
fungi | ianw: anyway, i spotted ze04 because a bunch of publication jobs were failing rsync calls | 21:42 |
ianw | fungi: yeah, not sure i have anything else i know to look at | 21:43 |
fungi | ianw: in that case i guess we can just make sure to reboot ze04 when we're restarting the rest of the executors | 21:44 |
corvus | clarkb: replied | 21:44 |
fungi | oh, and i did try restarting afsd on the rax-ord mirror, but it got stuck stopping | 21:44 |
fungi | and was unkillable | 21:44 |
clarkb | corvus: rgr +2 | 21:45 |
ianw | fungi: so i'm just getting reset, not sure if you saw the scrollback about the utimensat() calls and openafs not updating the ns for files and "-t" | 21:45 |
corvus | #status log re-keyed gearman tls certs (they expired) | 21:45 |
openstackstatus | corvus: finished logging | 21:45 |
ianw | fungi: yeah, i've never had any luck with anything but rebooting | 21:45 |
fungi | ianw: yep, i followed that. mismatch in timestamp expectations between openafs and rsync sounds plausible. did you try dropping -t? | 21:46 |
clarkb | ianw: I tried to understand what that meant for us, do we update our rsync flags? | 21:46 |
ianw | fungi: i plan to do a manual run under strace without the "-t" to rsync and see what happens | 21:46 |
corvus | mordred, fungi: i will force-merge now | 21:46 |
fungi | ianw: cool, i notice it's the cronjob has been running all day | 21:46 |
ianw | the fedora cron job should be commented out ... i hope at least | 21:47 |
fungi | ianw: also should we go ahead and turn mirror-update.openstack.org back on too before the reprepro mirrors fall too far behind? | 21:47 |
openstackgerrit | Merged opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409 | 21:47 |
openstackgerrit | Merged opendev/system-config master: Fake zuul_connections for gate https://review.opendev.org/730929 | 21:47 |
ianw | fungi: oh yeah, i think so, i saw your initial work to migrate that which is great too | 21:47 |
fungi | it's way incomplete, i need to find time to make progress on it | 21:48 |
openstackgerrit | Merged opendev/system-config master: Add Zookeeper TLS support https://review.opendev.org/720302 | 21:48 |
mordred | corvus: woot! | 21:48 |
fungi | ianw: oh, and somebody said a new centos 8.x release dropped today, so... probably a lot of rsync going on for that too | 21:48 |
clarkb | fungi: ya 8.2 (was me | 21:48 |
fungi | thanks clarkb! today has been a complete blur | 21:48 |
fungi | i may declare wednesday as sunday and make myself scarce ;) | 21:49 |
mordred | fungi: wednesday fednesday right? | 21:49 |
* fungi boots mirror-update.ostack.o back up | 21:49 | |
fungi | mordred: something like that, yep | 21:49 |
corvus | mordred, fungi: looks like there's a deploy backlog; i'm going to afk for 30m | 21:50 |
fungi | corvus: cool, i'll be around for a while still when you get back | 21:51 |
mordred | corvus: kk | 21:53 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Cleanup from ZK TLS transition https://review.opendev.org/735740 | 21:55 |
*** sgw has quit IRC | 21:59 | |
*** sgw has joined #opendev | 22:01 | |
fungi | just the firewall rules for now, but we can dogpile other cleanup into that if anyone knows of more | 22:02 |
fungi | we still need to update the ports in the nodepool confs in project-config, right? | 22:05 |
fungi | or is there a separate change already up to do that@ | 22:05 |
fungi | ? | 22:05 |
clarkb | fungi: I think that is in the change that merged, it does a load file and then edit then write back out again | 22:05 |
fungi | #status log started mirror-update01.openstack.org | 22:05 |
openstackstatus | fungi: finished logging | 22:05 |
fungi | clarkb: oh! so directly modifies the configs at runtime, okay | 22:06 |
clarkb | fungi: yes I think so | 22:06 |
fungi | clarkb: i'm not finding anywhere in 720302 which modifies the nodepool configs, i may be overlooking something | 22:08 |
clarkb | fungi: https://review.opendev.org/#/c/720302/17/playbooks/roles/nodepool-base/tasks/main.yaml line 69 | 22:09 |
fungi | oh, in 0playbooks/roles/nodepool-base/tasks/main.yaml there's a task to "Overwrite zookeeper-servers" and another from a previous change to "Write nodepool config" | 22:10 |
clarkb | and https://review.opendev.org/#/c/720302/17/playbooks/roles/nodepool-base/library/make_nodepool_zk_hosts.py | 22:10 |
fungi | so i guess we don't directly write out the nodepool configs from the project-config repo | 22:10 |
mnaser | hi all. Could we add me (or tc members) to https://review.opendev.org/#/admin/groups/441,members to help with the retirement of tricircle ? | 22:10 |
mordred | mnaser: done | 22:11 |
mnaser | mordred: thanks! | 22:12 |
fungi | clarkb: aha, i see we've actually had that implemented since april via https://review.opendev.org/720709 | 22:12 |
fungi | i wonder if we should either update the configs in project-config for the new ports to avoid confusion, or better yet remove the zookeeper servers from it entirely and substitute a comment saying that we inject them with ansible now | 22:13 |
fungi | having zk connection details in those configs when we're not actually relying on them is just begging someone to make updates in the wrong place down the road | 22:14 |
clarkb | fungi: ya that seems reasonable. Or maybe go back to consuming it from project-config once the dust settles on this | 22:14 |
fungi | clarkb: well, the change went in originally to support production-like test environments for our integration testing jobs | 22:15 |
fungi | so i expect we'd want to keep the capability | 22:15 |
clarkb | fungi: with /etc/hosts being written now we may be able to do that without editing the configs? | 22:15 |
clarkb | though we probably only do a single zk server I guess | 22:16 |
clarkb | (rather than 3) | 22:16 |
mnaser | mordred: i'm sorry, do you mind adding me to https://review.opendev.org/#/admin/groups/1706,members too? | 22:25 |
mnaser | seems like client core != project core | 22:25 |
mnaser | (or any other infra admin around) | 22:26 |
ianw | mnaser: done | 22:26 |
ianw | fungi: ok, i've re-run manually our fedora rsync without the "-t" | 22:26 |
mnaser | thank you ianw | 22:27 |
ianw | it's got all the lstats, but none of the utimensat() calls | 22:28 |
fungi | ianw: that's promising... any noticeable difference in rsync runtime (like is it significantly slower without -t?) | 22:29 |
ianw | 2020-06-15 22:22:06 | Running rsync for releases/31.. | 22:31 |
ianw | 2020-06-15 22:22:30 | Running rsync for updates/30... | 22:31 |
mnaser | last request in helping retire, trivial change: https://review.opendev.org/#/c/728902/4 | 22:31 |
ianw | like 24 seconds | 22:31 |
ianw | 2020-06-08 06:43:34 | Running rsync for updates/30... | 22:32 |
clarkb | any concern landing mnaser's chnage ^? specifically the zuul config update at https://review.opendev.org/#/c/728902/4/zuul/main.yaml while we fix zuul things? | 22:32 |
corvus | back | 22:32 |
ianw | 2020-06-08 06:44:11 | Running rsync for updates/31... | 22:33 |
mnaser | yeah we may hold off on that then, it's not _that_ urgent but worth deferring for later if things are goin on | 22:33 |
ianw | fungi: so yeah, in the noise | 22:34 |
fungi | ianw: in that case, sounds like we should just drop -t from our rsync calls | 22:35 |
*** ysandeep|away is now known as ysandeep | 22:43 | |
corvus | mordred, fungi: it might be worth a bit of analysis to find out why https://review.opendev.org/733673 is running all the jobs | 22:49 |
corvus | it's been running for 1.5 hours, and is maybe 1/3 through the list | 22:49 |
corvus | then there are 4 changes after it, then finally the 3 changes we need for zuul :/ | 22:50 |
mordred | corvus: because it touches inventory | 22:51 |
corvus | mordred: do we need to adjust that matcher after the recent reorg? | 22:51 |
corvus | or is that intentional? | 22:51 |
mordred | corvus: yes - but I think we still have work to do there | 22:51 |
corvus | because any job can reference the inventory hostvars of any group.... | 22:52 |
mordred | yeah. I think we're defaulting to safe currently | 22:52 |
corvus | if we're adding inventory/ to everything because of that ^ then i think maybe we'd be better off just running one job that does everything, because everything is going to touch inventory | 22:52 |
mordred | corvus: I actually thought we had some smaller matchers already | 22:53 |
corvus | but i think we can maybe narrow that down | 22:53 |
corvus | like, service-zuul should be able to say "inventory/zuul" + "inventory/zookeeper" or whatever | 22:53 |
mordred | yes - that | 22:53 |
ianw | fungi: i think dropping -t means that it doesn't detect non-size changing updates? | 22:54 |
mordred | corvus: yeah - I think we have better matchers on the CI jobs ... but haven't done the same for the prod jobs | 22:54 |
mordred | corvus: s/I think// | 22:55 |
mordred | corvus: it's totally all inventory/ in the prod jobs | 22:55 |
corvus | mordred: ok, well the good news is that none of the other changes before the zuul changes touch inventory (though one will probably run puppet-else); our final change does touch inventory though | 22:55 |
mordred | corvus: so - I think we go through and match up the file matchers we use for CI jobs with the prod versions | 22:55 |
corvus | mordred: ++ | 22:55 |
corvus | this is just a swag, we might be looking at another 5 hours before the zuul change is deployed | 22:56 |
corvus | i will not be in a position to help then | 22:56 |
corvus | mordred: perhaps we should disable something and come back tomorrow? | 22:56 |
mordred | corvus: well ... we could touch disable-ansible - that'll block all future runs | 22:57 |
mordred | so as soon as the current job is done there will be no more ansible running | 22:57 |
mordred | and we could just do a git piull and then run the relevant playbooks | 22:57 |
corvus | and then rely on -hourly to catch up whatever else was in the queue? | 22:58 |
mordred | yeah | 22:58 |
corvus | sounds like a plan | 22:58 |
mordred | the currently queued jobs will time out after an hour iirc | 22:58 |
mordred | well - each one will block for an hour | 22:58 |
mordred | o - but then we'll restart zuul - so they will go away | 22:58 |
clarkb | you're going to restart zuul | 22:58 |
mordred | yeah | 22:58 |
clarkb | ya | 22:58 |
mordred | so yeah - I think that'll totally work | 22:59 |
mordred | want me to run disable-ansible now? | 22:59 |
corvus | where's our docs for disabling ansible? | 22:59 |
corvus | https://docs.openstack.org/infra/system-config/sysadmin.html#disable-enable-puppet | 22:59 |
corvus | that's all i've found :/ | 22:59 |
mordred | clarkb just updated them | 22:59 |
mordred | but I thin kthe patch is one of teh ones landing | 22:59 |
clarkb | they are in bridge's page | 23:00 |
mordred | corvus: https://review.opendev.org/#/c/735246/ | 23:00 |
clarkb | I added them to our sysadmins page as that is where we have preexisting stuff | 23:00 |
mordred | yeah - so you would find them there eventually | 23:00 |
corvus | where's bridge's page? | 23:00 |
mordred | corvus: oh - you're looking at openstack docs | 23:01 |
mordred | corvus: https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#disable-enable-ansible | 23:01 |
corvus | we moved that without a redirect or delete? | 23:01 |
mordred | it certainly seems that way, yes. we should fix that | 23:02 |
mordred | anyway - clarkb's change still hasn't published there - so the proper instructions are still missing | 23:02 |
mordred | but they reduce to "run the disable-ansible script" | 23:02 |
corvus | i wonder how we could add a redirect? | 23:02 |
fungi | i think we can add one to the htaccess file in openstack-manuals | 23:03 |
fungi | looking | 23:03 |
corvus | #status log disabled ansible on bridge due to 5+ hour backlog with potentially breaking change at end | 23:03 |
openstackstatus | corvus: finished logging | 23:03 |
*** mlavalle has quit IRC | 23:04 | |
mordred | corvus: cool | 23:04 |
fungi | corvus: looks like we did it for infra-manual thusly: https://opendev.org/openstack/openstack-manuals/src/branch/master/www/.htaccess#L263-L266 | 23:04 |
*** tosky has quit IRC | 23:04 | |
fungi | there's also a corresponding ci test for that redirect | 23:04 |
fungi | i'll propose a similar one for system-config now | 23:04 |
corvus | fungi: thanks! | 23:04 |
mordred | fungi: ++ | 23:05 |
fungi | as soon as i finish cloning that massive repo | 23:05 |
fungi | so... slow... | 23:06 |
fungi | and cloned | 23:13 |
fungi | wow | 23:13 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Make disable-ansible fancier https://review.opendev.org/735745 | 23:15 |
corvus | fungi, mordred: ^ that's the result of a mental simulation i just performed about possible outcomes from leaving DISABLE-ANSIBLE in place overnight. | 23:15 |
mordred | corvus: yes. | 23:19 |
fungi | 23:27 <openstackgerrit> Jeremy Stanley proposed openstack/openstack-manuals master: Redirect infra/system-config to docs.opendev.org https://review.opendev.org/735747 | 23:28 |
fungi | there was some sitemap cleanup to do at the same time | 23:28 |
corvus | fungi: thanks! | 23:35 |
openstackgerrit | Merged opendev/system-config master: Be explicit about using python3 in docker images https://review.opendev.org/734647 | 23:37 |
*** DSpider has quit IRC | 23:38 | |
clarkb | are we restarting services ir leaving ansible disabled then picking it up tomorrow? | 23:52 |
openstackgerrit | Merged opendev/system-config master: Make disable-ansible fancier https://review.opendev.org/735745 | 23:54 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!