*** abhishekk is now known as akekane|home | 05:21 | |
*** akekane|home is now known as abhishekk | 05:22 | |
*** ykarel_ is now known as ykarel | 05:43 | |
*** jpena|off is now known as jpena | 07:35 | |
*** rpittau|afk is now known as rpittau | 07:52 | |
*** akekane_ is now known as abhishekk | 08:26 | |
*** ykarel is now known as ykarel|lunch | 08:32 | |
*** ykarel|lunch is now known as ykarel | 10:06 | |
*** jcapitao is now known as jcapitao_lunch | 10:41 | |
*** rlandy is now known as rlandy|ruck | 11:08 | |
yoctozepto | I am wondering why Masakari renos do not show up https://docs.openstack.org/releasenotes/masakari-dashboard/unreleased.html the promote job has run successfully but its effects cannot be seen | 11:35 |
---|---|---|
*** jpena is now known as jpena|lunch | 11:39 | |
fungi | yoctozepto: do you see them being built in the docs job? | 11:40 |
fungi | er, nevermind. openstack projects seem to use a separate releasenotes job | 11:41 |
*** hberaud_ is now known as hberaud | 11:44 | |
fungi | ahh, the releasenotes are built as part of the docs job but then the promote-openstack-releasenotes job pulls the release notes from the created docs tarball artifact and publishes those separately | 11:47 |
fungi | for example, https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_2e4/798842/3/gate/build-openstack-releasenotes/2e4842e/docs-html.tar.gz is the one https://zuul.opendev.org/t/openstack/build/7d5734d34dff4f669f0b5584aff2e15e tried to publish them from | 11:48 |
fungi | and i see a release note in the unreleased.html that includes, which is not appearing on the docs site | 11:51 |
fungi | i can see the new note and a build timestamp from saturday in /afs/.openstack.org/docs/releasenotes/masakari-dashboard/unreleased.html (the read-write volume path) | 11:56 |
fungi | but the docs site publishes from a read-only replica at /afs/openstack.org/docs/releasenotes/masakari-dashboard/unreleased.html which lacks that note and has a build timestamp from months ago (2021-05-24) | 11:57 |
fungi | so looks like that volume may have stale content, vos release could be broken, i'll look into that possibility | 11:58 |
fungi | ianw: if you're still around, where did the vos release cronjob for static site volumes move to? | 12:04 |
yoctozepto | thanks fungi for handling the issue | 12:05 |
fungi | oh, right, we moved it from cron to ansible, so i guess bridge initiates and logs it now | 12:06 |
fungi | looks like vos release has been failing for the docs volume for at least a week (that's as far back as our log retention goes) | 12:10 |
fungi | i suppose it's possible we rebooted afs01.dfw in the middle of a vos release of the docs volume when we were doing kernel updates a few weeks back | 12:14 |
fungi | there's a tarballs volume update in progress since a few minutes ago, but once it completes i'll take the lock for that cronjob in a root screen session on mirror-update and start trying to manually release the docs volume | 12:19 |
fungi | yeah, vos release says the vldb entry is already locked. i'll try to clear it | 12:27 |
fungi | unlocked it, vos release is running with -localauth under a root screen session on afs01.dfw now | 12:28 |
*** jpena|lunch is now known as jpena | 12:32 | |
fungi | it hasn't said anything to the tty yet though, leading me to wonder if there's an existing transaction it's waiting to see complete | 12:48 |
fungi | if this takes much longer i'll propose a patch to switch to serving sites from the read-write volume until we can get the read-only replicas back in sync | 12:50 |
yoctozepto | thanks | 13:04 |
fungi | yoctozepto: it finished. check that https://docs.openstack.org/releasenotes/masakari-dashboard/unreleased.html now shows what you expect when refreshed | 13:13 |
fungi | #status log Deleted stale vldb entry for AFS docs volume and ran vos release manually to catch up the read-only replicas | 13:13 |
opendevstatus | fungi: finished logging | 13:13 |
yoctozepto | fungi: thanks! I assume I should be expecting it now to always stay in sync? or does not it run on promote, only on some schedule? | 14:23 |
fungi | yoctozepto: the promote job writes to the read-write volume. every 5 minutes a cronjob runs to sync that read-write volume to read-only replicas. this can take upwards of a few minutes (and occasionally much longer if queued up behind a particularly large tarballs site update or something), but generally within 10 minutes after completion of the promote job you should see it reflected | 14:25 |
fungi | on the website | 14:25 |
yoctozepto | fungi: thanks, that's really helpful; I will report if it ever takes longer than one hour for these to appear | 14:26 |
fungi | thanks | 14:28 |
yoctozepto | I have a related question too; does some process try to reconcile failures such as: https://zuul.opendev.org/t/openstack/build/d0cbe655424e4b1995fdcd39eae43221 or is it only best-effort under the assumption that some later promote happens and succeeds? | 14:30 |
fungi | yoctozepto: it's assumed that a subsequent release notes publication job will succeed later and include the earlier content. also while those jobs are triggered on each branch, they all build and upload notes for all branches so as long as the job succeeds later on some branch the content should be incorporated | 14:34 |
fungi | as for that exact failure, i wonder if there's a problem with the afs driver on one of the executors... looking into it now | 14:36 |
yoctozepto | fungi: thanks and thanks | 14:36 |
fungi | looks like it ran from ze12 according to the inventory | 14:36 |
fungi | no recent afs errors in its dmesg though | 14:37 |
fungi | oh, i should have looked at https://zuul.opendev.org/t/openstack/build/d0cbe655424e4b1995fdcd39eae43221/console#1/0/28/localhost closely | 14:38 |
fungi | rsync: rename "/afs/.openstack.org/docs/releasenotes/kolla-ansible/.victoria.html.mxBbLh" -> "victoria.html": No such file or directory (2) | 14:38 |
fungi | we sometimes see that when multiple release notes jobs are running at the same time and trying to update the same path | 14:39 |
fungi | and one build delete's the other build's tempfile | 14:39 |
fungi | deletes | 14:39 |
yoctozepto | makes sense; would need a mutex to solve for good | 14:40 |
yoctozepto | anyhow, good to know the details | 14:40 |
fungi | yep, the 803845,1 build for stable/wallaby was running at exactly the same tie | 14:40 |
fungi | time | 14:40 |
fungi | we'd need to do cross-branch mutexes to avoid this, right | 14:41 |
fungi | basically there were updates for the release notes on the stable/ussuri and stable/wallaby branches of the same project trying to rsync --delete at the exact same time | 14:42 |
fungi | so one deleted the other's tempfiles because they weren't expected content | 14:42 |
yoctozepto | can't be truer | 14:43 |
*** ykarel is now known as ykarel|away | 14:48 | |
*** jpena is now known as jpena|off | 15:32 | |
*** rpittau is now known as rpittau|afk | 16:26 | |
zul | Where would I be adding an envlist for this one? https://zuul.opendev.org/t/openstack/build/005f12d679ef4891ae00d87008ca681f | 16:59 |
clarkb | zul: if you go to https://zuul.opendev.org/t/openstack/build/005f12d679ef4891ae00d87008ca681f/console (its the console button on the right side of the link you gave above) you'll see it is the tox role that is failing (its the 'tox:' prefix that gives that away). Then you can look up that role in https://opendev.org/zuul/zuul-jobs to find the args it takes : | 17:02 |
clarkb | https://opendev.org/zuul/zuul-jobs/raw/branch/master/roles/tox/README.rst | 17:02 |
clarkb | looks like you want to set a var called tox_envlist on the job | 17:03 |
zul | thanks | 17:03 |
*** rlandy|ruck is now known as rlandy|drappt | 17:21 | |
*** rlandy|drappt is now known as rlandy | 19:43 | |
*** rlandy is now known as rlandy|ruck | 19:45 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!