*** rlandy|ruck is now known as rlandy|out | 00:57 | |
corvus | we are at > T+2h since the rolling restart, and everything seems nominal | 01:10 |
---|---|---|
fungi | yeah, things are looking clean to me | 01:18 |
wxy-xiyuan | hi ianw, the openEuler label is there https://zuul.opendev.org/t/openstack/labels but there is no node for ready. I assume there is any build/launch error in nodepool? Could you please take a look, or how can I debug it? Thanks. | 02:00 |
ianw | wxy-xiyuan: ahh, sorry i meant to check back on that : you can see the build @ https://nb03.opendev.org/openEuler-20.03-LTS-SP2-arm64-0000000023.log | 02:01 |
ianw | it's failing in our project-config elements | 02:01 |
ianw | i have to admit that totally slipped my mind | 02:02 |
wxy-xiyuan | Nice, this log is what I need. Checking. Big thanks | 02:02 |
ianw | wxy-xiyuan: the elements in https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements will need updating | 02:05 |
wxy-xiyuan | xinliang https://nb03.opendev.org/openEuler-20.03-LTS-SP2-arm64-0000000023.log | 02:05 |
wxy-xiyuan | dib-run-parts Running /tmp/in_target.d/install.d/20-iptables | 02:05 |
wxy-xiyuan | echo 'Unsupported operating system openeuler' | 02:06 |
wxy-xiyuan | ianw ++ | 02:06 |
wxy-xiyuan | xinliang: https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/nodepool-base/install.d/20-iptables as ianw said, maybe not only iptalbes, but also other base elements need update | 02:07 |
xinliang | wxy-xiyuan: thanks, looking at it | 02:07 |
xinliang | wxy-xiyuan: these elements haven't been tested before | 02:09 |
fungi | system-config-run-mirror-update seems like it may have started consistently timing out in the run phase | 02:31 |
opendevreview | wangxiyuan proposed openstack/project-config master: Add openEuler disto support for elements https://review.opendev.org/c/openstack/project-config/+/821794 | 02:33 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772 | 02:50 |
ianw | fungi: hrm -- i think that does mount afs as part of it. that's the only part that i think would cause problems | 02:53 |
ianw | 2021-12-15 02:02:59.852337 | bridge.openstack.org | 2021-12-15 01:50:38,158 INFO ansible: changed: [mirror-update01.opendev.org] => { | 03:02 |
ianw | 2021-12-15 02:02:59.867179 | bridge.openstack.org | 2021-12-15 01:58:35,723 INFO ansible: changed: [mirror-update01.opendev.org] => { | 03:03 |
ianw | it took about 8 minutes to build openafs | 03:03 |
ianw | that ... seems about normal, i guess | 03:03 |
ianw | https://03eb8bb46d4e2a6a232a-dc3e65ccae23bb6c49297bc4ac109b91.ssl.cf5.rackcdn.com/820899/6/check/system-config-run-mirror-update/3e8178e/job-output.txt | 03:03 |
ianw | 2021-12-15 02:03:00.611982 | bridge.openstack.org | 2021-12-15 02:02:43,597 INFO ansible: mirror-update01.opendev.org : ok=233 changed=121 unreachable=0 failed=0 skipped=28 rescued=0 ignored=12 | 03:05 |
ianw | the whole thing finished 4-ish minutes after that, so nothing blowing out there | 03:05 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772 | 03:16 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772 | 03:35 |
opendevreview | chzhang8 proposed openstack/project-config master: register and bring back tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/800442 | 03:42 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [dnm] add vm element to 9-stream image test to test bootloader https://review.opendev.org/c/openstack/diskimage-builder/+/821772 | 03:51 |
fungi | second recheck seems to have made it back into the gate too, so i guess it's not 100% failing | 04:28 |
*** ysandeep|out is now known as ysandeep | 04:50 | |
*** pojadhav- is now known as pojadhav | 05:12 | |
ykarel | rax-iad and rax-dfw still affected with pypi issues | 05:13 |
ykarel | ianw, you around? | 05:13 |
ykarel | or anyone else from infra who can help in clearing this | 05:14 |
ykarel | one option i see to try out revert of https://review.opendev.org/c/openstack/project-config/+/760495 | 05:16 |
ykarel | as public mirror of these looks good, the internal ones are impacted | 05:16 |
ykarel | or can try -XPURGE to those internal mirrors if that resolves it, but need to do that from where those are reachable | 05:17 |
ianw | ykarel: I am for just a bit | 06:02 |
ykarel | ianw, ack np i am trying to clear with hack https://review.opendev.org/c/openstack/neutron/+/821798 | 06:03 |
ykarel | clear rax-iad | 06:03 |
ykarel | now waiting for node on rax-dfw | 06:03 |
ianw | i'm not sure what that internal mirror would have to do with it? that is just using the rax local network to access the mirror, but it's the same node | 06:04 |
ianw | i.e. mirror-int.iad.rax.opendev.org == mirror.iad.rax.opendev.org, just one is the internal interface | 06:05 |
ianw | as mentioned with pypi, we are only a proxy... | 06:05 |
frickler | also, although the nodes are called "mirror", they are actually just caching proxies | 06:06 |
ykarel | ianw, but can see failures with mirror-int.iad.rax.opendev.org but with mirror.iad.rax.opendev.org installation get's fine | 06:06 |
ianw | hrm, that is *probably* a red-herring issue -- they both go to exactly the same apache process | 06:06 |
frickler | ykarel: do you have links to the failures? | 06:06 |
ianw | (i mean, i will never say never, but I'd highly doubt that is actually the issue) | 06:07 |
ykarel | frickler, https://86804121d4d0f7ba6424-61662cfb64be48a1e2663c2773bf553c.ssl.cf2.rackcdn.com/821414/2/check/neutron-tempest-plugin-api/c4e6f3a/job-output.txt | 06:08 |
ykarel | there were many failures http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22The%20user%20requested%20(constraint)%5C%22 | 06:09 |
ykarel | yesterday other providers were also impacted but fungi cleared those by running -XPURGE against those | 06:10 |
ianw | 2021-12-15 03:53:43.468529 | controller | ERROR: Cannot install neutron==19.1.0.dev278 because these package versions have conflicting dependencies. | 06:10 |
ykarel | but today only seeing failures in rax-iad and rax-dfw | 06:10 |
ykarel | and the common among those was these using mirror-int | 06:10 |
ianw | 2021-12-15 03:53:43.468804 | controller | neutron 19.1.0.dev278 depends on pecan>=1.3.2 | 06:10 |
ykarel | so likely yesterday those were not cleared | 06:11 |
ianw | 2021-12-15 03:53:43.468828 | controller | The user requested (constraint) pecan===1.4.1 | 06:11 |
ykarel | ianw, yes rax-iad affected with ^ and rax-dfw with pyjwt | 06:11 |
frickler | likely yet another CDN hickup for pypi. give me a bit to setup some local testing | 06:11 |
ykarel | frickler, try try with rax-dfw | 06:12 |
ykarel | rax-iad seems to be fixed after i ran -XPURGE | 06:12 |
ianw | so the real problem is that pip gets an error fetching pecan? but outputs that error? | 06:13 |
ykarel | ianw, yes right | 06:13 |
ykarel | 1.4.1 not available | 06:13 |
ykarel | older version can be installed | 06:13 |
frickler | the error that pip doesn't log usually is that it doesn't find any version at all in the index | 06:17 |
ykarel | frickler, i found a running job on rax-dfw, is it possible to log into it ? | 06:22 |
ykarel | ip 104.130.132.107 | 06:22 |
frickler | ykarel: both iad and dfw seem to be working for me. either my testing is wrong or the XPURGE is indeed global as I assumed | 06:22 |
ykarel | frickler, you used mirror-int? | 06:23 |
frickler | ykarel: no, I agree with ianw that there is no plausible explanation for it to behave differently | 06:24 |
ykarel | frickler, may be can check on 104.130.132.107 node to confirm? | 06:24 |
ykarel | in a venv with latest pip just need to run MIRROR=mirror-int.dfw.rax.opendev.org | 06:24 |
ykarel | pip install --index-url="https://${MIRROR}/pypi/simple" --extra-index-url="https://$MIRROR/wheel/ubuntu-20.04-x86_64" --trusted-host=$MIRROR -c https://raw.githubusercontent.com/openstack/requirements/master/upper-constraints.txt pyjwt | 06:25 |
ykarel | if ^ fails try with public one | 06:25 |
frickler | working just fine for me | 06:27 |
ykarel | frickler, ohkk then likely it got fixed, you ran on 104.130.132.107 only, right? | 06:28 |
frickler | yep | 06:28 |
ykarel | ack Thanks for checking, will keep an eye if it happens again | 06:29 |
frickler | oh, wait, I tested without u-c. it only finds pyjwt==2.2.0, 2.3.0 is missing | 06:31 |
ykarel | ahh then it's affected, can try the same command as above | 06:31 |
ykarel | first with mirror-int and then with public one | 06:31 |
frickler | ykarel: o.k., indeed it works without the -int, this is very weird, need to look into what is happening on the proxy | 06:38 |
ykarel | frickler, okk good | 06:39 |
ykarel | for now may be can just run PURGE against mirror-int.dfw.rax.opendev.org to clear CI | 06:39 |
frickler | ykarel: it doesn't work like that, the purge is against pypi, not the proxy. | 06:42 |
ykarel | frickler, but isn't if you fire the request to proxy it don't got to pypi? | 06:42 |
frickler | but maybe the proxy has different local caching for -int and external URLs | 06:42 |
ykarel | i ran it for rax-iad and seemed it worked | 06:42 |
ykarel | i ran purge there and after some time module installed fine | 06:43 |
frickler | o.k., indeed the proxy caches by URL, so mirror and mirror-int are different | 06:53 |
frickler | running "htcacheclean -A -v -p /var/cache/apache2/proxy/ 'https://mirror-int.dfw.rax.opendev.org:443/pypi/simple/pyjwt/?'" has resolved the issue for that pkg | 06:53 |
ykarel | frickler, Thanks | 06:54 |
frickler | infra-root: for reference, this is what I did in detail, still need to look into decoding the timestamps https://paste.opendev.org/show/811679/ | 06:57 |
frickler | ykarel: thanks for being so persistent, I'll see if we can better tune the proxy cache | 06:58 |
ykarel | frickler, Thanks, btw you are in what timezone? | 07:02 |
frickler | ykarel: nominally UTC+1 currently, but I don't always stick to that ;) | 07:04 |
ykarel | yes seems so as it's too early for you now :) | 07:05 |
elodilles | fungi corvus : ack, thanks! | 07:09 |
dulek | Hey folks! I see another set of dependency issues in the OpenStack jobs, this time Keystone installation fails on pyjwt. | 07:16 |
dulek | Does it make sense to recheck these jobs now? | 07:16 |
frickler | dulek: if the errors were happening on rax-dfw or rax-int, yes. otherwise please link to a failure | 07:28 |
frickler | infra-root: it seems that sometimes we have "stuck" cache entries with an expiry of 24h instead of the expected 5m, see the first timestamps in my above paste for the mirror-int entry | 07:29 |
dulek | Here's one on rax-iad: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0a9/821442/7/check/openstack-tox-pep8/0a92d7c/job-output.txt | 07:35 |
dulek | And that's it, rest failed on rax-dfw. | 07:37 |
*** ysandeep is now known as ysandeep|brb | 08:12 | |
*** ysandeep|brb is now known as ysandeep | 08:22 | |
*** sshnaidm is now known as sshnaidm|afk | 09:33 | |
*** ysandeep is now known as ysandeep|afk | 09:57 | |
*** redrobot6 is now known as redrobot | 10:34 | |
*** ysandeep|afk is now known as ysandeep | 11:08 | |
*** rlandy is now known as rlandy|ruck | 11:13 | |
*** sshnaidm|afk is now known as sshnaidm | 11:26 | |
anbanerj|ruck | Hi, | 11:39 |
anbanerj|ruck | We have a gate blocker. Patches 816991,16 and 821778 which fixes bugs needs to go first to unblock the rest. Can someone please get these two patches to the top of the queue? | 11:39 |
anbanerj|ruck | https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/821699/ | 11:39 |
anbanerj|ruck | https://review.opendev.org/c/openstack/tripleo-ci/+/821778/ | 11:39 |
anbanerj|ruck | fungi, clarkb ^ when you get some time | 11:40 |
anbanerj|ruck | thanks | 11:40 |
*** pojadhav is now known as pojadhav|afk | 11:46 | |
anbanerj|ruck | Also 821538 (https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/821538) pls. thanks | 11:52 |
*** tkajinam is now known as Guest8517 | 11:58 | |
anbanerj|ruck | Hey clarkb, fungi: Sorry pls ignore the previous patches. The correct patches that have to go first to unblock the gate are: 821538, 821778, 821699. in order. Could you pls put these at the top of the tripleo gate queue? Thanks! | 12:07 |
*** pojadhav|afk is now known as pojadhav | 12:41 | |
*** kopecmartin_ is now known as kopecmartin | 13:41 | |
*** lbragstad8 is now known as lbragstad | 14:09 | |
fungi | frickler: i suspect some fastly cnd endpoints occasionally serving pypi's fallback mirror could explain that if their fallback sets different cache parameters on responses | 14:12 |
frickler | fungi: indeed, I made some further tests and the cache timeout (default seems to be mostly 10m) is being sent from pypi and not specified on our side | 14:14 |
frickler | fungi: while the broken index responses seem to be sent also with broken or long timeout. we should consider setting like a maximum timeout of maybe 1h or less to reduce the impact of those | 14:15 |
frickler | https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cachemaxexpire would be the option to set | 14:17 |
fungi | so basically when a fastly endpoint decides it can't reach the real pypi backend and serves something from pypi's backup mirror instead, it's also including a much longer cache timeout which results in us serving that stale data from our proxies for even longer | 14:17 |
fungi | anbanerj|ruck: i've put 821538,2 821778,1 821699,2 (in that order) as the first three items in the tripleo shared gate queue now | 14:25 |
anbanerj|ruck | fungi, thank you! | 14:25 |
fungi | no problem | 14:26 |
*** pojadhav is now known as pojadhav|afk | 14:32 | |
frickler | fungi: that's just my current assumption based on what I saw and the data from the cache I posted. might need further watching. | 14:43 |
frickler | fungi: otoh enforcing a limit of 1h when the default we see is 10m might be agreeable already | 14:43 |
frickler | fungi: it's also not sure whether that timeout is set by fastly or comes from the backends | 14:44 |
fungi | yeah, i'd be up for making it 10min even | 14:44 |
frickler | another thing I noticed: we run a htcacheclean daemon for /var/cache/apache2/mod_cache_disk but that dir is empty, the proxy cache is in /var/cache/apache2/proxy , which we don't seem to actively clean at all | 14:49 |
*** ysandeep is now known as ysandeep|dinner | 14:55 | |
*** simondodsley_ is now known as simondodsley | 14:58 | |
*** mnaser_ is now known as mnaser | 14:58 | |
*** ildikov_ is now known as ildikov | 15:00 | |
*** johnsom_ is now known as johnsom | 15:00 | |
*** clayg_ is now known as clayg | 15:00 | |
*** bbezak_ is now known as bbezak | 15:00 | |
*** erbarr_ is now known as erbarr | 15:01 | |
*** parallax_ is now known as parallax` | 15:01 | |
*** walshh__ is now known as walshh_ | 15:01 | |
*** davidlenwell_ is now known as davidlenwell | 15:01 | |
*** JpMaxMan_ is now known as JpMaxMan | 15:02 | |
*** parallax` is now known as parallax | 15:02 | |
*** parallax is now known as Guest8535 | 15:03 | |
Clark[m] | One of those cache paths and htcacheclean is the default you get with Ubuntu packaging. The other is our path necessary due to cinder volumes in use on some hosts. Both should get cache cleaning via Cron jobs. | 15:09 |
Clark[m] | Note a 10m expiry won't be very effective due to how often htcacheclean runs for keeping disk use down. But I think apache will refresh data it sees as stale, it just won't delete it as quickly as we might expect | 15:10 |
*** clarkb is now known as Guest8536 | 15:11 | |
frickler | Clark[m]: the 10m would be mostly to reduce the impact of caching broken indices, not to reduce disk usage | 15:13 |
frickler | might be interesting to check whether we could actually make that specific only for indices and have longer timeouts for wheels/tars | 15:14 |
fungi | yeah, we want to turn over indices fairly quickly, but the packages don't ever change so can be cached for as long as we have space | 15:22 |
fungi | however, the packages are technically proxied from a different site name entirely, so we can probably leverage that difference? | 15:22 |
*** ysandeep|dinner is now known as ysandeep | 15:25 | |
*** parallax_ is now known as parallax | 15:30 | |
frickler | Context:server config, virtual host, directory, .htaccess | 15:31 |
frickler | doesn't seem to work by location :( | 15:31 |
frickler | the last thing to note, which both ianw and me were wrong about: the cache works per target URL, not per source, so for rax indeed the cache for the -int versions is distinct from the one seen from the public | 15:33 |
*** dviroel|rover is now known as dviroel|rover|lunch | 15:47 | |
elodilles | fungi corvus : i'm about to run the branch delete script now. i'll let you know when i reach the part where multiple branches will be deleted in a short time | 15:49 |
fungi | elodilles: sounds great, i'm around and can keep an eye on things as well | 15:49 |
elodilles | ack, let's see | 15:50 |
*** rlandy|ruck is now known as rlandy|ruck|brb | 15:50 | |
*** rlandy|ruck|brb is now known as rlandy|ruck | 16:12 | |
*** Guest8536 is now known as clarkb | 16:18 | |
opendevreview | Merged opendev/system-config master: Copy Exim logs in system-config-run jobs https://review.opendev.org/c/opendev/system-config/+/820899 | 16:32 |
*** dviroel|rover|lunch is now known as dviroel|rover | 16:32 | |
*** marios is now known as marios|out | 16:33 | |
fungi | that took far more rechecks than i would have expected | 16:37 |
*** ysandeep is now known as ysandeep|out | 16:54 | |
clarkb | it just occured to me that we should update the limnoria bot when meetings aren't happening | 17:00 |
clarkb | for that reason my next bullseye update will be the matrix eavesdrop bot instead of limnoria | 17:00 |
clarkb | corvus: ^ fyi I'm approving that update. I don't expect trouble since that bot doesn't rely on debian user space for much | 17:00 |
clarkb | for limnoria if we land that today we'll want to do it after the swift meeting at 2100 UTC | 17:02 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add firewall behavior assertions to test_bridge https://review.opendev.org/c/opendev/system-config/+/821780 | 17:07 |
clarkb | I think ^ might actually end up doing what we want, but I'm doing another forced failure to be sure | 17:11 |
*** sshnaidm is now known as sshnaidm|afk | 17:14 | |
fungi | looks like the other test-related change in topic:mailman-lists will merge shortly, and then i'll start approving the ones which might (though highly unlikely) have production impact | 17:19 |
clarkb | cool. I'll be around if I can help | 17:20 |
corvus | i'm not quite sure why i'm not seeing a login button on zuul... i'll try to look into that today | 17:22 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144 | 17:23 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397 | 17:23 |
elodilles | fungi: 10 branch have been deleted in the last ~3 minutes | 17:28 |
elodilles | currently i see 10 management events | 17:29 |
clarkb | now to see how long it takes them to exit the queue | 17:29 |
fungi | yeah, in theory it should only spend time on the first and last ones, right? | 17:30 |
corvus | seeing 10 events in the queue is expected; they should all be processed together (or at least 1 and then 9 more together) | 17:30 |
corvus | 2021-12-15 17:28:42,098 INFO zuul.Scheduler: Tenant reconfiguration beginning for openstack due to projects {('opendev.org/openstack/openstack-ansible-os_barbican', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_ceilometer', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-openstack_openrc', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_gnocchi', 'stable/ocata'), | 17:31 |
corvus | ('opendev.org/openstack/openstack-ansible-os_horizon', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_designate', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_aodh', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_cinder', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_heat', 'stable/ocata'), ('opendev.org/openstack/openstack-ansible-os_glance', 'stable/ocata')} | 17:31 |
corvus | that is very promising though :) | 17:31 |
elodilles | :] | 17:31 |
fungi | so either it will drop from 10 to 9 to 0, or from 10 to 1 to 0 | 17:31 |
corvus | or maybe 10 to 0 if we're lucky | 17:31 |
fungi | presumably the latter | 17:31 |
fungi | ahh, okay | 17:32 |
elodilles | 0 \o/ | 17:32 |
fungi | much better! | 17:33 |
fungi | thanks again corvus!!! | 17:33 |
elodilles | yepp, thanks for the fix! | 17:33 |
corvus | no prob, thanks for finding the bug :) | 17:33 |
elodilles | i'll queue now some more branch to delete | 17:33 |
fungi | also i'm enjoying the cute icons for the pipeline manager types now | 17:33 |
corvus | that's mhu's handywork | 17:34 |
fungi | it's lovely | 17:34 |
elodilles | another 16 branch deleted, and now: Queue lengths: 4 events, 14 management events. | 17:40 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144 | 17:43 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397 | 17:43 |
opendevreview | Merged opendev/system-config master: Update matrix-eavesdrop image to bullseye https://review.opendev.org/c/opendev/system-config/+/821332 | 17:46 |
opendevreview | Merged opendev/system-config master: Collect mailman logs in deployment testing https://review.opendev.org/c/opendev/system-config/+/821112 | 17:46 |
elodilles | Queue lengths: 6 events, 7 management events. | 17:46 |
elodilles | (there were 2 managements events for a while, but now it's 0 \o/) | 17:59 |
elodilles | (and that was all, every EOL'd branch have been deleted now, i think) | 18:00 |
fungi | that's awesome, thanks for working through that with us elodilles! | 18:03 |
elodilles | :] | 18:05 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add firewall behavior assertions to testinfra testing https://review.opendev.org/c/opendev/system-config/+/821780 | 18:06 |
clarkb | fungi: ^ I think that is mergeable now and addresses the confusion around editing those extra files with extra commas | 18:07 |
clarkb | I've approved the zp01 dns record update change | 18:17 |
fungi | thanks, i agree putting the new file in the list of those triggering the jobs we're interested in is a better way to go about it | 18:17 |
opendevreview | Merged opendev/zone-opendev.org master: Try to make zuul-preview records more clear https://review.opendev.org/c/opendev/zone-opendev.org/+/821743 | 18:24 |
opendevreview | Merged opendev/system-config master: Make sure /usr/bin/python is present for mailman https://review.opendev.org/c/opendev/system-config/+/821095 | 18:27 |
opendevreview | Merged opendev/system-config master: Add "mailman" meta-list to lists.katacontainers.io https://review.opendev.org/c/opendev/system-config/+/821775 | 18:31 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add firewall behavior assertions to testinfra testing https://review.opendev.org/c/opendev/system-config/+/821780 | 18:31 |
clarkb | fungi: ^ the point you made about inventory changing is a good one. That attemps to avoid problems | 18:31 |
fungi | clarkb: inspection of the logs indicates 821144 is working now as written. if you're okay with it let's get it and the one after it merged and then we're caught up to working for the current mailman deployment and i can start on migrating foundation mailing lists to the new site | 18:32 |
clarkb | fungi: +2'd if you want to approve | 18:33 |
fungi | thanks! will do | 18:34 |
opendevreview | Merged opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144 | 19:05 |
opendevreview | Merged opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397 | 19:09 |
clarkb | The meeting about openstack health is happenign tomorrow at 16:30 utc. I'll try to attend that | 19:10 |
clarkb | Should be able to as that is just late enough to not interfere with my morning tasks | 19:10 |
fungi | qa team meeting? is it irc? | 19:12 |
gmann | http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026250.html | 19:13 |
gmann | fungi: clarkb ^^ | 19:13 |
clarkb | it is on google meet | 19:13 |
fungi | ahh, okay | 19:13 |
gmann | we discussed in QA IRC meeting and setup the schedule time with tripleo team on google meet | 19:13 |
fungi | it overlaps with the cd foundation interop sig call i usually join | 19:14 |
fungi | but if clarkb's on then i'm redundant anyway | 19:14 |
opendevreview | Ghanshyam Mann proposed openstack/project-config master: Add openstack-venus irc channel in access an gerrit bot https://review.opendev.org/c/openstack/project-config/+/821875 | 20:08 |
ianw | frickler: thanks for looking in on the cache stuff, sorry i had to disappear. the -int thing is interesting -- told you i never say never :) | 20:35 |
ianw | is the end result that we should be talking to pypi about their backend having different timeouts to fastly? | 20:37 |
ianw | maybe it is something they do, so that if the cdn is down, they don't get hit so hard on the backend server? | 20:37 |
ianw | but if the backend is serving bad data (again) ... | 20:37 |
Clark[m] | We brought the general issue up to them when it was last hitting us frequently. The plan at the time was to try and make the backup far more up to date aiui. But the data here seems to indicate that hasnt happened yet | 20:39 |
Clark[m] | Constraints is the main reason it affects openstack. Most other installs will just use an old package version. But that represents security risks. Bringing up that angle might be productive | 20:40 |
ianw | ok, the last time i remember we found the backend was actually out of disk and was seriously out of date. but that was a while ago | 20:43 |
ianw | https://github.com/pypa/warehouse/issues/8568 16-sep-2020 to be exact :) | 20:44 |
Clark[m] | Ya I think they realized they had to address it in general but seems to not have happened due to occurrence rate for us. But really openstack using constraints is what exposes this so they may not be aware it has gotten bad again | 20:44 |
Clark[m] | And maybe arguing this exposes their uses to pulling old insecure packages is a good angle to approach for why this isn't just an openstack is weird issue | 20:45 |
clarkb | I think what I'm getting at is it would be better for them to fail than to fallback | 20:47 |
fungi | well, the last time we brought it up with them the biggest issue was that the fallback lacked python_requires metadata in the indices, and the plan was to get that added (which i think they did?) | 20:49 |
clarkb | fungi: right there were a number of issues. One of which was that, theo ther was it hadn't updated in months | 20:49 |
fungi | or maybe that was the time before last | 20:49 |
clarkb | In this case it seems the backup doesn't update for weeks at least based on some of the errors we have observed | 20:49 |
clarkb | but regardless it seems that returning a 400/500 error to the client when the cdn can't find the data is a better response due to the security issues this potentialy presents | 20:50 |
fungi | the debate raging over pep 665 "lockfiles" might present an opportunity to point out that this will become increasingly painful for users | 20:50 |
clarkb | OpenStack has essentially opted into those errors via contraints, but I'm saying everyone should see them instead | 20:50 |
clarkb | If pypi cannot serve a correct and up to date index that may include important security updates then they should fail and pass that to the user | 20:52 |
clarkb | that is my tldr | 20:52 |
clarkb | Unrelated: the swift meeting is about to start at 21:00 UTC. I need to pick up kids from school ar about 22:15UTC and won't be back until close to 23:00UTC. If I approve the limnoria bullseye update at ~22:00UTC and can't debug until ~23:00 UTC is that a problem for anyone? | 20:53 |
fungi | not a problem for me. but also i plan to be around and am happy to help debug | 20:53 |
clarkb | ok I'll plan to approve it at 22:00 UTC or when I notice the swift meeting is over | 20:54 |
fungi | looks like in order to redirect foo@lists.bar to foo@lists.baz we may need to add a new kind of aliases router which can match on whole addresses with the domain part rather than just the local part | 21:00 |
fungi | luckily i just realized i already have one written for my personal mailserver | 21:09 |
*** dviroel|rover is now known as dviroel|out | 21:11 | |
clarkb | ianw: https://review.opendev.org/c/opendev/system-config/+/821780 is what I ended up with for testing firewall rules externally. I don't think it is perfect (there is a todo in there) but it seems to functioan and do the checking we want to have | 21:18 |
opendevreview | Ghanshyam Mann proposed openstack/project-config master: Add openstack-venus irc channel in access an gerrit bot https://review.opendev.org/c/openstack/project-config/+/821875 | 21:35 |
opendevreview | Ghanshyam Mann proposed opendev/system-config master: Add openstack-venus channel in statusbot https://review.opendev.org/c/opendev/system-config/+/821882 | 21:35 |
ianw | clarkb: I think you can just match testinfra_hosts on the zk hosts, and then anything you run is running on bridge | 21:42 |
ianw | similar to how the screenshots work; selenium is running on bridge -- we just use things in "host.X()" context to run on the remote host? | 21:42 |
ianw | does that make sense? | 21:42 |
clarkb | hrm isn't the host passed in the remote testinfra_host entries so would be zk instead? | 21:43 |
clarkb | I guess I could implement my own checker for tcp connectivity is what you are saying and not use the host argument? | 21:44 |
clarkb | then the actual test case is running from bridge so it would always be external connectivity. Just need to implement our own checks | 21:45 |
clarkb | the swift meeting has ended. I'm approving the limnoria bullseye chagne now | 21:45 |
clarkb | ianw: ya so I think what would work is set testinfra_hosts to zk or whatever and move the test case into test_zookeeper.py. Then ignore the host var that is passed to the testcase except for getting its IP address. Then implement our own checker? | 21:49 |
timburke | thanks for waiting for us :-) | 21:51 |
opendevreview | Ghanshyam Mann proposed openstack/project-config master: Mark openstack-placement IRC channel as retired https://review.opendev.org/c/openstack/project-config/+/821889 | 21:53 |
ianw | clarkb: yep -- basically if you hav ea test_zookeeper.py as "usual", if you run, say, requests.* in there it's running on bridge. it's only if you use like "host.cmd()" that it's actually running on the remote server | 22:03 |
clarkb | right. I'll take a look at doing that refactor later today | 22:04 |
opendevreview | Ghanshyam Mann proposed opendev/system-config master: Fix command for setting the entry message for IRC channel https://review.opendev.org/c/opendev/system-config/+/821913 | 22:07 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add a domain aliases mechanism to lists.o.o https://review.opendev.org/c/opendev/system-config/+/821914 | 22:10 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Create an OpenInfra Foundation staff ML https://review.opendev.org/c/opendev/system-config/+/821915 | 22:10 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Forward messages for OpenInfra Foundation staff ML https://review.opendev.org/c/opendev/system-config/+/821916 | 22:10 |
opendevreview | Merged opendev/system-config master: Update limboria ircbot to bullseye https://review.opendev.org/c/opendev/system-config/+/821330 | 22:27 |
opendevreview | Merged opendev/system-config master: Fix command for setting the entry message for IRC channel https://review.opendev.org/c/opendev/system-config/+/821913 | 22:27 |
clarkb | Need to generate some text here to check if limnoria is working | 22:54 |
clarkb | that last message shows up in the text log. | 22:55 |
clarkb | I'll try a test meeting momentarily | 22:55 |
clarkb | do we have a test meeting entry? | 22:55 |
clarkb | https://meetings.opendev.org/meetings/test/ looks like yes | 22:56 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add a domain aliases mechanism to lists.o.o https://review.opendev.org/c/opendev/system-config/+/821914 | 22:56 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Create an OpenInfra Foundation staff ML https://review.opendev.org/c/opendev/system-config/+/821915 | 22:56 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Forward messages for OpenInfra Foundation staff ML https://review.opendev.org/c/opendev/system-config/+/821916 | 22:56 |
clarkb | https://meetings.opendev.org/meetings/test/2021/test.2021-12-15-22.56.txt I think the new image is happy. Note I didn't land the install from upstream update yet as that one seems more scary. I was goign to check if they haev a stable branch or release tags or something as an alternative | 22:57 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Switch docs theme to RTD https://review.opendev.org/c/zuul/zuul-jobs/+/821918 | 22:59 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Switch docs theme to RTD https://review.opendev.org/c/zuul/zuul-jobs/+/821918 | 23:08 |
corvus | zuul changes merged... i could do a rolling restart, but i have to head out in <2 hours, so i'll plan on doing it tomorrow (unless other folks plan on being around today and would rather i do it now) | 23:12 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add firewall behavior assertions to testinfra testing https://review.opendev.org/c/opendev/system-config/+/821780 | 23:17 |
clarkb | corvus: I've got family stuff this evening so I'd prefer tomorrow, but no objections if fungi and/or ianw would like to do it | 23:17 |
clarkb | ianw: ^ I think that is what you were suggesting for the connectivity testing | 23:17 |
ianw | shoudl we merge the log format update too before restart https://review.opendev.org/c/opendev/system-config/+/821508 ? | 23:18 |
ianw | is it scheduler or complete restart? | 23:18 |
clarkb | ianw: I've approved the log formatter change. I believe this restart is a rolling restart of scheduelrs (and maybe web?) | 23:19 |
ianw | clarkb: yep, that's almost exactly what i was thinking :) | 23:20 |
fungi | i can be around for a restart if that's preferable | 23:21 |
ianw | i'll be happy to do it, just wait for the log format changes to apply? | 23:22 |
*** rlandy|ruck is now known as rlandy|ruck|bbl | 23:23 | |
corvus | it would be a rolling scheduler+web restart | 23:26 |
ianw | ++ | 23:27 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: centos: work around 9-stream BLS issues https://review.opendev.org/c/openstack/diskimage-builder/+/821772 | 23:27 |
corvus | okay, i'll go ahead and do the restart under the assumption that fungi/ianw will be around to check on it later. | 23:27 |
corvus | i'm still around for the next hour or so | 23:28 |
corvus | pulling images now | 23:28 |
corvus | killing 02 | 23:31 |
fungi | yeah, i'm still around and will keep an eye on things | 23:38 |
corvus | okay 02 is fully up; going to restart 01 now | 23:42 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Switch docs theme to RTD https://review.opendev.org/c/zuul/zuul-jobs/+/821918 | 23:59 |
corvus | it's up; i don't see any concerning errors in the scheduler logs | 23:59 |
corvus | will restart web now | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!