ianw | again with that 500 error. we don't collect the devstack logs so it's very hard to tell what's going on | 00:42 |
---|---|---|
openstackgerrit | Merged openstack/diskimage-builder master: Ensure redhat efi packages are reinstalled during finalise https://review.opendev.org/c/openstack/diskimage-builder/+/786804 | 01:08 |
*** hemanth_n has joined #opendev | 02:52 | |
*** ysandeep|away is now known as ysandeep | 03:55 | |
*** vishalmanchanda has joined #opendev | 04:52 | |
*** ykarel has joined #opendev | 05:29 | |
*** slaweq has joined #opendev | 05:35 | |
*** ykarel_ has joined #opendev | 05:52 | |
*** ykarel has quit IRC | 05:54 | |
*** sboyron has joined #opendev | 05:59 | |
*** ralonsoh has joined #opendev | 06:01 | |
*** marios has joined #opendev | 06:16 | |
*** eolivare has joined #opendev | 06:29 | |
*** lbragstad_ has joined #opendev | 06:36 | |
*** amoralej|off is now known as amoralej | 06:36 | |
*** lbragstad has quit IRC | 06:40 | |
*** dtantsur|afk is now known as dtantsur | 06:46 | |
*** ykarel_ is now known as ykarel | 06:50 | |
*** iurygregory has joined #opendev | 06:58 | |
openstackgerrit | Merged openstack/diskimage-builder master: dib-run-parts: stop leaving PROFILE_DIR behind https://review.opendev.org/c/openstack/diskimage-builder/+/787303 | 07:00 |
*** pongboom has joined #opendev | 07:06 | |
*** andrewbonney has joined #opendev | 07:07 | |
*** rpittau|afk is now known as rpittau | 07:12 | |
openstackgerrit | Merged opendev/system-config master: Remove IRC bots from #ara https://review.opendev.org/c/opendev/system-config/+/787891 | 07:14 |
*** ysandeep is now known as ysandeep|mtg | 07:19 | |
*** fressi has joined #opendev | 07:19 | |
*** tosky has joined #opendev | 07:42 | |
*** openstack has joined #opendev | 07:51 | |
*** ChanServ sets mode: +o openstack | 07:51 | |
*** brinzhang0 has joined #opendev | 07:52 | |
openstackgerrit | Merged openstack/project-config master: Remove IRC bots from #ara https://review.opendev.org/c/openstack/project-config/+/787892 | 07:53 |
openstackgerrit | Merged openstack/project-config master: Make gerritbot linter know all the supported events https://review.opendev.org/c/openstack/project-config/+/787894 | 07:53 |
*** jpena|off is now known as jpena | 07:54 | |
*** brinzhang_ has quit IRC | 07:55 | |
*** DSpider has joined #opendev | 08:01 | |
*** parallax has joined #opendev | 08:08 | |
*** openstackgerrit has quit IRC | 08:11 | |
*** ysandeep|mtg is now known as ysandeep | 08:11 | |
*** ysandeep is now known as ysandeep|lunch | 08:15 | |
*** calcmandan has quit IRC | 08:16 | |
*** calcmandan has joined #opendev | 08:17 | |
*** ildikov has quit IRC | 08:24 | |
*** ildikov has joined #opendev | 08:29 | |
*** hamalq has joined #opendev | 08:33 | |
*** hrw has joined #opendev | 08:36 | |
hrw | morning | 08:37 |
hrw | fungi: have you had time to find out why python3 failed on centos-8-stream-aarch64 nodes? | 08:37 |
*** hamalq has quit IRC | 08:38 | |
*** ykarel is now known as ykarel|lunch | 09:16 | |
*** ysandeep|lunch is now known as ysandeep | 09:19 | |
*** dtantsur is now known as dtantsur|bbl | 10:08 | |
*** ykarel|lunch is now known as ykarel | 10:29 | |
*** hamalq has joined #opendev | 10:34 | |
*** hamalq has quit IRC | 10:39 | |
*** jpena is now known as jpena|lunch | 11:26 | |
*** amoralej is now known as amoralej|lunch | 12:09 | |
*** dtantsur|bbl is now known as dtantsur | 12:10 | |
kopecmartin | hi, have been any changes done lately which could affect email notifications from review.opendev.org? .. f.e. notifications from reviews (owned, or cced, as reviewer ...) on new commits, zuul results, comments etc? | 12:10 |
*** jpena|lunch is now known as jpena | 12:28 | |
*** openstackgerrit has joined #opendev | 12:28 | |
openstackgerrit | Radosław Piliszek proposed openstack/project-config master: Make gerritbot notify about V-2 on kolla https://review.opendev.org/c/openstack/project-config/+/787887 | 12:28 |
yoctozepto | kopecmartin: they work for me today; any specific details you want to share? | 12:30 |
kopecmartin | yoctozepto: i've just noticed that i haven't received any email from gerrit since 24 Apr .. and I created 1 review, rechecked several ones and got cced in at least one ... and i haven't done any changes in my profile | 12:32 |
kopecmartin | very weird | 12:32 |
kopecmartin | i don't know what to check / verify | 12:32 |
yoctozepto | yeah, you spammed me as well | 12:32 |
kopecmartin | :D | 12:32 |
yoctozepto | I know it's the dreadful question but... you sure the e-mail address is correct? :D | 12:33 |
*** hamalq has joined #opendev | 12:35 | |
kopecmartin | yoctozepto: yes :) .. i'm going all mail now to verify my filters didn't betray me | 12:38 |
*** lpetrut has joined #opendev | 12:38 | |
*** hamalq has quit IRC | 12:39 | |
yoctozepto | oh those treacherous filters! | 12:41 |
fungi | hrw: no, i haven't found time to work out how ansible is invoking python to be able to more directly recreate the fault and load the coredump into gdb | 12:47 |
fungi | kopecmartin: i suppose one possibility is your mailserver is rejecting messages from gerrit. if you can't find then being deleted or misfiled after delivery, i could check out mta logs for outbound delivery errors related to your address | 12:49 |
*** lbragstad_ is now known as lbragstad | 12:51 | |
*** amoralej|lunch is now known as amoralej | 13:02 | |
fungi | #status log Requested Spamhaus PBL delisting for the IPv4 address of review.opendev.org, which should take effect within the hour | 13:04 |
openstackstatus | fungi: finished logging | 13:04 |
fungi | kopecmartin: ^ see if you start receiving notifications by ~14:00 utc | 13:04 |
kopecmartin | fungi: thank you | 13:04 |
fungi | thanks for mentioning you were having a problem! | 13:04 |
fungi | now if we can just figure out how it keeps getting readded to the pbl (or rather how its exclusion keeps getting removed) | 13:05 |
*** sshnaidm|afk is now known as sshnaidm | 13:10 | |
fungi | infra-root: just a heads up, i intend to try to stay away from the computer for a lot of this week so i can catch up on a backlog of tasks around the house, but i'll probably quietly knock out a few lingering tasks here as well and keep an ear to the ground in case of emergencies | 13:17 |
*** vishalmanchanda has quit IRC | 13:41 | |
*** d34dh0r53 has quit IRC | 13:48 | |
frickler | fungi: spamhaus says "NOTE: Exclusions are only valid for 1 year", not sure how to best handle that but I'll make a note in my calendar for next year. we also should check whether we can get some non-PBLed address if we ever move the host again, like to vexxhost | 13:48 |
frickler | indeed, last this happened was 2020/04/17 according to my logs | 13:49 |
frickler | so one year plus 7 days grace period sounds very plausible | 13:50 |
*** d34dh0r53 has joined #opendev | 13:57 | |
*** ysandeep is now known as ysandeep|afk | 14:15 | |
*** slaweq has quit IRC | 14:16 | |
*** slaweq_ has joined #opendev | 14:16 | |
*** slaweq_ is now known as slaweq | 14:16 | |
openstackgerrit | Merged openstack/project-config master: Make gerritbot notify about V-2 on kolla https://review.opendev.org/c/openstack/project-config/+/787887 | 14:18 |
*** vishalmanchanda has joined #opendev | 14:21 | |
*** openstackgerrit has quit IRC | 14:23 | |
fungi | frickler: oh, thanks i hadn't noticed the exclusions expire. yeah we probably need to go back through and check a bunch of addresses periodically in that case. and yes this is really just rackspace i think, they blanket pbl listings for their entire address space so they can not have to deal with spam complaints about compromised customers or folks abusing their services | 14:23 |
*** fressi has quit IRC | 14:24 | |
*** whoami-rajat has joined #opendev | 14:25 | |
*** lpetrut has quit IRC | 14:30 | |
*** hamalq has joined #opendev | 14:36 | |
hrw | fungi: thanks for info | 14:36 |
*** hamalq has quit IRC | 14:40 | |
*** lassimus has quit IRC | 14:50 | |
clarkb | ianw: we probably need to (temporarily) collect some openstack logs in the nodepool/dib jobs to try and track that down | 14:50 |
clarkb | I'm catching up on email and scroll back this morning, but I'm not seeing anything that would indicate today is a bad day to start the zk swaps. I do have one errand I think I need to run this morning but after that I can dive into zk | 14:51 |
*** ysandeep|afk is now known as ysandeep | 15:14 | |
*** ykarel has quit IRC | 15:21 | |
frickler | #status log Requested Spamhaus PBL delisting for the IPv4 address of mirror01.dfw.rax.opendev.org | 15:23 |
openstackstatus | frickler: finished logging | 15:23 |
frickler | #status log Requested Spamhaus PBL delisting for the IPv4 address of nb01.opendev.org | 15:35 |
openstackstatus | frickler: finished logging | 15:35 |
frickler | fungi: ^^ fyi these are the ones I found in my log from february last year that spamhaus reported in PBL again | 15:36 |
fungi | frickler: thanks! those are probably less critical since the only e-mail they send is to our sysadmins, but still good to have clean. my biggest worries are mailing list servers, gerrit, mediawiki, storyboard... anything which sends notifications or other messages to application users | 15:37 |
*** amoralej is now known as amoralej|off | 15:45 | |
*** mlavalle has joined #opendev | 15:49 | |
*** mnaser has joined #opendev | 15:53 | |
mnaser | infra-root: https://opendev.org/openstack/cinder/commit/f4359c523f4cf47eabad7fdcfa3f35c22ebc619e cinder has decided to bump ussuri oslo.serialization==3.1.2, but our wheels have not updated -- http://mirror.ca-ymq-1.vexxhost.opendev.org/wheel/ubuntu-18.04-x86_64/oslo-serialization/ | 15:54 |
mnaser | how can i trigger a build? | 15:54 |
*** rpittau is now known as rpittau|afk | 15:55 | |
clarkb | mnaser: they should run daily, so if that release is more than a day old we may need to check why the existing builds haven't succeeded first | 15:55 |
clarkb | oh unless we only build what is constrained and we just skipped over it myabe | 15:56 |
mnaser | clarkb: this seems to have merged 2 months ago :-P | 15:56 |
mnaser | https://review.opendev.org/c/openstack/cinder/+/774680 | 15:56 |
clarkb | mnaser: right I'm saying that it seems likely the reason it isn't there is not that it needs to be manually triggered but some other fundamental problem | 15:57 |
clarkb | and we should address that | 15:57 |
*** ykarel has joined #opendev | 15:58 | |
*** hamalq has joined #opendev | 15:58 | |
mnaser | where are the jobs listed? | 15:58 |
*** hemanth_n has quit IRC | 15:58 | |
clarkb | they are in openstack/openstack-zuul-jobs as build-wheel-cache-* | 15:58 |
clarkb | I think anyway, still digging that up | 15:59 |
clarkb | mnaser: https://pypi.org/project/oslo.serialization/3.1.2/#files that has a wheel already I think we dont' cache that separately | 15:59 |
mnaser | ah, yes, i think my issues are stemming from the fact i have a cached upper-constraints | 16:01 |
clarkb | mnaser: roughly the way things should work is if there is already a wheel then don't bother building and caching one locally. I don't understand why 4.1.0 has a wheel there, but it shouldn't if I remember how this works properly | 16:02 |
*** hamalq has quit IRC | 16:03 | |
clarkb | publish-wheel-cache-ubuntu-bionic is the actual job. the build jobs are there to test the tooling to do the publication | 16:03 |
*** ykarel has quit IRC | 16:03 | |
clarkb | https://zuul.opendev.org/t/openstack/build/c6f4c525447b4513994e6dddc184c89d/logs is the most recent build if you want to look it over, but ya I don't think we expect to cache that because it has a wheel on pypi | 16:03 |
*** hamalq has joined #opendev | 16:04 | |
*** ysandeep is now known as ysandeep|away | 16:04 | |
clarkb | basically the wheel cache is there primarily to avoid building wheels at runtime which can be expensive for things like lxml, cryptography and friends, etc. If there is already a wheel on pypi we don't bother to rebuild it | 16:09 |
clarkb | in the case of cryptography we've shifted to helping them build the wheels that go on pypi so everyone benefits too | 16:09 |
*** roman_g has joined #opendev | 16:23 | |
dtantsur | ianw or anyone: could you please release glean with the latest fixes? | 16:25 |
roman_g | Good morning, team. I'm experiencing problems with CityCloud KNA1. Could you have a look if there are problems on OpenDev Zuul side, please? Thank you! https://grafana.opendev.org/d/QQzTp6EGz/nodepool-airship-citycloud?orgId=1&from=now-7d&to=now | 16:26 |
roman_g | Seems that some instances are stuck, and no new instances are being able to launch there. | 16:27 |
clarkb | roman_g: without looking I'm fairly confident in saying it is most likely the lack of hypervisors capable of scheduling the requested nodes. But I'll check | 16:27 |
clarkb | dtantsur: I think there were a couple more changes ianw wanted to get in? | 16:27 |
roman_g | clarkb Error Node Launch Attempts is zero, which confuses me. And graphs show that instances are stuck there. | 16:28 |
roman_g | Since Thursday evening. There was PTG, so probably no one noticed issue earlier. | 16:28 |
roman_g | dtantsur o/ | 16:29 |
fungi | clarkb: i'm not here at the moment, but one possibility for the airship-kna1 lack of nodes could be "stuck" node requests, i've noticed over the past month the launchers (especially in the wake of high launch error counts) have a tendency to not complete turning over node requests they lock after they've satisfied or decided to decline them | 16:29 |
clarkb | I see the issue | 16:30 |
clarkb | its a config problem | 16:30 |
dtantsur | clarkb: sure, but the work seems stuck there | 16:30 |
clarkb | dtantsur: we've all been incredibly busy :/ | 16:30 |
dtantsur | I imagine | 16:30 |
dtantsur | that's why I wonder if we could have a release now. It fixes a serious issue for one of the features in ironic. | 16:30 |
fungi | dtantsur: care to skim the commits since the last tag and suggest what the next appropriate semver version should be for it? | 16:31 |
roman_g | clarkb would it be solved by you, or do I need to contact CityCloud and ask them to check and do something? | 16:31 |
dtantsur | sure | 16:31 |
*** openstackgerrit has joined #opendev | 16:32 | |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Fix nl02's provider config list https://review.opendev.org/c/openstack/project-config/+/788044 | 16:32 |
clarkb | roman_g: ^ thats the fix | 16:32 |
roman_g | Thank you, clarkb ! | 16:32 |
dtantsur | fungi: things there are essentially bug fixes, but they're quite substantial, so it may be worth bumping the minor version to highlight it. | 16:32 |
clarkb | ya we can check with ianw if he is comfortable with a release as is before landing the additional refactors. I suspect so since everything should be fairly forward/backward compatible | 16:33 |
clarkb | I wonder if that yaml would've errored if we weren't doing the reserialization stuff | 16:34 |
fungi | duplicate keys aren't verboten by pyyaml's parser | 16:35 |
fungi | you have to jump through some hoops to detect them at all | 16:35 |
clarkb | ya you have to subclass some portion of the parser to check for key existence before adding and raise an exception if you hit that ? | 16:39 |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: debian-minimal: Set bullseye version https://review.opendev.org/c/openstack/diskimage-builder/+/787665 | 16:43 |
clarkb | this change depends on a modification to the nodepool jobs now to try and debug those nova 500 errors | 16:44 |
clarkb | dtantsur: did you want to review ianw's glean stack to ensure it doesn't conflict with ironic needs? https://review.opendev.org/c/opendev/glean/+/782010/4 is the bottom of that stack | 16:45 |
dtantsur | clarkb: I've seen them, and I *think* they're fine, but to say for sure I need to conduct pretty cumbersome testing | 16:45 |
clarkb | it is a refactor to try and make things more clear as to what runs and when they run | 16:46 |
dtantsur | which I don't have much time for :( | 16:46 |
clarkb | I think it could make your CI testing take longer beacuse we'll be back to running multiple invocations of things for configuring a single interface, but it should force ordering and make it clear that is what is happening | 16:46 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Roles to create, cleanup and promote snapshots in ec2 https://review.opendev.org/c/zuul/zuul-jobs/+/787677 | 16:48 |
*** marios is now known as marios|out | 16:48 | |
openstackgerrit | Merged openstack/project-config master: Fix nl02's provider config list https://review.opendev.org/c/openstack/project-config/+/788044 | 16:50 |
*** whoami-rajat has quit IRC | 16:50 | |
clarkb | roman_g: ^ that shoudl apply in the next 10 minutes or so | 16:51 |
roman_g | clarkb thank you! | 16:51 |
*** marios|out has quit IRC | 16:52 | |
roman_g | clarkb deploy failed https://zuul.opendev.org/t/openstack/build/7f51faaa68be4680a8417c3d93af2e57/console | 17:06 |
clarkb | I bet it is nb03 still causing problems. Let me check the specific fix | 17:07 |
clarkb | roman_g: I think the fix applied successfully to nl02 which is all we should need | 17:07 |
roman_g | clarkb I see builds starting. Thank you. | 17:08 |
*** jpena is now known as jpena|off | 17:10 | |
*** sshnaidm is now known as sshnaidm|afk | 17:12 | |
*** ralonsoh has quit IRC | 17:20 | |
*** roman_g has quit IRC | 17:30 | |
*** andrewbonney has quit IRC | 17:37 | |
*** dtantsur is now known as dtantsur|afk | 17:41 | |
*** sboyron has quit IRC | 17:50 | |
*** vishalmanchanda has quit IRC | 18:21 | |
*** eolivare has quit IRC | 18:33 | |
*** stand has joined #opendev | 18:42 | |
*** gouthamr has joined #opendev | 19:01 | |
*** DSpider has joined #opendev | 19:38 | |
*** amoralej|off has quit IRC | 20:31 | |
*** jpena|off has quit IRC | 20:32 | |
*** fbo has quit IRC | 20:33 | |
*** fbo has joined #opendev | 20:37 | |
*** slaweq has quit IRC | 21:06 | |
*** stevebaker has joined #opendev | 21:21 | |
clarkb | ok that errand took much much longer than I anticipated and I haven't even had lunch yet :/ | 21:27 |
clarkb | I'm not feeling in a great spot to do the zk thing now, first thing tomorrow though | 21:27 |
fungi | i expect to be around more tomorrow. doing more stuff inside the house and fewer errands outside | 21:38 |
fungi | can arrange to be around for zk swaps | 21:38 |
clarkb | cool, I'll try to dive right into that when my day starts | 21:40 |
ianw | clarkb: i guess we should probably do a release of glean with dtantsur's fix and then put the other bits in and do another release. they're more code motion, and that will give people a point to pin to in an emergency | 22:02 |
ianw | (not that i expect one, but, you never know) | 22:02 |
clarkb | ianw: cool, that worked for me if it also worked for you | 22:06 |
ianw | can do soon | 22:07 |
clarkb | ianw: I got a change up to collect logs in the nodepool changes that I depends-on'd to the dib change. | 22:10 |
clarkb | Haven't caught a nova failure yet though | 22:10 |
ianw | oh great, thanks. that's one less thing on the todo list :) | 22:10 |
clarkb | feel free to remove the depends on if you want to try your luck instead (I don't expect we'll be merging the nodepool side change) | 22:10 |
fungi | clarkb: ianw: i'm struggling a bit to work out what ansible might be calling during fact getting which would crash the python3 interpreter on centos 8 for arm64, wanting to come up with a clean reproducer to generate a core dump on the node i've held... you don't happen to know how to emulate fact gathering from the local command line? | 22:10 |
clarkb | fungi: ansible -i localhost, -m setup | 22:11 |
clarkb | er ansible localhost -i localhost, -m setup | 22:11 |
clarkb | you need to specify the target in addition to the inventory | 22:11 |
fungi | the comma there is intentional? | 22:11 |
clarkb | yes it tells ansible to treat the arg as the actual inventory and not a file to find the inventory in | 22:11 |
ianw | fungi: ansible -i <host>, <host> -m setup | 22:12 |
clarkb | and then you can see what that setup module is doing, but I'm not sure of the exact behavior | 22:12 |
ianw | heh or what clarkb said | 22:12 |
fungi | i guess i need to actually install ansible on the held node in that case | 22:14 |
clarkb | or setup the inventory on some other host and target that node | 22:14 |
clarkb | but it may be easier to track what is happenign if everything is on one machine | 22:15 |
fungi | was more wondering if i could run something involving one of these python zipballs ansible stuck in /tmp | 22:15 |
clarkb | in theory I think you can, but you'd have to figure out which oen corresponded to the setup module and possibly hack it up to only run that | 22:15 |
clarkb | also it may expect some sort of command and control input | 22:15 |
ianw | i'd probably be tempted in this case to replace python with a shell script that either runs it under strace to start, or gdb if that doesn't help | 22:16 |
fungi | yeah, they don't make this simple to debug in isolation | 22:16 |
fungi | alternatively, any idea where it would have stuck the core it claims to have dumped? | 22:17 |
ianw | i feel like with default ulimits that would not be done? | 22:17 |
ianw | my method of debugging the debian distro checking was to place a syntaxerror with debugging info in, which i could partially see in the output of "-vvv" | 22:18 |
ianw | so yeah, i'm also not sure how you're supposed to get info out of the remote side :) | 22:18 |
* fungi shakes fist at ulimit rules preventing saving critical debugging data | 22:19 | |
fungi | i remember the days when /var/crash was sacrosanct | 22:20 |
fungi | openbsd still lets things write to /var/crash by default | 22:23 |
* fungi is not bitter | 22:23 | |
fungi | openbsd 6.9 should officially happen this weekend though | 22:24 |
fungi | 50th release! | 22:24 |
ianw | clarkb: logs LGTM, what do you think about putting it under /var/log/nodepool/devstack/ though to keep them separate? | 22:28 |
clarkb | ianw: thats fine, I just did the most likely to succeed thing since I suspect that zuulains may not want to merge that change at all | 22:32 |
*** zimmerry has quit IRC | 22:49 | |
*** zimmerry has joined #opendev | 22:53 | |
*** tosky has quit IRC | 23:01 | |
ianw | sigh, nb03 has filled up it's disk | 23:08 |
clarkb | ianw: I was worried that may happen with all of the new image buidls there | 23:09 |
clarkb | and adding raw | 23:09 |
clarkb | if we can fit just raw onto that host then switching linaro-us over to raw may be a good option? | 23:09 |
ianw | kevinz: ^ can/should we use raw images with linaro-us? | 23:10 |
ianw | hrm we have at least an old 100gb volume to remove | 23:12 |
ianw | it seems we have quota for another 400, which doubles the current size. i'll attach that | 23:15 |
clarkb | sounds good | 23:16 |
ianw | #status log nb03 : doubled /opt volume to 800g to allow for more images after enabling raw with https://review.opendev.org/c/opendev/system-config/+/787293 | 23:20 |
openstackstatus | ianw: finished logging | 23:20 |
ianw | ok rebooted for good measure and it's trying again | 23:27 |
clarkb | fungi: one other thought on the segfaulting. You can limit which facts are gathered. Yuo might be able to narrow down what is causing it using that? | 23:32 |
clarkb | fungi: https://docs.ansible.com/ansible/latest/collections/ansible/builtin/setup_module.html#parameter-gather_subset | 23:32 |
clarkb | basically say gather_subset: !all,!min,foo then !all,!min,bar and so on | 23:33 |
clarkb | and if you trip with one and not the others that would be the thing to loko into | 23:33 |
fungi | aha, yeah thanks, that could help | 23:40 |
ianw | other than nb03, we don't have any other ZK hosts outside RAX DFW, do we? | 23:43 |
clarkb | ianw: I don't think so | 23:43 |
clarkb | the launchers, builders, and scheduler are currently it I think. But the executors will be added soon | 23:43 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!