Wednesday, 2021-05-26

openstackgerritMerged opendev/system-config master: static site
clarkbfwiw storyboard seems to have a happy cert now02:47
clarkbI'll followup with translate tomorrow02:47
clarkbhrm storyboard actually seems to be serving me both certs. May need to more forcefully restart apache. But again tomorrow :)02:49
corvusi got an le cert02:50
clarkbcorvus: ya I refreshed and got the old one02:51
clarkbit happens when the apache reload doesn't force all the children to die and respawn02:51
clarkbdoing a restart of apache clears it up usually02:51
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Use containers for functional testing
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Use containers for functional testing
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Use containers for functional testing
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Use containers for functional testing
prometheanfirefreenode is hijacking the old gentoo channels, so...04:20
prometheanfirehmm, can I move a openstack projects channel to another network?04:21
prometheanfireasking for a... friend04:22
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Remove octvia-v1-dsvm-* jobs
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Use containers for functional testing
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: Do not uninstall non-installed packages
fricklerprometheanfire: didn't gentoo move to libera? also, did you see ?04:57
fricklerprometheanfire: other than that, each project is free to move their channel anywhere they want, but we can only run our service bots in one network. we are preparing to move those to oftc, but no decision has yet been made afaict05:01
prometheanfirewe did05:03
prometheanfirefreenode decided to take over some of our channels after we mentioned the move in our /topic05:03
prometheanfireoftc is ok, maybe not what I would have personally chosen, but I've been there before :P05:04
fricklerfreenode taking over old channels is a known "issue", one of the reasons to motivate a fast move IMO. just happened for #systemd, too05:06
prometheanfireheheheh, I mean it's really unfortunate how deep they steped in it05:08
prometheanfirelike watching a train wreck05:08
frickleroh, and also ubuntu it seems05:08
prometheanfirewe had a june 14th date for our deadline, I feel like we will be pushing that up now05:09
ianw#status log retired and redirected to static site06:54
openstackstatusianw: finished logging06:55
ianwi'll shut down and make a note to remove it in a week or so in case of any last minute thoughts06:55
ianwvale ask.openstack.org06:55
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Use containers for functional testing
paladox they are calling people who move “mentally ill”07:46
mnasiadkaMorning - can anybody look at zuul jobs for 792999,2? one job is hanging in queued state for over 1h30m08:52
jonherpaladox yeah.. they are basically refering to "policy" but there is no policy against mentioning that you moved to another network in case people are running old verions with old irc info. it feels like censorship at this point, and they are not adressing questions about it in #freenode and set it to +z so people can't talk about it openly10:06
jonherno matter the "fight" that led to this, this is not good and makes me want to leave freenode but not the openstack community10:07
toskyjohnsom: totally correct, just one thing: #freenode iirc has always been set to let only moderator can read all messages10:13
yoctozeptoinfra-root: would it be possible to touch as it has not run one job at all10:18
jonhertosky: there was activity there before i think, and today when i re-joined i noticed a op saying: "<@eskimo> oh this chan is still +z lol" so i think that suggest it is not usually on, but idk about that part, i'm just upset about the "censorship" and unwritten policies being applied regardless of the previous issues10:20
*** sshnaidm|afk is now known as sshnaidm10:25
yoctozeptoinfra-root config-core please feel very welcome to join TC today at 14:30 UTC11:48
fungi#status log Restarted the nodepool launcher container on nl02 in order to free stuck node request locks12:46
openstackstatusfungi: finished logging12:46
fungiyoctozepto: it's got a node assignment now12:46
fungithere were a few other changes in a similar situation and they all seem to be progressing now12:49
yoctozeptofungi: thanks13:07
fungiany time13:07
ysandeep|ruck#opendev we are noticing some jobs failing due to network related issues..13:57
ysandeep|ruckfor example:
ysandeep|ruckFailed to establish a new connection: [Errno 101] Network is unreachable',))13:57
ysandeep|ruckanother example:
ysandeep|ruck Error: Failed to download metadata for repo 'centos-8-fix': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried13:58
ysandeep|ruckMany jobs in post_failure too.13:59
*** whoami-rajat_ has joined #opendev13:59
fungiysandeep|ruck: i'm about to head to an appointment but can take a look when i get back if nobody else beats me to it. do they seem to be primarily in a particular provider?14:07
ysandeep|ruckfungi, i found in 2 provider already limestone and gra1.ovh14:08
ysandeep|ruck [Could not resolve host:]14:08
fungiunfortunately the log urls lack a lot of the build metadata, it would help if you linked to the build report instead so we don't have to dig the build uuid out of the inventory and construct the build url ourselves14:08
ysandeep|ruckTimeout was reached for [Operation too slow. Less than 1000 bytes/sec transferred the last 3014:08
ysandeep|ruckack o/14:10
ysandeep|ruckfungi, i will share the build report once zuul reports back14:11
ysandeep|ruckalso too many jobs hitting post_failure.. logs not available.14:14
fungiinfra-root: ^ if someone has time to look before i get done with my errands14:16
clarkbThe post failurse look like issues hitting ovh keystone. I can hit it from here currently. I think we wait on that and see if it persists14:40
clarkbif it persists or gets worse we can pull ovh swift from log destinations14:40
clarkbysandeep|ruck: for the other issues they all look similar but different and in different clouds. For example I can currently hit from here (note you can always test that yourself too to see if it is persistent)14:41
clarkbysandeep|ruck: for the upper constraints fetch why are your jobs not using the repo in zuul?14:42
clarkbysandeep|ruck: is also accessible for values of X 1-8. These are our backends for opendev.org14:43
* ysandeep|ruck in a mtg currently, will ping back shortly14:44
clarkbysandeep|ruck: in general the internet is known to be flaky and that is why we suggest jobs rely on local resources as much as possible. This includes using constraints from the local openstack/requirements repo14:44
clarkbinfra-root all of the storyboard apache processes seem recent now so I guess apache aged them out automatically14:47
clarkbI have approved the translate LE switch change14:50
* ysandeep|ruck back14:52
ysandeep|ruckclarkb, thanks for looking, yes i can also hit and locally, so it was just a transient network issue.14:53
clarkbysandeep|ruck: basically all of those look like similar but different the internet is unreliable issues. If they persist we can dig in further but as is its hard to say anything actually went wrong on our end. repomd.xml being too slow could be the backend we proxy to. the post failures were an issue with ovh keystone that seems to be working now and so on14:54
openstackgerritClark Boylan proposed opendev/system-config master: Update certcheck additional domains
clarkbif they persist then we can dig in further and see if we need to mitigate or report upstream etc14:55
*** noonedeadpunk_ has joined #opendev14:55
ysandeep|ruckclarkb, thanks for recommendation around using constraints from the local openstack/requirements repo. I will discuss about moving to this/ instead of pulling over network within tripleo-ci team.14:55
ysandeep|ruckclarkb, we will report here if too many jobs hit the network issue/ if issue persists.14:57
ysandeep|ruckclarkb++ fungi++14:57
openstackgerritMerged opendev/system-config master: Switch translate to LE cert
fungiclarkb: catching up, but was it timeouts or error responses from ovh keystone? if the latter (which we see for brief periods occasionally) there's usually a ticket listed for it on their service status site15:25
clarkbfungi: I think it was a timeout establishign a tcp connection15:26
fungiahh, okay. from where? could it be a recurrence of the cogent routing problems we saw with vexxhost a few days ago?15:27
clarkbfungi: ze04 was one of the nodes. that is rax dfw15:27
clarkbI suppose of cogent is between ovh and rax then ya its possible15:28
clarkband might explain some of the other networking issues we saw15:28
*** ysandeep|ruck is now known as ysandeep|away15:28
funginah, i didn't even see rax traverse cogent to reach vexxhost, i don't think they peer with them15:32
fungii suppose it's possible we might transit cogent between rax's backbone peers and ovh's, but i think we wind up going through zayo and abovenet15:34
fungii'd have to do some traceroutes to confirm15:34
clarkbcould be a blip in another provider that ovh and rax peer with15:36
clarkbwe saw other similar connection type issues in the jobs too (hitting things like the proxy caches)15:36
clarkbfungi: is an easy followup after we LE'd some services. I'm going to look at openstackid next too15:38
openstackgerritJeremy Stanley proposed openstack/project-config master: Switch the IRC access check to OFTC
openstackgerritJeremy Stanley proposed openstack/project-config master: Revert "Accessbot OFTC channel list stopgap"
openstackgerritJeremy Stanley proposed opendev/system-config master: Retool accessbot for OFTC
fungithat's ^ coming along but i'm still trying to wrap my head around specifically how to integrate per-channel admins and ops17:33
openstackgerritAndreas Jaeger proposed openstack/project-config master: Remove AJaeger from accessbot
openstackgerritMerged opendev/system-config master: Remove AJaeger from statusbot
fricklerfungi: not sure if you looked into the failures for your oftc patch yet, those seem unrelated to me, looks like a regression with the latest pbr meeting older python
fungifrickler: neat, no i haven't yet, thanks for checking in on it18:38
*** dmellado has quit IRC18:39
fungii saw 793213 failed project-config-irc-access as well and that change should have been basically non-impacting18:39
fricklerfungi: the latter is a different thing: 'openstackinfra does not have permissions on #rdo'18:40
fungineat, i wonder why that's suddenly a problem on freenode18:41
fricklerbecause freenode admins took over the channel, I'd assume18:41
fungiahh, fun. yeah i guess we need to clean it out of our config for now18:42
fungi(that'll mean yet still more rebases for my wip stack on project-config, but c'est la vie)18:43
frickleroh, they are even listed in the error
fricklermaybe if we don't manage to move fast, just disable this check until we're on the other side18:44
fungiwell, my wip stack also removes #rdo because they registered it separately some time ago on oftc so our opendevaccess account doesn't have master role on it there yet either. i have a followup task to get with their current channel admins to add us18:45
clarkbfungi: we probably shouldn't be managing their IRC channels anyway?18:46
fungichannels in a similar state are #edeploy, #openstack-sahara and #puppet-openstack18:46
fungithough that's probably an easy set of tasks for some other volunteers to take up (look at the current access lists for those and try to get their admins to add opendevaccess with master access)18:47
fungiif they get fixed before this weekend we can stop removing them in 792842 and also take them out of the revert of that in 79285718:48
fungiclarkb: i think we were previously managing #rdo because they wanted our gerritbot and maybe also meetbot functionality in there18:49
fungi(possibly statusbot as well)18:49
openstackgerritAndreas Jaeger proposed openstack/project-config master: Remove AJaeger from accessbot
clarkbI see18:50
fricklerhumm, the one I thing I missed to save from zuul01 is my bash history which I used to lookup the details of how to submit a hold ;-S18:52
fricklerbut that's likely to be some docker invocation now anyway, right?18:54
frickler"sudo docker exec -it zuul-scheduler_scheduler_1 zuul autohold-list --tenant openstack" seems to work18:55
frickleralthough it's weird that I need to specify a tenant and the result still lists multiple tenants18:57
fungifrickler: perhaps easier, `sudo docker-compose exec zuul-scheduler zuul autohold-list --tenant openstack`18:58
fricklerfungi: ah, yes, that's a bit simpler. anyway, holding a node for looking at the pbr failures now, not sure if I'll still do it today, though19:01
fungifrickler: also we'll probably soon have a `zuul` wrapper which calls `docker-compose exec zuul-scheduler zuul $@` but long term using zuul-client may make more sense (can it do autoholds already?)19:04
*** whoami-rajat_ has quit IRC19:09
*** iurygregory has quit IRC19:10
*** iurygregory has joined #opendev19:10
*** akurbatov has quit IRC19:20
*** avass has joined #opendev19:21
clarkbthe openstackid LE stuff is a bit more involved. And I'm juggling new computer fun, but will try to get patches up today19:46
*** hamalq has joined #opendev20:12
openstackgerritClark Boylan proposed opendev/system-config master: Provision LE certs for
openstackgerritClark Boylan proposed opendev/system-config master: Switch openstackid to LE certs
clarkbI need to update dns befor we can land those ^ doing that now20:28
ianwclarkb: sorry!  accidentally left the window open when i fiddle the dns yesterday22:13
clarkbthanks! I will go update openstackid dns momentarily22:19
clarkbinfra-root ok DNS records for should be in place now22:23
clarkbianw: and are easy reviews. The openstackid LE changes are up now too. I'm happy to approve those tomorrow after they have had some review and can be monitored22:41
ianwlgtm, thanks!  be nice to have that additional domains list empty! :)23:06
openstackgerritMerged opendev/system-config master: Stop testing nodepool when puppet changes
openstackgerritMerged opendev/system-config master: Update certcheck additional domains
ianw"sudo docker-compose exec zuul-scheduler zuul autohold-list" doesn't seem to to work for me23:38
ianwon new zuul server23:38
clarkbianw: are you running that within the dir with the docker compose file?23:39
corvusdon't run zuul-scheduler23:39
clarkboh also that23:39
corvusit's just 'zuul autohold-list'23:39
clarkboh no I think that is right it says exec in the zuul-scheduler container and run zuul autohold-list23:39
corvusoh sorry23:40
ianwyeah i am in the docker-compose directory23:40
clarkbI've only used docker exec with it and that worked ok23:41
ianwi get a 1 exit but nothing suggesting why ...23:41
clarkb`sudo docker exec zuul-scheduler_scheduler_1 zuul autohold-list --tenant openstack` works23:42
clarkbcorvus: ah the container is a different name23:43
ianwyep, both of those work23:43
corvusif you want to use the compose method, it's the name of the service in the compose file, which is 'scheduler'23:43
ianwyeah; i was pasting in fungi's suggested command above :)23:43
ianwalright, yak shaving achieved, i might be able to put a node on hold now to debug the actual problem23:44
ianwclarkb: do you still need "clarkb inspecting mailman ansible work"?23:45
clarkbianw: I do want to hold a set of those jobs so that I can test the inplace upgrade but its probably best to do that with a new hold. You can clean that one up23:46
clarkb(or I can if you prefer)23:46
fungiaha, thanks, i was guessing based on frickler's docker command, guess i had it almost right ;)23:46
ianwclarkb: there's also  "clarkb testing review-test replication"23:46
ianwcorvus: and "corvus debug gerrit" which looks old too23:47
clarkbianw: that one can be cleaned up too, its set up as the replication target from review-test, but I stopped srvices on review-test in prep for further cleanup (further cleanup has not happened yet)23:47
fungiianw: i may still have a node held troubleshooting the zuul nodepool functional job, if so it can be deleted i'll make another one when i'm freed up enough to resume that effort23:47
corvusi need 0 held nodes at this point23:48
ianwcool, there is now only an old one for vexxhost and an eavesdrop one for frickler23:48

