*** xek has joined #openstack-infra | 00:01 | |
*** whoami-rajat has quit IRC | 00:01 | |
*** yamamoto has joined #openstack-infra | 00:02 | |
openstackgerrit | Merged zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs https://review.opendev.org/669939 | 00:15 |
---|---|---|
*** jistr has quit IRC | 00:15 | |
*** jistr has joined #openstack-infra | 00:15 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 00:21 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Enable debug logs for openstack-functional tests https://review.opendev.org/672412 | 00:23 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element https://review.opendev.org/669787 | 00:23 |
*** larainema_ has joined #openstack-infra | 00:48 | |
*** larainema_ is now known as larainema | 00:49 | |
*** gyee has quit IRC | 00:49 | |
*** ricolin has joined #openstack-infra | 00:55 | |
ianw | clarkb: http://logs.openstack.org/87/669787/9/check/nodepool-functional-openstack-src/235e201/nodepool/nodepool-launcher.log | 00:56 |
ianw | @ around 2019-07-25 00:52:37,654 ... sending the systemd output to the journal, it gets captured ok ... i think that will be helpful in general for any such future issues | 00:57 |
*** igordc has quit IRC | 01:04 | |
*** yamamoto has quit IRC | 01:04 | |
clarkb | ya that bit was working iirc | 01:07 |
*** slaweq has joined #openstack-infra | 01:11 | |
*** slaweq has quit IRC | 01:15 | |
*** tdasilva has quit IRC | 01:20 | |
*** tdasilva has joined #openstack-infra | 01:21 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: journal-to-console: element to send systemd journal to console https://review.opendev.org/669784 | 01:25 |
*** mriedem has quit IRC | 01:49 | |
*** Frootloop has quit IRC | 02:09 | |
*** jcoufal has joined #openstack-infra | 02:19 | |
*** jcoufal has quit IRC | 02:33 | |
*** yamamoto has joined #openstack-infra | 02:46 | |
*** bhavikdbavishi has joined #openstack-infra | 02:51 | |
*** bhavikdbavishi1 has joined #openstack-infra | 02:54 | |
*** bhavikdbavishi has quit IRC | 02:55 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 02:55 | |
*** ykarel|away has joined #openstack-infra | 02:56 | |
*** whoami-rajat has joined #openstack-infra | 03:06 | |
*** slaweq has joined #openstack-infra | 03:11 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove gitea02 from inventory so we can replace it https://review.opendev.org/672621 | 03:13 |
clarkb | fungi: ^ head start on tomorrow | 03:13 |
*** slaweq has quit IRC | 03:16 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Functional testing: add journal-to-console element https://review.opendev.org/669787 | 03:35 |
*** eernst has joined #openstack-infra | 03:36 | |
*** psachin has joined #openstack-infra | 03:38 | |
*** yamamoto has quit IRC | 03:42 | |
*** yamamoto has joined #openstack-infra | 03:46 | |
*** yamamoto has quit IRC | 03:51 | |
*** yamamoto has joined #openstack-infra | 03:53 | |
*** rcernin has quit IRC | 03:55 | |
*** yamamoto has quit IRC | 03:57 | |
*** yamamoto has joined #openstack-infra | 04:02 | |
*** lmiccini has quit IRC | 04:04 | |
*** lmiccini has joined #openstack-infra | 04:05 | |
*** udesale has joined #openstack-infra | 04:06 | |
*** dchen has quit IRC | 04:07 | |
*** ykarel|away has quit IRC | 04:08 | |
*** dchen has joined #openstack-infra | 04:10 | |
*** yolanda has quit IRC | 04:21 | |
*** yolanda has joined #openstack-infra | 04:22 | |
*** ykarel|away has joined #openstack-infra | 04:34 | |
*** pcaruana has joined #openstack-infra | 04:44 | |
*** pcaruana has quit IRC | 04:56 | |
*** slittle1 has joined #openstack-infra | 05:04 | |
*** slittle1 has quit IRC | 05:09 | |
*** slaweq has joined #openstack-infra | 05:11 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 05:15 |
*** slaweq has quit IRC | 05:16 | |
*** kopecmartin|offf is now known as kopecmartin | 05:18 | |
*** eernst has quit IRC | 05:22 | |
*** rcernin has joined #openstack-infra | 05:33 | |
*** ykarel|away is now known as ykarel | 05:42 | |
*** dchen has quit IRC | 05:50 | |
openstackgerrit | Merged openstack/diskimage-builder master: Enable nodepool debugging for functional tests https://review.opendev.org/672608 | 06:00 |
*** kjackal has joined #openstack-infra | 06:01 | |
*** jaosorior has quit IRC | 06:03 | |
*** rcernin has quit IRC | 06:03 | |
*** yamamoto has quit IRC | 06:07 | |
*** dchen has joined #openstack-infra | 06:09 | |
*** slaweq has joined #openstack-infra | 06:11 | |
*** slaweq has quit IRC | 06:15 | |
openstackgerrit | Kartikeya Jain proposed openstack/diskimage-builder master: Adding new dib element https://review.opendev.org/578773 | 06:18 |
*** yamamoto has joined #openstack-infra | 06:18 | |
*** rcernin has joined #openstack-infra | 06:18 | |
*** jaosorior has joined #openstack-infra | 06:20 | |
*** pcaruana has joined #openstack-infra | 06:21 | |
*** rcernin has quit IRC | 06:21 | |
*** rcernin has joined #openstack-infra | 06:21 | |
*** jaicaa has quit IRC | 06:28 | |
AJaeger | infra-root, I cannot login to Zanata at translate.openstack.org, is our openid somehow broken? I do not get a login screen at all ;( | 06:30 |
*** dpawlik has joined #openstack-infra | 06:31 | |
*** jaicaa has joined #openstack-infra | 06:31 | |
*** joeguo has quit IRC | 06:33 | |
*** slaweq has joined #openstack-infra | 06:33 | |
*** udesale has quit IRC | 06:33 | |
*** udesale has joined #openstack-infra | 06:34 | |
*** cshen has joined #openstack-infra | 06:36 | |
cshen | morning, is opendev.org DOWN? | 06:36 |
AJaeger | https://opendev.org/ is up - what exactly is failing for you? | 06:37 |
*** abhishekk has joined #openstack-infra | 06:38 | |
AJaeger | infra-root, do we have gitea problem again? | 06:38 |
openstackgerrit | Kartikeya Jain proposed openstack/diskimage-builder master: Adding support for SLES 15 in element 'sles' https://review.opendev.org/619186 | 06:38 |
AJaeger | I get: "fatal: unable to access 'https://opendev.org/openstack/openstack-manuals.git/': Empty reply from server" | 06:38 |
AJaeger | cshen: is that your problem as well? ^ | 06:38 |
AJaeger | infra-root, this is running a git pull from opendev | 06:38 |
abhishekk | hi, I am not able to access https://opendev.org/openstack/glance/ or https://opendev.org/openstack/glance_store/ | 06:39 |
abhishekk | is there any problem? | 06:39 |
cshen | AJaeger: opendev.org is not accessible. | 06:39 |
AJaeger | abhishekk: seems so, see the last lines | 06:39 |
*** marios|ruck has joined #openstack-infra | 06:39 | |
AJaeger | cshen: Which URL exactly? The git clone or anything else? | 06:40 |
cshen | what a luck, it just happened when we started our major upgrade :-D | 06:40 |
abhishekk | AJaeger, ack | 06:40 |
cshen | AJaeger: basicly, the whole site is not accesible. | 06:40 |
AJaeger | cshen: for me https://opendev.org/ works on top level, so are you running in the same problem with gig cloning that abhishekk and myself do or is there another one? How exactly can we reproduce? | 06:41 |
cshen | AJaeger: git clone failed by me as well. | 06:42 |
AJaeger | #infra log cloning with git from opendev is failing | 06:42 |
yoctozepto | AJaeger: does not from here either | 06:42 |
yoctozepto | by browser either | 06:42 |
yoctozepto | seems like connection issue? | 06:42 |
cshen | yoctozepto: it seems that the site is down. | 06:43 |
yoctozepto | cshen: AJaeger has just claimed it works for him :D | 06:43 |
yoctozepto | top-level, from browser, does not load for me | 06:43 |
cshen | yoctozepto: I can't access opendev.org from Germany right now. neither HTTP nor git clone. | 06:44 |
yoctozepto | Poland here | 06:44 |
yoctozepto | Podlachia region (north east) | 06:44 |
AJaeger | yoctozepto: git cloning fails for me, https://opendev.org (top-level) works but nothing git related like browsing repositories - from Germany | 06:45 |
abhishekk | me From India - Asia | 06:45 |
abhishekk | not able to clone or access via browser | 06:45 |
AJaeger | #status alert The git service on opendev.org is currently down. | 06:46 |
openstackstatus | AJaeger: sending alert | 06:46 |
* AJaeger sends an alert to reduce questions ;) | 06:46 | |
*** rlandy has joined #openstack-infra | 06:46 | |
AJaeger | I think we can all agree that git is broken - and without an admin around, nothing we can do until the US wakes up. So, this might take another 5 hours... | 06:47 |
AJaeger | yoctozepto, cshen , abhishekk, thanks for reporting - and sorry for this. But nothing we can do right now | 06:48 |
*** pgaxatte has joined #openstack-infra | 06:48 | |
abhishekk | AJaeger, ack | 06:48 |
-openstackstatus- NOTICE: The git service on opendev.org is currently down. | 06:49 | |
*** ChanServ changes topic to "The git service on opendev.org is currently down." | 06:49 | |
yoctozepto | AJaeger: roger that, git is definitely down when all http is down :-) | 06:49 |
yoctozepto | it's odd | 06:50 |
yoctozepto | I debugged it | 06:50 |
*** dpawlik has quit IRC | 06:50 | |
yoctozepto | http does a redirect to https | 06:50 |
yoctozepto | https negotiates tls session | 06:50 |
yoctozepto | and hangs | 06:51 |
yoctozepto | after tunnel is established | 06:51 |
yoctozepto | should be region independent | 06:51 |
yoctozepto | http://paste.openstack.org/show/754833/ | 06:52 |
*** jpena|off is now known as jpena | 06:52 | |
yoctozepto | could it be that it banned us at app level? ;d | 06:52 |
*** jpena is now known as jpena|mtg | 06:53 | |
openstackstatus | AJaeger: finished sending alert | 06:53 |
cshen | AJaeger: ack, any backup git repo which we could check out? | 06:53 |
yoctozepto | cshen: review.opendev.org seems to still work | 06:54 |
cshen | yoctozepto: same here | 06:54 |
yoctozepto | cshen: cool, I meant you can use the repos via gerrit | 06:54 |
Tengu | wait, comodo CA is still alive ?! | 06:56 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Zuul CLI: allow access via REST https://review.opendev.org/636315 | 06:56 |
yoctozepto | Tengu: that's what it seems, at least for this cert | 06:57 |
Tengu | surprizing..... didn't they get intrusion and CA stollen? | 06:57 |
Tengu | (now, wondering why not using something free like «let's encrypt» :D) | 06:58 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Add Authorization Rules configuration https://review.opendev.org/639855 | 06:58 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Web: plug the authorization engine https://review.opendev.org/640884 | 06:59 |
cshen | yoctozepto: could you give me an example of repo url? | 06:59 |
yoctozepto | Tengu: yup, as long as you don't need EV (i.e. you are not a payment processing org) | 06:59 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint https://review.opendev.org/641099 | 06:59 |
Tengu | yoctozepto: of course :). | 06:59 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: authentication config: add optional token_expiry https://review.opendev.org/642408 | 06:59 |
yoctozepto | cshen: sure, it requires you to be a registered user though: | 06:59 |
yoctozepto | [remote "gerrit"] | 07:00 |
yoctozepto | url = ssh://yoctozepto@review.opendev.org:29418/openstack/kolla-ansible.git | 07:00 |
yoctozepto | fetch = +refs/heads/*:refs/remotes/gerrit/* | 07:00 |
*** apetrich has quit IRC | 07:00 | |
yoctozepto | change to your username obviously | 07:01 |
Tengu | https://review.opendev.org/openstack/tripleo-ci also | 07:01 |
Tengu | anonymous | 07:01 |
Tengu | and http(s) | 07:01 |
cshen | or maybe use the repos in github.com? | 07:02 |
cshen | it seems to be 1:1 mirrored. | 07:02 |
yoctozepto | Tengu: it said 'not found'? | 07:02 |
yoctozepto | cshen: yeah, openstack/ are | 07:02 |
Tengu | o_O that's the link provided within the project listing of gerrit | 07:03 |
yoctozepto | though wonder if lack of opendev.org did not stop sync at some point | 07:03 |
Tengu | for instance: https://review.opendev.org/#/admin/projects/openstack/tripleo-ci | 07:03 |
*** odicha has joined #openstack-infra | 07:03 | |
yoctozepto | Tengu: yeah, it worked now | 07:03 |
*** jamesmcarthur has joined #openstack-infra | 07:04 | |
Tengu | but the git link doesn't.... | 07:04 |
Tengu | that's interesting. | 07:04 |
yoctozepto | it works from git, not browser, just checked | 07:04 |
Tengu | hmm.... didn't work for me using git. | 07:04 |
yoctozepto | then it's magic | 07:04 |
Tengu | {"changed": false, "cmd": ["/bin/git", "fetch", "origin"], "msg": "Failed to download remote objects and refs: fatal: remote error: Git repository not found\n"} | 07:05 |
Tengu | unless... wait. | 07:05 |
yoctozepto | $ git clone https://review.opendev.org/openstack/tripleo-ci | 07:05 |
yoctozepto | Cloning into 'tripleo-ci'... | 07:05 |
yoctozepto | remote: Counting objects: 13343, done | 07:05 |
yoctozepto | remote: Finding sources: 100% (13343/13343) | 07:05 |
yoctozepto | remote: Total 13343 (delta 6671), reused 11016 (delta 6671) | 07:05 |
yoctozepto | Receiving objects: 100% (13343/13343), 5.99 MiB | 3.27 MiB/s, done. | 07:05 |
yoctozepto | Resolving deltas: 100% (6671/6671), done. | 07:05 |
yoctozepto | so anonymous https works too via gerrit | 07:05 |
yoctozepto | good to know | 07:05 |
Tengu | oh, my fault. | 07:05 |
yoctozepto | next time gitea refuses to work | 07:05 |
Tengu | was still using the old project "openstack-infra". | 07:06 |
*** rcernin has quit IRC | 07:06 | |
*** rlandy is now known as rlandy|mtg | 07:07 | |
yoctozepto | AJaeger: wonder if you can send announcement about the availiablity of git repos via gerrit? | 07:07 |
yoctozepto | should make ppl happier | 07:07 |
yoctozepto | the path seems to be exact the same | 07:08 |
ianw | hrrm, this is definitely not my area of knowledge with the changes going on atm | 07:09 |
yoctozepto | #status info The git service on review.opendev.org can be used in place of opendev.org's - project paths are preserved | 07:12 |
yoctozepto | (was worth trying ;D ) | 07:12 |
*** tesseract has joined #openstack-infra | 07:15 | |
*** iurygregory has joined #openstack-infra | 07:15 | |
*** udesale has quit IRC | 07:16 | |
*** iokiwi has quit IRC | 07:17 | |
*** adriant has quit IRC | 07:17 | |
*** dpawlik has joined #openstack-infra | 07:17 | |
*** udesale has joined #openstack-infra | 07:18 | |
*** iokiwi has joined #openstack-infra | 07:18 | |
*** adriant has joined #openstack-infra | 07:18 | |
*** gfidente has joined #openstack-infra | 07:20 | |
ianw | 12660704.934832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. | 07:21 |
ianw | [12660825.726429] INFO: task jbd2/vda1-8:248 blocked for more than 120 seconds. | 07:21 |
ianw | [12660825.732761] Not tainted 4.15.0-45-generic #48-Ubuntu | 07:21 |
ianw | this is on the gitea-lb01 console | 07:22 |
ianw | http://paste.openstack.org/show/754834/ for posterity | 07:22 |
ianw | i think it needs a reboot ... i guess it can't make it worse | 07:22 |
*** aedc has joined #openstack-infra | 07:22 | |
*** rpittau|afk is now known as rpittau | 07:22 | |
*** raissa has quit IRC | 07:23 | |
*** raissa has joined #openstack-infra | 07:24 | |
*** raissa has joined #openstack-infra | 07:25 | |
ianw | great, now it is in error state | 07:26 |
yoctozepto | life is full of surprises | 07:27 |
cshen | ianw: do we have only one server for serving git service? | 07:27 |
ianw | cshen: one load balancer, anyway :/ | 07:28 |
yoctozepto | cshen: but review.opendev.org works with the same paths | 07:30 |
*** Goneri has joined #openstack-infra | 07:30 | |
yoctozepto | so it's a no-brainer actually to replace ;D | 07:30 |
ianw | i think this is a problem on vexxhost that i can't solve | 07:30 |
yoctozepto | discussed a bit above | 07:30 |
yoctozepto | cshen: change opendev.org to review.opendev.org and it should magically work (for git) | 07:31 |
cshen | yoctozepto: yes, I checked, I even checked out from github.com. But the upgrade scripts have some dependencies on opendev.org. | 07:32 |
*** kobis1 has joined #openstack-infra | 07:32 | |
yoctozepto | cshen: what scripts are you about? | 07:32 |
ianw | i don't think there's much i can do at this point. either vexxhost need to look at what's going on in the backend and recover the server, or we need to build a new one | 07:34 |
ianw | mnaser: ^ | 07:35 |
*** dchen has quit IRC | 07:35 | |
cshen | yoctozepto: https://github.com/openstack/openstack-ansible/blob/master/scripts/bootstrap-ansible.sh | 07:39 |
yoctozepto | ah, osa | 07:40 |
cshen | it pulls a lot of things from opendev.org | 07:40 |
*** ykarel is now known as ykarel|lunch | 07:41 | |
noonedeadpunk | guilhermesp probably you can help with opendev thing ^ | 07:42 |
yoctozepto | yeah, kolla's CI does too, it is broken for the moment | 07:43 |
yoctozepto | mostly due to redirect from upper-constraints to opendev | 07:43 |
yoctozepto | ;D | 07:43 |
*** priteau has joined #openstack-infra | 07:45 | |
*** marekchm has joined #openstack-infra | 07:50 | |
*** tkajinam has quit IRC | 07:53 | |
*** tkajinam has joined #openstack-infra | 07:53 | |
AJaeger | yoctozepto: upper-constraints should be downloaded from releases.openstack.org | 07:57 |
*** jaosorior has quit IRC | 07:57 | |
AJaeger | yoctozepto: e.g. https://releases.openstack.org/constraints/upper/master | 07:57 |
AJaeger | ianw: do you know how to take git01 out of haproxy? | 07:58 |
ianw | #status log sent email update about opendev.org downtime, appears to be vexxhost region-wide http://lists.openstack.org/pipermail/openstack-infra/2019-July/006426.html | 07:58 |
openstackstatus | ianw: finished logging | 07:58 |
AJaeger | ianw: thanks ! | 07:58 |
yoctozepto | AJaeger: yeah and that REDIRECTS ;D | 07:58 |
ianw | AJaeger: ^ see above email. not only does the load-balancer have issues, but the gitea backend servers also have kernel errors about storage. i think it's a region wide issues on vexxhost | 07:58 |
ianw | so yeah, just rebuilding the lb somewhere else won't help | 07:59 |
yoctozepto | to opendev which is utterly broken atm ;/ | 07:59 |
AJaeger | yoctozepto: oh, it redirects? didn't know that ;( | 08:00 |
AJaeger | ianw: argh ;/ | 08:00 |
yoctozepto | AJaeger: yeah, unfortunately, someone even suggested it is inefficient when it was proceeded | 08:00 |
yoctozepto | forgot it could also be "unstable" | 08:01 |
ianw | AJaeger: yeah sorry i've got to step away, but i think the most practical thing is to wait for vexxhost to confirm issues | 08:03 |
*** dtantsur|afk is now known as dtantsur | 08:11 | |
*** pkopec has joined #openstack-infra | 08:11 | |
*** lucasagomes has joined #openstack-infra | 08:12 | |
*** pkopec has quit IRC | 08:12 | |
*** pkopec has joined #openstack-infra | 08:12 | |
*** ralonsoh has joined #openstack-infra | 08:13 | |
AJaeger | ianw: I'm in meetings all day, so not much time either (and even less options than you have). Is the alert good enough or do you have a propoal to change it? | 08:13 |
jamesmcarthur | ianw: yeah, openstack.org, etc... are all down as well | 08:16 |
jamesmcarthur | if anyone is asking :| | 08:16 |
yoctozepto | jamesmcarthur, ianw, AJaeger: oh, that escalated pretty quickly | 08:21 |
*** apetrich has joined #openstack-infra | 08:24 | |
*** fdegir has joined #openstack-infra | 08:24 | |
AJaeger | So, is the following ok to send out "Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers." ? | 08:26 |
*** siqbal has joined #openstack-infra | 08:26 | |
ianw | jamesmcarthur: yeah, i guess that goes through the same lb | 08:27 |
yoctozepto | AJaeger: looks fine | 08:27 |
*** panda has quit IRC | 08:28 | |
yoctozepto | guys, https://review.opendev.org/671178 , are cyclic dependencies possible? | 08:29 |
yoctozepto | I get no error but it does not seem to be picked up | 08:29 |
yoctozepto | ;/ | 08:29 |
AJaeger | #status alert Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers. | 08:29 |
openstackstatus | AJaeger: sending alert | 08:29 |
AJaeger | yoctozepto: cyclic dependencies are not fine - Zuul will review to test these since it cannot put them in any sequential order | 08:30 |
*** tosky has joined #openstack-infra | 08:30 | |
*** panda has joined #openstack-infra | 08:31 | |
-openstackstatus- NOTICE: Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers. | 08:32 | |
*** ChanServ changes topic to "Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers." | 08:32 | |
* AJaeger sends an email to openstack-discuss now as well... | 08:32 | |
noonedeadpunk | so, seems that mnaser just fixed balancer | 08:33 |
AJaeger | cool! | 08:34 |
cshen | thanks, better now. | 08:34 |
AJaeger | are we green again? | 08:34 |
AJaeger | looks good on my end... | 08:35 |
cshen | I'm bootstraping. | 08:35 |
AJaeger | noonedeadpunk: thanks for telling us | 08:35 |
yoctozepto | looks green | 08:35 |
AJaeger | ok, then I'll send the "ok" ;) | 08:35 |
*** ysastri has joined #openstack-infra | 08:36 | |
*** wpp has joined #openstack-infra | 08:36 | |
AJaeger | #status ok The problem in our cloud provider has been fixed, services should be working again | 08:36 |
openstackstatus | AJaeger: finished sending alert | 08:36 |
*** tkajinam has quit IRC | 08:36 | |
openstackstatus | AJaeger: sending ok | 08:36 |
AJaeger | mnaser: thanks for fixing! | 08:36 |
*** kobis1 has quit IRC | 08:37 | |
noonedeadpunk | AJaeger: I gues you should send alert a bit earlier - probably we'll get solution faster :P | 08:38 |
*** sshnaidm has quit IRC | 08:38 | |
*** dkopper has joined #openstack-infra | 08:39 | |
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure | docs http://docs.openstack.org/infra/ | bugs https://storyboard.openstack.org/ | source https://opendev.org/opendev/ | channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/" | 08:39 | |
AJaeger | noonedeadpunk: first alert was send two hours ago - as soon as it was reported... | 08:39 |
-openstackstatus- NOTICE: The problem in our cloud provider has been fixed, services should be working again | 08:39 | |
jamesmcarthur | appears everything is back online now | 08:39 |
noonedeadpunk | ah... | 08:40 |
AJaeger | jamesmcarthur: thanks for confirming. | 08:41 |
* AJaeger is offline again... | 08:41 | |
*** sshnaidm has joined #openstack-infra | 08:42 | |
cshen | me is still working. | 08:42 |
openstackstatus | AJaeger: finished sending ok | 08:43 |
mnaser | sorry about that, this should have not happened and I'm a bit embarassed at how it all went down | 08:46 |
mnaser | And sorry for the lack of communication on my side. | 08:46 |
mnaser | Also, is it possible to drop max-servers to 0 in sjc for now? | 08:47 |
*** jamesmcarthur has quit IRC | 08:48 | |
ianw | mnaser: np, stuff happens! yep we can, is it a fast-merge situation? | 08:53 |
*** apetrich has quit IRC | 08:53 | |
mnaser | ianw: I mean I kinda disabled the user already on my side | 08:53 |
mnaser | So not really unless it breaks you a whole ton having the OpenStack Jenkins user disabled | 08:54 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Disable sjc https://review.opendev.org/672662 | 08:54 |
AJaeger | ianw: want to fast-merge ^ | 08:54 |
AJaeger | and apply on the server directly? | 08:54 |
ianw | AJaeger: heh, you beat me to it :) | 08:55 |
ianw | AJaeger: umm i can, maybe it will miss a puppet run. with the remote end disabled we'll just timeout | 08:55 |
AJaeger | you're the expert ;) | 08:56 |
ianw | i'd never claim that :) but i've set it to zero on nl03 for the mean time anyway | 08:57 |
*** ykarel|lunch is now known as ykarel | 08:57 | |
*** jtomasek has joined #openstack-infra | 09:01 | |
*** joeguo has joined #openstack-infra | 09:01 | |
*** kobis1 has joined #openstack-infra | 09:02 | |
*** siqbal90 has joined #openstack-infra | 09:02 | |
*** apetrich has joined #openstack-infra | 09:02 | |
*** siqbal has quit IRC | 09:04 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Return dependency cycle failure to user https://review.opendev.org/672487 | 09:12 |
*** lpetrut has joined #openstack-infra | 09:15 | |
*** lpetrut has quit IRC | 09:16 | |
*** lennyb has joined #openstack-infra | 09:16 | |
*** lpetrut has joined #openstack-infra | 09:16 | |
*** kobis1 has quit IRC | 09:24 | |
openstackgerrit | Merged openstack/project-config master: Disable sjc https://review.opendev.org/672662 | 09:24 |
*** e0ne has joined #openstack-infra | 09:32 | |
*** yamamoto has quit IRC | 09:39 | |
*** apetrich has quit IRC | 09:42 | |
*** ysastri has quit IRC | 09:52 | |
*** bhavikdbavishi has quit IRC | 09:52 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Fix reference pipelines syntax coloration for Pagure driver https://review.opendev.org/672677 | 09:54 |
*** Lucas_Gray has joined #openstack-infra | 09:55 | |
*** Lucas_Gray has quit IRC | 10:06 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Add reference pipelines file for Gerrit driver https://review.opendev.org/672683 | 10:12 |
*** yamamoto has joined #openstack-infra | 10:17 | |
*** yamamoto has quit IRC | 10:27 | |
*** yamamoto has joined #openstack-infra | 10:27 | |
*** siqbal has joined #openstack-infra | 10:33 | |
*** siqbal90 has quit IRC | 10:34 | |
*** abhishekk has quit IRC | 10:38 | |
*** ykarel is now known as ykarel|afk | 10:43 | |
*** jaosorior has joined #openstack-infra | 10:47 | |
*** yamamoto has quit IRC | 10:54 | |
*** yamamoto has joined #openstack-infra | 11:01 | |
*** yamamoto has quit IRC | 11:06 | |
*** adriant has quit IRC | 11:07 | |
*** adriant has joined #openstack-infra | 11:07 | |
*** jaosorior has quit IRC | 11:08 | |
*** udesale has quit IRC | 11:13 | |
*** marekchm has quit IRC | 11:13 | |
*** cshen has quit IRC | 11:25 | |
*** cshen has joined #openstack-infra | 11:28 | |
*** yamamoto has joined #openstack-infra | 11:32 | |
*** rh-jelabarre has joined #openstack-infra | 11:35 | |
*** yamamoto has quit IRC | 11:37 | |
*** stakeda has quit IRC | 11:39 | |
*** pcaruana has quit IRC | 11:42 | |
*** bhavikdbavishi has joined #openstack-infra | 11:42 | |
*** igordc has joined #openstack-infra | 11:43 | |
*** mriedem has joined #openstack-infra | 11:51 | |
*** apetrich has joined #openstack-infra | 11:58 | |
*** armax has quit IRC | 11:58 | |
*** armax has joined #openstack-infra | 11:59 | |
*** ykarel|afk is now known as ykarel | 12:00 | |
*** lmiccini has quit IRC | 12:02 | |
*** dpawlik has quit IRC | 12:02 | |
*** lmiccini has joined #openstack-infra | 12:08 | |
*** iurygregory has quit IRC | 12:11 | |
*** yamamoto has joined #openstack-infra | 12:11 | |
*** iurygregory has joined #openstack-infra | 12:11 | |
*** lmiccini has quit IRC | 12:15 | |
*** yamamoto has quit IRC | 12:17 | |
*** yamamoto has joined #openstack-infra | 12:18 | |
*** dpawlik has joined #openstack-infra | 12:21 | |
*** pcaruana has joined #openstack-infra | 12:22 | |
*** aedc has quit IRC | 12:25 | |
*** yamamoto has quit IRC | 12:27 | |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Improve SQL query performance in some cases https://review.opendev.org/672606 | 12:31 |
*** jcoufal has joined #openstack-infra | 12:34 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Add reference pipelines file for Github driver https://review.opendev.org/672712 | 12:41 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Add change replacement field in doc for start-message https://review.opendev.org/665974 | 12:44 |
*** joeguo has quit IRC | 12:47 | |
*** aaronsheffield has joined #openstack-infra | 12:56 | |
*** yamamoto has joined #openstack-infra | 13:00 | |
*** ekultails has joined #openstack-infra | 13:01 | |
*** gtarnaras has joined #openstack-infra | 13:06 | |
*** rfarr has joined #openstack-infra | 13:07 | |
*** rfarr_ has joined #openstack-infra | 13:07 | |
*** bhavikdbavishi has quit IRC | 13:07 | |
*** bhavikdbavishi has joined #openstack-infra | 13:10 | |
*** yamamoto has quit IRC | 13:14 | |
*** udesale has joined #openstack-infra | 13:16 | |
*** ykarel is now known as ykarel|away | 13:22 | |
*** jhesketh has quit IRC | 13:22 | |
*** jaosorior has joined #openstack-infra | 13:23 | |
*** jhesketh has joined #openstack-infra | 13:26 | |
*** ykarel_ has joined #openstack-infra | 13:27 | |
petevg | I've got a question about the outage earlier: I had a change that got merged around the same time as the outage, and it seems to have been merged to gerrit's view of the master branch, but not to origin's view of the master branch. | 13:29 |
petevg | This is https://opendev.org/x/microstack | 13:29 |
*** ykarel|away has quit IRC | 13:29 | |
petevg | My local view of the change that didn't get merged to origin looks like this: | 13:29 |
petevg | commit 59551ca2cdf387fb3a1e857f3aeb89912731e3f2 (HEAD -> master, gerrit/master, multipass-testing-support) | 13:30 |
petevg | As opposed to my local view of the last change to appear in "origin's" master: | 13:30 |
petevg | commit 8ea5dc8679eea1921888fec1a3d468c0b3ae09ce (origin/master, origin/HEAD) | 13:30 |
petevg | Does anybody have a suggestion for a fix? I'm thinking of just running git review on my local copy of master, which I've manually pulled from gerrit, to see if that triggers the gate to fix things ... | 13:31 |
*** goldyfruit has joined #openstack-infra | 13:32 | |
AJaeger | petevg: what is link for the change? | 13:32 |
*** ykarel_ has quit IRC | 13:32 | |
petevg | AJaeger: https://review.opendev.org/#/c/672586/ | 13:32 |
AJaeger | petevg: where exactly are you missing it? | 13:33 |
petevg | AJaeger: if I git clone https://opendev.org/x/microstack.git, the change doesn't show up in the master branch. | 13:34 |
petevg | AJaerger: (also, if I just "git pull origin master" on the previously cloned repo.) | 13:34 |
AJaeger | petevg: I see it on https://opendev.org/x/microstack - let me check cloing | 13:34 |
AJaeger | petevg: I just downloaded and it's there... | 13:35 |
AJaeger | it's also here https://opendev.org/x/microstack/commit/59551ca2cdf387fb3a1e857f3aeb89912731e3f2 | 13:35 |
petevg | AJaeger: yeah. I see it there, too. That's why I pasted the commit lines from git log above. It's in a weird state where it's merged to HEAD and gerrit/master, but not to origin/master. | 13:35 |
petevg | I'll try recloning. Maybe it fixed itself while I was poking at it. | 13:36 |
*** ricolin has quit IRC | 13:36 | |
petevg | AJaeger: nope. It's still not there when you clone. | 13:36 |
AJaeger | It is fine on my end - but we have a git farm. So, if it still fails for you, we need hepl from an admin to check check of the systems in the git farm - maybe you hit the one that is out of sync | 13:36 |
fungi | it's possible that some gitea backends are missing some objects which could have replicated at the time | 13:37 |
petevg | AJaeger: that would make sense. Just to verify, when you say "download", do you mean that you grabbed a tarball, or that you cloned w/ git? | 13:37 |
fungi | probably best if we force replication to all of them from gerrit just to be sure | 13:37 |
AJaeger | cloned with git - git clone https://opendev.org/x/microstack | 13:37 |
AJaeger | fungi: yeah... | 13:37 |
petevg | AJaeger: cool. fungi: thank you! | 13:38 |
fungi | you can reach them individually without going through the lb like http://gitea08.opendev.org:3080/x/microstack | 13:38 |
petevg | fungi: ooh, cool. I can self service on the troubleshooting next time :-) | 13:39 |
fungi | anyway, mass replicating to all of them is likely a good precaution but it will take some hours to complete and will delay replication of newer refs | 13:40 |
petevg | fungi: If I've got a new ref ready to merge, will that fix it? | 13:40 |
petevg | Because I'm selfishly okay w/ that. I don't know whether anybody else was affected, though. | 13:41 |
fungi | petevg: for that one repo, it should | 13:41 |
fungi | odds are there are plenty of missing refs if there's at least one | 13:41 |
*** wpp has quit IRC | 13:41 | |
petevg | Yeah ... | 13:41 |
*** rfarr_ has quit IRC | 13:41 | |
*** rfarr has quit IRC | 13:41 | |
petevg | I won't complain about any delays when/if you decide to kick of the mass replication, then. I have a lot of meetings today, anyway :-) | 13:42 |
fungi | i'll give #openstack-release a heads up so they don't approve any openstack release changes while this is still going on | 13:42 |
*** jaosorior has quit IRC | 13:43 | |
*** apetrich has quit IRC | 13:43 | |
*** yamamoto has joined #openstack-infra | 13:44 | |
fungi | ~17k gerrit replication tasks queued | 13:47 |
*** apetrich has joined #openstack-infra | 13:47 | |
*** yamamoto has quit IRC | 13:48 | |
AJaeger | thanks! | 13:48 |
openstackgerrit | Merged opendev/system-config master: Remove gitea02 from inventory so we can replace it https://review.opendev.org/672621 | 13:54 |
*** iurygregory has quit IRC | 13:59 | |
*** iurygregory has joined #openstack-infra | 14:02 | |
*** eernst has joined #openstack-infra | 14:02 | |
openstackgerrit | Merged openstack/project-config master: Cleanup in-tree removed jobs https://review.opendev.org/671412 | 14:03 |
*** yamamoto has joined #openstack-infra | 14:04 | |
*** yamamoto has quit IRC | 14:04 | |
*** goldyfruit has quit IRC | 14:07 | |
*** ykarel_ has joined #openstack-infra | 14:08 | |
*** wpp has joined #openstack-infra | 14:09 | |
clarkb | fungi: one trick to make it go faster is to only replicate to the gitea backends (then github and local /p are left alone) | 14:13 |
fungi | that's what i did | 14:14 |
*** gtarnaras has quit IRC | 14:14 | |
*** gtarnaras has joined #openstack-infra | 14:14 | |
fungi | in retrospect i should have skipped 02 since we're about to rip it out | 14:14 |
*** ian-pittwood has joined #openstack-infra | 14:15 | |
*** goldyfruit has joined #openstack-infra | 14:16 | |
*** wpp has quit IRC | 14:18 | |
*** bobh has joined #openstack-infra | 14:23 | |
*** dpawlik has quit IRC | 14:28 | |
ian-pittwood | I'm currently stumped by a problem I am having with Zuul. I have a tox job that I need to run in a py36 environment. I know that Zuul uses py35 by default so I added a line to set the bindep_profile to use py36. Unfortunately that didn't seem to help as the job still fails, stating that py36 wasn't found. Does anyone know what I might be missing? H | 14:30 |
ian-pittwood | ere's the zuul.yaml in question https://review.opendev.org/#/c/672599/4/.zuul.yaml | 14:30 |
clarkb | ian-pittwood: You likely need to change the nodeset. Ubuntu xenial has py35 but not 36. Bionic has py36. There should be existing py36 jobs you can use too | 14:31 |
ian-pittwood | Ok, I'll give that a try. Thank you | 14:32 |
*** ccamacho has joined #openstack-infra | 14:32 | |
clarkb | but this specific issue is related to your nodeset | 14:32 |
*** ysastri has joined #openstack-infra | 14:40 | |
*** yikun has quit IRC | 14:40 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 14:42 |
*** yamamoto has joined #openstack-infra | 14:43 | |
*** eernst has quit IRC | 14:47 | |
*** yamamoto has quit IRC | 14:53 | |
*** ccamacho has quit IRC | 14:53 | |
*** jjohnson42 has joined #openstack-infra | 14:58 | |
jjohnson42 | So I have an issue where it says 'Change has been successfully merged by Zuul' but I don't see it in the opendev git repo? | 14:59 |
*** roman_g has quit IRC | 14:59 | |
AJaeger | jjohnson42: we had some downtime this morning and currently replicate everything to our git farm to ensure they are in sync. So, I hope this will be fixed a few hours... | 15:00 |
*** ricolin_phone has joined #openstack-infra | 15:00 | |
fungi | jjohnson42: yeah, we're down to 1.25k replication tasks queued so should be caught up in the next couple hours | 15:01 |
fungi | er, 12.5k i mean | 15:01 |
jjohnson42 | ok, figured it would be something well known, just asking to double check, thanks for the info | 15:01 |
fungi | what's an order of magnitude among friends? ;) | 15:01 |
mordred | fungi: I dunno, joey vs chandler? | 15:02 |
fungi | i'm doing my best to forget that i have context to parse that punchline | 15:02 |
*** rlandy|mtg has quit IRC | 15:05 | |
*** jpena|mtg is now known as jpena|off | 15:07 | |
*** siqbal has quit IRC | 15:12 | |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: test operator / iptables https://review.opendev.org/672755 | 15:17 |
*** dklyle has quit IRC | 15:17 | |
*** _erlon_ has joined #openstack-infra | 15:18 | |
*** dklyle has joined #openstack-infra | 15:18 | |
*** dkopper has quit IRC | 15:20 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 15:21 |
*** larainema has quit IRC | 15:21 | |
*** ricolin has joined #openstack-infra | 15:22 | |
*** siqbal has joined #openstack-infra | 15:23 | |
*** e0ne has quit IRC | 15:23 | |
*** kopecmartin is now known as kopecmartin|off | 15:24 | |
mordred | fungi: this punchline is cut in half. I'd like to exchange it for a punchline that is NOT ... cut in half. | 15:24 |
*** pgaxatte has quit IRC | 15:25 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 15:25 |
*** gfidente has quit IRC | 15:26 | |
clarkb | 11k tasks. I do wonder if it goes faster when we do them one or two at a time | 15:27 |
clarkb | still waiting for gitea02 removal to show up on bridge (likely due to the replication backlog) | 15:27 |
*** odicha has quit IRC | 15:28 | |
clarkb | that must be gerrit's way of telling me to go on an early bike ride | 15:29 |
*** Goneri has quit IRC | 15:31 | |
*** ricolin_ has joined #openstack-infra | 15:33 | |
*** siqbal has quit IRC | 15:34 | |
*** ricolin_phone has quit IRC | 15:34 | |
*** ricolin has quit IRC | 15:36 | |
*** ricolin_ is now known as ricolin | 15:38 | |
*** adriancz has quit IRC | 15:39 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Assure ensure-tox installs latest tox version https://review.opendev.org/672760 | 15:39 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases https://review.opendev.org/672606 | 15:39 |
zbr_ | clarkb: mordred ^ i hope I explained well the ensure-tox change reasoning. i am curious what you think. | 15:40 |
AJaeger | do we need to sync to codesearch as well? Or will it be updated once the replicatoin is done? | 15:41 |
clarkb | zbr that eould break any users that might preselect a working tox in their image builds | 15:42 |
clarkb | AJaeger: I think codesearch pulls from opendev.org on its own so should self correct once opendev is up to date | 15:42 |
clarkb | (codeesearch is #3 requestor to opendev when I looked) | 15:42 |
zbr_ | clarkb: depends how they call it. if they call it with full path, it should not. | 15:43 |
AJaeger | clarkb: great, thanks | 15:43 |
clarkb | zbr unless that path is in that user install venv | 15:43 |
clarkb | zbr we have had to do this a couple times in the past due to changesin tox breaking backward compat | 15:44 |
zbr_ | yep, and I already see jobs failing. any ideas? | 15:44 |
*** gyee has joined #openstack-infra | 15:44 | |
clarkb | I would add a separate upgrade tox step to jobs that know they always want the latest version | 15:44 |
zbr_ | i could add a variable that tells it to update or not, default not to. | 15:44 |
zbr_ | in fact is even worse: i need to remove the system one to be sure it will work. | 15:44 |
zbr_ | clarkb: i discovered an hour ago that i was not able to add new stuff to a tox.ini file because the repository was running tox-docs on centos7, which happens to have tox 1.6. | 15:46 |
clarkb | run the job on a different nodetype is probably the quickest path fprward there | 15:46 |
corvus | zbr_: seems to me that maybe someone setting that job up wanted to make sure that development could happen on centos7? | 15:46 |
zbr_ | so I am trying to find a solution that would not break exiting system | 15:47 |
clarkb | corvus: ya that is similar to my other concern | 15:47 |
clarkb | basically that tox version choice may be intebtional | 15:47 |
zbr_ | clarkb: is not indentional in this case. so ok if I add a paramter to change behavior? so only those wanting lastest would get it. | 15:48 |
*** altlogbot_0 has quit IRC | 15:48 | |
*** ykarel_ is now known as ykarel|away | 15:49 | |
clarkb | a flag to opt into upgrading would probably be ok | 15:49 |
*** altlogbot_2 has joined #openstack-infra | 15:49 | |
*** marios|ruck has quit IRC | 15:49 | |
*** tesseract has quit IRC | 15:50 | |
zbr_ | here is an interesting finding: upgrading tox as user breaks tox on system that do not have ~/.local/bin in PATH (aka CentOS7, newer ones do have it) | 15:52 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: test operator / iptables https://review.opendev.org/672755 | 15:53 |
zbr_ | so in this particular case one user cannot have a working system-tox and a working tox-in-user-dir -- one of them will fail to import. | 15:53 |
zbr_ | workarounds: calling tox with `python -m tox` | 15:53 |
zbr_ | or removing the old one. me being inclined to like the the module calling method in general. | 15:54 |
*** ginopc has quit IRC | 15:54 | |
zbr_ | only the script is broken, module works fine, both versions. | 15:54 |
zbr_ | another approach would be to check if ~/.local/bin is in PATH and add it before calling tox, but it is bit ugly. | 15:55 |
*** siqbal has joined #openstack-infra | 15:57 | |
openstackgerrit | Merged zuul/zuul-jobs master: Skip test-setup.sh in pep8 jobs https://review.opendev.org/670133 | 15:57 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Assure ensure-tox installs latest tox version https://review.opendev.org/672760 | 15:58 |
*** cdent has joined #openstack-infra | 16:05 | |
yoctozepto | jjohnson42: re: opendev.org - I reconfigured my repos to use review.opendev.org, also wanted to report my repos are not in sync | 16:06 |
cdent | how long do we normally expect a patch to show up in opendev.org master? https://review.opendev.org/#/c/672298/ is in gerrit/master but not origin/master (where origin is the opendev.org) | 16:07 |
cdent | ah. | 16:07 |
cdent | seems it is already being discussed | 16:07 |
fungi | cdent: yoctozepto: yep, we're down to 9.6k remaining replication tasks in the queue | 16:08 |
*** gtarnaras has quit IRC | 16:08 | |
cdent | I assume that's fallout from the earlier disk issues? | 16:08 |
fungi | yep, since there were block device problems in the provider hosting the gitea servers, they ended up missing some git objects, so i initiated a full replication of all repositories to them to make sure any missing objects are fixed | 16:09 |
fungi | but that causes all replication for new refs to queue up behind that | 16:09 |
*** wpp has joined #openstack-infra | 16:10 | |
yoctozepto | fungi: thanks for background | 16:11 |
cdent | ditto | 16:11 |
mnaser | fungi: i wonder if long term, it would be faster to replicate to a 'master' gitea node that then replicates to a bunch of other ones | 16:11 |
mnaser | eliminating latency and reducing load on the gerrit server too | 16:11 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/672760 | 16:12 |
fungi | mnaser: long term we want gitea servers to be able to share a backend | 16:13 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: test operator / iptables https://review.opendev.org/672755 | 16:13 |
fungi | mnaser: but there are some enhancements it needs to be able to support that | 16:13 |
mnaser | Gotcha | 16:13 |
fungi | our original deployment model involved only replicating to one, and it mostly worked accidentally | 16:14 |
fungi | but gitea isn't actually designed for that (yet) so it stopped working when we upgraded | 16:14 |
*** mattw4 has joined #openstack-infra | 16:14 | |
fungi | and so the current design with independent backends is a workaround for now | 16:15 |
corvus | work is in progress to support that | 16:15 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 16:16 |
cdent | thank fungi, now back to my reguarly scheduled assorted manyness | 16:17 |
*** iurygregory has quit IRC | 16:20 | |
*** lucasagomes has quit IRC | 16:21 | |
*** ykarel|away has quit IRC | 16:22 | |
*** mattw4 has quit IRC | 16:23 | |
*** mattw4 has joined #openstack-infra | 16:23 | |
*** rascasoft has quit IRC | 16:23 | |
*** rascasoft has joined #openstack-infra | 16:27 | |
*** lpetrut has quit IRC | 16:29 | |
mordred | mnaser: in fact, once the work in progress to support single-shared-gitea is done, it would be made even better by manilla-cephfs - so there are several future improvement possibilities | 16:33 |
mnaser | mordred: forever hinting at the need/want of manila-cephfs :P | 16:33 |
mnaser | soon(tm) | 16:34 |
mnaser | :p | 16:34 |
mordred | mnaser: it's how I let you know I care ;) | 16:34 |
*** cdent has left #openstack-infra | 16:35 | |
*** rpittau is now known as rpittau|afk | 16:36 | |
*** ricolin has quit IRC | 16:39 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: install-openshift: bump version to 3.11.0 https://review.opendev.org/672785 | 16:40 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add clear-firewall role https://review.opendev.org/672786 | 16:41 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 16:41 |
openstackgerrit | Tristan Cacqueray proposed zuul/nodepool master: DNM: test openshift version bump https://review.opendev.org/672788 | 16:43 |
*** ykarel|away has joined #openstack-infra | 16:45 | |
*** pkopec has quit IRC | 16:48 | |
*** dtantsur is now known as dtantsur|afk | 16:49 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 16:50 |
*** chandankumar is now known as raukadah | 16:52 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add telnet to Docker Image https://review.opendev.org/672791 | 16:53 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add telnet to Docker Image https://review.opendev.org/672791 | 16:56 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: install-openshift: bump version to 3.11.0 https://review.opendev.org/672785 | 16:57 |
*** igordc has quit IRC | 16:58 | |
*** igordc has joined #openstack-infra | 16:58 | |
openstackgerrit | Tristan Cacqueray proposed zuul/nodepool master: DNM: test openshift version bump https://review.opendev.org/672788 | 16:58 |
*** ysastri has quit IRC | 16:59 | |
*** jcoufal_ has joined #openstack-infra | 17:03 | |
fungi | it's under 9000 | 17:04 |
*** jcoufal has quit IRC | 17:07 | |
*** roman_g has joined #openstack-infra | 17:08 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 17:10 |
*** diablo_rojo has joined #openstack-infra | 17:11 | |
*** armax has quit IRC | 17:13 | |
*** ian-pittwood has quit IRC | 17:19 | |
*** odicha has joined #openstack-infra | 17:19 | |
*** betherly has joined #openstack-infra | 17:19 | |
*** odicha_ has joined #openstack-infra | 17:21 | |
*** odicha__ has joined #openstack-infra | 17:22 | |
*** ralonsoh has quit IRC | 17:24 | |
*** betherly has quit IRC | 17:24 | |
*** igordc has quit IRC | 17:28 | |
*** odicha__ has quit IRC | 17:28 | |
*** odicha has quit IRC | 17:28 | |
*** odicha_ has quit IRC | 17:28 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 17:30 |
*** odicha has joined #openstack-infra | 17:33 | |
*** odicha has quit IRC | 17:33 | |
*** udesale has quit IRC | 17:34 | |
*** odicha has joined #openstack-infra | 17:36 | |
*** siqbal has quit IRC | 17:36 | |
*** bobh has quit IRC | 17:39 | |
*** weifan has joined #openstack-infra | 17:45 | |
*** odicha_ has joined #openstack-infra | 17:46 | |
clarkb | bringing the security group discussion here. Historically the two major issues with them have been 1) rax didn't support security groups and 2) they were very inefficient with group to group rules (which we'd need to rely on for multinode testing and the like) no the database. I believe rax has security groups now and that the database is no longer as sad about security groups | 17:47 |
clarkb | I think that means we could reconsider them as an option for preventing open dns resolvers and such on the internet then remove our firewall rules from the test nodes entirely | 17:47 |
*** goldyfruit has quit IRC | 17:49 | |
clarkb | Then zuul testing and everyone else testing doesn't have to worry about modifying firwall rules at job time | 17:49 |
*** psachin has quit IRC | 17:50 | |
*** armax has joined #openstack-infra | 17:50 | |
weifan | Has there been any changes to tag pushing? | 17:50 |
weifan | I was trying to push a new tag using following remote, which used to work.. | 17:50 |
weifan | ssh://<username>@review.opendev.org:29418/x/<project_name> | 17:50 |
weifan | Right now it says the push is completed, and I could also find it on pypi. But I dont see the tag on opendev for some reason.. | 17:50 |
clarkb | weifan: there was a cloud outage a little while ago that prevented us from replicating gerrit repo data to the opendev backends. That outage has been corrected and we are now in the process of rereplicating everythign to gitea to ensure it is up to date | 17:51 |
clarkb | weifan: when that process completes your tag should be present on opendev, but until then it is somewhere in the queue | 17:51 |
weifan | i see, thanks :) | 17:51 |
clarkb | at this rate I'm guessing a few more hours? | 17:51 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 17:54 |
clarkb | I think we could turn on security groups with our existing iamges (we'll just be double firewalled) then if that doesn't break anything remove the firewalls from the images. The transition should be fairly safe (and if adding security groups does break something revert the cloud laucnher change) | 17:55 |
fungi | clarkb: yeah, it's possible we could orchestrate whitelist security groups over each of the job node tenant networks... as long as things like temporary docker registries coexist in the same region as the builds which connect to them | 17:56 |
fungi | otherwise i think we're stuck with a blacklist model instead | 17:56 |
clarkb | fungi: I believe zuul enforces that requirement currently, but good point we should double check that | 17:56 |
fungi | basically if we can assume that builds which interact with each other will only attempt to connect to job nodes in the same provider/region then it's probably pretty straightforward | 17:57 |
clarkb | I'm 99% sure zuul does enforce that locality requirement (probably because we were thinking about stuff like this) | 17:57 |
clarkb | corvus would likely know 100% | 17:58 |
*** odicha has quit IRC | 17:58 | |
*** odicha_ has quit IRC | 17:59 | |
clarkb | and we should double check that security groups do work on rax (their docs say you can do that with public cloud so I expect it to work) | 17:59 |
*** jtomasek has quit IRC | 18:01 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/672760 | 18:03 |
*** goldyfruit has joined #openstack-infra | 18:04 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 18:05 |
mordred | clarkb, fungi: what's the email address we're using for when we need an opedev root email address? infra-root@openstack.org still? | 18:11 |
*** bobh has joined #openstack-infra | 18:11 | |
clarkb | mordred: yes | 18:11 |
*** mattw4 has quit IRC | 18:11 | |
mordred | clarkb: thx | 18:11 |
*** mattw4 has joined #openstack-infra | 18:11 | |
*** dklyle has quit IRC | 18:11 | |
clarkb | fungi: re locality I remember why we enforce that, it is because some clouds have ipv6 only and others are ipv4 only so we can't assume they can talk to each other even if firwalls are wide open | 18:11 |
*** dklyle has joined #openstack-infra | 18:12 | |
clarkb | the firwalls are 1980s wood paneling | 18:12 |
mordred | such lovely wood paneling | 18:12 |
*** priteau has quit IRC | 18:13 | |
fungi | yup | 18:13 |
fungi | okay, so a fairly simple (22/tcp from everywhere) whitelist is probably sufficient? | 18:13 |
clarkb | fungi: and an in group wide open rule (security group members can talk to themselves) | 18:14 |
fungi | though to allow instance-to-instance traffic we have to add the instances to groups | 18:14 |
fungi | yeah, that | 18:14 |
clarkb | that is a thing you can express in the rules too | 18:14 |
fungi | is there a default group they appear in automatically? | 18:14 |
clarkb | there is a default group | 18:14 |
clarkb | and by default that group has the talk to myself rule (but our cloud launcher removes it currently) | 18:15 |
*** auristor has quit IRC | 18:15 | |
*** jamesmcarthur has joined #openstack-infra | 18:17 | |
*** auristor has joined #openstack-infra | 18:17 | |
clarkb | we also need to open the zuul console log port | 18:17 |
clarkb | ssh + console log port + in group connectivity. Anything else missing? | 18:17 |
*** bobh has quit IRC | 18:17 | |
openstackgerrit | Merged zuul/zuul master: Improve SQL query performance in some cases https://review.opendev.org/672606 | 18:18 |
*** dims has quit IRC | 18:19 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 18:21 |
*** roman_g has quit IRC | 18:22 | |
*** igordc has joined #openstack-infra | 18:22 | |
*** roman_g has joined #openstack-infra | 18:23 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Use cloud security groups for test node isolation https://review.opendev.org/672806 | 18:28 |
clarkb | fungi: mordred ^ that roughly what it would look like (and applied to vexxhost mtl1 only in that change if we wnt to merge it, only gpu test nodes reside there currently) | 18:29 |
*** dims has joined #openstack-infra | 18:29 | |
clarkb | I believe the default ruleset if applied to instances by default if you don't specify one either so nothing in nodepool would have to change either | 18:30 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/672760 | 18:30 |
*** weifan has quit IRC | 18:35 | |
mordred | clarkb: udp? | 18:36 |
clarkb | mordred: we don't need udp inbound do we? | 18:36 |
mordred | oh- default group rule is typeless | 18:36 |
clarkb | (I think iptables treats udp as "stateful" so the outbound dns requests should get responses) | 18:36 |
clarkb | mordred: ya | 18:36 |
mordred | (was more thinking instance-to-instance traffic) | 18:36 |
*** goldyfruit has quit IRC | 18:38 | |
clarkb | 5.9k tasks to go now | 18:39 |
*** eharney has quit IRC | 18:39 | |
*** betherly has joined #openstack-infra | 18:41 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/672760 | 18:42 |
fungi | yeah, we're down to ~1/3 of the replication backlog remaining | 18:44 |
fungi | going to try and knock out some yardwork so that my evening is free to work on gitea server replacement stuff | 18:44 |
*** betherly has quit IRC | 18:45 | |
*** fdegir has quit IRC | 18:45 | |
*** fdegir has joined #openstack-infra | 18:46 | |
*** ykarel|away has quit IRC | 18:49 | |
clarkb | I think if we want to move ahead with that chagne the next two things to do would be to confirm it doesn't break anything (by applying it to vexxhost as proposed) and also to try and apply it to the rax regions | 18:50 |
clarkb | since will it work with rax and will it not break existing jobs are the two big questions | 18:50 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 18:51 |
corvus | fungi, clarkb: okay so on the firewall thing -- let me summarize and see if we're on the same page: 1) the firewall is good because it's easy for folks to mess up and accidentally create an open proxy/resolver/etc. 2) we give folks root, they can disable it if they need to. 3) it's good to have that speedbump though so that they have to think about it, so we should not remove it from the base | 18:51 |
corvus | images. 4) it is reasonable to disable the firewall for the k8s case because the very next step is that k8s is going to create a bunch of firewall rules that are not going to allow undue external access. 5) we could consider using security groups in our providers as a replacement for the firewall (but that's going to take some careful engineering since we have jobs which communicate cross-region) | 18:51 |
clarkb | corvus: yes basically and maybe 6) the major historical reasons for not using security groups are no longer present (according to neutron and rax docs) | 18:52 |
clarkb | corvus: what jobs communicate cross region? I seem to recall we couldn't do that due to ipv6 and ipv4 only clouds existing | 18:52 |
*** boden has joined #openstack-infra | 18:52 | |
fungi | i concur with the summary | 18:53 |
boden | hi... wondering if anyone has any pointers on a functional job failure related to "Error when trying to get requirement for VCS system" as shown in http://logs.openstack.org/25/672725/5/check/neutron-classifier-functional-dsvm/5eb2c85/job-output.txt.gz#_2019-07-25_18_41_29_579110 | 18:53 |
boden | is this because keystone is not in the test-requirementst.txt maybe? | 18:54 |
*** mriedem has quit IRC | 18:54 | |
mordred | boden: that's not actually an error | 18:55 |
clarkb | boden: http://logs.openstack.org/25/672725/5/check/neutron-classifier-functional-dsvm/5eb2c85/job-output.txt.gz#_2019-07-25_18_40_32_918662 is the error | 18:56 |
mordred | boden: it's an unfortunate error printed by pip because of the lack of origin remote in the repos - but is harmless ... ^^ what clarkb said | 18:56 |
corvus | clarkb: i think jobs that use the buildset registry may do that (and yes, it's a pita) | 18:56 |
clarkb | you are running into ERROR_ON_CLONE because devstack is needing to clone some repos but we've told it isn't allowed to. The way to address that is to add it to the required projects of the job or remove those services from the devstack config | 18:56 |
clarkb | boden: ^ | 18:56 |
boden | clarkb mordred thanks for that | 18:57 |
yoctozepto | did iad.rax experience issues with epel mirror around 16:50 UTC? cause different images failed to build due to different 404 packages | 18:57 |
clarkb | corvus: in cases where we pause a job with a buildset registry then other jobs consume from that? for some reason I thought we did restrict that to the same region | 18:57 |
mordred | yeah - I thought the same thing | 18:57 |
mordred | but I am most likely just wrong | 18:57 |
clarkb | yoctozepto: that is our kafs canary, that implies the fixes for falling back to the second afs server are not working | 18:57 |
clarkb | yoctozepto: can you provide direct links to where that happens it will help us and the kernel devs debug possibly | 18:58 |
corvus | clarkb, fungi, mordred: multinode jobs are restricted to the same region, but jobs which depend on other jobs aren't | 18:58 |
clarkb | corvus: got it | 18:58 |
fungi | if a job paused to serve a registry in limestone (global v6 access only) and then the build trying to use that ran in ovh (no global ipv6 egress routing) they'd be unable to talk | 18:58 |
clarkb | corvus: considering that we can't rely on that cross cloud region communication working anyway (regardless of where we put the firwall) I think we may want to fix that anyway? | 18:58 |
*** cshen has quit IRC | 18:59 | |
yoctozepto | clarkb: e.g. here http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/logs/build/ - timestamps on _FAILED_ | 18:59 |
yoctozepto | though it only pinpoints the time | 18:59 |
yoctozepto | 404 is generic ;-) | 18:59 |
clarkb | yoctozepto: why do your log files not have timestamps in them? | 18:59 |
yoctozepto | clarkb: this one does: http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/job-output.txt.gz | 19:00 |
yoctozepto | though it's all-in-one | 19:00 |
clarkb | yoctozepto: 404 is generic but we know it happens in kafs when the filesytem is being updating and clients are supposed to fallback to the secondary fs, however kafs wasn't doing that and we are running proposed changes that are supposed to fix that in kafs whcih I'm guessing they dont. That feedback is useful to the kernel | 19:00 |
yoctozepto | ok, then I pinpoint the time for ya | 19:01 |
clarkb | and ya if we have the timestamp we can check if the fs was updating at that time to correlate the two events | 19:01 |
yoctozepto | grep "HTTP Error 404" on http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/job-output.txt.gz | 19:01 |
*** betherly has joined #openstack-infra | 19:01 | |
yoctozepto | nice timestamps | 19:01 |
clarkb | yoctozepto: note you can direct link to the timestamps on that file | 19:02 |
corvus | clarkb, fungi: maybe we need to fix that by getting ipv6 in ovh | 19:02 |
clarkb | http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/job-output.txt.gz#_2019-07-25_16_44_54_226419 for example | 19:02 |
clarkb | corvus: and inap iirc | 19:02 |
clarkb | and rax | 19:02 |
clarkb | (we only support ipv6 on rax on debuntu hosts) | 19:02 |
yoctozepto | more here: http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-binary/0d3fc67/job-output.txt.gz | 19:03 |
*** mriedem has joined #openstack-infra | 19:03 | |
zbr_ | AJaeger: clarkb: i made the required changed to ensure-tox, if you can have another look it would be great. | 19:03 |
zbr_ | https://review.opendev.org/#/c/672760/ | 19:03 |
yoctozepto | clarkb: thanks, you are right, though there are many to share | 19:03 |
clarkb | yoctozepto: we only need the one probably | 19:03 |
clarkb | just enough to correlate to an updating afs volume | 19:03 |
corvus | zbr_, AJaeger: that sort of change should have a test job | 19:04 |
yoctozepto | http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-binary/0d3fc67/job-output.txt.gz#_2019-07-25_16_39_23_606826 | 19:04 |
yoctozepto | ^ earliest probably | 19:04 |
yoctozepto | seems it hit epel only | 19:04 |
yoctozepto | centos mirror seems to have worked | 19:04 |
clarkb | yoctozepto: they are separate afs volumes iirc (though I'll double check that when I look at this more closely) | 19:04 |
clarkb | currently about to consume lunch | 19:04 |
*** bobh has joined #openstack-infra | 19:05 | |
*** betherly has quit IRC | 19:05 | |
*** weifan has joined #openstack-infra | 19:07 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running https://review.opendev.org/670395 | 19:08 |
corvus | clarkb, fungi, mordred: jobs which depend on other paused jobs *request* nodes from the same provider, and will get them if that provider is still online. | 19:08 |
corvus | clarkb, fungi, mordred: so that case should usually not be a problem | 19:08 |
corvus | only in weird edge cases (like a provider going offline during a buildset) | 19:08 |
corvus | (in that case, it'll fall back on letting any provider fulfill it) | 19:09 |
*** igordc has quit IRC | 19:09 | |
*** tosky has quit IRC | 19:10 | |
corvus | clarkb, fungi, mordred: and we're talking nodepool provider here, so that's a cloud-region combo | 19:10 |
corvus | could come from a different 'pool' though | 19:11 |
*** bobh has quit IRC | 19:11 | |
*** weifan has quit IRC | 19:11 | |
clarkb | our nodepool providers are per region | 19:12 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Allow ensure-tox to upgrade tox version https://review.opendev.org/672760 | 19:13 |
clarkb | From that I think we'd be ok except for the fallback case that we also risk breaking in the ipv4 vs ipv6 case, however with security groups that would be a hard fail all the time rather than a sometimes fail | 19:14 |
zbr_ | corvus: done, added test jobs and referenced it with needed-by. see https://review.rdoproject.org/r/#/c/21594/ | 19:14 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job https://review.opendev.org/672756 | 19:15 |
*** xek_ has joined #openstack-infra | 19:15 | |
* clarkb lunches | 19:16 | |
*** xek has quit IRC | 19:17 | |
*** bhavikdbavishi has quit IRC | 19:18 | |
*** igordc has joined #openstack-infra | 19:25 | |
*** dims has quit IRC | 19:30 | |
*** goldyfruit has joined #openstack-infra | 19:32 | |
*** igordc has quit IRC | 19:32 | |
AJaeger | zbr_: we have in-tree test jobs nowadays in zuul-jobs, have a look zuul-tests.d/ directory | 19:38 |
*** rfarr has joined #openstack-infra | 19:38 | |
*** rfarr has quit IRC | 19:38 | |
*** e0ne has joined #openstack-infra | 19:39 | |
*** jamesmcarthur has quit IRC | 19:40 | |
*** jamesmcarthur has joined #openstack-infra | 19:41 | |
*** joeguo has joined #openstack-infra | 19:44 | |
*** rascasoft has quit IRC | 19:45 | |
*** jamesmcarthur has quit IRC | 19:46 | |
*** rascasoft has joined #openstack-infra | 19:47 | |
zbr_ | AJaeger: no pb with me, so you one one more job that uses this new param that triggers when someone edits this role, right? | 19:48 |
zbr_ | i personally prefer using molecule to test ansible roles, as I can easily test lots of usecases in seconds, and locally too. maybe I should make a demonstration | 19:49 |
fungi | what's nice about the existing jobs is they exercise these roles the way they'll be used in ci jobs, rather than in an abstract framework | 19:59 |
*** michael-beaver has joined #openstack-infra | 20:02 | |
*** betherly has joined #openstack-infra | 20:02 | |
*** betherly has quit IRC | 20:07 | |
*** jcoufal_ has quit IRC | 20:07 | |
*** igordc has joined #openstack-infra | 20:08 | |
*** jamesmcarthur has joined #openstack-infra | 20:11 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add clear-firewall role https://review.opendev.org/672786 | 20:15 |
corvus | zbr_: a third-party test is great, but how about a first party test? :) AJaeger had some suggestions there | 20:17 |
corvus | zbr_: this may be a candidate for testing on different platforms too; thare are examples for that | 20:18 |
*** jamesmcarthur has quit IRC | 20:19 | |
zbr_ | corvus: sure. which platforms/versions you want me to cover? | 20:20 |
corvus | zbr_: at least ubuntu-bionic (the default) plus any you don't want to break. since centos7 was a concern, you may want to include that. | 20:20 |
corvus | zbr_: there's a special macro you can use if you think it should be tested on all platofrms | 20:21 |
corvus | zbr_: http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-July/000973.html has more info too | 20:21 |
corvus | zbr_: i'm writing a patch for zuul-jobs to update the docs with the info in that ml post | 20:21 |
zbr_ | cool, that was what I expected. i will read that too, | 20:22 |
*** gyee has quit IRC | 20:22 | |
clarkb | I have WIP'd https://review.opendev.org/#/c/672806/1 given that buildset registries may run in different clouds | 20:25 |
corvus | clarkb: did you see my update? | 20:25 |
corvus | clarkb: you were 99% right about that (and i was 1% right) | 20:25 |
clarkb | oh no I missed it then | 20:25 |
corvus | so i don't think it's a problem we need to concern ourselves with | 20:26 |
corvus | see 19:08-19:11 in here; i think you were getting lunch | 20:26 |
fungi | (depends on a provider outage or similar immediate catastrophy) | 20:26 |
corvus | fungi: right | 20:26 |
clarkb | oh neat. Should I remove the WIP then? I guess the question now becomes: do we think that this is worth pursuing as it will take some measured rollout | 20:26 |
corvus | clarkb: i kinda think so? i like the idea of having a cleaner test env | 20:27 |
clarkb | k I'll remove the WIP then as Ithink the current ps is a good starting point for testing a rollout | 20:28 |
fungi | it does mean that, e.g., if someone manually troubleshooting a job wants to initiate connections to it other than those allowed by the security groups we apply will be unable to (aside from reverse tunneling or similar complexity). not sure if that's a concern | 20:28 |
fungi | i'm not personally concerned by that aspect, fwiw | 20:29 |
*** weifan has joined #openstack-infra | 20:29 | |
clarkb | looking up epel afs volume update times now | 20:29 |
corvus | i have held nodes running a docker registry and performed local actions from my workstation against them to debug. this would make that harder. not sure if that's a deal killer. | 20:29 |
*** jamesmcarthur has joined #openstack-infra | 20:30 | |
clarkb | ya you'd likely end up doing ssh -L type proxying | 20:30 |
corvus | yep. should suffice i think. | 20:30 |
fungi | gerrit replication backlog is under 3k now | 20:31 |
fungi | i think we're on track for an 8 hour completion time, which implies that it currently takes ~1 hour to perform full replication to a single gitea backend | 20:32 |
corvus | aren't they in parallel? | 20:32 |
fungi | estimating completion around 21:40z | 20:32 |
clarkb | http://paste.openstack.org/show/754873/ does seem to coincide with http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-binary/0d3fc67/job-output.txt.gz#_2019-07-25_16_39_23_606826 | 20:33 |
clarkb | ianw: ^ re kafs I don't think the fixes for falling back to other servers is working properly | 20:33 |
clarkb | yes it is in parallel | 20:33 |
clarkb | there are N threads per replication target | 20:33 |
clarkb | However, I think it may be faster if we do them one by one? seems like it didn't take me that long to run through them all after OOMs | 20:34 |
*** zbr_ has quit IRC | 20:35 | |
clarkb | I wonder if that implies we should have fewer replication threads (contention being a likely cause of slowdown when run in parallel?) | 20:35 |
fungi | ahh, yeah that i don't know about. because i issued replication commands for each of them one by one (so as to exclude local and github... i couldn't manage to get a glob/regex working) that might have caused them to get serialized? hard to tell from what's left in the backlog at this point but can probably suss it out from cacti graphs | 20:35 |
*** raissa has joined #openstack-infra | 20:36 | |
*** zbr has joined #openstack-infra | 20:37 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Update testing section https://review.opendev.org/672820 | 20:37 |
corvus | AJaeger, zbr: ^ | 20:38 |
*** diablo_rojo has quit IRC | 20:40 | |
*** cshen has joined #openstack-infra | 20:40 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add clear-firewall role https://review.opendev.org/672786 | 20:41 |
*** harlowja has joined #openstack-infra | 20:43 | |
fungi | corvus: clarkb skimming the active replication processes in the queue, they do appear to be parallelized (~4 active per destination) | 20:45 |
clarkb | looks like we do set it to 4 threads per gitea backend | 20:46 |
clarkb | that is in system-config/modules/openstack_project/manifests/review.pp | 20:47 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/672760 | 20:49 |
*** mriedem has quit IRC | 20:52 | |
*** mriedem has joined #openstack-infra | 20:53 | |
*** bobh has joined #openstack-infra | 20:53 | |
*** bobh has quit IRC | 20:59 | |
*** gyee has joined #openstack-infra | 20:59 | |
*** jamesmcarthur has quit IRC | 21:00 | |
*** Lucas_Gray has joined #openstack-infra | 21:00 | |
*** jamesmcarthur has joined #openstack-infra | 21:01 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version https://review.opendev.org/672760 | 21:02 |
*** betherly has joined #openstack-infra | 21:03 | |
zbr | corvus: thanks for documenting this, I will try to use it tomorrow as is 10pm here. For the. moment i enabled tox-molecule job for testing that role (just to compare the two approaches) | 21:04 |
*** jjohnson42 has quit IRC | 21:05 | |
*** cshen has quit IRC | 21:07 | |
*** betherly has quit IRC | 21:08 | |
*** zbr has quit IRC | 21:11 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273 | 21:11 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Trim some bazel flags https://review.opendev.org/672274 | 21:12 |
*** ekultails has quit IRC | 21:12 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running https://review.opendev.org/670395 | 21:12 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add clear-firewall role https://review.opendev.org/672786 | 21:13 |
mordred | corvus, clarkb: https://review.opendev.org/#/c/671457 is ready for re-review - I think I took care of the review comments | 21:13 |
*** jamesmcarthur has quit IRC | 21:13 | |
clarkb | mordred: safe to approve since nothing is using it yet right? | 21:14 |
*** slaweq has quit IRC | 21:15 | |
mordred | clarkb: that's right | 21:15 |
corvus | i agree | 21:15 |
clarkb | done | 21:15 |
mordred | woot! | 21:15 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Update testing section https://review.opendev.org/672820 | 21:17 |
*** cshen has joined #openstack-infra | 21:18 | |
*** diablo_rojo has joined #openstack-infra | 21:19 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Add note to clear-firewall docs https://review.opendev.org/672829 | 21:20 |
*** cshen has quit IRC | 21:23 | |
*** zbr has joined #openstack-infra | 21:26 | |
*** whoami-rajat has quit IRC | 21:28 | |
*** pcaruana has quit IRC | 21:28 | |
fungi | replication backlog is nearly down to 1k. gonna go grab dinner and by the time i'm done hopefully the haproxy config change will have taken effect and i can rip out gitea02 and start building its replacement | 21:29 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Add note to clear-firewall docs https://review.opendev.org/672829 | 21:30 |
*** zbr has quit IRC | 21:32 | |
*** boden has quit IRC | 21:32 | |
*** panda has quit IRC | 21:34 | |
*** panda has joined #openstack-infra | 21:34 | |
openstackgerrit | Merged zuul/zuul-jobs master: Add clear-firewall role https://review.opendev.org/672786 | 21:34 |
clarkb | mriedem: thank you for calling out the nova memcache thing on the config drive bug | 21:44 |
clarkb | mriedem: I left a note on it suggesting that having devstack just do it when memcache is enabled would be great | 21:44 |
*** jamesmcarthur has joined #openstack-infra | 21:46 | |
mriedem | \o/ | 21:46 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running https://review.opendev.org/670395 | 21:48 |
openstackgerrit | Merged zuul/zuul-jobs master: Add note to clear-firewall docs https://review.opendev.org/672829 | 21:50 |
*** jamesmcarthur has quit IRC | 21:51 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running https://review.opendev.org/670395 | 21:55 |
*** e0ne has quit IRC | 21:56 | |
*** rascasoft has quit IRC | 21:56 | |
openstackgerrit | Merged opendev/system-config master: Build docker images of gerrit https://review.opendev.org/671457 | 21:58 |
*** rascasoft has joined #openstack-infra | 21:58 | |
*** slaweq has joined #openstack-infra | 22:11 | |
*** bdodd_ has joined #openstack-infra | 22:12 | |
*** bdodd_ has quit IRC | 22:13 | |
clarkb | we are now processin replication events from after the great enqueing | 22:15 |
*** slaweq has quit IRC | 22:16 | |
*** betherly has joined #openstack-infra | 22:16 | |
*** rcernin has joined #openstack-infra | 22:16 | |
clarkb | and we are caught up | 22:21 |
*** betherly has quit IRC | 22:21 | |
clarkb | I think we are about half an hour from bridge's system-config updating based on where it is in the loop | 22:25 |
clarkb | hrm | 22:26 |
clarkb | except https://opendev.org/opendev/system-config/commits/branch/master is still out of date | 22:26 |
clarkb | I wonder if all of these have corrupt root disks like 06 did around the summit :/ | 22:26 |
* clarkb checks them individually | 22:27 | |
ianw | clarkb: were they rebooted after the outage? | 22:27 |
ianw | they all had various kernel messages with things like "vda" in them | 22:28 |
clarkb | ianw: I don't know | 22:28 |
clarkb | 01 and 08 have the latest system-config refs but none of the others do | 22:28 |
clarkb | I'm going to try replicating system-config to gitea02 | 22:29 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Remember tab location on build page https://review.opendev.org/672836 | 22:29 |
clarkb | unless things are cached I don't think that is working | 22:30 |
clarkb | which is very similar to the behavior we observed in gitea06 | 22:30 |
ianw | clarkb: looks like no ... gitea02 for example systemd has decided the journal is corrupt at least | 22:31 |
ianw | although, rebooting it might make it worse if it doesn't want to mount the disk any more | 22:31 |
clarkb | ianw: I guess we remove it from haproxy, reboot it, retrigger replication and see if that helps? | 22:31 |
clarkb | fungi: ^ are you back yet? | 22:32 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Use base 1 line number anchors in log view https://review.opendev.org/672837 | 22:33 |
ianw | clarkb: i looped through the gitea* servers last night and they all had similar things; especially the systemd journal unhappiness | 22:33 |
ianw | but then again, they haven't logged anything since, so maybe it's recovered | 22:35 |
clarkb | except that replication doesn't work | 22:35 |
clarkb | but maybe a reboot will solve that? | 22:35 |
clarkb | I'll remove 02 from the haproxy and reboot it | 22:37 |
*** jamesmcarthur has joined #openstack-infra | 22:38 | |
fungi | clarkb: back now | 22:38 |
clarkb | 02 has been removed | 22:38 |
ianw | clarkb: the rax rescue image thing would be good to try a fsck on the disk and see what that thinks ... | 22:39 |
ianw | not sure how to do that | 22:39 |
fungi | i did check the gitea servers and none seemed to have marked their root filesystems read-only | 22:39 |
clarkb | ianw: https://docs.openstack.org/infra/system-config/gitea.html#backend-maintenance | 22:39 |
fungi | which i would have expected if they had irrecoverable i/o errors | 22:39 |
clarkb | I'm checking gitea docker logs now to see that connections have stopped | 22:39 |
clarkb | fungi: maybe you want to grab a db backup or 10 just in case these filesystems are really unhappy? | 22:40 |
ianw | clarkb: oh i mean more mount the disk from outside and check it | 22:40 |
*** armax has quit IRC | 22:40 | |
clarkb | last request at 2019-07-25 22:38:15 so going to reboot now | 22:40 |
clarkb | ianw: oh | 22:40 |
clarkb | sorry skipped the fsck message | 22:40 |
clarkb | lets reboot since that is easy, rereplicate and check | 22:41 |
fungi | just copy the last nightly backup from one? should be fine since we haven't created new projects | 22:41 |
clarkb | fungi: ya | 22:41 |
ianw | although agree with fungi, they didn't offline themselves. and also it seemed to be a pretty hard shutoff, so it's not like some writes were getting through, but others weren't | 22:42 |
fungi | we can experiment with 02 presumably | 22:42 |
clarkb | I thik the writes ar ehappening | 22:42 |
clarkb | but you can't read them back again | 22:42 |
clarkb | anyways rebooting 02 now | 22:42 |
fungi | if this comes right back up, maybe we need to touch /forcefsck | 22:42 |
clarkb | it came right back up | 22:43 |
clarkb | waiting for docker to show happy containers then will try rereplicating | 22:43 |
*** jamesmcarthur has quit IRC | 22:44 | |
fungi | but you did confirm it had missing git objects? | 22:45 |
clarkb | fungi: yes | 22:45 |
clarkb | er not after reboot | 22:45 |
clarkb | I havne't replicated yet | 22:45 |
clarkb | panic: Failed to execute 'git config --global core.quotepath false': error: could not lock config file /data/git/.gitconfig: File exists | 22:45 |
clarkb | I am going to delete that file | 22:45 |
fungi | curious if they were still missing after a reboot to | 22:45 |
fungi | o | 22:45 |
fungi | not that i have high hopes | 22:46 |
fungi | are all 8 affected, or just some of them? any idea? | 22:46 |
clarkb | er the .lock file | 22:46 |
clarkb | fungi: 01 and 08 have the system-config refs, none of the others do | 22:46 |
clarkb | I don't know if that means 01 and 08 are ok or if it is per repo problem | 22:46 |
fungi | yeah | 22:46 |
fungi | well, i've got nothing better to do with my evening than churn through gitea server rebuilds. yardwork is done, dinner is behind me | 22:48 |
fungi | and we've ironed out most of the gotchas as of yesterday | 22:48 |
clarkb | https://gitea02.opendev.org:3000/opendev/system-config/commits/branch/master is serving content again (old content) going to trigger replication now | 22:50 |
clarkb | after triggering replication those refs are present | 22:51 |
clarkb | given that should we rotate through all 8, reboot them all, then trigger replication again? | 22:51 |
clarkb | I'm adding 02 back to haproxy since its reboot is done | 22:52 |
clarkb | I'm going to remove 03 now | 22:54 |
clarkb | any objections to preoceeding to do all of these? maybe I should start with 01? | 22:54 |
ianw | clarkb: i'm happy to help ... would a little playbook help? | 22:55 |
clarkb | ianw: maybe? the tricky bit with a playbook will be clearing the .gitconfig.lock file but only if gitea fails to start | 22:55 |
auristor | ianw: was the "5.3.0-rc1-afs-next-48c7a244 : volume is offline messages during release" e-mail sent due to additional failures of the mirror? | 22:57 |
clarkb | decided to start with 01 | 22:57 |
ianw | auristor: it was mostly an update, but i think we have a case reported above of a file that seemed missing during a release. i need to correlate it all into something readable, will respond to your mail :) | 22:58 |
clarkb | ianw: I think it may be quicker to just do it given how complicated checking that lock file may be? | 22:58 |
clarkb | (would have to check docker logs output after determining the container id to find if there are errors around the lockfile?) | 22:58 |
clarkb | 01 is back up and didn't have lock errors. Putting it back in haproxy again | 22:59 |
clarkb | I wonder if that lockfile is gonna be the canary for broken gitea replication | 22:59 |
ianw | clarkb: yep, sure. if you want to log the steps i can follow along and help out with some of the others in due course | 22:59 |
*** weifan has quit IRC | 22:59 | |
clarkb | ianw: run the disable commands in that link I pasted earlier on the load balancer. Log into giteaXY and do `docker ps -a` then `docker logs --tail 50 $ID_FOR_GITEA` that comes from previous command output when you see no new connections reboot | 23:01 |
clarkb | then on start do the docker ps -a and docker log sagain to see if it is sad about the lock file | 23:02 |
clarkb | if it is the file to delete is in /var/haproxy/data/git/.gitconfig.lock | 23:02 |
clarkb | docker should try again and it will succeed after that file is gone, then you can enable the host in haproxy as per my link earlier | 23:02 |
ianw | ok, should i try 08? | 23:02 |
fungi | clarkb: cycling through all of them makes sense. we should hold off on rebuilds i guess | 23:02 |
clarkb | sorry /var/gitea/data/git | 23:02 |
fungi | maybe i can knock some out tomorrow and over the weekend | 23:03 |
clarkb | ianw: yup I am on 03 and it is failing on the lockfile | 23:03 |
ianw | ok, bringing up some windows ... | 23:03 |
clarkb | I bet 08 doesn't fail on the lockfile because it had the system-config ref | 23:03 |
fungi | it's an interesting theory, but still hard to know for sure it's not missing something else | 23:04 |
clarkb | ya :/ | 23:04 |
clarkb | but a lock file may prevent replication from succeeding maybe? | 23:04 |
fungi | certainly possible | 23:05 |
*** aaronsheffield has quit IRC | 23:05 | |
clarkb | 03 is done, doing 04 now | 23:06 |
*** _erlon_ has quit IRC | 23:07 | |
clarkb | 04 also had lock problem | 23:08 |
clarkb | gitea logs when it starts listening on 3000 too | 23:10 |
clarkb | though the new health checks should make that a non issue if we want to enable early? | 23:10 |
fungi | just wondering if we should take this opportunity to yank several of the problem servers out of rotation and rebuild them in parallel while volume is low | 23:10 |
ianw | ok 08 rebooted, back in rotation and i can't see anything bad in logs | 23:11 |
clarkb | fungi: to do that the "right" way we have to update system-config which requires working gitea | 23:11 |
fungi | true | 23:11 |
clarkb | 04 is back up, doing 05 now | 23:12 |
ianw | i'll do 07 | 23:12 |
clarkb | 05 too had lockfile problems | 23:13 |
clarkb | (the correlation seems very strong) | 23:14 |
fungi | i guess the "wrong" way would be to put the haproxy server into emergency disable and then manually tweak the config to remove those from pools | 23:14 |
fungi | or use the command socket | 23:14 |
clarkb | fungi: using the command socket hould be safe without emergency updates | 23:14 |
clarkb | the problem is we can't use the inventory we have to launch nodes until we remove the nodes we want to replace | 23:14 |
fungi | right | 23:15 |
clarkb | 05 is done. doing 06 | 23:16 |
ianw | 07 has the lockfile issue | 23:17 |
clarkb | 06 did too | 23:17 |
*** mriedem has quit IRC | 23:18 | |
clarkb | ianw: when 07 is happy let me know and I htink I'll trigger system-config replication on all the giteas | 23:18 |
*** betherly has joined #openstack-infra | 23:18 | |
clarkb | then we can check if they have all updated, if they have then I think we trigger replication again globally | 23:18 |
clarkb | (maybe do it one gitea at a time to see if that is faster than all at once?) | 23:18 |
fungi | yeah, should definitely see if there's any speedup | 23:19 |
fungi | if it takes roughly an hour to replicate one, then the parallel replication is apparently not buying us any performance increase | 23:19 |
*** jamesmcarthur has joined #openstack-infra | 23:20 | |
fungi | and we ought to focus first on replicating to the ones we suspect are broken before the rest | 23:20 |
clarkb | 07 looks up? | 23:20 |
fungi | we could even take some /all of the ones we think have stale state out of the haproxy pools in the interim | 23:21 |
ianw | yep just came back into rotation and seems ok | 23:21 |
clarkb | alright I'm going to trigger system-config replication to all giteas now | 23:21 |
fungi | not going to try one at a time after all? | 23:21 |
clarkb | just system-config | 23:21 |
fungi | oh, right | 23:21 |
fungi | so with that we can still take a few out of the inventory and replace them while we replicate to the others | 23:22 |
*** weifan has joined #openstack-infra | 23:22 | |
clarkb | all 8 render the latest commit of system-config now | 23:22 |
*** betherly has quit IRC | 23:23 | |
clarkb | for replication should we do 01 then 03-08 in that order? skipping 02 since it is going to be replaced? | 23:23 |
clarkb | I'll trigger 01 replication now if so | 23:23 |
fungi | i's say 01 and 06 first? | 23:23 |
fungi | since 06 will also not be rebuilt | 23:23 |
clarkb | oh good point | 23:23 |
clarkb | ya 01, 06, 03, 04, 05, 07, 08 in that order | 23:24 |
clarkb | triggering 01 now | 23:24 |
fungi | i mean, serially still if you want | 23:24 |
clarkb | yes serially | 23:24 |
*** jamesmcarthur has quit IRC | 23:24 | |
clarkb | 01 is in progress now. ~2100 tasks | 23:25 |
fungi | but yeah, the new servers first, and we could work on replacing 02,03,04 together or something | 23:25 |
fungi | and then replace 05,07,08 in a second batch | 23:25 |
clarkb | fungi: we'll need a new change to the inventory if we want ot batch them | 23:25 |
clarkb | at this point unlikely to get any of them done today? so maybe we push that up for prep for tomorrow? | 23:26 |
fungi | that's fine too. i'm willing to work on some server replacements this evening but just as happy to save them for tomorrow when more folks are on hand | 23:26 |
fungi | and when we're not conflating today's incident with issues we might create with server replacements | 23:27 |
clarkb | fungi: well I don't want you to feel pressured to do that. I think we'll be ok to limp into tomorrow if these replications work | 23:27 |
*** weifan has quit IRC | 23:27 | |
clarkb | I'm going to hav eto make dinner in the near future: curry too so won't be able to type and eat :) | 23:27 |
fungi | ahh, yes, let's not get in the way of curry ;) | 23:28 |
* fungi is envious | 23:28 | |
ianw | (not something i want to take on while you're all away, not quite across it well enough) | 23:29 |
*** armax has joined #openstack-infra | 23:30 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Parse log file in action module https://review.opendev.org/672839 | 23:30 |
*** tjgresha has quit IRC | 23:31 | |
clarkb | already down to 1100 tasks | 23:31 |
clarkb | at this rate serializing will be done in ~12 minutes? | 23:32 |
clarkb | (maybe we should reduce the thread count then) | 23:32 |
*** weifan has joined #openstack-infra | 23:32 | |
fungi | i also wonder if it just goes faster when nobody's using gerrit | 23:34 |
clarkb | could be | 23:36 |
*** weifan has quit IRC | 23:37 | |
fungi | i basically started the mass replication just when the bulk of our activity was climbing for the day | 23:38 |
clarkb | and time | 23:38 |
clarkb | about 14-15 minutes? | 23:39 |
clarkb | starting 06 now | 23:39 |
clarkb | `ssh -p 29418 user@review.openstack.org replication start --url gitea06.opendev.org` is the command I'm running | 23:39 |
fungi | yeah, that was waaaay faster than earlier | 23:40 |
*** goldyfruit has quit IRC | 23:40 | |
*** sshnaidm is now known as sshnaidm|off | 23:43 | |
*** dchen has joined #openstack-infra | 23:45 | |
*** dchen has quit IRC | 23:45 | |
*** dchen has joined #openstack-infra | 23:46 | |
fungi | already more than halfway done with 06 | 23:46 |
fungi | wonder how fast it goes with two at a time | 23:47 |
clarkb | fungi: I can do 03 and 04 together next | 23:47 |
fungi | though another possibility is that 01 and 06 are faster than the rest? | 23:48 |
clarkb | could be | 23:48 |
fungi | (on faster storage owing to be created more recently) | 23:48 |
clarkb | fungi: and proper journal size | 23:48 |
fungi | yeah | 23:49 |
clarkb | however I Think we should do 03 and 04 together for science | 23:49 |
fungi | for science, yes | 23:49 |
*** jamesmcarthur has joined #openstack-infra | 23:50 | |
clarkb | 03 and 04 started | 23:53 |
*** jamesmcarthur has quit IRC | 23:55 | |
*** smcginnis has quit IRC | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!