*** tosky has quit IRC | 00:01 | |
*** smarcet has quit IRC | 00:07 | |
clarkb | melwitt: there is definitely something going on there and I undersatnd it less after looking at your example | 00:08 |
---|---|---|
clarkb | later in the console log it outputs the route table and ip-route:default via 10.1.0.1 dev eth0 is in there | 00:09 |
clarkb | which is what I think WARN: failed: route add -net "0.0.0.0/0" gw "10.1.0.1" would give you | 00:09 |
clarkb | and if that can happen without the disk trouble than maybe that is just noise | 00:09 |
clarkb | now I wonder if there is a neutron bug | 00:10 |
clarkb | and I just got distracted by shiny error and warning messages | 00:10 |
melwitt | yeah, I noticed that too in the route table dump but didn't understand it | 00:10 |
clarkb | melwitt: we may want to get someone from neutron to look at some of these with fresh eyes and see if they notice anything less shiny and distracting | 00:11 |
melwitt | haha, indeed those errors and warnings are shiny | 00:11 |
*** _alastor_ has joined #openstack-infra | 00:11 | |
melwitt | ok. I think sean-k-mooney might be able to help here, so I'll ask tomorrow | 00:12 |
clarkb | ianw: I am able to login to github with the new account given the details you provided | 00:13 |
clarkb | ianw: so that seems to work | 00:13 |
clarkb | melwitt: also given the number of hits changing the query finds (and themall being failures) maybe we should update the bug and query | 00:13 |
clarkb | it is odd that if the route add was unrelated that they would all be failures so maybe there is a thread there to pull on | 00:13 |
*** wolverineav has quit IRC | 00:14 | |
melwitt | I was thinking similar, that if the WARN is unrelated, why is it all failures. I did notice a lot of dupes in the build uuids though, so I'm not sure if there's anything I can do to reduce those with the query too | 00:14 |
melwitt | lots of hits were on the same build uuid I mean | 00:15 |
clarkb | e-r will dedup them when it makes the graphs iirc | 00:15 |
clarkb | and ya that will happen if multiple cirros boots have the same issue in one job | 00:15 |
melwitt | k | 00:15 |
clarkb | there isn't a built in uniq filter we can use with elasticsearch though | 00:15 |
*** smarcet has joined #openstack-infra | 00:18 | |
*** wolverineav has joined #openstack-infra | 00:18 | |
clarkb | in ansible if a var is set in one role via defaults and vars dir yaml files are those vars not available to subsequent roles? | 00:21 |
clarkb | http://logs.openstack.org/25/624525/1/check/openstack-infra-multinode-integration-centos-7/05ed114/job-output.txt.gz#_2018-12-11_23_58_47_946410 implies that that may be the case | 00:21 |
clarkb | dmsimard: pabelanger ^ for some reason I thought vars were global but I must be misunderstanding? | 00:21 |
*** dklyle has quit IRC | 00:21 | |
openstackgerrit | melanie witt proposed openstack-infra/elastic-recheck master: Update query for bug 1808010 https://review.openstack.org/624533 | 00:21 |
openstack | bug 1808010 in OpenStack-Gate "Tempest cirros boots fail due to lack of disk space" [Undecided,New] https://launchpad.net/bugs/1808010 | 00:21 |
*** wolverineav has quit IRC | 00:23 | |
melwitt | clarkb: you know what though, we only index the cirros log if a job fails, so we would only see the WARN message on a job failure, even if it's unrelated to the fail (for example, if it's always emitted) | 00:25 |
melwitt | *we only collect the cirros log if a job fails | 00:25 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Use mirrors if available when installing OVS on centos https://review.openstack.org/624525 | 00:27 |
clarkb | melwitt: oh good point since tempest dumps that data out on failure only | 00:28 |
clarkb | so ya could be entirely unrelated and we have some other networking bug there | 00:28 |
melwitt | yeah | 00:29 |
clarkb | we probaly do want to sort out the cirros issues even if they aren't failures (so that they don't become shiny fly traps for us in the future) | 00:29 |
clarkb | but definitely lower priority if that is the case | 00:29 |
*** dklyle has joined #openstack-infra | 00:34 | |
*** xek__ has joined #openstack-infra | 00:37 | |
*** tpsilva has quit IRC | 00:37 | |
*** smarcet has quit IRC | 00:39 | |
*** xek_ has quit IRC | 00:39 | |
clarkb | melwitt: ok, rereading the tempest portion of the log from logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/ it appears the paramiko says it is connecting so tcp works | 00:43 |
clarkb | the failure is with ssh public key auth | 00:43 |
clarkb | melwitt: that is the same as your example | 00:43 |
clarkb | melwitt: failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key is an error from yours | 00:46 |
clarkb | melwitt: is config drive enabled on your job? | 00:46 |
*** wolverineav has joined #openstack-infra | 00:48 | |
*** Swami has quit IRC | 00:49 | |
*** armax has joined #openstack-infra | 00:49 | |
clarkb | b'failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys' error on mine | 00:53 |
clarkb | now I'm fairly certain we are seeing a bug in the instance -> neutron metadataapi -> nova (metadata)api services | 00:53 |
clarkb | I've got to go now to keep an eye on kids, but hopefully ^ is useful | 00:56 |
*** jamesmcarthur has joined #openstack-infra | 00:57 | |
ianw | clarkb: thanks, if we get votes on https://review.openstack.org/#/c/624531/ i'll consider that a sign to add permissions to the account | 00:58 |
*** jamesmcarthur has quit IRC | 00:59 | |
*** agopi has joined #openstack-infra | 01:03 | |
*** jamesmcarthur has joined #openstack-infra | 01:04 | |
*** dklyle has quit IRC | 01:04 | |
*** rlandy has quit IRC | 01:05 | |
*** dave-mccowan has joined #openstack-infra | 01:15 | |
*** sthussey has quit IRC | 01:19 | |
*** mriedem has quit IRC | 01:23 | |
*** rh-jelabarre has quit IRC | 01:24 | |
*** _alastor_ has quit IRC | 01:25 | |
*** markvoelker has quit IRC | 01:41 | |
*** jamesmcarthur has quit IRC | 01:41 | |
*** jamesmcarthur has joined #openstack-infra | 01:42 | |
*** anteaya has quit IRC | 01:45 | |
*** jamesmcarthur has quit IRC | 01:46 | |
*** hwoarang has quit IRC | 01:48 | |
*** hwoarang has joined #openstack-infra | 01:53 | |
*** jamesmcarthur has joined #openstack-infra | 02:00 | |
*** armax has quit IRC | 02:02 | |
*** jamesmcarthur has quit IRC | 02:05 | |
*** pots has joined #openstack-infra | 02:05 | |
*** mrsoul has quit IRC | 02:09 | |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Use mirrors if available when installing OVS on centos https://review.openstack.org/624525 | 02:11 |
clarkb | mwhahaha: ^ I think I got it right that time | 02:11 |
clarkb | melwitt: my ssh debugging script that I added to tempest a while back should try and print the authorized user ssh keys | 02:15 |
clarkb | melwitt: mriedem looking at http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/job-output.txt.gz#_2018-12-11_11_17_44_640445 there are no contents in that so I think that confirms the issue is related to metadta service not providing that data back | 02:16 |
*** wolverineav has quit IRC | 02:17 | |
*** wolverineav has joined #openstack-infra | 02:18 | |
*** armax has joined #openstack-infra | 02:18 | |
*** xarses has joined #openstack-infra | 02:21 | |
*** wolverineav has quit IRC | 02:22 | |
*** xarses_ has joined #openstack-infra | 02:22 | |
*** jrist has quit IRC | 02:25 | |
*** xarses has quit IRC | 02:26 | |
*** fuentess has quit IRC | 02:26 | |
clarkb | what is really weird is the q-meta log file shows that path beying GETted and repsonding with a 200 | 02:27 |
clarkb | however it took 10 seconds looks like which is maybe beyond that cirros timeout? | 02:27 |
clarkb | melwitt: mriedem: I wonder if the slowness logged at http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/controller/logs/screen-n-api-meta.txt.gz#_Dec_11_10_46_57_286705 is leading to cirros client to timeout and igve up | 02:29 |
clarkb | any idea where to look for that lost time? | 02:29 |
*** bobh has quit IRC | 02:30 | |
clarkb | https://github.com/XANi/cirros/blob/master/src/usr/bin/ec2metadata#L5 the cirros timeout is 10 seconds so we are just over | 02:35 |
*** hongbin has joined #openstack-infra | 02:44 | |
*** bhavikdbavishi has joined #openstack-infra | 02:48 | |
*** jamesmcarthur has joined #openstack-infra | 02:56 | |
*** ykarel|away has joined #openstack-infra | 02:58 | |
*** agopi has quit IRC | 02:59 | |
*** xarses_ has quit IRC | 03:01 | |
*** bobh has joined #openstack-infra | 03:03 | |
*** ykarel|away has quit IRC | 03:05 | |
*** agopi has joined #openstack-infra | 03:06 | |
*** bobh has quit IRC | 03:07 | |
*** apetrich has quit IRC | 03:15 | |
*** jamesmcarthur has quit IRC | 03:24 | |
*** jamesmcarthur has joined #openstack-infra | 03:25 | |
*** armax has quit IRC | 03:27 | |
*** psachin has joined #openstack-infra | 03:27 | |
melwitt | clarkb: thanks for all that info. this is the code that could get time consuming https://github.com/openstack/nova/blob/master/nova/api/metadata/base.py#L117-L119 it makes at least one call to neutron for security groups. looking through it now (not already familiar with it) | 03:29 |
*** wolverineav has joined #openstack-infra | 03:30 | |
*** jamesmcarthur has quit IRC | 03:31 | |
*** bobh has joined #openstack-infra | 03:32 | |
*** bobh has quit IRC | 03:36 | |
*** jamesmcarthur has joined #openstack-infra | 03:36 | |
*** jamesmcarthur has quit IRC | 03:42 | |
*** jamesmcarthur has joined #openstack-infra | 03:44 | |
*** Tengu has quit IRC | 03:51 | |
*** dave-mccowan has quit IRC | 03:52 | |
*** hwoarang has quit IRC | 03:56 | |
*** Tengu has joined #openstack-infra | 03:58 | |
*** yamamoto has quit IRC | 04:02 | |
*** hwoarang has joined #openstack-infra | 04:03 | |
*** yamamoto has joined #openstack-infra | 04:06 | |
*** ykarel|away has joined #openstack-infra | 04:08 | |
*** bobh has joined #openstack-infra | 04:14 | |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Use mirrors if available when installing OVS on centos https://review.openstack.org/624525 | 04:17 |
*** udesale has joined #openstack-infra | 04:17 | |
*** bobh has quit IRC | 04:18 | |
*** jamesmcarthur has quit IRC | 04:23 | |
*** bobh has joined #openstack-infra | 04:23 | |
*** jamesmcarthur has joined #openstack-infra | 04:24 | |
*** bobh has quit IRC | 04:27 | |
*** bobh has joined #openstack-infra | 04:31 | |
*** yamamoto has quit IRC | 04:31 | |
*** bobh has quit IRC | 04:34 | |
*** pots has quit IRC | 04:43 | |
*** bobh has joined #openstack-infra | 04:43 | |
*** pots has joined #openstack-infra | 04:44 | |
*** bobh has quit IRC | 04:48 | |
*** yamamoto has joined #openstack-infra | 04:50 | |
*** yamamoto has quit IRC | 04:52 | |
*** yamamoto has joined #openstack-infra | 04:53 | |
*** hongbin has quit IRC | 04:55 | |
openstackgerrit | Michael Johnson proposed openstack-infra/project-config master: Add publish-to-pypi for octavia-lib https://review.openstack.org/624574 | 04:56 |
prometheanfire | johnsom: :D | 04:57 |
*** jamesmcarthur has quit IRC | 04:58 | |
johnsom | prometheanfire: coming to a g-r near you soon... | 04:58 |
*** xarses has joined #openstack-infra | 04:59 | |
*** xarses has quit IRC | 04:59 | |
*** xarses has joined #openstack-infra | 04:59 | |
*** bobh has joined #openstack-infra | 05:00 | |
*** bobh has quit IRC | 05:04 | |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: mirror-workspace-git-repos: Explicitly show HEAD of checked out branches https://review.openstack.org/621840 | 05:06 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: Update test-mirror-workspace-git-repos, add test https://review.openstack.org/624575 | 05:06 |
ianw | AJaeger: to your question yesterday, i think we need to merge ^ and then it will test ^^ | 05:07 |
*** ykarel|away has quit IRC | 05:09 | |
*** bobh has joined #openstack-infra | 05:09 | |
*** bobh has quit IRC | 05:14 | |
*** wolverineav has quit IRC | 05:17 | |
*** dhellmann has quit IRC | 05:19 | |
*** rtjure has quit IRC | 05:19 | |
*** dhellmann has joined #openstack-infra | 05:20 | |
*** rtjure has joined #openstack-infra | 05:22 | |
*** wolverineav has joined #openstack-infra | 05:24 | |
*** ykarel|away has joined #openstack-infra | 05:25 | |
*** ykarel|away is now known as ykarel | 05:26 | |
*** jamesmcarthur has joined #openstack-infra | 05:34 | |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: Add a note on testing https://review.openstack.org/624578 | 05:44 |
*** bobh has joined #openstack-infra | 05:46 | |
*** bobh has quit IRC | 05:51 | |
*** dklyle has joined #openstack-infra | 05:51 | |
*** dklyle has quit IRC | 05:56 | |
*** _alastor_ has joined #openstack-infra | 06:06 | |
*** wolverineav has quit IRC | 06:09 | |
*** hwoarang has quit IRC | 06:16 | |
*** hwoarang has joined #openstack-infra | 06:22 | |
*** dayou has quit IRC | 06:22 | |
*** dayou has joined #openstack-infra | 06:23 | |
*** bobh has joined #openstack-infra | 06:26 | |
*** slaweq has joined #openstack-infra | 06:29 | |
*** yboaron_ has joined #openstack-infra | 06:29 | |
*** lpetrut has joined #openstack-infra | 06:30 | |
*** bobh has quit IRC | 06:31 | |
*** ramishra has quit IRC | 06:32 | |
*** ramishra has joined #openstack-infra | 06:33 | |
*** bobh has joined #openstack-infra | 06:34 | |
*** bhavikdbavishi1 has joined #openstack-infra | 06:36 | |
*** bhavikdbavishi has quit IRC | 06:37 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 06:37 | |
*** bobh has quit IRC | 06:39 | |
*** jamesmcarthur has quit IRC | 06:42 | |
*** _alastor_ has quit IRC | 06:45 | |
*** kjackal has joined #openstack-infra | 06:52 | |
openstackgerrit | Merged openstack-infra/project-config master: Add openstack/os-api-ref to #openstack-doc https://review.openstack.org/623013 | 07:05 |
openstackgerrit | Merged openstack-infra/project-config master: Remove openstack/osc-placement from #openstack-nova https://review.openstack.org/622987 | 07:05 |
*** quiquell|off is now known as quiquell | 07:13 | |
*** e0ne has joined #openstack-infra | 07:13 | |
*** masayukig[m] has joined #openstack-infra | 07:13 | |
*** bobh has joined #openstack-infra | 07:15 | |
*** xarses_ has joined #openstack-infra | 07:17 | |
*** jamesmcarthur has joined #openstack-infra | 07:18 | |
*** bobh has quit IRC | 07:19 | |
*** jamesmcarthur has quit IRC | 07:23 | |
*** wolverineav has joined #openstack-infra | 07:25 | |
*** openstackgerrit has quit IRC | 07:29 | |
*** dklyle has joined #openstack-infra | 07:29 | |
*** wolverineav has quit IRC | 07:29 | |
*** rcernin has quit IRC | 07:30 | |
*** alexchadin has joined #openstack-infra | 07:31 | |
*** jamesmcarthur has joined #openstack-infra | 07:40 | |
*** jamesmcarthur has quit IRC | 07:44 | |
*** lpetrut has quit IRC | 07:46 | |
*** pgaxatte has joined #openstack-infra | 07:50 | |
*** xarses has quit IRC | 07:50 | |
*** xarses has joined #openstack-infra | 07:50 | |
*** bobh has joined #openstack-infra | 07:55 | |
*** apetrich has joined #openstack-infra | 07:55 | |
*** bobh has quit IRC | 08:00 | |
*** ahosam has joined #openstack-infra | 08:01 | |
*** jamesmcarthur has joined #openstack-infra | 08:01 | |
*** ahosam has quit IRC | 08:01 | |
*** ahosam has joined #openstack-infra | 08:01 | |
*** ahosam has quit IRC | 08:02 | |
*** shardy has joined #openstack-infra | 08:05 | |
*** jtomasek has joined #openstack-infra | 08:05 | |
*** jamesmcarthur has quit IRC | 08:05 | |
*** dayou has quit IRC | 08:10 | |
*** bobh has joined #openstack-infra | 08:13 | |
*** openstackgerrit has joined #openstack-infra | 08:16 | |
openstackgerrit | Merged openstack-infra/system-config master: Mirror Stein on Ubuntu from Cloud Archive https://review.openstack.org/621231 | 08:16 |
*** bobh has quit IRC | 08:17 | |
*** yboaron_ has quit IRC | 08:18 | |
*** shardy has quit IRC | 08:21 | |
*** shardy has joined #openstack-infra | 08:22 | |
*** jamesmcarthur has joined #openstack-infra | 08:22 | |
*** imacdonn has quit IRC | 08:23 | |
*** imacdonn has joined #openstack-infra | 08:23 | |
*** jamesmcarthur has quit IRC | 08:26 | |
*** dayou has joined #openstack-infra | 08:28 | |
*** e0ne has quit IRC | 08:31 | |
*** dklyle has quit IRC | 08:32 | |
evrardjp | is https://review.openstack.org/#/c/624484/2 scary , or a welcomed addition ? | 08:33 |
*** bhavikdbavishi has quit IRC | 08:35 | |
*** ccamacho has joined #openstack-infra | 08:37 | |
*** dayou has quit IRC | 08:43 | |
*** markvoelker has joined #openstack-infra | 08:44 | |
*** dayou has joined #openstack-infra | 08:44 | |
*** bobh has joined #openstack-infra | 08:45 | |
*** tosky has joined #openstack-infra | 08:46 | |
*** markvoelker has quit IRC | 08:49 | |
openstackgerrit | Jean-Philippe Evrard proposed openstack-infra/zuul-jobs master: Add docker insecure registries feature https://review.openstack.org/624484 | 08:51 |
*** jpena|off is now known as jpena | 08:55 | |
*** jonher_ has joined #openstack-infra | 08:57 | |
*** fresta_ has joined #openstack-infra | 08:57 | |
*** jonher has quit IRC | 09:00 | |
*** jonher_ is now known as jonher | 09:00 | |
*** jpich has joined #openstack-infra | 09:01 | |
*** fresta has quit IRC | 09:01 | |
*** jamesmcarthur has joined #openstack-infra | 09:04 | |
*** yamamoto has quit IRC | 09:06 | |
*** yboaron_ has joined #openstack-infra | 09:07 | |
*** jamesmcarthur has quit IRC | 09:08 | |
*** mtreinish has quit IRC | 09:11 | |
*** yamamoto has joined #openstack-infra | 09:11 | |
*** dtantsur|afk is now known as dtantsur | 09:19 | |
*** bhavikdbavishi has joined #openstack-infra | 09:19 | |
*** jamesmcarthur has joined #openstack-infra | 09:20 | |
*** bhavikdbavishi has quit IRC | 09:23 | |
*** jamesmcarthur has quit IRC | 09:24 | |
quiquell | Good morning | 09:31 |
amorin | morning | 09:31 |
quiquell | We are having issues with fedora28 nodepool images | 09:31 |
quiquell | amorin: o/ | 09:31 |
Tengu | ah.... cool, I was wondering if it was due to my patch not being in-sync with master. | 09:32 |
quiquell | amorin: Do you know something about them ? | 09:32 |
amorin | nop, sorry | 09:34 |
quiquell | amorin: Who can help me with that ? | 09:34 |
*** psachin is now known as psachin|session | 09:35 | |
*** jamesmcarthur has joined #openstack-infra | 09:36 | |
amorin | maybe frickler ? | 09:36 |
amorin | or fungi, but they are on another tz | 09:37 |
*** markvoelker has joined #openstack-infra | 09:39 | |
*** jamesmcarthur has quit IRC | 09:41 | |
frickler | quiquell: maybe you could start by explaining what you think the issue might be | 09:41 |
*** lpetrut has joined #openstack-infra | 09:41 | |
frickler | amorin: did you see my remarks yesterday? I'm not sure how we should proceed now, does is still make sense to continue with the 20-nodes-benchmarking setup? | 09:42 |
amorin | frickler: yes I saw it | 09:43 |
quiquell | frickler: yep sorry, the thing is that we have a script at tripleo with the shebag "python3 -s" | 09:43 |
quiquell | frickler: -s modifier means don't use user site package, meaning don't use pip stuff | 09:43 |
quiquell | frickler: but fedora28 has setuptools from pip since it's excluded from dnf.conf | 09:43 |
amorin | so, for now I think it's not needed to move forward until we fix our RAM issue | 09:43 |
quiquell | frickler: so it fails using "pkg_resources" that it's at setuptools | 09:44 |
amorin | anyway I moved your instances, one per host as said | 09:44 |
*** derekh has joined #openstack-infra | 09:44 | |
*** lpetrut has quit IRC | 09:45 | |
frickler | amorin: so your plan would be to install more RAM into the hosts? or adapt the quota settings to the RAM that is actually installed? I'm assuming the latter would be faster | 09:46 |
frickler | quiquell: is this a new issue? also, it sounds to me like the solution would be to either amend your script or the fedora setup. I'm guessing the former might be easier. also ianw seems to be our resident fedora expert, maybe he has some further insight | 09:48 |
amorin | the plan is a little bit different, because we are something that is leaking some ram on the host itself, not related to instances | 09:48 |
amorin | but the side effect is that instances wont have enough ram available | 09:48 |
frickler | amorin: ah, o.k. | 09:48 |
amorin | so adding more ram or setting quota is not the solution | 09:48 |
*** bhavikdbavishi has joined #openstack-infra | 09:48 | |
amorin | we need to work deeper on our leak | 09:48 |
quiquell | frickler: We have started to test python3 stuff using fedora28 for future centos versions, so it's new | 09:49 |
quiquell | frickler: I suppose the "-s" modifier is there for a reason, have to ask around | 09:49 |
quiquell | frickler: But I don't know why nodepool f28 is coocked with exclusions at dnf.conf and using pip setuptools | 09:49 |
quiquell | frickler: Someone told me about, people not messing around with the images or the like | 09:50 |
frickler | quiquell: I can only guess that we need to avoid pip >= 10 because it broke devstack | 09:50 |
*** jamesmcarthur has joined #openstack-infra | 09:51 | |
quiquell | frickler: ok, so it would be ok for a job to modify dnf.conf and install setuptools from dnf ? | 09:51 |
quiquell | frickler: in case it's not using devstack | 09:52 |
frickler | amorin: o.k., so I'm going to assume that there is nothing to be done currently on infra-root side, we'll wait for your further feedback. | 09:52 |
quiquell | ssbarnea|rover: ^ | 09:53 |
*** gfidente has joined #openstack-infra | 09:53 | |
amorin | frickler: yes | 09:53 |
frickler | quiquell: in theory anything that works for your job and doesn't break things globally would seem fine I guess | 09:53 |
quiquell | frickler: fair enough, thanks! | 09:55 |
*** jamesmcarthur has quit IRC | 09:56 | |
quiquell | frickler: Do you have a pointer to part of the code that cooks the f28 image and add the exclusion at dnf.conf ? | 09:57 |
quiquell | ssbarnea|rover: ^ | 09:57 |
frickler | quiquell: looks like this is what is installing the exclude: http://git.openstack.org/cgit/openstack/diskimage-builder/tree/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-install-pip#n145 | 09:59 |
quiquell | frickler: thanks so much | 10:00 |
ssbarnea|rover | frickler: am I correct to assume that the only reason for installing from pip and doing the exclude was because it was not available on base os? | 10:01 |
ssbarnea|rover | so if we change the logic into: "install from rpm if available, else install from pip and add exclude" would be fine, right? | 10:02 |
AJaeger | quiquell: ianw and pabelanger did most of the fedora work, best discuss with them | 10:03 |
*** electrofelix has joined #openstack-infra | 10:04 | |
*** sshnaidm|afk is now known as sshnaidm | 10:05 | |
quiquell | AJaeger: ack thanks | 10:05 |
*** e0ne has joined #openstack-infra | 10:08 | |
*** jamesmcarthur has joined #openstack-infra | 10:12 | |
*** jamesmcarthur has quit IRC | 10:17 | |
*** jamesmcarthur has joined #openstack-infra | 10:19 | |
*** quiquell is now known as quiquell|brb | 10:19 | |
*** yamamoto has quit IRC | 10:21 | |
*** pbourke has quit IRC | 10:26 | |
*** pbourke has joined #openstack-infra | 10:28 | |
*** jamesmcarthur has quit IRC | 10:28 | |
*** jamesmcarthur has joined #openstack-infra | 10:35 | |
frickler | quiquell|brb: the main issue is that the os version might be too new for what is needed in testing, i.e. the pip >= 10 issue I was mentioning earlier | 10:39 |
*** jamesmcarthur has quit IRC | 10:40 | |
*** quiquell|brb is now known as quiquell | 10:40 | |
quiquell | frickler: the pip >= 10 is only affecting devstack ? | 10:41 |
*** jamesmcarthur has joined #openstack-infra | 10:43 | |
frickler | quiquell: most likely not, but for devstack I know that in the current state it would hard fail. see https://github.com/pypa/pip/issues/4805 for some context | 10:43 |
*** abregman has joined #openstack-infra | 10:46 | |
abregman | hey, what does this means? "Incompatible requirement found!" | 10:50 |
*** jamesmcarthur has quit IRC | 10:53 | |
*** yamamoto has joined #openstack-infra | 10:54 | |
frickler | abregman: you probably need to give us a bit more context in order to be able to come up with a helpful answer | 10:58 |
abregman | frickler: reqruiement-check fails for this patch https://review.openstack.org/#/c/624521 | 11:01 |
*** jamesmcarthur has joined #openstack-infra | 11:01 | |
*** udesale has quit IRC | 11:02 | |
*** udesale has joined #openstack-infra | 11:03 | |
frickler | abregman: so that's mainly a question for the requirements team, but it seems to show the details two lines earlier http://logs.openstack.org/21/624521/6/check/requirements-check/4bb9da8/job-output.txt.gz#_2018-12-12_09_58_02_847672 | 11:03 |
*** lpetrut has joined #openstack-infra | 11:03 | |
*** yamamoto has quit IRC | 11:03 | |
*** jamesmcarthur has quit IRC | 11:06 | |
frickler | abregman: IIUC the short answer is that requirements should go into the global list first before you use them in your project. folks in #openstack-requirements probably can explain this in more detail | 11:06 |
*** abregman has quit IRC | 11:07 | |
*** abregman has joined #openstack-infra | 11:07 | |
abregman | frickler: sorry, I disconnected. did you write something? | 11:08 |
frickler | abregman: yes I did, sadly eavesdrop seems to have dropped it, too: | 11:14 |
frickler | abregman: so that's mainly a question for the requirements team, but it seems to show the details two lines earlier http://logs.openstack.org/21/624521/6/check/requirements-check/4bb9da8/job-output.txt.gz#_2018-12-12_09_58_02_847672 | 11:14 |
frickler | abregman: IIUC the short answer is that requirements should go into the global list first before you use them in your project. folks in #openstack-requirements probably can explain this in more detail | 11:14 |
*** jamesmcarthur has joined #openstack-infra | 11:14 | |
*** jamesmcarthur has quit IRC | 11:18 | |
*** bhavikdbavishi has quit IRC | 11:19 | |
*** yamamoto has joined #openstack-infra | 11:24 | |
openstackgerrit | Natal Ngétal proposed openstack/diskimage-builder master: [Configuration] Add missing py37 and corrected default envlist. https://review.openstack.org/624670 | 11:25 |
aspiers | fungi: I notice that gerritbot doesn't announce changes to openstack/governance-* anywhere (except -uc) - should I fix that? | 11:26 |
*** yamamoto has quit IRC | 11:29 | |
*** jamesmcarthur has joined #openstack-infra | 11:29 | |
*** markvoelker has quit IRC | 11:30 | |
*** e0ne has quit IRC | 11:30 | |
*** rfolco has joined #openstack-infra | 11:30 | |
aspiers | it could announce in #openstack-{dev,tc}, since openstack/governance changes are already announced there | 11:30 |
*** jamesmcarthur has quit IRC | 11:34 | |
*** smarcet has joined #openstack-infra | 11:37 | |
*** jamesmcarthur has joined #openstack-infra | 11:37 | |
openstackgerrit | Merged openstack-infra/system-config master: Ectomy some Jenkins out of the docs https://review.openstack.org/436452 | 11:53 |
*** witek has quit IRC | 11:55 | |
*** witek has joined #openstack-infra | 11:55 | |
*** jamesdenton has quit IRC | 11:56 | |
*** dtantsur is now known as dtantsur|brb | 11:56 | |
*** smarcet has quit IRC | 11:56 | |
*** tpsilva has joined #openstack-infra | 11:57 | |
*** jamesdenton has joined #openstack-infra | 12:00 | |
*** markvoelker has joined #openstack-infra | 12:05 | |
*** yamamoto has joined #openstack-infra | 12:08 | |
*** yamamoto has quit IRC | 12:11 | |
*** yamamoto has joined #openstack-infra | 12:11 | |
*** jamesmcarthur has quit IRC | 12:16 | |
*** jamesmcarthur has joined #openstack-infra | 12:17 | |
*** jamesmcarthur has quit IRC | 12:21 | |
*** abregman has quit IRC | 12:24 | |
*** jamesmcarthur has joined #openstack-infra | 12:25 | |
*** bhavikdbavishi has joined #openstack-infra | 12:39 | |
pabelanger | frickler: quiquell: AJaeger: ssbarnea|rover: There is a long history of how we install pip / setuptool / virtualenv on images, you can see some of that in the DIB url linked above. I too am actually running into this issue with dnf.conf excludes, my though is to modify dnf.conf during pre.yaml run and removed the excludes, however I have no idea what this is going to break. | 12:39 |
pabelanger | I think the better solution might be to drop excludes from dnf.conf and switch to using dnf versionlock, where we can pin the version of packages we have installed: https://dnf-plugins-core.readthedocs.io/en/latest/versionlock.html | 12:40 |
pabelanger | however, I don't believe we are going to produce images with versions of pip / setuptools from distro over latest. As we want to keep version the same across distros for testing reasons | 12:41 |
*** abregman has joined #openstack-infra | 12:41 | |
pabelanger | ianw: ^ | 12:41 |
abregman | frickler: thanks, I'll check with them | 12:42 |
quiquell | pabelanger: The pinning of RPM package would be better solution | 12:42 |
pabelanger | quiquell: yes, I think so also, right now an exclude results in 404 from dnf for packages, breaking cfgmgmt if trying to install those packages | 12:43 |
*** jpena is now known as jpena|lunch | 12:43 | |
ssbarnea|rover | i doubt pinning would work, i think that test images should respect default distro setup/behavior and try not to alter it, or the result of "testing the code on xyz distro" would not be trustable. It worth nothing to know that a tool works on a specific distro if you.... if you didn't use distro packages. | 12:44 |
ssbarnea|rover | pip got smarter recently and now is able to fail to upgrade a distro package instead of breaking the diso packge. in the past it was upgrading distro and causing very hard to discover bugs. most often where when the distro package was updated, yum was failing to update that package. | 12:46 |
*** yamamoto has quit IRC | 12:49 | |
ssbarnea|rover | this conflict between pip and system packager is a permanent source of issues. if i remember correctly debian did something smart: system packages are in different location than pip installed ones, so you cannot really override system packages with pip. | 12:49 |
quiquell | ssbarnea|rover: well I suppose --user should be the default for pip so they get separated | 12:49 |
ssbarnea|rover | clearly I am in favour or eradicating use of excludes. | 12:50 |
*** rh-jelabarre has joined #openstack-infra | 12:50 | |
quiquell | ssbarnea|rover: ack, let's play a little removing it to see if it affects our tripleo jobs or not | 12:50 |
*** bobh has quit IRC | 12:51 | |
*** bobh has joined #openstack-infra | 12:55 | |
*** bobh has quit IRC | 12:58 | |
*** jamesmcarthur has quit IRC | 12:59 | |
*** xarses has quit IRC | 13:03 | |
*** jamesmcarthur has joined #openstack-infra | 13:06 | |
*** abregman has quit IRC | 13:06 | |
*** bobh has joined #openstack-infra | 13:06 | |
*** trown|outtypewww is now known as trown | 13:07 | |
*** bobh has quit IRC | 13:11 | |
*** jamesmcarthur has quit IRC | 13:14 | |
*** jamesmcarthur has joined #openstack-infra | 13:15 | |
*** bobh has joined #openstack-infra | 13:15 | |
*** yamamoto has joined #openstack-infra | 13:16 | |
*** yamamoto has quit IRC | 13:18 | |
frickler | infra-root: devstack and tempest have now switched to running on bionic per default on master, as a result we are currently using 150 bionic vs. 200 xenial nodes. I'm thinking we should adjust the min-ready count somehow, maybe set to 10 for both as a first step? | 13:19 |
*** bobh has quit IRC | 13:20 | |
*** dtantsur|brb is now known as dtantsur | 13:20 | |
pabelanger | frickler: min-ready doesn't really affect how many jobs get launched to use images, it is more about first job for node request. There is also new logic for realitive priority, so smaller projects using xenial will likely be run first before those changed to use bionic | 13:20 |
pabelanger | basically, if we want and watch, it will al level out | 13:21 |
pabelanger | I'd actually be okay with dropping min-ready down to like 1, our clouds boot pretty fast theses days | 13:23 |
openstackgerrit | Filippo Inzaghi proposed openstack-infra/python-storyboardclient master: Don't quote {posargs} in tox.ini https://review.openstack.org/609176 | 13:23 |
*** yamamoto has joined #openstack-infra | 13:23 | |
frickler | pabelanger: I would subscribe to the latter, rax is around 5 minutes, ovh currently bursting up to 20 mins | 13:24 |
*** jamesmcarthur has quit IRC | 13:24 | |
*** jamesmcarthur has joined #openstack-infra | 13:24 | |
pabelanger | frickler: sure, in the case of openstack, the system is so busy, min-ready has minimal affect on making jobs faster. Only if there is zero patches in the queue, and number of nodes running < min-ready. Otherwise, PRs just need to wait x minutes for resoures to boot | 13:25 |
fungi | yeah, min-ready only really matters when we're not using all our capacity, then nodepool pre-boots some instances to have them ready for new requests. more often than not when people really care about responsiveness is when we're already under a backlog anyway. the main risk is in having min-ready too high for node types we almost never use, so they sit there chewing up some of our capacity to no | 13:25 |
fungi | purpose | 13:25 |
fungi | but yeah, when there are already backlogged requests for those node types, min-ready is entirely irrelevant | 13:26 |
*** markvoelker has quit IRC | 13:28 | |
pabelanger | +1 | 13:31 |
*** rlandy has joined #openstack-infra | 13:33 | |
frickler | o.k., so leave things as is? or make ubuntu-xenial==1, too, to avoid it looking more equal than others? | 13:36 |
*** jamesmcarthur has quit IRC | 13:38 | |
*** jamesmcarthur has joined #openstack-infra | 13:38 | |
pabelanger | lets confirm with clarkb, but my preference would be to set all to min-ready: 1 and let nodepool boot them as needed. Our system is busy enough to not really benefit from it now with new node request system | 13:40 |
*** eharney has joined #openstack-infra | 13:40 | |
*** jamesmcarthur has quit IRC | 13:43 | |
*** bhavikdbavishi has quit IRC | 13:44 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-dev/hacking master: Fix coverage job https://review.openstack.org/624699 | 13:48 |
*** jpena|lunch is now known as jpena | 13:48 | |
*** jamesmcarthur has joined #openstack-infra | 13:49 | |
*** bobh has joined #openstack-infra | 13:53 | |
*** dtantsur is now known as dtantsur|brb | 13:56 | |
*** markvoelker has joined #openstack-infra | 14:01 | |
*** aojea has joined #openstack-infra | 14:01 | |
openstackgerrit | Merged openstack-infra/elastic-recheck master: add query for os-vif pyroute2 open files https://review.openstack.org/624412 | 14:04 |
*** kgiusti has joined #openstack-infra | 14:13 | |
*** mriedem has joined #openstack-infra | 14:13 | |
*** fresta has joined #openstack-infra | 14:17 | |
*** jonher_ has joined #openstack-infra | 14:17 | |
*** jonher has quit IRC | 14:21 | |
*** jonher_ is now known as jonher | 14:21 | |
*** fresta_ has quit IRC | 14:21 | |
*** tk81 has joined #openstack-infra | 14:30 | |
*** irclogbot_1 has quit IRC | 14:32 | |
*** jrist has joined #openstack-infra | 14:37 | |
*** e0ne has joined #openstack-infra | 14:38 | |
*** irclogbot_1 has joined #openstack-infra | 14:41 | |
*** yamamoto has quit IRC | 14:44 | |
*** ykarel has quit IRC | 14:47 | |
dmsimard | Can anyone remind me how to check if there are any meetbot meetings in progress other than with the calendar ? | 14:52 |
fungi | it's hard to know now that lots of teams are running meetings in their own channels | 14:52 |
fungi | i used to just look at the meeting channels since i lurk in all of them | 14:53 |
fungi | but that doesn't work any longer | 14:53 |
dmsimard | fungi: right, that was my point | 14:53 |
*** Miouge has quit IRC | 14:55 | |
fungi | i don't know of any other way than by parsing the calendar data, no. maybe a quick python script to check whether a specific time overlaps any of the time ranges in the yaml data would be useful to add to the yaml2ical repo or the meetings repo | 14:55 |
*** irclogbot_1 has quit IRC | 14:55 | |
*** jamesmcarthur has quit IRC | 14:55 | |
fungi | alternatively, it might be possibel to parse the meetbot daemon's log directly on eavesdrop.o.o to see if it records any meetings starting which it didn't record ending | 14:56 |
fungi | er, possible | 14:56 |
fungi | depending on whether you care about finding running meetings which aren't on the schedule or which have run over their allotted times | 14:56 |
*** Miouge has joined #openstack-infra | 14:57 | |
*** ykarel has joined #openstack-infra | 14:58 | |
*** jamesmcarthur has joined #openstack-infra | 15:04 | |
openstackgerrit | Matt Riedemann proposed openstack-infra/elastic-recheck master: Add query for network-vif-plugged timeout bug 1808171 https://review.openstack.org/624728 | 15:09 |
openstack | bug 1808171 in neutron "TaggedBootDevicesTest.test_tagged_boot_devices intermittently fails waiting for network-vif-plugged event which neutron does not send" [Medium,Confirmed] https://launchpad.net/bugs/1808171 | 15:09 |
*** jamesmcarthur has quit IRC | 15:09 | |
evrardjp | I am trying to use Zuul's tool/encrypt_secret.py can someone tell me what I should use as endpoint for infra? http://susepaste.org/view//5313605 | 15:16 |
mriedem | ssbarnea|rover: were you going to update this e-r query? https://review.openstack.org/#/c/621004/ | 15:17 |
*** markvoelker has quit IRC | 15:18 | |
fungi | evrardjp: https://zuul.openstack.org/ should work, but your project name is openstack/openstack-helm-images not just openstack-helm-images right? | 15:18 |
evrardjp | fungi: oh right. Thanks! | 15:18 |
evrardjp | me silly. | 15:18 |
ssbarnea|rover | mriedem: planning to, just too many things to do. if you want it faster feel free to update it. | 15:18 |
mriedem | ok | 15:19 |
fungi | evrardjp: https://docs.openstack.org/infra/manual/zuulv3.html#secret-variables | 15:20 |
*** irclogbot_1 has joined #openstack-infra | 15:20 | |
*** yamamoto has joined #openstack-infra | 15:20 | |
evrardjp | fungi: it's just me being silly -- I went through that doc, just ... my bad. | 15:21 |
evrardjp | oh wait. | 15:21 |
evrardjp | I didn't see that docs | 15:21 |
evrardjp | but yeah, it's still my bad. | 15:21 |
fungi | evrardjp: the bit in the infra manual is specific to our deployment of zuul, so that you know the correct tenant name and url | 15:21 |
*** jrist has quit IRC | 15:21 | |
fungi | it does link to the more general feature info in zuul's documentation for those who are interested in the details though | 15:22 |
*** alexchadin has quit IRC | 15:22 | |
evrardjp | yeah :) it's easier to copy and paste and not make mistakes like a silly eventingmonkey | 15:22 |
evrardjp | woops | 15:22 |
evrardjp | evrardjp | 15:22 |
evrardjp | bad tab! | 15:22 |
evrardjp | sorry eventingmonkey | 15:22 |
*** lpetrut has quit IRC | 15:22 | |
evrardjp | seems a bad end of day for me | 15:22 |
fungi | or a good time to end your day perhaps? ;) | 15:23 |
fungi | evrardjp: i also like using --infile and --outfile options with that tool to cut down on copy/paste issues with the plaintext and encrypted secret data too | 15:24 |
evrardjp | fungi: Yeah I am concerned of those :) | 15:25 |
*** dtantsur|brb is now known as dtantsur | 15:25 | |
evrardjp | fungi: on top of that it's not me who'll do said copy and paste -- intermediaries are generally bad for those things | 15:25 |
evrardjp | thanks for the advice fungi ! | 15:28 |
fungi | any time! | 15:28 |
evrardjp | (including the end of day one, but sadly I can't take that one) | 15:28 |
*** jamesmcarthur has joined #openstack-infra | 15:31 | |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Add query for glance-api proxy error bug 1808063 https://review.openstack.org/624524 | 15:37 |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Add query for network-vif-plugged timeout bug 1808171 https://review.openstack.org/624728 | 15:37 |
openstack | bug 1808063 in OpenStack-Gate "glanceclient.exc.HTTPBadGateway: 502 Proxy Error during server snapshot" [Undecided,Confirmed] https://launchpad.net/bugs/1808063 | 15:37 |
openstack | bug 1808171 in neutron "TaggedBootDevicesTest.test_tagged_boot_devices intermittently fails waiting for network-vif-plugged event which neutron does not send" [Medium,Confirmed] https://launchpad.net/bugs/1808171 | 15:37 |
openstackgerrit | Frank Kloeker proposed openstack-infra/irc-meetings master: [trivial] fix I18n meeting-id https://review.openstack.org/624742 | 15:40 |
*** lpetrut has joined #openstack-infra | 15:48 | |
*** armax has joined #openstack-infra | 15:53 | |
*** jrist has joined #openstack-infra | 15:54 | |
*** lpetrut has quit IRC | 15:56 | |
*** yamamoto has quit IRC | 15:58 | |
*** yamamoto has joined #openstack-infra | 15:58 | |
*** kjackal has quit IRC | 16:04 | |
*** kjackal has joined #openstack-infra | 16:07 | |
*** ykarel is now known as ykarel|away | 16:10 | |
*** smarcet has joined #openstack-infra | 16:13 | |
*** gyee has joined #openstack-infra | 16:19 | |
*** ykarel|away has quit IRC | 16:19 | |
*** udesale has quit IRC | 16:19 | |
*** smarcet has quit IRC | 16:20 | |
JpMaxMan | Hey fungi | 16:23 |
JpMaxMan | I had a couple of questions about https://review.openstack.org/#/c/624523/ - should I ask here or message you directly? | 16:24 |
*** smarcet has joined #openstack-infra | 16:26 | |
clarkb | JpMaxMan: usually best to ask the channel then others can answer too if they are able | 16:27 |
JpMaxMan | cool - so with regards to the Missing zuul/main.yaml change - I'm not entirely sure what that is referring to.... | 16:28 |
*** rfolco is now known as rfolco_doctor | 16:28 | |
clarkb | https://docs.openstack.org/infra/manual/creators.html#add-project-to-zuul is the piece of documentation that explains that. Basically zuul takes action on repos that it has been told about, so we have to edit the zuul config to tell it about your new project | 16:29 |
clarkb | (otherwise zuul ignores events from that project repo) | 16:29 |
JpMaxMan | since it is a sandbox I think we would want Zuul to ignore it right? | 16:30 |
*** bhavikdbavishi has joined #openstack-infra | 16:31 | |
clarkb | possibly. At the very least you need zuul to run the noop jobs to get your Verified +1 and +2 votes so that things can merge | 16:31 |
clarkb | but zuul involvment can stop there | 16:31 |
clarkb | (though you may want to have zuul do other useful tasks) | 16:32 |
*** Alvass has joined #openstack-infra | 16:32 | |
*** _alastor_ has joined #openstack-infra | 16:32 | |
JpMaxMan | alright - I'll add that - open to suggestions if you think it should do more? | 16:32 |
Alvass | Hi, I might have found a bug with zuul-executor executor/server.py, not sure where I should report this | 16:33 |
JpMaxMan | my second question was regarding specifying just the master branch. how do I specify that in the upstream: paramater of projects.yaml | 16:33 |
clarkb | Alvass: zuul tracks bugs on https://storyboard.openstack.org and has #zuul on freenode for IRC discussion. There is also a zuul-discuss mailing list at lists.zuul-ci.org | 16:34 |
clarkb | Alvass: any one of those three locations would be a good place to start | 16:34 |
Alvass | clarkb thanks | 16:34 |
fungi | JpMaxMan: sorry, in a meeting. our automation is going to perform it's initial import from all branches and tags from whatever repo you list in the "upstream" parameter | 16:37 |
fungi | you could create a fork and delete all the branches/tags you don't want imported, then list that instead | 16:37 |
JpMaxMan | got it - so its up to the upstream repo to be clean | 16:38 |
JpMaxMan | sounds good | 16:38 |
JpMaxMan | ok I'll work on it - thanks! | 16:38 |
*** _alastor_ has quit IRC | 16:40 | |
openstackgerrit | Matt Riedemann proposed openstack-infra/elastic-recheck master: Update query for bug 1721093 https://review.openstack.org/621004 | 16:40 |
openstack | bug 1721093 in OpenStack-Gate "Zuul v3 tasks can end up in an UNREACHABLE state" [Undecided,Confirmed] https://launchpad.net/bugs/1721093 | 16:40 |
*** Alvass has quit IRC | 16:47 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: [trivial] fix I18n meeting-id https://review.openstack.org/624742 | 16:52 |
*** electrofelix has quit IRC | 16:53 | |
openstackgerrit | Jp Maxwell proposed openstack-infra/project-config master: Adding the netlify-sandbox project https://review.openstack.org/624523 | 16:53 |
*** yboaron_ has quit IRC | 16:55 | |
*** pgaxatte has quit IRC | 16:57 | |
*** ykarel has joined #openstack-infra | 17:03 | |
*** amuller has joined #openstack-infra | 17:03 | |
amuller | hi ho, hi ho. I'm trying to have a gate job require a github project (in this case skydive) | 17:03 |
amuller | check out: https://review.openstack.org/#/c/624494/5/.zuul.yaml | 17:03 |
openstackgerrit | Jp Maxwell proposed openstack-infra/project-config master: Adding the netlify-sandbox project https://review.openstack.org/624523 | 17:03 |
amuller | getting this error: http://logs.openstack.org/94/624494/4/check/neutron-tempest-plugin-dvr-multinode-scenario/9f22757/job-output.txt.gz#_2018-12-12_15_08_47_958015 | 17:04 |
amuller | any idea what is the syntax to add a github project in the 'required projects' list? | 17:04 |
fungi | amuller: they would need to release packages to pypi | 17:05 |
amuller | fungi: so they do have a python client on pypi | 17:05 |
amuller | I'm getting that via requirements.txt | 17:06 |
amuller | what I need in the job is the devstack plugin they defined in their github repo | 17:06 |
fungi | oh, required-projects in zuul, not global requirements in openstack, got it | 17:06 |
*** quiquell is now known as quiquell|off | 17:06 | |
fungi | pretty sure to be able to have it as a required-project (so that you can take advantage of cross-repo dependencies) it needs to be included in the zuul "openstack" tenant's main list of repositories for the github.org connection | 17:07 |
amuller | I don't need cross-repo dependencies, I don't think | 17:07 |
amuller | really what I'm trying to do is to pull in the skydive devstack plugin | 17:07 |
fungi | er, i guess it's github.com (you can see how much time i spend on the site) | 17:07 |
clarkb | I'm guessing the underlying issue here is that set error on clone | 17:08 |
clarkb | so the github plugin can't be cloned from github | 17:08 |
amuller | so should I set ERROR_ON_CLONE to False for jobs that want to use skydive? | 17:09 |
clarkb | amuller: a better approach may be to clone skydive into the correct location outside of devstack first | 17:10 |
clarkb | then the devstack check will still apply to everything else, ut you can get the repo on disk | 17:10 |
amuller | hmm | 17:11 |
amuller | not sure how to do that and still to get the devstack plugin executed on stack.sh | 17:11 |
clarkb | ya add a pre-run playbook that clones the repo to /opt/stack/skydive (or whatever the path is supposed to be) | 17:11 |
amuller | it's weird, locally it all works fine | 17:11 |
clarkb | amuller: its a sanity check for gating because devstack doesn't know how to set up repos to test changes under test | 17:11 |
clarkb | amuller: instead zuul does that for devstack, then we disable any cloning in devstack so devstack doesn't undo zuul' | 17:12 |
clarkb | s work | 17:12 |
clarkb | I think corvus has talked about making required projects less strict so you can require some project from github and have zuul set it up for you, but not pull in the other implied zuul config stuff | 17:13 |
clarkb | if/when this happens it will make this simpler, but isn't implemented yet | 17:13 |
amuller | new to zuul and these type of changes... so I add a new playbook in the repo, and add it to the pre-run list for the jobs I'm interested in? | 17:15 |
amuller | and the playbook will git clone the github repo to the correct dir | 17:15 |
amuller | and how do I ensure that devstack will use that plugin? | 17:15 |
clarkb | amuller: yup I think that will make devstack happy in that context because skydive will be present and no cloning will need to happen | 17:16 |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Update query for bug 1721093 https://review.openstack.org/621004 | 17:16 |
clarkb | amuller: you enable it as before. The difference is devstack will notice the repo already exists and won't have to clone it so no error happens | 17:16 |
openstack | bug 1721093 in OpenStack-Gate "Zuul v3 tasks can end up in an UNREACHABLE state" [Undecided,Confirmed] https://launchpad.net/bugs/1721093 | 17:16 |
amuller | yeah got it | 17:16 |
amuller | ok | 17:16 |
amuller | let's see if I can Google my way to get the Ansible playbook to do what it needs to do :) | 17:16 |
amuller | clarkb++ | 17:17 |
clarkb | amuller: feel free to push something up even if it doesn't seemquite right. We can probably help once we've got error messages from ansible :) | 17:17 |
amuller | for the ansible playbook I should be able to develop that locally instead of pushing 10 patchsets | 17:17 |
amuller | hopefully zuul will run it the same way | 17:18 |
amuller | as far as users and whatever | 17:18 |
*** graphene has joined #openstack-infra | 17:18 | |
fungi | but also, once you do push it, zuul can run it on your proposed addition and provide you feedback too | 17:18 |
*** psachin|session has quit IRC | 17:20 | |
*** e0ne has quit IRC | 17:21 | |
*** jamesmcarthur has quit IRC | 17:22 | |
*** jamesmcarthur has joined #openstack-infra | 17:22 | |
clarkb | frickler: mriedem melwitt to TL;DR the cirros ssh issues, they all root cause down to metadata being slow, but in the case of running out of disk we are attempting to use config drive first (which would ignore the metadata server), but since disk is full that fails then we fall back to metadata then that times out and fails? | 17:25 |
clarkb | so thats actually a couple bugs in one. That is a fun one :) | 17:25 |
mriedem | metadata api being slow? | 17:26 |
mriedem | as in the guest doesn't get network info fast enough from the meta api? | 17:26 |
*** jpich has quit IRC | 17:26 | |
clarkb | mriedem: yup, cirros has a 10 second timeout on that network request, and nova was taking 10.something seconds to respond | 17:26 |
melwitt | I couldn't pinpoint what part of it was taking > 10 seconds. but I left notes on the launchpad bug | 17:26 |
*** dtantsur is now known as dtantsur|afk | 17:26 | |
melwitt | I assumed a call to neutron but was having trouble matching up the metadata API log to the neutron log | 17:27 |
clarkb | mriedem: in the case of the disk issues in the logs, we are trying config-drive first and that fails so we fall back on metadata after | 17:27 |
mriedem | ok, there is a known db query perf bug in the metadata api where we're doing some unnecessary joins | 17:27 |
mriedem | maybe removing that would speed things up | 17:27 |
clarkb | but in the case melwitt found I don't think we attempt config drive at all so that explains why we see the same problem in both cases | 17:27 |
mriedem | clarkb: want to link nova to that bug? | 17:27 |
clarkb | mriedem: sure | 17:27 |
mriedem | we likely need some profiling of the metadata api code since it's rarely touched | 17:27 |
clarkb | done | 17:27 |
clarkb | it sounded like frickler was talking to cirros about fixing the config-drive failures on their end | 17:28 |
melwitt | mriedem: yeah, it's really hard to tell because each thing is just a singular log message | 17:28 |
melwitt | as far as looking at it from a failed run sans profiling | 17:28 |
mriedem | https://bugs.launchpad.net/nova/+bug/1799298 was the other thing | 17:29 |
openstack | Launchpad bug 1799298 in OpenStack Compute (nova) rocky "Metadata API cross joining instance_metadata and instance_system_metadata" [Medium,Triaged] | 17:29 |
clarkb | and the note about the route add failing seems to be just noise since the route is tehre when the route table is echoed | 17:29 |
mriedem | workday reported that | 17:29 |
mriedem | ok added a note to https://bugs.launchpad.net/openstack-gate/+bug/1808010 | 17:31 |
openstack | Launchpad bug 1808010 in OpenStack-Gate "Tempest cirros boots fail due to lack of disk space" [Undecided,New] | 17:31 |
clarkb | I should update the bug title too | 17:31 |
fungi | heading out for a brisk, chilly walk but will be back well before the storyboard meeting at 19:00 | 17:32 |
clarkb | new title is a bit long but hopefully captures what is going on | 17:32 |
melwitt | this bug is heating up | 17:34 |
melwitt | I read through the db cross join bug and ML thread, kinda hard to believe that could take > 10 seconds. but what do I know | 17:35 |
clarkb | melwitt: we may want to check against the dstat information to see if the system is under heavy load at that point, could explain the extra slowness | 17:36 |
melwitt | good thinkin, I shall look | 17:36 |
clarkb | I've been using https://lamada.eu/dstat-graph/ you can dump the dstat.csv file from devstack into there | 17:36 |
*** graphene has quit IRC | 17:38 | |
*** ianychoi has joined #openstack-infra | 17:38 | |
*** graphene has joined #openstack-infra | 17:40 | |
*** trown is now known as trown|lunch | 17:41 | |
clarkb | fungi: that sounds really nice actually. I kinda want to take a walk, but everyone just left for a dentist visit so I'll enjoy the peace and quiet at home instead | 17:42 |
melwitt | oh, neat | 17:43 |
*** panda is now known as panda|off | 17:43 | |
clarkb | fungi: can you let me know when you are back? I'd like to approve cmurphy's puppet4 futureparser chagne for lists.o.o but figure you know that service much better than I do so your help if something goes sideways would be nice :) | 17:46 |
*** ginopc has quit IRC | 17:46 | |
mriedem | clarkb: the metadata api does need to query neutron to get security group information, so that's one slow thing | 17:53 |
clarkb | mriedem: does the api need to do that when asking for public keys? | 17:53 |
mriedem | and another long-standing known api performance bug, that nova doesn't cache the security group info | 17:53 |
mriedem | clarkb: no, it's just part of the metadata response | 17:53 |
mriedem | probably to match ec2 | 17:54 |
clarkb | ah | 17:54 |
* clarkb finds breakfast | 17:57 | |
*** _alastor_ has joined #openstack-infra | 18:03 | |
*** derekh has quit IRC | 18:04 | |
*** kjackal has quit IRC | 18:04 | |
*** boden has joined #openstack-infra | 18:09 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Set iptables forward drop by default https://review.openstack.org/624501 | 18:10 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585 | 18:10 |
clarkb | I think I've got ^ correct this time | 18:10 |
*** graphene has quit IRC | 18:11 | |
*** rfolco_doctor is now known as rfolco | 18:12 | |
*** jpena is now known as jpena|off | 18:17 | |
melwitt | clarkb: that dstat graph site is awesome. looking at the data from the run, cpu usr is at 100% at 10:46:57 when the > 10 secs metadata API reply happened, and also memory usage nearly maxed http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/controller/logs/dstat-csv_log.txt | 18:18 |
*** mriedem is now known as mriedem_lunch | 18:21 | |
clarkb | corvus: mordred: can you check my comment on https://review.openstack.org/#/c/622964/2 about that inventory generation picking up the magnum nodes? I don't think we want that but maybe it will be fine | 18:21 |
clarkb | melwitt: not surprising :/ probably want to make any easy performance improvements to metadata retrieval, then if this persists reduce general system overhead too | 18:22 |
melwitt | aye | 18:22 |
melwitt | mriedem_lunch is investigating | 18:23 |
mordred | clarkb: we probably do want a flag. that said - there is at least one update to the magnum hosts we'll want to make with ansible before we go live with them for anything | 18:23 |
mordred | clarkb: so ansibling our magnum hosts in general may not be crazy | 18:23 |
clarkb | mordred: ya I'm more thinking we don't want our install exim and iptables stuff running on them | 18:23 |
clarkb | so we need to split them off | 18:23 |
mordred | clarkb: but - I'm sure things will blow up ... yeah | 18:23 |
*** aojea has quit IRC | 18:23 | |
clarkb | melwitt: mriedem_lunch sounds good, thanks | 18:24 |
*** bhavikdbavishi has quit IRC | 18:25 | |
*** gfidente is now known as gfidente|afk | 18:30 | |
*** jrist has quit IRC | 18:33 | |
*** shardy has quit IRC | 18:35 | |
clarkb | mordred: do we want to merge the inventory script as is (I mean it works and won't get auto applied to anything) or should we refine it further? | 18:36 |
mordred | clarkb: I think lets just land it and we can always make it better | 18:37 |
mordred | clarkb: my hunch is that we'll find it more pleasant to just add an entry to the inventory file by hand when making a new node - and that script is really just there for in-case-we-need-ot | 18:38 |
clarkb | mordred: k I'll approve | 18:38 |
*** jamesmcarthur has quit IRC | 18:44 | |
*** Swami has joined #openstack-infra | 18:44 | |
Shrews | fungi: left you a comment on https://review.openstack.org/623211 | 18:45 |
*** wolverineav has joined #openstack-infra | 18:46 | |
*** wolverineav has quit IRC | 18:46 | |
*** wolverineav has joined #openstack-infra | 18:47 | |
*** smarcet has quit IRC | 18:49 | |
*** trown|lunch is now known as trown | 18:50 | |
fungi | clarkb: cmurphy: yeah, i'm around now to watch lists.o.o | 18:52 |
clarkb | fungi: ok I'll go ahead and reapprove that now I guess | 18:54 |
openstackgerrit | Jp Maxwell proposed openstack-infra/project-config master: Adding the netlify-sandbox project https://review.openstack.org/624523 | 18:54 |
*** dabukalam has joined #openstack-infra | 18:55 | |
*** wolverineav has quit IRC | 18:57 | |
fungi | Shrews: thanks! i've replied | 18:58 |
Shrews | fungi: ah, for some reason i assumed task #'s may not be unique | 19:00 |
Shrews | (thus the need to include Story) | 19:00 |
fungi | they are globally unique | 19:01 |
fungi | at least within a given sb deployment | 19:01 |
fungi | i think what we'd like (but isn't implemented in its-storyboard yet) is for the story footer to trigger a story comment, a la "related-bug" | 19:02 |
*** wolverineav has joined #openstack-infra | 19:03 | |
fungi | so people can use it when they want a commit to refer to a story without addressing a particular task within that story | 19:03 |
fungi | many of us thought that was already implemented since it was part of the original specification | 19:03 |
fungi | but it seems to have been left as a future exercise for someone who enjoys java | 19:03 |
Shrews | fungi: i do not enjoy java, ftr | 19:06 |
Shrews | infra history has documented this fact :) | 19:07 |
*** wolverineav has quit IRC | 19:07 | |
*** wolverineav has joined #openstack-infra | 19:09 | |
fungi | heh, yes i was sort of harkening back to the age of the wip feature | 19:09 |
*** ykarel has quit IRC | 19:10 | |
*** mriedem_lunch is now known as mriedem | 19:11 | |
openstackgerrit | Merged openstack-infra/system-config master: Add a script to generate the static inventory https://review.openstack.org/622964 | 19:12 |
mriedem | clarkb: corvus: can one of you link me to the zuul queueing change made recently to put tripleo in it's own queue or whatever | 19:12 |
mriedem | i want to send out a gate status update while we have some fixes approved | 19:12 |
clarkb | melwitt: mriedem want to update https://review.openstack.org/#/c/624533/1 with my comment? | 19:13 |
clarkb | mriedem: https://review.openstack.org/#/c/624246/ is the tripleo-ci change, not merged yet | 19:13 |
clarkb | corvus was also mentioning we might be ebtter off setting that in project-config | 19:13 |
clarkb | (I'm happy to start with ^ and if that doesn't work move to project-config though) | 19:13 |
fungi | mriedem: https://review.openstack.org/623595 | 19:13 |
clarkb | ya ^ is the zuul feature and https://review.openstack.org/#/c/624246/ uses that new feature to group things | 19:14 |
fungi | mriedem: at least for the feature implementation in zuul | 19:14 |
mriedem | ok | 19:14 |
fungi | and then we got the scheduler restarted with that in place so we can configure | 19:14 |
melwitt | clarkb: oh, yup, will do | 19:16 |
openstackgerrit | melanie witt proposed openstack-infra/elastic-recheck master: Update query for bug 1808010 https://review.openstack.org/624533 | 19:19 |
openstack | bug 1808010 in OpenStack-Gate "Tempest cirros ssh setup fails due to lack of disk space causing config-drive setup to fail forcing fallback to metadata server which fails due to hitting 10 second timeout." [Undecided,New] https://launchpad.net/bugs/1808010 | 19:19 |
*** e0ne has joined #openstack-infra | 19:20 | |
clarkb | infra-root https://review.openstack.org/#/c/605585/16 and its parent both pass ansible testinfra integration testing now. The parent change switches our default FORWARD rule to DROP from ACCEPT | 19:25 |
clarkb | I believe this to be safe but we should watch it carefully. Then 605585 is the base change to get docker onto servers so we can start dockering services | 19:25 |
*** e0ne_ has joined #openstack-infra | 19:27 | |
amuller | clarkb: of course it fails: http://logs.openstack.org/94/624494/6/check/neutron-tempest-plugin-dvr-multinode-scenario/d2d8b5a/job-output.txt.gz#_2018-12-12_18_32_52_015837 | 19:27 |
*** e0ne has quit IRC | 19:27 | |
amuller | do I need the playbook to become a certain user? | 19:28 |
clarkb | amuller: I think that dir will be owned by the stack user | 19:28 |
clarkb | you should be able to set become: yes and whatever the flag is to set user to stack and then it will work | 19:28 |
Shrews | clarkb: what is /etc/docker/daemon.json? | 19:29 |
Shrews | is that created when the service starts? | 19:29 |
amuller | clarkb: thanks I'll try that | 19:29 |
clarkb | Shrews: that is the docker daemon configuration file | 19:30 |
*** e0ne_ has quit IRC | 19:30 | |
clarkb | Shrews: I use it locally to point docker at my zfs pool for volume use, we seem to be using it to allow for ipv6 | 19:30 |
Shrews | clarkb: ah i see it used in the playbooks now. funny i've never noticed that file | 19:31 |
Shrews | clarkb: do we want to test that the service is actually running? | 19:32 |
clarkb | maybe? (I've mostly picked this change up by way of mordred and ianw to get the existing tests working around firwall rules) | 19:33 |
clarkb | I think we are relying on the packaging to do that for us but double checking it starts docker daemon may be a good idea | 19:34 |
*** e0ne has joined #openstack-infra | 19:34 | |
clarkb | let me add that | 19:34 |
Shrews | clarkb: http://git.openstack.org/cgit/openstack-infra/system-config/tree/testinfra/test_base.py#n104 | 19:36 |
Shrews | for an example | 19:36 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585 | 19:36 |
clarkb | the service is called docker on my local machine but process is dockerd. I think I got ^ correct | 19:36 |
Shrews | clarkb: lgtm otherwise. i'll +2 if the new check passes | 19:38 |
Shrews | i have a feeling we'll need a check for distro on the service name though (just from past experience) | 19:39 |
*** e0ne_ has joined #openstack-infra | 19:39 | |
clarkb | Shrews: currently the test is scoped to bionic only | 19:40 |
Shrews | clarkb: oh, you only test on bionic. that should be fine then :) | 19:40 |
clarkb | but ya if we add more platforms we'll need that maybe | 19:40 |
clarkb | (it is using upstream packaging not distro packaging too, so may be consistent across platforms) | 19:40 |
*** e0ne has quit IRC | 19:41 | |
Shrews | yeah | 19:41 |
clarkb | fungi: cmurphy just a few minutes away from merging the lists.o.o future parser change | 19:44 |
*** e0ne has joined #openstack-infra | 19:44 | |
clarkb | then we wait for puppet to run there | 19:44 |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Update query for bug 1808010 https://review.openstack.org/624533 | 19:44 |
openstack | bug 1808010 in OpenStack-Gate "Tempest cirros ssh setup fails due to lack of disk space causing config-drive setup to fail forcing fallback to metadata server which fails due to hitting 10 second timeout." [Undecided,New] https://launchpad.net/bugs/1808010 | 19:44 |
*** e0ne_ has quit IRC | 19:45 | |
*** e0ne has quit IRC | 19:45 | |
*** e0ne has joined #openstack-infra | 19:46 | |
fungi | clarkb: cool, i'm tailing syslog on lists.o.o watching for puppet activity | 19:46 |
mriedem | so i think i'm going to just start pushing patches to skip cinder tests that have had reported bugs forever for which no one is working on | 19:48 |
clarkb | mriedem: in tempest? | 19:48 |
mriedem | http://status.openstack.org/elastic-recheck/ - look for 'backup | 19:48 |
mriedem | or just move cinder backup tests to their own job so they aren't part of the integrated gate | 19:49 |
*** tk81 has quit IRC | 19:49 | |
fungi | jungleboyj might be interested in a consolidated list of those | 19:49 |
fungi | and yeah, testing them only against cinder changes could be a viable compromise | 19:50 |
mriedem | i already dumped a bit in -cinder a few minutes ago | 19:50 |
mriedem | i will send a separate proposal to the ML | 19:50 |
fungi | ahh, yep i see that. i lurk in there but wasn't paying close attention | 19:51 |
* jungleboyj hangs my head in shame | 19:51 | |
fungi | thanks mriedem! | 19:51 |
clarkb | fwiw I really do think there is value in debugging these failurse. Seems like in the last week myself and you and ovh and inap and tripleo have identified a bunch of things that can be approved in a variety of places | 19:52 |
clarkb | basically the test failures are valuable information if we act on them | 19:52 |
clarkb | the problem is we don't act on them often | 19:52 |
mriedem | i've debugged a few of the cinder bugs, | 19:52 |
AJaeger | clarkb: are you fine adding another repo to openstack-infra? See https://review.openstack.org/#/c/624523/ for netlify-sandbox | 19:52 |
jungleboyj | mriedem: Yes, thank you for your help. | 19:52 |
mriedem | several of them are due to the fact that cinder-api does rpc calls to cinder-volume, | 19:52 |
mriedem | which times out the REST API response | 19:52 |
fungi | getting someone active in cinder involved in helping debug those seems like a reasonable expectation | 19:52 |
mriedem | or the RPC response | 19:52 |
jungleboyj | The problem we are having is no one else picking these up and helping. | 19:52 |
mriedem | cinder can't scale with RPC blocking calls everywhere | 19:52 |
clarkb | AJaeger: yes, I mentioned yseterday that given the similar desire for other docs hosting with zuul and opendev I don't mind hosting it under -infra | 19:53 |
jungleboyj | fungi: Agreed. | 19:53 |
AJaeger | clarkb: thanks, then I'll +2 | 19:53 |
openstackgerrit | Merged openstack-infra/system-config master: Turn on the future parser for lists.openstack.org https://review.openstack.org/615656 | 19:53 |
clarkb | AJaeger: Its not a perfect location today, but it will work and it should be valuable to us. | 19:53 |
jungleboyj | Also have noticed more check and gate issues popping up. | 19:53 |
clarkb | jungleboyj: yes I've been banging that drum since about the PTG now | 19:53 |
mriedem | jungleboyj: yes, see the email i just sent | 19:53 |
AJaeger | clarkb: great | 19:53 |
*** e0ne has quit IRC | 19:54 | |
*** Adri2000 has quit IRC | 19:54 | |
clarkb | jungleboyj: I think the objectvie data collection shows our testing (and software?) is less reliable in recent months than in the past | 19:54 |
fungi | AJaeger: yeah, i double-checked with clarkb before suggesting the infra namespace, just haven't had a chance to follow up on the review | 19:54 |
clarkb | there are a variety of reasons for that, but I think it mostly has to do with people not actively identifying and fixing issues as they come up | 19:54 |
jungleboyj | clarkb: Sadly you are right. | 19:55 |
fungi | for intermittent/nondeterministic failures it's all too easy to "recheck" and hope it becomes someone else's problem to solve | 19:55 |
jungleboyj | clarkb: In Cinder we have seen a focus on users fixing issues in their drivers, which is good, but not in fixing general issues. | 19:55 |
clarkb | mriedem: thanks for the email update, the bug by bug rundown is good stuff | 19:56 |
*** Adri2000 has joined #openstack-infra | 19:56 | |
fungi | once folks are trained to recheck until things pass, they recheck new nondeterministic bugs in until we grind to a halt because jobs won't succeed any more | 19:56 |
dansmith | clarkb: mriedem ++ | 19:56 |
AJaeger | thanks, fungi | 19:57 |
*** smarcet has joined #openstack-infra | 19:58 | |
clarkb | fungi: lists.o.o will get puppeted on the run starting at 2000UTC | 19:58 |
* jungleboyj read mriedem s email. | 19:59 | |
clarkb | fwiw I've also got a fix for an issue affecting a bunch of tripleo jobs up and dmsimard has suggested a simpler/better way to fix it and ahs offered a patch for that. So we're trackign things down outside of the integrated gate or even just nova as well | 19:59 |
fungi | clarkb: thanks. most recent puppetage on lists.o.o completed at 19:51:44 | 19:59 |
clarkb | OVH thinks that may have tracked down the source of slowness there, we have to wait for them to fix it though | 20:00 |
fungi | mriedem: i like "fracas" | 20:00 |
fungi | not a word i get to see very often | 20:01 |
mriedem | jungleboyj: cinder-only version just sent | 20:01 |
clarkb | fungi: once lists.o.o is confirmed happy I need to pop out for lunch and dinner prep. We are trying to do family thing today since we are leaving town for holidays to see other family | 20:02 |
clarkb | so I may become a bit more afk as the day wears on | 20:02 |
fungi | clarkb: noted, thanks for the heads up | 20:02 |
*** vkmc has quit IRC | 20:03 | |
*** wolverineav has quit IRC | 20:05 | |
mriedem | fungi: i use fracas as much as humanly possibe | 20:06 |
fungi | good call | 20:07 |
*** vkmc has joined #openstack-infra | 20:07 | |
dansmith | fungi: don't encourage him. | 20:08 |
*** bobh has quit IRC | 20:08 | |
clarkb | mriedem: total brainstorm thinking out loud mode here. But maybe You, myself, and say frickler can ask for say 1-3 volunteers that are interested in walking through the process of digging into failures, writing e-r bugs, then fixing things and try to start building a group of people that can jump in when things get really bad (and maybe periodically jump in to fix stuff and keep things running | 20:09 |
clarkb | happily) | 20:09 |
clarkb | Problem is I'm getting on a plane next week then holidays happen so now isn't a great time, but maybe early january ish we do that? | 20:09 |
clarkb | Another idea is maybe we take the featurefreeze/RC period seriously and really push on this stuff then | 20:09 |
clarkb | I think some of the problem definitely is that OpenStack is this giant piece of machinery and understanding the moving parts is hard | 20:10 |
*** kmalloc is now known as notmorgan | 20:10 | |
*** notmorgan is now known as morgan | 20:10 | |
fungi | yeah, having a succession plan so mriedem doesn't feel obligated to jump on this stuff constantly would be great | 20:11 |
clarkb | so if we can get people moving past that obstacle maybe we get more help on this | 20:11 |
jungleboyj | mriedem: clarkb We do have a person that has been digging into Cinder issues more lately. | 20:11 |
mriedem | ¯\_(ツ)_/¯ | 20:11 |
rm_work | could someone possibly poke at https://review.openstack.org/#/c/624574/ ? :P | 20:11 |
mriedem | i'm out the week of xmas and the week of jan 7 | 20:12 |
mriedem | our spec freeze in nova is jan 10 | 20:12 |
jungleboyj | Had been asking whoami-rajat to spend time looking at bugs. Sounds like his time might be best focused on the check and gate issues? | 20:12 |
mriedem | and i'm already behind on a bunch of crap for the people that pay my bills | 20:12 |
jungleboyj | mriedem: Feel your pain there. | 20:12 |
*** bobh has joined #openstack-infra | 20:12 | |
mriedem | so sure i'm willing to help there, and we've had some summit talks about this as well | 20:13 |
clarkb | jungleboyj: maybe not best focused, but I do think it helps overall since nova needs cinder to be reliable to merge code and vice cersa | 20:13 |
fungi | the descendants of sdague, j0g0 and mtreinish. sjmsquad | 20:13 |
mriedem | but dealing with the gate is just kind of my extra curricular right now while i'm procrastinating on blueprint work | 20:13 |
clarkb | jungleboyj: basically we don't want people to be super siloed on this type of work. The tests test "OpenStack"'s IaaS and if things don't work together well no one is happy | 20:13 |
mriedem | http://status.openstack.org/elastic-recheck/#1763712 is an excellent example of that | 20:14 |
mriedem | nova volume attach tests fail b/c cinder times out | 20:14 |
mriedem | b/c of rpc call | 20:14 |
rm_work | thanks clarkb :) | 20:14 |
fungi | yeah, having people with a strong background in cinder looking at these problems would be great. having them focus exclusively on cinder bugs less so | 20:14 |
dansmith | mriedem: on that one, do we time out waiting for them to wait, or would the rpc heartbeat help them help us? | 20:14 |
mriedem | dansmith: c-api gets a messaging timeout from c-vol, which returns a 500 response to nova-compute | 20:15 |
*** e0ne has joined #openstack-infra | 20:15 | |
dansmith | so, could help... | 20:15 |
mriedem | n-cpu isn't polling cinder-api for state changes | 20:15 |
mriedem | i don't think so, | 20:15 |
mriedem | default http response time is 60 seconds isn't it? | 20:15 |
dansmith | default where? | 20:15 |
mriedem | i believe for one of the cinder backup bugs i identified that the long rpc timeout could help | 20:16 |
jungleboyj | fungi: clarkb I wasn't saying having them focus on Cinder bugs. Focus on check/gate issues that we are causing. | 20:16 |
mriedem | b/c it was an rpc timeout between c-bak and c-vol over rpc | 20:16 |
*** bobh has quit IRC | 20:16 | |
dansmith | if we're timing out at 60s then we'd generally not see the 500 | 20:16 |
*** e0ne has quit IRC | 20:16 | |
jungleboyj | mriedem: Not seeing the Cinder list in e-mail. Did you sent it? | 20:16 |
clarkb | jungleboyj: ya, I'm saying that longer term having them also debug neutron or nova or whatever is also valuable. For example my involvement in this has largely been to rule out/fix infra issues. Through that I've learned quite a bit about debugging openstack in general and so try to help in general too | 20:16 |
dansmith | so I said it'd could help because if ours is 120s or something, it might actually complete with the longer timeout | 20:17 |
mriedem | jungleboyj: yar | 20:17 |
mriedem | jungleboyj: http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000868.html | 20:17 |
mriedem | didn't tag for [cinder] | 20:17 |
clarkb | jungleboyj: so maybe the starting point is "here are cinder specific check/gate issues" but if we can buidl that into "here are check/gate issues" thats even better | 20:17 |
jungleboyj | Ah, I just had to ask, it showed up. | 20:17 |
mriedem | b/c it's more than just cinder | 20:17 |
*** jcoufal has joined #openstack-infra | 20:17 | |
mriedem | dansmith: i thought there used to be a thing for response timeouts in the api, but maybe that was when we had eventlet | 20:18 |
mriedem | mordred: does ksa define any kind of http response timeout? | 20:18 |
clarkb | fungi: lists should puppet soon | 20:18 |
clarkb | (I'm watching on the bridge side) | 20:18 |
fungi | yep, still watching in one of my numerous array of terminals | 20:18 |
dansmith | mriedem: what's the operation we're waiting for, by the way? | 20:18 |
dansmith | because timeout or not, we shouldn't be making synchronous calls that take that long anyway | 20:18 |
mriedem | dansmith: this https://developer.openstack.org/api-ref/block-storage/v3/#update-an-attachment | 20:19 |
mriedem | which is essentially the new version of os-initialize_connection, | 20:19 |
dansmith | mriedem: why does that take a long time? | 20:19 |
mriedem | which creates the export | 20:19 |
mriedem | idk | 20:19 |
mriedem | this is why i would like cinder people to figure it out :) | 20:19 |
jungleboyj | eharney: smcginnis Can you guys take a quick look at the discussion between mriedem clarkb and I above? | 20:20 |
mriedem | it's been awhile since i tried digging into where time is spent there, | 20:20 |
mriedem | maybe c-vol is using a lock in a bad way, | 20:20 |
mriedem | e.g. maybe something is holding a lock during a periodic task or something and we're stuck waiting on that | 20:20 |
dansmith | mriedem: yeah, just trying to think of what could be taking that long legitimately | 20:20 |
dansmith | yeah, sounds fishy to me | 20:20 |
mriedem | i think we've also at times seen weird things where tgtadm takes more than a minute | 20:21 |
dansmith | jungleboyj: ^ | 20:21 |
jungleboyj | I have seen all kinds of weird behavior when running on slow nodes. | 20:21 |
jungleboyj | It is hard to debug/fix though. | 20:21 |
*** auristor has quit IRC | 20:22 | |
clarkb | jungleboyj: if we can at least debug the source that helps us identify what we can change to reduce the slowness | 20:22 |
jungleboyj | clarkb: Agreed. | 20:22 |
clarkb | even if we can't fix that in the software itself, we might be able to take that info back to the cloud (as we've done with OVH) | 20:22 |
clarkb | that might also indicate a case where we want to isolate that specific test or set of tests so that they aren't competing for cpu time or memory etc | 20:23 |
clarkb | it may still be slow, but probably more relaible if we can do that | 20:23 |
clarkb | (basically there are things we can do if we identify the source of the slowness, but I agree it isn't always easy) | 20:24 |
mriedem | dansmith: this is the backup bug where i identified we could use long_rpc_timeout https://bugs.launchpad.net/cinder/+bug/1739482 | 20:24 |
openstack | Launchpad bug 1739482 in Cinder "test_snapshot_backup fails to build backup due to MessagingTimeout" [Medium,Confirmed] | 20:24 |
mriedem | see comment 2 | 20:24 |
jungleboyj | The case where I saw issues here was due to disk contention. Tried setting up an OpenStack cloud in a way that wasn't working well. Pounding one disk too much. | 20:24 |
dansmith | locally I sometimes seem weird iscsi behavior when the source/dest are the same node | 20:24 |
dansmith | so I wonder if we're causing ourselves pain by doing all-in-one here | 20:24 |
clarkb | fungi: seems to have been a noop. So I think we are good on that futureparser change? | 20:25 |
jungleboyj | dansmith: Interesting. I haven't seen that so much. | 20:25 |
clarkb | fungi: if you confirm I'm gonna go make lunch | 20:25 |
dansmith | like tgtd and iscsiadm fighting for locks or buffer flushes or something | 20:25 |
fungi | clarkb: complete and total noop, yes. go enjoy lunch! | 20:25 |
*** bobh has joined #openstack-infra | 20:27 | |
jungleboyj | mriedem: I am going to try and get eharney smcginnis e0ne and whoami-rajat together to discussion these issues and put together a plan to help with them. That is the best place to start. | 20:27 |
mriedem | thanks | 20:28 |
mriedem | dansmith: maybe i was making stuff up about ksa / response timeouts https://github.com/openstack/keystoneauth/blob/ccf6cb79033b2083d9177823094f7836eb68ae0d/keystoneauth1/session.py#L248 | 20:28 |
mriedem | ksa sessions have a timeout, but default to nothing | 20:28 |
openstackgerrit | Merged openstack-infra/project-config master: Add publish-to-pypi for octavia-lib https://review.openstack.org/624574 | 20:28 |
* dansmith nods | 20:28 | |
dansmith | mriedem: the individual clients might set that I guess, but.. | 20:28 |
mriedem | although the CLI sets 600 https://github.com/openstack/keystoneauth/blob/ebe781a3ea0386d6ff088a84e8dde26e538b856d/keystoneauth1/loading/session.py#L116 | 20:28 |
mriedem | but that's way high | 20:29 |
dansmith | mriedem: anyway, it doesn't sound like waiting longer for that really makes sense in the long term if there's some silly issue we'd just be covering up | 20:29 |
*** jrist has joined #openstack-infra | 20:30 | |
mriedem | this also reminds me that we used to get debug logs in nova's logs about calls using cinderclient but we don't get that debug logging anymore, | 20:31 |
mriedem | making it hard to trace requests across services | 20:32 |
mriedem | i think something might have changed in ksa recently with how that logging is setup | 20:32 |
*** bobh has quit IRC | 20:32 | |
*** _alastor_ has quit IRC | 20:32 | |
*** dklyle has joined #openstack-infra | 20:39 | |
* mriedem moves to cinder | 20:39 | |
*** bobh has joined #openstack-infra | 20:41 | |
*** wolverineav has joined #openstack-infra | 20:44 | |
*** bobh has quit IRC | 20:46 | |
*** wolverineav has quit IRC | 20:52 | |
*** bobh has joined #openstack-infra | 20:52 | |
*** wolverineav has joined #openstack-infra | 20:55 | |
*** smarcet has quit IRC | 20:55 | |
*** yboaron_ has joined #openstack-infra | 20:56 | |
*** bobh has quit IRC | 20:57 | |
*** rcernin has joined #openstack-infra | 21:06 | |
slaweq | mordred: hi | 21:07 |
slaweq | mordred: can You help me with one thing? | 21:07 |
slaweq | mordred: we have in neutron neutron-functional job, which is still using legacy-dsvm-base as parent, I now want to move it to zuulv3, do You know what job template should I use for it? Should it be devstack-tox-functional ? | 21:09 |
slaweq | or maybe someone else from infra team can help me? | 21:09 |
ianw | frickler: oh, hrm, switching devstack testing to bionic has caused dib issues because zypper isn't available on bionic. i don't know what we're going to do about building opensuse-minimal on bionic hosts, this has been a known issue that i just forgot about | 21:11 |
*** bobh has joined #openstack-infra | 21:11 | |
clarkb | ianw: possible we can use the xenial zypper package on bionic? | 21:13 |
ianw | clarkb: when i looked, it's all a huge bundle of c++ ... and i'm assuming abi's are such that it won't work | 21:13 |
clarkb | slaweq: what is the general setup of the neutron function job? It runs devsatck in a minimal setup then exectues tox or tempest or something else? | 21:14 |
ianw | slaweq / clarkb: yeah looks like it runs tox at the end? http://git.openstack.org/cgit/openstack/neutron/tree/neutron/tests/contrib/post_test_hook.sh#n64 | 21:15 |
ianw | i think you're in the ballpark with devstack-tox-functional | 21:15 |
clarkb | ++ | 21:15 |
clarkb | ianw: re zypper maybe suse ships a staticly compiled option to bootstrap things? | 21:16 |
*** bobh has quit IRC | 21:16 | |
*** amuller has quit IRC | 21:16 | |
ianw | hrm, when i looked previously i never saw anything like that but maybe | 21:16 |
slaweq | clarkb: currently it clones devstack but then runs our script: https://github.com/openstack/neutron/blob/master/neutron/tests/contrib/gate_hook.sh#L69 and then run dsvm-functional tests with tox | 21:16 |
clarkb | the new devstack jobs won't run devstack-gate hooks anymore. I think the idea there is to have your run stage execute that But maybe that happens after tox is invoked? that is the piece that might need figuring out | 21:17 |
ianw | i don't want to blow out testing, but we should maybe run bionic and xenial tests for the nodepool dsvm dib tests ... since all our builders are actually xenial, but it would also be nice to have bionic gated too for future sanity | 21:18 |
clarkb | ianw: ++ | 21:18 |
*** xek__ has quit IRC | 21:18 | |
slaweq | ianw: clarkb: if whole devstack will be run, it is also fine for our tests - that should works too, thx then - I will play with this devstack-tox-functional job then | 21:18 |
ianw | i guess we could upgrade builders, but we'd have to isolate opensuse-minimal to an older xenial host, in a similar way i guess to how we isolate arm64 builds to nb03 | 21:19 |
rm_work | could someone poke along https://review.openstack.org/#/c/624551/ ? | 21:19 |
rm_work | pretty please? (octavia-lib release) | 21:20 |
ianw | oh, but we're trying for puppet free bionic hosts, which means a whole thing around ansible-ising dib install etc etc ... hrm | 21:20 |
clarkb | rm_work: you'll want to ping the release team about that. #openstack-release | 21:20 |
rm_work | clarkb: ah ok, i looked quickly at past +2s and it seemed mostly people here :P | 21:22 |
rm_work | i'll go poke them there tho | 21:22 |
pabelanger | ianw: we had the same issue with zypper missing in xenial, but somebody fixed it. Maybe dirk, cannot remember | 21:22 |
*** yboaron_ has quit IRC | 21:24 | |
*** bobh has joined #openstack-infra | 21:26 | |
ianw | pabelanger: hrm, i guess I'll just be thankful that I have no memory of that :) | 21:26 |
ianw | it doesn't sound like fun | 21:26 |
pabelanger | ianw: I want to say somebody had commit rights for ubuntu, but can't completely remember | 21:27 |
*** bobh has quit IRC | 21:30 | |
ianw | i thought there was a bug, maybe not, so i filed https://bugs.launchpad.net/ubuntu/+source/zypper/+bug/1808230 which is content free, but at least something we can refer to | 21:30 |
openstack | Launchpad bug 1808230 in zypper (Ubuntu) "Zypper unavailable on bionic" [Undecided,New] | 21:30 |
openstackgerrit | Hongbin Lu proposed openstack-infra/project-config master: Rename neutron ryu jobs https://review.openstack.org/624814 | 21:30 |
*** jamesmcarthur has joined #openstack-infra | 21:33 | |
*** jcoufal has quit IRC | 21:34 | |
*** bobh has joined #openstack-infra | 21:40 | |
*** bobh has quit IRC | 21:44 | |
*** lpetrut has joined #openstack-infra | 21:44 | |
*** _alastor_ has joined #openstack-infra | 21:47 | |
cmurphy | clarkb: fungi sorry i wasn't around, i assume it went okay with lists.o.o? | 21:47 |
cmurphy | there are a bunch more queued up if you're feeling adventurous | 21:48 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Vendor the RDO repository configuration for installing OVS https://review.openstack.org/624817 | 21:49 |
fungi | cmurphy: yep, totally fine--thanks again!!! | 21:49 |
*** slaweq has quit IRC | 21:49 | |
*** bobh has joined #openstack-infra | 21:51 | |
*** bobh has quit IRC | 21:55 | |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Vendor the RDO repository configuration for installing OVS https://review.openstack.org/624817 | 21:57 |
*** bobh has joined #openstack-infra | 21:58 | |
*** dklyle has quit IRC | 22:00 | |
*** bobh has quit IRC | 22:03 | |
*** trown is now known as trown|outtypewww | 22:04 | |
*** slaweq has joined #openstack-infra | 22:06 | |
ianw | dmsimard: heh, great minds think alike :) | 22:07 |
clarkb | hrm http://logs.openstack.org/85/605585/17/check/system-config-run-docker/77e61f6/job-output.txt.gz#_2018-12-12_19_48_59_704238 indicates the docker service wasn't running, but reading the docker-ce packaging it should be | 22:08 |
*** jamesmcarthur has quit IRC | 22:08 | |
clarkb | "Setting up docker-ce (5:18.09.0~3-0~ubuntu-bionic) ...", "update-alternatives: using /usr/bin/dockerd-ce to provide /usr/bin/dockerd (dockerd) in auto mode", "Created symlink /etc/systemd/system/multi-user.target.wants/docker.service -> /lib/systemd/system/docker.service.", "Job for docker.service failed because the control process exited with error code.", "See \"systemctl status docker.service\" | 22:09 |
clarkb | and \"journalctl -xe\" for details." | 22:09 |
clarkb | ianw: ^ any ideas. Also Shrews ++ for the usggestion to test this which catches ^ | 22:09 |
*** slaweq has quit IRC | 22:10 | |
ianw | hrm, i saw something like this when i put an invalid ipv6 config in maybe? | 22:10 |
ianw | i think these jobs should probably start capturing syslog and things from the host | 22:11 |
clarkb | ianw: there is the ipv6 config in there still, any idea if the current ps has a valid config? | 22:11 |
ianw | i thought it was valid, but i have been known to be wrong :) | 22:12 |
* clarkb googles docker.json config | 22:12 | |
clarkb | ooh we can also set iptables to false | 22:13 |
*** jamesmcarthur has joined #openstack-infra | 22:15 | |
*** jamesmcarthur has quit IRC | 22:15 | |
*** slaweq has joined #openstack-infra | 22:15 | |
*** jamesmcarthur has joined #openstack-infra | 22:18 | |
*** slaweq has quit IRC | 22:20 | |
clarkb | I think the ipv6 setting is valid. Now wondering if we have to specify some other config that we aren't specifying | 22:21 |
*** bobh has joined #openstack-infra | 22:21 | |
*** boden_ has joined #openstack-infra | 22:21 | |
ianw | i hate json config files | 22:21 |
ianw | clarkb: is the best thing to do get log capturing at this point? seems like it will be useful work in the future anyway | 22:22 |
clarkb | ianw: likely | 22:22 |
*** mnasiadka_ has joined #openstack-infra | 22:23 | |
*** benj_- has joined #openstack-infra | 22:23 | |
clarkb | I really need to figure out apparmor and libvirt locally so that I can boot VMs again | 22:25 |
* clarkb does that now since its impacted a couple things at this point | 22:25 | |
*** bobh has quit IRC | 22:26 | |
*** _alastor_ has quit IRC | 22:26 | |
*** boden_ has quit IRC | 22:26 | |
*** uberjay has joined #openstack-infra | 22:29 | |
*** mnasiadka has quit IRC | 22:29 | |
*** andreykurilin has quit IRC | 22:29 | |
*** boden has quit IRC | 22:29 | |
*** rh-jelabarre has quit IRC | 22:29 | |
*** verdurin has quit IRC | 22:29 | |
*** masayukig[m] has quit IRC | 22:29 | |
*** eumel8 has quit IRC | 22:29 | |
*** lewo has quit IRC | 22:29 | |
*** uberjay_ has quit IRC | 22:29 | |
*** smcginnis has quit IRC | 22:29 | |
*** benj_ has quit IRC | 22:29 | |
*** logan- has quit IRC | 22:29 | |
*** mnasiadka_ is now known as mnasiadka | 22:29 | |
*** rh-jelabarre has joined #openstack-infra | 22:29 | |
*** wolverineav has quit IRC | 22:31 | |
*** logan- has joined #openstack-infra | 22:31 | |
*** verdurin has joined #openstack-infra | 22:31 | |
*** gouthamr has quit IRC | 22:32 | |
*** wolverineav has joined #openstack-infra | 22:32 | |
*** lpetrut has quit IRC | 22:32 | |
*** irdr has quit IRC | 22:33 | |
*** irdr has joined #openstack-infra | 22:34 | |
*** gouthamr has joined #openstack-infra | 22:35 | |
*** bobh has joined #openstack-infra | 22:38 | |
*** jamesmcarthur has quit IRC | 22:40 | |
*** slaweq has joined #openstack-infra | 22:42 | |
*** bobh has quit IRC | 22:42 | |
*** slaweq has quit IRC | 22:47 | |
*** kgiusti has left #openstack-infra | 22:49 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Collect syslogs from nodes in ansible tests https://review.openstack.org/624827 | 22:54 |
*** bobh has joined #openstack-infra | 23:00 | |
*** jamesmcarthur has joined #openstack-infra | 23:00 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] Collect syslogs from nodes in ansible tests https://review.openstack.org/624827 | 23:03 |
clarkb | ok I think I have working libvirt again. Now just waiting on bionic image to download | 23:03 |
* clarkb reviews ^ | 23:03 | |
*** bobh has quit IRC | 23:04 | |
clarkb | ianw: left a note, I think the second stat may overwrite the previous? | 23:06 |
*** smcginnis has joined #openstack-infra | 23:07 | |
*** rkukura_ has joined #openstack-infra | 23:08 | |
ianw | oh yeah, good point, it only works if the first is a negative result and the second is positive | 23:09 |
*** rkukura has quit IRC | 23:11 | |
*** rkukura_ is now known as rkukura | 23:11 | |
*** bobh has joined #openstack-infra | 23:12 | |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Vendor the RDO repository configuration for installing OVS https://review.openstack.org/624817 | 23:14 |
*** bobh has quit IRC | 23:16 | |
openstackgerrit | Merged openstack-infra/system-config master: Prefix install_openstacksdk variable https://review.openstack.org/621462 | 23:19 |
openstackgerrit | Merged openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible https://review.openstack.org/611228 | 23:19 |
openstackgerrit | Merged openstack-infra/system-config master: Enable ARA reports for system-config bridge CI jobs https://review.openstack.org/617216 | 23:19 |
dmsimard | ianw: ^ yay, thanks for your help | 23:19 |
ianw | if we're going to start collecting even more logs, we should start from that base | 23:20 |
*** gfidente|afk has quit IRC | 23:24 | |
*** jamesmcarthur has quit IRC | 23:28 | |
*** jamesmcarthur_ has joined #openstack-infra | 23:28 | |
*** rlandy is now known as rlandy|bbl | 23:28 | |
*** jamesmcarthur_ has quit IRC | 23:30 | |
*** jamesmcarthur has joined #openstack-infra | 23:30 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] Collect syslogs from nodes in ansible tests https://review.openstack.org/624827 | 23:31 |
*** dklyle has joined #openstack-infra | 23:35 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] Collect syslogs from nodes in ansible tests https://review.openstack.org/624827 | 23:43 |
*** dklyle has quit IRC | 23:45 | |
*** weshay is now known as weshay_pto | 23:46 | |
*** diablo_rojo has joined #openstack-infra | 23:47 | |
*** auristor has joined #openstack-infra | 23:49 | |
jonher | zuul is not doing gate checks for https://review.openstack.org/624482 am I missing something? | 23:50 |
*** bobh has joined #openstack-infra | 23:52 | |
*** jamesmcarthur has quit IRC | 23:53 | |
*** bobh has quit IRC | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!