*** armax has quit IRC | 00:42 | |
*** diablo_rojo has quit IRC | 00:48 | |
*** mriedem has quit IRC | 01:03 | |
*** whoami-rajat has joined #openstack-release | 01:16 | |
*** ekcs has quit IRC | 01:25 | |
*** ricolin has joined #openstack-release | 02:06 | |
*** ykarel|away has joined #openstack-release | 02:07 | |
*** gmann_afk is now known as gmann | 02:16 | |
*** lbragstad has quit IRC | 03:08 | |
*** whoami-rajat has quit IRC | 03:35 | |
*** ykarel|away has quit IRC | 03:40 | |
*** udesale has joined #openstack-release | 03:58 | |
*** whoami-rajat has joined #openstack-release | 04:06 | |
*** ykarel|away has joined #openstack-release | 04:24 | |
*** ykarel_ has joined #openstack-release | 04:30 | |
*** ykarel|away has quit IRC | 04:32 | |
openstackgerrit | Merged openstack/releases master: [keystone] create pike-em tag against final releases https://review.opendev.org/652905 | 05:08 |
---|---|---|
openstackgerrit | Merged openstack/releases master: [Telemetry] final releases for pike https://review.opendev.org/652882 | 05:08 |
*** e0ne has joined #openstack-release | 05:10 | |
*** e0ne has quit IRC | 05:17 | |
*** ykarel_ is now known as ykarel | 05:53 | |
*** electrofelix has joined #openstack-release | 06:11 | |
*** d34dh0r53 has quit IRC | 06:22 | |
*** pcaruana has joined #openstack-release | 06:24 | |
*** egonzalez has quit IRC | 07:03 | |
*** egonzalez has joined #openstack-release | 07:04 | |
*** amoralej has joined #openstack-release | 07:25 | |
*** tosky has joined #openstack-release | 07:26 | |
*** hberaud has joined #openstack-release | 07:39 | |
*** dtantsur|afk is now known as dtantsur | 07:40 | |
*** ykarel is now known as ykarel|lunch | 07:53 | |
*** jpich has joined #openstack-release | 08:00 | |
ttx | tonyb[m]: we should really hold on releases until we figure out the twine issue. | 08:15 |
ttx | Currently all those tarballs are lost and all those release jobs need to be reenqueued | 08:16 |
*** ykarel|lunch is now known as ykarel | 08:30 | |
*** jpich has quit IRC | 08:56 | |
*** jpich has joined #openstack-release | 08:56 | |
*** jpich has quit IRC | 08:57 | |
*** jpich has joined #openstack-release | 09:04 | |
*** dtantsur is now known as dtantsur|brb | 09:19 | |
*** e0ne has joined #openstack-release | 09:20 | |
*** d34dh0r53 has joined #openstack-release | 09:46 | |
*** dirk has quit IRC | 09:54 | |
*** gmann has quit IRC | 09:54 | |
*** dirk has joined #openstack-release | 09:54 | |
*** vdrok has quit IRC | 09:55 | |
*** rm_work has quit IRC | 09:55 | |
*** gmann has joined #openstack-release | 09:56 | |
*** vdrok has joined #openstack-release | 09:57 | |
*** ykarel_ has joined #openstack-release | 10:01 | |
*** ykarel has quit IRC | 10:04 | |
*** rm_work has joined #openstack-release | 10:06 | |
*** hberaud is now known as hberaud|lunch | 10:09 | |
*** ykarel_ is now known as ykarel | 10:14 | |
*** ykarel_ has joined #openstack-release | 10:19 | |
*** ykarel has quit IRC | 10:22 | |
*** ykarel_ is now known as ykarel | 10:23 | |
*** ykarel is now known as ykarelaway | 10:33 | |
*** dtantsur|brb is now known as dtantsur | 10:35 | |
*** gmann has quit IRC | 10:44 | |
*** ykarelaway is now known as ykarel | 11:05 | |
*** udesale has quit IRC | 11:16 | |
*** hberaud|lunch is now known as hberaud | 11:28 | |
*** amoralej is now known as amoralej|lunch | 12:06 | |
*** udesale has joined #openstack-release | 12:29 | |
*** lbragstad has joined #openstack-release | 12:44 | |
*** irclogbot_3 has quit IRC | 12:55 | |
*** irclogbot_3 has joined #openstack-release | 12:56 | |
*** altlogbot_1 has quit IRC | 12:57 | |
*** altlogbot_0 has joined #openstack-release | 12:58 | |
smcginnis | fungi: Did anything more happen with tracking down the source of the ensure-twine issues? | 13:03 |
fungi | smcginnis: i haven't given up but i'm still searching. on the lp error ttx posted to the ml, i think that one's clear-cut and easy to fix (see my reply there) | 13:09 |
smcginnis | Yeah, the lp one should be fixed, but that's not the blocker. | 13:09 |
fungi | for ensure-twine, i think we need to figure out whether pip is actually being invoked from within a virtualenv there, or whether it is confused by something making it think it's using a virtualenv. we ought to be able to use the ensure-twine role in a check job proposed as a do-not-merge change and see what it does, then maybe add some debugging around it | 13:14 |
smcginnis | I have https://review.opendev.org/655241 up, but it sounded like it would fail due to being in a read-only environment (if I understood correctly), so not sure if that's really needed or not. | 13:15 |
smcginnis | Would be easy enough to add a debug print for sys.real_prefix. | 13:15 |
fungi | i spent a good chunk of yesterday looking into anything which might have changed on or shortly before the 17th when we saw the first case of this, but didn't get to do any further debugging what with also trying to look into network instability and rogue vm issues in our providers most of the day | 13:16 |
fungi | trying to debug with something like 655241 is going to be harder, since we have to land changes to it and exercise them with test releases | 13:16 |
*** amoralej|lunch is now known as amoralej | 13:17 | |
smcginnis | I'd be glad to help, but need some hand holding to know how. | 13:17 |
fungi | but as for the role itself, either it really *is* calling pip inside the ansible virtualenv (in which case the job shouldn't be able to pip install anything into it) or it's confused about what its situation is | 13:18 |
fungi | if we take the error on its face, a possible way out is to rebuild the ansible venvs with system site packages enabled, at which point pip install --user in the workspace should work, but i don't know whether there could be security ramifications like jobs being able to so | 13:19 |
fungi | silently replace ansible modules | 13:19 |
smcginnis | What if we always set up twine in a venv and make sure any use of it is via that venv? | 13:20 |
fungi | we could also try sanitizing the calling environment and setting the interpreter to /usr/bin/python3, or creating a twine venv in the workspace and using that | 13:20 |
* fungi will be back in just a minute | 13:21 | |
*** ykarel is now known as ykarel|away | 13:23 | |
fungi | well, a few minutes | 13:24 |
dhellmann | fungi : is ansible on the workers is using a virtualenv by default? if so, that might explain why this is a "new" problem, since those may have not had a "python3" executable on the old images | 13:34 |
dhellmann | being explicit about using a twine virtualenv is probably the quickest route back to making things work, but it would be nice to understand why they're failing | 13:35 |
fungi | there are no "images" relevant to this as twine is being installed and run on the executor | 13:41 |
fungi | ansible is installed in and run from virtualenvs on the executor so we can have multiple versions of ansible from which for zuul to select, but that's been the case since well before stein released so not recent enough to be the cause | 13:41 |
fungi | though i do wonder if there could be a new ansible regression which is leaking envvars into the pip install shell task, i'm going to see if recent ansible release history lines up with our timeline for this | 13:42 |
fungi | if zuul, for some reason, got new versions of ansible at different times on different executors, that could account for the random behavior we were seeing early on | 13:43 |
smcginnis | Not sure if it will help, but I put up https://review.opendev.org/655437 | 13:44 |
fungi | (i'm also not finished catching up on overnight irc scrollback and e-mail either, so please bear with me) | 13:46 |
*** ykarel|away has quit IRC | 13:46 | |
smcginnis | dhellmann: I haven't wanted to bring it up yet, but see last comment on https://review.opendev.org/#/c/654627/3 for another potential issue. | 13:47 |
openstackgerrit | Ivan Kolodyazhny proposed openstack/releases master: Release Horizon Queens 13.0.2 https://review.opendev.org/655440 | 13:49 |
dhellmann | smcginnis : if we have a git repo sync issue, we should be more aggressive about cloning/updating in our jobs. I think only the release jobs are sensitive to missing tags like that. | 13:50 |
openstackgerrit | Ivan Kolodyazhny proposed openstack/releases master: Release Horizon Rocky 14.0.2 https://review.opendev.org/655442 | 13:51 |
dhellmann | fungi : sure, I don't know how that stuff is set up so by "images" I just meant "the OS on the hosts running the ansible jobs" and it sounds like there may have been some changes in that content (at least to ansible itself, if not to the version of python used in those virtualenvs) | 13:51 |
*** gmann has joined #openstack-release | 13:51 | |
dhellmann | smcginnis : our clone_repo.sh script *should* be doing a lot of extra fetching and pulling already, but maybe we're missing something? | 13:52 |
* dhellmann has to drop offline for a bit | 13:52 | |
*** mriedem has joined #openstack-release | 13:53 | |
openstackgerrit | Ivan Kolodyazhny proposed openstack/releases master: Release Horizon Stein 15.0.1 https://review.opendev.org/655447 | 13:54 |
fungi | yeah, the virtualenvs themselves were built in mid-march across all the executors with the same versions (latest) of virtualenv and pip, which are a couple months old at this point too | 13:54 |
fungi | but we do upgrade ansible in the virtualenvs whenever there's a new point release | 13:54 |
smcginnis | Looks like 2.7.10 was released 21 days ago. | 13:57 |
fungi | yeah, so in checking my assumptions, it seems we have *not* been upgrading ansuble 2.7 at least. the executors are all still using 2.7.9 from march 14 (latest at the time their virtualenvs were built on march 18) | 13:58 |
fungi | same with ansible 2.6.15 from march 15 | 13:59 |
fungi | so there goes that theory | 13:59 |
smcginnis | So ansible, python, and pip have all been the same since well before this started happening? | 14:00 |
fungi | yes, since well before stein release even | 14:03 |
fungi | as have the jobs and roles in use | 14:03 |
fungi | which is why this is so maddening | 14:03 |
smcginnis | Underlying platform on the executors? | 14:04 |
fungi | we have unattended-upgrades pulling in security updates for ubuntu xenial (16.04 lts) so in theory it could be something which upgraded around that time. i'll take a look at the dpkg log on an executor | 14:05 |
fungi | there was a glibc update on the 16th | 14:06 |
*** udesale has quit IRC | 14:06 | |
*** udesale has joined #openstack-release | 14:07 | |
*** udesale has quit IRC | 14:08 | |
*** udesale has joined #openstack-release | 14:08 | |
fungi | http://paste.openstack.org/show/749704 is a list of the packages which were updated on ze01.openstack.org since the start of april | 14:09 |
smcginnis | Nothing in that list that looks obvious to me that could have an impact on python execution. | 14:10 |
fungi | i was wrong about glibc upgrading, i think those were just trigger debug lines | 14:14 |
smcginnis | So sounds like we may need that debugging to figure out what virtual environment we are in. If the host, ansible, python, and pip all have not really changed, then the only other thing I can think of is some other task was added that is somehow getting us into a venv. | 14:15 |
*** ykarel has joined #openstack-release | 14:18 | |
ttx | yeah, we at least need to figure out whether the error is correct or whether it's misleading | 14:19 |
openstackgerrit | Ivan Kolodyazhny proposed openstack/releases master: Release Horizon Stein 15.0.1 https://review.opendev.org/655447 | 14:30 |
openstackgerrit | Ivan Kolodyazhny proposed openstack/releases master: Release Horizon Rocky 14.0.3 https://review.opendev.org/655442 | 14:33 |
*** electrofelix has quit IRC | 14:37 | |
*** mlavalle has joined #openstack-release | 14:45 | |
fungi | right, the output of the additional debugging should hopefully confirm or refute the basis of that error and then we'll hopefully have a better idea of where/how to try to fix it | 14:46 |
openstackgerrit | Matt Riedemann proposed openstack/releases master: [nova] final releases for pike https://review.opendev.org/652868 | 14:52 |
openstackgerrit | Matt Riedemann proposed openstack/releases master: [nova] create pike-em tag against final releases https://review.opendev.org/652869 | 14:52 |
smcginnis | ttx: I have that lp change just about ready if you have not started it yet. | 14:52 |
ttx | done | 14:53 |
ttx | sorry | 14:53 |
ttx | https://review.opendev.org/655465 | 14:53 |
smcginnis | No worries. Was just writing the commit message when I saw your ML response. :) | 14:53 |
openstackgerrit | Ivan Kolodyazhny proposed openstack/releases master: Release Horizon Stein 15.0.1 https://review.opendev.org/655447 | 14:59 |
*** dave-mccowan has joined #openstack-release | 15:01 | |
*** pcaruana has quit IRC | 15:06 | |
*** dave-mccowan has quit IRC | 15:07 | |
*** amotoki_ is now known as amotoki | 15:08 | |
*** diablo_rojo has joined #openstack-release | 15:15 | |
*** ricolin has quit IRC | 15:17 | |
*** ricolin has joined #openstack-release | 15:28 | |
*** armax has joined #openstack-release | 15:29 | |
*** pcaruana has joined #openstack-release | 15:46 | |
*** diablo_rojo has quit IRC | 15:53 | |
*** diablo_rojo has joined #openstack-release | 15:56 | |
*** tosky has quit IRC | 16:00 | |
smcginnis | Status update on debugging the post-release failures with ensure-twine. | 16:02 |
smcginnis | Added debug output of the environment and got one passing, one failing. | 16:02 |
smcginnis | Pass: http://logs.openstack.org/77/654477/1/check/test-release-openstack/24f58a8/ara-report/result/8537481b-3908-4016-8165-0b4650aeb758/ | 16:03 |
smcginnis | Fail: http://logs.openstack.org/59/59ce65e9e66fe3ea203b77812ee14e69ebdb192a/release/release-openstack-python/781edd8/ara-report/result/a6f1d958-2d17-4b8a-a6bf-97cb1bf6e31d/ | 16:03 |
smcginnis | Pass shows "Host: ubuntu-bionic" which would indicate ansible was ssh'ing into the machine and running, where it ended up not being within a virtualenv. | 16:03 |
smcginnis | Fail shows "Host: localhost" which would indicate it was just running locally, and output shows it is within ansible's virtualenv. | 16:04 |
*** ekcs has joined #openstack-release | 16:05 | |
*** altlogbot_0 has quit IRC | 16:09 | |
*** ykarel is now known as ykarel|away | 16:11 | |
*** altlogbot_2 has joined #openstack-release | 16:12 | |
*** ianychoi has quit IRC | 16:14 | |
*** ianychoi has joined #openstack-release | 16:15 | |
smcginnis | It is looking like https://opendev.org/zuul/zuul/commit/70ec13a7caf8903a95b0f9e08dc1facd2aa75e84 is the cause of the problems. | 16:17 |
smcginnis | 3 weeks ago, then it wouldn't have been picked up until more executors were restarted. | 16:17 |
smcginnis | Which a lot were around the 16/17. | 16:18 |
smcginnis | Going to revert that change. | 16:18 |
openstackgerrit | Hervé Beraud proposed openstack/releases master: Introduce a new yamlutils available by using oslo.serizalization https://review.opendev.org/648133 | 16:19 |
*** ricolin has quit IRC | 16:20 | |
ttx | smcginnis: nice catch | 16:20 |
smcginnis | ttx: Definitely team effort with the infra and zuul folks. Just glad we're finally narrowing things down and have a good hypothesis. | 16:22 |
smcginnis | Revert - https://review.opendev.org/#/c/655491/ | 16:22 |
smcginnis | We'll run another test after that lands. I think we also need to restart the executors to make sure they pick that up. | 16:22 |
*** dtantsur is now known as dtantsur|afk | 16:29 | |
*** jpich has quit IRC | 16:33 | |
*** altlogbot_2 has quit IRC | 16:43 | |
*** altlogbot_0 has joined #openstack-release | 16:44 | |
*** altlogbot_0 has quit IRC | 16:53 | |
openstackgerrit | Matt Riedemann proposed openstack/releases master: [nova] final releases for pike https://review.opendev.org/652868 | 16:56 |
openstackgerrit | Matt Riedemann proposed openstack/releases master: [nova] create pike-em tag against final releases https://review.opendev.org/652869 | 16:56 |
mriedem | i think ^ should be good now | 16:56 |
*** altlogbot_2 has joined #openstack-release | 16:56 | |
smcginnis | Thanks mriedem | 16:57 |
*** hberaud is now known as hberaud|gone | 17:01 | |
*** e0ne has quit IRC | 17:02 | |
*** amoralej is now known as amoralej|off | 17:18 | |
*** electrofelix has joined #openstack-release | 17:46 | |
*** udesale has quit IRC | 17:49 | |
fungi | yeah, i suspect if we go back through system logs we'll find that one or more (but not all) executors were restarted around the 16th/17th and so release-openstack-python jobs which got scheduled to one of those executors hit this regression while reenqueunig them often resulted in running on a different executor which was not yet running with that change | 18:17 |
fungi | and the point at which the behavior became consistent was coincident with when we did a full coordinated restart of all executors | 18:18 |
fungi | so as long as it goes away once we restart all the executors again with the revert in place, i won't lose any more sleep over it | 18:20 |
*** ykarel|away has quit IRC | 18:22 | |
smcginnis | Now we just need that revert to make it through. | 18:31 |
*** e0ne has joined #openstack-release | 18:48 | |
*** openstackgerrit has quit IRC | 18:57 | |
*** electrofelix has quit IRC | 19:03 | |
dhellmann | isn't that revert just going to break other things, though? | 19:03 |
smcginnis | It was needed for ara, but they have a plan for handling it better there rather than forcing all jobs to be executed by ansibles venv. | 19:04 |
smcginnis | Doesn't sound like it was a service affecting ara issue either. | 19:05 |
dhellmann | ok, good | 19:14 |
*** e0ne has quit IRC | 19:32 | |
*** e0ne has joined #openstack-release | 19:35 | |
*** dave-mccowan has joined #openstack-release | 19:56 | |
*** e0ne has quit IRC | 20:08 | |
*** pcaruana has quit IRC | 20:39 | |
*** whoami-rajat has quit IRC | 21:25 | |
smcginnis | tonyb[m]: Still issues with releasing python projects. | 21:43 |
smcginnis | tonyb[m]: We think we need https://review.opendev.org/#/c/655491/ to merge and the zuul executor nodes to be restarted before we can do releases without failures. | 21:43 |
smcginnis | tonyb[m]: So probably best to hold off on any more releases until we get the infra all clear. | 21:43 |
*** mlavalle has quit IRC | 22:13 | |
dhellmann | maybe we should apply a procedural -2 to all of those so their authors know the status | 22:17 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!