Tuesday, 2020-07-21

*** openstack has joined #opendev07:30
*** ChanServ sets mode: +o openstack07:30
ianw#status log rebooted eavesdrop01.openstack.org as it was not responding to network or console07:32
openstackstatusianw: finished logging07:32
*** dougsz has quit IRC07:36
*** dougsz has joined #opendev07:36
openstackgerritIan Wienand proposed opendev/system-config master: Add borg-backup roles  https://review.opendev.org/74136607:37
*** tosky has joined #opendev07:37
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS  https://review.opendev.org/74186807:44
openstackgerritIan Wienand proposed openstack/diskimage-builder master: [wip] Drop dib-python requirement from several elements  https://review.opendev.org/74187707:44
*** dtantsur|afk is now known as dtantsur07:55
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** priteau has joined #opendev08:24
ttxhmm, jobs in the kata-containers tenant seem to have stopped running on Jul 17: https://zuul.opendev.org/t/kata-containers/builds?pipeline=SoB-check08:38
ttxAnyone knows of any obvious culprit, before I dive deeper?08:38
ttxseems to match the last reconfiguration/restart time08:42
openstackgerritMerged opendev/irc-meetings master: Change Neutron L3 Sub-team Meeting frequency  https://review.opendev.org/74187608:51
*** xiaolin has joined #opendev09:47
openstackgerritFabien Boucher proposed opendev/gear master: Bump crypto requirement to accomodate security standards  https://review.opendev.org/74211710:35
openstackgerritDaniel Bengtsson proposed openstack/diskimage-builder master: Update the tox minversion parameter.  https://review.opendev.org/73875411:26
*** sshnaidm|afk is now known as sshnaidm11:41
*** iurygregory has quit IRC11:49
*** tkajinam has quit IRC11:56
*** iurygregory has joined #opendev11:59
openstackgerritFabien Boucher proposed opendev/gear master: use python3 as context for build-python-release  https://review.opendev.org/74216511:59
fungittx: i can take a look in a few hours12:09
ttxfungi: thx!12:12
ttxI looked but there seems to be no trail at all12:12
openstackgerritFabien Boucher proposed opendev/gear master: Bump crypto requirement to accomodate security standards  https://review.opendev.org/74211712:16
fungittx: yeah, i expect i'll have to check the scheduler debug log to see if we're getting any events from github at all12:17
*** ryohayakawa has quit IRC12:25
*** Eighth_Doctor is now known as Conan_Kudo12:59
*** Conan_Kudo is now known as Eighth_Doctor12:59
*** gema has joined #opendev13:01
*** fressi has joined #opendev13:21
*** fressi has quit IRC13:22
clarkbfungi: ttx github is deprecating some bits of its application webhooks but that isn't scheduled until october 1st so doubt that is it13:33
clarkbseems like something to do with the zuul scheduler update13:33
clarkbtobiash: ^ you're probably more up to date on changes that may affect github? have you seen any problems?13:33
fungiyeah, i'm digging in logs now, looks like we stopped getting any trigger events from their repos but i haven't yet cross-checked our other github connection repos13:34
tobiashwas there a scheduler restart on jul 17?13:35
clarkbtobiash: yes13:35
tobiashare there github related exceptions?13:35
clarkbwe restarted to pick up zookeeper tls connection setup with the kazoo update. I suppose that could potentially be related too (though doubtful since other aspects are happy)13:36
tobiashone potential change that alters github stuff has been merged on jul 16: https://review.opendev.org/710034 but this should have beem a noop refactoring13:36
fungii'm checking now to identify the timestamp of the last GithubTriggerEvent in our debug logs, but... they're large (and compressed)13:38
fungiokay, last GithubTriggerEvent we saw for any project (not just limietd to the kata-containers tenant) was 2020-07-17 22:28:4413:39
fungis/limietd/limited/13:39
tobiashquick testing with the github-debugging script shows no general issue against a real github13:40
fungi2020-07-17 22:32:59,413 INFO zuul.Scheduler: Starting scheduler13:41
tobiashand if that got broken you would see exception traces or github auth failures13:41
fungiso yes, we stopped seeing GithubTriggerEvent at restart13:41
clarkbI wonder if we needed to restart zuul-web but didn't13:41
clarkb?13:42
clarkbsince the github events are all funneled through that13:42
tobiashshould be compatible actually, but maybe13:42
fungioh, i should be looking at the zuul-web logs right? i forgot about these coming in via webhooks13:43
clarkbfungi: well both sets of logs13:43
clarkbthe entry to the system is zuul-web though13:43
clarkband it was started in june13:43
fungifinding a github-related traceback in the scheduler logs will take a while... with all the other tracebacks the scheduler throws it's needle-in-haystack stuff13:44
fungibut maybe the zuul-web logs will get me there faster13:44
clarkbya may give you event ids to grep on13:45
clarkbsince you should be able to trace that from the entry point to job completion and back to reporting13:45
fungiargh, i think zoom webclient in chromium has locked up the display on my workstation13:47
openstackgerritKendall Nelson proposed openstack/project-config master: Updates Message on PR Close  https://review.opendev.org/74219413:50
fungizuul-web is definitely calling into the rpcclient to notify the scheduler of webhook events13:52
clarkbwe have had to restart zuul-web in the past for it to work after a scheduler restart. Maybe this is that class of problem and we should go ahead and restart it?13:53
clarkbI think that will pick up the PF4 changes too13:53
fungithe only exceptions in the web-debug.log today are websocket errors13:57
fungitrying to see if the scheduler is logging those rpc commands at all13:59
clarkbI wonder if that has started the zk transition13:59
clarkband old zuul-web is talking gearman while scheudler is checking zk13:59
*** roman_g has joined #opendev14:00
fungiyeah, nothing in the scheduler debug log about rpc calls from the zuul-web14:01
fungicorvus: when you're around, do you have an opinion on trying a zuul-web restart to see if we start getting github webhook events again? or is there additional troubleshooting we should consider first?14:02
AJaegerfungi, clarkb, ianw: ricolin proposes in https://review.opendev.org/#/c/742090/1 a couple of arm64 jobs, some advise from you would be welcome there, please14:08
corvusfungi: so we see the events in zuul-web, but not on the scheduler?14:17
*** markmcclain has quit IRC14:23
fungicorvus: tobiash helped me find that geard logs them14:24
fungiand zuul.GithubEventProcessor handles them but then does nothing14:24
*** markmcclain has joined #opendev14:24
*** mlavalle has joined #opendev14:38
clarkbinfra-root fyi http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016022.html some of those changes seem like reasonable improvments. I think my concern is mostly for https://review.opendev.org/#/c/737043/16 which adds centos support back into base. We finally simplified down to not needing to have support for multiple platforms :/15:57
clarkbOur shift away from trying to provide general purpose roles is a direct consequence of never really having had much help for that so we ended up doing extra work for ourselves and didn't see much benefit. In this case I suppose if rdo is using it then we may get that help (basically we have users before the extra work)15:58
corvuswe've been down this road before15:59
corvuswe specifically put roles in playbooks/roles because we are targeting no external users16:02
clarkbyup, because we had struggled mightily with trying to accomdate external users with the puppet stuff16:03
corvusit does look well tested16:04
corvusthat might help make this effort more successful than the last 216:04
clarkbI think my biggest concern from a mirror perspective is we expect them to be used by our system only. That means we can and do change paths and deprecate and remove old stuff16:06
fungieven so, i'm dubious... could the modularity of the existing orchestration be improved so that they can plug in support for the stuff they need which we aren't running? that way they can maintain it separately16:06
clarkbmaking those changes safely for external users becomes much more difficult (though they could guard against issues with testing)16:06
fungialso worried that this could be the tip of the iceberg and we wind up back in puppet-openstackci territory again16:06
*** marios has quit IRC16:07
clarkbfor example once we drop xenial test jobs I'd like to force https on our mirrors16:07
clarkbbut that may break external usage of the role if people are consuming http and can't https (as in the case of xenial apt)16:08
fungiyeah, that's the main thing for me. we need to be able to make breaking changes to this without coordinating with downstream users16:08
clarkbI'll write up a response with our concerns and why we've structured things the way we have16:09
clarkband see what Javier thinks about that16:09
fungiright now, only accepting patches for improvements in what we're running that we can take advantage of helps reinforce that people shouldn't expect help trying to reuse it16:09
mnaseri'm using the python builder images and i had some really tricky issue which is -- i need to install Cython -- to install the packages that i need16:14
mnaseris this a pattern that may have been addressed in another way?16:14
mnaserthe trick is we build things inside a venv, and that's created inside `assemble`16:15
clarkbmnaser: bindep?16:15
clarkbbasically the bindep compile target should get installed prior to installing pip packages16:15
mnaserclarkb: well, i dont think i can do python3-cython in bindep16:16
clarkbwhy not?16:16
mnaserbecause that would install in system python, not the one the image uses inside /usr/local/bin/python16:16
clarkbI could be wrong about this but I think it should find your system install via PATH and use it from there16:17
clarkbwhether or not that would actually work for compiling the cython bits I'm not sure16:17
mnaserif i do that i end up with two pythons16:17
mnaserthe python3 that ships in the docker image is built from source into /usr/local/bin/python16:17
mnaserif i install cython bits, it will end up installing debian shipped python16:18
corvushow is one expected to install cython when using a built-from-source python?16:18
mnasercorvus: pip install Cython worked just fine in the container16:18
*** chandankumar is now known as raukadah16:19
mnaser(i think they ship a lot of wheels so it works™)16:19
clarkbok so the actual problem is that installing python3-cython deps on python3 under debian and that mxies up the python install16:19
clarkbwe do install a dpkg override file thing in those images to avoid that I thought16:19
clarkbmaybe we need to add more content to that list?16:20
mnaseri could install Cython from pip16:20
mnaserbut it would have to be two staged16:20
mnaserinstall Cython into build venv, then actually install the pkgs i need16:20
corvusit doesn't work to just put it first in requirements.txt?16:20
clarkbwe do the control file for python3-dev16:21
clarkbmaybe adding one for python3 is sufficient? and that addresses a larger global problem with the images?16:21
mnaser(trying with Cython first)16:22
mnaserWARNING: Cython is not installed.16:23
mnaserERROR: Cannot find Cythonized file rados.c16:23
mnaserit's almost as if it was a pre-build step or something16:24
clarkbanother approach could be to install a wheel of your cythonized package16:24
mnaserbuilding this https://github.com/ceph/ceph/tree/master/src/pybind/rados16:25
*** roman_g has quit IRC16:25
mnaser'cythonize' is called in setup.cfg16:25
corvusmnaser: what line in assemble fails?16:27
mnaser`/tmp/venv/bin/pip install -c /tmp/src/upper-constraints.txt --cache-dir=/output/wheels Cython git+https://opendev.org/openstack/glance@stable/ussuri git+https://github.com/ceph/ceph@octopus#subdirectory=src/pybind/rados PyMySQL python-memcached`16:27
mnaserhttp://paste.openstack.org/show/796175/16:28
mnaserthis is the dockerfile (but takes a little bit to build because ceph clone)16:28
*** priteau has quit IRC16:28
mnaserwith this bindep - http://paste.openstack.org/show/796176/16:29
corvusi didn't know assemble took args16:29
corvusi see that now16:29
*** priteau has joined #opendev16:29
fungimnaser: yeah, cython is going to have to be preinstalled for setuptools to make use of it, i guess pip install could be run twice16:30
mnaserfungi: pip install didn't work because we build wheels inside a venv16:30
mnaserhttps://opendev.org/opendev/system-config/src/branch/master/docker/python-builder/scripts/assemble#L9016:30
fungican't create the venv, pip install cython in it, then build the wheels in it?16:31
corvusmnaser: oooh so this is all just in assemble.16:31
mnaseryeah :) because we build them out of repo so its a little tricky16:31
clarkbemail reply sent to javier16:31
corvusmnaser: i think maybe we should just change how assemble works :)16:31
*** dougsz has quit IRC16:32
mnaserim hoping to eventually using zuul-checked-out code to build the images but that's a todo16:32
clarkbmnaser: was it confirmed that using debian cython doesn't work?16:32
clarkbit wasn't clear to me if that was tried and failed or assumed to fail16:32
corvusmnaser: maybe just add a "--pre-install" argument to assemble, so it installs cython in the venv in a separate step?16:32
clarkbI do think updating assemble is likely the best longer term option but if system cython works that may get you moving quicker16:32
mnaserclarkb: it installs a whole set of python packages and you end up with two pythons, and the build tools (assemble) doesn't actually use that 'newly installed python'16:33
mnaserclarkb: i shortcircuited all of that and straight up added python3-rados but that pulled python and wasn't even usable in /usr/local/bin/python3 (but it was in /usr/bin/python3)16:34
clarkbmnaser: right but cython is a command line tool isn't it? so would be  in $PATH? and -builder's contents are thrown away except for the built wheels16:34
*** priteau has quit IRC16:34
clarkbthe end result should be fine assuming cython can execute properly16:34
clarkbits clunky for sure though16:34
mnaseri dont think cython is acommand line tool because its imported in setup.cfg16:34
mnasers/cfg/py/16:34
clarkbah16:34
mnaseri mean, it might have one, but the setup.py seemed to import it and do things with the api16:34
clarkbya the package itself may call it via imports not cli16:35
corvusmaybe just stick something like lines 1-3 into assemble?  http://paste.openstack.org/show/796178/16:35
mnaseryeah that's what im thinking.  probably the most annoying part is adding argument parsing there16:35
mnaser:P16:35
corvusmnaser: yeah, i was like "let's just pseudocode this" :)16:36
corvusanyway, that sounds like a good addition to assemble to me; we'll probably want a nice long comment about why it's necessary16:37
corvuscause it sure did take me a few minutes to achieve understanding :)16:37
mnasercorvus: this worked locally -- http://paste.openstack.org/show/796179/ so ill integrate it16:48
corvusmnaser: sweet!  my mental pattern matcher says there's a 60% chance that's bash and 40% it's perl.  ;)16:50
mnaserhahaha.16:50
mnasergetopt is so not my thing :(16:50
openstackgerritMohammed Naser proposed opendev/system-config master: python-builder: allow installing packages prior to build  https://review.opendev.org/74224916:53
mnasercorvus, clarkb, fungi ^ you can see the paste to toy around expected behaviour, cause i dont thinkw e really test these images16:54
mnaseri have a change i can test it in but unfortunately its in the vexxhost tenant16:55
corvusmnaser: i think the gerrit images in opendev depend on python-builder, so we can do a noop depends-on with that to at least exercise it16:55
corvus(because the gerrit images build in jeepyb)16:56
mnaseractually, i think the uwsgi images will get rebuilt too and that's a good exercise too16:56
corvusk16:56
mnaseras that uses the 'arguments' thing i think16:56
mnaserlike the assemble uWSGI -- i think16:56
mnaserill make a gerrit one too16:56
mnaserk that job is already gonna rebuild uwsgi base image so just gotta do a gerrit one16:57
openstackgerritMohammed Naser proposed opendev/jeepyb master: DNM: testing python-builder image changes  https://review.opendev.org/74225117:00
*** priteau has joined #opendev17:00
mnaserand that's a test17:00
*** bolg has quit IRC17:01
*** dtantsur is now known as dtantsur|afk17:15
*** priteau has quit IRC17:21
mnasercorvus, clarkb, fungi: the docker image change built uwsgi and jeepyb just fine, the integration job failed but that seems unrelated: `/usr/bin/python3: No module named pip`17:56
mnaserand the uwsgi image build also uses 'assemble uWSGI' so that means we didn't break that17:58
fungimnaser: yeah, needs ensure-pip probably17:59
clarkbhttps://review.opendev.org/#/c/741277/3 fixes the integration job17:59
clarkbfungi: yup18:00
mnaserok, that solved my issue when building locally too18:13
mnaseri do have another issue but that's unrelated :)18:13
corvusi'm going to restart all of zuul now18:30
corvus(i confirmed the promote job ran for https://review.opendev.org/742229 )18:30
corvusi'm going to save queues, then run zuul_restart.yaml playbook18:32
clarkbcorvus: did the fix for that land to restart executors?18:32
clarkbI think it did but I guess we know where to look if that bit fails. Oh also did we ever remove ze* from the emergency file?18:32
clarkbif not then they may not automatically pull the new images? /me is checking18:32
corvusi believe we removed them in order to switch to containers18:33
clarkbya I don't see them in emergency anymore so should be good18:33
corvusokay, i'll execute now18:33
clarkbthough we update images hourly? so may still be behind?18:33
corvusi thought we update when we start?18:33
clarkbno I think we docker-compose pull during main.yaml then start and stop only start and stop18:34
clarkbhowever does a docker-compose up imply a pull?18:34
fungiup does, start does not i think?18:35
clarkbah18:35
corvusso i'll hit the button?18:35
clarkbya I think so I'm just looking at docker-compose docs and it does seem that it does pull18:35
clarkb--quiet-pull is an option to it18:36
corvusit's running18:36
corvusthis playbook is not an optimal stop/start sequence18:37
fungiany moment someone's going to ask in here whether zuul is down ;)18:38
corvuswe seem to be stuck stopping a merger?  i don't know what it's doing.18:39
corvusit's gathering facts on mergers18:39
corvusand it's said ok far zm01,2,5,6,818:39
corvusapparently we are unable to gather facts on zm03,4,7?18:40
corvusi can't ssh to any of those18:40
corvusmaybe our playbook should start with "gather facts on all zuul hosts" before it even starts.18:40
corvusi'm going to abort this, add those 3 hosts to emergency, and re-run18:41
clarkb3 and 4 report connection reset by peer. 07 seems to hang for me18:41
corvusok we've moved on to stoppeing ze18:43
corvusoh and we don't wait for executors to stop18:43
corvusi don't know if that's okay18:43
corvusif it works, it's going to make for confusing log files18:43
clarkbya that will probably end up starting a second set of processes18:44
clarkbwhich may be a problem for the built in merger?18:44
corvusi only see one container process running18:44
clarkboh I wonder if we are less graceful with containers18:44
corvusyeah that's what i'm thinking18:44
corvusthat's probably okay for this18:45
clarkblooks like jobs are starting18:49
corvusre-enqueing18:50
fungiskimming cacti, zm03 died almost 24 hours ago, zm04 roughly 20 hours ago, and zm08 is still responding to snmp18:50
clarkbfungi: 07 not 0818:50
clarkbthe new web ui is just different enough that you notice :)18:50
fungioh, yep, misread. zm07 died around the same time as zm0418:51
fungiso all three went within a few hours of each other18:51
fungithat's an odd coincidence18:51
clarkbshared hypervisor?18:51
fungipossibly18:52
clarkbI'm guessing we will want to reboot them, if services come up on them I'm betting they will be the older verion which is actually ok for mergers (but we should probably udpate anyway?)18:53
corvusthe git sha it's reporting that it's running -- i can't find it in the history18:53
corvusis that because we promoted a zuul merge commit?18:53
corvusZuul version: 3.19.1.dev125 6f0e46ce18:54
clarkbcorvus: that would be my hunch (it is the downside to using artifacts built in the gate18:54
corvusgit describe says 3.19.0-141-g9a9b690dc18:54
clarkbsince timestamp is part of the commit hashing if a merge was used and not a fast forward we'll end up with different hashes18:54
corvusi can't reconcile the numbers either18:54
corvusshouldn't dev125 == 141?18:54
clarkbsort of related I miss that cgit showed refs in the commit view18:56
clarkbcorvus: yes I would expect the 125 to be 14118:56
corvusi'm wondering if pbr counts differently; i'm re-installing in a venv to ask it18:56
fungi`cd /etc/zuul-scheduler/;sudo docker-compose exec scheduler pbr freeze` says "zuul==3.19.1.dev125  # git sha 6f0e46ce"18:58
corvusyes, that's what the scheduler says, i'm trying to match that up to the actual source tree18:58
fungiso that at least suggests zuul is interpreting what pbr claims18:59
clarkbwe should copy the whole tree into the docker image during build time, but then on the actual prod image we just use the resulting wheel right?18:59
corvusyeah, i know that the version at the bottom of the status page is the pbr version18:59
corvusi just don't know what that version *means*18:59
fungioh, right, this is going to be built from a temporary merger state, not the branch history18:59
clarkbfungi: ya but we would've expected the commit delta count to be closer18:59
corvusso i'm currently updating my local zuul install so i can run pbr freeze in a source tree that i know is up to date.19:00
corvusit's just really friggin slow19:00
clarkbalso its meeting time, not sure if we want to take this over into the meeting and do lunch/breakfast debug meeting or go with our regularly scheduled content19:01
corvuszuul==3.19.1.dev137  # git sha 9a9b690dc19:01
corvusi have no idea what we're running.19:01
fungicurrent master branch tip installed in a venv here also claims "zuul==3.19.1.dev137  # git sha 9a9b690dc"19:02
clarkbmay need to go to the build job in the gate? and work forward from there?19:02
clarkbis it possible we promoted the stable/3.x branch fix and are running that?19:03
clarkbwhcih would in theory have a lower delta count from 3.19?19:03
corvusi'm checking that theory now19:04
corvusit seems plausible19:04
fungilooks like the devNNN should be two less than `git log --oneline 3.19.0..HEAD|wc -l`19:04
corvustox has decided to recreate again, so it'll be a minute19:05
fungistable/3.x should in theory claim 3.19.1.dev119:05
clarkbfungi: its ahead of the tag though19:06
clarkb(at least minimally)19:06
fungianyway nowhere near 12519:06
fungihuh, nevermind. zuul==3.19.1.dev3  # git sha 5d911942519:06
corvusfungi: where's that from?19:07
fungipip install of origin/stable/3.x19:08
fungiso i wonder why pbr thinks an install of master is dev137 when it has 141 commits since 3.19.019:08
clarkbhttps://zuul.opendev.org/t/zuul/build/719c6a2ca02b40c88bd5e1e26da8673c/log/job-output.txt#1743 <- that should be for the master change19:08
corvusyes i got that too19:09
clarkbhttps://zuul.opendev.org/t/zuul/build/6b898f8f37f84f589aa796132ff6aa24/log/job-output.txt#1535 <- stable change19:09
clarkbI wonder if docker-compose up doesnt' do what we think it does19:09
clarkbhow old is that image?19:09
corvuszuul/zuul-executor   latest              6fbba1285c10        4 days ago          1.43GB19:10
corvusthat seems to be the problem.  sigh.19:10
fungiso up doesn't pull?19:10
clarkbfungi: apparently not, maybe it only pulls if there are missing iamges19:11
clarkbout of date images may not be sufficinet?19:11
clarkbI mean we should try a manual pull and see if it gets anything19:11
clarkbif not then we may have a image promotion issue/19:11
corvuspulling on ze0119:11
* clarkb will context switch to meeting more now that we have a thread to pull on19:12
corvuszuul/zuul-executor   latest              50e6f6d1f5eb        About an hour ago   1.43GB19:12
corvusso yeah, we need to pull everwhere.19:13
clarkbmaybe do the reboots of the mergers now and they can be fixed with everything else?19:14
fungior wait for the periodic pull?19:14
fungiand yeah, i'm working on the mergers now19:14
fungijust wanted to check a console for one first19:15
funginot that i expect to find much19:15
corvusfungi: thanks; i'm not going to wait for them -- i'll proceed with pulling and restarting everything else19:15
fungicorvus: sounds good19:15
fungithey're in emergency disable anyway19:15
clarkbthanks!19:16
fungionce they're up i'll stop the merger on them as fast as i can and pull19:16
fungiyeah, nothing useful on console, some bursts of messages about hung kernel tasks but no idea how old those are since the timestamps are seconds since boot19:18
fungino carriage return does not produce new login prompts though19:19
fungianyway, will proceed with those three reboots (zm03,4,7)19:19
fungi#status log performed hard reboot of zm03,4,7 via api after they became unresponsive earlier today19:21
openstackstatusfungi: finished logging19:21
fungithe containers on those three mergers are downed and pulling now19:24
fungiand done19:24
ianwfungi: last acme.sh update is "[Fri Jul 17 06:30:38 UTC 2020] Skip, Next renewal time is: Sun Jul 19 19:07:05 UTC 2020"19:24
ianwit doesn't appear to have run since\19:24
fungiianw: yeah, that's why i wondered if networking issues could be causing ansible trouble runnnig it19:24
ianwFriday 17 July 2020  06:31:22 +0000 (0:00:01.038)       0:03:01.934 ***********19:27
fungiokay, i've done docker-compose up -d on the errant mergers now19:27
ianwthat's from letsencrypt.yaml -- that seems to be the last time it ran19:27
fungiso they should be back in the fold19:27
ianwso it might be more a bridge problem, and linaro is the canary19:27
fungicorvus: do you want me to leave those in the emergency disable list for now, or go ahead and take them back out?19:27
fungidon't want to further complicate the restart19:28
corvusfungi: i'll handle it19:28
corvusi'll take them out of emergency and run a pull on them19:28
fungithanks19:28
fungioh, i already ran a pull on them19:28
corvusthen it'll noop :)19:29
fungidowned them with docker compose, did pull, then up -d19:29
fungicool19:29
corvusall done19:29
fungithanks!19:29
corvusi'll now do the full restart again19:29
corvusZuul version: 3.19.1.dev137 aaff0a0219:36
corvusthat looks to be expected for master19:36
fungiagreed, knowing that the id won't match, the count looks correct19:36
fungithough i still don't quite get why it's 137 and not 141, i don't feel like digging in pbr internals right now to find out19:37
corvusZuul version: 3.19.1.dev13719:37
corvusthat's from the executor19:37
corvusso even if we did get the streams crossed, we ended up with the master version both places it matters19:38
corvusre-enqueing, and i gave the release team the all-clear19:38
fungithanks! i'll keep an eye out for github events in the log and see if things have gone back to normal there19:41
corvus#status restarted all of zuul at 9a9b690dc22c6e6fec43bf22dbbffa67b6d92c0a to fix github events and shell task vulnerability19:47
openstackstatuscorvus: unknown command19:47
corvus#status log restarted all of zuul at 9a9b690dc22c6e6fec43bf22dbbffa67b6d92c0a to fix github events and shell task vulnerability19:47
openstackstatuscorvus: finished logging19:47
fungiso i lied, i did want to go digging in pbr internals apparently (or at least i couldn't resist the urge)19:50
fungipbr is basically iterating over the output from `git log --decorate --oneline` looking for the earliest tag it encounters20:13
fungiin the case of the current zuul master branch tip the commit tagged 3.19.0 appears on line 138 of the log, so the dev count is 137 commits since that tag20:14
fungihowever, if you `git log --oneline 3.19.0..HEAD` you get 141 commits20:15
fungiso git is including 4 additional commits since that tag which don't appear before it in the log output20:17
fungier, don't appear after it i mean20:17
fungi(above, whatever_20:17
fungiif you ask git to give you the log for 3.19.0..HEAD it includes 170542089, 1d2fd6ff4, 25ccee91a and 99c6db089 because they're not in the history of 3.19.0 and are in the history of master, but because git log sorts them below (earlier chronologically compared to) the tagged commit, pbr isn't counting them toward its dev count20:24
fungithis seems like a fundamental flaw in how pbr counts commits "since" a tag20:25
ianw... so back to the letsencrypt thing ... do we not run it daily.  i thought we did, but maybe not20:40
ianwnah, it's in periodic, so why hasn't it run ...20:41
clarkbmaybe ssh failed due to ipv4 problemsm20:43
clarkbwe converted our inventory to ipv4 not ipv6 addrs20:43
clarkbdue to rax ipv6 routing problems20:43
ianwhttps://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-letsencrypt ... it's getting skipped the last few days20:46
ianwdependencies are : infra-prod-install-ansible (soft)20:47
ianwbut that doesn't seem to be failing...20:48
ianw(be cool if that dependency link were clickable :)20:48
ianw infra-prod-install-ansible: TIMED_OUT20:51
ianw2020-07-21 19:53:15.312219 |20:54
ianw2020-07-21 19:53:15.312511 | TASK [Make sure a manaul maint isn't going on]20:54
ianw2020-07-21 20:22:57.172003 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/opendev/system-config/playbooks/zuul/run-production-playbook.yaml@master]20:54
ianw... we've stopped it?20:54
ianw2020-07-17T21:02:56+00:0020:55
ianwze rollout corvus20:55
ianwso ... that explains that at least20:55
ianwcorvus: ^ are we ok to remove this now?20:55
*** roman_g has joined #opendev20:58
clarkboh20:58
*** roman_g has quit IRC20:58
clarkbianw I suspect so as that was for the zuul zk tls rollout which is complete now20:59
*** Dmitrii-Sh has quit IRC21:09
*** Dmitrii-Sh has joined #opendev21:10
corvusyes21:20
clarkbianw: what makes https://mirror.regionone.linaro-us.opendev.org/wheel/ have more content than https://mirror.regionone.limestone.opendev.org/wheel/ with our emulated cross builds I think we should make all the arches available everywhere, but its not clear to me how we distinguish that21:20
corvussorry about that.  good thing we have a comment :)21:20
clarkbwe don't have any special rewrite rules for aarch64 that I see and the symlink to the web dir is for the parent wheels/ dir21:34
clarkbso how do we end up seeing different things there21:34
clarkbproject rename announcement sent21:39
ianwclarkb: they should be in sync ...21:43
ianwclarkb: do you see afs errors on limestone?21:43
clarkbhrm I had to reboot my laptop and haven't relodaed my key. Let me see21:44
ianwls: cannot access 'centos-8-x86_64': No such device21:44
ianwthe answer is yes :/21:44
clarkbI want to say others showed similar too but I hvane't done a complete audit21:45
ianw[Thu Jun 25 06:41:09 2020] afs: Lost contact with file server 23.253.73.143 in cell openstack.org (code -1) (all multi-homed ip addresses down for the server)21:45
ianw[Thu Jun 25 06:41:47 2020] afs: file server 23.253.73.143 in cell openstack.org is back up (code 0) (multi-homed address; other same-host interfaces may still be down)21:45
ianwlast messages on limestone21:45
clarkbso maybe its a reboot and then happy situation? or reload afs kernel module?21:45
ianwi've never managed to get it to reload; i think a reboot is the easiest21:46
ianwinfra-root: just to confirm, i'm going to remove the DISABLE-ANSIBLE file?21:46
clarkbianw: ++ re disable file21:46
ianwok done, that should get certs renewed on linaro21:47
ianwat next periodic run21:47
corvusianw: ++ remove disable (ftr)21:52
clarkbgra1.ovh, bhs1.ovh, ord.rax, sjc1.vexxhost, and regionone.limestone all exhibit the lack of dirs problem under wheels/ via webserver22:00
clarkbthe others (rax, inap, and vexxhost regions as well as the linaro mirror) seem fine22:01
ianwthat's a good mix :/22:01
clarkbconsidering we already disrupted jobs with the earlier zuul restart maybe give it until tomorrow before I reboot things22:01
ianwthere's something maybe to try ... umm let me look at notes22:01
ianwfs checkvolumes22:02
clarkboh right we've done that before to address mismatched cache info22:02
clarkband that runs on the mirror side right?22:02
ianwhttps://mirror.regionone.limestone.opendev.org/wheel/ ... yeah that's done it22:02
clarkbcool I'm loading up my key now and can get the others22:03
clarkbjust root fs checkvolumes on the mirror?22:03
ianwumm i kinit /aklogged first22:03
clarkbk22:04
clarkbianw: but you did that on mirror not the fs server?22:04
ianwyes on the mirror node22:06
clarkbsjc1.vexxhost is now happy22:07
clarkbthanks! I'll finish up the rest of the ones I found22:07
clarkband now we should be able to use region specific mirrors for aarch64 cross arch builds22:10
clarkbnot that I'm ready to do that given the issues with missing cryptography package but afterwards maybe22:11
ianwright, wheel builds ... let me cycle back to that22:18
ianwhttp://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-07-20.log.html#t2020-07-20T01:55:35 was the last discussion22:20
ianwi looks like the release job ran https://zuul.openstack.org/build/679fdc41cf9846579f9f2bc7100af1f222:21
clarkbianw: when that was happening I noticed difficulting reaching the mirror from home. I don't have ipv6 so attributed it to the same issue. More recently I havne't noticed that22:21
clarkbbut I may be getting lucky with network connectivity too22:21
ianwreleases look good on https://grafana01.opendev.org/d/LjHO9YGGk/afs?orgId=122:23
ianwclarkb: so basically if your wheel was going to be built, i would assume it would be done now ... cryptography right?22:23
clarkbyes cryptography, but maybe they released the new wheel after the job to update it ran22:24
clarkbpublish-wheel-cache-debian-buster-arm64 /me looks for those logs22:24
ianwhttps://zuul.openstack.org/build/b10b1629af454b7f9e45c6178ef38a1b22:25
ianwthat's the latest run22:25
ianwcryptography\=\=\=2.9.222:26
clarkbhttps://zuul.openstack.org/build/b10b1629af454b7f9e45c6178ef38a1b/log/python3/wheel-build.sh.log ya that doens't show 3.0.0 so I expect tomorrow's run in about 8 hours will get it22:26
clarkboh wait22:28
clarkbdoes this use constraints?22:28
clarkbor just requirements?22:28
clarkblooks like it uses constraints so that may be the issue22:28
clarkbthis is going to be the major issue with trying to use openstack's wheel cache22:29
ianwyeah it runs over upper-constraints.txt22:31
clarkbcorvus: ^ the intermedate cache layer may be worth looking into further now22:32
clarkbI know you're distracted with the zuul reelase, but wanted to call that out22:32
ianwwe could do another run for the latest stuff -- but it already ran @ 1 hr 58 mins 49 secs and that's just doing the latest 2 branches22:32
clarkbwell openstack constraints hasn't updated yet22:33
clarkbbut zuul/nodepool don't use constraints so get the latest version22:33
corvusclarkb: i think i'm going to have to catch up with you on this tomorrow22:33
clarkbwe could run a zuul specific job to add wheels22:33
clarkbcorvus: no worries, my day is near the end anyway. These early mornings are hard22:33
ianwclarkb: we'd probably need to run it on a separate host22:43
clarkbianw: ya as a separate job and maybe even order them so that we don't conflict on writes?22:43
clarkbsome wheels may not build determinstically and could end up with different shas or whatever?22:44
ianwyeah basically no wheels build deterministically22:44
clarkbanother option is to use openstack constraints file for arm64 builds22:45
clarkbas sort of a hack to make jobs go faster22:45
ianwone of the issues in https://review.opendev.org/#/c/703916/1/specs/wheel-modernisation.rst22:46
ianwwe might be able to make the job multi-node ... it already runs under parallel22:47
ianwthat might almost give us a bunch of "free" builds within the same overall time period, because i think there's some packages that take a very long time to build22:48
clarkbya they are slow under buildx but I imagine not really quick on the actual hardware either22:48
ianwso if one host was doing them, the other host might be zooming through the smaller packages22:48
clarkbfwiw there is also a bug in pypa's manlinux repo about doing arm stuff and tldr is its hard for them to make it more generic due to all the flavors22:48
clarkbone approach I considered was working with them to just make manylinux builds easier since all these deps are manylinux on x86 anyway22:49
fungii have an open change to attempt to add some determinsim, but as ianw rightly pointed out, that depends on makefiles consistently treating cflags overrides, which is far from the case (some replace cflags, some append, some make up their own alternate xcflags and so on)22:49
clarkbthat said there is a manylinx for aarch64 and we could probably talk to cryptography about running those jobs maybe, either via qemu or a cloud provider with arm hardware22:49
ianwfungi: yeah, also add to that list there's no standard way to "-j"22:49
ianwhttps://github.com/pyca/cryptography/issues/529222:53
ianwit might be worth getting some numbers on how fast linaro runs the test case22:53
*** tkajinam has joined #opendev22:55
*** DSpider has quit IRC23:03
*** Dmitrii-Sh has quit IRC23:05
*** Dmitrii-Sh has joined #opendev23:06
*** mlavalle has quit IRC23:08
clarkbtge point about apple hardware is a fun one23:10
ianwi'm running tox now, it's collected ~10,000 tests23:13
ianwit's no running in parallel though afaics23:13
ianwcoverage run --parallel-mode ... it says it is, but it's not using all the cpus23:15
openstackgerritPierre-Louis Bonicoli proposed zuul/zuul-jobs master: Use ansible_distribution* facts instead of ansible_lsb  https://review.opendev.org/74231023:15
ianwmight actually need --concurrency=multiprocessing23:16
ianw782.05user 18.82system 13:21.25elapsed 99%CPU (0avgtext+0avgdata 2212248maxresident)k23:21
ianw8inputs+1171456outputs (0major+1625364minor)pagefaults 0swaps23:21
ianwi guess i'll offer to hook zuul up for them; i think all that would need to happen is allow the bot23:27
ianw%Cpu(s): 93.9 us ... using xdist, that's more like it23:34
*** iurygregory has quit IRC23:42
ianw 574.65s (0:09:34)23:42
ianwnot as much as i'd thought23:42
*** tosky has quit IRC23:47
openstackgerritPierre-Louis Bonicoli proposed zuul/zuul-jobs master: Avoid to use 'length' filter with null value  https://review.opendev.org/74231623:52

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!