pabelanger | maybe it isn't fixed | 00:00 |
---|---|---|
pabelanger | ianw: mind adding http://logs.openstack.org/16/502316/3/check/gate-openstackci-beaker-ubuntu-trusty/241fbcf/console.html#_2017-10-11_23_55_43_441780 to your list of things to fix? | 00:00 |
pabelanger | that is blocking system-config patches from landing | 00:00 |
pabelanger | I'll look at mirror-update.o.o again | 00:01 |
*** vhosakot has joined #openstack-infra | 00:01 | |
ianw | hmm, ok | 00:02 |
jeblair | pabelanger: let me know what you see -- i don't understand what i'm seeing in /var/log/reprepro/ubuntu-mirror.log | 00:02 |
pabelanger | jeblair: I deleted the lockfile and I'm manually running reprepro update on ubuntu mirror | 00:03 |
*** yamahata has quit IRC | 00:03 | |
pabelanger | I _think_ we need to increase out timeout from 30mins to longer | 00:04 |
jeblair | pabelanger: ok, that explains the abbreviated output | 00:04 |
pabelanger | which then kills reprepro and leaves lockfile | 00:04 |
*** dingyichen has joined #openstack-infra | 00:04 | |
*** srobert_ has joined #openstack-infra | 00:04 | |
jeblair | pabelanger: we still only release the volume if it's successful, right? (so i'm curious how we're getting out of syng) | 00:04 |
jeblair | sync | 00:04 |
pabelanger | but, a few hours ago, I did run reprepro check and checkpool fast, and things looked correct | 00:04 |
pabelanger | jeblair: right, should only vos release when check / checkpool pass | 00:05 |
pabelanger | processing updates for 'xenial-security|main|amd64' | 00:05 |
pabelanger | currently | 00:05 |
*** gmann_afk is now known as gmann | 00:06 | |
*** ijw has quit IRC | 00:07 | |
*** ijw has joined #openstack-infra | 00:08 | |
*** sree has joined #openstack-infra | 00:08 | |
*** srobert has quit IRC | 00:08 | |
*** Swami has quit IRC | 00:08 | |
ianw | pabelanger: the fact this is in keyring & cryptography ... possibly related to missing packages? | 00:08 |
ianw | it does not trivially reproduce on a trusty node in a virtualenv | 00:09 |
*** vhosakot has quit IRC | 00:09 | |
pabelanger | ianw: ya, I think I'm going to see about an autohold, once I figure out reprepro issue | 00:10 |
ianw | let me add that and kick one off | 00:10 |
*** sree has quit IRC | 00:12 | |
*** yamahata has joined #openstack-infra | 00:12 | |
*** sbezverk has joined #openstack-infra | 00:16 | |
*** edmondsw has joined #openstack-infra | 00:16 | |
ianw | pabelanger: http://paste.openstack.org/show/623394/ | 00:17 |
ianw | error in cryptography setup command: Invalid environment marker: python_version < '3' | 00:17 |
*** Goneri has quit IRC | 00:17 | |
clarkb | oh swift was running into that too | 00:17 |
notmyname | yup | 00:18 |
notmyname | had to update setuptools to a newer-than-distro version | 00:18 |
ianw | hmm, the root cause seems to be pip 7 ish | 00:18 |
ianw | why is the latest pip not on trusty | 00:18 |
ianw | to the build logs! | 00:19 |
ianw | http://logs.openstack.org/16/502316/3/check/gate-openstackci-beaker-ubuntu-trusty/241fbcf/console.html#_2017-10-11_23_51_08_431588 | 00:20 |
ianw | sudo pip install 'pip<8' 'virtualenv<14' | 00:20 |
ianw | why would you do that | 00:20 |
*** edmondsw has quit IRC | 00:21 | |
ianw | because http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/macros.yaml#n569 ... hmmm | 00:21 |
ianw | because of this https://review.openstack.org/#/c/270995/ ~2 year old patch | 00:22 |
jeblair | does that mean that the ubuntu mirror is actually okay? | 00:22 |
*** gouthamr has joined #openstack-infra | 00:24 | |
pabelanger | no, I think there is an issue with the reprepro database for xenial-security | 00:26 |
*** vhosakot has joined #openstack-infra | 00:27 | |
jeblair | pabelanger: why's that? | 00:27 |
jeblair | tell me what you're looking at and what you see | 00:27 |
pabelanger | sure, 1 sec | 00:28 |
pabelanger | let me get pastebin | 00:28 |
openstackgerrit | Ian Wienand proposed openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs" https://review.openstack.org/511360 | 00:29 |
ianw | jeblair: remember what that's all about ^ ? i'm guessing no | 00:29 |
*** jkilpatr has quit IRC | 00:29 | |
jeblair | ianw: nope, sorry. | 00:30 |
openstackgerrit | Ian Wienand proposed openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs" https://review.openstack.org/511360 | 00:30 |
pabelanger | jeblair: http://paste.openstack.org/show/623396/ | 00:30 |
pabelanger | processing updates for 'u|xenial-security|main|amd64' | 00:30 |
pabelanger | doesn't look correct | 00:30 |
*** caphrim007 has joined #openstack-infra | 00:31 | |
*** andreas_s has joined #openstack-infra | 00:31 | |
pabelanger | and reprepro update command appears to have hung on what I posted | 00:31 |
pabelanger | I did use strace to look at pid, but I didn't see much going on | 00:31 |
ianw | pabelanger: dmesg ... i had an issue with zero sized files on AFS the other day doing the ceph stuff | 00:32 |
ianw | ? | 00:32 |
ianw | i had to clear everything out | 00:32 |
jeblair | i'm stracing it now, and it does not seem to be working -- i haven't even gotten the current system call returned. | 00:32 |
pabelanger | ianw: clear out form where? | 00:32 |
pabelanger | from* | 00:32 |
ianw | sorry, clear out the mirror and restart | 00:33 |
ianw | but this was just for the ceph luminous, so not big | 00:33 |
pabelanger | ah | 00:33 |
pabelanger | yah, I am hope we don't need to do the same | 00:33 |
mnaser | xenial-security is probably not that big, unless you have to wipe everything :( | 00:34 |
pabelanger | right, I _think_ i've already cleared the files on xenial-security, but update still not happy | 00:34 |
openstackgerrit | Ian Wienand proposed openstack-infra/openstack-zuul-jobs master: Remove pin pip from beaker legacy jobs https://review.openstack.org/511361 | 00:34 |
pabelanger | which make me think, when timeout killed reprepro before, something maybe got corrupt in the database | 00:35 |
pabelanger | which, is possible, according to the warning it prints | 00:35 |
jeblair | pabelanger: so let's assume there's something unhappy about the afs client on mirror-update that has it stuck. how is it possible that we released an inconsistent volume? | 00:35 |
*** andreas_s has quit IRC | 00:35 | |
ianw | jeblair: of course, if we merge that unpin ... then do we break things even worse? :/ | 00:35 |
*** Apoorva_ has joined #openstack-infra | 00:36 | |
jeblair | ianw: zuulv3 would tell us pre-merge :/ | 00:36 |
pabelanger | jeblair: I don't think we did. I mean, I manually release a few hours ago, because I thought things were okay. But I believe the actually issue is, we build newer images with new packages, but our indexes were still old | 00:36 |
jeblair | pabelanger: ooh... so the underlying fix we need is to use our own mirrors when building images? | 00:37 |
pabelanger | and when we run apt-get install on bindep-fallback, it fails because we still point to old packages, while it expects newer | 00:37 |
pabelanger | jeblair: Yah, that is possible | 00:37 |
pabelanger | it would help prevent something like this I think | 00:37 |
jeblair | pabelanger: what do you main by 'point to old packages'? | 00:37 |
jeblair | s/main/mean/ | 00:37 |
pabelanger | I am speculating here, but give me a second | 00:38 |
*** srobert_ has quit IRC | 00:38 | |
pabelanger | jeblair: http://logs.openstack.org/c7/c722a78bea5d1a75cb204cc783b2480131bd5bc4/post/static-election-publish/d11a220/console.html#_2017-10-11_01_54_31_755588 | 00:38 |
jeblair | pabelanger: do you mean the index on the image is out of date? cause i thought the first thing we do after configure-mirrors is to apt-get update. the only thing i could see that could cause an inconsistency is to actually have a package installed on the image that causes a conflict | 00:38 |
pabelanger | that error to me, means we already have libcurl4-gnutls-dev installed, but it is a newer version that what our index is saying it should be | 00:39 |
*** Apoorva has quit IRC | 00:39 | |
pabelanger | I think the indexes on the image are newer then AFS mirrors, but because we apt-get clean in configure_mirror today, they image boots properly | 00:40 |
pabelanger | then, once we hit old indexes, apt-get is confused by old indexes | 00:40 |
jeblair | apt-get clears out the index; clean should just clear out cached packages, i think. | 00:40 |
*** Apoorva_ has quit IRC | 00:40 | |
jeblair | i looked on a ready xenial node and see i libcurl3-gnutls:amd64 7.47.0-1ubuntu2.2 installed, no libcurl4 | 00:41 |
pabelanger | the odd thing is, I _think_ this might work on zuulv3 jobs. I say think because I thought I seen a job properly pass an hour ago | 00:41 |
* EmilienM online for the next hour if needed | 00:41 | |
pabelanger | okay, it is possible I am wrong. So, please look and see if you find anything | 00:42 |
EmilienM | pabelanger: see #tripleo when you can | 00:43 |
jeblair | pabelanger: is that the current repo error, or the previous one? | 00:45 |
*** thorst has joined #openstack-infra | 00:46 | |
*** thorst has quit IRC | 00:46 | |
pabelanger | jeblair: I believe that has been the issue all along, clarkb right? | 00:46 |
jeblair | i just ran those commands on the xenial node i logged into and they worked | 00:47 |
pabelanger | ya | 00:47 |
pabelanger | http://logs.openstack.org/55/511255/1/check/legacy-devstack-gate-tox-py3-run-tests/8134612/job-output.txt.gz#_2017-10-11_15_52_26_842050 | 00:47 |
jeblair | pabelanger: those are both old runs though, are we sure that's still a problem? | 00:48 |
pabelanger | jeblair: which cloud? | 00:48 |
jeblair | pabelanger: rax-ord | 00:48 |
pabelanger | jeblair: no, I'm not 100% it is an issue still. | 00:49 |
pabelanger | I thought I fixed it a few hours ago | 00:49 |
pabelanger | but, when I started running reprepro update manually, and it stopped (hung), I assumed it still wasn't fixed | 00:49 |
pabelanger | so, possible this is a 2nd (new isssue) | 00:50 |
pabelanger | issue* | 00:50 |
jeblair | okay here's the recent error: http://logs.openstack.org/56/511356/1/check/gate-election-python35/e52bb05/console.html | 00:50 |
*** rook is now known as rook-afk | 00:50 | |
jeblair | https://etherpad.openstack.org/p/fkQc9nXfgN | 00:51 |
pabelanger | that is also ovh | 00:51 |
mnaser | jeblair i think what hsppens is 2.2 is installed, but then if you try and do apt-get install libcurl3-gnutls-devel, it will try to pull 2.3 | 00:51 |
*** s-shiono has joined #openstack-infra | 00:51 | |
mnaser | beacuse it tries to install libcurl3-gnutls-devel-<whatever>-2.3 | 00:51 |
mnaser | and that wants libcurl3-gnutls-<foo>-2.3 which does not exist in the mirrors | 00:52 |
*** LindaWang has joined #openstack-infra | 00:52 | |
mnaser | (or didn't this morning at least) | 00:52 |
clarkb | the gnutls one is the ine weve had all along | 00:52 |
jeblair | mnaser: right, though i just ran those commands on a 30m old rax-ord node and it only wanted to install 2.2 | 00:52 |
mnaser | apt-get update before doing that jeblair ? | 00:52 |
jeblair | so the question i now have is: under what circumstances does it want to install 2.3 | 00:53 |
jeblair | mnaser: yes | 00:53 |
jeblair | pabelanger seems to be suggesting we should look at the cloud region as a nexus | 00:53 |
tonyb | pabelanger: I'm still gettingt the gnutls issue :( | 00:53 |
mnaser | you're bringing up a good point here, the volume would have not been released | 00:53 |
mnaser | tonyb do you have logs of a failed job? | 00:53 |
pabelanger | yah, we are debugging now | 00:54 |
jeblair | mnaser, tonyb: i started an etherpad and put tonyb's links there: https://etherpad.openstack.org/p/fkQc9nXfgN | 00:54 |
tonyb | mnaser: reykjavik | 00:54 |
jeblair | both ovh-gra1 | 00:54 |
*** lewo` has quit IRC | 00:54 | |
tonyb | mnaser: http://logs.openstack.org/56/511356/1/check/gate-election-python27-ubuntu-xenial/695ae09/console.html#_2017-10-12_00_09_31_881052 stupid clipboard | 00:54 |
mnaser | tonyb np :) | 00:55 |
mnaser | jeblair ill try and search elasticsearch and see if the theory of ovh-gra1 only holds | 00:55 |
tonyb | jeblair: Thanks. | 00:55 |
jeblair | mnaser: cool, i'll try see what i can find out about those node and image build times | 00:55 |
*** sbezverk has quit IRC | 00:55 | |
jeblair | we put the image build information on the node. we don't output it in all jobs. :( | 00:56 |
*** dhinesh has quit IRC | 00:56 | |
mnaser | jeblair do you build the image multiple times for different formats in nodepool | 00:57 |
mnaser | or build once then convert | 00:57 |
pabelanger | Oh | 00:57 |
clarkb | mnaser: once and convert | 00:57 |
jeblair | mnaser: once then convert | 00:57 |
pabelanger | jeblair: I think rax are using the old images | 00:57 |
jeblair | pabelanger: that's not surprising | 00:58 |
pabelanger | jeblair: I can see in nodepool we are still trying to upload xenial images | 00:58 |
jeblair | pabelanger: so the cloud connection is "broken everywhere but rax" | 00:58 |
pabelanger | if so, they were the last good images before the breakage | 00:58 |
jeblair | i'm going to assume that's the case for the moment and stop my investigations | 00:58 |
dmsimard | Wow, achievement unlocked. ARA mentioned in top comment of a frontpage HackerNews thread (without getting thrashed) https://news.ycombinator.com/item?id=15450594 | 00:58 |
pabelanger | kk | 00:59 |
*** priteau has joined #openstack-infra | 00:59 | |
jeblair | dmsimard: congrats! | 00:59 |
*** cuongnv has joined #openstack-infra | 00:59 | |
jeblair | i'm going to see if i can get myself on a non-rax node | 00:59 |
mnaser | elasticsearch isn't cooperating, it shows bars for events but the messages is not showing things :< | 00:59 |
mnaser | or at least it's taking a loooong time to load | 01:00 |
*** namnh has joined #openstack-infra | 01:00 | |
clarkb | mnaser: ya I noticed e-r is out to lunch too will have to investigate in the morning | 01:00 |
jeblair | ii curl 7.47.0-1ubuntu2.3 amd64 command line tool for transferring data with URL syntax | 01:00 |
mnaser | there we have it | 01:00 |
dmsimard | Sorry for distracting from the issues, I'll go back to my cave | 01:00 |
* tonyb is going to take a tangent and try to create a minimal bindep.txt for the election repo | 01:01 | |
jeblair | okay, so it does look like the problem is that images are newer than mirrors | 01:01 |
*** jcoufal has joined #openstack-infra | 01:01 | |
jeblair | tonyb: time well spent regardless! | 01:01 |
pabelanger | so, when I checked a few hours ago, I must have been looking at a rax node | 01:01 |
tonyb | jeblair: Yeah I've been putting it of as 'hard' | 01:01 |
jeblair | so fixes are: short-term: git a mirror update finished and released. long-term: build images with our mirrors | 01:01 |
mnaser | jeblair: compounded alongside the mirrors failing to update, boo | 01:01 |
fungi | tonyb: not hard at all. adding a bindep.txt is self-testing | 01:02 |
tonyb | fungi: hehe okay | 01:02 |
*** aeng has quit IRC | 01:02 | |
pabelanger | Yah, and reprepro update still hasn't moved past the pastbin from above, so I am guessing we're corrupted something with timeout command | 01:03 |
jeblair | pabelanger: if you think afs is being weird, how about we reboot mirror-update? | 01:03 |
fungi | tonyb: just take the http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/data/bindep-fallback.txt and whittle it down to the things you think your jobs for that repo will need from a distro package perspective. odds are, on the election repo, the answer is "very little" | 01:03 |
pabelanger | jeblair: ya, happy to try that | 01:03 |
jeblair | pabelanger: the main process is still stuck and doing nothing | 01:03 |
jeblair | i'm less inclined to think it's corruption and more inclined to think it's afs | 01:03 |
*** priteau has quit IRC | 01:04 | |
pabelanger | sure, lets reboot | 01:04 |
*** jcoufal_ has joined #openstack-infra | 01:04 | |
tonyb | fungi: Yeah. I'm going to try with an empty one ;P | 01:04 |
jeblair | the only other thing i see running is npm-mirror-update running since apr 14 | 01:04 |
jeblair | i think i should just issue 'reboot' now. any objections? | 01:05 |
pabelanger | ++ | 01:05 |
jeblair | and there goes bandersnatch. | 01:05 |
jeblair | i'll wait till it's done, then reboot immediately. | 01:05 |
*** liusheng has joined #openstack-infra | 01:05 | |
pabelanger | dmsimard: I agree with comment, tower has a lot of moving parts | 01:05 |
jeblair | i'm not going to cast any stones at CI/CD systems for having lots of moving parts. | 01:06 |
jeblair | rebooting | 01:06 |
SamYaple | oh you just found the issue... | 01:06 |
*** kiennt26 has joined #openstack-infra | 01:06 | |
SamYaple | i was going to pop on to say ovh is a valid mirror | 01:06 |
SamYaple | it just looks like you have newer packages than ovh has already installed | 01:07 |
SamYaple | always the slow poke | 01:07 |
jeblair | SamYaple: yep! | 01:07 |
jeblair | SamYaple: all mirrors are old | 01:07 |
pabelanger | mirror-update.o.o back | 01:07 |
*** jcoufal has quit IRC | 01:07 | |
jeblair | all images are new, except rax. so rax is the only thing working now (because we're unable to upload there atm) | 01:07 |
jeblair | pabelanger: you want to do the rerpreprepepro thing? | 01:07 |
pabelanger | jeblair: yah | 01:08 |
*** sbezverk has joined #openstack-infra | 01:08 | |
SamYaple | well my gates are working too, but thats because i build everything in docker conatienrs | 01:08 |
SamYaple | thats what got me looking down the versions to new path | 01:08 |
jeblair | the systemic fix is to build our images with our mirrors so they can't get ahead of each other | 01:08 |
SamYaple | yea | 01:09 |
*** namnh has quit IRC | 01:09 | |
dmsimard | SamYaple: I know I wanted to ask you something earlier but I forget what :/ | 01:09 |
mnaser | question | 01:09 |
mnaser | dont we want to fix the upload-to-rax problem first | 01:10 |
pabelanger | okay, reprepro running now | 01:10 |
jeblair | pabelanger: that is a lot of 'v's :) | 01:10 |
pabelanger | moar v's | 01:10 |
mnaser | or otherwise we'll have a significantly smaller portion of ci that is functioning | 01:10 |
pabelanger | reading '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_xenial-security_main_amd64_Packages' | 01:10 |
pabelanger | last thing in console ATM | 01:10 |
SamYaple | jeblair: another option would be to jsut run apt-get with the option "-t=xenial" as that will stomp and downgrade thigns as needed | 01:10 |
*** hemna_ has quit IRC | 01:10 | |
*** yamahata has quit IRC | 01:10 | |
SamYaple | that might cause other problems though, something to keep in mind | 01:11 |
jeblair | mnaser: we generally expect images to be out of date -- we try not to rely on them being current | 01:11 |
SamYaple | it comes in handy in a pinch | 01:11 |
mnaser | jeblair gotcha, and actually i realized that curl will update gracefully if the mirrors are okay now | 01:11 |
SamYaple | dmsimard: was it "SamYaple: how are you so successful and attractive?" | 01:11 |
mnaser | (of course i always realize these things after speaking up) | 01:11 |
pabelanger | file looks valid | 01:11 |
tonyb | https://review.openstack.org/#/c/511365/ \o/ Possibly more than needed and wont work for rpms but I'll merge it anyway | 01:12 |
pabelanger | jeblair: are you seeing anything in strace? | 01:12 |
jeblair | pread(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 90521600) = 4096 | 01:12 |
jeblair | that's the last line | 01:13 |
jeblair | reprepro 2003 root 5u REG 0,25 90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db | 01:13 |
dmsimard | SamYaple: nope. And it's annoying the hell out of me now :( | 01:13 |
*** liusheng has quit IRC | 01:13 | |
jeblair | i can access that file okay | 01:13 |
jeblair | pabelanger: any chance that's a potentially corrupted file? | 01:13 |
SamYaple | dmsimard: sounds like me yea | 01:14 |
jeblair | pabelanger: reprepro is at 100% cpu | 01:14 |
pabelanger | jeblair: possible, however I believe we can regnerate it with reprepro with another command | 01:14 |
pabelanger | let me check man page | 01:14 |
mnaser | btw, not sure if anyone knows this or not but -fF with strace is quite useful | 01:14 |
mnaser | it'll actually hop into subprocesses/threads | 01:14 |
*** cuongnv has quit IRC | 01:14 | |
jeblair | pabelanger: i feel like 100% cpu and no system call activity after reading a bunch of null data from a file looks a lot like "infinite loop because of bad data" | 01:14 |
jeblair | mnaser: i used -f but not -ff | 01:15 |
mnaser | jeblair im old, man says => "This option is now obsolete and it has the same functionality as -f." | 01:15 |
mnaser | old typing habits die hard i guess | 01:15 |
*** liusheng has joined #openstack-infra | 01:15 | |
pabelanger | reprepro collectnewchecksums | 01:15 |
pabelanger | I think that is the command | 01:15 |
pabelanger | jeblair: yah, seems to make sense | 01:15 |
ianw | pabelanger / jeblair : can we do https://review.openstack.org/#/c/511360/ to unblock system-config, and i'll jump on any further issues? | 01:15 |
pabelanger | reprepro _listchecksums should show what current checksums are | 01:16 |
pabelanger | jeblair: I'm going to kill reprepro and try _listchecksums | 01:16 |
jeblair | pabelanger: ++ | 01:17 |
jeblair | ianw: +2 | 01:17 |
dmsimard | SamYaple: OH I remember now | 01:18 |
*** baoli has joined #openstack-infra | 01:18 | |
dmsimard | SamYaple: remember how we discussed bindep supporting different dances for sources and things | 01:18 |
pabelanger | appears to be running, will let it finish | 01:18 |
jeblair | ianw, pabelanger: i need to afk. aiui next steps are 1) fix pip stuff 2) fix reprepro and release the mirror 3) approve and enqueue 511260 into zuulv2 gate | 01:18 |
dmsimard | SamYaple: or different profiles and such | 01:19 |
jeblair | i think if we do all of those things, we can zuulv3? | 01:19 |
SamYaple | dmsimard: https://review.openstack.org/#/c/506502/ ? | 01:19 |
pabelanger | jeblair: okay, I'll keep working on reprepro | 01:19 |
SamYaple | (please some one +3 that patch, im begging) | 01:19 |
dmsimard | SamYaple: I was wondering whether that was still relevant with zuul v3, considering roles (and their dependencies) should likely be self contained | 01:19 |
dmsimard | so if you need something in a role, it should likely be installed inside that role | 01:19 |
SamYaple | dmsimard: so actually that patch is to use bindep in docker containers for image building (which we are currently doing in LOCI) | 01:20 |
SamYaple | significantly different use case to the gate | 01:20 |
dmsimard | SamYaple: oh, huh, interesting. | 01:20 |
SamYaple | dmsimard: https://github.com/openstack/loci/blob/master/bindep.txt | 01:20 |
dmsimard | SamYaple: sort of makes sense I guess | 01:21 |
SamYaple | dmsimard: it makes image building very very clean. if i can get the bindep syntax changed from the above patch, then i can do https://review.openstack.org/#/c/506823/3/bindep.txt | 01:21 |
dmsimard | I wouldn't have thought about bindep for installing packages in containers :p | 01:21 |
SamYaple | which is even more better | 01:21 |
SamYaple | well its great ebcause its one stop for all rpm/deb/pacman/emerge | 01:21 |
SamYaple | and with different architectures | 01:22 |
dmsimard | not unlike ansible but I guess ansible is more verbose | 01:22 |
ianw | jeblair: ok, i'll try to push all that along | 01:22 |
SamYaple | no duplication in the case where there are same-named packages across multiple distros | 01:22 |
dmsimard | I should look at ansible-container again, I try to look at least once every 3 months | 01:22 |
SamYaple | all in all, weve been very happy with it | 01:22 |
dmsimard | they haven't yet fulfilled my dream | 01:22 |
SamYaple | heh you and me both | 01:22 |
ianw | the logs i assume we're just pruning as fast as we can | 01:23 |
dmsimard | SamYaple: https://github.com/ansible/ansible-container/issues/399#issuecomment-316109193 | 01:23 |
*** mrunge has quit IRC | 01:24 | |
SamYaple | dmsimard: i actually dont mind running systemd as pid 1. docker did/does have the pid reaping problem (there are docker daemon options to fix that now) | 01:24 |
SamYaple | but i dont want to include systemd in my image because it adds like 80mb | 01:24 |
SamYaple | im building keystone in less than a 40mb layer. i dont need to triple that | 01:25 |
dmsimard | SamYaple: oh, it's just a bit awkward to do it but that's not ansible-container's fault | 01:25 |
*** cuongnv has joined #openstack-infra | 01:25 | |
openstackgerrit | Tony Breeds proposed openstack-infra/irc-meetings master: bindep: Supply a bindep.txt file to avoid the 'global' set https://review.openstack.org/511369 | 01:25 |
SamYaple | its a cool idea, i just find it hard to image it becoming practicle | 01:25 |
*** yamamoto has quit IRC | 01:26 | |
dmsimard | SamYaple: the use case is mostly to take a role that already exists and works with modern distros and use it to build an image with ansible-container | 01:26 |
dmsimard | but there's all sort of things that make this awkward | 01:26 |
SamYaple | yes. and that i agree with as a migration step | 01:26 |
SamYaple | but i dont really like it as a "long-term" solution | 01:27 |
dmsimard | migration ? there's no migration, if you want to install on a bare metal, vm or container image you use the same role with the same params and everything :p | 01:27 |
SamYaple | no i get that, im just not sure i get the benefit at that point is *my* point | 01:27 |
*** baoli has quit IRC | 01:27 | |
fungi | s/migration/cross-platform portability/ ? | 01:27 |
dmsimard | fungi: :) | 01:28 |
dmsimard | If I do a service task in ansible that says start the service, it better be able to start the darn service :) | 01:28 |
*** kiennt26 has quit IRC | 01:29 | |
SamYaple | i do understand the feeling :) | 01:29 |
dmsimard | in the meantime, I'll keep cursing at this elk container thing | 01:31 |
openstackgerrit | Merged openstack-infra/irc-meetings master: bindep: Supply a bindep.txt file to avoid the 'global' set https://review.openstack.org/511369 | 01:32 |
pabelanger | ianw: 260GB is the since of ubuntu mirror | 01:33 |
pabelanger | would take a bit to re-mirror i think | 01:34 |
*** vhosakot has quit IRC | 01:38 | |
*** fanzhang has joined #openstack-infra | 01:38 | |
*** baoli has joined #openstack-infra | 01:38 | |
*** kiennt26 has joined #openstack-infra | 01:41 | |
*** ijw has quit IRC | 01:41 | |
*** ijw has joined #openstack-infra | 01:41 | |
ianw | :/ | 01:42 |
pabelanger | currently running reprepro export xenial | 01:42 |
pabelanger | in an effort to see if we regenerate | 01:42 |
*** larainema has joined #openstack-infra | 01:43 | |
*** kiennt26 has quit IRC | 01:43 | |
*** baoli has quit IRC | 01:43 | |
*** kiennt26 has joined #openstack-infra | 01:43 | |
*** ijw has quit IRC | 01:44 | |
*** sdague has quit IRC | 01:44 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add documentation on force-merging a change https://review.openstack.org/511248 | 01:45 |
*** jcoufal_ has quit IRC | 01:45 | |
*** kiennt26 has quit IRC | 01:46 | |
*** kiennt26 has joined #openstack-infra | 01:46 | |
*** kiennt26 has quit IRC | 01:47 | |
*** kaisers has quit IRC | 01:48 | |
*** kaisers has joined #openstack-infra | 01:49 | |
*** psachin has joined #openstack-infra | 01:49 | |
*** kiennt26 has joined #openstack-infra | 01:49 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add documentation on force-merging a change https://review.openstack.org/511248 | 01:50 |
*** edmondsw has joined #openstack-infra | 01:51 | |
*** rosmaita has quit IRC | 01:53 | |
*** nikhil has quit IRC | 01:54 | |
*** hongbin has joined #openstack-infra | 01:55 | |
openstackgerrit | Merged openstack-infra/project-config master: Online inap-mtl01 region https://review.openstack.org/511328 | 01:57 |
*** kukacz has quit IRC | 02:00 | |
*** dhinesh has joined #openstack-infra | 02:00 | |
*** kukacz has joined #openstack-infra | 02:01 | |
openstackgerrit | Merged openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs" https://review.openstack.org/511360 | 02:02 |
*** thorst has joined #openstack-infra | 02:02 | |
ianw | ok, will try system-config in a bit with ^ and see where we're at | 02:03 |
*** thorst has quit IRC | 02:07 | |
*** baoli has joined #openstack-infra | 02:07 | |
*** jascott1 has quit IRC | 02:08 | |
*** jascott1 has joined #openstack-infra | 02:08 | |
*** hichihara has joined #openstack-infra | 02:10 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Create alternate time for Neutron Drivers meeting https://review.openstack.org/511293 | 02:11 |
*** baoli has quit IRC | 02:12 | |
*** jascott1 has quit IRC | 02:12 | |
*** kiennt26 has quit IRC | 02:13 | |
*** kiennt26 has joined #openstack-infra | 02:15 | |
*** thorst has joined #openstack-infra | 02:17 | |
dmsimard | clarkb: I'm done fighting with my local elk instance for tonight, I wanted to test the type things we've talked about.. I'll look at it some more tomorrow. | 02:18 |
*** thorst has quit IRC | 02:19 | |
dmsimard | It might be that I'm testing with stuff that is too up to date compared to what we're running on logstash.o.o. | 02:20 |
clarkb | oh ya we are old for reasons | 02:20 |
clarkb | mostly of the javascript variety | 02:21 |
*** gildub has quit IRC | 02:25 | |
*** baoli has joined #openstack-infra | 02:25 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Update Neutron team meeting chairperson https://review.openstack.org/511303 | 02:28 |
pabelanger | clarkb: still working on getting ubuntu mirror working again, it is possible we may need to rebuilt it from scratch, but doing so might take ~2 days. We have 260GB to deal with | 02:28 |
pabelanger | I don't plan on deleting anything, but something we might need to dig into in the morning | 02:28 |
ianw | can i help? | 02:29 |
clarkb | pabelanger: do we think it is reprepro or afs or both? | 02:29 |
pabelanger | ianw: right now, waiting for reprepro export to finish, then going to try running update again | 02:29 |
pabelanger | ianw: however, I might have to pass off to you shortly, getting late | 02:29 |
pabelanger | clarkb: we rebooted mirror-update to make sure afs was good | 02:29 |
pabelanger | but same issues | 02:30 |
pabelanger | current hope is export regenerates everything we need | 02:30 |
pabelanger | so update cmd works | 02:30 |
pabelanger | its been reading files on disk for a while now | 02:30 |
clarkb | export of reprepro? | 02:30 |
pabelanger | https://mirrorer.alioth.debian.org/reprepro.1.html | 02:31 |
pabelanger | yah | 02:31 |
*** liujiong has joined #openstack-infra | 02:31 | |
pabelanger | reprepro export xenial | 02:31 |
*** gouthamr has quit IRC | 02:31 | |
clarkb | gotcha thats like an in place rebuild | 02:32 |
pabelanger | yah, I hope | 02:33 |
*** coolsvap has joined #openstack-infra | 02:33 | |
pabelanger | I also just found https://github.com/esc/reprepro/blob/master/docs/recovery | 02:34 |
*** srobert has joined #openstack-infra | 02:38 | |
*** markvoelker has quit IRC | 02:39 | |
ianw | the only other thing i can think to do as a prophylactic is maybe attach a volume and get the data on afs01 so if it needs to be imported, it's there? | 02:39 |
ianw | but if it slows things down, it would be even worse | 02:39 |
*** mrunge has joined #openstack-infra | 02:39 | |
*** andreas_s has joined #openstack-infra | 02:40 | |
*** gildub has joined #openstack-infra | 02:42 | |
*** srobert has quit IRC | 02:43 | |
*** gcb has joined #openstack-infra | 02:43 | |
*** dfflanders has joined #openstack-infra | 02:44 | |
pabelanger | Oh interesting | 02:48 |
pabelanger | http://paste.openstack.org/show/623401/ | 02:48 |
pabelanger | I just got that on export | 02:48 |
pabelanger | and see some afs warnings in dmesg | 02:49 |
*** andreas_s has quit IRC | 02:49 | |
*** dfflanders has quit IRC | 02:49 | |
*** yamahata has joined #openstack-infra | 02:51 | |
ianw | oh dear | 02:52 |
*** junbo has quit IRC | 02:52 | |
*** edmondsw has quit IRC | 02:53 | |
ianw | things that handle an error from close() are few and far between too | 02:53 |
*** nicolasbock has quit IRC | 02:53 | |
ianw | handle it properly, anyway | 02:53 |
pabelanger | I'm starting down the recovery doc now | 02:54 |
pabelanger | rereference first | 02:54 |
tonyb | project-config is still frozen correct? | 02:57 |
mnaser | tonyb v2 changes likely wont merge, v3 changes welcome afaik | 02:57 |
tonyb | mnaser: okay I was hoping to do both but I can just make a note and do the v3 change after the switch | 02:58 |
*** andreas_s has joined #openstack-infra | 02:59 | |
mnaser | tonyb you can still propose the v3 change and it will be reviewed and merged | 02:59 |
pabelanger | ianw: ya, something appears to be up with AFS | 02:59 |
*** priteau has joined #openstack-infra | 03:00 | |
ianw | i've been pining afs01 from mirror-update | 03:00 |
ianw | no dropped packets, a few quite high spikes though (~8ms) | 03:00 |
tonyb | mnaser: okay, I'll thin kon that for a bit | 03:00 |
ianw | seeing as basically nothing has changed, gotta feel like it's network between the two | 03:00 |
pabelanger | ianw: okay, I have to call it. But xenial-updates and xenial-security both have issues | 03:01 |
pabelanger | xenial and xenial-backports update properly | 03:01 |
pabelanger | so, it is possible could just try first deleteing ubuntu-secutiry from reprepro, then mirror it | 03:01 |
pabelanger | then, if it works, we do the same for -updates | 03:02 |
*** mtreinish has quit IRC | 03:02 | |
pabelanger | ianw: good luck, I'll read up on backscroll in the morning | 03:02 |
ianw | alright, let me think about it before i do anything :) | 03:03 |
*** hichihara has quit IRC | 03:03 | |
*** priteau has quit IRC | 03:05 | |
*** mtreinish has joined #openstack-infra | 03:07 | |
ianw | interesting, we don't make /var/run/reprepro on reboot i guess | 03:10 |
*** kiennt26 has quit IRC | 03:11 | |
*** gouthamr has joined #openstack-infra | 03:12 | |
*** andreas_s has quit IRC | 03:12 | |
*** thorst has joined #openstack-infra | 03:12 | |
*** thorst has quit IRC | 03:12 | |
*** kiennt26 has joined #openstack-infra | 03:16 | |
*** baoli has quit IRC | 03:17 | |
*** Srinivas has joined #openstack-infra | 03:20 | |
Srinivas | hi all, i am facing this while runnning jobs in jenkins, " ERROR! Unexpected Exception: 'module' object has no attribute '_vendor'" any one knows this issue | 03:20 |
*** links has joined #openstack-infra | 03:25 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Create flock directories in /var/run https://review.openstack.org/511380 | 03:28 |
*** udesale has joined #openstack-infra | 03:35 | |
*** yamamoto has joined #openstack-infra | 03:39 | |
*** yamamoto_ has joined #openstack-infra | 03:44 | |
*** yamamoto has quit IRC | 03:47 | |
openstackgerrit | Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy job from Oslo.log https://review.openstack.org/511384 | 03:52 |
*** dave-mccowan has quit IRC | 03:54 | |
*** ykarel|afk has joined #openstack-infra | 03:56 | |
openstackgerrit | Nam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Oslo.log legacy job https://review.openstack.org/511385 | 03:56 |
openstackgerrit | Nam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Oslo.log legacy job https://review.openstack.org/511385 | 03:58 |
openstackgerrit | Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy job from Oslo.log https://review.openstack.org/511384 | 04:01 |
*** sree has joined #openstack-infra | 04:03 | |
*** yamamoto_ has quit IRC | 04:03 | |
*** hongbin has quit IRC | 04:04 | |
ianw | pabelanger: ok, as suggested, i removed xenial-update & xenial-security -- i dropped them from distribtuions (/etc/reprepro/ubuntu/distributions.ianw) $REPREPRO --delete clearvanished | 04:04 |
ianw | it seemed to remove a bunch of things | 04:04 |
ianw | see logs in /tmp/ianw/out.log (sorry it's just a huge stream) | 04:04 |
ianw | i put them back, and am rerunning a "normal" update | 04:04 |
SamYaple | ianw: are you saying that updates and security wont be mirrored anymore? | 04:04 |
SamYaple | oh ok | 04:05 |
SamYaple | phew. scared me for a second | 04:05 |
ianw | i don't know what it's doing, it's sitting there at 100% cpu with -VVV not saying anything | 04:05 |
ianw | i am going to go do something else for about 45 minutes and not look at it, see if something happens | 04:05 |
SamYaple | ianw: you dont htink this has to do with the docker mirror we added do you? | 04:05 |
SamYaple | seems like alot of this started right after that | 04:06 |
ianw | SamYaple: no, my guess is that transient network errors have introduced AFS issues, which have corrupted reprepro's state somehow | 04:06 |
SamYaple | got it | 04:06 |
ianw | the only thing more obscure than AFS internals is reprepro internals, which makes for an interesting combo | 04:06 |
SamYaple | :) | 04:07 |
SamYaple | i really have to finish my apt mirroring utility | 04:07 |
SamYaple | ive never found a really good one | 04:07 |
SamYaple | and i like to push to my ceph radosgw without having an intermedate clone locally, which *nothing* does | 04:07 |
*** armax has quit IRC | 04:08 | |
*** armax has joined #openstack-infra | 04:08 | |
*** armax has quit IRC | 04:08 | |
*** armax has joined #openstack-infra | 04:09 | |
SamYaple | is there something on paper about how infra is going to solve the unsigned mirrors for apt issue? | 04:09 |
*** armax has quit IRC | 04:09 | |
SamYaple | we could just resign the Release file after the mirroring | 04:09 |
*** armax has joined #openstack-infra | 04:10 | |
ianw | i don't think there's anything to solve, i don't think we want it signed to avoid it being used as public mirrors | 04:10 |
*** armax has quit IRC | 04:10 | |
ianw | as jeblair noted, you can't seem to strace this process. or at least it doesn't seem to be doing anything | 04:10 |
ianw | i installed gdb in a hail-mary to see if i can see what's going on | 04:10 |
Srinivas | SamYaple:hi all, i am facing this while runnning jobs in jenkins, " ERROR! Unexpected Exception: 'module' object has no attribute '_vendor'" any one knows this issue | 04:10 |
ianw | i haven't bothered with symbols -> http://paste.openstack.org/show/623403/ | 04:11 |
ianw | it's somewhere doing something in db code every time | 04:11 |
ianw | the dbs it has open are | 04:12 |
ianw | reprepro 17829 root 5u REG 0,25 42790912 2537578 /afs/.openstack.org/mirror/ubuntu/db/references.db | 04:12 |
ianw | reprepro 17829 root 6u REG 0,25 90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db | 04:12 |
ianw | reprepro 17829 root 7u REG 0,25 485736448 2537576 /afs/.openstack.org/mirror/ubuntu/db/contents.cache.db | 04:12 |
ianw | i think if they are corrupt, we are SOL basically | 04:12 |
clarkb | ianw: I'm just happy that hail mary as long shot term transcemds murican football | 04:13 |
SamYaple | would it be so wrong to purge it all and completely resync? | 04:13 |
ianw | this might have been a hospital pass from pabelanger :) | 04:13 |
SamYaple | i know it will take time | 04:13 |
ianw | not sure if that term transcends | 04:13 |
ianw | it's 250something gb over afs ... that is our last option | 04:14 |
ianw | if i had to learn something from this now, it's that i think we should get things pointing to reverse proxies | 04:15 |
*** claudiub|2 has joined #openstack-infra | 04:15 | |
ianw | that way, we can at least roll out a config to point it to upstream if this happens again | 04:15 |
SamYaple | reverse proxies work great for non-https things | 04:15 |
SamYaple | but some repos are only https | 04:15 |
clarkb | you can totally http -> https | 04:16 |
SamYaple | yea thats true | 04:16 |
SamYaple | i guess that wouldbt be so bad, we are already doing custom urls. wouldnt be much different | 04:16 |
clarkb | ianw we could do that if we need to | 04:16 |
SamYaple | as far as a workflow goes i mean | 04:17 |
clarkb | in this case can we get by if we rebuild images against the mirror while we rebuild it? | 04:19 |
ianw | clarkb: i think reverse proxies would be more reliable | 04:20 |
*** yamamoto has joined #openstack-infra | 04:21 | |
ianw | of course right now, we can't merge system-config until https://review.openstack.org/511360 is deployed. i don't know why zuul hasn't reconfigured, it seems like it's been ages | 04:22 |
SamYaple | i do apt-cacher-ng at my house with pretty good success | 04:22 |
SamYaple | i would be ok with reverse proxies | 04:22 |
SamYaple | it would save a great deal of space too | 04:23 |
*** markvoelker has joined #openstack-infra | 04:39 | |
*** edmondsw has joined #openstack-infra | 04:39 | |
*** edmondsw has quit IRC | 04:44 | |
adriant | we still having issues with Zuul? I've got a patch I +2 and +1 workflow for an it doesn't seem to want to merge :( | 04:45 |
adriant | https://review.openstack.org/#/c/509016/ | 04:45 |
*** bhavik1 has joined #openstack-infra | 04:49 | |
clarkb | adriant: it needs to be +1'd by jenkins first. a recheck should get it going | 04:50 |
adriant | clarkb: ty! | 04:50 |
adriant | clarkb: although I'd have assumed the zuul +1 was enough :( | 04:51 |
*** stakeda has joined #openstack-infra | 04:53 | |
ykarel|afk | clarkb, why jenkins has not possed +1 in https://review.openstack.org/#/c/510735/, any idea? | 04:55 |
clarkb | it wouldve been if we managed to keep using zuulv3 for gating but we had to roll back | 04:55 |
*** gouthamr has quit IRC | 04:56 | |
*** thorst has joined #openstack-infra | 04:56 | |
ykarel|afk | the patch has workflow +1 but gate jobs are not running | 04:56 |
clarkb | ykarel|afk: it did, look at the comments (toggle ci if you need to) | 04:56 |
*** ykarel|afk is now known as ykarel | 04:56 | |
ykarel | clarkb, yes it's there but gate jobs are not there in http://status.openstack.org/zuul/ | 04:58 |
ykarel | not running | 04:58 |
clarkb | it is there when I look | 04:59 |
ykarel | clarkb, yes it's there, sorry | 05:00 |
ykarel | i mislooked | 05:00 |
*** thorst has quit IRC | 05:00 | |
*** priteau has joined #openstack-infra | 05:00 | |
*** bhavik1 has quit IRC | 05:01 | |
*** eumel8 has joined #openstack-infra | 05:03 | |
*** priteau has quit IRC | 05:05 | |
ykarel | clarkb, how tarballs are pushed, is there some issue, i cannot find https://tarballs.openstack.org/puppet-tripleo/puppet-tripleo-5.6.4.tar.gz | 05:05 |
ianw | ok, i think reprepo is dead, nothing has happened | 05:07 |
ykarel | looks like there is some issue that's why some reverts are going: https://review.openstack.org/#/q/status:merged+project:openstack/releases+branch:master+topic:newton/tripleo | 05:08 |
*** CHIPPY has joined #openstack-infra | 05:11 | |
*** markvoelker has quit IRC | 05:14 | |
*** stakeda has quit IRC | 05:18 | |
*** ykarel_ has joined #openstack-infra | 05:20 | |
*** ykarel has quit IRC | 05:22 | |
ianw | i sent out a note in reply to mordred. i'm running out of ideas if i can't get system-config changes merged | 05:29 |
*** jtomasek has joined #openstack-infra | 05:33 | |
*** bhavik1 has joined #openstack-infra | 05:35 | |
*** CHIPPY has quit IRC | 05:36 | |
*** mrunge has quit IRC | 05:44 | |
*** eumel8 has quit IRC | 05:45 | |
*** cshastri has joined #openstack-infra | 05:45 | |
*** threestrands has quit IRC | 05:48 | |
*** dhajare has joined #openstack-infra | 05:53 | |
*** e0ne has joined #openstack-infra | 05:55 | |
*** eumel8 has joined #openstack-infra | 05:55 | |
*** lewo has joined #openstack-infra | 05:56 | |
*** e0ne has quit IRC | 05:59 | |
*** udesale__ has joined #openstack-infra | 06:03 | |
*** martinkopec has joined #openstack-infra | 06:03 | |
*** udesale has quit IRC | 06:03 | |
*** udesale has joined #openstack-infra | 06:06 | |
*** mrunge has joined #openstack-infra | 06:06 | |
*** sshnaidm|off is now known as sshnaidm | 06:06 | |
*** udesale__ has quit IRC | 06:07 | |
*** martinkopec has quit IRC | 06:08 | |
*** martinkopec has joined #openstack-infra | 06:09 | |
*** markvoelker has joined #openstack-infra | 06:11 | |
*** kjackal_ has joined #openstack-infra | 06:12 | |
*** bhavik1 has quit IRC | 06:16 | |
*** pahuang has quit IRC | 06:18 | |
*** yamahata has quit IRC | 06:20 | |
ianw | ok, stracing reprepro the last entry is | 06:23 |
ianw | 3170 pread(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 90521600) = 4096 | 06:23 |
ianw | which lsof tells me | 06:23 |
ianw | reprepro 3170 root 6u REG 0,25 90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db | 06:23 |
sshnaidm | infra-root, zuulv3 can't install ansible properly and fails, is it known issue? I didn't see it in etherpad before: fatal error: openssl/opensslv.h: No such file or directory: http://logs.openstack.org/07/472607/102/check/legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-puppet/a263fa3/job-output.txt.gz#_2017-10-12_06_13_19_532007 | 06:24 |
*** pahuang has joined #openstack-infra | 06:27 | |
*** edmondsw has joined #openstack-infra | 06:28 | |
ianw | i'm running "find pool -type f -print | reprepro --confdir /etc/reprepro/ubuntu -b . _detect" which can hopefully recreate it? | 06:29 |
*** pgadiya has joined #openstack-infra | 06:30 | |
*** Swami has joined #openstack-infra | 06:31 | |
*** edmondsw has quit IRC | 06:32 | |
ianw | oh jeez, if this has to checksum the whole thing, over afs ... | 06:32 |
ianw | it's up to 1mb, the old file was 80mb | 06:33 |
*** Swami has quit IRC | 06:33 | |
AJaeger | oops ;( | 06:33 |
ianw | so let's say 5 minutes a megabyte, 5*80 == 6 hours? | 06:34 |
*** yamahata has joined #openstack-infra | 06:34 | |
ianw | the old checksum file is still there | 06:34 |
*** udesale has quit IRC | 06:34 | |
AJaeger | jlk: something is wrong with the periodic translation jobs, see http://logs.openstack.org/periodic/git.openstack.org/openstack/glance/stable/newton/propose-translation-update/2302cc1/ - I expected that to be a *master* job since we only converted master... | 06:35 |
*** rcernin has joined #openstack-infra | 06:35 | |
*** udesale has joined #openstack-infra | 06:35 | |
AJaeger | jlk: That one failed to isntall packages as well. Missing root? | 06:37 |
*** armaan has joined #openstack-infra | 06:38 | |
*** srobert has joined #openstack-infra | 06:40 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Install zanata dependencies as root https://review.openstack.org/511396 | 06:40 |
AJaeger | jlk, ianw, quick fix for the second problem ^ | 06:40 |
ianw | AJaeger: see my notes though, not sure how much will merge | 06:41 |
*** markvoelker has quit IRC | 06:44 | |
*** dhinesh has quit IRC | 06:44 | |
*** srobert has quit IRC | 06:44 | |
eumel8 | AJaeger: There are more tasks in this role which requires root | 06:45 |
*** pgadiya has quit IRC | 06:45 | |
AJaeger | eumel8: want to do a followup fix? | 06:46 |
chandankumar | ianw: hello | 06:46 |
AJaeger | ianw: those jobs looks fine - but in general I agree ;( Thanks for wading through it | 06:47 |
chandankumar | ianw: how to add initial core reviewers for this http://git.openstack.org/cgit/openstack/python-tempestconf/ ? we need to add 4 people for the same. | 06:47 |
ianw | chandankumar: i added you as core, you can now add as you want | 06:49 |
eumel8 | AJaeger: just wondering if this full playbook runs not under root | 06:49 |
chandankumar | ianw: thanks :-) | 06:49 |
chandankumar | ianw: one more help i need on this review https://review.openstack.org/#/c/511194/ | 06:50 |
AJaeger | eumel8: it does not run as root, see the link above if you want to check | 06:51 |
chandankumar | ianw: https://review.openstack.org/#/admin/groups/1842,members i am not able to add other core reviewers | 06:51 |
chandankumar | ianw: my email-id is chkumar@redhat.com | 06:51 |
AJaeger | chandankumar: log out and in again | 06:52 |
*** priteau has joined #openstack-infra | 06:52 | |
AJaeger | chandankumar: and add first thing the QA PTL, please - the repo is part of QA | 06:52 |
AJaeger | ianw: so, 511396 passed tests | 06:53 |
* AJaeger will be offline for a couple of hours now | 06:54 | |
chandankumar | AJaeger: it is a part of Refstack, i will add hogepodge but still facing the same issue after logging out and logging again | 06:54 |
eumel8 | ok | 06:54 |
chandankumar | AJaeger: some people also complains that they are not able to add me a reviewer | 06:54 |
ianw | infra-root / pabelanger: http://lists.openstack.org/pipermail/openstack-infra/2017-October/005610.html is likely my last update on the reprepro thing. it's currently trying to recreate the checksums.db as described. if that doesn't work, i'm out of ideas for now | 06:55 |
*** priteau has quit IRC | 06:55 | |
*** priteau has joined #openstack-infra | 06:55 | |
ianw | chandankumar: i don't know, i'm almost out. want me to add someone else? | 06:56 |
chandankumar | ianw: please add luigi toscano | 06:56 |
chandankumar | and Chris Hoge | 06:57 |
*** thorst has joined #openstack-infra | 06:57 | |
ianw | chandankumar: what's you account id? | 06:57 |
ianw | click on your name and settings | 06:57 |
chandankumar | ianw: username chkumar246 | 06:58 |
chandankumar | Username | 06:58 |
chandankumar | chkumar246 | 06:58 |
chandankumar | Full Name Chandan Kumar | 06:58 |
chandankumar | Email Address chkumar@redhat.com | 06:58 |
ianw | Account ID below that | 06:58 |
chandankumar | Account ID 12393 | 06:58 |
ianw | Oct 12, 2017 5:48 PMAddedChandan Kumar (8944) | 06:59 |
*** priteau has quit IRC | 06:59 | |
ianw | that's the problem. i think someone will have to manually delete the account. check back in US hours for another infra root, i've got to EOD sorry | 06:59 |
chandankumar | ianw: no problem thanks :-) | 07:00 |
*** thorst_ has joined #openstack-infra | 07:00 | |
chandankumar | take rest, have anice night ahead :-) | 07:00 |
*** eumel8 has quit IRC | 07:01 | |
*** Dinesh_Bhor has quit IRC | 07:01 | |
*** slaweq has joined #openstack-infra | 07:01 | |
*** thorst has quit IRC | 07:01 | |
*** thorst_ has quit IRC | 07:05 | |
*** Dinesh_Bhor has joined #openstack-infra | 07:07 | |
*** vsaienk0 has joined #openstack-infra | 07:09 | |
*** dingyichen has quit IRC | 07:10 | |
*** Hal has joined #openstack-infra | 07:13 | |
*** Hal is now known as Guest66337 | 07:14 | |
sshnaidm | is issue with "permission denied" is back? http://logs.openstack.org/84/508884/1/check/legacy-tripleo-ci-centos-7-nonha-multinode-oooq/f7a99fa/logs/devstack-gate-setup-workspace-new.txt | 07:14 |
sshnaidm | I thought it was solved yesterday | 07:14 |
*** yamahata has quit IRC | 07:15 | |
*** pcaruana has joined #openstack-infra | 07:17 | |
*** florianf has joined #openstack-infra | 07:20 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove unused kuryr-libnetwork jobs https://review.openstack.org/511404 | 07:22 |
*** aviau has quit IRC | 07:23 | |
*** aviau has joined #openstack-infra | 07:24 | |
*** gildub has quit IRC | 07:24 | |
*** armaan has quit IRC | 07:24 | |
*** jpich has joined #openstack-infra | 07:29 | |
*** shardy has joined #openstack-infra | 07:30 | |
*** florianf has quit IRC | 07:32 | |
*** tesseract has joined #openstack-infra | 07:32 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: DNM: test containers update https://review.openstack.org/511175 | 07:32 |
*** florianf has joined #openstack-infra | 07:32 | |
*** rwsu has joined #openstack-infra | 07:33 | |
*** andreas_s has joined #openstack-infra | 07:35 | |
*** ykarel__ has joined #openstack-infra | 07:38 | |
AJaeger | sshnaidm: see the status emails by ianw and monty on openstack-dev, this is not solved yet. You know, when it rains, it pours... ;( | 07:39 |
ethfci | guys i feel it is high time for a 'stop the line'? | 07:40 |
ethfci | since days Jenkins and Zull is dead... | 07:40 |
*** ykarel_ has quit IRC | 07:41 | |
*** markvoelker has joined #openstack-infra | 07:41 | |
ethfci | still facing with the 'libcurl4-gnutls-dev' issue... | 07:43 |
*** stakeda has joined #openstack-infra | 07:45 | |
*** egonzalez has joined #openstack-infra | 07:45 | |
AJaeger | ethfci: http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html | 07:46 |
*** hashar has joined #openstack-infra | 07:57 | |
*** d0ugal has joined #openstack-infra | 07:59 | |
*** eumel8 has joined #openstack-infra | 08:00 | |
*** gildub has joined #openstack-infra | 08:03 | |
*** dbecker has joined #openstack-infra | 08:05 | |
openstackgerrit | Tovin Seven proposed openstack-infra/openstack-zuul-jobs master: Remove legacy oslo.db job https://review.openstack.org/511412 | 08:05 |
openstackgerrit | Tovin Seven proposed openstack-infra/project-config master: Remove legacy oslo.db job https://review.openstack.org/511414 | 08:05 |
*** yamamoto has quit IRC | 08:05 | |
*** yamamoto has joined #openstack-infra | 08:08 | |
*** markvoelker has quit IRC | 08:14 | |
*** edmondsw has joined #openstack-infra | 08:15 | |
*** s-shiono has quit IRC | 08:17 | |
*** priteau has joined #openstack-infra | 08:18 | |
*** edmondsw has quit IRC | 08:20 | |
*** shardy has quit IRC | 08:29 | |
*** shardy has joined #openstack-infra | 08:29 | |
kazsh | AJaeger: G'day, got PTL's +1 accordingly, please check https://review.openstack.org/#/c/509119/ | 08:31 |
*** ralonsoh has joined #openstack-infra | 08:33 | |
*** lucas-afk is now known as lucasagomes | 08:34 | |
*** tosky has joined #openstack-infra | 08:35 | |
*** derekh has joined #openstack-infra | 08:38 | |
tosky | AJaeger: hi! Going back to the previous questions about python-tempestconf by chandankumar: I'm adding the missing people, but when the project is approved under refstack, will we add refstack-core instead of specifying for example the PTL directly in python-tempestconf? | 08:49 |
*** spectr has quit IRC | 08:49 | |
AJaeger | tosky: that all depends on how the refstack team wants to have it working ;) | 08:49 |
tosky | ack | 08:50 |
AJaeger | You could add refstack-core (or could have created the repo with reusing the refstack ACLs). | 08:50 |
AJaeger | Or add a subteam of refstack... I would either add refstack-core or the PTL - and let the PTL decide the rest (after discussion with the team obviously) | 08:50 |
AJaeger | kazsh: thanks, will review later | 08:51 |
*** yamamoto has quit IRC | 08:55 | |
tosky | sure, I added the PTL for now | 08:56 |
*** thorst has joined #openstack-infra | 09:02 | |
*** e0ne has joined #openstack-infra | 09:04 | |
*** thorst has quit IRC | 09:05 | |
*** gildub has quit IRC | 09:09 | |
*** jascott1 has joined #openstack-infra | 09:10 | |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Karbor https://review.openstack.org/511432 | 09:12 |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Karbor https://review.openstack.org/511433 | 09:12 |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Karbor https://review.openstack.org/511432 | 09:14 |
*** jascott1 has quit IRC | 09:14 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: Configure OVB jobs to use local mirrors for images https://review.openstack.org/511434 | 09:14 |
*** yamamoto has joined #openstack-infra | 09:15 | |
*** yamamoto has quit IRC | 09:15 | |
*** ykarel__ is now known as ykarel|lunch | 09:17 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Use native propose-translation jobs https://review.openstack.org/511435 | 09:17 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Fix propose-translation-update https://review.openstack.org/511436 | 09:17 |
*** spectr has joined #openstack-infra | 09:19 | |
*** ociuhandu has quit IRC | 09:23 | |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Murano https://review.openstack.org/511438 | 09:24 |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Murano https://review.openstack.org/511439 | 09:24 |
*** chenying_ has joined #openstack-infra | 09:25 | |
*** logan- has quit IRC | 09:27 | |
AJaeger | jlk, mordred, pleaes review https://review.openstack.org/511435 https://review.openstack.org/511436 and https://review.openstack.org/511396 - and review whether we need root access in other places as well. Hope that gets us moving forward with translations... | 09:31 |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Solum https://review.openstack.org/511440 | 09:31 |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Solum https://review.openstack.org/511441 | 09:31 |
*** yamamoto has joined #openstack-infra | 09:44 | |
*** kiennt26 has quit IRC | 09:53 | |
*** priteau has quit IRC | 09:53 | |
*** electrofelix has joined #openstack-infra | 10:00 | |
*** thorst has joined #openstack-infra | 10:02 | |
*** liujiong has quit IRC | 10:02 | |
*** sree has quit IRC | 10:02 | |
*** sree has joined #openstack-infra | 10:03 | |
*** egonzalez has quit IRC | 10:03 | |
*** edmondsw has joined #openstack-infra | 10:04 | |
*** andreas_s has quit IRC | 10:06 | |
*** rpittau has quit IRC | 10:06 | |
*** andreas_s has joined #openstack-infra | 10:06 | |
*** rpittau has joined #openstack-infra | 10:06 | |
*** edmondsw has quit IRC | 10:08 | |
*** thorst has quit IRC | 10:08 | |
*** andreas_s has quit IRC | 10:11 | |
*** spectr has quit IRC | 10:12 | |
*** markvoelker has joined #openstack-infra | 10:12 | |
*** pbourke has quit IRC | 10:15 | |
*** ykarel|lunch is now known as ykarel | 10:15 | |
*** egonzalez has joined #openstack-infra | 10:17 | |
*** pbourke has joined #openstack-infra | 10:17 | |
*** andreas_s has joined #openstack-infra | 10:20 | |
*** sbezverk has quit IRC | 10:21 | |
*** spectr has joined #openstack-infra | 10:25 | |
*** sree has quit IRC | 10:26 | |
*** mikal has quit IRC | 10:28 | |
*** thingee has quit IRC | 10:28 | |
*** thingee has joined #openstack-infra | 10:28 | |
*** mikal has joined #openstack-infra | 10:30 | |
*** udesale has quit IRC | 10:30 | |
*** gcb has quit IRC | 10:30 | |
*** panda|rover|off is now known as panda|rover | 10:30 | |
*** andreas_s has quit IRC | 10:33 | |
*** boden has joined #openstack-infra | 10:34 | |
*** andreas_s has joined #openstack-infra | 10:39 | |
*** armaan has joined #openstack-infra | 10:40 | |
*** florianf has quit IRC | 10:40 | |
*** florianf has joined #openstack-infra | 10:40 | |
*** priteau has joined #openstack-infra | 10:42 | |
*** armaan has quit IRC | 10:43 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: Use infra proxy server for trunk.r.o in delorean-deps https://review.openstack.org/508884 | 10:45 |
*** markvoelker has quit IRC | 10:45 | |
*** clayton has quit IRC | 10:49 | |
*** clayton has joined #openstack-infra | 10:51 | |
*** andreas_s has quit IRC | 10:53 | |
*** edmondsw has joined #openstack-infra | 10:53 | |
*** florianf has quit IRC | 10:54 | |
*** andreas_s has joined #openstack-infra | 10:54 | |
*** florianf has joined #openstack-infra | 10:55 | |
*** andreas_s has quit IRC | 10:56 | |
*** andreas_s has joined #openstack-infra | 10:56 | |
*** edmondsw has quit IRC | 10:56 | |
*** logan- has joined #openstack-infra | 10:57 | |
*** sambetts|afk is now known as sambetts | 11:01 | |
*** zoli is now known as zoli|lunch | 11:02 | |
*** zoli|lunch is now known as zoli | 11:02 | |
*** priteau has quit IRC | 11:02 | |
*** priteau has joined #openstack-infra | 11:03 | |
*** dave-mccowan has joined #openstack-infra | 11:04 | |
*** andreas_s has quit IRC | 11:07 | |
*** priteau has quit IRC | 11:08 | |
*** shardy is now known as shardy_lunch | 11:08 | |
*** priteau has joined #openstack-infra | 11:10 | |
*** wolverineav has joined #openstack-infra | 11:10 | |
*** sdague has joined #openstack-infra | 11:11 | |
*** gildub has joined #openstack-infra | 11:12 | |
*** florianf has quit IRC | 11:13 | |
*** florianf has joined #openstack-infra | 11:14 | |
*** priteau has quit IRC | 11:16 | |
*** gmann is now known as gmann_afk | 11:17 | |
*** jkilpatr has joined #openstack-infra | 11:23 | |
*** Srinivas has quit IRC | 11:26 | |
*** cuongnv has quit IRC | 11:26 | |
*** yamamoto has quit IRC | 11:27 | |
*** martinkopec has quit IRC | 11:29 | |
*** priteau has joined #openstack-infra | 11:35 | |
*** andreas_s has joined #openstack-infra | 11:35 | |
*** ykarel has quit IRC | 11:37 | |
*** ykarel has joined #openstack-infra | 11:38 | |
*** nicolasbock has joined #openstack-infra | 11:39 | |
*** andreas_s has quit IRC | 11:39 | |
*** andreas_s has joined #openstack-infra | 11:40 | |
*** markvoelker has joined #openstack-infra | 11:43 | |
*** nicolasbock has quit IRC | 11:45 | |
strigazi | Hello AJaeger, Can we merge magnum's zuulv3 patch? https://review.openstack.org/#/c/508676/ I'm a little lost with all these new fast failures/RETRY_LIMIT | 11:46 |
*** andreas_s has quit IRC | 11:49 | |
*** yamamoto has joined #openstack-infra | 11:50 | |
mordred | infra-root: I've got a doctor's appointment this morning - out for the next few hours | 11:52 |
*** coolsvap has quit IRC | 11:55 | |
AJaeger | mordred: all the best! | 11:55 |
jamespage | morning/afternoon all - we had some issues with installation of libcurl4-gnutls-dev in our check/gate jobs yesterday which I understood where due to an ubuntu archive cache problem | 11:56 |
jamespage | still seeing some of those today - https://review.openstack.org/#/c/504310/ | 11:56 |
jamespage | do we still have some inconsistency somewhere? | 11:56 |
AJaeger | strigazi: the syntax and set up looks fine, otherwise Zuul would have complained - it was able to run the jobs successfully. There'S no regresssion from legacy to your jobs, so the migration looks fine. With all the problems, the -1 are to be expected. I think you can merge the change but I doubt you will be able to since there's no +1 by Jenkins yet. | 11:56 |
AJaeger | jamespage: http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html | 11:57 |
AJaeger | jamespage: basically: the problem is not fixed yet | 11:57 |
*** nicolasbock has joined #openstack-infra | 11:58 | |
*** stakeda has quit IRC | 11:59 | |
*** andreas_s has joined #openstack-infra | 11:59 | |
*** dprince has joined #openstack-infra | 12:00 | |
*** hashar has quit IRC | 12:01 | |
*** trown|outtypewww is now known as trown | 12:02 | |
*** lucasagomes is now known as lucas-hungry | 12:02 | |
*** andreas_s has quit IRC | 12:03 | |
*** andreas_s has joined #openstack-infra | 12:04 | |
*** thorst has joined #openstack-infra | 12:06 | |
*** andreas_s has quit IRC | 12:08 | |
*** andreas_s has joined #openstack-infra | 12:08 | |
*** shardy_lunch is now known as shardy | 12:09 | |
*** edmondsw has joined #openstack-infra | 12:10 | |
*** edmondsw_ has joined #openstack-infra | 12:10 | |
*** gcb has joined #openstack-infra | 12:12 | |
*** edmondsw has quit IRC | 12:14 | |
*** markvoelker has quit IRC | 12:16 | |
sambetts | AJaeger: are the zuul v2 docs completely gone now?? I'm trying to change a configuration in our third party CI and all the zuul docs are for zuul v3 now | 12:19 |
*** lifeless has quit IRC | 12:24 | |
AJaeger | sambetts: do you mean the infra-manual? You can check it out and build locally... | 12:25 |
*** gmann_afk is now known as gmann | 12:26 | |
openstackgerrit | Major Hayden proposed openstack-infra/project-config master: Remove OpenStack/Ceph/Virt repo from CentOS https://review.openstack.org/493003 | 12:26 |
*** hrybacki|trainin is now known as hrybacki | 12:26 | |
sambetts | AJaeger: so its not published any more? not planning on doing https://docs.openstack.org/infra/zuul/v2 like the other projects have ocata/pike etc | 12:26 |
*** eharney has joined #openstack-infra | 12:28 | |
*** lifeless has joined #openstack-infra | 12:31 | |
*** markvoelker has joined #openstack-infra | 12:33 | |
*** rosmaita has joined #openstack-infra | 12:33 | |
efried | Good morning infra. Is this a known issue? I'm seeing quite a bit of it: http://logs.openstack.org/06/502306/10/check/gate-nova-specs-docs-ubuntu-xenial/e9f94d9/console.html | 12:33 |
*** udesale has joined #openstack-infra | 12:36 | |
*** kgiusti has joined #openstack-infra | 12:36 | |
*** udesale has quit IRC | 12:37 | |
*** udesale has joined #openstack-infra | 12:37 | |
rosmaita | also on https://review.openstack.org/#/c/493654/7 | 12:38 |
rosmaita | actually, https://review.openstack.org/#/c/493654/8 | 12:39 |
*** ociuhandu has joined #openstack-infra | 12:40 | |
*** adarazs is now known as adarazs_brb | 12:41 | |
vsaienk0 | efried: we need to switch to bindep to fix it https://review.openstack.org/#/c/444201/ | 12:41 |
*** links has quit IRC | 12:41 | |
efried | vsaienk0 That needs to be done for every project? | 12:42 |
vsaienk0 | looks like upstream deb repo is broken, and we install default package list, which actually is not needed for ironic tests, so adding bindep to our project with exact depends fixes problem | 12:42 |
eumel8 | efried, rosmaita: hat are known issues. look at http://lists.openstack.org/pipermail/openstack-dev/2017-October/123489.html | 12:42 |
*** andreas_s has quit IRC | 12:43 | |
eumel8 | s/hat/that/ | 12:43 |
*** LindaWang has quit IRC | 12:43 | |
*** andreas_s has joined #openstack-infra | 12:43 | |
*** liusheng has quit IRC | 12:45 | |
*** chenying_ has quit IRC | 12:45 | |
vsaienk0 | efried: ideally each project should have its own bindep file with only exact dependencies it needs. By not having this file enforce jobs to install default package list | 12:45 |
*** liusheng has joined #openstack-infra | 12:45 | |
*** bobh has joined #openstack-infra | 12:47 | |
*** mriedem has joined #openstack-infra | 12:47 | |
*** florianf has quit IRC | 12:52 | |
*** florianf has joined #openstack-infra | 12:52 | |
openstackgerrit | Stephen Finucane proposed openstack-dev/pbr master: Discover Distribution through the class hierarchy https://review.openstack.org/399188 | 12:54 |
openstackgerrit | Stephen Finucane proposed openstack-dev/pbr master: Remove unnecessary 'if True' https://review.openstack.org/510806 | 12:54 |
stephenfin | dhellmann, mordred: Want to take a look at those? Think they can merge now (setuptools had changed stuff under the hood) | 12:54 |
*** LindaWang has joined #openstack-infra | 12:56 | |
*** andreas_s has quit IRC | 12:57 | |
*** lucas-hungry is now known as lucasagomes | 12:58 | |
*** andreas_s has joined #openstack-infra | 13:01 | |
*** adarazs_brb is now known as adarazs | 13:02 | |
*** jcoufal has joined #openstack-infra | 13:03 | |
*** esberglu has quit IRC | 13:03 | |
openstackgerrit | Merged openstack-infra/tripleo-ci master: Remove unnecessary scripts from tripleo-ci https://review.openstack.org/510828 | 13:06 |
AJaeger | sambetts: ah, that'S what you mean - better ask the zuul folks, can't help with that one | 13:10 |
AJaeger | efried, rosmaita http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html | 13:11 |
AJaeger | argh, wrong link - I see eumel8 gave the correct one... | 13:11 |
rosmaita | AJaeger thanks | 13:11 |
sambetts | AJaeger: what channel can I find zuul folks in? here or do they have a separate one? | 13:12 |
*** mat128 has joined #openstack-infra | 13:13 | |
AJaeger | sambetts: #zuul ;) | 13:13 |
*** baoli has joined #openstack-infra | 13:13 | |
sambetts | ah not openstack-zuul (tried that one and it didn't exist) | 13:13 |
AJaeger | sambetts: but here as well - just give them a chance to wake up and drink their morning coffee, please :) | 13:13 |
sambetts | of course :D | 13:13 |
AJaeger | rosmaita: regarding bindep: Yes, that might help in this case - you might want to review https://review.openstack.org/#/c/468159/ | 13:15 |
aspiers | FYI, how Google uses Gerrit: https://gitenterprise.me/2017/10/10/gerrit-user-summit-gerrit-at-google/ | 13:15 |
aspiers | pretty interesting setup at scale | 13:15 |
AJaeger | jamespage: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123489.html is the link I wanted to point you to earlier | 13:15 |
*** camunoz has joined #openstack-infra | 13:17 | |
*** esberglu has joined #openstack-infra | 13:18 | |
*** edmondsw_ is now known as edmondsw | 13:19 | |
*** ykarel has quit IRC | 13:20 | |
*** ykarel has joined #openstack-infra | 13:20 | |
*** esberglu has quit IRC | 13:21 | |
*** esberglu has joined #openstack-infra | 13:23 | |
*** tikitavi has joined #openstack-infra | 13:23 | |
*** chlong has joined #openstack-infra | 13:25 | |
*** cshastri has quit IRC | 13:27 | |
*** jaosorior has quit IRC | 13:27 | |
*** sree has joined #openstack-infra | 13:27 | |
*** dbecker has quit IRC | 13:27 | |
*** dbecker has joined #openstack-infra | 13:27 | |
*** sree has quit IRC | 13:31 | |
*** gmann is now known as gmann_afk | 13:31 | |
openstackgerrit | Jean-Philippe Evrard proposed openstack-infra/irc-meetings master: Moving the OpenStack-Ansible meeting time and channel https://review.openstack.org/511479 | 13:33 |
*** sree has joined #openstack-infra | 13:37 | |
*** sbezverk has joined #openstack-infra | 13:39 | |
fungi | okay, i'm here and catching up on scrollback now. i can already see we're still out of inodes on the logs volume even though the tempfile deletion pass and expired log purging are still going from before i went to sleep | 13:41 |
*** sree has quit IRC | 13:41 | |
AJaeger | still out of inodes? Ooops ;( And Ubuntu mirror also still broken ;( | 13:42 |
AJaeger | fungi: you didn't sleep long enough ;) Good morning! | 13:42 |
AJaeger | fungi, jeblair, good news: We had for the first time periodic jobs running with Zuul v3. Read in backscroll and etherpad some of the issues it uncovered ;) | 13:43 |
*** cshastri has joined #openstack-infra | 13:43 | |
*** kashyap has joined #openstack-infra | 13:44 | |
kashyap | Can anyone link to the upstream live migration job, please? | 13:47 |
fungi | ykarel: the missing tripleo release tarballs from yesterday are due to the stale ubuntu mirror issue. we'll rerun the jobs to build and publish them as soon as that's sorted out | 13:48 |
*** gouthamr has joined #openstack-infra | 13:49 | |
*** kiennt26 has joined #openstack-infra | 13:50 | |
*** nikhil_ has joined #openstack-infra | 13:52 | |
*** kiennt26 has quit IRC | 13:52 | |
*** nikhil_ is now known as Guest48516 | 13:52 | |
*** kiennt26 has joined #openstack-infra | 13:53 | |
*** Guest48516 is now known as nikhil_k | 13:53 | |
*** andreas_s has quit IRC | 13:54 | |
*** andreas_s has joined #openstack-infra | 13:55 | |
*** kiennt26 has quit IRC | 13:55 | |
*** kiennt26 has joined #openstack-infra | 13:55 | |
ykarel | fungi, Ok | 13:56 |
*** srobert has joined #openstack-infra | 13:56 | |
*** srobert has quit IRC | 13:56 | |
*** smatzek has joined #openstack-infra | 13:56 | |
fungi | the last mirror-update pass ianw started seems to still be running | 13:57 |
*** srobert has joined #openstack-infra | 13:57 | |
openstackgerrit | Petr Kovar proposed openstack-infra/irc-meetings master: Update chair for doc team meeting https://review.openstack.org/511484 | 13:57 |
*** andreas_s has quit IRC | 13:57 | |
AJaeger | fungi, ianw sent a status report via email - on the infra list | 13:57 |
*** andreas_s has joined #openstack-infra | 13:57 | |
fungi | AJaeger: thanks, i'm caught up on irc scrollback so proceeding to e-mail backlog next | 13:58 |
fungi | also, i seem to be having some massive packet loss from here today... not sure what's up | 13:58 |
fungi | (my home broadband uplink i mean) | 13:58 |
AJaeger | not good ;( | 13:59 |
*** sree has joined #openstack-infra | 14:00 | |
Shrews | fungi: Oh no! Pack loss i not goo . Hope t gets b tter fo you. | 14:03 |
fungi | ;) | 14:03 |
*** gildub has quit IRC | 14:06 | |
*** jaosorior has joined #openstack-infra | 14:09 | |
AJaeger | :) | 14:09 |
*** hamzy has quit IRC | 14:12 | |
*** hongbin has joined #openstack-infra | 14:13 | |
*** chlong has quit IRC | 14:13 | |
*** eumel8 has quit IRC | 14:14 | |
*** florianf has quit IRC | 14:15 | |
*** florianf has joined #openstack-infra | 14:15 | |
*** chlong has joined #openstack-infra | 14:19 | |
pabelanger | ianw: thanks | 14:20 |
pabelanger | Ya, inodes look bad still | 14:21 |
jeblair | i'm on mirror-update and looking through the screen windows and don't see one running reprepro | 14:22 |
fungi | infra-root: making a judgement call here, i think we're going to need to drop our retention on the logs site | 14:22 |
*** hemna_ has joined #openstack-infra | 14:23 | |
fungi | jeblair: one of the screen windows was when i looked a moment ago | 14:23 |
jeblair | nor do i see a reprepro process running | 14:23 |
pabelanger | jeblair: yes, ianw last updated is it failed to write to AFS directories | 14:23 |
*** yamamoto has quit IRC | 14:23 | |
pabelanger | /tmp/ianw contains logs | 14:23 |
fungi | jeblair: no, wauit, you're right. that was the flock bysywait i saw | 14:23 |
jeblair | pabelanger: where's that update? last i saw ianw said it was still running. | 14:24 |
fungi | jeblair: infra mailing list | 14:24 |
pabelanger | jeblair: ianw posted a reply to infra ML | 14:24 |
pabelanger | I'm still getting up to speed myself | 14:24 |
jeblair | "I restarted for good luck," | 14:24 |
*** cshastri has quit IRC | 14:25 | |
fungi | may as well have been "for great justice" | 14:25 |
jeblair | that makes me think we should see either a process running, or a prompt right after a process died | 14:25 |
jeblair | i can find neither | 14:25 |
*** rbrndt has joined #openstack-infra | 14:27 | |
jeblair | also, do we still have the old images? can we delete the new ones from nodepool to get things working again? | 14:27 |
*** psachin has quit IRC | 14:28 | |
pabelanger | looking | 14:29 |
pabelanger | we still haven't upload to rackspace, so we should be good there for now | 14:29 |
pabelanger | I don't think out other clouds have a good image anymore | 14:29 |
jeblair | okay. we should have deleted the new images yesterday, as soon as this started. | 14:30 |
*** armax has joined #openstack-infra | 14:30 | |
fungi | but does nodepool remove the old image from disk if it hasn't been able to upload newer images? | 14:30 |
pabelanger | since the AFS read-only mirror is working, we could also do the work needed to have DIBs use them | 14:30 |
jeblair | fungi: good point, it does not. | 14:30 |
pabelanger | which, should fix our issues for now, but pin us at specific version of xenial | 14:30 |
pabelanger | fungi: yah | 14:30 |
fungi | thinking rackspace may have saved us here | 14:31 |
jeblair | Shrews: around? | 14:31 |
fungi | since there are several people focusing on the mirror situation, i'll focus on the logs site | 14:31 |
jeblair | fungi: ++ | 14:31 |
jeblair | fungi: i support whatever retention period you want to use :) | 14:31 |
fungi | infra-root: last call for objections, i'm planning to reduce our log retention from 4 to 3 weeks (and if that doesn't help fast enough, i'll drop it to 2) | 14:32 |
pabelanger | ++ | 14:32 |
Shrews | jeblair: yes | 14:32 |
clarkb | fungi: seems reasonable, its still just an inode problem right? | 14:32 |
jeblair | Shrews: can you and pabelanger work on getting the old rax images uploaded everywhere? | 14:33 |
fungi | clarkb: yes, but we have so many that traversing them to fnid tempfiles or 4-week old files seems to be taking too long, so we need something with a higher hit-rate for now i expect | 14:33 |
jeblair | this is a tricky thing we've never done before, so best to proceed carefully | 14:33 |
clarkb | fungi: maybe before deleting the old stuff really quickly do an inode cou t between some common jobs like tempest just to make sure we havent regressed there too? | 14:34 |
pabelanger | jeblair: Shrews: if we manually upload, we could use cloud-images section in nodepool? | 14:34 |
*** andreas_s has quit IRC | 14:34 | |
pabelanger | otherwise, will defer to Shrews | 14:34 |
*** andreas_s has joined #openstack-infra | 14:35 | |
clarkb | though that will just tell us if it has changed, not what or why (so maybe less important) | 14:35 |
fungi | clarkb: i'll see if i can spot anything real quick | 14:35 |
Shrews | pabelanger: not sure. lemme catch up on things | 14:35 |
fungi | clarkb: but my guess is that all these other unrelated issues are just resulting in higher log volume as jobs fail more quickly and people are rechecking them all | 14:36 |
clarkb | ya | 14:37 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492 | 14:37 |
jeblair | pabelanger, Shrews: rolling forward is a good option too :) | 14:38 |
pabelanger | jeblair: clarkb: fungi: ^in case we want to go this route. Should make our DIB builds use AFS mirrors for ubuntu | 14:38 |
fungi | clarkb: actually, i'm not even sure i can easily get a representative sample since we're not successfully uploading logs | 14:38 |
*** xarses has joined #openstack-infra | 14:38 | |
clarkb | fungi: oh right ugh | 14:39 |
fungi | i need to find one which successfully managed to upload all its files | 14:39 |
pabelanger | jeblair: clarkb: fungi: regardless what we do, I believe we also want to disable nb03.o.o, as it has a slow uplink to clouds. Takes upwards of 6 hours to upload, vs 30mins on nb04.o.o | 14:39 |
*** caphrim007 has quit IRC | 14:39 | |
*** links has joined #openstack-infra | 14:39 | |
*** andreww has joined #openstack-infra | 14:41 | |
pabelanger | I've stopped nb03 for now | 14:41 |
*** andreww has quit IRC | 14:41 | |
fungi | clarkb: i expect that it will be easier to find the culprit(s) and address the issue once we have working log uploads again, so i'm going ahead with the >3-week purge now | 14:42 |
pabelanger | Okay, I think ubuntu-xenial-0000001137 is the DIB we want to save | 14:42 |
pabelanger | that is our oldest ubuntu-xenial image | 14:42 |
Shrews | pabelanger: so cloud-images is a feature/zuulv3 thing (builders aren't running that), so that's a no-go | 14:42 |
pabelanger | kk | 14:43 |
*** xarses has quit IRC | 14:43 | |
pabelanger | ubuntu-xenial-0000001138 is also an option I think | 14:43 |
*** andreww has joined #openstack-infra | 14:44 | |
*** supertakumi86 has joined #openstack-infra | 14:44 | |
pabelanger | and believe that is what we are booting in rackspace now | 14:44 |
pabelanger | trying to confirm | 14:44 |
Shrews | pabelanger: is rebuilding a new image an option? | 14:44 |
openstackgerrit | Michael Turek proposed openstack/diskimage-builder master: Add iscsi-boot element https://review.openstack.org/511494 | 14:44 |
andreykurilin | hi folks! There are a lot of POST_FAILURES in zuul_v2. Is it ok? | 14:44 |
pabelanger | Shrews: it is, but we'll need 511492 | 14:44 |
pabelanger | I'm happy to give it a try | 14:44 |
*** supertakumi86 has quit IRC | 14:45 | |
openstackgerrit | Michael Turek proposed openstack/diskimage-builder master: Add iscsi-boot element https://review.openstack.org/511494 | 14:45 |
pabelanger | infra-root: do we want to roll forward an image with 511492 first? | 14:45 |
jeblair | clarkb: want to work on a status alert? | 14:45 |
pabelanger | but, we should first copy ubuntu-xenial-0000001138 or ubuntu-xenial-0000001137 to be safe | 14:46 |
*** jkilpatr has quit IRC | 14:46 | |
clarkb | jeblair: sure | 14:46 |
jeblair | pabelanger: whatever you and Shrews think is safest and quickest | 14:46 |
fungi | for the current 3-week expiration pass i've also switched from -exec rm {} \; to -delete and removed the check for removing empty directories for now in hopes this will cover ground more quickly | 14:46 |
*** iyamahat has joined #openstack-infra | 14:46 | |
pabelanger | jeblair: ack | 14:47 |
pabelanger | Shrews: okay, so sounds like you want to try new image? I'll let you review 511492 and I'll save our DIBs | 14:47 |
clarkb | how does "Job log retention is being reduced to get inode consumption under control. Separately we are updating job instance images to use our ubuntu mirrors temporarily addressing the problems with Xenial packaging." | 14:48 |
clarkb | er how does that look | 14:48 |
Shrews | pabelanger: i think so. if we delete the bad images, i think a new one is just going to be built anyway | 14:48 |
clarkb | maybe too verbose | 14:48 |
jeblair | clarkb: maybe cover the problem and symptoms first. i don't think folks need to know what we're doing | 14:48 |
fungi | s/updating/reverting/ maybe | 14:48 |
fungi | but yeah | 14:49 |
pabelanger | Shrews: okay, lets land 511492 and give it a go | 14:49 |
Shrews | pabelanger: +A'd | 14:49 |
*** iyamahat_ has joined #openstack-infra | 14:49 | |
clarkb | "Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow." | 14:50 |
clarkb | that better? | 14:50 |
jeblair | clarkb: wfm | 14:50 |
fungi | ship it | 14:50 |
pabelanger | ++ | 14:50 |
Shrews | pabelanger: if nb03 is stopped, 0000001137 and 0000001138 will not be deleted, but good to make a backup anyway | 14:50 |
pabelanger | Shrews: ya, doing that in /opt/nodepool_dib.backup-pabelanger now | 14:51 |
*** iyamahat__ has joined #openstack-infra | 14:51 | |
*** Swami has joined #openstack-infra | 14:51 | |
Shrews | pabelanger: once your change lands, we'll delete 0000001140 and 0000001141 | 14:51 |
clarkb | #status alert Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow. | 14:51 |
openstackstatus | clarkb: sending alert | 14:51 |
jeblair | i'm going to add a new volume to the afs server because the debian volume has 7G free. then i will increase the quota on that volume. then i will reboot both the mirror-update server and afs01.dfw. then i will attempt the mirror repair again. | 14:51 |
pabelanger | clarkb: jobs that don't need gnutls, could stop using bindep-fallback.txt and properly add their own bindep.txt also | 14:51 |
pabelanger | not long term, but help mitigate the issue | 14:52 |
*** iyamahat has quit IRC | 14:52 | |
*** jkilpatr has joined #openstack-infra | 14:53 | |
*** andreas_s has quit IRC | 14:53 | |
*** iyamahat__ has quit IRC | 14:53 | |
*** yamahata has joined #openstack-infra | 14:53 | |
*** iyamahat__ has joined #openstack-infra | 14:53 | |
-openstackstatus- NOTICE: Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow. | 14:53 | |
*** ChanServ changes topic to "Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow." | 14:54 | |
clarkb | Anything that uses cryptography needs it though right? That is going to be many things | 14:54 |
clarkb | the subset that doesnt need it is probably small enough that identifying it is too mich work | 14:54 |
*** rcernin has quit IRC | 14:55 | |
*** iyamahat_ has quit IRC | 14:55 | |
*** hamzy has joined #openstack-infra | 14:55 | |
openstackstatus | clarkb: finished sending alert | 14:57 |
*** iyamahat__ has quit IRC | 14:58 | |
*** iyamahat has joined #openstack-infra | 14:58 | |
pabelanger | clarkb: Shrews: 511492 failed with POST_FAILURE. I suggest we add nodepool to emergency file, and manually apply. While we attempt to recheck it | 14:58 |
clarkb | pabelanger: wfm, though its builders that need it? | 14:59 |
jeblair | clarkb, pabelanger: is the npm mirror in production? | 14:59 |
pabelanger | no, there is a patch up to remove it from AFS | 15:00 |
pabelanger | clarkb: ya, nb04 | 15:00 |
jeblair | okay, i'll kill the process | 15:00 |
Shrews | pabelanger: are you confident that will fix the build image? because the other option here is just pause image builds and delete the bad images and run with the older images for awhile | 15:01 |
jeblair | oh that's nice, the npm mirror releases even if the process is killed | 15:01 |
*** eharney has quit IRC | 15:02 | |
*** chlong has quit IRC | 15:02 | |
*** andreas_s has joined #openstack-infra | 15:02 | |
Shrews | but if we know it will fix it, would be best to move forward with the fix, IMO | 15:02 |
clarkb | Shrews: I dont think any of our images are old enough to work at this point | 15:03 |
jeblair | clarkb: rax | 15:03 |
pabelanger | Shrews: no, we need to first upload the good images, only rackspace today have them | 15:03 |
clarkb | right except in rax | 15:03 |
pabelanger | so, that process needs to be manual | 15:03 |
jeblair | clarkb: we have the images on disk | 15:03 |
clarkb | oh right we keep all the formats | 15:03 |
clarkb | until all formats can be removed | 15:04 |
jeblair | and this is why | 15:04 |
Shrews | oh, rax has the oldest ones. got it | 15:05 |
*** jaosorior has quit IRC | 15:05 | |
pabelanger | okay, patch manually applied | 15:07 |
pabelanger | ready to image-build xenial | 15:07 |
pabelanger | clarkb: Shrews:^ | 15:07 |
*** dhajare has quit IRC | 15:08 | |
Shrews | pabelanger: cool. kick it off | 15:08 |
pabelanger | started | 15:09 |
evrardjp | hey, is there an env var I can use in my job to check if I am under zuul v3 or jenkins? | 15:09 |
pabelanger | evrardjp: I think we said $(whoami)? If user is jenkins, then you are jenkins | 15:10 |
pabelanger | otherwise it would be zuul | 15:10 |
evrardjp | ok | 15:10 |
AJaeger | pabelanger: NO! | 15:10 |
pabelanger | evrardjp: listen to AJaeger | 15:10 |
evrardjp | I couldn't use that :) | 15:10 |
AJaeger | evrardjp: check email by Monty, let me find it quickly... | 15:10 |
*** annp has joined #openstack-infra | 15:11 | |
evrardjp | AJaeger: let me search then | 15:11 |
pabelanger | Shrews: k, we are pulling packages from http://mirror.dfw.rax.openstack.org/ubuntu now | 15:11 |
AJaeger | evrardjp: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123049.html | 15:11 |
evrardjp | wow that was fast | 15:11 |
AJaeger | ;) | 15:11 |
AJaeger | bbl, shutting down here so won't be able to read backscroll for some time... | 15:12 |
*** sree has quit IRC | 15:12 | |
*** AJaeger has quit IRC | 15:12 | |
evrardjp | AJaeger but that's only openstack's zuul's behavior, I don't have a magic variable I can use outside if need be | 15:13 |
evrardjp | at least I have something. | 15:13 |
evrardjp | thanks! | 15:13 |
Shrews | pabelanger: i think we'll need delete 1141 before the new one will upload. i see 1141 still uploading rax too | 15:14 |
*** iyamahat has quit IRC | 15:14 | |
*** yamahata has quit IRC | 15:14 | |
*** sadasu has joined #openstack-infra | 15:15 | |
Shrews | (my weechat session is picking a very poor time to randomly freeze on me) | 15:15 |
*** andreas_s has quit IRC | 15:15 | |
*** eharney has joined #openstack-infra | 15:15 | |
pabelanger | k, we had a minor issue with --allow-unauthenticated, working on patch | 15:16 |
*** chlong has joined #openstack-infra | 15:16 | |
jeblair | is the rubygems mirror in production? | 15:19 |
pabelanger | no, we are using reverse proxy for that also | 15:20 |
jeblair | okay we *really* need to clean this stuff up | 15:20 |
fungi | infra-root: the three-week expiration is not making traction fast enough to keep pace with new log uploads either. i'm going to switch to a two-week expiration as a last-ditch before we have to consider disabling uploads for a while to bring utilization back down or randomly deletnig rtees of the filesystem since using find to stat modify time is just too slow | 15:20 |
jeblair | npm and ruby are both making this work very difficult | 15:20 |
jeblair | infra-root: i'm going to delete both from afs | 15:21 |
fungi | jeblair: sounds good | 15:21 |
clarkb | fungi: do you think stat might not be able to keepp up? or are older logs less inody? | 15:22 |
fungi | clarkb: i expect it's a combination of both of those plus we're uploading a lot more logs with zuul v3 also running check jobs | 15:22 |
fungi | problem is finding those newer inody job logs and purging them is at least as expensive if not moreso than the date-based expirations | 15:23 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492 | 15:23 |
fungi | if we had the tree sharded by date/time somehow this would be a cinch | 15:23 |
pabelanger | clarkb: Shrews: ^updates needed for AFS mirrors in DIBs. Logic come from nodepool-dsvm jobs | 15:23 |
pabelanger | helps if I git add | 15:24 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492 | 15:24 |
*** yamamoto has joined #openstack-infra | 15:24 | |
fungi | clarkb: and also as i said earlier, all the recent issues in general are causing people to recheck changes far more frequently in vain hope that they'll suddenly work | 15:24 |
clarkb | pabelanger: looks like it should work | 15:25 |
pabelanger | clarkb: I think we might have to ask tripleo to drop more things, /etc for example | 15:25 |
pabelanger | with 2 zuuls running and uploading to logs.o.o, I won't be surprised if tripleo jobs are eating all the inodes now | 15:26 |
clarkb | pabelanger: ya once we've got some breathing room we'll need to gather data and see where inode usage is | 15:26 |
jeblair | i'm going to restart afs02.dfw | 15:27 |
*** udesale has quit IRC | 15:27 | |
*** electrofelix has quit IRC | 15:30 | |
fungi | i have my fingers crossed on the current >2-week purge, but it's not looking good so far and i'm afraid we're going to have to do recursive deletes based on some sort of filesystem glob rather than by age to get things back to sanity before we can make further progress through less disruptive means | 15:30 |
jeblair | i'm going to restart afs01.dfw now | 15:31 |
*** yamamoto has quit IRC | 15:31 | |
fungi | like, i can delete _all_ logs for specific jobs by name, or just remove jobs at random by wiping a high-level sibdirectory or two | 15:31 |
jeblair | and mirror-update | 15:31 |
*** dangers has joined #openstack-infra | 15:32 | |
clarkb | fungi: we might be able to construct some deletes based on change numbers to roughly correlate to dates? | 15:32 |
*** LindaWang has quit IRC | 15:32 | |
fungi | that would be a very rough correlation | 15:32 |
fungi | like, what delete jobs for any change id numbers below a certain threshhold? | 15:34 |
*** andreas_s has joined #openstack-infra | 15:34 | |
*** Swami has quit IRC | 15:34 | |
*** Swami has joined #openstack-infra | 15:35 | |
*** hashar has joined #openstack-infra | 15:36 | |
fungi | a quick listing of 6-digit change ids prior to 500000 says there are 4646 of those | 15:36 |
clarkb | ya | 15:36 |
clarkb | though at this point those may actually be relatively active as they would've aged out already on their own otherwise | 15:37 |
*** iyamahat has joined #openstack-infra | 15:37 | |
*** vsaienk0 has quit IRC | 15:37 | |
*** ykarel has quit IRC | 15:37 | |
*** annp has quit IRC | 15:37 | |
pabelanger | I'm starting to like the idea of top-level hash by UTC date again | 15:38 |
*** iyamahat has quit IRC | 15:38 | |
*** iyamahat has joined #openstack-infra | 15:38 | |
jeblair | let's not redesign the system now | 15:38 |
*** ykarel has joined #openstack-infra | 15:38 | |
pabelanger | okay, ubuntu-xenial now building properly with latest 511492 applied | 15:39 |
pabelanger | in devstack-cache element now | 15:39 |
*** e0ne has quit IRC | 15:40 | |
*** andreas_s has quit IRC | 15:43 | |
clarkb | find gate-tripleo-ci-centos-7-3nodes-multinode-nv | grep -v '/tmp/ansible' | wc -l reports 39549 /me finds some other comparisons | 15:43 |
*** yamamoto has joined #openstack-infra | 15:45 | |
*** yamamoto has quit IRC | 15:45 | |
clarkb | er that was for multiple runs, its 9889 for a single run. 917 for single run of the tempest multinode job | 15:46 |
pabelanger | 100x | 15:46 |
pabelanger | :( | 15:46 |
*** links has quit IRC | 15:47 | |
*** andreas_s has joined #openstack-infra | 15:48 | |
clarkb | legacy multinode job under zuulv3 is roughly in that 900 range too | 15:48 |
jeblair | pabelanger: how do i run the reprepro _detect command? | 15:48 |
pabelanger | jeblair: you be: reprepro _detect | 15:49 |
pabelanger | reprepro --configdir /etc/reprepro/ubuntu _detect maybe | 15:49 |
jeblair | pabelanger: i ran: | 15:50 |
jeblair | cd /afs/.openstack.org/mirror/ubuntu | 15:50 |
jeblair | find pool -type f -print | reprepro -b . _detect | 15:50 |
jeblair | and then got: | 15:50 |
jeblair | Error opening config file './conf/distributions': No such file or directory(2) | 15:50 |
jeblair | pabelanger: so i'm looking for the exact command you or ianw ran | 15:50 |
jeblair | should i not be doing the find thing? | 15:51 |
jeblair | that's what it said in step 3 of https://github.com/esc/reprepro/blob/master/docs/recovery which ianw linked | 15:51 |
ihrachys | all those POST_FAILURE failures that happened on all my patches yesterday, are those gone and we can recheck? | 15:51 |
ihrachys | failures as in https://review.openstack.org/#/c/507966/ | 15:51 |
jeblair | ihrachys: no, see channel topic | 15:51 |
ihrachys | ok thanks | 15:52 |
*** kiennt26 has quit IRC | 15:52 | |
Shrews | pabelanger: so... i'm not sure how any image uploads are working at all. i'm seeing shade exceptions during image upload in builder logs | 15:52 |
pabelanger | Shrews: not sure either, I haven't looked at logs yet | 15:53 |
*** vsaienk0 has joined #openstack-infra | 15:54 | |
pabelanger | jeblair: hmm, let me see, I didn't try _detect | 15:54 |
pabelanger | jeblair: I think you need to pass --confdir /etc/reprepro/ubuntu to your reprepro command | 15:55 |
*** yamahata has joined #openstack-infra | 15:55 | |
jeblair | pabelanger: okay -- is that the command i should be running? | 15:55 |
*** egonzalez has quit IRC | 15:55 | |
jeblair | i'm basically trying to just follow whatever instructions you and ianw gave. i thought it was clear, but it's becoming less so | 15:55 |
*** priteau has quit IRC | 15:55 | |
jeblair | pabelanger: like, i'm very confused why you don't think i should run _detect. what do you think i should do? | 15:56 |
pabelanger | jeblair: I am not sure, which document are you looking at currently? What is it you want reprepro to do? | 15:56 |
*** kashyap has left #openstack-infra | 15:56 | |
jeblair | https://github.com/esc/reprepro/blob/master/docs/recovery | 15:56 |
jeblair | pabelanger: ianw said checksums.db was bad. that says that's what you do when checksums.db is bad. | 15:56 |
jeblair | pabelanger: is that not what you were doing yesterday? | 15:56 |
pabelanger | no, I only tried step 1 (rereference) before I passed the torch to ianw | 15:57 |
pabelanger | so, this is new process for me also | 15:57 |
jeblair | pabelanger: how did you determine referencesdb was bad? | 15:57 |
pabelanger | jeblair: I didn't, I never deleted referencedb, but just tried rereference to see if there was any corruption | 15:58 |
pabelanger | it command worked properly | 15:58 |
jeblair | pabelanger: thanks. i'll take it from here. | 15:58 |
pabelanger | okay | 15:59 |
openstackgerrit | Brian Rosmaita proposed openstack-infra/project-config master: Remove workflow +1 on glance_store from swift-core https://review.openstack.org/511517 | 15:59 |
*** andreas_s has quit IRC | 16:00 | |
*** links has joined #openstack-infra | 16:01 | |
*** ralonsoh has quit IRC | 16:01 | |
*** chlong has quit IRC | 16:03 | |
*** tikitavi has quit IRC | 16:04 | |
jeblair | #status log removed mirror.npm volume from afs | 16:04 |
openstackstatus | jeblair: finished logging | 16:04 |
*** ykarel is now known as ykarel|afk | 16:05 | |
*** edmondsw has quit IRC | 16:05 | |
pabelanger | Shrews: it looks like maybe just rackspace upload have the issue in shade. I can see an inap working in debug log | 16:06 |
*** erlon has quit IRC | 16:07 | |
smatzek | The trove gate has been broken by one thing or another since the PTG. I've been working for 2-3 weeks trying to fix it up. In the past couple days I've seen errors like this from the gate-trove-python27-ubuntu-xenial checks whereas the openstack-tox-py27 runs clean. Is this a known issue with v3? "libcurl4-gnutls-dev : Depends: libcurl3-gnutls (= 7.47.0-1ubuntu2.2) but 7.47.0-1ubuntu2.3 is to be installed" | 16:08 |
smatzek | http://logs.openstack.org/87/507087/15/check/gate-trove-python27-ubuntu-xenial/3168c43/console.html | 16:08 |
jeblair | smatzek: we're not running v3. it's a known issue. see topic. | 16:08 |
Shrews | pabelanger: i think it's only providers using tasks to upload images | 16:09 |
pabelanger | Shrews: okay, I think that is only rackspace for us | 16:09 |
*** dangers has quit IRC | 16:09 | |
smatzek | thanks, I read the upload issue but glazed over the gnutls | 16:09 |
*** masber has quit IRC | 16:09 | |
clarkb | fungi: poking around the tmp/ansible fix definitely seems to have cut down on tripleo inode consumption but they are still about an order of magnitude more inodes per job run than say multinode devstack + tempest | 16:10 |
pabelanger | we're just compressing ubuntu-xenial DIB now, shouldn't be much longer before we start uploads | 16:10 |
*** andreas_s has joined #openstack-infra | 16:10 | |
pabelanger | smatzek: hopefully not much longer, new images should be coming online in the next hour | 16:10 |
*** dangers has joined #openstack-infra | 16:10 | |
*** camunoz has quit IRC | 16:11 | |
clarkb | and are significantly smaller under zuulv3 I think because log collection is broken there for som reason for them | 16:11 |
fungi | clarkb: good to know. we _could_ just delete the logs for those jobs specifically. is there a solid file glob i could match on to get all those? | 16:11 |
*** edmondsw has joined #openstack-infra | 16:12 | |
fungi | trying to do ti by arbitrary pattern matching with find is not going to be fast enough | 16:12 |
clarkb | fungi: gate-tripleo-ci- is the job name prefix | 16:12 |
clarkb | the ara install at top level of all zuulv3 jobs is coming in at 400-600 inodes depending on job looks like | 16:13 |
clarkb | which may significantly bump inode overhead for all the things that weren't really copying many files before hand | 16:13 |
Shrews | pabelanger: i am REALLY confused as to how rax has _any_ uploads | 16:13 |
*** dhinesh has joined #openstack-infra | 16:14 | |
*** andreas_s has quit IRC | 16:14 | |
clarkb | 455 inodes for nova pep8 under zuulv3, 24 under v2 | 16:15 |
jeblair | pabelanger: i have a dumb question -- is it possible for us to just copy the db files from the most recent read-only release into place, and then run reprepro normally? | 16:15 |
clarkb | 441 of the v3 side is ara | 16:15 |
jeblair | fungi, clarkb: ^ do you know enough about reprepro to know if that's okay? | 16:15 |
*** AJaeger has joined #openstack-infra | 16:16 | |
*** edmondsw has quit IRC | 16:16 | |
fungi | jeblair: that _seems_ like it should be okay, but i don't really know. i get the impression that's where it keeps its state anyway so makes sense that it should be able to roll forward again from there | 16:16 |
pabelanger | jeblair: I don't see why we can't try. reprepro should be smart enough to detect differences and update where needed. reprepro check and reprepro checkpool should be how we audit | 16:16 |
*** priteau has joined #openstack-infra | 16:16 | |
jeblair | ya -- like maybe it re-downloads some new files or something. that'd be fine. | 16:16 |
jeblair | okay, i'll give that a shot | 16:16 |
jeblair | and verify with pabelanger's suggested commands | 16:17 |
fungi | pabelanger: unfortunately the checks seem to be designed to retrieve every file from the filesystem (so over slow udp datagrams in the case of afs) to recalculate checksums, right? | 16:17 |
jeblair | fungi: well, that's been the repair process to date anyway | 16:17 |
fungi | or does it have a check mode to just verify filenames? | 16:18 |
* SpamapS peeking back in and seeing reprepro and inode issues... quickly retreats like a groundhog seeing his shadow | 16:18 | |
*** dangers` has joined #openstack-infra | 16:19 | |
*** AJaeger has quit IRC | 16:19 | |
*** AJaeger has joined #openstack-infra | 16:19 | |
*** dangers has quit IRC | 16:20 | |
fungi | clarkb: are there specific paths under those gate-tripleo-ci-* log trees i should remove, or are those scattered and better to just remove the entire tree for each job matching that name pattern? | 16:20 |
SpamapS | reprepro's check process likely is also reading all of the metadata from every package. | 16:20 |
clarkb | fungi: logs/undercloud/tmp/ansible logs/ara_oooq logs/undercloud/etc seem to be large hitters | 16:21 |
*** AJaeger has quit IRC | 16:21 | |
*** sadasu has quit IRC | 16:21 | |
clarkb | fungi: I'm currently trying to get a count for what I hope is a representative nova change t osee if we should expect to be able to store 4 weeks of nova change logs | 16:21 |
clarkb | change 509039 has ~157208 inodes | 16:22 |
mordred | morning all | 16:22 |
*** AJaeger has joined #openstack-infra | 16:22 | |
clarkb | we have inodes for about 5122 nova changes if we treat that as representative | 16:22 |
*** camunoz has joined #openstack-infra | 16:22 | |
mordred | clarkb: holy crap! - 157208 is a lot of inodes for one change | 16:22 |
pabelanger | fungi: yah, it was a slow process last night when I did reprepro export, that walked all the files in the pool for generating indexes | 16:22 |
clarkb | mordred: ya I'm going to start trying to break that down | 16:23 |
*** andreas_s has joined #openstack-infra | 16:24 | |
fungi | clarkb: thanks, i'm still waiting for ls | wc -l to return a count for the pattern /srv/static/logs/??/??????/*/*/gate-tripleo-ci-* | 16:24 |
fungi | well, ls -d specifically | 16:24 |
pabelanger | clarkb: as I am watching this DIB rebuilt again, I'm noticing we're approaching 1 hour build times again. I think it is possible we might want to delete our git cache on builders for a fresh (and smaller) start shorlty | 16:26 |
stephenfin | dhellmann: Got a few mins? | 16:27 |
stephenfin | Curious about the rationale behind pre/post-versioning in pbr | 16:27 |
dhellmann | stephenfin : I'm on a call. ~30 min? | 16:27 |
* stephenfin didn't know you could do 'Sem-Ver:' trailers | 16:27 | |
stephenfin | dhellmann: I'll probably be gone by then. Tomorrow is fine :) | 16:27 |
dhellmann | stephenfin : ok. or email to the -dev list (this sounds like something others might be interested in and have input into) | 16:28 |
jeblair | it looks like the checksumsdb on the ro volume is corrupt, so i'm back to running the find | _detect command | 16:28 |
AJaeger | fungi, what about remove /srv/static/logs/0b0bbd59a9be905da869ace3797919f9cd6217/ etc - those are logs that nobdy finds... | 16:28 |
AJaeger | these came from initial Zuul v3 logs | 16:28 |
clarkb | mordred: check's patchset 4 of that change is 116373 | 16:28 |
clarkb | looks like ~8 rechecks | 16:28 |
clarkb | I think there is a lot of weight behind the "constant rechecks are just making it worse" theory based on ^ | 16:29 |
jeblair | i am running that command with a copy of the db directory on local disk, so if there is any trouble writing to the fileserver, we shouldn't lose the whole operation. | 16:29 |
mordred | clarkb: yah - that makes an amount of sense | 16:29 |
pabelanger | clarkb: Shrews: ubuntu-xenial DIB finished, we've started uploads | 16:29 |
jeblair | but as ianw calculated, best case for this is probably 6 hours | 16:30 |
Shrews | pabelanger: ++ | 16:30 |
clarkb | openstack-tox-pep8 is ~1800 and old pep8 job is ~200 over those rechecks | 16:30 |
clarkb | maybe we sould consider not building a static ara for every build? we could maybe just upload them on failures? | 16:30 |
*** jkilpatr_ has joined #openstack-infra | 16:31 | |
*** jpich has quit IRC | 16:31 | |
fungi | AJaeger: sure, i can do that and maybe it'll free up some as well | 16:31 |
pabelanger | clarkb: wow, large difference | 16:32 |
*** jkilpatr has quit IRC | 16:32 | |
pabelanger | clarkb: sounds like we need to update ARA regardless. But I do only look at it today if there is a failure | 16:32 |
*** andreas_s has quit IRC | 16:33 | |
*** trown is now known as trown|lunch | 16:33 | |
clarkb | pabelanger: ya me too, which is why I had that idea:) it is really handy for understanding failures, but skimming successes tends to happen in the job output log for me (or job specific logs) | 16:33 |
pabelanger | clarkb: maybe we can see how much effort would be involved from dmsimard to clean up little files | 16:34 |
Shrews | pabelanger: I'm actually unclear as to why those are already uploading since 1141 is less than 24 hours old | 16:34 |
*** ykarel|afk has quit IRC | 16:34 | |
*** sambetts is now known as sambetts|afk | 16:34 | |
pabelanger | Shrews: not sure myself | 16:34 |
SpamapS | is there a summary of why some jobs take up so many inodes? Purely curious. | 16:35 |
*** Apoorva has joined #openstack-infra | 16:35 | |
*** Guest66337 has quit IRC | 16:35 | |
clarkb | SpamapS: in the general case, ara seems to be a big hitter. In specific cases some tripleo jobs were copying all of ansibles /tmp contents | 16:36 |
*** hashar is now known as hasharAway | 16:36 | |
clarkb | SpamapS: there are also bits of some jobs like tripleo that copy a good chunk of /etc which if grabbing /etc/selinux gets about an aras worth of files too | 16:36 |
*** vsaienk0 has quit IRC | 16:36 | |
pabelanger | 2017-10-12 16:36:50,057 INFO nodepool.builder.UploadWorker.0: Image build ubuntu-xenial-0000001155 in ovh-bhs1 is ready | 16:37 |
pabelanger | OVH nice and fast :) | 16:37 |
clarkb | oh also tripleo has its internal ara_oooq which is much larger than the zuul ara | 16:37 |
jeblair | clarkb: ara is optional in zuulv3; the process to disable it is just to uninstall it and restart the executors. | 16:38 |
pabelanger | clarkb: we could propose ara_oooq disable now, as zuulv3 wouldn't make it needed? | 16:38 |
Shrews | pabelanger: oh, nm. uploads happen anytime. dib rebuilds happen only after 24 hours | 16:38 |
SpamapS | clarkb: if it's just for debugging and not quick viewing... tar instead of rsync? | 16:38 |
jeblair | SpamapS: that would likely be so difficult to use to not make it worthwhile; i would find it easier to just read the raw json. | 16:39 |
pabelanger | SpamapS: clarkb: I think stackviz has the right idea of what it does, a single json file IIRC, to render data from | 16:39 |
boden | hi, as of recent I’ve seen a number of failures in the v2 jobs “ERROR: These requested packages were not installed..” is this a known issue? | 16:39 |
pabelanger | boden: it is known, we are pushing up new images now to try and fix it | 16:39 |
jeblair | i think i can channel dmsimard here and say the right way to use ara in this situation is with a centralized reporting database. | 16:39 |
dmsimard | I don't believe there is a short term opportunity to reduce the amount of files generated in an ARA report, it's basically one html file per result/host/etc. The best option is to consider a centralized instance of sorts (not unlike openstack-health) | 16:40 |
fungi | SpamapS: if i were to redesign this entire system, i'm starting to think that it would have made more sense to archive a tarball of logs from each build and then have a human-friendly frontend which temporarily unpacks and serves it on demand... but then you also get the benefits that anyone or any system who wants to grab all the logs for a build can just pull that tarball | 16:40 |
dmsimard | jeblair: thou hath summoned me | 16:40 |
AJaeger | we also store both job-output.json.gz and job-output.txt.gz - and the json is 3 times as large as the txt | 16:40 |
jeblair | dmsimard: nailed it! | 16:40 |
boden | pabelanger: ack | 16:40 |
boden | thanks | 16:40 |
pabelanger | jeblair: dmsimard: woah | 16:40 |
jeblair | AJaeger: the json has all the information in it; we still haven't gotten the text quite right yet. it very frequently does not have info wee need to diagnose errors. | 16:41 |
SpamapS | jeblair: I was just thinking for the instances where people try to grab all of /etc, not ARA. | 16:41 |
AJaeger | jeblair: ack | 16:41 |
jeblair | SpamapS: ah | 16:41 |
dmsimard | The thing about a centralized instance is that we don't want every ansible run to synchronously report each task/result to a central (mysql) database remotely, just the added latency I suspect would be noticeable -- especially for regions farther away from the database server. | 16:42 |
dmsimard | We would need to asynchronously import the data, somehow. | 16:42 |
clarkb | SpamapS: well we've also asked htat that stop happening and it has improved over time, but still finding cases here and there where I think it must be a blacklist instead of a whitelist of copies | 16:42 |
dmsimard | A quick hack would likely be to recover the sqlite database and import it in a central location a bit like we trigger logstash things | 16:43 |
openstackgerrit | wes hayutin proposed openstack-infra/tripleo-ci master: be more prescriptive in log collection https://review.openstack.org/511526 | 16:43 |
SpamapS | clarkb: maybe we should du -i in the executor too. | 16:43 |
SpamapS | actually | 16:43 |
SpamapS | s/maybe/ | 16:43 |
SpamapS | / | 16:43 |
SpamapS | bah | 16:43 |
dmsimard | So we wouldn't generate the report on each job but we would copy the sqlite database. The database is pretty small. | 16:43 |
SpamapS | DiskAccountant should have stopped this. | 16:43 |
SpamapS | We're limiting on storage bytes, but we should also limit inodes. | 16:44 |
jeblair | SpamapS: our limit is super high until after the cutover because we have to support some legacy jobs that use a lot of space | 16:44 |
jeblair | SpamapS: also, we still haven't cut over | 16:44 |
SpamapS | this happened on log storage from check jobs yeah? | 16:44 |
dmsimard | I'd need to think about the process involved in importing databases over and over. | 16:44 |
SpamapS | And just the duplicated check load? | 16:44 |
jeblair | so i'd like to suggest that we focus this conversation right now to whether we need to make any emergency changes to zuulv3 right now to stop our inode use? | 16:44 |
jeblair | because the ci system has had at least a partial outage for over a day | 16:45 |
jeblair | and we should focus on nothing other than correcting that now. | 16:45 |
jeblair | when we clear that status alert, we can talk about what to do later in v3 | 16:45 |
clarkb | jeblair: I think it is definitely a large overhead for the previously "small" jobs | 16:45 |
*** yamamoto has joined #openstack-infra | 16:45 | |
clarkb | er ara by default for each job is | 16:45 |
dmsimard | ara is probably a nice to have, if it can help alleviate the load we can toggle it off -- no need to do it by uninstalling ara from the executor IMO, we could just disable the generation from inside the role that does the generation | 16:46 |
pabelanger | If ARA is a large amount of inodes over v2, then (reluctantly) I'd be in favor of disabling it on zuulv3 for now | 16:46 |
clarkb | I'd be happy trying it with just failed jobs to start if that is easy | 16:46 |
SpamapS | Right, I was suggesting we have DiskAccountant kill jobs that abuse the inode table of the executor. But I guess the problem is actually not the executor running out of inodes, but the end target running out. | 16:46 |
jeblair | how about we just turn off the v3 check pipeline right now? | 16:46 |
*** armaan has joined #openstack-infra | 16:46 | |
clarkb | that works too | 16:46 |
*** andreas_s has joined #openstack-infra | 16:46 | |
pabelanger | sure | 16:46 |
fungi | that may help drop some load. even with `rm -rf /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/{und | 16:46 |
*** smatzek has quit IRC | 16:46 | |
jeblair | mordred: are you available? | 16:46 |
fungi | ercloud/tmp/ansible,ara_oooq,undercloud/etc}` going we're still not gaining ground | 16:46 |
jeblair | guess not | 16:47 |
*** smatzek has joined #openstack-infra | 16:47 | |
jeblair | clarkb: you want to disable v3 check? | 16:47 |
SpamapS | Yeah I think that's the thing to do. | 16:47 |
dmsimard | I'll go ahead and propose a toggle to disable the ara report generation just in case, it could come handy in the future | 16:47 |
clarkb | jeblair: ya I think we should to try and get logs fs into a happier state | 16:48 |
fungi | basically, i think short of burning down whole swaths of the logs tree, i don't think we can delete files faster than we're uploading them at the moment | 16:48 |
jeblair | clarkb: sorry, i meant, are you available to make that change? | 16:48 |
clarkb | oh yes, I can | 16:48 |
*** dimak has quit IRC | 16:48 | |
jeblair | clarkb: cool, it's yours | 16:48 |
clarkb | I'll push that up momentarily | 16:48 |
fungi | i suppose another option is we could artificially constrain our nodepool quota so we run fewer jobs at a time | 16:49 |
fungi | i mean, after the stop the v3 check pipeline | 16:49 |
*** leyal has quit IRC | 16:49 | |
*** lihi has quit IRC | 16:50 | |
*** oanson has quit IRC | 16:50 | |
*** oanson has joined #openstack-infra | 16:50 | |
fungi | but let's see if this makes a significant dent first, i guess | 16:51 |
*** smatzek has quit IRC | 16:51 | |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Disable zuulv3 check pipeline https://review.openstack.org/511527 | 16:51 |
clarkb | does that look right? | 16:51 |
pabelanger | looking | 16:52 |
*** yamamoto has quit IRC | 16:52 | |
*** lucasagomes is now known as lucas-afk | 16:52 | |
*** Swami has quit IRC | 16:53 | |
mordred | jeblair: yes! I am here | 16:53 |
pabelanger | yah, think so | 16:53 |
mordred | +2 on turning of v3 check | 16:54 |
*** dimak has joined #openstack-infra | 16:54 | |
*** leyal has joined #openstack-infra | 16:54 | |
*** lihi has joined #openstack-infra | 16:55 | |
jeblair | clarkb: zuul reported back with an expected post-fail. that means the syntax check passed. i say force-merge it now. | 16:56 |
mordred | ++ | 16:56 |
clarkb | ok on it | 16:56 |
*** smatzek has joined #openstack-infra | 16:57 | |
pabelanger | periodic is also large on zuulv3 (303 patches), so zuulv3 might start processing them with no check | 16:57 |
clarkb | do we want to disable periodic too? | 16:57 |
jeblair | good point. i'm in favor of disabling periodic | 16:58 |
openstackgerrit | Merged openstack-infra/project-config master: Disable zuulv3 check pipeline https://review.openstack.org/511527 | 16:58 |
clarkb | ok working on a patch for periodic now | 16:58 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Add a toggle to disable ARA static report generation https://review.openstack.org/511528 | 16:58 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Add a toggle to enable saving the ARA sqlite database https://review.openstack.org/511529 | 16:58 |
dmsimard | infra-root ^ | 16:58 |
*** derekh has quit IRC | 16:58 | |
*** andreas_s has quit IRC | 17:00 | |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Similarly to disabling check, disable periodic https://review.openstack.org/511530 | 17:00 |
clarkb | there is periodic | 17:00 |
*** Goneri has joined #openstack-infra | 17:01 | |
*** baoli has quit IRC | 17:01 | |
jeblair | clarkb: i think you need to leave the pipelines and change the trigger to "trigger: {}" | 17:02 |
clarkb | thanks | 17:02 |
clarkb | (just saw the error with jobs trying to use a pipeline that no longer exists) | 17:02 |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Similarly to disabling check, disable periodic https://review.openstack.org/511530 | 17:03 |
*** panda|rover is now known as panda|rover|off | 17:04 | |
mordred | dmsimard: both lgtm | 17:04 |
*** caphrim007 has joined #openstack-infra | 17:05 | |
*** eroux has joined #openstack-infra | 17:06 | |
*** priteau has quit IRC | 17:06 | |
*** baoli has joined #openstack-infra | 17:07 | |
*** jbadiapa has quit IRC | 17:07 | |
*** baoli has quit IRC | 17:09 | |
*** baoli has joined #openstack-infra | 17:10 | |
*** dangers` has quit IRC | 17:10 | |
*** iyamahat has quit IRC | 17:10 | |
*** iyamahat has joined #openstack-infra | 17:10 | |
clarkb | 511530 has no post failured, ready for me to merge it ? | 17:10 |
pabelanger | ++ | 17:11 |
*** baoli has quit IRC | 17:11 | |
*** tesseract has quit IRC | 17:11 | |
*** baoli has joined #openstack-infra | 17:11 | |
mordred | clarkb: wfm | 17:12 |
openstackgerrit | Merged openstack-infra/project-config master: Similarly to disabling check, disable periodic https://review.openstack.org/511530 | 17:12 |
clarkb | mordred: want me to remove you from project bootstrappers when I remove myself? | 17:12 |
*** sree has joined #openstack-infra | 17:12 | |
*** dangers has joined #openstack-infra | 17:13 | |
*** caphrim007_ has joined #openstack-infra | 17:13 | |
pabelanger | just citycloud-sto2 and infracloud (both regions) left for latest ubuntu-xenial DIB | 17:13 |
mordred | clarkb: yes please | 17:13 |
pabelanger | we should be seeing some results of bindep-fallback.txt already | 17:13 |
clarkb | mordred: done | 17:13 |
pabelanger | going to try and find a log | 17:13 |
*** ociuhandu has quit IRC | 17:14 | |
pabelanger | boden: which review did you see the failure on? | 17:15 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Temporarily disable volume and os_image functional tests https://review.openstack.org/508156 | 17:15 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Fix image task uploads https://review.openstack.org/511532 | 17:15 |
mordred | Shrews: ^^ those should help/fix the image upload issue | 17:15 |
pabelanger | boden: okay, I found 510224 | 17:16 |
boden | pabelanger anything recent in neutron-lib… for example https://review.openstack.org/#/c/502416/ https://review.openstack.org/#/c/510224/ | 17:16 |
*** caphrim007 has quit IRC | 17:16 | |
pabelanger | boden: thanks | 17:16 |
mordred | Shrews: or at least help with the log spam - since I think the uploads are actually accidentally occuring correctly whilst we log a bunch of errors - but logging errors while a thing works cloud-side means finding real errors is unpossible | 17:17 |
clarkb | df shows IFree appears to be slowly increasing | 17:17 |
clarkb | course now that i have said that... | 17:17 |
*** sree has quit IRC | 17:17 | |
clarkb | also would be nice if du had -i on that server, oh well | 17:18 |
mordred | clarkb: yah- it rises for a bit and then gets cratered | 17:18 |
*** dhinesh has quit IRC | 17:18 | |
boden | pabelanger: also still seeing issues with vmware-nsx… ex: https://review.openstack.org/#/c/509661/ | 17:19 |
clarkb | mordred: its over 100k now at least | 17:19 |
mordred | clarkb: \o/ | 17:19 |
pabelanger | boden: okay, just confirming it is fixed with 510224 | 17:19 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Add override-branch to all periodic jobs https://review.openstack.org/511533 | 17:19 |
pabelanger | boden: I would also suggest adding your own bindep.txt to both project, and figuring out which OS packages you need bindep to install. Possible you might be able to mitigate the errors from today, since you are using bindep-fallback.txt right now | 17:20 |
boden | pabelanger: ok, I’ll have to read up on that | 17:21 |
clarkb | wow and back down to 80k ish | 17:21 |
AJaeger | we also had broken periodic jobs in v3, see 511533 - we didn't specify override-branch and just run the job for each branch... | 17:21 |
pabelanger | will ned to recheck 510224, all jobs ran on rackspace | 17:22 |
*** slaweq_ has joined #openstack-infra | 17:23 | |
inc0 | good morning, zuulv3 is down? | 17:23 |
pabelanger | yes, we are stopping check pipelines until we can recover logs.o.o | 17:23 |
pabelanger | (on zuulv3) | 17:23 |
inc0 | ok | 17:23 |
pabelanger | clarkb: should we restart zuulv3 to dump pipelnes? Or let it run out | 17:23 |
clarkb | oh right it won't dump them on its own | 17:24 |
clarkb | jeblair: ^ what do you thinK/ | 17:24 |
AJaeger | team, we have 303 changes currently in periodic and 117 in check | 17:24 |
jeblair | clarkb: yes we should | 17:24 |
AJaeger | pabelanger: yes! | 17:24 |
*** slaweq_ has quit IRC | 17:24 | |
jeblair | i will do it | 17:24 |
clarkb | jeblair: tanks | 17:24 |
pabelanger | ++ | 17:24 |
clarkb | and now down toe 45k ish inodes | 17:24 |
clarkb | so ya not keeping up | 17:25 |
pabelanger | Shrews: just citycloud-sto2 left for ubuntu-xenial DIB | 17:25 |
pabelanger | and rackspace of course | 17:25 |
jeblair | zuulv3 restarting | 17:25 |
*** felipemonteiro has joined #openstack-infra | 17:26 | |
*** links has quit IRC | 17:27 | |
clarkb | we are back down to 0 free inodes | 17:27 |
pabelanger | okay, confirmed gnutls package is no longer breaking on xenial with ovh | 17:27 |
pabelanger | telnet://158.69.88.129:19885 | 17:27 |
pabelanger | clarkb: care to +3 511492 | 17:28 |
*** shardy has quit IRC | 17:29 | |
pabelanger | that's what we used for ubuntu DIBs | 17:29 |
jeblair | as best as i can tell, the checksum correction will take more than 7 more hours. | 17:29 |
pabelanger | kk | 17:29 |
clarkb | pabelanger: done | 17:30 |
fungi | pabelanger: excellent job! | 17:30 |
clarkb | watching df -i output I am imagining a game of hungry hungry hippos | 17:30 |
fungi | i take it we're still waiting on uploads to complete elsewhere | 17:30 |
fungi | clarkb: yes, that's a great analogy | 17:31 |
jlk | AJaeger: reviewed. I don't think anything else needs root. | 17:31 |
*** pblaho has quit IRC | 17:31 | |
pabelanger | Yah, as long as we don't vos release our ubuntu AFS mirror, we should be protected until we can repair read/write. Just means we've pinned for abit | 17:31 |
AJaeger | jlk: great, thanks | 17:31 |
fungi | clarkb: heh, one of my polls (i have watch reporting once a minute on inode count for that filesystem) showed 1 free inode | 17:32 |
pabelanger | we also could cause ubuntu DIBs too, if we felt the need | 17:32 |
pabelanger | s/cause/pause | 17:32 |
jlk | AJaeger: You'll need infra root to push it through though. | 17:32 |
*** sree has joined #openstack-infra | 17:32 | |
AJaeger | jlk: why that? | 17:32 |
clarkb | is there an easy way t ostrace ssh on static.o.o and filter writes to logs/ ? | 17:32 |
clarkb | and maybe we can see realtime what is going into the fs/ | 17:33 |
jlk | or maybe not infra root, but people with more voting rights than I have :D | 17:33 |
mordred | jlk, AJaeger: which change? | 17:33 |
clarkb | clarkb@static:/srv/static/logs/61$ find 509761 | wc -l returns 2840985 | 17:33 |
fungi | clarkb: sshd forks on each incoming connection, so you'd need to -f | 17:33 |
AJaeger | jlk: yes, I know... | 17:33 |
pabelanger | clarkb: Oh, I guess we should also get check-tripleo pipelines for zuulv3 | 17:33 |
pabelanger | since they will run tripleo jobs | 17:33 |
mordred | clarkb: holy crap | 17:34 |
jeblair | pabelanger: please update commit message on https://review.openstack.org/473911 so we can merge the change. i've already manually removed the crontab entries. | 17:34 |
AJaeger | mordred: https://review.openstack.org/511396 , https://review.openstack.org/511436 , https://review.openstack.org/511435 | 17:34 |
pabelanger | jeblair: ack | 17:34 |
clarkb | a single tripleo change is .4% of our inode total | 17:34 |
SamYaple | haha wow | 17:35 |
*** sshnaidm is now known as sshnaidm|off | 17:35 | |
AJaeger | jeblair: let me do a proposal for periodic translation jobs... | 17:36 |
jeblair | AJaeger: let's check with mordred and see which approach he favors; he's thought about this more in the context of the migrated jobs | 17:36 |
*** ociuhandu has joined #openstack-infra | 17:36 | |
clarkb | I need to pop out for breakfast now. If someone else is able to get check-trupleo in v3 that would be great | 17:36 |
jeblair | clarkb: can you elaborate? i don't know what you're asking | 17:37 |
mordred | AJaeger, jeblair: those patches look good to me - what's the other approach? | 17:37 |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Remove npm / rubygem crontab entries https://review.openstack.org/473911 | 17:37 |
pabelanger | jeblair: fungi: ^more info in commit message for rubygems / npm | 17:38 |
jeblair | mordred: https://review.openstack.org/511533 i think is the change we're discussing with AJaeger | 17:38 |
AJaeger | mordred, jeblair what about http://paste.openstack.org/show/623489/ instead of https://review.openstack.org/#/c/511436/1/zuul.d/jobs.yaml ? | 17:38 |
jeblair | i've re-enabled all of the crontab entries on mirror-update except ubuntu | 17:38 |
AJaeger | jeblair: problem with that approach is that we have some repos that don't run the job on all branches | 17:38 |
*** felipemonteiro has quit IRC | 17:39 | |
jeblair | AJaeger: you're saying some projects only run propose-translation-update on master, but some run on all branches? in which case, put no branch matcher on the job, but do add one to the project-pipeline invocation of the job. | 17:42 |
jeblair | AJaeger: also, branches can be a yaml list, so you don't have to do regexes any more | 17:42 |
*** SumitNaiksatam has joined #openstack-infra | 17:43 | |
AJaeger | jeblair: most projects run the translation proposal only on master, some on stable/pike and ocata, some only on pike, others only on ocata | 17:43 |
clarkb | jeblair: pabelanger pointed out we are still running check tripleo in zuulv3 | 17:43 |
pabelanger | clarkb: mordred: what if you did http://logs.openstack.org/07/472607/ ? what is the number of inodes on that? | 17:43 |
AJaeger | all depending on whether translations were ready at that time. | 17:43 |
jeblair | this is certainly how new periodoc jobs should be constructed. the question i'd like mordred to weigh in on is whether we should go ahead and do this for these broken legacy jobs, or is the other approach better. | 17:44 |
*** trown|lunch is now known as trown | 17:44 | |
jeblair | mordred you +2d 511533 without any feedback on my comments, so i guess that means he favors your approach. that's fine. | 17:44 |
mordred | wait | 17:44 |
jeblair | wow that didn't make sens | 17:44 |
jeblair | mordred you +2d 511533 without any feedback on my comments, so i guess that means you favor AJaeger'sapproach. | 17:45 |
jeblair | i did not switch conversation partners well | 17:45 |
jeblair | anyway, i need to focus on fires | 17:45 |
mordred | jeblair, AJaeger: I'm sorry, I feel completely lost. I do not understand how branch matchers can have any impact on periodic jobs | 17:45 |
jeblair | so let's pin this for later. | 17:45 |
jeblair | clarkb: i'll take care of it | 17:45 |
*** apetrich has quit IRC | 17:45 | |
mordred | ok. I've removed my vote (which I gave missing jeblair's comments) - and yes, I'd like to pin it for later bcause I'm not being a good participant in it right now it seems | 17:46 |
jeblair | we need to focus on the 30-hour long ci outage now. | 17:46 |
*** armaan_ has joined #openstack-infra | 17:46 | |
AJaeger | jeblair: ok - we can discuss later, I'll add to etherpad | 17:46 |
mordred | yup | 17:46 |
*** edmondsw has joined #openstack-infra | 17:46 | |
pabelanger | okay, gnutls is under control now. I've see a few projects working properly again with bindep-fallback.txt | 17:48 |
*** apetrich has joined #openstack-infra | 17:48 | |
pabelanger | what can I help with now? | 17:48 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Disable trigger for v3 check-tripleo pipeline https://review.openstack.org/511540 | 17:48 |
jeblair | pabelanger: did you end up creating and uploading new images? | 17:49 |
*** Swami has joined #openstack-infra | 17:49 | |
pabelanger | jeblair: yes, we used AFS mirrors for packages on xenial DIB, once we built and images uploaded, issue resolved it self | 17:49 |
*** armaan has quit IRC | 17:49 | |
*** dhinesh has joined #openstack-infra | 17:49 | |
jeblair | pabelanger: and theyre uploaded to all regions? | 17:49 |
pabelanger | jeblair: execpt rackspace, because of a shade bug | 17:50 |
jeblair | ok. and that's fine because they have working old images | 17:50 |
pabelanger | and rackspace is running 3 day old images, that are not affected | 17:50 |
pabelanger | yah | 17:50 |
*** sree has quit IRC | 17:50 | |
Shrews | They should eventually complete in rax | 17:50 |
*** rwsu has quit IRC | 17:51 | |
pabelanger | once https://review.openstack.org/511492/ lands, we also can remove nb04 from emergency file | 17:51 |
pabelanger | I recommend we keep nb03 disabled, and potentially rebuilt into rackspace for faster uploads of DIBs | 17:51 |
*** sree has joined #openstack-infra | 17:51 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add new translation templates https://review.openstack.org/511541 | 17:52 |
mordred | the longest portion of rackspace 'upload' time is actually the image import step I believe (which is where the bug is) | 17:52 |
jeblair | pabelanger, mordred: +3 https://review.openstack.org/511540 ? | 17:52 |
jeblair | fungi: do you need any help with logs? | 17:53 |
mordred | jeblair: done | 17:53 |
pabelanger | should I look at update base jobs in zuulv3 with ARA disabled? | 17:53 |
jeblair | pabelanger: no | 17:53 |
pabelanger | kk | 17:53 |
jeblair | i'm not ready to talk about anything that isn't directly related to clearing that status message | 17:54 |
*** dhinesh has quit IRC | 17:54 | |
*** dhinesh has joined #openstack-infra | 17:54 | |
*** armaan_ has quit IRC | 17:54 | |
fungi | jeblair: i think i've got it deleting the most effective two things we can hope for at the moment (subtrees of tripleo jobs clarkb identified, and any logs older than 2 weeks). it's _almost_ keeping up now, and as we finish winding down the other high-use pipelines on zuulv3 i have hopes it'll finally gain ground | 17:54 |
*** armaan has joined #openstack-infra | 17:55 | |
*** slaweq_ has joined #openstack-infra | 17:55 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Add new translation templates https://review.openstack.org/511541 | 17:55 |
fungi | i haven't seen it break 100k free inodes yet (polling once a minute) but it's been a little while since i've seen it at 0 free (i've seen a few sub-1k though) | 17:56 |
*** sree has quit IRC | 17:56 | |
*** baoli has quit IRC | 17:56 | |
fungi | as opposed to earlier where it was basically pegged to 0 free on every poll | 17:56 |
fungi | no, wait, there it just came back 0 again | 17:56 |
SamYaple | fungi: you jinxed it! | 17:56 |
fungi | indeed :/ | 17:56 |
pabelanger | okay, I'll work on getting system-config jobs working | 17:56 |
*** baoli has joined #openstack-infra | 17:57 | |
pabelanger | looks like openstackci-beaker is failing | 17:57 |
jeblair | pabelanger: thanks | 17:57 |
*** jascott1 has joined #openstack-infra | 17:57 | |
mordred | as soon as the latest patch from jeblair lands we should likely restart zuul again to clear that pipeline yeah? | 17:57 |
jeblair | mordred: there's only one thing in it now, so if it lands soon, no big deal | 17:57 |
fungi | probably so | 17:57 |
mordred | oh - ok. cool | 17:57 |
fungi | oh, good point we already restarted so that cleared it out anyway | 17:58 |
*** camunoz has quit IRC | 17:58 | |
fungi | and there hasn't been a lot of time for it to accumulate again | 17:58 |
*** armaan has quit IRC | 17:59 | |
*** trown is now known as trown|brb | 18:00 | |
mordred | Shrews: https://review.openstack.org/#/c/508156/ came back green - wanna +A it? | 18:01 |
mordred | Shrews: the follow up also is mostly green - the red is POST_FAILURE from the current incident | 18:01 |
*** dangers has quit IRC | 18:02 | |
*** eharney has quit IRC | 18:03 | |
openstackgerrit | Paul Belanger proposed openstack-infra/puppet-openstack_infra_spec_helper master: Cap signet < 0.8.0 https://review.openstack.org/511543 | 18:03 |
fungi | huh, no more rsync processes on static.o.o for the past few minutes, and now we're up over 100k free inodes | 18:04 |
fungi | over 200k free | 18:05 |
*** dprince has quit IRC | 18:05 | |
mordred | fungi: I haven't seen that number over 200k in a WHILE | 18:07 |
pabelanger | jeblair: okay, I've confirmed ubuntu-trusty also is affect with gnutls issue. I've started an image-build for ubuntu-trusty now | 18:07 |
*** dangers has joined #openstack-infra | 18:08 | |
*** ijw has joined #openstack-infra | 18:08 | |
*** dhinesh has quit IRC | 18:08 | |
*** dhinesh has joined #openstack-infra | 18:09 | |
*** trown|brb is now known as trown | 18:09 | |
jeblair | i'm looking into what's holding up 511396 | 18:09 |
*** dhinesh has quit IRC | 18:09 | |
pabelanger | Yah, was just taking a peek myself | 18:10 |
pabelanger | we have a lot of ready nodes on nl01, and few currently building | 18:10 |
jeblair | slow building node in inap-mtl01 | 18:11 |
pabelanger | mordred: fungi: clarkb: https://review.openstack.org/511543/ is the fix for system-config openstackci-beaker jobs, if you'd like to review. I'm working on fixing ubuntu-trusty issue now | 18:11 |
pabelanger | jeblair: maybe trying to boot new xenial image | 18:11 |
fungi | thanks pabelanger! | 18:11 |
jeblair | pabelanger: does that take 30m? | 18:12 |
pabelanger | jeblair: yah, I've seen upwards of an hour | 18:12 |
mgagne | jeblair: have new images been uploaded? is Nodepool playing catch up since yesterday? | 18:12 |
jeblair | we should maybe set the launch timeout to 10m there like rax | 18:12 |
fungi | i suppose it could if there's a thundering herd on the storage distribution network warming nova image caches | 18:12 |
pabelanger | jeblair: +1 | 18:13 |
jeblair | mgagne: yes new images | 18:13 |
pabelanger | inap boots fast, so 10mins should be plenty | 18:13 |
*** camunoz has joined #openstack-infra | 18:14 | |
Shrews | mordred: +2's the first, the follow up has a comment from mye | 18:14 |
Shrews | me | 18:14 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Set v3 nodepool inap timeout to 600 https://review.openstack.org/511545 | 18:14 |
mordred | Shrews: thanks | 18:14 |
pabelanger | +2 | 18:15 |
jeblair | okay, that should eventually clear. i don't think disabling v3 check-tripleo is urgent enough to do anything other than just check back in a bit. | 18:15 |
pabelanger | great | 18:16 |
jeblair | fungi: how much headroom do you think we need before we can send an all-clear? | 18:16 |
fungi | watching what jobs are uploading logs in real-time (by grepping the process list for rsync) i just saw a gate-tripleo-ci-centos-7-3nodes-multinode-nv build suck up 10k inodes | 18:16 |
Shrews | mordred: oh, maybe what you've done in that exception format will work (assumes 'message' is an attribute, right?) | 18:17 |
fungi | jeblair: a sane amount would be when we get down to 99% inode consumption maybe? like around 7.7m free | 18:17 |
fungi | jeblair: right now we're at 0.02% free | 18:18 |
*** jdandrea has quit IRC | 18:18 | |
openstackgerrit | Paul Belanger proposed openstack-infra/puppet-openstack_infra_spec_helper master: Add bindep.txt file https://review.openstack.org/511546 | 18:19 |
mordred | Shrews: yes - just replied with that - status is a dict with message as a key - but we could change it to the other syntax if you prefer | 18:19 |
Shrews | mordred: yeah, i'd rather be explicit than clever :) | 18:19 |
openstackgerrit | Merged openstack-infra/project-config master: Install zanata dependencies as root https://review.openstack.org/511396 | 18:20 |
openstackgerrit | Merged openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492 | 18:20 |
fungi | a gate-tripleo-ci-centos-7-containers-multinode build just now uploaded 8k files | 18:20 |
clarkb | almost 2k of that is various ara things iirc and etc is another 1k or so then multiple etc by number of nodes | 18:21 |
fungi | and right after that i saw a gate-tempest-dsvm-py35-ubuntu-xenial job upload 600 files | 18:21 |
fungi | so we're talking over order of magnitude higher inode counts from tripleo jobs than devstack-gate jobs | 18:22 |
pabelanger | yah | 18:22 |
fungi | i saw a gate-tempest-dsvm-ironic-ipa-partition-bios-pxe_ipmitool-coreos-src-ubuntu-xenial build upload 500 files | 18:23 |
fungi | (that's one heck of a job name!) | 18:23 |
clarkb | ya multinode tempest/grenade is in the 1k range | 18:24 |
mordred | Shrews: kk. update coming | 18:25 |
*** anupn has quit IRC | 18:26 | |
*** baoli_ has joined #openstack-infra | 18:31 | |
*** dhinesh has joined #openstack-infra | 18:31 | |
pabelanger | mordred: clarkb: care to +3 https://review.openstack.org/511545/ for inap launch-timeout 600 | 18:31 |
openstackgerrit | Merged openstack-infra/project-config master: Disable trigger for v3 check-tripleo pipeline https://review.openstack.org/511540 | 18:32 |
mordred | pabelanger: done | 18:32 |
*** baoli has quit IRC | 18:33 | |
fungi | i just saw a legacy-tempest-dsvm-neutron-ovsfw build upload logs... i guess the orphaned nodes from the zuulv3 restart are still chugging along | 18:34 |
fungi | that might explain why cleanup hasn't sped up just yet | 18:34 |
openstackgerrit | Merged openstack-infra/project-config master: Set v3 nodepool inap timeout to 600 https://review.openstack.org/511545 | 18:36 |
jeblair | fungi: ah, yeah, we may still have the bug where a scheduler restart doesn't abort executor jobs | 18:36 |
jeblair | though it should cause them to get deleted. | 18:36 |
jeblair | the nodes i mean | 18:37 |
pabelanger | ubuntu-trusty DIB now compressing | 18:37 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Fix image task uploads https://review.openstack.org/511532 | 18:38 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305 | 18:38 |
jeblair | fungi: at this point you should see no more legacy- jobs upload | 18:38 |
mordred | Shrews: k. that should fix your comment | 18:38 |
jeblair | fungi: the only nodepool v3 nodes in use are for infra-post jobs | 18:38 |
jeblair | fungi: and the change to disable check-tripleo in v3 has landed | 18:38 |
jeblair | i'm going to afk for about an hour for lunch, etc. | 18:39 |
mordred | jeblair: have good lunching | 18:39 |
clarkb | what time did the tripleo ansible tmp fix get in yesterday? mordred do you recall? | 18:40 |
mordred | clarkb: I do not - I can go look though | 18:40 |
mordred | clarkb: I made the Stop collecting ephemeral temp dirs patch at around 22:16 - which is right around the time we force-merged the other tmp patches | 18:41 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299 | 18:42 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479 | 18:42 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Allow domain_id for roles https://review.openstack.org/496992 | 18:42 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Move role normalization to normalize.py https://review.openstack.org/500170 | 18:42 |
clarkb | mordred: thanks | 18:43 |
mordred | Shrews: ^^ if you get a sec, those 4 have been on hold due to other gate issue, but would be nice to have if we're gonna cut a new release for the upload bug | 18:43 |
mordred | Shrews: (3 of them are for fixing bugs humans have reported running in to) | 18:43 |
clarkb | gate-tripleo-ci-centos-7-ovb-ha-oooq has added more /etc collection in the last 7 dayrs or so | 18:43 |
clarkb | overcloud-*/etc seems to be the bulk of it that is new | 18:44 |
*** vsaienk0 has joined #openstack-infra | 18:44 | |
clarkb | went from ~4k to ~37k | 18:45 |
clarkb | 23k or so of that is the ansible tmp stuff | 18:45 |
clarkb | then good chunk of the rest looks like etc | 18:45 |
Shrews | mordred: ack | 18:46 |
clarkb | also /var/log/extra and /var/log/config-data | 18:48 |
clarkb | we are copying all of the apache modules multiple times (basically once per oenstack service?) | 18:49 |
clarkb | EmilienM: Hlogs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/nova/etc/httpd/conf.modules.d is probably as easy sort of thing to just stop collecting | 18:50 |
clarkb | EmilienM: are we using a whitelist for logs yet? | 18:50 |
clarkb | but its in logs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/heat_api/etc/httpd/conf.modules.d as well and so on | 18:51 |
openstackgerrit | Merged openstack-infra/shade master: Temporarily disable volume and os_image functional tests https://review.openstack.org/508156 | 18:52 |
clarkb | logs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/keystone/etc/httpd/conf.modules.d | 18:53 |
*** vsaienk0 has quit IRC | 18:54 | |
clarkb | looks like we also copy the system systemd units | 18:55 |
EmilienM | o/ | 18:55 |
EmilienM | clarkb: yes we have whitelist and exclude | 18:55 |
clarkb | EmilienM: why are we copying all of the apache modules multiple times then? | 18:55 |
EmilienM | weshay|ruck: can you take a look please? I'm in a call right now | 18:55 |
clarkb | and systemd system units? | 18:55 |
EmilienM | clarkb: I don't know now | 18:55 |
weshay|ruck | aye | 18:56 |
weshay|ruck | EmilienM, k | 18:56 |
EmilienM | ty | 18:56 |
weshay|ruck | clarkb, I have an email out for review w/ a few patches for logs on openstack-dev | 18:57 |
pabelanger | likey drop SSH host keys in http://logs.openstack.org/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/etc/ssh/ | 18:57 |
pabelanger | :( http://logs.openstack.org/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/etc/sysconfig/network-scripts/ | 18:58 |
pabelanger | don't think we need all of sysconfig/network-scripts too | 18:58 |
*** camunoz has quit IRC | 18:58 | |
pabelanger | weshay|ruck: clarkb: lets create a topic in gerrit so we can review them | 18:59 |
weshay|ruck | https://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/collect-logs.yml#L151 | 18:59 |
weshay|ruck | ya.. I'll nuke that | 18:59 |
*** slaweq_ has quit IRC | 18:59 | |
pabelanger | Yah, I think we should be whitelisting specific files, not just directories | 18:59 |
clarkb | logs/61/509761/2/check/gate-tripleo-ci-centos-7-containers-multinode/99f9196/logs/subnode-2/etc/selinux is another big consumer | 19:00 |
clarkb | pabelanger: yes that is what we've been asking for since like march | 19:00 |
pabelanger | clarkb: agree | 19:00 |
pabelanger | okay, I have to run out for an errand | 19:00 |
pabelanger | I will try to be back shortly | 19:00 |
fungi | worth noting, tripleo isn't the only team with high-inode-count build logs... i just saw a gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial build upload 5k files | 19:02 |
Shrews | mordred: reviewed the shade changes. all look good except for one | 19:03 |
*** ihrachys_ has joined #openstack-infra | 19:03 | |
*** harlowja has quit IRC | 19:03 | |
*** slaweq_ has joined #openstack-infra | 19:03 | |
clarkb | fungi: do you have the full path to that? i'd be curious to go see what tehy are grabbing | 19:03 |
fungi | clarkb: /srv/static/logs/44/479844/6/check/gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial/1541ded/ | 19:03 |
clarkb | thanks | 19:03 |
dmsimard | fungi: OSA have heavy playbooks and use ARA so there's likely a lot of files | 19:03 |
*** rbrndt has quit IRC | 19:03 | |
dmsimard | (because of ARA) | 19:03 |
*** ihrachys has quit IRC | 19:03 | |
fungi | noted | 19:03 |
Shrews | mordred: i think you need to s/payload/kwargs/ in 511305 ? | 19:04 |
clarkb | fungi: ara is 2800 of that | 19:04 |
fungi | yikes. still a lot of files, but ara is over 50%? | 19:04 |
dmsimard | fungi: that's probably not even the heaviest one, gate-openstack-ansible-openstack-ansible-aio-ubuntu-trusty is likely heavier than that | 19:05 |
clarkb | they are also grabbing a lot of stuff out of etc that shouldn't be grabbed | 19:05 |
*** lukebrowning has quit IRC | 19:05 | |
clarkb | and looks like redundant sets possibly | 19:05 |
dmsimard | fungi: wait, wrong job name, hang on. | 19:05 |
*** masber has joined #openstack-infra | 19:05 | |
dmsimard | fungi: gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial http://logs.openstack.org/21/474721/7/check/gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial/d8cdf1d/logs/ara/ | 19:06 |
*** slaweq_ has quit IRC | 19:06 | |
*** baoli_ has quit IRC | 19:06 | |
*** slaweq_ has joined #openstack-infra | 19:06 | |
dmsimard | that one likely has a bunch of files :( | 19:07 |
fungi | dmsimard: oh, yeah, /srv/static/logs/21/474721/7/check/gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial/d8cdf1d contains 10k files, so rivalling tripleo jobs | 19:07 |
*** AJaeger has quit IRC | 19:07 | |
*** lukebrowning has joined #openstack-infra | 19:07 | |
dmsimard | gate-openstack-ansible-openstack-ansible-ceph-ubuntu-xenial should be on about the same level | 19:07 |
*** SumitNaiksatam has quit IRC | 19:08 | |
pabelanger | okay, ubuntu-trusty DIBs uploadin | 19:08 |
fungi | and we're back under 10k free inodes. so we're really still not keeping pace with the rate at which new builds are uploading logs (or maybe only barely) | 19:08 |
pabelanger | afk now | 19:08 |
clarkb | maybe its worth a general email explaining that we shouldn't be copying all of /etc | 19:09 |
clarkb | but ya ara is the bigger chunk of the pie for osa at least | 19:09 |
*** AJaeger has joined #openstack-infra | 19:09 | |
dmsimard | I'll try and think of the plumbing involved for shifting from static reports to sqlite to central | 19:09 |
*** camunoz has joined #openstack-infra | 19:10 | |
*** masber has quit IRC | 19:10 | |
fungi | how terrible would ara performance be if the report files were passed around as a bundle (tarball or something) and unpacked on the fly? | 19:11 |
*** baoli has joined #openstack-infra | 19:11 | |
fungi | i guess you'd need some backend support to deal with that, or end up transferring all the data to the browser as a giant blob up-front | 19:12 |
clarkb | odyssey4me: ^ fyi, any chance you can workto clean up the collection of /etc in your jobs? | 19:12 |
fungi | 3 inodes free :/ | 19:12 |
dmsimard | fungi: unpacked on the fly ? I've never done something like this before -- right now every file in gzipped individually and then there's the necessary mime types to make the webserver extract them on the fly | 19:13 |
fungi | yeah, we're back to returning POST_FAILURE again | 19:13 |
clarkb | odyssey4me: http://logs.openstack.org/44/479844/6/check/gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial/1541ded/logs/etc/ has 2111 inodes in use and a bunch of that is copying stuff that isn't really relevant to the jobs | 19:13 |
dmsimard | fungi: We would need some sort of middleware ? | 19:13 |
dmsimard | evrardjp, cloudnull ^ see clarkb's question | 19:13 |
fungi | dmsimard: yeah, probably unless you serialized all the report data into a single file | 19:13 |
fungi | which would likely be a big hit browser-side, i'm guessing | 19:14 |
fungi | (hit to performance, not hit on the solid gold singles chart) | 19:14 |
* cloudnull reading | 19:15 | |
*** gouthamr has quit IRC | 19:15 | |
dmsimard | fungi: It's not very realistic, no, there's too much data to display everything in one single file. I'll try and think of something relative to the sqlite database instead. The sqlite database is several order of magnitude smaller than even the gzipped static report, not to mention it's just one file. | 19:16 |
clarkb | oh nice that sounds like win win win | 19:16 |
clarkb | dmsimard: is that a useable feature today? we just have to turn it on? | 19:16 |
cloudnull | ^ I can go kick that out to the tests repo, if so | 19:17 |
fungi | dmsimard: oh! neat, i didn't realize sqlite could do multiple tables in one file, but i'll admit i've done very little with it so far | 19:17 |
cloudnull | clarkb: we have the log/config collection tasks within the tests role. are we needing to just prune that bacK? | 19:18 |
dmsimard | clarkb: Well the sqlite database already exists, that's where the callback saves it's data and from where the web interface reads it. The static report generation is more or less a crawler that crawls all the pages of the interface and generates static files out of every page. | 19:18 |
clarkb | cloudnull: ya if we could stop grabbing all of /etc and multiple copies of it that would be good. | 19:19 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs https://review.openstack.org/511555 | 19:19 |
evrardjp | cloudnull: I am off for today, but if you're doing something to make the whole etc collection and archive that would be great | 19:19 |
clarkb | cloudnull: its fine to copy things relavent to the job, openstack service logs and config or whatever | 19:19 |
evrardjp | clarkb: it's multiple copies because it's multiple "hosts" | 19:19 |
clarkb | cloudnull: but do you really need all of rc.* and fonts and logrotate and so on | 19:19 |
dmsimard | fungi: BonnyCI ran with a mysql database of like 42 000 playbook runs :) | 19:19 |
evrardjp | we generally need it, but it can definitely be a an archive | 19:19 |
clarkb | evrardjp: right the problem is that stuff like ^ is all going to be identica and has no relevance to the job really | 19:19 |
clarkb | evrardjp: its fine to copy the bits that are relevant to the job anddifferent like openstack service ofnfig | 19:20 |
evrardjp | exactly | 19:20 |
evrardjp | I agree | 19:20 |
evrardjp | plus one or two locations, like apt sources | 19:20 |
evrardjp | or yum repos | 19:20 |
dmsimard | fungi: the challenge here is to go from a sqlite database saved on logs.o.o to an interface, somehow -- whether that's a centralized instance, or something generated on the fly from that database | 19:20 |
evrardjp | the rest doesn't matter | 19:20 |
evrardjp | and in all cases we can iterate later to add some small stuff | 19:20 |
cloudnull | yea I think we did an /etc/.* just because it was easy, we could be a lot more tactical | 19:21 |
evrardjp | but I think we should generally not ship those files directly | 19:21 |
evrardjp | we should just archive those | 19:21 |
dmsimard | fungi: ultimately, it's sqlalchemy with a sqlite connection string -- sqlite is usually on the filesystem. Maybe we can work out something that uses the sqlite database over http or something like that. | 19:21 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs https://review.openstack.org/511555 | 19:21 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-python{34,35} jobs https://review.openstack.org/511557 | 19:21 |
evrardjp | if we want the detail, we download and unarchive | 19:21 |
clarkb | evrardjp: sound slike a plan then? prune and archive? | 19:21 |
evrardjp | archive and don't even collect | 19:21 |
evrardjp | so when the instance is destroyed, we don't care anymore | 19:22 |
fungi | dmsimard: yeah, still needs some backend support, like a filter callout in apache maybe (that's how we do the fancy clickable log stuff with os_loganalyze) | 19:22 |
evrardjp | I had a very long day, so I'd be happy if someone can take over this... cloudnull? | 19:22 |
cloudnull | sure thing | 19:23 |
cloudnull | looks like we just need to adjust https://github.com/openstack/openstack-ansible-tests/blob/master/test-log-collect.sh#L40-L62 | 19:23 |
cloudnull | which will take the pressure off from everyone of our role.s | 19:23 |
cloudnull | for now we could comment all that out | 19:23 |
cloudnull | and then work it back in | 19:24 |
evrardjp | same for var log, we can only keep what's interesting for us | 19:24 |
evrardjp | anything that can deblock others is good... But we still need logs at the end, because it reduces our ability to use gate results, and we don't want to spend cycles for nothing either :p | 19:26 |
jeblair | fungi: back. looks like there's little progress on inodes? | 19:26 |
clarkb | ya there is a balance to be reached | 19:26 |
clarkb | with devstack-gate we try to add things when we notice we need them and be specific | 19:26 |
clarkb | and we remove things as we notice they aren't useful too | 19:27 |
clarkb | rather than juts wholesale copy (so I think pruning and archiving to single file is a big win there, thanks) | 19:27 |
*** andreww has quit IRC | 19:27 | |
evrardjp | clarkb: yeah, I guess here we noticed "we need /etc/<something> " | 19:27 |
evrardjp | and then yes we need /etc/<somethingelse> | 19:27 |
fungi | jeblair: yes and no. i saw it fall all the way back to 0 but now we're nearing 300k free again | 19:27 |
evrardjp | and then it finished to be a boatload of things | 19:27 |
evrardjp | which is obviously wrong :p | 19:27 |
jeblair | fungi: what's the next most dramatic step we can take? | 19:28 |
*** xarses has joined #openstack-infra | 19:28 | |
fungi | jeblair: a few potential options: artificially constrain our nodepool quota, disable some of the top-offender jobs, or delete entire subtrees of the filesystem | 19:28 |
jeblair | fungi: oh! what if i produced a list of v3 check jobs from zuul logs and we just rm-rfd those paths? | 19:29 |
*** eharney has joined #openstack-infra | 19:29 | |
clarkb | jeblair: ++ | 19:29 |
*** baoli has quit IRC | 19:29 | |
fungi | jeblair: maybe... deleting jobs by name still seems to go pretty slowly mainly because there are a lot of wildcarded parent directories to get to the job names | 19:29 |
cloudnull | evrardjp: https://review.openstack.org/511560 | 19:30 |
jeblair | fungi: i'm talking exact paths | 19:30 |
jeblair | fungi: i should have said 'build' rather than 'job' :) | 19:30 |
*** baoli has joined #openstack-infra | 19:30 | |
fungi | jeblair: oh, yeah i could probably loop over those pretty easily | 19:30 |
jeblair | lemme see what i can produce | 19:30 |
cloudnull | we can add it back once there's less pressure, and we have sometime to think about everything that we might really need. | 19:30 |
*** camunoz has quit IRC | 19:31 | |
fungi | thanks jeblair! | 19:32 |
evrardjp | cloudnull: ok. Alternative would be to tar them | 19:32 |
evrardjp | let's already do htis | 19:32 |
fungi | i still worry that tarring up files you don't know for sure you need just avoids doing the actual work of figuring out what information is actually useful | 19:32 |
cloudnull | ++ | 19:33 |
cloudnull | I think it'd be better to get a list together of what we really need | 19:33 |
fungi | and makes it easier to never get around to working through that | 19:33 |
evrardjp | fungi: oh yes, I mean tarring only what's useful | 19:33 |
fungi | oh, got it. that would be cool as long as they're not useful to browse directly on a frequent basis | 19:34 |
evrardjp | cloudnull: I updated the commit message with the reason and let's do it | 19:34 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299 | 19:34 |
cloudnull | we could do something like an include list from file and use that with our existing rsync commands so that its easy to add and remove as needed. | 19:34 |
ianw | jeblair / pabelanger : looks like current status on the mirror is trying to rebuild the checksums.db, is that right? | 19:34 |
evrardjp | cloudnull: yes, that's what I thought, basically saying WHAT we really want to collect. | 19:34 |
cloudnull | but it'd be good to circulate that with the osa community so that we make sure we get everything useful for our folks | 19:34 |
evrardjp | let me merge your patch quick then | 19:34 |
jeblair | ianw: yes. i have added space to partition and added quota to volume, rebooted all servers involved, and am running the checksum rebuild with a db directory on local disk. | 19:35 |
cloudnull | evrardjp: ok. | 19:35 |
*** andreas_s has joined #openstack-infra | 19:35 | |
jeblair | ianw: (that way we avoid afs write errors on the db). | 19:36 |
cloudnull | assuming jenkins doesn't kick us in the teeth that should be merged soon, which will have immediate impact on ALL of our role jobs. | 19:36 |
clarkb | I'm semi manually going though and clearning out tmp/ansible from tripleo change logs | 19:36 |
clarkb | cloudnull: thanks | 19:36 |
clarkb | cloudnull: I think we are making progress on the inode front so optimistic it will get through | 19:37 |
fungi | oh, wow, we're up over half a million inodes free now! i suspect we owe some of this to job volume falling now that zuul is no longer backlogged | 19:37 |
jeblair | ianw: the immediate gnutls issue has been resolved by uploading new images built from our mirror. | 19:37 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305 | 19:37 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299 | 19:37 |
clarkb | fungi: ya and I've cleared out about 100k so far | 19:37 |
cloudnull | sorry for the issues fungi clarkb. | 19:37 |
*** florianf has quit IRC | 19:38 | |
jeblair | clarkb, fungi: should we delete all v3 check and check-tripleo pipeline builds? | 19:38 |
ianw | jeblair: excellent, thanks; i am glad that works. dib's "use this mirror during build" is maybe not as robust as i'd like | 19:39 |
jeblair | ianw: you can see recent project-config changes merged to do that if you want to retro-review it | 19:39 |
clarkb | jeblair: I think that would make a significant impact, I'd be in favor | 19:39 |
mordred | jeblair: yah. I'm also in favor | 19:40 |
fungi | jeblair: sure, i expect a majority of them to exhibit failures for issues we've since fixed | 19:40 |
*** andreas_s has quit IRC | 19:40 | |
jeblair | i have a list of 132272 zuulv3 builds. 119842 of which are check* | 19:40 |
fungi | so probably could stand fresh check results anyway | 19:40 |
ianw | and i see system-config seems green, so that's good too | 19:40 |
jeblair | fungi: static.openstack.org:~corvus/log-delete | 19:42 |
fungi | we've just reached 0.1% free inodes now, so about 1/10th of what i'd like to see freed up before we #status ok | 19:42 |
fungi | thanks jeblair! i'll start culling those | 19:42 |
jeblair | fungi: cool, thanks | 19:42 |
fungi | worth noting, my deletion of /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/{und | 19:42 |
fungi | ercloud/tmp/ansible,ara_oooq,undercloud/etc} | 19:42 |
fungi | finally completed | 19:42 |
ianw | cool, the only reason not to use DIB_DISTRIBUTION_MIRROR is that it leaves the mirror behind in the image. which doesn't matter in this case, but i didn't feel was suitable for the general case | 19:42 |
mordred | fungi: \o/ | 19:42 |
fungi | yay stray newlines in my clipboard | 19:42 |
dmsimard | clarkb, fungi: I was discussing with a colleague.. looking at https://github.com/openstack-infra/puppet-openstackci/blob/master/files/log_archive_maintenance.sh#L4-L10 would it make sense to do a .tar.gz archive of the whole job logs instead of just gzipping every file ? | 19:43 |
clarkb | dmsimard: the reason to not do that is for browseability in your web browser | 19:43 |
dmsimard | yeah, I get that | 19:43 |
clarkb | if you tarball everything the nyou have to download and extract locally | 19:44 |
dmsimard | but past a certain treshold, meh, I don't know | 19:44 |
clarkb | fungi: oh I think ansible should have been ansible* | 19:44 |
fungi | dmsimard: they cease to be browsable but i suppose we could consider doing that for logs over a week old or something | 19:44 |
ianw | jeblair: was I right that zuulv2 didn't reload to see https://review.openstack.org/#/c/511360/? | 19:44 |
dmsimard | otherwise, we could consider rotating logs off to another node (cold storage) or something | 19:44 |
fungi | clarkb: thanks, i'll add that in a separate pass | 19:44 |
dmsimard | just trying to think of different other things that could help | 19:44 |
clarkb | fungi: actually you can just rm undercloud/tmp | 19:45 |
clarkb | fungi: since the only content there is the ansible related stuff | 19:45 |
dmsimard | fungi: maybe a treshold between 10 and 30 days, I don't know. Just saying the likelyhood of someone looking at logs >1 week gets increasingly smaller, and on that topic, it'd probably be interesting to look at apache logs to get some stats on what people are looking at. | 19:45 |
fungi | clarkb: thanks, that'll help | 19:45 |
ianw | dmsimard: heh, .tar.gz is my oldest "one day i'll fix this" change -> https://review.openstack.org/#/c/122615/ (not related to reducing inodes though) | 19:46 |
fungi | dmsimard: we already were only keeping 30 days and that had us at 95% blocks used | 19:46 |
ianw | "log packages are around 7MiB from my testing. This is big but not ridiculous." i think this is no longer true | 19:46 |
fungi | so we needed to reduce retention for now anyway (and i've effectively dropped it to 14 days with the pass currently underway) | 19:46 |
dmsimard | ianw: that review is very interesting, we were actually discussing something like that earlier fungi and I | 19:47 |
dmsimard | ianw: the problem with ARA is that while it's not big, it's a lot of smaller files and we could perhaps make a .tar.gz and serve that instead | 19:47 |
dmsimard | However I'm looking at another possibility right now, involving just having to store the sqlite database | 19:48 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add a toggle to disable ARA static report generation https://review.openstack.org/511528 | 19:48 |
jeblair | ianw: re zuulv2 and pip8; i don't know. i forgot about that one. | 19:49 |
fungi | jeblair: i'm using your list thusly: cd /srv/static/logs/ ; cat ~corvus/log-delete | xargs rm -rf | 19:49 |
jeblair | fungi: that sounds about right | 19:49 |
fungi | actually, i think i'm going with my earlier plan for safety | 19:50 |
clarkb | we just passed 1 million free | 19:51 |
jeblair | fungi: for loop? | 19:51 |
*** vhosakot has joined #openstack-infra | 19:51 | |
fungi | jeblair: sed s,^,/srv/static/logs/, ~corvus/log-delete | xargs rm -rf | 19:51 |
jeblair | fungi: heh, that one makes me nervous i'd sed that to a new file and just xargs from that file | 19:52 |
fungi | jeblair: fair, i just re-audited the file to make sure it contains no absolute paths | 19:52 |
fungi | and no ".." | 19:53 |
fungi | deletion underway just catting the file and treating them as relative paths | 19:53 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Add publish-deploy-guide job https://review.openstack.org/511563 | 19:54 |
fungi | we're finally well over a million inodes free | 19:54 |
*** baoli has quit IRC | 19:55 | |
fungi | so i have hopes the current several patterns/lists under deletion will get us to our 1% free #status ok in relatively short order now | 19:55 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: convert deploy-guide to native zuul v3 https://review.openstack.org/511564 | 19:57 |
mordred | jeblair: good idea with the v3 job list! | 19:57 |
*** harlowja has joined #openstack-infra | 19:57 | |
fungi | definitely. this should knock out a good chunk | 19:57 |
fungi | along with the adjusted pattern for tripleo-ci ansible tempfiles and halving retention | 19:58 |
*** pcaruana has quit IRC | 19:58 | |
mordred | ++ | 19:58 |
*** baoli has joined #openstack-infra | 19:59 | |
fungi | and with a few teams making headway on reducing the number of files they're collecting, we should be in better shape in a couple weeks when we get back to a month of logs | 19:59 |
*** baoli has quit IRC | 19:59 | |
SamYaple | just switch to btrfs with dynamic inodes. simple. there have never been issues at scale with btrfs. | 20:00 |
mordred | SamYaple: yah - I can't see any potential issues with that at all | 20:01 |
jeblair | 2m inodes now, i clocked it at +125826 inodes/sec | 20:01 |
fungi | SamYaple: reiserfs also didn't have an inode maximum | 20:01 |
jeblair | er | 20:01 |
jeblair | 2m inodes now, i clocked it at +125826 inodes/min | 20:01 |
jeblair | /sec would be truly impressive. | 20:01 |
mordred | jeblair: that other rate would hve been amazing | 20:01 |
jeblair | so maybe 45m to get to 7m free? | 20:02 |
SamYaple | fungi: i dont want to... murder... performance though | 20:02 |
fungi | my freighter can make the kessel run in 125826 parsecs | 20:02 |
fungi | SamYaple: ooh, too soon | 20:02 |
clarkb | I've really been enjoying zfs locally | 20:02 |
clarkb | but scrubbing 12 TB of logs is probably very slow :/ | 20:03 |
SamYaple | yea zfs is da bomb for alot of things. but it has its weaknesses | 20:03 |
dmsimard | clarkb, fungi: I was reading ianw's middleware patch for serving tgz's ( https://review.openstack.org/#/c/122615/ ) and it gave me an idea.. how about we always create the /ara/ log directory with the ara sqlite database in it, and then a middleware intercepts requests to that directory, if a static report is not generated, it generates it ? It doesn't sound overly complex to achieve and it would | 20:03 |
dmsimard | make it so the ara reports would only be generated on demand and if required | 20:03 |
fungi | clarkb: really, the slowness is the nearly a billion inodes | 20:03 |
fungi | not so much the block size | 20:03 |
clarkb | fungi: ya though it checksums all the data too iirc | 20:04 |
clarkb | at a block level | 20:04 |
openstackgerrit | Merged openstack-infra/shade master: Fix image task uploads https://review.openstack.org/511532 | 20:04 |
SamYaple | all data and metadata, yup | 20:04 |
fungi | clarkb: oh, so bandwidth hit | 20:04 |
clarkb | its so stupid simple to use | 20:04 |
clarkb | really like the simplicity of it | 20:04 |
SamYaple | and it can generate block devices | 20:04 |
SamYaple | its really nice | 20:05 |
SamYaple | thinly or think provisioned block devices at that | 20:05 |
clarkb | would be two commands to have our current logs lvm set in place | 20:05 |
fungi | clarkb: i'll reserve "simple to use" for something with mainline kernel support i can use for my boot/rootfs | 20:06 |
SamYaple | fungi: 16.04 started including it | 20:07 |
SamYaple | fungi: so you can totally do that by default | 20:07 |
fungi | i suppose if i were to switch to freebsd it would be mainline | 20:07 |
clarkb | I mean ext4 + lvm is also relatively simple, just more verbose | 20:08 |
fungi | well, ubuntu is shipping out-of-tree kernel drivers for zfs, right? i thought the cddl was incompatible with the gplv2 | 20:08 |
clarkb | fungi: ya its a module | 20:08 |
SamYaple | there was a big license uproar about it, but it landed as "meh" | 20:08 |
SamYaple | the whole thing was "prove damages" and no one could | 20:08 |
SamYaple | so it didnt really go anywhere | 20:08 |
fungi | clarkb: yeah, all my personal systems boot from lvm2 | 20:08 |
clarkb | ya Fontana had a talk about it at seagl | 20:09 |
fungi | grub has fine support for searching logical volumes these days | 20:09 |
SamYaple | and zfs ;) | 20:09 |
clarkb | tldr hard to show damages because source is provided on both sides and they aren't charging money from it thatyou'd be able to charge elsewhere | 20:09 |
SamYaple | that was my takeaway too | 20:09 |
clarkb | fungi: my zfs box is booting lvm + ext4 on a dedicated device which then mounts the zfs pool | 20:09 |
fungi | heh | 20:09 |
clarkb | approaching 2 million inodes | 20:10 |
clarkb | also scrubs are auto niced for you | 20:11 |
fungi | it's like we've finally reached warp factor 2 | 20:11 |
clarkb | so in theory they don't have major impact | 20:11 |
fungi | and the reactor hasn't even shaken apart | 20:12 |
SamYaple | warp 2 on which scale? | 20:12 |
SamYaple | this is important | 20:12 |
fungi | oh, cochrane scale, sorry | 20:13 |
clarkb | jeblair: what was the magic sauce for handling all those emails back in the day and inode counts? I imagine that was a very high inode to disk ratio? | 20:14 |
dmsimard | before we got sidetracked by filesystems discussion I was trying to brainstorm about solutions to help with the unfortunate contribution of ARA to the inode exhaustion :p Another low hanging fruit would be to consider generating an ara report only when there is a job failure | 20:14 |
clarkb | dmsimard: ya that was the idea pabelanger had earlier, I like it because I really only look at ara when things hav ebroken | 20:14 |
pabelanger | and back | 20:14 |
pabelanger | catching up on backscroll | 20:14 |
clarkb | that seems like a relatively easy intermediate fix | 20:14 |
fungi | clarkb: "back in the day" your mailbox was one file which kept getting appended to | 20:15 |
SamYaple | was this inode issue the underlying problem with the mirror? | 20:15 |
dmsimard | clarkb, jeblair: would a post job know that the job is going to fail ? | 20:15 |
dmsimard | I guess the executor knows, but it's probably not passed on as a piece of information to the post jobs | 20:15 |
pabelanger | ianw: ya, we could configure-mirror roles should protect us with DIB_DISTRIBUTION_MIRROR, however we could also fix in finalize.d if we wanted | 20:15 |
*** gouthamr has joined #openstack-infra | 20:16 | |
mordred | dmsimard: yah - I believe there is a status variable that the post job should know about | 20:16 |
clarkb | SamYaple: no they were separate issues | 20:16 |
mordred | dmsimard: it's called 'success' | 20:16 |
clarkb | SamYaple: mirror is on afs, logs inode is a 12TB ext4 fs | 20:16 |
SamYaple | got it. and also ouch | 20:17 |
clarkb | ya when it rains it pours | 20:17 |
SamYaple | all this unrelated to rolling out zuulv3 ya? | 20:17 |
pabelanger | okay, ubuntu-trusty DIB uploading to rackspace and citycloud-kna1 still | 20:17 |
dmsimard | mordred: zuul_success ? | 20:18 |
dmsimard | mordred: like http://git.openstack.org/cgit/openstack-infra/project-config/tree/roles/submit-logstash-jobs/tasks/main.yaml#n6 | 20:18 |
clarkb | SamYaple: other than zuulv3 double running jobs potentiaily adding inodes to the logs fs correct | 20:18 |
mordred | dmsimard: yes! | 20:18 |
fungi | SamYaple: correct (discounting that we were adding a few additional build logs from running extra copies of a lot of jobs under v3 which likely didn't help matters) | 20:18 |
SamYaple | man. craziness | 20:18 |
dmsimard | mordred: ok, I'll send a patch. | 20:18 |
*** hasharAway has quit IRC | 20:18 | |
mordred | dmsimard: cool - I think that'll buy us time to think about some of the other options | 20:19 |
pabelanger | mordred: Shrews: do you think new shade will be today to address rackspace uploads? I'm asking, because we might have to remove ubuntu-trusty from rackspace, since it is broken | 20:19 |
*** esberglu has quit IRC | 20:19 | |
fungi | pabelanger: honestly, we run so few jobs on trusty at this point that having it missing from a few regions for a while won't hurt us much | 20:20 |
mordred | pabelanger: yes - it should be not too much longer | 20:20 |
pabelanger | fungi: yah, that is true | 20:20 |
fungi | i would just go ahead and delete it there regardless of the timeline for getting a replacement uploaded | 20:20 |
pabelanger | fungi: I'll propose the patch | 20:21 |
fungi | pabelanger: we can't just delete the trusty images there and wait for a corrected upload to eventually work? | 20:21 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add a toggle to enable saving the ARA sqlite database https://review.openstack.org/511529 | 20:22 |
*** esberglu has joined #openstack-infra | 20:22 | |
pabelanger | fungi: Hmm, I think we could | 20:22 |
pabelanger | that might be better | 20:22 |
pabelanger | fungi: I'll start with rax-ord and see | 20:23 |
*** ijw has quit IRC | 20:25 | |
*** ijw has joined #openstack-infra | 20:25 | |
mnaser | when all this zuulv3 stuff settle, i would like to work with some infra core to add some monitoring, these things are so much easier to solve when you know they're coming up beforehand :( | 20:26 |
mnaser | i can share some of the stuff we do and the tooling of how to do it in a distributed way (mostly stateless sensu-server, servers define their own checks in sensu-client) .. but yeah, i think it'd make all of our lives easier to find out about issues in advance (hopefully) | 20:27 |
clarkb | mnaser: at the ptg rough plan was tohave a spec detailing options available then going from there (just because there are so many tools and they have their own strenths and weaknesses) | 20:27 |
clarkb | mnaser: I think we were initially wary of sensu due to its open core nature and the need to run a message bus for it | 20:28 |
clarkb | (but it should be on the list of options probably) | 20:28 |
mnaser | clarkb i have a very document going over many of the OSS monitoring tools and why we ended up at sensu so i'll find that and share it | 20:28 |
*** AJaeger has quit IRC | 20:29 | |
clarkb | 2.5million | 20:29 |
*** kgiusti has left #openstack-infra | 20:29 | |
*** rbrndt has joined #openstack-infra | 20:29 | |
mnaser | yeah... but honestly, we haven't ran into any issues where we were like "dang, we'd want the enterprise for this one" .. the nice thing is that yuo dont have to maintain the checks in the server (unlike most other tools) but at the client, which makes it cleaner in writing puppet manifests and what not, but anyways, /me puts name down for that | 20:29 |
*** rbrndt has quit IRC | 20:29 | |
dmsimard | mnaser: I'm accountable for drafting a spec to do proactive monitoring | 20:30 |
dmsimard | mnaser: I signed up for that :) | 20:30 |
mnaser | oh even better :> | 20:30 |
*** jkilpatr_ has quit IRC | 20:30 | |
pabelanger | Shrews: when you have a moment, I'm not sure why image-delete is not working. I get back 'Image upload not found' | 20:30 |
pabelanger | Shrews: sudo -H -u nodepool nodepool image-delete --provider rax-ord --image ubuntu-trusty --upload-id 0000002659 --build-id 0000000001 | 20:30 |
dmsimard | mnaser: I believe it was part of when we discussed https://etherpad.openstack.org/p/queens-infra-metric-collection | 20:31 |
* fungi would prefer to see something based on an established standard, ideally snmp, but is willing to entertain other options | 20:31 | |
pabelanger | I don't mind nagios, after all these years | 20:31 |
fungi | definitely not a fan of monitoring systems which need server-side agents speaking nonstandard protocols | 20:31 |
fungi | i did deal with nrpe for many years on that front, but having proper snmp backends to check this is so much nicer | 20:32 |
*** e0ne has joined #openstack-infra | 20:32 | |
fungi | and net-snmp is very extensible if you want to write your own extensions for custom mibs | 20:32 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs https://review.openstack.org/511555 | 20:33 |
Shrews | pabelanger: hrm, not sure. may need to do some digging | 20:33 |
clarkb | fun queston time. Assuming we've got inodes and ubuntu mirror stuff under control. Is the last outstanding item for v3 re rollout 511260 to fix cache useage? | 20:33 |
fungi | and anyway, the two major issues we have would have been spotted by 1. trending inode usage for filesystems (there is a standard oid for that, easy enough to check over snmp) and 2. evaluating the last updated timestamps on our mirrors (these can be polled over http and analyzed quite trivially) | 20:34 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-.*python{34,35} jobs https://review.openstack.org/511557 | 20:34 |
fungi | clarkb: as far as i know, yes (well, and getting check/periodic pipelines added back i guess so people can dry-run their v3 jobs again) | 20:35 |
jeblair | infra-root, dmsimard: when folks have a moment, i'd like to have some semi-structured conversation about 1) options for ara in v3 followed by 2) refreshing the rollout plan for v3 | 20:36 |
clarkb | I'm good now | 20:36 |
jeblair | assuming we think that fires are out enough we can do that while we wait for bg tasks to complete | 20:36 |
*** rbrndt has joined #openstack-infra | 20:37 | |
fungi | jeblair: yep, i think we're in a good place for that now. inode usage has been dropping steadily rather than increasing for a while | 20:37 |
dmsimard | jeblair: I am working on a patch for emit-ara-html to be able to only generate a report on job failure, it's a low hanging fruit that we can put through fairly easily. | 20:37 |
fungi | thanks dmsimard! | 20:37 |
jeblair | let's use this etherpad: https://etherpad.openstack.org/p/hdYC2ZKfWd | 20:37 |
dmsimard | jeblair: Beyond that, it requires a bit of thinking outside the box -- whether that's figuring out how to translate a sqlite database to an ara interface on the fly somehow, or use a centralized instance, etc. | 20:37 |
jeblair | dmsimard: yeah -- let me articulate my current thinking: | 20:38 |
jeblair | point 1: we think running ara on every v3 job is bad for inodes | 20:38 |
jeblair | point 2: we want to roll out v3 soon | 20:38 |
jeblair | point 3: we should come up with short-term solutions to give us breathing room to roll out v3 | 20:39 |
Shrews | pabelanger: oh, i think you have upload-id and build-id backwards | 20:39 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Remove ubuntu-trusty from rackspace https://review.openstack.org/511570 | 20:39 |
pabelanger | Shrews: oh, maybe | 20:39 |
jeblair | point 4: there are long term changes that may make this better | 20:39 |
pabelanger | Shrews: let me test | 20:39 |
jeblair | so i'm thinking we mostly need to decide on a short term solution now to give us room to roll out v3 and implement long-term solutions | 20:39 |
pabelanger | Shrews: Better! Thanks | 20:40 |
jeblair | assuming we accept that point 1 is valid :) | 20:40 |
mordred | I agree with those 4 points and the goal | 20:40 |
* Shrews debugs pabelanger | 20:40 | |
*** hamzy has quit IRC | 20:40 | |
mordred | (like, I think that making it so that running ara on every change IS a thing that we want to do - but that is also likely to take slightly longer) | 20:40 |
pabelanger | thanks! it is now deleting | 20:40 |
*** Apoorva_ has joined #openstack-infra | 20:41 | |
dmsimard | I agree with that as well, however I'd appreciate highlighting that while ara contributes to the issue it is not the sole responsible :( | 20:41 |
SamYaple | silly question, what is the ARA thing? | 20:41 |
clarkb | SamYaple: ara has a floor of like 400 files per job and ceiling much higher depending on the job so it uses a lot more inodes than beofre when jobs might have had a couple files logged | 20:41 |
dmsimard | SamYaple: this: http://logs.openstack.org/21/474721/7/check/gate-openstack-ansible-openstack-ansible-ceph-ubuntu-xenial/779e047/logs/ara/ | 20:41 |
*** e0ne has quit IRC | 20:41 | |
clarkb | In theory our jobs succeed more than they fail and ara is a useful debugging tool. I'd be inclined to start iwth ara only on failure and if that isn't enough then possibly just remove it by default? | 20:42 |
SamYaple | oh i see | 20:42 |
clarkb | would it be possible to have ara locally accept the json file and emit a report? | 20:42 |
dmsimard | Going back to my previous statement, I'd like to make sure that we follow up with the other projects to make sure they are not needlessly logging things | 20:42 |
*** eharney has quit IRC | 20:42 | |
clarkb | so that we can continue logging the json file and then only feed it to ara if you know you want it? | 20:42 |
clarkb | dmsimard: yes we should continue pushing on that too | 20:43 |
jeblair | yes, though we also have longer term plans to rework logging so we may not care as much. | 20:43 |
dmsimard | clarkb: I am looking in doing a bit like what you are proposing but with the sqlite database instead. Running off of the JSON would require more work. | 20:43 |
*** eharney has joined #openstack-infra | 20:43 | |
mordred | clarkb, dmsimard: I think that falls into the category of "medium to longer term we can make improvements to how we're using ARA or how ARA works or whatnot" | 20:43 |
*** Apoorva has quit IRC | 20:43 | |
jeblair | for instance, following mordred's proposal to its conclusion means we get to the point where we say "every job gets 100MB. put whatever you want in there. it goes in swift. we don't care" | 20:44 |
jeblair | so i think it's worthwhile to push back on some really large inode jobs, but i think that's not a long-term sustainable strategy. | 20:44 |
* dirk has a few inodes to give away | 20:44 | |
* mordred takes dirk's inodes | 20:44 | |
clarkb | jeblair: swift too has inode like limits last I looked into it | 20:45 |
mordred | jeblair, clarkb, dmsimard: so far I'm the biggest fan of dmsimard's change to run ara only on failure | 20:45 |
clarkb | jeblair: basically there is a ceiling on reasonable number of objects within a container to maintain performance | 20:45 |
pabelanger | I like #2 so far, but this isn't long term right? | 20:45 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479 | 20:45 |
mordred | clarkb: nod - but once we get to that point we'll have a good central place in-which to place limits | 20:45 |
mordred | pabelanger: right - only short term | 20:45 |
jeblair | what's the collecting the sqlite file option? | 20:45 |
notmyname | clarkb: that "ceiling" is rather large, and it doesn't affect client performance | 20:46 |
jeblair | i also favor ara-on-failure at the moment, but i do want to make sure we survey the options | 20:46 |
mordred | jeblair: ++ | 20:46 |
clarkb | notmyname: I think its roughly in the range of our current inode limit though ~1billion | 20:46 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305 | 20:47 |
fungi | jeblair: doing something server-side with ara to render reports out of a database file in a similar way to how we use osla to render log files | 20:47 |
mordred | jeblair: dmsimard was investigating something wsgi-like to do the report generation only on-demand | 20:47 |
mordred | fungi: jinx | 20:47 |
notmyname | clarkb: but it's a per-container thing. so if you're doing a new container per job, that's ok. or container per project may be better | 20:47 |
jeblair | ah, interesting. i feel like that's probably a long-term thing. like, we should be weighing that against running a central server, (or static generation into swift) | 20:48 |
fungi | notmyname: agreed, and we could be doing something similar with local filesystems, but what we're doing now is akin to dumping them all in one container | 20:48 |
jeblair | dmsimard: is server-side sqlite generation a 2 day project or longer? | 20:48 |
mordred | yah - I think that's not a 'by tomorrow' kind of option and might take a little longer for us to be comfortable with it- especially since the main value in the ara reports is helping to diagnose job issues | 20:48 |
mordred | so if we need to make sure the on-the-fly report generation is solid ... | 20:49 |
dmsimard | jeblair: I'm not sure if it's the best approach, it would mean logs.o.o would be using it's cpu for generating reports | 20:49 |
notmyname | ack. the current feature/deep work (hoped for my early next year) will solve that once and for all (ie N billions of objects per container is no problem, only limited by your installed hardware capacity) | 20:49 |
pabelanger | doesn't need to be logs.o.o, we could stand up another server | 20:49 |
mordred | notmyname: cool | 20:49 |
mordred | pabelanger: the files are sqlite | 20:49 |
clarkb | oh that is good to know re swift | 20:49 |
fungi | notmyname: neat! | 20:49 |
jeblair | pabelanger: it'd have to get the sqlite file from logs.o.o, which is not scalable | 20:49 |
dmsimard | jeblair: The direction I'm looking at, is more like... ara.openstack.org/?database=path/to/sqlite/in/logs.sqlite or something | 20:50 |
mordred | I think it's likely better use of hacking resources to figure out a centralized ara than an on-demand ara | 20:50 |
dmsimard | I don't know | 20:50 |
mordred | I could be wrong about that - but it's an unknown enough I doubt it's the solution for this week | 20:50 |
dmsimard | A centralized ara is not complicated, we just need to figure out how to feed the data back to the instance -- because we don't want to have the callback do a call to a remote mysql server on each task. Just the latency from farther nodepool regions would not be good. | 20:51 |
jeblair | dmsimard: i put that as long term option #3 | 20:51 |
mordred | dmsimard: yah - well, there's also grouping issues we'd need to figure out too with centralized | 20:51 |
dmsimard | An example would be to have a "post subunit gearman" thing and then have that import data back into a central instance | 20:51 |
dmsimard | mordred: right, that too. | 20:51 |
pabelanger | mqtt could be a long(er) term option too | 20:52 |
mordred | dmsimard: having one ara with 10000 playbook runs called "run.yaml" ... :) | 20:52 |
dmsimard | there's permalinks for playbooks but not group of playbooks | 20:52 |
fungi | oh, i just realized this is specifically long-term options for ara, not more general options for increasing inode capacity/decreasing inode usage on the logs site | 20:52 |
jeblair | pabelanger: how does mqtt help? | 20:52 |
mordred | fungi: well - no, option 1 is not about ara | 20:52 |
mordred | fungi: long term #1 is about offloading caring about inodes to swift - but has a few steps between us and it to be viable | 20:52 |
dmsimard | I have to step away momentarily... not going to have time to finish the patch for only running on failure, if anyone can put that up I can review in ~1hr | 20:53 |
dirk | Is moving stuff into a readonly squashfs mirror an option? That should save plenty of inodes | 20:53 |
pabelanger | jeblair: in my brain, we publish to mqtt, instead of sqlite, then have a series of ara collectors, to generate static bits, then upload some place. However, would be a lot more moving parts | 20:53 |
dmsimard | nevermind, looks like I got time | 20:54 |
mordred | dmsimard: if you run out of time I can take over | 20:54 |
jeblair | dirk: re-architecting log storage is a long-term option. that would be up there with "use ceph" and "use swift" | 20:55 |
pabelanger | fungi: Shrews: I've deleted ubuntu-trusty images from rax-ord | 20:55 |
mordred | dmsimard: or I can write it if you'd like to think about other things | 20:55 |
jeblair | fungi: that docs-draft thing is worth keeping in mind as it's a medium-term mitigation. | 20:55 |
dmsimard | mordred: nope, I'll have time to finish it. | 20:56 |
dmsimard | mordred: also, it's something we have to be very cautious about.. remember it's in the base job and it's not something that is integration tested :( | 20:57 |
*** caphrim007_ has quit IRC | 20:57 | |
mordred | dmsimard: ++ | 20:57 |
fungi | jeblair: agreed | 20:57 |
clarkb | for 4 weeks of logs ~28760941 is the number of inodes we can use per day | 20:57 |
*** caphrim007 has joined #openstack-infra | 20:58 | |
jeblair | okay, we only have two short term options: #2 run ara on success. #3 don't run ara at all. anything else we can do short-term? | 20:58 |
clarkb | if we want to run 25k jobs per day that is about 1150 inodes per job average | 20:58 |
clarkb | (25k per day is our rough peak from a year ago iirc) | 20:58 |
mordred | jeblair: those are the only things I can think of right now for short term | 20:58 |
*** dangers is now known as dangers_away | 20:58 | |
jeblair | clarkb: that makes me think reducing log retention should be in the short-term list | 20:58 |
*** caphrim007_ has joined #openstack-infra | 20:59 | |
clarkb | I think 10-15k is likely a more reasonable current average jobs per day which will roughly double the inode count | 20:59 |
*** caphrim007_ has quit IRC | 20:59 | |
jeblair | should we also consider reducing to 3 weeks retention as a short-term solution? | 21:00 |
*** iyamahat has quit IRC | 21:00 | |
mordred | ++ | 21:00 |
clarkb | devstack,grenade,tempest,tox related jobs all fit into that set of limitations based on my scanning. But osa, tripleo, and potentially others don't | 21:00 |
clarkb | jeblair: ya I think so | 21:00 |
fungi | ranking preferences, i would have to say i vote 2,3,4,1 | 21:00 |
pabelanger | I like #2 if we can swing it | 21:00 |
pabelanger | but, understand if we have to do #3 | 21:00 |
mordred | well - also - the osa/tripleo inode counts are the same v2 and v3 | 21:00 |
dmsimard | mordred: what are the values for zuul_success ? It seems like it's either undefined or true | 21:01 |
mordred | the main thing is the additional inodes from ara run by v3's ansible | 21:01 |
clarkb | mordred: yes, roughly the same | 21:01 |
mordred | dmsimard: yah - use the | boolean filter | 21:01 |
dmsimard | mordred: ok | 21:01 |
mordred | clarkb: I _think_ a normal zuul-generated-ara report is around 500 inodes | 21:01 |
fungi | we're up over 3m inodes free now, btw | 21:01 |
*** iyamahat has joined #openstack-infra | 21:01 | |
*** caphrim007 has quit IRC | 21:02 | |
clarkb | mordred: ~400 seems to be the low end | 21:02 |
clarkb | mordred: for eg pep8 jobs | 21:02 |
mordred | (as opposed to the multi-10k reports from some of the larger jobs that are using ara in their job content | 21:02 |
Shrews | wait, "run ara on success"? not failure? i must've missed something, cause failure is where ara is most helpful, yeah? | 21:02 |
jeblair | i was leaning toward 2+4 togther then fallback to 3. | 21:02 |
mordred | Shrews: on failure | 21:02 |
*** trown is now known as trown|outtypewww | 21:02 | |
clarkb | jeblair: ya I think that is my preference too | 21:02 |
fungi | Shrews: run on failure, using the zuul_success variable to determine whether there was a failure | 21:02 |
clarkb | 3 is fallback from 2 | 21:02 |
mordred | my prefernce is also 2+4 and fallback to 3 | 21:03 |
jeblair | fungi: if i interpret your earlier statement, you'd prefer to keep retention at 4 weeks even if it means not running ara at all? | 21:03 |
fungi | i'm uncertain option 4 is strictly necessary (aside from the one-time expiration i'm doing right now to deal with the current crisis) | 21:04 |
dmsimard | What we need to keep in mind is that we'll need to retrofit the 'generate ara only on failure' to openstack-ansible, tripleo and kolla-ansible as well, they are using it outside of zuul v3 | 21:04 |
fungi | but yeah, i see options 3 and 4 as roughly equal preference | 21:04 |
*** edmondsw has quit IRC | 21:05 | |
clarkb | dmsimard: I think thats a separate concern of continuing to work with various projects to prune and curate the logs they collect | 21:05 |
fungi | so maybe my preference is 2,3|4,3&4,1 | 21:05 |
clarkb | dmsimard: that might involve only running ara on failure along with cleaning up etc/ and so on | 21:05 |
jeblair | fungi, clarkb, mordred: i think the compromise position then is 2, then 4, then 3. how's that sound? | 21:06 |
fungi | wfm | 21:06 |
dmsimard | I'll have a patch up for #2 soon.. just being extra careful about it and testing every bit of it | 21:06 |
mordred | ++ | 21:06 |
fungi | dmsimard: appreciated! | 21:06 |
clarkb | jeblair: sounds like a plan | 21:06 |
jeblair | okay, i put that in the etherpad | 21:07 |
jeblair | the next thing, while we're all here, is how we should proceed with v3 rollout | 21:07 |
dmsimard | mordred: zuul_success is undefined on failure, right ? (double making sure) | 21:07 |
jeblair | i'm inclined to say that we should allocate tomorrow as a day for continued stabilization | 21:07 |
*** jkilpatr_ has joined #openstack-infra | 21:08 | |
jeblair | the mirror issue may not be resolved by tonight, or even by tomorrow | 21:08 |
SamYaple | jeblair: but i want my zuulv3 for the weekend :( | 21:08 |
clarkb | 511260 finally appears close to merging | 21:08 |
dmsimard | +1, rolling out on a friday is not a good idea | 21:08 |
*** thorst has quit IRC | 21:08 | |
clarkb | jeblair: do we want to maybe turn check et al back on in v3 and watch it? | 21:08 |
fungi | jeblair: yes, at this point i'd be concerned about rolling back onto v3 on a friday | 21:08 |
clarkb | maybe after the ara on failure thing is in place | 21:09 |
mordred | dmsimard, jeblair: two things- a) it's always defined in post playbooks b) it's not set if a pre-playbook or a previous post playbook failed | 21:09 |
dmsimard | mordred: ok. | 21:09 |
mordred | so I think we should also make a patch to zuul to make sure we set either zuul_success or a new variable if ANY of the playbooks fail | 21:09 |
jeblair | mordred: maybe we need a new var? | 21:09 |
mordred | jeblair: yah. let's do that ... I can make that patch | 21:09 |
jeblair | yeah, one of those things. i'm not sure which yet. :) | 21:10 |
fungi | clarkb: i could see turning check pipelines back on in v3 sometime tomorrow, as overal ci volume tends to trail off around 16:00z or so | 21:10 |
*** kjackal_ has quit IRC | 21:10 | |
fungi | on fridays | 21:10 |
jeblair | even with ara-on-failure in place, are we worried about general additional volume? | 21:10 |
pabelanger | jeblair: fungi: if we did rollout on friday, it would possible low jobs over weekend for fixes eager people would want to make | 21:10 |
pabelanger | but, agree we should stabilize first | 21:10 |
jeblair | it's not a lot, though it will run continuously over the weekend and still emit more than normal log volume | 21:10 |
fungi | pabelanger: but fewer of _us_ around to deal with lurking bugs in zuul v3 we have yet to uncover | 21:10 |
jeblair | also, we've had >24h partial outage; i kinda don't want to push it. | 21:11 |
jeblair | it might be nice for folks to be able to land changes for a few minutes. :) | 21:11 |
fungi | yup | 21:11 |
*** eharney has quit IRC | 21:12 | |
clarkb | ya I also don't like feeling compeleed to firefight over the weekend :) | 21:12 |
jeblair | i'm going to start a new section on that etherpad at the bottom | 21:12 |
*** gouthamr has quit IRC | 21:13 | |
fungi | i so hope it's a section listing new drinking games i can try over the weekend | 21:14 |
jeblair | i don't normally like to say "can i please work on the weekend?" but at this point would be willing to help flip the switch sunday evening. | 21:14 |
*** eharney has joined #openstack-infra | 21:14 | |
mordred | yah. I think a sunday rollout is not a terrible idea | 21:14 |
jeblair | and of course, our 1100utc plan for monday or tuesday would be fine too. | 21:14 |
fungi | i could see doing it late sunday _if_ ianw and yolanda are handy to keep an eye on things | 21:15 |
mordred | yes - all three work for me- I'll be flying starting afternoon local time on monday ... | 21:15 |
mordred | so if we do monday morning I may be less available to help than other times | 21:15 |
clarkb | ya I don't mind a later sunday | 21:15 |
fungi | otherwise it'll be one of those where i wake up and never get a chance to pour a cup of coffee | 21:15 |
mordred | I think my preference is sunday, tuesday, monday | 21:15 |
mordred | (or, rather, those are ordered by amount of time/effort I'll be able to directly contribute) | 21:16 |
clarkb | I like sunday because we'll be able to hopefully sort out any issues without the full load of the system on it | 21:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Switch success to false if a post playbook fails https://review.openstack.org/511619 | 21:17 |
clarkb | it was definitely easier to fix problems over the weekend after the last rollout | 21:17 |
dmsimard | what's the rush? | 21:17 |
dmsimard | why sunday ? | 21:17 |
fungi | dmsimard: the longer we wait, the later into the release cycle we creep with these disruptions | 21:18 |
mordred | dmsimard: job config is effectively frozen for people | 21:18 |
mordred | and yah - what fungi said | 21:18 |
dmsimard | okay, that's fair | 21:18 |
mordred | dmsimard, jeblair, clarkb: https://review.openstack.org/511619 is the zuul_success patch | 21:19 |
clarkb | I've also got to start prepping for event things (summit mostly) so earlier the better for me | 21:19 |
jeblair | okay, to my secret disappointment, no one has vetoed sunday :) | 21:19 |
mordred | jeblair: :) | 21:19 |
jeblair | what time sunday works for east-coasters? | 21:19 |
dmsimard | mordred: I'm covering for the edge case where it might not be defined | 21:19 |
jeblair | and, erm, central coasters? | 21:19 |
mordred | dmsimard: that's great - that patch is just making sure that if a post playbook fails that subsequent post playbooks get success==false | 21:20 |
dmsimard | My mother is coming to visit this weekend, I'm east coaster and can respond to pings but might not be available for longer periods of sustained work | 21:20 |
fungi | i have no hard scheduled obligations next week, and am happy to support whatever/whenever people want to do the next v3 rollout attempt | 21:20 |
fungi | i can be around as late as, say, 04:00z | 21:21 |
jeblair | dmsimard: well, i'm not expecting us to do sustained work on sunday, more like perform the transition so that people start the day on nv3 | 21:21 |
fungi | (which is technically utc monday, not sunday, but whatevs) | 21:21 |
jeblair | start monday on v3 that is | 21:21 |
mordred | jeblair: agree | 21:21 |
pabelanger | I'm traveling on Wed, Thur, Friday next week too | 21:21 |
mordred | turns out flipping the switch itself is actually not too hard | 21:21 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure https://review.openstack.org/511622 | 21:21 |
dmsimard | mordred, jeblair, fungi, clarkb, pabelanger ^ | 21:22 |
dmsimard | I *think* that's okay, but please do review it carefully, no integration tests and all | 21:22 |
Shrews | i can be around late sunday, but am having my tooth cleaned monday morning so less around then | 21:22 |
*** aviau has quit IRC | 21:22 | |
fungi | Shrews: just the one tooth, eh? | 21:22 |
dmsimard | lol | 21:22 |
*** aviau has joined #openstack-infra | 21:22 | |
*** jascott1 has quit IRC | 21:22 | |
* clarkb somewhat arbitrarily throws out 2200UTC | 21:23 | |
Shrews | fungi: of course. all i need | 21:23 |
clarkb | that is mid afternoon for pacific coasters and evening for easter/central coasters | 21:23 |
fungi | Shrews: as long as you can open a beer with it, you're all set i suppose | 21:23 |
jeblair | clarkb: that wfm | 21:23 |
mordred | clarkb: 2200 wfm | 21:23 |
fungi | i'm cool with a 22-23:00z window | 21:24 |
dmsimard | jeblair: vars across jobs are merged or replaced ? | 21:24 |
fungi | that's like 6-7pm local here so early by my standards | 21:24 |
dmsimard | jeblair: I mean, if we put 'ara_generate_report: failure' as a var in the base job(s), it's going to stick around, right ? | 21:24 |
jeblair | dmsimard: merged | 21:24 |
dmsimard | jeblair: ok, nice. | 21:24 |
*** mat128 has quit IRC | 21:25 | |
jeblair | dmsimard: yes. it will be overridable by children, but i'm not worried about folks overriding that. for now. | 21:25 |
mordred | ++ | 21:25 |
clarkb | ok should we call it 2200UTC sunday then? | 21:25 |
mordred | ++ | 21:25 |
clarkb | use the rest of today and tomorrow to stabilize | 21:25 |
jeblair | dmsimard: let's just stick in some documentation asking folks to please not override it. :) | 21:25 |
jeblair | clarkb: ++ | 21:25 |
fungi | clarkb: sgtm | 21:25 |
dmsimard | jeblair: I was more worried about someone declaring just 'vars' and removing it more than someone overriding the default value | 21:26 |
clarkb | mordred: do you want to follow up to your thread about the mirror and devstack-gate with a zomg inodes but now we are looking to be in a better place and aiming for sunday rollout? | 21:26 |
*** aeng has joined #openstack-infra | 21:26 | |
mordred | sure! | 21:26 |
openstackgerrit | David Moreau Simard proposed openstack-infra/project-config master: Test ARA report generation only on failure in base-test https://review.openstack.org/511624 | 21:26 |
dmsimard | ^ testing the toggle in base-test | 21:27 |
fungi | we're just over 4m inodes free now, so about halfway to where we want to be for a 1% free cushion | 21:27 |
clarkb | oh maybe wait for the cushion before emailing | 21:27 |
clarkb | and possibly sneak in a "please review your logs and remove unnecessary things like logging all of etc or selinux or systemd units | 21:27 |
fungi | yes, we can gloat about what a great position we're in with only 99% inode utilization on that volume ;) | 21:28 |
clarkb | I could also delete the logs for this one change and free up 2.8 million inodes | 21:28 |
clarkb | to get us there quicker >_> | 21:28 |
fungi | deletion of /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/ercloud/tmp completed a little while ago, looks like | 21:29 |
clarkb | fungi: hrm I was still seeing them | 21:29 |
clarkb | was it undercloud and no ercloud? | 21:30 |
fungi | er, that's a terrible pattern | 21:30 |
fungi | i must have missed an un when i edited that line. restarting :/ | 21:30 |
fungi | "und" added and rerunning. /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/undercloud/tmp this time | 21:31 |
mordred | fungi: that seems gooder | 21:31 |
pabelanger | fungi: clarkb: jeblair: gnutls issue on ubuntu-trusty fixed now too | 21:31 |
clarkb | fungi: oh I think there is a bug in that too | 21:31 |
fungi | pabelanger: excellent! | 21:31 |
pabelanger | and bad images from rackspace deleted | 21:31 |
clarkb | fungi: needs to be gate-tripleo-ci-*/*/logs/undercloud/tmp | 21:31 |
pabelanger | should be re-uploaded once shade has been released | 21:32 |
clarkb | fungi: to get the build uuid | 21:32 |
fungi | clarkb: yep, you're right. fixing | 21:32 |
pabelanger | fungi: clarkb: which means we can then land https://review.openstack.org/511543/ to fix system-config | 21:32 |
*** thorst has joined #openstack-infra | 21:33 | |
clarkb | pabelanger: I've approved it so should be fine if the recheck comes around good | 21:33 |
mnaser | yay, things getting fixed | 21:34 |
pabelanger | distributed sysops | 21:34 |
mordred | dmsimard: one comment | 21:34 |
*** eharney has quit IRC | 21:34 | |
mordred | dmsimard: otherwise looks good to me | 21:34 |
mordred | clarkb: I would not oppose you deleting all the logs for that one job :) | 21:35 |
pabelanger | clarkb: I've also just removed nb04.o.o from emergency file | 21:35 |
*** thorst has quit IRC | 21:36 | |
*** lifeless has quit IRC | 21:37 | |
dmsimard | mordred: the reason it took longer to get the patch up is that I was testing exactly your comment | 21:39 |
dmsimard | mordred: the problem is that false and failure have different behaviors | 21:39 |
dmsimard | mordred: false == never generate, failure == only generate on failure | 21:39 |
mordred | dmsimard: yes - they do - but the first condition is checking for true | 21:40 |
dmsimard | true would be == always generate | 21:40 |
mordred | dmsimard: so == true and | bool should both have the same effect | 21:40 |
mordred | you're using | bool on the false branch already | 21:40 |
dmsimard | oh | 21:40 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299 | 21:40 |
dmsimard | let me test | 21:40 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479 | 21:40 |
mordred | dmsimard: it's possible this is dumb - my python brain is rejecting == true -but maybe in jinja == true is ok? | 21:40 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Allow domain_id for roles https://review.openstack.org/496992 | 21:41 |
clarkb | 511260 should enter the gate shortly, last job is running against it now | 21:41 |
dmsimard | mordred: yeah, == true is okay and works | 21:41 |
clarkb | did we decide on whether or not we should reenable v3 pipelines? | 21:41 |
dmsimard | mordred: but | bool also works | 21:41 |
clarkb | jeblair: ^ | 21:41 |
dmsimard | mordred: I was afraid of using | bool for what might up being a string (that would evaluate to true) | 21:42 |
dmsimard | mordred: in python, a non-empty string is true | 21:42 |
mordred | dmsimard: no - | bool on a string is false (I just checked that in ansible) | 21:42 |
*** srobert_ has joined #openstack-infra | 21:42 | |
dmsimard | mordred: but in the jinja bool filter, it looks like a non-empty string (that is not 'false') is false | 21:42 |
mordred | dmsimard: BUT ... if == true works in jinja, let's do it | 21:42 |
openstackgerrit | Merged openstack-infra/puppet-openstack_infra_spec_helper master: Cap signet < 0.8.0 https://review.openstack.org/511543 | 21:42 |
dmsimard | mordred: | bool is fine, I tested it and it works.. it's just my brain confusing python and ansible/jinja boolean :/ | 21:43 |
mordred | dmsimard: or, rather, I'm fine either way now that we've verified that | bool and == true both ahve the same impact | 21:43 |
clarkb | pabelanger: ^ there we go | 21:43 |
openstackgerrit | David Shrewsbury proposed openstack-infra/shade master: Move role normalization to normalize.py https://review.openstack.org/500170 | 21:43 |
jeblair | clarkb: drat. we did not. | 21:43 |
mordred | maybe re-enable tomorrow once the system has stablized more? | 21:43 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure https://review.openstack.org/511622 | 21:43 |
dmsimard | mordred: ^ now with | bool | 21:43 |
jeblair | mordred: yeah, that sounds like a plan | 21:44 |
clarkb | wfm | 21:44 |
pabelanger | clarkb: yah, rechecking system-config now | 21:44 |
clarkb | over 5 million now | 21:44 |
dmsimard | mordred: also added a comment for posterity | 21:44 |
dmsimard | clarkb: wow that's over 9000 | 21:44 |
fungi | mordred: jeblair: clarkb: i suggested tomorrow as well, mainly because around 16:00z on a friday utilization will start to trail off heading into the weekend so we can continue to make progress on inode cleanup in the background | 21:44 |
openstackgerrit | Mohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Drop tox_constraints_file from include_role for release notes https://review.openstack.org/511627 | 21:45 |
openstackgerrit | Mohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Move tox_envlist into job variables for releasenote jobs https://review.openstack.org/511628 | 21:45 |
*** srobert has quit IRC | 21:45 | |
mnaser | i identified two issues with releasenote jobs, very quick review mordred ^ | 21:45 |
fungi | somewhere around 16-19:00z at any rate | 21:45 |
mnaser | you can also see the failure happening here - http://logs.openstack.org/54/511054/1/check/build-openstack-releasenotes/6aeff53/job-output.txt.gz | 21:45 |
fungi | but for now, i need to disappear for a while. back later | 21:46 |
mnaser | (sorry just stomping here mid-discussion) | 21:46 |
*** jcoufal has quit IRC | 21:46 | |
*** srobert_ has quit IRC | 21:47 | |
mordred | mnaser: yes. great patches | 21:47 |
*** esberglu has quit IRC | 21:47 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Switch success to false if a post playbook fails https://review.openstack.org/511619 | 21:48 |
*** esberglu has joined #openstack-infra | 21:48 | |
mnaser | should i add those in any etherpad that's being used so they get eyes or maybe someone here around can give them a quick review (i haven't been following much today) | 21:48 |
*** iyamahat_ has joined #openstack-infra | 21:48 | |
*** boden has quit IRC | 21:49 | |
mordred | clarkb, pabelanger, fungi: mnaser's patches above lgtm - could use an extra set of eyeballs | 21:49 |
*** iyamahat has quit IRC | 21:49 | |
*** slaweq_ has quit IRC | 21:50 | |
pabelanger | +3 | 21:50 |
jeblair | mordred: i know we all have brainhurt, but do you to chat about periodic jobs and branches now? | 21:50 |
mordred | jeblair: yes - although I just had an idea I want to float first ... | 21:51 |
jeblair | https://review.openstack.org/511533 was the change which brought it up | 21:51 |
mordred | jeblair: does abandoning a change emit an event zuul can act on? | 21:51 |
jeblair | mordred: yes | 21:51 |
jeblair | (whether it correctly does act on it in either master or v3 atm, i could not say for certain) | 21:52 |
pabelanger | clarkb: zomg, centos-7 failed again with ssh key | 21:52 |
mordred | jeblair: what if we made a 'cleanup' pipeline that ran when a change is abandoned, and had something that would delete the logs for an abandoned change ... | 21:52 |
pabelanger | clarkb: looks like I'll be debugging that next | 21:52 |
mordred | (came to mind as I just abandoned a bunch of DNM test patches that each had a ton of tests associated with them) | 21:52 |
*** esberglu has quit IRC | 21:52 | |
jeblair | mordred: the v3 logs have the abandoned tests enabled, so it should be functioning. | 21:53 |
pabelanger | for now, I have to run. | 21:53 |
jeblair | mordred: interesting. i *think* that would take code changes. | 21:53 |
jeblair | mordred: i believe we have zuul hard-coded to remove abandoned changes from pipelines | 21:53 |
jeblair | mordred: so we'd have to drop that and rely on the 'status:open' pipeline requirement | 21:53 |
jeblair | mordred: i think it's feasible; couple hours of work. | 21:54 |
*** threestrands has joined #openstack-infra | 21:54 | |
jeblair | (tbh, i think that's the better long-term structure for the code anyway) | 21:55 |
*** Apoorva_ has quit IRC | 21:55 | |
jeblair | (i think the hard-coding predates pipeline requirements) | 21:55 |
mordred | jeblair: maybe let's put that on the backburner for next time we feel like hacking in such areas ... if we did that, I think perhaps we rename the merge-check template to "system-required" and put the 'delete logs' job into that - so that the openstack story is "you always have to have the system-required template" | 21:55 |
*** gouthamr has joined #openstack-infra | 21:55 | |
jeblair | ++ | 21:56 |
mordred | maybe we should do that second part anyway | 21:56 |
jeblair | ++ | 21:56 |
mordred | jeblair: ok - so - branches | 21:57 |
jeblair | the idea in v3 is that periodic jobs are just like regular jobs. so instead of putting "periodic-foo-master" and "periodic-foo-pike" on a project, you just put "foo" | 21:57 |
mordred | jeblair: yes! this I agree with wholeheartedly | 21:58 |
jeblair | zuul emits trigger events for every project-branch combination | 21:58 |
mordred | jeblair: and now I believe I understand what you were saying | 21:58 |
*** wolverineav has quit IRC | 21:58 | |
jeblair | so if you add a periodic job to a project, it'll run on all that projects branches | 21:58 |
jeblair | so if you only want it to run on a subset of branches, you just use branch matchers in the project-pipeline in the regular way | 21:58 |
mordred | jeblair: so instead of putting branch-override ... yah. that | 21:58 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Drop tox_constraints_file from include_role for release notes https://review.openstack.org/511627 | 21:58 |
mordred | jeblair: cool. that all makes sense to me- and yes, I think that's definitely the way to go for new jobs | 21:59 |
jeblair | (or, you can put the branch matcher on the job definition itself, if that's something you can say globally) | 21:59 |
jeblair | mordred: yep | 21:59 |
*** yamamoto has joined #openstack-infra | 21:59 | |
*** lifeless has joined #openstack-infra | 22:00 | |
mordred | jeblair: for *legacy* jobs... I think ajaeger's patch - except s/override-branch/branches/ is the right thing | 22:00 |
mordred | jeblair: becuase those generated jobs are all expecting to only be triggered for the branch in question | 22:00 |
*** wolverineav has joined #openstack-infra | 22:00 | |
mordred | jeblair: and we should definitely replace them all with new v3 jobs that are done correctly - but I don't think we should try to correct them semantically in place | 22:00 |
jeblair | okay, that works for me. | 22:01 |
mordred | (I worry that if we tried to collapse at that scale we'd get something weirdly wrong) | 22:01 |
jeblair | good point | 22:01 |
mordred | branches can take a scalar right? | 22:01 |
jeblair | mordred: a scalar or a list | 22:01 |
mordred | so "branches: master" works? awesome | 22:01 |
mordred | I'll modify that patch real quick | 22:01 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645 | 22:02 |
*** esberglu has joined #openstack-infra | 22:02 | |
jeblair | yep. as does "branches: [stable/ocata, stable/pike]". this is, i think, going to be a big improvement. :) | 22:02 |
jeblair | the checksums process is 93% complete; i'm afk for 20m. | 22:03 |
*** iyamahat_ has quit IRC | 22:03 | |
*** iyamahat__ has joined #openstack-infra | 22:03 | |
mordred | jeblair: while I'm doing that - you feel like restarting v3 scheduler to pick up the zuul_success fix so we can test dmsimard's patch? | 22:03 |
mordred | or - afk - that's better | 22:03 |
clarkb | pabelanger: I have a change up to help debug that by dumping data from config drive and /home/root/.ssh/authorized_keys, not sure if it mergee | 22:04 |
dmsimard | I have to relocate, I'll be back in >1hr | 22:06 |
*** esberglu has quit IRC | 22:06 | |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add branches to all periodic jobs https://review.openstack.org/511533 | 22:07 |
*** rbrndt has quit IRC | 22:08 | |
openstackgerrit | Merged openstack-infra/system-config master: Add documentation on force-merging a change https://review.openstack.org/511248 | 22:08 |
openstackgerrit | Merged openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305 | 22:10 |
*** rbrndt has joined #openstack-infra | 22:10 | |
*** jascott1 has joined #openstack-infra | 22:12 | |
*** bobh has quit IRC | 22:12 | |
*** mriedem1 has joined #openstack-infra | 22:14 | |
*** mriedem has quit IRC | 22:15 | |
*** jascott1 has quit IRC | 22:16 | |
*** jascott1 has joined #openstack-infra | 22:16 | |
*** Keitaro has quit IRC | 22:20 | |
*** jascott1 has quit IRC | 22:21 | |
*** gildub has joined #openstack-infra | 22:22 | |
clarkb | pabelanger: https://review.openstack.org/#/c/501887/ | 22:25 |
*** jascott1 has joined #openstack-infra | 22:25 | |
clarkb | I've rechecked it, I was sort of hoping we'd catch a failure premerge | 22:26 |
clarkb | we are at 99% used now | 22:26 |
clarkb | fungi: ^ fyi | 22:27 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use native propose-translation jobs https://review.openstack.org/511435 | 22:28 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add branches to all periodic jobs https://review.openstack.org/511533 | 22:28 |
clarkb | I've approved 511260 now | 22:28 |
*** threestrands has quit IRC | 22:28 | |
mordred | jeblair, clarkb: ^^ updated both patches - they should be correct for v3 now | 22:28 |
clarkb | I'm reviewing the ara on failure related changes now | 22:30 |
mordred | cool | 22:30 |
jeblair | mordred: back. i think that's an executor fix. we can leave the scheduler and restart the execs | 22:30 |
mordred | jeblair: oh! good point | 22:30 |
jeblair | mordred: i will do that | 22:30 |
mordred | jeblair: thanks | 22:30 |
*** gouthamr has quit IRC | 22:30 | |
jeblair | okay, i ran "service zuul-executor stop" on all ze machines, and they all stopped cleanly | 22:32 |
jeblair | mordred: there's something i think i need your help with though | 22:33 |
jeblair | mordred: every time zuul is installed, it seems to ignore the "-e git+https" requirement and re-installs the new version of gitpython | 22:33 |
jeblair | mordred: i think clarkb said pbr may somehow be involved | 22:33 |
fungi | i just knew if i stopped to eat something, i'd miss the great unveiling of the 99% | 22:34 |
clarkb | jeblair: mordred yes it is, basically setuptools doesn't understand those requirements so when pbr reads those reqs into setuptools it strips all the git stuff out and uses the egg as is | 22:34 |
clarkb | jeblair: mordred you have ot intsall requirements with pip directly to get it to do what you expect | 22:34 |
*** Keitaro has joined #openstack-infra | 22:34 | |
clarkb | mordred: dmsimard re https://review.openstack.org/#/c/511624/1 what tests use base-test? I'd like to see them not include ara reports on success | 22:35 |
jeblair | clarkb: yeah, but we re-install zuul on every commit. so that means we're now uninstalling our gitpython fork on every commit. | 22:35 |
clarkb | jeblair: probably the thing to do is make zuul install a pip install -U /path/to/zuul && pip install -U /path/to/zuul/requirements.txt ? | 22:35 |
ianw | jeblair: i see the checksums has now exceeded the size of the original. that's good, i guess? | 22:36 |
jeblair | clarkb: the procedure is to land a change to base-test, then create a DNM change which reparents any job (say, unittests) to base-test, and examine the results. if that works, then we copy the change from base-test to base. | 22:36 |
*** lin_yang has joined #openstack-infra | 22:36 | |
jeblair | clarkb: nothing normally uses base-test. so it's safe to land changes to it as long as they look reasonable. | 22:37 |
clarkb | jeblair: gotcha, so we have to merge the things first | 22:37 |
jeblair | yep. we should put this in some documentation around there :) | 22:37 |
jeblair | ianw: neat! | 22:37 |
*** threestrands has joined #openstack-infra | 22:37 | |
clarkb | I've approved the parent of https://review.openstack.org/#/c/511624/1 if we can get a second review on that change it would be great to get this tested soon | 22:38 |
jeblair | done | 22:39 |
clarkb | tyty | 22:39 |
mordred | jeblair: I agree with the thing clarkb said - we could also do pip install -U /path/to/zuul/requirements.txt && pip install --no-deps -U /path/to/zuul | 22:40 |
jeblair | clarkb: yeah, i think your install procedure will work | 22:40 |
jeblair | i just tested that manually... | 22:40 |
jeblair | mordred: should i do clarkb's thing or try yours? | 22:40 |
clarkb | mordreds is likely a bit quicker | 22:41 |
mordred | jeblair: try mine- it's less churn - clarkb's will owrk - but will result in gitpython temporarily being changed | 22:41 |
fungi | and does also cause some dependencies to be installed and then reinstalled i guess | 22:41 |
clarkb | seems like inode cleanup is really flying now. I wonder if that is the addition of the undercloud/tmp cleanup? | 22:42 |
fungi | (in the case of the ones which have to be installed from git urls) | 22:42 |
fungi | clarkb: i have a feeling it's ci load dropping off for the evening | 22:42 |
fungi | we tend to make more progress on bulk cleanup tasks like this on the logs site off-hours and on weekends | 22:43 |
*** mriedem1 has quit IRC | 22:43 | |
clarkb | ah | 22:43 |
openstackgerrit | Monty Taylor proposed openstack-infra/puppet-zuul master: Split zuul and requirements install https://review.openstack.org/511637 | 22:43 |
fungi | whereas during peak load its lucky to be marching in place | 22:43 |
mordred | jeblair, clarkb: ^^ like that | 22:43 |
jeblair | oh that's easier than what i was about to do. :) | 22:43 |
jeblair | mordred: needs a "-r" though, right? | 22:44 |
mordred | jeblair: were you going to make a puppet resource dependency graph? | 22:44 |
jeblair | mordred: yes | 22:44 |
mordred | jeblair: YES | 22:44 |
openstackgerrit | Monty Taylor proposed openstack-infra/puppet-zuul master: Split zuul and requirements install https://review.openstack.org/511637 | 22:44 |
jeblair | lgtm | 22:45 |
clarkb | approved | 22:45 |
mordred | jeblair: does https://review.openstack.org/#/c/511533 and its parent look better to you now? | 22:45 |
jeblair | i re-installed deps manually and restarted all ze machines | 22:46 |
jeblair | mordred: yep | 22:47 |
clarkb | because I was curious I checked my local 4TB zpool's inode count and it has more than 7 times the number of inodes of our 12TB fs | 22:47 |
clarkb | (and that was just with default fs creation commands) | 22:47 |
clarkb | if we ever get around to moving this filesystem maybe we should multiply the inode count by some big number | 22:48 |
jeblair | ya | 22:48 |
mordred | who would ever need more than 640k of inodes | 22:48 |
mordred | s/0// | 22:48 |
mordred | bleh | 22:48 |
*** rbrndt has quit IRC | 22:52 | |
jeblair | fungi: want to send the status ok? | 22:54 |
*** iyamahat__ has quit IRC | 22:54 | |
jeblair | ianw: however, being at 105% of completion has thrown off my time estimates. :| | 22:54 |
ianw | i got all day :) | 22:55 |
ianw | i just hope it fixes it | 22:55 |
clarkb | jeblair: that is a neat trick | 22:55 |
jeblair | the file it currently has open is in the "/u/" directory | 22:56 |
jeblair | i have no idea how close to alphabetical it is though. | 22:56 |
fungi | jeblair: you bet | 22:56 |
jeblair | next time (please no) -- find > file, then reprepro < file. | 22:57 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure https://review.openstack.org/511622 | 22:57 |
*** iyamahat has joined #openstack-infra | 22:58 | |
fungi | status ok Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now | 22:59 |
fungi | that look okay? | 22:59 |
jeblair | ++ | 23:00 |
clarkb | yes | 23:00 |
ianw | jeblair: combined with "pv" in the middle, it might even come up with an accurate % | 23:00 |
fungi | #status ok Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now | 23:00 |
openstackstatus | fungi: sending ok | 23:00 |
*** aeng has quit IRC | 23:01 | |
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure | docs http://docs.openstack.org/infra/ | bugs https://storyboard.openstack.org/ | source https://git.openstack.org/cgit/openstack-infra/ | channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/" | 23:03 | |
-openstackstatus- NOTICE: Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now | 23:03 | |
jeblair | ianw: oh it finished! | 23:03 |
jeblair | 13484 files were added but not used. | 23:03 |
jeblair | The next deleteunreferenced call will delete them. | 23:03 |
ianw | :/ that seems like ... a lot | 23:03 |
*** xarses has quit IRC | 23:04 | |
jeblair | ianw: i'm a little worried that maybe i should have started with step 1? | 23:04 |
ianw | ? i maybe wouldn't run the /usr/local/bin script, isn't deleteunreferenced a separate step there (checking ...) | 23:04 |
jeblair | i was assuming that references.db was okay, but i don't know that. (i'm not actually sure how to know that) | 23:04 |
openstackgerrit | Merged openstack-infra/puppet-zuul master: Split zuul and requirements install https://review.openstack.org/511637 | 23:05 |
ianw | hmm, that one i regenerated yesterday afternoon | 23:05 |
jeblair | ianw: okay, assuming that's okay, then i think we've done step1 and step2 | 23:06 |
openstackstatus | fungi: finished sending ok | 23:06 |
openstackgerrit | Merged openstack-infra/shade master: Image should be optional https://review.openstack.org/511299 | 23:06 |
jeblair | ianw: there is no step 3 | 23:06 |
clarkb | can we serve the read write volume befor ereleasing it? | 23:07 |
jeblair | ianw: should i try running "reprepro update" now? | 23:07 |
ianw | jeblair: i would just run "k5start -t -f /etc/reprepro.keytab service/reprepro -- reprepro --confdir /etc/reprepro/ubuntu update" by hand | 23:07 |
ianw | yeah, that :) | 23:07 |
jeblair | clarkb: yeah, but we're not there yet i don't think | 23:07 |
openstackgerrit | Merged openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479 | 23:07 |
clarkb | gotcha | 23:07 |
ianw | oh with a -VVVV | 23:07 |
openstackgerrit | Merged openstack-infra/shade master: Allow domain_id for roles https://review.openstack.org/496992 | 23:07 |
openstackgerrit | Merged openstack-infra/shade master: Move role normalization to normalize.py https://review.openstack.org/500170 | 23:07 |
ianw | *hopefully* it does something other than peg at 100% cpu saying nothing | 23:07 |
jeblair | okay, i will copy the db files from the local disk into afs, delete *.old (i still have them on the local disk), then run that. | 23:08 |
jeblair | ianw: reprepro --confdir /etc/reprepro/ubuntu -VVVV update | 23:10 |
jeblair | look right? | 23:10 |
ianw | yep | 23:10 |
*** wolverineav has quit IRC | 23:10 | |
*** wolverineav has joined #openstack-infra | 23:10 | |
mordred | pabelanger, Shrews: remote: https://review.openstack.org/511643 Release 1.24.0 of shade ... patch submitted to cut new release of shade | 23:11 |
jeblair | good news! it did not hang. | 23:12 |
*** bobh has joined #openstack-infra | 23:12 | |
jeblair | the bad news: File "pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb" is already registered with different checksums! | 23:12 |
mordred | jeblair: grumble | 23:12 |
clarkb | that is a neat trick | 23:12 |
ianw | haha, i knew it was a Farnsworth "good news, everybody!" | 23:13 |
jeblair | http://paste.openstack.org/show/623503/ | 23:14 |
*** rwsu has joined #openstack-infra | 23:14 | |
jeblair | the "expected" values match what's in afs | 23:15 |
clarkb | pabelanger: first recheck of that ssh debugging change didn't fail, trying again | 23:15 |
*** hongbin has quit IRC | 23:15 | |
dmsimard | clarkb: did you get your answer for base-test ? | 23:15 |
dmsimard | clarkb: I figure we could just create an adhoc no-op job based off of base-test if there wasn't any. | 23:16 |
clarkb | dmsimard: ya we need to push a change to eg ozj once things merge to rebase to base-test | 23:16 |
clarkb | jeblair: that imply the got: side is what is already registered? | 23:16 |
jeblair | clarkb: ya... i'm trying to extract a text version of the checksum db to examine | 23:17 |
*** aeng has joined #openstack-infra | 23:18 | |
jeblair | pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb :1:1c15ec44003064eb9e664462f764e98aa5e9d36c :2:e4ca9514867498531f1feea0d081c1d3df8d91d9b8bc0f353315e9a9a362e2a2 d7159bf89cc9df87ba64db43c7d8bd1a 1669873 | 23:18 |
ianw | the sizes aren't even close? | 23:18 |
jeblair | that also seems to match "expected" | 23:19 |
jeblair | what does "got" mean? | 23:19 |
jeblair | from whence was it "got"? | 23:19 |
clarkb | dmsimard: the update to base-test is in the gate now, so once that merges we just push a chang eto ozj or whatever to use base-test in some test (probably a test that runs against ozj) | 23:19 |
*** felipemonteiro has joined #openstack-infra | 23:19 | |
clarkb | jeblair: perhaps got is what the upstream mirror gave it? | 23:19 |
dmsimard | clarkb: right, I won't be at a keyboard for a while. If we want to test it soon, someone else can do it. | 23:20 |
clarkb | dmsimard: ok I'll push one up then | 23:20 |
jeblair | what is our upstream? | 23:20 |
dmsimard | clarkb: thanks! | 23:20 |
ianw | I think it means "I read the info from disk and got this value, so that's what i expect, but the database told me this other value"? | 23:20 |
jeblair | the lines before that were | 23:21 |
jeblair | processing updates for 'trusty-security|main|amd64' | 23:21 |
jeblair | reading '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_trusty-security_main_amd64_Packages' | 23:21 |
jeblair | so was 'got' from that file? | 23:21 |
jeblair | yes | 23:22 |
jeblair | the values in that file match "got" | 23:22 |
openstackgerrit | Merged openstack-infra/project-config master: Test ARA report generation only on failure in base-test https://review.openstack.org/511624 | 23:22 |
*** Apoorva has joined #openstack-infra | 23:23 | |
ianw | jeblair: it seems to be the wrong size ... see one i download from upstream in /tmp | 23:23 |
jeblair | ianw: yeah, that matches the index | 23:23 |
jeblair | so the file we have in our archive is wrong | 23:23 |
ianw | so maybe we just do this over and over, copying in the wrong stuff? | 23:23 |
pabelanger | mordred: I really like the idea of an abandon pipeline, that's a great idea. | 23:24 |
jeblair | or if we remove that file, and remove its entry from checksums.db, will it re-download it and add it? | 23:24 |
openstackgerrit | Clark Boylan proposed openstack-infra/openstack-zuul-jobs master: Reparent ozj integration jobs to base-test for testing https://review.openstack.org/511646 | 23:24 |
ianw | it's probably a lot easier to just wget it in than fiddle the db? | 23:24 |
clarkb | there is the ara test I think | 23:24 |
tonyb | Befoer I write one is the a tool that will take a repo name and grovel around in the zuul(v2) config data and list all the jobs that will be run? (bonus points if it can check for branch exclusions) | 23:24 |
jeblair | ianw: i think if we wget it, we still have to fiddle the db to add the newly correct entry to checksums | 23:25 |
fungi | tonyb: not only project and target branch but also changed files in the diff can determine which jobs will run | 23:25 |
jeblair | ianw: there are commands to both remove a single entry from checsums, as well as add/update one. | 23:25 |
pabelanger | clarkb: cool, I'll add some rechecks to that too. We might also want to setup autohold of the job too | 23:26 |
fungi | tonyb: as well as the pipeline into which the ref is enqueued | 23:26 |
pabelanger | and caught up on backscroll | 23:26 |
ianw | jeblair: should we just delete the file and rerun the update, and see if it just downloads it? | 23:26 |
tonyb | fungi: That is true but for my use case today I don't think that matters | 23:26 |
ianw | maybe it can recover from that | 23:26 |
ianw | if not, move on to replacing | 23:27 |
tonyb | fungi: Hmm I will need to consider the pipeline | 23:27 |
jeblair | ianw: i'm pretty sure we need a db fiddle either way, cause i *think* what's happening here is comparision of checksums.db with package list. i don't know if it's going out to actual files at all. | 23:27 |
jeblair | ianw: so i think we should either remove the file and remove the checksums.db entry; or replace the file and replace the checksums.db entry. | 23:27 |
pabelanger | mordred: 511643 has some errors | 23:28 |
jeblair | ianw: i'm hopeful that if we did the second thing, it would auto-correct. i have no basis other than hope for that though. | 23:28 |
EmilienM | I know you're very busy but if someone can ping me when release-tarball jobs are kick-off again, thanks a lot | 23:28 |
ianw | jeblair: i'm just hoping it stat()s the file or something (i have no idea). i think start simple, remove the file from disk and try update, see what happens | 23:28 |
ianw | just reading up on the checksum remove cmds now | 23:29 |
*** Swami has quit IRC | 23:29 | |
clarkb | ianw: not to completely distrct you from the ubuntu mirror but do you know if there is a fix for http://logs.openstack.org/46/511646/1/infra-check/base-integration-centos-7/910a17b/job-output.txt.gz#_2017-10-12_23_27_37_155946 pushed up yet? | 23:29 |
jeblair | ianw: okay, i'm happy to try 1) remove file; rerun. then if that fails, 2) also remove checksum; rerun. | 23:29 |
mordred | pabelanger: so it does :) | 23:29 |
*** caphrim007 has joined #openstack-infra | 23:29 | |
clarkb | ianw: need to check ansible_default_ipv6 is defined instead of ansible_default_ipv6.address is defined | 23:29 |
*** aeng has quit IRC | 23:30 | |
jeblair | ianw: i'll wait for you to finish reading before i execute. | 23:30 |
fungi | EmilienM: they should be safe to run now. i know there were several tripleo releases i need to reenqueue but was hoping someone on the release team could put together a list of all releases that need reenqueuing besides those so i can do them all in one batch (per my e-mail to the dev list) | 23:30 |
ianw | jeblair: ++ on that plan | 23:30 |
jeblair | tonyb: what's the use case (curiosity) | 23:30 |
*** caphrim007_ has joined #openstack-infra | 23:30 | |
ianw | clarkb: ahh, no i haven't. does that explain the occasional errors? | 23:30 |
clarkb | ianw: I think so | 23:30 |
clarkb | ianw: should I go ahead and push a patc or do you want to? | 23:30 |
ianw | that would be nice, i wasn't looking forward to debugging that | 23:30 |
ianw | i can, let me pull it up | 23:31 |
tonyb | jeblair: I want a list of all the jobs (and nodes) that tripleo run to help understand the impact of keeping stable/newaton around for longer | 23:31 |
ianw | clarkb: that's weird though, i thought that variable was always defined, and just blank | 23:31 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Let v2 publish shade releases again https://review.openstack.org/511649 | 23:31 |
mordred | pabelanger: ^^ | 23:31 |
ianw | i couldn't find it documented though, maybe i didn't look hard enough | 23:31 |
clarkb | ianw: seems to imply its not defined if no ipv6 | 23:31 |
tonyb | jeblair: if it were 1 or 2 repos I'd just do it by hand but ... | 23:31 |
mordred | pabelanger: we need to re-enable publish-to-pypi for shade in v2 :) | 23:32 |
pabelanger | clarkb: I've added auto-hold for gate-infra-puppet-apply-3-centos-7 on nodepool.o.o | 23:32 |
jeblair | tonyb: gotcha. fwiw, i expect us to have a rest api in zuulv3 in a couple of months that would help with this sort of thing. | 23:32 |
tonyb | jeblair: \o/ | 23:32 |
pabelanger | mordred: ack | 23:32 |
clarkb | tonyb: globally tripleo was ~1/3 of all jobs run when I last checked | 23:32 |
mordred | clarkb, jeblair, ianw: if you have a sec - https://review.openstack.org/511649 is needed for us to cut a shade release | 23:32 |
jeblair | ianw: no change after rming the file | 23:32 |
pabelanger | mordred: do we need to disable in zuulv3? | 23:33 |
jeblair | ianw: will proceed to checksums.db surgery | 23:33 |
mordred | pabelanger: no - we have those pipelines disabled in v3 anyway | 23:33 |
tonyb | clarkb: Wow | 23:33 |
clarkb | tonyb: er not jobs run | 23:33 |
clarkb | tonyb: sorry it was a cpu time calculation | 23:33 |
clarkb | triple owas 1/3 of all cpu usage | 23:33 |
tonyb | clarkb: Ahh okay. that's less shocking ;P | 23:34 |
pabelanger | mordred: kk | 23:34 |
*** caphrim007 has quit IRC | 23:34 | |
ianw | clarkb: do you know if ansible is a "&&" or a "&" ? i.e. is "- ansible_default_ipv6 is defined" then "- ansible_default_ipv6.address is defined" going to bail too? | 23:34 |
openstackgerrit | Lin Yang proposed openstack-infra/project-config master: Add OpenStack client check to python-rsdclient https://review.openstack.org/511650 | 23:35 |
ianw | cause i'm sure i saw in the no routable address that being blank | 23:35 |
clarkb | ianw: I'm not sure if it will short circuit | 23:35 |
clarkb | dmsimard: pabelanger mordred ^ do you know? | 23:35 |
jeblair | ianw: it is now doing more things. | 23:35 |
jeblair | looks like it's actually downloading package files. | 23:36 |
ianw | \o/ | 23:36 |
pabelanger | clarkb: looking | 23:36 |
jeblair | reprepro --confdir /etc/reprepro/ubuntu -VVVV _forget pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb | 23:36 |
jeblair | was the command i ran to drop the checksums.db entry for that, btw | 23:36 |
*** thorst has joined #openstack-infra | 23:37 | |
dmsimard | ianw: I don't understand the question, probably missing context | 23:37 |
ianw | when: | 23:37 |
ianw | - ansible_default_ipv6 is defined | 23:37 |
ianw | - ansible_default_ipv6.address is defined | 23:37 |
ianw | dmsimard: ^ does that work when ansible_default_ipv6 is not defined at all is basically the question | 23:37 |
dmsimard | If you're only interested in the second condition, it should work by itself without the first one but I'd test it first to make sure | 23:38 |
clarkb | dmsimard: we think the first is required because of http://logs.openstack.org/46/511646/1/infra-check/base-integration-centos-7/910a17b/job-output.txt.gz#_2017-10-12_23_27_37_155946 | 23:38 |
ianw | dmsimard: emperical evidence shows it doesn't, but i agree i thought it would too :) | 23:39 |
*** bobh has quit IRC | 23:39 | |
mnaser | but is it possible that ansible_default_ipv6 is defined but without and address? | 23:39 |
ianw | mnaser: i'm pretty sure i saw that with ipv6 but no routable address | 23:39 |
pabelanger | ianw: clarkb: we might want to use nodepool.public_ipv6 | 23:39 |
pabelanger | which we setup in inventory with zuul | 23:40 |
clarkb | mnaser: the log message is "ansible_default_ipv6' is undefined" | 23:40 |
clarkb | mnaser: implying its the root var that does not exist | 23:40 |
pabelanger | then you can when: nodepool.public_ipv6 | 23:40 |
dmsimard | ianw, clarkb: I have a sandbox on my laptop just to keep testing this sort of junk with conditionals and other things. It's never straightforward :( | 23:40 |
pabelanger | http://logs.openstack.org/69/511069/1/infra-check/project-config-nodepool/d1f74c9/zuul-info/host-info.ubuntu-xenial.yaml | 23:40 |
pabelanger | ansible_default_ipv6: {} | 23:41 |
mnaser | clarkb: which is why i'm saying if its possible that "ansible_default_ipv6" is defined with "ansible_default_ipv6.address" undefined. if that's the case, the second conditional can be dropped | 23:41 |
ianw | pabelanger: that's cool ... i mean this "default_ipv6" *should* be exactly what we want to express, as it seems to only put a routable ipv6 in there | 23:41 |
mnaser | but i guess pabelanger just confirmed it can be | 23:41 |
openstackgerrit | Merged openstack-infra/project-config master: Let v2 publish shade releases again https://review.openstack.org/511649 | 23:41 |
pabelanger | so, need to check if ansible_default_ipv6 is empty | 23:42 |
pabelanger | which means, no ipv6 | 23:42 |
clarkb | I think the two checks ianw has above should cover all cases | 23:42 |
pabelanger | or, nodepool.public_ipv6 | 23:42 |
pabelanger | which is str | 23:42 |
*** thorst has quit IRC | 23:42 | |
clarkb | its just a matter of knowing if ansible short circuits or not | 23:42 |
clarkb | I guess we might also want to check that the address is not empty | 23:42 |
mnaser | and until the new release on ansible changes the behaviour too (ha, ha, ha :-P) | 23:42 |
ianw | clarkb: such confusion! i think i will propose an update to the ansible doc page when we figure this out | 23:43 |
clarkb | mnaser: indeed | 23:43 |
*** aeng has joined #openstack-infra | 23:43 | |
* mnaser goes back to getting pdfs to accountant who is unable to unzip an archived file | 23:43 | |
ianw | mnaser: just fax them | 23:44 |
mnaser | so his specific request: please attach every single pdf into the email without a zip file, because that stuff is complicated .. | 23:44 |
clarkb | fungi: is the undercloud/tmp delete done? I'm not seeing it in ps | 23:45 |
pabelanger | clarkb: ianw: you can use with_dict: ansible_default_ipv6 | 23:45 |
pabelanger | then {{ item.address }} | 23:45 |
pabelanger | and should do the right thing | 23:45 |
*** markvoelker has quit IRC | 23:46 | |
*** gouthamr has joined #openstack-infra | 23:46 | |
ianw | and check if that's defined? | 23:46 |
pabelanger | ya, or | default({}) | 23:46 |
openstackgerrit | Merged openstack-infra/system-config master: Remove npm / rubygem crontab entries https://review.openstack.org/473911 | 23:46 |
*** bobh has joined #openstack-infra | 23:47 | |
EmilienM | fungi: ack, thx for the update | 23:48 |
*** tosky has quit IRC | 23:49 | |
fungi | clarkb: still running. it's in a window of the root screen session there | 23:49 |
fungi | i've cycled to that window now, you should see it if you attach | 23:50 |
clarkb | fungi: thanks | 23:50 |
clarkb | fungi: that is the log archive maintenance script? | 23:51 |
fungi | nope | 23:51 |
*** wolverineav has quit IRC | 23:51 | |
fungi | when you pointed out there was a missing subdirectory level, i went back to the original set of three patterns to delete and added the addional */ | 23:51 |
ianw | why did i start looking : https://github.com/ansible/ansible/issues/23675 | 23:51 |
clarkb | apparently screen -x maintains different windows on different attaches | 23:51 |
fungi | oh neat | 23:51 |
fungi | well, anyway, it's one of the three windows under that root screen session | 23:52 |
fungi | the other two are the v3 log deletion and the 2-week expiration | 23:53 |
clarkb | yup I see it now | 23:53 |
jeblair | ianw: reprepro finished | 23:54 |
jeblair | the file i deletex exists now and is the correct size | 23:55 |
ianw | yay team! | 23:55 |
jeblair | 818 files lost their last reference. | 23:55 |
jeblair | (dumpunreferenced lists such files, use deleteunreferenced to delete them.) | 23:55 |
pabelanger | great work | 23:55 |
jeblair | okay, what do we want to do next? | 23:55 |
jeblair | clarkb: i think you suggested that we switch one or more mirrors to serving from the rw volume, yeah? | 23:56 |
jeblair | that's basically a single character apache config change | 23:56 |
clarkb | jeblair: ya and then make sure that jobs are happy wit hthe mirror before we commit to it via vos release | 23:56 |
pabelanger | we should increase timeout kill value too, I don't thing 30m is enough time | 23:56 |
pabelanger | maybe make it 90 | 23:56 |
clarkb | jeblair: ianw do we want to consider rerunning reprepro again and see it mostly noop? | 23:57 |
jeblair | pabelanger: can you write that change? let's get it merged before we turn cron back on | 23:57 |
ianw | i would, run the full /usr/local/bin script | 23:57 |
pabelanger | sure | 23:57 |
jeblair | ianw: that will do the vos release which i don't want to do just yet | 23:57 |
clarkb | fungi: looking at ps I'm worried that that rm has been spending all of its time globbing things? | 23:57 |
jeblair | ianw: but i can do the rest of the reprepro steps there | 23:57 |
ianw | oh, yeah, with that commented out, and maybe the timeout commented out to | 23:57 |
jeblair | ianw: so maybe deleteunreferenced next? | 23:57 |
clarkb | fungi: we might need to run that through find too instead? | 23:58 |
clarkb | fungi: strace seems to agree that its just sitting there for the most part | 23:58 |
ianw | jeblair: i think so, since better to know what happens now than when cron hits it | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!