*** armax has quit IRC | 00:00 | |
*** aaronsheffield has quit IRC | 00:01 | |
*** jrist has quit IRC | 00:01 | |
donnyd | So I am 100% confident my bottleneck is no longer storage | 00:02 |
---|---|---|
donnyd | Just fired up a vm on a loaded server, and here are the results | 00:03 |
donnyd | https://www.irccloud.com/pastebin/9OMKNFQH/ | 00:03 |
*** jrist has joined #openstack-infra | 00:03 | |
*** sgw has joined #openstack-infra | 00:04 | |
*** jrist has quit IRC | 00:04 | |
donnyd | I am not sure what other clouds are doing, but I would be curious to see what one would get in this instance type. Maybe I am still short | 00:05 |
*** betherly has joined #openstack-infra | 00:06 | |
*** lseki has quit IRC | 00:09 | |
donnyd | So I am not sure why tempest runs keep timing out on FN... It appears to me like things are running as they should from the infra perspective | 00:09 |
*** jrist has joined #openstack-infra | 00:10 | |
*** betherly has quit IRC | 00:11 | |
clarkb | donnyd: I think it is possible that the swapping is just really hurting it | 00:12 |
donnyd | Here are stats from my edge router showing we aren't short there... The only thing left it could be is my CPU's are too old and slow.. or too over commited https://usercontent.irccloud-cdn.com/file/WoVNW88l/Screenshot%20from%202019-07-31%2020-10-29.png | 00:12 |
*** michael-beaver has quit IRC | 00:13 | |
*** jrist has quit IRC | 00:15 | |
*** jrist has joined #openstack-infra | 00:17 | |
*** ekultails has quit IRC | 00:18 | |
donnyd | clarkb: I will see if turning down the cpu ratios helps at all.. but I can't do much more to make this thing go any faster | 00:20 |
donnyd | Maybe the context switching on the hypervisor is slowing things to a crawl | 00:20 |
*** sgw has quit IRC | 00:20 | |
donnyd | or something like that | 00:20 |
clarkb | donnyd: ok, I think part of the problem is definitely the software being slow too (for example that api stuff we were discussing can be much quicker) | 00:21 |
donnyd | yea, that would surely help | 00:21 |
donnyd | I have a tv calling me, but I do appreciate everyone ( clarkb fungi mordred ianw ) for helping me to get FN back online | 00:23 |
donnyd | Have a great night | 00:23 |
fungi | we appreciate your help too, thanks donnyd! | 00:23 |
clarkb | oh right that is happenign again tonight | 00:24 |
clarkb | donnyd: thank you! and I should go find the tv too | 00:24 |
*** armax has joined #openstack-infra | 00:44 | |
*** gregoryo has joined #openstack-infra | 00:47 | |
*** efried has quit IRC | 00:51 | |
*** efried has joined #openstack-infra | 00:59 | |
*** ricolin has joined #openstack-infra | 01:04 | |
*** betherly has joined #openstack-infra | 01:08 | |
*** betherly has quit IRC | 01:13 | |
*** eernst_ has quit IRC | 01:19 | |
*** igordc has quit IRC | 01:31 | |
*** bobh has joined #openstack-infra | 01:34 | |
*** happyhemant has quit IRC | 01:35 | |
*** armax has quit IRC | 01:39 | |
*** betherly has joined #openstack-infra | 01:40 | |
*** dychen has joined #openstack-infra | 01:41 | |
*** dchen has quit IRC | 01:44 | |
*** betherly has quit IRC | 01:44 | |
*** bhavikdbavishi has joined #openstack-infra | 01:48 | |
*** eernst has joined #openstack-infra | 01:50 | |
*** bhavikdbavishi1 has joined #openstack-infra | 01:51 | |
*** bhavikdbavishi has quit IRC | 01:52 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 01:52 | |
*** eernst has quit IRC | 01:58 | |
*** gyee has quit IRC | 02:00 | |
*** betherly has joined #openstack-infra | 02:01 | |
*** betherly has quit IRC | 02:05 | |
*** slaweq has joined #openstack-infra | 02:11 | |
*** slaweq has quit IRC | 02:15 | |
*** bobh has quit IRC | 02:19 | |
*** betherly has joined #openstack-infra | 02:25 | |
*** bobh has joined #openstack-infra | 02:26 | |
*** dingyichen has joined #openstack-infra | 02:26 | |
*** betherly has quit IRC | 02:29 | |
*** dychen has quit IRC | 02:30 | |
*** ykarel|away has joined #openstack-infra | 02:30 | |
*** ykarel|away has quit IRC | 02:35 | |
*** bhavikdbavishi has quit IRC | 02:35 | |
*** ramishra has joined #openstack-infra | 02:41 | |
*** bobh has quit IRC | 02:56 | |
*** kjackal has joined #openstack-infra | 03:04 | |
*** eernst has joined #openstack-infra | 03:05 | |
*** rlandy|bbl has quit IRC | 03:10 | |
*** kjackal has quit IRC | 03:17 | |
*** notmyname has quit IRC | 03:20 | |
*** notmyname has joined #openstack-infra | 03:20 | |
*** ykarel|away has joined #openstack-infra | 03:26 | |
*** diablo_rojo has quit IRC | 03:27 | |
*** bhavikdbavishi has joined #openstack-infra | 03:29 | |
*** bobh has joined #openstack-infra | 03:33 | |
*** bobh has quit IRC | 03:38 | |
*** whoami-rajat has joined #openstack-infra | 03:41 | |
*** ykarel|away is now known as ykarel | 03:43 | |
*** hongbin has joined #openstack-infra | 03:45 | |
*** hongbin has quit IRC | 03:46 | |
*** udesale has joined #openstack-infra | 03:46 | |
*** psachin has joined #openstack-infra | 03:55 | |
*** slaweq has joined #openstack-infra | 04:11 | |
*** ramishra has quit IRC | 04:17 | |
*** slaweq has quit IRC | 04:17 | |
ianw | hrm not geting a console stream on http://zuul.openstack.org/stream/ec45ac1d208343b58dee520053e8caee?logfile=console.log | 04:29 |
*** ramishra has joined #openstack-infra | 04:30 | |
ianw | went to ze02, job-output.txt is flowing through ok but not console | 04:32 |
*** n-saito has joined #openstack-infra | 04:45 | |
*** eernst has quit IRC | 04:55 | |
*** raukadah is now known as chandankumar | 04:59 | |
*** goldyfruit has joined #openstack-infra | 04:59 | |
*** ykarel is now known as ykarel|afk | 05:00 | |
*** gfidente has joined #openstack-infra | 05:02 | |
*** goldyfruit has quit IRC | 05:04 | |
*** ykarel|afk has quit IRC | 05:04 | |
ianw | infra-root: so it looks like the log dameon on ze01 & ze02 has disappeared; nothing listening on port 7900. i've looked through the logs for anything with "^Traceback" but i can't see anything obvious as to why it would stop | 05:06 |
*** n-saito has quit IRC | 05:09 | |
*** diablo_rojo has joined #openstack-infra | 05:21 | |
*** ociuhandu has joined #openstack-infra | 05:22 | |
*** dingyichen has quit IRC | 05:23 | |
*** pkopec has joined #openstack-infra | 05:23 | |
*** dchen has joined #openstack-infra | 05:23 | |
*** diablo_rojo has quit IRC | 05:25 | |
*** dpawlik has joined #openstack-infra | 05:26 | |
*** jamesmcarthur has quit IRC | 05:26 | |
*** dansmith has quit IRC | 05:26 | |
*** ociuhandu has quit IRC | 05:27 | |
*** dansmith has joined #openstack-infra | 05:28 | |
*** kopecmartin|off is now known as kopecmartin | 05:46 | |
*** jaosorior has quit IRC | 05:46 | |
*** ykarel|afk has joined #openstack-infra | 05:51 | |
*** ccamacho has quit IRC | 05:54 | |
*** ykarel|afk is now known as ykarel | 05:58 | |
*** slaweq has joined #openstack-infra | 06:04 | |
ianw | #status log restarted ze02 to get log streaming working | 06:07 |
openstackstatus | ianw: finished logging | 06:07 |
ianw | infra-root: ^ see my notes in #zuul -- ze01 is still not streaming but i'm not going to touch it right now. i don't know if the "Unable to find worker for job" is just part of the startup now, or there's something else going on | 06:07 |
ianw | ze02 appears to be processing jobs but i don't want to risk making anything worse | 06:08 |
*** slaweq has quit IRC | 06:09 | |
*** slaweq has joined #openstack-infra | 06:11 | |
*** takamatsu has joined #openstack-infra | 06:13 | |
*** slaweq has quit IRC | 06:16 | |
*** xek has joined #openstack-infra | 06:17 | |
*** bobh has joined #openstack-infra | 06:19 | |
*** jtomasek has joined #openstack-infra | 06:21 | |
*** janki has joined #openstack-infra | 06:22 | |
*** bobh has quit IRC | 06:23 | |
*** pgaxatte has joined #openstack-infra | 06:26 | |
*** ricolin_ has joined #openstack-infra | 06:26 | |
*** xek has quit IRC | 06:27 | |
*** iurygregory has joined #openstack-infra | 06:29 | |
*** ricolin has quit IRC | 06:29 | |
*** jlufr has joined #openstack-infra | 06:40 | |
*** odicha has joined #openstack-infra | 06:45 | |
openstackgerrit | Luigi Toscano proposed zuul/zuul-jobs master: fetch-subunit-output: collect additional subunit files https://review.opendev.org/673885 | 06:56 |
*** e0ne has joined #openstack-infra | 06:57 | |
*** e0ne has quit IRC | 06:59 | |
*** slaweq has joined #openstack-infra | 07:04 | |
*** ginopc has joined #openstack-infra | 07:12 | |
*** pcaruana has quit IRC | 07:12 | |
*** rpittau|afk is now known as rpittau | 07:13 | |
*** ccamacho has joined #openstack-infra | 07:18 | |
*** tesseract has joined #openstack-infra | 07:20 | |
*** fdegir has joined #openstack-infra | 07:24 | |
*** jtomasek has quit IRC | 07:29 | |
*** jpena|off is now known as jpena | 07:34 | |
*** Goneri has joined #openstack-infra | 07:35 | |
*** ykarel is now known as ykarel|lunch | 07:37 | |
*** AJaeger has quit IRC | 07:37 | |
*** ralonsoh has joined #openstack-infra | 07:39 | |
*** ociuhandu has joined #openstack-infra | 07:39 | |
*** ralonsoh has quit IRC | 07:40 | |
*** ralonsoh has joined #openstack-infra | 07:40 | |
*** pcaruana has joined #openstack-infra | 07:42 | |
*** AJaeger has joined #openstack-infra | 07:48 | |
*** gregoryo has quit IRC | 07:51 | |
*** jtomasek has joined #openstack-infra | 07:52 | |
*** e0ne has joined #openstack-infra | 07:53 | |
*** tosky has joined #openstack-infra | 07:54 | |
*** ociuhandu has quit IRC | 07:56 | |
*** electrofelix has joined #openstack-infra | 07:58 | |
*** lpetrut has joined #openstack-infra | 08:03 | |
*** lpetrut has quit IRC | 08:04 | |
*** goldyfruit has joined #openstack-infra | 08:04 | |
*** lpetrut has joined #openstack-infra | 08:04 | |
*** dchen has quit IRC | 08:07 | |
*** jtomasek has quit IRC | 08:09 | |
*** goldyfruit has quit IRC | 08:09 | |
*** happyhemant has joined #openstack-infra | 08:09 | |
*** lucasagomes has joined #openstack-infra | 08:11 | |
*** ykarel|lunch is now known as ykarel|away | 08:21 | |
*** pgaxatte has quit IRC | 08:22 | |
*** derekh has joined #openstack-infra | 08:25 | |
*** ykarel|away has quit IRC | 08:27 | |
*** tkajinam has quit IRC | 08:27 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 08:28 |
*** lpetrut has quit IRC | 08:29 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 08:31 |
*** pgaxatte has joined #openstack-infra | 08:52 | |
*** priteau has joined #openstack-infra | 08:53 | |
*** psachin has quit IRC | 09:00 | |
*** bobh has joined #openstack-infra | 09:07 | |
*** janki has quit IRC | 09:08 | |
*** janki has joined #openstack-infra | 09:09 | |
*** bobh has quit IRC | 09:12 | |
*** jlufr has quit IRC | 09:13 | |
openstackgerrit | Sorin Sbarnea proposed opendev/base-jobs master: [POC] Execute linters using pre-commit tool https://review.opendev.org/673969 | 09:17 |
*** jchhatbar has joined #openstack-infra | 09:17 | |
*** janki has quit IRC | 09:17 | |
openstackgerrit | Carlos Goncalves proposed openstack/diskimage-builder master: Reduce yum-minimal based OS install size footprint https://review.opendev.org/672329 | 09:19 |
*** ginopc has quit IRC | 09:19 | |
*** ginopc has joined #openstack-infra | 09:25 | |
*** jchhatbar has quit IRC | 09:25 | |
*** e0ne_ has joined #openstack-infra | 09:36 | |
*** yamamoto has joined #openstack-infra | 09:36 | |
*** e0ne has quit IRC | 09:37 | |
*** roman_g has joined #openstack-infra | 09:38 | |
*** priteau has quit IRC | 09:43 | |
*** priteau has joined #openstack-infra | 09:43 | |
openstackgerrit | Thierry Carrez proposed openstack/project-config master: Fix ACL for compute-hyperv https://review.opendev.org/673988 | 09:47 |
*** priteau has quit IRC | 09:50 | |
*** priteau has joined #openstack-infra | 09:51 | |
*** goldyfruit has joined #openstack-infra | 09:55 | |
*** bhavikdbavishi has quit IRC | 09:56 | |
*** ociuhandu has joined #openstack-infra | 09:57 | |
*** goldyfruit has quit IRC | 10:00 | |
*** ociuhandu has quit IRC | 10:01 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: POC: Run linters via pre-commit https://review.opendev.org/667699 | 10:01 |
*** prometheanfire has quit IRC | 10:04 | |
*** rpittau is now known as rpittau|bbl | 10:04 | |
*** yamamoto has quit IRC | 10:07 | |
*** sshnaidm|afk is now known as sshnaidm | 10:25 | |
*** ricolin__ has joined #openstack-infra | 10:28 | |
*** dtantsur|afk is now known as dtantsur | 10:29 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 10:29 |
*** yamamoto has joined #openstack-infra | 10:30 | |
*** yamamoto has quit IRC | 10:30 | |
*** ricolin_ has quit IRC | 10:31 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 10:33 |
*** georgk has quit IRC | 10:34 | |
*** georgk has joined #openstack-infra | 10:35 | |
*** georgk has quit IRC | 10:35 | |
*** georgk has joined #openstack-infra | 10:37 | |
openstackgerrit | Sorin Sbarnea proposed opendev/system-config master: Recognize DISK_FULL failure messages (review_dev) https://review.opendev.org/673893 | 10:47 |
*** prometheanfire has joined #openstack-infra | 10:50 | |
*** yamamoto has joined #openstack-infra | 10:51 | |
frickler | ianw: I've commented on the ethercalc regarding devstack OSC timings, if you remove the two results from clarkb's tests, the stddev goes way down and the improvement resulting from them gets much clearer IMHO | 10:54 |
*** yamamoto has quit IRC | 10:55 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 10:55 |
*** dpawlik has quit IRC | 10:58 | |
*** rcernin has quit IRC | 10:58 | |
*** dpawlik has joined #openstack-infra | 11:06 | |
*** yamamoto has joined #openstack-infra | 11:10 | |
*** goldyfruit has joined #openstack-infra | 11:14 | |
*** ociuhandu has joined #openstack-infra | 11:20 | |
*** ociuhandu has quit IRC | 11:21 | |
*** goldyfruit has quit IRC | 11:21 | |
*** panda is now known as panda|eat | 11:25 | |
*** dpawlik has quit IRC | 11:27 | |
*** e0ne has joined #openstack-infra | 11:29 | |
*** e0ne_ has quit IRC | 11:30 | |
*** bhavikdbavishi has joined #openstack-infra | 11:35 | |
*** bhavikdbavishi has quit IRC | 11:37 | |
*** bhavikdbavishi has joined #openstack-infra | 11:38 | |
*** bobh has joined #openstack-infra | 11:45 | |
*** priteau has quit IRC | 11:46 | |
*** tdasilva has joined #openstack-infra | 11:47 | |
*** udesale has quit IRC | 11:49 | |
*** bobh has quit IRC | 11:50 | |
*** udesale has joined #openstack-infra | 11:50 | |
*** jamesmcarthur has joined #openstack-infra | 11:51 | |
*** dpawlik has joined #openstack-infra | 11:57 | |
*** pgaxatte has quit IRC | 11:58 | |
*** ociuhandu has joined #openstack-infra | 12:01 | |
*** dpawlik has quit IRC | 12:01 | |
*** ociuhandu has quit IRC | 12:05 | |
*** jamesmcarthur has quit IRC | 12:06 | |
*** ociuhandu has joined #openstack-infra | 12:15 | |
*** bobh has joined #openstack-infra | 12:16 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Optionally allow zoned executors to process unzoned jobs https://review.opendev.org/673840 | 12:19 |
*** Lucas_Gray has joined #openstack-infra | 12:22 | |
*** ociuhandu has quit IRC | 12:25 | |
*** ociuhandu has joined #openstack-infra | 12:27 | |
*** dpawlik has joined #openstack-infra | 12:29 | |
*** ociuhandu has quit IRC | 12:31 | |
*** rlandy has joined #openstack-infra | 12:31 | |
*** ociuhandu has joined #openstack-infra | 12:31 | |
*** Lucas_Gray has quit IRC | 12:32 | |
*** ekultails has joined #openstack-infra | 12:32 | |
*** panda|eat is now known as panda | 12:34 | |
*** jpena is now known as jpena|off | 12:34 | |
*** Lucas_Gray has joined #openstack-infra | 12:35 | |
*** bhavikdbavishi1 has joined #openstack-infra | 12:37 | |
*** bhavikdbavishi has quit IRC | 12:37 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 12:37 | |
*** Lucas_Gray has quit IRC | 12:42 | |
*** jamesmcarthur has joined #openstack-infra | 12:44 | |
*** aaronsheffield has joined #openstack-infra | 12:51 | |
gmann | infra team, patrole stable/stein was created by mistake, I have fixed that on release side (https://review.opendev.org/#/c/670942/) can you delete the branch now - https://opendev.org/openstack/patrole/src/branch/stable/stein | 12:53 |
mordred | it's currently on 58ec6210c6a121743e6bc3217d1388962f0647c3 | 12:54 |
mordred | done | 12:54 |
*** rpittau|bbl is now known as rpittau | 12:57 | |
*** jaosorior has joined #openstack-infra | 12:58 | |
*** pgaxatte has joined #openstack-infra | 13:03 | |
gmann | mordred: thanks | 13:05 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 13:07 |
*** tosky has quit IRC | 13:10 | |
*** tosky has joined #openstack-infra | 13:10 | |
*** bobh has quit IRC | 13:12 | |
donnyd | in the interest of getting to the bottom of job performance I am curious to know if increasing the available memory for each instance will in fact improve performance. | 13:15 |
*** jpena|off is now known as jpena | 13:16 | |
donnyd | I think at this point the jobs that timeout on FN are fairly well established, so i propose increasing memory for each instance on FN from 8G to 16G to see if that helps jobs move any faster. With the understanding that we aren't planning to ask for more from the cloud providers, I just want to get to the bottom of getting the absolute most of of each job. | 13:18 |
donnyd | With the proper data, I think we can do better at optimizing the jobs if we can definitively say that the jobs are using too much memory and find a way to make them better | 13:20 |
*** yamamoto has quit IRC | 13:21 | |
*** ociuhandu has quit IRC | 13:26 | |
*** ociuhandu has joined #openstack-infra | 13:27 | |
*** mriedem has joined #openstack-infra | 13:27 | |
*** jcoufal has joined #openstack-infra | 13:27 | |
*** priteau has joined #openstack-infra | 13:27 | |
*** ginopc has quit IRC | 13:29 | |
*** ociuhandu has quit IRC | 13:31 | |
*** lseki has joined #openstack-infra | 13:33 | |
*** ginopc has joined #openstack-infra | 13:33 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 13:37 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Make tenant and pipeline optional in zuul-changes https://review.opendev.org/674034 | 13:38 |
*** ginopc has quit IRC | 13:40 | |
*** yamamoto has joined #openstack-infra | 13:43 | |
*** witek has joined #openstack-infra | 13:44 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Move fingergw config to fingergw https://review.opendev.org/664949 | 13:45 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Route streams to different zones via finger gateway https://review.opendev.org/664965 | 13:45 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/664950 | 13:45 |
*** eharney has joined #openstack-infra | 13:46 | |
frickler | donnyd: if you want to abandon https://review.opendev.org/673838 you should see an "abandon" button there where you can do that yourself | 13:47 |
*** ginopc has joined #openstack-infra | 13:48 | |
frickler | donnyd: we generally want to keep the memory per instance identical on all providers. otherwise jobs will start to fail for lack of memory based on provider selection, which would be bad | 13:48 |
donnyd | frickler: well they already do that | 13:50 |
donnyd | The understanding is that we know they fail, but is it from lack of memory and going into swap? | 13:50 |
donnyd | Not a permanent thing, just testing to see if the instances that are provided more memory stop failing, or build faster | 13:51 |
witek | hello Infra Team | 13:52 |
frickler | donnyd: hmm, maybe wait for feedback from other infra-root, for me I'd like to see such a test made with dedicated job runs, not with a random subset of our general job queue | 13:52 |
*** priteau has quit IRC | 13:53 | |
witek | we see Monasca tempest jobs failing since Tuesday with strange errors when stacking DevStack | 13:54 |
witek | I don't think we have changed anything in the configuration though | 13:54 |
frickler | witek: is that with nova.conf missing? do you have a sample log pointer? | 13:54 |
witek | frickler: yes | 13:55 |
witek | https://logs.opendev.org/16/674016/1/check/monasca-tempest-python3-influxdb/3b8a695/controller/logs/devstacklog.txt.gz#_2019-08-01_12_38_53_231 | 13:55 |
frickler | witek: yes, that's a known issue in tempest, sadly the revert keeps failing in gate, see https://review.opendev.org/673784 | 13:56 |
witek | there are some other errors in other jobs as well | 13:56 |
donnyd | I surely agree that across the cloud providers we should be doing exactly the same thing, but there are a certain batch of jobs that fail on the regular. The other purpose in doing it more broadly is to see if all jobs are completed faster or its only certain ones that can actually make use of the extra memory. If yes, then we can report back to the communities with complete data saying that we need to find a better | 13:56 |
donnyd | way to more efficiently use the memory we have | 13:56 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add telnet to Docker Image https://review.opendev.org/672791 | 13:57 |
witek | frickler: thanks for info | 13:57 |
donnyd | But I am thinking a temporary test could give us the data we need | 13:57 |
donnyd | I agree we should wait to see what everyone else thinks | 13:58 |
*** lbragstad has joined #openstack-infra | 13:59 | |
donnyd | frickler: thanks for reminding me to abandon my review, still learning how to gerrit and such | 14:00 |
*** priteau has joined #openstack-infra | 14:01 | |
*** michael-beaver has joined #openstack-infra | 14:02 | |
frickler | #status log Force-Merged openstack/tempest master: Revert "Use memcached based cache in nova in all devstack-tempest jobs" https://review.opendev.org/673784 as requested by gmann to unblock other projects | 14:04 |
openstackstatus | frickler: finished logging | 14:04 |
gmann | frickler: thanks | 14:04 |
*** ociuhandu has joined #openstack-infra | 14:07 | |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Avoid openstacksdk image delete bug https://review.opendev.org/674043 | 14:14 |
mriedem | clarkb: the devstack memcache + nova meta api patch https://review.opendev.org/#/c/674025/ | 14:15 |
*** ociuhandu has quit IRC | 14:16 | |
*** ociuhandu has joined #openstack-infra | 14:19 | |
*** jhesketh has quit IRC | 14:23 | |
*** ociuhandu has quit IRC | 14:23 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add spec for enhanced regional executor distribution https://review.opendev.org/663413 | 14:34 |
jroll | neat, pypi added upload-scoped API tokens, we should consider moving to that for uploads: https://pyfound.blogspot.com/2019/07/pypi-now-supports-uploading-via-api.html | 14:38 |
jroll | it also means we'd be able to put 2FA on our pypi account(s) :) | 14:39 |
corvus | jroll: that looks perfect | 14:39 |
corvus | supports scoping to all packages maintained by an account | 14:40 |
corvus | so we can still just have one for all of openstack | 14:40 |
jroll | yep | 14:40 |
corvus | most of that work will be enhancements to the roles in zuul-jobs to support tokens. anyone can do that. once that's in place, an infra-root will need to go into our account and get a token. | 14:43 |
*** dpawlik has quit IRC | 14:43 | |
mordred | ++ | 14:43 |
*** ykarel has joined #openstack-infra | 14:43 | |
corvus | we need to restart zuul to pick up the log_url patch; earlier the better on that, so i'll get started on that | 14:45 |
*** jaosorior has quit IRC | 14:46 | |
fungi | corvus: jroll: the topic came up in #zuul last week, and it was unclear what benefit that provided over using a dedicated account for performing uploads (which is basically what we do now) | 14:46 |
jroll | fungi: can that account do anything besides upload? | 14:47 |
jroll | if not, I guess I don't see any benefit either | 14:48 |
fungi | what else important is there to do on pypi besides uploading? | 14:48 |
corvus | fungi: i think it means if the creds were compromised, they couldn't be used to remove or replace old releases, or change ownership of a project | 14:48 |
fungi | well, pypi already doesn't allow replacing old releases | 14:49 |
corvus | at least, i really hope that's the case, otherwise they shouldn't call them 'upload' tokens :) | 14:49 |
corvus | fungi: if they allow deleting old releases (which i'm pretty sure we've learned to our chagrin they do) and they allow uploading, then they do, right? | 14:49 |
jroll | right, uploads are probably the most interesting thing to do with a stolen credential anyway, so it isn't a large benefit IMO. but things like changing ownership are also bad | 14:49 |
fungi | the feature seemed to be mostly so that for users who interact with the warehouse webui and so want to use 2fa because they're at risk for leaking their credentials, but also use the same account to upload files and need some means of doing that non-interactively | 14:50 |
fungi | corvus: they allow you to delete old releases, but they don't allow you to reupload a previously-deleted release | 14:50 |
corvus | fascinating | 14:51 |
fungi | they indefinitely track the old release/file names | 14:51 |
fungi | even after deletion | 14:51 |
fungi | (to close the loophole you describe) | 14:51 |
corvus | then i agree this doesn't significantly change our security posture except in relation to ownership changes (though to really effect that, we'd have to revoke the ownership of our names by the patchwork of people who additionally own them now) | 14:51 |
fungi | yeah, i mean i think from a zuul-jobs perspective supporting tokens is great, but i believe because it reuses the existing credential options for twine it would already work today | 14:52 |
corvus | (zuul is restarting, btw) | 14:53 |
corvus | it just uses the password field? | 14:53 |
corvus | or, username + password | 14:54 |
fungi | right, and a generic username of "token" (or something like that) | 14:54 |
corvus | ok, then i guess it's at most a docs change :) | 14:54 |
fungi | token goes in the password field, username is a generic one which indicates the warehouse auth api should do a token lookup | 14:54 |
fungi | so right, if we wanted opendev's pypi uploads to use tokens, pretty sure it's mostly a matter of issuing one with our account and then updating the secret in project-config | 14:55 |
fungi | so maybe still worth doing if we think the slight reduction in permissions is worthwhile | 14:56 |
*** priteau has quit IRC | 14:56 | |
fungi | the main reason the announcement so broad is that pypi plans to make upload tokens mandatory for accounts which have enable two-factor authentication (though no indication of when that's planned yet, and the implication is that non-2fa accounts can continue to upload with their normal account username+password for the foreseeable future) | 14:57 |
fungi | maybe someday they'll also make upload tokens mandatory for non-2fa accounts, but i've not seen anyone suggest that | 14:58 |
corvus | i'm re-enqueing now; the executors haven't finished stopping yet, which is interesting. they're extra slow today. | 15:01 |
jroll | corvus: fungi: good info, thanks. it does seem like it isn't super useful, but a good item for someone to pick up if they're bored :) | 15:02 |
*** yamamoto has quit IRC | 15:02 | |
clarkb | corvus: did you see ianws noted about ze01 and ze02 not streaming logs? | 15:02 |
corvus | clarkb: yes | 15:03 |
clarkb | maybe related to slowness stopping? | 15:03 |
openstackgerrit | Natal Ngétal proposed openstack/reviewstats master: Load subproject data from governance https://review.opendev.org/653024 | 15:03 |
openstackgerrit | Natal Ngétal proposed openstack/reviewstats master: Raise hacking version and fix pep8 errors https://review.opendev.org/655911 | 15:03 |
corvus | ze02 stopped, and ze01 just did | 15:03 |
openstackgerrit | Natal Ngétal proposed openstack/reviewstats master: Switch to stestr https://review.opendev.org/655506 | 15:03 |
openstackgerrit | Natal Ngétal proposed openstack/reviewstats master: Drop pypy default tox env https://review.opendev.org/655912 | 15:03 |
*** yamamoto has joined #openstack-infra | 15:04 | |
*** yamamoto has quit IRC | 15:04 | |
*** yamamoto has joined #openstack-infra | 15:05 | |
corvus | neat, ze07 hit the oom killer | 15:05 |
corvus | and the whole executor process died | 15:05 |
corvus | i suspect that's what happened to 01 and 02, except that it only killed the streaming daemon | 15:06 |
corvus | the extra slowness is probably from the other executors being overloaded | 15:06 |
clarkb | perhaps related to cmurphy's keystone console logs that killed my desktop | 15:06 |
clarkb | if someone was trting to view them in the live streamed console? | 15:06 |
*** pkopec has quit IRC | 15:06 | |
corvus | all is back up now | 15:06 |
fungi | ugh, our broadband provider seems to have lost contact with the mainland at 15:00z precisely (i'm back on through a wireless modem for now) | 15:06 |
clarkb | fungi: mine texted me several hours ago about an outage today too | 15:07 |
clarkb | seems to be working so Im hoping it is all behind us | 15:07 |
*** ykarel is now known as ykarel|away | 15:08 | |
*** yamamoto has quit IRC | 15:09 | |
*** diga has joined #openstack-infra | 15:10 | |
*** ociuhandu has joined #openstack-infra | 15:10 | |
diga | Hello Everyone | 15:10 |
diga | I want to create new project in storyboard and under Open Infra topic | 15:11 |
diga | what's the process for it ? | 15:11 |
*** yamamoto has joined #openstack-infra | 15:13 | |
*** yamamoto has quit IRC | 15:13 | |
*** yamamoto has joined #openstack-infra | 15:14 | |
AJaeger | diga: what kind of new project? What's the purpose? | 15:14 |
*** ykarel|away has quit IRC | 15:17 | |
*** ykarel|away has joined #openstack-infra | 15:17 | |
*** ykarel|away has quit IRC | 15:18 | |
*** yamamoto has quit IRC | 15:18 | |
paladox | fungi you live on an island? | 15:19 |
*** ykarel|away has joined #openstack-infra | 15:19 | |
*** ykarel|away has quit IRC | 15:20 | |
*** ykarel|away has joined #openstack-infra | 15:20 | |
fungi | paladox: yup, but it's only ~15km offshore | 15:20 |
fungi | (barrier island separating an inland brackish body of water from the atlantic) | 15:21 |
*** odicha has quit IRC | 15:21 | |
donnyd | isle of mann fungi ? | 15:22 |
*** Goneri has quit IRC | 15:23 | |
fungi | heh, wrong side of the atlantic | 15:23 |
donnyd | lol | 15:23 |
fungi | bodie (pronounced "body") island, part of the northern stretch of the outer banks of north carolina | 15:23 |
paladox | lol | 15:24 |
donnyd | i was going to keep guessing :) | 15:24 |
donnyd | and now you have taken the fun out of my day | 15:26 |
* paladox will be near the isle of man in 2 weeks | 15:26 | |
paladox | i won't be on the isle, but near it :) | 15:26 |
donnyd | Next year I am going to try to make the TT... greatest show on earth | 15:27 |
*** gyee has joined #openstack-infra | 15:27 | |
paladox | TT? | 15:29 |
knikolla | mordred sir, just got word that app creds should be enabled now on the moc. | 15:30 |
mordred | knikolla: woot! | 15:32 |
mordred | knikolla: I can confirm that this works! | 15:33 |
* knikolla makes a happy dance. | 15:33 | |
*** jpena is now known as jpena|off | 15:36 | |
*** ykarel|away has quit IRC | 15:36 | |
*** e0ne has quit IRC | 15:37 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add clouds.yaml entry for MOC control plane project https://review.opendev.org/671463 | 15:38 |
mordred | infra-root: that ^^ is ready for review | 15:38 |
donnyd | Its a motorcycle race @paladox | 15:38 |
paladox | ah ok | 15:39 |
*** sthussey has joined #openstack-infra | 15:39 | |
clarkb | donnyd: I share frickler's concern. Basically we want to avoid having code merge that depends on that increase in memory. If we want to set up a new label temporarily that fn services to test if it makes a difference that would be fine | 15:39 |
clarkb | donnyd: the basic process there is updating that nl02.openstack.org.yaml file to have another ubuntu-bionic image + flavor + label combo. Then we can psuh a change to tempest that uses a different nodeset to run its jobs on that label | 15:40 |
*** tdasilva has quit IRC | 15:43 | |
*** ccamacho has quit IRC | 15:43 | |
donnyd | I am happy to do whatever infra thinks is a good path forward. Just so I understand what you are saying, if we bump the memory up: a job will succeed and code will be merged based on that. When the job runs again somewhere else it won't succeed the next time because it will get scheduled somewhere that does not have that setup. Trying to understand the why clarkb frickler | 15:44 |
clarkb | mordred: is there just the one account? I notice the all_clouds file only adds one | 15:45 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Create bindep_virtualenv_python for bindep role https://review.opendev.org/658439 | 15:46 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add test-bindep job https://review.opendev.org/674078 | 15:46 |
*** jamesmcarthur has quit IRC | 15:46 | |
*** ociuhandu has quit IRC | 15:47 | |
openstackgerrit | Luigi Toscano proposed zuul/zuul-jobs master: fetch-subunit-output: collect additional subunit files https://review.opendev.org/673885 | 15:49 |
*** priteau has joined #openstack-infra | 15:50 | |
*** whoami-rajat has quit IRC | 15:51 | |
*** jtomasek has joined #openstack-infra | 15:51 | |
*** dpawlik has joined #openstack-infra | 15:52 | |
*** lucasagomes has quit IRC | 15:54 | |
clarkb | donnyd: yup that is the concern. Not necessarily that it will happen but that it could happen | 15:54 |
mordred | clarkb: yes - so far - I haven't requested the second project yet because I wanted to get the first one working with app creds first | 15:54 |
clarkb | gotcha | 15:55 |
donnyd | ok, now i see what clarkb and frickler are saying and it makes sense to me | 15:55 |
*** ginopc has quit IRC | 15:55 | |
*** ginopc has joined #openstack-infra | 15:55 | |
*** armax has joined #openstack-infra | 15:58 | |
*** ginopc has quit IRC | 15:59 | |
*** dpawlik has quit IRC | 15:59 | |
*** sgw has joined #openstack-infra | 16:00 | |
*** pgaxatte has quit IRC | 16:02 | |
*** michael-beaver has quit IRC | 16:04 | |
*** jpena|off is now known as jpena | 16:05 | |
openstackgerrit | Merged zuul/zuul-jobs master: fetch-subunit-output: collect additional subunit files https://review.opendev.org/673885 | 16:07 |
*** jpena is now known as jpena|off | 16:07 | |
*** igordc has joined #openstack-infra | 16:08 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint https://review.opendev.org/641099 | 16:19 |
openstackgerrit | James E. Blair proposed zuul/zuul master: authentication config: add optional token_expiry https://review.opendev.org/642408 | 16:19 |
fungi | donnyd: also i don't know that we really need experimental data on that. we already have data which indicates that those jobs used to work in 8gb of ram and are now using increasing amounts of swap space | 16:20 |
fungi | i mean, increasing ram to find out if they still time out might confirm for us that it's the swapping which is slowing them down, but we need them to stop swapping heavily regardless | 16:20 |
openstackgerrit | Merged openstack/reviewstats master: Mailing lists change openstack-dev to openstack-discuss https://review.opendev.org/668446 | 16:23 |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: Sketch of a registry test which uses swift https://review.opendev.org/653797 | 16:24 |
*** lbragstad has quit IRC | 16:29 | |
cloudnull | hey all , we've been seeing the following error with the docker registry reverse proxy on OVH - https://logs.opendev.org/20/673920/4/check/tripleo-ci-centos-7-scenario004-standalone/59f618e/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz?level=ERROR | 16:29 |
cloudnull | when I curl http://mirror.bhs1.ovh.openstack.org:8082/v2/tripleomaster/centos-binary-nova-compute/blobs/sha256:2efded40b28a63edb701aef3f646be560c1d938199334b173f496db2ca7285b1 - I get the same 401 | 16:29 |
cloudnull | so i' | 16:29 |
clarkb | cloudnull: what happens if you request the same object from the backend? | 16:30 |
cloudnull | So i'm wondering if this is something you've seen or may have some insight on ? | 16:30 |
clarkb | cloudnull: but this is likely the thing that sshnaidm brought up a little while ago | 16:30 |
clarkb | cloudnull: docker requires auth tokens even if doing anonymous access | 16:30 |
clarkb | tripleo uses a python script that doesn't get a token (or maybe the token is expiring?) | 16:31 |
cloudnull | ah ha ! | 16:31 |
clarkb | you basically requet an anonymous token that they can use to track the session/requestor without authenticating | 16:31 |
cloudnull | interesting. I will look into that. | 16:31 |
* cloudnull TIL | 16:31 | |
clarkb | cloudnull: let me see if I can find the docs on that | 16:31 |
clarkb | I had them found when sshnaidm was looking at it | 16:32 |
clarkb | cloudnull: https://docs.docker.com/registry/spec/auth/token/ | 16:32 |
clarkb | I was able to reproduce the 401's locally with curl at the time not authing. Then when I auth'd it worked | 16:32 |
cloudnull | ok, cool . that should be an easy enough fix | 16:33 |
cloudnull | thanks cloudnull | 16:33 |
cloudnull | bah... | 16:33 |
cloudnull | clarkb | 16:33 |
fungi | there was an example where we already do that in another script/job, right? | 16:34 |
fungi | i just don't recall which one now | 16:34 |
corvus | these roles: https://zuul-ci.org/docs/zuul-jobs/container-roles.html | 16:34 |
*** dtantsur is now known as dtantsur|afk | 16:38 | |
*** rpittau is now known as rpittau|afk | 16:40 | |
*** witek has quit IRC | 16:41 | |
clarkb | corvus: ttx and jroll have asked that we add them to the openstack org as admins so they can begin cleaning it up. They have promised to be careful. But I remember you were working with jroll before and wanted to makle sure you didn't have some other plan in mind | 16:44 |
clarkb | if not I'll go ahead and add them | 16:44 |
*** sshnaidm is now known as sshnaidm|afk | 16:46 | |
corvus | clarkb: no other plan; sgtm | 16:47 |
mordred | clarkb: also - we shoudl point jroll and ttx at my crappy scripts from yesterday | 16:48 |
jroll | mordred: heh, I've already written another crappy script | 16:48 |
mordred | awesome! | 16:48 |
jroll | but more crappy code is always welcome | 16:48 |
mordred | jroll: well, https://review.opendev.org/#/c/673831/ is what I used to retire and archive the stuff in openstack-infra that had moved to opendev/ | 16:48 |
corvus | mostly i wanted to highlight how very dangerous it is to use the same github account for regular github work and also give it admin perms in the openstack org | 16:49 |
mordred | jroll: if it's useful to you in anyway, awesome. if not - also awesome :) | 16:49 |
jroll | cool, thanks mordred | 16:49 |
corvus | i lost track of that thread and am unsure if a solution to that was ever arrived at | 16:49 |
corvus | ie, it should really either be a dedicated account, or an account owned by someone who doesn't actually use github. | 16:51 |
jroll | corvus: we plan to use our accounts for now (as neither of us use/clone from/etc github for openstack work), and move to a shared account if we feel that's needed to add more people to help | 16:51 |
donnyd | clarkb: Do you want me to setup a custom resource for jobs known to fail on FN? | 16:51 |
corvus | jroll: sounds reasonable | 16:51 |
jroll | see also https://etherpad.openstack.org/p/openstack-repos-on-github | 16:51 |
clarkb | donnyd: up to you if you want to proceed with testing that further. I'm fairly happy with the results we've got so far and have identified deficiencies in the software that should make it go quicker | 16:52 |
clarkb | donnyd: I can assist with that if you are interested | 16:52 |
donnyd | I should have been more specific, this job we are trying to get to fail correct? https://review.opendev.org/#/c/673923/ | 16:53 |
clarkb | donnyd: ya if we recheck it a few times hopefully one of them runs on fn and then fails and we get logs | 16:53 |
donnyd | OH ic | 16:53 |
clarkb | jroll: invite sent | 16:54 |
clarkb | jroll: you can check your email or go to https://github.com/openstack to accept says github's banner | 16:54 |
clarkb | ttx: ^ you too | 16:55 |
jroll | clarkb: got it, thanks | 16:55 |
donnyd | I wouldn't mind a pointer on what to change to setup a custom resource. Do I just give it a different name, say like ubuntu-bionic-testonly | 16:55 |
clarkb | donnyd: ya pretty mcuh. We have an example with the vexxhost gpu flaor type I can get links for. one sec | 16:56 |
cloudnull | so when I start looking at our test results for that 401 error it looks like all of our failures for the last week have been in OVH. | 16:56 |
cloudnull | http://logstash.openstack.org/#/dashboard/file/logstash.json?query=(message:%20%5C%22401%20Client%20Error:%20Unauthorized%5C%22)%20AND%20tags:%20%5C%22console%5C%22%20AND%20voting:1%20AND%20(project:%20*tripleo*) | 16:56 |
clarkb | donnyd: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L48-L53 and https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L237-L251 are the config you need | 16:57 |
cloudnull | 116 instances of that 401 error over the last 7 days | 16:57 |
clarkb | cloudnull: yes but the error is reproduceable from anywhere including my desktop | 16:57 |
cloudnull | but seems odd that we're not hitting it on the other node providers ? | 16:58 |
clarkb | cloudnull: that was why I theorized maybe the token is expiring | 16:58 |
cloudnull | hum . I will continue digging . | 16:58 |
clarkb | could be that the script is requesting a token but that token expires before all requests are complete possibly due to transatlantic data transit | 16:58 |
cloudnull | ++ | 16:58 |
cloudnull | that very well could be | 16:58 |
clarkb | cloudnull: the dockerhub stuff is just a proxy so if we are not hitting cached data it could theoretically be slow to pull data down to france | 16:59 |
clarkb | (maybe, its one thing to look into at least) | 16:59 |
cloudnull | I will see if I can make our tools reauth on 401 | 16:59 |
clarkb | cloudnull: that sounds like a great idea | 17:00 |
fungi | yeah, my theory to the intermittent nature of it was that mostly jobs are hitting cached copies but when there's a cache miss only jobs which are getting tokens manage to re-warm the cache | 17:00 |
* mordred hands cloudnull +2 Chain Mail of Awesomeness | 17:00 | |
clarkb | fungi: cloudnull oh right the other theory was that docker itself may be pulling down those images which caches them Then our proxy doesn't care about the auth token | 17:00 |
cloudnull | fungi ++ | 17:00 |
fungi | mordred: ooh, is it mithril? | 17:00 |
clarkb | fungi: cloudnull so depending on job ordering and mirror expiry demands (we have 24 hour refresh but also expire due to disk space) we could see it in one cloud more frequently | 17:01 |
mordred | fungi: yes, and also covered in feathers | 17:01 |
*** derekh has quit IRC | 17:01 | |
fungi | righteous | 17:01 |
* cloudnull tared and feathered | 17:01 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Don't compare to literal True/False https://review.opendev.org/667697 | 17:01 |
openstackgerrit | Donny Davis proposed openstack/project-config master: Adding another pool to FN for expanded memory test https://review.opendev.org/674091 | 17:02 |
fungi | one reason ovh is hitting it more often could be that we're running an larger number of builds there relative to the single mirror in each region, so caching a diversity of things more quickly and expiring some images out of the cache faster as a result of pressure for the cache space | 17:02 |
*** igordc has quit IRC | 17:03 | |
cloudnull | that does make some sense | 17:03 |
fungi | though it also could simply be that we're failing to cache (a subset of?) images in ovh for some reason | 17:03 |
fungi | or registering cache misses when we shouldn't | 17:04 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Be consistent about spaces before and after vars https://review.opendev.org/667698 | 17:04 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support https://review.opendev.org/674092 | 17:04 |
*** priteau has quit IRC | 17:05 | |
*** smrcascao has quit IRC | 17:05 | |
*** electrofelix has quit IRC | 17:06 | |
donnyd | clarkb: I think i got it | 17:06 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support https://review.opendev.org/674092 | 17:07 |
clarkb | donnyd: couple of comments inline | 17:10 |
*** bobh has joined #openstack-infra | 17:11 | |
*** bobh has quit IRC | 17:11 | |
*** udesale has quit IRC | 17:14 | |
*** whoami-rajat has joined #openstack-infra | 17:15 | |
*** ramishra has quit IRC | 17:18 | |
*** ralonsoh has quit IRC | 17:20 | |
openstackgerrit | Donny Davis proposed openstack/project-config master: Adding another pool to FN for expanded memory test https://review.opendev.org/674091 | 17:20 |
donnyd | oh i messed that one up | 17:21 |
*** kopecmartin is now known as kopecmartin|off | 17:24 | |
donnyd | Ok, so maybe my browser was just caching something... weird. | 17:27 |
*** igordc has joined #openstack-infra | 17:27 | |
*** jcoufal_ has joined #openstack-infra | 17:30 | |
*** ricolin__ is now known as ricolin | 17:30 | |
*** igordc has quit IRC | 17:30 | |
*** jcoufal has quit IRC | 17:32 | |
openstackgerrit | Merged zuul/zuul master: Try out reporting the build page https://review.opendev.org/673863 | 17:32 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Change operator namespace to zuul-ci.org https://review.opendev.org/674100 | 17:37 |
clarkb | donnyd: one more small thing but otherwise that looks bout ready | 17:41 |
*** slaweq has quit IRC | 17:42 | |
*** igordc has joined #openstack-infra | 17:42 | |
*** ociuhandu has joined #openstack-infra | 17:44 | |
openstackgerrit | Donny Davis proposed openstack/project-config master: Adding another pool to FN for expanded memory test https://review.opendev.org/674091 | 17:46 |
clarkb | donnyd: that lgtm | 17:48 |
*** ociuhandu has quit IRC | 17:48 | |
*** betherly has joined #openstack-infra | 17:49 | |
donnyd | only took me three trys | 17:51 |
*** betherly has quit IRC | 17:54 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Revert "fetch-subunit-output: collect additional subunit files" https://review.opendev.org/674102 | 17:55 |
AJaeger | tosky, corvus , I see failures and propose a revert ^ | 17:56 |
tosky | AJaeger: sure, failures where? | 17:56 |
AJaeger | example is https://logs.opendev.org/01/673401/2/check/openstacksdk-tox-py36-tips/0e0bed9/job-output.txt.gz#_2019-08-01_17_04_29_549629 | 17:56 |
tosky | otherwise I can't fix it | 17:56 |
*** yamamoto has joined #openstack-infra | 17:57 | |
tosky | AJaeger: there is an inconsistency in the way zuul_work_dir is defined, it seems | 17:57 |
*** slaweq has joined #openstack-infra | 17:57 | |
AJaeger | tosky: lots of "post_failures" on openstacksdk jobs, have a look at zuul.opendev.org | 17:57 |
AJaeger | tosky: here as well: https://logs.opendev.org/87/673987/1/gate/cross-cinder-py27/9b81c84/job-output.txt.gz#_2019-08-01_17_55_02_291391 | 17:58 |
tosky | AJaeger: I can't fix them there, but I think there is an inconsistency in the usage of relative paths and full paths | 17:58 |
tosky | so sure, revert, but the issue is somewhere else | 17:58 |
AJaeger | infra-root, should we revert - and then fix? | 17:59 |
corvus | we should certainly revert | 17:59 |
corvus | whether the fix is to correct all instances of zuul_work_dir, or rather to make the consuming role more forgiving is an open question | 17:59 |
AJaeger | thanks, clarkb | 17:59 |
AJaeger | tosky: then let's work on figuring out the best way forward... | 18:00 |
tosky | tomorrow | 18:01 |
*** tosky has quit IRC | 18:01 | |
*** gfidente has quit IRC | 18:03 | |
*** yamamoto has quit IRC | 18:04 | |
fungi | i think, given that it's a change in zuul-jobs, we probably need to make the role more forgiving (or do a lot of communication to downstream consumers in advance of the behavior change) | 18:04 |
fungi | because "we" can't fix all uses of it, given we are probably not even aware of some of them and have no visibility into them | 18:05 |
*** witek has joined #openstack-infra | 18:10 | |
openstackgerrit | Merged zuul/zuul-jobs master: Revert "fetch-subunit-output: collect additional subunit files" https://review.opendev.org/674102 | 18:12 |
openstackgerrit | Merged openstack/project-config master: Adding another pool to FN for expanded memory test https://review.opendev.org/674091 | 18:13 |
AJaeger | we have a couple of places that use a relatve zuul_work_dir, so those jobs will fail. I agree, we need to make it more robust | 18:13 |
clarkb | we could use a tmpdir maybe? | 18:14 |
clarkb | or possibly even append on the executor before upload to the log server? | 18:14 |
*** jcoufal_ has quit IRC | 18:14 | |
AJaeger | looks like 32 or so repos use that in master - so if we require absolute paths, we have to backport them. http://codesearch.openstack.org/?q=zuul_work_dir%3A%20src&i=nope&files=&repos= | 18:15 |
AJaeger | 32 lines - not repos | 18:15 |
*** diablo_rojo has joined #openstack-infra | 18:17 | |
openstackgerrit | Matt McEuen proposed openstack/project-config master: New project request: airship/kubernetes-entrypoint https://review.opendev.org/673900 | 18:18 |
*** jamesmcarthur has joined #openstack-infra | 18:23 | |
*** betherly has joined #openstack-infra | 18:30 | |
*** eharney has quit IRC | 18:32 | |
*** diablo_rojo has quit IRC | 18:32 | |
*** jcoufal has joined #openstack-infra | 18:34 | |
*** betherly has quit IRC | 18:35 | |
*** eharney has joined #openstack-infra | 18:36 | |
fungi | AJaeger: and that's just in opendev, to say nothing of other possible users outside opendev | 18:37 |
*** diablo_rojo has joined #openstack-infra | 18:42 | |
*** diga has quit IRC | 18:44 | |
AJaeger | fungi: yeah. As tosky pointed out, the role itself says it needs to be an absolute paths - but we have other places where do not require that ;( So, I would be fine making that change *after* everything is fixed and an announcement went out since the risk of breakage is high... | 18:44 |
* AJaeger is suprised to see zuul_work_dir: "{{ zuul.project.src_dir }}" and in other files: zuul_work_dir: "{{ ansible_user_dir }}/{{ zuul.project.src_dir }}" | 18:45 | |
AJaeger | Which one should we use? | 18:45 |
AJaeger | and then zuul_work_dir: "src/{{ zuul.project.canonical_name }}" | 18:46 |
AJaeger | even opestack-zuul-jobs has one occurence of "zuul_work_dir: src/opendev.org/zuul/zuul" | 18:46 |
AJaeger | and I see zuul_work_dir: "{{ zuul.executor.work_root }}/{{ zuul.project.src_dir }}" | 18:47 |
*** jcoufal has quit IRC | 18:49 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Use absolute zuul_work_dir https://review.opendev.org/674108 | 18:52 |
AJaeger | fungi: that is one place to update ^ | 18:52 |
*** betherly has joined #openstack-infra | 18:58 | |
AJaeger | argh, that is not an absolute path either ;( | 19:01 |
AJaeger | so, even more relative zuul_work_dirs since I ignored that pattern | 19:02 |
*** bhavikdbavishi has quit IRC | 19:02 | |
AJaeger | zuul.project.src_dir is relative | 19:03 |
*** betherly has quit IRC | 19:03 | |
* AJaeger waves good night | 19:03 | |
fungi | have a good night AJaeger! | 19:04 |
*** bhavikdbavishi has joined #openstack-infra | 19:04 | |
*** jtomasek has quit IRC | 19:05 | |
*** michael-beaver has joined #openstack-infra | 19:22 | |
*** tesseract has quit IRC | 19:23 | |
*** eernst has joined #openstack-infra | 19:25 | |
*** dpawlik has joined #openstack-infra | 19:32 | |
clarkb | anyone know how to link to multiple cacti graphs? | 19:35 |
clarkb | I can't seem to figure it out | 19:35 |
clarkb | I can chaneg the graphs in my browser but they don't seem to have links | 19:35 |
clarkb | anyways gitea08 OOM'd again today | 19:36 |
clarkb | no other hosts have OOM'd since the replacements | 19:36 |
clarkb | cacti shows large number of connections and high cpu, memory, swap usage | 19:36 |
clarkb | I'm going to trigger gerrit replication against gitea08 now just to be sure we don't have missing stuff there | 19:36 |
clarkb | but then I guess I'll need to look at logs to see what was going on? | 19:37 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint https://review.opendev.org/641099 | 19:39 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: authentication config: add optional token_expiry https://review.opendev.org/642408 | 19:39 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: [WIP] admin REST API: zuul-web integration https://review.opendev.org/643536 | 19:39 |
*** bhavikdbavishi has quit IRC | 19:40 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Use a requests session to simplify auth'd calls https://review.opendev.org/670511 | 19:40 |
clarkb | looks like gitea actually crashed or was killed by oomkiller according to the logs. Not sure which yet | 19:41 |
clarkb | but basically goroutines complain about lack of memory and then there are logs of it starting again | 19:41 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: add-build-sshkey: add centos/rhel-8 support https://review.opendev.org/674092 | 19:41 |
fungi | clarkb: so the lack of screaming means our haproxy health checks dtrt and took it out of the pool at least? | 19:42 |
clarkb | fungi: ya I'm assuming so | 19:43 |
clarkb | it should be back in the pool now | 19:43 |
fungi | awesome | 19:43 |
fungi | thanks | 19:43 |
fungi | you couldn't tell from dmesg whether the oom killer got it? | 19:43 |
clarkb | oh I didn't readd it, I think haproxy checks should've passed and stuck it back into the rotation | 19:43 |
clarkb | fungi: I could if I want to read the logs thoroughly | 19:43 |
fungi | no worries | 19:44 |
fungi | i'm taking a quick look to see if i can tell | 19:44 |
clarkb | I'm trying to figure out what triggered this now | 19:44 |
clarkb | and I'm remembering it is difficult to read the logs for that (that proxy protocol thing might be a good idea) | 19:44 |
fungi | [Thu Aug 1 11:36:15 2019] Out of memory: Kill process 14505 (gitea) score 949 or sacrifice child | 19:46 |
fungi | [Thu Aug 1 11:36:15 2019] Killed process 24769 (git) total-vm:686700kB, anon-rss:23304kB, file-rss:0kB, shmem-rss:0kB | 19:47 |
fungi | so gitea needed memory, git was ultimately sacrificed | 19:47 |
clarkb | fungi: oomkiller was invoked a bunch though | 19:47 |
clarkb | the gitea crash seems to be closer to 11:47 | 19:47 |
fungi | oh | 19:48 |
fungi | yep | 19:48 |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: builder: Remove recency table logging https://review.opendev.org/674124 | 19:48 |
fungi | all the "Killed process" lines are git processes, except for one pandoc | 19:49 |
fungi | suggesting that gitea was not killed directly via oom-killer but crashed or was terminated for some other reason | 19:49 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: add-build-sshkey: add centos/rhel-8 support https://review.opendev.org/674092 | 19:55 |
clarkb | k | 19:55 |
clarkb | somehow emc, novell, ibm, and rackspace IPs all land on this one backend | 19:56 |
clarkb | I suppose that is deterministic hashing for the how | 19:56 |
clarkb | in the ~4 hour time period surrounding that increase in resource usage emc is the biggest number of connections | 19:57 |
clarkb | followed closely by novell | 19:57 |
clarkb | then rackspace half as many as them and ibm half as many as rax | 19:57 |
*** jtomasek has joined #openstack-infra | 19:57 | |
clarkb | (the rax IP is not ours according to our inventory) | 19:57 |
*** jcoufal has joined #openstack-infra | 19:57 | |
fungi | any signs of similar spikes on other backends in the same timeframe? | 19:58 |
clarkb | my grep | sed | uniq -c is not sophisticated enough for that :) | 19:59 |
clarkb | though now that you mention it maybe we want a python script that can make bar graphs or something | 19:59 |
*** betherly has joined #openstack-infra | 20:00 | |
clarkb | I need to eat lunch so will have to take a break | 20:00 |
clarkb | my homedir on gitea08 has some files I've trimmed down from a copy of syslog | 20:00 |
clarkb | `cut -d' ' -f 7 gitea08_curated_haproxy_logs | sed -ne 's/\(:[0-9]\+\)$//p' | sort | uniq -c | sort -n -r | head` will give you numbers | 20:00 |
*** eharney has quit IRC | 20:03 | |
*** betherly has quit IRC | 20:04 | |
*** nhicher has quit IRC | 20:04 | |
clarkb | the emc IP shows up with multiple connections to gerrit with about 5 different accounts so likely a NAT addr | 20:05 |
*** jtomasek has quit IRC | 20:06 | |
clarkb | all of the accounts talking to gerrit from that ip are dell emc third party ci accounts | 20:07 |
clarkb | I don't know yet that they are at fault, but they are making a ton of requests and wouldn't surprise me if there is a relationship there | 20:08 |
clarkb | ok eating lunch now | 20:08 |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: builder: Log all deletions of image upload records https://review.opendev.org/674126 | 20:10 |
*** tdasilva has joined #openstack-infra | 20:14 | |
*** jamesmcarthur has quit IRC | 20:16 | |
*** tosky has joined #openstack-infra | 20:17 | |
openstackgerrit | Merged zuul/nodepool master: Avoid openstacksdk image delete bug https://review.opendev.org/674043 | 20:18 |
*** jcoufal has quit IRC | 20:22 | |
fungi | there is a great python library which will do "ascii" (more unicode) line/bar graphs, but i'm on the wrong computer to find that tab i had up | 20:34 |
* fungi is only half here at the moment, between breaks pushing around a mower | 20:34 | |
*** diablo_rojo has quit IRC | 20:35 | |
smcginnis | fungi: This? https://github.com/lord63/ascii_art | 20:36 |
*** e0ne has joined #openstack-infra | 20:40 | |
fungi | nope, but that looks neat | 20:47 |
*** witek has quit IRC | 20:54 | |
*** betherly has joined #openstack-infra | 20:59 | |
*** smrcascao has joined #openstack-infra | 21:02 | |
*** betherly has quit IRC | 21:03 | |
*** witek has joined #openstack-infra | 21:05 | |
*** cloudnull has quit IRC | 21:06 | |
*** cloudnull has joined #openstack-infra | 21:07 | |
*** Lucas_Gray has joined #openstack-infra | 21:13 | |
*** betherly has joined #openstack-infra | 21:19 | |
*** betherly has quit IRC | 21:24 | |
*** witek has quit IRC | 21:25 | |
clarkb | `goaccess` can apparently be convinced to do things with haproxy via a log format string | 21:26 |
clarkb | so I'm about to fiddle with that | 21:26 |
ianw | frickler: ahh, great point on removing the numbers from the test runs from the timing results! | 21:30 |
ianw | lies, damned lies and statistics :) | 21:30 |
ianw | clarkb: did you see https://review.opendev.org/#/c/673724/ & https://review.opendev.org/#/c/673739/1 from the other day -- your intuition on the recommends install is i believe correct | 21:37 |
*** eernst has quit IRC | 21:38 | |
*** ekultails has quit IRC | 21:38 | |
clarkb | ianw: I did not | 21:40 |
clarkb | sorry I've been all over the place the last week or so | 21:40 |
ianw | no worries :) i think it does solve the mystery of why it works in the gate but not on new servers, though | 21:46 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Support Rackspace in upload-logs-swift https://review.opendev.org/674136 | 21:47 |
*** betherly has joined #openstack-infra | 21:50 | |
*** mriedem has quit IRC | 21:53 | |
clarkb | ugh goaccess has trouble parsing haproxy logs because haproxy writes ipv6 addres without []s | 21:53 |
clarkb | but even then it seems to not do greedy matching so it fails | 21:53 |
*** betherly has quit IRC | 21:55 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Support Rackspace in upload-logs-swift https://review.opendev.org/674136 | 21:59 |
*** tosky has quit IRC | 22:07 | |
*** slaweq has quit IRC | 22:10 | |
*** betherly has joined #openstack-infra | 22:11 | |
*** slaweq has joined #openstack-infra | 22:11 | |
clarkb | ianw: I think https://review.opendev.org/#/c/673739/1 is inverted. The test nodes arleady don't install recommends but production images to | 22:12 |
clarkb | s/images to/images do/ | 22:12 |
*** betherly has quit IRC | 22:15 | |
*** slaweq has quit IRC | 22:16 | |
clarkb | ok I finally figured out how to get goaccess to read the logs (I ended up wrapping all ip addrs in []s via sed then I could update the format for goaccess apropriately | 22:19 |
clarkb | Gives some interesting data | 22:19 |
clarkb | the emc ip I identified doesn't do anywhere near the bulk of the data transfer but it does do the third most cumulative connection time | 22:20 |
clarkb | so whatever they are doing is expensive? | 22:20 |
clarkb | Now I need to trim my logs a bit so that I'm only looking at the time period I care about (goaccess dashboard doesn't make that configurable) | 22:20 |
ianw | clarkb: that's right, so that's why in base-server i've added the don't install flags so it applies on our control-plane/production servers? | 22:22 |
*** diablo_rojo has joined #openstack-infra | 22:22 | |
ianw | so they act the same as the testing nodes (and, if when we get nodepool uploads working, the dib control plane nodes) | 22:22 |
clarkb | oh for some reason I read that as a test playbook/role | 22:23 |
clarkb | any concern with things breaking beacuse we'll stop installing all the packages we depend on if we do that? | 22:23 |
ianw | clarkb: i don't think so ... i mean we would presumably only notice when rolling out new servers, and new servers should have pretty good gate coverage | 22:24 |
clarkb | I guess unattended upgrades will continue to update installed packages for the older stuff so that is fine | 22:26 |
ianw | and even all the old puppet testing has been running in the gate on nodes with it disabled via dib | 22:26 |
ianw | i think it is definitely a openafs packaging bug that it seems to start the client and systemd seems to think it started although the modules aren't built | 22:27 |
ianw | but still, i think we're also better being more homogeneous between testing and production anyway | 22:28 |
*** smrcascao has quit IRC | 22:28 | |
*** jamespage_ has joined #openstack-infra | 22:29 | |
*** dustinc_ has joined #openstack-infra | 22:29 | |
*** sdoran_ has joined #openstack-infra | 22:29 | |
*** mnasiadka_ has joined #openstack-infra | 22:30 | |
*** jrosser_ has joined #openstack-infra | 22:30 | |
*** kmalloc_ has joined #openstack-infra | 22:30 | |
*** dtantsur has joined #openstack-infra | 22:30 | |
clarkb | an engineering college in india did the largest amount of data transfer during that ~4 hour window with the largest cummulative time spent against gitea08 | 22:31 |
*** betherly has joined #openstack-infra | 22:31 | |
clarkb | ~5.5x data transfered than emc | 22:31 |
clarkb | about about .25 of a day more time spent talking to the gitea08 backend | 22:32 |
clarkb | did class start and everyone was told to clone nova at the same time or something? | 22:32 |
clarkb | their connections do begin in earnest around that 8:40UTC timeperiod | 22:35 |
clarkb | I'm going to try and map that onto gitea connection logs now | 22:35 |
*** jpenag has joined #openstack-infra | 22:36 | |
*** betherly has quit IRC | 22:36 | |
*** jpena|off has quit IRC | 22:37 | |
*** dtantsur|afk has quit IRC | 22:37 | |
*** mordred has quit IRC | 22:37 | |
*** mnasiadka has quit IRC | 22:37 | |
*** jamespage has quit IRC | 22:37 | |
*** kmalloc has quit IRC | 22:37 | |
*** sdoran has quit IRC | 22:37 | |
*** dustinc has quit IRC | 22:37 | |
*** jrosser has quit IRC | 22:37 | |
*** mnasiadka_ is now known as mnasiadka | 22:37 | |
*** kmalloc_ is now known as kmalloc | 22:37 | |
*** jamespage_ is now known as jamespage | 22:37 | |
*** dustinc_ is now known as dustinc | 22:37 | |
*** jrosser_ is now known as jrosser | 22:37 | |
*** sdoran_ is now known as sdoran | 22:37 | |
*** panda has quit IRC | 22:41 | |
*** panda has joined #openstack-infra | 22:42 | |
*** mordred has joined #openstack-infra | 22:44 | |
*** e0ne_ has joined #openstack-infra | 22:45 | |
*** e0ne has quit IRC | 22:46 | |
*** e0ne has joined #openstack-infra | 22:47 | |
*** e0ne_ has quit IRC | 22:50 | |
*** tkajinam has joined #openstack-infra | 22:51 | |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Add swift base test job https://review.opendev.org/674143 | 22:51 |
*** e0ne has quit IRC | 22:52 | |
*** whoami-rajat has quit IRC | 22:55 | |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Upload to a swift at random https://review.opendev.org/674144 | 23:07 |
corvus | i think that ^ is going to be pretty cool :) | 23:08 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Support Rackspace in upload-logs-swift https://review.opendev.org/674136 | 23:10 |
*** igordc has quit IRC | 23:10 | |
*** ianychoi has quit IRC | 23:12 | |
*** betherly has joined #openstack-infra | 23:12 | |
clarkb | infra-root https://etherpad.openstack.org/p/debugging-gitea08-OOM | 23:13 |
clarkb | I've tried to collect my current thoughts on the problem there | 23:13 |
clarkb | corvus: re swift stuff neat. We could probably also weight the swift uploads by total storage size | 23:14 |
clarkb | so that we balance things out over time | 23:14 |
clarkb | (but random is probably close enough) | 23:14 |
*** betherly has quit IRC | 23:18 | |
*** rcernin has joined #openstack-infra | 23:21 | |
clarkb | tl;dr is we have some users that fetch disproportionately more data per connection and more time per connection and on top of that we have some very very long lived requests in gitea whcih I think may result in git processes living for extended periods using all the memory | 23:24 |
clarkb | I don't want to tell users to go away, which means our best option may be to have gitea timeout quicker if things are going south? | 23:24 |
clarkb | ianw: left a comment on https://review.opendev.org/#/c/673739/1 | 23:33 |
ianw | clarkb: yeah, i need to work on arm64 bionic control plane | 23:40 |
*** sthussey has quit IRC | 23:49 | |
*** aaronsheffield has quit IRC | 23:51 | |
*** betherly has joined #openstack-infra | 23:54 | |
*** trident has quit IRC | 23:54 | |
*** diablo_rojo is now known as diablo_rojo_ | 23:56 | |
*** diablo_rojo_ is now known as diablo__rojo_ | 23:56 | |
*** diablo__rojo_ is now known as diablo_rojoooooo | 23:57 | |
*** diablo_rojoooooo is now known as diablo_rojo | 23:57 | |
*** betherly has quit IRC | 23:58 | |
*** trident has joined #openstack-infra | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!