*** markvoelker has joined #openstack-infra | 00:00 | |
*** tkelsey has joined #openstack-infra | 00:00 | |
*** jimbaker has joined #openstack-infra | 00:00 | |
*** jimbaker has quit IRC | 00:00 | |
*** jimbaker has joined #openstack-infra | 00:00 | |
cloudnull | pabelanger mordred is there a way we could store the console log from a vm if it's marked error by nodepool? | 00:01 |
---|---|---|
pabelanger | cloudnull: Ya, we don't have a way today to keep a node online with ready-script failure. Maybe jeblair has some thoughts on that | 00:01 |
cloudnull | IE boot, if fail store console log, delete? | 00:02 |
clarkb | now thats interesting, ubuntu does run an ntpdate on if up | 00:02 |
clarkb | I wonder if its failing to resolve dns at that point due to unbound not being up? | 00:02 |
pabelanger | we could make our configure_mirror.sh script smarter | 00:02 |
*** Goneri has quit IRC | 00:03 | |
*** spzala has joined #openstack-infra | 00:03 | |
cloudnull | I guess I could enable defered delete for a while and try to trap the log of instances. | 00:04 |
openstackgerrit | Ian Wienand proposed openstack-infra/project-config: Further F24 kernel update https://review.openstack.org/353783 | 00:04 |
clarkb | its tempting to just go back to ntpdate, deprecation or not, there doesn't seem to be any other sane tools to do this | 00:04 |
fungi | clarkb: not all of our providers (/me glares) provide nova console log access, so we haven't relied on it in nodepool previously | 00:05 |
fungi | er, cloudnull ^ | 00:05 |
fungi | sorry, clarkb | 00:05 |
* fungi failx0rz at teh tabcompletes | 00:05 | |
*** tkelsey has quit IRC | 00:05 | |
* cloudnull knows who to glare at... | 00:05 | |
mordred | cloudnull: we have the ability to hold nodes on error in nodepool - but it currently only works on job names | 00:06 |
fungi | so, yes, it's possible obviously. nodepool calls openstack apis, nodepool logs things, nodepool could call another api method and log the results | 00:06 |
fungi | it's "just" a matter of code, as they say | 00:06 |
* mordred muses having a feature of be able to grab an error node from a provider rather than a job | 00:06 | |
*** piet_ has joined #openstack-infra | 00:07 | |
*** baoli has joined #openstack-infra | 00:07 | |
cloudnull | I think for now i'll set the reclaim_instance_interval w/in the nova.conf to something like an hour or so. | 00:07 |
mordred | fungi: it would not be difficult "just code" to attempt a console log grab on node boot error | 00:08 |
fungi | agreed | 00:08 |
cloudnull | then next time we have an ssh timeout let me know and I can go look at the things. | 00:08 |
mordred | fungi: I'm cooking meat now - but I can make that patch tomorrow | 00:08 |
fungi | yeah, was going to say, as long as it ends up on someone's "just code" list, that's the tricky bit | 00:09 |
clarkb | ianw: pabelanger so I am open to ideas, but even using eg chrony on centos/fedora and ntp on ubuntu/debian isn't going to fix this for us I don't think | 00:09 |
*** tqtran has quit IRC | 00:09 | |
clarkb | ianw: pabelanger since we will continue to run into the problem of slowly skewing time rather than making a step at boot to avoid that | 00:09 |
fungi | speaking of "just code" third batch of contributor registration discount codes for barcelona just finished going out. 270 in ~3 weeks | 00:10 |
*** PalTale has joined #openstack-infra | 00:10 | |
ianw | clarkb: yes, i don't really see ntpdate actually being deprecated, despite what it says. the RH maintainer tells people it's about the only sane way to start ntp | 00:10 |
fungi | used latest state of 263971 for that (i also did lots of additional validation of results against older data to make sure it did what was expected of it) | 00:10 |
fungi | ianw: though the rh maintainer also said not relying on ntp was even saner on rh-derivatives since it's no longer default | 00:11 |
clarkb | ianw: even using chronyd you have to do non default things to make it actually step from my reading | 00:11 |
clarkb | basically the time sync services as implemented by these distros don't solve this problem | 00:12 |
clarkb | which si annoying | 00:12 |
harlowja | mordred ' You might even come to the conclusion that my personal preferences | 00:12 |
harlowja | or needs are not the most important thing. I' | 00:12 |
harlowja | but they are! | 00:12 |
harlowja | ha | 00:12 |
mordred | harlowja: :) | 00:12 |
harlowja | as long as your preferences are my preferences | 00:12 |
harlowja | lol | 00:12 |
*** spzala has quit IRC | 00:12 | |
openstackgerrit | Chris Krelle proposed openstack/diskimage-builder: WIP: A hardware burn-in element. https://review.openstack.org/355675 | 00:13 |
ianw | fungi: yes, that too | 00:13 |
mordred | harlowja: listen - you are entitled to your own wrong opinion | 00:13 |
fungi | clarkb: i wonder if the openntpd package for debian/ubuntu has a config option to start with -s | 00:13 |
harlowja | mordred not if donald gets elected, lol | 00:13 |
mordred | harlowja: I believe it's a basic human right | 00:13 |
* mordred steps away from election talk ... | 00:13 | |
harlowja | hahahahha | 00:13 |
ianw | clarkb: i believe "chronyc makestep" is the ntpdate equivalent | 00:14 |
* harlowja goes right into the deep end | 00:14 | |
openstackgerrit | Abhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service https://review.openstack.org/355670 | 00:14 |
clarkb | ianw: yup but that doesn't happen for us on boot | 00:15 |
clarkb | ianw: so we would have to write our own service to do it or otherwise hack it in | 00:15 |
openstackgerrit | Merged openstack-infra/system-config: Fix firehose hostname on cacti hiera https://review.openstack.org/355671 | 00:19 |
ianw | clarkb: the config is | 00:22 |
ianw | # In first three updates step the system clock instead of slew | 00:22 |
ianw | # if the adjustment is larger than 1 second. | 00:22 |
ianw | makestep 1.0 3 | 00:22 |
clarkb | ianw: thats teh chronyd default config on centos/fedora? | 00:23 |
*** Hal has quit IRC | 00:23 | |
ianw | clarkb: yes | 00:23 |
clarkb | ah ok | 00:23 |
ianw | so, of course systemd is in the mix here | 00:23 |
clarkb | yes systemd has its own service to do syncing | 00:24 |
clarkb | but it has almost no docs | 00:24 |
ianw | oh, that's not in use, but i think chrony does have network detection service bits | 00:24 |
clarkb | ianw: its in use on ubuntu xenial :/ | 00:24 |
ianw | particularly http://pkgs.fedoraproject.org/cgit/rpms/chrony.git/tree/chrony-dnssrv@.service | 00:24 |
*** Swami has quit IRC | 00:24 | |
clarkb | by default | 00:24 |
openstackgerrit | YAMAMOTO Takashi proposed openstack-infra/project-config: networking-midonet: switch to python-db-jobs https://review.openstack.org/335551 | 00:25 |
*** gildub has joined #openstack-infra | 00:26 | |
ianw | clarkb: ah ... so now we have 3 methods to set the time | 00:27 |
clarkb | ianw: indeed :( | 00:27 |
*** fitoduarte has quit IRC | 00:27 | |
clarkb | ianw: though I am somewhat partial to just using one across the barod if it can be made to work sanely | 00:27 |
clarkb | chronyd seems fine except ubuntu doesn't seemt o have that makestep setup that centos/fedora do | 00:27 |
*** thorst_ has joined #openstack-infra | 00:28 | |
clarkb | I wonder if we can configure that via the default file somehow | 00:28 |
*** piet_ has quit IRC | 00:29 | |
*** woodster_ has quit IRC | 00:29 | |
*** signed8bit is now known as signed8bit_Zzz | 00:30 | |
ianw | clarkb: just looking at the deb packaging now... | 00:31 |
clarkb | ianw: I did confirm that an ubuntu-minimal build of xenial boots up with teh systemd service running | 00:31 |
clarkb | and trusty doesn't have anything | 00:32 |
*** gildub_ has joined #openstack-infra | 00:32 | |
clarkb | ianw: I think the ideal situation would be to have on each distro we run something that does the similar case to ntpdate first then ntpd. Then completely remove ntp munging from devstack-gate. Sounds like centos/fedora do this with chrony, so need to figure out ubuntu/debian option that works | 00:33 |
*** xarses has quit IRC | 00:34 | |
clarkb | my reading of the ntp setup on ubuntu/debian is that it will try to ntpdate on if-up but that doesn't seemt to be working for us? Maybe beacuse of a race between unbound and networking coming up resolving the ntp servers | 00:34 |
*** gongysh has joined #openstack-infra | 00:36 | |
clarkb | ianw: and the chrony package on ubuntu will do a burst but not a step from my reading of scripts | 00:38 |
ianw | clarkb: yeah, so config in https://launchpad.net/ubuntu/+archive/primary/+files/chrony_2.1.1-1.debian.tar.xz doesn't specify makestep as you say. to me, a bug saying "redhat does it, it would be nice to be consistent and it's probably what you want anyway" might be ok | 00:38 |
ianw | but time people also seem very, ahh, set in their ways | 00:38 |
ianw | so i expect that might also be closed with a flame to boot | 00:39 |
* fungi always boots in flames | 00:39 | |
fungi | and in flaming boots | 00:39 |
clarkb | might be worth attempting to trace the normal ntp boot up and see if ntpdate is in fact running and if it is failing due to other deps not being there at boot | 00:41 |
clarkb | I have some local VMs I can use to try and attempt that but need to go to dinner now | 00:41 |
*** rbuzatu has joined #openstack-infra | 00:41 | |
ianw | clarkb: i have an osic vm i've been pottering on for f24. let me rebuild that with a ubuntu image and see if anything pops out | 00:42 |
cloudnull | clarkb mordred pabelanger I have deffered deletes enabled now. if at all possible I'd love to know the next time an instance has ssh issues so i can go hunt down that specific failures and such. | 00:45 |
pabelanger | cloudnull: sure, I can check now | 00:46 |
*** rbuzatu has quit IRC | 00:46 | |
*** amotoki has joined #openstack-infra | 00:47 | |
pabelanger | cloudnull: 8098e5c0-125f-4fda-9887-496f8f7fdf7d | 00:48 |
pabelanger | cloudnull: just failed | 00:48 |
*** jamielennox is now known as jamielennox|away | 00:48 | |
cloudnull | ok | 00:48 |
ianw | clarkb: ok, so with ntpdate not in the base image, it's not starting on boot for sure | 00:49 |
*** spzala has joined #openstack-infra | 00:49 | |
*** jamielennox|away is now known as jamielennox | 00:49 | |
*** tonytan4ever has joined #openstack-infra | 00:50 | |
pabelanger | cloudnull: 7b3e102d-3f32-4a76-9e44-abf0b42dad4d is another | 00:50 |
ianw | clarkb: and when it is there, it is called in the network ifup scripts, but -> Aug 16 00:49:35 iwienand-f24-test ntpdate[816]: Can't find host 3.debian.pool.ntp.org: Name or service not known (-2) | 00:50 |
*** csmart has quit IRC | 00:52 | |
*** csmart has joined #openstack-infra | 00:53 | |
cloudnull | pabelanger: idk if its related but both of those instances are 16.04? do we generally see these ssh failures more on 16.04 than not? | 00:54 |
cloudnull | or is 16.04 just what's more common now? | 00:55 |
pabelanger | cloudnull: let me check, I have logs. we are doing more and more xenial | 00:55 |
cloudnull | also both are using config_drive, is that the default? | 00:56 |
cloudnull | i'd like to spin up lots of tests to reproduce this issue without continuing to bother you :) | 00:56 |
*** fguillot_ has quit IRC | 00:56 | |
openstackgerrit | fumihiko kakuma proposed openstack-infra/devstack-gate: Enable to add sudo permission to tempest user https://review.openstack.org/355682 | 00:57 |
*** gyee has quit IRC | 00:59 | |
*** fguillot_ has joined #openstack-infra | 00:59 | |
pabelanger | cloudnull: you are correct, it looks to be only xenial failing | 01:00 |
pabelanger | cloudnull: let me manually launch one and see why | 01:00 |
*** rbuzatu has joined #openstack-infra | 01:04 | |
*** aeng has quit IRC | 01:04 | |
*** gongysh has quit IRC | 01:05 | |
*** aeng has joined #openstack-infra | 01:05 | |
*** zhurong has joined #openstack-infra | 01:05 | |
ianw | clarkb: so here's how i think it goes on trusty. ./network/if-up.d/ntpdate gets called by ifup ... but dhclient is still working at that point. that's ok, because ./dhcp/dhclient-exit-hooks.d/ntpdate will be called when we actually have network | 01:05 |
ianw | clarkb: none of this happens on boot of our trusty images, because ntpdate isn't installed | 01:05 |
*** esberglu has joined #openstack-infra | 01:06 | |
ianw | which is probably the fault of puppet-ntp ... i don't think ntpdate is really an optional component | 01:07 |
*** pahuang has joined #openstack-infra | 01:07 | |
*** rbuzatu has quit IRC | 01:08 | |
*** tqtran has joined #openstack-infra | 01:09 | |
clarkb | aha! | 01:09 |
*** baoli has quit IRC | 01:10 | |
*** julim has joined #openstack-infra | 01:10 | |
pabelanger | clarkb: cloudnull: okay, reproduced the failure of host git.openstack.org in osic-cloud1. I think we have a race condition, if I ran the command 1min later, it worked | 01:11 |
*** adrian_otto has quit IRC | 01:11 | |
clarkb | oraybe a nat issue? | 01:11 |
pabelanger | possible | 01:12 |
pabelanger | let me force ipv6 and reboot | 01:12 |
*** baoli has joined #openstack-infra | 01:12 | |
pabelanger | going to also check that sshd depends on unbound too | 01:12 |
*** tqtran has quit IRC | 01:13 | |
cloudnull | maybe we can add something like this to the script http://cdn.pasteraw.com/cs48x75pis3n67r63j5mgc0a3fsscur ? | 01:14 |
pabelanger | ya, unbound is taking a while to start | 01:14 |
cloudnull | then it can try for a min or two before failing ? | 01:14 |
*** weshay has quit IRC | 01:15 | |
pabelanger | http://paste.openstack.org/show/557770/ | 01:16 |
pabelanger | unbound is taking about 1 min to start | 01:16 |
pabelanger | err | 01:16 |
pabelanger | yes, 1 min | 01:16 |
*** esberglu has quit IRC | 01:17 | |
*** gildub_ has quit IRC | 01:18 | |
*** gildub has quit IRC | 01:18 | |
*** jimbaker has quit IRC | 01:18 | |
*** gildub has joined #openstack-infra | 01:19 | |
cloudnull | pabelanger: rather... http://cdn.pasteraw.com/n57rvu8vw6w3q8mzd9s0hiua5i8v677 -- forgot an import loop there ;) | 01:19 |
*** Apoorva_ has joined #openstack-infra | 01:19 | |
pabelanger | cloudnull: Ya, we could trying polling a few times. Let me see why unbound is taking 1 min to start | 01:20 |
*** asettle has joined #openstack-infra | 01:22 | |
*** jimbaker has joined #openstack-infra | 01:22 | |
*** jimbaker has quit IRC | 01:22 | |
*** jimbaker has joined #openstack-infra | 01:22 | |
*** Apoorva has quit IRC | 01:23 | |
*** Apoorva_ has quit IRC | 01:24 | |
cloudnull | going to grab a bite, back in a while. | 01:24 |
*** rajinir has quit IRC | 01:25 | |
*** spzala has quit IRC | 01:29 | |
*** spzala has joined #openstack-infra | 01:30 | |
pabelanger | cloudnull: clarkb: So, I think unbound is blocking on key generation: http://paste.openstack.org/show/557771/ waiting for random from the kernel | 01:31 |
pabelanger | cloudnull: clarkb: so, we can either, make configure_mirror.sh smartly but polling service unbound status every 30second, 10 times: http://paste.openstack.org/show/557773/ | 01:32 |
pabelanger | cloudnull: clarkb: see if we can preseed the key, or disable the key | 01:32 |
*** asettle has quit IRC | 01:32 | |
*** aeng has quit IRC | 01:33 | |
*** spzala has quit IRC | 01:34 | |
*** dkehn_ has quit IRC | 01:34 | |
*** dkehn has quit IRC | 01:34 | |
*** thorst_ has quit IRC | 01:38 | |
*** thorst_ has joined #openstack-infra | 01:39 | |
*** rfolco has quit IRC | 01:39 | |
*** hparekh has quit IRC | 01:40 | |
*** nwkarsten has joined #openstack-infra | 01:43 | |
*** baoli has quit IRC | 01:43 | |
*** amotoki has quit IRC | 01:43 | |
*** Sukhdev has quit IRC | 01:44 | |
*** gongysh has joined #openstack-infra | 01:45 | |
*** elo has quit IRC | 01:45 | |
*** dkehn has joined #openstack-infra | 01:47 | |
*** dkehn_ has joined #openstack-infra | 01:47 | |
*** thorst_ has quit IRC | 01:48 | |
*** yanyanhu has joined #openstack-infra | 01:48 | |
*** larainema has quit IRC | 01:49 | |
openstackgerrit | Tim Burke proposed openstack-dev/hacking: Add optional H203 to check that assertIs(Not)None is used https://review.openstack.org/276517 | 01:50 |
*** baoli has joined #openstack-infra | 01:51 | |
*** vinaypotluri has quit IRC | 01:51 | |
*** hparekh has joined #openstack-infra | 01:51 | |
*** gongysh has quit IRC | 01:55 | |
*** tkelsey has joined #openstack-infra | 02:02 | |
*** thorst_ has joined #openstack-infra | 02:02 | |
*** thorst_ has quit IRC | 02:03 | |
*** inc0 has joined #openstack-infra | 02:03 | |
openstackgerrit | James Polley proposed openstack-dev/pbr: Fix handling of old git log output https://review.openstack.org/339392 | 02:03 |
*** gongysh has joined #openstack-infra | 02:04 | |
*** dimtruck is now known as zz_dimtruck | 02:05 | |
*** rbuzatu has joined #openstack-infra | 02:05 | |
*** zz_dimtruck is now known as dimtruck | 02:05 | |
openstackgerrit | zhangyanxian proposed openstack-infra/project-config: Fix typo in the Pypi-extract-name.py https://review.openstack.org/355692 | 02:05 |
*** tkelsey has quit IRC | 02:06 | |
*** jamielennox is now known as jamielennox|away | 02:07 | |
*** xarses has joined #openstack-infra | 02:07 | |
openstackgerrit | zhangyanxian proposed openstack-infra/project-config: Fix typo in the Pypi-extract-name.py https://review.openstack.org/355692 | 02:09 |
openstackgerrit | zhangyanxian proposed openstack-infra/project-config: Fix typo in the pypi-extract-name.py https://review.openstack.org/355692 | 02:09 |
*** rbuzatu has quit IRC | 02:10 | |
openstackgerrit | Merged openstack-infra/project-config: Further F24 kernel update https://review.openstack.org/353783 | 02:10 |
openstackgerrit | James Polley proposed openstack-dev/pbr: Fix handling of old git log output https://review.openstack.org/339392 | 02:11 |
*** pradk has quit IRC | 02:12 | |
openstackgerrit | James Polley proposed openstack-dev/pbr: Fix handling of old git log output https://review.openstack.org/339392 | 02:18 |
*** aeng has joined #openstack-infra | 02:19 | |
*** gongysh has quit IRC | 02:20 | |
openstackgerrit | Timothy R. Chavez proposed openstack-infra/jenkins-job-builder: Use xml_jobs not jobs https://review.openstack.org/355694 | 02:20 |
*** baoli has quit IRC | 02:21 | |
*** elo has joined #openstack-infra | 02:22 | |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config: Add smarter dns checking for configure_mirror.sh https://review.openstack.org/355695 | 02:22 |
pabelanger | cloudnull: clarkb: fungi: So, that should fix our launch node errors around DNS not working ^. In the case of osic-cloud1 and ubuntu-xenial, we are SSHing into the node and running host git.openstack.org before unbound has finished starting | 02:24 |
*** raunak has quit IRC | 02:25 | |
*** jamielennox|away is now known as jamielennox | 02:26 | |
timrc | zxiiro: Hi... it looks like 80aa5266166dfcc84be765060cae7c6eac363ecd caused a regression. See: https://review.openstack.org/#/c/355694/ | 02:27 |
*** mriedem is now known as mriedem_away | 02:27 | |
timrc | zxiiro: Use of --delete-old with commit 80aa5266166dfcc84be765060cae7c6eac363ecd will delete every job. | 02:27 |
fungi | pabelanger: what are the odds that we're not preinstalling haveged on our nodes, resulting in entropy starvation? | 02:29 |
fungi | timrc: i thought they fixed that last week? | 02:29 |
zxiiro | i thought we fixed it too. I've been using it on my systems with no issue | 02:30 |
zxiiro | i'm not sure what the difference of passing xml_jobs instead of jobs is. Jobs is what is returned from jenkins as the list of all jobs that were updated, hence shouldn't be deleted. xml_jobs should be the same too? | 02:31 |
timrc | Not what I'm seeing... | 02:31 |
zxiiro | timrc: how are you running your command? jenkins-jobs update --delete-old jjbs/ ? | 02:33 |
*** asettle has joined #openstack-infra | 02:33 | |
timrc | zxiiro: Essentially, e.g. jenkins-jobs --conf /etc/jenkins_jobs/jenkins_jobs.ini update ./jjb-jobs/servers/`hostname` --delete-old | 02:34 |
*** netsin has quit IRC | 02:34 | |
*** signed8bit_Zzz is now known as signed8bit | 02:37 | |
timrc | zxiiro: From my console running the script that runs whenever a change to our jobs repo changes... http://paste.openstack.org/show/557860/ | 02:38 |
zxiiro | timrc: well let me test it real quick and if it works for me I'll merge it | 02:39 |
*** mdrabe has joined #openstack-infra | 02:39 | |
*** asettle has quit IRC | 02:40 | |
*** tphummel has quit IRC | 02:40 | |
*** vinaypotluri has joined #openstack-infra | 02:42 | |
timrc | zxiiro: I think the jobs list that gets returned by update_jobs is just the list of jobs that changed.. so if no jobs changed, for example, it returns []. That empty list gets passed as the "keeps" list. Since no jobs are in that list, they all get removed. | 02:44 |
*** hongbin has joined #openstack-infra | 02:44 | |
*** bin_ has quit IRC | 02:44 | |
timrc | If we use xml_jobs the "keeps" list will always be every job in config, regardless of if it changed or not. | 02:45 |
timrc | Which is exactly what we want, I think. | 02:45 |
*** zhenguo has joined #openstack-infra | 02:46 | |
timrc | --delete-old should presumably just delete the jobs which are no longer in config. | 02:46 |
zxiiro | timrc: yeah i'm testing that theory now. I want ot make sure we understand the difference betweeen the 2 | 02:48 |
zxiiro | timrc: i suspect i didn't catch it in testing because i run my system with ignore_cache=True | 02:48 |
*** gongysh_ has joined #openstack-infra | 02:48 | |
*** yuanying has quit IRC | 02:49 | |
zxiiro | timrc: Ok I just confirmed it | 02:49 |
zxiiro | timrc: you're right. jobs returns only updated so if you cached and your jobs didnt' update it won't be in the list. xml_Jobs is the right thing to use | 02:50 |
zxiiro | timrc: can you update the commit message to explain that? | 02:50 |
zxiiro | timrc: I'll approve the change right away once you do that | 02:50 |
*** elo has quit IRC | 02:51 | |
*** yuanying has joined #openstack-infra | 02:52 | |
*** jimbaker has quit IRC | 02:53 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config: Add a script to list change owner statistics https://review.openstack.org/263971 | 02:53 |
*** yamahata has quit IRC | 02:54 | |
*** inc0 has quit IRC | 02:55 | |
*** elo has joined #openstack-infra | 02:57 | |
*** jimbaker has joined #openstack-infra | 02:57 | |
*** jimbaker has quit IRC | 02:57 | |
*** jimbaker has joined #openstack-infra | 02:57 | |
*** gongysh_ has quit IRC | 02:57 | |
ianw | is it possible there's something up with the nodepool builder? | 02:58 |
*** signed8bit is now known as signed8bit_Zzz | 03:00 | |
pabelanger | fungi: I am not sure, I'd have to check. I've never used haveged before either | 03:00 |
pabelanger | ianw: I kicked of a build an hour or so go | 03:01 |
pabelanger | looks like ubuntu-xenial is just finishing up | 03:01 |
pabelanger | actually, done now | 03:01 |
*** baoli has joined #openstack-infra | 03:01 | |
ianw | pabelanger: ahh, yeah, sorry should have checked the debug log | 03:01 |
*** krtaylor has joined #openstack-infra | 03:02 | |
pabelanger | fungi: it is installed for ubuntu-xenial | 03:02 |
*** thorst_ has joined #openstack-infra | 03:03 | |
*** yamahata has joined #openstack-infra | 03:03 | |
*** dimtruck is now known as zz_dimtruck | 03:07 | |
ianw | pabelanger: what's up with that -> OpenStackCloudException: Image creation failed: delete() takes exactly 2 arguments (1 given) | 03:07 |
*** raunak has joined #openstack-infra | 03:08 | |
*** signed8bit_Zzz is now known as signed8bit | 03:09 | |
pabelanger | ianw: never seen that | 03:10 |
pabelanger | ianw: I did delete some old DIB images from nodepool.o.o tonight however | 03:10 |
*** apetrich has joined #openstack-infra | 03:10 | |
*** netsin has joined #openstack-infra | 03:10 | |
*** nwkarste_ has joined #openstack-infra | 03:11 | |
pabelanger | ianw: looks like a bug in shade | 03:11 |
pabelanger | mordred: ^ | 03:11 |
ianw | pabelanger: yeah ... odd traceback | 03:11 |
ianw | http://paste.openstack.org/show/557882/ | 03:11 |
*** thorst_ has quit IRC | 03:11 | |
*** elo has quit IRC | 03:12 | |
zxiiro | timrc: looks like you're not here. I'll update the commit message | 03:12 |
*** nwkarsten has quit IRC | 03:13 | |
*** fguillot_ has quit IRC | 03:14 | |
openstackgerrit | Thanh Ha proposed openstack-infra/jenkins-job-builder: Use xml_jobs not jobs https://review.openstack.org/355694 | 03:14 |
*** elo has joined #openstack-infra | 03:17 | |
*** nwkarste_ has quit IRC | 03:18 | |
ianw | pabelanger: that tb really makes no sense ... i get the feeling the builder process might not be running the same code as on disk... | 03:19 |
pabelanger | ianw: possible, you can restart it if you want, I am done for the night | 03:20 |
*** raunak has quit IRC | 03:20 | |
ianw | pabelanger: ok, no worries, i'll see, numbers might make sense on an older release | 03:21 |
*** raunak has joined #openstack-infra | 03:21 | |
timrc | zxiiro: Sorry, was putting my daughter to sleep. Readig up | 03:22 |
zxiiro | timrc: no worries. once jenkins returns I will merge it | 03:24 |
*** psilvad has quit IRC | 03:25 | |
timrc | zxiiro: Excellent. Thanks! | 03:25 |
zxiiro | timrc: no thank you for reporting and fixing the issue! | 03:25 |
*** baoli has quit IRC | 03:26 | |
*** rbuzatu has joined #openstack-infra | 03:26 | |
ianw | pabelanger: to answer my own question, the shade .py files are from the 13th, and the builder was started on the 12th. so yeah, the numbers don't line up in the tb | 03:27 |
*** shashank_hegde has joined #openstack-infra | 03:30 | |
*** rbuzatu has quit IRC | 03:31 | |
ianw | yep, 1.9.0 makes much more sense | 03:32 |
*** signed8bit has quit IRC | 03:34 | |
*** signed8bit has joined #openstack-infra | 03:34 | |
*** shashank_hegde has quit IRC | 03:36 | |
beagles | meh, still have really weird issues with zuul ansible on ubuntu. "async task produced unparseable results" shows up in the ansible log and the job fails | 03:38 |
*** signed8bit has quit IRC | 03:38 | |
*** signed8b_ has joined #openstack-infra | 03:39 | |
*** julim has quit IRC | 03:42 | |
*** vikrant has joined #openstack-infra | 03:42 | |
*** yamahata has quit IRC | 03:43 | |
*** roxanaghe has joined #openstack-infra | 03:43 | |
*** roxanaghe has quit IRC | 03:43 | |
*** hongbin has quit IRC | 03:45 | |
*** ramishra has quit IRC | 03:45 | |
*** rajinir has joined #openstack-infra | 03:45 | |
*** nwkarsten has joined #openstack-infra | 03:46 | |
beagles | pabelanger, you still around - if so, should that possible fix to ^^^ have propogated through to where it'd get picked up on a recheck? | 03:47 |
ianw | beagles: we were having issue with that on fedora, which had to do with locales on the host and it outputting error messages that got things confused | 03:47 |
beagles | ouch. how did you resolve it? | 03:48 |
clarkb | beagles: ianw my understanding is jeblair kocked off some restarts to pick up new ansible today | 03:48 |
*** yuanying has quit IRC | 03:48 | |
*** roxanaghe has joined #openstack-infra | 03:48 | |
clarkb | the ansible fix merged but us not yet released | 03:48 |
jeblair | yeah, should be in place. we may need to hold a node to debug further. (i can't do that now) | 03:48 |
beagles | clarkb, awwww okay | 03:48 |
ianw | beagles: fixed the locales in the image build :) but yeah, ansible did fix it in later release | 03:48 |
beagles | clarkb, I had sifted through IRC backlog and misunderstood - thought it was "in the mix" | 03:49 |
openstackgerrit | Merged openstack-infra/jenkins-job-builder: Use xml_jobs not jobs https://review.openstack.org/355694 | 03:49 |
jeblair | beagles: it is in place -- we are running unreleased ansible to get it | 03:49 |
*** ramishra has joined #openstack-infra | 03:51 | |
*** yuanying has joined #openstack-infra | 03:51 | |
beagles | jeblair, okay nice.. how long ago would it have been available? Just want to confirm these jobs were launched before they would've gotten the fix | 03:51 |
jeblair | beagles: i think i status logged it... 1 sec | 03:52 |
jeblair | beagles: https://wiki.openstack.org/wiki/Infrastructure_Status says | 03:52 |
jeblair | 2016-08-15 20:34:14 UTC Installed ansible stable-2.1 branch on zuul launchers to pick up https://github.com/ansible/ansible/commit/d35377dac78a8fcc6e8acf0ffd92f47f44d70946 | 03:52 |
*** nwkarsten has quit IRC | 03:52 | |
*** nwkarsten has joined #openstack-infra | 03:53 | |
beagles | jeblair, crap.. then unless I'm missing something it should've been picked up.. 1s | 03:54 |
beagles | jeblair, is there something in the ansible logs, etc. I can spot to check what version was being used? | 03:55 |
*** signed8bit has joined #openstack-infra | 03:56 | |
*** asettle has joined #openstack-infra | 03:56 | |
*** nwkarsten has quit IRC | 03:57 | |
*** signed8b_ has quit IRC | 03:58 | |
*** winggundamth has quit IRC | 03:59 | |
*** asettle has quit IRC | 04:01 | |
prometheanfire | think I may have found a bug in git-review/gerrit | 04:06 |
prometheanfire | maybe | 04:06 |
prometheanfire | can you git-review to the same change-id but a diferent branch? | 04:07 |
prometheanfire | huh, you can | 04:07 |
prometheanfire | nvm then lol | 04:07 |
prometheanfire | https://review.openstack.org/#/q/I67d7a5000bfe0c98717d3e29d23edc9c6117e765,n,z | 04:07 |
*** thorst_ has joined #openstack-infra | 04:10 | |
*** tqtran has joined #openstack-infra | 04:10 | |
beagles | jeblair, actually .. what I'm looking at looks like a timeout... wow | 04:10 |
clarkb | prometheanfire: yes change ids are not unique | 04:12 |
clarkb | prometheanfire: the unique tuple is prject, branch, change id | 04:13 |
*** hichihara has joined #openstack-infra | 04:13 | |
prometheanfire | just realized that :D | 04:13 |
*** winggundamth has joined #openstack-infra | 04:14 | |
prometheanfire | toabctl: for you to watch I guess https://review.openstack.org/353349 https://review.openstack.org/355711 | 04:14 |
*** tqtran has quit IRC | 04:14 | |
prometheanfire | bah | 04:14 |
prometheanfire | toabctl: sorry, mistype | 04:14 |
prometheanfire | tonyb: for you to watch I guess https://review.openstack.org/353349 https://review.openstack.org/355711 | 04:15 |
prometheanfire | tonyb: though you might be done working | 04:15 |
openstackgerrit | kyle liu proposed openstack-infra/project-config: Add new project networking-zte https://review.openstack.org/355278 | 04:16 |
prometheanfire | also, if someone has some time to review... https://review.openstack.org/#/c/310865/ | 04:17 |
*** thorst_ has quit IRC | 04:17 | |
*** rlandy has quit IRC | 04:19 | |
*** jimbaker has quit IRC | 04:20 | |
*** links has joined #openstack-infra | 04:20 | |
*** sflanigan has joined #openstack-infra | 04:22 | |
*** sflanigan has joined #openstack-infra | 04:22 | |
*** raunak has quit IRC | 04:22 | |
*** jimbaker has joined #openstack-infra | 04:23 | |
*** jimbaker has quit IRC | 04:23 | |
*** jimbaker has joined #openstack-infra | 04:23 | |
*** raunak has joined #openstack-infra | 04:24 | |
*** javeriak has joined #openstack-infra | 04:26 | |
openstackgerrit | Ian Wienand proposed openstack-infra/shade: Use "image" as argument for Glance V1 upload error path https://review.openstack.org/355715 | 04:27 |
*** tonytan4ever has quit IRC | 04:27 | |
ianw | pabelanger: ^ re that error. | 04:27 |
ianw | that fixed, i'll restart the builder now since it's quiet and so it's running the same code that's actually on disk :) | 04:28 |
*** javeriak has quit IRC | 04:31 | |
*** kzaitsev_mb has joined #openstack-infra | 04:38 | |
ianw | i wonder why "nodepool image-build fedora-24" gets stuck? | 04:42 |
*** Sukhdev has joined #openstack-infra | 04:43 | |
*** javeriak has joined #openstack-infra | 04:45 | |
*** rbuzatu has joined #openstack-infra | 04:48 | |
*** pgadiya has joined #openstack-infra | 04:48 | |
*** sarob has joined #openstack-infra | 04:49 | |
*** signed8bit has quit IRC | 04:52 | |
*** sarob has quit IRC | 04:53 | |
*** mdrabe has quit IRC | 04:54 | |
*** rbuzatu has quit IRC | 04:54 | |
*** psachin has joined #openstack-infra | 04:59 | |
*** arnewiebalck has quit IRC | 05:00 | |
*** jimbaker has quit IRC | 05:00 | |
*** tonytan4ever has joined #openstack-infra | 05:03 | |
*** kzaitsev_mb has quit IRC | 05:03 | |
*** jimbaker has joined #openstack-infra | 05:04 | |
*** jimbaker has quit IRC | 05:04 | |
*** jimbaker has joined #openstack-infra | 05:04 | |
*** elo has quit IRC | 05:04 | |
*** raunak has quit IRC | 05:05 | |
*** raunak has joined #openstack-infra | 05:06 | |
*** thorst_ has joined #openstack-infra | 05:15 | |
*** senk_ has joined #openstack-infra | 05:16 | |
*** _nadya_ has joined #openstack-infra | 05:19 | |
*** raunak has quit IRC | 05:20 | |
*** raunak has joined #openstack-infra | 05:21 | |
*** thorst_ has quit IRC | 05:22 | |
*** _nadya_ has quit IRC | 05:24 | |
*** Sukhdev has quit IRC | 05:26 | |
*** kushal has joined #openstack-infra | 05:29 | |
*** jaosorior has joined #openstack-infra | 05:30 | |
*** raunak has quit IRC | 05:35 | |
*** hichihara has quit IRC | 05:36 | |
*** baoli has joined #openstack-infra | 05:38 | |
*** rbuzatu has joined #openstack-infra | 05:39 | |
*** ccamacho has joined #openstack-infra | 05:40 | |
*** shashank_hegde has joined #openstack-infra | 05:42 | |
*** baoli has quit IRC | 05:42 | |
*** M-docaedo_vector has quit IRC | 05:43 | |
*** raunak has joined #openstack-infra | 05:43 | |
*** senk_ has quit IRC | 05:45 | |
*** roxanaghe has quit IRC | 05:45 | |
*** r-mibu has quit IRC | 05:46 | |
*** tonytan4ever has quit IRC | 05:46 | |
beagles | is it possible to log in to a node and see what's going on if it looks like jobs are hung? | 05:47 |
openstackgerrit | guo yunxian proposed openstack/os-testr: Add support for Python versions https://review.openstack.org/355730 | 05:48 |
*** dkehn_ has quit IRC | 05:48 | |
*** dkehn has quit IRC | 05:49 | |
*** shashank_hegde has quit IRC | 05:49 | |
*** raunak has quit IRC | 05:50 | |
*** markusry has joined #openstack-infra | 05:50 | |
openstackgerrit | guo yunxian proposed openstack/os-testr: Add support for Python versions https://review.openstack.org/355730 | 05:51 |
*** tonytan4ever has joined #openstack-infra | 05:54 | |
*** rajinir has quit IRC | 05:55 | |
*** raunak has joined #openstack-infra | 05:55 | |
*** dkehn has joined #openstack-infra | 05:55 | |
ianw | beagles: yes, we can hold a node and give you a login, but it's a manual process | 05:56 |
*** slaweq_ has joined #openstack-infra | 05:57 | |
*** oanson has joined #openstack-infra | 05:58 | |
*** markvoelker has quit IRC | 05:58 | |
*** dkehn_ has joined #openstack-infra | 06:01 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config: Pre-install python2-requests package for Fedora https://review.openstack.org/355731 | 06:01 |
*** sandanar has joined #openstack-infra | 06:02 | |
*** pabelanger has quit IRC | 06:02 | |
*** pabelanger has joined #openstack-infra | 06:03 | |
*** ccamacho is now known as ccamacho|afk | 06:04 | |
*** tonytan4ever has quit IRC | 06:04 | |
*** tkelsey has joined #openstack-infra | 06:05 | |
*** r-mibu has joined #openstack-infra | 06:06 | |
*** raunak has quit IRC | 06:09 | |
*** florianf has joined #openstack-infra | 06:09 | |
*** tkelsey has quit IRC | 06:09 | |
*** M-docaedo_vector has joined #openstack-infra | 06:10 | |
*** tqtran has joined #openstack-infra | 06:11 | |
*** markusry has quit IRC | 06:11 | |
*** jimbaker has quit IRC | 06:13 | |
*** rcernin has joined #openstack-infra | 06:14 | |
*** tqtran has quit IRC | 06:15 | |
*** elo has joined #openstack-infra | 06:16 | |
*** jimbaker has joined #openstack-infra | 06:17 | |
*** raunak has joined #openstack-infra | 06:17 | |
*** jimbaker has quit IRC | 06:17 | |
*** jimbaker has joined #openstack-infra | 06:17 | |
*** javeriak has quit IRC | 06:18 | |
yolanda | good morning | 06:19 |
*** thorst_ has joined #openstack-infra | 06:20 | |
*** raunak has quit IRC | 06:21 | |
*** raunak has joined #openstack-infra | 06:25 | |
*** shashank_hegde has joined #openstack-infra | 06:26 | |
*** kzaitsev_mb has joined #openstack-infra | 06:27 | |
*** elo has quit IRC | 06:27 | |
*** elo has joined #openstack-infra | 06:27 | |
*** thorst_ has quit IRC | 06:27 | |
*** csomerville has quit IRC | 06:29 | |
*** cody-somerville has joined #openstack-infra | 06:30 | |
*** cody-somerville has joined #openstack-infra | 06:30 | |
*** Jeffrey4l has joined #openstack-infra | 06:30 | |
*** liusheng has quit IRC | 06:30 | |
*** spzala has joined #openstack-infra | 06:31 | |
*** liusheng has joined #openstack-infra | 06:31 | |
*** spzala has quit IRC | 06:35 | |
*** raunak has quit IRC | 06:35 | |
*** savihou has joined #openstack-infra | 06:36 | |
*** gildub has quit IRC | 06:37 | |
*** kushal has quit IRC | 06:39 | |
*** vsaienko has quit IRC | 06:42 | |
*** markusry has joined #openstack-infra | 06:46 | |
*** raunak has joined #openstack-infra | 06:47 | |
*** ihrachys has joined #openstack-infra | 06:47 | |
yolanda | ianw, around? care reviewing https://review.openstack.org/353994 ? | 06:49 |
*** martinkopec has joined #openstack-infra | 06:50 | |
*** raunak has quit IRC | 06:50 | |
*** markvoelker has joined #openstack-infra | 06:51 | |
*** markusry has quit IRC | 06:52 | |
*** yamahata has joined #openstack-infra | 06:53 | |
*** tkelsey has joined #openstack-infra | 06:54 | |
*** rbuzatu has quit IRC | 06:57 | |
*** rbuzatu has joined #openstack-infra | 06:58 | |
*** jtomasek|afk is now known as jtomasek | 07:00 | |
openstackgerrit | Vitaly Gridnev proposed openstack-infra/project-config: don't run tempest tests in sahara grenade https://review.openstack.org/354700 | 07:01 |
*** yamahata has quit IRC | 07:02 | |
*** savihou has quit IRC | 07:07 | |
*** thorongil has joined #openstack-infra | 07:10 | |
*** jpich has joined #openstack-infra | 07:11 | |
*** ccamacho|afk is now known as ccamacho | 07:13 | |
openstackgerrit | Merged openstack-infra/project-config: fix typo in comment https://review.openstack.org/355153 | 07:14 |
*** shashank_hegde has quit IRC | 07:18 | |
openstackgerrit | Merged openstack-infra/project-config: Fix syntax error in ironic-python-agent post job https://review.openstack.org/355487 | 07:18 |
*** dizquierdo has joined #openstack-infra | 07:19 | |
*** tonytan4ever has joined #openstack-infra | 07:22 | |
*** nmagnezi has joined #openstack-infra | 07:23 | |
*** tonytan4ever has quit IRC | 07:26 | |
*** e0ne has joined #openstack-infra | 07:28 | |
*** hichihara has joined #openstack-infra | 07:28 | |
*** thorst_ has joined #openstack-infra | 07:28 | |
*** raunak has joined #openstack-infra | 07:30 | |
*** thorst_ has quit IRC | 07:32 | |
akscram | Guys, I want to add the puppet-check-jobs group and make it non-voting but I do not know how to do it properly: https://review.openstack.org/#/c/355265/ | 07:33 |
akscram | Could someone provide me an advise how to enable it? | 07:34 |
*** raunak has quit IRC | 07:37 | |
*** bauzas_off is now known as bauzas | 07:38 | |
*** ifarkas_afk is now known as ifarkas | 07:40 | |
*** javeriak has joined #openstack-infra | 07:42 | |
*** dkehn has quit IRC | 07:43 | |
*** dkehn_ has quit IRC | 07:43 | |
*** raunak has joined #openstack-infra | 07:45 | |
*** savihou has joined #openstack-infra | 07:45 | |
openstackgerrit | Changcheng Intel proposed openstack-infra/jenkins-job-builder: add compress-log option to compress log https://review.openstack.org/354138 | 07:49 |
*** dkehn has joined #openstack-infra | 07:50 | |
*** matthewbodkin has joined #openstack-infra | 07:50 | |
*** baoli has joined #openstack-infra | 07:50 | |
*** chlong has quit IRC | 07:50 | |
*** raunak has quit IRC | 07:52 | |
*** kzaitsev_mb has quit IRC | 07:52 | |
*** yanyanhu has quit IRC | 07:52 | |
openstackgerrit | Changcheng Intel proposed openstack-infra/jenkins-job-builder: add post-send script option https://review.openstack.org/355135 | 07:53 |
*** hwoarang has joined #openstack-infra | 07:53 | |
*** baoli has quit IRC | 07:54 | |
openstackgerrit | Changcheng Intel proposed openstack-infra/jenkins-job-builder: use base_email_create to customize email flexible https://review.openstack.org/355139 | 07:54 |
*** sshnaidm|afk is now known as sshnaidm | 07:55 | |
*** kzaitsev_mb has joined #openstack-infra | 07:55 | |
*** zzzeek has quit IRC | 08:00 | |
*** zzzeek has joined #openstack-infra | 08:00 | |
*** pilgrimstack has joined #openstack-infra | 08:01 | |
*** dkehn_ has joined #openstack-infra | 08:01 | |
*** markvoelker has quit IRC | 08:01 | |
*** Mmike has quit IRC | 08:02 | |
*** Mmike has joined #openstack-infra | 08:02 | |
*** pilgrimstack has quit IRC | 08:05 | |
*** afred312 has quit IRC | 08:05 | |
*** raunak has joined #openstack-infra | 08:06 | |
*** afred312 has joined #openstack-infra | 08:06 | |
*** pilgrimstack has joined #openstack-infra | 08:07 | |
*** raunak has quit IRC | 08:11 | |
*** esikachev has joined #openstack-infra | 08:13 | |
*** matrohon has joined #openstack-infra | 08:19 | |
*** yanyanhu has joined #openstack-infra | 08:20 | |
*** asettle has joined #openstack-infra | 08:20 | |
*** sshnaidm has quit IRC | 08:21 | |
*** lucas-dinner is now known as lucasagomes | 08:21 | |
*** sshnaidm has joined #openstack-infra | 08:21 | |
*** tonytan4ever has joined #openstack-infra | 08:23 | |
openstackgerrit | Matthew Bodkin proposed openstack-infra/storyboard-webclient: Make side bar the same length as navbar https://review.openstack.org/355554 | 08:26 |
*** Goneri has joined #openstack-infra | 08:27 | |
*** Na3iL has joined #openstack-infra | 08:27 | |
*** tonytan4ever has quit IRC | 08:28 | |
*** chem has joined #openstack-infra | 08:29 | |
*** thorst_ has joined #openstack-infra | 08:30 | |
*** electrofelix has joined #openstack-infra | 08:33 | |
*** bethwhite_ has joined #openstack-infra | 08:33 | |
*** kzaitsev_mb has quit IRC | 08:34 | |
*** sandanar_ has joined #openstack-infra | 08:34 | |
*** thorst_ has quit IRC | 08:37 | |
*** sandanar has quit IRC | 08:38 | |
*** tkelsey has quit IRC | 08:39 | |
*** yaume has joined #openstack-infra | 08:40 | |
openstackgerrit | Ivan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects https://review.openstack.org/347047 | 08:40 |
*** mhickey has joined #openstack-infra | 08:41 | |
*** yamamoto has quit IRC | 08:44 | |
*** acoles_ is now known as acoles | 08:49 | |
*** sarob has joined #openstack-infra | 08:51 | |
*** dkehn_ has quit IRC | 08:51 | |
*** dkehn has quit IRC | 08:51 | |
*** bethwhite__ has joined #openstack-infra | 08:53 | |
*** sarob has quit IRC | 08:55 | |
*** Na3iL has quit IRC | 08:56 | |
*** dkehn has joined #openstack-infra | 08:58 | |
*** Julien-zte has joined #openstack-infra | 08:59 | |
*** Goneri has quit IRC | 09:01 | |
*** Goneri has joined #openstack-infra | 09:01 | |
*** markvoelker has joined #openstack-infra | 09:02 | |
*** derekh has joined #openstack-infra | 09:03 | |
*** dkehn_ has joined #openstack-infra | 09:04 | |
*** markvoelker has quit IRC | 09:07 | |
*** sambetts|afk is now known as sambetts | 09:07 | |
*** Na3iL has joined #openstack-infra | 09:11 | |
*** vinaypotluri has quit IRC | 09:11 | |
openstackgerrit | Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861 | 09:18 |
*** eranrom has quit IRC | 09:20 | |
*** markmcd has joined #openstack-infra | 09:20 | |
*** _nadya_ has joined #openstack-infra | 09:22 | |
*** _nadya_ has quit IRC | 09:22 | |
*** _nadya_ has joined #openstack-infra | 09:22 | |
*** infra-red has joined #openstack-infra | 09:23 | |
openstackgerrit | Merged openstack/diskimage-builder: Allow to skip kernel cleanup https://review.openstack.org/353994 | 09:24 |
*** dtardivel has joined #openstack-infra | 09:28 | |
*** eranrom has joined #openstack-infra | 09:30 | |
*** yamamoto has joined #openstack-infra | 09:31 | |
*** thorst_ has joined #openstack-infra | 09:35 | |
*** ociuhandu has joined #openstack-infra | 09:40 | |
*** nwkarsten has joined #openstack-infra | 09:40 | |
*** dtantsur|afk is now known as dtantsur | 09:40 | |
*** thorst_ has quit IRC | 09:41 | |
*** yamamoto has quit IRC | 09:41 | |
*** ramishra has quit IRC | 09:42 | |
*** ramishra has joined #openstack-infra | 09:44 | |
*** nwkarsten has quit IRC | 09:44 | |
*** yamamoto has joined #openstack-infra | 09:46 | |
*** yamamoto has quit IRC | 09:46 | |
*** dmellado has quit IRC | 09:46 | |
*** amoralej has quit IRC | 09:46 | |
*** geguileo has quit IRC | 09:46 | |
*** kzaitsev_mb has joined #openstack-infra | 09:48 | |
*** yamamoto has joined #openstack-infra | 09:48 | |
*** tosky has joined #openstack-infra | 09:49 | |
openstackgerrit | Ilya Shakhat proposed openstack-infra/project-config: Add new project "os-failures" https://review.openstack.org/355819 | 09:53 |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481 | 09:54 |
*** hichihara has quit IRC | 09:54 | |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481 | 09:55 |
*** dmellado has joined #openstack-infra | 09:56 | |
*** ihrachys has quit IRC | 09:58 | |
*** javeriak has quit IRC | 10:00 | |
*** zhurong has quit IRC | 10:01 | |
*** markvoelker has joined #openstack-infra | 10:03 | |
*** jed56 has joined #openstack-infra | 10:03 | |
*** sandanar__ has joined #openstack-infra | 10:03 | |
*** sandanar_ has quit IRC | 10:07 | |
openstackgerrit | Julien Danjou proposed openstack-infra/project-config: Teach some Telemetry jobs about Gnocchi stable/2.2 branch https://review.openstack.org/355828 | 10:07 |
*** markvoelker has quit IRC | 10:08 | |
*** kushal has joined #openstack-infra | 10:09 | |
*** tqtran has joined #openstack-infra | 10:12 | |
*** ihrachys has joined #openstack-infra | 10:14 | |
*** pt_15 has quit IRC | 10:16 | |
*** Julien-zte has quit IRC | 10:17 | |
*** tqtran has quit IRC | 10:17 | |
*** _degorenko|afk is now known as degorenko | 10:18 | |
sshnaidm | do you know why in some of projects when I set "closes-bug" it doesn't affect bugs in launchpad? Should be something special configured for this feature? | 10:18 |
*** asettle has quit IRC | 10:22 | |
*** sdague has joined #openstack-infra | 10:23 | |
*** ihrachys has quit IRC | 10:23 | |
*** yanyanhu has quit IRC | 10:24 | |
*** tonytan4ever has joined #openstack-infra | 10:24 | |
*** yamamoto has quit IRC | 10:25 | |
*** mhickey has quit IRC | 10:25 | |
*** kushal has quit IRC | 10:26 | |
*** kushal has joined #openstack-infra | 10:27 | |
*** tonytan4ever has quit IRC | 10:28 | |
*** javeriak has joined #openstack-infra | 10:29 | |
*** rbuzatu has quit IRC | 10:29 | |
*** cdent has joined #openstack-infra | 10:30 | |
cdent | I'm trying to figure out how to integrate the api-wg gerrit with the (newer) launchpad bugs collection it has. I can see from the docs that jeepyb does it, but I'm missing the bit on what to change in config to turn it on. halp? | 10:31 |
*** spzala has joined #openstack-infra | 10:31 | |
*** boogibugs has joined #openstack-infra | 10:32 | |
*** florianf has quit IRC | 10:33 | |
openstackgerrit | Darragh Bailey proposed openstack-infra/jenkins-job-builder: Adding support for Manual Build Trigger https://review.openstack.org/202543 | 10:34 |
openstackgerrit | Darragh Bailey proposed openstack-infra/jenkins-job-builder: Consolidate trigger-manual and trigger-parameterized-builds https://review.openstack.org/314108 | 10:34 |
*** spzala has quit IRC | 10:36 | |
*** boogibugs has quit IRC | 10:36 | |
*** boogibugs has joined #openstack-infra | 10:36 | |
*** Na3iL has quit IRC | 10:38 | |
*** florianf has joined #openstack-infra | 10:38 | |
*** narayrak has joined #openstack-infra | 10:39 | |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla https://review.openstack.org/355839 | 10:40 |
*** thorst_ has joined #openstack-infra | 10:41 | |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla https://review.openstack.org/355839 | 10:44 |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/system-config: Correct public IP for baremetal00 https://review.openstack.org/355841 | 10:46 |
*** thorst_ has quit IRC | 10:47 | |
*** bethwhite_ has quit IRC | 10:48 | |
electrofelix | zxiiro waynr: given TOX_TESTENV_PASSENV works for https://review.openstack.org/271244, perhaps I should just change that review to update documentation when testing and add a comment instead of explicitly allowing proxy variables to be passed through? | 10:48 |
*** rhallisey has joined #openstack-infra | 10:49 | |
*** sarob has joined #openstack-infra | 10:52 | |
openstackgerrit | yolanda.robla proposed openstack-infra/puppet-infracloud: Fix bridge creation when no vlan is involved https://review.openstack.org/355845 | 10:54 |
*** sarob has quit IRC | 10:56 | |
openstackgerrit | Benny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest https://review.openstack.org/355846 | 11:01 |
openstackgerrit | Benny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest https://review.openstack.org/355846 | 11:03 |
*** yamamoto has joined #openstack-infra | 11:03 | |
*** dmsimard is now known as dmsimard|afk | 11:03 | |
*** azvyagintsev_h has joined #openstack-infra | 11:03 | |
*** markvoelker has joined #openstack-infra | 11:04 | |
*** markvoelker has quit IRC | 11:08 | |
*** asettle has joined #openstack-infra | 11:09 | |
*** Na3iL has joined #openstack-infra | 11:09 | |
*** locust has joined #openstack-infra | 11:12 | |
*** baoli has joined #openstack-infra | 11:14 | |
openstackgerrit | Ryan Hallisey proposed openstack-infra/project-config: Few changed to the kolla-kubernetes job https://review.openstack.org/355199 | 11:15 |
*** florianf has quit IRC | 11:17 | |
*** baoli has quit IRC | 11:19 | |
*** ociuhandu has quit IRC | 11:20 | |
*** florianf has joined #openstack-infra | 11:21 | |
openstackgerrit | Sean Dague proposed openstack-infra/project-config: Prime pip cache https://review.openstack.org/355854 | 11:22 |
*** jkilpatr has joined #openstack-infra | 11:23 | |
*** dizquierdo is now known as dizquierdo_afk | 11:29 | |
*** rbuzatu has joined #openstack-infra | 11:29 | |
*** asettle has quit IRC | 11:30 | |
*** ramishra has quit IRC | 11:30 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci: DONT MERGE: test periodic job https://review.openstack.org/355859 | 11:31 |
*** ccamacho is now known as ccamacho|lunch | 11:31 | |
cdent | sdague: since you appear to be awake maybe you know the answer to my question above: "I'm trying to figure out how to integrate the api-wg gerrit with the (newer) launchpad bugs collection it has. I can see from the docs that jeepyb does it, but I'm missing the bit on what to change in config to turn it on. halp?" | 11:31 |
*** ramishra has joined #openstack-infra | 11:32 | |
*** pbourke has joined #openstack-infra | 11:32 | |
pbourke | hi, wondering are the repos at http://mirror.ord.rax.openstack.org/ubuntu/dists/xenial/ signed, and if so, where can I find the key? | 11:33 |
openstackgerrit | Fathi Boudra proposed openstack-infra/jenkins-job-builder: builders: add 'publish over ssh' support as a build step https://review.openstack.org/98437 | 11:34 |
*** rbuzatu has quit IRC | 11:34 | |
*** thorst_ has joined #openstack-infra | 11:35 | |
*** jaosorior has quit IRC | 11:35 | |
*** jaosorior has joined #openstack-infra | 11:36 | |
*** sdake has joined #openstack-infra | 11:36 | |
*** berendt has joined #openstack-infra | 11:39 | |
*** rfolco has joined #openstack-infra | 11:41 | |
openstackgerrit | Merged openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla https://review.openstack.org/355839 | 11:41 |
*** sfinucan has quit IRC | 11:41 | |
*** tpsilva has joined #openstack-infra | 11:41 | |
*** asettle has joined #openstack-infra | 11:44 | |
*** sfinucan has joined #openstack-infra | 11:44 | |
dtantsur | hi folks! could you please merge https://review.openstack.org/#/c/354608/ ? it's blocking Ironic stable gate | 11:45 |
openstackgerrit | Darragh Bailey proposed openstack-infra/jenkins-job-builder: Allow using lockfile per jenkins master https://review.openstack.org/293631 | 11:46 |
sdague | cdent: what is the old bug group, and what is the new one? | 11:47 |
*** matbu is now known as matbu|lunch | 11:47 | |
sdague | dtantsur: +A | 11:47 |
cdent | sdague: there was no previous association with launchpad. The new launchpad is: https://bugs.launchpad.net/openstack-api-wg https://launchpad.net/~openstack-api-wg-drivers | 11:48 |
odyssey4me | yolanda if you have a moment, reviews of https://review.openstack.org/355434 & https://review.openstack.org/355491 would be appreciated | 11:48 |
*** sarob has joined #openstack-infra | 11:50 | |
dtantsur | sdague, thanks! | 11:50 |
sdague | cdent: I think it's the 'groups' field | 11:51 |
*** rodrigods has quit IRC | 11:51 | |
*** asettle has quit IRC | 11:51 | |
*** rodrigods has joined #openstack-infra | 11:51 | |
sdague | https://github.com/openstack-infra/project-config/blob/c5ed5d0c03c337c8834cb153de78459f4d802dda/gerrit/projects.yaml#L4220 | 11:51 |
sdague | anteaya, is that right? ^^^ | 11:51 |
*** asettle has joined #openstack-infra | 11:52 | |
*** sshnaidm is now known as sshnaidm|lnch | 11:52 | |
sdague | are we really imbalanced on xenial nodes? | 11:53 |
*** baoli has joined #openstack-infra | 11:53 | |
*** baoli_ has joined #openstack-infra | 11:54 | |
*** sarob has quit IRC | 11:54 | |
*** tonytan4ever has joined #openstack-infra | 11:55 | |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481 | 11:56 |
*** acabot has quit IRC | 11:57 | |
*** baoli has quit IRC | 11:58 | |
*** rbuzatu has joined #openstack-infra | 11:58 | |
*** tonytan4ever has quit IRC | 11:59 | |
openstackgerrit | Merged openstack-infra/project-config: Ensure we alway build old Ironic ramdisk https://review.openstack.org/354608 | 12:00 |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481 | 12:00 |
beagles | ianw: sorry I ran off on you... had to catch some Zzz's | 12:00 |
beagles | ianw, these jobs seem to be largely hanging while cloning repos... if not hanging, then at least slowing wwaaaayyyyyy done | 12:00 |
openstackgerrit | Jim Rollenhagen proposed openstack-infra/project-config: Ironic: multitenant job should not run on stable https://review.openstack.org/355880 | 12:01 |
openstackgerrit | Merged openstack-infra/project-config: Implement Swift pypy experimental check https://review.openstack.org/355491 | 12:02 |
openstackgerrit | Sam Betts proposed openstack-infra/project-config: Prevent Ironic multitenancy job running on old versions https://review.openstack.org/355881 | 12:02 |
jroll | sambetts: you're too slow :) | 12:02 |
sambetts | jroll: apprently so :-P | 12:02 |
*** dprince has joined #openstack-infra | 12:03 | |
*** ldnunes has joined #openstack-infra | 12:03 | |
*** markvoelker has joined #openstack-infra | 12:05 | |
*** sigmavirus|away is now known as sigmavirus | 12:05 | |
*** lucasagomes is now known as lucas-hungry | 12:06 | |
*** mriedem_away has quit IRC | 12:07 | |
*** markvoelker has quit IRC | 12:09 | |
*** kgiusti has joined #openstack-infra | 12:09 | |
openstackgerrit | Julia Kreger proposed openstack-infra/project-config: Rename bifrost integration test job https://review.openstack.org/355652 | 12:09 |
openstackgerrit | Chris Dent proposed openstack-infra/project-config: Set the launchpad name for api-wg https://review.openstack.org/355885 | 12:09 |
openstackgerrit | Matthew Bodkin proposed openstack-infra/storyboard: Fixing docs so it is easy to understand https://review.openstack.org/355886 | 12:10 |
*** acabot has joined #openstack-infra | 12:10 | |
*** psachin has quit IRC | 12:10 | |
azvyagintsev_h | Folks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ? since if i remove check\gate section - test fall ;( | 12:11 |
*** vrovachev has joined #openstack-infra | 12:11 | |
vrovachev | Hello around, please take a look https://review.openstack.org/#/c/355382/ | 12:12 |
*** rbuzatu has quit IRC | 12:13 | |
*** yaume has quit IRC | 12:13 | |
*** rbuzatu has joined #openstack-infra | 12:14 | |
*** narayrak has quit IRC | 12:15 | |
*** locust has quit IRC | 12:17 | |
*** weshay has joined #openstack-infra | 12:17 | |
*** javeriak has quit IRC | 12:21 | |
*** matbu|lunch is now known as matbu | 12:21 | |
openstackgerrit | Dmitry Tantsur proposed openstack-infra/project-config: Make the grenade job voting on ironic-inspector https://review.openstack.org/355894 | 12:22 |
*** javeriak has joined #openstack-infra | 12:24 | |
*** gordc has joined #openstack-infra | 12:24 | |
beagles | ianw: is there something particular with these jobs (osic cloud jobs?) that could slow down stuff like git clone operations | 12:25 |
EmilienM | to give a bit more precisions than beagles, we are seeing a persistent problem when cloning repositories with zuul cloner, when running ubuntu nodes on osic-cloud1 | 12:25 |
beagles | yeah, what he said | 12:25 |
beagles | :) | 12:25 |
EmilienM | are we aware about any downtime on osic ? | 12:25 |
*** markvoelker has joined #openstack-infra | 12:26 | |
*** pradk has joined #openstack-infra | 12:26 | |
*** burgerk has joined #openstack-infra | 12:27 | |
*** mdrabe has joined #openstack-infra | 12:29 | |
*** gouthamr has joined #openstack-infra | 12:32 | |
*** apetrich has quit IRC | 12:32 | |
pleia2 | mtreinish: so this time it really was getting stuck on the fact that the new/ directory existed and immediately failing, I manually removed it and let it run at :20, other.html now exists: http://status.openstack.org/elastic-recheck/data/other.html | 12:33 |
*** yamamoto has quit IRC | 12:33 | |
pleia2 | mtreinish: should probably sort out the naming though :) http://status.openstack.org/elastic-recheck/ links to other.html and that exists, but it's inconsistent with our others.html template | 12:34 |
odyssey4me | EmilienM afaik it's running well... but you may need to know that it's running IPv6 and that its DNS resolver is configured to use 127.0.0.1 to point at a locally running unbound service... so your tests may appear to have dns resolution errors | 12:34 |
odyssey4me | EmilienM also, if your tests can't use IPv6 for external connectivity, then that may also be an issue | 12:34 |
EmilienM | odyssey4me: zuul-cloner takes forever | 12:35 |
EmilienM | odyssey4me: 355612 | 12:35 |
EmilienM | err | 12:35 |
EmilienM | http://logs.openstack.org/35/355235/1/check/gate-puppet-openstacklib-puppet-beaker-rspec-ubuntu-trusty/730b053/console.html#_2016-08-16_10_38_40_231581 | 12:35 |
odyssey4me | EmilienM yeah, that could relate to DNS resolution... we've seen slowness in odd places too | 12:36 |
openstackgerrit | Benny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest https://review.openstack.org/355846 | 12:36 |
odyssey4me | basically OSIC is configured to use unbound, RAX has something in place which overwrites the nodepool config and uses the RAX DNS... | 12:37 |
*** apetrich has joined #openstack-infra | 12:37 | |
odyssey4me | so we're seeing inconsistencies and odd slowness here and there too | 12:37 |
*** ccamacho|lunch is now known as ccamacho | 12:38 | |
*** yamamoto has joined #openstack-infra | 12:39 | |
*** sandanar__ has quit IRC | 12:39 | |
openstackgerrit | Brad P. Crochet proposed openstack-infra/tripleo-ci: Use tripleo-build-images for CI https://review.openstack.org/336312 | 12:41 |
mordred | EmilienM, odyssey4me: for the slow cloning ... is there any chance that there is some weird routing which is causing routing between OSIC and RAX to go strange? the git mirrors are all in RAX | 12:42 |
odyssey4me | mordred hmm, good question - not one I have the answer to, but that would explain how slow the cloning is | 12:43 |
*** yamamoto has quit IRC | 12:43 | |
odyssey4me | I'm surprised that we don't have regional git endpoints too. :) | 12:44 |
odyssey4me | perhaps cloudnull can provide some insight when he comes online | 12:44 |
mordred | yah - well, so far it hasn't been an issue :) | 12:44 |
sdague | mordred: I did some poking around on my devstack | 12:44 |
mordred | yeah? | 12:44 |
odyssey4me | mordred ah of course, the local git cache is useful to speed things up | 12:44 |
sdague | the pip cache used by devstack is actually the one owned by the root user | 12:45 |
sdague | because sudo | 12:45 |
sdague | so https://review.openstack.org/#/c/355854/ might be all that we need | 12:45 |
*** raildo has joined #openstack-infra | 12:45 | |
sdague | I don't know how one actually validates a thing like that before it goes into production | 12:45 |
*** rlandy has joined #openstack-infra | 12:45 | |
* mordred looks | 12:46 | |
mordred | sdague: yesterday, I noticed in this change: http://logs.openstack.org/05/351905/7/check/check-osc-plugins/71038e2/console.html#_2016-08-15_17_51_18_401054 | 12:46 |
mordred | (which does happen to be on OSIC) | 12:47 |
mordred | that every remote update action took 4 seconds | 12:47 |
mordred | sdague: root owns a pip cache? | 12:48 |
sdague | sudo pip install foo | 12:48 |
dtantsur | folks, jroll, the check-osc-plugin seems broken for ironic: http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/. is it something known? | 12:48 |
*** jheroux has joined #openstack-infra | 12:48 | |
sdague | will put that content into ~/.cache/pip | 12:48 |
sdague | for root | 12:48 |
sdague | /root/.cache/pip | 12:49 |
kgiusti | folks: the oslo.messaging team is experiencing frequent failures of the same 3 tempest tests: http://status.openstack.org/openstack-health/#/job/gate-oslo.messaging-src-dsvm-full-zmq | 12:49 |
sdague | mordred: 4 seconds for a git operation does not seem completely out of bounds | 12:49 |
*** devkulkarni has joined #openstack-infra | 12:49 | |
kgiusti | similarish to bug: https://bugs.launchpad.net/openstack-gate/+bug/1449136 | 12:49 |
openstack | Launchpad bug 1449136 in OpenStack-Gate "OpenStack pypi mirrors disconnecting connections" [Undecided,New] | 12:49 |
*** matt-borland has joined #openstack-infra | 12:49 | |
kgiusti | same failures, but not against pypi host but against localhost http server | 12:50 |
kgiusti | known issue? | 12:50 |
*** bswartz has joined #openstack-infra | 12:50 | |
*** ociuhandu has joined #openstack-infra | 12:50 | |
*** itisha has quit IRC | 12:50 | |
sdague | kgiusti: I think we're feeding it the icon on git.openstack.org for the http image registration | 12:51 |
sdague | so that really means git.openstack.org is dropping requests | 12:51 |
*** devkulkarni has quit IRC | 12:53 | |
*** devkulkarni has joined #openstack-infra | 12:54 | |
mordred | sdague: I think a consistent 4 seconds to check whether there are any new refs to pull in repos that should be no more than a day out of day is exceptionally long | 12:54 |
mordred | sdague: that said - I have verified that the root pip caching works - so neat | 12:54 |
kgiusti | sdague: are the three failing tests the only ones that query git.openstack.org? I ask because only those three tests consistently fail - all others have passed without incident. | 12:55 |
*** asettle has quit IRC | 12:55 | |
mordred | sdague: I don't think your patch is going to work, becuase install is going to want to build them, and we don;'t have the bindep depends installed at that point | 12:55 |
mordred | sdague: if we want to prime the cache, using pip download I think may be better? but now I need to check if that also does cache things ... | 12:56 |
mordred | yah. it does (just checked) | 12:57 |
*** ociuhandu has quit IRC | 12:57 | |
jroll | dtantsur: that's new to me | 12:57 |
rcarrillocruz | o/ | 12:58 |
rcarrillocruz | i'm around today (yesterday was bank holiday in Spain) | 12:58 |
*** asettle has joined #openstack-infra | 12:58 | |
sdague | mordred: pip download won't prime the cache | 12:58 |
mordred | I just tested that it will | 12:58 |
sdague | I got a wildly smaller cache with it locally | 12:58 |
*** vikrant has quit IRC | 12:59 | |
sdague | mordred: it will only try to build if the wheels aren't there, right? | 12:59 |
sdague | we're hitting the wheel mirror with this, right? | 12:59 |
mordred | mordred@camelot:~/src/openstack-infra/nodepool$ sudo -H pip install -d . paramz | 12:59 |
mordred | Collecting paramz | 12:59 |
rcarrillocruz | doh | 12:59 |
mordred | Using cached paramz-0.6.1.tar.gz | 12:59 |
rcarrillocruz | yolanda , mordred , pabelanger : http://paste.openstack.org/show/558377/ | 12:59 |
mordred | that was the second time I ran it, after deleting the tarball from the local dir | 12:59 |
*** julim has joined #openstack-infra | 12:59 | |
rcarrillocruz | glean cruft on writing interfaces file | 12:59 |
rcarrillocruz | but yeah, i can deploy servers with bifrost | 13:00 |
rcarrillocruz | i'll see what's up with glean | 13:00 |
Zara | hm, should gerrit search autocomplete for stories and tasks now? aiui we need config to enable gerrit-updating-storyboard per project, as per the commit message here: https://review.openstack.org/#/c/347486/ but tasks and stories were now indexed in gerrit search? | 13:01 |
mordred | sdague: http://paste.openstack.org/show/558378/ | 13:01 |
rcarrillocruz | huh | 13:01 |
rcarrillocruz | also, the interface is set to dhcp, but should not | 13:01 |
*** apetrich has quit IRC | 13:02 | |
mordred | Zara: I'm not sure they autocomplete - but https://review.openstack.org/#/q/bug:2000522 works | 13:02 |
openstackgerrit | Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912 | 13:02 |
mordred | Zara: so adding a bug:2000522 to the search finds the thing by story id | 13:03 |
*** xyang1 has joined #openstack-infra | 13:03 | |
*** _ari_ has joined #openstack-infra | 13:03 | |
sdague | mrodden: can you rm -rf ~/.cache/pip and try that again? | 13:04 |
*** javeriak has quit IRC | 13:04 | |
*** woodster_ has joined #openstack-infra | 13:04 | |
*** kbaegis has joined #openstack-infra | 13:05 | |
Zara | mordred: oh, aha. I thought it needed 'story:2000522' but that was probably just me misinterpreting the expected behaviour. found the docs now and they do say 'bug:' and 'tr:' so whoops. | 13:06 |
*** javeriak has joined #openstack-infra | 13:06 | |
sdague | because when I use -d, my pip cache remains empty | 13:06 |
mordred | sdague: sure | 13:06 |
sdague | with pip 8.1.2 | 13:06 |
*** yamamoto has joined #openstack-infra | 13:07 | |
tosky | sdague: now that devstack switched to neutron by default, how to enable nova-network in gate jobs (for a poor old Sahara job that I'd like to kill sooner than later)? | 13:07 |
*** yamamoto has quit IRC | 13:07 | |
sdague | tosky: the gate doesn't really change, it's always had explicit service lists | 13:07 |
odyssey4me | yolanda if you have a moment, a review of https://review.openstack.org/355434 would be appreciated | 13:07 |
*** andymaier has joined #openstack-infra | 13:08 | |
mordred | sdague: yes. it works | 13:08 |
yolanda | odyssey4me, back from lunch, i'll take a look in a while | 13:08 |
*** ociuhandu has joined #openstack-infra | 13:08 | |
Zara | (yes, bug:$task_id will also find storyboard tasks, ace) | 13:09 |
*** sshnaidm|lnch is now known as sshnaidm | 13:09 | |
*** devkulkarni has quit IRC | 13:10 | |
*** lucas-hungry is now known as lucasagomes | 13:10 | |
mordred | sdague: http://paste.openstack.org/show/558383/ | 13:10 |
penguinolog | Hello! Could anybody help with https://review.openstack.org/#/c/355382/ - it's blocker for the parallel team | 13:10 |
*** edmondsw has joined #openstack-infra | 13:11 | |
*** lifeless has quit IRC | 13:11 | |
persia | Zara: Do we run any risk of collision between LP bug# and SB task#? There's a gap in SB stories to avoid LP bugs, but I don't think there is one for tasks. | 13:11 |
*** mriedem has joined #openstack-infra | 13:12 | |
*** Julien-zte has joined #openstack-infra | 13:12 | |
*** andymaier has quit IRC | 13:13 | |
mordred | sdague, odyssey4me: I tested git remote operations on an osic node and they all took less than a second as expected ( doing git remote update origin pointed at git.o.o) | 13:13 |
mordred | so it doesn't seem to be routing issues | 13:13 |
*** javeriak has quit IRC | 13:14 | |
Zara | persia: yes, I think so, though just when searching for them. so if two commits pop up when someone searches, it should be fairly quick to find the right one since I'd imagine they'd be about totally different things. | 13:14 |
tosky | sdague: I see, thanks | 13:15 |
*** nmagnezi has quit IRC | 13:15 | |
*** apetrich has joined #openstack-infra | 13:15 | |
persia | Zara: I was worried more about comments being posted to unrelated stories that might trigger email as a result of subscriptions. Maybe I lack context. | 13:15 |
sdague | mordred: pip --version? | 13:15 |
Zara | persia: ah, that's a separate thing. this is just for searching things in gerrit. the plugin should use storyboard-specific syntax in the commit message | 13:18 |
azvyagintsev_h | Folks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ? since if i remove check\gate section - test fall ;( | 13:18 |
persia | Ah, cool. I was missing context :) | 13:18 |
mordred | sdague: pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7) | 13:19 |
*** spzala_ has joined #openstack-infra | 13:19 | |
sdague | mordred: ok, well | 13:19 |
mordred | sdague: I'm not sure why it's not working for you - but I think it has the better chance of working, since we know install won't work | 13:20 |
*** ianychoi has quit IRC | 13:20 | |
Zara | (so 'closes-bug: $id' will close a lp bug, 'task: $id' will affect sb task status; both are searchable in gerrit with 'bug:$id'. so in practice I think the tricky bit will be that we'll probably see people using lp notation to try to change sb task status, but that's one for the future) | 13:20 |
*** zhurong has joined #openstack-infra | 13:21 | |
*** cdent has left #openstack-infra | 13:22 | |
sdague | mordred: this is what I get - http://paste.openstack.org/show/558386/ | 13:22 |
mordred | sdague: you need to do find /root/.cache/pip | 13:22 |
mordred | not /home/sdague | 13:23 |
sdague | I'm not running as root | 13:23 |
*** kushal has quit IRC | 13:23 | |
mordred | hrm. weird. try as root and see if you get my behavior? | 13:23 |
mordred | (since that's the important one for this) | 13:23 |
mordred | sdague: got it | 13:25 |
mordred | sdague: adding the index prevents the cache | 13:25 |
openstackgerrit | Oleksii Zamiatin proposed openstack-infra/project-config: Remove n-net related gates https://review.openstack.org/355919 | 13:25 |
sdague | mordred: gah, really? | 13:25 |
mordred | sdague: yup | 13:25 |
openstackgerrit | Merged openstack-infra/project-config: Implement LXD hypervisor experimental check https://review.openstack.org/355434 | 13:25 |
sdague | so that means this doesn't work at all because we're using alternative indexes? | 13:25 |
mordred | it will neither download to the cache or use things in the cache | 13:25 |
mordred | yah | 13:25 |
mordred | at least, according to my test just now | 13:26 |
mordred | I haven't poked more extensively | 13:26 |
openstackgerrit | Merged openstack-infra/project-config: fuel-qa: stable-mu branches for maintenance and stable for upgrades https://review.openstack.org/355382 | 13:26 |
sdague | so... we know that's not entirely true during runs, because we definitely only download each package once | 13:27 |
mordred | weird | 13:27 |
mordred | well, local testing with your command line resulted in nothing being cached | 13:27 |
sdague | yeh, install with index still builds the cache | 13:28 |
sdague | it's just download that doesn't | 13:28 |
mordred | sigh | 13:28 |
mordred | that seems like a pip bug | 13:28 |
mordred | oh - so - this is going to run during image build | 13:28 |
*** lifeless has joined #openstack-infra | 13:28 | |
*** rbuzatu_ has joined #openstack-infra | 13:28 | |
mordred | which means it should be hitting pypi, not pip mirrors | 13:29 |
mordred | yeah? | 13:29 |
yolanda | rcarrillocruz, can it be a race? i ran glean several times on my environment and i get good results | 13:29 |
mordred | or do we set it to use the dfw mirror during image buids (/me can't remember) | 13:29 |
sdague | mordred: that actually will defeat the purpose of the patch if it does | 13:29 |
mordred | why? | 13:29 |
mordred | it's during image build - it'll download and cache the things using download. then, during devstack run, the cache will be populated and the intsall command will be using install so it should read the cache | 13:30 |
sdague | because if we hit pypi and download, then we'll get numpy as source | 13:30 |
sdague | which means we have to spend 4 minutes compiling it on the node | 13:30 |
mordred | oh. right. bother | 13:30 |
mordred | so - I guess we just have to get download honoring caches | 13:30 |
*** rlandy is now known as rlandy|mtg | 13:31 | |
mordred | dstufft: ^^ whence you awaken ... tl;dr pip download with -i option for an alternate index does not populate or consume cache. pip install with -i option does | 13:32 |
*** andymaier has joined #openstack-infra | 13:32 | |
*** ianychoi has joined #openstack-infra | 13:32 | |
mordred | that said - I do not believe we point at pip mirrors until the image boots | 13:32 |
*** rbuzatu has quit IRC | 13:33 | |
mordred | so we'd also want to explore setting a mirror location during image build | 13:33 |
fungi | mordred: does changing the mirror url after boot still pose a problem? | 13:34 |
fungi | if so, we're sort of stuck unless we want to build different images for every provider/region | 13:34 |
*** inc0 has joined #openstack-infra | 13:35 | |
mordred | fungi: no, I do not believe it does | 13:35 |
fungi | so if we, say, set it to the dfw pypi mirror before caching packages, then we can update it to a different mirror later and it'll still use the cache? | 13:36 |
*** esberglu has joined #openstack-infra | 13:36 | |
jroll | mordred: when you have a sec, this looks like a similar thing you were looking at yesterday, is it just a timeout or something else? http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/_zuul_ansible/ansible_log.txt . no errors in the console log http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/console.html | 13:37 |
sdague | fungi: ug, you might be right | 13:38 |
*** rbuzatu_ has quit IRC | 13:38 | |
mordred | jroll: yah - 2016-08-16 11:33:55,481 p=6961 u=zuul | fatal: [node]: FAILED! => {"async_result": {"ansible_job_id": "344516947230.6884", "changed": false, "finished": 0, "invocation": {"module_args": {"jid": "344516947230.6884", "mode": "status"}, "module_name": "async_status"}, "started": 1}, "changed": false, "failed": true, "msg": "async task produced unparseable results"} | 13:39 |
fungi | sdague: mordred: i mean, maybe we can "transform" the cache when we reset the mirror url, as an alternative. though that's getting into implementation details of pip's cache that probably aren't a guaranteed stable api | 13:39 |
mordred | jroll: that just ran a couple of hours ago doens' it? | 13:39 |
jroll | mordred: looks like it, yeah | 13:40 |
mordred | fungi: I think it's caching by content hash, not by name | 13:40 |
*** nwkarsten has joined #openstack-infra | 13:40 | |
sdague | mordred: I'm not so sure | 13:40 |
fungi | oh, so if the content hash is consistent (which it would be across our mirrors unless there's an update) then we might be fine | 13:40 |
fungi | but if it mixes other data into that hash, like the url or something, then that gets tricky | 13:41 |
mordred | sdague: pip install with a -i doesn't cache for me with install either | 13:41 |
*** hichihara has joined #openstack-infra | 13:41 | |
fungi | huh. apparently crowbar is still under active development? just saw a cve request to the oss-security ml because they were setting a known default admin account password in it | 13:42 |
mordred | fungi: yah - it's the basis of rob's current company | 13:43 |
fungi | oic | 13:43 |
*** zhurong has quit IRC | 13:43 | |
*** markusry has joined #openstack-infra | 13:43 | |
sdague | anyway, I need to get back to release things. Once dstufft is up he can probably just tell us all our silliness instead of us guessing | 13:43 |
jroll | mordred: so "yah" meaning "yah that is similar" or "yah that is a timeout" or? :) looking for something actionable I can do here | 13:43 |
Shrews | mordred: that ansible error... looks like a genuine timeout | 13:43 |
mordred | sdague: http://paste.openstack.org/show/558391/ | 13:43 |
sdague | my quick git grepping in pip source isn't finding payload | 13:43 |
Shrews | mordred: TASK [zuul_runner with 1547 second timeout] | 13:43 |
jroll | oh wait, timestamps | 13:44 |
* jroll feels dumb | 13:44 | |
*** rbuzatu has joined #openstack-infra | 13:44 | |
fungi | Shrews: so the behavior i was seeing in various job logs yesterday were legitimate timeouts ending with an ansible json parse failure | 13:44 |
*** zhurong has joined #openstack-infra | 13:44 | |
*** ramishra has quit IRC | 13:44 | |
mordred | fungi: yah. that's what we fixed yesterday | 13:44 |
sdague | mordred: you delete the venv | 13:44 |
*** dizquierdo_afk is now known as dizquierdo | 13:44 | |
Shrews | fungi: yeah, i can't explain why a timeout causes that | 13:44 |
mordred | sdague: I do - then I re-make it | 13:44 |
sdague | ah, right | 13:44 |
*** ramishra has joined #openstack-infra | 13:44 | |
jroll | Shrews: so this is just pip being super slow, I guess | 13:44 |
fungi | as if one of the things ansible was failing to parse was the json coming from jobs that timed out, not that ansible was responsible for the timeout | 13:44 |
Shrews | jroll: probably? | 13:45 |
sdague | mordred: can you do that without the ^C? | 13:45 |
mordred | sdague: sure | 13:45 |
jroll | Shrews: all that job does, if you look at the console, is install a bunch of OSC plugins | 13:45 |
jroll | :) | 13:45 |
Shrews | fungi: yeah, i suspect nothing is written to the async file if the job doesn't finish, thus the unparseable | 13:45 |
fungi | empty != json | 13:45 |
fungi | indeed | 13:45 |
fungi | jroll: did that run in rax-ord? | 13:46 |
jroll | fungi: nope, osic http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/console.html | 13:46 |
mordred | Shrews, fungi: hrm. that would be annoying | 13:46 |
pabelanger | cloudnull: I updated ubuntu-xenail in osic-cloud1 and confirmed DNS is running on ipv6. Other images will be updated today | 13:46 |
fungi | jroll: oh, okay. we did just up the quota significantly there... lemme check a few things | 13:46 |
Shrews | mordred: we *might* be able to recognize the timeout in ansible and write empty json to solve that | 13:46 |
mordred | Shrews: there is an if/else case that I thought was related to timeout | 13:47 |
* Shrews looks | 13:47 | |
jroll | fungi: cool, I'm going to recheck that unless you think there's reason not to | 13:47 |
jroll | I guess we had two in a row, though | 13:47 |
fungi | unfortunately https://review.openstack.org/355580 hasn't merged yet, so we're going to have a relatively hard time figuring out if we're taxing that mirror | 13:48 |
mordred | Shrews: line 603 in lib/ansible/executor/task_executor.py | 13:48 |
jroll | both osic cloud | 13:48 |
Shrews | mordred: ah, it still depends on 'parsed' being there | 13:48 |
Shrews | which it won't be if it didn't actually finish | 13:48 |
fungi | jroll: might only be coincidence, but i'll get our mirror there into cacti in moments and see what else i can find in the meantime | 13:48 |
*** kushal has joined #openstack-infra | 13:49 | |
jroll | fungi: cool, thank you :) | 13:49 |
mordred | Shrews: why not thought? the async_runner shold be the thing writing the status to the file | 13:49 |
Shrews | mordred: apparently it isn't. 'parsed' is not in the output you just pasted in channel | 13:49 |
rcarrillocruz | so yeah | 13:52 |
rcarrillocruz | fungi: we are bifrosting | 13:52 |
rcarrillocruz | i just redeployed a server with paul via screen session now | 13:53 |
fungi | rcarrillocruz: rock on! that's awesome news | 13:53 |
*** tonytan4ever has joined #openstack-infra | 13:53 | |
*** permalac has joined #openstack-infra | 13:53 | |
mordred | \o/ | 13:54 |
rcarrillocruz | we needed several bifrost fixes | 13:54 |
rcarrillocruz | and i spotted a couple glean things | 13:54 |
rcarrillocruz | we'll go thru in a bit | 13:54 |
pabelanger | mordred: fungi: clarkb: Would love some feedback on: https://review.openstack.org/#/c/355695/ to fix some launch node failures with ubuntu-xenial | 13:54 |
* rcarrillocruz goes for coffee now | 13:54 | |
*** burgerk has quit IRC | 13:54 | |
*** vikrant has joined #openstack-infra | 13:55 | |
*** infra-red has quit IRC | 13:55 | |
azvyagintsev_h | fungi craige Folks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ? since if i remove check\gate section - test fall ;( | 13:55 |
*** yamahata has joined #openstack-infra | 13:55 | |
mordred | pabelanger: wow | 13:55 |
*** infra-red has joined #openstack-infra | 13:56 | |
mordred | pabelanger: just out of idle curiosity - (patch looks fine) - I wonder if we could get the ssh daemon to not start until unbound is started | 13:56 |
pabelanger | mordred: yes, I thought of that too. I haven't looked into that yet | 13:56 |
fungi | pabelanger: speaking of xenial, snmpd won't start on firehose01... i suspect we need to tweak our config for it | 13:57 |
pabelanger | mordred: as for the problem: http://paste.openstack.org/show/557771/ I _think_ unbound is waiting for random to initialize in the kernel, before doing things with its root.key | 13:57 |
*** vikrant has quit IRC | 13:57 | |
*** markusry has quit IRC | 13:57 | |
*** markusry has joined #openstack-infra | 13:57 | |
pabelanger | fungi: sounds like we need to get an etherpad to track our xenial issues | 13:57 |
*** thorongil has quit IRC | 13:58 | |
fungi | rcarrillocruz: looks (from all the sudospam i've received) that baremetal00 is having trouble resolving its own hostname. may need /etc/hosts fixed? | 13:58 |
*** rbrndt has joined #openstack-infra | 13:59 | |
fungi | rcarrillocruz: or maybe you already fixed that... last entry i have was 07:47:53 utc | 13:59 |
*** pgadiya has quit IRC | 14:00 | |
*** jimbaker has quit IRC | 14:00 | |
*** nmagnezi has joined #openstack-infra | 14:01 | |
*** markusry has quit IRC | 14:02 | |
*** andymaier has quit IRC | 14:02 | |
*** bin_ has joined #openstack-infra | 14:02 | |
*** jistr is now known as jistr|debug | 14:03 | |
*** rlandy|mtg is now known as rlandy | 14:04 | |
*** jimbaker has joined #openstack-infra | 14:04 | |
*** jimbaker has quit IRC | 14:04 | |
*** jimbaker has joined #openstack-infra | 14:04 | |
rcarrillocruz | fungi: https://review.openstack.org/#/c/355778/ | 14:05 |
*** zhurong has quit IRC | 14:06 | |
rcarrillocruz | fungi: essentially, the install playbook on bifrost hardcodes /etc/hostname on 127.0.0.1 | 14:06 |
*** zhurong has joined #openstack-infra | 14:06 | |
rcarrillocruz | which breaks fqdn resolution | 14:06 |
rcarrillocruz | and breaks puppet apply runs | 14:07 |
rcarrillocruz | other thing i've noticed is that puppet sets /etc/resolv.conf to nameserver 127.0.0.1, not sure if that's some unbound thing on the node declaration | 14:07 |
rcarrillocruz | pabelanger: ^ | 14:07 |
*** yamamoto has joined #openstack-infra | 14:07 | |
*** hichihara has quit IRC | 14:08 | |
fungi | rcarrillocruz: yeah, that's because we run unbound on all our servers to provide a local resolver cache | 14:08 |
rcarrillocruz | that's a problem, since baremetal00 runs dnsmasq itself | 14:09 |
fungi | oh, so port conflict i guess | 14:09 |
rcarrillocruz | possibly, although i see unbound set to false on the node declaration | 14:09 |
rcarrillocruz | O_O | 14:09 |
rcarrillocruz | i'll wait for paul, i remember he changed the unbound setting on this node for a reason | 14:10 |
fungi | rcarrillocruz: likely we need to specify a remote resolved if we're not installing unbound | 14:10 |
fungi | er, remote resolver | 14:10 |
rcarrillocruz | i set by hand /etc/resolv.conf to 8.8.8.8 :/ | 14:10 |
*** armax has joined #openstack-infra | 14:10 | |
fungi | pabelanger: digging into the snmpd issue, the commands in our initscript seem fine, but apparently it's not used because there's a systemd unit for it which takes precedence | 14:10 |
* fungi blames lennart | 14:10 | |
rcarrillocruz | but yeah, conflicts with puppet, since that changes it back to 127.0.0.1 which breaks install playbook that pulls things from IntarWeb | 14:11 |
rcarrillocruz | it can't resolve | 14:11 |
mtreinish | pleia2: I though I moved everything to use others.html now | 14:11 |
pleia2 | mtreinish: shrug | 14:11 |
*** _ari_ has quit IRC | 14:11 | |
* rcarrillocruz is procrastinating learning of systemd | 14:11 | |
mtreinish | pleia2: ugh, no the template for integrated gate and the output file look like it's other.html still | 14:13 |
*** yamamoto has quit IRC | 14:13 | |
pleia2 | mtreinish: well, at least it generates now, just some final tweaks to tidy this up then | 14:14 |
*** tqtran has joined #openstack-infra | 14:14 | |
jeblair | rcarrillocruz: why is baremetal00 running dnsmasq as a nameserver? | 14:15 |
rcarrillocruz | jeblair: it's what it uses to pxe boot servers | 14:15 |
*** pgadiya has joined #openstack-infra | 14:16 | |
rcarrillocruz | bifrost that is | 14:16 |
*** xarses has quit IRC | 14:16 | |
jeblair | rcarrillocruz: why is a name server needed for that? | 14:16 |
rcarrillocruz | it's a bifrost dependency | 14:16 |
rcarrillocruz | it's also used as a pxe/tftp server, not just a nameserver | 14:17 |
*** jaosorior has quit IRC | 14:17 | |
jeblair | yeah, the parts that are not a dns resolver make sense. i'm just wondering why it's also configured as a dns resolver | 14:17 |
rcarrillocruz | if you want historical reasons why it was decided to be used , TheJulia may be best to answer that | 14:17 |
jeblair | i'm not sure i'm stating the question in a way that is conveying my meaning | 14:17 |
fungi | is it possible to use it without having it serve as a dns resolver? | 14:17 |
TheJulia | it is, the configuration just needs to be disabled for dns resolution | 14:18 |
*** asettle has quit IRC | 14:18 | |
TheJulia | that is in what is put in place for dnsmasq's main config file | 14:18 |
fungi | that way it wouldn't conflict with the local resolver cache service we want to run on the same machine | 14:18 |
*** tqtran has quit IRC | 14:18 | |
*** asettle has joined #openstack-infra | 14:18 | |
jeblair | dnsmasq is a server which supports lots of protocols. do we need to use its dns resolver as opposed to just the other (pxe/tftp) bits? | 14:18 |
rcarrillocruz | is there a flag available to disable it ? | 14:19 |
rcarrillocruz | TheJulia: ^ | 14:19 |
rcarrillocruz | the dns part | 14:19 |
TheJulia | rcarrillocruz: in bifrost, not presently | 14:19 |
rcarrillocruz | what i thought | 14:19 |
rcarrillocruz | i mean, it wouldn't be complex to push | 14:19 |
*** edtubill has joined #openstack-infra | 14:19 | |
rcarrillocruz | s/push/patch | 14:19 |
TheJulia | no, it should be extremely simple | 14:19 |
rcarrillocruz | TheJulia: do nodes need any dns resolving from the bifrost controller during the IPA loading etc | 14:20 |
rcarrillocruz | ? | 14:20 |
*** jcoufal has joined #openstack-infra | 14:20 | |
rcarrillocruz | if DNS was never needed, i'm curious why it wasn't just disabled from the beginning | 14:20 |
*** asselin has joined #openstack-infra | 14:21 | |
TheJulia | rcarrillocruz: if someone decides to use names in the config handed to ironic in terms of URLs, then dns resolution is required | 14:21 |
TheJulia | but if only IPs are used, then it is not required | 14:21 |
rcarrillocruz | ah | 14:21 |
*** gongysh has joined #openstack-infra | 14:21 | |
rcarrillocruz | hmmm | 14:21 |
jeblair | we *do* have a dns resolver :) | 14:21 |
TheJulia | Well, there you go, the correct dns resolver just needs to be offered out for dhcp requests then | 14:22 |
*** devkulkarni has joined #openstack-infra | 14:22 | |
jeblair | {% if disable_dnsmasq_dns %} | 14:22 |
jeblair | bifrost may already have the option :) | 14:22 |
openstackgerrit | Matthew Treinish proposed openstack-infra/elastic-recheck: Make everything plural https://review.openstack.org/355967 | 14:23 |
rcarrillocruz | doing ironic node-show blah, i only see IPs there | 14:23 |
rcarrillocruz | so i think we should be good | 14:23 |
*** asettle has quit IRC | 14:23 | |
*** edtubill has quit IRC | 14:23 | |
*** _ari_ has joined #openstack-infra | 14:23 | |
*** hongbin has joined #openstack-infra | 14:25 | |
fungi | anybody happen to know how to get systemd to tell you the location on disk of the unit it's using for a particular service? | 14:28 |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/puppet-infracloud: Disable DNS resolver on Bifrost dnsmasq server https://review.openstack.org/355973 | 14:29 |
rcarrillocruz | jeblair: ^ | 14:30 |
rcarrillocruz | fungi: ^ | 14:30 |
*** tosky has quit IRC | 14:30 | |
*** edtubill has joined #openstack-infra | 14:31 | |
*** zz_dimtruck is now known as dimtruck | 14:31 | |
openstackgerrit | Ivan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects https://review.openstack.org/347047 | 14:32 |
*** tosky has joined #openstack-infra | 14:33 | |
azvyagintsev_h | fungi will you have some time to help me with https://review.openstack.org/#/c/353861 ? i cannot get where i miss those stuff..( | 14:34 |
fungi | just a heads up, we're discussing some job failures in #openstack-qa that look like they're a result of remote http(s) calls out of bluebox are failing with some consistency | 14:34 |
*** rajinir has joined #openstack-infra | 14:34 | |
openstackgerrit | Ivan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects https://review.openstack.org/347047 | 14:34 |
fungi | azvyagintsev_h: no clue what you're talking about, or why you're asking me directly. can you elaborate? | 14:34 |
*** mdrabe has quit IRC | 14:34 | |
fungi | azvyagintsev_h: is it related to something i was already working on? | 14:34 |
*** mdrabe has joined #openstack-infra | 14:35 | |
rajinir | fungi: The cell patch was reverted https://review.openstack.org/#/c/355599/1 | 14:35 |
*** jed56 has quit IRC | 14:35 | |
*** asettle has joined #openstack-infra | 14:35 | |
fungi | rajinir: the patch which was causing the failure you were seeing, right? | 14:35 |
rajinir | fungi: yes | 14:35 |
fungi | i didn't follow that very closely, just saw it was also severely impacting nova cells tests in the upstream ci as well | 14:36 |
*** pgadiya has quit IRC | 14:36 | |
*** burgerk has joined #openstack-infra | 14:37 | |
azvyagintsev_h | fungi i guess no:) should i directly ask\wait Craig\someone else ? (i'm asking you just because you are guru ) | 14:37 |
*** amitgandhinz has quit IRC | 14:38 | |
*** links has quit IRC | 14:39 | |
*** amitgandhinz has joined #openstack-infra | 14:39 | |
*** kushal has quit IRC | 14:40 | |
openstackgerrit | Ilya Shakhat proposed openstack-infra/project-config: Add new project "os-failures" https://review.openstack.org/355819 | 14:40 |
betherly | hi there! getting ready to release ironic-ui. i have a patch for openstack-releases but do i also need to tag the release? | 14:40 |
*** kushal has joined #openstack-infra | 14:41 | |
betherly | the route for releasing eslint was quite different from what i did with the ironic-ui last time so got a bit confused re what i need to do this time round | 14:41 |
fungi | betherly: you probably want to ask in #openstack-release but i believe for projects under release management you submit a patch to the releases repo and then a release manager runs a script after it merges and pushes a tag for you | 14:41 |
*** xarses has joined #openstack-infra | 14:42 | |
betherly | ah sorry fungi!! | 14:42 |
betherly | that would make sense! thank you so much :) | 14:42 |
fungi | betherly: you're welcome | 14:42 |
*** florianf has quit IRC | 14:44 | |
jeblair | fungi: 355580 was killed by the problem it attempts to debug | 14:44 |
Shrews | fungi: so, hopefully this change will make those ansible timeouts actually be reported as timeouts and not unparseable: https://github.com/ansible/ansible/pull/17104 | 14:44 |
openstackgerrit | Matthew Treinish proposed openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988 | 14:44 |
openstack | bug 1613749 in OpenStack-Gate "Git timeouts from bluebox" [Undecided,New] https://launchpad.net/bugs/1613749 | 14:44 |
beagles | rcarrillocruz, is https://review.openstack.org/#/c/355973/ supposed to help with some of the jobs failing because of stuff like slow git repo cloning, etc? | 14:45 |
beagles | just seeking clarification as the issue is atm near and dear to my heart :) | 14:45 |
jeblair | beagles: no; but the ansible fix from yesterday was not sufficient | 14:45 |
jeblair | beagles: https://github.com/ansible/ansible/pull/17104 | 14:46 |
rcarrillocruz | beagles: no, that change is unrelated | 14:46 |
beagles | rcarrillocruz, thanks | 14:46 |
beagles | also jeblair thanks | 14:46 |
jeblair | beagles: so the next iteration of that fix is in progress. it hasn't landed in ansible yet, but when it does, we'll redeploy | 14:47 |
jeblair | beagles: however, it's looking like most of the instances of this error are actually timeouts -- did you say you were thinking that was the case with your job? | 14:47 |
rcarrillocruz | that change is about infracloud, a pool of servers we'll manage to run a cloud for CI | 14:47 |
Shrews | jeblair: to be fair, that PR will just (hopefully) get timeouts reported as timeouts. it doesn't actually fix the timeout | 14:47 |
jeblair | Shrews: right :) | 14:47 |
beagles | jeblair, yeah, I was just going to say.. the parsing thing is just how the info was represented - it's the timeouts I'm wondering about | 14:48 |
beagles | not the timeouts actually by *why* those particular things are taking so long :) | 14:49 |
beagles | jeblair, ultimately, I want the ansible fix to be unnecessary :) | 14:49 |
fungi | azvyagintsev_h: explaining in channel what you're trying to figure out and what potential issues you've eliminated already is usually a faster way to get help, rather than just pasting a link. i've skimmed the change and it seems you're proposing creation of a new project/repo but are having trouble with the layout job. the console log from it indicates you're trying to configure zuul to run jobs you | 14:49 |
fungi | haven't defined (e.g., gate-murano-pkg-check-python27-ubuntu-trusty). i see a typo which i've marked inline on your change that would account for it | 14:49 |
*** dprince has quit IRC | 14:49 | |
*** pt_15 has joined #openstack-infra | 14:50 | |
*** jistr|debug is now known as jistr | 14:50 | |
jeblair | mordred: your 4-minute git thing was on osic? | 14:51 |
sdague | jeblair: 4 seconds, right? | 14:52 |
jeblair | sdague: those are different than minutes? :) yeah, 4-something. i guess i'm asking what that value is too. i may be confused because i'm staring at a log that took 5 minutes for each remote update. | 14:53 |
jeblair | on osic. | 14:54 |
sdague | jeblair: I thought he said seconds | 14:54 |
sdague | 4 minutes would be an issue, I agree | 14:54 |
jeblair | sdague: a full clone of git://git.openstack.org/openstack/python-aodhclient took 1 second, so even 4 seconds is :( | 14:55 |
*** tongli has joined #openstack-infra | 14:56 | |
*** vinaypotluri has joined #openstack-infra | 14:56 | |
*** Julien-zte has quit IRC | 14:57 | |
* jeblair fixes he.net tunnel | 14:57 | |
*** permalac has quit IRC | 14:58 | |
*** florianf has joined #openstack-infra | 14:58 | |
*** jimbaker has quit IRC | 14:59 | |
jeblair | telnet 2001:4800:1ae1:18:f816:3eff:fe13:4660 1885 | 14:59 |
jeblair | ga | 14:59 |
jeblair | that's even the wrong port | 14:59 |
mordred | jeblair: yes. 4 minutes | 14:59 |
jeblair | mordred: not seconds? :) | 14:59 |
mordred | :) | 14:59 |
*** yaume has joined #openstack-infra | 15:00 | |
mordred | nope. 4 minutes :) | 15:00 |
jeblair | mordred: osic? | 15:00 |
openstackgerrit | Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861 | 15:00 |
openstackgerrit | Darragh Bailey proposed openstack-infra/jenkins-job-builder: Allow using lockfile per jenkins master https://review.openstack.org/293631 | 15:00 |
mordred | jeblair: yup | 15:01 |
openstackgerrit | Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861 | 15:01 |
*** amitgandhinz has quit IRC | 15:02 | |
*** amitgandhinz has joined #openstack-infra | 15:02 | |
wznoinsk | hi infra | 15:03 |
wznoinsk | et al | 15:03 |
*** jimbaker has joined #openstack-infra | 15:03 | |
*** jimbaker has quit IRC | 15:03 | |
*** jimbaker has joined #openstack-infra | 15:03 | |
openstackgerrit | Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861 | 15:03 |
wznoinsk | did anyone see a situation where in static-network-up is emitted earlier than all the interfaces get their IPs and their /run/network/ifup.* get created? | 15:04 |
jeblair | mordred, sdague: oh, huh, it's not every job on osic. i just watched one breeze right through a git clone. | 15:04 |
*** ifarkas is now known as ifarkas_afk | 15:04 | |
wznoinsk | that's ubuntu 14.04, troubleshooting cloud-init init kicking off to early (before the network is actually up) | 15:04 |
*** dizquierdo is now known as dizquierdo_afk | 15:05 | |
mordred | jeblair: yah - I jumped on an osic node earlier and tried some manual updates and they worked as expected | 15:05 |
mordred | jeblair: I have not yet been able to find the pattern | 15:05 |
jeblair | grr. | 15:06 |
pabelanger | mordred: jeblair: fungi: So, here is the boot process on ubuntu-xenail visualized: http://imgh.us/filename_3.svg check out unbound | 15:06 |
pabelanger | 1min 155ms to start | 15:06 |
pabelanger | I don't know why yet | 15:06 |
*** martinkopec has quit IRC | 15:06 | |
openstackgerrit | Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912 | 15:06 |
jeblair | cloudnull: if we collect instance ids from jobs which have very slow interactions with our git farm, can you correlate those and see if there is a host/network patten on the cloud side? | 15:07 |
pabelanger | it does look like it is waiting for random | 15:07 |
fungi | speaking of osic, it looks like we also have devstack jobs failing there because glance isn't responding on 127.0.0.1:9292 when told to listen on 0.0.0.0:9292 (baffling) | 15:07 |
fungi | and i notice traceroute6 out from job nodes there to git.o.o coming back blank | 15:07 |
jeblair | pabelanger: is that first or second boot? | 15:08 |
*** mhickey has joined #openstack-infra | 15:08 | |
*** itisha has joined #openstack-infra | 15:09 | |
jeblair | fungi: i just did 'traceroute6 git.openstack.org' from a node (which ran a job where git worked fine) and got data | 15:09 |
jeblair | fungi: so maybe on the nodes where git takes 4+ minutes for each operation, traceroute6 git.o.o also fails? | 15:09 |
*** hockeynut has joined #openstack-infra | 15:10 | |
openstackgerrit | Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861 | 15:10 |
pabelanger | jeblair: in this case, 2nd boot (I disabled the puppet service on boot). I can redo on first boot if needed | 15:10 |
fungi | jeblair: perhaps. here's one log we were looking at with the localhost glance weirdness http://logs.openstack.org/55/352455/3/gate/gate-tempest-dsvm-cells/7881266/console.html#_2016-08-16_12_27_19_830897 | 15:10 |
jeblair | pabelanger: nah, that's okay. 2nd is more interesting to me. | 15:10 |
*** vhosakot has joined #openstack-infra | 15:13 | |
*** devkulkarni has quit IRC | 15:13 | |
*** jcoufal has quit IRC | 15:13 | |
*** devkulkarni has joined #openstack-infra | 15:13 | |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config: Disable puppet service on boot https://review.openstack.org/356004 | 15:13 |
pabelanger | disabled puppet service on boot^ | 15:14 |
*** hockeynu_ has joined #openstack-infra | 15:15 | |
pabelanger | jeblair: I've updated our configure_mirror.sh (355695) to better handle the delayed dns on ubuntu-xenial. Since we have a large amount of launch failures because of it. This could also explain why ubuntu-trusty ready nodes is much higher then ubuntu-xenial during the day | 15:15 |
*** devkulkarni has quit IRC | 15:16 | |
*** devkulkarni has joined #openstack-infra | 15:16 | |
jeblair | pabelanger: does that always happen, or just sometimes? | 15:16 |
jeblair | pabelanger: i wonder if it's the same problem as git. | 15:17 |
*** Goneri has quit IRC | 15:17 | |
pabelanger | jeblair: Yes, I've also see it on multiple clouds, osic-cloud1 and bluebox, in sampling ubuntu-xenial syslogs | 15:17 |
*** dprince has joined #openstack-infra | 15:18 | |
jeblair | pabelanger: oh, so not just osic | 15:18 |
pabelanger | let me check others quickly | 15:18 |
pabelanger | jeblair: right | 15:18 |
*** jcoufal has joined #openstack-infra | 15:18 | |
*** hockeynut has quit IRC | 15:18 | |
clarkb | pabelanger: and that is with the pre ipv6 resolver config right? (that hasn't gotten onto our images yet) | 15:18 |
*** Goneri has joined #openstack-infra | 15:18 | |
jeblair | cloudnull: is there any commonality between instances a4b575fe-b043-4775-9d8e-286c04f03a9f and 9b3bf68f-b08b-4851-a22a-d2f6a5247982 ? | 15:19 |
pabelanger | clarkb: no, I build and uploaded a ubuntu-xenial image to osic last night, same issue | 15:19 |
jeblair | cloudnull, pabelanger: i don't think we actually needed the ipv6 resolver change -- osic has a 6 to 4 gateway | 15:19 |
jeblair | clarkb: ^ | 15:19 |
*** markusry has joined #openstack-infra | 15:19 | |
jeblair | shouldn't hurt | 15:20 |
clarkb | jeblair: correct we don't need it for stuff to function properly. Just trying to make sure this isn't somehow a regression related to that change | 15:20 |
*** markusry has quit IRC | 15:20 | |
fungi | yeah, we don't _need_ it for osic, but we do need it in case we end up with a provider with no ipv4 routing at all for our job nodes in the future | 15:20 |
fungi | so not urgent, but not entirely useless | 15:20 |
*** mtanino has joined #openstack-infra | 15:20 | |
mtreinish | jeblair: fwiw, we're tracking 2 failures on the tempest glance tests. One on osic and the other on bluebox | 15:20 |
mtreinish | it's the same tests, but they manifest a little differently | 15:21 |
pabelanger | clarkb: jeblair: that is from internap http://imgh.us/filename_4.svg | 15:22 |
pabelanger | I have a change out to disable puppet on boot | 15:23 |
pabelanger | 356004 | 15:23 |
*** derekh has quit IRC | 15:24 | |
clarkb | pabelanger: how long does it take if you stop the unbound service then start it? | 15:24 |
*** oanson has quit IRC | 15:24 | |
clarkb | is this only present on boot or any time the service starts? | 15:24 |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/puppet-infracloud: Switch to infra-cloud-bridge element https://review.openstack.org/356009 | 15:24 |
pabelanger | clarkb: after it has started properly, restarts are instant | 15:24 |
fungi | i have a feeling it's generating a local key for dnssec at first start | 15:24 |
*** ccamacho is now known as ccamacho|out | 15:24 | |
pabelanger | right, I think that too | 15:25 |
fungi | though why it takes that long to do so is worth asking | 15:25 |
openstackgerrit | Merged openstack-infra/system-config: Add mirror.regionone.osic-cloud1.o.o to cacti https://review.openstack.org/355580 | 15:25 |
fungi | haveged starts well before udevd according to that visualization | 15:25 |
rcarrillocruz | some entropy delay ^ | 15:25 |
pabelanger | clarkb: http://paste.openstack.org/show/557771/ is always the order when unbound starts processing | 15:25 |
clarkb | "With cache restoration turned on, my system reboot would take forever, because of unbound hanging/processing a maybe corrupt cache-file." is from a random pfsense forum post | 15:25 |
pabelanger | rcarrillocruz: likely ^ | 15:25 |
fungi | er, i mean haveged starts well before unbound | 15:26 |
fungi | oh, that's worth checking | 15:26 |
pabelanger | clarkb: So.... | 15:26 |
pabelanger | there is some chroot logic in unbound too | 15:26 |
clarkb | pabelanger: apparently we can turn off cache restoration which should be fine for our single use nodes | 15:26 |
clarkb | (if that is indeed related) | 15:26 |
pabelanger | okay, I can try that | 15:27 |
*** tongli has quit IRC | 15:27 | |
pabelanger | any docs on how to do that? | 15:27 |
*** tongli has joined #openstack-infra | 15:27 | |
clarkb | not seeing it in the unbound.conf man page | 15:28 |
* clarkb keeps digging | 15:28 | |
*** esikachev has quit IRC | 15:29 | |
fungi | yeah, i've been through the manpages for unbound, unbound.conf and unbound-control so far, to no avail | 15:30 |
pabelanger | I think it is a manual process | 15:30 |
jeblair | fungi, pabelanger: i thought this graph was second boot? | 15:31 |
clarkb | pabelanger: ya looks like its part of unbound-control so would probably be part of the unit files if being done on ubuntu | 15:31 |
pabelanger | jeblair: first svg was 2nd boot, seconds svg was first boot | 15:32 |
jeblair | pabelanger: either way, it's a long startup both times, ya? | 15:32 |
*** matrohon has quit IRC | 15:32 | |
jeblair | pabelanger: do you have a node where this is slow? | 15:33 |
jeblair | i just restarted unbound on a xenial node and it was fast | 15:33 |
pabelanger | jeblair: yes, 2001:4800:1ae1:18:f816:3eff:fe8e:9a3e | 15:33 |
pabelanger | I manually launched that is osic-cloud1 | 15:34 |
pabelanger | jeblair: feel free to reboot if needed | 15:34 |
jeblair | pabelanger: cool, thanks | 15:34 |
clarkb | I guess the other thing to check is logs? is unbound logging to journald here? | 15:35 |
jeblair | unbound restarts instantly there. i'll reboot | 15:35 |
pabelanger | jeblair: right on first boot check the status of the service | 15:36 |
pabelanger | I haven't not stopped and started right after a boot | 15:36 |
pabelanger | clarkb: we'd have to enable debugging, which I can | 15:36 |
jeblair | pabelanger: yeah, there's no delay when doing a stop/start | 15:36 |
jeblair | there is some logging to syslog | 15:36 |
clarkb | Aug 16 15:07:56 ubuntu unbound-anchor: fail: the anchor is NOT ok and could not be fixed | 15:36 |
pabelanger | jeblair: on first boot? | 15:37 |
*** tosky_ has joined #openstack-infra | 15:37 | |
clarkb | though on that random host I am looking at syslog for it seems to have started in about 2 seconds | 15:37 |
*** zhurong has quit IRC | 15:37 | |
clarkb | Aug 16 15:07:55 ubuntu systemd[1]: Starting unbound.service... to Aug 16 15:07:56 ubuntu systemd[1]: Started unbound.service. | 15:37 |
*** _nadya_ has quit IRC | 15:38 | |
pabelanger | clarkb: which host is that? | 15:38 |
clarkb | ubuntu-xenial-rax-ord-3521279 | 15:38 |
clarkb | just a random one I grabbed out of the nodepool list | 15:38 |
jeblair | strace -p 1852 | 15:38 |
jeblair | strace: Process 1852 attached | 15:38 |
jeblair | getrandom( | 15:38 |
clarkb | so this isn't consistent | 15:38 |
*** edtubill has quit IRC | 15:38 | |
jeblair | so yes, waiting for getrandom | 15:38 |
fungi | cat /proc/sys/kernel/random/entropy_avail | 15:39 |
pabelanger | jeblair: neat | 15:39 |
jeblair | 2374 | 15:39 |
jeblair | i'll reboot again and repeat | 15:39 |
fungi | also does ps suggest haveged is running? | 15:39 |
clarkb | there is a haveged on my host that did it quickly | 15:39 |
fungi | haveged should be keeping the entropy pool nice and full | 15:39 |
*** tosky has quit IRC | 15:40 | |
pabelanger | fungi: I see it running | 15:40 |
*** amotoki has joined #openstack-infra | 15:40 | |
openstackgerrit | Ilya Shakhat proposed openstack-infra/project-config: Add new project "os-failures" https://review.openstack.org/355819 | 15:41 |
pabelanger | root 700 0.2 0.0 12204 6584 ? Ss 15:39 0:00 /usr/sbin/haveged --Foreground --verbose=1 -w 1024 | 15:41 |
dstufft | mordred: sdague I am awake now, what's up? | 15:41 |
jeblair | hrm. haveged was running while unbound-anchor was waiting. the pool had 2496 | 15:41 |
*** andreykurilin has quit IRC | 15:41 | |
jeblair | now that unbound-anchor completed the pool is at 2369 | 15:41 |
fungi | jeblair: yeah, that's a ton of available entropy | 15:41 |
mordred | dstufft: pip download with an alternate index does not store into or retreive from cache. pip download without an alternate index does. pip install writes to and reads from cache in both cases | 15:42 |
mordred | dstufft: should I file a bug for that? | 15:42 |
*** rbuzatu has quit IRC | 15:42 | |
*** amotoki has quit IRC | 15:43 | |
*** amotoki has joined #openstack-infra | 15:43 | |
dstufft | mordred: is this alternate index available publically? can I repro it on my desktop? | 15:43 |
mordred | dstufft: yup! | 15:43 |
clarkb | jeblair: fungi pabelanger I wonder if /usr/share/dns/root.key's key is just old and stale? the unbound-anchor manpage warns against this | 15:43 |
mordred | dstufft: pip install --trusted-host mirror.gra1.ovh.openstack.org -i http://mirror.gra1.ovh.openstack.org/pypi/simple paramz | 15:44 |
mordred | dstufft: is what we've been using | 15:44 |
mordred | dstufft: (obviously in the various different combinations) | 15:44 |
clarkb | it then does an update beacuse that file is not valid | 15:44 |
openstackgerrit | Emmet Hikory proposed openstack-infra/storyboard: Describe Storyboard in more detail https://review.openstack.org/356021 | 15:44 |
*** rcernin has quit IRC | 15:46 | |
clarkb | it does fetch things from the internet in that case | 15:46 |
*** rbuzatu has joined #openstack-infra | 15:46 | |
pabelanger | clarkb: but once the file is updated once, shouldn't the next reboot be good? | 15:47 |
rcarrillocruz | TheJulia, cinerama : are we good to land https://review.openstack.org/#/c/353990/ and https://review.openstack.org/#/c/354615/ ? | 15:48 |
*** e0ne has quit IRC | 15:49 | |
clarkb | pabelanger: maybe? Probably not if it copies the bad one over again | 15:50 |
jeblair | clarkb, pabelanger: when i strace unbound-anchor at boot, it's sitting at getrandom, and stays there until the kernel says: Aug 16 15:49:39 ubuntu kernel: [ 62.801497] random: nonblocking pool is initialized | 15:50 |
dstufft | mordred: Hmm, well pip install --trusted-host doesn't populate the cache at all for me here, and I think that's by design (trying to remember back to when we implemented it). The comments in the code suggest we purposely only cache valid HTTPS to prevent semi persistent poisoning of the cache and requiring manual eviction, also http://mirror.gra1.ovh.openstack.org/pypi/simple/paramz/ doesn't have cache control headers so even if you | 15:51 |
dstufft | had valid HTTPS it wouldn't do a no-network cache hit (it does have an ETag header, so it'll do a conditonal GET though) (same is true for the files themselves) | 15:51 |
pabelanger | jeblair: I'd like to try something quickly, can we set ROOT_TRUST_ANCHOR_UPDATE=false in /etc/default/unbound and restart? | 15:52 |
pabelanger | jeblair: then run strace | 15:52 |
pabelanger | # Whether to automatically update the root trust anchor file. | 15:52 |
pabelanger | ROOT_TRUST_ANCHOR_UPDATE=true | 15:52 |
jeblair | pabelanger: go for it | 15:52 |
mordred | dstufft: ah. you're right - I must have done one of the combinations wrong :( | 15:52 |
mordred | dstufft: if we did a download without the alternate index just pointing to normal pypi | 15:53 |
mordred | dstufft: and then did a subsequent install with the trusted index ... should we expect it to read from the cache? | 15:53 |
dstufft | mordred: No, cache keys are full URLs | 15:53 |
mordred | gotcha | 15:53 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/subunit2sql: Fix typo in test_attr_list handling https://review.openstack.org/355385 | 15:53 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/subunit2sql: Remove the test_attr_prefix before injecting https://review.openstack.org/355393 | 15:53 |
jeblair | $ reboot | 15:53 |
jeblair | Failed to connect to bus: No such file or directory | 15:53 |
*** ganesan has joined #openstack-infra | 15:53 | |
dstufft | mordred: so my recommendation would be A) Throw letsencrypt on the mirror B) setup cache-control | 15:54 |
jeblair | love it | 15:54 |
clarkb | jeblair: need more sudo | 15:54 |
jeblair | clarkb: indeed | 15:54 |
openstackgerrit | Beth Elwell proposed openstack-infra/project-config: Add release notes jobs for ironic-ui https://review.openstack.org/356029 | 15:54 |
mordred | dstufft: nod. cool. thanks! | 15:54 |
fungi | clarkb: sure, but that error message is beyond vague | 15:54 |
mordred | fungi: ^^ see convo with dstufft | 15:54 |
clarkb | fungi: ya its systemctl failing to talk to systemd due to perms | 15:54 |
*** hieulq_ has joined #openstack-infra | 15:55 | |
fungi | mordred: but if cache keys are the full urls, then we're still not going to end up being able to do much to prepopulate the cache | 15:55 |
mordred | fungi: agree | 15:55 |
fungi | since we need a different mirror url in each provider | 15:55 |
mordred | yup | 15:55 |
dstufft | mordred: fungi if this is a systemd using machine I have a couple of systemd unit files and a cron job you can use to keep LE up to date | 15:55 |
pabelanger | jeblair: clarkb: okay, that booted a little faster, jeblair I missed the strace, do you mind trying? | 15:55 |
dstufft | ah yea | 15:56 |
dstufft | that's harder | 15:56 |
mordred | yah. I think that's the real issue | 15:56 |
fungi | i suppose i can buy certs for the mirrors... i'm hesitant to have letsencrypt breaking our mirrors at random when it tries to renew certs | 15:56 |
mordred | fungi: I don;'t think it'll get us anywhere | 15:56 |
dstufft | it's a proper HTTP cache, so it treats different URLs as distinct | 15:56 |
fungi | dstufft: is there a good way to transform the cache? i suppose you're using a one-way hash so we can't reverse it to update the urls? | 15:57 |
mordred | actually - how is this working on normal devstack runs then? | 15:57 |
mordred | sdague said earlier that we do see only one download of a given thing in our devstack jobs | 15:57 |
openstackgerrit | Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912 | 15:57 |
mordred | but if install is not supposed to cache when we have trusted-host set | 15:57 |
mordred | those two things are potentially at odds | 15:57 |
clarkb | its wheel caching in devstack iirc | 15:58 |
sdague | clarkb: it's just pip | 15:58 |
dstufft | mordred: it's possible that wheel caching didn't get the same treatment | 15:58 |
mordred | gotit. because those are locally built wheels | 15:58 |
mordred | or whatnot | 15:58 |
sdague | mordred: except they aren't | 15:58 |
pabelanger | clarkb: jeblair: fungi: Oh, ya. Way faster now: http://imgh.us/filename_5.svg That is with ROOT_TRUST_ANCHOR_UPDATE=false | 15:58 |
sdague | we're mostly downloading wheels | 15:58 |
pabelanger | unbound.service (138ms) | 15:59 |
mordred | so wheel cache potentially not caching the same as tarballs is the thing saving us there | 15:59 |
*** matthewbodkin has quit IRC | 16:00 | |
sdague | clarkb / fungi - I have suspicions that some of our odd fails in the last day are related to this - https://review.openstack.org/#/c/356010/ | 16:00 |
dstufft | fungi: it is a one way hash, I'm still kinda asleep (woo waking up at 11am), but off the top of my head it might be reasonable to implement some sort of aliasing thing. a la "treat domain x, y, z as domain a" | 16:00 |
sdague | which increased keystone debug logs by 2 orders of magnitude | 16:00 |
dstufft | when it comes to caching | 16:00 |
sdague | that's the revert | 16:00 |
sdague | any chance we could pop it into the top of gate? | 16:00 |
openstackgerrit | Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912 | 16:00 |
*** yaume has quit IRC | 16:00 | |
*** Sukhdev has joined #openstack-infra | 16:01 | |
sdague | I've definitely seen a bunch of odd keystone token lookup fails since that merged | 16:01 |
dstufft | we should probably make the wheel cache and the http cache consistent though | 16:01 |
dstufft | it's weird that it's not | 16:01 |
*** xarses has quit IRC | 16:01 | |
sdague | plus, until it's reverted, keystone logs are about ~1G uncompressed | 16:01 |
*** xarses has joined #openstack-infra | 16:01 | |
dstufft | either skip the lack of caching on http or make wheel cache not cache on http | 16:01 |
fungi | sdague: ouch | 16:01 |
pabelanger | jeblair: clarkb: maybe not. still takes 1min for host git.openstack.org to resolve | 16:01 |
jeblair | pabelanger: can you put that host back so i can continue to debug? | 16:01 |
sdague | fungi: it was an attempt to narrow some issues in the revoke code, I think the full extent of the fallout wasn't anticipated | 16:02 |
pabelanger | jeblair: yes, just rebooted back to original settings | 16:02 |
mordred | dstufft: I agree on making them consistent | 16:03 |
jeblair | pabelanger, clarkb: looking at another host which is not slow to boot, i see: | 16:03 |
jeblair | Aug 16 15:55:19 ubuntu-xenial-osic-cloud1-3521020 kernel: [ 3.906606] random: nonblocking pool is initialized | 16:03 |
*** jcoufal_ has joined #openstack-infra | 16:03 | |
jeblair | note that's at 3.9 seconds from boot, as opposed to 62 seconds on pabelanger's host | 16:03 |
fungi | sdague: openstack/keystone 356010,1 is at the top of the integrated gate change queue now | 16:03 |
fungi | stevemar: dstanek: ^ | 16:04 |
*** edtubill has joined #openstack-infra | 16:04 | |
stevemar | fungi: thank you | 16:04 |
stevemar | sdague is padding his stats with reverts again :) | 16:05 |
*** Sukhdev has quit IRC | 16:05 | |
zaro | morning | 16:05 |
jeblair | pabelanger, clarkb: putting everything together so far -- it's deciding to fetch a new anchor file, and is using openssl for that which needs some random which is taking 60+ seconds | 16:05 |
clarkb | jeblair: that sounds correct to me | 16:05 |
*** xarses has quit IRC | 16:05 | |
mordred | that also sounds correct to me | 16:05 |
mordred | based on reading | 16:05 |
*** xarses has joined #openstack-infra | 16:06 | |
jeblair | pabelanger: i can't log into 2001:4800:1ae1:18:f816:3eff:fe8e:9a3e | 16:06 |
*** jcoufal has quit IRC | 16:06 | |
jeblair | pabelanger: nm | 16:06 |
*** aaltman has joined #openstack-infra | 16:07 | |
aaltman | Hey guys, I had a quick question about nodepool if anyone has a moment | 16:07 |
*** xarses has quit IRC | 16:07 | |
*** xarses has joined #openstack-infra | 16:07 | |
pabelanger | I still don't understand why it gets a new root.key on reboot | 16:07 |
mordred | aaltman: just shoot - we'll respond in time | 16:07 |
*** amotoki has quit IRC | 16:07 | |
pabelanger | I would expect that to persist | 16:07 |
clarkb | pabelanger: its copying the bad one if it does that unconditionally the fixed one will be overwritten would be easy to check that in syslog | 16:08 |
*** infra-red has quit IRC | 16:09 | |
*** sdake has quit IRC | 16:09 | |
*** xyang1 has quit IRC | 16:09 | |
aaltman | okay cool: so when I boot a vm w/ nodepool and have it configured w/ Jenkins - what is the expectation for those two to connect? Who is performing the registration? I thought it happened over Gearman, but does the jenkins ssh key and username need to be enabled on the vm or can it use something like cloud-user | 16:09 |
*** hockeynu_ has quit IRC | 16:10 | |
*** sshnaidm is now known as sshnaidm|afk | 16:11 | |
*** matbu is now known as matbu|afk | 16:11 | |
mgagne | aaltman: looks to be done via Jenkins API: https://github.com/openstack-infra/nodepool/blob/master/nodepool/myjenkins.py#L132-L133 | 16:12 |
pabelanger | clarkb: Yup, I see that on first boot. root.key copied, follow up reboots, key has content | 16:12 |
mordred | aaltman: the expectation is that once nodepool spins up a node it'll have account/public-keys on it such that jenkins can connect to it - we bake those in as part of our base-image build process | 16:12 |
mordred | and then yes, as mgagne says, nodepool uses the Jenkins API to attach the slave to jenkins | 16:12 |
aaltman | mgagne: okay great. I think I can replicate that and see what's going on. May be an SSL issue w/ our jenkins since it's self signed for dev. | 16:13 |
*** asettle has quit IRC | 16:13 | |
sdague | stevemar: hey, it counts as commit in keystone, so I get to vote for ptl again :) | 16:14 |
stevemar | sdague: haha, uh oh ... :) | 16:14 |
jeblair | aaltman: do the nodes show up in jenkins at all? if not, the nodepool log may have information as to why | 16:14 |
dstanek | sdague: now i see your game :-) | 16:14 |
aaltman | mordred: okay. So that may be missing as well, we are generating a key, uploading to openstack on container entry, and nodepool can access, but Jenkins shouldn't be able to w/ that model | 16:14 |
aaltman | jetblair: they don't | 16:15 |
aaltman | Jetblair: we checked the logs thoroughly and don't see anything suspcious | 16:15 |
*** dizquierdo_afk is now known as dizquierdo | 16:15 | |
aaltman | jetblair: there's auth exec related to root/fedora/ubuntu logins until it hits cloud-user, which works fine and finishes out the setup | 16:15 |
*** dtantsur is now known as dtantsur|afk | 16:18 | |
*** xyang1 has joined #openstack-infra | 16:18 | |
jeblair | aaltman: what you describe sounds like the snapshot image build process, where nodepool boots a node from a base image, customizes it, then takes a snapshot of it. the actual test nodes are built from the snapshot. | 16:18 |
jeblair | aaltman: nodepool and jenkins both need to have the private ssh key for the same account which should be installed on the snapshot. nodepool uses it to log in immediately after a node boots to make sure that it worked, then it attaches it to jenkins | 16:19 |
openstackgerrit | Matt Riedemann proposed openstack-infra/project-config: Make gate-tempest-dsvm-multinode-live-migration gating for nova https://review.openstack.org/356043 | 16:20 |
aaltman | jetblair: Okay, so the boot process seems* to go fine, it's the handoff to Jenkins, which I suspect is an SSL cert issue that I currently do not see in the log and then in addition matching the keys | 16:20 |
aaltman | That should be enough information to go off of | 16:21 |
*** adrian_otto has joined #openstack-infra | 16:21 | |
aaltman | I'll give those two things a try. Thanks for the help! | 16:21 |
fungi | aaltman: worth noting, back when we stil used jenkins we had self-signed certs on our jenkins masters | 16:21 |
fungi | i don't recall if we had to do anything special to "trust" those, or were simply relying on older python 2.7 not actually validating server certs | 16:22 |
aaltman | fungi: hmmm that's interesting | 16:22 |
*** markusry has joined #openstack-infra | 16:22 | |
kgiusti | fungi: just fyi oslo.messaging is hitting the exact same tempest failures as the gate-tempest-dsvm-cells you mentioned | 16:22 |
pabelanger | Aug 16 16:07:16 ubuntu kernel: [ 15.415094] random: nonblocking pool is initialized | 16:22 |
*** pilgrimstack has quit IRC | 16:23 | |
kgiusti | fungi: but we never see the issue running the same test on the centos box FWIW | 16:23 |
*** Sukhdev has joined #openstack-infra | 16:23 | |
*** yamahata has quit IRC | 16:23 | |
fungi | kgiusti: the ones where it fails to reach glance on 127.0.0.1:9292? | 16:23 |
pabelanger | jeblair: that was 2 reboots ago, did you make a change on the node? | 16:23 |
pabelanger | ^ | 16:23 |
*** piet_ has joined #openstack-infra | 16:23 | |
pabelanger | only time random has started below 60 seconds | 16:23 |
kgiusti | fungi: http://logs.openstack.org/90/349290/3/check/gate-oslo.messaging-src-dsvm-full-zmq/dd1de25/console.html#_2016-08-16_09_59_04_230462 | 16:24 |
fungi | kgiusti: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9292): Max retries exceeded with url: /v1/images (Caused by ReadTimeoutError("HTTPConnectionPool(host='127.0.0.1', port=9292): Read timed out. (read timeout=60)",)) | 16:24 |
fungi | so, yep | 16:24 |
jeblair | pabelanger: no | 16:25 |
kgiusti | fungi: yay it's not just me! :) | 16:25 |
fungi | kgiusti: and in osic, so this pattern seems consistent | 16:25 |
cloudnull | afternoons. sorry have been mostly AFK so far today. | 16:26 |
kgiusti | fungi: agreed. | 16:26 |
* cloudnull reading back | 16:26 | |
fungi | cloudnull: have fun, you have several nick highlights in here ;) | 16:26 |
*** savihou has quit IRC | 16:28 | |
cloudnull | mordred jeblair if we can get a list of instance id's I can go and track them down to and see if there are specific issues with a given host. | 16:28 |
openstackgerrit | Matthew Treinish proposed openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988 | 16:28 |
openstack | bug 1613749 in OpenStack-Gate "Git timeouts from bluebox" [Undecided,New] https://launchpad.net/bugs/1613749 | 16:28 |
mtreinish | fungi: ^^^ there is the bug and e-r query we're using to track it | 16:29 |
cloudnull | pabelanger: do we think that the DNS resolver issues are what is causing the slowdown folks have been mentioning? | 16:30 |
*** jpich has quit IRC | 16:30 | |
cloudnull | mordred: whats with the 4 min to resolve git.openstack.org? is that something on the OSIC side that is causing that slowdown or is that a known routing issue? | 16:31 |
*** gongysh has quit IRC | 16:31 | |
*** baoli_ has quit IRC | 16:31 | |
pabelanger | cloudnull: we've had ipv6 dns on ubuntu-xenail since last night | 16:32 |
cloudnull | still not happy ? | 16:32 |
pabelanger | cloudnull: however, I haven't followed the git issue much this morning | 16:32 |
cloudnull | ok | 16:32 |
jeblair | cloudnull: i believe we're working on 2 simultaneous issues, only one of which is osic-specific | 16:33 |
cloudnull | did the unbound start issues get resolved? | 16:33 |
jeblair | (the other is xenial specific) | 16:33 |
cloudnull | jeblair: which one is the osic specific issue? -- sorry likley missed the message in scroll back | 16:33 |
jeblair | cloudnull: the 'git' issue is that it takes 4 minutes to perform git operations from osic to git.openstack.org | 16:33 |
jeblair | cloudnull: and that's the one where i sent you two instance ids to see if there is any correlation | 16:33 |
cloudnull | well the message is likely there, i just missed reading it | 16:34 |
cloudnull | :) | 16:34 |
* cloudnull looking into those instances now | 16:34 | |
greghaynes | jeblair: Random depends-on quesion - in the case of there being multiple changesets which match a change-id in a depends-on (in two different projects) does zuul depend on both? | 16:34 |
jeblair | cloudnull: the other issue is that unbound sometimes takes a while to start, but i don't think that's xenial related | 16:34 |
jeblair | greghaynes: yes | 16:34 |
greghaynes | good deal :) | 16:34 |
jeblair | greghaynes: whew! :) | 16:34 |
cloudnull | I saw the patch from pabelanger last night regarding that issue giving the process start and g.o.o resolv a wait. | 16:35 |
cloudnull | does that fix the unbound problem ? | 16:35 |
jeblair | cloudnull: i don't know about that. pabelanger ? | 16:37 |
pabelanger | jeblair: cloudnull: 355695 will just make our configure_mirror.sh script more robust, it doesn't address the actually unbound delay issue | 16:38 |
cloudnull | yes that one, https://review.openstack.org/#/c/355695/ -- do we think thats an entropy issue? | 16:39 |
openstackgerrit | Matt Riedemann proposed openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988 | 16:39 |
openstack | bug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New] https://launchpad.net/bugs/1613749 | 16:39 |
jeblair | grr. all of the google results about "nonblocking pool is initialized" are related to the fact that it's a late kernel message, so it's what people see when their systems are borked and hang | 16:39 |
jeblair | i can't actually find what prints it | 16:39 |
* fungi wishes linux would just switch to a continuously-seeded high-quality nonblocking prng as its /dev/random backend, like all the *bsds have done for years | 16:40 | |
*** florianf has quit IRC | 16:40 | |
*** hockeynut has joined #openstack-infra | 16:40 | |
persia | Isn't /dev/urandom very close to that? | 16:40 |
jeblair | fungi: i'm still really confused since haveged is running and the pool has entropy. | 16:40 |
fungi | jeblair: agreed | 16:41 |
fungi | no clue why it thinks it needs to wait still. unless aslr is grabbing priority over the available entropy pool soon after boot while other stuff is being loaded into memory? | 16:41 |
mgagne | pabelanger: I'm considering contributing to grafyaml. I found that Grafana 3.x supports more features but also changes the syntax of some options. How should grafyaml be updated so it doesn't break the world of 2.x? | 16:42 |
fungi | persia: yeah, except they continue to claim /dev/urandom is not secure for things like key generation. consensus among cryptographers is that you don't really "use up" entropy you accumulate, and so the linux entropy pool design is a bit of a fiction | 16:42 |
fungi | you should be able to reuse the same entropy once you have it, as long as it's presented through an appropriately turbulent prng algorithm | 16:43 |
persia | Claiming /dev/urandom isn't secure is just FUD. It's usually at least as good as using the HWRNG on a TPM module, or similar. | 16:44 |
*** berendt has quit IRC | 16:44 | |
*** tosky_ has quit IRC | 16:44 | |
fungi | agreed | 16:44 |
*** rbuzatu has quit IRC | 16:45 | |
persia | Well, if you need to generate a one-time pad vs. an adversary with unlimited resources, /dev/random *might* be better, if you have good sources of true entropy, but in that situation, you probably shouldn't be using an operating system you didn't hand-code from scratch... | 16:45 |
*** florianf has joined #openstack-infra | 16:45 | |
fungi | there's a bit of stockholm syndrome going on with linux's /dev/random though. other operating systems have moved past that thinking | 16:45 |
cloudnull | jeblair: regarding those two instances, nothing stands out between the two nodes, they landed on different compute nodes w/in different cabinets and both compute nodes have other vms running on them which are funcitonal . | 16:46 |
fungi | jeblair: cloudnull: we do have a second (suspected) osic-specific issue, which could be related but also could be distinct: specific tests timing out trying to connect to a service listening on 127.0.0.1. we're only seeing this failure manifest in osic (so far anyway) | 16:46 |
jeblair | cloudnull: gr. | 16:46 |
*** piet_ has quit IRC | 16:46 | |
cloudnull | fungi: hum... | 16:47 |
*** kcobb has quit IRC | 16:47 | |
*** kcobb has joined #openstack-infra | 16:48 | |
fungi | example failure is http://logs.openstack.org/55/352455/3/gate/gate-tempest-dsvm-cells/7881266/console.html#_2016-08-16_13_38_04_769628 | 16:48 |
cloudnull | fungi: anything strange or did something change in the hosts file making it not resolve? | 16:48 |
fungi | cloudnull: not entirely sure yet, though our network diagnostics at the start of the log also show traceroute6 timing out trying to get to git.o.o | 16:49 |
fungi | it resolves via dns correctly, but sees no icmp responses from any hop | 16:49 |
jeblair | fungi, cloudnull: the two instances of slow git operations also have traceroute6 git.o.o timeouts | 16:49 |
kgiusti | fungi: cloudnull: fyi: https://bugs.launchpad.net/openstack-gate/+bug/1613749 | 16:51 |
openstack | Launchpad bug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New] | 16:51 |
fungi | kgiusti: i think that's separate | 16:52 |
fungi | oh, maybe it's not | 16:52 |
fungi | the bluebox one might want to be separated out though since the symptoms are distinct | 16:53 |
*** sambetts is now known as sambetts|afk | 16:53 | |
*** jerryz has joined #openstack-infra | 16:54 | |
kgiusti | mtreinish: ^^^ | 16:54 |
fungi | but thinking through that test, _if_ glance thinks it's serving the image from a remote location on git.o.o, then that could account for the timeout for the test's api calls to 127.0.0.1:9292 | 16:54 |
*** infra-red has joined #openstack-infra | 16:54 | |
cloudnull | kgiusti fungi jeblair: We have an SSD sepcific AZ if we think that the speed of writes is whats causing that issue? we could switch to using that AZ to see if more iops fixes the issue? | 16:54 |
*** nwkarsten has quit IRC | 16:54 | |
*** bin_ has quit IRC | 16:54 | |
mtreinish | fungi: the working theory right now is it actually might not network related. we're waiting on sdague's revert to see if the load being generated by keystone logging constantly was causing these issue | 16:55 |
mtreinish | because there are some keystone token errors in the glance logs before things start getting weird | 16:55 |
fungi | mtreinish: still strange we would only see it manifest that way in osic | 16:55 |
mordred | cloudnull: I think sorting out the network issue first is more likely to be a win | 16:55 |
cloudnull | ++ | 16:55 |
cloudnull | fungi: maybe we're seeing it in the osic more due to it now having more tests run within the cloud ? | 16:56 |
mtreinish | fungi: well if it's load related than the hardware and/or cloud config comes into play more | 16:56 |
*** javeriak has joined #openstack-infra | 16:57 | |
fungi | cloudnull: well, our osic quota is still a minority of our overall aggregate quota, so we should be seeing it in other providers besides just osic. so far i haven't found any though | 16:58 |
fungi | mtreinish: agreed | 16:58 |
*** nwkarste_ has joined #openstack-infra | 16:58 | |
*** infra-red has quit IRC | 16:58 | |
*** infra-red has joined #openstack-infra | 16:58 | |
openstackgerrit | Merged openstack-infra/elastic-recheck: Make everything plural https://review.openstack.org/355967 | 16:58 |
*** edtubill has quit IRC | 17:00 | |
*** lucasagomes is now known as lucas-dinner | 17:02 | |
*** javeriak_ has joined #openstack-infra | 17:02 | |
*** javeriak has quit IRC | 17:02 | |
krotscheck | Any infra-core around to add a +A to https://review.openstack.org/#/c/346130/ ? I already have 2 +2's. pabelanger, in particualr, as you can verify that the bindep changes have landed. | 17:02 |
*** nwkarste_ has quit IRC | 17:03 | |
*** hockeynut has quit IRC | 17:03 | |
krotscheck | Also, I'm trying to get our JS DSVM job landed... https://review.openstack.org/#/c/348056/8 | 17:03 |
*** xarses has quit IRC | 17:04 | |
*** javeriak has joined #openstack-infra | 17:04 | |
fungi | cloudnull: jeblair: picking jobs running at random in osic, `traceroute6 git.openstack.org` seems to be broken in all of them at the start of jobs. so maybe we have an early race with something in the network there if it's working later on? | 17:05 |
fungi | also i just found one i can't connect to the console for | 17:05 |
*** yamahata has joined #openstack-infra | 17:05 | |
pabelanger | fungi: jeblair: sudo apt-get install rng-tools | 17:06 |
pabelanger | fungi: jeblair: I do not know why yet, but that is making random start faster | 17:06 |
fungi | `nc 2001:4800:1ae1:18:f816:3eff:fe3b:53ef 19885` is just dead for me. should be running gate-tempest-dsvm-neutron-full-ubuntu-xenial | 17:06 |
*** devkulkarni has quit IRC | 17:07 | |
pabelanger | Aug 16 17:06:38 ubuntu kernel: [ 24.471992] random: nonblocking pool is initialized | 17:07 |
pabelanger | lowest was: Aug 16 17:04:20 ubuntu kernel: [ 7.680334] random: nonblocking pool is initialized | 17:07 |
*** javeriak_ has quit IRC | 17:07 | |
jeblair | pabelanger: what image are you using for your tests? | 17:07 |
jeblair | pabelanger: i notice that apparmor is not installed | 17:07 |
zxiiro | electrofelix: I agree with you. I think we should just make it documented and a comment in the tox file | 17:07 |
pabelanger | jeblair: template-ubuntu-xenial-1471316598 | 17:08 |
*** oanson has joined #openstack-infra | 17:09 | |
*** _nadya_ has joined #openstack-infra | 17:09 | |
fungi | pabelanger: rng-tools _can_ be configured to feed /dev/urandom in as a mock hardware rng. probably worth double-checking its config but that might be what it's doing. otherwise it's likely getting passthrough entropy from the hypervisor host | 17:09 |
*** david-lyle_ has joined #openstack-infra | 17:09 | |
*** infra-red has quit IRC | 17:10 | |
*** tonytan4ever has quit IRC | 17:11 | |
*** e0ne has joined #openstack-infra | 17:11 | |
pabelanger | jeblair: I didn't know we installed apparmor explicitly | 17:11 |
fungi | cloudnull: yeah, 2001:4800:1ae1:18:f816:3eff:fe3b:53ef is just plain unreachable, but should still be up. uuid is c6dd7d7d-2797-47a5-b7b1-ed3cc917a4cc according to nodepool | 17:11 |
*** ansmith has joined #openstack-infra | 17:12 | |
cloudnull | fungi: looking now | 17:13 |
*** david-lyle has quit IRC | 17:13 | |
*** david-lyle_ is now known as david-lyle | 17:13 | |
fungi | openstack server list isn't showing that uuid as existing at all for me though | 17:13 |
fungi | oh, now nodepool's deleted it too | 17:14 |
cloudnull | yup deleted. | 17:15 |
cloudnull | :'( | 17:15 |
cloudnull | sorry i was too slow | 17:15 |
jeblair | fungi, pabelanger: unbound (via openssl) may actually be using urandom. the getrandom(2) call reads that by default, but even the urandom pool needs to be initialized, and getrandom will block until urandom has been initialized. | 17:15 |
jeblair | fungi, pabelanger: the fact that it waits until the kernel prints 'nonblocking pool is initialized' reinforces that for me | 17:16 |
jeblair | though i have not checked the openssl code to verify the flags | 17:16 |
fungi | jeblair: seems a likely explanation | 17:16 |
*** _nadya_ has quit IRC | 17:16 | |
jeblair | fungi, pabelanger: finally found the print statement: http://lxr.free-electrons.com/source/drivers/char/random.c?v=4.4#L684 | 17:17 |
pabelanger | fungi: jeblair: nice work on finding the reason | 17:17 |
openstackgerrit | Eddie Ramirez proposed openstack-infra/project-config: Add craton-dashboard repository (Horizon Plugin) https://review.openstack.org/354274 | 17:18 |
*** rbuzatu has joined #openstack-infra | 17:18 | |
*** javeriak has quit IRC | 17:19 | |
*** tqtran has joined #openstack-infra | 17:19 | |
*** javeriak has joined #openstack-infra | 17:19 | |
jeblair | i don't know why it's taking so long to initialize with haveged running and, according to the kernel, 2300+ bits of entropy | 17:20 |
jeblair | when apparently 128 bits is needed to call it initialized | 17:20 |
fungi | cloudnull: jeblair: okay, some random spot checking turned up an example where a job in osic is successfully doing a traceroute6 to git.openstack.org, so this certainly seems inconsistent (could still be a startup race i suppose?) | 17:20 |
jeblair | fungi: yeah, if you're thinking a startup race, it could be affected by how long the node sat idle before launching the job | 17:21 |
cloudnull | fungi: the only way I'm able to reproduce this issue is break the resolvers. | 17:21 |
cloudnull | :( | 17:21 |
fungi | right, exactly what i'm wondering | 17:21 |
*** _nadya_ has joined #openstack-infra | 17:21 | |
fungi | cloudnull: strangely, the example logs i have, dns resolution of git.openstack.org is fine, but traceroute responses aren't coming in | 17:21 |
fungi | owing in part, i think, to the fact that nodepool ready scripts do a dns lookup of that name before ever declaring the node fit for use, so it should have resolution already cached | 17:22 |
fungi | however, also dns lookups are happening via ipv4, so wouldn't be broken by ipv6 routing issues | 17:23 |
jeblair | until that change lands | 17:23 |
jeblair | (though it will still fall back on v4) | 17:23 |
*** fguillot has quit IRC | 17:24 | |
*** piet_ has joined #openstack-infra | 17:25 | |
cloudnull | maybe this is an issue with the neutron router for IPv4 traffic? The v6 network is dual stack in the OSIC and the v4 interface is part of a neutron router. potentially, we're pushing the router farther than it wants to be pushed or its slow to be programmed which is causing the various timeouts? | 17:25 |
*** tonytan4ever has joined #openstack-infra | 17:26 | |
fungi | cloudnull: so what i find particularly strange is that when traceroute6 works we get a response back from what appears to be the global address of the default gateway (2001:4800:1ae1:18::3), but when traceroute6 doesn't work, i don't even get a response from that one indicating an issue with neutron, or neighbor discovery (even though the fe80::def linklocal for that gateway is showing up as having a | 17:27 |
fungi | valid hw address like 00:05:73:a0:00:06), or the local layer 2 maybe? | 17:27 |
*** aaltman has quit IRC | 17:27 | |
*** gomarivera has joined #openstack-infra | 17:27 | |
pabelanger | fungi: jeblair: if rng-tools is hardware based generator, doesn't it make more sense to use that? I admit, haveged and rng-tools is new to me | 17:27 |
fungi | pabelanger: the "hardware" based entropy sources supported by rng-tools may include things that are not actual hardware (especially on virtual machines). but regardless i'm fine with using it | 17:29 |
*** hieulq_ has quit IRC | 17:29 | |
jeblair | pabelanger: i don't know (yet). pabelange, fungi: i'm still digging, and i have found that /proc/sys/kernel/random/entropy_avail is the amount of entropy in the input pool, which i believe feeds the urandom pool, which is what we're waiting on being initialized. so that at least partially explains how the value in proc is high while we're still waiting for initialization. it doesn't explain *why*. | 17:30 |
*** gyee has joined #openstack-infra | 17:30 | |
fungi | pabelanger: haveged provides a nice fallback when there are no rng devices available, since it attempts to extract entropy from other timing-related sources | 17:30 |
openstackgerrit | Jim Rollenhagen proposed openstack-infra/project-config: Make ironic job non-voting on Neutron https://review.openstack.org/356072 | 17:31 |
openstackgerrit | Sai Sindhur Malleni proposed openstack-infra/project-config: Adding Ansible jobs for Browbeat https://review.openstack.org/356073 | 17:32 |
jroll | ^ 356072 is is a fairly easy review so we don't end up blocking neutron this close to release | 17:32 |
*** baoli has joined #openstack-infra | 17:33 | |
pabelanger | fungi: okay thanks for the info. | 17:33 |
*** mhickey has quit IRC | 17:34 | |
pabelanger | But it does seem to be related to which cloud we start ubuntu-xenial on | 17:34 |
pabelanger | Aug 16 17:21:31 ubuntu kernel: [ 3.385322] random: nonblocking pool is initialized | 17:34 |
pabelanger | that is from rackspace | 17:34 |
pabelanger | nice and fast | 17:34 |
jeblair | pabelanger: we don't have a lot of these log lines in logstash, but there are some | 17:34 |
jeblair | pabelanger: i see 30 seconds in ovh, 60 seconds in bluebox | 17:35 |
*** florianf has quit IRC | 17:35 | |
fungi | jeblair: here's another fun anecdote related to urandom initialization times http://haypo-notes.readthedocs.io/summary_python_random_issue.html | 17:35 |
pabelanger | internap is 30sec too | 17:35 |
fungi | seems consistent with what we're suspecting | 17:35 |
*** hieulq_ has joined #openstack-infra | 17:36 | |
*** acoles is now known as acoles_ | 17:36 | |
pabelanger | fungi: oh, nice | 17:36 |
*** florianf has joined #openstack-infra | 17:36 | |
*** thorongil has joined #openstack-infra | 17:36 | |
pabelanger | http://bugs.python.org/issue26839#msg264121 | 17:36 |
*** electrofelix has quit IRC | 17:37 | |
*** ccamacho|out has quit IRC | 17:38 | |
*** thorongil has quit IRC | 17:38 | |
fungi | at least on some platforms, some pseudo-random data gets written out on shutdown and then read in at startup to quickly seed /dev/urandom. we might be able to dump something into /var/lib/random-seed in our job node images | 17:38 |
*** tphummel has joined #openstack-infra | 17:38 | |
*** thorongil has joined #openstack-infra | 17:38 | |
fungi | ahh, that's an rh-ism. debian derivatives use /var/lib/urandom/random-seed to the same ends however | 17:39 |
*** thorongil has quit IRC | 17:40 | |
*** thorongil has joined #openstack-infra | 17:40 | |
*** thorongil has quit IRC | 17:41 | |
*** thorongil has joined #openstack-infra | 17:42 | |
*** shashank_hegde has joined #openstack-infra | 17:43 | |
*** thorongil has quit IRC | 17:43 | |
*** thorongil has joined #openstack-infra | 17:44 | |
*** thorongil has quit IRC | 17:45 | |
*** nwkarste_ has joined #openstack-infra | 17:45 | |
*** thorongil has joined #openstack-infra | 17:45 | |
*** nwkarst__ has joined #openstack-infra | 17:46 | |
*** thorongil has quit IRC | 17:47 | |
*** thorongil has joined #openstack-infra | 17:47 | |
*** thorongil has quit IRC | 17:48 | |
*** devkulkarni has joined #openstack-infra | 17:49 | |
*** nwkarste_ has quit IRC | 17:50 | |
*** hieulq_ has quit IRC | 17:50 | |
fungi | pabelanger: that makes sense, so starting in linux 3.17 we're getting that behavior, which explains why it's impacting xenial and not trusty or centos 7 | 17:51 |
fungi | i would guess recent fedoras are impacted as well | 17:52 |
*** oanson has quit IRC | 17:52 | |
fungi | debian jessie is one kernel rev too old to see it | 17:52 |
pabelanger | ya, we can check fedora-24 | 17:52 |
pabelanger | we have a node online | 17:52 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool: [WIP] Add scheduling thread to nodepool builder https://review.openstack.org/356079 | 17:52 |
*** Sukhdev has quit IRC | 17:53 | |
pabelanger | fungi: so we have a few work around for now, rng-tools, smarter configure_mirror.sh. I wait until jeblair is finished before moving forward on that front | 17:54 |
SpamapS | Hey. I just wanted to offer some public praise. Thanks for all the hard work you everyone in infra has put in on zuul and nodepool. :-D http://zuul.cloud-ci.ibmcis.com/ | 17:54 |
fungi | pabelanger: jeblair: i guess we could also pick haypo's brain in #openstack-oslo about this since he seems to have dig into it quite a bit | 17:54 |
mordred | SpamapS: woot! | 17:54 |
tlbr | infra-core could you please review merge https://review.openstack.org/#/c/347047/ ? | 17:54 |
pabelanger | SpamapS: yay | 17:54 |
fungi | er, dug | 17:55 |
fungi | SpamapS: thanks! | 17:55 |
SpamapS | We're pipelining and jobbing and really just happy as clams to have CI that works like upstream. :-D | 17:55 |
*** sdake has joined #openstack-infra | 17:55 | |
*** tqtran has quit IRC | 17:55 | |
Shrews | mmm, clams | 17:55 |
kgiusti | +1 what SpamapS said - thanks muchly! | 17:56 |
*** baoli has quit IRC | 17:56 | |
*** ccamacho has joined #openstack-infra | 17:57 | |
*** baoli has joined #openstack-infra | 17:57 | |
SpamapS | Shrews: oh heck yeah, clams would be great | 17:57 |
SpamapS | steamed in a little white wine sauce. :) | 17:57 |
Shrews | oh yeah | 17:58 |
tlbr | mordred, could you please also review https://review.openstack.org/#/c/347047/ ? We want to start work on this projects as soon as possible :) | 17:58 |
*** tqtran has joined #openstack-infra | 17:58 | |
Shrews | mordred: jeblair: notmorgan: look, pretty diagrams https://review.openstack.org/#/c/356079/1/doc/source/devguide.rst | 17:58 |
*** andrey-mp has joined #openstack-infra | 17:58 | |
*** gomarivera has quit IRC | 17:58 | |
*** tonytan4ever has quit IRC | 17:59 | |
mordred | Shrews: woot | 18:00 |
jeblair | SpamapS: thanks! | 18:00 |
jeblair | Shrews: nice! | 18:00 |
*** dmsimard|afk is now known as dmsimard | 18:01 | |
*** ganesan has quit IRC | 18:01 | |
*** rcernin has joined #openstack-infra | 18:01 | |
*** tqtran has quit IRC | 18:03 | |
*** rbrndt has quit IRC | 18:04 | |
*** kzaitsev_mb has quit IRC | 18:06 | |
rcarrillocruz | nice :-) | 18:07 |
*** dprince has quit IRC | 18:07 | |
*** ociuhandu has quit IRC | 18:07 | |
openstackgerrit | Merged openstack-infra/puppet-infracloud: Switch to infra-cloud-bridge element https://review.openstack.org/356009 | 18:07 |
rcarrillocruz | \o/ ^ | 18:08 |
*** ccamacho has quit IRC | 18:08 | |
jeblair | pabelanger, fungi: i booted the machine without haveged and verified that unbound will continue to sit there waiting because there is no entropy. so i know that we are getting entropy from haveged. i then ran haveged in the foreground which immediately (<1s) provided entropy to the pool. yet it still took 95 seconds for the pool to be initialized | 18:08 |
jeblair | er, sorry, it took an additional 30 seconds to be initialized | 18:08 |
jeblair | (i waited 60 seconds to start) | 18:08 |
rcarrillocruz | pabelanger: i'm going to wipe the deploy dib image to get the bridge element in | 18:09 |
rcarrillocruz | in case you want to run it to see how it goes | 18:09 |
rcarrillocruz | ? | 18:09 |
rcarrillocruz | well nm, it seems you all are hooked with the entropy thing, sorry for the noise | 18:09 |
fungi | jeblair: what about with haveged removed but rng-tools installed? in theory (if this is qemu-based at least) there'll be a virt-rng it uses to get extra entropy | 18:11 |
*** tqtran has joined #openstack-infra | 18:12 | |
jeblair | fungi: i don't know, but i'm not quite ready to try that yet; still trying to understand the sequence with haveged | 18:12 |
mordred | also - rackspace isn't qemu based | 18:12 |
jeblair | i'm running on osic | 18:12 |
jeblair | i think :) | 18:12 |
fungi | mordred: yeah, not sure how this will vary from provider to provider | 18:13 |
mordred | I know - I was just respnding to fungi in that we have to make sure that fixing osic doesn't break rax | 18:13 |
mordred | fungi: ++ | 18:13 |
jeblair | ya | 18:13 |
*** xarses has joined #openstack-infra | 18:13 | |
fungi | we already know there's a significant timing variance for this across providers. seems to block longer on some than others | 18:13 |
*** degorenko is now known as _degorenko|afk | 18:13 | |
jeblair | fungi's question is a good one -- essentially in my mind as "if haveged is doing it's thing quickly, which seems to be the case, why does rng-tools appear to be faster" | 18:14 |
jeblair | i just think i have a bit more data i can pull out of this configuration before i start to examine the delta with that one | 18:14 |
fungi | i'm also curious if it's got a /var/lib/urandom/random-seed it's reading in at boot | 18:15 |
*** _nadya_ has quit IRC | 18:15 | |
fungi | if we're seeing this delay on successive reboots then the on-disk seed likely isn't going to help | 18:15 |
jeblair | cat: /var/lib/urandom/random-seed: No such file or directory | 18:15 |
sdague | mtreinish: http://logs.openstack.org/10/352610/4/gate/gate-tempest-dsvm-cells/1d2248f/console.html still failing even after the keystone revert, so I think osic issues are still a real thing | 18:16 |
fungi | jeblair: oh, i wonder if it's not saving one for some reason, or if it's been moved | 18:16 |
jeblair | fungi: /var/lib/systemd/random-seed exists | 18:16 |
fungi | aha, now all restaurants are taco bell^W^Wsystemd | 18:17 |
*** fguillot has joined #openstack-infra | 18:17 | |
*** javeriak_ has joined #openstack-infra | 18:17 | |
jeblair | fungi: there is a /lib/systemd/system/systemd-random-seed.service | 18:18 |
jeblair | Description=Load/Save Random Seed | 18:18 |
*** javeriak has quit IRC | 18:18 | |
fungi | yeah, so sounds like it's there and doesn't help speed up urandom initialization at boot | 18:18 |
jeblair | i would like to verify it's working | 18:18 |
*** tqtran has quit IRC | 18:19 | |
mordred | jeblair: is it enabled? | 18:19 |
*** pvaneck has joined #openstack-infra | 18:19 | |
jeblair | # service random-seed status | 18:19 |
jeblair | ● random-seed.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead) | 18:19 |
mordred | well, there we go | 18:19 |
jeblair | mordred: that looks somewhat negative | 18:20 |
mordred | yah | 18:20 |
mordred | is /var/lib/urandom present? | 18:20 |
openstackgerrit | Abhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service https://review.openstack.org/355670 | 18:20 |
jeblair | yes | 18:20 |
mordred | nod | 18:20 |
fungi | huh | 18:20 |
fungi | that's odd | 18:20 |
*** jaosorior has joined #openstack-infra | 18:21 | |
Shrews | i like the "active-inactive-dead" text there. not confusing at all | 18:21 |
fungi | ELENNART | 18:21 |
sdague | does someone have a summary of the current best theory around the osic issues? | 18:21 |
jeblair | Shrews: 2 out of 3 are negative | 18:22 |
fungi | mtreinish: i found the root of our snmpd issue on xenial... https://review.openstack.org/10112 says "This can go away after everything is upgraded to precise..." | 18:22 |
sdague | as I'm trying to ponder if there is a short term mitigation on the qa side? | 18:22 |
Shrews | jeblair: ah, it's a proportional failure message. got it :) | 18:22 |
jeblair | sdague: i think you're referring to the 'glance' issue? | 18:23 |
*** gomarivera has joined #openstack-infra | 18:23 | |
sdague | yeh | 18:23 |
jeblair | sdague: or do you mean the 'git' issue? | 18:23 |
mordred | jeblair: the internet tells me the actice: inactive (dead) thing may be the result of getting the name wrong in the status command | 18:23 |
sdague | well... the glance issue is coupled to the git issue, right? | 18:23 |
andrey-mp | hi! is there a document about gate/integrated queue? I want to inderstand how it works... job for changeset 352455 is going about 10 hours. it stops at the end and begins again... | 18:24 |
fungi | sdague: that's unclear. the glance issue in bluebox does seem to be related to being unable to directly reach git.openstack.org because it's being treated as a "fake" glance remote location | 18:24 |
mordred | jeblair: I think you want "systemctl status systemd-random-seed.service" | 18:24 |
jeblair | mordred: aha you and the internet are right | 18:24 |
mordred | woot! | 18:24 |
pabelanger | rcarrillocruz: okay | 18:25 |
fungi | sdague: the glance issue we're seeing in osic seems to be that calls to the local glance service on the job node time out (so maybe behind the scenes, glance is acting as a sort of proxy to that remote file on git.o.o still?) | 18:25 |
mordred | although amusingly on my laptop I get a different error when I do that wrong | 18:25 |
jeblair | mordred: Active: active (exited) since Tue 2016-08-16 18:06:16 UTC; 18min ago | 18:25 |
jeblair | that seems more better | 18:25 |
mordred | jeblair: that's good | 18:25 |
*** jaosorior has quit IRC | 18:25 | |
jeblair | now i wonder if there are any logs | 18:25 |
jeblair | cause i would like to have a timestamp for when it ran/exited | 18:25 |
pabelanger | nice work | 18:26 |
jeblair | journalctl -u systemd-random-seed.service | 18:27 |
jeblair | Aug 16 18:06:16 ubuntu systemd[1]: Started Load/Save Random Seed. | 18:27 |
*** dprince has joined #openstack-infra | 18:27 | |
*** gomarivera has quit IRC | 18:27 | |
jeblair | that's about t+0 seconds for this boot | 18:27 |
jeblair | so it seems to have run as expected | 18:28 |
*** berendt has joined #openstack-infra | 18:28 | |
*** berendt has quit IRC | 18:28 | |
*** baoli has quit IRC | 18:29 | |
*** baoli has joined #openstack-infra | 18:29 | |
*** cody-somerville has quit IRC | 18:29 | |
*** cody-somerville has joined #openstack-infra | 18:29 | |
*** csomerville has joined #openstack-infra | 18:30 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/puppet-snmpd: Remove initscript https://review.openstack.org/356090 | 18:30 |
openstackgerrit | Henry Gessau proposed openstack-infra/project-config: Use python-db-jobs for networking-sfc https://review.openstack.org/354358 | 18:30 |
fungi | mtreinish: pabelanger: ^ | 18:31 |
*** Jeffrey4l has quit IRC | 18:31 | |
*** rbrndt has joined #openstack-infra | 18:32 | |
*** ociuhandu has joined #openstack-infra | 18:32 | |
*** tqtran has joined #openstack-infra | 18:32 | |
*** cody-somerville has quit IRC | 18:34 | |
*** abregman has joined #openstack-infra | 18:36 | |
*** chem` has joined #openstack-infra | 18:37 | |
*** chem has quit IRC | 18:38 | |
*** _nadya_ has joined #openstack-infra | 18:40 | |
*** amotoki has joined #openstack-infra | 18:45 | |
mtreinish | fungi: heh, that would do it | 18:45 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config: Add ansible-role-jobs for browbeat https://review.openstack.org/356093 | 18:45 |
openstackgerrit | Vasyl Saienko proposed openstack-infra/devstack-gate: DO NOT REVIEW https://review.openstack.org/356094 | 18:46 |
*** _nadya_ has quit IRC | 18:48 | |
*** amotoki has quit IRC | 18:48 | |
mtreinish | sdague: ok, sure. At least we know now | 18:49 |
mtreinish | fungi: especially since the whole systemd thing on xenial an init script is even less useful there :) | 18:49 |
mtreinish | fungi: do we need an equiv systemd unit file for xenial or does it come with the package? | 18:50 |
openstackgerrit | Isaku Yamahata proposed openstack-infra/project-config: Add networking-odl for grafana dashboard https://review.openstack.org/353226 | 18:51 |
fungi | mtreinish: the snmpd package on xenial ships with an initscript still | 18:51 |
mtreinish | heh, ok | 18:52 |
*** abregman has quit IRC | 18:52 | |
*** ryanpetrello has quit IRC | 18:54 | |
*** ryanpetrello has joined #openstack-infra | 18:55 | |
*** chem` has quit IRC | 18:55 | |
*** Goneri has quit IRC | 18:55 | |
*** tonytan4ever has joined #openstack-infra | 18:58 | |
*** devkulkarni1 has joined #openstack-infra | 18:59 | |
*** Sukhdev has joined #openstack-infra | 18:59 | |
fungi | it's that (weekly infra team meeting) time again! find us in #openstack-meeting for the next hour | 19:00 |
*** Na3iL has quit IRC | 19:00 | |
*** e0ne has quit IRC | 19:01 | |
*** Goneri has joined #openstack-infra | 19:01 | |
*** andrey-mp has left #openstack-infra | 19:01 | |
*** devkulkarni has quit IRC | 19:01 | |
*** baoli has quit IRC | 19:01 | |
*** edtubill has joined #openstack-infra | 19:02 | |
openstackgerrit | Isaku Yamahata proposed openstack-infra/project-config: networking-odl: cover more combinations of version https://review.openstack.org/347045 | 19:02 |
*** camunoz has joined #openstack-infra | 19:04 | |
*** edtubill has quit IRC | 19:04 | |
openstackgerrit | Joost van der Griendt proposed openstack-infra/jenkins-job-builder: Add support for stash-pullrequest-builder plugin Although the application has now been renamed/merge to BitBucket, it is still sensible to keep the Stash name for now. As there are already plugins named BitBucket, which are purely targeting the cloud solu https://review.openstack.org/355211 | 19:05 |
*** fifieldt has quit IRC | 19:06 | |
*** gomarivera has joined #openstack-infra | 19:08 | |
openstackgerrit | Isaku Yamahata proposed openstack-infra/project-config: Add networking-odl for grafana dashboard https://review.openstack.org/353226 | 19:11 |
*** edtubill has joined #openstack-infra | 19:13 | |
*** sdake_ has joined #openstack-infra | 19:14 | |
*** asettle has joined #openstack-infra | 19:14 | |
*** sdake has quit IRC | 19:14 | |
*** Apsu has left #openstack-infra | 19:15 | |
*** sdake_ has quit IRC | 19:15 | |
*** docaedo has quit IRC | 19:16 | |
*** sdake has joined #openstack-infra | 19:16 | |
*** dtardivel has quit IRC | 19:17 | |
openstackgerrit | Merged openstack-infra/system-config: Pre-install python2-requests package for Fedora https://review.openstack.org/355731 | 19:17 |
*** edtubill has quit IRC | 19:18 | |
*** fifieldt has joined #openstack-infra | 19:19 | |
*** _nadya_ has joined #openstack-infra | 19:20 | |
*** martinkopec has joined #openstack-infra | 19:20 | |
*** martinkopec has quit IRC | 19:21 | |
anteaya | sdague: I'm at a loss about http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-08-16.log.html#t2016-08-16T11:51:36 | 19:21 |
*** asettle has quit IRC | 19:21 | |
anteaya | sdague: you link to a line that says groups: | 19:21 |
anteaya | - labs | 19:21 |
sdague | anteaya: I assumed you'd be up at that timezone, and it was a project-config question | 19:21 |
anteaya | and ask about xenial nodes | 19:21 |
anteaya | did you get an answer? | 19:22 |
anteaya | I have been offline most of today | 19:22 |
sdague | about how to set the launchpad bug project page | 19:22 |
sdague | I did not, but cdent is really the one that needs to know | 19:22 |
anteaya | yes, that is the way to set a launchpad group for bugs | 19:22 |
anteaya | this is the best documentation for setting up launchpad: http://docs.openstack.org/infra/manual/creators.html#set-up-launchpad | 19:23 |
anteaya | as the group and who owns the group is important | 19:23 |
*** iurygregory has joined #openstack-infra | 19:25 | |
*** Hal has joined #openstack-infra | 19:26 | |
*** edtubill has joined #openstack-infra | 19:27 | |
*** edtubill has quit IRC | 19:28 | |
*** xyang1 has quit IRC | 19:29 | |
*** Hal has quit IRC | 19:30 | |
*** edtubill has joined #openstack-infra | 19:30 | |
*** tqtran has quit IRC | 19:34 | |
*** tongli has quit IRC | 19:35 | |
*** Goneri has quit IRC | 19:36 | |
karthikp_ | clarkb: afazekas: sdague, ianw Please could you help me review these change to the infra in your free time. we need this to test multinode grenade job for Cinder | 19:37 |
karthikp_ | Thanks in advance | 19:37 |
*** gomarivera has quit IRC | 19:40 | |
karthikp_ | https://review.openstack.org/#/c/355678/ | 19:41 |
sdague | clarkb: is there a patch up already to move cells & ceph jobs to xenial? | 19:41 |
*** _nadya_ has quit IRC | 19:42 | |
openstackgerrit | Joost van der Griendt proposed openstack-infra/jenkins-job-builder: Adding support for Hidden parameter plugin https://review.openstack.org/355209 | 19:42 |
*** docaedo has joined #openstack-infra | 19:44 | |
*** markusry has quit IRC | 19:44 | |
*** tqtran has joined #openstack-infra | 19:47 | |
*** florianf has quit IRC | 19:48 | |
*** markusry has joined #openstack-infra | 19:48 | |
tlbr | infra-team could you please merge https://review.openstack.org/#/c/347047/ ? | 19:48 |
*** hockeynut has joined #openstack-infra | 19:48 | |
openstackgerrit | yolanda.robla proposed openstack-infra/system-config: Bump version of rabbitmq module https://review.openstack.org/356117 | 19:52 |
*** hockeynut has quit IRC | 19:53 | |
*** kzaitsev_mb has joined #openstack-infra | 19:54 | |
*** hockeynut has joined #openstack-infra | 19:54 | |
*** Apoorva has joined #openstack-infra | 19:55 | |
mtreinish | jeblair, fungi: how difficult would it be to get the node type into the metadata we pass to logstash and subunit2sql? | 19:56 |
*** nwkarst__ has quit IRC | 19:57 | |
*** nwkarsten has joined #openstack-infra | 19:57 | |
*** asettle has joined #openstack-infra | 19:59 | |
*** asettle has quit IRC | 19:59 | |
*** tqtran has quit IRC | 19:59 | |
*** camunoz has quit IRC | 19:59 | |
*** annegentle has joined #openstack-infra | 19:59 | |
*** nwkarste_ has joined #openstack-infra | 20:00 | |
jpmaxman | Krenair: I think your config is more correct - keep in mind this was patched together going from older distribution / apache. I'm assuming you started fresh with trusty / apache 2.4 | 20:01 |
*** nwkarsten has quit IRC | 20:02 | |
Krenair | Fresh Trusty, then I applied puppet which gave me apache etc. | 20:02 |
fungi | Krenair: yeah, this was me porting jpmaxman's apache config changes to production. i didn't really try to whittle them down. what you have in your change is likely sufficient | 20:02 |
*** julim has quit IRC | 20:03 | |
fungi | that diff was simply between what we had on the production server and what i found on the upgrade test server. so it's known-working, but almost certainly could be improved/tightened | 20:04 |
yolanda | hi, so no time in the meeting... i wanted to raise the topic about mid-cycle sprint. OPNFV people are interested in adding some slot to the agenda, they requested in the etherpad | 20:04 |
*** e0ne has joined #openstack-infra | 20:05 | |
jeblair | mtreinish: possible; have the ansible launch server return it in the zmq event it sends | 20:05 |
*** tqtran has joined #openstack-infra | 20:05 | |
mtreinish | jeblair: ok do you have a link to where to start looking? That way I can take a detailed look after the tc meeting | 20:06 |
fungi | yolanda: opnfv people from the qa team? or we have opnfv people on the infra team? | 20:07 |
jeblair | mtreinish: yep, right here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n857 | 20:07 |
mtreinish | jeblair: cool, thanks | 20:07 |
*** edtubill has quit IRC | 20:07 | |
fungi | mtreinish: so you mean some different node type than what we already record in logstash? | 20:08 |
yolanda | opnfv people from infra. Actually Fatih is interested on coming | 20:08 |
yolanda | i'm collaborating with them in infracloud deployment efforts on opnfv | 20:09 |
fungi | yolanda: cool, i didn't know we had people in infra helping with that | 20:09 |
jeblair | fungi, pabelanger: i believe i have found that restoring entropy data from a file in the manner of systemd (or init scripts that use dd) does put entropy into the pool, but does *not* update the entropy count. | 20:09 |
fungi | yolanda: so it's really less an opnfv topic, and more a making infra-cloud reconsumable downstream topic? | 20:09 |
*** matrohon has joined #openstack-infra | 20:09 | |
anteaya | yolanda: what is faith's irc nick? | 20:10 |
mtreinish | fungi: we have the build_node and the node_provider today. I just want like trusty, or xenial-2-node or something like that | 20:10 |
*** vhosakot has quit IRC | 20:10 | |
anteaya | I've been bemoaning the lack of new women lately | 20:10 |
yolanda | they have interest in infra-cloud, they need some specific features, being more reconsumable, having ha, some more network configs | 20:10 |
anteaya | great to see more | 20:10 |
yolanda | but also an specific opnfv topic, about how can we collaborate better | 20:10 |
mtreinish | fungi: so we can more easily see if a failure is isolated to a specific distro or something like that | 20:10 |
yolanda | fungi, Fatih nick is fdegir | 20:10 |
anteaya | yolanda: thanks | 20:11 |
openstackgerrit | Merged openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988 | 20:11 |
openstack | bug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New] https://launchpad.net/bugs/1613749 | 20:11 |
jpmaxman | so Krenair Fungi I'm not super familiar with Puppet - so really with your changes as far as I can tell they look good. I'd be more capable of judging by looking at a resulting server that was spun up from these puppet scripts. I was actually hoping to do that myself, but it's a little silly to hold this up for that. I'm hopeful to get more familiar with | 20:11 |
jpmaxman | puppet in general and be able to be more helpful with that side of things moving forward. | 20:11 |
*** vhosakot has joined #openstack-infra | 20:11 | |
jeblair | mtreinish: 'node_image' is what i would recommend for naming that with specificity | 20:11 |
fungi | mtreinish: oh, indeed, for some reason i thought we had the base node label as a parameter there already | 20:11 |
fungi | but on inspection i see it's definitely noy | 20:12 |
fungi | not | 20:12 |
mtreinish | I prefer noy :) | 20:12 |
jeblair | jpmaxman: cool -- if it's at all helpful, there's a bit of a walkthrough here about how to run infra puppet on a vm: http://docs.openstack.org/infra/system-config/sysadmin.html#making-a-change-in-puppet | 20:13 |
yolanda | fungi, anteaya, so well, i wanted to raise the attention on the etherpad, requesting that slot to be added if there is time, so Fatih can come to the mid-cycle if there is interest on it | 20:13 |
fungi | yolanda: testing additional deployments of our infra-cloud manifests sounds like something some of the attendees might find interesting, but i would avoid spinning it as the infra team helping opnfv deploy a cloud | 20:16 |
Krenair | There's a story somewhere about having a wiki-dev server | 20:16 |
Krenair | maybe the puppet changes could be applied there and tested properly? | 20:16 |
*** tqtran has quit IRC | 20:16 | |
Krenair | It could be that my puppet changes don't cover everywhere and there's still some things to do that I didn't find | 20:17 |
*** gomarivera has joined #openstack-infra | 20:17 | |
Krenair | e.g. you made some changes for ReCaptcha I think? | 20:17 |
fungi | Krenair: yep, i expect that we'll do that as the next step after we merge those changes. puppet is currently disabled for the production server since the upgrade | 20:17 |
*** e0ne has quit IRC | 20:18 | |
fungi | Krenair: http://paste.openstack.org/show/558529 was the change i applied to Settings.php (with credentials redacted) | 20:18 |
*** yaume has joined #openstack-infra | 20:18 | |
yolanda | fungi, i would not say "infra team helping them", but propose as some ways to collaborate or join efforts | 20:19 |
Krenair | fungi, yeah it seems we're going to have quite a few extra things to puppetise | 20:19 |
fungi | Krenair: again, just directly ported from the upgrade test server jpmaxman worked on, with some whitespace cleanup to reduce the diff as much as possible | 20:19 |
Krenair | why was MF removed? | 20:20 |
fungi | Krenair: it allowed account creation outside openid previously | 20:20 |
fungi | it's possible in 1.27 that's no longer the case | 20:20 |
Krenair | okay but isn't disabling that a separate patch? why was it included in a wiki-upgrade change? | 20:21 |
yolanda | anyway, i have to leave, i'll try to attend to next infra meeting and propose some item to the agenda, to see if there is interest on having an slot for it or not | 20:22 |
jpmaxman | Krenair: also when I enabled it the wiki error'd out | 20:22 |
jpmaxman | I didn't dig into it too deep | 20:23 |
fungi | yeah, we discussed the possibility of reenabling it again once we work out what's needed | 20:23 |
fungi | this was fairly rushed as we're still scrambling to get the spam problem under control | 20:23 |
*** baoli has joined #openstack-infra | 20:24 | |
fungi | so having a wiki with limited incoming spam was prioritized over some previous features we had | 20:24 |
*** pfallenop has quit IRC | 20:24 | |
fungi | similar for file uploads | 20:24 |
Krenair | okay well | 20:24 |
pabelanger | jeblair: good to know, thanks for the update | 20:25 |
Krenair | there's no safe way you can just send these commit through and apply puppet in prod, it's going to have to go through a wiki-dev server | 20:25 |
fungi | Krenair: yes, that's what i'm expecting | 20:25 |
Krenair | too many unknowns created by working on servers without using puppet | 20:26 |
fungi | Krenair: we have puppet entirely disabled for the production server for now so we can work through massive refactoring of that puppet module in safety on a dev deployment | 20:26 |
cloudnull | fungi pabelanger mordred jeblair: Just as an update, we've found that the VLAN that was supposed to be running on all of our compute nodes wasn't trunked to all of the required switch ports. so that is likely a major part of the recent raft of failures. I **Believe** this is fixed now. we're rerunning some tests and I'll let you know what I find out. | 20:26 |
Krenair | okay. what will it take to get a -dev server? | 20:26 |
Krenair | I assume these servers are all just instances in a cloud somewhere, right? you don't have to procure hardware for this | 20:26 |
jpmaxman | right - I think dev-wiki is next step and get that where we want it to be with the functionality we want | 20:27 |
*** jordanP has joined #openstack-infra | 20:27 | |
jpmaxman | Krenair: correct | 20:27 |
krotscheck | mordred: I'm going through these cloud-config things here- what's the point of having the API version in clouds.config? Shouldn't the SDK be the thing that knows what language it can talk? | 20:27 |
pabelanger | cloudnull: Nice, thanks for the update | 20:27 |
fungi | Krenair: i (or another of our ~dozen root admins) needs to launch one. this is a priority for me, but it's competing with a number of other priorities so i can't promise it in the next 24-48 hours | 20:27 |
*** inc0 has quit IRC | 20:27 | |
krotscheck | Any infra cores around that can +A this patch? I've got 2x+2's, Ajaeger is on vacation, and I don't really want to sit on this for the next two weeks. | 20:28 |
Krenair | okay well, don't let me rush you :) | 20:28 |
krotscheck | https://review.openstack.org/#/c/346130/ | 20:28 |
*** piet_ has quit IRC | 20:28 | |
fungi | krotscheck: specifically we'll need to create a server instance for it, a trove instance to hold its database, a cinder volume for the file content mounted in the appropriate place on the fs, add some dns records, and we're probably at a minimum also lacking some glue in the system-config repo to instantiate the mediawiki module for that new server name | 20:28 |
fungi | er, Krenair ^ | 20:29 |
fungi | sorry krotscheck | 20:29 |
*** piet has joined #openstack-infra | 20:29 | |
* krotscheck lays claim on the tab-completion scope of the letter K! | 20:29 | |
Zara | :) | 20:29 |
* fungi is now known as krugerand | 20:30 | |
fungi | oh, i guess it has two r's | 20:30 |
fungi | well, three in total | 20:30 |
*** tqtran has joined #openstack-infra | 20:31 | |
*** kgiusti has left #openstack-infra | 20:31 | |
*** gouthamr has quit IRC | 20:33 | |
*** tqtran has quit IRC | 20:34 | |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Deploy minimal services in multinode job https://review.openstack.org/355097 | 20:34 |
*** pfallenop has joined #openstack-infra | 20:35 | |
*** xyang1 has joined #openstack-infra | 20:36 | |
ianw | krotscheck: it's waiting on depends-on's ? | 20:36 |
krotscheck | ianw: Hrm. | 20:37 |
krotscheck | ianw: Ah, right. So https://review.openstack.org/#/c/334873/ is a review that I don't have any other cores on | 20:37 |
*** rbuzatu has quit IRC | 20:37 | |
Krenair | fungi, okay, well, let me know when it's up? | 20:37 |
Krenair | I'm on holiday next week but other than that I should be available | 20:38 |
*** javeriak has joined #openstack-infra | 20:40 | |
ianw | krotscheck: is that list in 334873 curated in any way, or just grabbed from somewhere? i mean i don't mind just putting it in as is, the only problem would be that it does too much | 20:40 |
krotscheck | ianw: You'd have to ask AJaeger, I think it's a list of default dependencies from project ¯\_(ツ)_/¯ | 20:41 |
*** pfallenop has quit IRC | 20:41 | |
*** tqtran has joined #openstack-infra | 20:42 | |
*** piet has quit IRC | 20:44 | |
*** javeriak_ has quit IRC | 20:44 | |
*** armax has quit IRC | 20:45 | |
*** jheroux has quit IRC | 20:45 | |
*** e0ne has joined #openstack-infra | 20:45 | |
fungi | Krenair: will do. your work on this so far is much appreciated too! | 20:47 |
*** ansmith has quit IRC | 20:48 | |
*** pfallenop has joined #openstack-infra | 20:48 | |
*** tqtran has quit IRC | 20:48 | |
*** kbaegis has quit IRC | 20:48 | |
*** jheroux has joined #openstack-infra | 20:49 | |
jpmaxman | yes Krenair thank you! | 20:51 |
*** jordanP has quit IRC | 20:52 | |
*** edtubill has joined #openstack-infra | 20:52 | |
*** yaume has quit IRC | 20:54 | |
*** Apoorva_ has joined #openstack-infra | 20:54 | |
*** kbaegis has joined #openstack-infra | 20:55 | |
*** matrohon has quit IRC | 20:55 | |
*** piet has joined #openstack-infra | 20:56 | |
*** javeriak has quit IRC | 20:57 | |
*** Apoorva has quit IRC | 20:57 | |
*** rbrndt has quit IRC | 20:59 | |
*** tonytan4ever has quit IRC | 20:59 | |
*** dprince has quit IRC | 21:00 | |
*** raildo has quit IRC | 21:01 | |
cloudnull | fungi pabelanger mordred jeblair: So I've now built a VM on every compute node using the V6 network and pinged it. Additionally I've added user data to the VM to install traceroute and tracerout(6) git.o.o and from a spot check of many of the instances console log they're all being able to get there. so I **hope** this "resolves" the issue with instances + busted v6 networks. In test, I've found a few misbehaving hosts | 21:01 |
cloudnull | have pulled them from the available pool. | 21:01 |
fungi | cloudnull: thanks! | 21:02 |
*** thorst_ has quit IRC | 21:02 | |
mtreinish | cloudnull: did you check it from a trusty vm by any chance? | 21:02 |
fungi | mtreinish: sdague: ^ keep an eye out for continued hits | 21:02 |
cloudnull | IDK if that makes the localhost routing thing happy, but getting there. | 21:02 |
cloudnull | mtreinish: no, i did it w/ xenial | 21:02 |
mtreinish | cloudnull: because that was another side of the equation we saw. The failures were only happening on trusty jobs | 21:03 |
cloudnull | I can do it w/ trusty | 21:03 |
cloudnull | mtreinish: the localhost failures we're on trusty ? | 21:04 |
mtreinish | cloudnull: yep | 21:04 |
cloudnull | ok. i'll give that a go to o | 21:04 |
*** javeriak has joined #openstack-infra | 21:04 | |
*** sdague has quit IRC | 21:04 | |
*** tqtran has joined #openstack-infra | 21:07 | |
*** yamamoto has joined #openstack-infra | 21:08 | |
openstackgerrit | Merged openstack-infra/project-config: Added documentation draft jobs for nodejs-based projects https://review.openstack.org/346130 | 21:09 |
*** gomarivera has quit IRC | 21:10 | |
fungi | cloudnull: `nc 2001:4800:1ae1:18:f816:3eff:fed4:f536 198851 to see a log of an instance which showed failing traceroute6 as recently as 10 minutes ago. uuid is 553a91ef-fe3d-4c14-965a-419cf93acbba | 21:10 |
openstackgerrit | Merged openstack/python-jenkins: Remove discover from test-requirements https://review.openstack.org/345764 | 21:10 |
*** Apoorva_ has quit IRC | 21:11 | |
cloudnull | I can ping that node . | 21:12 |
fungi | cloudnull: i ssh'd into it, and `traceroute6 git.openstack.org` continues to fail for me there | 21:12 |
*** Apoorva has joined #openstack-infra | 21:12 | |
fungi | want me to hold it? | 21:12 |
cloudnull | hum. are the routes set? | 21:12 |
cloudnull | if you could | 21:12 |
fungi | okay, it's held | 21:13 |
cloudnull | does ``host git.openstack.org`` work? | 21:13 |
*** dizquierdo has quit IRC | 21:13 | |
fungi | yeah, and it also resolved it correctly for the traceroute6 | 21:13 |
fungi | just gets no responses back to its datagram probes | 21:14 |
fungi | default via fe80::def dev eth0 proto ra metric 1024 expires 1787sec hoplimit 64 | 21:14 |
fungi | which i take it is the linklocal of the next hop | 21:14 |
cloudnull | hum. | 21:14 |
fungi | fe80::def dev eth0 lladdr 00:05:73:a0:00:06 router REACHABLE | 21:14 |
* cloudnull looking at the compute node | 21:14 | |
*** yamamoto has quit IRC | 21:15 | |
fungi | `ping6 git.openstack.org` from it works fine | 21:16 |
*** Hal has joined #openstack-infra | 21:16 | |
cloudnull | well thats odd. | 21:16 |
fungi | it's possible that these failing v6 traceroutes are correlated to "trusty in osic" and the job failures are also correlated to "trusty in osic" but that the two behaviors are unrelated | 21:18 |
*** matrohon has joined #openstack-infra | 21:18 | |
cloudnull | fungi: its missing all the hops ? or just fails all together? | 21:18 |
*** gyee has quit IRC | 21:20 | |
*** edtubill has quit IRC | 21:20 | |
cloudnull | also does cloning from git.o.o work and _not_ take the 4 some odd minutes. | 21:20 |
*** sarob has joined #openstack-infra | 21:20 | |
*** gomarivera has joined #openstack-infra | 21:20 | |
*** e0ne has quit IRC | 21:20 | |
fungi | cloudnull: missing all hops | 21:21 |
*** jcoufal_ has quit IRC | 21:22 | |
fungi | we have trusty servers elsewhere with working ipv6 and the basic ip6tables -L output matches | 21:22 |
fungi | and i can successfully traceroute6 to stuff from those | 21:22 |
*** jordanP has joined #openstack-infra | 21:23 | |
*** jkilpatr has quit IRC | 21:23 | |
*** spzala_ has quit IRC | 21:23 | |
cloudnull | does it miss all of the hops w/ something else, like google.com ? | 21:23 |
fungi | so it doesn't seem to be a misconfigured firewall rule or trusty-specific bug | 21:23 |
*** spzala has joined #openstack-infra | 21:23 | |
*** gyee has joined #openstack-infra | 21:23 | |
fungi | yep | 21:24 |
cloudnull | and from the sounds of it, everything is working? | 21:24 |
cloudnull | besides the traceroute that is | 21:24 |
fungi | right, so i think the traceroute errors we're getting in the logs may be unrelated to the slow git clones and to the glance-related errors in devstack | 21:25 |
mtreinish | fungi: heh, that'd be too much of a coincidence for me to have 2 separate issues with trusty + osic involving talking to git.o.o | 21:25 |
*** sarob has quit IRC | 21:25 | |
fungi | it's something we can (and should) dig into, but i'm unconvinced it's a marker for the other issues | 21:25 |
*** sarob has joined #openstack-infra | 21:26 | |
fungi | mtreinish: well, i have a node held where traceroute6 to git.o.o times out, but pin6 to it works fine and git cloning from it works fine | 21:26 |
fungi | s/pin6/ping6/ | 21:26 |
*** rcernin has quit IRC | 21:27 | |
fungi | i'm going to try to find more examples of traceroute6 _working_ from osic, and see if any of them are on trusty | 21:27 |
*** javeriak has quit IRC | 21:28 | |
*** spzala has quit IRC | 21:28 | |
cloudnull | it seems suspect, but im going to rope in some of our network folks to see whats what, | 21:28 |
cloudnull | maybe a misconfiguration somewhere in the path. | 21:29 |
fungi | i have to step away for a bit though and eat dinner. bbiab | 21:30 |
cloudnull | kk, ttyl | 21:32 |
cloudnull | enjoy dinner. | 21:32 |
*** jordanP has quit IRC | 21:32 | |
*** matrohon has quit IRC | 21:32 | |
*** annegentle has quit IRC | 21:32 | |
*** baoli has quit IRC | 21:33 | |
jeblair | fungi, pabelanger: i'm still not quite at the bottom of the rabbit hole, but i think i'm getting close. neither systemd nor haveged alone is sufficient to initialize urandom. systemd's entropy is not counted at all. during the initialization phase, all system entropy goes to the urandom pool, *unless* it comes in via ioctl, which is what haveged does. in that case, it goes straight to the input pool, and either none of it, or at ... | 21:33 |
jeblair | ... least not enough of it spills over (reall this is a thing) into the nonblocking (urandom) pool for it to be initialized. eventually, it's regular system entropy which pushes it over the 128 bit threshold. | 21:33 |
jeblair | i have good news though | 21:34 |
jeblair | ted ts'o ripped all of this out last month: https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=e192be9d9a30555aae2ca1dc3aad37cba484cd4a | 21:35 |
jeblair | so it's going to get better | 21:35 |
jeblair | i have one more kernel recompile i want to do, then i think i'll be ready to try the experiment with the other generator | 21:36 |
*** tqtran has quit IRC | 21:37 | |
*** tqtran has joined #openstack-infra | 21:37 | |
cloudnull | fungi: when you get back, if you would not mind, ``traceroute -6 -T git.openstack.org`` which forces TCP instead of the assumed UDP | 21:38 |
cloudnull | also same for -I | 21:38 |
*** jheroux has quit IRC | 21:38 | |
cloudnull | which is forcing ICMP, maybe the UDP packets are getting dropped/deprioritized in the path? | 21:38 |
cloudnull | I'd be curious if that too fails | 21:39 |
*** thorst_ has joined #openstack-infra | 21:39 | |
*** gomarivera has quit IRC | 21:40 | |
*** rhallisey has quit IRC | 21:41 | |
*** tqtran has quit IRC | 21:42 | |
dmsimard | o/ I'm trying to find where the Cirros image gets pre-cached in the nodepool images. I searched for "cirros-0.3.4-x86_64-disk.img" in project-config and system-config but no luck :( | 21:42 |
dmsimard | I know it ends up in '~/cache/files' but I want to know how. | 21:43 |
*** sarob has quit IRC | 21:43 | |
*** thorst_ has quit IRC | 21:43 | |
*** ldnunes has quit IRC | 21:43 | |
jeblair | dmsimard: devstack i think | 21:44 |
mtreinish | fungi, rcarrillocruz: https://github.com/eclipse/mosquitto/commit/ba2de8879008f6df90a0d6af5902926483051124 the mosquitto bug got fixed | 21:44 |
*** sarob has joined #openstack-infra | 21:44 | |
mtreinish | jeblair: yeah, devstack has a command which exports a list of images to precache and the nodepool image scripts call that | 21:44 |
dmsimard | jeblair: ah, found it, ty https://github.com/openstack-dev/devstack/blob/06f3639a70dc5884107a4045bef5a9de1fb725a5/stackrc#L645 | 21:44 |
*** nmagnezi has quit IRC | 21:45 | |
beagles | the irony would be this network thing being MTU and neutron related | 21:45 |
mtreinish | jeblair, dmsimard: http://git.openstack.org/cgit/openstack-dev/devstack/tree/tools/image_list.sh | 21:45 |
mtreinish | dmsimard: and http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/scripts/cache_devstack.py | 21:46 |
*** admcleod_ has joined #openstack-infra | 21:46 | |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: Deploy minimal services in multinode job https://review.openstack.org/355097 | 21:46 |
*** rbrndt has joined #openstack-infra | 21:46 | |
*** matrohon has joined #openstack-infra | 21:46 | |
*** weshay has quit IRC | 21:47 | |
*** nwkarste_ has quit IRC | 21:48 | |
*** nwkarsten has joined #openstack-infra | 21:48 | |
*** njohnston has joined #openstack-infra | 21:48 | |
cloudnull | beagles: you may be onto something there. we're using an MTU of 9000 on the hosts, maybe something is off on nodes that are showing signs of failure. | 21:48 |
*** admcleod has quit IRC | 21:49 | |
* beagles facepalm | 21:49 | |
*** fguillot has quit IRC | 21:50 | |
njohnston | Hi, I have a quick question about change 339246 - it has been sitting in the zuul UI in the check queue, all tests having completed, for over an hour now I believe. Will it ever post it's results so the change can move on to the gate queue? | 21:50 |
*** matt-borland has quit IRC | 21:51 | |
*** gomarivera has joined #openstack-infra | 21:51 | |
*** annegentle has joined #openstack-infra | 21:51 | |
cloudnull | beagles: sadly not the problem | 21:51 |
cloudnull | :'( | 21:52 |
*** ggillies has joined #openstack-infra | 21:52 | |
cloudnull | i kinda wish it was it would've been simple to fix... | 21:52 |
*** nwkarsten has quit IRC | 21:52 | |
beagles | :( | 21:52 |
fungi | cloudnull: yeah, same behavior with traceroute6 -T as with the default method. however slightly different behavior with -I... first attempt none of the hops gave a response except git.openstack.org and it only responded to the second probe, but then rerunning with -I a second time worked correctly (-T and default protocols still do not however) | 21:53 |
*** thorst_ has joined #openstack-infra | 21:54 | |
cloudnull | fungi: traceroute6 or traceroute -6 ? | 21:54 |
*** tqtran has joined #openstack-infra | 21:54 | |
fungi | cloudnull: traceroute6 | 21:55 |
fungi | trying now with traceroute -6 and various options (i didn't know traditional traceroute grew a -6 option) | 21:55 |
cloudnull | yea, that was news to me today too :) | 21:56 |
pabelanger | jeblair: wow, that is a rabbit hole | 21:56 |
openstackgerrit | Merged openstack-infra/system-config: Disable puppet service on boot https://review.openstack.org/356004 | 21:56 |
*** annegentle has quit IRC | 21:58 | |
fungi | jeblair: yeah, i'm aware ted ts'o has been heavily revamping the entropy gathering and rng stuff kernel-side. very excited for that to finally land | 21:58 |
*** sarob has quit IRC | 21:58 | |
*** edmondsw has quit IRC | 21:58 | |
*** thiagop has quit IRC | 21:58 | |
*** amitgandhinz has quit IRC | 21:58 | |
fungi | it's been all abuzz on the post-cypherpunks crypto lists | 21:59 |
*** tqtran has quit IRC | 21:59 | |
*** piet has quit IRC | 22:00 | |
jeblair | fungi, pabelanger: on boot, the first pull from urandom results in a transfer of 0 bits of entropy from the input pool to the nonblocking pool. | 22:00 |
*** spzala has joined #openstack-infra | 22:00 | |
fungi | that brings a tear to my eye | 22:00 |
jeblair | fungi, pabelanger: that transfer of 0 bits causes a timer to start which protects urandom from draining the input pool too quickly. | 22:00 |
jeblair | fungi, pabelanger: which means that later, after haveged dumps 4096 bits of entropy into the input pool, the system waits 60 seconds before it will allow a transfer from input to nonblocking for urandom to reseed | 22:01 |
*** nwkarsten has joined #openstack-infra | 22:01 | |
jeblair | which is why we were seeing an almost exactly 60 second delay | 22:01 |
jeblair | and when i turned off haveged, the 90 seconds was just how long it took to naturally accumulate entropy from interrupts one bit at a time | 22:02 |
fungi | cloudnull: confirmed, traceroute -6 $* gives me identical behavior to traceroute6 $* | 22:02 |
cloudnull | bummer. | 22:02 |
fungi | right down to the strangeness with -I | 22:02 |
cloudnull | off to the next rabbit hole | 22:02 |
*** gomarivera has quit IRC | 22:03 | |
bkero | jeblair: Trying to gather entropy inside a VM? | 22:03 |
*** mriedem has quit IRC | 22:04 | |
*** gomarivera has joined #openstack-infra | 22:05 | |
fungi | bkero: specifically, trying to get unbound to not wait 60 seconds from boot before it can start, since that causes all other services starting and trying to resolve names via dns to bomb | 22:05 |
fungi | and unbound wants a working /dev/urandom to be able to do stuff for dnssec | 22:06 |
*** nwkarsten has quit IRC | 22:06 | |
fungi | and the kernel makes /dev/urandom basically useless for a full minute after boot starting with linux 3.17 | 22:06 |
anteaya | njohnston: all tests are not complete on 339246, | 22:07 |
anteaya | njohnston: all tests are not complete on 339246, | 22:07 |
anteaya | njohnston: all tests are not complete on 339246, | 22:08 |
*** devkulkarni1 has quit IRC | 22:08 | |
anteaya | njohnston: all tests are not complete on 339246, | 22:08 |
* fungi thinks anteaya is caught in a loop | 22:08 | |
anteaya | njohnston: all tests are not complete on 339246, one test | 22:08 |
bkero | fungi: weird. i would have assumed that urandom would be (as the name says) unblocking | 22:08 |
anteaya | njohnston: all tests are not complete on 339246, one test is waiting for a node: | 22:08 |
anteaya | njohnston: all tests are not complete on 339246, one test is waiting for a node: | 22:08 |
*** rbuzatu has joined #openstack-infra | 22:08 | |
anteaya | gate-tempest-dsvm-neutron-full-ubuntu-xenial | 22:08 |
anteaya | sorry for the multiple spame | 22:09 |
anteaya | my laptop was doing something weird with pasting | 22:09 |
anteaya | and I had scrolled up | 22:09 |
anteaya | my apologies | 22:09 |
fungi | bkero: yep, the kernel wants it to be safely seeded so processes don't rely on it before it's sufficiently entropic | 22:09 |
*** tqtran has joined #openstack-infra | 22:10 | |
fungi | and manage that by blocking on reads during that time | 22:10 |
bkero | fungi: Huh, man urandom has a little shell script to carry that randomness between reboots. Cute. | 22:11 |
jeblair | bkero: read scrollback from me today to understand why that doesn't help | 22:11 |
openstackgerrit | Merged openstack-infra/tripleo-ci: Use geard with keepalives https://review.openstack.org/352566 | 22:12 |
anteaya | fungi: and you had pinged me yesterday that the patch merged to allow anyone to compose electroal rolls, thank you for all your work on that | 22:12 |
anteaya | and zaro too | 22:12 |
*** rbuzatu has quit IRC | 22:13 | |
fungi | anteaya: yw! that and also the patch to expose submitted date via the rest api in change details are both in production, so the script is a good bit simpler now | 22:14 |
anteaya | yay simpler scripts! | 22:15 |
*** esberglu has quit IRC | 22:16 | |
bkero | jeblair: read scrollback. That's just unfun. | 22:16 |
jeblair | bkero: okay, well, i mean, i've been digging into a seriously complex subject all day. i'm not sure if you're trying to help or not. | 22:16 |
fungi | cloudnull: so, more spot checking, every trusty node i've found in osic has broken traceroute6, every xenial node i've found in osic seems to have a working traceroute6 | 22:16 |
*** mdrabe has quit IRC | 22:17 | |
beagles | is there a way to get a packet trace on hosts that are doing the 4 minute clone thing | 22:17 |
bkero | jeblair: ignore me, just sympathies | 22:17 |
beagles | mmm | 22:17 |
bkero | If you're at the point of recompiling kernels to add printk()s I'm not going to be much help. | 22:17 |
beagles | actually that probably wouldn't help - a retransmit doesn't tell you why | 22:17 |
notmorgan | bkero: oh god | 22:17 |
fungi | beagles: what's a packet trace? do you mean route trace or a packet capture? | 22:17 |
jeblair | bkero: if you would like to help, i'm happy to have it, just not sure to what degree i should invest in bootstrapping you -- not reading scrollback suggests you may not be very invested. :) | 22:18 |
notmorgan | bkero: recompiling kernels.... i... nooooooo | 22:18 |
fungi | beagles: sounds like you meant a packet capture | 22:18 |
beagles | fungi, I was referring to capture | 22:18 |
beagles | yeah | 22:18 |
bkero | jeblair: I meant I did read scrollback and was offering sympathies | 22:18 |
jeblair | bkero: ah, thanks on all accounts then :) | 22:18 |
fungi | beagles: we'd need to catch one of those instances while the job was running, since nodepool deletes them immediately on failure | 22:18 |
jeblair | bkero: i read your 'read' as 'read' when you meant 'read. | 22:19 |
jeblair | bkero: more like "i just read scrollback" and less like "what? me read scrollback" :) | 22:19 |
bkero | yeah | 22:19 |
bkero | My phrasing could have been better | 22:19 |
beagles | fungi, I could probably point you in the right direction there.. I don't know if a packet trace will help or not, but it might provide some kind of clue as to what the "profile" of the poor connection is | 22:20 |
fungi | beagles: we're running down alternate theories involving other anomalous symptoms we're able to observe, in hopes that they're related enough to provide an indicator | 22:20 |
beagles | fungi, ack | 22:20 |
*** tqtran has quit IRC | 22:20 | |
fungi | beagles: we've also got glance doing something odd in certain devstack jobs only on trusty nodes in osic, and traceroute6 not working correctly on trusty in osic (while xenial seems to be doing fine) | 22:21 |
jeblair | fungi, bkero, pabelanger: i've moved on to investigating why rng-tools makes this better -- somehow it has ticked a code path where entropy is transferred from the input pool to the nonblocking pool more often than 60 seconds | 22:21 |
jeblair | so i may not understand that timer fully... | 22:21 |
jeblair | (also, that timer value can be set in proc, but that's not the solution i'd like to take) | 22:22 |
*** gordc has quit IRC | 22:22 | |
bkero | Hahaha, wow. The char/random.c is copyright Matt Mackall of Mercurial fame. | 22:22 |
fungi | jeblair: there was an ubuntu bug that talked some about that. lemme see if i can dig it back up. something about the kernel not allowing userspace to advance the entropy pool directly, but the method rng-tools uses bypasses that | 22:22 |
fungi | the idea is that less privileged processes should be able to add to entropy, while still not trusted to actually provide good quality entropy | 22:24 |
jeblair | fungi: hrm, it *looks* like it's using the same ioctl that haveged is using... but yeah, that may be helpful | 22:24 |
jhesketh | Morning | 22:25 |
* bkero reading random.c, looks like adding entropy might not necessarily trigger 'crediting' the pool size. That might have to be done manually depending on the method used to add it. | 22:25 | |
jeblair | bkero: yes, that's why the 'save script' doesn't work. but haveged (and presumably rng tools) use the ioctl which does credit | 22:26 |
jeblair | bkero: sarch for RNDADDENTROPY: | 22:26 |
jeblair | bkero: hower, that goes to the *input* pool instead of the nonblocking (urandom) pool. so the thing that's missing is triggering a transfer from input to nonblocking | 22:26 |
bkero | Ahhh okay | 22:26 |
jeblair | sarch=search; hower=however | 22:27 |
*** tqtran has joined #openstack-infra | 22:27 | |
rcarrillocruz | mtreinish: nice! | 22:27 |
*** sdake has quit IRC | 22:27 | |
*** rbuzatu has joined #openstack-infra | 22:28 | |
*** sdake has joined #openstack-infra | 22:28 | |
*** rbuzatu has joined #openstack-infra | 22:29 | |
anteaya | morning jhesketh | 22:29 |
*** dimtruck is now known as zz_dimtruck | 22:29 | |
*** zz_dimtruck is now known as dimtruck | 22:29 | |
jeblair | bkero, fungi, pabelanger: oh -- i think pulls can only happen once per 60 seconds, but i think if you add entropy with the ioctl, and the input pool is full, then it can schedule a transfer from input to nonblocking pools | 22:31 |
*** yamahata has quit IRC | 22:31 | |
jeblair | bkero, fungi, pabelanger: it's looking like rng-tools does multiple ioctls to add entropy -- along with, on my test system, haveged adding one of its own | 22:31 |
jeblair | so that's how adding rng-tools makes initialization happen faster | 22:31 |
bkero | hm, ok | 22:32 |
* bkero looking what nonblocking_pool.initialized does | 22:32 | |
jeblair | presumably, convincing haveged to push more entropy than required (possibly via multiple ioctls) may do the same | 22:32 |
fungi | sounds like something worth testing | 22:33 |
jeblair | bkero: credit_entropy_bits has both the part i just described as well as the initialization threshold ("> 128") | 22:33 |
*** annegentle has joined #openstack-infra | 22:34 | |
*** tqtran has quit IRC | 22:36 | |
openstackgerrit | Varun Gadiraju proposed openstack-infra/project-config: Step 1 patch to project-config from bug #1609573 https://review.openstack.org/354344 | 22:36 |
openstack | bug 1609573 in Ironic "Ironic gate jobs should not pass configs through devstack-gate when possible" [Undecided,New] https://launchpad.net/bugs/1609573 - Assigned to Varun Gadiraju (varun-gadiraju) | 22:36 |
*** adrian_otto has quit IRC | 22:38 | |
*** tqtran has joined #openstack-infra | 22:39 | |
*** dimtruck is now known as zz_dimtruck | 22:39 | |
*** gouthamr has joined #openstack-infra | 22:39 | |
*** gouthamr_ has joined #openstack-infra | 22:40 | |
*** matrohon has quit IRC | 22:40 | |
*** tqtran has quit IRC | 22:43 | |
*** gouthamr has quit IRC | 22:44 | |
craige | o/ | 22:44 |
*** gouthamr_ is now known as gouthamr | 22:44 | |
*** yamamoto has joined #openstack-infra | 22:46 | |
openstackgerrit | Sai Sindhur Malleni proposed openstack-infra/project-config: Adding Ansible jobs for Browbeat https://review.openstack.org/356073 | 22:47 |
*** Apsu has joined #openstack-infra | 22:48 | |
pabelanger | jeblair: great info, thanks | 22:48 |
*** thorst_ is now known as thorst | 22:49 | |
*** yamahata has joined #openstack-infra | 22:51 | |
bkero | jeblair: could always ping mpm on freenode :) he wrote the code | 22:51 |
*** burgerk has quit IRC | 22:51 | |
bkero | jeblair: Could it be initialized, but maybe prandom_reseed_late() is being set too high? | 22:53 |
bkero | jeblair: I'm curious what prandom_seed_full_state() and prandom_bytes_state() would return | 22:54 |
jeblair | bkero: well, we see the "random: nonblocking pool is initialized" line in the logs right around 63 seconds, then urandom starts working. | 22:54 |
*** hockeynut has quit IRC | 22:55 | |
bkero | jeblair: That print happens after the timer is set | 22:55 |
bkero | I'm tracing prandom_reseed_late(); in random.c line 682 in v4.7 | 22:55 |
jeblair | bkero: oh! look at 4.4 | 22:56 |
jeblair | bkero: all this gets way better after https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=e192be9d9a30555aae2ca1dc3aad37cba484cd4a | 22:56 |
*** vhosakot has quit IRC | 22:56 | |
jeblair | bkero: but that's not what we're running :( | 22:56 |
bkero | jeblair: Still the same in 4.4 | 22:56 |
bkero | line 681 | 22:56 |
jeblair | bkero: that's the part where it initializes the urandom rng, right? | 22:57 |
bkero | jeblair: despite the name it looks like it actually seeds for the first time, so seed + reseed | 22:58 |
jeblair | bkero: you were asking if it could be initialized -- but we don't see the initialized line until 60+ seconds in (when the pull timer has expired) | 22:58 |
jeblair | so prandom_reseed_late isn't going to be called until then | 22:59 |
bkero | jeblair: I'm assuming this thing spins with the wake_up_all() on line 683 until rng is initialized, then prints out the message | 22:59 |
jeblair | bkero: i don't think this function spins at all | 23:00 |
bkero | That credit_entropy_bits codeblock section does: prandom_reseed_late(), process_random_ready_list(), wake_up_all(), then prints the message | 23:00 |
*** spzala has quit IRC | 23:01 | |
bkero | __prandom_reseed has a spinlock | 23:01 |
*** spzala has joined #openstack-infra | 23:01 | |
jeblair | bkero: that only happens after nonblocking pool gets 128 bits | 23:01 |
bkero | Yeah, I'm assuming that's happened. Maybe that's a false assumption. | 23:02 |
jeblair | bkero: it hasn't happened, because the only thing that can feed the nonblocking pool is entropy from interrupts (~ one bit per second) or a transfer from the input pool. | 23:03 |
bkero | I'd think the timers would be adding it too. http://lxr.free-electrons.com/source/drivers/char/random.c?v=4.4#L804 | 23:04 |
*** asettle has joined #openstack-infra | 23:05 | |
openstackgerrit | Matthew Treinish proposed openstack-infra/devstack-gate: SUPER WIP: Use new tempest run workflow https://review.openstack.org/355666 | 23:05 |
jeblair | bkero: not in practice on this system. but theoretically yes. | 23:05 |
*** xarses has quit IRC | 23:05 | |
*** hongbin has quit IRC | 23:05 | |
*** tqtran has joined #openstack-infra | 23:05 | |
*** spzala has quit IRC | 23:06 | |
cloudnull | jeblair fungi: i did some more tests using trusty and the traceroute issues using vanialla 14.04 -- I built 127 vms, passed user data to it to install and use traceroute(6) and from the looks of it, it all works. | 23:06 |
cloudnull | VMS: http://cdn.pasteraw.com/rdq27ar1tcxag4zjal72vciufu2r28c | 23:06 |
cloudnull | opps that console data show the traceroute | 23:07 |
cloudnull | VMS http://cdn.pasteraw.com/7pgwxkqhmfvzc8pz4y3uwk9y6v6z58b | 23:07 |
cloudnull | all using trusty | 23:07 |
cloudnull | mtreinish: -cc ^ | 23:07 |
*** Hal has quit IRC | 23:08 | |
*** tpsilva has quit IRC | 23:08 | |
cloudnull | simple userdata passed in http://cdn.pasteraw.com/7n0b3yeculm5zl5w4g5y5indfn4izhx | 23:08 |
*** rbrndt has quit IRC | 23:09 | |
*** asettle has quit IRC | 23:09 | |
*** Hal has joined #openstack-infra | 23:09 | |
cloudnull | I also made sure all of the VMs we're built on different compute nodes. | 23:09 |
fungi | cloudnull: yeah, this could be something odd with our image. i'm starting to dig around with tcpdump | 23:09 |
cloudnull | I have another battery of tests to run using our various AZs and other networks just to make sure everything is on the up and up , but im kinda at a loss... :'( | 23:11 |
bkero | jeblair: have you measured how many interrupts are being thrown at the boot of the system? Maybe it's not that many. | 23:11 |
*** xarses has joined #openstack-infra | 23:13 | |
cloudnull | fungi: this is the image i've been using http://cdn.pasteraw.com/eey9gn9gggaxyimu64dd79xwu0vjble | 23:14 |
jeblair | bkero: it's about 1 per second | 23:14 |
*** xyang1 has quit IRC | 23:14 | |
bkero | jeblair: If the only entropy source is interrupts, and add_interrupt_randomness triggered for each, that would only add 60 bits of randomness per minute | 23:15 |
bkero | add_interrupt_randomness() sets credit=0, and calls credit_entropy_bits(r, credit + 1). Since credit is never set except for on seed generators (PPC only) it's always 1. | 23:16 |
bkero | where 1 = 1 bit | 23:17 |
jeblair | bkero: yep. i'd expect it to initialize after 128 seconds. in practice, i saw it initialize after 90 seconds with no help. i can't immediately account for the 30 second discrepancy, but could reboot with both rng and haveged disabled to find out if it might be important. | 23:17 |
*** thorst has quit IRC | 23:17 | |
cloudnull | I've got to relocate home. bbl | 23:17 |
*** thorst has joined #openstack-infra | 23:17 | |
bkero | jeblair: I'm curious why add_timer_randomness() isn't working too | 23:17 |
bkero | NO_HZ? | 23:18 |
bkero | Maybe try nohz=off in the cmdline? | 23:18 |
fungi | cloudnull: fyi, our current suspect image is b9cb5844-82a6-4034-9d09-d651ec019c7b | 23:18 |
jeblair | bkero: possibly state->dont_count_entropy is true? | 23:19 |
jeblair | bkero: i only instrumented the entropy credit, not the mix_pool_bytes func | 23:19 |
*** tqtran has quit IRC | 23:19 | |
jeblair | bkero: so i know that it's not crediting, but i don't know whether the add_timer_randomness func is being called | 23:20 |
jeblair | bkero: here's the most recent boot: http://paste.openstack.org/show/558614/ this is with rng-tools rngd adding entropy starting around 19s | 23:21 |
jeblair | fungi, pabelanger, bkero: i strongly suspect the difference between rngd and haveged is that rngd writes data in smaller chunks which allows the overflow routine to happen to push the nonblocking pool over the limit and initialize | 23:24 |
*** adriant has joined #openstack-infra | 23:24 | |
bkero | jeblair: I think it would help a lot to print which entropy pool was being credited | 23:24 |
bkero | Can just print the memory address, there should only be a few. | 23:24 |
jeblair | bkero: credit pool nonblocking nbits 1 | 23:24 |
jeblair | bkero: nonblocking is a variable sub there, it's the name of the pool | 23:25 |
bkero | Hmm, what is entropy_count? | 23:25 |
jeblair | "credit from interrupt / credit pool nonblocking nbits 1 / credit entropy_count 6 / credit entropy_total 64" are all one event, and that's the order to read them in | 23:26 |
*** thorst has quit IRC | 23:26 | |
bkero | Aug 16 23:06:32 ubuntu kernel: [ 2.026124] random: write nonblocking pool 512 <-- doesn't look like that's getting credited. | 23:26 |
bkero | I'm betting that's systemd's seed thing | 23:27 |
jeblair | bkero: entropy_count is the value of that local variable before "if (unlikely(entropy_count < 0)) {" | 23:27 |
*** fguillot has joined #openstack-infra | 23:27 | |
jeblair | bkero: that's it exactly | 23:27 |
jeblair | bkero: that's a write to /dev/random rather than an ioctl, so it's added but not credited | 23:27 |
openstackgerrit | Abhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service https://review.openstack.org/355670 | 23:28 |
bkero | hrm | 23:28 |
bkero | That sounds like a bug | 23:28 |
bkero | Or maybe they don't want to credit userspace additions as a matter of security | 23:28 |
bkero | If that were the case I'd hope they would leave a comment htough. | 23:28 |
jeblair | bkero: there are some comments that allude to that | 23:28 |
jeblair | considering it's systemd, i think it could have used the ioctl, but anyway, all these bugs are fixed in newer kernels anyway :) | 23:29 |
bkero | jeblair: RNDADDENTROPY should be crediting it unless write_pool()s return value is 0, but according to your log it's 0. | 23:31 |
* bkero reads the systemd source | 23:32 | |
jeblair | bkero: RNDADDENTROPY is the ioctl (haveged and rngd use it). random_write is the entry point for "cat > /dev/random" which is what systemd does. those are the write_pool calls that i have wrapped with those debug lines (the ones that print return codes). | 23:33 |
*** sarob has joined #openstack-infra | 23:33 | |
bkero | Yeah, systemd v229 just does: r = loop_write(random_fd, buf, (size_t) k, false); | 23:35 |
bkero | blah | 23:36 |
* bkero disappears for a bit | 23:38 | |
bkero | jeblair: Good luck figuring it out :( | 23:38 |
jeblair | bkero: thanks :) | 23:38 |
jeblair | mordred, fungi, pabelanger: rngd doesn't seem to mind if there is no hardware rng. it prints some error lines and continues. | 23:38 |
bkero | jeblair: maybe make a systemd unit file to call the ioctl with a few bytes to do things correctly? | 23:39 |
*** jklare has quit IRC | 23:39 | |
*** zz_dimtruck is now known as dimtruck | 23:40 | |
fungi | jeblair: yeah, in some cases it may consume from virt-rng i think. i'm not super familiar with what happens if that's not available eithter | 23:40 |
jeblair | bkero: yeah, possibly with the help of rngd or haveged | 23:41 |
jeblair | fungi: do you know how to tell if it's doing that? | 23:41 |
fungi | jeblair: i do not, no | 23:42 |
*** csomerville has quit IRC | 23:42 | |
*** aviau has quit IRC | 23:43 | |
*** aviau has joined #openstack-infra | 23:43 | |
jeblair | fungi: it seems to behave the same in rax as on osic | 23:43 |
*** moravec has quit IRC | 23:44 | |
*** cody-somerville has joined #openstack-infra | 23:45 | |
*** tqtran has joined #openstack-infra | 23:46 | |
fungi | virtio-rng i guess | 23:48 |
*** zhurong has joined #openstack-infra | 23:49 | |
*** moravec has joined #openstack-infra | 23:49 | |
fungi | qemu/kvm passthrough... i though xen had something similar | 23:49 |
fungi | thought | 23:49 |
*** bswartz has quit IRC | 23:49 | |
*** annegentle has quit IRC | 23:50 | |
*** ihrachys has joined #openstack-infra | 23:50 | |
*** tqtran has quit IRC | 23:51 | |
*** jerryz has quit IRC | 23:51 | |
*** tqtran has joined #openstack-infra | 23:54 | |
*** kbaegis has quit IRC | 23:55 | |
*** apetrich has quit IRC | 23:55 | |
*** kbaegis has joined #openstack-infra | 23:56 | |
mtreinish | jeblair: so I'm looking at the zuul snippet you pointed me to, and do you know if there is an example of what the job.arguments dict looks like or just the job object that gets passed to launch() | 23:56 |
mtreinish | because I'm not exactly sure what I have to work with for adding the node_image to the metadata there | 23:57 |
jeblair | mtreinish: i may be in a better position to help tomorrow; i don't think i can context switch right now, sorry. | 23:58 |
mtreinish | jeblair: ok, no worries | 23:58 |
*** jklare has joined #openstack-infra | 23:58 | |
*** amitgandhinz has joined #openstack-infra | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!