*** clarkb has joined #openstack-infra | 00:06 | |
*** tosky has quit IRC | 00:30 | |
*** mgoddard has quit IRC | 00:51 | |
*** mgoddard has joined #openstack-infra | 00:57 | |
*** dviroel has quit IRC | 01:06 | |
*** noonedeadpunk has quit IRC | 01:19 | |
*** noonedeadpunk has joined #openstack-infra | 01:21 | |
*** zxiiro has quit IRC | 01:45 | |
*** hamalq has quit IRC | 01:47 | |
*** dmsimard has quit IRC | 02:16 | |
*** mtreinish has quit IRC | 02:16 | |
*** dmsimard has joined #openstack-infra | 02:17 | |
*** rcernin has quit IRC | 02:31 | |
*** zzzeek has quit IRC | 02:53 | |
*** zzzeek has joined #openstack-infra | 02:55 | |
*** prometheanfire has quit IRC | 02:58 | |
*** prometheanfire has joined #openstack-infra | 03:01 | |
*** prometheanfire has quit IRC | 03:22 | |
*** rcernin has joined #openstack-infra | 03:23 | |
*** rcernin has quit IRC | 03:28 | |
*** rcernin has joined #openstack-infra | 03:28 | |
*** prometheanfire has joined #openstack-infra | 03:36 | |
*** ykarel has joined #openstack-infra | 03:52 | |
*** ykarel_ has joined #openstack-infra | 03:58 | |
*** ykarel has quit IRC | 04:01 | |
*** ysandeep|away is now known as ysandeep|ruck | 04:54 | |
*** ykarel_ has quit IRC | 04:57 | |
*** vishalmanchanda has joined #openstack-infra | 05:27 | |
*** ianw has quit IRC | 06:18 | |
*** ianw has joined #openstack-infra | 06:19 | |
*** zzzeek has quit IRC | 06:26 | |
*** zzzeek has joined #openstack-infra | 06:29 | |
*** slaweq has joined #openstack-infra | 06:35 | |
*** mugsie has quit IRC | 06:37 | |
tkajinam | is anybody from infra team around ? | 06:47 |
---|---|---|
tkajinam | I noticed some of the CI jobs are not invoked, although we haven't made any changes in zuul configuration in individual repos | 06:48 |
tkajinam | https://review.opendev.org/c/openstack/puppet-nova/+/764763 | 06:48 |
tkajinam | for example now puppet-openstack-integraton-* jobs are not executed in check queue | 06:48 |
tkajinam | is there any change made at infra layer ? | 06:48 |
*** ysandeep|ruck is now known as ysandeep|lunch | 07:00 | |
*** jcapitao has joined #openstack-infra | 07:12 | |
*** amoralej|off is now known as amoralej | 07:37 | |
*** eolivare has joined #openstack-infra | 07:49 | |
*** sboyron_ has joined #openstack-infra | 08:10 | |
*** piotrowskim has joined #openstack-infra | 08:14 | |
*** andrewbonney has joined #openstack-infra | 08:23 | |
*** ysandeep|lunch is now known as ysandeep|ruck | 08:24 | |
*** rpittau|afk is now known as rpittau | 08:37 | |
*** rcernin has quit IRC | 08:40 | |
*** tosky has joined #openstack-infra | 08:44 | |
*** hashar has joined #openstack-infra | 08:45 | |
*** dtantsur|afk is now known as dtantsur | 08:53 | |
*** xek has joined #openstack-infra | 08:54 | |
*** jpena|off is now known as jpena | 08:59 | |
*** lucasagomes has joined #openstack-infra | 09:08 | |
*** yamamoto has quit IRC | 09:13 | |
*** rcernin has joined #openstack-infra | 09:18 | |
*** rcernin has quit IRC | 09:35 | |
*** yamamoto has joined #openstack-infra | 09:48 | |
*** zzzeek has quit IRC | 09:53 | |
*** zzzeek has joined #openstack-infra | 09:53 | |
*** yamamoto has quit IRC | 10:01 | |
*** derekh has joined #openstack-infra | 10:13 | |
*** yamamoto has joined #openstack-infra | 10:18 | |
*** derekh has quit IRC | 10:29 | |
*** derekh has joined #openstack-infra | 10:29 | |
*** ociuhandu has joined #openstack-infra | 10:31 | |
*** ociuhandu has quit IRC | 10:40 | |
*** ociuhandu has joined #openstack-infra | 10:46 | |
*** ociuhandu has quit IRC | 10:46 | |
*** ociuhandu has joined #openstack-infra | 10:47 | |
*** yamamoto has quit IRC | 10:49 | |
*** ociuhandu has quit IRC | 10:52 | |
*** yamamoto has joined #openstack-infra | 10:53 | |
*** yamamoto has quit IRC | 10:55 | |
*** yamamoto has joined #openstack-infra | 10:58 | |
*** dviroel has joined #openstack-infra | 11:01 | |
*** ociuhandu has joined #openstack-infra | 11:01 | |
*** yamamoto has quit IRC | 11:02 | |
*** gfidente has joined #openstack-infra | 11:07 | |
*** ociuhandu has quit IRC | 11:13 | |
*** mugsie has joined #openstack-infra | 11:13 | |
*** ociuhandu has joined #openstack-infra | 11:17 | |
*** yamamoto has joined #openstack-infra | 11:21 | |
*** vishalmanchanda has quit IRC | 11:22 | |
*** vishalmanchanda has joined #openstack-infra | 11:29 | |
*** yamamoto has quit IRC | 11:32 | |
*** ysandeep|ruck is now known as ysandeep|afk | 11:34 | |
*** yamamoto has joined #openstack-infra | 11:39 | |
*** yamamoto has quit IRC | 11:44 | |
*** yamamoto has joined #openstack-infra | 11:46 | |
*** jcapitao is now known as jcapitao_lunch | 12:12 | |
*** ociuhandu has quit IRC | 12:13 | |
*** ociuhandu has joined #openstack-infra | 12:14 | |
*** ysandeep|afk is now known as ysandeep|ruck | 12:16 | |
*** ociuhandu has quit IRC | 12:21 | |
*** slaweq has quit IRC | 12:23 | |
*** mtreinish has joined #openstack-infra | 12:27 | |
*** jpena is now known as jpena|lunch | 12:27 | |
*** rlandy has joined #openstack-infra | 12:32 | |
*** ramishra has quit IRC | 12:41 | |
*** ramishra has joined #openstack-infra | 12:41 | |
*** ramishra has quit IRC | 12:42 | |
*** ramishra has joined #openstack-infra | 12:44 | |
*** redrobot has joined #openstack-infra | 13:01 | |
*** yamamoto has quit IRC | 13:07 | |
*** jcapitao_lunch is now known as jcapitao | 13:08 | |
*** amoralej is now known as amoralej|lunch | 13:12 | |
*** yamamoto has joined #openstack-infra | 13:13 | |
*** yamamoto has quit IRC | 13:18 | |
*** yamamoto has joined #openstack-infra | 13:18 | |
*** yamamoto has quit IRC | 13:18 | |
*** ociuhandu has joined #openstack-infra | 13:23 | |
*** jpena|lunch is now known as jpena | 13:28 | |
*** rcernin has joined #openstack-infra | 13:33 | |
*** rcernin has quit IRC | 13:38 | |
*** yamamoto has joined #openstack-infra | 13:56 | |
*** tbarron|out has quit IRC | 13:59 | |
*** nweinber has joined #openstack-infra | 14:07 | |
*** yamamoto has quit IRC | 14:08 | |
*** amoralej|lunch is now known as amoralej | 14:17 | |
fungi | tkajinam: i'll take a look and see if i can spot the problem | 14:17 |
tkajinam | fungi, thanks a lot ! | 14:18 |
fungi | tkajinam: can you be specific about which jobs you expected aren't running for that change? | 14:19 |
fungi | it looks like it's running 4 jobs on that change | 14:19 |
fungi | oh, sorry, you said puppet-openstack-integraton-* | 14:20 |
tkajinam | fungi, this is an example which doesn't get the expected ci jobs https://review.opendev.org/c/openstack/puppet-ceilometer/+/775730 | 14:20 |
tkajinam | we expect items like this https://review.opendev.org/c/openstack/puppet-ceilometer/+/765100 | 14:20 |
* fungi drinks some more coffee and tries again ;) | 14:20 | |
*** slaweq has joined #openstack-infra | 14:21 | |
tkajinam | fungi, I sent an email to openstack-discuss. I'll be away soon because it is becoming too late so it would be nice if you can share your findings in your reply to that email | 14:21 |
fungi | tkajinam: yep, thanks i'll follow up there | 14:21 |
tkajinam | fungi, one more input. It seems that it is affecting not only master but also some stable branches | 14:23 |
tkajinam | https://review.opendev.org/c/openstack/puppet-oslo/+/774118 | 14:23 |
fungi | i see puppet-openstack-integration-6-scenario001-tempest-ubuntu-bionic running against a puppet-ceilometer change yesterday | 14:23 |
fungi | and on a puppet-nova change over the weekend | 14:24 |
tkajinam | it should have puppet-openstack-unit-* and puppet-openstack-integration-6-scenario00*-tempest-centos-8 | 14:24 |
tkajinam | yeah | 14:24 |
tkajinam | interesting this is that expected jobs were triggered 765100 when we merged 765100 at 18 pm UTC | 14:25 |
tkajinam | interesting thing * | 14:25 |
tkajinam | but when I submitted 775730 at 2am UTC some jobs are missing | 14:26 |
tkajinam | and there are no changed made in puppet repos between these two timestamps | 14:26 |
tkajinam | iiuc | 14:26 |
fungi | tkajinam: keep in mind that not all jobs run on all changes, it can depend on how the jobs or their parent jobs are defined as to which files are being changed | 14:27 |
fungi | for example this is an ancestor of that job: https://zuul.opendev.org/t/openstack/job/puppet-openstack-integration-run-base | 14:27 |
tkajinam | fungi, yeah I understand that point | 14:28 |
*** ociuhandu has quit IRC | 14:28 | |
fungi | if only files in the irrelevant_files list there are being changed, and if the descendants of that job don't override it, then the job won't be included | 14:28 |
tkajinam | fungi, but unfortunately I don't see anything clearly explaining the difference. | 14:28 |
tkajinam | I mean, both commits make change in files under manifests directory, so should trigger that job | 14:29 |
*** ociuhandu has joined #openstack-infra | 14:29 | |
fungi | looks like those jobs chain up to puppet-openstack-integration-base which is no longer defined? if i click on it in the job browser zuul gives an error: https://zuul.opendev.org/t/openstack/job/puppet-openstack-integration-base | 14:30 |
fungi | i'll see if i can tell where that is/was defined | 14:31 |
tkajinam | fungi, it is defined here https://github.com/openstack/puppet-openstack-integration/blob/master/zuul.d/base.yaml#L3 | 14:31 |
fungi | tkajinam: aha, here we are: https://zuul.opendev.org/t/openstack/config-errors | 14:31 |
fungi | look in there for "puppet" | 14:31 |
fungi | "Unknown projects: openstack/tempest-horizon" | 14:31 |
tkajinam | ahh ok | 14:32 |
tkajinam | that makes sence | 14:32 |
fungi | that project got retired recently, and it's still listed as a required-project | 14:32 |
tkajinam | we should remove that because it was retired | 14:32 |
tkajinam | yeah | 14:32 |
tkajinam | that explains why thing were broken without any change in puppet repos | 14:32 |
fungi | yep | 14:32 |
tkajinam | fungi, thanks a lot. I'll look into it | 14:33 |
fungi | looks like openstack-ansible and vexxhost's openstack-operator are also affected: https://codesearch.opendev.org/?q=openstack/tempest-horizon | 14:33 |
tkajinam | yeah I still see some project with reference to it | 14:37 |
*** ysandeep|ruck is now known as ysandeep|away | 14:43 | |
*** slaweq has quit IRC | 14:46 | |
*** sboyron_ is now known as sboyron | 14:51 | |
*** ociuhandu has quit IRC | 14:52 | |
*** ociuhandu has joined #openstack-infra | 14:52 | |
*** ociuhandu has quit IRC | 14:57 | |
*** jcapitao_ has joined #openstack-infra | 15:03 | |
*** jcapitao has quit IRC | 15:06 | |
*** ociuhandu has joined #openstack-infra | 15:08 | |
*** jcapitao_ is now known as jcapitao | 15:09 | |
*** hashar has quit IRC | 15:29 | |
*** rcernin has joined #openstack-infra | 15:34 | |
*** dklyle has joined #openstack-infra | 15:37 | |
*** rcernin has quit IRC | 15:39 | |
frickler | jrosser: noonedeadpunk: that could explain the jobs not running you mentioned earlier ^^ | 15:39 |
noonedeadpunk | we don't use tests from https://opendev.org/openstack/openstack-ansible-os_horizon/src/branch/master/tests/os_horizon-overrides.yml#L26-L29 | 15:41 |
noonedeadpunk | if you're about that | 15:41 |
fungi | yeah, there are no related zuul config errors for ansible, i just happened to notice that file when doing a codesearch so thought i'd point it out as potential cleanup | 15:42 |
fungi | the errors seem to all be for various puppet-openstack branches and also some old horizon branches | 15:43 |
noonedeadpunk | we've added test here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/utility_all.yml#L103 but it seems that `tempest_test_whitelist` does not have it for some reason | 15:43 |
fungi | and i think at least one (if not both) horizon branches affected are slated to be deleted | 15:43 |
frickler | o.k., I didn't look in that detail, I just matched the mentioning of horizon tests not running with the above | 15:46 |
noonedeadpunk | gotcha:) well yes, we've also just added that new tempest test so it's kind of related, yes:) | 15:47 |
*** jcapitao is now known as jcapitao|off | 16:05 | |
*** yamamoto has joined #openstack-infra | 16:06 | |
*** rpittau is now known as rpittau|afk | 16:10 | |
*** yamamoto has quit IRC | 16:17 | |
*** amoralej is now known as amoralej|off | 16:21 | |
*** hashar has joined #openstack-infra | 16:22 | |
*** gyee has joined #openstack-infra | 16:27 | |
*** ociuhandu has quit IRC | 16:41 | |
*** ociuhandu has joined #openstack-infra | 16:41 | |
*** ociuhandu has quit IRC | 16:44 | |
*** ociuhandu has joined #openstack-infra | 16:44 | |
*** lucasagomes has quit IRC | 17:09 | |
*** zul has joined #openstack-infra | 17:21 | |
*** slaweq has joined #openstack-infra | 17:24 | |
*** slaweq has quit IRC | 17:24 | |
*** slaweq has joined #openstack-infra | 17:26 | |
*** ociuhandu has quit IRC | 17:30 | |
*** ociuhandu has joined #openstack-infra | 17:31 | |
*** ociuhandu has quit IRC | 17:31 | |
*** slaweq has quit IRC | 17:31 | |
*** ociuhandu has joined #openstack-infra | 17:31 | |
*** rcernin has joined #openstack-infra | 17:34 | |
*** piotrowskim has quit IRC | 17:35 | |
*** gfidente is now known as gfidente|afk | 17:37 | |
*** rcernin has quit IRC | 17:39 | |
*** eolivare has quit IRC | 17:44 | |
*** jcapitao|off has quit IRC | 17:51 | |
*** ociuhandu_ has joined #openstack-infra | 17:57 | |
*** ociuhandu has quit IRC | 18:01 | |
*** ociuhandu_ has quit IRC | 18:02 | |
*** hashar is now known as hasharDinner | 18:07 | |
*** jpena is now known as jpena|off | 18:24 | |
*** ralonsoh has quit IRC | 18:25 | |
*** derekh has quit IRC | 18:32 | |
*** hamalq has joined #openstack-infra | 18:35 | |
openstackgerrit | Merged openstack/ptgbot master: Bot is now openinfraptg on #openinfra-events https://review.opendev.org/c/openstack/ptgbot/+/774863 | 18:46 |
*** andrewbonney has quit IRC | 18:54 | |
*** slaweq has joined #openstack-infra | 19:07 | |
*** dtantsur is now known as dtantsur|afk | 19:20 | |
*** jcapitao|off has joined #openstack-infra | 19:24 | |
*** hasharDinner is now known as hashar | 19:27 | |
*** rcernin has joined #openstack-infra | 19:35 | |
*** rcernin has quit IRC | 19:40 | |
*** hashar has quit IRC | 19:45 | |
*** hashar has joined #openstack-infra | 19:45 | |
clarkb | dansmith: melwitt: if we have a nova flavor that says we get an 80GB disk and a ~9GB image that ends up with only ~15GB of disk in reality is there a nova behavior that would explain that? | 19:46 |
clarkb | I suspect that our images may actually pad out to 15GB which may explain that choice. But growing the filesystems seems to not work in a particular cloud region occasionally and I'm wondering if nova will happily give you an instance even though the disk flavor can't be met | 19:46 |
clarkb | corvus: fungi: ^ possible that dansmith and melwitt may recognize this behavior? | 19:47 |
frickler | clarkb: in all cases that I saw, the disk according to ansible was 100G, only the fs was stuck at 15G | 19:50 |
clarkb | frickler: oh where was ansible reporting that? In the facts collection file? | 19:50 |
clarkb | (I had missed that and want to take a look) | 19:50 |
frickler | yeah | 19:50 |
clarkb | cool let me take a look at that | 19:50 |
frickler | https://ccc35cfa38f56032f297-95ab28bd06b01f2c7089eb38812248e0.ssl.cf2.rackcdn.com/759091/5/gate/nova-grenade-multinode/8997228/zuul-info/host-info.primary.yaml is what I looked at on Friday, but that gives me conn refused currently | 19:52 |
fungi | clarkb: something i was considering was adding an lsblk call in validate-host along with the df, but if we already have an example which proves the block device is larger, then what we really need is whatever errors growroot would have emitted | 19:52 |
clarkb | frickler: that link loads for me | 19:53 |
fungi | which may be included in the syslog | 19:53 |
fungi | or journal | 19:53 |
fungi | but we don't normally collect those if we fail in pre | 19:53 |
clarkb | and ya that shows vda is 100GB | 19:53 |
clarkb | but https://ccc35cfa38f56032f297-95ab28bd06b01f2c7089eb38812248e0.ssl.cf2.rackcdn.com/759091/5/gate/nova-grenade-multinode/8997228/zuul-info/zuul-info.primary.txt is only 15GB so ya growroot fs failures? | 19:53 |
clarkb | or as tobiash suggests possibly it has not finished yet for some reason | 19:54 |
clarkb | https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/growroot/init-scripts/systemd/growroot.service is the unit we should be running to growroot | 20:00 |
clarkb | that specifically says WantedBy=multi-user.target | 20:00 |
clarkb | I think that rules out tobiash suggestion as long as systemd is working properly | 20:01 |
fungi | yeah, zuul shouldn't be able to ssh in before that, right? | 20:01 |
clarkb | sort of | 20:01 |
clarkb | I think networking is also wanted by mutli-user.target so there may be a race there | 20:01 |
* clarkb checks the local system units | 20:02 | |
fungi | i thought only root was supposed to be able to authenticate before mutli-user.target was reached | 20:02 |
clarkb | my sshd says WantedBy=multi-user.target too | 20:02 |
fungi | yeah, but if you try to ssh in after sshd starts and before mutli-user.target i think that's when you get the "system is still starting up, try again shortly" soft of message | 20:04 |
clarkb | ah | 20:05 |
clarkb | that would imply growroot is actually failing. So ya adding a journalctl -u growroot log to things may be a good idea? | 20:05 |
clarkb | ianw: ^ fyi that may interest you from an image building and dib perspective | 20:08 |
ianw | ugh. it should be captured in syslog? | 20:18 |
fungi | yeah, that's what i was suggesting too, try to grep it out of syslog or something | 20:18 |
fungi | or on debian derivatives, it may also be in boot.log | 20:18 |
*** gfidente|afk has quit IRC | 20:19 | |
fungi | mm, maybe only up through ubuntu xenial and debian stretch timeframe | 20:20 |
fungi | looks like that probably went away with the switch to systemd | 20:21 |
fungi | or soon thereafter | 20:21 |
fungi | should hopefully still get copied to /var/log/syslog though yes | 20:23 |
clarkb | maybe in our prerun when we gather system info we can just do journalctl -u growroot? | 20:33 |
ianw | ++, will try poking at some hosts | 20:35 |
*** hashar has quit IRC | 20:35 | |
*** zul has quit IRC | 20:42 | |
melwitt | clarkb: off the top of my head no, I don't know of a behavior like that. and nova shouldn't be giving out an instance without honoring the flavor... I don't know why that's happening. I'll look around and see if I can find anything related | 20:50 |
fungi | melwitt: could be entirely unrelated to nova, just grasping at straws until we have more useful output captured from early boot | 20:52 |
melwitt | understood | 20:52 |
fungi | problem is, it happens infrequently. same provider, image and flavor works fine most of the time | 20:52 |
melwitt | fungi: keeping in grasping at straws, I was just reading this https://docs.openstack.org/image-guide/openstack-images.html#disk-partitions-and-resize-root-partition-on-boot-cloud-init is it possible that some images may not have the necessary cloud-initramfs-growroot package in them? | 20:58 |
melwitt | but you'd think nova would fail the boot if that grow couldn't happen. I'll look and see if/where we check that, if we can check it | 21:00 |
fungi | yeah, we see it fail intermittently with the same image which also sometimes succeeds | 21:00 |
fungi | even in the same provider | 21:01 |
melwitt | ok, I see | 21:01 |
*** rcernin has joined #openstack-infra | 21:20 | |
*** rcernin has quit IRC | 21:29 | |
*** rcernin has joined #openstack-infra | 21:34 | |
*** rcernin has quit IRC | 21:39 | |
*** nweinber has quit IRC | 21:47 | |
*** sboyron has quit IRC | 21:47 | |
dansmith | clarkb: sorry, just saw this, but no I don't know of any behavior that would cause that | 21:49 |
fungi | so far we've seen different host-ids associated with each failure, so i'm doubtful it's anything specific to nova or the hypervisor layer | 21:50 |
dansmith | clarkb: does the device appear to be 80G with a 15G partition in it, or does the device show 80 and it was only grown to 15? | 21:50 |
fungi | but they at least are all happening in the same provider | 21:51 |
fungi | dansmith: 100gb block device with a 15gb filesystem | 21:51 |
dansmith | is that provider running libvirt? | 21:51 |
clarkb | I believe it is yes | 21:51 |
clarkb | (and this should be after growroot ran) | 21:51 |
dansmith | fungi: okay | 21:51 |
dansmith | any chance they're using encrypted storage? | 21:53 |
dansmith | that just adds more layers on top of things which sometimes have some restrictions | 21:53 |
dansmith | but it would be a lot of overhead to be using that stuff for root unless you're paying for it | 21:54 |
fungi | dansmith: i would not be surprised if it's encrypted storage, this is in citynetwork and their target clientele are exceedingly security-conscious | 21:54 |
fungi | but i don't know either way | 21:54 |
clarkb | ya I don't know either | 21:55 |
fungi | noonedeadpunk might know | 21:55 |
dansmith | okay, I think that the libvirt driver will often grow the disk _and_ filesystem if it's clear, but I guess I'm not positive.. but it can't always do that if there is encryption going on | 21:55 |
dansmith | yeah, | 21:59 |
dansmith | so that could explain the difference I guess. some image backend configuration that differs from what you're normally getting on other hosts | 21:59 |
clarkb | that is helpful to know as a possibility, thank you | 22:00 |
dansmith | https://github.com/openstack/nova/blob/7b5ac717bd338be32414ae25f60a4bfe4c94c0f4/nova/virt/disk/api.py#L110-L158 | 22:00 |
*** jcapitao|off has quit IRC | 22:03 | |
*** yamamoto has joined #openstack-infra | 22:11 | |
*** xek has quit IRC | 22:19 | |
*** vishalmanchanda has quit IRC | 22:22 | |
*** rcernin has joined #openstack-infra | 22:24 | |
*** slaweq has quit IRC | 22:30 | |
*** arxcruz|rover has quit IRC | 22:46 | |
*** arxcruz has joined #openstack-infra | 22:47 | |
*** tosky has quit IRC | 23:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!