*** eernst has quit IRC | 00:00 | |
*** felipemonteiro has joined #openstack-infra | 00:00 | |
*** yamamoto has quit IRC | 00:00 | |
*** eernst has joined #openstack-infra | 00:00 | |
*** heyongli has quit IRC | 00:01 | |
*** heyongli has joined #openstack-infra | 00:01 | |
*** dingyichen has joined #openstack-infra | 00:03 | |
*** SumitNaiksatam has quit IRC | 00:03 | |
corvus | that job is now running | 00:06 |
---|---|---|
*** dhill_ has quit IRC | 00:07 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/job/{job_name} route https://review.openstack.org/550978 | 00:07 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/projects and /{tenant}/project/{project} routes https://review.openstack.org/550979 | 00:08 |
*** r-daneel has quit IRC | 00:08 | |
*** felipemonteiro has quit IRC | 00:09 | |
clarkb | corvus: do you know how early the failures were happening? | 00:09 |
ianw | fungi / clarkb: i've updated the project references in hiera, luckily fungi templated them out differently to start with. let's see if that makes future puppet runs happier... | 00:09 |
corvus | clarkb: i think we need a devstack-multinode job for that | 00:10 |
clarkb | corvus: oh | 00:10 |
corvus | (ie, the new thing; i'm guessing this is the old one, otherwise everything would have been broken?) | 00:11 |
*** heyongli has quit IRC | 00:11 | |
fungi | ianw: oh, no need for a template change then? excellent! | 00:11 |
clarkb | hrm no I think we only run multinode on a small subset of stuff and its mostly non voting? | 00:11 |
*** heyongli has joined #openstack-infra | 00:11 | |
clarkb | but I will get link to devstack-multinode too | 00:11 |
clarkb | https://zuul.openstack.org/stream.html?uuid=959ee2a9b7f142298f39f0166a834b48&logfile=console.log devstack-multinode against same change as above | 00:12 |
*** rlandy is now known as rlandy|afk | 00:13 | |
*** felipemonteiro has joined #openstack-infra | 00:14 | |
*** felipemonteiro_ has joined #openstack-infra | 00:16 | |
ianw | does --os-project-name on the command line not override the values in clouds.yaml? | 00:18 |
clarkb | ianw: probably not if it is set in clouds.yaml explicitly | 00:19 |
clarkb | I think those flags are for selecting the right cloud in clouds.yaml mostly | 00:19 |
*** felipemonteiro has quit IRC | 00:20 | |
*** heyongli has quit IRC | 00:21 | |
*** heyongli has joined #openstack-infra | 00:21 | |
*** rossella_s has quit IRC | 00:21 | |
ianw | i'm deleting the servers out of the open-infra project (including the mirror) | 00:22 |
*** aeng has quit IRC | 00:24 | |
*** Swami has quit IRC | 00:25 | |
*** Swami_ has quit IRC | 00:25 | |
*** r-daneel has joined #openstack-infra | 00:25 | |
*** felipemonteiro_ has quit IRC | 00:25 | |
*** rossella_s has joined #openstack-infra | 00:25 | |
clarkb | I'm going to start dinner prep ping me if zuul needs attention | 00:25 |
clarkb | corvus: fungi ianw ^ | 00:26 |
*** aeng has joined #openstack-infra | 00:26 | |
*** sthussey has quit IRC | 00:26 | |
fungi | k | 00:27 |
*** r-daneel_ has joined #openstack-infra | 00:29 | |
*** r-daneel has quit IRC | 00:29 | |
*** r-daneel_ is now known as r-daneel | 00:29 | |
*** heyongli has quit IRC | 00:31 | |
*** heyongli has joined #openstack-infra | 00:32 | |
*** aeng has quit IRC | 00:32 | |
*** rossella_s has quit IRC | 00:33 | |
*** rossella_s has joined #openstack-infra | 00:35 | |
*** heyongli has quit IRC | 00:42 | |
*** heyongli has joined #openstack-infra | 00:42 | |
openstackgerrit | Merged openstack/diskimage-builder master: Add log directory option to functional tests https://review.openstack.org/570095 | 00:43 |
*** annp has joined #openstack-infra | 00:48 | |
*** rossella_s has quit IRC | 00:50 | |
*** r-daneel has quit IRC | 00:50 | |
*** heyongli has quit IRC | 00:52 | |
*** heyongli has joined #openstack-infra | 00:52 | |
spsurya | clarkb: hi... | 00:53 |
clarkb | spsurya: hello | 00:53 |
*** rossella_s has joined #openstack-infra | 00:53 | |
spsurya | clarkb: currently in CI/CD of Infra, infra deploy services into VMs not into containers. | 00:53 |
spsurya | right ? | 00:53 |
spsurya | if in VM, does any discussion going on in infra team to deploy and test with containerized services ? | 00:53 |
clarkb | spsurya: there are probably two separate but possibly overlapping things here. The first is our control plane and the other is the test instances that run our tests | 00:54 |
clarkb | spsurya: the control plane runs on VMs though there is early work to spec out running services for the control plane in containers | 00:55 |
clarkb | spsurya: for the test instances we also run those in VMs but we give you root in those instances and you can use them how you like including running containers | 00:55 |
clarkb | many people do do this | 00:55 |
*** yamamoto has joined #openstack-infra | 00:56 | |
clarkb | as for running tests directly in containers I think that may still be a way off as we add functionality to zuul for that. Also it complicates test isolation and security concerns | 00:56 |
clarkb | spsurya: hopefully that helps | 00:57 |
clarkb | spsurya: is there something specific you are looking to do? | 00:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add initial GraphQL controller https://review.openstack.org/574625 | 00:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: sql: use a declarative base model https://review.openstack.org/575275 | 00:57 |
*** eharney has quit IRC | 00:58 | |
spsurya | clarkb: I am asking about gate jobs to run it faster | 00:58 |
spsurya | thanks for detailed info | 00:59 |
fungi | i'm not sure how containers would suddenly make jobs run faster | 00:59 |
spsurya | clarkb: does zuul currently has CI/CD containerized ? | 01:01 |
*** aeng has joined #openstack-infra | 01:01 | |
spsurya | fungi: thanks for putting the doubt | 01:02 |
*** yamamoto has quit IRC | 01:02 | |
clarkb | spsurya: there is an crio driver iirc but not yet merged. Work is in progress to make sure we support containers that look like VMs and more application container like systeds like k8s | 01:02 |
*** heyongli has quit IRC | 01:02 | |
clarkb | and making sure it all works together | 01:02 |
*** heyongli has joined #openstack-infra | 01:02 | |
spsurya | fungi: AFAIK we boot use infra resources by running VMs in some way | 01:03 |
spsurya | also wanted to reduce the uses of resources | 01:04 |
spsurya | if we will do with containers | 01:04 |
fungi | yes, the goal is to support using zuul/nodepool in environments which have container management systems rather than virtual machine management systems, though i'm not really sure the efficiency will be much different either way | 01:04 |
*** pahuang has quit IRC | 01:04 | |
fungi | virtual machines and containers have converged quite a lot on performance as containers have realized the need for better isolation and virtualization hypervisors have found improved efficiencies | 01:05 |
spsurya | clarkb: thank you very much for detailed info, may i get the WIP link whatever | 01:06 |
*** aeng has quit IRC | 01:07 | |
*** rossella_s has quit IRC | 01:08 | |
ianw | fungi: ok, i'm close to getting stuck on what networks should be setup in these projects | 01:08 |
clarkb | https://review.openstack.org/#/c/560136/ | 01:09 |
mnaser | gerrit seems kinda slow | 01:09 |
clarkb | https://review.openstack.org/#/c/565550/ | 01:10 |
*** rossella_s has joined #openstack-infra | 01:10 | |
clarkb | spsurya: those two changes are the two specs in progress | 01:10 |
*** rlandy|afk is now known as rlandy | 01:10 | |
fungi | ianw: i don't see where we were provided with any additional network detail. i assumed (no doubt in error) that shade would be able to figure it out | 01:11 |
spsurya | fungi: thanks, i understand your point not making much difference in efficiency. But we can reduce the booting time and resource uses of infra, by starting container in place of VMs, please correct me if my understanding is wrong | 01:11 |
ianw | fungi: no, the new projects don't have a network associated. i'm trying to copy what open-infra has setup, see if it works ... | 01:11 |
spsurya | clarkb: thanks for providing the link | 01:12 |
*** heyongli has quit IRC | 01:12 | |
*** heyongli has joined #openstack-infra | 01:13 | |
fungi | spsurya: depends on the difference in your vm boot time and container start time, but also that ultimately just translates to some percentage overhead in your overall quota since nodepool already has provisions to keep nodes prepared in advance of assignment | 01:14 |
fungi | so under ideal circumstances the vm or container is already up and prepared to assign to a build by the time it's requested | 01:15 |
*** pahuang has joined #openstack-infra | 01:16 | |
*** heyongli has quit IRC | 01:23 | |
*** heyongli has joined #openstack-infra | 01:23 | |
ianw | sometimes these tools feel like being about half a step away from raw sql queries | 01:25 |
fungi | ianw: its almost like you've seen the true face of openstack? | 01:26 |
*** rpioso is now known as rpioso|afk | 01:26 | |
*** r-daneel has joined #openstack-infra | 01:32 | |
*** heyongli has quit IRC | 01:33 | |
*** heyongli has joined #openstack-infra | 01:33 | |
spsurya | fungi: understood, thanks for info, may be i need to go through current WIPs of control plan containerization, specification for using containers as build resources looks interesting https://review.openstack.org/#/c/560136/ | 01:33 |
*** rlandy has quit IRC | 01:35 | |
*** r-daneel has quit IRC | 01:37 | |
*** r-daneel has joined #openstack-infra | 01:40 | |
*** rossella_s has quit IRC | 01:40 | |
*** gyee has quit IRC | 01:41 | |
*** rossella_s has joined #openstack-infra | 01:42 | |
*** heyongli has quit IRC | 01:43 | |
*** heyongli has joined #openstack-infra | 01:43 | |
*** rossella_s has quit IRC | 01:47 | |
*** rossella_s has joined #openstack-infra | 01:49 | |
*** boris_42_ has quit IRC | 01:50 | |
*** bobh has joined #openstack-infra | 01:51 | |
*** heyongli has quit IRC | 01:53 | |
*** heyongli has joined #openstack-infra | 01:54 | |
*** s-shiono has joined #openstack-infra | 01:58 | |
*** yamamoto has joined #openstack-infra | 01:58 | |
*** hongbin has joined #openstack-infra | 02:01 | |
mnaser | is it possible to use `project-template` but in combination of manually defined jobs | 02:02 |
mnaser | ex: a project template that defines a set of generic jobs, but inside a project, overriding a job to make it non voting | 02:03 |
*** yamahata has quit IRC | 02:03 | |
*** heyongli has quit IRC | 02:04 | |
*** heyongli has joined #openstack-infra | 02:04 | |
*** yamamoto has quit IRC | 02:04 | |
*** iyamahat_ has quit IRC | 02:10 | |
*** felipemo_ has joined #openstack-infra | 02:12 | |
*** heyongli has quit IRC | 02:14 | |
*** heyongli has joined #openstack-infra | 02:14 | |
*** hemna_ has quit IRC | 02:15 | |
*** ramishra has joined #openstack-infra | 02:22 | |
*** neiloy has joined #openstack-infra | 02:23 | |
openstackgerrit | Merged openstack/diskimage-builder master: Rename output log files https://review.openstack.org/570096 | 02:23 |
openstackgerrit | Merged openstack/diskimage-builder master: Don't install zypper on bionic https://review.openstack.org/570500 | 02:23 |
*** rossella_s has quit IRC | 02:24 | |
*** pahuang has quit IRC | 02:24 | |
*** heyongli has quit IRC | 02:24 | |
*** heyongli has joined #openstack-infra | 02:24 | |
*** kjackal has joined #openstack-infra | 02:25 | |
*** heyongli has quit IRC | 02:34 | |
*** heyongli has joined #openstack-infra | 02:35 | |
*** pahuang has joined #openstack-infra | 02:36 | |
openstackgerrit | Nguyen Van Trung proposed openstack/diskimage-builder master: Fix 'Operation not supported' issue for setfiles https://review.openstack.org/575315 | 02:43 |
*** heyongli has quit IRC | 02:45 | |
*** heyongli has joined #openstack-infra | 02:45 | |
*** pbourke has quit IRC | 02:47 | |
*** pbourke has joined #openstack-infra | 02:48 | |
*** bobh has quit IRC | 02:50 | |
corvus | mnaser: yes -- any jobs added directly to a project pipeline will be merged with the same jobs added via the template. so the voting attribute would override what's in the template | 02:53 |
mnaser | corvus: wonderful. Thank you. | 02:54 |
*** heyongli has quit IRC | 02:55 | |
*** heyongli has joined #openstack-infra | 02:55 | |
openstackgerrit | Nguyen Van Trung proposed openstack/diskimage-builder master: Add iscsi-boot element https://review.openstack.org/511494 | 02:57 |
openstackgerrit | Nguyen Van Trung proposed openstack/diskimage-builder master: Add iscsi-boot element for CentOS images https://review.openstack.org/542708 | 02:58 |
gmann | corvus: gate job against lib repo patches is always fetch that lib from src + that patch change right? not from pypi. means we do not need to define current repo(where that job is running not where job is defined) as required_projects list ? | 02:59 |
*** yamamoto has joined #openstack-infra | 03:00 | |
*** heyongli has quit IRC | 03:05 | |
*** heyongli has joined #openstack-infra | 03:05 | |
*** yamamoto has quit IRC | 03:05 | |
*** SumitNaiksatam has joined #openstack-infra | 03:13 | |
*** heyongli has quit IRC | 03:15 | |
*** heyongli has joined #openstack-infra | 03:16 | |
*** heyongli has quit IRC | 03:26 | |
*** heyongli has joined #openstack-infra | 03:26 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: sql: use a declarative base model https://review.openstack.org/575275 | 03:26 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add initial GraphQL controller https://review.openstack.org/574625 | 03:26 |
*** yamahata has joined #openstack-infra | 03:35 | |
*** heyongli has quit IRC | 03:36 | |
*** heyongli has joined #openstack-infra | 03:36 | |
*** yamamoto has joined #openstack-infra | 03:38 | |
*** yamahata has quit IRC | 03:43 | |
*** dave-mccowan has quit IRC | 03:45 | |
*** heyongli has quit IRC | 03:46 | |
*** heyongli has joined #openstack-infra | 03:46 | |
*** sree has joined #openstack-infra | 03:48 | |
*** yamahata has joined #openstack-infra | 03:51 | |
*** udesale has joined #openstack-infra | 03:56 | |
*** heyongli has quit IRC | 03:56 | |
*** heyongli has joined #openstack-infra | 03:57 | |
*** andreww has quit IRC | 03:57 | |
*** xarses has joined #openstack-infra | 03:58 | |
*** annp has quit IRC | 04:00 | |
*** annp has joined #openstack-infra | 04:00 | |
*** heyongli has quit IRC | 04:07 | |
*** heyongli has joined #openstack-infra | 04:07 | |
*** dhajare-brb has joined #openstack-infra | 04:10 | |
*** germs has quit IRC | 04:12 | |
*** ykarel|away has joined #openstack-infra | 04:16 | |
*** ykarel|away is now known as ykarel | 04:16 | |
*** heyongli has quit IRC | 04:17 | |
*** heyongli has joined #openstack-infra | 04:17 | |
*** hongbin has quit IRC | 04:21 | |
*** eernst has quit IRC | 04:21 | |
*** lifeless_ has quit IRC | 04:22 | |
*** stakeda has joined #openstack-infra | 04:23 | |
*** heyongli has quit IRC | 04:27 | |
*** heyongli has joined #openstack-infra | 04:27 | |
*** agopi has quit IRC | 04:31 | |
*** heyongli has quit IRC | 04:37 | |
*** heyongli has joined #openstack-infra | 04:38 | |
*** threestrands has quit IRC | 04:47 | |
*** heyongli has quit IRC | 04:48 | |
*** heyongli has joined #openstack-infra | 04:48 | |
*** e0ne has joined #openstack-infra | 04:53 | |
*** janki has joined #openstack-infra | 04:57 | |
*** heyongli has quit IRC | 04:58 | |
*** heyongli has joined #openstack-infra | 04:58 | |
*** e0ne has quit IRC | 05:00 | |
*** links has joined #openstack-infra | 05:00 | |
*** felipemo_ has quit IRC | 05:07 | |
*** heyongli has quit IRC | 05:08 | |
*** heyongli has joined #openstack-infra | 05:08 | |
*** pcaruana has quit IRC | 05:09 | |
*** dhajare-brb has quit IRC | 05:12 | |
*** lifeless has joined #openstack-infra | 05:17 | |
*** heyongli has quit IRC | 05:18 | |
*** heyongli has joined #openstack-infra | 05:19 | |
*** dhajare-brb has joined #openstack-infra | 05:29 | |
*** heyongli has quit IRC | 05:29 | |
*** heyongli has joined #openstack-infra | 05:29 | |
*** kzaitsev_pi has quit IRC | 05:33 | |
*** heyongli has quit IRC | 05:39 | |
*** heyongli has joined #openstack-infra | 05:39 | |
*** pcichy has quit IRC | 05:40 | |
*** jaosorior has quit IRC | 05:41 | |
*** heyongli has quit IRC | 05:49 | |
*** heyongli has joined #openstack-infra | 05:49 | |
*** slaweq has quit IRC | 05:53 | |
*** cshastri has joined #openstack-infra | 05:55 | |
*** pcichy has joined #openstack-infra | 05:55 | |
*** heyongli has quit IRC | 05:59 | |
*** heyongli has joined #openstack-infra | 06:00 | |
*** dhajare-brb has quit IRC | 06:02 | |
*** hjensas has quit IRC | 06:05 | |
AJaeger | mnaser: keep in mind that we have voting jobs in both check and gate - and non-voting only in check queue. So, while you can override a voting job with non-voting this way, it'S not nice | 06:07 |
*** heyongli has quit IRC | 06:10 | |
*** heyongli has joined #openstack-infra | 06:10 | |
*** jesslampe has joined #openstack-infra | 06:12 | |
*** pcaruana has joined #openstack-infra | 06:17 | |
*** mtreinish has quit IRC | 06:19 | |
*** heyongli has quit IRC | 06:20 | |
*** heyongli has joined #openstack-infra | 06:20 | |
*** rajinir has quit IRC | 06:22 | |
*** AJaeger has quit IRC | 06:24 | |
*** threestrands has joined #openstack-infra | 06:27 | |
*** dhajare has joined #openstack-infra | 06:28 | |
*** anteaya has quit IRC | 06:29 | |
*** heyongli has quit IRC | 06:30 | |
*** heyongli has joined #openstack-infra | 06:30 | |
*** jesslampe has quit IRC | 06:31 | |
*** jesslampe has joined #openstack-infra | 06:31 | |
*** aojea has joined #openstack-infra | 06:33 | |
*** hashar has joined #openstack-infra | 06:33 | |
*** jesslampe has quit IRC | 06:36 | |
*** shardy has joined #openstack-infra | 06:38 | |
*** dingyichen has quit IRC | 06:39 | |
*** heyongli has quit IRC | 06:40 | |
*** heyongli has joined #openstack-infra | 06:41 | |
*** iyamahat has joined #openstack-infra | 06:48 | |
*** heyongli has quit IRC | 06:51 | |
*** heyongli has joined #openstack-infra | 06:51 | |
*** ccamacho has joined #openstack-infra | 06:54 | |
*** slaweq has joined #openstack-infra | 06:56 | |
*** AJaeger has joined #openstack-infra | 06:57 | |
*** mtreinish has joined #openstack-infra | 06:58 | |
*** heyongli has quit IRC | 07:01 | |
*** heyongli has joined #openstack-infra | 07:01 | |
*** jcoufal has joined #openstack-infra | 07:07 | |
*** heyongli has quit IRC | 07:11 | |
*** heyongli has joined #openstack-infra | 07:11 | |
*** evrardjp_ is now known as evrardjp | 07:12 | |
*** amoralej|off is now known as amoralej | 07:12 | |
*** pblaho has quit IRC | 07:13 | |
*** tesseract has joined #openstack-infra | 07:16 | |
*** heyongli has quit IRC | 07:21 | |
*** heyongli has joined #openstack-infra | 07:22 | |
*** pblaho has joined #openstack-infra | 07:23 | |
*** efried has quit IRC | 07:26 | |
*** efried has joined #openstack-infra | 07:27 | |
*** rcernin has quit IRC | 07:27 | |
*** lifeless has quit IRC | 07:28 | |
*** gfidente has joined #openstack-infra | 07:29 | |
*** gfidente has joined #openstack-infra | 07:29 | |
*** jpena|off is now known as jpena | 07:30 | |
*** heyongli has quit IRC | 07:32 | |
*** heyongli has joined #openstack-infra | 07:32 | |
*** lifeless has joined #openstack-infra | 07:35 | |
*** tosky has joined #openstack-infra | 07:37 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Move zuul_log_id injection to command action plugin https://review.openstack.org/575351 | 07:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix log streaming for delegated hosts https://review.openstack.org/575352 | 07:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Revert "Temporarily override Ansible linear strategy" https://review.openstack.org/575353 | 07:38 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove extra argument when logging logger timeout https://review.openstack.org/575354 | 07:38 |
*** heyongli has quit IRC | 07:42 | |
*** heyongli has joined #openstack-infra | 07:42 | |
*** zoli is now known as zoli|wfh | 07:43 | |
*** zoli|wfh is now known as zoli | 07:43 | |
*** flaper87 has quit IRC | 07:49 | |
*** armaan has joined #openstack-infra | 07:49 | |
*** heyongli has quit IRC | 07:52 | |
*** janki has quit IRC | 07:52 | |
*** heyongli has joined #openstack-infra | 07:52 | |
*** jpich has joined #openstack-infra | 07:57 | |
*** flaper87 has joined #openstack-infra | 07:58 | |
*** kzaitsev_pi has joined #openstack-infra | 07:58 | |
*** florianf has joined #openstack-infra | 08:00 | |
*** heyongli has quit IRC | 08:02 | |
*** heyongli has joined #openstack-infra | 08:03 | |
*** kamren has quit IRC | 08:03 | |
*** janki has joined #openstack-infra | 08:09 | |
*** heyongli has quit IRC | 08:13 | |
*** heyongli has joined #openstack-infra | 08:13 | |
*** hamzy_ has joined #openstack-infra | 08:18 | |
*** ykarel is now known as ykarel|lunch | 08:19 | |
*** hamzy has quit IRC | 08:20 | |
*** owalsh has joined #openstack-infra | 08:21 | |
*** heyongli has quit IRC | 08:23 | |
*** heyongli has joined #openstack-infra | 08:23 | |
*** armaan has quit IRC | 08:24 | |
*** armaan has joined #openstack-infra | 08:26 | |
*** bdodd_ has joined #openstack-infra | 08:28 | |
*** owalsh has quit IRC | 08:28 | |
*** alexchadin has joined #openstack-infra | 08:29 | |
*** bdodd has quit IRC | 08:30 | |
*** armaan has quit IRC | 08:30 | |
*** sree has quit IRC | 08:32 | |
*** electrofelix has joined #openstack-infra | 08:33 | |
*** heyongli has quit IRC | 08:33 | |
*** heyongli has joined #openstack-infra | 08:33 | |
*** armaan has joined #openstack-infra | 08:34 | |
*** ianychoi has quit IRC | 08:39 | |
*** owalsh has joined #openstack-infra | 08:40 | |
*** armaan has quit IRC | 08:41 | |
*** heyongli has quit IRC | 08:43 | |
*** heyongli has joined #openstack-infra | 08:44 | |
*** derekh has joined #openstack-infra | 08:45 | |
*** iyamahat has quit IRC | 08:51 | |
*** heyongli has quit IRC | 08:54 | |
*** heyongli has joined #openstack-infra | 08:54 | |
*** yamahata has quit IRC | 08:54 | |
*** armaan has joined #openstack-infra | 08:57 | |
*** s-shiono has quit IRC | 09:02 | |
*** heyongli has quit IRC | 09:04 | |
*** heyongli has joined #openstack-infra | 09:04 | |
*** ykarel|lunch is now known as ykarel | 09:09 | |
*** armaan has quit IRC | 09:11 | |
*** armaan has joined #openstack-infra | 09:13 | |
*** heyongli has quit IRC | 09:14 | |
*** heyongli has joined #openstack-infra | 09:14 | |
*** jaosorior has joined #openstack-infra | 09:15 | |
*** dougsz has joined #openstack-infra | 09:22 | |
*** zhangfei has joined #openstack-infra | 09:23 | |
dougsz | Any ideas why there is no tarball for http://tarballs.openstack.org/monasca-thresh/ ? | 09:23 |
dougsz | (the main thing that differentiates it is that it's a Java project (!)) | 09:24 |
*** heyongli has quit IRC | 09:24 | |
*** heyongli has joined #openstack-infra | 09:25 | |
*** kamren has joined #openstack-infra | 09:27 | |
*** e0ne has joined #openstack-infra | 09:28 | |
*** udesale_ has joined #openstack-infra | 09:30 | |
*** chkumar246 has joined #openstack-infra | 09:30 | |
*** dhajare_ has joined #openstack-infra | 09:30 | |
*** dtantsur|afk is now known as dtantsur | 09:31 | |
*** cshastri_ has joined #openstack-infra | 09:32 | |
*** links has quit IRC | 09:33 | |
*** links has joined #openstack-infra | 09:33 | |
*** chandankumar has quit IRC | 09:33 | |
*** udesale__ has joined #openstack-infra | 09:33 | |
*** udesale has quit IRC | 09:34 | |
*** cshastri has quit IRC | 09:34 | |
*** dhajare has quit IRC | 09:34 | |
frickler | dougsz: might well be that the release jobs are broken for it, but since the last release seems to have been 8 weeks ago, the logs have expired, so it is difficult to check this | 09:34 |
*** heyongli has quit IRC | 09:35 | |
*** heyongli has joined #openstack-infra | 09:35 | |
frickler | dougsz: the existence of this directory makes it very likely to me that something is broken with the release jobs: http://tarballs.openstack.org/monasca-thresh/$ZUUL_SHORT_PROJECT_NAME/ | 09:35 |
*** chkumar246 has quit IRC | 09:35 | |
*** dhajare_ has quit IRC | 09:36 | |
*** udesale_ has quit IRC | 09:36 | |
*** chandankumar has joined #openstack-infra | 09:39 | |
AJaeger | dougsz, frickler, looking at the job configuration, I see as post job " legacy-monasca-thresh-localrepo-upload" | 09:39 |
AJaeger | we had many broken legacy upload jobs, I will just assume this never worked and needs porting to the new way of uploading... | 09:40 |
frickler | AJaeger: yes, according to zuul it ran with status success on 2018-04-10T15:39:43 which seems to be when the last release was tagged, but one would need the logs in order to dig deeper I think. | 09:41 |
frickler | but I'm not sure how to really debug this other than tagging some new release and possibly repeating that until one can fix it | 09:42 |
AJaeger | frickler: post job - runs after each merge | 09:42 |
AJaeger | frickler: so, just merge something ;) | 09:42 |
AJaeger | and last merge was 10th of April... | 09:42 |
AJaeger | frickler: I suggest to just rewrite that job from scratch - but cannot help. Perhaps mordred can? | 09:43 |
*** kamren has quit IRC | 09:44 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Move zuul_log_id injection to command action plugin https://review.openstack.org/575351 | 09:44 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix log streaming for delegated hosts https://review.openstack.org/575352 | 09:44 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Revert "Temporarily override Ansible linear strategy" https://review.openstack.org/575353 | 09:44 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove extra argument when logging logger timeout https://review.openstack.org/575354 | 09:44 |
frickler | AJaeger: oh, indeed, I just saw the last tag was created 8 weeks ago and assumed that that was the event | 09:44 |
*** heyongli has quit IRC | 09:45 | |
*** heyongli has joined #openstack-infra | 09:45 | |
*** dhajare_ has joined #openstack-infra | 09:49 | |
*** threestrands has quit IRC | 09:54 | |
*** heyongli has quit IRC | 09:55 | |
*** heyongli has joined #openstack-infra | 09:56 | |
*** heyongli has quit IRC | 10:05 | |
*** heyongli has joined #openstack-infra | 10:06 | |
e0ne | hi. could anybody please help we with configuring a new grenade job? | 10:08 |
e0ne | it fails, there is almost no logs :( | 10:08 |
e0ne | e.g.: http://logs.openstack.org/15/575115/6/check/grenade-vitrage/25b6457/ | 10:08 |
dougsz | frickler, AJaeger, thanks for the insight. It's good to at least know there should be a tarball | 10:15 |
*** heyongli has quit IRC | 10:16 | |
*** heyongli has joined #openstack-infra | 10:16 | |
*** kjackal has quit IRC | 10:22 | |
dougsz | perhaps I can use the jar from the Maven build for now, just need to find where it's published. (I'm adding support for deploying monasca-thresh to kolla). | 10:22 |
*** hjensas has joined #openstack-infra | 10:24 | |
jamespage | morning - is there a way to see why https://review.openstack.org/#/c/573217/ did not post-commit publish to https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/ ? | 10:25 |
*** heyongli has quit IRC | 10:26 | |
*** gnuoy has joined #openstack-infra | 10:26 | |
*** heyongli has joined #openstack-infra | 10:26 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Move zuul_log_id injection to command action plugin https://review.openstack.org/575351 | 10:28 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix log streaming for delegated hosts https://review.openstack.org/575352 | 10:28 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Revert "Temporarily override Ansible linear strategy" https://review.openstack.org/575353 | 10:28 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove extra argument when logging logger timeout https://review.openstack.org/575354 | 10:28 |
AJaeger | e0ne: better ask on #openstack-qa | 10:35 |
e0ne | AJaeger: got it, thanks! | 10:35 |
*** heyongli has quit IRC | 10:36 | |
*** heyongli has joined #openstack-infra | 10:36 | |
AJaeger | jamespage: https://wiki.openstack.org/wiki/Infrastructure_Status shows that a cuople of hours afterwards we restarted zuul, it might be that the post job was lost. | 10:37 |
*** nicolasbock has joined #openstack-infra | 10:37 | |
AJaeger | but let's check... | 10:37 |
jamespage | ta | 10:37 |
AJaeger | jamespage: http://zuul.openstack.org/builds.html?job_name=publish-deploy-guide&project=openstack%2Fcharm-deployment-guide | 10:39 |
AJaeger | so, no runs in June - probably the restart | 10:39 |
jamespage | AJaeger: OK - best to shove through another commit? | 10:40 |
AJaeger | jamespage: yeah, if possible. An infra-root can also manually trigger it if needed | 10:40 |
AJaeger | jamespage: remove the index from the main page - http://logs.openstack.org/17/573217/3/check/build-openstack-deploy-guide/a3e26ef/html/genindex.html gives 404 ;) | 10:41 |
jamespage | AJaeger: ack will do | 10:43 |
stephenfin | mtreinish: How difficult would it be for stestr to accept a pytest-style path to a function instead of a Python module path? e.g. nova/tests/unit/virt/test_hardware.py::VirtNUMATopologyCellUsageTestCase | 10:44 |
stephenfin | (asking here as I don't know where to ask those questions) | 10:44 |
*** kjackal has joined #openstack-infra | 10:46 | |
*** heyongli has quit IRC | 10:46 | |
*** heyongli has joined #openstack-infra | 10:47 | |
*** dtantsur is now known as dtantsur|brb | 10:50 | |
openstackgerrit | boden proposed openstack-infra/project-config master: add lower constraints to vmware-nsx gate pipeline https://review.openstack.org/575399 | 10:53 |
*** heyongli has quit IRC | 10:57 | |
*** jpena is now known as jpena|lunch | 10:57 | |
*** heyongli has joined #openstack-infra | 10:57 | |
*** annp has quit IRC | 11:02 | |
*** yamamoto has quit IRC | 11:02 | |
*** heyongli has quit IRC | 11:07 | |
*** heyongli has joined #openstack-infra | 11:07 | |
*** zoli is now known as zoli|lunch | 11:09 | |
*** alexchadin has quit IRC | 11:11 | |
*** armaan has quit IRC | 11:15 | |
*** heyongli has quit IRC | 11:17 | |
*** heyongli has joined #openstack-infra | 11:18 | |
*** heyongli has quit IRC | 11:27 | |
*** heyongli has joined #openstack-infra | 11:28 | |
*** ykarel_ has joined #openstack-infra | 11:31 | |
*** ykarel has quit IRC | 11:34 | |
*** ykarel_ is now known as ykarel | 11:34 | |
*** ldnunes has joined #openstack-infra | 11:35 | |
*** heyongli has quit IRC | 11:38 | |
*** heyongli has joined #openstack-infra | 11:38 | |
mnaser | AJaeger: ok interesting but I was thinking it can be a quick temporary way but it’s all one big long term thing of improving OSA Ci | 11:39 |
*** nicolasbock has quit IRC | 11:42 | |
*** udesale_ has joined #openstack-infra | 11:42 | |
AJaeger | mnaser: often those temporary hacks stay for ever ;/ - but yes, it could be used as a temporary band-aid ;) | 11:44 |
*** yamamoto has joined #openstack-infra | 11:44 | |
mnaser | AJaeger: my goal is for OSA to use project-template only so that we disable in one repo only and it will disable everywhere so we will always notice that it must be fixed | 11:45 |
mnaser | And when fixing it will be one place only rather than be scattered and inconsistent | 11:45 |
*** udesale__ has quit IRC | 11:45 | |
AJaeger | mnaser: good plan | 11:46 |
* AJaeger loves templates ;) | 11:46 | |
*** dtantsur|brb is now known as dtantsur | 11:46 | |
mnaser | you and I both :) | 11:46 |
*** heyongli has quit IRC | 11:48 | |
*** heyongli has joined #openstack-infra | 11:48 | |
*** yamamoto has quit IRC | 11:48 | |
*** udesale_ has quit IRC | 11:48 | |
*** sthussey has joined #openstack-infra | 11:50 | |
*** gfidente has quit IRC | 11:53 | |
*** roman_g has joined #openstack-infra | 11:54 | |
*** gfidente has joined #openstack-infra | 11:56 | |
*** heyongli has quit IRC | 11:58 | |
*** heyongli has joined #openstack-infra | 11:58 | |
*** rlandy has joined #openstack-infra | 12:01 | |
*** boden has joined #openstack-infra | 12:01 | |
*** jpena|lunch is now known as jpena | 12:03 | |
boden | AJaeger hi.. thx for review on https://review.openstack.org/#/c/575399/ but your comments don’t match the discussion I had the other day with other infra folks: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-06-07.log.html#t2018-06-07T17:35:41 | 12:03 |
boden | so admittedly I’m confused | 12:03 |
*** amoralej is now known as amoralej|lunch | 12:03 | |
AJaeger | boden: I'M talking about the vmware job only - it should really have been done in your tree | 12:03 |
boden | AJaeger I understand, I’m just telling you I was told to put them projects.yaml | 12:04 |
AJaeger | boden: I expect that clarkb was not aware that the job is in your tree | 12:04 |
*** yamamoto has joined #openstack-infra | 12:06 | |
frickler | AJaeger: boden: oh, then this was done wrong already in https://review.openstack.org/#/c/573386/1/zuul.d/projects.yaml | 12:08 |
boden | frickler, yeah I’ll submit a patch to remove that as per AJaeger comments | 12:08 |
*** heyongli has quit IRC | 12:08 | |
AJaeger | frickler: yes, agreed | 12:09 |
*** heyongli has joined #openstack-infra | 12:09 | |
*** yamamoto has quit IRC | 12:10 | |
*** nicolasbock has joined #openstack-infra | 12:11 | |
openstackgerrit | boden proposed openstack-infra/project-config master: remove lower constraints to vmware-nsx gate pipeline https://review.openstack.org/575399 | 12:12 |
boden | frickler ^ | 12:12 |
AJaeger | boden: thanks | 12:12 |
boden | AJaeger thanks.. sorry for confusion, I should’ve seen the local defs for those jobs | 12:13 |
AJaeger | boden: happy to review your change for vmware-nsx - feel free to CC me on it. | 12:14 |
AJaeger | boden: foudn it;) | 12:15 |
*** heyongli has quit IRC | 12:19 | |
*** heyongli has joined #openstack-infra | 12:19 | |
*** tpsilva has joined #openstack-infra | 12:21 | |
*** kgiusti has joined #openstack-infra | 12:21 | |
*** dhajare_ has quit IRC | 12:22 | |
tosky | frickler: regarding that issue with orchestrate-devstack, I confirm that it's solved now | 12:23 |
*** felipemonteiro has joined #openstack-infra | 12:26 | |
*** trown|outtypewww is now known as trown | 12:27 | |
*** heyongli has quit IRC | 12:29 | |
*** heyongli has joined #openstack-infra | 12:29 | |
pabelanger | heads up, 2 weeks until SSL certs expire: Jun 30 23:59:59 2018 GMT according to inbox | 12:33 |
*** zoli|lunch is now known as zoli|wfh | 12:35 | |
*** zoli|wfh is now known as zoli | 12:35 | |
*** myoung|off is now known as myoung | 12:36 | |
fungi | yup | 12:38 |
fungi | i believe clarkb is preparing to replace them in the next week-ish | 12:38 |
*** heyongli has quit IRC | 12:39 | |
*** heyongli has joined #openstack-infra | 12:39 | |
*** yamamoto has joined #openstack-infra | 12:40 | |
*** dhajare has joined #openstack-infra | 12:42 | |
*** yamamoto has quit IRC | 12:45 | |
*** ianychoi has joined #openstack-infra | 12:46 | |
*** felipemonteiro has quit IRC | 12:47 | |
*** lifeless has quit IRC | 12:48 | |
*** lifeless has joined #openstack-infra | 12:49 | |
*** heyongli has quit IRC | 12:49 | |
*** gfidente has quit IRC | 12:49 | |
*** heyongli has joined #openstack-infra | 12:50 | |
openstackgerrit | Merged openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173 | 12:51 |
*** gfidente has joined #openstack-infra | 12:51 | |
*** gfidente has joined #openstack-infra | 12:51 | |
*** cshastri_ has quit IRC | 12:53 | |
*** edmondsw has joined #openstack-infra | 12:54 | |
*** florianf has quit IRC | 12:55 | |
*** esarault has joined #openstack-infra | 12:56 | |
*** florianf has joined #openstack-infra | 12:58 | |
*** heyongli has quit IRC | 13:00 | |
*** heyongli has joined #openstack-infra | 13:00 | |
*** camunoz has joined #openstack-infra | 13:00 | |
*** mriedem has joined #openstack-infra | 13:03 | |
*** iyamahat has joined #openstack-infra | 13:04 | |
*** lihi has quit IRC | 13:06 | |
*** lihi has joined #openstack-infra | 13:06 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove extra argument when logging logger timeout https://review.openstack.org/575354 | 13:07 |
flaper87 | fungi: I re ran the tripleo-ci-centos-7-scenario009-multinode-oooq job for 574233 this morning to test another PS. Is the autohold still on? Or does it have to be enabled on demand? | 13:08 |
flaper87 | you can release the 2 servers you held for me yday | 13:09 |
mnaser | clarkb: is there a way to force a recheck on the kata repos, i'd like to iterate on your work | 13:09 |
*** dave-mccowan has joined #openstack-infra | 13:09 | |
*** felipemonteiro has joined #openstack-infra | 13:09 | |
*** eernst has joined #openstack-infra | 13:09 | |
*** heyongli has quit IRC | 13:10 | |
*** heyongli has joined #openstack-infra | 13:10 | |
*** armaan has joined #openstack-infra | 13:15 | |
mnaser | is there a way for us to get zuul to hold a vm the next time a job fails? | 13:15 |
mnaser | the `openstack-ansible-deploy-ceph-ubuntu-xenial` non-deterministically fails and we've gathered a lot of data but it doesn't seem to be still be enough :( | 13:16 |
*** felipemonteiro has quit IRC | 13:16 | |
mnaser | ending up with "IOError: [Errno 28] No space left on device" even though there is plenty of space left on the device according to df | 13:16 |
fungi | flaper87: i set it for a single count. looking to see if it held any more but i wouldn't expect it to until i delete the old ones | 13:17 |
flaper87 | fungi: understood. Let me know, worst case, I'll recheck it | 13:17 |
fungi | flaper87: okay, old held nodes have been deleted and a fresh autohold has been set. rechect as needed | 13:19 |
flaper87 | fungi: done, thanks! | 13:20 |
*** heyongli has quit IRC | 13:20 | |
*** heyongli has joined #openstack-infra | 13:20 | |
*** Goneri has joined #openstack-infra | 13:21 | |
*** hemna_ has joined #openstack-infra | 13:21 | |
*** dhill_ has joined #openstack-infra | 13:22 | |
*** yamamoto has joined #openstack-infra | 13:22 | |
*** jamesdenton has joined #openstack-infra | 13:23 | |
*** r-daneel has quit IRC | 13:26 | |
*** agopi has joined #openstack-infra | 13:27 | |
*** yamamoto has quit IRC | 13:27 | |
*** armaan has quit IRC | 13:29 | |
*** armaan has joined #openstack-infra | 13:29 | |
frickler | mnaser: that's for project "openstack-ansible" correct? | 13:29 |
*** amoralej|lunch is now known as amoralej | 13:29 | |
*** eernst has quit IRC | 13:29 | |
*** eernst has joined #openstack-infra | 13:30 | |
*** heyongli has quit IRC | 13:30 | |
*** heyongli has joined #openstack-infra | 13:31 | |
*** armaan has quit IRC | 13:32 | |
*** armaan has joined #openstack-infra | 13:32 | |
*** eernst has quit IRC | 13:34 | |
*** armaan has quit IRC | 13:37 | |
*** yamamoto has joined #openstack-infra | 13:38 | |
*** florianf has quit IRC | 13:40 | |
*** heyongli has quit IRC | 13:41 | |
*** heyongli has joined #openstack-infra | 13:41 | |
frickler | mnaser: I did set an autohold and it catched a node almost immediately. so with https://github.com/mnaser.keys you should be able to access root@104.130.163.27 now for further debugging | 13:41 |
*** armaan has joined #openstack-infra | 13:42 | |
*** florianf has joined #openstack-infra | 13:42 | |
*** yamamoto has quit IRC | 13:42 | |
*** iyamahat has quit IRC | 13:43 | |
frickler | mnaser: which is a bit strange, because I cannot see results for this job yet ... hmm | 13:44 |
*** armaan has quit IRC | 13:46 | |
*** shaner has quit IRC | 13:46 | |
*** shaner has joined #openstack-infra | 13:46 | |
*** eharney has joined #openstack-infra | 13:49 | |
fungi | frickler: was the job perhaps already running for another change? did you limit the autohold by change as well as project and job? | 13:49 |
*** heyongli has quit IRC | 13:51 | |
*** heyongli has joined #openstack-infra | 13:51 | |
*** ccamacho has quit IRC | 13:51 | |
frickler | fungi: mnaser didn't mention a change, so I was assuming that the failure was happening for any job. but I was still thinking that I should find the job for the held node at http://zuul.openstack.org/builds.html?job_name=openstack-ansible-deploy-ceph-ubuntu-xenial | 13:52 |
fungi | ahh | 13:52 |
*** dave-mccowan has quit IRC | 13:52 | |
*** Tahvok has quit IRC | 13:54 | |
fungi | `grep 0000123384 /var/log/zuul/debug.log` on zuul01 | 13:55 |
*** jesslampe has joined #openstack-infra | 13:55 | |
fungi | looks like it was running for 559452,1 | 13:56 |
*** r-daneel has joined #openstack-infra | 13:56 | |
EmilienM | did anything outstanding changed in the doc jobs lately? no job has been running on tripleo-docs repo for the last 3 days | 13:57 |
EmilienM | I've checked project-config, nothing much on that regard lately | 13:57 |
*** linkmark has quit IRC | 13:57 | |
fungi | EmilienM: could it be due to the files vs irrelevant-files behavior change in zuul? | 13:58 |
EmilienM | I guess it could | 13:58 |
EmilienM | weshay|ruck, mwhahaha ^ fyi | 13:58 |
*** r-daneel_ has joined #openstack-infra | 13:59 | |
fungi | that went into effect late mondau | 13:59 |
fungi | monday | 13:59 |
EmilienM | I guess it's related | 14:00 |
*** r-daneel has quit IRC | 14:00 | |
*** r-daneel_ is now known as r-daneel | 14:00 | |
EmilienM | but we don't have zuul layout in the repo | 14:00 |
fungi | rm_work: weshay|ruck: mwhahaha: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131304.html | 14:00 |
fungi | for a refresher | 14:00 |
*** shardy has quit IRC | 14:01 | |
*** heyongli has quit IRC | 14:01 | |
*** rajinir has joined #openstack-infra | 14:01 | |
*** heyongli has joined #openstack-infra | 14:01 | |
frickler | fungi: oh, so the builds list only gets updated when all jobs for a change have finished it seems | 14:02 |
*** Tahvok has joined #openstack-infra | 14:02 | |
fungi | frickler: i guess so, as the db inserts are via a reporter | 14:03 |
frickler | mnaser: sadly this one seems to haved failed early with a different error: http://logs.openstack.org/52/559452/1/check/openstack-ansible-deploy-ceph-ubuntu-xenial/42af3d3/job-output.txt.gz#_2018-06-14_13_30_04_625435 | 14:03 |
fungi | e-mail reporter sends a message when all builds complete, gerrit/github reporters leave a comment when all builds complete, so i suppose the mysql reporter performs db inserts once all builds complete | 14:03 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Don't follow symlinks when setting log permissions https://review.openstack.org/575439 | 14:04 |
*** jcoufal has quit IRC | 14:04 | |
frickler | fungi: makes sense, yes, just was a bit unexpected for me | 14:04 |
*** ykarel is now known as ykarel|away | 14:04 | |
fungi | me too! | 14:04 |
mordred | fungi, frickler: ^^ found that in tracking down why https://review.openstack.org/#/c/551989/ keeps failing in post | 14:04 |
fungi | i hadn't considered it | 14:04 |
mordred | oh - my commit message is wrong | 14:05 |
fungi | mordred: random behavior change in a minor ansible release? did you find it in the release notes, or is it more likely a regression? | 14:05 |
mordred | fungi: it's a change - I was looking at the 2.4 docs on the default by mistake | 14:06 |
mordred | https://docs.ansible.com/ansible/latest/modules/file_module.html | 14:06 |
*** r-daneel_ has joined #openstack-infra | 14:06 | |
mordred | shows that the default cahnged in 2.5 | 14:06 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Don't follow symlinks when setting log permissions https://review.openstack.org/575439 | 14:07 |
mordred | fungi: fixed commit message to be accurate | 14:07 |
fungi | this seems likely to catch a lot of jobs off guard | 14:07 |
*** r-daneel has quit IRC | 14:07 | |
*** r-daneel_ is now known as r-daneel | 14:07 | |
fungi | i'm surprised we haven't seen more reports of it before now | 14:07 |
mordred | possibly - although you kind of have to work at it to make symlinks in the logs that do sensible things | 14:08 |
*** yamamoto has joined #openstack-infra | 14:08 | |
* mordred has some in the new zuul multi-tenant web dashboard job to simulate what apache rewrite rules would be doing :) | 14:08 | |
mordred | that said - yes - I'm also surprised | 14:08 |
fungi | mordred: did you see ianw's e-mail about shade behavior with network guessing in our packethost environment? | 14:09 |
fungi | curious whether you have any ideas there | 14:10 |
mordred | fungi: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.5.html#noteworthy-module-changes does list the change, so there is that | 14:10 |
mordred | fungi: I did not - which list? | 14:10 |
mordred | oh - just direct | 14:10 |
mordred | one sec | 14:10 |
fungi | yeah. was just a private e-mail to me, clarkb and studarus, cc'd to you | 14:11 |
fungi | not sure why he was concerned about putting that on the infra ml | 14:11 |
*** heyongli has quit IRC | 14:11 | |
*** heyongli has joined #openstack-infra | 14:12 | |
mordred | WELL | 14:12 |
*** yamamoto has quit IRC | 14:12 | |
mordred | I have never seen the values "Internal" and "External" for router:external before - those are usually true or false | 14:12 |
mordred | slaweq: ^^ is this a neutron change? | 14:12 |
mordred | slaweq: (seeing a cloud that returns Internal or External for router:external) | 14:13 |
*** links has quit IRC | 14:15 | |
*** jesslampe has quit IRC | 14:15 | |
*** jesslampe has joined #openstack-infra | 14:16 | |
frickler | mordred: that sounds broken to me, do you really see that in the api response? | 14:18 |
*** yamamoto has joined #openstack-infra | 14:20 | |
*** yamamoto has quit IRC | 14:20 | |
frickler | mordred: OSC does seem to translate the bool, though | 14:21 |
mordred | frickler: yes. ianw got it from packethost | 14:21 |
mordred | frickler: really? | 14:21 |
mordred | frickler: *headdesk* | 14:21 |
*** jesslampe has quit IRC | 14:21 | |
mordred | that's actually extremely unhelpfulk | 14:21 |
fungi | nutty | 14:21 |
frickler | openstackclient/network/v2/network.py: return 'External' if item else 'Internal' | 14:22 |
mordred | fungi: in any case, I think we should just do with packethost the same thing we do for internap - mark the networks internal and external | 14:22 |
*** heyongli has quit IRC | 14:22 | |
mordred | frickler: that is upsetting to me | 14:22 |
*** heyongli has joined #openstack-infra | 14:22 | |
mordred | frickler: I guess it's been there since 2015 though - so such is life | 14:23 |
AJaeger | mordred: could you review https://review.openstack.org/#/c/570260/ , please? It looks sane to me but I know nothing about npm... | 14:23 |
mordred | the thing is - it's not even an accurate translation - since the router:external attribute does not actually mean internal/external | 14:23 |
mordred | frickler: so that translation actually increases the confusion about what that means | 14:23 |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: Add anchors to link to specific parts of index https://review.openstack.org/575445 | 14:24 |
fungi | mordred: so was this a misconfiguration in the deployment itself? or a bug in neutron? (or the latter making the former possible?) | 14:25 |
mordred | fungi: I now no-longer know why shade can't figure it out | 14:25 |
*** dave-mccowan has joined #openstack-infra | 14:25 | |
mordred | and will have to debug further | 14:25 |
EmilienM | infra-core: mwhahaha and myself are looking at our gate issues and would like to know if you have hits on docker mirror provided on 8081 port. e.g. mirror.mtl01.inap.openstack.org:8081 - our goal is to make sure we use the infra mirror and not docker.io | 14:26 |
mordred | but we can still put in entries to clouds.yaml similar to internap to unblock us | 14:26 |
*** hongbin has joined #openstack-infra | 14:26 | |
frickler | mordred: I agree, if you compare it to the description in the api-ref, "External" is ok-ish, but to name the complement "Internal" is pretty misleading | 14:26 |
frickler | "Indicates whether the network has an external routing facility that’s not managed by the networking service." | 14:26 |
EmilienM | e.g. #2: http://mirror.mtl01.inap.openstack.org:8081/registry-1.docker/ | 14:26 |
mordred | AJaeger: lgtm | 14:27 |
mordred | frickler: yah | 14:27 |
mordred | frickler: "router:external" really means "can have a neutron router attached to it, oh, and also btw implies shared=True" | 14:27 |
mordred | what it decidedly does NOT mean is "this network is external" | 14:28 |
*** owalsh has quit IRC | 14:29 | |
mordred | so router:external = True can be used to determine that a network is probably to be used for routing externally, but router:external = False actually cannot be counted on to communicate the same thing, as the network in question could be, for instance, a provider network to be used for external traffic | 14:30 |
AJaeger | thanks, mordred | 14:31 |
fungi | EmilienM: 198.72.124.176 - - [14/Jun/2018:14:26:36 +0000] "GET /registry-1.docker/v2/tripleomaster/centos-binary-nova-compute-ironic/blobs/sha256:d82d6152d6a63608951fcc3c36dd01a66d12bc2ba41e8d4ab034ba4cc3d05806 HTTP/1.1" 307 736 "-" "docker/1.13.1 go/go1.9.4 kernel/3.10.0-862.3.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" | 14:31 |
fungi | mwhahaha: ^ | 14:31 |
*** heyongli has quit IRC | 14:32 | |
*** heyongli has joined #openstack-infra | 14:32 | |
*** zhangfei has quit IRC | 14:35 | |
mnaser | frickler: thanks for the node hold, it was a different failure but one that seems to repeat pretty often | 14:35 |
mnaser | is there problems with our xenial mirrors | 14:36 |
mnaser | http://paste.openstack.org/show/723477/ | 14:36 |
mtreinish | stephenfin: you can use --no-discover/-n on stestr run to specify running a specific file by path: https://stestr.readthedocs.io/en/latest/MANUAL.html#running-tests | 14:36 |
* mordred hugs mtreinish | 14:37 | |
mtreinish | it doesn't take a class arg iirc because the test runner just loads the module | 14:37 |
stephenfin | mtreinish: That's almost exactly what I was looking for. I assume we can't specify a test class that way | 14:37 |
*** ramishra has quit IRC | 14:37 | |
stephenfin | Beat me to it | 14:37 |
mwhahaha | fungi: so 307 is a cache hit? is there anyway to see like the number of requests/cache hit for docker fetches? | 14:38 |
mtreinish | we probably could look at adding that, but it might mean we need to patch the underlying subunit.run command | 14:38 |
stephenfin | mtreinish: Eh, this gets me 80% of where I need to go | 14:38 |
ttx | Hi! Did y'all have any plans to normalize IRC configuration ? It's a bit spread out onto lots of files today. I'm interested as I'm looking into publishing a reference list of current IRC channels on eavesdrop.o.o | 14:38 |
mordred | ttx: only in as much as there is a plan to consolidate from multiple bots to a single bot | 14:39 |
mordred | ttx: but there is no current configuration normalization plan that I am aware of | 14:39 |
ttx | mordred: any decision as to which bot that would be ? | 14:40 |
mordred | ttx: yes - there is a spec even ... one sec | 14:40 |
* ttx likes gerritbot config file better than that hiera file statusbot uses for channel list | 14:40 | |
mordred | ttx: http://specs.openstack.org/openstack-infra/infra-specs/specs/irc.html | 14:41 |
frickler | mnaser: that kind of error tends to happen if the mirrors are not up to date and your image has newer packages pre-installed than what the mirror has. | 14:42 |
frickler | mnaser: maybe some other infra-root can continue here, /me needs to leave soon | 14:42 |
mnaser | frickler: darn, okay, also i don't need `104.130.163.27` anymore | 14:42 |
ttx | mordred: thanks! | 14:42 |
mnaser | but if we can keep an autohold for the next failure in line :( | 14:42 |
*** heyongli has quit IRC | 14:42 | |
*** heyongli has joined #openstack-infra | 14:42 | |
frickler | mnaser: o.k., deleted the old node and did set an autohold for another three, just in case | 14:43 |
mnaser | frickler: thank you very much | 14:43 |
fungi | mwhahaha: on a provider-by-provider basis we could probably run some numbers by manually analyzing apache logs. what specifically are you looking for? | 14:43 |
*** hamzy_ is now known as hamzy | 14:44 | |
fungi | mwhahaha: also, i don't know that tripleo jobs are the only ones using the dockerhub proxy | 14:44 |
mwhahaha | fungi: we were seeing increased time loading containers over the last week and want to make sure the caching is still working and if we had added additional strain by switching to an containerized undercloud | 14:44 |
*** ociuhandu_ has joined #openstack-infra | 14:45 | |
*** iyamahat has joined #openstack-infra | 14:45 | |
mwhahaha | fungi: we've confirmed that we're still using the proxy, so we're working our way back to the source to try and determine what's going on | 14:45 |
fungi | mwhahaha: it may be useful for me to see if i can check the churn rate on the cache. we only set aside ~50gb of space for apache to cache things on our mirrors, if memory serves, so if the variety of what's being cached is too great we might be overrunning that and making the cache basically useless | 14:46 |
mtreinish | stephenfin: fwiw, you could use the python path (so '.' separated) and specify the class that way (not sure if it works down to a single method though), but for the file path interpolation it only works to the module level I think | 14:46 |
mwhahaha | yea that's what i'm afraid of | 14:46 |
mtreinish | stephenfin: and please feel free to open issues for quirks or improvment suggestions on this. Anything suggestions on making it easier to use would be appreciated | 14:47 |
stephenfin | mtreinish: Yeah, that's what I have been doing (tox -e py27 nova.tests.unit.network.neutronv2.SomeClass.test_something) but it's tedious converting paths to python paths | 14:47 |
fungi | and unfortunately, increasing that cache space is counter-productive even if we do have space to do it, because we reach the point where apache can't remove less-frequently-accessed content as fast as new content is being requested (it does this in an asynchronous fashion) so we risk filling up the filesystem | 14:47 |
stephenfin | mtreinish: will do (y) | 14:47 |
mwhahaha | fungi: ideally having a local container registry that we could push to in each cloud would reduce this requirement, but i'm not aware if this is on anyone's radar | 14:48 |
*** iyamahat has quit IRC | 14:48 | |
*** r-daneel_ has joined #openstack-infra | 14:48 | |
mtreinish | stephenfin: ah, that's slightly different pattern. With '-n' you're giving it a python object and it's run without doing a discovery. Without any args it does a regex match on the string after doing discovery (which imports all the modules to build a list) | 14:49 |
*** iyamahat has joined #openstack-infra | 14:49 | |
*** r-daneel has quit IRC | 14:49 | |
*** r-daneel_ is now known as r-daneel | 14:49 | |
mtreinish | stephenfin: if you don't mind taking the discovery hit (which can take a few secs depending on your system's io) you can give it a smaller string that uniquely identifies the test you care about | 14:49 |
fungi | mwhahaha: even at that, our test environment isn't really tuned to support nodes requesting many gigabytes of data over the network, whether it's hosted nearby or not | 14:49 |
fungi | we discussed in the past (now long past) the possibility of preparing, snapshotting, cloning and attaching block devices to test nodes to reduce the network load for such things, but in some of our providers available block storage bandwidth may not be any better than local network bandwidth (and in fact they're often one in the same anyway because of relying on iscsi) | 14:52 |
*** heyongli has quit IRC | 14:52 | |
mtreinish | mordred: on the topic of irc bots, want to take a look at: https://review.openstack.org/#/q/status:open+branch:master+topic:even-more-firehose :) | 14:53 |
*** heyongli has joined #openstack-infra | 14:53 | |
mwhahaha | so i think that's a different solution than being proposed, a docker registry would be no different than the existing distro mirrors today | 14:53 |
mwhahaha | because it's not a single file, it's a bunch of layers | 14:54 |
*** rpioso|afk is now known as rpioso | 14:54 | |
fungi | sure, but also constrained by many of the same challenges we have with our existing mirroring solutions | 14:54 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Collect the coverage report for npm test jobs https://review.openstack.org/570260 | 14:54 |
*** Swami has joined #openstack-infra | 14:54 | |
mwhahaha | the existing distro mirrors seem to be fine, so i'm unsure why you think local in dc repository wouldn't be an improvement | 14:55 |
fungi | we can do synchronous mirror updates via afs, but it's not really fit for replicating many gigabytes of high-churn data halfway around the world | 14:55 |
mordred | we looked at local in-dc docker repo before doing the apache route | 14:55 |
* mwhahaha shrugs | 14:56 | |
mordred | it is, unfortunately, more complicated to get it going and working properly than was feasile to deal with at that point in time | 14:56 |
mordred | there are some other changes and things coming down the pipeline that should improve that and make it reasonable to re-assess that | 14:56 |
mordred | but nothing that would make much improvement this week | 14:56 |
fungi | mwhahaha: any idea what the turnover rate is on those docker images? and how much space we'd need to store them? just trying to get a feel for the scale of the problem | 14:56 |
mwhahaha | fungi: at most once every 8 hours i think | 14:57 |
mwhahaha | fungi: but we haven't had any new ones in some time | 14:57 |
fungi | if it's 3 or 4 gigabytes changing every week or two that seems doable | 14:57 |
mwhahaha | the probablem is likely that it's more than just one branch because it's master/queens/pike | 14:57 |
mwhahaha | i need to get some actual number on the total size of a container set for a release | 14:58 |
fungi | if its in the order of tens of gigabytes and changing daily (or faster) then that would need some significant engineering to solve | 14:58 |
mwhahaha | i don't think so because we don't necessarily need infra to solve the replication | 14:58 |
mwhahaha | we could do the pushes and coordination of tagging ourselves | 14:58 |
mwhahaha | so i think it can be solved simply but we don't have a location to push to | 14:58 |
fungi | pushes to where? directly into the mirrors? and what's the plan to deal with them getting out of sync? | 14:59 |
mwhahaha | so we just need a stable base to start from, as long as it's in sync of a region there's not a problem | 14:59 |
mwhahaha | from our stand point we query the registry for the latest | 15:00 |
mwhahaha | so if we're pushing and don't update the tag until we're done, the jobs would continue to function | 15:00 |
mordred | fungi: so that if there was a docker registry in each cloud-region that had per-project space in it, they could have a publication job that pushed content into all of the mirrors | 15:00 |
mwhahaha | we're handling patching the containers in the jobs which includes updating them | 15:00 |
fungi | more concerned about the troubleshooting required if uploads to one or more regions get stuck/fail and you end up with jobs running against different versions of images for a significant period of time depending on where they're running | 15:01 |
mwhahaha | but that's on the project | 15:01 |
mwhahaha | we already have processes for uploading containers | 15:01 |
mordred | the challenge there is having a per-region registry that supports enough namespacing that a given project could have space to push things without affecting other projects | 15:01 |
mwhahaha | so for us it's just additional locations and folks solve the failures as they happen | 15:02 |
fungi | yeah, i really don't want to engineer "the tripleo docker registry network" | 15:02 |
mwhahaha | for that it would be quota setting and i'm not sure what's available from the various registry solutions | 15:02 |
*** heyongli has quit IRC | 15:03 | |
*** e0ne has quit IRC | 15:03 | |
mwhahaha | but it's also the kolla registry network | 15:03 |
fungi | if we build something, it would need to be generalized for any project that wants to put things in it | 15:03 |
*** heyongli has joined #openstack-infra | 15:03 | |
mordred | right. one of the challenges so far is that all of the more advanced registry solutions all assume one is going to run your container registry system in containers itself | 15:03 |
mordred | so far we do not use containers to run any services - although I am working on a spec about opening that up | 15:03 |
mordred | this is what I was getting at before - when we looked at this before the rabbit hole got pretty deep ... but we're getting close to the point where some of the previous blockers may have solutions | 15:04 |
mwhahaha | i know dmsimard has had decent luck with the atomic registry | 15:04 |
mordred | yes. that is one of the ones I believe we'd entertain | 15:05 |
mwhahaha | so maybe he has some input on this and the fesability to offer something | 15:05 |
mordred | but atomic registry assumes you are running it in containers | 15:05 |
dmsimard | atomic registry is unfortunately not really a thing anymore | 15:05 |
mordred | which we do not currently support | 15:05 |
clarkb | do we have a list of issues that the cyrrebt caching proxies dony solve that running a new service everywhere would? | 15:05 |
dmsimard | at least last I know | 15:05 |
clarkb | also tumbleweed deleted xmonad so I'm like a fisb out of water right now | 15:05 |
dmsimard | mordred: we are running the RDO openshift (standalone registry implementation) on a single virtual machine | 15:05 |
mordred | dmsimard: awesome | 15:05 |
dmsimard | mordred: it's not containerized | 15:05 |
fungi | clarkb: sounds like it's probably mostly related to wanting faster and more reliable access to very large files which change very frequently | 15:05 |
mwhahaha | dmsimard: oh sorry i thought it was teh atomic registry, it's the openshift one? | 15:06 |
clarkb | fungi: right I dont expect anew service would address that | 15:06 |
openstackgerrit | Merged openstack-infra/project-config master: Don't follow symlinks when setting log permissions https://review.openstack.org/575439 | 15:06 |
mwhahaha | they aren't very large files | 15:06 |
mwhahaha | containers aren't a single file | 15:06 |
fungi | large sets of data then | 15:06 |
clarkb | network bw is the issue | 15:06 |
dmsimard | mwhahaha: atomic registry is an implementation of openshift standalone registry but it's deprecated afaik | 15:07 |
clarkb | abd running a new service wont fix that | 15:07 |
mwhahaha | the issue is network bw out of a DC | 15:07 |
mordred | clarkb: I think the thing a registry in each region would solve that passthrough caching doesn't is the ability choose which things are put into the registries (the things people care about - or maybe also built artifacts) | 15:07 |
mwhahaha | and a new service in dc would fix that | 15:07 |
mwhahaha | or at least improve it | 15:07 |
clarkb | mwhahaha: you still have to copy it to all DCs | 15:07 |
frickler | mordred: ianw: fungi: with this patch to the clouds yaml I was able to successfully start an instance running /home/ianw/start.py http://paste.openstack.org/show/723478/ | 15:07 |
mordred | passthrough caching can get blown out by contention and random queries for less important things | 15:07 |
mwhahaha | having to go to extenral origin because caching rate is not sufficient seems to be the issue | 15:07 |
clarkb | thats wan and effectively the same issue | 15:08 |
mwhahaha | we're trying to be smart about the loading of teh data because we know what we need | 15:08 |
frickler | bbl | 15:08 |
dmsimard | mordred: fwiw openshift has pull-through caching support https://docs.openshift.com/container-platform/3.9/install_config/registry/extended_registry_configuration.html#middleware-repository-pullthrough | 15:08 |
mwhahaha | the caching leaves that up to disruption | 15:08 |
clarkb | mordred: that implies you'll find terabytes if local storage for the registry in each cloud? otherwise you still have that problem | 15:08 |
mwhahaha | we don't need terabytes | 15:08 |
mwhahaha | it's a rotating set of probably 10-20g | 15:09 |
clarkb | mwhahaha: you have 100gb today iirc | 15:09 |
dmsimard | mordred: so for example, you can automatically mirror the "centos" image in the openshift registry and then pull that | 15:09 |
mordred | clarkb: I don't think the request is for terrabytes - I think the request if for a specific set of base images to always be in the cache and never get expired | 15:09 |
mwhahaha | clarkb: 100gbs where? | 15:09 |
clarkb | mwhahaha: of dockerhub cache in each region | 15:09 |
mwhahaha | if you're refering to the apache stuff, that's shared between containers, rpms, etc | 15:09 |
mordred | becuase if they get expired, then the pullthrough cache needs to refresh from the open internet | 15:09 |
*** iyamahat has quit IRC | 15:09 | |
mwhahaha | clarkb: so whats the cache hit on that | 15:10 |
clarkb | mordred: right so thats sort of shat I was tryibg to get at if the problem is how we cache we can address tht pretty easily | 15:10 |
clarkb | if the problem is network we cant | 15:10 |
clarkb | mwhahaha: I havent looked in a long time so not sure | 15:10 |
*** dizquierdo has joined #openstack-infra | 15:11 | |
mordred | clarkb: yes - the problem is how we cache | 15:11 |
mordred | the problem is not network | 15:11 |
*** dhajare has quit IRC | 15:11 | |
mordred | or - the hypothesis is that the problem is how we cache | 15:11 |
clarkb | mordred: we can increas the length of time we keep objects (I think it is 24 hours today) | 15:11 |
mwhahaha | so there are two issues | 15:11 |
mwhahaha | (at least) | 15:12 |
mwhahaha | the caching of stuff for 24 hours may not be correct | 15:12 |
mwhahaha | because the rotation of the images may occur more frequently | 15:12 |
mordred | clarkb: in a perfect world, projects would be able to say "we never want this set of objects to be expired from cache" | 15:12 |
mwhahaha | which is hwy i asked about being able to push to a registry because we would handle the next cache loading in the container build process | 15:12 |
mordred | clarkb: but the logistics of that become obviously complicated | 15:13 |
*** heyongli has quit IRC | 15:13 | |
mwhahaha | with caching, we're at the mercy of other things as well because the stale content may continue to live long after it's no longer valid reducing the viability of the cache | 15:13 |
*** heyongli has joined #openstack-infra | 15:13 | |
fungi | also if the files all update at once, then you get a thundering herd sort of problem | 15:13 |
mwhahaha | also misses are more painful because you end up with multipl origin hits | 15:13 |
clarkb | mwhahaha: stale data in this cobtex shouldnt be a big problem since it is all sha256 addressed | 15:14 |
clarkb | and it acts as a fifo | 15:14 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Add buildset-artifacts-location https://review.openstack.org/530679 | 15:14 |
mwhahaha | ok it would be nice to have some visibility on the health of these caches to understand increased requests/misses | 15:14 |
fungi | would exposing apache mod_status help, i wonder? | 15:18 |
mwhahaha | possibly to start | 15:20 |
clarkb | I was trying ti fibd if something like that would report ache stats | 15:20 |
*** yamamoto has joined #openstack-infra | 15:20 | |
clarkb | I dont think status does | 15:21 |
Diabelko | hello again, I've stumbled upon "parent-change-enqueued" event and seems like a great idea to solve my problem (B depends on A, both got -1 because of failure in A, A is fixed and B is not getting a re-run), but I don't see it configured anywhere in your check or gate pipelines in openstack-infra/project-config | 15:22 |
Diabelko | is there a tricky part there somewhere? | 15:22 |
Diabelko | some unintended behavior? | 15:22 |
*** heyongli has quit IRC | 15:23 | |
*** lpetrut has joined #openstack-infra | 15:23 | |
*** heyongli has joined #openstack-infra | 15:23 | |
*** pcaruana has quit IRC | 15:24 | |
*** yamamoto has quit IRC | 15:25 | |
*** krtaylor has quit IRC | 15:25 | |
clarkb | looks like you can add cache status logging to the log format | 15:25 |
fungi | Diabelko: we've taken the stance in the past that actions on a change should be necessary to trigger new jobs in independent pipelines like our check or experimental pipelines (new patchset, change restored, explicit recheck comment). also parent-change-enqueued only gets you that behavior for explicit change series but not zuul's cross-repository or cross-connection dependencies | 15:29 |
fungi | Diabelko: | 15:30 |
fungi | er, sorry, stray carriage return | 15:31 |
*** ykarel_ has joined #openstack-infra | 15:31 | |
fungi | oh, you're talking about a zuul internal event, i was completely misunderstanding and confusing that with one of the gerrit event stream events | 15:32 |
fungi | so i think we do rely on that to enqueue in dependent pipelines | 15:32 |
fungi | and it does get you cross-repo/conn dependencies | 15:33 |
*** heyongli has quit IRC | 15:33 | |
*** ykarel|away has quit IRC | 15:34 | |
*** heyongli has joined #openstack-infra | 15:34 | |
*** eernst has joined #openstack-infra | 15:34 | |
fungi | oh, actually no that behavior is simply implicit in dependent pipelines, as zuul evaluates the entire set of dependent changes to see which are ready to enqueue when it gets an enqueuing event for any one of them | 15:35 |
*** anteaya has joined #openstack-infra | 15:35 | |
fungi | looks like we actually added that event type in https://review.openstack.org/112411 nearly 4 years ago | 15:38 |
*** lpetrut has quit IRC | 15:38 | |
*** krtaylor has joined #openstack-infra | 15:40 | |
*** hashar is now known as hasharAway | 15:40 | |
*** e0ne has joined #openstack-infra | 15:40 | |
*** krtaylor has quit IRC | 15:42 | |
fungi | Diabelko: so anyway, yes matching on parent-change-enqueued to enqueue changes for retesting may make sense in some environments. i think in ours i wouldn't want to see that because we have rather a lot of churn and often very long dependent series where forcing retesting could lead to a lot of additional utilization, but it's worth entertaining | 15:43 |
clarkb | ya the internet seems to think that %{Age} is the way to track this via the log | 15:43 |
clarkb | then we can produce a report like we do for docs 404s likely | 15:43 |
*** heyongli has quit IRC | 15:44 | |
*** krtaylor has joined #openstack-infra | 15:44 | |
*** krtaylor has quit IRC | 15:44 | |
*** heyongli has joined #openstack-infra | 15:44 | |
clarkb | or cache_status https://httpd.apache.org/docs/2.4/mod/mod_cache.html#status | 15:45 |
*** krtaylor has joined #openstack-infra | 15:45 | |
*** krtaylor has quit IRC | 15:49 | |
*** lihi has quit IRC | 15:50 | |
*** lpetrut has joined #openstack-infra | 15:50 | |
*** dtantsur is now known as dtantsur|afk | 15:51 | |
*** kgiusti has left #openstack-infra | 15:52 | |
*** germs has joined #openstack-infra | 15:52 | |
*** germs has quit IRC | 15:52 | |
*** germs has joined #openstack-infra | 15:52 | |
*** lihi has joined #openstack-infra | 15:52 | |
fungi | that seems like it would be useful | 15:53 |
*** linkmark has joined #openstack-infra | 15:53 | |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Rename the API-WG to API-SIG https://review.openstack.org/575478 | 15:54 |
*** heyongli has quit IRC | 15:54 | |
*** heyongli has joined #openstack-infra | 15:54 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Log cache misses on caching mirror proxies https://review.openstack.org/575479 | 15:54 |
clarkb | mwhahaha: fungi ^ start with something like that maybe | 15:55 |
*** sshnaidm has quit IRC | 15:55 | |
*** dizquierdo has quit IRC | 15:56 | |
pabelanger | I'm just looking into nodepool errors in nl03, and seeing: shade.exc.OpenStackCloudTimeout: Timeout waiting for the server to come up for limestone wonder if we need to bump boot-timeout a little here | 15:58 |
pabelanger | was originally looking to see they OVH-gra1 failures | 15:59 |
Diabelko | fungi: oh, interesting. Our dependencies usually don't go with more than 5 reviews, but then again I think only situation it can bite you is that if change B (C, D..) requires modifications as well | 16:01 |
logan- | pabelanger: timestamps line up with the spike here? http://grafana.openstack.org/dashboard/db/nodepool-limestone?panelId=13&fullscreen | 16:01 |
Diabelko | I'm interested that mostly because of the users though | 16:01 |
Diabelko | got asked multiple times why CI didn't run on a change X | 16:01 |
pabelanger | logan-: 2018-06-14 11:37:20,798 ERROR nodepool.NodeLauncher-0000121748: Launch attempt 1/3 failed for node 0000121748: | 16:02 |
pabelanger | logan-: seems too | 16:02 |
pabelanger | logan-: did we upload new images? | 16:03 |
pabelanger | and hitting force raw convert? | 16:03 |
*** krtaylor has joined #openstack-infra | 16:03 | |
*** myoung is now known as myoung|lunch | 16:04 | |
*** heyongli has quit IRC | 16:04 | |
*** heyongli has joined #openstack-infra | 16:05 | |
*** krtaylor has quit IRC | 16:06 | |
*** panda is now known as panda|off | 16:06 | |
*** lifeless has quit IRC | 16:07 | |
*** jpich has quit IRC | 16:08 | |
*** lifeless has joined #openstack-infra | 16:08 | |
logan- | pabelanger: yeah either that or the daily osa deploy run is what i was thinking | 16:09 |
pabelanger | k | 16:10 |
logan- | thanks was just curious. osa deploy was 8:15 - 9:50 so it must have been new images | 16:12 |
*** jesslampe has joined #openstack-infra | 16:13 | |
*** heyongli has quit IRC | 16:14 | |
pabelanger | logan-: can you confirm if nova is converting them to raw? | 16:14 |
*** heyongli has joined #openstack-infra | 16:15 | |
logan- | ya the images in /var/lib/nova/instances/_base appear to be raws based on qemu-img info output | 16:16 |
logan- | http://paste.openstack.org/raw/723482/ | 16:17 |
pabelanger | k, that might explain it | 16:17 |
pabelanger | logan-: do they need to be raw? | 16:18 |
pabelanger | if so, we could upload raw directly | 16:18 |
pabelanger | otherwise, we should force qcow2 in nova.conf | 16:18 |
*** gyee has joined #openstack-infra | 16:19 | |
*** udesale has joined #openstack-infra | 16:20 | |
*** jesslampe has quit IRC | 16:22 | |
*** yamamoto has joined #openstack-infra | 16:22 | |
logan- | they don't need to be raw, but iirc the defaults are the way they are because theres a performance hit to the instance disk i/o when the base is not raw? uploading raw would probably not be preferable because you would take a hit on boot times downloading images from glance then. | 16:22 |
logan- | im thinking youre right to increase the ready timeout | 16:23 |
*** krtaylor has joined #openstack-infra | 16:23 | |
pabelanger | logan-: which would take longer, downloading raw from glance, or all compute nodes converting to raw? | 16:24 |
*** krtaylor has quit IRC | 16:24 | |
*** jesslampe has joined #openstack-infra | 16:24 | |
*** heyongli has quit IRC | 16:25 | |
pabelanger | logan-: 14GB for .raw / 8.5GB for .qcow2 | 16:25 |
*** heyongli has joined #openstack-infra | 16:25 | |
logan- | depends how many nodes we have :) i have no idea w/ the current setup but as you add more nodes its not like the glance bw throughput will scale along with the hv count. if you want to upload raws and find out, im not opposed to that | 16:27 |
*** yamamoto has quit IRC | 16:27 | |
*** pbourke has quit IRC | 16:28 | |
pabelanger | clarkb: corvus: Shrews: we seem to have a fair bit (10) ready locked nodes in nodepool right now, all seem to be above 15 hours. Only mention since node-requests are climbing today and could use all the nodes when possible. | 16:28 |
pabelanger | 1 seems to be 2 days | 16:28 |
*** udesale has quit IRC | 16:28 | |
*** jesslampe has quit IRC | 16:29 | |
fungi | also raw means much longer upload times to glance, and greater opportunity they're disrupted and have to be retried | 16:31 |
fungi | and greater bandwidth utilization | 16:32 |
corvus | pabelanger, clarkb, Shrews: it would be useful if someone has the time to track down what holds the locks and why. unfortunately, i don't at the moment. | 16:32 |
*** germs has quit IRC | 16:32 | |
Shrews | I'm afk for lunch | 16:33 |
corvus | i plan on performing a full zuul restart today (a fuul restart?), so we'll probably lose that data soon. on the plus side, we'll probably release the locks. | 16:34 |
pabelanger | I might be able to dig more in later this evening, but not right now. I just wanted to point it out as I was looking into nodepool failures | 16:34 |
*** Swami has quit IRC | 16:35 | |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Migrate the API-SIG to StoryBoard https://review.openstack.org/575120 | 16:35 |
*** heyongli has quit IRC | 16:35 | |
*** heyongli has joined #openstack-infra | 16:35 | |
corvus | though, iirc, we may still have an open bug (from even before the 3.0 release) about builds/nodes not being cleaned up correctly if an executor dies; there could still be an edge case in there somewhere. | 16:35 |
corvus | we've had a lot of executor restarts lately | 16:35 |
*** SumitNaiksatam has quit IRC | 16:36 | |
*** iyamahat has joined #openstack-infra | 16:36 | |
pabelanger | Yah, there is also a bug if zuul does a reload, and removes a job, we leak the node and don't properly clean up. | 16:37 |
*** zzzeek has quit IRC | 16:38 | |
*** dhajare has joined #openstack-infra | 16:39 | |
*** lyarwood has quit IRC | 16:39 | |
*** lyarwood has joined #openstack-infra | 16:39 | |
*** sshnaidm has joined #openstack-infra | 16:40 | |
*** germs has joined #openstack-infra | 16:41 | |
*** germs has quit IRC | 16:41 | |
*** germs has joined #openstack-infra | 16:41 | |
*** krtaylor has joined #openstack-infra | 16:42 | |
*** zzzeek has joined #openstack-infra | 16:44 | |
*** heyongli has quit IRC | 16:45 | |
*** heyongli has joined #openstack-infra | 16:45 | |
*** krtaylor has quit IRC | 16:46 | |
*** germs has quit IRC | 16:47 | |
*** e0ne has quit IRC | 16:48 | |
clarkb | we should be able to manually delete the nodes though right? | 16:53 |
clarkb | especially if they are that old | 16:53 |
*** dougsz has quit IRC | 16:55 | |
mnaser | have we ever mirrored git repos locally? | 16:55 |
pabelanger | clarkb: no, locked held but zuul, can't do it via CLI. Maybe we should add a --force flag | 16:55 |
mnaser | spice-html5 decided "fu github" and is now running on an awful gitlab instance that is working half the time | 16:55 |
*** heyongli has quit IRC | 16:55 | |
*** heyongli has joined #openstack-infra | 16:56 | |
clarkb | mnaser: no, one idea around that was to add the repos to zuul and it would push things into the test nodes | 16:56 |
clarkb | (and maintain the cache on the zuul nodes) | 16:56 |
*** krtaylor has joined #openstack-infra | 16:58 | |
*** zoli is now known as zli|gone | 16:59 | |
*** zli|gone is now known as zoli|gone | 16:59 | |
*** zoli|gone is now known as zoli | 16:59 | |
*** e0ne has joined #openstack-infra | 17:00 | |
*** derekh has quit IRC | 17:00 | |
*** trown is now known as trown|lunch | 17:01 | |
mnaser | https://gitlab.freedesktop.org/spice/spice-html5 like this was 503ing a whole bunch of times | 17:02 |
*** SumitNaiksatam has joined #openstack-infra | 17:02 | |
pabelanger | isn't it published to npm? | 17:02 |
mnaser | ooo | 17:03 |
mnaser | that's a good idea | 17:03 |
*** krtaylor has quit IRC | 17:03 | |
pabelanger | yah, you'll get the regional reverse proxy cache that way | 17:03 |
mnaser | a whole bunch of commits missing from master tho | 17:03 |
mnaser | gr | 17:03 |
mnaser | maybe they publish snapshots to npm hm | 17:04 |
clarkb | mnaser: thanks for the comments about docker install yesterday, looking at the log now and http://logs.openstack.org/74/74/c356c2eeb28c1dcd4deb2b00fcd896c57d66284c/third-party-check/kata-runsh/bae0973/job-output.txt.gz#_2018-06-13_22_39_08_695464 looks suspicioutly unhappy | 17:04 |
clarkb | corvus: looking at that log path I think the 74/74/sha1/ maybe in error? I think we want 74/c3/sha1 ? | 17:04 |
mnaser | clarkb: yeah, docker-ce would make a difference, can i iterate on the job you were building out? | 17:05 |
mnaser | the prep script is the same they use across all vms | 17:05 |
clarkb | mnaser: definitely, but also docker-ce doesn't have bionic package :/ | 17:05 |
mnaser | you have to add a repo | 17:05 |
clarkb | mnaser: ya setup.sh is running but there are only older ubuntu packages | 17:05 |
clarkb | this is why I manually install docker as part of the job from the system | 17:05 |
*** heyongli has quit IRC | 17:06 | |
clarkb | mnaser: they have a bionic repo but no packages in it | 17:06 |
mnaser | right but you have to add the docker.io repos | 17:06 |
mnaser | OH | 17:06 |
mnaser | OH | 17:06 |
clarkb | yes I know, those are empty :) | 17:06 |
mnaser | i'm sorry | 17:06 |
mnaser | okay, now i get it | 17:06 |
mnaser | sorry potato brain | 17:06 |
*** heyongli has joined #openstack-infra | 17:06 | |
clarkb | ya I don't know if that is just lag or if they don't publish packages while the distro has up to date packages itself or what | 17:06 |
pabelanger | https://github.com/docker/for-linux/issues/290 | 17:06 |
mnaser | i think there in another path | 17:06 |
clarkb | we could try the xenial packages | 17:06 |
pabelanger | or artful? | 17:07 |
mnaser | or maybe we could run their ci on xenial vms? | 17:08 |
clarkb | mnaser: ya I think the other path works as far as "oh docker is already installed" but doesn't work for the configure docker stuff | 17:08 |
clarkb | mnaser: ya we could do that, though I figured starting on newer distro would be nice for other reasons (but docker-ce sort of gets in the way I guess) | 17:08 |
mnaser | involves changing up the nodesets but it is a lot less 'breaking' their ci | 17:08 |
mnaser | they do testing on 16.04, 17.10 (dont ask me), centos 7 and fedora 27 | 17:09 |
clarkb | was thinking newer kernel and everything else may be beneficial to them but maybe worry about that later | 17:10 |
*** e0ne has quit IRC | 17:10 | |
mnaser | ++ | 17:10 |
clarkb | let me put up some patches to xenial | 17:11 |
*** myoung|lunch is now known as myoung | 17:12 | |
*** e0ne has joined #openstack-infra | 17:13 | |
*** jpena is now known as jpena|off | 17:14 | |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Add xenial node for vexxhost kata testing https://review.openstack.org/575502 | 17:15 |
clarkb | mnaser: ^ that is first step | 17:15 |
openstackgerrit | Merged openstack-infra/project-config master: Remove glance legacy job https://review.openstack.org/551016 | 17:15 |
mnaser | lgtm | 17:16 |
*** heyongli has quit IRC | 17:16 | |
*** heyongli has joined #openstack-infra | 17:16 | |
pabelanger | +3 | 17:16 |
*** amoralej is now known as amoralej|off | 17:16 | |
dhellmann | could I get 1 more reviewer to take a look at https://review.openstack.org/574842 please? I would like to be able to test out that new check job but I can't do it speculatively because the change is in project-config | 17:17 |
openstackgerrit | Clark Boylan proposed openstack-infra/openstack-zuul-jobs master: Improve kata-runsh job https://review.openstack.org/573748 | 17:18 |
clarkb | pabelanger: mnaser ^ that is the consumption of it, I have just been testing that in a self testing manner though | 17:18 |
clarkb | so doesn't need review yet | 17:18 |
*** krtaylor has joined #openstack-infra | 17:19 | |
corvus | dhellmann, AJaeger: why is that in project-config instead of ozj? | 17:19 |
dhellmann | because it uses part of the tarball playbook and the real publish job needs to be in project-config because it uses secrets | 17:19 |
*** e0ne has quit IRC | 17:19 | |
corvus | thx | 17:19 |
mnaser | clarkb: this is the ready script right now in jenkins that is working - http://paste.openstack.org/show/723489/ | 17:20 |
dhellmann | keeping the 2 things together felt like the less confusing way to do it, rather than duplicating the playbook and role | 17:20 |
dhellmann | *2 jobs together | 17:20 |
mnaser | you can skip the java part obviously because that's for the jenkins slave stuff | 17:20 |
mnaser | also the whole unattended stuff is probably useless in our case | 17:20 |
*** krtaylor has quit IRC | 17:21 | |
clarkb | mnaser: ya also setup.sh does the docker install if no docker installed so I think it has that covered. Also we don't need java for jenkins | 17:21 |
mnaser | oh the setup.sh does docker install? ok interesting didnt know that | 17:21 |
mnaser | i copypasta'd old config | 17:21 |
clarkb | mnaser: ya, it was one of the first things that failed for me because of the lack of bionic packages | 17:21 |
mnaser | aaaah | 17:22 |
clarkb | got past that then ran into the nested virt problem and it wouldn't run past that | 17:22 |
*** krtaylor has joined #openstack-infra | 17:22 | |
*** krtaylor has quit IRC | 17:22 | |
mnaser | gotcha | 17:23 |
clarkb | but what is old is new again now that nested virt is addressed for their use case | 17:23 |
*** yamamoto has joined #openstack-infra | 17:23 | |
*** e0ne has joined #openstack-infra | 17:24 | |
clarkb | mnaser: https://github.com/kata-containers/proxy/pull/74 is the pull request I have been pushing to to test changes to https://review.openstack.org/573748 if you want to also try that feel free. Though we have to wait for the xenial stuff to get to nodepool first | 17:24 |
mnaser | ooh that's how you've been doing it | 17:25 |
mnaser | i see | 17:25 |
clarkb | mnaser: right now only that one project in github tlaks to zuul | 17:25 |
mnaser | are you manually re-enquing or 'recheck' | 17:25 |
*** felipemonteiro has joined #openstack-infra | 17:25 | |
clarkb | mnaser: I'm pushing new commits to the PR because I don't think recheck will work there until they accept the updated perms requirements | 17:25 |
clarkb | mnaser: usually I just edit the commit message then push :) | 17:26 |
*** heyongli has quit IRC | 17:26 | |
mnaser | ok cool | 17:26 |
*** heyongli has joined #openstack-infra | 17:26 | |
mnaser | once we get the base going i think it might be relatively easy to move things across | 17:27 |
clarkb | ya should be too bad. The biggest thing will be addign a tenant for them in zuul if they want to use zuul | 17:27 |
*** e0ne has quit IRC | 17:27 | |
*** yamamoto has quit IRC | 17:29 | |
*** tesseract has quit IRC | 17:30 | |
*** dave-mccowan has quit IRC | 17:30 | |
clarkb | mwhahaha: so that I understand properly, the symptom of failure you are observing is that multiple docker image pulls take long enough to cause jobs to timeout? Also, you are pulling the same images in all cases so they should be cached? | 17:30 |
clarkb | or rather pulling the same images within a single job so subsequent pulls should be quicker | 17:31 |
mwhahaha | yea | 17:31 |
corvus | dhellmann: i feel like there's probably a way to either use inheritance or ansible roles so that the bulk of the job is in ozj, and then there's a smaller thing in project-config which adds the secret into the mix for the "real" job. but probably the best thing to do is to +3 that change and then think about refactoring later (since it shouldn't be hard) rather than blocking on perfect design. :) | 17:31 |
*** janki has quit IRC | 17:31 | |
clarkb | mwhahaha: cool thanks for confirming. In that case I agree checking cache usage is going to be helpful. I think causes of that could be not using the cache at all (we have checked that we are using the proxies right?), the proxies not caching or not using cached data for some reason, or we are caching but network bandwidh (possibly disk io?) are the bottleneck | 17:32 |
clarkb | my change to the apache mirror config should help address the second item there | 17:32 |
mwhahaha | clarkb: we checked the logs and it's referencing the mirrors so it should be going through the proxies | 17:33 |
mwhahaha | so i'm just trying to work through the flow to understand where there might be issues. i know that the transit to the origin would be a problem. It's not consistent enough to look like we're hitting something there but may be related to efficiency of the caches | 17:34 |
clarkb | mwhahaha: is any one region worse than the others (that could point to network/disk bottlenecks) | 17:35 |
mwhahaha | not really | 17:35 |
mwhahaha | weshay|ruck had more data around that | 17:35 |
mwhahaha | we were seeing issues in rax and in limestone | 17:35 |
mwhahaha | and maybe vexxhost | 17:35 |
mwhahaha | at least thats what he mentioned to me yesterday but it wasn't consistent | 17:36 |
weshay|ruck | what's up | 17:36 |
mwhahaha | i noticed there seemed to be a slight pattern in timeouts but it wasn't exactly every N hours | 17:36 |
*** heyongli has quit IRC | 17:36 | |
mwhahaha | weshay|ruck: looking into container fetching and caching effeciencies | 17:36 |
weshay|ruck | k | 17:37 |
*** heyongli has joined #openstack-infra | 17:37 | |
corvus | i'm going to directly re-enqueue some zuul gate changes (starting at 575351) so that we can restart it with them today | 17:38 |
*** akhilaki has joined #openstack-infra | 17:38 | |
clarkb | ok | 17:38 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: Added endpoint to get current selection plan by status https://review.openstack.org/575509 | 17:38 |
openstackgerrit | Merged openstack-infra/openstackid-resources master: Added endpoint to get current selection plan by status https://review.openstack.org/575509 | 17:39 |
clarkb | I'm getting htcacheclean to print everything we have in the cache on limestone | 17:42 |
*** florianf has quit IRC | 17:43 | |
*** pcaruana has joined #openstack-infra | 17:43 | |
clarkb | limestone has 6718 docker images cached | 17:44 |
clarkb | or at least that many /cloudfront/registry-v2/docker/registry/v2/blobs/$sha256 objects | 17:45 |
*** e0ne has joined #openstack-infra | 17:45 | |
clarkb | that implies we are at least caching them | 17:45 |
*** heyongli has quit IRC | 17:47 | |
*** heyongli has joined #openstack-infra | 17:47 | |
corvus | clarkb: do we log cache status? https://httpd.apache.org/docs/2.4/mod/mod_cache.html#status | 17:47 |
clarkb | corvus: not yet https://review.openstack.org/575479 is up to start doing that | 17:49 |
clarkb | the biggest object in the cache is 429MB and is an opendaylight tarball. The next biggest objects are in the 330MB range and are all container images | 17:50 |
*** diablo_rojo has joined #openstack-infra | 17:51 | |
openstackgerrit | Merged openstack-infra/project-config master: add a job to check the metadata for python packages https://review.openstack.org/574842 | 17:52 |
openstackgerrit | Merged openstack-infra/project-config master: remove publish-openstack-python-tarball https://review.openstack.org/574859 | 17:52 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Add cache status into mirror access log https://review.openstack.org/575514 | 17:54 |
corvus | clarkb: ^ i was thinking of an alternate approach, what do you think? | 17:54 |
clarkb | 43.9GB of 52GB of cache is just docker body content | 17:54 |
clarkb | corvus: looking | 17:55 |
clarkb | corvus: ya that will get us more info. I was wanting to avoid needing to sift too much data but that can all be done after the fact +2 | 17:56 |
EmilienM | is gerrit only slow for me? | 17:56 |
clarkb | EmilienM: it wasn't slow for me reviewing that change just now. Is it the web ui that is slow or pushing code? maybe both? | 17:56 |
EmilienM | both | 17:56 |
EmilienM | it's probably canadian mega firewalls | 17:56 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix paused handler exception handling https://review.openstack.org/575515 | 17:57 |
*** heyongli has quit IRC | 17:57 | |
clarkb | we aren't garbage collecting according to melody | 17:57 |
*** heyongli has joined #openstack-infra | 17:57 | |
*** ykarel_ has quit IRC | 17:58 | |
clarkb | mwhahaha: corvus I was also wrong about having 100GB available for the cache. We do have at least that but then htcacheclean is set to trim at 50GB because it lags behind apache | 17:58 |
clarkb | we may be able to increase that number, cache cleaning performance appears much better after we changed the number of levels in the cache | 17:58 |
clarkb | 80GB maybe | 17:58 |
clarkb | mwhahaha: weshay|ruck if you can find the sha256sum of one of your images that should not have changed recently I can double check if it is in the cache too | 18:00 |
dhellmann | corvus : I stepped away from lunch, so I'm just coming back to your comment. I'll be happy to work with you on a refactoring once I have the job working. | 18:01 |
*** pcaruana has quit IRC | 18:01 | |
clarkb | dockerhub search doesn't work on sha256s apaprently | 18:03 |
*** krtaylor has joined #openstack-infra | 18:04 | |
*** jesslampe has joined #openstack-infra | 18:04 | |
*** krtaylor has quit IRC | 18:06 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add buildset-artifacts-location https://review.openstack.org/530679 | 18:06 |
clarkb | they don't print the sha256sum either | 18:06 |
*** pcaruana has joined #openstack-infra | 18:06 | |
*** krtaylor has joined #openstack-infra | 18:07 | |
*** heyongli has quit IRC | 18:07 | |
clarkb | aha as of docker1.10 the layers are sha256 addressable | 18:07 |
*** heyongli has joined #openstack-infra | 18:08 | |
clarkb | and no longer 1:1 mapped wtih images | 18:08 |
clarkb | still would be nice to be able to look things up this way | 18:08 |
clarkb | (but undersatnd why it is more difficult if everyone shares a layer) | 18:08 |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Rename the API-WG to API-SIG https://review.openstack.org/575478 | 18:08 |
clarkb | mwhahaha: weshay|ruck but ya if you can docker inspect one of your images that hasn't updated recently then I can check for whether or not the layers are cached | 18:09 |
*** krtaylor has quit IRC | 18:11 | |
*** owalsh has joined #openstack-infra | 18:12 | |
*** aojea has quit IRC | 18:14 | |
*** lpetrut has quit IRC | 18:14 | |
*** eharney has quit IRC | 18:15 | |
*** heyongli has quit IRC | 18:17 | |
*** heyongli has joined #openstack-infra | 18:18 | |
*** electrofelix has quit IRC | 18:19 | |
*** felipemonteiro has quit IRC | 18:19 | |
*** trown|lunch is now known as trown | 18:19 | |
*** eharney has joined #openstack-infra | 18:20 | |
openstackgerrit | Brianna Poulos proposed openstack-infra/openstack-zuul-jobs master: Remove glance legacy job https://review.openstack.org/551019 | 18:21 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Increase apache mirror cache to 70GB https://review.openstack.org/575520 | 18:23 |
clarkb | corvus: mwhahaha ^ something like that may also help | 18:23 |
*** sshnaidm is now known as sshnaidm|off | 18:26 | |
*** e0ne has quit IRC | 18:27 | |
*** heyongli has quit IRC | 18:28 | |
*** r-daneel has quit IRC | 18:28 | |
*** heyongli has joined #openstack-infra | 18:28 | |
*** r-daneel has joined #openstack-infra | 18:28 | |
*** yamamoto has joined #openstack-infra | 18:34 | |
openstackgerrit | Merged openstack-infra/zuul master: Move zuul_log_id injection to command action plugin https://review.openstack.org/575351 | 18:36 |
openstackgerrit | Merged openstack-infra/zuul master: Fix log streaming for delegated hosts https://review.openstack.org/575352 | 18:36 |
openstackgerrit | Merged openstack-infra/zuul master: Revert "Temporarily override Ansible linear strategy" https://review.openstack.org/575353 | 18:36 |
openstackgerrit | Merged openstack-infra/zuul master: Remove extra argument when logging logger timeout https://review.openstack.org/575354 | 18:36 |
*** heyongli has quit IRC | 18:38 | |
*** heyongli has joined #openstack-infra | 18:38 | |
*** yamamoto has quit IRC | 18:39 | |
*** germs has joined #openstack-infra | 18:43 | |
*** germs has quit IRC | 18:43 | |
*** germs has joined #openstack-infra | 18:43 | |
*** germs has quit IRC | 18:47 | |
*** heyongli has quit IRC | 18:48 | |
*** heyongli has joined #openstack-infra | 18:48 | |
*** yamamoto has joined #openstack-infra | 18:49 | |
*** camunoz has quit IRC | 18:50 | |
*** Goneri has quit IRC | 18:52 | |
*** dave-mccowan has joined #openstack-infra | 18:52 | |
*** yamamoto has quit IRC | 18:53 | |
clarkb | mwhahaha: corvus my reading of the cache timestamps is that we have plenty of older cached data in teh cache. So increasing the size may not actually help | 18:54 |
*** camunoz has joined #openstack-infra | 18:55 | |
clarkb | (that implies we are keeping old data then checking if it has been modified since and not deleting it due to disk pressure) | 18:57 |
clarkb | the logging change from corvus should tell us more though | 18:57 |
*** yamamoto has joined #openstack-infra | 18:58 | |
*** heyongli has quit IRC | 18:58 | |
*** heyongli has joined #openstack-infra | 18:59 | |
*** sthussey has quit IRC | 18:59 | |
*** gfidente is now known as gfidente|afk | 19:00 | |
dhellmann | corvus : I expected this patch to run the new test release job and I don't know why it didn't. Is there some way to ask zuul? https://review.openstack.org/#/c/574916/ | 19:02 |
dhellmann | oh, nevermind, I see why | 19:03 |
dhellmann | missing a depends-on | 19:03 |
*** yamamoto has quit IRC | 19:07 | |
hogepodge | Sorry for the silly question, but how do I switch a gate job from optional to required? | 19:08 |
*** heyongli has quit IRC | 19:09 | |
clarkb | hogepodge: depends on what you mean by optional. Is it currently non voting? or does it only run when certain files are modified? | 19:09 |
fungi | hogepodge: not silly, but i'm having trouble parsing it. is the job in question being run now but reported as "non-voting" with the result? | 19:09 |
*** heyongli has joined #openstack-infra | 19:09 | |
*** eernst has quit IRC | 19:09 | |
hogepodge | clarkb: fungi: I want to change the job loci-requirements from non-voting to voting https://review.openstack.org/#/c/575174/ | 19:10 |
hogepodge | The rest can remain non-voting | 19:10 |
*** eernst has joined #openstack-infra | 19:11 | |
clarkb | hogepodge: https://git.openstack.org/cgit/openstack/loci/tree/.zuul.d/base.yaml#n6 sets that voting value to false globally in the loci jobs. You can override that at https://git.openstack.org/cgit/openstack/loci/tree/.zuul.d/requirements.yaml#n12 to force that one job to be voting | 19:12 |
clarkb | just add a voting: True | 19:14 |
*** e0ne has joined #openstack-infra | 19:15 | |
*** felipemonteiro has joined #openstack-infra | 19:15 | |
hogepodge | Thanks clarkb. | 19:15 |
*** yamamoto has joined #openstack-infra | 19:16 | |
*** yamamoto has quit IRC | 19:16 | |
*** eernst has quit IRC | 19:16 | |
*** eernst has joined #openstack-infra | 19:16 | |
*** heyongli has quit IRC | 19:19 | |
*** heyongli has joined #openstack-infra | 19:19 | |
*** sthussey has joined #openstack-infra | 19:23 | |
clarkb | any other infra-root willing to review https://review.openstack.org/#/c/575514/1 ? | 19:25 |
clarkb | adds cache status logging to our caching mirror proxies | 19:25 |
*** bobh has joined #openstack-infra | 19:27 | |
*** heyongli has quit IRC | 19:29 | |
*** heyongli has joined #openstack-infra | 19:29 | |
prometheanfire | dhellmann: might as well ask here | 19:30 |
prometheanfire | how long is the hound re-index expected to take? | 19:31 |
clarkb | prometheanfire: I want to say just a few minutes, like 5-10 | 19:32 |
fungi | 2018/06/14 14:57:51 Rebuilding airship-drydock for b4a31a79de5096ffb39c21c52544da515a7e06be | 19:36 |
fungi | 2018/06/14 14:57:51 open /tmp/csearch078190157: too many open files | 19:36 |
fungi | from the end of /var/log/hound.log | 19:36 |
prometheanfire | ah | 19:36 |
fungi | looks like it may have been stuck partway through a reindex for the past ~5.5 hours | 19:37 |
*** lifeless has quit IRC | 19:37 | |
*** eharney has quit IRC | 19:37 | |
*** heyongli has quit IRC | 19:39 | |
*** salv-orlando has joined #openstack-infra | 19:40 | |
*** heyongli has joined #openstack-infra | 19:40 | |
dhellmann | heh, I assumed it was just taking a long time because we have a lot of data | 19:41 |
fungi | looking for how to kick it manually, though i want to say we've hit "too many open files" with hound in the past just don't remember if we made any adjustments which we've now started to outgrow | 19:43 |
mtreinish | fungi: sounds like ulimit | 19:44 |
clarkb | mtreinish: agreed | 19:44 |
*** salv-orlando has quit IRC | 19:44 | |
*** eernst has quit IRC | 19:45 | |
fungi | restarting the hound service seems to be how we update it | 19:46 |
fungi | tailing the log now | 19:46 |
fungi | and yeah, we likely made some ulimit changes in the initscript i'm just checking to see what | 19:47 |
fungi | ulimit -n 2048 | 19:47 |
fungi | in the start function | 19:47 |
fungi | so i guess we've outgrown it | 19:47 |
fungi | i'll get a patch together after my next conference call | 19:48 |
*** heyongli has quit IRC | 19:50 | |
fungi | 2018/06/14 19:50:08 All indexes built! | 19:50 |
*** heyongli has joined #openstack-infra | 19:50 | |
*** kjackal has quit IRC | 19:53 | |
*** lifeless has joined #openstack-infra | 19:53 | |
fungi | quite a number of "too many open files" errors in hound.log going back at least a week (the extent of our log retention on it) | 19:53 |
*** jesslampe has quit IRC | 19:54 | |
*** eernst has joined #openstack-infra | 19:54 | |
*** eernst has quit IRC | 19:55 | |
*** eernst has joined #openstack-infra | 19:55 | |
clarkb | AJaeger: dirk I managed to rescue my xmonad install on tumbleweed by using https://build.opensuse.org/project/show/devel:languages:haskell:lts:11 the joys of a rolling release I guess. I'm pinging you because I can't find why the xmonad packages were removed from the distro proper, any idea how I figure that out? I'd be willing to maintain packages if that is what is needed | 19:55 |
mtreinish | clarkb: heh, its just not hipster enough for xmonad :) | 19:58 |
*** heyongli has quit IRC | 20:00 | |
*** heyongli has joined #openstack-infra | 20:00 | |
*** dpawlik has quit IRC | 20:03 | |
*** agopi has quit IRC | 20:09 | |
*** agopi has joined #openstack-infra | 20:09 | |
*** heyongli has quit IRC | 20:10 | |
*** heyongli has joined #openstack-infra | 20:10 | |
*** iyamahat has quit IRC | 20:14 | |
clarkb | mtreinish: at this point it feels less hipster and silly old functional programming people related | 20:16 |
*** felipemonteiro has quit IRC | 20:16 | |
clarkb | which is fine by me | 20:16 |
*** yamamoto has joined #openstack-infra | 20:16 | |
*** heyongli has quit IRC | 20:20 | |
*** heyongli has joined #openstack-infra | 20:21 | |
*** yamamoto has quit IRC | 20:22 | |
*** e0ne_ has joined #openstack-infra | 20:26 | |
*** e0ne has quit IRC | 20:29 | |
*** iyamahat has joined #openstack-infra | 20:29 | |
*** heyongli has quit IRC | 20:31 | |
*** heyongli has joined #openstack-infra | 20:31 | |
openstackgerrit | Merged openstack-infra/project-config master: Add xenial node for vexxhost kata testing https://review.openstack.org/575502 | 20:33 |
*** esarault has quit IRC | 20:34 | |
mtreinish | clarkb: heh, ok | 20:38 |
*** salv-orlando has joined #openstack-infra | 20:40 | |
*** heyongli has quit IRC | 20:41 | |
*** heyongli has joined #openstack-infra | 20:41 | |
clarkb | mtreinish: are you in the i3 camp? I seem to recall you had something going on your laptop too | 20:44 |
*** salv-orlando has quit IRC | 20:45 | |
*** eernst has quit IRC | 20:45 | |
mtreinish | I run openbox on my laptop and desktop | 20:47 |
mtreinish | I could get into the full tiling wm thing | 20:47 |
*** eernst has joined #openstack-infra | 20:47 | |
*** eernst has joined #openstack-infra | 20:47 | |
* fungi still uses raptioson on all his x11 sessions | 20:47 | |
mtreinish | I did want to put i3 on my gemini, but the linux side of that is still a mess so I couldnt get it working | 20:48 |
fungi | er, ratpoison | 20:48 |
fungi | no idea how my fingers turned that into raptioson | 20:48 |
zigo | Hi there ! | 20:48 |
zigo | Any idea why https://review.openstack.org/#/c/575168/ isn't launching its tests? | 20:48 |
zigo | Did I do something wrong? | 20:48 |
fungi | zigo: http://zuul.openstack.org/ shows it enqueued in the check pipeline for right at 2 hours now | 20:50 |
zigo | fungi: So, it's just busy infra? | 20:50 |
fungi | we're just busy at the moment, yeah | 20:50 |
zigo | Ok. | 20:51 |
*** heyongli has quit IRC | 20:51 | |
zigo | I just found a wrong test in tempest... | 20:51 |
zigo | http://logs.openstack.org/62/575262/2/check/puppet-openstack-integration-4-scenario003-tempest-debian-stable/fb52e26/job-output.txt.gz#_2018-06-14_14_47_34_908771 | 20:51 |
zigo | Looks like " instead of ' ... :P | 20:51 |
fungi | looks like there are changes just a few minutes ahead of it getting node assignments in check now, so it'll probably start running jobs momentarily | 20:51 |
*** heyongli has joined #openstack-infra | 20:51 | |
zigo | If I fix that one, then I get a 2nd puppet-openstack scenario passing ! :) | 20:52 |
fungi | excellent | 20:52 |
zigo | fungi: https://review.openstack.org/#/c/575262/ <--- The first scenario (the one that everyone cares about) is now green there... | 20:52 |
*** dhajare has quit IRC | 20:53 | |
zigo | I'm not sure what's going on with fwaas though, but it's definitively the cause of the last issue. | 20:53 |
fungi | and looks like 575168,5 has node assignments rolling in now | 20:57 |
*** caphrim007 has joined #openstack-infra | 20:58 | |
openstackgerrit | Merged openstack-infra/grafyaml master: fix tox python3 overrides https://review.openstack.org/574333 | 21:00 |
*** eernst has quit IRC | 21:00 | |
*** eernst has joined #openstack-infra | 21:00 | |
ianw | mordred: are you thinking the clouds.yaml would look like -> http://paste.openstack.org/show/723511/ ? | 21:00 |
ianw | cause i still can't get it to automatically choose :/ | 21:01 |
clarkb | mnaser: I just pushed to my kata proxy PR so we should queue up a job on xenial | 21:01 |
ianw | i feel slightly better that it was not at least trivially obvious | 21:01 |
*** heyongli has quit IRC | 21:01 | |
*** heyongli has joined #openstack-infra | 21:02 | |
mnaser | clarkb: oh cool i'll keep an eye out | 21:02 |
*** eernst has quit IRC | 21:02 | |
*** eernst has joined #openstack-infra | 21:06 | |
*** gfidente|afk has quit IRC | 21:07 | |
ianw | ok, i think it's default we want | 21:08 |
ianw | which leads to "The cloud returned multiple addresses, and none of them seem to work. That might be what you wanted, but we have no clue what's going on, so we just picked one at random" | 21:09 |
ianw | which sounds like a mordred error message to me :) but it comes up | 21:09 |
ianw | fungi / clarkb : want to checkout 147.75.38.146 for sanity as a testing node; if it's good we're closer to getting packethost up | 21:10 |
*** bobh has quit IRC | 21:10 | |
*** eernst has quit IRC | 21:10 | |
fungi | i can certainly reach it | 21:10 |
fungi | 75gb rootfs with 63gb available might be tight | 21:11 |
clarkb | fungi: I think that is what we get in other clouds | 21:11 |
fungi | i thought it was 80, but maybe close enough | 21:12 |
*** SumitNaiksatam has quit IRC | 21:12 | |
fungi | 4 vcpus | 21:12 |
*** heyongli has quit IRC | 21:12 | |
*** heyongli has joined #openstack-infra | 21:12 | |
ianw | oohh, i might have selected m1.large actually | 21:12 |
ianw | i think there's a zuul flavor | 21:12 |
clarkb | fungi: I think once you account for Gb vs GB and fs overhead we end up with ~75GB usable then our git caches fill up a ton of spcae alongside the distro | 21:12 |
fungi | sure | 21:13 |
clarkb | but ya 4vcpus looks potentially short what we want, but if we aren't oversubscribed or those cpus are fast it could be ok | 21:13 |
ianw | 548f2da8-edb4-440f-8f64-b661223f572c | zuul-flavor | 8192 | 80 | 0 | 8 | True | 21:13 |
clarkb | osic was 4vcpu for a while | 21:13 |
ianw | clarkb: ^ yep, the zuul flavor has 8, but is otherwise the same | 21:13 |
clarkb | ianw: that looks closer to what we want | 21:13 |
clarkb | ianw: maybe run tests on both flavors | 21:13 |
fungi | and yeah, no other block devices besides the rootfs as far as i can see | 21:13 |
clarkb | if 4vcpu is enough we use that otherwise use 8 | 21:13 |
ianw | i can bring up a zuul flavor ... hang on | 21:13 |
*** iyamahat has quit IRC | 21:13 | |
fungi | /dev/vda1 * 2048 167772126 167770079 80G 83 Linux | 21:14 |
fungi | you're right clarkb | 21:14 |
fungi | Disk /dev/vda: 80 GiB, 85899345920 bytes, 167772160 sectors | 21:15 |
*** ldnunes has quit IRC | 21:15 | |
ianw | 147.75.38.147 is a zuul-flavor | 21:15 |
fungi | yeah, looks basically the same except for double the vcpu count | 21:16 |
clarkb | ya I would say run some representative test on it (tempest has been good in the past) and we can compare them and go from there | 21:17 |
fungi | i suppose we can try m1.large and if jobs run too slowly we switch to the zuul flavor? | 21:17 |
clarkb | fungi: in the past I've run tempest on flavors until we find one that works | 21:17 |
clarkb | its a bit more difficult now that we don't have a reproduce.sh that works out of the box | 21:17 |
clarkb | but if you modify the zuul ref stuff you can get it ti still function I think | 21:17 |
clarkb | (you just point it at master) | 21:18 |
ianw | well we have enough quota to run 100 of the zuul flavor, so i'd say go with that | 21:18 |
clarkb | ianw: that works too | 21:18 |
ianw | otherwise we run out of ram before cpu | 21:18 |
clarkb | good point | 21:18 |
clarkb | in that case we can just test zuul-flavor | 21:18 |
clarkb | make sure it gets jobs done reasonably quickly and then add it ot the pool | 21:18 |
openstackgerrit | Merged openstack-infra/system-config master: Add cache status into mirror access log https://review.openstack.org/575514 | 21:18 |
clarkb | ianw: I want to say if you take a current devstack-gate reproduce.sh and hack it to have a zuul cloner and update the zuul refs it will still work | 21:19 |
*** yamamoto has joined #openstack-infra | 21:19 | |
*** roman_g has quit IRC | 21:19 | |
zigo | fungi: The day we get our public cloud deployed, I'll make sure we give a few VMs to the infra. What's the requirement? Is there a minimum? I guess it's a nice stress test to run infra jobs, no? :) | 21:19 |
*** trown is now known as trown|outtypewww | 21:20 | |
fungi | zigo: it's a very effective stress-test, we've been told | 21:20 |
zigo | :) | 21:20 |
zigo | And then we get free monitoring by real humans ... :P | 21:21 |
fungi | a minimum of 25 nodes worth of quota i think we've said in the past, but preferably around 100 nodes of quota or more | 21:21 |
fungi | 25 is where it starts becoming more work for us to track than we benefit from | 21:21 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add default network to packethost https://review.openstack.org/575547 | 21:21 |
zigo | Ok. | 21:22 |
*** heyongli has quit IRC | 21:22 | |
clarkb | ianw: also ansible-clouds.yaml or whatever that file is called | 21:22 |
zigo | fungi: And for us, it's just giving out a tenant, right? | 21:22 |
*** heyongli has joined #openstack-infra | 21:22 | |
clarkb | it is in the same dir as all-clouds.yaml | 21:22 |
fungi | zigo: basically, yes. details are at https://docs.openstack.org/infra/system-config/contribute-cloud.html | 21:23 |
*** eernst has joined #openstack-infra | 21:23 | |
fungi | zigo: and if we can get 100 nodes of quota then we list the provider at https://www.openstack.org/foundation/companies/#infra-donors as well as on posters and slides in common areas and between talks at some of our conferences/events | 21:24 |
*** yamamoto has quit IRC | 21:24 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add default network to packethost https://review.openstack.org/575547 | 21:25 |
ianw | clarkb: ^ now with more overrides :) | 21:25 |
fungi | zigo: and we track and graph our interactions with per-provider dashboards like http://grafana.openstack.org/dashboard/db/nodepool-rackspace too so you can see what we're seeing | 21:25 |
clarkb | ianw: +2 thanks | 21:26 |
zigo | k | 21:26 |
*** eernst has quit IRC | 21:27 | |
fungi | the "time to ready" graph for example shows how long it's taking from when we put in a nova boot call until the node is available and reachable | 21:27 |
fungi | and the various api operations graphs show response times for the api to get back to us with responses for each of those kinds of calls | 21:28 |
zigo | We're kind on planning on super fast infra... | 21:29 |
zigo | 4 x 10 Gbits bgp to the host... | 21:29 |
*** slaweq has quit IRC | 21:29 | |
zigo | The only thing is that I haven't found so many docs on bgp-to-the-host setups. | 21:30 |
zigo | :/ | 21:30 |
*** slaweq has joined #openstack-infra | 21:30 | |
*** eernst has joined #openstack-infra | 21:31 | |
*** aeng has joined #openstack-infra | 21:32 | |
*** heyongli has quit IRC | 21:32 | |
*** heyongli has joined #openstack-infra | 21:32 | |
fungi | that would be pretty awesome. ibgp i'm assuming, not ebgp | 21:33 |
fungi | though hopefully bgp6 | 21:33 |
fungi | i never really dealt with ibgp back in my providers days since we were pretty firmly entrenched in ospf (and some inherited eigrp) for our igp, though did a lot of ebgp at least | 21:35 |
*** caphrim007 has quit IRC | 21:35 | |
*** eernst has quit IRC | 21:35 | |
*** eernst has joined #openstack-infra | 21:37 | |
*** hasharAway has quit IRC | 21:37 | |
*** camunoz has quit IRC | 21:38 | |
fungi | i can imagine some very interesting global redundancy services you could sell to customers by advertising their same ip addresses from different servers in different facilities | 21:39 |
*** salv-orlando has joined #openstack-infra | 21:41 | |
ianw | fungi / clarkb : i'm feeling like if we merge the default network thing, we just hand-hack in a max servers of 5 or something to get a few jobs and see? | 21:41 |
clarkb | ianw: probably not the worst thing | 21:42 |
*** eernst has quit IRC | 21:42 | |
*** heyongli has quit IRC | 21:42 | |
*** heyongli has joined #openstack-infra | 21:43 | |
*** eernst has joined #openstack-infra | 21:43 | |
*** prometheanfire has quit IRC | 21:43 | |
*** dave-mccowan has quit IRC | 21:45 | |
*** diablo_rojo has quit IRC | 21:45 | |
*** myoung is now known as myoung|off | 21:45 | |
*** salv-orlando has quit IRC | 21:46 | |
*** slaweq has quit IRC | 21:46 | |
fungi | wfm | 21:48 |
*** eernst has quit IRC | 21:48 | |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Migrate the API-SIG to StoryBoard https://review.openstack.org/575120 | 21:49 |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Rename the API-WG to API-SIG https://review.openstack.org/575478 | 21:49 |
*** eernst has joined #openstack-infra | 21:49 | |
*** lifeless has quit IRC | 21:49 | |
*** sthussey has quit IRC | 21:49 | |
*** lifeless has joined #openstack-infra | 21:50 | |
*** eernst has quit IRC | 21:51 | |
*** heyongli has quit IRC | 21:53 | |
*** eernst has joined #openstack-infra | 21:53 | |
*** heyongli has joined #openstack-infra | 21:53 | |
*** edmondsw has quit IRC | 21:54 | |
*** eernst has quit IRC | 21:55 | |
*** eernst has joined #openstack-infra | 21:55 | |
fungi | we're finally back down under 100 node requests | 21:59 |
fungi | er, 1000 | 21:59 |
*** diablo_rojo has joined #openstack-infra | 22:01 | |
fungi | looks like we're spending good chunks of time with no executors accepting, though we're managing to saturate our quotas so i don't guess we need more executors yet | 22:02 |
*** heyongli has quit IRC | 22:03 | |
*** heyongli has joined #openstack-infra | 22:03 | |
*** jbadiapa_ has joined #openstack-infra | 22:03 | |
*** jesslampe has joined #openstack-infra | 22:04 | |
*** jbadiapa has quit IRC | 22:06 | |
openstackgerrit | Colleen Murphy proposed openstack-infra/puppet-openstack_infra_spec_helper master: Use system-config script to install puppet https://review.openstack.org/481943 | 22:06 |
*** akhilaki has quit IRC | 22:11 | |
*** heyongli has quit IRC | 22:13 | |
*** heyongli has joined #openstack-infra | 22:14 | |
ianw | yeah a new cloud would be nice about now :) | 22:16 |
fungi | heh | 22:16 |
fungi | should we bypass check for 575547? | 22:16 |
johnsom | I have a question about a gate run. We got ERROR Unable to find playbook , though it's there in the master branch but this patch was not rebased onto that yet. Is it the case that the playbooks must be in patch history? It seems odd that it would find the job but not the playbook. | 22:18 |
johnsom | This is the patch: https://review.openstack.org/#/c/558962/ | 22:18 |
*** iyamahat has joined #openstack-infra | 22:19 | |
*** rcernin has joined #openstack-infra | 22:20 | |
clarkb | johnsom: where is the change that added the job and the one that adds the playbook | 22:20 |
*** yamamoto has joined #openstack-infra | 22:20 | |
johnsom | clarkb https://review.openstack.org/549654 | 22:20 |
ianw | fungi: will the check actually check anything? i'm not sure we even run a syntax check over those files | 22:21 |
clarkb | johnsom: its possible the branches specifier may be confusing things since this is a branched repo | 22:22 |
corvus | infra-root: any objection to me restarting zuul now? (full restart) | 22:23 |
fungi | ianw: other than maybe a yaml syntax check, doubtful. regardless we're just going to run all the same jobs again in the gate so skipping check doesn't lose us anything | 22:23 |
fungi | corvus: no objection | 22:23 |
*** heyongli has quit IRC | 22:23 | |
johnsom | clarkb Did we do this wrong? branches: ^(?!stable/(ocata|queens)).*$ Both patches were on the master branch | 22:23 |
clarkb | johnsom: looking at time stamps it may also be a race in config generation possibly | 22:24 |
*** heyongli has joined #openstack-infra | 22:24 | |
clarkb | johnsom: in general you don't need branches: specifiers on repos with branches. Instead you just have per branch configs on each branch | 22:24 |
*** e0ne_ has quit IRC | 22:24 | |
clarkb | johnsom: this allows your config to evolve on master but be stable on stable branches just by virtue of git branch branching | 22:24 |
clarkb | corvus: not from me | 22:24 |
*** yamamoto has quit IRC | 22:25 | |
clarkb | johnsom: the behavior of branches: can get really confusing as you branch overtime and try to update things | 22:25 |
johnsom | clarkb I wondered about that. So really we don't need those "branches" config lines? That makes more sense to me really | 22:25 |
clarkb | zuul applies it deterministically but it does so in a way that people don't expect unless they consider all branches together | 22:25 |
clarkb | johnsom: ya, instead you just configure the jobs you want in each branch's config | 22:26 |
johnsom | clarkb Golden, thanks! I will clean those up at some point. | 22:26 |
*** boden has quit IRC | 22:27 | |
*** hongbin has quit IRC | 22:27 | |
clarkb | johnsom: as for the error you saw, I think it is possible that there was a race between merging trees for the executorsand merging trees for the scheduler. The scheduler merged stuff and added the new job. executor merged stuff and didn't add new job and new playbook | 22:27 |
corvus | clarkb: the executor should merge exactly what the scheduler did | 22:28 |
clarkb | corvus: https://review.openstack.org/#/c/558962/ last comment from zuul there is what we are discussing fwiw | 22:29 |
clarkb | corvus: there definitely appears to be a mismatch | 22:29 |
clarkb | https://review.openstack.org/#/c/549654/37 is the change that dded the job and playbook | 22:29 |
clarkb | not see anything otherthan the brances: specifiers that look out of place. THe paths seem to match up | 22:31 |
corvus | clarkb: oh i understand what you're saying. yes, there was likely a mismatch between the scheduler's config and the change itself. if the repo state for the change was frozen by the scheduler, then the change that added the job landed, then we might expect to see this error. | 22:32 |
*** ociuhandu has joined #openstack-infra | 22:32 | |
*** ociuhandu_ has quit IRC | 22:33 | |
corvus | clarkb: (the repo states are still consistent between what the scheduler originally merged for the change and what it executed. the delta is between both of those states and the running config. interestingly, a change which touches .zuul.yaml is probably immune to this fault) | 22:33 |
*** heyongli has quit IRC | 22:34 | |
*** heyongli has joined #openstack-infra | 22:34 | |
clarkb | in that case I think the remedy for now is to recheck | 22:34 |
corvus | yep | 22:34 |
clarkb | johnsom: ^ | 22:34 |
*** nicolasbock has quit IRC | 22:34 | |
johnsom | Ok | 22:35 |
corvus | if we wanted to treat this as a bug, we might be able to fix it by resetting the buildset when jobs are added to a buildset, and those jobs are defined in a repo in the current dependency chain. | 22:36 |
*** iyamahat has quit IRC | 22:42 | |
*** salv-orlando has joined #openstack-infra | 22:42 | |
*** heyongli has quit IRC | 22:44 | |
*** heyongli has joined #openstack-infra | 22:44 | |
corvus | okay, so i think what i'm going to do this time is shut the executors and mergers down, wait until they're stopped, then save queues and restart the scheduler (and friends), then start mergers and executors | 22:44 |
corvus | i think that will minimize perceived downtime while avoiding the issue of the scheduler timing out during startup | 22:45 |
fungi | what were the circumstances of the scheduler timeout last time? | 22:45 |
clarkb | ok, and full restart is for pre release burn in? | 22:45 |
corvus | fungi: if it takes more than 5m for a merger to pick up an initial "cat" job, the scheduler will stop. | 22:46 |
*** salv-orlando has quit IRC | 22:46 | |
*** jbadiapa_ has quit IRC | 22:46 | |
fungi | aha, right, and the mergers/executors were still all stopping when the scheduler got restarted? | 22:47 |
corvus | (i could stop everything, then start the mergers along with the scheduler but not the executor. i think that would work just as well) | 22:47 |
clarkb | I think that is how I've done it before | 22:47 |
corvus | yeah, i think fundamentally the mistake was not starting the mergers with the scheduler. that's minimally required. executors are optional. | 22:48 |
fungi | anyway, sounds safe enough. thanks for the reminder! | 22:48 |
*** boris_42_ has joined #openstack-infra | 22:48 | |
corvus | oh, actually, let me tweak this a bit | 22:49 |
fungi | separately, i wonder if it would make sense for the scheduler to bring a merger to the party the way each executor does | 22:49 |
corvus | https://etherpad.openstack.org/p/aIJSh0cGks | 22:49 |
*** tosky has quit IRC | 22:49 | |
clarkb | that lgtm | 22:50 |
corvus | okay, two procedures there. i think they both would work (i tweaked my original suggestion slightly) | 22:52 |
corvus | with the first procedure, we still want to keep the mergers online as long as possible so that zuul can continue to process events and add new items/jobs to the queues | 22:53 |
clarkb | procedure 2 is the one I've used in the past and seemed to work well enough | 22:54 |
*** heyongli has quit IRC | 22:54 | |
fungi | the main difference with #2 is that the executors are continuing to stop while you're bringing everything else back up i guess? | 22:54 |
*** heyongli has joined #openstack-infra | 22:54 | |
fungi | that means less time wasted waiting since you can wait for them to stop while you wait for the config to get loaded by the scheduler? | 22:55 |
corvus | yes, and the scheduler will come online slower with #2, which means more new events will end up first in the queue ahead of the things you saved | 22:55 |
fungi | good point | 22:55 |
fungi | those people just win the lottery, that's all ;) | 22:55 |
corvus | well, i think either way, the total time taken is going to be driven by the executor restart cycle | 22:55 |
*** threestrands has joined #openstack-infra | 22:56 | |
*** threestrands has quit IRC | 22:56 | |
*** threestrands has joined #openstack-infra | 22:56 | |
*** dklyle has quit IRC | 22:56 | |
corvus | i think the event ordering is the only real substantial difference (even with only the mergers, the scheduler will come online before the executors are stopped) | 22:56 |
corvus | i'm going to try #1 just to try it out | 22:56 |
fungi | wfm | 22:57 |
*** dklyle has joined #openstack-infra | 22:57 | |
clarkb | sounds good | 22:57 |
*** threestrands has quit IRC | 22:57 | |
corvus | i gave a heads up to -release | 22:57 |
*** threestrands has joined #openstack-infra | 22:57 | |
corvus | stopping executors | 22:58 |
corvus | while i'm looking at the post pipeline... clarkb you may be interested in reviewing https://review.openstack.org/571932 | 22:59 |
fungi | oh! i didn't see that one come in | 23:00 |
clarkb | oh indeed. | 23:01 |
corvus | we've gone 6 years with only two pipeline managers. it will be exciting to have a third :) | 23:01 |
*** jbadiapa_ has joined #openstack-infra | 23:02 | |
*** heyongli has quit IRC | 23:04 | |
*** heyongli has joined #openstack-infra | 23:05 | |
pabelanger | Oooh, looking forward to ^ | 23:11 |
clarkb | corvus: when not restarting zuul I left some comments, one of which I think may need attention (at the very least to get the logging right) | 23:12 |
corvus | looks like all executors have stopped, so i'm proceeding now | 23:12 |
*** heyongli has quit IRC | 23:15 | |
*** heyongli has joined #openstack-infra | 23:15 | |
corvus | okay, a downside of process #1 is extra nodepool thrashing | 23:15 |
corvus | after stopping the executors, the scheduler recycles all the nodes, but then once the scheduler restarts, they all have to be recycled again | 23:17 |
corvus | (sorry cloud providers) | 23:17 |
corvus | that might be a reason to favor #2 | 23:17 |
fungi | yeah | 23:17 |
*** yamamoto has joined #openstack-infra | 23:21 | |
EmilienM | is the queue going to come back? | 23:23 |
corvus | fungi, clarkb: responded to comments; can you check me on that? | 23:23 |
fungi | EmilienM: yes | 23:24 |
EmilienM | ok | 23:24 |
corvus | EmilienM: yep, just started re-enqueuing (was waiting for zuul to come online) | 23:24 |
*** heyongli has quit IRC | 23:25 | |
*** heyongli has joined #openstack-infra | 23:25 | |
fungi | corvus: thanks for the detailed response. makes sense and i hadn't considered how the queue ordering actually worked | 23:26 |
*** yamamoto has quit IRC | 23:26 | |
corvus | fungi: me neither -- at least not with respect to starvation | 23:26 |
fungi | it's certainly an interesting dynamic | 23:27 |
fungi | on the other hand, at least in our model, the items should only enqueue as fast as new merges can trigger them | 23:27 |
corvus | i think maybe a way things could be made more fair (for supercedent and depedent) would be to have the queue processor order the queues based on the enqueue time of the head of each queue. | 23:28 |
*** prometheanfire has joined #openstack-infra | 23:28 | |
fungi | yeah, i'm trying to think back to fair queuing algorithms i've had familiarity with in the past and whether any are relevant to this case | 23:29 |
fungi | the packet forwarding field is full of fair queuing designs | 23:29 |
clarkb | corvus: aha, I think that may be worth a comment as its non obvious if just looking at it like queue operations. Of course that could be the allergy headache talking | 23:30 |
clarkb | corvus: basically 0th and last are special | 23:30 |
fungi | clarkb: my left ear has been stopped up since vancouver. i feel for you :/ | 23:30 |
*** jesslampe has quit IRC | 23:30 | |
clarkb | fungi: the last few days have been really bad, yesterday was ok though. We had a reasonably dry may, then rain and now sun again and I think thattriiggered whatever my body hates to add pollen to the air | 23:31 |
clarkb | the rain then sun combo is so bad | 23:31 |
fungi | unless you're grass | 23:31 |
clarkb | I have a hunch its the what I think is lilac in the backyard | 23:32 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add supercedent pipeline manager https://review.openstack.org/571932 | 23:32 |
clarkb | but I don't want to take the tree out until I am sure because its a nice tree | 23:32 |
*** heyongli has quit IRC | 23:35 | |
*** heyongli has joined #openstack-infra | 23:35 | |
*** r-daneel has quit IRC | 23:37 | |
*** jesslampe has joined #openstack-infra | 23:39 | |
*** lifeless_ has joined #openstack-infra | 23:42 | |
*** salv-orlando has joined #openstack-infra | 23:43 | |
*** lifeless has quit IRC | 23:43 | |
*** heyongli has quit IRC | 23:45 | |
*** heyongli has joined #openstack-infra | 23:46 | |
*** markvoelker has quit IRC | 23:46 | |
*** salv-orlando has quit IRC | 23:47 | |
*** bobh has joined #openstack-infra | 23:52 | |
*** caphrim007 has joined #openstack-infra | 23:52 | |
*** rpioso is now known as rpioso|afk | 23:54 | |
*** heyongli has quit IRC | 23:56 | |
*** heyongli has joined #openstack-infra | 23:56 | |
*** caphrim007 has quit IRC | 23:57 | |
*** caphrim007 has joined #openstack-infra | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!