*** dchen has joined #openstack-infra | 00:02 | |
*** armax has quit IRC | 00:09 | |
*** tkajinam has quit IRC | 00:12 | |
*** takamatsu has quit IRC | 00:18 | |
openstackgerrit | Merged opendev/irc-meetings master: Updated the PowerVM Chair Person Info https://review.opendev.org/671929 | 00:18 |
---|---|---|
*** dklyle has joined #openstack-infra | 00:28 | |
*** bobh has joined #openstack-infra | 00:28 | |
*** armax has joined #openstack-infra | 00:32 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add some pointers on the OpenDev PPA https://review.opendev.org/670952 | 00:59 |
*** mgoddard has quit IRC | 01:01 | |
*** mgoddard has joined #openstack-infra | 01:05 | |
*** imacdonn has quit IRC | 01:17 | |
*** imacdonn has joined #openstack-infra | 01:18 | |
*** ruffian_sheep has joined #openstack-infra | 01:25 | |
*** ruffian_sheep has quit IRC | 01:41 | |
*** ruffian_sheep has joined #openstack-infra | 01:46 | |
*** ruffian_sheep has quit IRC | 02:08 | |
*** zul has quit IRC | 02:09 | |
*** ruffian_sheep has joined #openstack-infra | 02:21 | |
*** ruffian_sheep has quit IRC | 02:31 | |
*** bobh has quit IRC | 02:34 | |
*** ruffian_sheep has joined #openstack-infra | 02:37 | |
*** ruffian_sheep27 has joined #openstack-infra | 02:38 | |
*** ruffian_sheep has quit IRC | 02:41 | |
*** ykarel|away has joined #openstack-infra | 02:43 | |
*** ykarel|away has quit IRC | 02:56 | |
*** yamamoto has joined #openstack-infra | 03:00 | |
ianw | i do not like the full set of POST_FAILURES i just got for that ... :/ | 03:04 |
ianw | nor this -- [Mon Jul 22 00:45:20 2019] EXT4-fs (dm-2): Remounting filesystem read-only | 03:05 |
ianw | /srv/static/logs is offline ... looking into it | 03:08 |
ianw | i'm going to reboot the host with the mount commented in fstab; it could do with new kernel updates anyway. we can buffer logs to local disk while i presumably run a prolonged fsck | 03:09 |
*** yamamoto has quit IRC | 03:10 | |
*** yamamoto has joined #openstack-infra | 03:14 | |
*** yamamoto has quit IRC | 03:21 | |
*** yamamoto has joined #openstack-infra | 03:26 | |
*** yamamoto has quit IRC | 03:26 | |
*** yamamoto has joined #openstack-infra | 03:27 | |
*** ykarel|away has joined #openstack-infra | 03:31 | |
*** psachin has joined #openstack-infra | 03:32 | |
*** bhavikdbavishi has joined #openstack-infra | 03:34 | |
*** ruffian_sheep27 is now known as ruffian_sheep | 03:36 | |
*** yamamoto has quit IRC | 03:37 | |
*** yamamoto has joined #openstack-infra | 03:39 | |
*** bhavikdbavishi1 has joined #openstack-infra | 03:41 | |
*** bhavikdbavishi has quit IRC | 03:42 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:42 | |
*** yamamoto has quit IRC | 03:44 | |
*** yamamoto has joined #openstack-infra | 03:47 | |
*** yamamoto has quit IRC | 03:48 | |
*** yamamoto has joined #openstack-infra | 03:55 | |
ianw | #status log logs.o.o : /srv/static/logs bind mounted to /opt for buffering, recovery of /dev/mapper/logs proceeding in a root screen session | 04:02 |
openstackstatus | ianw: finished logging | 04:02 |
*** udesale has joined #openstack-infra | 04:02 | |
ianw | infra-root: ^ fyi | 04:02 |
ianw | infra-root: note mounts recorded in /etc/fstab; will need to be restored | 04:04 |
*** ykarel|away has quit IRC | 04:05 | |
*** raukadah is now known as chandankumar | 04:10 | |
ianw | argh i just realised i should have copied out the errors too; there was no i/o type errors in the logs at the time. just some directory related ext4 issues and then it went r/o per above at 00:45 | 04:15 |
*** jamesmcarthur has joined #openstack-infra | 04:21 | |
*** ykarel|away has joined #openstack-infra | 04:23 | |
ianw | i think it's gone, oh well :/ | 04:24 |
*** ramishra has joined #openstack-infra | 04:27 | |
*** yamamoto has quit IRC | 04:30 | |
openstackgerrit | Merged opendev/system-config master: Add some pointers on the OpenDev PPA https://review.opendev.org/670952 | 04:39 |
*** yamamoto has joined #openstack-infra | 04:46 | |
*** threestrands has joined #openstack-infra | 05:00 | |
*** jamesmcarthur has quit IRC | 05:04 | |
*** Lucas_Gray has joined #openstack-infra | 05:06 | |
*** jamesmcarthur has joined #openstack-infra | 05:06 | |
*** jamesmcarthur has quit IRC | 05:10 | |
*** Lucas_Gray has quit IRC | 05:44 | |
*** lmiccini has joined #openstack-infra | 05:46 | |
*** ykarel|away is now known as ykarel | 05:49 | |
*** AJaeger has quit IRC | 05:53 | |
ianw | ahh, it's probably worth alerting that *old* logs aren't there for a bit while we reset | 05:56 |
openstackgerrit | Merged opendev/system-config master: Allow to rsync Centos Software Collections repo https://review.opendev.org/671449 | 06:00 |
ianw | #status alert Due to a failure on the logs.openstack.org volume, old logs are unavailable while partition is recovered. New logs are being stored. ETA for restoration probably ~Mon Jul 22 12:00 UTC 2019 | 06:01 |
openstackstatus | ianw: sending alert | 06:01 |
ianw | well there you go, i send that and the fsck just finished | 06:02 |
-openstackstatus- NOTICE: Due to a failure on the logs.openstack.org volume, old logs are unavailable while partition is recovered. New logs are being stored. ETA for restoration probably ~Mon Jul 22 12:00 UTC 2019 | 06:04 | |
*** ChanServ changes topic to "Due to a failure on the logs.openstack.org volume, old logs are unavailable while partition is recovered. New logs are being stored. ETA for restoration probably ~Mon Jul 22 12:00 UTC 2019" | 06:04 | |
*** whoami-rajat has joined #openstack-infra | 06:05 | |
openstackstatus | ianw: finished sending alert | 06:08 |
ianw | i'm just copying buffered logs | 06:13 |
*** udesale has quit IRC | 06:16 | |
*** piotrowskim has joined #openstack-infra | 06:17 | |
*** AJaeger has joined #openstack-infra | 06:18 | |
*** kopecmartin|off is now known as kopecmartin | 06:18 | |
*** AJaeger has quit IRC | 06:18 | |
*** AJaeger has joined #openstack-infra | 06:19 | |
*** ricolin__ is now known as ricolin | 06:20 | |
ianw | that's done. i guess we're ahead of schedule :) | 06:20 |
ianw | #status ok logs.openstack.org volume has been restored. please report any issues in #openstack-infra | 06:20 |
openstackstatus | ianw: sending ok | 06:20 |
AJaeger | thanks, ianw ! | 06:21 |
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure | docs http://docs.openstack.org/infra/ | bugs https://storyboard.openstack.org/ | source https://opendev.org/opendev/ | channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/" | 06:23 | |
-openstackstatus- NOTICE: logs.openstack.org volume has been restored. please report any issues in #openstack-infra | 06:23 | |
*** jaosorior has joined #openstack-infra | 06:25 | |
openstackstatus | ianw: finished sending ok | 06:28 |
*** markvoelker has quit IRC | 06:32 | |
*** pcaruana has joined #openstack-infra | 06:32 | |
*** jpena has joined #openstack-infra | 06:34 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: files.o.o : publish .log as text/plain https://review.opendev.org/671963 | 06:37 |
AJaeger | config-core, please review https://review.opendev.org/671117 | 06:43 |
*** udesale has joined #openstack-infra | 06:47 | |
*** yamamoto has quit IRC | 06:47 | |
*** dchen has quit IRC | 06:51 | |
*** dchen has joined #openstack-infra | 06:51 | |
*** odicha has joined #openstack-infra | 06:52 | |
*** xek has joined #openstack-infra | 06:56 | |
*** Goneri has joined #openstack-infra | 06:56 | |
*** slaweq has joined #openstack-infra | 06:57 | |
*** joeguo has quit IRC | 06:57 | |
*** tesseract has joined #openstack-infra | 07:01 | |
*** ginopc has joined #openstack-infra | 07:03 | |
*** markvoelker has joined #openstack-infra | 07:04 | |
*** rpittau|afk is now known as rpittau | 07:05 | |
*** yamamoto has joined #openstack-infra | 07:07 | |
*** rcernin has quit IRC | 07:07 | |
*** jamesmcarthur has joined #openstack-infra | 07:07 | |
*** pgaxatte has joined #openstack-infra | 07:08 | |
*** tkajinam has joined #openstack-infra | 07:08 | |
*** xek has quit IRC | 07:09 | |
*** jbadiapa has joined #openstack-infra | 07:09 | |
*** xek has joined #openstack-infra | 07:10 | |
ianw | AJaeger: 671117 -- how does changing the secret change the publishing path? or did i miss another change? | 07:14 |
AJaeger | ianw: changing the secrets let it publish to a different server, see how the other jobs look like. But let me double check... | 07:16 |
ianw | yeah, i can see that would allow it to write to /afs/openstack.org/docs instead of /afs/openstack.org/developer-docs ... but i'm not sure how it makes up the path | 07:17 |
AJaeger | ianw: check zuul.d/secrets.yaml, it has "path: /afs/.openstack.org/docs" for the new secret and "path: /afs/.openstack.org/developer-docs" for the old one | 07:17 |
AJaeger | That's the magic ;) | 07:17 |
ianw | ooohhhh, right, yep that explains it, thanks | 07:17 |
AJaeger | thanks for double checking! | 07:18 |
*** jaosorior has quit IRC | 07:18 | |
*** iurygregory has joined #openstack-infra | 07:19 | |
*** jtomasek has joined #openstack-infra | 07:21 | |
*** jtomasek has quit IRC | 07:21 | |
*** tosky has joined #openstack-infra | 07:21 | |
*** jhesketh has quit IRC | 07:22 | |
*** jtomasek has joined #openstack-infra | 07:22 | |
*** jhesketh has joined #openstack-infra | 07:24 | |
*** apetrich has joined #openstack-infra | 07:24 | |
*** jamesmcarthur has quit IRC | 07:26 | |
*** jamesmcarthur has joined #openstack-infra | 07:26 | |
*** tkajinam has quit IRC | 07:36 | |
*** ykarel is now known as ykarel|lunch | 07:45 | |
*** pgaxatte has quit IRC | 07:51 | |
*** lucasagomes has joined #openstack-infra | 07:54 | |
*** pgaxatte has joined #openstack-infra | 07:54 | |
*** jamesmcarthur has quit IRC | 07:56 | |
*** jamesmcarthur has joined #openstack-infra | 07:58 | |
*** iurygregory has quit IRC | 08:01 | |
*** ralonsoh has joined #openstack-infra | 08:04 | |
*** yolanda has joined #openstack-infra | 08:06 | |
*** roman_g has joined #openstack-infra | 08:07 | |
*** dtantsur|afk is now known as dtantsur | 08:13 | |
*** jamesmcarthur has quit IRC | 08:13 | |
*** hwoarang_ has quit IRC | 08:14 | |
*** threestrands has quit IRC | 08:14 | |
*** threestrands has joined #openstack-infra | 08:14 | |
*** dchen has quit IRC | 08:14 | |
*** kjackal has joined #openstack-infra | 08:22 | |
*** zbr|out is now known as zbr | 08:22 | |
*** odicha has quit IRC | 08:25 | |
*** pkopec has joined #openstack-infra | 08:28 | |
*** gtarnaras has joined #openstack-infra | 08:28 | |
*** kjackal has quit IRC | 08:33 | |
*** odicha has joined #openstack-infra | 08:35 | |
*** kjackal has joined #openstack-infra | 08:36 | |
*** ociuhandu has joined #openstack-infra | 08:37 | |
*** ociuhandu has quit IRC | 08:39 | |
*** ociuhandu has joined #openstack-infra | 08:39 | |
*** iurygregory has joined #openstack-infra | 08:41 | |
*** yamamoto has quit IRC | 08:42 | |
*** Goneri has quit IRC | 08:45 | |
*** yamamoto has joined #openstack-infra | 08:50 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Install gettext for translation jobs https://review.opendev.org/671992 | 08:53 |
*** jpena is now known as jpena|brb | 08:56 | |
*** Goneri has joined #openstack-infra | 08:58 | |
*** hwoarang has joined #openstack-infra | 09:01 | |
*** mgoddard has quit IRC | 09:01 | |
*** threestrands has quit IRC | 09:03 | |
*** e0ne has joined #openstack-infra | 09:04 | |
*** ykarel|lunch is now known as ykarel | 09:06 | |
*** bhavikdbavishi has quit IRC | 09:11 | |
*** ginopc has quit IRC | 09:15 | |
*** ruffian_sheep has quit IRC | 09:24 | |
*** priteau has joined #openstack-infra | 09:29 | |
*** ginopc has joined #openstack-infra | 09:38 | |
*** udesale has quit IRC | 09:40 | |
*** udesale has joined #openstack-infra | 09:40 | |
*** udesale has quit IRC | 09:42 | |
*** udesale has joined #openstack-infra | 09:42 | |
*** FlorianFa has joined #openstack-infra | 09:42 | |
*** siqbal has joined #openstack-infra | 09:42 | |
*** ociuhandu has quit IRC | 09:45 | |
*** jpena|brb is now known as jpena | 09:46 | |
*** ociuhandu has joined #openstack-infra | 09:48 | |
*** adriancz has joined #openstack-infra | 09:49 | |
*** pgaxatte has quit IRC | 10:02 | |
*** yamamoto has quit IRC | 10:05 | |
*** siqbal90 has joined #openstack-infra | 10:06 | |
*** siqbal has quit IRC | 10:07 | |
*** pgaxatte has joined #openstack-infra | 10:09 | |
*** pgaxatte has quit IRC | 10:16 | |
*** pgaxatte has joined #openstack-infra | 10:16 | |
*** joeguo has joined #openstack-infra | 10:24 | |
stephenfin | fungi: RE: doc8, the email doesn't seem to be appearing in the archives yet, but Ian said he was happy to either transfer doc8 to his personal GitHub account (sigmavirus24) or to add you to the doc8 team in the PyQCA org so you can move it directly there | 10:24 |
stephenfin | fungi: Happy to help wherever I can. Just tell me what I've to do :) | 10:25 |
*** udesale has quit IRC | 10:28 | |
*** ccamacho has joined #openstack-infra | 10:29 | |
*** ccamacho has quit IRC | 10:29 | |
*** ccamacho has joined #openstack-infra | 10:29 | |
*** yamamoto has joined #openstack-infra | 10:39 | |
*** jaosorior has joined #openstack-infra | 10:41 | |
*** jaosorior has quit IRC | 10:43 | |
*** jaosorior has joined #openstack-infra | 10:44 | |
*** joeguo has quit IRC | 10:45 | |
*** ykarel is now known as ykarel|afk | 10:47 | |
AJaeger | config-core, please review https://review.opendev.org/671117 and https://review.opendev.org/671121 to further deprecate api-site repo. | 10:49 |
*** yamamoto has quit IRC | 10:49 | |
sshnaidm | where is Dockerfile for nodepool container? There is an error to fix.. | 10:53 |
*** siqbal90 has quit IRC | 10:55 | |
*** siqbal has joined #openstack-infra | 10:55 | |
*** joeguo has joined #openstack-infra | 10:56 | |
sshnaidm | as I understand nodepool containers are not used anywhere and not tested, otherwise it'd fail on first run.. | 10:58 |
*** joeguo has quit IRC | 10:59 | |
*** joeguo has joined #openstack-infra | 10:59 | |
*** ruffian_sheep has joined #openstack-infra | 11:02 | |
openstackgerrit | Sagi Shnaidman proposed zuul/nodepool master: Fix nodepool container failure https://review.opendev.org/672012 | 11:04 |
*** rosmaita has joined #openstack-infra | 11:07 | |
*** ruffian_sheep has quit IRC | 11:08 | |
*** gtarnaras has quit IRC | 11:12 | |
*** apetrich has quit IRC | 11:15 | |
*** betherly has joined #openstack-infra | 11:19 | |
*** yamamoto has joined #openstack-infra | 11:22 | |
*** apetrich has joined #openstack-infra | 11:24 | |
*** betherly has quit IRC | 11:24 | |
*** rh-jelabarre has joined #openstack-infra | 11:28 | |
*** jpena is now known as jpena|lunch | 11:33 | |
*** jcoufal has joined #openstack-infra | 11:34 | |
*** kaisers has quit IRC | 11:35 | |
*** kaisers has joined #openstack-infra | 11:36 | |
*** apetrich has quit IRC | 11:37 | |
*** apetrich has joined #openstack-infra | 11:37 | |
*** jaosorior has quit IRC | 11:38 | |
*** yamamoto has quit IRC | 11:38 | |
*** jaosorior has joined #openstack-infra | 11:39 | |
*** kjackal has quit IRC | 11:39 | |
*** gtarnaras has joined #openstack-infra | 11:40 | |
*** kjackal has joined #openstack-infra | 11:41 | |
*** psachin has quit IRC | 11:50 | |
*** betherly has joined #openstack-infra | 11:50 | |
*** yamamoto has joined #openstack-infra | 11:55 | |
*** betherly has quit IRC | 11:55 | |
*** ykarel|afk is now known as ykarel | 12:05 | |
*** udesale has joined #openstack-infra | 12:07 | |
*** joeguo has quit IRC | 12:08 | |
*** lucasagomes has quit IRC | 12:14 | |
*** lucasagomes has joined #openstack-infra | 12:19 | |
*** _erlon_ has joined #openstack-infra | 12:21 | |
*** rfolco|rover has joined #openstack-infra | 12:28 | |
*** rlandy has joined #openstack-infra | 12:31 | |
*** tdasilva has joined #openstack-infra | 12:33 | |
*** dklyle has quit IRC | 12:36 | |
*** david-lyle has joined #openstack-infra | 12:36 | |
*** gtarnara_ has joined #openstack-infra | 12:39 | |
*** yamamoto has quit IRC | 12:39 | |
*** irclogbot_3 has quit IRC | 12:39 | |
*** electrofelix has joined #openstack-infra | 12:40 | |
*** yamamoto has joined #openstack-infra | 12:41 | |
*** gtarnaras has quit IRC | 12:42 | |
*** irclogbot_2 has joined #openstack-infra | 12:42 | |
*** ykarel has quit IRC | 12:43 | |
*** roman_g has quit IRC | 12:46 | |
*** jpena|lunch is now known as jpena | 12:47 | |
*** yamamoto has quit IRC | 12:54 | |
*** jaosorior has quit IRC | 12:59 | |
fungi | ianw: thanks for taking care of the logs volume! sorry i wasn't on hand, had already fallen asleep by that point | 12:59 |
fungi | stephenfin: i'll check the moderation queue here in a moment | 12:59 |
*** siqbal has quit IRC | 13:02 | |
*** siqbal has joined #openstack-infra | 13:02 | |
openstackgerrit | Merged openstack/project-config master: Publish api-ref/api-guide to docs.o.o https://review.opendev.org/671117 | 13:05 |
*** eharney has joined #openstack-infra | 13:10 | |
*** mriedem has joined #openstack-infra | 13:11 | |
*** yamamoto has joined #openstack-infra | 13:12 | |
*** yamamoto has quit IRC | 13:14 | |
fungi | stephenfin: i didn't find anything in the openstack-discuss moderation queue from him. are you sure it was sent to the list address? | 13:15 |
stephenfin | fungi: Sorry, wasn't clear at all /o\ I was referring to the pycqa archives | 13:16 |
fungi | oh, got it | 13:16 |
stephenfin | I'm guessing my messaging to that haven't been approved. In any case, as noted he's onboard with moving to his account and letting him do the shuffle to PyCQA | 13:18 |
fungi | stephenfin: if he temporarily adds the openstackadmin user to the org, i can transfer the repo directly into it with that account | 13:19 |
*** yamamoto has joined #openstack-infra | 13:19 | |
*** aaronsheffield has joined #openstack-infra | 13:20 | |
stephenfin | fungi: I'll ask him to do that now (y) | 13:20 |
fungi | as soon as it's done i'll do the transfer and then he can remove the permission | 13:20 |
AJaeger | mnaser: thanks for reviewing, could you look at https://review.opendev.org/#/c/671118/ also, please? | 13:20 |
*** zzehring has joined #openstack-infra | 13:26 | |
*** siqbal has quit IRC | 13:31 | |
AJaeger | fungi, if you have time for review, please? ^ | 13:33 |
fungi | sure, image encryption meeting wrapped up early, so i have a couple minutes | 13:33 |
fungi | heh, i shouldn't have tried to pull those depends-on entries in gertty | 13:34 |
*** gtarnara_ has quit IRC | 13:35 | |
*** needscoffee is now known as kmalloc | 13:36 | |
AJaeger | ;) | 13:39 |
fungi | fetching changes for openstack-manuals is almost as rough as fetching changes for nova | 13:40 |
*** gtarnaras has joined #openstack-infra | 13:43 | |
*** beekneemech is now known as bnemec | 13:43 | |
AJaeger | ;/ | 13:45 |
AJaeger | fungi: here's another change without dependencies - installing gettext as part of the role - that you might be interested in (no urgency). https://review.opendev.org/671992 Thanks! | 13:46 |
*** tesseract has quit IRC | 13:55 | |
*** ykarel has joined #openstack-infra | 13:59 | |
*** gtarnaras has quit IRC | 14:00 | |
*** tesseract has joined #openstack-infra | 14:00 | |
*** bhavikdbavishi has joined #openstack-infra | 14:00 | |
*** gtarnaras has joined #openstack-infra | 14:01 | |
*** yamamoto has quit IRC | 14:03 | |
*** bhavikdbavishi1 has joined #openstack-infra | 14:04 | |
*** bhavikdbavishi has quit IRC | 14:05 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 14:05 | |
*** yamamoto has joined #openstack-infra | 14:05 | |
johnsom | I have a question about gerrit ACLs. I see that there are "included groups" in the GUI ACLs, but I don't see those in the ACL definitions in the project-config repo. What is the process to update the "included groups" in the gerrit ACLs? | 14:05 |
AJaeger | johnsom: just add them in the UI like you add individual users | 14:06 |
fungi | johnsom: someone with ownership of the group in gerrit makes those changes (either through the gerrit webui or api) | 14:06 |
*** yamamoto has quit IRC | 14:06 | |
fungi | if the group is self-owned then any member of the group can do that | 14:06 |
*** yamamoto has joined #openstack-infra | 14:07 | |
fungi | if the group is owned by another group, then a member of that other owner group has to do it | 14:07 |
fungi | you can find the group owner on the group's general properties page in the gerrit webui | 14:07 |
johnsom | Ah, ok, so I likely can manage those. Thanks! | 14:07 |
fungi | you're welcome! | 14:07 |
fungi | if you need more details, let us know the name of the group and we can work out what's needed | 14:07 |
*** michael-beaver has joined #openstack-infra | 14:12 | |
openstackgerrit | Merged openstack/project-config master: Remove publish-openstack-manuals-developer-lang https://review.opendev.org/671118 | 14:12 |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Install gettext for translation jobs https://review.opendev.org/671992 | 14:13 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Add build releasenotes py3 template https://review.opendev.org/672053 | 14:19 |
*** yamamoto has quit IRC | 14:22 | |
*** smarcet has joined #openstack-infra | 14:22 | |
*** eernst has joined #openstack-infra | 14:25 | |
*** smarcet has quit IRC | 14:28 | |
clarkb | sshnaidm: questions like that are probably best in #zuul but I believe the answer is in the nodepool repo and that those images are used in the zuul quickstart job | 14:43 |
clarkb | ianw: thank you for taking care of that | 14:45 |
AJaeger | infra-root, could you copy most of http://files.openstack.org/developer-docs/api-ref/ to http://files.openstack.org/docs/api-ref/ , please? In that case I can give you a list of directories to copy... | 14:48 |
clarkb | AJaeger: this is part of the developer.o.o clean up? | 14:50 |
AJaeger | clarkb: yes | 14:51 |
*** yamamoto has joined #openstack-infra | 14:51 | |
AJaeger | might be easier than waiting for all to publish to new content - but nothing that needs done immediately. | 14:51 |
* AJaeger can prepare a small script with dirs to copy | 14:51 | |
clarkb | in theory we can also give AJaeger write perms on docs/ in afs? | 14:52 |
fungi | yeah, as long as he has kerberos working | 14:53 |
*** jamesmcarthur has joined #openstack-infra | 14:53 | |
AJaeger | clarkb: that means I would need to setup AFS which is not worth it for the 2 times I need it... | 14:53 |
fungi | (and afs obviously) | 14:53 |
*** jaosorior has joined #openstack-infra | 14:53 | |
clarkb | AJaeger: ya and my opensuse install stopped working (which is why I was so excited for kafs) | 14:54 |
fungi | i'm happy to do the file copies though, if i get a list/script containing them | 14:54 |
clarkb | me too (I'll just hop on one of our afs servers and do it from there) | 14:55 |
fungi | i have a meeting starting in 5 minutes, but could tackle it after\ | 14:55 |
AJaeger | thanks, fungi and clarkb - will provide a script in a few minutes | 14:55 |
*** yamamoto has quit IRC | 14:56 | |
AJaeger | fungi, clarkb, http://paste.openstack.org/show/754722/ is the script - needs adjustment in line 43 for the AFS location. Thanks! | 14:57 |
AJaeger | fungi, clarkb, please figure out who does it ;) | 14:57 |
AJaeger | wait - I forgot api-guide... | 14:58 |
AJaeger | now with all content - http://paste.openstack.org/show/754723/ | 14:59 |
AJaeger | needs three lines to adjust for AFS pathes | 14:59 |
AJaeger | if you want me to edit the script, tell me - I'll be offline now for an hour or two and can update later. | 15:01 |
*** jamesmcarthur has quit IRC | 15:03 | |
*** jamesmcarthur has joined #openstack-infra | 15:04 | |
*** ccamacho has quit IRC | 15:05 | |
openstackgerrit | Sagi Shnaidman proposed zuul/nodepool master: Fix nodepool container failure https://review.opendev.org/672012 | 15:07 |
*** gyee has joined #openstack-infra | 15:08 | |
*** jamesmcarthur has quit IRC | 15:08 | |
*** smarcet has joined #openstack-infra | 15:08 | |
*** odicha has quit IRC | 15:13 | |
clarkb | gitea01 and 02 have OOM'd since adding the swapfile | 15:14 |
clarkb | so 1GB swap isn't enough for all the demand there I guess | 15:14 |
mordred | corvus: I have NO CLUE why the gerrit image isn't building in zuul. it's working locally | 15:14 |
mordred | corvus: I'm going to put in a hold | 15:14 |
clarkb | I'll trigger replication against them just to be sure nothing was missed. But I guess we should start looking at rebuilding those as the next step | 15:15 |
mordred | corvus: also - I'm not sure if this is intended design or not: the image build fails, but we don't see a failure until the post job tries to upload it to the changeset registry | 15:15 |
corvus | mordred: i didn't intend for that :( | 15:16 |
mordred | kk. I'll look in to that after I figure out why it's not working in the first place | 15:17 |
*** priteau has quit IRC | 15:20 | |
*** gtarnaras has quit IRC | 15:23 | |
*** ginopc has quit IRC | 15:24 | |
*** yamamoto has joined #openstack-infra | 15:25 | |
*** pgaxatte has quit IRC | 15:29 | |
*** lpetrut has joined #openstack-infra | 15:30 | |
*** gtarnaras has joined #openstack-infra | 15:30 | |
*** ykarel is now known as ykarel|away | 15:35 | |
fungi | clarkb: were you already working on the api-site file copies, or shall i do them now? | 15:43 |
clarkb | fungi: sorry got nerd sniped digging into neutrons gate job changes so no haven't pulled up afs credentials | 15:44 |
clarkb | you should go ahead and do them. I need breakfast too | 15:44 |
*** kjackal has quit IRC | 15:45 | |
fungi | on it, thanks! | 15:45 |
*** kjackal has joined #openstack-infra | 15:47 | |
fungi | AJaeger: i'm going to assume that line 55 of that paste should have been DIRS_GUIDE rather than DIRS and am adjusting the script accordingly | 15:47 |
*** gfidente has joined #openstack-infra | 15:50 | |
AJaeger | fungi: yes, sorry | 15:51 |
*** gtarnaras has quit IRC | 15:52 | |
*** kjackal has quit IRC | 15:55 | |
*** kjackal has joined #openstack-infra | 15:55 | |
fungi | AJaeger: also, some of those target directories already exist. is the cp still going to work fine or will we wind up copying to a subdirectory? | 15:55 |
AJaeger | fungi: only baremetal-inspection exists - want to remove it from the script? | 15:56 |
AJaeger | fungi: the other existing dir (network) I already removed and baremetal-inspection is new | 15:56 |
AJaeger | (newly published) | 15:56 |
fungi | cool, i've taken baremetal-introspection out of the list to copy | 15:57 |
AJaeger | thanks | 15:57 |
fungi | copying starting now | 15:57 |
AJaeger | \o/ | 15:58 |
fungi | it will likely take a while | 15:58 |
fungi | since it's tromboning through my home broadband connection | 15:59 |
AJaeger | oh | 15:59 |
AJaeger | ok | 15:59 |
openstackgerrit | Merged zuul/nodepool master: Fix nodepool container failure https://review.opendev.org/672012 | 16:01 |
*** jpena is now known as jpena|off | 16:02 | |
*** e0ne has quit IRC | 16:05 | |
AJaeger | fungi: I see the first dirs on files.openstack.org ... | 16:07 |
*** udesale has quit IRC | 16:08 | |
fungi | yeah, it's gotten to data-protection-orchestration so far | 16:10 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66635&rra_id=all I think the spike there shows why 1GB swap isn't enough | 16:13 |
clarkb | If we are going to start replacing some of those servers we should stop adding new projects temporarily | 16:13 |
clarkb | AJaeger: ^ any concerns with that? | 16:14 |
AJaeger | clarkb: nothing in the queue for new projects - so go ahead. | 16:15 |
clarkb | Then we want to boot instances with an 80GB volume using the existing flavor of the other nodes. Will need to use the flag added in https://review.opendev.org/#/c/667548/ as we don't have ping6 installed on the image. Also need to remove the existing gitea0X's from the ansible inventory before booting the new instances (as tehy will conflict with launch node) | 16:15 |
clarkb | fungi: ^ is that something you were still interested in doing? | 16:16 |
*** rascasoft has quit IRC | 16:16 | |
*** lpetrut has quit IRC | 16:16 | |
* clarkb writes change to remove gitea01 from inventory temporarily | 16:16 | |
*** jaosorior has quit IRC | 16:17 | |
*** lucasagomes has quit IRC | 16:17 | |
*** smarcet has left #openstack-infra | 16:17 | |
sshnaidm | I see more and more errors, especially from OVH BHS1 like: mage prepare failed: 401 Client Error: Unauthorized for url: http://mirror.bhs1.ovh.openstack.org:8082/v2/tripleomaster/centos-binary-nova-compute-ironic/blobs/sha256:6d3a23ca3a1378376ca4268c06d7c7da7b25358e69ff389475e5a30b78549fbb | 16:17 |
sshnaidm | it failed a few gate jobs | 16:18 |
clarkb | sshnaidm: can you link to the job logs? | 16:18 |
sshnaidm | is there something to do with that? | 16:18 |
*** rascasoft has joined #openstack-infra | 16:18 | |
fungi | clarkb: sorry, stepped away for a sec, but yeah let me catch up and i can start work on replacing gitea01 | 16:18 |
sshnaidm | clarkb, http://logs.openstack.org/26/671526/4/gate/tripleo-ci-centos-7-undercloud-containers/5610169/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz?level=ERROR | 16:18 |
*** dtantsur is now known as dtantsur|afk | 16:18 | |
sshnaidm | clarkb, or this: http://logs.openstack.org/26/671526/4/gate/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a371bbe/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz?level=ERROR | 16:19 |
sshnaidm | clarkb, I have a few like that | 16:19 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove gitea01 from inventory so we can replace it https://review.opendev.org/672083 | 16:20 |
clarkb | fungi: I think something like ^ is the 0th step | 16:20 |
*** rpittau is now known as rpittau|afk | 16:21 | |
clarkb | sshnaidm: those urls are just proxied to dockerhub. My best guess is that object was made private? | 16:21 |
clarkb | sshnaidm: its also possible they changed their cdn again and nothing is working | 16:21 |
sshnaidm | clarkb, nope, it's downloaded fine in other jobs | 16:21 |
*** michael-beaver has quit IRC | 16:21 | |
clarkb | however the fact that we get back json saying we are not authorized implies to me that its private | 16:22 |
clarkb | sshnaidm: open the link in your browser | 16:22 |
clarkb | sshnaidm: its definitely not working genearlly | 16:22 |
clarkb | and if you request the same path from multiple mirrors you get the same result | 16:22 |
fungi | clarkb: also isn't there something manual we need to do about clearing cached facts after we remove it from the inventory? | 16:23 |
*** tdasilva_ has joined #openstack-infra | 16:23 | |
clarkb | fungi: maybe? I'm not sure if that was strictly required or I was just confused because we actually use the global inventory when launching nodes | 16:23 |
fungi | ahh, okay. we can give it a shot and find out, i guess | 16:24 |
clarkb | sshnaidm: see also https://hub.docker.com/search?q=tripleomaster%2Fcentos-binary-nova-compute-ironic&type=image | 16:24 |
clarkb | fungi: ya it should be safe because the launch script creates a one off ssh key that won't be able to log into the existing server | 16:25 |
*** piotrowskim has quit IRC | 16:26 | |
*** tdasilva has quit IRC | 16:26 | |
sshnaidm | clarkb, it works there: http://logs.openstack.org/96/669596/4/gate/tripleo-ci-centos-7-containers-multinode/25ffdee/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz#_2019-07-22_14_15_47_219 | 16:26 |
clarkb | sshnaidm: http://mirror.sjc1.vexxhost.openstack.org:8082/v2/tripleomaster/centos-binary-nova-compute-ironic/blobs/sha256:6d3a23ca3a1378376ca4268c06d7c7da7b25358e69ff389475e5a30b78549fbb currently fails | 16:27 |
*** mattw4 has joined #openstack-infra | 16:27 | |
clarkb | so something may have changed? | 16:27 |
clarkb | did the image get deleted? | 16:27 |
clarkb | or switched to a non public image? | 16:28 |
sshnaidm | clarkb, https://hub.docker.com/r/tripleomaster/centos-binary-nova-compute-ironic | 16:28 |
*** iurygregory has quit IRC | 16:28 | |
clarkb | huh whydoesn't that show up in a search? | 16:29 |
fungi | i don't have docker installed... does a docker pull of that directly from dockerhub work? | 16:30 |
*** mattw4 has quit IRC | 16:30 | |
sshnaidm | clarkb, it was there at least for last year, I don't think something changed and other jobs pass on that. | 16:31 |
*** mattw4 has joined #openstack-infra | 16:31 | |
sshnaidm | clarkb, "non authorized" errors usually come from bad proxy, kinda misleading error message | 16:31 |
clarkb | sshnaidm: except our proxy shouldnt' be writing the json | 16:32 |
clarkb | sshnaidm: if it was just an http status code I would agree, but that json must be coming from docker hub | 16:32 |
sshnaidm | clarkb, mm.. why does dockerhub answers instead of proxy? or I miss something | 16:32 |
clarkb | sshnaidm: you talk to the proxy, the proxy talks to dockerhub, docker hub returns a 401 + json document, proxy returns the 401 and json document to you | 16:33 |
clarkb | if it was just the proxy at fault I would expect only the 401 and not the json document beacuse its a simple apache setup | 16:33 |
sshnaidm | clarkb, why does proxy talk to dockerhub if it has this image cached? | 16:33 |
fungi | the cache eventually expires | 16:34 |
fungi | and has to be refreshed | 16:34 |
clarkb | yes the cache entries are only allowed to live for 24 hours or something | 16:34 |
clarkb | our proxy is not writing json documents | 16:34 |
fungi | (also, we have limited cache space on these, so have to expire cached objects to keep from overrunning available space) | 16:34 |
clarkb | it is possible that dockerhub changed their url paths on the backend and the proxy is no longer requesting things at valid addresses which could lead to this | 16:35 |
sshnaidm | hmm.. so in this case it doesn't use the cached image and turns to dockerhub which returns json error? so it's dockerhub issue.. | 16:35 |
clarkb | sshnaidm: that is my current udnerstanding of the problem (and the json document is the key to that because I'm 99% sure nothing in our apache config knows how to write out json for http status codes) | 16:36 |
sshnaidm | clarkb, is it usual thing for dockerhub? I'm not really familiar with their backend | 16:36 |
clarkb | https://registry-1.docker.io/v2/tripleomaster/centos-binary-nova-compute-ironic/blobs/sha256:6d3a23ca3a1378376ca4268c06d7c7da7b25358e69ff389475e5a30b78549fbb ya that gives me the same result and is where we proxy you | 16:37 |
clarkb | sshnaidm: dockerhub is pretty bad at being proxy friendly | 16:37 |
*** eernst has quit IRC | 16:38 | |
sshnaidm | clarkb, actually it is 401 error: HTTPError: 401 Client Error: Unauthorized for url: http... | 16:38 |
*** kopecmartin is now known as kopecmartin|off | 16:38 | |
clarkb | sshnaidm: yes it is a 401 AND a json document | 16:38 |
clarkb | the AND a json document was the clue to me that it wasn't our proxy originating the error and the url above confirms it | 16:39 |
clarkb | so the problem is the destination we proxy requests to is no longer valid it appears | 16:39 |
clarkb | (at least for that image) | 16:39 |
sshnaidm | seems like that, Host registry-1.docker.io | 16:39 |
*** ykarel|away has quit IRC | 16:40 | |
sshnaidm | any ideas how we can prevent this? | 16:40 |
clarkb | someone will need to do a docker pull and trace it out to see where requests are supposed to go to then update then proxy config | 16:40 |
clarkb | (assuming it did move and this isn't dockerhub being broken) | 16:41 |
fungi | https://hub.docker.com/r/tripleomaster/centos-binary-nova-compute-ironic says it was updated 2 hours ago... was the failure for that sha256 before or after? | 16:41 |
clarkb | sshnaidm: can you try the docker pull that fungi suggested locally without a proxy and see if that works? | 16:41 |
fungi | if before, then this may be a wild goose chase | 16:41 |
clarkb | fungi: well that blob shouldn't go away even if it is older | 16:42 |
clarkb | (however I think that blob can be deleted and if that happened then ya) | 16:42 |
fungi | ahh, it retains old blobs on replace> | 16:42 |
fungi | ? | 16:42 |
clarkb | fungi: ya my understanding is that it won't ever delete those blobs for you automatically | 16:43 |
clarkb | because the idea is you can rollback or wahtever to known good state | 16:43 |
clarkb | hwoever you as the user could elect to delete known bad blobs aiui | 16:43 |
sshnaidm | clarkb, fungi sorry, which command? | 16:43 |
sshnaidm | if this image is 2 hours old, so maybe it wasn't cached yet.. | 16:43 |
clarkb | sshnaidm: `docker pull tripleomaster/centos-binary-nova-compute-ironic` (and specify the sha maybe?) | 16:43 |
clarkb | sshnaidm: the blob that is failing is a week old | 16:43 |
sshnaidm | clarkb, it doesn't have "latest", need to specify a tag | 16:44 |
clarkb | sshnaidm: ok however you need to do it | 16:44 |
clarkb | the idae is to test if it works at all | 16:44 |
*** eernst has joined #openstack-infra | 16:44 | |
clarkb | we know the url above bypassing the proxy does not work | 16:44 |
clarkb | what we don't know is if it is supposed to work via some other path or not | 16:44 |
sshnaidm | works fine: docker pull tripleomaster/centos-binary-nova-compute-ironic:69cab4cd5356e6c5314a103ca760d531d004dc5a_802178e9 | 16:44 |
fungi | AJaeger: copies into /afs/.openstack.org/docs/api-guide/ have finished now. | 16:44 |
clarkb | doing a docker pull should indicate that to us | 16:45 |
fungi | AJaeger: let me know if anything seems to have been obviously missed there | 16:45 |
clarkb | sshnaidm: ok that implies the destination we are proxying to is no longer correct because dockerhub changed something | 16:45 |
clarkb | sshnaidm: so the next step is to trace out that pull and see where requests are going | 16:45 |
clarkb | sshnaidm: I think if you enable debug logging it will show up in your dockerd logs | 16:45 |
sshnaidm | clarkb, should it be in proxy logs? | 16:45 |
*** bhavikdbavishi has quit IRC | 16:45 | |
sshnaidm | ah, you mean locally | 16:45 |
clarkb | sshnaidm: no the proxy is requesting things at the bad location | 16:45 |
clarkb | we need someone to request it via working tooling and see what it talks to | 16:46 |
fungi | clarkb: sshnaidm: well, the blob in the failure is sha256:6d3a23ca3a1378376ca4268c06d7c7da7b25358e69ff389475e5a30b78549fbb so not the same? | 16:46 |
*** roman_g has joined #openstack-infra | 16:46 | |
fungi | or is that the same as 69cab4cd5356e6c5314a103ca760d531d004dc5a_802178e9 | 16:46 |
sshnaidm | fungi, I hardly understand what is this "sha256" | 16:46 |
clarkb | fungi: I think the sha you have is for a specific filesystem layer and the one sshnaidm has is for a manifest that includes potentiall many filesytem layers | 16:46 |
fungi | sshnaidm: a checksum | 16:46 |
clarkb | fungi: so its possible they are teh same but someone would have to hcekc that manifest | 16:47 |
fungi | ahh | 16:47 |
clarkb | s/the same/overlapping/ | 16:47 |
sshnaidm | fungi, clarkb let's take from logs: http://logs.openstack.org/26/671526/4/gate/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/a371bbe/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz?#_2019-07-22_09_29_50_583 | 16:48 |
sshnaidm | docker pull docker.io/tripleomaster/centos-binary-swift-object:4cadc580aed3cde73c487f827f76bf7b92b4d1e5_10e135ca | 16:48 |
sshnaidm | works fine | 16:48 |
clarkb | sshnaidm: ok you need to log what that does locally | 16:48 |
*** eernst has quit IRC | 16:48 | |
clarkb | and compare it against what the proxy does | 16:49 |
clarkb | (and update if necessary) | 16:49 |
clarkb | reading their specs I don't think the api paths would have changed so it must be the hostname | 16:50 |
*** eernst has joined #openstack-infra | 16:50 | |
clarkb | https://docs.docker.com/registry/spec/api/#pulling-an-image | 16:51 |
clarkb | https://registry-1.docker.io/v2/tripleomaster/centos-binary-nova-compute-ironic/manifests/current-tripleo-rdo gives me the same 401 and json document | 16:52 |
clarkb | also possible that there must be some token generated ? and straight up anonymous access doesn't work anymore? however you'd expect docker pull from client to figure that out regardless of destination | 16:53 |
clarkb | pabelanger: ^ how did you trace this in the past? | 16:54 |
fungi | also you'd think all proxied docker access would be failing similarly if that were the case | 16:54 |
*** igordc has joined #openstack-infra | 16:54 | |
clarkb | fungi: its possible that it is | 16:55 |
*** eernst has quit IRC | 16:55 | |
fungi | https://registry-1.docker.io/v2/zuul/zuul/manifests/latest | 16:56 |
fungi | is that the equivalent for the zuul/zuul container? | 16:57 |
*** eernst has joined #openstack-infra | 16:57 | |
clarkb | yes I think so | 16:57 |
clarkb | level=debug msg="Trying to pull tripleomaster/centos-binary-nova-compute-ironic from https://registry-1.docker.io v2" is what my dockerd says | 16:57 |
*** ricolin has quit IRC | 16:57 | |
fungi | seems that 401 json response is consistent for made-up manifests too https://registry-1.docker.io/v2/foo/bar/manifests/baz | 16:58 |
fungi | so i suspect this is its general "i have no idea what you just asked me for" response | 16:58 |
sshnaidm | something like that happens: https://paste.fedoraproject.org/paste/OgVKGHTKoLWUUYElUNGcug | 16:59 |
clarkb | https://registry-1.docker.io/v2/tripleomaster/centos-binary-swift-object/manifests/4cadc580aed3cde73c487f827f76bf7b92b4d1e5_10e135ca whihc shows up in taht log is a 401 for me | 17:01 |
fungi | yep, i just saw/tried the same, with the same effect | 17:02 |
*** eernst has quit IRC | 17:02 | |
mordred | I see authz mentions in the log - are we _sure_ that nothing accidentally got pushed that's marked as needing authentication? | 17:02 |
clarkb | mordred: no I don't think we are sure fo that | 17:03 |
* mordred doesn't know - mostly just looking for things it might be | 17:03 | |
mordred | nod | 17:03 |
*** eernst has joined #openstack-infra | 17:03 | |
clarkb | mordred: as a sanity check you might want to try pulling zuul from one of our mirrors? | 17:03 |
AJaeger | fungi: thanks, will double check in a bit - first look is fine | 17:04 |
clarkb | fwiw nothing in their spec seems to say you should need auth to pull manifets | 17:05 |
*** e0ne has joined #openstack-infra | 17:05 | |
*** jtomasek has quit IRC | 17:05 | |
clarkb | www-authenticate header does say Bearer realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:tripleomaster/centos-binary-nova-compute-ironic:pull" | 17:08 |
*** eernst has quit IRC | 17:08 | |
clarkb | and apparently that is how you are isntructed on what to do if you get a 401 /me reads more | 17:08 |
Shrews | fungi: so about the b0rked image leak problem we discussed last week... i see some examples where we repeatedly try deleting the upload, but then we give up for some reason and delete the zk record. So i was both right AND wrong about nodepool repeating the operation. I can't quite see that code path in the source. I may need to add some logging to the builders and restart them. | 17:10 |
*** eernst has joined #openstack-infra | 17:10 | |
fungi | Shrews: at least that would explain why it seems to be infrequent | 17:10 |
clarkb | I read that as you need a token scoped to that action. I would expect the dockerd/the thing pulling to request one of those? | 17:10 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build a docker images of gerrit https://review.opendev.org/671457 | 17:11 |
clarkb | https://auth.docker.io/token does generate 5 minute tokens for me | 17:11 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build a docker images of gerrit https://review.opendev.org/671457 | 17:13 |
mordred | clarkb, corvus: it's possible that might go green ^^ - I give it at least a 50/50 chance | 17:14 |
fungi | clarkb: sshnaidm: mordred: one interesting detail... i searched logstash for occurrences of message:"Image prepare failed: 401 Client Error: Unauthorized for url" over the past 2 days and all 30 matches were for builds which ran in ovh-bhs1 or ovh-gra1 | 17:14 |
fungi | not sure if that's a clue | 17:14 |
*** eernst has quit IRC | 17:15 | |
sshnaidm | fungi, I see the same results, that's why I mentioned OVH bhs1 | 17:15 |
mordred | well ... if these are jobs using the intermediate registry, network flakiness with the proxy can cause the docker daemon to try to talk to dockerhub but using the credentials for the intermediate registry | 17:15 |
clarkb | sshnaidm: fungi I'm wary of blaming ovh since we get the exact same behavior if talking directly to docker hub | 17:15 |
clarkb | fwiw if I add an authorization header with the token content I still get a 401 so I'm probably doing that wrong | 17:16 |
mordred | but yeah - I think digging further into what clarkb is saying is more likely to bear fruit right now | 17:16 |
*** chandankumar is now known as raukadah | 17:18 | |
clarkb | the rfc says proxies must not modify authorization headers if provided and I doubt apache breaks that which implies to me that dockerd isn't setting that up at all | 17:19 |
clarkb | the logs certainly don't make mention of it that I can see | 17:19 |
sshnaidm | clarkb, maybe auth error is just a default when not finding the image | 17:19 |
clarkb | sshnaidm: ya that could be | 17:20 |
sshnaidm | security etc | 17:20 |
clarkb | you would expect `curl https://registry-1.docker.io/v2/` to work at a minimum reading their api spec | 17:21 |
*** eernst has joined #openstack-infra | 17:21 | |
clarkb | but that also returns the same error | 17:21 |
clarkb | https://status.docker.com/ claims no errors | 17:22 |
*** eernst has quit IRC | 17:25 | |
*** panda has quit IRC | 17:25 | |
*** panda has joined #openstack-infra | 17:26 | |
clarkb | https://quay.io/v2/ returns "true" instead of go away like https://registry-1.docker.io/v2/ so I'm not completely off base in how this is expected to work I think | 17:28 |
clarkb | https://quay.io/v2/calico/node/manifests/master also returns data | 17:30 |
clarkb | I think that means we've got the rough gist of how the api is supposed to work down | 17:30 |
clarkb | I need to pop out for a bit and get a bike ride in before it turns into an oven outside | 17:30 |
clarkb | I'll be back in a bit | 17:31 |
fungi | one thread is that the failures have been in providers with our old puppeted mirror hosts, not in providers with our newer ansible-only mirror hosts, so i'm comparing the two in hopes of spotting any potential differences in dockerhub proxying rules | 17:31 |
*** eernst has joined #openstack-infra | 17:32 | |
clarkb | fungi: fwiw if I switched the name out for an opendev mirror I got the same failure | 17:32 |
fungi | k | 17:33 |
fungi | same error from docker pull or from direct url requests? | 17:33 |
fungi | also, no, i don't see any obvious differences in configuration | 17:34 |
clarkb | direct url requests | 17:36 |
*** eernst has quit IRC | 17:37 | |
fungi | there are more consumers of these dockerhub v2 mirrors than just tripleo, right? | 17:37 |
*** eernst has joined #openstack-infra | 17:38 | |
fungi | message:"401 Client Error: Unauthorized for url" has 517 hits in logstash currently, and all appear to be for tripleo-ci jobs | 17:39 |
clarkb | zuul, our gitea stuff, and maybe kolla/loci/helm/airship? | 17:42 |
fungi | looks like these exceptions are being raised when _copy_registry_to_registry() is calling requests with the broken url from tripleo_common/image/image_uploader.py | 17:42 |
*** eernst has quit IRC | 17:43 | |
*** ralonsoh has quit IRC | 17:43 | |
*** eernst has joined #openstack-infra | 17:45 | |
*** ociuhandu_ has joined #openstack-infra | 17:47 | |
fungi | it builds source_config_url based on some info passed into that method, and then asks requests to fetch it | 17:49 |
*** ociuhandu has quit IRC | 17:49 | |
fungi | so... this doesn't look like `docker pull ...` failing | 17:49 |
fungi | could it be that tripleo-common is making broken assumptions about dockerhub urls? | 17:50 |
*** eernst has quit IRC | 17:50 | |
*** eernst has joined #openstack-infra | 17:51 | |
*** ociuhandu_ has quit IRC | 17:52 | |
*** rlandy has quit IRC | 17:52 | |
clarkb | maybe the same ones we are running into doing direct requests | 17:53 |
fungi | given that it's forming its own dockerhub urls and querying those, yes perhaps | 17:56 |
fungi | https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/image_uploader.py#L1256 | 17:56 |
*** eernst has quit IRC | 17:56 | |
*** diablo_rojo has joined #openstack-infra | 17:56 | |
*** eernst has joined #openstack-infra | 17:58 | |
*** tesseract has quit IRC | 18:01 | |
*** eernst has quit IRC | 18:02 | |
*** jtomasek has joined #openstack-infra | 18:04 | |
*** eernst has joined #openstack-infra | 18:04 | |
*** eernst has quit IRC | 18:09 | |
*** eernst has joined #openstack-infra | 18:10 | |
*** eernst has quit IRC | 18:11 | |
*** electrofelix has quit IRC | 18:12 | |
*** smarcet has joined #openstack-infra | 18:13 | |
AJaeger | fungi: my automatic tests for api-site rename all passed, so I declare it as done - thanks! | 18:14 |
*** pkopec has quit IRC | 18:16 | |
fungi | AJaeger: my pleasure, thanks for making it easy! | 18:16 |
*** pkopec has joined #openstack-infra | 18:17 | |
*** diablo_rojo has quit IRC | 18:23 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build a docker images of gerrit https://review.opendev.org/671457 | 18:38 |
donnyd | power just went out here, and my generator is still not hooked up. | 18:42 |
donnyd | :( | 18:42 |
donnyd | its on my list | 18:42 |
*** smarcet has quit IRC | 18:42 | |
donnyd | I might get lucky and have it come back on in time... have about 25 minutes of backup UPS power | 18:43 |
*** smarcet has joined #openstack-infra | 18:47 | |
gouthamr | hi, i'm trying to convert a legacy job in openstack/manila to zuulv3; and had a question: How can i run something after devstack, but before tempest tests are kicked off? | 18:50 |
fungi | donnyd: thanks for the heads up. should we emergency zero the quota there? | 18:51 |
donnyd | not yet | 18:51 |
donnyd | if its not back on in about 3 minutes, then yes | 18:51 |
fungi | i mean, worst case zuul loses contact with nodes there and requeues the builds they were running | 18:52 |
donnyd | Well that never happens.. Its back on | 18:52 |
fungi | hah | 18:52 |
donnyd | When I have luck, its usually bad | 18:52 |
donnyd | I really need to get that generator hooked up.. LOL | 18:52 |
fungi | gouthamr: that might be a better question for #openstack-qa... the playbooks for devstack and tempest are maintained in their respective repos | 18:53 |
gouthamr | ah, will ask there, ty fungi! | 18:53 |
fungi | of course! | 18:54 |
AJaeger | gouthamr: I suggest you write a new job from scratch ;) | 18:54 |
gouthamr | AJaeger: trying to align to the rest of the projects, so we aren't a special snowflake :) so hope that's not necessary | 18:56 |
*** tdasilva_ has quit IRC | 18:56 | |
AJaeger | gouthamr: talk with QA team. The framework is quite different and my udnerstanding is that rewriting from scratch based on the new framework should be one option to evaluate. | 18:57 |
*** Vadmacs has joined #openstack-infra | 18:58 | |
gouthamr | AJaeger: sure thing. i'll do that, ty... | 18:59 |
clarkb | fungi: any luck with the docker thing? | 18:59 |
clarkb | I'm back now and can help | 18:59 |
fungi | clarkb: no, my only theory so far is that sometimes tripleo-ci's image uploader builds invalid urls | 19:00 |
*** e0ne has quit IRC | 19:01 | |
*** eernst has joined #openstack-infra | 19:01 | |
fungi | (the error isn't coming from a docker pull or anything like that as far as i can tell, just the urls tripleo-ci is assembling) | 19:01 |
clarkb | fungi: ya the odd thing is those urls appear valid if I'm reading the dockerhub api spec correctly | 19:01 |
*** eernst has quit IRC | 19:05 | |
*** smarcet has quit IRC | 19:07 | |
*** eernst has joined #openstack-infra | 19:07 | |
* clarkb configures local dockerd to go through proxy | 19:07 | |
fungi | are you using a local proxy to trace the urls it requests, or testing with one of the ci proxies? | 19:09 |
clarkb | I was just gonna check if it works successfully to docker pull through one of our mirrors | 19:10 |
clarkb | if it does then ya I think it is as you suspect: problem with their script | 19:10 |
fungi | ahh | 19:10 |
fungi | i have a feeling docker pull is working or else we'd have gotten a lot more complaints. the 401 errors from tripleo-ci jobs stretch back at least a week according to logstash | 19:11 |
clarkb | ya I just want to be sure of it | 19:12 |
*** eernst has quit IRC | 19:12 | |
clarkb | yup confirmed docker pull works | 19:13 |
clarkb | sshnaidm: ^ so the problem is with your script I think | 19:13 |
clarkb | sshnaidm: and talking directly to the backend with the urls your script fails on produces the same results | 19:13 |
clarkb | so this isn't a problem of the proxy | 19:13 |
sshnaidm | EmilienM, ^^ | 19:13 |
*** eernst has joined #openstack-infra | 19:14 | |
fungi | yeah, it does seem that the 401 auth required errors are how dockerhub responds to any unknown url, so odds are the script is sometimes assembling an invalid combination of parameters for the url | 19:16 |
fungi | perhaps wrong sha256 checksum? | 19:17 |
clarkb | fungi: maybe? though that manifests url should work | 19:19 |
*** eernst has quit IRC | 19:19 | |
clarkb | possible you need a token even if anonymous and docker pull does that | 19:20 |
*** eernst has joined #openstack-infra | 19:20 | |
mordred | corvus, clarkb: GERRIT BUILD WORKED!!!!! | 19:21 |
mordred | paladox: thanks for the help - I borrowed a few of your settings | 19:21 |
*** rh-jelabarre has quit IRC | 19:21 | |
paladox | your welcome :) | 19:21 |
clarkb | anyone else want to review https://review.opendev.org/#/c/672083/ so we can get the ball rolling on gitea server replacements? | 19:22 |
*** rh-jelabarre has joined #openstack-infra | 19:22 | |
mordred | paladox: sadly I also had to constrain it to 4k RAM and a single core - because any more than that it would either segfault or run out of memory | 19:22 |
paladox | :( | 19:22 |
mordred | clarkb: +A | 19:23 |
mordred | paladox: yeah. tell me about it | 19:23 |
clarkb | mordred: they? | 19:23 |
paladox | I've only built gerrit on large ram machines | 19:23 |
*** eharney has quit IRC | 19:23 | |
mordred | paladox: these are 8G VMs with 8 vcpus ... but that is apparently unhappy making | 19:24 |
mordred | clarkb: they? | 19:24 |
paladox | wow | 19:24 |
clarkb | mordred: from your message they would etierh segfault or run out of memory. wonder what they is | 19:24 |
paladox | mordred you have more power then the vps i did it on :P | 19:24 |
paladox | but that did cause jenkins to OOM | 19:24 |
*** eernst has quit IRC | 19:25 | |
mordred | clarkb: oh - the bazel workers | 19:25 |
mordred | clarkb: bazel something something | 19:25 |
mordred | gets really grumpy on our vms for some reason | 19:25 |
mordred | works FINE on my chromebook | 19:25 |
AJaeger | mordred: are you reviewing service-types etc? Please review https://review.opendev.org/672139 and https://review.opendev.org/672136 and https://review.opendev.org/#/c/672131/ and https://review.opendev.org/#/c/672138/ - all change links from developer.o.o to docs.o.o | 19:26 |
clarkb | mordred: clearly we should make a chromeos test image then | 19:26 |
mordred | clarkb: wcpgw? | 19:26 |
mordred | AJaeger: on it | 19:26 |
*** eernst has joined #openstack-infra | 19:26 | |
paladox | mordred what segfault did it give you? I've never experienced a segfault with bazel. | 19:27 |
clarkb | if anyone is wondering the tripleo rdo image for nova-compute-ironic is 1.784GB | 19:27 |
AJaeger | thanks, mordred ! | 19:27 |
*** igordc has quit IRC | 19:27 | |
clarkb | EmilienM: sshnaidm https://docs.docker.com/registry/spec/api/#api-version-check may be useful documentation | 19:28 |
clarkb | specifically it details what they claim the 401 not authorized is meant to mean and how you can deal with it | 19:29 |
fungi | clarkb: wow... could the error they're getting be a normally unexercised code path which only triggers when a download initially fails? could be unpredictable network performance in ovh triggering that is what causes it to only appear on builds in that provider? a stretch, i know | 19:29 |
clarkb | EmilienM: sshnaidm https://docs.docker.com/registry/spec/auth/token/ that too | 19:29 |
clarkb | fungi: perhaps? | 19:30 |
mordred | paladox: http://logs.openstack.org/57/671457/8/check/system-config-build-image-gerrit/300d08d/job-output.txt.gz#_2019-07-20_18_18_09_540913 is an example of a memory error ... | 19:30 |
*** igordc has joined #openstack-infra | 19:30 | |
mordred | paladox: http://logs.openstack.org/57/671457/7/check/system-config-build-image-gerrit/95547e8/job-output.txt.gz#_2019-07-20_15_38_17_543596 is an example of one of the bazel worker processes just going away mid-process | 19:30 |
*** eernst has quit IRC | 19:31 | |
clarkb | fungi: though I think it likely that they just need to request a token first | 19:31 |
clarkb | though perhaps that ishappening but failing which leads the pull requset to fail | 19:31 |
paladox | ah | 19:31 |
sshnaidm | clarkb, but we don't authenticate afaik, it's public contaienrs | 19:31 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: update version of open-iscsi that is installed on musl https://review.opendev.org/672152 | 19:32 |
clarkb | sshnaidm: yes I know, my theory is they require tokens for anonymous access too | 19:32 |
clarkb | I'm working to test that | 19:32 |
sshnaidm | clarkb, that would be weird, give the fact in other cases it worked | 19:33 |
*** eernst has joined #openstack-infra | 19:33 | |
sshnaidm | s/give/given | 19:33 |
sshnaidm | what is interesting, it's same container in two last cases | 19:33 |
sshnaidm | and same proxy | 19:33 |
paladox | mordred apparently we give the docker slave that the gerrit job runs on, 36gb of ram and 8cpu (https://tools.wmflabs.org/openstack-browser/server/integration-slave-docker-1052.integration.eqiad.wmflabs) | 19:34 |
mordred | paladox: that's incredible | 19:35 |
paladox | indeed, i didn't know that until now :) | 19:35 |
clarkb | sshnaidm: remember the same error happens if you make that request to docker directly | 19:36 |
clarkb | sshnaidm: so while yes maybe the proxy is a clue that request fails when made without a proxy | 19:36 |
clarkb | yup confirmed | 19:37 |
*** eernst has quit IRC | 19:37 | |
clarkb | if I manually curl out a token and use that with authorization header it works fine | 19:37 |
clarkb | sshnaidm: so I think you need to implement https://docs.docker.com/registry/spec/auth/token/ | 19:38 |
*** Lucas_Gray has joined #openstack-infra | 19:39 | |
sshnaidm | clarkb, but how then it works in all other jobs? | 19:39 |
*** eernst has joined #openstack-infra | 19:39 | |
sshnaidm | like in the same jobs, just different runs | 19:39 |
clarkb | I can only tell you what I observe talking from my machine directly to dockerhub | 19:39 |
clarkb | I reproduce the failure trivially and if I add a bearer token it works | 19:39 |
clarkb | sshnaidm: it is possible that all of the other jobs are pulling data which is still cached | 19:40 |
clarkb | and only ovh has expired the cache on those objects so far | 19:40 |
clarkb | if this was a recent change made by docker that would be possible | 19:40 |
*** jtomasek has quit IRC | 19:41 | |
*** eernst has quit IRC | 19:44 | |
*** eernst has joined #openstack-infra | 19:46 | |
clarkb | and now I've successfully followed through with a follow of the GET against the object, then a GET against that redirects me to | 19:46 |
clarkb | sshnaidm: that is my best guess. Differences in cache expiration have exposed this in some regions and not others and for some iamges and not others | 19:47 |
clarkb | sshnaidm: within ~24 hours when the entire cache is refreshed this will likely be a more global problem and you'll need to implement the bearer token stuff even for anonymous gets | 19:47 |
fungi | i have a feeling we may be caching those when they get requested through the proxy by a docker pull or similar tool which uses a token | 19:49 |
clarkb | fungi: oh good point | 19:49 |
fungi | and then subsequent direct requests get satisfied out of the cache until expiraiton | 19:49 |
fungi | which could explain the intermittent nature of the failures | 19:50 |
clarkb | that certainly fits the behavior really well | 19:50 |
openstackgerrit | Merged opendev/system-config master: Remove gitea01 from inventory so we can replace it https://review.opendev.org/672083 | 19:50 |
*** e0ne has joined #openstack-infra | 19:50 | |
*** eernst has quit IRC | 19:50 | |
fungi | once that ^ takes effect i can work on booting the replacement? | 19:51 |
clarkb | yup | 19:51 |
*** pkopec has quit IRC | 19:51 | |
*** eernst has joined #openstack-infra | 19:52 | |
clarkb | the next steps are roughly boot new node, add it to inventory (but not haproxy) and remove it from the create gitea repos playbook (I forget the name should be in the docs), then recover from db backup, then click button to make all repos, then replicate then add to haproxy and remove exception from ansible playbook | 19:52 |
fungi | also getting close to time for me to start making dinner, so it may happen after | 19:52 |
clarkb | hopefully that is all clear in the docs and if not I'll help interpret them and fix them | 19:52 |
fungi | also update dns entries | 19:53 |
clarkb | oh ya that too | 19:53 |
*** Lucas_Gray has quit IRC | 19:53 | |
clarkb | sshnaidm: an easy way to confirm would be to switch to docker hub directly rather than our mirror in your script | 19:54 |
*** eernst has quit IRC | 19:56 | |
fungi | clarkb: any special steps for booting a new gitea server? these go in vexxhost-sjc1 right? so bfv... you did mention at least adding the option to skip ping6 tests | 19:56 |
fungi | ahh, sorry, found https://docs.openstack.org/infra/system-config/gitea.html#deploy-a-new-backend | 19:57 |
* fungi should rtfm before asking questions | 19:57 | |
*** smarcet has joined #openstack-infra | 19:58 | |
*** eernst has joined #openstack-infra | 19:58 | |
clarkb | fungi: ya in sjc1 and I used an 80GB disk for gitea06 beacuse the 30GB or whatever they are now is way too small particularly if we want swap | 20:00 |
sshnaidm | clarkb, heh.. dockerhub will fail in half of cases :) not the best server performance there | 20:00 |
*** pcrews has joined #openstack-infra | 20:00 | |
clarkb | fungi: you have to exclude ping6 because our minimal image is quite minimal and doesn't have that tooling. You should use the same image that gitea06 was booted off of (this has correct ext4 journal size) | 20:00 |
*** Vadmacs has quit IRC | 20:01 | |
*** Goneri has quit IRC | 20:01 | |
fungi | awesome, thanks | 20:01 |
fungi | i guess i can just pass that in by uuid | 20:01 |
clarkb | that image is the one I uploaded from nodepool builders separately so that we could clean up the control plane nodepool builder stuff | 20:02 |
clarkb | fungi: or name the image can't be deleted because it is in use | 20:02 |
fungi | right | 20:02 |
clarkb | I should dobule check the nodepool builder cleanup work was a success I'll do that after lunch | 20:02 |
*** eharney has joined #openstack-infra | 20:02 | |
*** jcoufal has quit IRC | 20:02 | |
*** igordc has quit IRC | 20:03 | |
*** eernst has quit IRC | 20:03 | |
*** e0ne has quit IRC | 20:04 | |
*** eernst has joined #openstack-infra | 20:05 | |
*** EmilienM is now known as EmilienM|pto | 20:05 | |
*** mattw4 has quit IRC | 20:08 | |
*** igordc has joined #openstack-infra | 20:08 | |
*** mattw4 has joined #openstack-infra | 20:08 | |
*** eernst has quit IRC | 20:10 | |
fungi | any atypical parameters i need to add besides --boot-from-volume, --volume-size=80 and --ignore_ipv6? | 20:10 |
clarkb | I dont think so | 20:10 |
fungi | cool | 20:11 |
*** eernst has joined #openstack-infra | 20:11 | |
*** michael-beaver has joined #openstack-infra | 20:11 | |
*** eernst has quit IRC | 20:16 | |
*** eernst has joined #openstack-infra | 20:18 | |
*** eernst has quit IRC | 20:22 | |
*** eernst has joined #openstack-infra | 20:24 | |
*** eernst has quit IRC | 20:25 | |
*** smarcet has left #openstack-infra | 20:28 | |
*** tdasilva has joined #openstack-infra | 20:38 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP download and display a log file https://review.opendev.org/671912 | 20:42 |
*** xek has quit IRC | 20:44 | |
corvus | clarkb, fungi: i'm sticking my head up after being in a javascript hacking hole and have not been paying attention to scrollback; anything you need me for? | 20:44 |
*** xek has joined #openstack-infra | 20:45 | |
fungi | corvus: i don't think so. gonna try building a replacement gitea01 in a little while. we were also trying to confirm whether there was an issue with our dockerhub proxy | 20:47 |
clarkb | corvus: I'm goood. I think I managed to figure out the dockerhub oddity | 20:48 |
clarkb | basically you have to get a token even for anonymous access (not sure if that is new or not but tripleo ran into it) | 20:48 |
corvus | clarkb: that sounds familiar? i think that may be encoded into the docker roles. also, that can vary based on which of the services you hit (it's a mess) | 20:49 |
clarkb | corvus: cool so not completely odd. Basically their jobs fail when trying to request the data directlywith a python script in some cases | 20:49 |
clarkb | corvus: fungi's theory is that depending on what we have cached ti may work at times and not work at others. And I was able to manually confirm using curl that the token was required | 20:50 |
clarkb | pretty sure it isn't a fault of the proxies at least | 20:50 |
fungi | yeah, tripleo is doing direct access to dockerhub via python-requests and custom-assembled urls | 20:50 |
*** joeguo has joined #openstack-infra | 20:52 | |
corvus | we do some actions against "hub.docker.com" and others against "registry.hub.docker.com" | 20:52 |
corvus | and that's not by accident | 20:53 |
corvus | look at this one: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/promote-docker-image/tasks/promote-retag-inner.yaml | 20:53 |
corvus | bearer token agaist registry; JWT against hub | 20:53 |
clarkb | ah yup that includes the bearer token | 20:54 |
corvus | the list tags against 'hub.docker.com' here doesn't require auth: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/promote-docker-image/tasks/promote-cleanup.yaml | 20:54 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Cleanup nodepool builder clouds.yaml https://review.opendev.org/665018 | 20:55 |
*** jeremy_houser has joined #openstack-infra | 20:55 | |
clarkb | gerrit said ^ cannot merge so I rebased it | 20:55 |
clarkb | that is the next step required in the cleanup of the nodepool builders | 20:56 |
clarkb | corvus: ^ if you have a moment to review that that would help | 20:56 |
clarkb | corvus: and also https://review.opendev.org/#/c/671858/ if you've got all the js paged in would be good | 20:56 |
clarkb | mordred: fungi: you too on https://review.opendev.org/665018 | 20:57 |
jeremy_houser | currently attempting to build a gate for my repository, where are parent projects like "openstack-tox-py36" defined? Im trying to understand how .zuul.yaml hooks my tox.ini to run what I have written there | 20:57 |
clarkb | jeremy_houser: openstack/openstack-zuul-jobs should have most if not all of the openstack- prefixed jobs in it | 20:58 |
jeremy_houser | fantastic, thank you | 20:58 |
clarkb | jeremy_houser: them zuul/zuul-jobs contains a lot of very generic stuff which the openstack-zuul-jobs may build on | 20:58 |
corvus | jeremy_houser: this may be helpful if you ignore the zuulv2 bits: https://docs.openstack.org/infra/manual/zuulv3.html#how-jobs-are-defined-in-zuul-v3 | 21:00 |
corvus | it's probably time to fold that into the rest of the document normally.... | 21:01 |
*** whoami-rajat has quit IRC | 21:01 | |
jeremy_houser | thank you | 21:02 |
*** xek has quit IRC | 21:03 | |
*** xek has joined #openstack-infra | 21:04 | |
*** mattw4 has quit IRC | 21:05 | |
*** mattw4 has joined #openstack-infra | 21:05 | |
clarkb | and another trick is to use http://codesearch.openstack.org | 21:09 |
*** tdasilva_ has joined #openstack-infra | 21:09 | |
*** pcaruana has quit IRC | 21:10 | |
*** tdasilva has quit IRC | 21:12 | |
*** jeremy_houser has quit IRC | 21:13 | |
*** kjackal has quit IRC | 21:17 | |
*** mattw4 has quit IRC | 21:20 | |
clarkb | corvus: fungi ty ty for those reviews | 21:20 |
clarkb | fungi: https://review.opendev.org/#/c/667474/ is the change I used to enroll new gitea06 into inventory. Note the delta in https://review.opendev.org/#/c/667474/1/playbooks/remote_puppet_git.yaml | 21:21 |
*** tdasilva_ has quit IRC | 21:22 | |
*** eernst has joined #openstack-infra | 21:31 | |
*** mattw4 has joined #openstack-infra | 21:32 | |
*** eernst has quit IRC | 21:35 | |
*** eernst has joined #openstack-infra | 21:37 | |
*** eernst has quit IRC | 21:41 | |
*** eernst has joined #openstack-infra | 21:43 | |
smcginnis | Anyone know what happened with this release job? http://lists.openstack.org/pipermail/release-job-failures/2019-July/001193.html | 21:44 |
smcginnis | Unfortunately I need to drop, but will check back if anyone has any pointers. | 21:45 |
clarkb | smcginnis: zuul is claiming the merge failed. Is that job run against openstack/releases? | 21:45 |
smcginnis | That was a post-release job. | 21:45 |
clarkb | ya merge failure might more generally be categorized as git had a sad setting up the repo | 21:46 |
clarkb | what repo does that job run against? openstack/releases? | 21:46 |
smcginnis | I believe the tagging is run from openstack/releases, but then clones the repo being released to add the tag to it. | 21:47 |
clarkb | ok that helps as now I can search for the build in zuul | 21:47 |
clarkb | (though maybe it didn't get that far given the merge failrue message?) | 21:47 |
*** eernst has quit IRC | 21:48 | |
smcginnis | Hard to tell I think. I guess it would have come from the openstack/releases repo since if it got past that initial setup we would at least have log messages showing the failure steps? | 21:48 |
clarkb | yes | 21:48 |
clarkb | and sure enough there are no failures logged by zuul's builds page | 21:49 |
clarkb | http://zuul.openstack.org/builds?project=openstack%2Freleases&pipeline=release-post | 21:49 |
*** eernst has joined #openstack-infra | 21:50 | |
smcginnis | Maybe try a reenqueue? Not sure what next steps are from here. Shouldn't really be any way for a merge failure to be an issue at that point, so I'm guessing it was a quirk. | 21:52 |
openstackgerrit | Merged opendev/system-config master: Cleanup nodepool builder clouds.yaml https://review.opendev.org/665018 | 21:52 |
smcginnis | Sorry, I really need to step away now. I'll be back later. | 21:52 |
clarkb | I'm trying to figure out why it thought the merge failed | 21:53 |
clarkb | looking at https://opendev.org/openstack/releases/commits/branch/master both manila tags in the last hour are present and accounted for | 21:55 |
*** eernst has quit IRC | 21:55 | |
clarkb | ok it didn't fail against releaes, it failed against manila | 21:55 |
*** eernst has joined #openstack-infra | 21:56 | |
clarkb | gitea doesn't show us git tag refs huh? | 21:57 |
* clarkb clones manila | 21:57 | |
*** betherly has joined #openstack-infra | 22:00 | |
*** eernst has quit IRC | 22:01 | |
*** slaweq has quit IRC | 22:01 | |
clarkb | now I am extra confused the git logs say that the release-post pipeline didn't match for manila | 22:02 |
*** eernst has joined #openstack-infra | 22:02 | |
clarkb | neither tag did | 22:03 |
clarkb | aha that isb eacuse it wasn't a release-post failure ... | 22:03 |
clarkb | this is the tag pipelien failing | 22:03 |
*** betherly has quit IRC | 22:05 | |
*** eernst has quit IRC | 22:07 | |
*** eernst has joined #openstack-infra | 22:09 | |
*** mattw4 has quit IRC | 22:10 | |
clarkb | corvus: is there any easy way to map a gearman merger job to the merger that arn it? | 22:10 |
clarkb | corvus: the unique key seems to maybe not be unique ( I found evidence of the job running successfully after the failure a minute later than the merge failure is logged) | 22:11 |
clarkb | 1.5 minutes actually | 22:11 |
clarkb | http://paste.openstack.org/show/754740/ | 22:13 |
*** mattw4 has joined #openstack-infra | 22:13 | |
*** eernst has quit IRC | 22:13 | |
clarkb | oh I'm a derp | 22:14 |
clarkb | the logs had scrolled by and it showed me the retry I think but if I scroll up the failure is thre :/ | 22:14 |
*** eernst has joined #openstack-infra | 22:15 | |
clarkb | http://paste.openstack.org/show/754741/ too many connections to gerrit | 22:15 |
clarkb | we are leaking those I guess? | 22:16 |
fungi | okay, evening sustenance has been prepared and consumed. catching up and then i'll look into the gitea01 replacement assuming no emergencies | 22:19 |
*** eernst has quit IRC | 22:19 | |
clarkb | I'm not seeing evidence of 64 connections between those two hosts (checking netstat -np --wide on both sides) | 22:19 |
clarkb | checking the gerrit connection list via gerrit cli now) | 22:19 |
fungi | yeah, that's where the 64 limit comes in | 22:20 |
clarkb | no evidence of the limit behing hit there either | 22:21 |
clarkb | and the mergers don't run concurrently right? | 22:21 |
fungi | concurrent gerrit ssh api connections in its connections list are limited to 64 per account if | 22:21 |
fungi | accont id | 22:21 |
clarkb | oh account id | 22:21 |
fungi | i'm giving up typing accuratekly | 22:21 |
clarkb | ok so the problem may be the zuul user | 22:21 |
* clarkb greps for connection count by zuul user | 22:21 | |
*** eernst has joined #openstack-infra | 22:22 | |
fungi | that's also configurable, but has proven a useful defense against runaway third-party ci systems killing gerrit | 22:22 |
clarkb | there are currently only 6 | 22:23 |
clarkb | maybe we should bump to 96? | 22:23 |
*** eernst has quit IRC | 22:23 | |
clarkb | give us a bit more room, but at least for right now it seems fine | 22:23 |
*** eernst has joined #openstack-infra | 22:23 | |
*** betherly has joined #openstack-infra | 22:23 | |
clarkb | doesnt' look like we currently set that configuration option so 64 must be the deafult | 22:26 |
clarkb | corvus: we have 8 (zm) + 12 (ze) mergers and they each run serially right? so in theory 20 should be enough | 22:26 |
clarkb | am I missing anything obvious for why we may need more? | 22:26 |
clarkb | those release jobs in particular don't rely on the zuul user's credentials right (that would bump up potential connnection count at that time) | 22:27 |
corvus | clarkb: hrm, i think the change to set up repos in parallel on the executor may end up using more than one connection | 22:28 |
corvus | if that's correct then it could be one connection per starting-job from each executor | 22:30 |
*** betherly has quit IRC | 22:30 | |
*** rcernin has joined #openstack-infra | 22:30 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Increate gerrit user connection limit by 50% https://review.opendev.org/672188 | 22:30 |
clarkb | corvus: uhm thats hundreds potentially right? | 22:31 |
clarkb | I don't think our 50% increase is sufficient in that case | 22:31 |
clarkb | I've pushed that change though if we want to consider its merits there | 22:31 |
clarkb | looks like our peak for starting builds is actually ~7 according to grafana | 22:32 |
clarkb | should the number be 12 * 7 (ze) + 8 (zm) then? That is actually really close to 96 | 22:33 |
clarkb | 92 | 22:33 |
fungi | does the job in question directly update git repositories from gerrit? | 22:34 |
corvus | clarkb: yeah, that's what i'm thinking | 22:34 |
clarkb | fungi: that isn't where the failure was from so not sure. The failre was on zm02 updating the manila repo to then run the jobs | 22:35 |
fungi | ahh, okay | 22:35 |
corvus | this has been in production for a few months, so if this is what caused it, it's probably pretty rare for us to get that high | 22:35 |
fungi | with that i agree | 22:35 |
clarkb | corvus: ya looking at grafana it seems rare that more than one executor would be starting more than a small number of jobs | 22:36 |
fungi | first i've seen it reported at the very least | 22:36 |
clarkb | so 96 should actually be a good amount of breathing room | 22:36 |
fungi | i think that's global? but probably also still sufficiently low to catch runaway ci systems | 22:36 |
corvus | that may be something to think about for zuulv4 -- having executors/mergers coordinate to limit the overall number of connections | 22:36 |
clarkb | ya it appears to be global | 22:36 |
corvus | tobiash: ^ fyi | 22:37 |
fungi | also we have conntract set to reject more than 100 concurrent ssh api sessions from the same source ip address | 22:37 |
fungi | er, conntrack | 22:37 |
clarkb | fungi: which in the case of zuul shouldn't ever happen since its 20 hosts limited to 64 (and probably 96 soon) connections | 22:37 |
fungi | yup | 22:37 |
clarkb | that conntrack limit was in place to address the problem of leaky connectors | 22:37 |
clarkb | (across multiple users) | 22:38 |
fungi | just thinking in terms of where our various accidental denial of service mitigations are relative to one another | 22:38 |
fungi | ~100 api connections per address and ~100 api connections per account are fairly compatible | 22:39 |
*** mriedem has quit IRC | 22:39 | |
clarkb | https://opendev.org/openstack/manila/src/tag/8.0.1 is the tag that we need to rerun 'tag' pipeline jobs | 22:39 |
*** tkajinam has joined #openstack-infra | 22:39 | |
clarkb | should we wait for smcginnis to return and confirm that is correct before proceeding? | 22:40 |
fungi | what were the failed tag pipeline builds? | 22:40 |
clarkb | no idea | 22:40 |
clarkb | zuul didn't get that far because it couldn't get a config built | 22:40 |
clarkb | possibly none | 22:40 |
fungi | if they weren't anything crucial, they probably are covered by the next tag (may be just release notes) | 22:40 |
clarkb | looks like at least release-notes-jobs | 22:42 |
clarkb | and looking at other tags that is the only job | 22:44 |
*** rascasoft has quit IRC | 22:46 | |
*** rascasoft has joined #openstack-infra | 22:47 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP download and display a log file https://review.opendev.org/671912 | 22:49 |
fungi | yeah, that's the only one i'm aware of generally running in the tag pipeline | 22:49 |
*** tosky has quit IRC | 22:49 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP download and display a log file https://review.opendev.org/671912 | 22:49 |
clarkb | and we run the job there instead of release beacue it may be a tag that isn't for a release (in which case do release notes make sense?) | 22:50 |
fungi | clarkb: i think it was because $foo-eol tags don't match pre-release nor release pipline expressions | 22:51 |
clarkb | ah | 22:52 |
smcginnis | Back and just caught up on scrollback. | 22:52 |
smcginnis | So sounds like we can just ignore that failure then? | 22:52 |
smcginnis | Just wait until the next patch to get the release notes updated. | 22:52 |
clarkb | smcginnis: unless the release notes are critical and https://review.opendev.org/672188 should help prevent that from happening again (will need a gerrit restart too though) | 22:54 |
smcginnis | Thanks for digging in to that. I don't think it's critical. If someone comes complaining, we can do an update then to get them out there. | 22:55 |
fungi | i'm happy to reenqueue the ref in the tag pipeline if that happens, yes | 22:56 |
smcginnis | OK, thanks! I'm guessing that will not be necessary, but good to know we can if needed. | 22:56 |
*** eernst has quit IRC | 23:01 | |
*** eernst has joined #openstack-infra | 23:08 | |
*** eernst has quit IRC | 23:12 | |
*** rh-jelabarre has quit IRC | 23:13 | |
*** eernst has joined #openstack-infra | 23:14 | |
*** eernst has quit IRC | 23:18 | |
*** eernst has joined #openstack-infra | 23:20 | |
clarkb | donnyd: looks like tehre are 2 timeouts in fn over the last 12 hours. At least one of those appears to have been due to unhappy cloud under test (so not sure we can blame fn in that case) | 23:22 |
clarkb | donnyd: http://logs.openstack.org/53/671853/2/check/tempest-slow-py3/b70e8e1/job-output.txt is the other. That one actually fails to set up devstack but then runs tempest anyway | 23:25 |
clarkb | curious | 23:25 |
clarkb | all that to say I don't think we can blame fn in either case which is a great improvement | 23:25 |
fungi | it's a very determined job | 23:25 |
*** eernst has quit IRC | 23:25 | |
clarkb | gmann: any idea what is happening in http://logs.openstack.org/53/671853/2/check/tempest-slow-py3/b70e8e1/job-output.txt#_2019-07-22_18_59_50_267474 that failed to run devstack but then starts tempest anyway | 23:25 |
fungi | #status log deleted gitea01.opendev.org instance from vexxhost-sjc1 in preparation for replacement | 23:25 |
openstackstatus | fungi: finished logging | 23:25 |
*** eernst has joined #openstack-infra | 23:27 | |
donnyd | clarkb: it seems that just a few jobs don't like to run on fn. I don't really have a good answer as to why. They should work. I have worked out the performance issues, so I am kinda thinking it may be a ipv6 only thing maybe | 23:28 |
*** dchen has joined #openstack-infra | 23:28 | |
donnyd | Also that same aio job for Osa fails too | 23:29 |
fungi | clarkb: `openstack server show gitea06.opendev.org` has "image" as blank... suggestions? | 23:30 |
fungi | from the `openstack image list` output i'm guessing it's infra-ubuntu-bionic-minimal-20190612 (and not one of the two images simply named "ubuntu-bionic-minimal") | 23:31 |
*** eernst has quit IRC | 23:31 | |
clarkb | fungi: I think if you do a volume show on that hosts volume you get the image name | 23:32 |
fungi | i suspect if i iterate through `openstack volume list` to see which volume that instance's uuid has in use and then check what image it's base don... | 23:32 |
clarkb | fungi: the image would be on server show if it wasn't bfv | 23:32 |
fungi | ahh | 23:32 |
donnyd | I will keep my eyes on the failing jobs to see if it's just some particular ones, or if its totally random | 23:32 |
clarkb | fungi: and ya the one I uploaded would've had a date iirc | 23:33 |
* clarkb checks bridge command history | 23:33 | |
fungi | yeah, luckily `openstack volume list` has a total of only 27 entries in that region | 23:33 |
clarkb | fungi: ya the other two images are from february and are likely what mordred used | 23:35 |
clarkb | infra-ubuntu-bionic-minimal-20190612 is the one to use | 23:35 |
fungi | | 921f1962-2046-4644-bc2e-ba22d7a4947f | | 23:36 |
fungi | | in-use | 80 | Attached to gitea06.opendev.org on /dev/vda | | 23:36 |
fungi | so that would be the volume to track back from i guess | 23:36 |
clarkb | ya | 23:36 |
*** eernst has joined #openstack-infra | 23:36 | |
clarkb | volume show on that volume and in a json blob should be the name of the image iirc | 23:36 |
fungi | yup. confirmed | 23:36 |
*** eernst has quit IRC | 23:40 | |
fungi | /opt/system-config/launch/launch-node.py gitea01.opendev.org --flavor=v2-highcpu-8 --cloud=openstackci-vexxhost --region=sjc1 --image=infra-ubuntu-bionic-minimal-20190612 --boot-from-volume --volume-size=80 --ignore_ipv6 | 23:41 |
fungi | that look right, clarkb? | 23:41 |
*** eernst has joined #openstack-infra | 23:42 | |
clarkb | yes | 23:43 |
*** eernst has quit IRC | 23:47 | |
*** eernst has joined #openstack-infra | 23:49 | |
*** aaronsheffield has quit IRC | 23:49 | |
clarkb | fungi: corvus have a quick moment for https://review.opendev.org/#/c/672188/ ? can try to sneak in a gerrit restart in the near future if that gets in | 23:50 |
fungi | Multiple possible networks found, use a Network ID to be more specific | 23:50 |
fungi | bah, i guess i need to pick a network too | 23:50 |
clarkb | fungi: oh you probably have to delete the volume too | 23:50 |
*** eernst has quit IRC | 23:51 | |
clarkb | I don't think launch node cleans up after itself in the bfv case | 23:51 |
fungi | ahh | 23:51 |
fungi | no entries in volume list mention gitea01 though | 23:51 |
fungi | also `openstack network list` complains "got an unexpected keyword argument 'rate_limit'" | 23:52 |
clarkb | ya I think it tries but it races? I remember needing to clean it up once but I had a few failed attempts | 23:52 |
fungi | i guess it doesn't like our clouds.yaml on bridge.o.o? | 23:52 |
clarkb | fungi: 0048fce6-c715-4106-a810-473620326cb0 | public I get that with my osc venv | 23:52 |
fungi | ahh | 23:53 |
fungi | --network=0048fce6-c715-4106-a810-473620326cb0 seems to have gotten me further, yes | 23:54 |
fungi | openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: https://block-storage-sjc1.vexxhost.us/v3/462ecebbb6e34add9eeeae3936aa6cb9/volumes/ea7cd896-f78a-4e2a-ac98-8e62a7d275c2, Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be | 23:55 |
fungi | disassociated from snapshots after volume transfer. | 23:55 |
fungi | i left my wizard's staff back at basecamp | 23:55 |
*** eernst has joined #openstack-infra | 23:55 | |
fungi | (and maybe i memorized the wrong spells) | 23:56 |
clarkb | That is a new one to me | 23:56 |
clarkb | does a volume list show a volume? | 23:57 |
fungi | volume list shows a few dozen volumes, but none mention attachment to gitea01 | 23:57 |
fungi | i can just try the launch script again and see if this is repeatable | 23:58 |
clarkb | fungi: ea7cd896-f78a-4e2a-ac98-8e62a7d275c2 | 23:58 |
fungi | took a few minutes for it to get to that point the first time through though | 23:58 |
clarkb | I think that is your volume but it is available so even more confusing | 23:58 |
fungi | yeah, same error again | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!