*** AJaeger has joined #openstack-infra-incident | 15:30 | |
*** jeblair has joined #openstack-infra-incident | 15:58 | |
*** ChanServ changes topic to "Gerrit upgrade starting at 17:00 UTC" | 15:58 | |
anteaya | it is my belief that this is the etherpad we are working from for the upgrade, yes? https://etherpad.openstack.org/p/gerrit-2.11-upgrade | 16:00 |
---|---|---|
*** fungi has joined #openstack-infra-incident | 16:16 | |
jeblair | anteaya: cool i was about to ask about that | 16:16 |
jeblair | anteaya: do you have a link to the mailing list post? | 16:16 |
anteaya | :) | 16:16 |
* fungi catches up on the channel log | 16:16 | |
anteaya | http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html | 16:16 |
anteaya | been handing it out liberaly | 16:16 |
jeblair | status notice Gerrit will be offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html | 16:17 |
jeblair | anteaya: fungi how's that look? | 16:17 |
anteaya | I like it | 16:17 |
jeblair | #status notice Gerrit will be offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html | 16:19 |
openstackstatus | jeblair: sending notice | 16:19 |
*** clarkb has joined #openstack-infra-incident | 16:19 | |
*** jroll has joined #openstack-infra-incident | 16:20 | |
* jroll lurks | 16:20 | |
-openstackstatus- NOTICE: Gerrit will be offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html | 16:20 | |
fungi | jeblair: lgtm | 16:20 |
openstackstatus | jeblair: finished sending notice | 16:21 |
*** igorbelikov has joined #openstack-infra-incident | 16:22 | |
*** ChanServ changes topic to "Gerrit upgrade starting at 17:00 UTC | Etherpad https://etherpad.openstack.org/p/gerrit-2.11-upgrade" | 16:22 | |
fungi | i guess we should go ahead and just stop the ansible puppet cron across the board so that it has time to go dormant? | 16:25 |
fungi | any objections to me commenting it out on the puppetmaster now? | 16:25 |
clarkb | none here | 16:26 |
anteaya | no objections | 16:26 |
clarkb | will allow us to get the changes merged without applying them until we are ready too | 16:26 |
fungi | looks like i caught it when it wasn't already running, and have commented it out | 16:27 |
anteaya | off to a good start | 16:27 |
*** olaph has joined #openstack-infra-incident | 16:27 | |
fungi | most of the work, unsurprisingly, is happening in a shell on review.o.o... who from infra-root was volunteering to drive that part? want to do it in a root screen session? | 16:29 |
anteaya | mordred is in the defcore meeting in -3 if anyone is interested | 16:31 |
jeblair | i added some zuul tasks to the etherpad | 16:47 |
*** nibalizer has joined #openstack-infra-incident | 16:47 | |
*** zaro has joined #openstack-infra-incident | 16:47 | |
nibalizer | hello | 16:47 |
zaro | hi | 16:47 |
*** mordred has joined #openstack-infra-incident | 16:47 | |
jeblair | zuul is pretty far from idle at the moment :) | 16:48 |
jeblair | and if i had to guess, i'd say we probably lost the tripleo cloud | 16:48 |
AJaeger | only 15 in the post queue - that's short ;) | 16:48 |
fungi | yeah, tripleo cloud errors were rampant in yesterday's nodepool logs | 16:48 |
fungi | i assume it's hard-down | 16:48 |
fungi | for some extended period | 16:49 |
anteaya | zaro: can you go over current open patches that need to be merged during the upgrade | 16:49 |
jeblair | perhaps we should remove it from the config. | 16:49 |
anteaya | zaro: and ensure they have what they need to be merged? | 16:49 |
anteaya | zaro: since in 11 minutes most of us can't see them | 16:49 |
zaro | open patches are not crossed out: https://etherpad.openstack.org/p/gerrit-2.11-upgrade | 16:49 |
jeblair | (gertty ftw!) | 16:49 |
anteaya | zaro: yes, can you ensure they have the reviews they need? | 16:50 |
jeblair | any reason not to merge https://review.openstack.org/235079 now? | 16:50 |
anteaya | no objection | 16:51 |
zaro | nope, merge it | 16:51 |
jeblair | zaro: what's the status of https://review.openstack.org/258088 ? | 16:51 |
zaro | anteaya: there are only 3 left that need to merge with upgrade process | 16:51 |
anteaya | zaro: wonderful | 16:51 |
zaro | jeblair: it works, just doesn't pass bashate testt | 16:52 |
zaro | so can use it or just use what's on master | 16:52 |
zaro | bashate doesn't like the long lines. i tried to fix that but couldn't get it to work in time. | 16:53 |
clarkb | I am noticing the etherpad doesn't call out merging the remaining changes | 16:53 |
jeblair | fungi: i think you volunteered to drive the screen session on review.o.o :) | 16:53 |
clarkb | should we go ahead and do that since puppet cron is disabled? | 16:54 |
jeblair | seems safe and reasonable | 16:54 |
jeblair | zaro: can you unwip https://review.openstack.org/241309 | 16:54 |
fungi | jeblair: wfm | 16:54 |
zaro | done | 16:55 |
jeblair | all changes approved | 16:55 |
jeblair | the toggleci change failed a test | 16:56 |
jeblair | ah, apt mirror errors | 16:56 |
jeblair | shall i force-merge it? | 16:56 |
AJaeger | what is with https://review.openstack.org/258088 ? That fails... | 16:56 |
zaro | AJaeger: bashate doesn't like the long lines. | 16:57 |
AJaeger | ;( | 16:57 |
zaro | i tried to fix but didn't get it to work in time. | 16:57 |
clarkb | jeblair: ya I don't think apt mirror fails are related to a few js line change | 16:57 |
fungi | jeblair: yeah, i'd just cram it in | 16:57 |
zaro | AJaeger: the current PS works, but bashate hates it | 16:57 |
fungi | also i'm fine ignoring/disabling bash8 or skipping line length checks | 16:57 |
AJaeger | zaro: disable it in bashate | 16:58 |
zaro | AJaeger: how? | 16:58 |
AJaeger | zaro: exclude it in tools/run-bashate.sh | 16:58 |
jeblair | 235079 merged | 16:58 |
AJaeger | e.g. |grep -v name-of-your-script | 16:58 |
jeblair | shall i force-merge the other two since we're running short on time? | 16:59 |
anteaya | no objections | 16:59 |
zaro | yes | 16:59 |
fungi | anyway, infra-root people who feel the need to keep tabs on progress within cli steps, feel free to screen -x as root on review.o.o | 16:59 |
fungi | though i'll also update the channel with my progress at each step | 16:59 |
jeblair | okay, all the approved changes are merged; i'm not going to merge the bash8 failing change since it's failing tests and would break everything | 17:01 |
*** _david_ has joined #openstack-infra-incident | 17:01 | |
fungi | ready for a #status alert? | 17:01 |
zaro | i've crossed out merged changes on etherpad | 17:01 |
jeblair | status alert Gerrit is offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html | 17:01 |
zaro | ready | 17:01 |
jeblair | like that ^? | 17:01 |
anteaya | wfm | 17:01 |
fungi | jeblair: lgtm | 17:01 |
olaph | short and sweet | 17:01 |
jeblair | #status alert Gerrit is offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html | 17:02 |
openstackstatus | jeblair: sending alert | 17:02 |
-openstackstatus- NOTICE: Gerrit is offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html | 17:04 | |
*** ChanServ changes topic to "Gerrit is offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html" | 17:04 | |
*** jswarren has joined #openstack-infra-incident | 17:05 | |
fungi | need to top up my coffee while the zuul queues stuff is being done... who's got that part? | 17:05 |
jeblair | fungi: i'm on it | 17:05 |
fungi | cool, thanks! | 17:05 |
jeblair | zuul is stopped | 17:06 |
fungi | and my caffeine is at the ready | 17:07 |
openstackstatus | jeblair: finished sending alert | 17:07 |
jeblair | fungi: you're clear to proceed | 17:07 |
fungi | and there's our friendly bot! stopping gerrit now | 17:07 |
jeblair | (i'll purge nodepool) | 17:07 |
fungi | gerrit is down. i'll start backing up the database and review_site tree into ~root (or somewhere else, after i make sure we have space) | 17:07 |
fungi | correction... into /opt | 17:08 |
fungi | ;) | 17:08 |
clarkb | is / too full? | 17:08 |
jeblair | (nodepool is deleting all nodes) | 17:08 |
zaro | fungi: you may need to use the updated cleanup script; https://review.openstack.org/#/c/258088/7/tools/gerrit-2.8.4-cleanup.sh | 17:09 |
pleia2 | (updating the etherpad accordingly) | 17:09 |
fungi | clarkb: 8gb free. i don't trust it | 17:09 |
anteaya | hey pleia2 | 17:10 |
zaro | i forgot the one in master requirest a host (-h) param for all of the mysql commands since you'll be running from review.o.o | 17:10 |
fungi | zaro: or a my.cnf with the correct host set in it | 17:11 |
clarkb | of which there is one in ~root/ | 17:11 |
zaro | yeah, that would work as well. your choice. | 17:11 |
clarkb | but with a non default name iirc | 17:11 |
jeblair | i note that 'review_site' backup and 'git repo backup' are both on the list, but 'git repos' are inside of 'review_site' | 17:12 |
fungi | clarkb: yeah, root has a ~/.gerrit_db.cnf | 17:13 |
jeblair | i also note that 'review_site backup' is on there twice. | 17:13 |
fungi | jeblair: agreed, i was just going to backup the review_site into a tarball | 17:13 |
fungi | okay, database backup complete. working on review_site backup now (to the same location) | 17:13 |
jeblair | i have struck out the lines that i think are not necessary, does that look correct? | 17:13 |
jeblair | or do we really mean to do those twice? | 17:14 |
jeblair | we do have 2 database backups | 17:14 |
zaro | backup twice is to save before and after cleanup, but probably not necessary | 17:14 |
jeblair | daes anything between the first review_site backup and the second one change anything in review_site? | 17:15 |
fungi | zaro: that's a third backup though | 17:15 |
zaro | you are correct, not neccessary. | 17:15 |
jeblair | so i guess i had 2 questions: | 17:15 |
fungi | yeah, i can see doing a second db backup after the cleanup before the migration | 17:16 |
fungi | that makes plenty of sense | 17:16 |
jeblair | 1) is the separate git repo backup necessary -- i think we answered this: no | 17:16 |
clarkb | the second db backup was so that we would have the post db cleanup db available too | 17:16 |
jeblair | 2) is the second review_site directory backup necessary | 17:16 |
jeblair | i think 2 is still unanswered? | 17:16 |
clarkb | I do not think the second review_site directory backup is necessary | 17:16 |
clarkb | actually | 17:16 |
jeblair | [and i guess 3) is the second db backup necessary -- yes, because we are changing the db] | 17:16 |
clarkb | the cleanup script moves at least one repo akanada | 17:17 |
fungi | the only things changing in it are the akanada typo correction looks like | 17:17 |
zaro | fungi: makes sense to me | 17:17 |
fungi | right, that | 17:17 |
clarkb | so second review_site backup not a nop but also not super necessary | 17:17 |
fungi | everything else is either happening in the db or on the git farm | 17:17 |
fungi | the edits it would encompass are trivially reversable | 17:17 |
fungi | two mv commands | 17:18 |
fungi | non-lossy transformation | 17:18 |
jeblair | i would be okay dropping the extra backup step based on that | 17:18 |
zaro | i agree | 17:18 |
fungi | same | 17:18 |
pleia2 | yeah | 17:18 |
clarkb | sounds good | 17:18 |
fungi | also saves us some many minutes waiting for a second tar to complete | 17:19 |
jeblair | brb | 17:19 |
fungi | review_site backup completed | 17:21 |
fungi | who wants the git farm steps from https://git.openstack.org/cgit/openstack-infra/system-config/tree/tools/gerrit-2.8.4-cleanup.sh | 17:21 |
fungi | i'll get started on the db and review.o.o filesystem updates | 17:22 |
clarkb | I can git farm | 17:22 |
fungi | thanks! | 17:22 |
jeblair | re | 17:22 |
clarkb | so there is already an akanda.git in openstack-attic | 17:23 |
clarkb | I am going to mv that into ~root then rename akanada | 17:23 |
fungi | clarkb: i have a feeling it got created by jeepyb | 17:23 |
clarkb | ya I think thats right and likely not needed | 17:23 |
fungi | i expect to find the same on review.o.o | 17:23 |
jeblair | the number of 0 rows affected by the db cleanup script is curious | 17:26 |
fungi | zaro: most of the mysql delete queries are matching 0 rows | 17:26 |
fungi | yeah, that | 17:26 |
jeblair | zaro: did you see that in your tests with the db snapshot? | 17:26 |
zaro | fungi: ok. that's odd. | 17:26 |
jeblair | (i'll copy the results from prod to the etherpad) | 17:27 |
zaro | fungi: no it did remove with the data i used | 17:27 |
fungi | it's things like no submodule_subscriptions or account_project_watches entries so that seems likely | 17:27 |
fungi | we're getting matches on the changes table for those | 17:27 |
fungi | just not the others | 17:27 |
zaro | ohh, wait yeah, i few of them might be noops | 17:27 |
fungi | i mean, it's not at all uncommon when we do the equivalent update queries for those tables during project renames that they match 0 rows | 17:28 |
fungi | just because we don't have submodules, and few people use the gerrit subscription features | 17:28 |
zaro | but they should not be all noops | 17:28 |
fungi | right, there were changes table matches for each of the three being cleaned up | 17:29 |
fungi | i'm about to move forward with the update queries for the akanada typo in there next | 17:29 |
fungi | on the assumption this is expected behavior | 17:29 |
jeblair | zaro: do you have a log of the output from running your test? | 17:30 |
jeblair | oh | 17:30 |
zaro | looking | 17:30 |
clarkb | ok I think git01-08 are good | 17:31 |
jeblair | fungi, zaro: yeah, i'm inclined to agree this is expected behavior -- just surprising that we had noops in a script with a dry run. will still be good for zaro to confirm if he can, but i think we can move on regardless. | 17:32 |
jeblair | (nodepool is idle with ready nodes) | 17:34 |
*** jswarren has quit IRC | 17:36 | |
fungi | okay, confirmed the git trees for akanda have been safely cleaned up and moved into place | 17:37 |
zaro | i've looked thru my test runs and it looks like there's no logging coming from my script unless there is an error. so i believe ran the script and manually checked the db to verify. I was beeing over zealous with the removal for some reason. | 17:37 |
fungi | starting the second (post-cleanup) db backup now | 17:37 |
fungi | okay, that's done. all ready to delete the gerrit cache | 17:41 |
fungi | zaro: just to confirm, there's no reason to save that cache, right? | 17:42 |
zaro | nope, you should remove it | 17:42 |
fungi | okay, that's done then | 17:43 |
zaro | just a safeguard for possible issue i saw on gerrit ML. | 17:43 |
fungi | thanks | 17:43 |
fungi | so... these seem to be all the prep steps completed | 17:44 |
jeblair | do we run puppet to execute the upgrade? | 17:44 |
fungi | you beat me to the question ;) | 17:44 |
zaro | yes. | 17:45 |
fungi | also do i manually upgrade the javamelogy plugin before the gerrit war upgrade, or after? | 17:45 |
zaro | before is fine | 17:45 |
fungi | doing now | 17:45 |
jeblair | i will update git repos on puppetmaster | 17:46 |
jeblair | HEAD is at Merge "Upgrade review.o.o to Gerrit 2.11" | 17:46 |
fungi | zaro: we have a javamelody-4744bfb.jar and javamelody.jar.disabled in /home/gerrit2/review_site/plugins | 17:47 |
zaro | ohh. hmm. maybe just start fresh. | 17:47 |
fungi | zaro: do either of those need to be removed? your instructions have me creating a file just named javamelody.jar | 17:47 |
zaro | remove them and get the one from etherpad | 17:47 |
zaro | or just copy them off to anther location. | 17:48 |
*** ociuhandu has joined #openstack-infra-incident | 17:48 | |
zaro | *copy/move | 17:48 |
fungi | the other part of my question though, is should we end up with /home/gerrit2/review_site/plugins/javamelody.jar (like in the etherpad) or /home/gerrit2/review_site/plugins/javamelody-3fefa35.jar instead? | 17:48 |
zaro | javamelody.jar | 17:48 |
fungi | the one that seems to be in use now is javamelody-4744bfb.jar not javamelody.jar | 17:49 |
zaro | that's what the puppet does | 17:49 |
zaro | i have no idea how that happened. | 17:49 |
fungi | so puppet changed that behavior after it installed javamelody-4744bfb.jar i guess? | 17:49 |
jeblair | clarkb, mordred: i'm a little confused by ansible/puppet | 17:49 |
jeblair | nibalizer: ^ | 17:49 |
jeblair | it's supposed to rsync the system-config repo from puppetmaster to the hosts, right? | 17:50 |
nibalizer | jeblair: it works in agent mode right now | 17:50 |
nibalizer | it also rsyncs but that code doesn't do anything | 17:50 |
zaro | the puppet will get javamelody-<version>.jar then copy to /home/gerrit2/review_site/plugins/javamelody.jar | 17:50 |
fungi | zaro: mostly just making sure we're not going to end up with two javamelody plugins in there with different names once puppet gets reenabled | 17:50 |
jeblair | but looking on some hosts, /opt/system-config/production is at a comment from dec 1 | 17:50 |
nibalizer | then possibly we do not have that working as well as we thought | 17:50 |
fungi | zaro: okay, i'll keep an eye out for it after we run puppet just in case | 17:50 |
jeblair | nibalizer: okay, so for now, ignore that, and run 'puppet agent' ? | 17:50 |
zaro | fungi: yeah, puppet will overrite /home/gerrit2/review_site/plugins/javamelody.jar with newer versions | 17:51 |
nibalizer | yes | 17:51 |
jeblair | nibalizer: k, thx | 17:51 |
zaro | when we do upgrades, that's how its setup to work | 17:51 |
clarkb | yup nibalizer's got it | 17:51 |
jeblair | we don't have any puppet module or ansible role upgrades as part of this, right? | 17:51 |
zaro | or should work, https://git.openstack.org/cgit/openstack-infra/puppet-gerrit/tree/manifests/plugin.pp#n70 | 17:52 |
clarkb | jeblair: all of those should've gone in already | 17:52 |
jeblair | so i only need to update system-config on puppetmaster, not run install_modules, etc. correct? | 17:52 |
clarkb | there wer eseveral puppet-gerrit changes but we merged them since they were backwrad and forward compat | 17:52 |
clarkb | jeblair: yes | 17:52 |
jeblair | fungi: cool, so i think you are set to run 'puppet agent --test' on review.o.o whenever you are ready | 17:53 |
fungi | okay, latest javamelody.jar is downloaded to /home/gerrit2/review_site/plugins/javamelody.jar and ownership/permissions have been made consistent with its predecessor | 17:53 |
fungi | here goes. hold onto your seats | 17:54 |
* anteaya holds onto her seat | 17:54 | |
* _david_ holds onto his seat | 17:54 | |
jeblair | stuff is happening! | 17:54 |
anteaya | yay stuff | 17:54 |
anteaya | _david_: thanks for being here | 17:54 |
fungi | and no errors (yet!) | 17:54 |
_david_ | ;-) | 17:55 |
pleia2 | :) | 17:55 |
fungi | i take it we're reasonably confident the puppet exec timeout won't kill this before it completes? | 17:55 |
zaro | uhh, what's the timeout? | 17:55 |
fungi | by default i think it's something like 10 minutes, but it might be overridden in the module. i haven't looked | 17:55 |
zaro | it took 1 hr to reindex on my HP bigVM | 17:55 |
fungi | but you were running it under puppet to trigger the upgrade right? | 17:56 |
fungi | so presumably the exec timeout must be overridden | 17:56 |
jeblair | i want to say our reindexes take 15m? | 17:56 |
zaro | ohh, i think that' a non-upgrade reindex. upgrade reindex take 4x | 17:57 |
nibalizer | default timeout is 300 seconds | 17:57 |
fungi | anyway, i expect it'll be here for a while... Info: /Stage[main]/Gerrit/Exec[install-core-plugins]: Scheduling refresh of Exec[gerrit-start] | 17:57 |
jeblair | oh fun. | 17:57 |
zaro | no, was not running with puppet | 17:57 |
jeblair | so... what happens if a reindex is killed? | 17:57 |
jeblair | because it seems like that's about to happen. | 17:57 |
fungi | wait, you didn't test the upgrade under puppet but the plan was to upgrade via puppet? | 17:57 |
fungi | yeah, we may be staring down the barrel of an imminent cleanup and retry | 17:58 |
zaro | opps, sorry i didn't mention that earlier. | 17:59 |
* fungi should have been a little more concerned about the fact that the upgrade plan didn't include the actual upgrade command | 17:59 | |
mordred | this upgrade doesn't actually contain upgrades | 17:59 |
* anteaya puts on the kettle for more tea | 18:00 | |
zaro | fwiw, i've stopped and started reindex a few times and it doesn't see to be an issue. like reindex just ran again without issues. | 18:00 |
fungi | well, i guess let's wait for puppet to hit the timeout and see if it kills the reindex or keeps running orphaned | 18:00 |
jeblair | and there it goes | 18:00 |
fungi | and there it goes | 18:00 |
fungi | yep | 18:00 |
zaro | what? reindex? | 18:00 |
jeblair | i don't see a java process anymore | 18:00 |
jeblair | so i think it was killed | 18:01 |
fungi | presumably we need to run these steps by hand | 18:01 |
zaro | are you talking about the DB upgrade or reindex? | 18:01 |
fungi | zaro: do you have some crib notes on how you were actually running the upgrade that you can add to the etherpad? | 18:01 |
zaro | yes. | 18:01 |
fungi | zaro: i don't actually know what i'm talking about. i'm talking about the black box step of running puppet | 18:01 |
fungi | so "whatever puppet was doing" | 18:01 |
*** alop has joined #openstack-infra-incident | 18:02 | |
fungi | that aborted early because the exec took more than 300 seconds to return | 18:02 |
jeblair | fungi: i'm going to copy some stuff out of screen | 18:02 |
pleia2 | created a little bit in the etherpad for "Manual commands" | 18:02 |
fungi | thanks jeblair | 18:02 |
fungi | looks like it might have been in the middle of schema migrations | 18:02 |
fungi | hopefully those are resumable | 18:03 |
fungi | (at least they usually are in most applications) | 18:03 |
jeblair | (i'm hoping to determine which things it actualyl finished) | 18:03 |
mordred | fungi: if not, we can manually fix them | 18:03 |
*** greghaynes has joined #openstack-infra-incident | 18:04 | |
_david_ | or restore db from the backup? | 18:04 |
fungi | _david_: yes, that's an option | 18:05 |
zaro | fungi: manual commands are at the bottom of etherpad | 18:05 |
fungi | we can drop and re-source the db and then start the upgrade step over | 18:05 |
jeblair | fungi: finished copying; transferring to new etherpad | 18:05 |
fungi | thanks zaro! | 18:05 |
mordred | that actually might be a safer bet, since no further data has gone in - DDL in mysql is not transactional | 18:05 |
_david_ | zaro, "init --batch" does it always update installed plugins? | 18:06 |
zaro | _david_: no it doesn't install plugins | 18:06 |
jeblair | https://etherpad.openstack.org/p/fjOwsR2NVY | 18:06 |
jeblair | there is the log from the puppet run ^ | 18:06 |
mordred | fungi: do you feel comfortable doing that db thing? | 18:06 |
_david_ | zaro, I fixed that upstream on stable-2.11, but not sue if you guys cherry-picked that change | 18:06 |
_david_ | zaro, How do we make sure, that installed plugins got updated? | 18:07 |
mordred | jeblair, fungi: it LOOKS like the migration completed | 18:07 |
fungi | mordred: yeah, just 'drop reviewdb; source "/path/to/backup.sql";' right? | 18:07 |
mordred | since the scheduling refresh happened | 18:07 |
fungi | if we end up needing it | 18:07 |
* _david_ looking the migration log | 18:07 | |
jeblair | mordred, fungi: i agree, i think the db migration finished | 18:07 |
mordred | so I don't think it's necessary to drop/reload | 18:07 |
zaro | _david_: puppet unzips the jars to plugins folder | 18:07 |
fungi | yay, we won't lose quite as much time at least | 18:07 |
zaro | _david_: we are upgrading javamelody manually | 18:08 |
fungi | rather, i already manually upgraded the javamelody plugin | 18:08 |
_david_ | can someone check the schema version in the database? | 18:08 |
fungi | _david_: i'm happy to if you have the command handy | 18:09 |
_david_ | fungi, let me figure it out | 18:09 |
_david_ | select * from schema_version; | 18:10 |
jeblair | mysql> select * from schema_version; | 18:10 |
jeblair | +-------------+-----------+ | 18:10 |
jeblair | | version_nbr | singleton | | 18:10 |
jeblair | +-------------+-----------+ | 18:10 |
jeblair | | 107 | X | | 18:10 |
fungi | jeblair seems to have beat us to the punch | 18:10 |
fungi | also 107 is what the migration log implies | 18:10 |
fungi | so presumably we're confirmed good for that step | 18:11 |
_david_ | Jepp, we don't need to restore the db. Looks sane t me. | 18:11 |
zaro | yeah, it looks good to me | 18:11 |
fungi | so if i'm reading correctly, i need to manually repeat the reindex | 18:12 |
mordred | that's my take | 18:12 |
zaro | yes. | 18:12 |
jeblair | ++ | 18:12 |
fungi | so i need to run that as the gerrit2 user, correct? | 18:13 |
zaro | yes | 18:13 |
clarkb | fungi: should be the same as when we rename projects so yes | 18:13 |
fungi | it's going now | 18:13 |
_david_ | zaro, offline reindex time is around 1 hour? | 18:15 |
zaro | you mayalready know this but will take 3/4 time to get to 99% then spend 1/4 of time before 100%. | 18:15 |
jeblair | looking at the log and the puppet manifest, i think that's the only thing -- i believe the other exec steps completed | 18:15 |
zaro | _david_: it was for me on my big test VM | 18:15 |
clarkb | that last 1% is bigger than all the other percents | 18:16 |
zaro | reindex after upgrade take about 15-20 mins for me. | 18:16 |
* mordred thinks percent does not mean what you think it means | 18:17 | |
jeblair | all percents are equal. some percents are more equal than others. | 18:17 |
fungi | hrm, we got a missing blob | 18:17 |
fungi | org.eclipse.jgit.errors.MissingObjectException: Missing blob e1b60551d607142580d962122399613060afde69 | 18:17 |
pleia2 | :\ | 18:18 |
fungi | that may just be a long-standing corruption in our git repos? | 18:18 |
zaro | yeah, take a look at the bottom of etherpad | 18:18 |
zaro | it's expected. i think overly verbose logging. | 18:18 |
fungi | zaro: hah, thanks! | 18:18 |
pleia2 | phew :) | 18:18 |
fungi | there was also this: | 18:19 |
* nibalizer creates a WANTED poster for the missing blob | 18:19 | |
fungi | ERROR com.google.gerrit.server.change.MergeabilityCacheImpl : Error checking mergeability of 08489f39fc2292b7a0b2315da3b11a623c47cf85 into 6e5059a7c6dcfa6aee85b894139a5afdd0335035 (MERGE_IF_NECESSARY) | 18:19 |
clarkb | fungi: I wondre if that is just a timeout on the merge op | 18:20 |
zaro | fungi: same thing. | 18:20 |
fungi | com.google.gerrit.server.git.MergeException: Cannot merge 08489f39fc2292b7a0b2315da3b11a623c47cf85 | 18:20 |
fungi | okay, so it's related to the missing blob? | 18:20 |
zaro | yes, it's in the discussion thread. | 18:20 |
fungi | rather, related to the missing blob errors | 18:20 |
fungi | okay, cool then | 18:20 |
zaro | i've investigated that and was able to retreive those changes after upgrade without problems. | 18:21 |
jeblair | 08489f39fc2292b7a0b2315da3b11a623c47cf85 is 239488,1 | 18:21 |
jeblair | which is an abandoned openstack/fuel-plugin-contrail change | 18:23 |
fungi | and nothing of value was lost ;) | 18:23 |
fungi | current reindex status... Reindexing changes: projects: 54% (471/858), 50% (128372/256443) | 18:23 |
fungi | we're past the halfway mark, if that's to be believed | 18:23 |
zaro | yeah, all those errors happend on changes that were abandoned. | 18:24 |
AJaeger | we shouldn't merge those ;) | 18:24 |
zaro | anybody know what's our largest repo? | 18:31 |
jeblair | nova? | 18:31 |
clarkb | zaro: I think it is either nova or neutron, likely nova | 18:31 |
fungi | also depends on how you calculate "largest" | 18:33 |
fungi | in terms of disk space utilized, i believe it's openstack-manuals since they had some very large files in their git history (multiple versions of rendered pdfs and images mostly, i think?) | 18:34 |
AJaeger | interesting: | 18:34 |
AJaeger | 137M nova | 18:34 |
AJaeger | 55M neutron | 18:34 |
AJaeger | 451M openstack-manuals/ | 18:34 |
AJaeger | if you take disk file usage... | 18:34 |
clarkb | nova has diskimages in it | 18:35 |
AJaeger | fungi: yes, that happened in the past but not anymore | 18:35 |
clarkb | but apparently small ones :) | 18:35 |
fungi | AJaeger: however, the past is still reflected in git, by design | 18:35 |
AJaeger | indeed, fungi | 18:36 |
jeblair | wow so many merge check failures | 18:36 |
AJaeger | if we look at total commits overall - according to stackalytics - the top three are: | 18:36 |
fungi | yeah, i'm trying to keep an eye on those and make sure they're not drowning out other more legitimate issues with the reindex | 18:37 |
AJaeger | nova: 21762; neutron 7153; openstack-manuals 7025 | 18:37 |
fungi | AJaeger: though i believe if you look at the neutron project (not repo) compared to the nova project you'll find neutron is ahead there | 18:37 |
fungi | however, the docs project likely has them both beat? | 18:38 |
fungi | (maybe infra too, we're crazy like that) | 18:38 |
zaro | for those that are intested, this is probably why reindex stays at 99% for a long time: https://groups.google.com/d/msg/repo-discuss/Ux6mr9jvuUA/oTs623GiBAAJ | 18:39 |
fungi | Reindexing changes: projects: 99% (854/858), 95% (243737/256443) | 18:40 |
fungi | (speaking of) | 18:40 |
AJaeger | neutron official: 11164; nova official: 23625; docs: 10914; infra: 16566 | 18:40 |
fungi | AJaeger: oh, i bet the numbers i was remembering were over the liberty dev cycle, not all time | 18:40 |
AJaeger | let me check to entertain the waiting time - tell me to shut off if this is distracting, please | 18:41 |
fungi | well, it was a statistic that got announced in a keynote at the last summit anyway | 18:42 |
fungi | so, you know, take that with a grain of salt ;) | 18:42 |
zaro | hopefully offline reindex can become a thing of the past after 2.11. there's a new online reindex feature that promises to allow upgrading without offline reindex. | 18:42 |
jeblair | fungi: remember: infra doesn't count for keynotes | 18:42 |
AJaeger | neutron: 3345 nova: 1705 docs: 1983 infra: 3922 | 18:42 |
AJaeger | congrats to all of us ;) | 18:43 |
fungi | jeblair: right, we're uninteresting | 18:43 |
jeblair | AJaeger: what's that number? | 18:43 |
fungi | i sort of like being uninteresting | 18:43 |
AJaeger | http://stackalytics.com/?metric=commits&release=liberty&module=infrastructure-group | 18:44 |
fungi | jeblair: liberty cycle commits merged i believe | 18:44 |
AJaeger | jeblair: as fungi said | 18:44 |
jeblair | nice | 18:44 |
fungi | by project-team | 18:44 |
fungi | we probably have a larger number of contributors than any other team's repos, just by nature of what we're doing with them | 18:45 |
fungi | (self-service, with a lot of selves) | 18:46 |
AJaeger | fungi: infra 489 different committers, docs: 344, neutron 354, nova: 292 | 18:48 |
AJaeger | for liberty | 18:49 |
clarkb | the stuck at 99% progress bar reminds me of old video game installs on windows | 18:51 |
pleia2 | heh | 18:51 |
mordred | s/old video game// | 18:51 |
pleia2 | indeed, ps4 isn't much better ;) | 18:51 |
clarkb | pleia2: I don't really notice on ps4 because it does it all in the background now | 18:52 |
nibalizer | pleia2: oh god, 'oh did you buy that disc with the game on it? lol i gott do installations' | 18:52 |
clarkb | I have an update, reboot, done | 18:52 |
nibalizer | 'oh did you want to play? better play the update game' | 18:52 |
pleia2 | clarkb: yeah, mine tends to get turned off now and then and I don't notice until I turn it back on to play and need to update x_x | 18:52 |
clarkb | nibalizer: that is less of a problem on current consoles if you let them stay alive in the low power mode | 18:52 |
clarkb | nibalizer: since they do all that in the background and all you do is acknowledge the change and reboot if necessray | 18:53 |
clarkb | not perfect but much much better | 18:53 |
nibalizer | ah im still at ps3 | 18:53 |
nibalizer | my favorite ps4 fact is that it can't dlna or play a video from a usb drive | 18:53 |
pleia2 | ps3 is just my netflix player now | 18:53 |
fungi | pleia2: i like that even though the ps4 will go into a suspend state without terminating the running game, it will still prompt you to update and restart it when you resume if there's a new version uploaded | 18:53 |
clarkb | fungi: you can tell it not to | 18:54 |
zaro | is puppeting gerrit using the default puppet timeout? i don't see timeout set in the puppet-gerrit manifests | 18:54 |
clarkb | zaro: yes | 18:54 |
nibalizer | zaro: from what i can tell yes | 18:54 |
fungi | pleia2: clarkb though i like more that it gives you the option to _not_ do that and just go back to what you were doing, yes | 18:54 |
zaro | and that was working in the past? | 18:54 |
fungi | zaro: i believe it has never worked, no | 18:55 |
clarkb | fungi: it worked way back when | 18:55 |
clarkb | then the external index became a thing | 18:55 |
fungi | zaro: rather, before there was a lucene index it worked, but yeah | 18:55 |
fungi | the exec to upgrade dates from before upgrading a production gerrit was more than a minute or two to complete | 18:56 |
zaro | ahh, so we never used gerrit puppet other than applying minor changes? | 18:56 |
fungi | also we had a lot fewer repos and changes and a much smaller database back then | 18:56 |
zaro | *used/use as in currently | 18:56 |
fungi | i think it's working fine on review-dev simply because it has a comparatively small dataset | 18:57 |
fungi | and it's convenient to have there | 18:57 |
clarkb | I seem to recall one of the baldurs gate games requiring a CD change at 99% | 18:58 |
fungi | hah | 18:58 |
clarkb | like it had to load the next pane of the installer off the first disk before it could continue | 18:58 |
fungi | almost like swapping floppies | 18:58 |
fungi | ooh, we're up to 97% of changes reindexed now! | 18:59 |
mordred | fungi: you know the best thing about waiting for this reindex? | 18:59 |
clarkb | I had a carmen san diego game on 5.25" floppies and I couldn't play it after that drive died or went away | 18:59 |
clarkb | was a sad day | 18:59 |
fungi | mordred: you can do things other than code review without feeling guilty? | 18:59 |
jeblair | anyone want a ps2? | 18:59 |
pleia2 | hehe | 19:00 |
zaro | what games? | 19:00 |
fungi | jeblair: i recently replaced my old ps2 with one of the really compact models they have now, so nope | 19:00 |
jeblair | um. really fun ones. i'm sure. classics. | 19:00 |
zaro | i have 1st gen wii, with 2 games. so probably an upgrade | 19:01 |
clarkb | also I learned recnetly that rollercoaster tycoon was written in assembly | 19:01 |
olaph | nes ftw | 19:01 |
pleia2 | I sold mine when I moved west, along with all my FF games, and PS2 Monkey Island | 19:01 |
fungi | i keep it around because the ps1/ps2 were the definitive platform for dance dance revolution. the one they tried to do for the ps3 was not great and their fans had moved on to other titles anyway | 19:01 |
pleia2 | I do miss monkey island | 19:01 |
clarkb | pleia2: the FFVII remake looks promising | 19:01 |
pleia2 | clarkb: right, I'm excited :) | 19:02 |
fungi | yep, planning to reply vii when that comes out. it's been a while | 19:02 |
fungi | er, replay | 19:02 |
pleia2 | clarkb: wow re: rollercoaster tycoon | 19:03 |
fungi | Paint peeling: 99% (857/858), 97% (249178/256443) | 19:03 |
clarkb | pleia2: it was released in 1999 too which is crazy | 19:03 |
zaro | ohh one project left, which one could that be? | 19:04 |
fungi | zaro: yep, it's been on that one project for the last ~15 minutes i think | 19:04 |
zaro | someone reallyl ikes assembly | 19:04 |
pleia2 | hehe | 19:05 |
mordred | fungi: SO CLOSE | 19:05 |
AJaeger | do we know which project it is? Just curious... | 19:06 |
zaro | i blieve you've already answered your Q, most changes | 19:06 |
AJaeger | So, indeed nova? | 19:08 |
zaro | actually most patchsets | 19:08 |
zaro | yeah, probably the same. | 19:08 |
jeblair | wow it keeps lots of files open | 19:08 |
jeblair | lsof does not tell me the answer | 19:08 |
nibalizer | pleia2: so ps3 as netflix machine is interesting to me | 19:09 |
nibalizer | because after the 360 and ps3 came out, i felt that a LOT of ps2s hung out in dorm rooms playing audio cds and dvds | 19:09 |
pleia2 | my ps2 wasn't doing so well when I got rid of it, was never a good dvd player | 19:10 |
anteaya | if the offered ps2 plays dvds or cds I am happy to take it and will pay for shipping | 19:10 |
fungi | optical media players are notoriously prone to degradation (mostly mechanical/alignment issues) | 19:11 |
pleia2 | fungi: nods | 19:11 |
pleia2 | I read somewhere that I could get the laser realigned, but the cost was more than the system, unless I did it myself (hah hah) | 19:11 |
fungi | this is why i enjoy restoring cartridge-based console platforms | 19:11 |
fungi | they're much easier to work on | 19:11 |
clarkb | fungi: how ar eyour save game batter replacement skills? | 19:12 |
fungi | i don't even bother to do more than very minor repair/reconditioning on optical consoles | 19:12 |
fungi | clarkb: pretty good, depending on whether you want the memory preserved through the operation | 19:12 |
fungi | i tend to desolder and replace the batteries on those cartridges when i get them | 19:13 |
fungi | so most of mine are reasonably fresh | 19:13 |
zaro | how small are the batteries? | 19:13 |
fungi | though i've seen them hold save memory well through 20 years | 19:13 |
fungi | zaro: button cells | 19:13 |
fungi | they're pretty much always solder-tab too, not clip | 19:14 |
zaro | you talking like attari or nintendo? | 19:14 |
fungi | nes/famicom were the earliest cartridges i've seen battery-backed ram used for | 19:15 |
fungi | atari carts were all stateless afaik | 19:15 |
pleia2 | I don't think atari had memory like that | 19:15 |
pleia2 | yeah | 19:15 |
clarkb | fungi: next time you are in town I may sit you down with my brother who is afraid of playing some games because he expects batteries to die | 19:15 |
fungi | heh, you bet | 19:15 |
fungi | the trick to preserving the memory through a battery replacement is to hopefully find good anode/cathode sites to tie in a donor in parallel while you remove and replace the original cell | 19:16 |
fungi | and try not to lose contact through that process | 19:17 |
fungi | but even then, it's not trivial, and even things like a little static discharge from your fingers can corrupt the ram | 19:17 |
clarkb | zaro: the batteries are good for about 20 ish years iirc | 19:17 |
clarkb | zaro: so in the last several years it has become a big problem for people that collect and play old games | 19:18 |
fungi | however, at least for old stuff like nes/snes i've found that it's just as fun starting over from scratch and not worrying about your 20-year-old saves | 19:18 |
clarkb | ya I think my brother would be ok losing saves, just doesnt want to start a new game and a week later have a dead battery | 19:18 |
fungi | right, replacing the batteries is trivial. i do it all the time to recondition carts i find at thrift stores and stuff | 19:19 |
zaro | i don't think i've ever played a game that laster over 1hr. | 19:19 |
fungi | just usually requires some careful coordination and a good temperature-controlled soldering station | 19:20 |
clarkb | such 99% | 19:20 |
fungi | Watched kettle: 99% (857/858), 97% (250585/256443) | 19:20 |
pleia2 | heh | 19:20 |
jeblair | i'm going to see if i can acquire food before it finishes | 19:22 |
fungi | at this pace, you probably can | 19:23 |
*** peter_ has joined #openstack-infra-incident | 19:23 | |
fungi | especially if there's a taco truck just down the block | 19:23 |
nibalizer | mmmm | 19:26 |
nibalizer | i'll do the same, back in a bit | 19:26 |
* pleia2 sticks around | 19:29 | |
fungi | yeah, i'm sticking around too | 19:31 |
fungi | the changes counter finally ticked over to 98% | 19:31 |
fungi | catching up on ml replies and voting on upcoming release names | 19:32 |
pleia2 | :) | 19:32 |
fungi | actually kinda hoping "nameless" wins as our next release name | 19:34 |
fungi | a lovely logical contradiction | 19:34 |
pleia2 | the operators would be thrilled, I'm sure | 19:34 |
* AJaeger put nameless first as well... | 19:35 | |
fungi | we can use caricatures of clint eastwood in his role as "the man with no name" | 19:36 |
fungi | nothing says texas like a classic cowboy western, after all | 19:36 |
anteaya | there is a heck of a lot of O names | 19:37 |
fungi | though nameless also conjures images of "hastur the unspeakable, he who is not to be named" | 19:38 |
anteaya | I think I like Om best | 19:38 |
fungi | yes, om would be a marvellous successor to nameless as well. continues down a sort of zen naming path | 19:40 |
fungi | though if that's a theme, then we just blew our opportunity to have a release named mu | 19:41 |
anteaya | ha ha ha | 19:41 |
pleia2 | btw, the civs polls landed in my spam box (gmail) | 19:42 |
pleia2 | was there an announcement elsewhere that the polls had gone out? (I couldn't find one) | 19:42 |
AJaeger | I didn't saw one. mordred, did you send one out? I was also surprised to see these... | 19:42 |
fungi | my spamassassin was pretty certain they weren't even close to being spam | 19:42 |
anteaya | I was pleasantly surprised | 19:43 |
fungi | AJaeger: the messages should be From: "Monty Taylor (CIVS poll supervisor)" <andru@cs.cornell.edu> | 19:43 |
AJaeger | fungi, I have it... | 19:43 |
AJaeger | I didn't saw an announcement | 19:44 |
fungi | AJaeger: oh! | 19:44 |
fungi | right, i don't think there was an announcement to the ml that they were going out. if you haven't been following the tc meetings then you probably wouldn't have known | 19:44 |
fungi | presumably he'll send an announcement now that all the batches have gone out | 19:45 |
jeblair | fried chicken sandwich successfully acquired | 19:45 |
fungi | that must be one awesome taco truck | 19:45 |
fungi | bone-in fried chicken on white bread with lots of grease, right? | 19:46 |
anteaya | and mayonnaise? | 19:47 |
anteaya | I think I want to go here when we are in spain next year: https://en.wikipedia.org/wiki/Museum_of_the_Americas_%28Madrid%29 | 19:47 |
fungi | i don't remember harold's chicken shack having mayonaise, but i probably wasn't paying close attention | 19:47 |
jeblair | fungi: boneless, but now that you describe that sandwich, i kind of want to live where you imagine i live :) | 19:47 |
anteaya | a chicken sandwich with no mayonnaise? | 19:48 |
anteaya | mind you it has been a long time since I had one | 19:48 |
fungi | fries and cole slaw yes though | 19:48 |
anteaya | maybe that is the way they are done now | 19:48 |
fungi | jeblair: chicago, south side | 19:48 |
anteaya | fungi: whew | 19:48 |
jeblair | the one i have is boneless, buttermilk, slaw (sans-mayo), bun | 19:48 |
pleia2 | I'd love to see what kinds of bugs we find by calling a release Null | 19:49 |
anteaya | pleia2: ah aha ha | 19:49 |
fungi | pleia2: same, that was my second pick | 19:49 |
jeblair | pleia2: or "None" | 19:49 |
anteaya | NaN? | 19:49 |
fungi | it's too bad none wasn't on the list of options | 19:49 |
fungi | mmm... naan | 19:49 |
jeblair | we've made fungi hungry | 19:49 |
anteaya | how's that paint coming along? | 19:50 |
jeblair | i think i can eat this sandwich before it will be done | 19:50 |
fungi | 99% (857/858), 98% (253194/256443) | 19:50 |
fungi | you can almost certainly, yes | 19:50 |
anteaya | progress obviously | 19:51 |
zaro | yeah, take your time | 19:51 |
mordred | I went and got sushi and came back | 19:53 |
mordred | it was tasty | 19:53 |
anteaya | yum | 19:53 |
anteaya | did we break civs yesterday? | 19:53 |
anteaya | did you get an email from the maintainer? | 19:54 |
*** dims has joined #openstack-infra-incident | 19:55 | |
mordred | I did not get an email from the maintainer | 19:55 |
mordred | and people with broken links yesterday seem to find that they work today | 19:55 |
anteaya | okay, that's good | 19:56 |
jeblair | aww better luck next time | 19:56 |
anteaya | mine worked | 19:56 |
fungi | yes, i was unable to get to civs for teh n naming poll last night, but the same url worked for me today (and also the o release naming poll worked fine today) | 19:57 |
* mordred is pleased that civs has upgraded since we did this last | 19:58 | |
mordred | last time I had to batch the names | 19:58 |
mordred | this time I uploaded all of them at once | 19:58 |
mordred | much less harder | 19:58 |
fungi | oh, really?!? | 19:58 |
fungi | no more splitting at or under 1000 addresses? | 19:58 |
fungi | ooh, 99% (857/858), 99% (254008/256443) | 19:59 |
fungi | almost, almost, almost there now! | 19:59 |
anteaya | yay | 19:59 |
mordred | fungi: yah. it's amaze now | 19:59 |
anteaya | and yay again for no more splitting at 1000 addresses | 19:59 |
_david_ | Guys, we can use inline edit feature now? Unbelievable ;-) | 20:01 |
fungi | well, we did use inline edit for commit messages in 2.8. but yeah this will be fancy x100 | 20:02 |
zaro | _david_: lets not get ahead of ourselves, not done yet | 20:02 |
fungi | heh | 20:02 |
fungi | touché | 20:03 |
_david_ | zaro, all we have to do is start gerrit, done | 20:03 |
dims | ooh shiny :) | 20:03 |
jeblair | mmm. tasty. | 20:03 |
_david_ | jeblair, This called just in time ready ;-) | 20:04 |
zaro | _david_: we start gerrit and see if we can stay on new gerrit | 20:04 |
_david_ | zaro, ,-) | 20:04 |
_david_ | zaro, Last time that "and see if we can stay on new gerrit" took two days? | 20:04 |
_david_ | time enough to use inline edit feature... | 20:05 |
AJaeger | So, instead of giving -1 for a typo, you can fix it directly in place ;9 | 20:05 |
anteaya | well we timed this one to be mid-week | 20:05 |
anteaya | so we can test under load right away | 20:05 |
fungi | yeah, last time it was 2 days mainly because we did it on saturday and so didn't know how bad it really was until monday | 20:06 |
fungi | hopefully our entire community (at least americas still around for a few hours and apac waking up) will bum rush it | 20:07 |
mordred | yah | 20:07 |
fungi | and it's finished! | 20:07 |
mordred | WOOT | 20:07 |
anteaya | yay! | 20:07 |
AJaeger | Yeah! | 20:07 |
mordred | I'm sure it'll get immediately slammed | 20:07 |
fungi | Reindexed 254762 changes in 6817.5s (37.4/s) | 20:07 |
fungi | so, do we want to apply puppet again just to be sure there weren't other things it skipped when it aborted the reindex exec? | 20:08 |
jeblair | (except at the end when it was like 2/s) | 20:08 |
jeblair | fungi: i think that should be safe and sounds like a good idea... | 20:08 |
fungi | i'll do that now | 20:08 |
_david_ | fungi, Pupet would repeat init? | 20:09 |
fungi | it may also try to start gerrit once it finishes | 20:09 |
mordred | it may | 20:09 |
fungi | _david_: it shouldn't no | 20:09 |
fungi | and it didn't repeat it | 20:09 |
mordred | infra-root: got a message from bluebox about replacing our CPUs | 20:09 |
_david_ | One dumb question: Can someone check, that installed plugins got updated? | 20:09 |
jeblair | i think it mostly reverted gerrit's rewriting of the config files... | 20:10 |
mordred | jeblair: my favorite feature | 20:10 |
jeblair | fungi: (maybe run once more to make sure it's steady-state?) | 20:10 |
_david_ | We shoot ourself in the foot last time (our admins) by leaving outdated plugins in place | 20:10 |
anteaya | mordred: can you put that bit in the -infra channel? | 20:10 |
anteaya | mordred: if I try to search for it I won't think to search this channel log | 20:11 |
jeblair | yeah, i this channel is back to serious mode :) | 20:11 |
mordred | anteaya: nod | 20:11 |
fungi | _david_: the last modified dates on the plugins in /home/gerrit2/review_site/plugins/ are dated december 11, but presumably that's the last modified time they have in the archive | 20:11 |
anteaya | thanks | 20:11 |
nibalizer | mordred: the suspense is killing me | 20:11 |
nibalizer | what did it say | 20:11 |
_david_ | fungi, sounds good then | 20:11 |
zaro | dec 11 date sounds correct | 20:12 |
fungi | unfortunately my screen buffer doesn't go far enough back to see what the dates on them were prior to starting this | 20:12 |
mordred | nibalizer: I have continued the conversatoin in the other channel | 20:12 |
zaro | or at least that matches review-dev.o.o which has been updated with same gerrit ver | 20:12 |
fungi | they're all Dec 11 22:17 (utc) except javamelody.jar which i downloaded manually | 20:12 |
jeblair | we could md5sum them | 20:12 |
jeblair | javamelody differs from review-dev | 20:13 |
jeblair | others look same | 20:14 |
fungi | added to the bottom of the main etherpad | 20:14 |
fungi | javamelody was obtained from http://tarballs.openstack.org/ci/gerrit/plugins/javamelody/javamelody-3fefa35.jar | 20:14 |
fungi | double-checking its md5sum now | 20:14 |
jeblair | it matches what i wget | 20:15 |
jeblair | perhaps it's just newer than the one on review-dev? | 20:15 |
zaro | we had to manually update javamelody because the puppet for non-core plugins does not work. | 20:15 |
jeblair | zaro: yeah, but why is it different than review-dev? | 20:15 |
fungi | yeah, md5sum locally on static.o.o has 4f406e94158d9267e2c36a3a0dfcd243 | 20:15 |
zaro | correct one should be http://tarballs.openstack.org/ci/gerrit/plugins/javamelody/javamelody-3fefa35.jar | 20:17 |
jeblair | review-dev has de58dfb0f1a71ae4de41eda649cd029e | 20:17 |
zaro | might not want to believe the one on review-dev.o.o. i think i was testing. | 20:17 |
fungi | i don't find any on tarballs.o.o with that checksum | 20:17 |
fungi | hand-built jar maybe? | 20:18 |
*** ociuhandu has quit IRC | 20:18 | |
*** dims has quit IRC | 20:18 | |
zaro | the one currently on review-dev.o.o is ver 2.8 | 20:18 |
zaro | javamelody-e00d5af.jar | 20:18 |
zaro | i mean http://tarballs.openstack.org/ci/gerrit/plugins/javamelody/javamelody-e00d5af.jar | 20:19 |
*** dims has joined #openstack-infra-incident | 20:19 | |
jeblair | it didn't match that md5sum either | 20:19 |
zaro | so yeah, i was testing the upgrade from 2.8->2.11 | 20:19 |
fungi | which has md5sum cbb5855865b4983ab5604448018f9e85 not de58dfb0f1a71ae4de41eda649cd029e | 20:19 |
fungi | as i said, i don't see any javamelody plugin jar on tarballs.o.o which has an md5sum matching the one on review-dev | 20:20 |
jeblair | zaro: do you want to test http://tarballs.openstack.org/ci/gerrit/plugins/javamelody/javamelody-3fefa35.jar on review-dev real quick? | 20:20 |
fungi | so either it came from somewhere else, or was modified after retrieval, or was somehow deleted from our tarballs site | 20:20 |
zaro | it's in revew-dev.o.o's /home/gerrit-plugins/javamelody-e00d5af.jar md5sum is de58dfb0f1a71ae4de41eda649cd029e | 20:20 |
zaro | i've already tested, both versions work fine with gerrit 2.11 actually | 20:21 |
zaro | i can do it now as well | 20:21 |
jeblair | zaro: are you sure? because the version installed on review-dev is not one from tarballs.o.o. if you say you tested it a while ago and then replaced it with a hand-built one, i'll believe you. but if you think the one there now is something from tarballs.o.o, then i think we should re-test. | 20:22 |
pleia2 | now that everyone else is back, I have a lunch meeting to run off to | 20:22 |
anteaya | pleia2: enjoy | 20:22 |
fungi | yeah, the md5sum of /home/gerrit-plugins/javamelody-e00d5af.jar on review-dev doesn't match the md5sum of http://tarballs.openstack.org/ci/gerrit/plugins/javamelody/javamelody-e00d5af.jar | 20:22 |
clarkb | I need to head to optometrist shortly as well | 20:22 |
anteaya | clarkb: hope it goes well | 20:23 |
fungi | good luck pleia2, clarkb! | 20:23 |
fungi | thanks for the help | 20:23 |
*** mihgen has joined #openstack-infra-incident | 20:23 | |
zaro | testing now | 20:24 |
zaro | https://review-dev.openstack.org/#/admin/plugins/ | 20:26 |
zaro | https://review-dev.openstack.org/monitoring | 20:26 |
_david_ | As the very last option we could rebuild the plugin, right? | 20:26 |
_david_ | Capability viewPlugins is required to access this resource | 20:26 |
zaro | that's with javamelody from /home/gerrit-plugins/javamelody-3fefa35.jar | 20:26 |
zaro | _david_: yeah, you cannot access only admins, sorry | 20:27 |
jeblair | 4e161fcf8ff2525ff373f9c469f711c3 /home/gerrit2/review_site/plugins/javamelody.jar | 20:27 |
fungi | javamelody has some features not safe for public access | 20:27 |
_david_ | n.p. | 20:27 |
clarkb | like thread killing | 20:27 |
_david_ | one day i will join this team i guess, to get all the credentials ;-) | 20:28 |
fungi | we want 4f406e94158d9267e2c36a3a0dfcd243 | 20:28 |
fungi | whatever's at /home/gerrit2/review_site/plugins/javamelody.jar is not the one you had us download from http://tarballs.openstack.org/ci/gerrit/plugins/javamelody/javamelody-3fefa35.jar | 20:28 |
_david_ | fungi, But where is this version is came from? Puppet? | 20:29 |
fungi | _david_: i don't think so, no | 20:29 |
zaro | what? ohh, i think it's because the build is on periodic pipeline | 20:29 |
zaro | so new build everyday | 20:29 |
_david_ | so it's even more fresh? | 20:30 |
fungi | that would make sense. -rw-rw-r-- 1 jenkins jenkins 1910323 Dec 16 06:10 javamelody-3fefa35.jar | 20:30 |
fungi | that's what we have on the tarballs server | 20:30 |
zaro | fungi: but its the same, to be safe you can copy it from review.o.o's /home/gerrit2/review_site/plugins/ folder | 20:30 |
fungi | so updated ~14 hours ago | 20:31 |
fungi | zaro: i assumed you were going to redownload http://tarballs.openstack.org/ci/gerrit/plugins/javamelody/javamelody-3fefa35.jar to test on review-dev | 20:31 |
jeblair | zaro: why don't you just do ^ | 20:31 |
zaro | ok, will do that now. | 20:32 |
fungi | so that we're sure we're testing the same one we're about to roll in production | 20:32 |
fungi | we've definitely seen subtle build failure false positives in the past which sneak by | 20:32 |
fungi | so good to confirm the actual build we're about to use, not just the same main commit | 20:32 |
_david_ | fungi, That surprises me. Gerrit Code Review is using the best build tool chain in the wild | 20:33 |
fungi | _david_: i know! at least that's what the buck developers say anyway | 20:34 |
_david_ | fungi, ;-) | 20:34 |
zaro | ok. md5 for one on review-dev is 4f406e94158d9267e2c36a3a0dfcd243 | 20:34 |
fungi | looks right now. thanks zaro! | 20:34 |
zaro | seems to work as well | 20:34 |
fungi | probably being overly pedantic, but our paranoia is rooted in prior experience | 20:35 |
jeblair | ready to start gerrit now? | 20:35 |
fungi | i'll puppet one last time as jeblair suggested to confirm it's a no-op | 20:35 |
fungi | yep, other than vcsreo which always wants to transition even when there's no update | 20:36 |
_david_ | "<fungi> probably being overly pedantic, but our paranoia is rooted in prior experience" ... that was caused by severe JGit synchronization bug, and not some nuances of Javamelody plugin | 20:36 |
jeblair | we have had more than one prior experience :) | 20:36 |
fungi | _david_: i meant prior experiences from build failures (long before that issue) | 20:36 |
_david_ | *lol* | 20:36 |
fungi | yeah, what jeblair said | 20:36 |
fungi | anyway, looks like we're all set | 20:37 |
fungi | anything else before i start gerrit? | 20:37 |
fungi | we're still ~20 minutes ahead of schedule here | 20:37 |
_david_ | database check? | 20:37 |
jeblair | fungi: all clear here | 20:37 |
*** jswarren has joined #openstack-infra-incident | 20:37 | |
fungi | admittedly we've burned most of our scotty factor | 20:37 |
mordred | fungi: maybe do a rain dance? | 20:37 |
_david_ | jeblair still schema at 107 in the db? | 20:37 |
olaph | or the rob dance? | 20:37 |
fungi | _david_: still 107, yep | 20:38 |
fungi | just confirmed with a new query | 20:38 |
fungi | okay, here goes nothing | 20:38 |
_david_ | ok, let's start gerrit then? | 20:38 |
fungi | (and here comes 2.11) | 20:38 |
jeblair | (wow, nodepool has _only_ ready nodes for like the first time ever) | 20:39 |
anteaya | ha ha ha | 20:39 |
_david_ | can we check the logs after starting it? | 20:39 |
fungi | claims gerrit is started now | 20:39 |
fungi | tailing logs | 20:39 |
_david_ | logs? | 20:39 |
anteaya | <- 503 | 20:39 |
jeblair | i will restart apache to clear the 503 | 20:39 |
jeblair | actually | 20:39 |
_david_ | https://review.openstack.org/ => 503 | 20:39 |
jeblair | i have stopped apache | 20:40 |
jeblair | because i'm watching fungi look at errors | 20:40 |
anteaya | yup | 20:40 |
_david_ | fungi, what gerrit logs are saying? | 20:40 |
fungi | unfortunately we have so many connection errors raising exceptions in the error log it's hard to know | 20:40 |
fungi | needle in haystack | 20:40 |
jeblair | :( | 20:40 |
fungi | org.apache.sshd.common.SshException: Received 96 on unknown channel 48 | 20:41 |
mordred | that's | 20:41 |
jeblair | https://issues.apache.org/jira/browse/SSHD-535 | 20:41 |
jeblair | so, er, a *new* kind of harmless connection error spam? | 20:41 |
fungi | [2015-12-16 20:40:27,421] WARN com.google.gerrit.sshd.GerritServerSession : Exception caught | 20:41 |
fungi | java.io.IOException: Connection reset by peer | 20:41 |
_david_ | but gerrit is up and running? | 20:42 |
jeblair | yeah, and we have many ssh connections | 20:42 |
fungi | aside from a handful of those, nothing above info level | 20:42 |
jeblair | so i'll start apache now? | 20:42 |
_david_ | re-start apache then? | 20:43 |
fungi | yep, i think we're safe | 20:43 |
jeblair | up | 20:43 |
anteaya | renders | 20:43 |
_david_ | Powered by Gerrit Code Review (2.11.4-11-ga14450f) | 20:43 |
_david_ | YAY | 20:43 |
olaph | yay! | 20:43 |
fungi | powerful! | 20:43 |
jeblair | i have logged in | 20:43 |
anteaya | logged in | 20:43 |
_david_ | congrats! | 20:43 |
_david_ | Who will do the first inline edit change/patch set??? | 20:44 |
fungi | notmorgan's apache hackarounds seem to have definitely done the trick for the openid redirect bug | 20:44 |
jeblair | my gertty has synced | 20:44 |
_david_ | Before we would need to think about the downgrade? | 20:44 |
zaro | yep, redirect works for me | 20:45 |
AJaeger | logged in - but my default query does not work ;( | 20:45 |
zaro | gitweb links work | 20:45 |
jeblair | AJaeger: what query? | 20:45 |
AJaeger | https://review.openstack.org/#/q/is:watched status:open label:Code-Review=0,self,n,z fails | 20:45 |
fungi | AJaeger: try + instead of space | 20:45 |
jeblair | AJaeger: https://review.openstack.org/#/q/is:watched+status:open+label:Code-Review%253D0%252Cself | 20:45 |
fungi | AJaeger: on, the ,n,z is likely no longer needed | 20:46 |
jeblair | i got that by putting the query into the search box and then copying the resulting url | 20:46 |
AJaeger | entered into the search box and got what jeblair has - thanks | 20:46 |
AJaeger | jeblair: you're too fast for me;) | 20:46 |
jeblair | shall i start zuul? | 20:46 |
_david_ | Wau ! | 20:46 |
_david_ | Tested inline edit: https://review.openstack.org/#/c/248975/1/tests/base.py,edit | 20:47 |
fungi | yeah, some of the exact query urls that 2.8 accepted may not work in 2.11 | 20:47 |
_david_ | Just created very first edit | 20:47 |
fungi | ooh | 20:47 |
AJaeger | And I have already the first email in my INBOX from somebdoy that +2A a change | 20:47 |
_david_ | it worked | 20:47 |
_david_ | Publish announcement to IRC and dev ML, that upgarde wa successful and outage is over? | 20:48 |
fungi | yeah, i'm not seeing any obvious broken. probably time to fire zuul back up and watch for more subtle breakage? | 20:48 |
zaro | has zuul started? | 20:48 |
jeblair | starting zuul now | 20:48 |
fungi | _david_: pretty much, and ask people to report any issues they see to us at quickly as possible | 20:49 |
fungi | er, as quickly | 20:49 |
fungi | [2015-12-16 20:49:13,350] WARN com.google.gerrit.server.patch.IntraLineLoader : 5000 ms timeout reached for IntraLineDiff in project openstack/swift-specs on commit dc35f427c5ab172dcfd5a6f62026b4d0886aa2fd for path specs/in_progress/container_sharding.rst comparing 6dce029237c3fdc0990e54b4776f3fe9528bf0f1..12baf6bc5cddd3b6999f28119b568c4cf03bb965 | 20:49 |
_david_ | fungi, something wrong in logs? Can we try tail again? | 20:49 |
_david_ | ^^^ Thanks ;-) | 20:49 |
jeblair | re-enqueuing changes | 20:50 |
fungi | that's no longer a permanent failure, right? (doesn't cause the repo to get marked corrupt any longer) | 20:50 |
_david_ | fungi, Yeah, that was fixed, but still, why don't we increase timeout form 5 sec to say 15? | 20:50 |
_david_ | s/form/from | 20:50 |
fungi | could probably stand some tuning, sure | 20:51 |
mordred | oh boo. clicking on a project name in the list of changes now no longer takes you to open changes for that project, just changes for that project | 20:51 |
jeblair | 5 seconds is a really long time to generate a diff. | 20:51 |
_david_ | mordred, yeah, that was changed | 20:51 |
jeblair | https://review.openstack.org/#/c/218738/ | 20:51 |
jeblair | i think that's the change from the error above | 20:51 |
_david_ | Is it big? | 20:52 |
jeblair | 46 lines total | 20:52 |
mordred | jeblair: it's snappy now | 20:52 |
mordred | it does have 4 binary iages in it | 20:52 |
jeblair | wait | 20:52 |
mordred | images | 20:52 |
_david_ | Well, we donÄt care, it's switching from Myers diff to Histogram diff in this case. | 20:52 |
jeblair | the filename links are to the _edit_ page? | 20:52 |
mordred | jeblair: they take me to diff view | 20:53 |
fungi | it happened right when zuul started reenqueuing changes, so might have slammed gerrit with queries a little or might have to do with the change being "bigger" than it seems because of the png images | 20:53 |
jeblair | oh weird | 20:54 |
jeblair | that was because of the link that _david_ shared earlier... | 20:54 |
fungi | it was carting the ,edit around on the url? | 20:54 |
jeblair | apparently somehow i ended up with my screen in "edit files" mode | 20:54 |
jeblair | fungi: yeah that | 20:54 |
mordred | weird | 20:55 |
_david_ | jeblair, Yes. That happens, wehn you push Edit button on the change screen, and then click on a file | 20:55 |
mordred | I cannot reproduce that behavior | 20:55 |
mordred | of going to that link then going to a different change and seeing it be in edit mode | 20:55 |
jeblair | status ok Gerrit has been upgraded to 2.11. Please report any issues in #openstack-infra as soon as possible. | 20:55 |
jeblair | mordred: i don't know how i got there, sorry. | 20:55 |
mordred | jeblair: darn. I'd a fascinating thing | 20:56 |
mordred | s/I'd/it's/ | 20:56 |
anteaya | jeblair: looks good to me | 20:57 |
fungi | jeblair: wfm | 20:57 |
jeblair | #status ok Gerrit has been upgraded to 2.11. Please report any issues in #openstack-infra as soon as possible. | 20:57 |
openstackstatus | jeblair: sending ok | 20:57 |
*** mihgen has left #openstack-infra-incident | 20:58 | |
jeblair | this timing is way too close for comfort. | 20:58 |
_david_ | 4 minutes before the deadline | 20:58 |
zaro | peeps will just think we got better at estimating :) | 20:58 |
*** ChanServ changes topic to "Gerrit upgrade starting at 17:00 UTC | Etherpad https://etherpad.openstack.org/p/gerrit-2.11-upgrade" | 21:00 | |
-openstackstatus- NOTICE: Gerrit has been upgraded to 2.11. Please report any issues in #openstack-infra as soon as possible. | 21:00 | |
_david_ | Joke time: | 21:00 |
_david_ | LibreOffice admin told me: thy only think about Gerrit upgrade to 2.11, when a) OpenStacke upgrade wen smoothly b) 48 hours gone after the upgrade | 21:01 |
fungi | hah! | 21:01 |
fungi | can't say i blame them | 21:01 |
_david_ | ;-))) | 21:02 |
zaro | toggleci button is working | 21:02 |
openstackstatus | jeblair: finished sending ok | 21:03 |
zaro | _david_: why they pick on us specifically? | 21:03 |
_david_ | Well, because Wikimedia guys are still at 2.8 or something? | 21:03 |
_david_ | zaro, And because you guys have downgrade strategy ;-) | 21:03 |
*** AJaeger has quit IRC | 21:04 | |
fungi | less of a strategy and more of a stragedy | 21:04 |
fungi | as in having to downgrade again would be a strategic tragedy | 21:05 |
fungi | okay, who's tackling the maintenance completion e-mail to the ml? | 21:05 |
_david_ | fungi, Have you noticed my comment in downgrade section? | 21:06 |
fungi | heh, yes | 21:06 |
pleia2 | woo | 21:09 |
*** alop has left #openstack-infra-incident | 21:18 | |
*** _david_ has quit IRC | 21:27 | |
*** dims has quit IRC | 23:01 | |
*** dims has joined #openstack-infra-incident | 23:24 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!