Update ElasticSearch snapshots

From Archivematica
Revision as of 08:29, 27 February 2019 by Scollazo (talk | contribs) (→‎ElasticSearch update)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

ElasticSearch update[edit]

1. Check that ES is up and running, and note the results:

 curl -X GET "localhost:9200/_cat/indices?v"
 health status index     pri rep docs.count docs.deleted store.size pri.store.size 
 yellow open   aips        5   1      24816            1    175.4mb        175.4mb 
 yellow open   transfers   5   1          0            0       720b           720b 

2. Create an ElasticSearch backup (they are called snapshots ) using the following commands:

a. Remove and recreate the folder that stores the backup

 sudo rm -rf /var/lib/elasticsearch/backup-repo/
 sudo mkdir -p /var/lib/elasticsearch/backup-repo/
 sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backup-repo/

b. Allow elasticsearch to write files to the backup

 echo 'path.repo: ["/var/lib/elasticsearch/backup-repo"]' |sudo tee -a /etc/elasticsearch/elasticsearch.yml

c. Restart ElasticSearch and wait for it to start

 sudo service elasticsearch restart
 sleep 60s

d. Configure the ES backup

 curl -XPUT "localhost:9200/_snapshot/backup-repo" -H 'Content-Type: application/json' -d \
   '{
        "type": "fs",
        "settings": {
        "location": "./",
        "compress": true
        }
    }'

e. Take the actual backup, and copy it to a safe place

 curl -X PUT "localhost:9200/_snapshot/backup-repo/am_indexes_backup?wait_for_completion=true"
 cp /var/lib/elasticsearch/backup-repo ~/elasticsearch-backup -rf


3. Check wich elasticsearch version is installed

 Redhat/Centos: rpm -qa | grep elasticsearch
 Ubuntu: dpkg -l | grep elasticsearch

4. Download a temporary copy of elasticsearch (same version than intalled)

 wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.6.tar.gz


5. Uncompress the downloaded tar.gz and go into it

 tar zxvf elasticsearch-1.7.5.tar.gz
 cd elasticsearch-1.7.5


6. Copy the backup you just created

 mkdir -p data
 cp /var/lib/elasticsearch/backup-repo data -rf # Use sudo if needed

7. Adjust file permissions

 sudo chown <your user>:<your group> data -R

8. Open other shell, and launch the temporary elasticsearch instance

 cd elasticsearch-1.7.5
 ES_JAVA_OPTS="-Xms2g -Xmx2g" ./bin/elasticsearch -p elastic-tmp.pid -Des.http.port=9500 -Des.path.repo=data/backup-repo

9. Tell the temporary ElasticSearch instance about the backup files, and restore them:

 curl -XPUT "localhost:9500/_snapshot/backup-repo" -H 'Content-Type: application/json' -d \
'{
   "type": "fs",
   "settings": {
   "location": "./",
    "compress": true
    }
 }'
 curl -X POST "localhost:9500/_snapshot/backup-repo/am_indexes_backup/_restore?wait_for_completion=true"

10. Verify that the temporary ElasticSearch service has the same content than the system one:

 curl -X GET "localhost:9500/_cat/indices?v"
 health status index     pri rep docs.count docs.deleted store.size pri.store.size 
 yellow open   aips        5   1      24816            1    175.4mb        175.4mb 
 yellow open   transfers   5   1          0            0       720b           720b 


11. Remove system's ElasticSearch 1.7.5 and it's files

 Redhat: sudo yum remove elasticsearch
 Ubuntu: sudo apt-get remove --purge elasticsearch
 sudo mv /var/lib/elasticsearch /var/lib/elasticsearch-1.7.5
 sudo mv /etc/elasticsearch /etc/elasticsearch-1.7.5

12. Upgrade archivematica and install elasticsearch 6

 ansible-playbook -i hosts singlenode.yml --tags=elasticsearch,archivematica-src

13. Configure ElasticSearch 6 to handle reindex from the temporary es

 echo 'reindex.remote.whitelist: localhost:9500' | sudo tee -a /etc/elasticsearch/elasticsearch.yml
 sudo service elasticsearch restart

14. Migrate the indexes

 sudo -u archivematica bash -c " \
 set -a -e -x
 source /etc/default/archivematica-dashboard || \
 source /etc/sysconfig/archivematica-dashboard \
       || (echo 'Environment file not found'; exit 1)
 cd /usr/share/archivematica/dashboard
 /usr/share/archivematica/virtualenvs/archivematica-dashboard/bin/python manage.py reindex_from_remote_cluster -t 60 -s 1 http://localhost:9500
 ";

The command should finish with

 All reindex requests ended successfully!

Else, you might need to tweak the timeout (-t) and size (-s) parameters.


15. Verify that the new indexes were created and populated:

 curl -X GET "localhost:9200/_cat/indices?v" 
 health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
 yellow open   transferfiles jlu2d2yZQpWwpKKT1ZzMAA   5   1          0            0      1.1kb          1.1kb
 yellow open   aips          VAvONAByRBuVcWBYhlhUmw   5   1        116            0     66.9mb         66.9mb
 yellow open   transfers     OhfLVBzuRqCNUYRCSSTidg   5   1          0            0      1.1kb          1.1kb
 yellow open   aipfiles      clX1gcqnT9CTdwTm8jeatw   5   1      24700            0     70.6mb         70.6mb

You should have 4 indexes now, and the sum of the aips and aipfiles docs.count column should be equal to the doc.count for the aip index in ElasticSearch 1.7.5. The same happens with the transfers/transferfiles indexes.

16. Stop the temporary elasticsearch service

 kill $(cat ~/elasticsearch-1.7.5/elastic-tmp.pid)

17. Restart all archivematica services

 service archivematica-dashboard restart
 service archivematica-mcp-server restart
 service archivematica-mcp-client restart
 service archivematica-storage-service restart