Update mapping and reindex Elasticsearch indices
This is now part of our official upgrading docs: https://www.archivematica.org/en/docs/latest/admin-manual/installation-setup/upgrading/upgrading/.
---
Backup Elasticsearch data
The easiest way to backup the Elasticsearch data is copying the data directory:
sudo service elasticsearch stop tar cvfz var_lib_elasticsearch.tgz /var/lib/elasticsearch sudo service elasticsearch start
List indices to resize the Elasticsearch heap size when needed
Use the following command to list indices:
curl -s -X GET 'http://localhost:9200/_cat/indices/%2A?v=&s=index:desc'
The output should show something like this:
root@archivematica-test-server:~# curl -s -X GET 'http://localhost:9200/_cat/indices/%2A?v=&s=index:desc' health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 0 0 1.2kb 1.2kb yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 11987 0 2.9gb 2.9gb
Take the elasticsearch heap size from /etc/default/elasticsearch (Ubuntu) or /etc/sysconfig/elasticsearch (CentOS):
root@ny-gclibrary-test-release-1:~# grep ES_JAVA_OPTS= /etc/default/elasticsearch #ES_JAVA_OPTS= ES_JAVA_OPTS="-Xms2g -Xmx2g"
The heap size for the example is 2G.
Ensure your Elasticsearch heap size is greater than the max store.size in the indices list. For our example, it should be greater than 3GB.
- Edit /etc/default/elasticsearch or /etc/sysconfig/elasticsearch.
- Change ES_JAVA_OPTS to a bigger value, in our example:
ES_JAVA_OPTS="-Xms3g -Xmx3g".
- Restart Elasticsearch service for the changes to take effect (sudo service elasticsearch restart)
Run script to reindex and use new mappings
Use the following script:
#!/bin/bash es_url="http://localhost:9200" index_list='aips aipfiles transfers transferfiles' echo -e "\nIndex list before reindexing:\n" curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc" echo -e "\n" #Clone indices with _reindex API call: for index in $index_list;do echo "Reindex ${index} in ${index}_new..." curl -s -X POST \ ${es_url}/_reindex \ -H 'Content-Type: application/json' \ -d '{ "source": { "index": "'"${index}"'" }, "dest": { "index": "'"${index}_new"'" } }' > /dev/null done echo -e "\n\n" echo -e "Index list after tmp indices creation\n" indices_output=$(curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc") curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc" echo -e "\n" #Delete old indices for index in $index_list;do echo "Deleting ${index}..." curl -s -X DELETE ${es_url}/${index} > /dev/null done #Restart archivematica-dashboard to create indices with new mappings echo -e "\nRestarting archivematica-dashboard" sudo service archivematica-dashboard restart #Wait 30 seconds echo "Wait 30 seconds to ensure dashboard has created the empty indices with new mapping" sleep 30 echo -e "\n" #When index has no docs the reindex doesn't create the new index (typically transferfiles index) #There's a check to ensure the new index has been create before reindexing. #Reindex fron *_new indices: for index in $index_list;do if echo "$indices_output" | grep ${index}_new >/dev/null; then echo "Indexing ${index} using ${index}_new ..." curl -s -X POST \ ${es_url}/_reindex \ -H 'Content-Type: application/json' \ -d '{ "source": { "index": "'"${index}_new"'" }, "dest": { "index": "'"${index}"'" } }' > /dev/null fi done echo -e "\n" #Delete tmp indices for index in $index_list;do if echo "$indices_output" | grep ${index}_new >/dev/null; then echo "Deleting ${index}_new..." curl -s -X DELETE ${es_url}/${index}_new > /dev/null fi done echo -e "\n\nReindexing done:\n" curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc" echo -e "\n"
For our example it takes 11 minutes, and this is the output:
root@archivematica-test-server:~# time ./script_reindex_new_map.sh Index list before reindexing: health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 3 0 11.6kb 11.6kb yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 12905 0 2.6gb 2.6gb Reindex aips in aips_new... Reindex aipfiles in aipfiles_new... Reindex transfers in transfers_new... Reindex transferfiles in transferfiles_new... Index list after tmp indices creation health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open transfers_new gdFevH8yRdiNTdrPcfo8Lg 5 1 0 0 460b 460b yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 3 0 11.6kb 11.6kb yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb yellow open aips_new uJ-ehaYLTfe_1lOSErfu3Q 5 1 17 0 96.8mb 96.8mb yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb yellow open aipfiles_new 00Xxu7v2QvWsq92gM247xQ 5 1 12905 0 3.1gb 3.1gb yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 12905 0 2.6gb 2.6gb Deleting aips... Deleting aipfiles... Deleting transfers... Deleting transferfiles... Restarting archivematica-dashboard Wait 30 seconds to ensure dashboard has created the empty indices with new mapping Indexing aips using aips_new ... Indexing aipfiles using aipfiles_new ... Indexing transfers using transfers_new ... Deleting aips_new... Deleting aipfiles_new... Deleting transfers_new... Reindexing done: health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open transfers FC7aSVPmSmmCc_LTv1AQRA 5 1 3 0 1.2kb 1.2kb yellow open transferfiles 5JMAft3FQwmosZQFi7eJNw 5 1 0 0 1.2kb 1.2kb yellow open aips EtwXG3-4SO2Px-4QMRufXA 5 1 17 0 102.1mb 102.1mb yellow open aipfiles -PFuzslgTeWJ4CWny8VZoA 5 1 12905 0 3gb 3gb real 10m47.114s user 0m0.068s sys 0m0.032s
NOTE: The script could fail because JAVA heap size out of memory (please, check /var/log/elascticsearch.log). In this case the indices will be empty, so restore /var/lib/elasticsearch from backup, increase Elasticsearch JAVA heap size and try again.
The script uses the elasticsearch API and makes the following actions:
- Reindex the transfers, transferfiles, aips and aipfiles indices in new temporary indices
- Delete original indices
- Restart archivematica-dashboard service to create empty indices with new mappings
- Reindex from temporary indices
- Delete temporary indices
Restore Elasticsearch heap size when needed
- Edit /etc/default/elasticsearch or /etc/sysconfig/elasticsearch when needed.
- Change ES_JAVA_OPTS when needed.
- Restart Elasticsearch service when needed (sudo service elasticsearch restart)