Update mapping and reindex Elasticsearch indices

From Archivematica
Revision as of 08:37, 21 December 2020 by Artefactual (talk | contribs) (Created page with "= Backup Elasticsearch data = The easiest way to backup the Elasticsearch data is copying the data directory: sudo service elasticsearch stop tar cvfz var_lib_elasticsea...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Backup Elasticsearch data

The easiest way to backup the Elasticsearch data is copying the data directory:

 sudo service elasticsearch stop
 tar cvfz var_lib_elasticsearch.tgz /var/lib/elasticsearch
 sudo service elasticsearch start

List indices to resize the Elasticsearch heap size when needed

Use the following command to list indices:

 curl -s -X GET 'http://localhost:9200/_cat/indices/%2A?v=&s=index:desc'

The output should show something like this:


 root@archivematica-test-server:~# curl -s -X GET 'http://localhost:9200/_cat/indices/%2A?v=&s=index:desc'
 health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
 yellow open   transfers     lYqkYjwZRy2XG8CP_3S3PQ   5   1          0            0      1.2kb          1.2kb
 yellow open   transferfiles K5gnDZyOQz2JdIeZ6adJsQ   5   1          0            0      1.2kb          1.2kb
 yellow open   aips          yAyK_koXThaZcWsBYfzN7w   5   1         17            0    101.4mb        101.4mb
 yellow open   aipfiles      TVrrX8jkRhWWxGfvK_M6zg   5   1      11987            0      2.9gb          2.9gb

Take the elasticsearch heap size from /etc/default/elasticsearch (Ubuntu) or /etc/sysconfig/elasticsearch (CentOS):

 root@ny-gclibrary-test-release-1:~# grep ES_JAVA_OPTS= /etc/default/elasticsearch 
 #ES_JAVA_OPTS=
 ES_JAVA_OPTS="-Xms2g -Xmx2g"

The heap size for the example is 2G.

Ensure your Elasticsearch heap size is greater than the max store.size in the indices list. For our example, it should be greater than 3GB.

  • Edit /etc/default/elasticsearch or /etc/sysconfig/elasticsearch when needed.
  • Change ES_JAVA_OPTS when needed.
  • Restart Elasticsearch service when needed (sudo service elasticsearch restart)

Run script to reindex and use new mappings

Use the following script:

#!/bin/bash


es_url="http://localhost:9200"

index_list='aips aipfiles transfers transferfiles'

echo -e "\nIndex list before reindexing:\n"
curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc"
echo -e "\n"

#Clone indices with _reindex API call:
for index in $index_list;do 
    echo "Reindex ${index} in ${index}_new..."
    curl -s -X POST \
      ${es_url}/_reindex \
      -H 'Content-Type: application/json' \
      -d '{
      "source": {
        "index": "'"${index}"'"
      },
      "dest": {
        "index": "'"${index}_new"'"
      }
    }' > /dev/null
done

echo -e "\n\n"

echo -e "Index list after tmp indices creation\n"
indices_output=$(curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc")
curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc"
echo -e "\n"

#Delete old indices
for index in $index_list;do
  echo "Deleting ${index}..."
  curl -s -X DELETE ${es_url}/${index} > /dev/null
done

#Restart archivematica-dashboard to create indices with new mappings
echo -e "\nRestarting archivematica-dashboard"
sudo service archivematica-dashboard restart

#Wait 30 seconds
echo "Wait 30 seconds to ensure dashboard has created the empty indices with new mapping"
sleep 30
echo -e "\n"

#When index has no docs the reindex doesn't create the new index (typically transferfiles index)
#There's a check to ensure the new index has been create before reindexing. 
#Reindex fron *_new indices:
for index in $index_list;do
  if echo "$indices_output" | grep ${index}_new >/dev/null; then
    echo "Indexing ${index} using ${index}_new ..."
    curl -s -X POST \
      ${es_url}/_reindex \
      -H 'Content-Type: application/json' \
      -d '{
      "source": {
        "index": "'"${index}_new"'"
      },
      "dest": {
        "index": "'"${index}"'"
      }
    }' > /dev/null
  fi
done

echo -e "\n"

#Delete tmp indices
for index in $index_list;do
  if echo "$indices_output" | grep ${index}_new >/dev/null; then
     echo "Deleting ${index}_new..."
     curl -s -X DELETE ${es_url}/${index}_new > /dev/null
  fi
done

echo -e "\n\nReindexing done:\n"
curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc"
echo -e "\n"

For our example it takes 11 minutes, and this is the output:

<nowiki>

root@archivematica-test-server:~# time ./script_reindex_new_map.sh

Index list before reindexing:

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 3 0 11.6kb 11.6kb yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 12905 0 2.6gb 2.6gb


Reindex aips in aips_new... Reindex aipfiles in aipfiles_new... Reindex transfers in transfers_new... Reindex transferfiles in transferfiles_new...


Index list after tmp indices creation

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open transfers_new gdFevH8yRdiNTdrPcfo8Lg 5 1 0 0 460b 460b yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 3 0 11.6kb 11.6kb yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb yellow open aips_new uJ-ehaYLTfe_1lOSErfu3Q 5 1 17 0 96.8mb 96.8mb yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb yellow open aipfiles_new 00Xxu7v2QvWsq92gM247xQ 5 1 12905 0 3.1gb 3.1gb yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 12905 0 2.6gb 2.6gb


Deleting aips... Deleting aipfiles... Deleting transfers... Deleting transferfiles...

Restarting archivematica-dashboard Wait 30 seconds to ensure dashboard has created the empty indices with new mapping


Indexing aips using aips_new ... Indexing aipfiles using aipfiles_new ... Indexing transfers using transfers_new ...


Deleting aips_new... Deleting aipfiles_new... Deleting transfers_new...


Reindexing done:

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open transfers FC7aSVPmSmmCc_LTv1AQRA 5 1 3 0 1.2kb 1.2kb yellow open transferfiles 5JMAft3FQwmosZQFi7eJNw 5 1 0 0 1.2kb 1.2kb yellow open aips EtwXG3-4SO2Px-4QMRufXA 5 1 17 0 102.1mb 102.1mb yellow open aipfiles -PFuzslgTeWJ4CWny8VZoA 5 1 12905 0 3gb 3gb


real 10m47.114s user 0m0.068s sys 0m0.032s <nowiki>

Restore Elasticsearch heap size when needed

  • Edit /etc/default/elasticsearch or /etc/sysconfig/elasticsearch when needed.
  • Change ES_JAVA_OPTS when needed.
  • Restart Elasticsearch service when needed (sudo service elasticsearch restart)