Skip to content

Data Migration

Bulk record modification

If you need to modify values in all records (e.g. change the $schema URL) of an instance. You can tweak this script and perform thos changes in the database.

Might require reindexing

Changing data at database level, above all in the json field, might require the data to be reindex to provide relevant results.

Reindex all records

If a mapping has to be changed with backward compatible changes, just creating a new file with a different version (both the mapping and the jsonschema) would suffice. However, if changes are not backward compatible all records belonging to that index must be reindexed to avoid full search failures.

Steps to reindex:

  1. Check the SQL database of the corresponding endpoint and give access to the user/host in the pg_hba.conf file.

  2. Delete the index from Elasticsearch.

  3. Create the new index.
  4. In a properly configure CSaS instance (defualt index and document, CSaS instance, ES host, SQL URI, etc.) execute the following commands:
$ invenio utils reindex -t recid (optional: --doc-type doc_v1.0.0)
$ invenio utils runindex

It is advised to reindex this records in an instance with the SQL URI pointing to production and the ES host to QA to check that the amount of records produced is the expected (the same that the ones existing in prod).