Data Migration¶
Bulk record modification¶
If you need to modify values in all records (e.g. change the $schema
URL) of an instance. You can tweak this script and perform thos changes in the database.
Might require reindexing
Changing data at database level, above all in the json
field, might require the data to be reindex to provide relevant results.
Reindex all records¶
If a mapping has to be changed with backward compatible changes, just creating a new file with a different version (both the mapping and the jsonschema) would suffice. However, if changes are not backward compatible all records belonging to that index must be reindexed to avoid full search failures.
Steps to reindex:
-
Check the SQL database of the corresponding endpoint and give access to the user/host in the
pg_hba.conf
file. -
Delete the index from Elasticsearch.
- Create the new index.
- In a properly configure CSaS instance (defualt index and document, CSaS instance, ES host, SQL URI, etc.) execute the following commands:
$ invenio utils reindex -t recid (optional: --doc-type doc_v1.0.0)
$ invenio utils runindex
It is advised to reindex this records in an instance with the SQL URI pointing to production and the ES host to QA to check that the amount of records produced is the expected (the same that the ones existing in prod).