Rebuilding Shopify Products After Data Loss

Recovering from a bad sync with accounting software that destroyed the product database.

A client that uses AccountEdge accounting software along with Shopify recently had a bad case of data loss that took some clever thinking to recover from.

Their primary product database exists in AccountEdge – product names, SKUs, costs & pricing, inventory – but metadata such as descriptions, images, SEO content, and Shopify metafields all only existed at Shopify. We use metafields to store labels and descriptions about the product options, without which customers can't make a purchase.

A typo in the name of a product variant within AccountEdge caused it to suddenly see all Shopify products as unknown products. The next time the app was launched and attempted to sync with Shopify, it deleted every product and recreated it with the new typo-variant. The recreated products had no images, metafields, etc., rendering the entire ecommerce platform unusable. This client had subscribed to a Shopify backup app, but after running a restore we discovered 2 big issues:

  • We now had 2 copies of all products.
  • Metafield data hadn't been restored at all.

After several back-and-forths with the backup service, they were ultimately unable to provide any metafield data and we had to form a new plan of attack.

Rebuild and Restore

Our new strategy:

  1. Write a script to backup the "restored" Shopify product data.
  • collections, products, and collects saved to a local MongoDB.
  • Image files copied from Shopify CDN to S3.
  1. Delete everything from Shopify.
  2. Have client do a fresh sync from AccountEdge to Shopify.
  3. Write a script to update all Shopify products with the metadata from step 1 (including the URL of the images at S3).
  4. Restore metafields - easier said than done, however in this case the bulk of the data was restored by parsing CSVs that the client maintained with their product information, so a lot of time was saved here.

Once the details of the process were worked out and the proof-of-concept was prototyped that utilized Shopify's API, the rollout was relatively quick. It took only seconds to grab all collections and thousands of collects, and about 20 minutes to grab several thousand products and copy all of their images. Restoring took a little bit longer, as each product PATCH takes anywhere from 200-2,000ms and Shopify's rate limits aren't negligible. The site was back up and running, and the client suffered a minimal amount of downtime.

As a bonus, we now have an in-house Shopify backup solution that includes all content (theme, pages, products, collections, orders) along with critical product data (images, metafields, other metadata). We can store this data in a JSON file, SQLite, MongoDB, or anything else that a client's use case requires. More importantly, the data is accessible and can be easily browsed or restored to guarantee its integrity.

This was a relatively elegant and interesting solution to a catastrophic problem. I don't want to share all of the code here since it's tailored specifically to this client, but if you're in a position where you need some assistance with something like this feel free to contact me.

Lesson learned: a backup is only as good as a test restore has proven it to be.