How to Audit a Shopify Product Feed Before You Push to Google Merchant Center
The Pattern Every Merchant Hits Eventually
You connect Shopify to Google Merchant Center, kick off the first sync, and the dashboard lights up red. Hundreds of disapprovals. Some say "missing GTIN." Some say "image too small." Some say "invalid product category." A few are genuinely cryptic. You sift through the list, fix what you can, push another sync, and the count changes but never reaches zero.
I've audited a lot of feeds. The most common mistake is treating Google Merchant Center as the validator. It is not. By the time GMC tells you a product is disapproved, you've already burned the slow async loop: push the feed, wait for review, read the rejection, fix the source data, push again. On a 10,000-product catalog that loop is days, not minutes.
The fix is to audit the feed before you push it. Catch the obvious failures locally, ship a clean catalog, and let GMC review only the genuinely ambiguous edge cases. This post is the audit checklist I use on real Shopify catalogs.
What "A Feed" Actually Means Here
When I say "feed" I mean the flat row-per-variant export you'd hand to GMC: one row for
each sellable SKU, with columns like id, title,
description, link, image_link, price,
availability, brand, gtin, mpn,
identifier_exists, google_product_category, and a handful of
attribute fields like color, size, material,
age_group, and gender.
Shopify doesn't expose this shape natively. You build it from the GraphQL admin API, a CSV export, or whatever sync tool you're using in between. The audit happens after that flattening step and before the GMC push.
The Eight Checks That Catch 90% of Disapprovals
1. Required field presence
Every row needs id, title, description,
link, image_link, availability, and price.
Run a presence check across every column on every row. The output should be zero empty
cells in those columns. If you see 414 empty titles in a 51,000-product feed, you have a
transformation bug, not a content gap. Most likely a variant-option name (like "Title") is
clobbering the product title during the flatten.
2. GTIN check-digit validation
A GTIN is not just any number. The last digit is a checksum computed from the first twelve. Half the GTINs I see in the wild fail the checksum because someone hand-typed them or scraped them from a supplier sheet without verifying. GMC will reject these as "invalid GTIN" without telling you the digit is wrong, only that the value isn't found in their database.
Compute the check digit yourself. The algorithm is short: multiply digits at odd positions by 1, even positions by 3, sum, take the value needed to round up to the next 10. Reject any row where the computed digit doesn't match the last digit on file. Surface those rows to the merchant with the original value and the corrected one as a side-by-side comparison.
3. identifier_exists consistency
GMC requires either a valid GTIN/MPN/brand combination or an explicit
identifier_exists=no flag for products without manufacturer identifiers
(custom-made goods, vintage items, generic supplies). If a row has no GTIN and no MPN
but the brand is set, GMC will treat it as a missing identifier and disapprove. The fix
is to set identifier_exists=no on those rows, which signals "this product
genuinely has no identifier" rather than "we forgot one."
Audit for the inconsistent state: rows with brand but no GTIN/MPN and no
identifier_exists=no. Those are the disapprovals waiting to happen.
4. Image URL reachability
GMC fetches every image_link on every row. If the URL 404s, returns a
redirect chain, or serves an image smaller than 100x100 pixels, the product gets
disapproved. Run a HEAD request against every unique image URL in the feed. Capture
the status code, final URL after redirects, and content-length. Flag anything that's
not a 200 with a content-length above ~10KB.
The redirect case is the sneaky one. A Shopify CDN URL that resolves through a 301 to the canonical asset will load fine in a browser but GMC sometimes treats the redirect chain as a soft failure. Resolve to the final URL in the feed.
5. google_product_category coverage
Google has an exhaustive product taxonomy. Most fields in the feed are optional, but the category is what powers Shopping ad targeting. A product without a category gets indistinct placement. A product with the wrong category gets shown to the wrong shoppers and produces zero conversions.
Two checks: every row has a category, and the category exists in Google's current taxonomy. Google publishes the full list as a downloadable text file. Download it, hash it for fast lookup, and reject any row whose category isn't in the set. Bonus: flag rows where the category is the top-level catch-all (like "Apparel & Accessories" with no subcategory). Those won't get disapproved, but they will underperform.
6. Price and availability sanity
Price must be a positive number with a currency code. Availability must be one of
in stock, out of stock, preorder, or
backorder. The two interact: a $0 price with in stock is a
disapproval. A negative inventory count flattened to in stock is a
disapproval. A free product is allowed but only with the right structured signaling.
Audit the joint distribution. Count rows by availability and price-bucket. Anything surprising (a thousand $0 in-stock rows, or a hundred negative-inventory in-stock rows) is a transformation bug.
7. Title and description quality
Beyond presence, GMC has soft quality rules. Titles longer than 150 characters get truncated in search results. Titles in all caps trigger spam filters. Descriptions under 80 characters look thin and reduce ad rank. HTML tags in either field are disallowed (and Shopify descriptions are HTML by default, so the flatten step needs to strip them).
Audit for: title length distribution, all-caps ratio, description character count,
and presence of < or > in either field. The
transforms are easy. The audit is what tells you which products to apply them to.
8. Variant-level uniqueness
Every row in the feed needs a unique id. In Shopify, that means
variant-level not product-level. A product with three sizes and two colors becomes six
rows in the feed, each with a unique variant ID. If your transform is collapsing
variants or duplicating them, GMC will either dedupe to one (losing your color/size
coverage) or reject the duplicates.
Count distinct IDs against total row count. They should match. If they don't, your flatten step has a join bug. Common cause: an inventory-location join that fans out one row per location per variant, producing 4x or 8x the expected row count.
Running the Audit Without Building Your Own Tool
You can do all of this in a Google Sheet with formulas, and that's the path I recommend for first-time audits. Pull your feed into a sheet, run the eight checks as columns, and you'll find your problems in 20 minutes. The blockers I see most often (empty titles, invalid GTINs, missing categories) jump out immediately.
I packaged the full set of checks into a Google Sheets add-on called Feed Audit Tool. It runs 130+ automated checks across the eight categories above, plus platform-specific checks for Meta and Shopify, and produces a color-coded report with cell-level clickable hyperlinks so you can jump to the exact problem cell. I built it for my own catalogs and sell it on Gumroad for $29 because it saves me an hour every time I run it.
The bigger lesson is that the audit is a habit, not a one-time event. Catalogs drift. Suppliers change SKUs. Photographers re-upload images at lower resolution. Whoever controls the source data will, at some point, make a change that breaks the feed. Run the audit on every push, not just the first one.
When the Audit Catches Something That Isn't Your Fault
Some failures live in the sync layer, not the source data. If your Shopify catalog has
a custom.color metafield that holds the right value, but the feed sent to
GMC has empty color, that's a mapping bug between Shopify and your sync
tool. The audit will flag it as a missing field, but the fix is in the transformation
logic, not in Shopify.
I built SnowPipe for exactly this case: a Shopify-to-GMC sync tool that handles metafield-to-attribute mapping, GTIN validation, auto-removal of deleted products, and the row-limiting strategies that prevent half-syncs on large catalogs. If you're tired of debugging your sync tool's transformations and want one that's been audited against the eight checks above, that's what it's for.
But you don't need a sync tool to start. Audit your feed today, fix the obvious failures, and watch the GMC disapproval count drop before you change anything else.