[AI] Extraction: Automatically Enrich Product Attributes

Summary

What is it?

The Extraction Module helps you automatically enrich missing product attributes by analyzing text such as product descriptions and names, but also assets such as PDF spec sheets and product images (beta).

This means you can now unlock valuable product information that was previously trapped in files, like dimensions, materials, or technical specifications, and have it filled directly into your product records.

Why it matters

  • Save time: Reduce tedious manual data entry by letting AI scan and extract data from PDFs, images and text.
  • Improve data quality: Minimize errors and inconsistencies by relying on automated extraction.
  • Speed up onboarding: Get new supplier products into your catalog faster with richer, more complete product data.

Example use case

A supplier sends you a spreadsheet of product SKUs plus a folder of technical spec sheets in PDF and product images. Instead of manually copying details like materials or dimensions, the Extraction Module automatically analyzes the assets and fills these attributes for you.

How does it work?

The AI analyzes available product information and fills in missing attributes, like dimensions or specifications. If an attribute has low confidence, it's flagged in orange for review. Images from the input file can also be displayed to help you confirm or edit attributes.

Extraction from public asset URLs pointing to PDFs or images is also supported.

 

Key Sections

  • Fields Section: On the left, you can see which sources are used for the AI.
  • Attributes Section: Here, you confirm and save the completed attributes which have automatically filled.

Attribute Types

  • Select Attributes: Predefined values you select from a list.
  • Textual Attributes: You can input text or numbers manually.

Attribute Statuses

  • Mandatory (Red Star): Must be filled before moving to the next step.
  • Important (Orange Dot): Alerts you if empty but doesn't block progress.
  • Optional: Non-blocking and not represented visually.

Filters

  • Origin Filter: Audit only AI-completed attributes, marked with a robot icon.
  • Importance Filter: View attributes based on their level of importance (mandatory, important, or optional).

Configuring data sources for Extraction

The Extraction Module can work with different types of sources to enrich product attributes:

  • Text sources: product titles, descriptions, or other text fields.
  • (beta) PDF sources: supplier spec sheets or documentation.
  • (beta) Image sources: product images provided by suppliers.

You can configure which sources the AI will use directly in the app:

  1. Go to Workflow > Settings > Steps.
  2. Select the Extraction step you want to customize.
  3. Choose the sources you want the AI to analyze (text fields, PDFs, images).

Once configured, the AI will automatically pull information from these sources to complete missing product attributes.
For more details, check out this page: Changing the attributes used as a reference by the AI model.

Limitations

To make sure the extraction works smoothly, keep in mind these limits:

  • PDFs: Only the first PDF is analyzed, and up to the first 20 pages are processed.
  • Images: The first 5 images linked to a product are scanned.
  • Products per job: Up to 10,000 products (rows).
  • Attributes per product: Up to 50 attributes.the 

💡 Best practice: For the best results, we recommend testing with smaller batches, fewer than 200 products and under 20 attributes, before scaling up. This makes it easier to review the extracted data and adjust settings if needed.