Accelerate data initialization with the Data Architect Agent

Summary

The Data Architect Agent (DAA) empowers new customers to quickly establish their product data structure in Akeneo PIM in under a day.

By leveraging AI, it automatically generates an accurate data model based on your catalog extract, while streamlining the PIM initialisation and reducing the time to go live.

 

Creating your product data structure in the PIM can be detailed under three steps:

Import your products

Upload your product catalog in a flat file from an e-commerce platform or ERP system. The AI uses this data to generate your custom model, based on Akeneo’s good practices and key concepts.

How does it work?

  • Check that you have full permission access to the Data Architect Agent under System > Roles > Administrator (or your own role) > Data Architect Agent

 

  • Then, click onto Settings to find the Data Architect Agent under the Automation section, all the way to the bottom of the page.

 

  • Fill in the context text box to guide the AI model and get the most of it.

 

The Context box will allow the AI model to generate the most accurate version of your catalog according to your business requirements.

We have added some examples of the kind of information that will help the AI model to create the best version of your data, under the section What kind of context should I provide to the AI model?’

 

 

  • Then, upload the product data file(s) that have been extracted from your ERP or your e-commerce platform. The accepted formats are XLS, XLSX and CSV and files can weight up to 20MB.

 

Please note that we recommend the following structure for files:

  • Format: XLS, XLSX and CSV files containing product information (title, description, some attributes that are already here, etc.), and nothing else. Your file does not need to be exhaustive, a few hundreds of products are more than enough.
  • One product per row
  • One attribute per column
  • Names of the attributes in the first row
  • Attribute values start from the second row
  • Max size of imported files: 20 Mb per file
  • Max number of imported files: 10

Please note that the model usually samples these files to build the data model.

 

 

  • Explore and refine

After processing, the tool creates a data model with families, attributes, and options. You can explore and make adjustments to every entity generated (families, attributes, codes and labels etc.).

 

Key concepts

  • Families

A family is a defined set of attributes that products automatically inherit when assigned to this family. While a product can only be part of one family (or none, if it's a unique item without default attributes), this structure helps to manage and track a product's data completeness.

Find out more with our dedicated Help Center article about Families.

  • Family Variants

The model is able to generate family variants based on your input file. Family variants help managing products with different variations (i.e. a couch that comes in different colors and sizes) .

Find out more with our dedicated Help Center article about Family Variants.

  • Attributes

The model is able to generate attributes based on your input file. Attributes help by providing specific characteristics and details for each product, enabling richer descriptions, improved searchability, and better organization of your product information within the PIM system.

Find out more with our dedicated Help Center article about Attributes.

 

We currently support 10 types of attributes within the Data Architect Agent. The model is capable of generating these attributes, whether as suggestions or during the re-processing of the input file.

 

 

Entity Status
Assets collection Not supported
Date Supported
File Not supported
Identifier Supported
Image Not supported
Metric Supported
Multi select Supported
Number Supported
Price Supported
Product link Not supported
Reference entity multiple links Not supported
Reference entity single link Not supported
Simple select Supported
Table Not supported
Text Supported
Text Area Supported
Yes/No (boolean) Supported

 

As of today, we don’t support the following entities in the Data Architect Agent.

  • Product models
  • Categories
  • Channels
  • Attribute groups
  • Association types
  • Groups
  • Workflows
  • Rules
 

 

Issues

We've introduced a new Issues tab to help you quickly identify and address problems within your catalog's structure. This dedicated tab centralizes all identified issues, making them easier to manage.

For each issue, you'll see details about the affected entity and a clear error message. To resolve an issue, simply click the ‘EDIT’ button. This will automatically take you to the specific location of the problem, allowing you to identify and correct it.

 

Product previews

The Product previews tab gives you a first overview at your products, showing you how they'll appear in the PIM.

We generate a sample set of products from your uploaded file, based on the families and attributes that have been suggested. You can easily navigate through your products by using the family filter or the search bar.

The attribute preview is designed to give you a quick projection of your data, by creating a sample set of your products.

Please note that:

  • The preview currently supports text, text area, simple select, and multi select attributes. Other attribute types may not be displayed as expected in the preview.
  • The product preview does not guarantee that every attribute will have a value, nor that every line in your uploaded file will necessarily result in a product in the preview tab.
  • The product preview might take some seconds to load.

 

We’ll be soon releasing a way to import these products directly in our PIM, to help you with your catalog initialization.

 

 

History

You can find a version history in the last tab of the feature. This allows you to restore the previous version of your change if needed. The history displays a maximum of 25 revisions.

 

Additional capabilities

  • Download

In the top right corner of the DAA, you'll see additional actions, including the ability to download your files. 

You can download separate CSV files for:

  • Families
  • Family variants
  • Attributes
  • Attribute Options

 

  • Cancel and reset

You also have the option to cancel and reset your model.

 

Specifications

  • Depending on several factors, generating a data model can take anywhere from minutes to hours. A waiting message “Model generation in progress” will be displayed on your screen, and the page will be automatically refreshed to show the model generated. A confirmation email is also sent when the model has been generated.
  • Family and attribute codes will be generated in English by the model but their labels will be generated in each of the locales selected in the first page when initializing the model. Please note that if you deselect English as one of the locales, your codes will still be created in English. However, in that case, labels will not be translated in English, but will be in the other locales that you chose.
  • The LLM used for data model generation is Gemini Flash 2.5. Data sent to Google is neither stored on their servers nor used to train their models.
     

Limitations

  • It's important to note that the data model you're working on is only stored in your browser. Accessing a data model generated with the Data Architect Agent on another user’s browser is not possible for the moment. We recommend downloading your files regularly to ensure they're safely stored.
  • Multiple users in the same PIM environment can generate and store separate data models.
     

Please note that if you clear your browser’s data, you might lose the generated data model.

 

 

Implementing the model

When you are satisfied with your data structure, you can implement your model into your PIM to create your entities, using the ‘Implement’ button on the top right of the screen.

 

We can highly recommend reviewing your data model with a Professional Services consultant or Partner before applying the model to your PIM to make sure it fits your business, and future catalog evolution.

 

 

What kind of context should I give to the AI model?

Our first recommendation would to be as precise as possible to ensure the model fully grasps the characteristics of your catalog.

Here are the key characteristics that are essential when you start writing a prompt:

  • Be clear, by providing precise context. The AI agent cannot know your intentions, it is essential that you clearly state what needs to be created and why.
    • Give concrete details about your product data structure that you have now and want to have in the future, the model will be able to generate a more accurate response to your context and input file (i.e. ‘I want to group those type of products in the same family’; ‘I want to have a dedicated family for my kids clothing items’ ; ‘I want the Shorts family to vary by color and size’ and so on)
    • Detail, if possible, how your product information should be organized using Akeneo entities (families, attributes and so on)
       
  • Include constraints, by specifying naming conventions, business rules, or technical requirements.
    • Give concrete details about your current and desired taxonomy, by providing specific examples from your existing product data (i.e. ‘All attribute labels should be in lowercase with an underscore separating words if needed.)

 

To learn more about prompting, check out our dedicated Akademy course. Additionally, your Professional Services consultant or Partner can help you create an accurate prompt for your need.

 

 

We can recommend a structure as follow:

  1. Introduction to your business and distribution context overview
    Example: 
    I am a B2B distributor specializing in bicycle components and accessories. I distribute my products mainly in France, but I also sell in other European countries (Belgium, Germany).
     
  2. Overview of your product range, with technical details if needed. Future expansion needs are also welcome, if it applies to your business
    Example: 
    My families should represent the products intended for professional bike workshops (e.g., bulk spare parts, repair kits) and those for retailers selling to end consumers (e.g., packaged accessories, branded components).

    I also plan to expand my catalog in the future to include e-bike components and possibly outdoor cycling gear. The generated data model should be flexible enough to accommodate these additions.
     
  3. Detailed overview of your key products, with their essential characteristics that the AI model should take into account
    Example: 
    For technical products, details like compatibility (e.g., Shimano or SRAM drivetrain compatibility), material (e.g., aluminum, carbon, steel), and sizing (e.g., tire sizes 700x25c, 29x2.2) are crucial. For accessories, attributes like color, weight, and mounting system should be considered.
     
  4. Final details about your audience and customers
    Example: 
    Make sure the generated model aligns with the needs of B2B buyers who require structured, precise data to facilitate procurement and sales.

 

Naming conventions and suggestions about the format of codes and/or labels that the model should put in place are also recommended in the context.

For technical products, details like compatibility (e.g., Shimano or SRAM drivetrain compatibility), material (e.g., aluminum, carbon, steel), and sizing (e.g., tire sizes 700x25c, 29x2.2) are crucial. For accessories, attributes like color, weight, and mounting system should be considered.