Software and Animation

raFTI: Matching "Chaotic" Wine Names
Hi, I'm Vit Glinca, a backend programmer at Deeplace, a company that actively works in winetech, among other areas. I'd like to present our latest feature in this area: raFTI.v5.3, a full-text search system.
When working with wine catalogs, one quickly encounters an unexpected problem. The same wine may appear under several completely different names. For example:
Barolo Riserva 2016
Barolo 2016 Riserva
Barolo DOCG Riserva 2016
Sometimes the differences are even more significant:
different languages;
abbreviations;
additional descriptive words;
typing errors.
As a result, what seems to be a simple search task turns into a text interpretation problem. Several years ago, I started experimenting with a system for matching wine names. Over time, it evolved into a rather unusual solution built around a large collection of heuristics and domain-specific rules. The system was eventually named - raFTI.v5.3 (Relate Assemblies for Full Text Indexation).
The Chaos of Wine Names
Wine names often resemble loosely structured text rather than formal identifiers. They may contain: grape varieties / regions / wineries / wine styles / vintages / colors / …
Moreover, the order of these elements may vary. Sometimes part of the information is missing altogether. In other cases, marketing terms are added. As a result, multiple textual representations may refer to exactly the same wine.
Anchors and Modifiers
Some terms in a title act as anchors, while others act as modifiers. Anchors define the main semantic points of a line, while modifiers refine them.
From Groups to Templates
The first versions of the system relied on simple term groups. For example:
grape varieties / regions / wineries / …
Over time, however, it became clear that wine names are not assembled randomly. Most of them follow recurring structural patterns. The next step was therefore to identify term subgroups and build name templates. For example:
[Winery] [Wine name] [Color] [Type]
or
[Region] [Grape] [Typology]
When the internal structure of a wine name is not explicitly visible, such templates begin to serve as matching anchors.
Synonym Mania
One of the core ideas behind raFTI is the aggressive use of synonyms. Over time, this mechanism evolved into several distinct layers.
Primary Synonymy
At the token level, different forms of the same concept are treated as equivalent:
grammatical variations;
common spelling mistakes;
translations of individual words.
All such variants receive the same base index component.
Secondary Synonymy
During index construction, additional synonyms are generated automatically:
word permutations;
alternative term combinations;
different name forms.
Tertiary Synonymy
This layer is derived from dictionaries and object catalogs:
alternative names;
historical names;
abbreviations.
Quaternary Synonymy
Some synonym relationships emerge naturally during data processing. For example, winery names frequently appear together with city names. Such associations may later be used as additional matching hints.
Semantic Triggers
Over time, it became clear that excessive synonymy creates side effects of its own. To control this process, a mechanism called semantic triggers was introduced.
A trigger is a contextual condition that allows or blocks the use of specific synonyms and/or activates a group of attributes. For example:
some synonyms are valid only in wine names;
others are valid only in region names;
some are allowed only near specific anchor terms;
the token "prosecco" automatically activates: color(white), co2(sparkling), country(Italy)
Over time, this mechanism evolved into a powerful semantic description tool in its own right.
Decision Signature
To understand why the system chooses one match over another, every candidate is represented by a compact 11-digit diagnostic code. Each position reflects the contribution of a specific component:

FF NGV RMC SAU
Component
Destination
FF
Final score
N
Name - criterion for the completeness of the name template
G
Grape (high impact on score)
V
Vineyard (high impact on score)
R
Region (mean impact on score)
M
Modifiers for class (low impact on score)
C
Color (high impact on score)
S
Sparkling - CO₂ (high impact on score)
A
Anchor - class type (high impact on score)
U
Unical words (high impact on score)

A value of 3 is considered neutral; greater than 3 is a bonus; less than 3 is a penalty. Additionally, components marked as (high impact on score) receive additional penalties if the value is less than 3. The impact of the important Region characteristic is weakened, as this characteristic is heavily contaminated with errors and inaccuracies in real data sources. Example:
(search sample) VIN CHATEAU VARTELY TARABOSTE ROSU SEC 0.75L
(comparison results with wines from the database)

score
vineyard
wine name
19 339 339 934
Chateau Vartely
Taraboste Pinot Noir
9 339 349 932
Chateau Vartely
Taraboste Pur Aristocratic Rosu
9 339 339 924
Chateau Vartely
Taraboste Reserva Cabernet Sauvignon & Merlot

(explanations) The first result received the maximum score for matches: winery + color + CO₂ (characteristics not reflected in the names are taken as values from the database). Grape varieties and region are not specified in the search sample and do not affect the result. A match for the unique word Taraboste is valued if it contains no extra words. The second result from the database was penalized for extra words. The third result was penalized for containing the important (Anchor - class type) word Reserva, which was not in the search sample.
When a person makes a mistake
(search sample) DIVUS Rara Neagra rosu sec 0.75l, anul 2022
(comparison results with wines from the database)

score
vineyard
wine name
24 588 339 933
Divus Winery
Rara Neagra
12 582 369 933
Gitana Winery
Rara Neagra Rosu Sec

(explanations) At first glance, the second option seems convincing. The words (Rosu Sec) in the name serve as both characteristics and modifiers when explicitly stated in the name. As characteristics, they earn prizes for both samples; as modifiers, only the second sample receives a prize, but it is penalized more severely for not matching the winery. (The operator chose the second option based on the number of word matches)

Comparing Matchers
During development, raFTI was continuously compared against traditional full-text search engines. In particular, several experiments were performed using the search mechanisms available in MySQL.FTI. In practice, a number of characteristic differences emerged. Full-text search performs well when:
word order is stable;
identical terms are used;
the text structure is relatively simple.
Wine names, however, often violate all three assumptions. Typical issues include:
shuffled term order;
partially missing information;
additional descriptive words.
In such situations, conventional search tends to produce a large number of irrelevant candidates. raFTI addresses this problem through:
structural templates;
term typology;
category-specific bonuses and penalties.
As a result, the candidate space becomes significantly smaller and the matching results more stable. Numerous statistically significant comparisons of the matching results yield the following distribution of correct answers:

Matcher
Result
MySQL.FTI
40 % correct answers
raFTI.v5.3
96 % correct answers

The Anti-Combinatorial Effect
At first glance, a system built around numerous heuristics should suffer from combinatorial explosion. In practice, the opposite effect is observed. Real-world wine names are generated within a relatively small set of structural patterns. As a result, additional rules do not expand the search space. Instead, they help eliminate impossible combinations at an earlier stage. Rather than increasing complexity, many heuristics act as filters that progressively narrow the set of plausible candidates.
The Core Idea Behind raFTI
is based on the Relate Assemblies (RA) methodology. Among other concepts, the methodology includes:
structural object indexing;
meta-description indexing;
multi-layer data integrity.
In particular, two complementary forms of referential integrity are used:
Forward Reference Integrity;
Reverse Reference Integrity.
The broader RA methodology is discussed in more detail: here
Final Thoughts
In the age of neural networks, raFTI may appear somewhat old-fashioned. That is probably true. Nevertheless, it remains an effective approach for domains where transparency, controllable knowledge representation, and predictable behavior are more important than raw statistical inference. Several ideas turned out to be considerably more useful than originally expected:
structural name templates;
multi-level synonymy;
semantic triggers;
compact decision signature.
This article describes only a subset of the mechanisms used by the system. Many implementation details were intentionally omitted to avoid turning the discussion into technical documentation. Practical use of the system continues to reveal new directions for development. In particular, the accumulated synonym data has proven valuable not only for improving recognition quality, but also for discovering missing relationships in the domain model and refining heuristics for rare or erroneous data forms.
Comments, criticism, alternative approaches, or simply a fresh perspective on the problem are always welcome.
TECHNOLOGIES ARE CHANGING, CRYSTALLIZATION OF DATA - NOT
(Help, translations, graphics by ChatGPT)

Software and Animation

Страницы

raFTI.v5.3 - description - en

raFTI: Matching "Chaotic" Wine Names

raFTI: Matching "Chaotic" Wine Names
Hi, I'm Vit Glinca, a backend programmer at Deeplace, a company that actively works in winetech, among other areas. I'd like to present our latest feature in this area: raFTI.v5.3, a full-text search system.
When working with wine catalogs, one quickly encounters an unexpected problem. The same wine may appear under several completely different names. For example:
Barolo Riserva 2016 Barolo 2016 Riserva Barolo DOCG Riserva 2016
Sometimes the differences are even more significant:
different languages; abbreviations; additional descriptive words; typing errors.
As a result, what seems to be a simple search task turns into a text interpretation problem. Several years ago, I started experimenting with a system for matching wine names. Over time, it evolved into a rather unusual solution built around a large collection of heuristics and domain-specific rules. The system was eventually named - raFTI.v5.3 (Relate Assemblies for Full Text Indexation).
The Chaos of Wine Names
Wine names often resemble loosely structured text rather than formal identifiers. They may contain: grape varieties / regions / wineries / wine styles / vintages / colors / …
Moreover, the order of these elements may vary. Sometimes part of the information is missing altogether. In other cases, marketing terms are added. As a result, multiple textual representations may refer to exactly the same wine.

Anchors and Modifiers
Some terms in a title act as anchors, while others act as modifiers. Anchors define the main semantic points of a line, while modifiers refine them.

From Groups to Templates
The first versions of the system relied on simple term groups. For example:
grape varieties / regions / wineries / …
Over time, however, it became clear that wine names are not assembled randomly. Most of them follow recurring structural patterns. The next step was therefore to identify term subgroups and build name templates. For example:
[Winery] [Wine name] [Color] [Type] or [Region] [Grape] [Typology]
When the internal structure of a wine name is not explicitly visible, such templates begin to serve as matching anchors.

Synonym Mania
One of the core ideas behind raFTI is the aggressive use of synonyms. Over time, this mechanism evolved into several distinct layers.
Primary Synonymy
At the token level, different forms of the same concept are treated as equivalent:
grammatical variations; common spelling mistakes; translations of individual words.
All such variants receive the same base index component.
Secondary Synonymy
During index construction, additional synonyms are generated automatically:
word permutations; alternative term combinations; different name forms.
Tertiary Synonymy
This layer is derived from dictionaries and object catalogs:
alternative names; historical names; abbreviations.
Quaternary Synonymy
Some synonym relationships emerge naturally during data processing. For example, winery names frequently appear together with city names. Such associations may later be used as additional matching hints.
Semantic Triggers
Over time, it became clear that excessive synonymy creates side effects of its own. To control this process, a mechanism called semantic triggers was introduced.
A trigger is a contextual condition that allows or blocks the use of specific synonyms and/or activates a group of attributes. For example:
some synonyms are valid only in wine names; others are valid only in region names; some are allowed only near specific anchor terms; the token "prosecco" automatically activates: color(white), co2(sparkling), country(Italy)
Over time, this mechanism evolved into a powerful semantic description tool in its own right.
Decision Signature
To understand why the system chooses one match over another, every candidate is represented by a compact 11-digit diagnostic code. Each position reflects the contribution of a specific component:

FF NGV RMC SAU
Component	Destination
FF	Final score
N	Name - criterion for the completeness of the name template
G	Grape (high impact on score)
V	Vineyard (high impact on score)
R	Region (mean impact on score)
M	Modifiers for class (low impact on score)
C	Color (high impact on score)
S	Sparkling - CO₂ (high impact on score)
A	Anchor - class type (high impact on score)
U	Unical words (high impact on score)

score	vineyard	wine name
19 339 339 934	Chateau Vartely	Taraboste Pinot Noir
9 339 349 932	Chateau Vartely	Taraboste Pur Aristocratic Rosu
9 339 339 924	Chateau Vartely	Taraboste Reserva Cabernet Sauvignon & Merlot

(explanations) At first glance, the second option seems convincing. The words (Rosu Sec) in the name serve as both characteristics and modifiers when explicitly stated in the name. As characteristics, they earn prizes for both samples; as modifiers, only the second sample receives a prize, but it is penalized more severely for not matching the winery. (The operator chose the second option based on the number of word matches)
Comparing Matchers
During development, raFTI was continuously compared against traditional full-text search engines. In particular, several experiments were performed using the search mechanisms available in MySQL.FTI. In practice, a number of characteristic differences emerged. Full-text search performs well when:
word order is stable; identical terms are used; the text structure is relatively simple.
Wine names, however, often violate all three assumptions. Typical issues include:
shuffled term order; partially missing information; additional descriptive words.
In such situations, conventional search tends to produce a large number of irrelevant candidates. raFTI addresses this problem through:
structural templates; term typology; category-specific bonuses and penalties.
As a result, the candidate space becomes significantly smaller and the matching results more stable. Numerous statistically significant comparisons of the matching results yield the following distribution of correct answers:

Matcher	Result
MySQL.FTI	40 % correct answers
raFTI.v5.3	96 % correct answers

The Anti-Combinatorial Effect
At first glance, a system built around numerous heuristics should suffer from combinatorial explosion. In practice, the opposite effect is observed. Real-world wine names are generated within a relatively small set of structural patterns. As a result, additional rules do not expand the search space. Instead, they help eliminate impossible combinations at an earlier stage. Rather than increasing complexity, many heuristics act as filters that progressively narrow the set of plausible candidates.

The Core Idea Behind raFTI
is based on the Relate Assemblies (RA) methodology. Among other concepts, the methodology includes:
structural object indexing; meta-description indexing; multi-layer data integrity.
In particular, two complementary forms of referential integrity are used:
Forward Reference Integrity; Reverse Reference Integrity.
The broader RA methodology is discussed in more detail: here
Final Thoughts
In the age of neural networks, raFTI may appear somewhat old-fashioned. That is probably true. Nevertheless, it remains an effective approach for domains where transparency, controllable knowledge representation, and predictable behavior are more important than raw statistical inference. Several ideas turned out to be considerably more useful than originally expected:
structural name templates; multi-level synonymy; semantic triggers; compact decision signature.
This article describes only a subset of the mechanisms used by the system. Many implementation details were intentionally omitted to avoid turning the discussion into technical documentation. Practical use of the system continues to reveal new directions for development. In particular, the accumulated synonym data has proven valuable not only for improving recognition quality, but also for discovering missing relationships in the domain model and refining heuristics for rare or erroneous data forms.
Comments, criticism, alternative approaches, or simply a fresh perspective on the problem are always welcome.
TECHNOLOGIES ARE CHANGING, CRYSTALLIZATION OF DATA - NOT

(Help, translations, graphics by ChatGPT)