{"id":148,"date":"2011-07-25T09:33:21","date_gmt":"2011-07-25T08:33:21","guid":{"rendered":"http:\/\/iamwcew\/blog\/?p=148"},"modified":"2011-07-25T09:33:21","modified_gmt":"2011-07-25T08:33:21","slug":"oracle-text-setup-for-people-search","status":"publish","type":"post","link":"https:\/\/gpmfactory.com\/index.php\/2011\/07\/25\/oracle-text-setup-for-people-search\/","title":{"rendered":"Oracle text setup for People Search"},"content":{"rendered":"<p>This post is a compilation of informations about\u00a0 the use of Oracle Text.\u00a0The particular context of Name Matching is illustrated by some features extracted from the documentation. 28 juin 2011 &#8211; Oracle V 0.1<\/p>\n<h2>Index Setup<\/h2>\n<h3>Type d&rsquo;index<\/h3>\n<p>Il existe deux types d&rsquo;index qui sont adapt\u00e9s \u00e0 l&rsquo;optimisation des recherches sur des textes:<\/p>\n<ul>\n<li>CONTEXT<\/li>\n<li>CTXCAT<\/li>\n<\/ul>\n<p>En principe, CTXCAT est mieux adapt\u00e9 pour l&rsquo;indexation des fragments de texte courts (des noms, par exemple) . cela n&#8217;emp\u00eache pas l&rsquo;utilisation du premier type\u00a0: CONTEXT.<\/p>\n<p>The\u00a0CTXCAT\u00a0indextype is well-suited for indexing small text fragments and related information. If created correctly, this type of index can provide better structured query performance over a\u00a0CONTEXT\u00a0index.<\/p>\n<p>&nbsp;<\/p>\n<h3>Pour le <em>lexer<\/em> et les variations grammaticales\u00a0:<\/h3>\n<p>Lors de la cr\u00e9ation d&rsquo;un index de type CONTEXT, on pr\u00e9cise en g\u00e9n\u00e9ral un jeu de pr\u00e9f\u00e9rences qui agit sur le comportement des recherches ult\u00e9rieures, vis-\u00e0-vis notamment de r\u00e8gles grammaticales.<\/p>\n<p>On peut dire que les pr\u00e9f\u00e9rences les plus importantes sont\u00a0:<\/p>\n<ul>\n<li>La liste des mots vides <a href=\"http:\/\/download.oracle.com\/docs\/cd\/E11882_01\/text.112\/e16593\/cdatadic.htm\">(stoplist)<\/a><\/li>\n<li><a href=\"http:\/\/download.oracle.com\/docs\/cd\/E11882_01\/text.112\/e16593\/cdatadic.htm\">Le lexer<\/a> (caract\u00e8re accentu\u00e9s, mani\u00e8re de d\u00e9couper les tokens, etc)<\/li>\n<li>Le stemmer\u00a0: Recherche des formes fl\u00e9chies (formes plurielles, conjugu\u00e9es)<br \/>\npr\u00e9f\u00e9rence <a href=\"http:\/\/download.oracle.com\/docs\/cd\/E11882_01\/text.112\/e16593\/cdatadic.htm\">BASIC_WORDLIST<\/a><\/li>\n<\/ul>\n<p>Pour le stemmer, c&rsquo;est un stemmer multi-langues qui vaudra mieux activer. Pour le v\u00e9rifier, il faudrait faire une recherche avec \u00ab\u00a0cheval\u00a0\u00bb qui devrait retourner les textes contenant \u00ab\u00a0chevaux\u00a0\u00bb, par exemple. Cela d\u00e9pend du default langue de la database.<\/p>\n<p>Dans Oracle Text, l&rsquo;op\u00e9rateur de stemming est le \u00ab\u00a0$\u00a0\u00bb<\/p>\n<p>La recherche ci-dessus s&rsquo;exprimerait ainsi\u00a0:<\/p>\n<p>Select * from TEST where CONTAINS(espece, &lsquo;$cheval&rsquo;)&gt;0<\/p>\n<p>En ce qui concerne la stop List, il faut consid\u00e9rer que les recherches s&rsquo;appliquent \u00e0 des noms propres, et que par cons\u00e9quent, tout les mots sont significatif. Alors que l&rsquo;article \u00ab\u00a0de\u00a0\u00bb est consid\u00e9r\u00e9 comme un mot vide, ici dans le cas d&rsquo;un phone book, il devra pouvoir faire l&rsquo;objet de recherche. Par cons\u00e9quent, la stoplist ne sera pas activ\u00e9e pour l&rsquo;index.<br \/>\nVoici un extrait de la doc sur Oracle Text concernant le traitement des formes \u00ab\u00a0fl\u00e9chies\u00a0\u00bb.<br \/>\nles formes \u00ab\u00a0plurielles\u00a0\u00bb d&rsquo;un mot sont bien trait\u00e9es dans le cas du Fran\u00e7ais (inflectional stemming).<\/p>\n<h3>Stemming<\/h3>\n<p>Stemming expands a search to include all words with the same linguistic root, by performing a morphological<br \/>\nanalysis of the word which allows for a search on both the root form of a word, and its inflected or derived<br \/>\nforms.<\/p>\n<p>1. Inflectional Stemming ? For all the supported languages, the stemmers return standard inflected forms of a word. In English, an<br \/>\ninflection is a change in the number such as the plural form of a noun &#8211; dog \u00e0 dogs, or a change in the tense such as the conjugated<br \/>\nforms of a verb &#8211; to run \u00e0 run, runs, running, ran.<br \/>\n2. Derivational Stemming ? For English, an French\u00a0the stemmer also returns standard derived forms of a word.<br \/>\nLa recherche de formes au pluriel fonctionne correctement pour le Fran\u00e7ais.<br \/>\nPour reprendre l&rsquo;exemple \u00e9quin, une recherche sur CHEVAL, ramenera bien CHEVAL mais \u00e9galement CHEVAUX, pour peu que le param\u00e9trage des options d&rsquo;indexation ait \u00e9t\u00e9 r\u00e9alis\u00e9 pr\u00e9alablement.<\/p>\n<h3>Exemple de mise en \u0153uvre d&rsquo;Oracle text<\/h3>\n<p><span style=\"font-family: Courier New;\">create table ttext (lib varchar2(2000));<br \/>\nexec ctx_ddl.create_preference(&lsquo;basic_wtest&rsquo;, &lsquo;BASIC_WORDLIST&rsquo;)<br \/>\nexec ctx_ddl.set_attribute(&lsquo;basic_wtest&rsquo;,&rsquo;STEMMER&rsquo;,&rsquo;FRENCH&rsquo;)<br \/>\nexec ctx_ddl.set_attribute(&lsquo;basic_wtest&rsquo;,&rsquo;FUZZY_MATCH&rsquo;,&rsquo;FRENCH&rsquo;)<\/p>\n<p>create index ttext_i on ttext(lib) indextype is ctxsys.context ONLINE<br \/>\nparameters (&lsquo;wordlist basic_wtest sync on-commit&rsquo; );<\/p>\n<p>insert into ttext values (&lsquo;cheval&rsquo;);<br \/>\ninsert into ttext values (&lsquo;chevaline&rsquo;);<br \/>\ninsert into ttext values (&lsquo;chevaux&rsquo;);<br \/>\n<\/span><\/p>\n<p>Test de bon fonctionnement:<\/p>\n<p><span style=\"font-family: Courier New;\"><br \/>\nselect * from ttext<br \/>\nwhere contains (lib, &lsquo;$cheval&rsquo;) &gt;0;<\/span><\/p>\n<p>R\u00e9sultat\u00a0:<br \/>\n<span style=\"font-family: Courier New;\">LIB<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\ncheval<br \/>\nchevaux<br \/>\n<\/span><\/p>\n<h3>Caract\u00e8res diacritiques (accents, etc..)<\/h3>\n<p>Some languages contain characters with diacritical marks such as tildes, umlauts, and accents. When your indexing operation converts words containing diacritical marks to their base letter form, queries need not contain diacritical marks to score matches. For example in Spanish with a base-letter index, a query of\u00a0energ\u00eda\u00a0matches\u00a0energ\u00eda\u00a0and\u00a0energia\u00a0in the index.<\/p>\n<p>However, with base-letter indexing disabled, a query of\u00a0energ\u00eda\u00a0matches only\u00a0energ\u00eda.<\/p>\n<p>You can enable and disable base-letter indexing for your language with the\u00a0base_letter\u00a0attribute of the\u00a0BASIC_LEXER\u00a0preference type.<\/p>\n<h3>Recherches approch\u00e9es<\/h3>\n<p>Fuzzy matching enables you to match similarly spelled words in queries.<\/p>\n<p>Fuzzy matching and stemming are automatically enabled in your index if Oracle Text supports this feature for your language.<\/p>\n<p>Fuzzy matching is enabled with default parameters for its similarity score lower limit and for its maximum number of expanded terms. At index time you can change these default parameters.<\/p>\n<p>To improve the performance of stem queries, create a stem index by enabling the\u00a0index_stems\u00a0attribute of the\u00a0BASIC_LEXER.<\/p>\n<h3>Caract\u00e8res jokers<\/h3>\n<p>Wildcard queries enable you to enter left-truncated, right-truncated and doubly truncated queries, such as\u00a0%ing,\u00a0cos%, or\u00a0%benz%. With normal indexing, these queries can sometimes expand into large word lists, degrading your query performance.<\/p>\n<p>Wildcard queries have better response time when token prefixes and substrings are recorded in the index.<\/p>\n<p>By default, token prefixes and substrings are not recorded in the Oracle Text index. If your query application makes heavy use of wildcard queries, consider indexing token prefixes and substrings. To do so, use the wordlist preference type. The trade-off is a bigger index for improved wildcard searching.<\/p>\n<p>The following example sets the wordlist preference for prefix and substring indexing. Having a prefix and sub-string component to your index improves performance for wildcard queries.<\/p>\n<p>For prefix indexing, the example specifies that Oracle Text create token prefixes between three and four characters long:<\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"color: black; font-family: Courier New;\">begin<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"color: black; font-family: Courier New;\">ctx_ddl.create_preference(&lsquo;mywordlist&rsquo;, &lsquo;BASIC_WORDLIST&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"color: black; font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;mywordlist&rsquo;,&rsquo;PREFIX_INDEX&rsquo;,&rsquo;TRUE&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"color: black; font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;mywordlist&rsquo;,&rsquo;PREFIX_MIN_LENGTH&rsquo;, &lsquo;3&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"color: black; font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;mywordlist&rsquo;,&rsquo;PREFIX_MAX_LENGTH&rsquo;, &lsquo;4&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"color: black; font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;mywordlist&rsquo;,&rsquo;SUBSTRING_INDEX&rsquo;, &lsquo;YES&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"color: black; font-family: Courier New;\">end;<br \/>\n<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">begin<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">Ctx_Ddl.Create_Preference (&lsquo;BASE_LETTER_PREF&rsquo;,&rsquo;BASIC_LEXER&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">Ctx_Ddl.Set_Attribute ( &lsquo;BASE_LETTER_PREF&rsquo;, &lsquo;BASE_LETTER&rsquo;, &lsquo;YES&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">Ctx_Ddl.Create_Preference(&lsquo;STEM_FUZZY_PREF&rsquo;, &lsquo;BASIC_WORDLIST&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;STEM_FUZZY_PREF&rsquo;,&rsquo;FUZZY_MATCH&rsquo;,&rsquo;FRENCH&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;STEM_FUZZY_PREF&rsquo;,&rsquo;FUZZY_SCORE&rsquo;,&rsquo;0&prime;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;STEM_FUZZY_PREF&rsquo;,&rsquo;FUZZY_NUMRESULTS&rsquo;,&rsquo;5000&prime;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;STEM_FUZZY_PREF&rsquo;,&rsquo;STEMMER&rsquo;,&rsquo;FRENCH&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">end;<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">\/<br \/>\n<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.create_stoplist(&lsquo;liste_stop&rsquo;,&rsquo;MULTI_STOPLIST&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.create_preference(&lsquo;WL_MULTI&rsquo;,&rsquo;basic_wordlist&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;WL_MULTI&rsquo;, &lsquo;STEMMER&rsquo;, &lsquo;AUTO&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;WL_MULTI&rsquo;, &lsquo;FUZZY_MATCH&rsquo;, &lsquo;AUTO&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">execute Ctx_Ddl.Create_Preference (&lsquo;FRENCH_BASIC_LEXER&rsquo;,&rsquo;BASIC_LEXER&rsquo;)<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">ctx_ddl.set_attribute(&lsquo;FRENCH_BASIC_LEXER&rsquo;,&rsquo;base_letter&rsquo;,&rsquo;yes&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">Ctx_Ddl.Add_Sub_Lexer ( &lsquo;GLOBAL_LEXER&rsquo;, &lsquo;f&rsquo;,&rsquo;FRENCH_BASIC_LEXER&rsquo; );<br \/>\n<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin-left: 36pt;\">Le code qui suit permet de forcer la construction de l&rsquo;index pour les particularit\u00e9s de la langue Fran\u00e7aise.<\/p>\n<p><span style=\"font-family: Courier New;\">BEGIN ctx_ddl.set_attribute(&lsquo;WKSYS.WK_WORDLIST&rsquo;,<br \/>\n&lsquo;STEMMER&rsquo;,<br \/>\n&lsquo;FRENCH&rsquo;);<br \/>\nEND;<\/p>\n<p>BEGIN ctx_ddl.set_attribute(&lsquo;WKSYS.WK_BASIC_LEXER&rsquo;,<br \/>\n&lsquo;INDEX_THEMES&rsquo;, &lsquo;YES&rsquo;);<br \/>\nctx_ddl.set_attribute(&lsquo;WKSYS.WK_BASIC_LEXER&rsquo;,<br \/>\n&lsquo;CONTINUATION&rsquo;,&rsquo;-&lsquo;);<br \/>\nctx_ddl.set_attribute(&lsquo;WKSYS.WK_BASIC_LEXER&rsquo;,<br \/>\n&lsquo;THEME_LANGUAGE&rsquo;,<br \/>\n&lsquo;FRENCH&rsquo;);<br \/>\nEND;<br \/>\nALTER INDEX WK_TEST.WK$DOC_PATH_IDX<br \/>\nREBUILD\u00a0 PARAMETERS (&lsquo;REPLACE LEXER WKSYS.WK_BASIC_LEXER<br \/>\nWORDLIST WKSYS.WK_WORDLIST&rsquo;);<br \/>\n<\/span><\/p>\n<p>&nbsp;<\/p>\n<h4>Syntaxe pour la cr\u00e9ation d&rsquo;un index textuel\u00a0:<\/h4>\n<p>&nbsp;<\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">CREATE INDEX [schema.]index ON [schema.]table(txt_column)<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> INDEXTYPE IS ctxsys.context [ONLINE]<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> [FILTER BY filter_column[, filter_column]&#8230;]<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> [ORDER BY oby_column[desc|asc][, oby_column[desc|asc]]&#8230;]<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> [LOCAL [(PARTITION [partition] [PARAMETERS(&lsquo;paramstring&rsquo;)]<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">[, PARTITION [partition] [PARAMETERS(&lsquo;paramstring&rsquo;)]])]<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">[PARAMETERS(paramstring)] [PARALLEL n] [UNUSABLE]];<br \/>\n<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3>Exemples de syntaxes de recherche<\/h3>\n<div>\n<table style=\"border-collapse: collapse;\" border=\"0\">\n<colgroup>\n<col style=\"width: 119px;\" \/>\n<col style=\"width: 155px;\" \/>\n<col style=\"width: 235px;\" \/>\n<col style=\"width: 85px;\" \/><\/colgroup>\n<tbody valign=\"top\">\n<tr style=\"background: #cccccc;\">\n<td style=\"border: outset 0.75pt; padding: 2px;\">Interrogation<\/td>\n<td style=\"border-top: outset 0.75pt; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">exemple de pr\u00e9dicat en format natif<\/p>\n<\/td>\n<td style=\"border-top: outset 0.75pt; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">Commentaires<\/p>\n<\/td>\n<td style=\"border-top: outset 0.75pt; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">Type d&rsquo;op\u00e9rateur<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">recettes avec du riz<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">Riz<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">mot simple<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\"><\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">recettes avec des sardines<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">$sardine<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">op\u00e9rateur de <a href=\"D:appportal10Gultrasearchetudesmise En oeuvre&quot; l \">stemming <\/a>($) afin de rechercher les<br \/>\n<a href=\"D:appportal10Gultrasearchetudesmise En oeuvre&quot; l \">formes fl\u00e9chies<\/a> (formes plurielles dans ce cas)<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">expansion<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recettes avec le verbe ventiler<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">$fr\u00e9mir<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">op\u00e9rateur de stemming pour rechercher les formes conjugu\u00e9es (<em>inflectional stemming<\/em>). Le <em>derivational stemming<\/em> ne semble pas etre support\u00e9 pour le Fran\u00e7ais<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">expansion<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recettes avec des radis mais SANS beurre<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">radis NOT beurre<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">op\u00e9rateur NOT<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">bool\u00e9en<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recettes avec des radis ET du beurre<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">radis &amp; beurre<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">op\u00e9rateur &amp; ou bien AND<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">bool\u00e9en<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recettes avec, de pr\u00e9f\u00e9rence, des radis et du beurre ensemble<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">radis,beurre<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">le symbole \u00ab\u00a0,\u00a0\u00bb agit sur le score. l&rsquo;ordre des recettes sera d\u00e9livr\u00e9e en fonction du score.<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">score<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recettes avec des radis, mais de pr\u00e9f\u00e9rence sans beurre<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">radis \u2013 beurre<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">le symbole \u00ab\u00a0&#8211;\u00a0\u00bb devrait donner, en priorit\u00e9, les recettes avec radis mais sans beurre. (l&rsquo;exemple ne semble ne pas fonctionner!)<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">score<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recettes contenant le fragment\u00a0 sardi<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">%sardi%<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">le caract\u00e8re jocker est \u00ab\u00a0%\u00a0\u00bb et non pas \u00ab\u00a0*\u00a0\u00bb<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">expansion<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recette avec les termes cuill\u00e8re et\u00a0 arachide assez proches l&rsquo;un de l&rsquo;autre<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">cuill\u00e8re;arachide<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">operateur de proximit\u00e9 NEAR ou bien \u00ab\u00a0;\u00a0\u00bb.<br \/>\n100 mots d&rsquo;\u00e9cart maximum par d\u00e9faut<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">proximit\u00e9<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recette avec les termes cuill\u00e8re et\u00a0 arachide proches l&rsquo;un de l&rsquo;autre<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">NEAR((cuill\u00e8re,arachide), 10, TRUE)<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">op\u00e9rateur de proximit\u00e9<br \/>\n\u00e9cart de moins de 10 mots, et dans l&rsquo;ordre pr\u00e9cis\u00e9<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">proximit\u00e9<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">Recettes avec pomme ou pome<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">?pome<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">op\u00e9rateur fuzzy pour les mots mal orthographi\u00e9s ou avec des lettres manquantes<\/p>\n<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 2px;\">\n<p style=\"text-align: justify;\">expansion<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<h3>Name matching<\/h3>\n<p>Someone accustomed to the spelling rules of one culture can have difficulty applying those same rules to a name originating from a different culture. Name matching provides a solution to match proper names that might differ in spelling due to orthographic variation. It also enables you to search for somewhat inaccurate data, such as might occur when a record&rsquo;s first name and surname are not propertly segmented.<\/p>\n<p>ndata_thesaurus<\/p>\n<p style=\"margin-left: 36pt;\">Specify a name of the thesaurus used for alternate name expansion. The indexing engine expands names in documents using synonym rings in the thesaurus. A user should make use of homographic disambiguating feature of the thesaurus to distinguish common nicknames.<\/p>\n<p style=\"margin-left: 36pt;\">An example is:<\/p>\n<p style=\"margin-left: 36pt;\">Albert<\/p>\n<p style=\"margin-left: 36pt;\">SYN Al<\/p>\n<p style=\"margin-left: 36pt;\">SYN Bert<\/p>\n<p style=\"margin-left: 36pt;\">Alfred<\/p>\n<p style=\"margin-left: 36pt;\">SYN Al<\/p>\n<p style=\"margin-left: 36pt;\">SYN Fred<\/p>\n<p style=\"margin-left: 36pt;\">A simple definition such as the above will put Albert, Alfred, Al, Bert, and Fred into the same synonym ring. This will cause an unexpected expansion such that the expansion of Bert includes Fred. To prevent this, you can use homographic disambiguation as in:<\/p>\n<p style=\"margin-left: 36pt;\">Albert<\/p>\n<p style=\"margin-left: 36pt;\">SYN Al (Albert)<\/p>\n<p style=\"margin-left: 36pt;\">SYN Bert (Albert)<\/p>\n<p style=\"margin-left: 36pt;\">Alfred<\/p>\n<p style=\"margin-left: 36pt;\">SYN Al (Alfred)<\/p>\n<p style=\"margin-left: 36pt;\">SYN Fred (Alfred)<\/p>\n<p style=\"margin-left: 36pt;\">This forms two synonym rings, Albert-Al-Bert and Alfred-Al-Fred. Thus, the expansion of Bert no longer includes Fred. A more detailed example is:<\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">begin<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> ctx_ddl.create_preference(&lsquo;NDAT_PREF&rsquo;, &lsquo;BASIC_WORDLIST&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> ctx_ddl.set_attribute(&lsquo;NDATA_PREF&rsquo;, &lsquo;NDATA_ALTERNATE_SPELLING&rsquo;, &lsquo;FALSE&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> ctx_ddl.set_attribute(&lsquo;NDATA_PREF&rsquo;, &lsquo;NDATA_BASE_LETTER&rsquo;, &lsquo;TRUE&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\"> ctx_ddl.set_attribute(&lsquo;NDATA_PREF&rsquo;, &lsquo;NDATA_THESAURUS&rsquo;, &lsquo;NICKNAMES&rsquo;);<br \/>\n<\/span><\/p>\n<p style=\"margin-left: 36pt;\"><span style=\"font-family: Courier New;\">end;<br \/>\n<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>A sample thesaurus for names can be found in the\u00a0$ORACLE_HOME\/ctx\/sample\/thesdirectory. This file is\u00a0dr0thsnames.txt.<\/p>\n<p>&nbsp;<\/p>\n<p>ndata_join_particles<\/p>\n<p style=\"margin-left: 36pt;\">Specify a list of colon-separated name particles that can be joined with a name that follows them. A name particle, such as da, is written separately from or joined with its following name like da Vinci or daVinci. The indexing engine generates index data for both separated and join versions of a name when it finds a name particle specified in this prefence. The same happens in the query processing for better recall.<\/p>\n<h3>NDATA<\/h3>\n<p>Use the\u00a0NDATA\u00a0operator to find matches that are spelled in a similar way or where rearranging the terms of the specified phrase is useful. It is helpful for finding more accurate results when there are frequent misspellings (or inaccurate orderings) of name data in the document set. This operator can be used only on defined\u00a0NDATA\u00a0sections. The\u00a0NDATA\u00a0syntax enables you to rank the result set so that documents that contain words with high orthographic similarity are scored higher than documents with lower similarity.<\/p>\n<p>Normalization<\/p>\n<p>A lexer does not process\u00a0NDATA\u00a0query phrases. Users can, however, set base letter and alternate spelling attributes for a particular section group containing\u00a0NDATA\u00a0sections. Query case is normalized and non-character data (except for white space) is removed (for example, numerical or punctuation).<\/p>\n<p><a name=\"sthref1082\"><\/a>Syntax<\/p>\n<p>ndata(sectionname, phrase [,order][,proximity])<\/p>\n<div>\n<table style=\"border-collapse: collapse;\" border=\"0\">\n<colgroup>\n<col style=\"width: 85px;\" \/>\n<col style=\"width: 94px;\" \/>\n<col style=\"width: 426px;\" \/><\/colgroup>\n<tbody valign=\"top\">\n<tr>\n<td style=\"border: outset 0.75pt; padding: 3px;\" valign=\"bottom\">Parameter Name<\/td>\n<td style=\"border-top: outset 0.75pt; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\" valign=\"bottom\">Default Value<\/td>\n<td style=\"border-top: outset 0.75pt; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\" valign=\"bottom\">Parameter Description<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">sectionname<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\"><\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">Specify the name of a defined\u00a0NDATA\u00a0sections to query (that is,section_name)<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">phrase<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\"><\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">Specify the phrase for the name data query.<\/p>\n<p>The phrase parameter can be a single word or a phrase, or a string of words in free text format.<\/p>\n<p>The score returned is a relevant score.<\/p>\n<p>Oracle Text ignores any query operators that are included in\u00a0phrase.<\/p>\n<p>The phrase should be a minimum of two characters in length and should not exceed 4000 characters in length.<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">order<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">NOORDER<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">Specify whether individual tokens (terms) in a query should be matched in-order or in any order. The order parameter provides a primary filter for matching candidate documents.<\/p>\n<p>ORDER\u00a0or\u00a0O\u00a0&#8211; The query terms are matched in-order.<\/p>\n<p>NOORDER\u00a0o\u00a0N\u00a0[DEFAULT] &#8211; The query terms are matched in any order.<\/td>\n<\/tr>\n<tr>\n<td style=\"border-top: none; border-left: outset 0.75pt; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">proximity<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">NOPROXIMITY<\/td>\n<td style=\"border-top: none; border-left: none; border-bottom: outset 0.75pt; border-right: outset 0.75pt; padding: 3px;\">Specify whether the proximity of terms should influence the similarity score of candidate matches. That is, if the proximity parameter is enabled, non-matching additional terms between matching terms will reduce the similarity score of candidate matches.<\/p>\n<p>PROXIMITY\u00a0or\u00a0P\u00a0&#8211; The similarity score influenced by the proximity of query terms in candidate matches.<\/p>\n<p>NOPROXIMITY\u00a0or\u00a0N\u00a0[DEFAULT] &#8211; The similarity score is not influenced by the proximity of query terms in candidate matches.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>This post is a compilation of informations about\u00a0 the use of Oracle Text.\u00a0The particular context of Name Matching is illustrated by some features extracted&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,22,17],"tags":[],"ppma_author":[150],"class_list":["post-148","post","type-post","status-publish","format-standard","hentry","category-dev","category-francais","category-setup"],"authors":[{"term_id":150,"user_id":1,"is_guest":0,"slug":"admin8700","display_name":"Patrick","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/209d5ed69b74d288390621ab4c1d3773?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/posts\/148","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/comments?post=148"}],"version-history":[{"count":0,"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/posts\/148\/revisions"}],"wp:attachment":[{"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/media?parent=148"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/categories?post=148"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/tags?post=148"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/gpmfactory.com\/index.php\/wp-json\/wp\/v2\/ppma_author?post=148"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}