web planner - ignore apostrophe character in online search text

Andrew Heard shared this problem ago

In Progress

I was searching for "Saint John's Point, Donegal, Ireland". It is not in the list for 1st screen capture because (apparently) I left out the apostrophe character (') - duh.

2bcc4824cb4c65623f9a012f5c6bf9c2

When I type the exact name as per the map on right hand side (found with Google Maps by the way), the online search has the correct place top of list.

5147ecba67e4a357e0017c66b4e255e8

Other equivalent search items would be

St Johns Point
St John's Point

The same problem 0

Replies (4)

Radim Večerek ●

Hello Andrew. This will sound silly, but the problem here is that we don't know the name of the lighthouse is in English. Vast majority of names in the OSM dataset we use does not specify its language. So we cannot use language specific mechanisms. The apostrophe is not the problem, that is treated correctly, problem is that 'john' is not equal to 'johns' in general. In English - yes, like (rock, rocks), in another languages - not. The proper, world-wide solution is close to impossible, I am afraid. But some assumptions based on geography could be used in Europe, USA e.t.c. That is a point to consider.

Reply URL

Andrew Heard ●

>The proper, world-wide solution is close to impossible

it would appear Google Maps have done the impossible?

and maybe my English-speaking simplistic quick-hack-solution but if the apostrophe character were removed from the search string, then the match would have been made? Surely better than the current frustrating situation? Ironically (sadly) since the introduction of Locus online search in LM4.17 I've been using Google Maps more & more rather than less & less.

Reply URL

Radim Večerek ●

No, this simplistic hack is a) already done early in the chain of events anyway, b) does not address the problem.

Reply URL

Andrew Heard ●

Radim, if already done why are search results for "Saint John's Point" very different to "Saint Johns Point"?

URL

Radim Večerek ●

The problem here is the absence of what is called 'stemming' or 'lemmatization'. Boulders have the same 'stemm' as boulder. Johns is the same as John. This is of course done properly in cases we know for certain the language used is en, de, es, fr and a few others. But in the most cases the language is unknown - the mappers did not tell us specifically - 's' is not recognized as just a plural form. I agree it would be appropriate to assume en is used in England, de is used in Germany. If we use osm data, there is no other option really. But doing this language geo-localization or even recognition by searched string in India, Africa? Very hard imho. This problem is unfortunatelly much deeper than some 'bug' to fix. First step in the solution: building a map of most used languages per region. Then decisions like: yes, en in England is a safe asumption. India - better leave this map blank. And so on.

Reply URL