Ticket #5231 (closed Bug: fixed)

Opened 10 years ago

Last modified 7 years ago

Non-ASCII characters aren't indexed correctly

Reported by: jammon Owned by: hannosch
Priority: critical Milestone: 2.5.1
Component: General Version:
Keywords: Cc:


The search box in Plone 2.1.2 won't give search results when non-ASCII characters are used. When typing the search phrase everything is fine up to the non-ASCII character. When the next character is typed, there are "no matching results found."

You can try it at plone.org. Type "hä" into the search box and you get a list of pages including that of PrimaGIS 0.4.0 by Kai Hänninen. But when you proceed to "hän", there are no more results found.

Change History

comment:1 Changed 10 years ago by hannosch

  • Priority changed from minor to critical
  • Component changed from Search to Catalog
  • Summary changed from Search box fails with non-ASCII characters to Non-ASCII characters aren't indexed correctly

The real problem here is that non-ascii characters like 'ä' aren't index correctly. What happens is that a word split is introduced after them, so if you search for 'Hänninen' there are two words in the catalog: 'Hä' and 'nninen'. For some reason the metadata is fine though.

comment:2 Changed 10 years ago by alecm

  • Milestone changed from 2.5.x to 2.5.1

comment:3 Changed 10 years ago by alecm

  • Owner changed from somebody to hannosch

comment:4 Changed 10 years ago by hannosch

  • Status changed from new to closed
  • Resolution set to fixed

(In [10362]) Added a workaround for erroneous indexing behavior for words containing non-ascii characters. These were treated as word breaks so far. The code works for a site encoding of 'utf-8' now as well as proper unicode usage. This closes #5231.

comment:5 Changed 7 years ago by hannosch

  • Component changed from Catalog to Infrastructure

comment:6 Changed 4 years ago by davisagli

  • Component changed from Infrastructure to General
Note: See TracTickets for help on using tickets.