Ticket #7728 (closed Bug: fixed)

Opened 6 years ago

Last modified 6 years ago

turning on "Link using UIDs" breaks indexing of rich text fields that contain unicode characters out of the 128 range

Reported by: greenman Owned by: duncan
Priority: minor Milestone: 3.0.6
Component: Visual Editor Version:
Keywords: Cc: matt@…

Description

The problem is described here:

 http://groups.google.com/group/plone-users/browse_thread/thread/5cfb778fd4b5e454

See the second post for the resolution.

Plone 3.0.4 Kupu 1.4.7

Change History

comment:1 Changed 6 years ago by duncan

It sounds like the output transform is throwing an exception but the Field.py code is swallowing the traceback. Could you possibly change the pdb.set_trace() into raise and then post the resulting traceback?

comment:2 Changed 6 years ago by greenman

I think it's a little different to that. If we look at def getIndexable in Field.py there is the following

        f = self.get(instance)

If the Link using UIDs is set, then we get f returning unicode, e.g.:

u'\r\n<p>We\u2019ve come a long way since the Oxo stock cube or Maggi powder ....

This means the str() conversion in the getIndexable method will raise an exception

if Linkusing UIDs is unset, then we get a utf-8 encoded ascii string for f = self.get(instance) :

'\r\n<p>We\xe2\x80\x99ve come a long way since the Oxo stock cube or Maggi powder ....

Which str(f) is fine with.

The use of str() in this method feels wrong. ZCTextIndex is unicode aware, so why is it not:

unicode(f) 

and later on in the same method

unicode(datastream)

The only exceptions thrown are the 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128) ones.

comment:3 Changed 6 years ago by duncan

Yes, you already posted the exception, but not the traceback. That's why I asked if you could collect the traceback: so I could see the context of the exception.

Assuming you are correct, then I guess I need to change the transform so that it utf8 encodes the result, or get someone to change getIndexable. Obviously returning unicode doesn't break other things (like page rendering), and the indexing code really should be able to cope.

comment:4 Changed 6 years ago by greenman

ok. since the exception is swallowed higher up again if we raise it, here is a stack trace:

(Pdb) w   
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZServer/PubCore/ZServerPublisher.py(25)__init__()
-> response=b)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/Publish.py(401)publish_module()
-> environ, debug, request, response)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/Publish.py(202)publish_module_standard()
-> response = publish(request, module_name, after_list, debug=debug)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/pdbdebugmode/PDBDebugMode/__init__.py(47)pdb_publish()
-> mapply=mapply, )
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/Publish.py(119)publish()
-> request, bind=1)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/mapply.py(88)mapply()
-> if debug is not None: return debug(object,args,context)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/pdbdebugmode/PDBDebugMode/pdbzope/runcall.py(60)pdb_runcall()
-> return call_object(object, args, request)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/Publish.py(42)call_object()
-> result=apply(object,args) # Type s<cr> to step into published object.
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/FSControllerPageTemplate.py(90)__call__()
-> return self._call(FSControllerPageTemplate.inheritedAttribute('__call__'), *args, **kwargs)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/BaseControllerPageTemplate.py(28)_call()
-> return self.getNext(controller_state, REQUEST)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/ControllerBase.py(231)getNext()
-> return next_action.getAction()(controller_state)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/Actions/TraverseTo.py(38)__call__()
-> REQUEST, bind=1)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/mapply.py(88)mapply()
-> if debug is not None: return debug(object,args,context)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/Publish.py(42)call_object()
-> result=apply(object,args) # Type s<cr> to step into published object.
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/FSControllerPythonScript.py(106)__call__()
-> return self.getNext(result, self.REQUEST)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/ControllerBase.py(231)getNext()
-> return next_action.getAction()(controller_state)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/Actions/TraverseTo.py(38)__call__()
-> REQUEST, bind=1)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/mapply.py(88)mapply()
-> if debug is not None: return debug(object,args,context)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/ZPublisher/Publish.py(42)call_object()
-> result=apply(object,args) # Type s<cr> to step into published object.
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/FSControllerPythonScript.py(104)__call__()
-> result = FSControllerPythonScript.inheritedAttribute('__call__')(self, *args, **kwargs)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFFormController/Script.py(145)__call__()
-> return BaseFSPythonScript.__call__(self, *args, **kw)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFCore/FSPythonScript.py(140)__call__()
-> return Script.__call__(self, *args, **kw)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/Shared/DC/Scripts/Bindings.py(313)__call__()
-> return self._bindAndExec(args, kw, None)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/Shared/DC/Scripts/Bindings.py(350)_bindAndExec()
-> return self._exec(bound_data, args, kw)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFCore/FSPythonScript.py(196)_exec()
-> result = f(*args, **kw)
  /Users/matt/development/endev/hfm/hfm.site.development/Script (Python)(1)content_edit()
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFCore/FSPythonScript.py(140)__call__()
-> return Script.__call__(self, *args, **kw)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/Shared/DC/Scripts/Bindings.py(313)__call__()
-> return self._bindAndExec(args, kw, None)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/Shared/DC/Scripts/Bindings.py(350)_bindAndExec()
-> return self._exec(bound_data, args, kw)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFCore/FSPythonScript.py(196)_exec()
-> result = f(*args, **kw)
  /Users/matt/development/endev/hfm/hfm.site.development/Script (Python)(13)content_edit_impl()
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/Archetypes/BaseObject.py(655)processForm()
-> REQUEST=REQUEST, values=values)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/Archetypes/BaseObject.py(646)_processForm()
-> self.reindexObject()
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/Archetypes/CatalogMultiplex.py(114)reindexObject()
-> c.catalog_object(self, url, idxs=lst)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/CMFPlone/CatalogTool.py(386)catalog_object()
-> update_metadata, pghandler=pghandler)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/distros/CacheSetup/patch.py(96)catalog_object()
-> uid, idxs, update_metadata, pghandler)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/distros/CacheSetup/patch_utils.py(6)call()
-> return getattr(self, PATTERN % __name__)(*args, **kw)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/pdbdebugmode/PDBDebugMode/zcatalog.py(18)pdb_catalog_object()
-> update_metadata=update_metadata, pghandler=pghandler)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/Products/ZCatalog/ZCatalog.py(535)catalog_object()
-> update_metadata=update_metadata)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/Products/ZCatalog/Catalog.py(360)catalogObject()
-> blah = x.index_object(index, object, threshold)
  /Users/matt/development/endev/hfm/hfm.site.development/parts/zope2/lib/python/Products/ZCTextIndex/ZCTextIndex.py(187)index_object()
-> text = text()
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/Archetypes/BaseObject.py(545)SearchableText()
-> datum =  method()
  /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/Archetypes/Field.py(1170)<lambda>()
-> return lambda: self.getIndexable(instance)
> /Users/matt/development/endev/hfm/hfm.site.development/parts/plone/Archetypes/Field.py(1193)getIndexable()
-> str(f),

While UTF-8 encoding doesn't seem right, there are facets of use of the catalog that seem to operate in the same way - e.g plugging in some unicode in to plone's site wide search ends up with the following in SearchableText:

'SearchableText': 'We\xe2\x80\x99ve*'

And this will successfully match indexed content (while the link by UID feature is off)

So while I think it's wrong that utf-8 is appearing in SearchableText and used for representing index values, it might be the case that it succeeds in this common use-case - i.e. site search of indexed attributes.

zope.formlib however does do the conversion - so in some search forms I have I'll get 'SearchableText':u'We\u2019ve' ... which will(I assume) miss the content indexed using utf-8 encoded representations.

comment:5 Changed 6 years ago by duncan

  • Status changed from new to closed
  • Resolution set to fixed

Fixed in SVN revision 51112. The transform now utf8 encodes its result (if it was unicode) and that seems to avoid upsetting the getIndexable method.

Note: See TracTickets for help on using tickets.