Ticket #9209 (closed Bug: fixed)

Opened 3 years ago

Last modified 2 years ago

CMFDiffTool with double-byte character

Reported by: terapyon Owned by: alecm
Priority: minor Milestone: 4.0
Component: Versioning Keywords: diff double-byte Japanese
Cc:

Description

Show strange character because "difftool view" split double-byte character in two different.
You had better chenge as follows for better handling of double-byte language.

TextDiff.py(with Plone3.3)

    def html_diff(self, context=True, wrapcolumn=40):
        """Return an HTML table showing differences"""
        a = [safe_utf8(i) for i in self._parseField(self.oldValue)]
        b = [safe_utf8(i) for i in self._parseField(self.newValue)]

TextDiff.py(with Plone3.3) to better

    from Products.CMFDiffTool.utils import safe_utf8, safe_unicode
    def html_diff(self, context=True, wrapcolumn=40):
        """Return an HTML table showing differences"""
        a = [safe_unicode(i) for i in self._parseField(self.oldValue)]
        b = [safe_unicode(i) for i in self._parseField(self.newValue)]



TextDiff.py(with Plone3.2.1)

    def html_diff(self, context=True, wrapcolumn=40):
        """Return an HTML table showing differences"""
        a = [str(i) for i in self._parseField(self.oldValue)]
        b = [str(i) for i in self._parseField(self.newValue)]

TextDiff.py(with Plone3.2.1) to better

    def html_diff(self, context=True, wrapcolumn=40):
        """Return an HTML table showing differences"""
        a = [i.decode('utf-8', 'replace') for i in self._parseField(self.oldValue)]
        b = [i.decode('utf-8', 'replace') for i in self._parseField(self.newValue)]

Change History

comment:2 Changed 3 years ago by alecm

I understand the issue with the 3.2 version, though it is unlikely there will be any further 3.2 releases. The patch you provide for 3.2 is not acceptable because the field value may already be unicode. For 3.3 I don't understand why the current use of safe_utf8 causes issues. That method simply uses safe_unicode (as your patch does), followed by an explicit encode into utf8? Is there some problem with performing utf8 conversion? Unfortunately, passing unicode directly into the template is likely to cause problems in some configurations.

Could you provide a test that demonstrates the issue for the Plone 3.3, 0.5 branch of CMFDiffTool?

comment:3 Changed 3 years ago by alecm

BTW, if you need this addressed for Plone 3.2, you can simply use the latest 0.5 release of CMFDiffTool with Plone 3.2 (installed manually or by updating the version in buildout). If the current 0.5 release doesn't address this issue, please provide a test or details on how to reproduce the problem. I think 0.5 should handle this issue since the template should be rendering utf-8 correctly, unless you are using some alternate encoding for your template output, which AFAIR is not really supported in Plone.

comment:4 Changed 2 years ago by alecm

  • Status changed from new to closed
  • Resolution set to fixed
  • Milestone changed from 3.x to 4.x

Fixed using the above patch for Plone 4.0 only. It appears that python's difflib in 2.4 does not support use of unicode strings because of internal use of cStringIO, so the fix is not appropriate for any release requiring python 2.4. The difflib html table generator appears to split multi-byte characters, using unicode directly fixes this in python 2.5 and 2.6.

comment:5 Changed 2 years ago by hannosch

  • Milestone changed from 4.x to 4.0
Note: See TracTickets for help on using tickets.