Ticket #9497 (closed Bug: fixed)

Opened 5 years ago

Last modified 4 years ago

PortalTransforms scalability problems (performance improvement, a patch for _findPath)

Reported by: sig Owned by: hannosch
Priority: critical Milestone: 3.3.5
Component: Archetypes Version:
Keywords: archetypes, portaltransforms, _findPath, _getPaths, scalability, performance, optimization Cc:

Description

Products.PortalTransforms.TransformEngine has a terrible _findPath method in terms of performance and scalability (especially against the number of mimetypes supported by transforms). For instance, if you have an image transformer which supports, let's say, 15 mimetypes in input and outputs, then the current naive algorithm will do about 15! (factorial 15 = 1.307.674.368.000) loops.

Here is a discussion thread mentioning the horror in _findPath/_getPaths :  http://plone.org/support/forums/general#nabble-td310736

Here is a patch which proposes 2 levels of optimization :

1) simple optimization heuristics (early path pruning) in _getPaths (changes the semantic of _getPaths a bit but this shouldn't matter)

2) a better shortest-path finding algorithm in _findPath (which, BTW, does not call _getPaths any more and _getPaths might then be removed)

If ever you don't want to deal with the proposed path-finding algorithm, please note that you may ignore this second level of optimization (changes in _findPath) and just commit the first one (changes in _getPaths). The improvement in scalability and performance will already be significant.

What do you think?

Attachments

PortalTransforms-patchWithoutHeaders.diff Download (7.5 KB) - added by sig 5 years ago.
A patch for PortalTransforms.TransformEngine, which improves its scalability and performance
patch.diff Download (11.6 KB) - added by sig 4 years ago.
new revision of the proposed patch, now fixed and contains a unit test.

Change History

Changed 5 years ago by sig

A patch for PortalTransforms.TransformEngine, which improves its scalability and performance

comment:1 Changed 5 years ago by esrever_otua

Thank god. This has bitten me on a stock-ish Plone 3.2.2 site with 7 image_to_whatever transforms registered. Pages took 30+ seconds to render because of _getPaths

comment:2 Changed 4 years ago by hannosch

  • Owner changed from nouri to hannosch
  • Status changed from new to assigned

comment:3 Changed 4 years ago by hannosch

Applied the patch as is to PortalTransforms trunk in  http://dev.plone.org/archetypes/changeset/12017. This will make it into the 2.0b1 release.

comment:4 Changed 4 years ago by hannosch

  • Status changed from assigned to closed
  • Resolution set to fixed

Backported to the maintenance branch in  http://dev.plone.org/archetypes/changeset/12018 and released as part of version 1.6.5.

comment:5 Changed 4 years ago by hannosch

  • Status changed from closed to reopened
  • Resolution fixed deleted

I had to revert the patch from both trunk and the maintenance branch.

Even in some simple cases the new algorithm didn't provide correct results. We got a test failure in linkintegrity showing that the reStructuredText transform wasn't called anymore. This is a rather simple transform chain from:

text/x-rst -> text/html -> text/x-html-safe

With the first done by the rest transform and the second done by the safe-html transform. The new algorithm only found the first step, but failed at the second.

comment:6 Changed 4 years ago by sig

Crap. There is an obvious bug in this patch, sorry. I am on it.

Details : We are iterating over typesToStartFrom (line 402 of the patched version) while removing items from this typesToStartFrom list (line 421). I am checking whether the removal line (typesToStartFrom.remove(startingType)) can simply be removed or if something else is needed. I keep you informed.

comment:7 Changed 4 years ago by hannosch

Cool. Since the algorithm isn't quite straight forward, it would be good to get a decent test coverage of it. Something simple that just tests the correct results for a given set of transforms and various transform paths. I think test_engine would be a good place for that.

Changed 4 years ago by sig

new revision of the proposed patch, now fixed and contains a unit test.

comment:8 Changed 4 years ago by sig

Here (attachment above) is a revised version of the patch. It fixes the previous bug it had (see my comment above). And it also contains a unit test. This unit test was successful when run against the original version and against the patched version as well (which is just a performance optimization after all, and performance is not test in this unit test). This unit test fails with the first buggy version of this patch. Other PortalTransforms and linkintegrity tests also succeed with this new version of the patch. It should be fine. If not, the unit test will help capturing any further regression.

Please confirm this patch is now OK for you, too.

comment:9 Changed 4 years ago by hannosch

Awesome, thanks for the new patch and tests! I applied the new patch to trunk.

This time I'll give it a bit of exposure in trunk and backport things to the stable branch used for Plone 3.3 after it has seen some more testing.

comment:10 Changed 4 years ago by esrever_otua

Backport to 3.3 can't happen soon enough. A relatively standard Plone 3.3.2 install will take multiple tens of seconds for page saving because of this bug.

comment:11 Changed 4 years ago by hannosch

  • Priority changed from minor to critical

There's a new version of PortalTransforms (1.6.7) for Plone 3.3 with these changes backported. Plone 3.3.4 doesn't come with this yet.

I'd appreciate some feedback from someone trying this out in real sites.

comment:12 Changed 4 years ago by cillian

 http://pypi.python.org/pypi/Products.PortalTransforms doesn't list 1.6.7 are you waiting for feedback before doing a release?

comment:13 Changed 4 years ago by cillian

Ah sorry, I see it is released. Thanks.

comment:14 Changed 4 years ago by hannosch

  • Status changed from reopened to closed
  • Resolution set to fixed

It's all released now and will also be part of Plone 3.3.5.

Note: See TracTickets for help on using tickets.