Ticket #9328 (closed PLIP: wontfix)

Opened 5 years ago

Last modified 3 years ago

content im-/export

Reported by: csenger Owned by: csenger
Priority: minor Milestone: 4.x
Component: General Version:
Keywords: Cc: plip-advisories@…, myroslav@…, grahamperrin@…, piv@…, cshenton, koval@…, chervol@…, kroman0@…, duffyd, sargo, mylanium

Description (last modified by csenger) (diff)

Motivation

Content ex-/import is an important functionality for different tasks, e.g. using a dedicated editing site during development and transferring the content into a production site without risking migration issues. It is also important if an in-site migration is not possible. This might be the case for a Plone release after 4.x. Currently used solutions include

All of these solutions have their problems and are incomplete, under documented, difficult to set up or not flexible enough.

Definitions

transmogrifier vocabulary

pipeline
A sequence of sections that is processed.
section
A section consists of a blueprint and optional configuration variables
blueprint
An class that provides ISectionBlueprint and implements ISection. In fact it is just a callable that implements iter to be used with python's iteration protocol.
source
A blueprint that reads in data that will be used by another blueprint in the pipeline. There can be more than one source where the second source injects new items into the pipeline.
constructor
A blueprint that reads the data and constructs an object.

Proposal

This PLIP aims to provide a solution for plone that

  • can be used to export the out-of-the box and most add-on content types
  • is extensible so add ons can add ex-/import data that can not be covered by a generic solution
  • is ready to use for an administrator out-of-the box
  • is integrated into the control panel.
  • can be used by developers to write a custom import for external data

Why a Proposal for Plone 4?

  • It should be the canonical ex-/import mechanism that add-on developer extend if the generic part does not cover enough data.
  • With dexterity and plone.app.content, there are other ways than archetypes to construct content. It seems impossible to support them and maintain the code outside of plone core.
  • It's regularly requested and import is one of the problems people are facing when it comes to migrating external to plone.

It can be added to a later Plone 4.x release just as well as it does not need changes to plone core and doesn't introduce backward incompatibility, but I submit it for Plone 4.0 to begin with.

Assumptions

This im-/export system covered by this PLIP handles only archetypes content and few special cases like comments. Generic blueprints for zope 3 schemata handling is not part of this PLIP.

Implementation

The export will be implemented with collective.transmogrifier. The main reasons are that it is extensible, fast and there are already most necessary blueprints implemented in plone.app.transmogrifier and quintagroup.transmogrifier, collective.blueprint.translationlinker. These include handling of Archetypes and ATCT + topics and their criteria (a port of gxml I think), references, comments, translation links, Browser Defaults and workflow state.

quintagroup.transmogrifier already implements a working ex-/import into a tarball. It uses atxml handler from Products.Marshall to export an archetypes object to xml ( example output) To write and read the data, it uses GenericSetup's TarballExportContext and TarballImportContext (with two small monkey patches). The structure of the tarball is similar the the generic setup content im-/export step and contains folders and xml files:

  • structure/
    • .objects.xml
      <?xml version="1.0" ?>
      <manifest> ... <record type="Document">front-page</record> ...
    • .properties.xml
      Xml produced with GenericSetup's propertymanager support and contains properties like default_page.
    • front-page/
      • .marshall.xml
        See atxml's example output.
  • news/
    • ...
    • aggregator/
      • .properties.xml
      • .objects.xml
      • .marshall.xml
      • criteffective_ATSortCriterion/
        • .marshall.xml
      • ...

The work on this PLIP is split into two major steps:

  1. Get a reliable, complete, hard wired content im-/export for an out-of-the-box plone site
  2. Make the system flexible enough to support add-on products and maybe TTW configuration of the export process.

1. Out-of-the-box Plone im-/export

This already supports add-ons as long as all information are saved in archetype schemata.

  1. Review the existing blueprints
  2. see what information we additionally need to export and write the missing blueprints
  3. write a pipeline configuration for im- and for export that works within a plone version.
  4. write a utility and a basic export control panel
  5. Get all used packages into the collective or the plone repository where they can be maintained.

2. Flexibility to support add-ons and configuration

A transmogrifier pipeline consists of many section where every section defines the blueprint to use and a number of configuration variables.

>>> exampleconfig = """\
... [transmogrifier]
... pipeline =
...     section 1
...     section 2
...     
... [section 1]
... blueprint = collective.transmogrifier.tests.examplesource
... size = 5
... 
... [section 2]
... blueprint = collective.transmogrifier.tests.exampletransform

We split the configuration into PloneTransmogrifierConfigProviders. They provide

  • information for the user interface (Title, Description)
  • one or more sections together with information
    • which kind of blueprint the section contains (source, transformer, writer; reader, transformer, constructor)
    • the priority of the section (like init scripts) within the group

The utility that composes the pipeline can then order the sections it receives from different ConfigProviders without knowing more about them. If an add-on registers a ConfigProvider, it can be integrated into the pipeline with a low chance to break the export.

Why not have one config provider per available blueprint?

  1. One or more sections (blueprints) are bound together if they do one thing at different points in the pipeline. An example is one blueprint that reads the information which object is the canonical version of a translation and a second blueprint that links the objects together after they were constructed by another blueprint.
  1. One blueprint can also be used several times like one that is transforming parts based on an regular expression so more than one PloneTransfomrationConfigProvider can use the same blueprint.

Configurability

PloneTransmogrifierConfigProviders can also be used to give the user the option to disable or configure certain tasks. Every provider could contain a zope schema to display an edit form with an option to disable it. If this generally makes sense has to be explored.

Another option would be to write a set of filter blueprints that are configurable and allow to configure e.g. the set of content types etc. that are removed before the export archive is generated or the imported data is written to the database.

Risks

The key component for reading/writing archetypes content is atxml from the Products.Marshall package. This package is kept in a working state, but is not well maintained. The unit tests of the package are not working. It seems to be an acceptable risk as this is the case for a long time and the package seems to be used by many people.

The package might not be finished within the the 4.0 release cycle. Beside the glue code there are are lots of details to be implemented and tested. But it's no problem to introduce the package in a later Plone 4.x release.

Deliverables

  • Consolidate blueprint packages
  • A plone package that contains the configuration backend and the control panel
  • ConfigProviders partly in the plone package, partly in external packages that implement the blueprints
  • Unit tests
  • Developer and end user documentation

Participants

Carsten Senger (csenger)

Progress and further information

See PlipContentImExport

Change History

comment:1 follow-up: ↓ 2 Changed 5 years ago by davisagli

Overall, this is a fairly well-thought-out PLIP. And the problem it tackles is certainly one that gets lots of attention (fixing it is the second most popular feature request at  http://plone.uservoice.com)! A few comments:

  • I'd like to hear more about what format will be produced by this export and whether it will be something that can be consumed by other systems besides Plone.
  • It would be good to check with someone who knows GenericSetup well (maybe wichert?) to make sure this isn't reinventing any wheels that are already part of our codebase.
  • I would encourage you to create wireframes of the UI that will support this and seek feedback from real-life integrators and Plone site admins.
  • I would not like to keep the import/exporter separate from the Plone egg rather than introducing new interdependencies.
  • I agree that this may be too ambitious for the Plone 4.0 timeframe, but could probably go in a 4.1 or later minor release.

comment:2 in reply to: ↑ 1 Changed 5 years ago by csenger

  • Status changed from new to assigned
  • Owner set to csenger
  • Description modified (diff)

Thanks for your comments.

Replying to davisagli:

  • I'd like to hear more about what format will be produced by this export and whether it will be something that can be consumed by other systems besides Plone.

I added information to the "Implementation" section.

  • It would be good to check with someone who knows GenericSetup well (maybe wichert?) to make sure this isn't reinventing any wheels that are already part of our codebase.

I will bring up the topic if there are major design decisions. I did not write it explicitly in the initial proposal, but kudos go to the quintagroup developers that did the biggest part of the job already. They use GenericSetup components and atxml for major tasks so their package does a lot while being surprisingly slim and easy to understand. I hope they contribute to the integration work that this PLIP is aimed at.

  • I would encourage you to create wireframes of the UI that will support this and seek feedback from real-life integrators and Plone site admins.

I thought about doing that for the PLIP during initial planning. What hold me off from doing that is the flexibility of transmogrifier. Data that is collected by one blueprint can be transformed by a second, filtered by a third and so on. I did not work out yet to which extend we can configure an export process and will follow your advice before implementing a configuration schema.

  • I would not like to keep the import/exporter separate from the Plone egg rather than introducing new interdependencies.

I will try to keep the number of packages small and move code to plone.app.* where possible.

comment:3 Changed 5 years ago by elvix

a big +1 to this. If Transmogrifier can be at the core of the import/export story, we have great flexibility for migrating from other systems to Plone too. Adding a roundrip XML formatted export/import and useable UI on top of it would rock!.

We will need a reliable default export/import story to migrate Plone sites to Plone 5 as there seems to be no intention for in-place upgrades. If all migrations will depend on it, it should get as much field-practice as possible before that. Jarn will support this. We did the original Transmogrifier.

comment:4 Changed 5 years ago by erikrose

  • Owner csenger deleted

Clearing Owner field of 4.0 PLIPs so we can use it to mean "implementor". (Many of these owners were automatically assigned from choosing a Component that had a default owner.)

comment:5 Changed 5 years ago by alecm

  • Status changed from assigned to new
  • Owner set to csenger

+1

This is an important and ambitious PLIP. In the not unlikely event that it isn't fully baked for Plone 4.0, it would certainly make sense for inclusion later in the 4.x series.

comment:6 follow-ups: ↓ 7 ↓ 16 Changed 5 years ago by optilude

I really really want this, and I really really hope Carsten will champion this. :-)

A few concerns just as I'm reading it:

  • Depending on GenericSetup's tarball contexts (and monkey patches?) seems a bit fragile. Probably better to fork these and bring them into the package if need be.
  • Storing hidden files like .objects.xml is very obtuse. On unix, people can't see these files. Rather we call them _objects.xml, which is a discouraged-or-invalid name in Zope so we won't get colissions, but they show up at the top of a folder listing.

I'd also consider how we deal with partial import and export. It's going to be quite commont to import/export a particular folder and its children, for example.

comment:7 in reply to: ↑ 6 Changed 5 years ago by csenger

Replying to optilude:

I really really want this, and I really really hope Carsten will champion this. :-)

No, just wrote the PLIP cause I want it too ;)

A few concerns just as I'm reading it:

  • Depending on GenericSetup's tarball contexts (and monkey patches?) seems a bit fragile. Probably better to fork these and bring them into the package if need be.

I think the patch can go upstream. I like to circumvent forking it unless we need a functionality that is incompatible with GS. The nice thing of using ExportContexts for GS is that you can replace the generic setup content step with the transmogrifier based and export the content along with the other configuration into one tarball.

ExportContexts have a simple interface and implementation that mostly abstracts the writing with writeDataFile(...) from the storage implementation (tarball, snapshot, directory). We only use this method, nothing else.

I add an item to the list to implement an alternative writer with no Plone dependencies to which we can switch to anytime.

  • Storing hidden files like .objects.xml is very obtuse. On unix, people can't see these files. Rather we call them _objects.xml, which is a discouraged-or-invalid name in Zope so we won't get colissions, but they show up at the top of a folder listing.

That's a good idea. They probably adopted it from the GS content handler.

I'd also consider how we deal with partial import and export. It's going to be quite common to import/export a particular folder and its children, for example.

I agree that it's one of the important options. I'll create a wiki page to maintain a list of desired configuration options and other details and add it.

comment:8 Changed 5 years ago by smcmahon

  • Cc plip-advisories@… added

comment:9 Changed 5 years ago by MatthewWilkes

FWT Vote: +1

comment:10 Changed 5 years ago by rossp

FWT vote: -1

My biggest concern here is the impact on add-on developers incurred by introducing a new migration requirement that fights against the merits of the ZODB. So I'm strongly opposed to developing a migration infrastructure that requires add-on developers to not only make the necessary updates to their Zope/Plone code but also to integrate with and maintain support for the new migration infrastructure.

I also think there's significant risk in defining an export/import format island. There seems to be noise out in the CMS world about import/export/interchange formats. Now surely these discussions will be lengthy and won't come to fruition any time soon, but I do think effort might be better placed in participating in those discussions. In particular, I'm worried about introducing a content import/export infrastructure that developers and integrators would start investing in that might subsequently be replaced by something that is less of an island.

comment:11 follow-up: ↓ 12 Changed 5 years ago by elvix

Ross, the cool thing about using transmogrifier for the export/import is that the framework supports all formats. We are already using it on oracle, mysql and differently formatted static sites. By having export/import an integral and maintained part of Plone it'll be simpler to add multipple importers to the fray. Adding a plone-specific xml dialect for roundtrip export-import will just give us the content dump and reload possibility all rmdbs-using cms-es have (and use) already…

comment:12 in reply to: ↑ 11 ; follow-up: ↓ 13 Changed 5 years ago by rossp

Replying to elvix:

Well then I'll change my FWT vote to +1 since this would be very valuable. But I'm strongly opposed to putting another requirement on add-on developers and fighting the strengths of the ZODB by making Plone migration depend on this.

comment:13 in reply to: ↑ 12 Changed 5 years ago by elvix

Replying to rossp:

Well then I'll change my FWT vote to +1 since this would be very valuable. But I'm strongly opposed to putting another requirement on add-on developers and fighting the strengths of the ZODB by making Plone migration depend on this.

The dependency on export/import for migrations is a future (Plone 5 PLIPs) discussion IMO. Migrations will *probably* have to depend on this from Plone 4 to Plone 5 since there is no in-place migration story between those major versions.

comment:14 Changed 5 years ago by davisagli

FWT vote: a big +1.

comment:15 Changed 5 years ago by calvinhp

FWT Vote: +1

comment:16 in reply to: ↑ 6 Changed 5 years ago by bohdan_koval

  • Cc koval.bogdan@… added

Replying to optilude:

I'd also consider how we deal with partial import and export. It's going to be quite commont to import/export a particular folder and its children, for example.

Partial export and import can be easily done, because transmogrifier is implemented as adapter for IFolderish interface and can be called in any folderish context.

Another good thing about transmogrifier based ex-/import system is that it can be well tested. Transmogrifier pipeline consists of small sections, which can be individually tested. When I was writing quintagroup.transmogrifier package it was very easy to create unit tests for all blueprints.

comment:17 Changed 5 years ago by csenger

I added PlipContentImExport to the wiki to record the progress and collect suggestions and information

comment:18 Changed 5 years ago by csenger

  • Description modified (diff)

Add link to PlipContentImExport to Description

comment:19 Changed 5 years ago by raphael

FWT Vote: +1 although I think that can just as well be an independent add-on

comment:20 Changed 5 years ago by interra

  • Cc myroslav@… added

comment:21 Changed 5 years ago by esteele

Approved by FWT vote.

comment:22 Changed 5 years ago by csenger

  • Status changed from new to assigned

comment:23 Changed 5 years ago by grahamperrin

  • Cc grahamperrin@… added

comment:24 Changed 5 years ago by mj

Downsides are: Transmogrifier has no final release, no end user interface or documentation and is complex.

Both collective.transmogrifier 1.0 and plone.app.transmogrifier 1.0 were released last weekend at the Bristol Balloon Sprint. Both have extensive developer documentation.

I'd prefer to see a simple XML format for storing transmogrifier pipeline keys and values, where only simple python types (unicode strings, encoded strings, integers, floats, and such) are stored, perhaps in a compressed file format.

comment:25 Changed 5 years ago by piv

  • Cc piv@… added

comment:26 Changed 5 years ago by cshenton

  • Cc cshenton added

comment:27 Changed 5 years ago by esteele

  • Milestone changed from 4.0 to 4.x

comment:28 Changed 4 years ago by limi

  • Component changed from Unknown to Infrastructure

comment:29 follow-up: ↓ 32 Changed 4 years ago by tomster

Here's a (belated) update from the progress made on this plip during the  Cathedral Sprint in Cologne made by csenger and myself.

We have decided on moving forward using quintagroup.transmogrifier since it currently offers the most complete solution of all the approaches evaluated. Our assessment was that while it seems a bit 'rough' and doesn't have a lot of test coverage it is the only solution that currently actually produces workable results. OOTB we were able to export a sample Plone 3 site and re-import it into a vanilla Plone 3 site, however importing that export into a vanilla Plone 4 site did not work.

We then spent the remainder of the sprint working on improving quintagroup.transmogrifier so that importing into Plone 4 would be possible. Since the code lives in quintagroup's read-only repository and we wanted to get started immediately, we created a  clone on github and worked on a  Plone 4 branch, where we now have complete tests for exporting and importing a plone site.

Additionally, we added some fixes to  Products.Marshall (including moving some of quintagroup's fixes in their Marshaller, which we then could remove there).

Long story short: we are now in a state where we can successfully export a Plone 3 site using quintagroup.transmogrifier and then import it into a vanilla Plone 4 site :-)

The next step would be to create releases of Products.Marshall and quintagroup.transmogrifier. I will contact their owners and request a review of our work and a release.

comment:30 follow-up: ↓ 31 Changed 4 years ago by limi

Nice, didn't know that you were working on this too at the Cathedral sprint.

Could we ask Quintagroup to donate it to the Foundation? If we're going to do this, it should be in the Plone repository — or at the very least in the Collective. Private repositories are painful. :)

Thanks for the update!

comment:31 in reply to: ↑ 30 Changed 4 years ago by tomster

Replying to limi:

Could we ask Quintagroup to donate it to the Foundation?

you just did... all three developers have added themselves cc: to this ticket already some time ago :)

If we're going to do this, it should be in the Plone repository — or at the very least in the Collective. Private repositories are painful. :)

indeed. seeing that none of them has responded yet (neither to this thread nor to my PM) we might consider copying the necessary bits from quintagroup.transmogrifier into plone.app.transmogrifier (q.transmogrifier is contains more features than we really need for this plip, anyway.)

also, from a UI perspective i've been thinking of adding a 'proper' "click here to dump" and "click here to import your old site" controlpanel instead of sending people to the ZMI.

and that could live in p.a.transmogrifier, or we create a new package p.a.contentmigration or somesuch.

IMHO we should have two separate products, one which you install into your old site for exporting and one for importing that dump which you install into your plone 4 (or later plone 5) site.

any feedback on this approach?

comment:32 in reply to: ↑ 29 ; follow-up: ↓ 33 Changed 4 years ago by interra

  • Cc koval@…, chervol@…, kroman0@… added; koval.bogdan@… removed

Replying to tomster:

We have decided on moving forward using quintagroup.transmogrifier since it currently offers the most complete solution of all the approaches evaluated. Our assessment was that while it seems a bit 'rough' and doesn't have a lot of test coverage it is the only solution that currently actually produces workable results. OOTB we were able to export a sample Plone 3 site and re-import it into a vanilla Plone 3 site, however importing that export into a vanilla Plone 4 site did not work.

We then spent the remainder of the sprint working on improving quintagroup.transmogrifier so that importing into Plone 4 would be possible. Since the code lives in quintagroup's read-only repository and we wanted to get started immediately, we created a  clone on github and worked on a  Plone 4 branch, where we now have complete tests for exporting and importing a plone site.

We've went on and evaluated the  Plone 4 branch. It is vast step forward. However  Plone-4 test fails with testcaselayer.ptc import error and  Plone-3.3 test fails with extra transmogrifier(u'quintagroup.transmogrifier.tests.datacorrector') test.

Let's clean up this stuff first and move on with collective initiative then.

comment:33 in reply to: ↑ 32 ; follow-up: ↓ 35 Changed 4 years ago by tomster

Replying to interra:

We've went on and evaluated the  Plone 4 branch. It is vast step forward. However  Plone-4 test fails with testcaselayer.ptc import error

that seems easy to fix. i'll look into that ASAP.

and  Plone-3.3 test fails with extra transmogrifier(u'quintagroup.transmogrifier.tests.datacorrector') test.

i encountered this, too. it seems the order is returned differently in Plone 3 and 4. shouldn't be too difficult to make the rest robust...

Let's clean up this stuff first and move on with collective initiative then.

absolutely. i'll look into it ASAP and update here.

comment:34 Changed 4 years ago by duffyd

  • Cc duffyd added

I just received notification from Volodymyr @ Quintagroup indicating that they're ready to move quintagroup.transmogrifier to the Collective but due to some failing tests in the Plone 4 github branch from the Cathedral sprint (referenced above) they have halted this process. How's things progressing with this as it would be great to have q.transmogrifier moved to the Collective so more people could collaborate on this.

Thanks, Tim

comment:35 in reply to: ↑ 33 ; follow-up: ↓ 36 Changed 4 years ago by tomster

Replying to tomster:

Replying to interra:

We've went on and evaluated the  Plone 4 branch. It is vast step forward. However  Plone-4 test fails with testcaselayer.ptc import error

that seems easy to fix. i'll look into that ASAP.

i've added an explicit dependency on ptc for testing in  http://github.com/tomster/quintagroup.transmogrifier/commit/b05d712dc71ff6b502a3334e711f627fe40dbe68

can you check, if that fixed it?

and  Plone-3.3 test fails with extra transmogrifier(u'quintagroup.transmogrifier.tests.datacorrector') test.

i encountered this, too. it seems the order is returned differently in Plone 3 and 4. shouldn't be too difficult to make the rest robust...

how are you guys running the tests under Plone 3.3? I currently cannot get them to run here at all. I've tried using plonenext and  http://svn.plone.org/svn/plone/buildouts/plone-coredev/branches/3.3 but i wasn't able to create a working bin/test for either.

do you have a plone3.3 buildout you could point me to, so i could reproduce the test failure?

Let's clean up this stuff first and move on with collective initiative then.

absolutely. i'll look into it ASAP and update here.

comment:36 in reply to: ↑ 35 Changed 4 years ago by interra

We still have the same errors. See:

There is a chance we should be testing in some other way (at the moment we do bin/test -s quintagroup.transmogrifier)?

Replying to tomster:

Replying to tomster:

Replying to interra:

We've went on and evaluated the  Plone 4 branch. It is vast step forward. However  Plone-4 test fails with testcaselayer.ptc import error

that seems easy to fix. i'll look into that ASAP.

i've added an explicit dependency on ptc for testing in  http://github.com/tomster/quintagroup.transmogrifier/commit/b05d712dc71ff6b502a3334e711f627fe40dbe68

can you check, if that fixed it?

[snip]

how are you guys running the tests under Plone 3.3? I currently cannot get them to run here at all. I've tried using plonenext and  http://svn.plone.org/svn/plone/buildouts/plone-coredev/branches/3.3 but i wasn't able to create a working bin/test for either.

do you have a plone3.3 buildout you could point me to, so i could reproduce the test failure?

[snip]

comment:37 Changed 4 years ago by sargo

  • Cc sargo added

comment:39 Changed 3 years ago by mylanium

  • Cc mylanium added

comment:40 Changed 3 years ago by rossp

  • Status changed from assigned to closed
  • Resolution set to wontfix

PLEASE READ THIS AND RE-OPEN VALID PLIPS!

As we launch the new PLIP process we'd like to see which PLIPs:

  • are still appropriate/needed
  • still have owners/proposers/champions
  • still have available implementers

If this PLIP should still be considered for future releases of Plone please do re-open this ticket and assign an appropriate milestone. If it should be considered for the next release of Plone, use the 4.2 milestone. Also be sure to update the PLIP description, requester, owner, etc. and include a comment detailing recent progress and new plans. We will use all these details in the new continuous PLIP process.

comment:41 Changed 22 months ago by davisagli

  • Component changed from Infrastructure to General
Note: See TracTickets for help on using tickets.