Skip to end of metadata
Go to start of metadata

Universal Wiki Converter

Overview
Build Instructions
Properties Files
The UWC is an extensible framework which can be improved to help you move your content to Confluence.

 


 

Links


Devel Info

Overview

The UWC was designed to be extensible, so that as many wikis as people want can be supported. Also, sometimes the existing wiki converter module will not be sufficient for a particular user's needs. For any number of reasons, you may find yourself wanting to extend the UWC.

Let's talk about some concepts.

Exporting and Converting

Most of the UWC can be divided into 2 major concepts:
Exporting and Converting.

When we talk about exporting, we mean the process you take to get your data from your old wiki into a format that the UWC can process. When we talk about converting, we mean the process the UWC uses to transform your old wiki's exported data into Confluence pages.

Converters

We talk about two types of converters:

  • Wiki Converters
  • Syntax Converters

A wiki converter is the set of syntax converters used to turn your old wiki's syntax into Confluence syntax.
The Mediawiki converter is a wiki converter. It is comprised of dozens of syntax converters. Each syntax converter handles one type of wiki syntax. So, for example, a syntax converter might handle transforming bold Mediawiki syntax into bold Confluence syntax.

In addition, we sometimes talk about non-converter properties. These are properties that live in the same properties file as syntax converters, but they do something other than convert syntax. Usually, they set some global state. For example, turning on the UWC Hierarchy Builder Framework.

Developing for the UWC

Your development efforts will probably fall into one of 5 (progressively more complicated) categories:

Uploading vs. the Output Directory

The UWC uploads pages to Confluence by default. It also outputs a copy of every page as a file to the directory output/output/. When you're developing syntax converters, we recommend turning off uploads, and just examining the contents of that output directory, as it will speed up your development.

To turn off uploads, see info on the Pages Will Be Sent to Confluence setting.

Source Control

Git

The latest source is now being hosted by AppFusions on Bitbucket with Git. To make contributions, please submit a pull request and we will contact you.

Build Instructions

To build the project, you'll need Ant and Java 6.

$shell$ cd devel/

To get a list of ant targets and their descriptions:

$shell$ ant -p       

To build the uwc so that it can be run using classes

$shell$ ant

To run that build with the GUI:

$shell$ cd target/uwc
$shell$ ./run_uwc_devel.sh

To run that build with the CLI:

$shell$ cd target/uwc
$shell$ ./run_cmd_devel.sh 

To build the uwc so that it can be run using the uwc.jar, cd back to the devel dir:

$shell$ ant package    

To run the uwc.jar build:

$shell$ cd target/uwc

and then run one of the following:

$shell$ ./run_uwc.sh           //on linux
$shell$ ./run_uwc_on_mac.sh    //on mac
$shell$ run_uwc.bat            //on windows
$shell$ ./run_cmdline.sh       //to run the command line interface

depending on your system. See UWC Running for further details about the different shell scripts.

conf-local directory

When the UWC is built either with 'all' or 'package', the contents of the conf directory are sent to target/uwc/conf.
However, with the 'all' target, if there is a conf-local directory, then the contents of that directory overwrite target/uwc/conf.

Use if...

If you have certain local settings that you don't want to have to re-write every time you build the UWC, create a conf-local directory and put your versions of the files there. Now when you run $shell$ ant all, your local versions of the conf files will be used by the built UWC.

Example:
Let's say I'm testing the tikiwiki converter's image handling. I don't want to have to set my database settings in settings.tikiwiki.properties every time I build, so I:

  1. create a directory conf-local (sibling directory to conf)

    $shell$ cd uwc/devel
    $shell$ mkdir conf-local
    $shell$ ls -l | egrep "^d"
    drwxr-xr-x    9 laura  laura      306 Jan 29 15:12 bin
    drwxr-xr-x   28 laura  laura      952 Jan 30 11:22 conf
    drwxr-xr-x    7 laura  laura      238 Jan 29 13:59 conf-local
    drwxr-xr-x    4 laura  laura      136 Oct 30 16:35 gensrc
    drwxr-xr-x    4 laura  laura      136 Oct 30 16:35 images
    drwxr-xr-x   32 laura  laura     1088 Dec  9 12:50 lib
    drwxr-xr-x    3 laura  laura      102 Jan 23 11:29 output
    drwxr-xr-x   19 laura  laura      646 Jan 30 11:20 sampleData
    drwxr-xr-x    6 laura  laura      204 Oct 30 16:36 src
    drwxr-xr-x    3 laura  laura      102 Jan 30 10:50 target
    
  2. cp conf's settings.tikiwiki.properties to conf-local

    $shell$ cp conf/settings.tikiwiki.properties conf-local/.
    
  3. edit conf-local/settings.tikiwiki.properties and add my local settings to that file
  4. Now, every time I run ant 'all'...

    $shell$ ant all
    
  5. ... the built UWC will use my local settings.tikiwiki.properties

    $shell$ cd target/uwc/
    $shell$ ./run_uwc_devel.sh
    

Improving an Existing Wiki Converter

Each wiki converter has a properties file that is located in the conf directory.
The naming convention for wiki converter properties files is
converter.xxx.properties, where xxx is the name of your wiki. Example, mediawiki's properties file is conf/converter.mediawiki.properties.

So, to make a change to an existing converter, you'll need to edit converter.xxx.properties, and add or change the syntax converters.

Each line in the properties file represents either a syntax converter or is some sort of global conversion state. There are many different types of syntax converters, described in the #Anatomy of a Properties File section of this document, as well as a number of global settings, described in #Nonconverter properties section of this document.

A Simple Example for improving an existing wiki converter

Let's say that in your wiki, you have a custom macro that always makes the text red, and there is no currently existing syntax converter for this syntax. If your macro looked like

sample text
[[mymacro]]This would be red text[[mymacro]]

Then you might add the following syntax converter to your properties file:

Mywiki.1234.mymacro.java-regex=\[\[mymacro\]\](.*?)\[\[mymacro\]\]{replace-with}{color:red}$1{color}

If you now ran your converter on the above sample text, you would end up with:

{color:red}This would be red text{color}

To learn more, please see: #Anatomy of a Properties File.

Adding a New Wiki Converter

Let's say your wiki is not currently supported. In addition to the techniques used in #Improving an Existing Wiki Converter, you would need to:

  1. Create a properties file in the conf directory representing your wiki, with the filename converter.xxx.properties, where xxx is the name of your wiki.
  2. Edit the properties file, and add syntax converters to it for each of the syntaxes you wish to be handled. See #Improving and Existing Wiki Converter.
  3. If the filenames of your files are not what you want the Confluence pagenames to be, you would need to take advantage of the UWC Page Titles Framework to change the pagenames.
  4. Optionally, take advantage of any of the other #Frameworks.

Frameworks

The UWC has a number of different frameworks designed to help you automatically maintain data that goes beyond the page contents.

To learn how to use a framework, please see its doc pages.

Adding or Changing a framework
This will generally be a fairly involved process. Classes you'll want to get to know are ConverterEngine.java, and probably Page.java.

Remember when adding a new framework, if you want your code to be accepted, you'll want to include some opt-in way of turning your feature on. (Opt-out is discouraged.) Generally, we use #Nonconverter properties to accomplish this. You'll have to implement these directly in the ConverterEngine. Luckily, most of this code's been written a half a dozen times. Search for the word "hierarchy", for example, within the ConverterEngine to find the code chunks necessary to insert your new property.

Other Features
In addition to the frameworks, there are several user features that you should probably be aware of:

Properties Files

The conf directory has many types of properties files. The two most important are converter.xxx.properties and exporter.xxx.properties.

Anatomy of a Properties File

Here you'll find info on the converter.xxx.properties files, their naming conventions, and some special classes that can be used to help you with your conversions.

Each converter.xxx properties file contains a series of properties: key-value pairs that define what the wiki converter does to the pages you give it.

Property names

Your property name will look like:

Example property

Wikitype.xxxx-syntax_description.suffix

Let's go through that in order.

  • The Wikitype section of the property is arbitrary, but for consistency, name it after the wiki you are converting from
  • The xxxx section is a number. This is useful for helping to keep the converters in an understandable order. Essentially, the converters are run in ASCII Ascending alphabetical order. So, provided your Wikitype is the same for all converters, these numbers are going to determine the order the converters get run in
  • The syntax_description is just for ease of identifying what the converter does
  • The suffix will tell the ConverterEngine what type of property this is. Choices are:
    • class - Use this one if the converter will use a Java class that implemented BaseConverter. See #Classes.
    • twiki-cleaner - Use this if the converter is a class, but you're just going to do a simple search/replace. See #Twiki cleaners.
    • java-regex - Use this one if the converter will do a simple search and replace java regex expression here. See #regular expressions for more info.
    • perl - Use this one if the converter will use perlish search and replace syntax. See #regular expressions for more info.
    • java-regex-tokenizer - Use this one if the converter will do a search and replace, and then tokenize the results so that they are no longer available for conversions. See #Tokenizing classes for more info.
    • xmlevent - Use this to associate xml parsers with xml tags. See #Xmlevent properties for more info.
    • A non-converter property

      Converters are run in ASCII alphabetical order by property name

      MyWiki.0100-stuff will get run before
      MyWiki.0200-stuff which will get run before
      MyWiki.0200-xyz

Property values

Property values for syntax converters can be #Classes (including a special variant called #Tokenizing classes), #regular expressions, and xml events. The idea for a converter property is that a syntax that needs to be converted gets a converter property which controls that conversion.

Property values for non-converter properties are tailored to the property in question (booleans, settings, classnames, etc.).

Converter Examples

Each converter property type is described in detail below, with examples. You can see additional examples here: UWC Conversion Line Examples.

Classes

If it's a class, the property value should point to a Java class that implements BaseConverter.

Example

MyWiki.0100-converting_stuff.class=com.atlassian.uwc.converters.ConvertingStuff

This class implements com.atlassian.uwc.converters.BaseConverter. The entry method is convert.
Basically, you should:

  • get the original text
  • Do something to it, maybe with a regular expression maybe not
  • set the page's converted text

    public void convert(Page page) {
       String input = page.getOriginalText();
       String converted = doSomething(input);
       page.setConvertedText(converted)
    }
    

Twiki Cleaners

If it's a class, but you just want to do a simple search/replace regular expression (or series of them), you can use a cleaner class.

The property will look like:

MyWiki.0100-mycleaner-class.twiki-cleaner=com.atlassian.uwc.converters.something.SomeCleaner

The class will have to implement com.atlassian.uwc.converters.twiki.cleaners.ContentCleaner
where the String results of the clean(String) method are the replacement text for your file.
As there are some implementations already existing that you can take advantage of, alternatively, you could extend one of the following:

  • RegularExpressionCleaner - use the constructor to set up search and replacement text.

    public class ColorRed extends RegularExpressionCleaner {
        public ColorRed() {
            super("%RED%","{color:RED}");
        }
    }
    

    This is the entirety of the ColorRed class. If it is assigned as a twiki-cleaner converter property then every instance of the text %RED% in the input text will be replaced with {color:RED}.

  • CompositeContentCleaner - use the constructor to set up a list of ContentCleaners, then pass them to super to iterate through them. You can use this to quickly go through a set of RegularExpressionCleaner classes.

    public class HtmlHeader extends CompositeContentCleaner
    {
       public HtmlHeader()
       {
          List cleaners = new ArrayList();
          cleaners.add(new RegularExpressionCleaner("<\\s*?[hH](\\d)\\s*?>", "h$1. "));
          cleaners.add(new RegularExpressionCleaner("<\\s*?(/[hH]\\d)\\s*?>", "\n"));
          super(cleaners);
       }
    }
    

    This class runs the input through each of those cleaners in the order they were added to the list.

regular expressions

If it's a regular expression, provide a search and replace string. If you are using the java-regex converter type, use the delimiter {replace-with} between your search and replace strings.
The following java-regex example takes characters surrounded by <nowiki> tags and replaces those tags with Confluence noformat macros.

java-regex example

Mediawiki.0200-re_noformat.java-regex=<nowiki>((?s).*?)</nowiki>{replace-with}{noformat}$1{noformat}


Here's a perl example. It looks like a perl regex. This one converts italics.

perl example
DokuWiki.1underlined.perl=s/__([^_]+)__/+$1+/g

UWC uses Jakarta's ORO project for perl handling

If you want to add your own regex converters but aren't sure what exactly is supported by the 'perl' engine, the underlying code is from the Jakarta ORO project and the regex flavor is documented here: Perl5Util

NEWLINE special replacement value

There's a special convenience replacement value for newlines.
If you use the word 'NEWLINE' in your replacement text, it will be replaced with a newline.
For example, to match and replace ABC with a newline character:

SomeWiki.newline-replace-example.java-regex=ABC{replace-with}NEWLINE

The text NEWLINE (in upper case) now resolves to a system dependent newline character.

Tokenizing classes

Let's say you want to convert something, but then not allow any further conversions. For example, converting the contents of <code> tags to {code} tags. You would then use the java-regex-tokenizer type. This would perform the search and replace and then tokenize those converted sections so that they were protected from further conversion.

java-regex-tokenizer example

Mediawiki.0095-re_code.java-regex-tokenizer=\<code\>((?s).*?)\<\/code\>{replace-with}{code}$1{code}

Tokenizer properties have a convenience option for when the developer wants "dotall" and "multiline" modes in effect. Use {replace-multiline-with} instead of {replace-with}:

replace-multiline-with compiles with DOTALL and MULTILINE

Mediawiki.0095-re_code.java-regex-tokenizer=\<code\>(.*?)\<\/code\>{replace-multiline-with}{code}$1{code}

To detokenize, you would add the following class to the end of your converter.properties:

If you use tokenizers, you must end your properties with this:

Mediawiki.2000-detokenize.class=com.atlassian.uwc.converters.DetokenizerConverter

Xmlevent properties

If you want to take advantage of the built-in Xml Framework, you'll need to assign xmlevent properties. The purpose of these properties is to assign a parsing class to an xml tag. Such a property would look like this:

Xmlevent property
XmlTest.0300.bold.xmlevent={tag}b{class}com.atlassian.uwc.converters.xml.example.BoldParser

This would be used to transform output like:

<b>bold</b>

to

*bold*

.
If you use xmlevent properties, you'll need to also call a Converter class to use them. We recommend using the XmlConverter, like so:

This does the actual parsing, so you need to call this if you want the xmlevent properties to do anything.
XmlTest.0500.xmlconverter.class=com.atlassian.uwc.converters.xml.XmlConverter

To learn more, see UWC Xml Converter Framework.

Nonconverter properties

Non-converter properties are used to handle settings that would affect the conversion, but are not technically converters. They are often used to turn on or customize optional features.

Non-converter properties belong on top

We recommend that non-converter properties be set at the beginning of the properties file.

hierarchy

UWC Hierarchy Builder Framework
Description - The hierarchy framework provides functionality to allow the UWC to set parent-child relationships between pages.
Example

MyWiki.0001.switch.hierarchy-builder=UseBuilder
MyWiki.0002.classname.hierarchy-builder=com.atlassian.uwc.hierarchies.FilepathHierarchy
page histories

UWC Page History Framework
Description - The page histories framework provides the ability to maintain version histories for pages.
Example

MyWiki.0001.switch.page-history-preservation=true
MyWiki.0002.suffix.page-history-preservation=-#.txt
disabling illegal pagenames framework

UWC Illegal Pagenames Framework - Disabling
Description - The disabling illegal pagenames framework feature provides a way to turn off the default illegal pagenames handling.

Careful!

Allowing illegal pagenames to be uploaded to your Confluence could produce unknown behavior.

Example

Mywiki.0001.illegal-handling=false
auto detect spacekeys

UWC Auto Detect Spacekeys Framework
Description - The Auto Detect Spacekeys framework will detect and create spaces on the fly for your new Confluence pages.
Example

Mywiki.0001.autodetect-spacekeys=true
Miscellaneous Properties

UWC Miscellaneous Properties Framework
Description - Allows configuration by passing properties following a certain naming convention to various UWC objects. Anything following the convention: Wikiname.xxxx.key.property=value will be assigned as a key-value pair to a Properties object that will be injected into many UWC objects (converters, hierarchies, etc).
Example

MyWiki.0101.myprop.property=true

Existing Global Configuration Properties

property key

description

default val

example

list-collisions

Used to control whether the UWC should detect and report namespace collisions

true

testing.1234.list-collisions.property=false

allow tilde in links

Used to tell the Illegal Pagenames Framework to allow tilde characters in links

false

Mediawiki.0001.allow-tilde-in-links.property=true

allow at in links

Used to tell the Illegal Pagenames Framework to allow at characters in links

false

Test.0001.allow-at-in-links.property=true

turning on url decoding

Tells the illegal pagenames framework to url decode the page title

false

Mediawiki.0002.illegalnames-urldecode.property=true

filepath-hierarchy-ext

Sets the filepath extension explicitly, rather than letter the Filepath Hierarchy try to figure out. Important, if your extension is the empty string.

undefined

TestHierarchy.0001.filepath-hierarchy-ext.property=

attachment-upload-comment

used to customize the upload comment associated with attachments

Added by UWC, the Universal Wiki Converter

TestComment.0003.attachment-upload-comment.property=Foo Bar 123!

Filename extension stripping class

If you do not want the pages that Confluence imports to have the filename extension in the page title, add this class to the end of your converter.properties:

This will strip out filename extensions from page titles

Mediawiki.1000-remove-extension.class=com.atlassian.uwc.converters.ChopPageExtensionsConverter

exporter.xxx.properties

Each wiki that the UWC provides an exporter for requires an exporter.xxx.properties file, where xxx is the name of the wiki. The name of the wiki used must match the corresponding converter.xxx.properties file.

For example, if you wanted to add an exporter for the vqwiki type, which currently has a converter but no exporter, you would examine the name of its converter properties file, which is converter.vqwiki.properties, and you would name your exporter file exporter.vqwiki.properties. You would not call it exporter.vqwiki-export.properties. Nor would you call it exporter.VQWIKI.properties. Just use the same keyword. This will allow the UWC to recognize that your wiki type has an exporter, and it will enable the export button when the vqwiki type is chosen.

Within the exporter.xxx.properties file, you need to define certain properties.

Required properties
You must define an exporter.class property, which will tell the UWC what class to instantiate to run your export. It must implement the interface com.atlassian.uwc.exporter.Exporter.

It will look something like this:

Example exporter.class property
exporter.class=com.atlassian.uwc.exporters.MediaWikiExporter

Optional properties
Other than the exporter class, all other properties are optional, depend on what you need from the user, and will be configured by and large by your user. We recommend at the least having some sort of output directory setting, so your exporter knows where to put the exported wiki files that the UWC will eventually use. Other common properties include database settings.

The rest of your exporter.xxx.properties file might look something like this:

Example optional properties
## database name is the name of the mediawiki database
databaseName=wikidb
## dbUrl is the JDBC connection url. The following is an example mysql url
dbUrl=jdbc:mysql://localhost:3306
# The JDBC driver you wish to use
# Note: You will have to provide the JAR, by placing it in the lib directory, unless you
# use MySQL. The Mysql driver has been provided. The following would be the class if you use MySQL.
jdbc.driver.class=com.mysql.jdbc.Driver

## Login info to connect to this database. You will need to replace this.
login=mylogin
password=mypassword

## This is the output directory. The export will send text files with Mediawiki syntax to this directory.
output=/Users/laura/Desktop/

These properties will be automatically loaded into your exporter class as a parameter passed to the export method, which is your entry point.

public void export(Map properties) throws ClassNotFoundException, SQLException;

Remember to document the properties both in the exporter.xxx.properties file, and also in the corresponding wiki notes page on this website, especially which properties are a requirement for your users to set in order for the exporter to work and how to get the info they need to set those properties. You can find the notes page for your wiki (or create one) here: UWC Supported Wikis.

Mediawiki Exporter is a good example.

The Mediawiki exporter was the first exporter developed, and is a good example from which to base your exporters.
The relevant files are: conf/exporter.mediawiki.properties and com/atlassian/uwc/exporters/MediaWikiExporter.java
You can find the mediawiki notes page here: UWC Mediawiki Notes.

To learn more about exporters in general, see #Exporters.

settings.xxx.properties

Sometimes, you may find the need to have the user configure additional properties.

In the past, we might create an additional properties file. For example, for a number of reasons, we found it handy for the Tikiwiki's attachment handling syntax to have access to the tikiwiki's database. Therefore, we needed the user to provide database settings. These settings were added to a settings.tikiwiki.properties file.

That being said, we recommend that developers needing to pass properties to their converters take advantage of the UWC Miscellaneous Properties Framework instead.

confluenceSettings.properties

Certain user settings are saved to conf/confluenceSettings.properties. These are usually settings used by the User Interface. For example: address, login, wiki type, max attachment filesize setting, etc.

Unless you're changing the UI, you generally can ignore this file.

guisettings.properties

Used to define wikis that have special gui element disabling needs. See UWC GUI Disabling Framework.

settings.illegalcharmap.properties

Defines the alternatives used when translating illegal page names. See UWC Illegal Pagenames Framework.

Attachments

Attachments are a complex aspect of UWC development. Because every wiki seems to handle their attachments differently, there are limits to how generalized we can make attachment handling. Our approach works like so:

We provide an Attachments setting, which the user sets in the GUI. Often, this will be used to indicate a directory where the attachments are located.

You can then get this setting in any converter class which has extended BaseConverter with the following getter:

String attDir = this.getAttachmentDirectory();

You may not need an attachments setting for your wiki. For example, if every file that will become a page has an attachments directory in the same directory as the page file, you can just use the page's filepath to get to the attachments. The attachments setting is particular useful, if instead, your attachments are maintained in a particular directory seperate from your pages.

To add your attachments to your Page object within your converter class, you would then use the addAttachment method.
Here's a basic idea of how an AttachmentConverter might work:

public class AttachmentConverter extends BaseConverter {
   public void convert(Page page) {
     String attDir = this.getAttachmentDirectory();
     List<File> attachments = doSomething(attDir);
     for (File file : attachments) {
        page.addAttachment(file);
     }
   }
   ...
}

For an example, I would recommend examining com.atlassian.uwc.converters.mediawiki.AttachmentConverter.java

Exporters

In order to convert your wiki data to Confluence data, you'll first need to export your wiki's data to a format the UWC can process. Sometimes the wiki's internal export process is fine. Sometimes, it's not. In those cases, we often provide an exporter that the UWC runs to get the data it needs to do a conversion. To learn more about UWC exporters, read #exporter.xxx.properties.

So what do we mean when we say "a format the UWC can process"?

The UWC expects that each page you want in Confluence will be represented by a text file on your file system. The contents of that file is the wiki syntax (or in some cases, xml) used by your old wiki for that page.

This is not to say that each file might not also have other metadata that you might want. It could. When you get to the conversion part of the process, you'll just need to do something with that metadata, and clean it out of the file using a syntax converter, or it will be uploaded to Confluence as part of the page data.

For example, if your converter.xxx.properties file contained no converters, Confluence would create (or update) a page that had the exact contents of your original file.

What about the page title?
Each file created in the export process will have a filename. If you're using an exporter to output this data, it's probably worth your time to make the filename of the file the same as the desired pagetitle you want in Confluence. This is because the UWC's default behavior is to make the incoming page have a page title that is the same as the filename of the input file. You can always change the pagetitle, using the UWC Page Titles Framework, but if you're setting the filename with a UWC Exporter class, then you might as well save yourself a step.

Adding a new UI

The UWC currently has two supported UIs:

Disabling GUI Elements

If you find that your wiki converter's conversion strategy would be made easier by disabling elements in the gui, you can take advantage of the UWC GUI Disabling Framework. We used this with the Sharepoint Converter to encourage users to allow the UWC to automatically select the spacekeys, as multiple spacekeys could be necessary in a proper Sharepoint conversion.

Other UI Customizations

This is rare, so there are no frameworks to make it easier. If you're interested in this, I would recommend getting to know the following classes:

  • com.atlassian.uwc.ui.ConverterEngine
  • com.atlassian.uwc.ui.UWCForm3
  • com.atlassian.uwc.ui.UWCGuiModel
  • com.atlassian.uwc.ui.UWCUserSettings
  • anything in the com.atlassian.uwc.ui.listeners package

Underlying Structural Changes

Similar to #Other UI Customizations, this is fairly rare. If you're interested in this, you'll probably want to get to know the following classes:

  • com.atlassian.uwc.ui.ConverterEngine
    Where all the converters are run on all the pages, any hierarchies are applied, and all the pages are uploaded to Confluence
  • com.atlassian.uwc.ui.Page
    The object that is passed to each converter or hierarchy object which contains all the page data.
  • com.atlassian.uwc.ui.UWCUserSettings
    The object representing the user's settings such address, login, etc.
  • com.atlassian.uwc.converters.Converter
    The interface that all Converters must implement
  • com.atlassian.uwc.converters.BaseConverter
    The standard converter that most converter extend.
  • com.atlassian.uwc.exporters.Exporter
    The interface that all exporters must implement.
  • com.atlassian.uwc.hierarchies.HierarchyBuilder
    The interface that all hierarchies must implement.
  • com.atlassian.uwc.util.TokenMap
    A helper class used to handle tokenization and detokenization as described in #Tokenizing classes. Tokenizing is used to freeze changes to a particular segment of text, so that no further changes will be applied until detokenized.

Architecture

The Universal Wiki Converter is a client side (standalone) application, written in Java 5. It converts files containing wiki markup from the first wiki and then sends those files directly to Confluence, using Confluence's Remote API (via XMLRPC).

The XMLRPC aspect of the UWC is handled by the Confluence Remote Java Wrapper library.

Test Strategy

See UWC Testing Strategy

Development Tips

  • Locate a wiki syntax page for your origin wiki and bookmark it since you'll be visiting it frequently.
  • Create a test page which shows all the most popular syntaxes for the origin wiki that you'll be converting. This is a file you can the run directly through the UWC to test.
  • If you can get your hands on some (or all) of the source files you'll be converting that's very handy to have around.
  • Look for other existing converters from which you can borrow regular expressions. The currently supported wikis are listed here: UWC Supported Wikis
  • You do not need to restart the UWC to test changes to your properties file. The converters are reloaded from the properties file every time you convert.
  • We've had some issues with Java6. We recommend using Java 5 at this time.
  • If you're developing with IDEA and running through the debugger in most cases code changes can be recompiled and reloaded by the IDE without restarting the UWC.
  • We recommend having a regression testing strategy as adding new converters can easily break old converters. See UWC Testing Strategy
  • Having a file such as SampleTikiwiki-Input2.txt and then an output file which correctly converted text such as SampleTikiwiki-Expected2.txt is helpful both to you and other developers. Please check such files in under sampleData/<wikiName>. Remember, anything in the Input file will be translated with all the converters, so if you want to maintain some bit of "before" text for clarification of what you're trying to do, you may want to surround that text with code blocks or your wiki's equivalent.
  • The UWC comes with a regex testing tool, if you don't have such tool already handy. To access it, go to the Other Tools tab of the UWC, and click the Regex Tester button.
  • There are at least three 'kinds' of conversion regular expressions I find myself writing.
    1. conversions of things which don't want to be touched by other regular expressions. These include links, code blocks which and attachments among others
    2. escapes - when the original wiki uses characters that aren't meaningful in that wiki but ARE meaningful in Confluence you have to escape those characters or Confluence will take them as formatting and generally look strange
    3. other conversions which just kind of stack up against each other...bold, italics, tables.
      What's working well is to order the above conversion types as shown - 1) 2) 3). For the first type you generally want to tokenize the matches by using the java-regex-tokenizer converter type. This way you convert something but then it gets tokenized so as it won't be touched by any other conversion until it is 'de-tokenized' at the end.
  • Regular Expression Reference Links:
  • When you're testing your converter, you do not need to import to Confluence. Go to the Other Tools tab and uncheck "Pages will be sent to Confluence...". After the conversion, examine the results of your converted pages in the output/output directory. Consider using a shell script to quickly compare the results with an "expected" file.
  • Examine one suggested directory and file structure for UWC consumption for an example of directory and file structure which will allow the UWC to easily consume external content.
  • When developing an exporter, you may have a choice between XML or wiki syntax data. The XML will be harder to convert, so we recommend using the wiki syntax when you have the option.
  • When creating a new converter, start by tokenizing dollar signs. Lots of regexes don't handle dollar signs gracefully, so doing this first can solve a lot of future headaches. Try:

    Mywiki.0010-tokenizedollars.java-regex-tokenizer=([$]){replace-with}$1
    # Add the rest of your converters in between these two
    Mywiki.9000-detokenize.class=com.atlassian.uwc.converters.DetokenizerConverter
    
  • Try to avoid a lot of backtracking when non-greedy matching will do. For example, if you have a multi-char delimiter, and you're trying to deal with the possibility of single chars within the syntax:

    Sample syntax:
    '''Triple apostrophes should become italics'''
    '''What happens if we're using a single apostrophe ?'''
    
    And you want the desired Confluence to be:
    _Triple apostrophes should become italics_
    _What happens if we're using a single apostrophe ?_
    
    Yes:
    (?s)'{3}(.*?)'{3}{replace-with}_$1_
    which says: Use dot all mode, match three apostrophes, and everything after non-greedily/until the next three apostrophes
    
    No:
    '''(([^']|''?[^'])*?)'''{replace-with}_$1_
    which uses pipes to force backtracking if it has less than three apostrophes. This can cause stackoverflows with complicated inputs.