Skip to end of metadata
Go to start of metadata

Universal Wiki Converter

Got a question?
Check this FAQ and the UWC Forum to see if it's already been answered, or email for support.

 

Conversion


What wikis are currently supported?

Supported Wikis

These pages contain notes relevant to the specific wiki converters supported by the UWC

Where do I submit enhancements or bugs?

Also, see UWC Bug Reporting

Does the UWC support SSL?

The UWC supports connecting to SSL protected Confluences. You will have to explicitly turn this feature on. Careful! If you choose the wrong settings, you could expose yourself to Man in the Middle attacks.

For more details on how to turn on this feature, see UWC SSL Support.

Why do I see the html tag as part of the conversion?

Example
{html}
<b>Some html</b>
{html}

In many cases there is no Confluence syntax which matches the original wiki's syntax. When this is of key importance the syntax is converted to html which in Confluence can be surrounded by the {html} macro. However this macro is disabled by default. To enable it in your Confluence you must go to Administration -> Plugins and enable it.

What can I do if the wiki I'm using isn't yet supported for Conversion? or There is no converter for the wiki I'm using. What should I do?

Please vote for your wiki on this page:
UWC Converter Vote

The code base is very open so you can develop a wiki converter

Also you can engage us or one of
Atlassian's Partners to create it for you. Send a note to cpteam@atlassian.com for more info.

How accurate are the conversions?

Realistically there is always a bit more to do with a converter as there are dozens of syntax statements for most wikis and people typically extend their base wiki syntax with plugins, but the ones listed above are currently very usable. Your mileage may very, but much work has been put into many of these modules to make them as accurate as possible. In some cases too there is no exact translation to Confluence syntax in which case things will look a little different.

Why use this instead of one of the other converters?

Our goal is to have this tool be:

  • up to date
  • user friendly
  • a solid framework upon which to add new converters
  • well documented
  • provide metadata features such as attachments, labels, parent-child relationships

Why sometimes when I send a page to Confluence does it not show up as 'updated' in the Dashboard?

If you send a page to Confluence it sometimes takes the Dashboard a few seconds to update. Also if you've already sent the exact same page it won't show up as changed. However if you change a single character of the page and send it again it should show as updated.

Note:
Updating the lucene index in your adminstration console will force the the recent results in the dashboard and the browse space -> pages view to update.

The browse space -> tree view is always up to date, regardless of lucene's indexing status.

How technical do I need to be to use this tool?

To convert a supported wiki in most cases you'll need the ability to install Java, do some Confluence administration tasks via the UI, and understand a bit about where your internal wiki stores it's pages and attachments. You may have to edit a properties file in some cases. But the goal is that you don't need to be a programmer.

Is this tool free to use?

Yes

What other wikis have people asked about converting?

See UWC Converter Vote.

I have a website I'd like to pull into Confluence. Is this possible?

Not without some work on your part. The UWC Xml Converter Framework has a number of HTML converter parsers that you could take advantage of, but they will probably not cover all of the different tags your website uses. And you'd have to create your own converter.xxx.properties file and add the parsers you wanted to it. For examples, see conf/converter.mediawiki.properties and conf/converter.testxml.properties.

Alternatively, the Jotspot Converter and Sharepoint Converter have to handle a lot of HTML, so you could try starting with that.

How can I add a question to this F.A.Q.?

Either feel free to edit this page with the question or add a comment to the bottom of the page.

Where is the output from the conversion?

The output basically gets sent to uwc/output/output, starting with wherever you run the uwc from. The UWC actually reads everything into memory and holds it there but outputs these files so you can see what the converted syntax looks like if you wish. Changing the files on the file system will not affect what gets sent to the UWC.

I see the converted files in the output directory, can I change them before sending to Confluence?

Yes, but it takes two steps. When you run the converter everything is held in memory. So changing the converted output files on the file system half way through the process will not have any affect. However you can

  1. turn off file uploads by going to the Other Tools page, and unchecking "Pages will be sent to Confluence..."
    Then send those "output" pages to Confluence either by:
  2. run the converter
  3. change the files on disk
  4. run the converter again, this time using those changed "output" files as your pages, but use the NoSyntaxConversions choice from the dropdown menu. (Then you're basically using the UWC to just send all the files to Confluence on this second step through.)
    or
  5. Using the Import Pages from Disk feature that comes with Confluence.

How tricky is it to convert a wiki?

With our current approach a set of regular expressions are authored which convert the origin wiki syntax to Confluence. This can be (in some cases) a little harder than it sounds.

For one thing people use syntax in the original wiki that is not meaningful there, but it is in Confluence. So forinstance if someone writes

-here is a phrase-

in confluence this text becomes strikethrough here is a phrase so it needs to be escaped. But you must escape this before you convert other syntaxt to confluence or you might accidently escape converted syntax.

One helpful strategy for particularly challenging conversion problems is tokenization. So for instance when you have some text which is ultimately between {code} tags you don't want to touch it in terms of escaping or anything else....but how can this be avoided? One strategy is to Base64 encode it near the beginning of the conversion and then decode it towards the end. In practice unfortunately Base64 encoding and decoding is pretty slow and even more so when run against thousands of pages.

A strategy which works better in the case of {code} blocks is to swap out all the text between what becomes {code} tags into memory, replace with some kind of token which will not be touched, then at the end swap the text back in.

Links are another challenge area where it's best to tokenize since there are typically such a large variety of ways to express both internal and external links.

An example here is something like

[this is not a link in TWiki]

but it would be in Confluence so must be escaped to

\[this is not a link in TWiki\]

...however such escaping must be done before conversion to Confluence links are made or you'll end up accidently escaping valid links. Tokening here as mentioned above does help a bit.

Another tricky problem is any syntax that allows multiple levels of nesting. (HTML, I'm looking at you.)

Regression testing is very important to make sure you don't backslide accidently in such cases. One strategy here is maintaining a file of syntax examples both pre conversion and post successful conversion. When you add something to your converter you can compare the results against the post successful conversion with a good quality diff program (I use JEdit's excellent JDiff Plugin). Also see UWC Testing Strategy.

Where can I ask questions about the UWC?

Here or the UWC Forum are good places.

During a conversion the UWC keeps getting stuck on the 1077th page (or some page). What is the problem?

I've seen something similar happen previously. Very occasionally the UWC gets stuck for some time on a single page which is either very very large or has a large number of some complex construct such as tables that it is trying to convert. Occasionally, Confluence will reject a page that contains certain unexpected characters. (Vertical tabs are a common culprit in this uncommon scenario).

When I've seen this in the past the page is not a user created page but rather a system created page...perhaps something like a history of all the edits for that wiki and several megabytes in size.

When the UWC gets stuck on that page look at the DOS window or Unix shell you were running the UWC from. At the very bottom it should tell you what file it is currently working on trying to convert and stuck on.

Then kill the UWC (close the window or ctrl-break) and run it again.

Then go and either move or delete the problem file else where (you could also just not select it or deselect it in the UWC) and try things again.

Essentially you probably want to eliminate that page from the conversion or handle it some other way. It's very likely you won't want that page anyway.

Will bugs I submit be fixed?

Possibly. The source code is available and version control accessible. There is a working ANT build system. So anyone is free to submit fixes.

See Also: UWC Bug Reporting

Can I convert HTML pages?

There isn't yet an HTML converter. However, the UWC Xml Converter Framework has several html handling converters available that you can take advantage of as a starting point. Since HTML syntax covers a wide range of possibilities (as well as possibly not being valid XML), you'll probably need to to a bit of development to get the conversion the way you want it.

Check out the conf/converter.mediawiki.properties file for examples of usage. (Search the file for .xmlevent converter properties.)

See also:

How do I build the UWC from the subversion source?

See: UWC Developer Documentation#Build Instructions

I'm getting a NoClassDefFoundError when I run the UWC, how can I fix this?

This means that something is wrong with the shell script.

Step 1: Complain loudly at the UWC Forum

Sometimes you can fix this yourself, but you'll need to figure out what library is missing, how to get it, and then you'll need to fix the shell script so that the classpath is updated. That can be quite a headache.

How can I combine Confluence spaces and rearrange pages post conversion?

Have a look at this video

The wiki I'm converting uses tags, labels, keywords or a similar type of metadata. How can I import those to labels to Confluence labels?

As of v.50, the UWC provides a labels handling framework. Simply use the Page object's labels methods in your Converter class to add labels to a given Page object. For more details and examples, see UWC Labels Framework.

My wiki has parent-child relationships or a directory structure, but the converted pages are all orphaned with no relationships. What happened?

The wiki converter you're using is probably not taking advantage of the UWC Hierarchy Builder Framework. If the framework is not being used, the UWC will upload all the pages as orphaned children in the same space.

Is there a way to upload to multiple spaces?

Yes. Please see the UWC Auto Detect Spacekeys Framework.

How many pages should I try to upload at once?

We haven't run into an upper limit at this time. We've had success with conversions with as many pages as 10000, and we routinely do conversions of 6000+. So, go ahead and do your worst. If you get OutOfMemory errors, first double check the 3 more likely culprits:

If those don't help, then try scaling the number of pages back and see how it goes.

How big can the attachments be?

We haven't run into hard limits here, but the Remote Api has been known to have issues with uploading large attachments. As of Confluence 2.9, they write that uploading attachments takes about 4x the amount of memory as the size of the attachment. For this reason, we recommend limiting attachment size to 10M. You can limit attachment size in the UWC. On the Other Tools tab, there is a Restrict Upload Attachment Size feature. For more info on that feature, see Restrict Attachment Size Doc. Also, remember your Confluence's General Configuration can also limit the size of attachments you want to upload.

In addition, if you have the Confluence Webdav Plugin installed, you can turn on the UWC Webdav Attachments feature to use your webdav to upload attachments. This is faster and takes less memory than the Remote API.

Out of Memory Errors
If you do run out of memory while attempting to upload attachments, the UWC will report an Internal Server error. That looks like this:

BAD_FILE: Couldn't send attachment somefile.doc. Skipping attachment.
java.io.IOException: Server returned HTTP response code: 500 for URL: your.confluenceurl/rpc/xmlrpc

To confirm why you got the Internal Server Error. Check your atlassian-confluence.log for errors like:

[[Standalone].[localhost].[/].[xmlrpc]] invoke Servlet.service() for servlet xmlrpc threw exception
java.lang.OutOfMemoryError: Java heap space

To accomodate your memory needs, you'll need to increase the memory your Confluence's tomcat has, or use the UWC Webdav Attachments feature. If you're using the Remote API, try increasing your tomcat's Free Memory to be at least 4x the size of your biggest attachment. If you're using webdav, you may still need to increase memory, but it will need less memory than the Remote API.

How big can the pages be?

Sometimes the UWC will choke on a page that's several Mb. Usually pages that are this large are artifacts of the old wiki: system pages, not user content. In which case, you probably don't want them anyway. We recommend that you take a look through your system for pages of unusual size and double check that you want them before trying to convert them.

Does it matter what database Confluence is using?

Confluence's bundled database, Hypersonic, sometimes runs into trouble when the UWC is uploading large quantities of data. If you're going to be doing a conversion of a large wiki, we recommend that the database Confluence is running be a production-level database (MySQL, Oracle, Postgres, etc.).

I have a very large wiki to convert. Are there any gotchas I should look out for?

Yes. See the following:

How do I connect to the UWC with a proxy?

Try changing the java command line arguments in the run_uwc* script of your choice like so:

from
java -Xms512m -Xmx512m -jar uwc.jar

to
java -Dhttp.proxyHost=proxy.mycorp.com -Dhttp.proxyPort=8080 -Xms512m -Xmx512m -jar uwc.jar

See Also:

How is regression testing and unit testing handled in the UWC?

See UWC Testing Strategy for Info on how the UWC project tests its code and handles regression testing.

I want to maintain the author and/or last updated information. The UWC sets that info all to my login and the date of the import. How can I fix that?

You can use the UWC UDMF Framework to import that data. You'll need to install the UDMF Plugin in order to use this feature.

But can't I just change the database?

So, as I'm not a DBA, and as Atlassian does not recommend directly making changes to your Confluence database, and as we're often discussing production level data, I can't recommend this course of action, nor can I advise you on it. If you choose to go this route, the only recommendations I can make are (a) please triple check that your data is backed up before you do this, and (b) please develop your solution on a test Confluence.

Generally speaking, we feel that this solution is like playing russian roulette with your data. Take that as you will.

I have a choice between exporting XML and exporting wiki syntax. Which should I choose?

If you decide to use xml, then you'll probably want to take advantage of the UWC Xml Converter Framework, which uses a SAX based event parsing model.
If you decide to use wiki syntax, you'll mostly be using regular expressions.

I would base the decision on a couple of questions:

  • Is there an existing converter that covers some of your existing syntax from which you can draw? For example, if you know that your wiki uses a subset of TWiki syntax, then it might be a good idea to start with the TWiki converter as an example, and therefore you'll want data that is similar to whatever TWiki exports.
  • Which sounds easier to you: "regular expressions" or "SAX event parsing"? Different people have different skill sets. If you already know alot about regular expressions and not so much about SAX parsing, then maybe you should go with regular expressions.
  • Does the xml use html type tags? The Xml Framework comes with assorted HTML handling classes that you can just use (bold, lists, etc.). So, you might save yourself some work by using the Xml Framework if you're dealing with html.
  • Are you comfortable with java? You don't need any particular java development knowledge to use the .java-regex, .perl, and .java-regex-tokenizer converter types. Added bonus: These types don't require rebuilding the project to make changes, so development is quicker.

What can I expect the UWC to provide? What can't it provide?

The UWC has a number of wiki converter modules that are designed to convert the majority of page content and syntax that most users take advantage of with that particular wiki. But the UWC cannot make a 100% conversion. There will always be something that doesn't quite translate. Either because: your old wiki did something that Conflence doesn't, you were using custom macros or plugins that the developer of your wiki's converter module couldn't have known about, your users chose clever uses of syntax that are hard to anticipate, etc. The UWC is a great way to start the process of moving your data from one wiki to another. It cannot automate every task. Your mileage will vary based on your particular needs.

Goals of the UWC project

  • To provide a GUI tool which makes converting a wiki to Confluence as easy as possible.
  • To provide a framework which handles common tasks such as sending pages and attachments to Confluence.
  • To provide a framework which makes it easy to re-use regular expressions others have written for convertering other wikis to Confluence.
  • To provide a solid foundation for continued open source development on this project through sufficient documentation and version control.

Things that are currently beyond the scope of this product

  • Converting users or user profiles.
    There is a 'bulk user upload' plugin for Confluence worth checking out if you're interested in pursuing user conversion.
  • Wiki syntax/markup for which there is no Confluence equivalent translation, for instance TWiki tables allow column spans while Confluence does not currently. Though no data should be lost the conversion is imperfect in such cases.
  • Forums

How do I turn on DEBUG messages?

Sometimes, when one is having a problem getting the UWC to do something, it's helpful to turn on DEBUG messages in the console (and uwc.log). To turn them on, do the following:

  1. edit log4j.properties
  2. Change the line that looks like this:

    #log4j.logger.com.atlassian.uwc=DEBUG
    

    to this (ie. uncomment it):

    log4j.logger.com.atlassian.uwc=DEBUG
    
  3. If you're turning on debug messages because you're having trouble connecting to an SSL encrypted Confluence (or you suspect a non UWC library is having trouble), also change the line that looks like:

    log4j.rootCategory=INFO, A1, A2
    

    to this:

    log4j.rootCategory=DEBUG, A1, A2
    
  4. Re-run the UWC, and re-do the failing operation. Now, both in the console and the uwc.log file, there will be a larger number of messages indicating what's going on.

Where is the uwc.log file?

If you're having trouble finding the uwc.log, look for it in the same directory you ran the UWC from.

I'd like to submit some code to the UWC project. Who should I contact? What format should I send the code in?

We'd love to have your code. Thank you!

Please contact laura.kolker@gmail.com to discuss code submissions.

As for format, we are willing to except entire files or zips of files. Patch files are also a good method.
Please consider the following guidelines when submitting:

  1. If your patch comprises more than one feature, each patch file (or zip of files) should handle no more than one feature. (A feature could be "New XYZ Framework", "Improve Syntax Converters for Mediawiki Tables", "New Wiki Converter" etc.)
  2. Include a note along with each patch file summarizing what it does.
  3. If the feature for a patch file is a refactor, include an explanation for how the refactor is beneficial to the project.
  4. If you are improving or adding syntax converters, please consider providing sample files, to make testing (and comprehension of your changes) easier. You can read more about our test strategies and the sort of files we would be looking for here: UWC Testing Strategy
  5. Please tell us which svn revision you worked on. You can find out by typing svn info in the directory you originally checked out:

    $shell$ svn info
    ...
    Revision: 12345
    ...
    
  6. Code you submit will be released under the standard UWC license. If there are potential licensing concerns that you (or perhaps your own clients) have, please resolve those before submitting your code. There's nothing we can do to help with that problem anyway, and we can't use code that's not being released to the public under the standard license.

Following these guidelines will make the process of incorporating your code go more smoothly and be more likely to succeed.

Again, Thanks!

I'd like to create a new converter.xxx.properties file or exporter.xxx.properties file. How do I get the UWC to recognize my new file, and show it as an option in the drop-down menu?

The GUI figures out which wikitypes to include by looking for converter.xxx.properties files in the conf/ directory. It enables the exporter button for wikis that have both converter and exporter.xxx.properties files. So just create your file in the conf directory, and make sure if you're making an exporter file that there's a comparable converter file to go with it.

The files don't have to have any particular content in order for the GUI to recognize them. For that purpose, the file can be empty.

I'm getting a FileNotFoundException in the output/output directory. File permissions are correct. What's wrong?

Several of the UWC wiki converters use the ChopPageExtensionsConverter which removes the ".xyz" at the end of the filename. The problem is that it doesn't know what to do if you give it a hidden file that starts with a dot. For example: .bash_profile

So, ChopPageExtensionsConverter gets rid of everything after the dot, which is everything, and then fails when it tries to to write to a file with the resulting no name.

So, if:

  1. you're getting FileNotFoundExceptions that look like this:

    2008-12-15 10:49:35,164 ERROR Thread-3 - Error writing to file output\output\.
    
    Note: Output file cannot be written to disk. Check permissions.
    java.io.FileNotFoundException: output\output (Access is denied)
    
  2. and your wiki converter uses the ChopPageExtensionsConverter

then check to see if amongst the pages you gave the UWC there are any files with filenames beginning with a dot. If so, you'll need to handle those files seperately, assuming that you want them. There may be a small enough number of them that handling them manually is reasonable. Alternatively, you could add an additional converter to deal with the title issue using the UWC Page Titles Framework.

I'm getting a HeadlessException when I try to run the UWC. What's wrong?

If you're getting an Exception that looks like this:

Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException

then java is trying to tell you that it can't run a graphical user interface on the system you're trying to run it on.
Perhaps you're using a Linux box that doesn't have X set up?

Options you have:

  1. Try the UWC Command Line Interface.
  2. If you'd prefer to use the GUI, then you can't run it headless. Use a system that always handles GUIs (Windows, Mac, etc) or make sure you have X up and running.

I want to turn off the Auto Detect Spacekeys Feature. How do I do that? or The spacekey textfield is disabled. What's going on?

The UWC Auto Detect Spacekeys Framework is used by some wiki converters to handle the possibility of namespace collisions. For example, Sharepoint can have multiple wiki sites with pages that have the same name. In order not to overwrite non-unique pagenames by uploading them all to the same space, the UWC can be set to upload to multiple spaces by turning on the auto detect spacekeys feature.

If this feature is turned on for your wiki, and you want to turn it off, below are instructions for how to do that.

Careful!

Be sure your wiki's data does not fit the namespace collision scenario described above, or turning off the auto detect spacekeys feature might cause you to lose some of your data.

To turn it off:

  1. Comment the setting in your converter's conf/converter.xxx.properties files. It will look like this:

    MyWiki.0001.autodetect-spacekeys=true
    

    This will turn off the auto-detect keys framework.

  2. Comment your wiki's gui disabling setting in conf/guisettings.properties, which will look like this:

    mywiki=com.atlassian.uwc.ui.guisettings.MyWikiGuiDisabler
    

    This will enable the spacekeys textfield in the gui.

  3. Restart the UWC

I want to make changes to my files using my own script. Can I still leverage the UWC?

Yes. Remember, you can always pick and choose which converters you want the UWC to use by commenting out any unnecessary ones from your converter.xxx.properties file. There's nothing that says you can't pre-process your files any way you want. Just remember that your wiki's converter module may not be prepared to handle your processed input. So, remember to do some testing, if you're combining your own script with pre-existing UWC converter modules.

You can also use the UWC simply to upload your files by using the NoSyntaxConversions wiki type from the drop-down menu. This will do nothing to the files but upload them to the Confluence space you indicate. Note: you can use Confluence's existing Import Pages From Disk feature to upload files to Confluence without changes.

A good way to leverage the UWC with your own syntax converting script is to let the UWC handle metadata and attachments, which Confluence's Import Pages From Disk feature can't do for you. For example: let's say you have a perl script to handle all your syntax changes. You run the script on your pages, and they're now Confluence ready. Since you also happen to want to automatically detect and attach attachments, you comment all the converters except the AttachmentConverter from your converter.xxx.properties file, and then run the UWC on your pages.

Does the UWC support Siteminder? or other SSO?

The UWC does not currently support SiteMinder or other SSO systems. That being said: the UWC can work in an environment where Siteminder is used to protect your Confluence. However this requires Siteminder to be set up properly so as not to protect the xml-rpc or soap portions of the Confluence application, which it arguably shouldn't be doing anyway. (SSO often assume user interaction for it to work properly, which a Remote API by definition cannot do.)

Things you can do if your Confluence's login process is managed by Siteminder:

Disable Siteminder Protection for your Confluence's Remote API
If you want to get the UWC working directly with your Siteminder protected Confluence, what you need to do is

  1. make sure that your Confluence's Remote API is not protected by Siteminder. Your Siteminder admin will need to add some sort of rule to disable Siteminder for the remote API. The Remote Api's url will be:

    your confluence url + "/rpc/xmlrpc"
    
  2. You'll also need to get a Confluence login (that isn't managed by Siteminder), so that you can give the UWC a login that it can use.

Alteratively:
Migrate locally - Export to prod via Web UI

  1. Install a local Confluence onto your desktop. (Use the same version of Confluence as your Siteminder protected one.)
  2. Use the UWC to migrate your wiki to this local Confluence. Migrate to a spacekey that your Siteminder Confluence doesn't currently use.
  3. Use Confluence's export/import features to export the space and then import it back into the Siteminder protected Confluence. (You'll need confluence-administrator access/cooperation to do this.)

How do I add a library jar to the UWC?

The UWC tries to be flexible. While we bundle the libraries we expect you to need with the UWC, sometimes we won't anticipate a customization that you'll need to do your migration. For example: Let's say you're migrating your mediawiki, but your mediawiki database isn't MySQL. The mediawiki exporter allows you to tell us what database to use but you'd have to provide the driver jar.

If you're already building the UWC from source, you can just copy your jar to the lib directory.

But if you're not building the UWC from source already, here's what you need to do:

  • Starting in your uwc directory, create a lib directory, and copy your jar to it

    $shell$ cd uwc
    $shell$ mkdir lib
    $shell$ cp path-to-jar/somelib.jar lib/.
    
  • create a classes directory, copy your uwc.jar to it, and explode it

    $shell$ mkdir classes
    $shell$ mv uwc.jar classes/.
    $shell$ cd classes
    $shell$ jar xvf uwc.jar
    $shell$ cd ..
    
  • Use the run_uwc_devel.sh script, which will use the classes directory to run the UWC, and add the contents of the lib directory to the classpath

    $shell$ ./run_uwc_devel.sh
    

    If you're on a Windows system that's not shell-script friendly, you'll need to create your own run.bat file which should look something like this:

    set UWCClasspath=classes
    set UWCClasspath=%UWCClasspath%;lib\dothislineforeachlibrary.jar
    
    -java -Xms512m -Xmx512m -classpath %UWCClasspath% com.atlassian.uwc.ui.UWCForm3
    

    The first line adds the classes directory (with your exploded jar contents) to the classpath.
    The second line appends the location of the indicated jar to the classpath. You can repeat this line for every jar you're trying to add.
    The last line sets the classpath and calls the UWC.

I had two pages that had case sensitive names and now there's only one. What happened to the other one? or How do I handle namespace collisions? or what's a NAMESPACE_COLLISION error and why did I get one?

Different wikis sometimes handle namespace concepts differently. This means that one wiki might allow you to have two pages with the same name as long as they have different parent pages or are in different "sites", etc. While another wiki might not allow that. Some wikis might allow case sensitive naming ("PAGENAME" vs. "pagename"), while others will not. Even just terminology may be subtly different. Mediawiki has namespaces while Confluence has spaces. Are these just a different name for the same thing? Or not?

As of UWC 3.5.0, the UWC will attempt to detect Namespace Collisions and provide errors regarding what pages are having problems. If you get a namespace collision, the error will look something like:

NAMESPACE_COLLISION: Potential namespace collision detected for pages:
sampleData/hierarchy/case-sensitivity/A/foo, sampleData/hierarchy/case-sensitivity/B/Foo

You can turn this option off, by setting the following property in your conf/converter.xxx.properties file:

MyWiki.0001.list-collisions.property=false

Generally, we recommend that you try to find out as much as possible about how your wikis differ from Confluence in how they define namespaces before attempting to migrate. This will help you isolate potential pitfalls before your migration.

Here are some things to know about Confluence's naming conventions:

  • Any page in a space must have a unique name. It doesn't matter if it has a different parent. The parent child relationships do not change this fundamental requirement. See CONF-2524 - Enable creation of same-named pages within a space for the long-running discussion on this issue.
  • Pagenames are case insensitive. If you try to migrate pages "Foo" and "foo" to Confluence, whichever one is migrated second will overwrite the former.

If you suspect that you have "lost" a page due to a naming convention issue, it's probably in the history of the similarly named page.

The UWC tries on a wiki-by-wiki basis to help avoid these problems. Sometimes by using the UWC Auto Detect Spacekeys Framework. Sometimes by linking to preparation tools in the wiki specific notes. Sometimes you can take advantage of the UWC Hierarchy Framework's - Use Pagenames option. But even with these tools sometimes there are limitations to automating namespace collision problems. For example, if you have two pages "Foo" and "foo". The UWC can't know the "right" way to rename these pages in a way that differentiates them meaningfully. Generally, that requires human intervention.

Don't forget links

If you're going to manually (or otherwise) change the page titles of pages with namespace collisions, don't forget to update the links to the changed pages.

What is the difference between stable, alpha, and user contributed wiki converters?

These aren't meant to be hard and fast terms. Generally, they tend to mean something along the following lines:

  • stable - developed with or tested heavily against real data (from an actual wiki of that type that had users, pages, etc.)
  • alpha - developed with or tested mostly against sample data. Alpha releases will be more likely to have syntax edge cases that haven't been anticipated.
  • user contributed - Developed mostly by someone not typically associated with the UWC project. This is mostly an indication that the converter is a less well-known quantity, and it may be more difficult to get knowledgable support as the current UWC team might not have in depth knowledge of how this converter functions.

Some characters in my files are turned into question marks. What do I do? Or how do I fix a problem with the character encoding?

The UWC currently presumes your characters are encoded in an encoding that either is or is a subset of UTF-8  . If yours is not (ISO-8859  -1, I'm looking at you), you may find your characters are not displayed correctly at the end of the conversion.

Use the Character Encoding Property

You can tell the UWC the correct character encoding to use with the UWC Character Encoding Feature.

Alternatively, you can transform your documents to UTF-8   using unix tools like perl and iconv.
If you use a Mac or Unix style system, you may already have these tools. Otherwise, you may need to install them, and possibly learn how they work.

Here's our recommendation for what to do.
To transform one file:

$shell$ iconv -c -f ISO-8859-1 -t UTF-8 in.txt > out.txt

which translates to: Use the iconv program to transform the file in.txt from a character encoding of ISO-8859  -1 to a character encoding of UTF-8   and output this to the file out.txt.

Now, you're probably not going to want to do this individually to each file. A way to transform all the .txt files in a tree branch of your file system would be:
To transform a directory of files:

$shell$ find . -type f -a -name '*.txt' | \
perl -nle 'rename $_, "$_.orig"; print "$_.orig"; print "$_"' | \
xargs -L2 sh -c 'iconv -f ISO-8859-1 -t UTF-8 "$1" > "$2"' -

which should find all the files in the current directory and all it's children that end in .txt, copy the file to foo.txt.orig, and iconv the original file to UTF-8   encoding, swallowing any encoding transformation errors as they occur.

How do I move pages, spaces or entire Confluence sites between Confluence servers?

First of all there are better ways of moving content between Confluence servers than the UWC. There is no built in way for the UWC to do this - though it could be created there is no need.

Moving an entire Confluence site:

  • The easiest way to do this is to do a site backup and then restore (if it works). There are limitations on the size of what will work here - specifically 2GB is the limit. Attachments are what eat away at the limit. Optionally you can backup just the data and move the attachments manually. More info on that process here
  • Here is another page and doing it more manually - this is a bit more complicated but also works in all scenarios - Migrating Confluence Between Servers

Copying an individual space or some pages

More infomation:

I get a BAD_SPACE or USER_NOT_PERMITTED error when I try to connect to Confluence. What could be wrong?

Here are the most likely problems:

  1. the spacekey doesn't exist on that Confluence
  2. the login you're using doesn't have page creation permissions for that spacekey.
  3. Your confluence is protected by a Single Sign On system. See Does the UWC support Siteminder? or other SSO?

Still having problems? Try turning on the UWC Auto Detect Spacekeys Framework. If it works, then you were having a user permissions or spacekey typo problem. If it doesn't work, then you're having some sort of system/networking problem.

What version of Java should I use?

Use Sun's Java 6.

Users report that GCJ doesn't load the GUI properly.
Fedora's IcedTea is untested.

Sun's Java 5 would theoretically work if you wanted to build the source (and the CRJW dependency library) yourself, but we're not supporting it at this time.

I get a 403 Forbidden error. What's wrong?

A 403 error usually means that the resource exists, but you can't have it for some reason.
What this probably means is the Remote API isn't turned on. See UWC Quick Start - Prerequisites.

How do I figure out the address or url setting?

The address setting needs to be the url to your confluence, the same way you'd type it in your browser, whatever that happens to be.
For example, let's say you login to Confluence, and go to your Dashboard. The url bar might have something that looks like:

http://localhost:8080/confluence/dashboard.action

Everything before the "/dashboard.action" is your confluence url. So, you would tell the UWC:

http://localhost:8080/confluence
or
localhost:8080/confluence

Example 2: Let's say your dashboard url looks like this:

https://mysite.com/dashboard.action

The https part says your confluence is protected by SSL. Be sure to put the protocol in your address setting, as this is required to tell the UWC to use SSL to connect. See also: UWC SSL Support

In addition, unlike the previous example, there's no context path (the /confluence). So your address setting would then be:

https://mysite.com

Where do I run the UWC?

Sometimes we here the following:

My source wiki is on a server, and my confluence is on a different remote server. Where do I run the UWC? My local machine? One of the remote servers?

So there are several parts to this process.

Export
Each wikis export process is different (see your wiki specific notes for details), but the goal will be to export your source wiki data to the machine you are running the UWC from. This could be your local machine, or the source wiki server, depending on your wiki, your organization, your level of administrative access, etc. Sometimes you will provide database settings, and the UWC will export data from your remote database to a file system. Sometimes you will need to copy your wikis data from the server file system to your local file system. But the goal remains the same: export your data to the same machine that you will be running the UWC from.

Attachments
As with export, attachment settings are wiki specific. Again, the general idea is to get a directory of attachment data to the same machine you are running the UWC from. For example, mediawiki attachments are saved to the server's file system in a directory. If you were running the UWC from your local system, then you would need to copy that attachments directory to your local system.

Confluence Conversion and Upload
To run the converter and upload your data to your Confluence, you will provide your Confluence's url as well as your confluence login information to the UWC. The UWC will use the Remote Api to your Confluence to send that data from your local file system over your network to your Confluence.

What if Confluence fails to upload the file?

Sometimes confluence will reject a converted page for some reason. You'll need to check

  1. first, the uwc.log to see if it tells what the problem is.
  2. and often, second, the atlassian-confluence.log (in your confluence-home/logs dir) to see if it sheds additional light.

There can be many reasons confluence will reject a page. Here is an example of one such reason, and how to deal with it.

Example of how to deal with a confluence rejected page

You have an error in your uwc.log that looks like:

2010-03-03 15:15:33,034 ERROR [main] - Unknown problem occured while sending page 'PAGENAME'. See atlassian-confluence.log for more details.

You check your atlassian-confluence log, and it says during the proper time period (truncated unnecessary stack traces):

2010-03-04 13:51:14,193 ERROR [http-8082-5] [confluence.rpc.xmlrpc.XmlRpcServer] serviceXmlRpcRequest Exception
servicing XML-RPC request: org.apache.xmlrpc.ParseFailed: Character reference "&#7" is an invalid XML character.
...
2010-03-03 15:27:33,848 ERROR [http-8082-3] [confluence.rpc.xmlrpc.XmlRpcServer] service javax.servlet.ServletException:
org.apache.xmlrpc.ParseFailed: Character reference "&#7" is an invalid XML character.

Confluence tried to transform the page data into valid xml for its own reasons, but the page data had an illegal xml character. This usually means a control character (not letters, numbers, etc) that probably is in your data by mistake.

To fix the problem, I need to identify the control character that's causing a problem.

  1. I need to identify the section of the page with the bad character, so I copy the file, delete half of it, and run the converter on it. Do I get the same error? Yes? Delete another half of the file and try again. No? Revert the file to its former state, and delete half of the previously deleted content. The idea is to quickly narrow down which line has the offending character. You've found the problematic line when its inclusion causes the failure, and its exclusion fixes it.
  2. Now that I know which line is the problem, I run od (octal dump), a unix utility, on the file, like so:

    $shell$ od -c test.txt
    

    You'll see the contents of your file, and characters like newlines and other control characters will be represented. Here's an example of a text file with only the words: "testing 123" and a newline.

    Ed:~/tmp laura$ od -c abc.txt
    0000000    t   e   s   t   i   n   g       1   2   3  \n
    0000014
    
  3. Now that you can see the guts of your content, even the normally hidden ones, you can identify what needs to be removed, so that confluence will allow the page to be uploaded. Characters like \n, \r, \t are fine. They represents newlines, carriage returns and tabs. Most other characters starting with a backslash (\a, \b, \v) are potential problems. Also, if you see a bunch of numbers spaced very closely together (as opposed to the way they're spaced above), that's indicative of an encoded character that you might need to handle.
    See also control character table.
  4. So, you've identified the character. Next, you should compare that section of the exported file to the original wiki page to determine what you should do with it. Is it doing something useful? Or is it a hidden character?
  5. Chances are it's hidden, and you want to get rid of it. You can manually get rid of the character by editing your file, and you can also add a converter to have the UWC remove it for you. If it's a control character that can be represented with a backslash: like \a, you can add the following to your converter.xxx.properties:

    wiki.0001.control.java-regex=\a{replace-with}
    

    If it's not one of the control characters with that sort of shortcut, then you'll have to work out the hex value, and use the \u option in your regular expression

    wiki.0001.test.java-regex=\u00a0{replace-with}
    

I'm getting Out of Memory errors? What to do?

Note that the UWC is a seperate java application from confluence.
So in short, if you get Out of Memory errors in your atlassian-confluence.log when you run the UWC, then you would up your confluence's tomcat's memory.

But if you're getting Out of Memory errors in your uwc.log (or on the console when running the uwc), then you would need to allocate sufficient memory to the UWC itself.

To increase the memory for the UWC, edit whichever sh or bat file you are running (run_uwc.bat, run_cmdline.sh, etc.), and update the -Xmx setting there. For example, if I wanted to increase the memory to the UWC when running it using run_cmdline.sh, I would edit the file, search for the line with '-Xmx' (which represents max heap) and change that to:

java -Xms256m -Xmx1024m $APPLE_ARGS -classpath $CLASSPATH com.atlassian.uwc.ui.UWCCommandLineInterface $1 $2 $3 $4

Remember to think about how much memory you have on your system and what other applications are using that memory when choosing a new max heap value.

Where can I get my questions answered?