Cool! We Have a New Collaboration Tool… Now What About Data Migration?
When a new tool is introduced into an organization with the intent of replacing an existing tool, there are many things to consider. Consequently, the process of moving the existing data from within the old tool into the new tool is not always at the forefront of such considerations and is most often written off as a trivial procedure that does not require much effort or thought. Existing tools are always well entrenched into an organizations current workflow and the result of such an entrenchment means that there is usually a massive amount of data that must be migrated into the new tool. This data currently residing in the existing tool is usually of great importance and an organization would want to be ensured that all of the historical effort expended to build up such data would not have been wasted. Requiring an organization to start with an empty data set after the switch to a new tool is usually not an option. Recently, we at Avant were presented with a client who intended to retool their software documentation knowledge base in order to allow for better collaboration on their existing documentation and to move away from a static/unchanging documentation model. This client decided that Atlassian Confluence was their new tool of choice. The idea was that once the existing documentation was migrated into the client’s Confluence instance, all employees would be able to collaborate instantaneously on the documentation in order to make it an ever-evolving knowledge base that better reflected the current state of the software that it documented at any point of time. Any content that was obscure or incorrect could easily be modified as encountered instead of the historical process of having to log an issue and wait for a colleague with the ability/know-how to modify the documentation within the Mad Cap Flare editor and subsequently publish the updated version for the rest of the organization to view. Moving to Atlassian’s Confluence offering was an easy choice for our client but migrating the existing documentation into Confluence from within Mad Cap Flare was proving to be extremely problematic. After the client performed some initial research on import options available within Confluence they had initially took it upon themselves to attempt the migration of approximately 350 tightly coupled pages via Mad Cap’s Microsoft Word export process. Confluence provides a simple Word document import functionality on a page per page basis but this “connector” was intended for text only which meant that all screenshots and icons within the original documentation would be lost during the import. It was quickly obvious to the client that this process was going to be very time consuming and would involve a large amount of manual work since none of the images shown within the exported Word document were visible within the resulting Confluence page, all links were no longer valid and things just generally didn’t look great. The whole point of moving to Confluence from Mad Cap Flare was to improve the documentation and to utilize the collaboration tools and functionality within Confluence to the maximum extent possible without having to rewrite the existing documentation. After failing to come up with a reasonable strategy to migrate their existing documentation from Mad Cap Flare into Atlassian Confluence the client came to Avant for our expert help and advice.
We Definitely Need an Automated Solution… But Where Do We Start?
When we at Avant were presented with our client’s migration situation we initially thought that surely somebody somewhere out there had come up against this problem in the past. Our initial analysis revealed an existing wiki conversion tool called the Universal Wiki Converter (UWC) which is developed and maintained by AppFusions to convert a variety of different wiki formats from original source markup into the expected Confluence markup. However, after thorough analysis of the UWC tool’s existing functionality/features and an exhaustive search of the discussions within the Atlassian Developer Portal along with other various message boards/blogs we came to the conclusion that nobody anywhere has efficiently migrated a complete documentation set from Mad Cap Flare into Atlassian Confluence without an immense amount of manual intervention, quite possibly performing every step of the process of such a conversion manually. Our client was not the first to come up against this challenge as there were numerous questions about how one would perform such a migration on many of the popular message boards and blogs relating to wiki migration across the internet. None of these conversations yielded anything of substantial use. There appeared to be some Perl API’s available to do simple HTML parsing but nothing that would provide for a migration that would result in a feature rich Confluence knowledge base that we at Avant wanted to deliver to our client. High quality is what our clients expect when working with Avant. Turning back to the UWC tool, further analysis revealed that this tool was in fact open source which gave us the ability to acquire the source code and extend the tool to support the exact conversion that we required. By no means was this a straight-forward task but after gaining a thorough understanding of the tool’s source code (written in Java) along with the developer documentation available on Atlassian’s developer portal, we were able to add Mad Cap Flare to the list of supported wiki conversions by developing a multitude of Java based custom converter and builder extension classes. This effectively allowed us to focus on the conversion itself instead of having to put any unnecessary focus on interacting with the XML-RPC API exposed by Confluence which was the most enticing reason on our decision to settle on extending the UWC versus writing a new tool from scratch. The exported documentation set provided to us by our client was in a proprietary XHTML format and came along with all of the page referenced resources like images, pdf files, and video files (.swf, .wmv, .mov) necessary to rebuild the documentation in Confluence. A few critical XML files were located within the data which we required to fully understand the layout of the table of contents, the relationship between pages and the glossary items that rounded out the Mad Cap Flare output. We now had a base tool to work with and expand, as well as detailed knowledge of how the source documentation data was output and displayed by Mad Cap Flare. We were ready to start understanding what changes were necessary to be made to the UWC tool in order to reduce the manual effort down to as little as possible within the time we made available. We identified that the following tasks would need to be automated via the UWC tool in order for us to efficiently and effectively migrate the documentation into Confluence:
Page Referenced Resources & Content
The Mad Cap Flare documentation output was highly screenshot based and heavily referenced both internal and external resources. One of the main shortcomings we identified with the existing Word import process was that it offered no way to automate migration of any image, pdf or video resources included or referenced within a page. We definitely did not want to manually identify, upload and embed each individual resource on every page within the documentation set, this would have been an extremely time consuming process. Our automated solution needed to identify each and every resource that was referenced within a page and ensure that they got uploaded as attachments into Confluence when the UWC uploaded the resulting migrated page content. For image and video content, we needed to ensure that they would correctly appear inline within the page content. For PDF documents, we needed to ensure that the links would allow the PDF document to reference the correct attachment uploaded with the page. The UWC tool had existing converters for other wiki types that already automated this process but didn’t quite work for the XHTML used within the Mad Cap documentation export so a new converter class was created to account for these differences. The main difference we observed with Mad Cap images was that they used scripting to control the display of images which allowed for the higher resolution image to appear in a popup as the user hovered over the thumbnail with their mouse, while in Confluence, the embedded image can have a defined size while allowing the full sized image to be shown when the image is clicked on. Since we would only require one image, we would need to ensure that the high resolution image was uploaded to Confluence and embedded within the page as opposed to the thumbnail.
Table of Contents & Page Hierarchy
We knew that we needed to maintain the exact layout of the Table of Contents during the migration process but we didn’t anticipate the problem we would come up against in doing so. The UWC tool already had varied support for page hierarchy schemas that would work well for a variety of supported wiki conversions and at first glance we thought we would be able to leverage the existing FilePathHierarchy builder class. The documentation states that when utilized, this builder class will automatically create parent-child page relationships using the directory tree structure on your file system. Upon review of the file system layout of the html files within the Mad Cap export provided to us, this builder class looked like it would do the trick for our purpose. However, once we dug deeper into the file structure and compared it against the layout of the Mad Cap Table of Contents output it became obvious that there really was no parent-child page relationship based on the existing file system structure. Manually modifying the file structure to reflect the intended parent-child page relationships was not an option as the resource path links within the html files would be broken. We quickly realized that none of the existing Hierarchy Builder classes were going to work for our purposes and a new class would have to be created for this Mad Cap migration process. This new hierarchy builder class would customize the order in which the UWC tool uploaded the converted pages to Confluence to ensure that the page level hierarchy would be maintained after the migration. This was to be accomplished by referencing the XML file provided within the Mad Cap export data that provided the details of the hierarchical nature of the original documentation. On the initial upload to Confluence, pages within each hierarchical level were being ordered alphabetically by default, unfortunately there didn’t appear to be any way to control this on the initial page upload. To remedy this issue, we implemented a post page upload routine to ensure that the uploaded pages maintained their expected order within their hierarchical level. The Mad Cap Table of Contents XML file also provided us with the full name of each page as shown in the original Mad Cap Table of Contents which we were then able to use in order to provide Confluence with a friendly and unique page name going forward. This was also extremely helpful to us in converting internal page hyperlinks into Confluence markup since we could no longer rely on the file name and path specified within the original source output.
Linking Related Content Across Pages
One of the main features of any decent documentation or knowledge base enables the ability to link related pages and provide the user with the ability to quickly navigate to the identified related pages either directly or via keyword searching. Mad Cap provided users with the ability to navigate to related pages through a “See Also” link that when clicked, launched a popup generated through script that listed all of the pages that relate to the current page. Clicking on one of the linked topics (See screenshot below) would direct the user to the related page. We wanted to ensure that the related pages within the migrated documentation would be more obvious after the migration than what the end user experience would have been previously with the Mad Cap functionality. After some analysis of the macros built into Confluence that work with Page labeling, we felt that the Content by Label Macro fit the bill perfectly. This macro lists all of the pages that exist within the Confluence space that have a defined set of labels attached to the page. This would also have the added benefit of dynamically referencing any new pages added to the documentation in the future.
In order to accomplish this functionality, we created a new UWC Converter that would effectively analyze each html source page to extract the related concept information and automatically create page labels based on the concept relationships. These labels would then need to be fed into the Content by Label Macro and inserted at the bottom of each migrated page output in the expected Confluence style markup for the macro. Adding the extracted labels to the final converted Confluence page in addition to using them within the macro would allow for fully indexed searching of all of the resulting pages using the default search capabilities built into Confluence.
Expandable Content Containers
Many of the Mad Cap documentation source pages heavily utilized expandable containers. Within Mad Cap, these expandable containers were always initially closed upon navigation to the page, this allowed for a lot of content to be made available on each page while only initially displaying a subset of the data. We felt that this really reduced the clutter of content on each individual page and we needed to ensure that we maintained this behavior when migrating the pages into Confluence. After reviewing the available Confluence built-in macros we quickly found the Expand Macro to be a perfect fit for this functionality.
The automation of migrating the Mad Cap drop down containers into Confluence Expand Macros was a fairly straight-forward process since we could specifically identify these containers through a series of regular expression statements and could then perform the find and replace on every instance encountered using the UWC Java-RegEx class converter. Once we identified the required regular expressions necessary to match the entire container and specifically capturing the internal data that we wanted the Expand Macro to contain, the only work necessary was to add a line item into the configuration file for the UWC to call on during the page conversion process. No new converter class was required to be created to accomplish this portion of the migration, although we could have easily created one to do the work if any additional changes needed to be made to the macro that a straight forward find-replace routine couldn’t accomplish.
Highlighted Information Panels
One of the more prominent visual indications of important information within the Mad Cap source documentation were various highlighted text paragraphs that we needed to ensure would have the same level of visual importance after migration into Confluence. These visual indications were heavily used in the client’s documentation that we were migrating. The Panel Macro was the perfect fit to allow us to match the visual element almost exactly (and in our opinion, better).
A new UWC Converter class was created to identify the various ways in which these highlighted information panels were created in the Mad Cap source documentation. We were able to distinguish the panel types by their differing paragraph class attributes in order to style them with the appropriate colors in the resulting migrated Confluence page output.
The End Result Is What Matters The Most…
In the end, our enhancements to the UWC tool allowed us to get approximately 99% of the conversion done as we expected in an automated fashion but because of the unstructured nature of HTML it was inevitable that there would be content that would still need to be adjusted manually. We were able to very effectively automate the migration process from Mad Cap Flare to Atlassian Confluence within the budget afforded to us, keeping in mind the high quality that our clients expect when working with Avant. We could have spent much more time refining and tweaking the tool and its associated converters to get that much closer to 100% but couldn’t justify the time and cost of doing so. With that in mind, a thorough quality assurance pass and tweak of each of the migrated pages within Confluence allowed us to have 100% confidence in the deliverable that we provided to our client to ensure they would be as happy with the resulting migration as we were!
|AppFusions Universal Wiki Converter||https://migrations.atlassian.net/wiki/display/UWC/Universal+Wiki+Converter|
|UWC Source Code Repository||https://bitbucket.org/appfusions/universal-wiki-converter/|
|UWC Hierarchy Builder Framework||https://migrations.atlassian.net/wiki/display/UWC/UWC+Hierarchy+Builder+Framework|