Migrating technical documentation to a new software or a component content management system (CCMS) is no easy task. It involves many steps, different programming languages, manual rework, and patience. More often than not, you will have to redo parts of the migration while you’re still half-way.
Migrations require planning and project management. Think of questions like:
What should we migrate first?
How can we combine the migration with our daily work?
How much rework vs. how much rewriting do we want?
How much work should we do internally? How much work should we outsource?
Unexpected problems will turn up, for example with images or internal hyperlinks.
Generative AI, meaning popular Large Language Models (LLMs), cannot help you much: stochastic models are not reliable enough and will hallucinate and/or produce gibberish output. What you need are rules, not inference.
As long as your content is structured, AI is not necessary: traditional programming languages like Java, .Net, or Python are more suitable. If you are migrating across XML schemas (for example, DocBook to DITA), then XSLT is a good bet.
Word is often a special case: even though Word documents are, in fact, XML documents, conversions via XSLT can be difficult. Word-to-XML applications are available on the market.
Unstructured content, like PDFs or Word documents without styles, cannot be migrated in a straightforward way. This process can be automated thanks to computer vision (another form of “AI”). For more information, see Structured content from old documents.
Set up a pilot migration with a sizable number of documents. It is important that you estimate the amount of pre-work and rework as precisely as possible.
After some trial and error, you might discover that you need to prepare and clean up your documents for the migration. Can this pre-work be automated, and how? Application-specific languages can be of help, for example Visual Basic for Applications (VBA) in Word or ExtendScript/JavaScript in FrameMaker.