How to convert from PDF to Word format

When PDF files are converted to Word format in order to be translated there are a few things that have to be taken into account depending on the result desired by the client.

For specific client instructions, see the paragraph "Example on client specific instructions".

This instruction is intended as a general checklist of things to be considered for the different levels of end result. But a general piece of advice is to avoid text boxes if possible.

General things to consider

  • The files must be in docx format.
  • Avoid Section breaks if possible or at least minimize the use.

Different levels of conversion

The client can ask for at least three different levels after conversion:

  • Only the text should be extracted and the layout is not important (level 1)
  • Everything should be extracted but the layout need not be exactly as the original (level 2)
  • The extracted file should look like the original (level 3)

Conversion for level 1

This is obviously the easiest one. Since the text can be extracted without concern about the layout or formatting, the things to consider is to remove unnecessary line breaks and make sure text flow of the text makes sense.

Conversion for level 2

In this case the extracted file should be slightly more similar to the original. All content including graphics should be extracted but the layout do not need to be exactly as the original as long as the flow of the text is similar and of course readable. The original number of pages should be kept as well.

Example of things that can be skipped/simplified:

  • Columns
    Multi-column layout can be removed
  • Flow charts
    Flow charts can be represented by for example a table
  • Drawings
    These are mostly extracted as gibberish and is difficult to handle. Leave it as a picture and if translation is needed it can be done in a separate file.

Conversion for level 3

The layout of the extracted file should look as the original. How this is done can vary but since the file will be translated it should be easy to work with.

When recreating the file in Word (if not instructed to use another program), keep the following recommendations in mind:

  • It is preferable to remove all section breaks and use file setup to have the same settings for all pages
  • Use header/footer
  • Use paragraph formats to control the general formatting of the text
  • If there is a table of contents make it automated
  • Set up tables as seen below
  • Do not leave text in text boxes if not absolutely necessary
  • Do not use tables to create columns

Extracting tables

Depending on the contents of the tables, these can be handled in two different ways.

  • Most cells contain text to be translated
    Extract the table as it is with all text included
  • A heading row to be translated and remaining rows containing numbers
    (or other text not in need of translation)
    Extract the heading row as text and insert the rest of the table as a picture

Example on client specific instructions

  • Target language will be German.
  • Declaration of incorporation, certificate and declaration of conformity do not need translation and can be added as images in the Word files.
  • Do not make text boxes, except for caption/image texts.
  • Drawings do not need translation, but captions needs to be editable.
  • Please only deliver Word files, not PDFs.
  • Double spaces and non-breaking space should be single, ordinary spaces in body text.

Read more in Translators’ forum