PDF converter to other document types?

As we are investigating PDF document manipulation it came a cross our mind that we might create desktop and online PDF converter.

Goal is to have tool that will scan your PDF documents and convert them into some other (editable) files formats like Word, Excel, rich text, HTML,...).

Yes, there are other similar tools but ReplaceMagic desktop edition will allow you to do mass conversions at once.
Concept will be the same like with ReplaceMagic.Total where you will chose folder(s)/drive(s) where your documents are and then decide into which file format you want to export your PDF.

Of course, we need to keep same formating like in your PDF documents.

Beside desktop edition we will offer also online application with simplistic layout where you can upload your document, we will convert it to chosen file format and send it per email back to you.

Online version will not have mass-conversion options like desktop edition.

What do you think does this make sense?
What else would you like to have (signing documents, approval process, team collaboration and document sharing options...)?

Btw. reason to create something like this is because we also get tons of PDF documents and if we need to make some changes this is difficult task (either some other similar tool or to rewrite text again). By using ReplaceMagic.PDFConverter you will be able to export text, make changes and then convert it back to PDF.

Approch to process docs


To process (for example, make replacement in documents) Office documents we can use different approaches:


  • Office Automation

  • Open XML (in case of newer Office version)

  • ReplaceMagic.Total


Office Automation has disadvantages that it was never meant for server usage. Reasons are like: security, stability, performances, scalability and price.


Microsoft quote about security:

Office Applications were never intended for use server-side, and therefore do not take into consideration the security problems that are faced by distributed components. Office does not authenticate incoming requests, and does not protect you from unintentionally running macros, or starting another server that might run macros, from your server-side code. Do not open files that are uploaded to the server from an anonymous Web! Based on the security settings that were last set, the server can run macros under an Administrator or System context with full privileges and compromise your network! In addition, Office uses many client-side components (such as Simple MAPI, WinInet, and MSDAIPP) that can cache client authentication information in order to speed up processing. If Office is being automated server-side, one instance may service more than one client, and because authentication information has been cached for that session, it is possible that one client can use the cached credentials of another client, and thereby gain non-granted access permissions by impersonating other users.


Microsoft quote about stability:


"Office 2000, Office XP, and Office 2003 use Microsoft Windows Installer (MSI) technology to make installation and self-repair easier for an end user. MSI introduces the concept of "install on first use", which allows features to be dynamically installed or configured at runtime (for the system, or more often for a particular user). In a server-side environment this both slows down performance and increases the likelihood that a dialog box may appear that asks for the user to approve the install or provide an appropriate install disk. Although it is designed to increase the resiliency of Office as an end-user product, Office's implementation of MSI capabilities is counterproductive in a server-side environment. Furthermore, the stability of Office in general cannot be assured when run server-side because it has not been designed or tested for this type of use. Using Office as a service component on a network server may reduce the stability of that machine and as a consequence your network as a whole. If you plan to automate Office server-side, attempt to isolate the program to a dedicated computer that cannot affect critical functions, and that can be restarted as needed."


Microsoft quote about Stability and Performances:

"Server-side components need to be highly reentrant, multi-threaded COM components with minimum overhead and high throughput for multiple clients. Office Applications are in almost all respects the exact opposite. They are non-reentrant, STA-based Automation servers that are designed to provide diverse but resource-intensive functionality for a single client. They offer little scalability as a server-side solution, and have fixed limits to important elements, such as memory, which cannot be changed through configuration. More importantly, they use global resources (such as memory mapped files, global add-ins or templates, and shared Automation servers), which can limit the number of instances that can run concurrently and lead to race conditions if they are configured in a multi-client environment. Developers who plan to run more than one instance of any Office Application at the same time need to consider "pooling" or serializing access to the Office Application to avoid potential deadlocks or data corruption."


On the other side price has also impact – if application utilize Office automation, on each computer where application is installed you need to have Microsoft Office. With ReplaceMagic.Total, which we plan to release during 2015, you will not need to install Office at all. This makes ReplaceMagic a cost effective way to make changes in your Office, PDF, Email and OneNote documents.


Open XML SDK will work only with newer version of Office document (for example, docx, xlsx,…) but you cannot process older documents types.


Of course, in case of Office Automation and Open XML SDK biggest issue is that you have only SDK and nothing more. ReplaceMagic.Total will offer you full application environment where you do not need to program anything. You can look at ReplaceMagic as graphical interface between you and changes in all document without need to do any programming.


Btw. we published ReplaceMagic in 2007 based on Office Automation. This works very well and ReplaceMagic was used in a lot of document migrations but now is time to go to next level thus we will introduce new version of ReplaceMagic which will be thread-enabled and also will not depend on Office Automation or Open XML.


What are future plans for ReplaceMagic?

Well, from time to time we are getting questions from our customers and not yet customer what do we plan with ReplaceMagic, how development will go and do we have some roadmaps.

In general, most important for us is to support as soon as new Office version is published document types of that version.

Long term we plan to go away from Office automatisation. It works fine and it makes development easier. Document model is well supported and without too much effort we can support new Office version but ... like Microsoft is saying this is not best way to work with Office document thus long goal is to go away from it.

To make this story short - we are currently building new version of ReplaceMagic that will not need Office applications installed on computer where ReplaceMagic is running.

Second - performances - we think that ReplaceMagic works fast but not fast enough. We you compare time needed to make replacements manually and with ReplaceMagic we are comparing somebody on bycicle and somebody driving fast car :).

No matter to this acceleration this is area where we expects additional improvements and that will be done by multi-thread enablements.

Our initial tests with multi-threads (10 threads during our test) and without Office shown that new ReplaceMagic will run multiple times fast. With out test documents (~12.000 documents) we were able to open them and check for hyperlinks in less than 5 minutes.

Next big point will be document types extension. Currently ReplaceMagic supports replacements in Microsoft Word, Powerpoint, Excel, Project, Visio documents and Windows shortcuts. In the future we will also support replacement of PDF documents, OneNote and maybe also emails (eml and Outlook files).

Almost forgot new version is big jump so we will also change ReplaceMagic design and currently it looks like this:

We cannot provide estimate when this will be finished as we are talking about completely rewritten application but we hope that this will happen during 2015.

Stay tuned...