I have been using the Poppler library for some time, over a series of various projects. It’s an open source set of libraries and command line tools, very useful for dealing with PDF files. Poppler is targeted primarily for the Linux environment, but the developers have included Windows support as well in the source code. Getting the executables (exe) and/or dlls for the latest version however is very difficult on Windows. So after years of pain, I jumped on oDesk and contracted Ilya Kitaev, to both compile with Microsoft Visual Studio, and also prepare automated tools for easy compiling in the future. Update: MSVC isn’t very well supported, these days the download is based off MinGW.
So now, you can run the following utilities from Windows!
- PDFToText – Extract all the text from PDF document. I suggest you use the -Layout option for getting the content in the right order.
- PDFToHTML – Which I use with the -xml option to get an XML file listing all of the text segments’ text, position and size, very handy for processing in C#
- PDFToCairo – For exporting to images types, including SVG!
- Many more smaller utilities
Latest binary : poppler-0.51_x86.7z