The process described in this documentation reflects the way the team at Five Simple Steps worked, so it's best suited to people working on Macs, and references tools they used. Bear in mind that you'll need access to a variety of devices at least an iPad and Kindle to test and view your ebook. ZIP file. All the information that you need to change in the template files appears in uppercase. You will need an edited and finalised manuscript, complete with all reference URLs.

Books in digital format, also known as e-books, can be read on devices lacking the power and screen space to afford a regular Web browser. Several publishers, not to mention projects such as Project Gutenberg, have provided thousands of new and classic titles in digital format.

The problem is both the hardware—be it generic PDAs or dedicated devices—and the whole e-book publishing industry are much more fragmented than are PCs and Web browsers. Therefore, it is probable that the e-book you recently bought will not be readable ten years from now—nor tomorrow, should you decide to use a laptop or change PDAs.

If you go this route for conversion, you are able to do it today, and because it's an open format, 20 years from now too. Doc, recognizable by the extensions. This standard has spun off several variants, including MobiPocket, which adds embedded HTML markup tags to the basic format.

Each Palm e-book is divided into three sections: the header, a series of text records and a series of bookmark records. Normally, the header is 16 bytes wide.

Some Doc readers may extend the width at run time to hold additional custom information. By default, the header contains data such as the total length of the uncompressed text, the position currently viewed in the document and an array of two-byte unsigned integers giving the uncompressed size of each text record. Usually, the maximum size for this kind of records is 4, bytes, and each one of them is compressed individually. The bookmark records are composed of a byte name and a 4-byte offset from the beginning of text.

Because bookmarks are optional, many Doc e-books don't contain them, and most Doc readers support alternative—that is, non-portable—methods to specify them. Other reader-specific extensions might include category, version numbers and links between e-books. Almost always, this information is stored outside the. Therefore, you should not expect to preserve this kind of data when converting your e-books. Pyrite Publisher, formerly Doc Toolkit, is a set of content conversion tools for the Palm platform.

Currently, only some text formats can be converted, but functionality can be extended to support new ones by way of Python plugins. Pyrite Publisher can download the documents to convert directly from the Web; it also can download set bookmarks directly to the output database.

The package, which requires Python 2. The software is available for Linux and Windows in both source and binary format. Pyrite installed and ran flawlessly on Fedora Core 2. The name of the executable is pyrpub. The exact command for converting. Pyrite can be enough if all you want to do is quickly index a digital library. On the other hand, it is almost trivial to reformat the result to make it more readable in a browser. The snippet of Perl code in Listing 1, albeit ugly, was all it took to produce the version of Don Quixote shown in Figure 1.

The result then is printed to standard output and properly formatted as basic HTML. I have tried to do this with the 1.

Hopefully, this functionality will be added to future versions. Pyrite Publisher is designed mainly to go from normal HTML or text files to the Palm platform, not the other way around.

The procedure discussed above is not really scalable to scenarios such as converting a great quantity of Palm e-books to customized HTML, with hyperlinks and metadata included. The P5-Palm set of modules includes classes for reading, processing and writing the.

RocketBook e-books have several interesting characteristics, including support for compressed HTML files and indexes containing a summary of paragraph formatting and the position of the anchor names. These and many more details on. Rbmake Rocket Ebook and Mobipocket files can be disassembled with a set of command-line tools called Rbmake. Its home page offers source code, binary packages, a mailing list and contact information to report bugs.

To use rbmake, you need libxml2, version 2. To compile from source—at least on Fedora Core 2—it also is necessary to install separately the pcre-devel package. A nice feature of Rbmake is the source code is structured in a modular manner. An entire library of object-oriented C routines can be compiled and linked independently from the rest of the package from any other program dealing with.

In this way, should you want to write your own super-customized Rocket Ebook converter or simply index all of your e-books into a database, you would need to use only the piece that actually knows how to read and write the. This chunk of code opens the file, returns a list of the sections composing the book and uncompresses on the fly only the ones actually required by the main program.

Should you need them, the library also includes functions to match and replace parts of the content through Perl-compatible regular expressions. Exhaustive HTML documentation also is included in the source tarball. The binary file able to generate HTML files is called rbburst. It extracts all the components—text, images and an info file—present in the original. Figure 2 shows, in two separate Mozilla Windows, the cover page and the table of contents of the file generated by rbburst when run on The Invisible Man by H.

Microsoft's Reader files, recognizable by the. They also support keyword searching and hyperlinks, but they are locked in to one reader platform. The tool for converting these files is called, simply, Convert Lit. Running the program with the -help option lists, according to UNIX tradition, all the available command-line options.

This program has three modes of operation: explosion, downconversion and inscribing. Explosion is the one needed to convert an existing. Downconversion is the opposite process; it generates a.

Inscribing is when the downconversion attaches a user-defined label to the. The exact syntax is explained on the program's home page see Resources. Here is the complete list for the example above: Contents. It is an XML container describing the structure and several portions of the original book's metadata. The extension OPF stands for open electronic book package format. The OPF file contains references to the other pieces of the e-book, as well as descriptions of their attributes.

To have a clearer idea of its role, a short excerpt of MidSummerNightDream. The practical consequence of this is Convert Lit could be useful even if you wanted to leave all of your collection in a proprietary format. You still could run the program on all your. Then, any quick script or full-blown XML parsing utility could scan them and index everything into the database of your choice. And if you have Microsoft Reader e-books, you likely have a Microsoft Windows system and a licensed copy of Microsoft Reader.

In general, we have discussed only command-line processing in this article. If, however, you have a whole collection of e-books in different formats, you can convert them all at one time with a simple shell script.

You can add one or two lines to the loop to index with glimpse or ht::dig , print everything in one single PostScript book and much more. A solution for putting e-books, at least the ones you will be able to get in the near future, into an open format is in the works. Its goal is to provide an XML-based specification, based on existing open standards, for providing content to multiple e-book platforms.

OEBPS, which has reached version 1. Time will tell what will come from this. Unicode is a family of encodings that enables computers to handle without ambiguity tens of thousands of characters. OEBPS has the potential to preserve all your e-books and make sure that the ones you download or buy will not vanish if any hardware or software company goes the way of the dodo. Marco Fioretti is a hardware systems engineer interested in free software both as an EDA platform and, as the current leader of the RULE Project, as an efficient desktop.

Marco lives with his family in Rome, Italy. Listing 1. Figure 1. Figure 2. Rbmake extracts all the components of a RocketBook file, including text and images. Figure 3. Listing 2. You May Like. Kyle Rankin.

EPUB is an e-book file format that uses the ". The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers , and compatible software is available for most smartphones, tablets, and computers. The Book Industry Study Group endorses EPUB 3 as the format of choice for packaging content and has stated that the global book publishing industry should rally around a single standard.

