ePub 의 개요 [전자책 표준]

2009. 9. 3. 19:39

3 September 2009 19:19:49

EPUB

[출처] MobileRead

ePub는 국제 디지털 출판 포럼(idpf)의 오픈 ebook 포럼에서 정의한 공개 포멧이다.

ePub는 XHTML과 XML 중에서 선택적인 스타일 쉬트(optional style sheet)에 기초하며,

ePub의 전신은 OEB(Open eBook) 표준이다.

- ePub의 정의

".epub"는 앞으로 널리 쓰일 디지털 책이나 출판물을 위한 XML 포멧의 확장자이다.

".epub"는 세가지 공개 표준으로 구성되는데, OPS (Open Publication Structure),

OPF (Open Packaging Format), 그리고 OCF (Open Container Format)이다.

".epub"는 출판사들이 고객에게 단 한개의 파일로 디지털 출판을 제공하여,

신뢰성있는 하드웨어/소프트웨어 방식으로 암호화한 디지털 책을 고객들에게 제공한다.

- ePub의 용례

ePub는 두가지의 파일로 서비스를 하는데, 소스파일 포멧과 최종 사용자 포멧이다. 이 이유는 파일들을

쉽게 제공하도록 모아서 담아 서비스하기 위함이다. 모아 담는 컨테니너는 일반적으로 zip 파일 포멧이다.

?
- ePub 리더 소프트웨어

Adobe Digital Editions - This reads Adobe DRM ePUB ebooks as well as non-DRM.
AZARDI - read ePUB and verify conformance.
Bookworm
Calibre - Windows, MacOS X and Linux
FBReader, Windows and more
OpenBerg Lector - Browser plugin
Stanza, Windows, MacOS X, and iPhone

- ePub 하드웨어

• Sony Reader PRS-700 - Sony Readers also support Adobe ADEPT (DRM) technology (EPUB/PDF)
• Sony Reader PRS-505 - Among other things, the V1.1 (Jul'08) firmware update added ePub support to the 505.
• Hanlin V3 and all its clones Bebook, EZ Reader, etc. See the full list - No DRM, A new firmware release (August 2009) uses ADE software that can support DRM
• JetBook - No DRM
• Bookeen Cybook Opus - Full support using ADE.

- ePub Creation software

• Adobe InDesign
• Atlantis Word Processor - can convert any TXT/RTF/DOC/DOCX d0cument to ePub.
• Calibre Click the "hammer" icon next to the search bar and set the output format to EPUB.
• CSS check tool
• DAISY Pipeline - creation and checking tools
• eCub - a simple to use EPUB and MobiPocket ebook creator
• ePUB check tool
• ePUB Tools - A collection of open source tools used to create and check ePUB
• EScape - an add-on for Open Office (ODT), not for commercial use.
• Convert uploads to ePUB at Feedbooks.com. You don't have to hit the "publish" button. Just edit your text, and download the ePub preview.
• OpenBerg Rector
• Python converter posted by MishaS - His latest version of oeb2epub is at his site
• Sigil is an editor (word processor) for directly changing or creating ePUB files.
• Stanza - converter for PC and Mac, typically strips formatting prior to conversion.
• Web2FB2 is a web site that will convert a URL to FB2 and ePUB format.

[edit] Specificatons
http://www.idpf.org/specs.htm contains the specifications for this format. In particular check the version 2.0 OPS and OPF specs and the version 1.0 OCF spec. The Informational d0cuments are also quite useful in understanding the intent and content.
[edit] OCF
A typical OCF is a zip file that might look like:
mimetype
META-INF/
container.xml
[manifest.xml]
[metadata.xml]
[signatures.xml]
[encryption.xml]
[rights.xml]
OEBPS/
Great Expectations.opf
cover.html
chapters/
chapter01.html
chapter02.html
… other HTML files for the remaining chapters …
[edit] mimetype
The first file in the ZIP Container MUST be a file by the ASCII name of ‘mimetype’ which holds the MIME type for the ZIP Container (i.e., “application/epub+zip” as a 20 character ASCII string; no padding, CR/LF, white-space or case change). The file MUST NOT be compressed nor encrypted and there MUST NOT be an extra field in its ZIP header.
[edit] OPF
The Open Packaging Format (OPF) Specification, defines the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication.
Specifically, OPF:
Describes and references all components of the electronic publication (e.g. markup files, images, navigation structures).
Provides publication-level metadata. Specifically it should include: dublin core formatted data
Specifies the linear reading-order of the publication.
Provides fallback information to use when unsupported extensions to OPS are employed.
Provides a mechanism to specify a declarative table of contents (the NCX).
An example:
<package version="2.0" xmlns="http://www.idpf.org/2007/opf"
unique-identifier="BookId">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title>Alice in Wonderland</dc:title>
<dc:language>en</dc:language>
<dc:identifier id="BookId" opf:scheme="ISBN">
123456789X
</dc:identifier>
<dc:creator opf:role="aut">Lewis Carroll</dc:creator>
</metadata>
<manifest>
<item id="intro" href="introduction.html"
media-type="application/xhtml+xml" />
<item id="c1" href="chapter-1.html"
media-type="application/xhtml+xml" />
<item id="c2" href="chapter-2.html"
media-type=application/xhtml+xml" />
<item id="toc" href="contents.xml"
media-type="application/xhtml+xml" />
<item id="oview" href="arch.png"
media-type="image/png" />
</manifest>
<spine toc="ncx">
<itemref idref="intro" />
<itemref idref="toc" />
<itemref idref="c1" />
<itemref idref="c2" />
<itemref idref="oview" linear="no" />
</spine>
</package>
[edit] OPS
The Open Publication Structure (OPS) Specification describes a standard for representing the content of electronic publications.
Specifically:
The specification is intended to give content providers (e.g. publishers, authors, and others who have content to be displayed) and publication tool providers, minimal and common guidelines that ensure fidelity, accuracy, accessibility, and adequate presentation of electronic content over various Reading Systems.
The specification seeks to reflect established content format standards.
The goal of this specification is to define a standard means of content description for use by purveyors of electronic books (publishers, agents, authors et al.) allowing such content to be provided to multiple Reading Systems and to insure maximum presentational equivalence across Reading Systems.
[edit] XHTML
A conforming OPS d0cument must support the following XHTML constructions.
XHTML 1.1 Module Name
Elements (non-normative)
Notes
Structure
body, head, html, title
the default rendering for body is consistent with the CSS property page-break-before having been set to right (which behaves like always on one-page Reading Systems), but may be overridden by an appropriate style sheet declaration.
Text
abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var
The optional attribute cite may be used in blockquote, q, del and ins to provide a URI citation for the element contents. Reading Systems are not required to process or use the referenced URI resource, whether or not the resource is listed in the Manifest.
Hypertext
a
Reading Systems may use or render a URI referenced physical resource not listed in the Manifest (i.e., it is not a component of the Publication), but they are not required to do so.
List
dl, dt, dd, ol, ul, li

Object
object, param
The object element is the preferred method for generic object inclusion. When adding objects whose data media type is not drawn from the OPS Core Media Type list or which reference an object implementation using the classid attribute, the object element must specify fallback information for the object, such as another object, an img element, or descriptive text.
Presentation
b, big, hr, i, small, sub, sup, tt

Edit
del, ins

Bidirectional Text
bdo

Table
caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr

Image
img
The inline element img should only be used to refer to images with OPS Core Media Types of GIF (http://www.w3.org/Graphics/GIF/spec-gif89a.txt), PNG (RFC 2083), JPG/JFIF (http://www.w3.org/Graphics/JPEG) or SVG (http://www.w3.org/TR/SVG11/). The required URI attribute, src, is used to reference the image resource, which must be listed in the Manifest.
The required alt attribute should contain a brief and informative textual description of the image. This text may be used by Reading Systems as an alternative to, or in addition to, displaying the image. The text is also an acceptable fallback for an img with src referencing a non-OPS Core Media Type for which no viable fallback was found in the manifest.
Client-Side Image Map
area, map

Meta-Information
meta

Style Sheet
style
The type attribute of the style element is required and must be given the value of text/css or the deprecated text/x-oeb1-css.
Style Attribute (deprecated)
style attribute

Link
link
The link element allows for the specification of various relationships with other d0cuments. Reading Systems must recognize external style sheet references specified via the href attribute and the associated rel attribute (for the values rel="stylesheet" and rel="alternate stylesheet".)
Base
base

[edit] Relationships
Relationship to NVDL
This specification uses the NVDL language (see http://standards.iso.org/ittf/PubliclyAvailableStandards/c038615_ISO_IEC_19757-4_2006(E).zip) as a means to unambiguously define the interaction between the various schemas used in this specification. NVDL allows for interaction and validation between various XML schema languages. See Appendix A for a normative NVDL definition of OPS.
This specification does not require the use of NVDL tools to validate OPS d0cuments, although such tools are available and may be used for validation.
Relationship to XHTML and DTBook
This specification recognizes the importance of current software tools, legacy data, publication practices, and market conditions, and has therefore incorporated certain XHTML 1.1 Document Type Modules and DTBook as Preferred Vocabularies. This approach allows content providers to exploit current XHTML and DTBook content, tools, and expertise.
To minimize the implementation burden on Reading System implementers (who may be working with devices that have power and display constraints), the Preferred Vocabularies do not include all XHTML 1.1 elements and attributes. Further, the modules selected from the XHTML 1.1 specification were chosen to be consistent with current directions in XHTML.
Any construct deprecated in XHTML 1.1 is either deprecated or omitted from this specification; CSS-based equivalents are provided in most such cases. Style sheet constructs are also used for new presentational functionality beyond that provided in XHTML.
Relationship to CSS
This specification defines a style language based on CSS 2. (Note that the CSS 2.1 specification is currently still at "Working Draft" status.) The style sheet MIME type text/x-oeb1-css has been deprecated in favor of text/css.
Relationship to XML
OPS is based on XML because of its generality and simplicity, and because XML d0cuments are likely to adapt well to future technologies and uses. XML also provides well-defined rules for the syntax of d0cuments, which decreases the cost to implementers and reduces incompatibility across systems. Further, XML is extensible: it is not tied to any particular type of d0cument or set of element types, it supports internationalization, and it encourages d0cument markup that can represent a d0cument’s internal parts more directly, making them amenable to automated formatting and other types of computer processing.
Reading Systems must be XML processors as defined in XML 1.1. All OPS Content Documents must be valid XML d0cuments according to their respective schemas.
Relationship to XML Namespaces
Reading Systems must process XML namespaces according to the XML Namespaces Recommendation at http://www.w3.org/TR/xml-names11/. For example:
xmlns:ops="http://www.idpf.org/2007/ops"
[edit] Tips
It is possible to make an eBook that conforms to the standard by placing the entire book contents in one XHTML file but the performance will be impacted by this decision. For best performance a standard size book should be divided into several files as the full file needs to be loading into memory at once. This is usually accomplished by separating the files by chapter.
Some mobile devices cannot handle large ePUB files. Generally this is caused by having an XHTML file that is too large. If the file can be expanded the large XHTML file may be able to be broken into multiple files.
The ePub file format has proper support for TOC, through the use of TOC.NCX files. Not all reader applications support this currently. This is d0cumented in the DTBook standard.
Make sure all tags are complete (no dangling tags). htmltidy does a great job here
Get rid of as many tables as you can! A lot of these CHM type files put the entire content of the page in one table and that causes tons of problems
"normal" tables tend to get truncated in the reader due to being too wide. Convert these tables to some intelligent lists with <hr/>'s around them
Play with the CSS to get the colors cleaned up. A lot of the "color" gets translated to light grey and it sucks. Best just to change everything to black that you can
<pre> blocks of code can go off the page as well. Use the CSS to shrink their font size and, at worst, reformat the blocks to keep them into a 70 character width at 6pt.

'Computer Science' 카테고리의 다른 글

프로그래밍 언어 포럼 순위 (랭킹) (1)	2009.09.27
파이썬 로보틱스 프로그래밍 (0)	2009.09.20
WebOS (0)	2009.08.30
MS Windows에서 DevC++로 GTK프로그래밍하기 (0)	2009.08.17
임베디드 OS란 무엇인가 (0)	2009.08.15

졸리운 곰의 정보기술 여행 [김성준]

ePub 의 개요 [전자책 표준]

'Computer Science' 카테고리의 다른 글

+ Recent posts

티스토리툴바